> Low network utilization yes and perhaps superior to X11 core protocol, but it ...

_flux · on Sept 19, 2022

> You can have your cake and eat it too: You can disable the compression if it's a problem. It's highly configurable if you want to play with it.

Well you've of course lost the game at that point if your surface is very tiny, given you have no primitives to express "write text hello at x, y" e.g. for a terminal; without compression the bitmap presentation of a terminal—or a text document—is very large. Scrolling will damage the whole screen.

> Each surface has its own stream, and is updated independently. E.g., a video player on a webpage will generally be a subsurface, a context menu or plugin is a popup surface. They're all processed independently, with each their own damage tracking (and if applicable, video compression). If content is stretched or scaled up, only the original source buffer will be transmitted, allowing the display server to take care of this.

That's pretty nice and better than I assumed! Some hardware has limited number of hardware video encoders, though, for example in NVIDIA 1080GTX it's four for the _complete system_. I think for this reason waypipe reuses the contexts for different surfaces, resulting in lost video quality? Without reuse I expect one to run out of contexts quite fast—and then you spill to CPU.

Video sending can be a nice way to provide remote frames because it naturally batches all drawing operations into one compressed frame and really works if your task is to send 4k 60Hz video. But can it really beat display server -side composition or even things such as text rendering? As a tool in the toolkit it's a very nice thing to have, but it shouldn't be the complete toolkit. For X11 Someone(TM) could implement XPutVideo.

I think the bad performance of many X11 apps is not inherent to the protocol but inherent to the programs written in synchronous style (libX11 included; libxcb fixes this), assuming immediate access to the display server, instead of putting in requests while waiting for the answer for previous ones. Worst offender: Virtualbox. Access to a local surface and sending that surface as a whole can be a good solution to such code. And this kind of code is possibly so common because doing everything asynchronously can be tedious, and even more tedious in some enviroments such as C..

arghwhat · on Sept 20, 2022

The X11 draw APIs are not really used outside stuff written in Motif or hand-written X11 clients like st. Instead, modern X11 clients sidestep all this for performance reasons and render on their own and post buffers (e.g. GLX, cairo, vulkan, whatever they may like). This means copying bitmaps when forwarding, over a protocol that is not made for doing so efficiently.

Sure, if an application is just rendering nothing but text through X (and not using e.g. pango rendering to a cairo context), and you do not prefer using the resources on the machine running the application rather than the one displaying it, and you don't care about performance when using the application locally, then X11's might be more efficient. But for a purely text application, SSH puts X11 and Wayland to shame.

Re: encoder limits, Intel Quick Sync (built into Intel CPUs since 2011) documentation suggests the only limitation there with respect to parallel encoding is whether or not you can keep up with frame-rate requirements - an old example being 10 streams at full HD 30fps. I believe waypipe only focuses on video-memory buffers right now for video encoding right now as a simple heuristic, as CPU memory (shm buffers) are only used for "low performance" content on Wayland.

> I think the bad performance of many X11 apps is not inherent to the protocol but inherent to the programs written in synchronous style ... And this kind of code is possibly so common because doing everything asynchronously can be tedious, and even more tedious in some enviroments such as C.

I believe even libxcb has forced synchronous parts, which are annoying Wayland compositor developers a lot as Xwayland-supporting compositors need a bit of X11 WM code.

The Wayland protocol is asynchronous by nature, and the primary client and server library exposes this with no synchronous pretenses in idiomatic C. Every function you call only queues a request that will be sent when your event loop dispatches next time, and when you receive events your event loop will fire callbacks in bulk.

Nothing is synchronous, and updating your window requires no wait. In the simple case, the only message you'll get back at some point is one informing you that the previous buffer can now be reused (buffer release).