2017-07-10: vc5, raspbian performance

Last week I got permission to open source my work on a Mesa-based VC5 3D driver for BCM7268. You can see the announcement here which I won’t replicate on this blog. TWIVC4 is going to be a lot of TWIVC5 from here on out!

I spent the rest of the week working on fixing performance regressions when Raspbian switches from software rendering to to using the vc4 driver without a compositor enabled. The current concern is that window dragging gets slower, and in the worst case can end up with seconds of window dragging queued up behind the motion of the mouse cursor.

Past debugging of mine into how we end up with seconds of window movement queued was fruitless. I suspect it’s “each mouse position is streamed out to the window manager, and the window manager naively queues up a window move for each position it gets, rather than reading through all the position events it gets at once and sending a single move for the last one”. Instead, I worked on just seeing if we can speed up enough that we don’t care.

X11 opaque window dragging is a tough case, because unlike compositors, the contents of the window are stored on the screen (saving gobs of memory, which is important on the Raspberry Pi). When you drag the window, the src and dst regions usually overlap, so we have to be careful to not overwrite src pixels before they’ve been copied to the dst. In software rasterization, we just arrange the memcpy to happen in the correct order. For GL, we have no such control.

What glamor does instead is make a temporary copy of the src pixels, and then copy from the temp to the dst. This creates dependencies between the screen-to-temp and temp-to-screen jobs, so we flush the rendering job at least twice per copy of the window, not counting any flushes that happen for the rendering of the exposed contents from whatever was underneath the window’s old position.

In my tracing, I found that the jobs being generated during window dragging were saying that they could modify any tile on the screen, not just the tiles being affected by the copy (so we read and write each tile on the screen for those jobs). In many paths in glamor we use glScissor() to limit our rendering to some subset of the screen, and this lets the GL keep our jobs trimmed to the appropriate size. However, copies and rectangle fills were scissoring only to the destination drawable’s area, which for LXDE was everything but the global menu bar.

I made a small series that looks at small drawing operations and uses glScissor() to clip to their bounds to help tiled renderers like vc4. I was careful to try to limit the impact of these changes on non-vc4 – fast desktop renderers don’t want to spend the CPU to compute the bounds of the operations when they don’t use the bounds.

It hasn’t completely fixed RPi window dragging, but things are a lot smoother. We may find more paths that need this treatment as more people switch to using vc4 for X11 drawing.