2018-05-30: V4L2, GNOME performance hackfest, V3D userspace, VC4 fencing

Early this month, I started on the upstreaming the prerequisite patches for Dave Stevenson’s MMAL V4L2 codec driver. The Kodi developers have tested that their pipeline works on the KMS backend using that driver, so once we get this upstream it should mean full media decode acceleration support for Kodi and gstreamer-based applications. All the prep patches were quickly accepted by gregkh, and now the MMAL camera driver will automatically probe at boot time when enabled.

The next week, I took a trip out to Cambridge to meet up with GNOME developers to work on performance issues on their software stack with lower-spec devices.

My primary task I adopted during the hackfest was building up support for capturing DRM events into sysprof. For anyone who hasn’t used it, sysprof is a beautiful tool for doing systemwide sampling profiling. Its maintainer, Christian Hergert, has been building up some timeline infrastructure in it, so you can see CPU utilization over time and look at samples within a chosen time range. By including some useful perf tracepoints in our captures too, we can see when vblank is and when GPU jobs were running relative to it.

vc4 timeline

That’s the first cut of throwing the events from a Raspberry Pi running fullscreen glxgears into the UI – you’ve got the GPU’s binner on top, then the GPU’s renderer, and the vblank event at the bottom. These aren’t great yet, but for a couple of days of hacking it’s a good start. We should be able to do AMDGPU, V3D, and etnaviv timelines easily as well, but i915 unfortunately hides the necessary tracepoints under an off-by-default Kconfig option.

There exist other tools that capture GPU timelines already, like gpuvis. However, gpuvis seems to be composed of a bunch of shell scripts and commands on a wiki, and isn’t packaged on debian. Incorporating support for DRM events into sysprof means that anyone updating the package will get this code, complete with nice authentication for getting the privileges necessary to start profiling. While I was there, Mathias worked on having GTK provide tracepoints from some of its notable codepaths, and Jonas worked on having gnome-shell get tracepoints as well. Hopefully, all of these tracepoints glued together can give us some visualization of what went wrong across the system when we miss a vblank.

Ultimately, though, I’m skeptical of GNOME 3 ever being usable on a Raspberry Pi. The clutter-based gnome-shell painting is too slow (60% of a CPU burned in the shell just trying to present a single 60fps glxgears), and there doesn’t seem to be a plan for improving it other than “maybe we’ll delete clutter some day?” Also, the javascipt extension system being in the compositor thread means that you drop application frames when something else (network statechanges, notifications, etc) happens in the system. This was a bad software architecture choice, and digging out of that hole now would take a long time. (I’m agnostic on whether it was wrong to move those into the same process as the compositor, but same thread was definitely wrong). I’ll keep working on the debugging tools to try to enable anyone to work on these problems, though.

After the hackfest, I stopped by the Raspberry Pi offices to talk multimedia again. Dave Stevenson’s going to take on upstreaming the v4l2 codec driver, having seen how easy the camera upstreaming to staging was. I respun the SAND display patches to get them upstreamed, so we can start talking about SAND with V4L2 soon. I also did some review of the core DRM writeback connector patches, so that hopefully we can push Boris’s VC4 support and be able to use that from things like HWC2 to do scene flattening.

While at the hotel, I tried out Stefan Schake’s vc4 fencing patches, wrote a couple of little fixes, and pushed the lot to Mesa.

For V3D, I wrote a patch series for supporting a bunch of GLES3 bitfield conversion operations on HW with no explicit instructions for them. Unfortunately, as the first one with hardware needing them, there hasn’t been any review yet. I also implemented glSampleMask/glSampleCoverage, and fixed some NaN bugs. Finally, I pushed the patches enabling the Mesa driver by default now that the kernel side has been accepted upstream!

After that busy couple of weeks, I took a week off at Portland’s regional burn, and I’m back to work now.