2017-05-09: CEC, writeback, panel-bridge, pl111, DEQP, meson

I’ve been procrastinating on writing up status, so this update’s a big one.

Boris Brezillon has been hacking away at implementing the VC4 transposer module as a DRM writeback connector, and has things running in test apps. My immediate goal with his work is to enable IGT testing of VC4’s HVS (hardware compositor) features, so that we can implement plane tiling and formats with confidence. However, there’s also a long term goal to extend X11’s modesetting driver to do Present into DRM planes, and only collapse planes down to the primary framebuffer once an atomic check fails or the display goes idle. This is something that other compositors like Raspberry Pi’s DispmanX firmware, Android HWC2, or Weston can all do, so we should fix X11 to do it as well.

Hans Verkuil has started on a CEC implementation for vc4’s HDMI module. The hardware looks fairly simple – unlike many other HDMI CEC implementations, ours is right there in the HDMI module, so there are no complicated inter-kernel-module dependencies to implement. I’ve set him up with register definitions and answered some hardware questions, and I suspect he’ll have results soon.

On my end, I’m excited to announce that the ARM pl111 module DRM KMS driver for Broadcom’s Cygnus is now merged to drm-misc-next, along with V3D support. This has been a more or less pleasant process: I got useful review feedback from Linus Walleij (who’s being involved in the fbdev version of this driver), some cleanups on the V3D code from Florian, and I got the code landed in a reasonable amount of time. The drm-misc process may not be perfect, but it’s the best I’ve experienced in the Linux kernel yet.

Next steps on Cygnus: I need to land a reset controller driver and the associated DT (right now if you lock up the GPU, it never resets). On Raspberry Pi reset works by us turning the power domain off and on, which causes the firmware to go through the reset process, but on Cygnus we don’t have a separate power domain for V3D. I also need to finish the panel driver and submit it (I’m using a stub for now, and have a proper panel driver in progress). And I’m working on using gallium’s “renderonly” helper functions to allow you to bring up GBM with GL accelerated on vc4 on a pl111 DRM node. This should get X, glmark, and kmscube (and any other GBM applications), working on Cygnus with no userspace changes.

I’ve also been reworking Raspberry Pi’s DSI panel support. I got really discouraged on this after trying to submit to the DRM panel tree back in December. Based on that feedback, I’m moving most of the code to a DRM bridge driver, with just the panel timing in the DRM panel tree. However, writing drivers that talk to DRM panels sucks, so I’ve written the “panel-bridge” helper that cuts about 100 lines of boilerplate from most DRM drivers that talk to a panel, by wrapping the panel in a DRM bridge structure. I submitted an early version for review last week, which was met with some skepticism. I need to re-submit my updated version that converts more drivers – probably when the patch series removes 400 lines of code from DRM, it’ll get a warmer reception.

I’ve also been working on cleaning up our GLES2 conformance test results. I’ve landed 4 patches to the GLSL compiler, and one more Mesa patch is in the queue. The most interesting problem for conformance is that some of the GLES2 conformance tests fail in our register allocator: vc4 can’t spill registers, so we need to be really good at register allocation to make arbitrary programs render successfully.

My current goal for allocation is to implement Sethi-Ullman based instruction scheduling at the QIR stage to try to keep register pressure down. However, register pressure is weighted at a middle level in the QIR instruction scheduler’s heuristics, because otherwise later on we end up too constrained for the QPU instruction scheduler and introduce stalls. To fix the QPU scheduling, I wrote a patch series that gives the driver a chance to choose among available registers during register coloring. With that, the driver can prioritize the accumulators while rotating through the available registers. Initial performance results look excellent, so I should be able to bump register pressure up in priority next and do the Sethi-Ullman work. Interestingly, this callback could resolve a longstanding issue that Intel’s pre-Sandybridge driver had in register allocation as well!

In the X Server, I’ve landed the meson build system. I’m not quite dogfooding it yet, because I can successfully run my desktop with startx but not with gdm (it looks like something is going wrong in the systemd integration). I’ve received patches from a few other X developers fixing up the meson build system for themselves, so I think we’re on track toward general adoption. I’ve made some more progress on implementing Travis CI for the X Server now, and on integrating the current X Server unit tests into meson.