2017-06-26: vc4-xml, panel, raspbian vc4, memory debug

This week I picked up my old vc4-xml branch. This rework was inspired by the Intel driver, where they wrote an XML description of the hardware packets and use that to code-generate the packet packing and debug dumping code. Given that vc4’s debug dumping has always been somewhat of a mess (and its code is duplicated between mesa and vc4-gpu-tools), it would be great to do the same thing to vc4. More importantly, the XML-generated pack code easily lets you do things like precompute part of your packed state packet at gallium CSO generation time, and then just memcpy (or OR together two copies) at draw time.

My problem with the branch had been that it bloated the size of the vc4_emit.c code (the draw-time path), which probably meant that it reduced performance compared to my old hand-written packing. I had spent a couple of weeks writing fast paths for things like moving a float into the unaligned CL, or packing a couple of flag bits into the bottom of a 32-bit address, but that only took the bloat from like 20% to 10%. Last week, I decided to stop using the size as a proxy for performance and just test performance, and it turns out that the difference was negligible or slightly positive! Now I need to get an Android build done, and merge.

In the process of doing this draw overhead testing, I turned on a new gallium flag that cut the CPU overhead of draw calls by 5%, which was more than any of my vc4-xml overhead ever was!

I also spent more time on the 7” panel again, trying a rework of the load order in response to review feedback. It turns out that the DSI portion of DRM isn’t built to support drivers the way that previous feedback requested, and nobody has a concrete plan for how it would work. I’ve tried one avenue of fixing it, but that ran into another mess in the DSI subsystem.

Switching Raspbian over to vc4 fkms is currently stalled on Simon rebuilding the packages with the current Mesa patchset. I’ve fixed a minor issue in the fkms overlay that requested aligned CMA areas despite my having removed that requirement a while back, so hopefully they’ll finally enable vc4 on Pi0/1 as well once they get around to updating.

I did another round of review on the piglit series for ANDROID_native_fence and they’re ready to land now.

I sent out a rework to make some VC4 NIR lowering code shareable with other drivers. Freedreno and Intel have both wanted it at some point, so hopefully I can get some review on it.

In the kernel, I polished up my BO-labeling code that gives you detailed graphics memory usage information in /debug/dri/0/bo_stats. The cleanup was to effectively eliminate the CPU overhead, unless you choose do labeling from userspace. Adding this userspace interface required adding intel-gpu-tools testcases, so I wrote those. The Mesa side isn’t merged yet since we need kernel review first, and will probably only be enabled in debug driver builds. To make it really fancy, I should also hook up glObjectLabel() all the way to the kernel, so that /debug/dri/0/bo_stats can have things like “X11 ARGB glyph cache” instead of “resource 1024x1024@4” for that mystery 4MB buffer you’ve got.

Finally, I did some cleanup of the VC4 modesetting code, prompted by Boris’s recent cleanups. We’re much closer to matching the common DRM helpers now, with just our async pageflip code still being special. Once Gustavo’s async cursor bits land, we may be able to remove our async pageflip special case as well!