2018-02-12: LCA 2018, VC4 media, VC5 progress

I spent the end of January gearing up for LCA, where I gave a talk about what I’ve done in Broadcom graphics since my last LCA talk 3 years earlier. Video is here.

(Unfortunately, I failed to notice the time countdown, so I didn’t make it to my fun VC5 demo, which had to be done in the hallway after)

I then spent the first week of February in Cambridge at the Raspberry Pi office working on vc4. The goal was to come up with a plan for switching to at least the “fkms” mode with no regressions, with a route to full KMS by default.

The first step was just fixing regressions for fkms in 4.14. The amusing one was mouse lag, caused by us accidentally syncing mouse updates to vblank, and an old patch to reduce HID device polling to ~60fps having been accidentally dropped in the 4.14 rebase. I think we should be at parity-or-better compared to 4.9 now.

For full KMS, the biggest thing we need to fix is getting media decode / camera capture feeding into both VC4 GL and VC4 KMS. I wrote some magic shader code to turn linear Y/U/V or Y/UV planes into tiled textures on the GPU, so that they can be sampled from using GL_OES_EGL_image_external. The kmscube demo works, and working with Dave Stevenson I got a demo mostly working of H.264 decode of Big Buck Bunny into a texture in GL on X11.

While I was there, Dave kept hammering away at the dma-buf sharing work he’s been doing. Our demo worked by having a vc4 fd create the dma-bufs, and importing that into vcsm (to talk MMAL to) and into the vc4 fd used by Mesa (mmal needs the buffers to meet its own size restrictions, so VC4 GL can’t do the allocations for it). The extra vc4 fd is a bit silly – we should be able to take vcsm buffers and export them to vc4.

Also, if VCSM could do CMA allocations for us, then we could potentially have VCSM take over the role of allocating heap for the firmware, meaning that you wouldn’t need big permanent gpu_mem= memory carveouts in order for camera and video to work.

Finally, on the last day Dave got a bit distracted and put together VC4 HVS support for the SAND tiling modifier. He showed me a demo of BBB H.264 decode directly to KMS on the console, and sent me the patch. I’ll do a little bit of polish, and send it out once I get back from vacation.

We also talked about plans for future work. I need to:

  • Polish and merge the YUV texturing support.
  • Extend the YUV texturing support to import SAND-layout buffers with no extra copies (I suspect this will be higher performance media decode into GL than the closed driver stack offered).
  • Make a (downstream-only) QPU user job submit interface so that John Cox’s HEVC decoder can cooprate with the VC4 driver to do deinterlace. (I have a long term idea of us shipping the deinterlace code as a “firmware” blob from the Linux kernel’s perspective and using that blessed blob to securely do deinterlace in the upstream kernel.
  • Make an interface for the firmware to request a QPU user job submission from VC4, so that the firmware’s fancy camera AWB algorithm can work in the presence of the VC4 driver (right now I believe we fall back to a simpler algorithm on the VPU).
  • Investigate reports of slow PutImage-style uploads from SDL/emulators/etc.

Dave plans to:

  • Finish the VCSM rewrite to export dma-bufs and not need gpu_mem= any more.
  • Make a dma-buf enabled V4L2 mem2mem driver for H.264 decode, JPEG decode, etc. using MMAL and VCSM.

Someone needs to:

  • Use the writeback connector in X to implement rotation (which should be cheaper than using GL to do so).
  • Backdoor the dispmanx library in Raspbian to talk KMS instead when the full vc4 KMS driver is loaded (at least on the console. Maybe with some simple support for X11?).

Finally, other little updates:

  • I ported Mesa to V3D 4.2
  • Fixed some GLES3 conformance bugs for VC5
  • Fixed 3D textures for VC5
  • Worked with Boris on debugging HDMI failures in KMS, and reviewed his patches. Finally the flip_done timeouts should be gone!