2018-09-07: baby, VC4 texture uploads, conformance

TWIV3D has been inactive for a bit, because I’ve been inactive for a bit. On August 15th, we added Moss Anholt to our family:

moss

So most of my time has been spent laying around with them and keeping us fed instead of writing software.

Here and there I’ve managed to write some patches. I’ve been working on meeting the current GLES3 CTS’s requirement for a 565 window system buffer being available. We don’t expose those on X11 today, because why would you? Your screen is 8888, so all you could do with 565 is render to a pixmap or buffer. Rendering to a pixmap is useless, becasue EGL doesn’t tell you whether R is in the top or bottom bits, so the X11 client doesn’t know how to interpret the rendered contents other than glReadPixels() (at which point, just use an FBO). My solution for this bad requirement from the CTS has been to add support for 565 pbuffers, which is pointless but is the minimum. This led to a patch to the xserver and a fix to V3D for blending against RTs without alpha channels.

I’m particularly proud of a small series to create a default case handler function for gallium pipe caps. Previously, every addition of a PIPE_CAP (which happens probably at least 10 times a year) has to touch every single driver, or that driver will start failing. It’s easy to forget a driver, particularly when rebasing a series across the addition of a new driver. And for driver authors, you’re faced with this list of 100 values which you don’t know the correct state of when starting to write a driver. The new helper lets CAP creators set a default value (usually whatever the hardcoded value was before their new CAP), and lets driver authors specify a much smaller set of required caps to get started.

Just before Moss arrived, I had landed some patches to VC4 to improve texture upload/download performance. Non-GL SDL apps on glamor were really slow, because the window wouldn’t be aligned to tiles and we’d download the tile-aligned area from the framebuffer to a raster-order temporary, store our new contents into it, then reupload that to the screen. The download from write-combined memory is brutally slow. My new code (mostly a rebase of patches I wrote in January 2017!) got the following results:

  • Improves x11perf -putimage10 performance by 139.777% +/- 2.83464% (n=5)
  • Improves x11perf -putimage100 performance by 383.908% +/- 22.6297% (n=11)
  • Improves x11perf -getimage10 performance by 2.75731% +/- 0.585054% (n=145)

I also wrote a followup patch to avoid the copy from the user’s buffer into the temporary in the driver for texture uploads, for another 12.1586% +/- 1.38155% (n=145)

Other things since my last post:

  • Fixed VCM cache size setup and the V3D driver’s idea of the VPM size.
  • Fixed a leak of X11 pixmaps backing pbuffers on DRI3.
  • Fixed vc4_fence_server_sync() on pre-syncobj kernels.
  • Fixed vc4 uniform offsets after a sampler in a structure.
  • Improved vc4 uniform upload performance.
  • Fixed vc4 MSAA tile loads when both Z and color are needed.
  • Fixed vc4 rendering to cubemap EGL images.
  • Fixed vc4 leak of the no-vertex-elements workaround BO.
  • Fixed some v3d register spilling bugs.