2017-12-11: VC5 GL, Vulkan, VC4 perf

It’s been a while since I posted a TWIV update, so this one will be big:

For VC5 GL features:

  • Implemented shader storage buffer objects
  • Fixed 1D texture mipmapping
  • Worked on 3D texture mipmapping again.
  • Reworked the Z32F_S8 support based on feedback from Rob Clark, and then rebuilt on his alternative Z32F_S8 patch.
  • Fixed GL_CLAMP with texture offsets in NIR lowering.
  • Added support for textureGrad().
  • Fixed textureLod() GL_BASE_LEVEL handling.
  • Fixed sampling from array textures.
  • Fixed pausing of transform feedback primitive queries for meta-ops.
  • Fixed incorrect padding in transform feedback output.
  • Fixed ARB_framebuffer_object mismatching RB size handling.

While running DEQP tests on all this (which unfortunately don’t complete yet due to running out of memory on my 7268 without swap), I’ve also rebased my Vulkan series and started on implementing image layout for it.

I also tested Timothy Arceri’s gallium NIR linking pass. The goal of that is to pack and dead-code eliminate varyings up in shared code. It’s a net ~0 effect on vc4 currently, but it will help vc5, and I may be able to dead-code eliminate some of the vc4 compiler backend now that the IR coming in to the driver is cleaner.

On the VC4 front, Boris has posted a series for performance counter support. This was a pretty big piece of work, and our hope is that with the addition of performance counters we’ll be able to dig into those workloads where vc4 is slower than the closed driver and actually fix them. Unfortunately he hasn’t managed to build frameretrace yet, so we haven’t really tested it on its final intended workload.

For VC4 GL, I did a bit of work on minetest performance, improving the game’s fps from around 15 to around 17. Its desktop GL renderer is really unfortunate, using a lot of immediate-mode GL, but I was completely unable to get its GLES renderer branch to build. It also lacks a reproducable/scriptable benchmark mode, so most of my testing was against an apitrace, which is very hard to get useful performance data from.

I debugged a crash in vc4 with large vertex counts that a user had reported, landed a fix for a kernel memory leak, and landed Dave Stevenson’s HVS format support (part of his work on getting video decode into vc4 GL).

Finally, I did a bit of research and work to help unblock Dave Stevenson’s unicam driver (the open source camera driver). Now that we have an ack for the DT binding, we should be able to get it merged for 4.16!