This week I worked on stabilizing the VC5 DRM. I got the MMU support almost working – rendering triangles, but eventually crashing the kernel, and then decided to simplify my plan a bit. First, a bit of background on DRM memory management:
One of the things I’m excited about for VC5 is having each process have a separate address space. It has always felt wrong to me on older desktop chips that we would let any DRI client read/write the contents of other DRI clients’ memory. We just didn’t have any hardware support to help us protect them, without effectively copying out and zeroing the other clients’ on-GPU memory. With i915 we gained page tables that we could swap out, at least, but we didn’t improve the DRM to do this for a long time (I’m actually not entirely sure if we even do so now). One of our concerns was that it could be a cost increase when switching between clients.
However, one of the benefits of having each process have a separate address space is that then we can give the client addresses for its buffer that it can always rely on. Instead of each reference to a buffer needing to be tracked so that kernel (or userspace, with new i915 ABI) can update them when the buffer address changes, we just keep track of the list of all buffers that have been referenced and need to be at their given offsets. This should be a win for CPU overhead.
What I built at first was each VC5 DRM fd having its own page table and address space that it would bind buffers into. The problem was that the page tables are expensive (up to 4MB of contiguous memory), and so I’d need to keep the page tables separate from the address spaces so that I could allocate them on demand.
After a bit of hacking on that plan, I decided to simplify things for now. Given that today’s boards have less than 4 GB of memory, I could have all the clients share the same page tables for a 4GB address space and use the GMP (an 8kb bitmask for 128kb-granularity access control) to mask each client’s access to the address space. With a shared page table and the GMP disabled I soon got this stable enough to do basic piglit texturing tests reliably. The GMP plan should also reduce our context switch overhead, by replacing the small 8kb GMP memory instead of flushing all the MMU caches.
Somewhere down the line, if we find we need 4GB per client, we can build kernel support to have clients that exhaust the shared table get kicked out to their own page tables.
Next up for the kernel module: GPU reset (so I can start running piglit tests and settling my V3D DT binding), and filling out those GMP bitmasks.
The last couple of days of the week were spent on forking the anv driver to start building a VC5 Vulkan driver (“bcmv”). I’ll talk about it soon (next week is my bike trip, so no TWIV then, and after that is XDC 2017).