[RFC PATCH 00/10] drm/panthor: Add user submission

Mihail Atanassov posted 10 patches 1 year, 3 months ago
Only 8 patches received!
drivers/gpu/drm/panthor/Makefile          |   4 +-
drivers/gpu/drm/panthor/panthor_device.c  |  66 ++-
drivers/gpu/drm/panthor/panthor_device.h  |  35 +-
drivers/gpu/drm/panthor/panthor_drv.c     | 233 +++++++-
drivers/gpu/drm/panthor/panthor_fw.c      |   2 +-
drivers/gpu/drm/panthor/panthor_sched.c   | 408 +++++++++-----
drivers/gpu/drm/panthor/panthor_sched.h   |   8 +-
drivers/gpu/drm/panthor/panthor_syncobj.c | 167 ++++++
drivers/gpu/drm/panthor/panthor_syncobj.h |  27 +
drivers/gpu/drm/panthor/panthor_xgs.c     | 638 ++++++++++++++++++++++
drivers/gpu/drm/panthor/panthor_xgs.h     |  42 ++
include/uapi/drm/panthor_drm.h            | 243 +++++++-
12 files changed, 1696 insertions(+), 177 deletions(-)
create mode 100644 drivers/gpu/drm/panthor/panthor_syncobj.c
create mode 100644 drivers/gpu/drm/panthor/panthor_syncobj.h
create mode 100644 drivers/gpu/drm/panthor/panthor_xgs.c
create mode 100644 drivers/gpu/drm/panthor/panthor_xgs.h
[RFC PATCH 00/10] drm/panthor: Add user submission
Posted by Mihail Atanassov 1 year, 3 months ago
Hello all,

This series implements a mechanism to expose Mali CSF GPUs' queue
ringbuffers directly to userspace, along with paraphernalia to allow
userspace to control job synchronisation between the CPU and GPU.

The goal of these changes is to allow userspace to control work
submission to the FW/HW directly without kernel intervention in the
common case, thereby reducing context switching overhead. It also allows
for greater flexibility in the way work is enqueued in the ringbufs.
For example, the current kernel submit path only supports indirect
calls, which is inefficient for small command buffers. Userspace can
also skip unnecessary sync operations.

This is still a work-in-progress, there's an outstanding issue with
multiple processes using different submission flows triggering
scheduling bugs (e.g. the same group getting scheduled twice), but we'd
love to gather some feedback on the suitability of the approach in
general and see if there's a clear path to merging something like this
eventually.

I've also CCd AMD maintainers because they have in the past done
something similar[1], in case they want to chime in.

There are two uses of this new uAPI in Mesa, one in gallium/panfrost
(link TBD), and one in panvk [2].

The Gallium implementation is a naïve change just to switch the
submission model and exercise the new kernel code, and we don't plan
on pursuing this at this time.

The panvk driver changes are, however, a better representation of the
intent behind this new uAPI, so please consider that as the reference
userspace. It is still very much also a work in progress.

 * patch 1 adds all the uAPI changes;
 * patch 2 implements the GROUP_CREATE ioctl changes necessary to expose
   the required objects to userspace;
 * patch 3 maps the doorbell pages, similarly to how the user I/O page is
   mapped;
 * patch 4 implements GROUP_KICK, which lets userspace request an
   inactive group to be scheduled on the GPU;
 * patches 5 & 6 implement XGS queues, a way for userspace to
   synchronise GPU queue progress with DRM syncobjs;
 * patches 7 & 8 add notification mechanisms for user & kernel to signal
   changes to native GPU syncobjs.

[1] https://lore.kernel.org/amd-gfx/CADnq5_N61q_o+5WYUZsZ=qu7VmeXTFHQSxLwTco05gLzHaiswA@mail.gmail.com/t/#m116a36a598d8fad1329e053974ad37a4dc0f28ed
[2] https://gitlab.freedesktop.org/larsivsi/mesa/-/commits/panvk-v10-usersubmit?ref_type=heads

Ketil Johnsen (7):
  drm/panthor: Add uAPI to submit from user space
  drm/panthor: Extend GROUP_CREATE for user submission
  drm/panthor: Map doorbell pages
  drm/panthor: Add GROUP_KICK ioctl
  drm/panthor: Factor out syncobj handling
  drm/panthor: Implement XGS queues
  drm/panthor: Add SYNC_UPDATE ioctl

Mihail Atanassov (1):
  drm/panthor: Add sync_update eventfd handling

 drivers/gpu/drm/panthor/Makefile          |   4 +-
 drivers/gpu/drm/panthor/panthor_device.c  |  66 ++-
 drivers/gpu/drm/panthor/panthor_device.h  |  35 +-
 drivers/gpu/drm/panthor/panthor_drv.c     | 233 +++++++-
 drivers/gpu/drm/panthor/panthor_fw.c      |   2 +-
 drivers/gpu/drm/panthor/panthor_sched.c   | 408 +++++++++-----
 drivers/gpu/drm/panthor/panthor_sched.h   |   8 +-
 drivers/gpu/drm/panthor/panthor_syncobj.c | 167 ++++++
 drivers/gpu/drm/panthor/panthor_syncobj.h |  27 +
 drivers/gpu/drm/panthor/panthor_xgs.c     | 638 ++++++++++++++++++++++
 drivers/gpu/drm/panthor/panthor_xgs.h     |  42 ++
 include/uapi/drm/panthor_drm.h            | 243 +++++++-
 12 files changed, 1696 insertions(+), 177 deletions(-)
 create mode 100644 drivers/gpu/drm/panthor/panthor_syncobj.c
 create mode 100644 drivers/gpu/drm/panthor/panthor_syncobj.h
 create mode 100644 drivers/gpu/drm/panthor/panthor_xgs.c
 create mode 100644 drivers/gpu/drm/panthor/panthor_xgs.h

-- 
2.45.0

Re: [RFC PATCH 00/10] drm/panthor: Add user submission
Posted by Ketil Johnsen 1 year, 3 months ago
On 28.08.2024 19:25, Mihail Atanassov wrote:
> Hello all,
> 
> This series implements a mechanism to expose Mali CSF GPUs' queue
> ringbuffers directly to userspace, along with paraphernalia to allow
> userspace to control job synchronisation between the CPU and GPU.
> 
> The goal of these changes is to allow userspace to control work
> submission to the FW/HW directly without kernel intervention in the
> common case, thereby reducing context switching overhead. It also allows
> for greater flexibility in the way work is enqueued in the ringbufs.
> For example, the current kernel submit path only supports indirect
> calls, which is inefficient for small command buffers. Userspace can
> also skip unnecessary sync operations.
> 
> This is still a work-in-progress, there's an outstanding issue with
> multiple processes using different submission flows triggering
> scheduling bugs (e.g. the same group getting scheduled twice), but we'd
> love to gather some feedback on the suitability of the approach in
> general and see if there's a clear path to merging something like this
> eventually.
> 
> I've also CCd AMD maintainers because they have in the past done
> something similar[1], in case they want to chime in.
> 
> There are two uses of this new uAPI in Mesa, one in gallium/panfrost
> (link TBD), and one in panvk [2].

Gallium/Panfrost changes to make use of this new user submission API can 
now be found here:
https://gitlab.freedesktop.org/ketil.johnsen/mesa/-/commits/panthor_usersubmit/?ref_type=heads

It is worth repeating, this is just a dumb switch from kernel submission 
to user submission for the Panfrost Gallium driver, no optimizations 
attempted. We have no concrete plans to pursue upstreaming of this for 
the time being. Panvk is our focus in that regard.

--
Regards,
Ketil Johnsen

> The Gallium implementation is a naïve change just to switch the
> submission model and exercise the new kernel code, and we don't plan
> on pursuing this at this time.
> 
> The panvk driver changes are, however, a better representation of the
> intent behind this new uAPI, so please consider that as the reference
> userspace. It is still very much also a work in progress.
> 
>   * patch 1 adds all the uAPI changes;
>   * patch 2 implements the GROUP_CREATE ioctl changes necessary to expose
>     the required objects to userspace;
>   * patch 3 maps the doorbell pages, similarly to how the user I/O page is
>     mapped;
>   * patch 4 implements GROUP_KICK, which lets userspace request an
>     inactive group to be scheduled on the GPU;
>   * patches 5 & 6 implement XGS queues, a way for userspace to
>     synchronise GPU queue progress with DRM syncobjs;
>   * patches 7 & 8 add notification mechanisms for user & kernel to signal
>     changes to native GPU syncobjs.
> 
> [1] https://lore.kernel.org/amd-gfx/CADnq5_N61q_o+5WYUZsZ=qu7VmeXTFHQSxLwTco05gLzHaiswA@mail.gmail.com/t/#m116a36a598d8fad1329e053974ad37a4dc0f28ed
> [2] https://gitlab.freedesktop.org/larsivsi/mesa/-/commits/panvk-v10-usersubmit?ref_type=heads
> 
> Ketil Johnsen (7):
>    drm/panthor: Add uAPI to submit from user space
>    drm/panthor: Extend GROUP_CREATE for user submission
>    drm/panthor: Map doorbell pages
>    drm/panthor: Add GROUP_KICK ioctl
>    drm/panthor: Factor out syncobj handling
>    drm/panthor: Implement XGS queues
>    drm/panthor: Add SYNC_UPDATE ioctl
> 
> Mihail Atanassov (1):
>    drm/panthor: Add sync_update eventfd handling
> 
>   drivers/gpu/drm/panthor/Makefile          |   4 +-
>   drivers/gpu/drm/panthor/panthor_device.c  |  66 ++-
>   drivers/gpu/drm/panthor/panthor_device.h  |  35 +-
>   drivers/gpu/drm/panthor/panthor_drv.c     | 233 +++++++-
>   drivers/gpu/drm/panthor/panthor_fw.c      |   2 +-
>   drivers/gpu/drm/panthor/panthor_sched.c   | 408 +++++++++-----
>   drivers/gpu/drm/panthor/panthor_sched.h   |   8 +-
>   drivers/gpu/drm/panthor/panthor_syncobj.c | 167 ++++++
>   drivers/gpu/drm/panthor/panthor_syncobj.h |  27 +
>   drivers/gpu/drm/panthor/panthor_xgs.c     | 638 ++++++++++++++++++++++
>   drivers/gpu/drm/panthor/panthor_xgs.h     |  42 ++
>   include/uapi/drm/panthor_drm.h            | 243 +++++++-
>   12 files changed, 1696 insertions(+), 177 deletions(-)
>   create mode 100644 drivers/gpu/drm/panthor/panthor_syncobj.c
>   create mode 100644 drivers/gpu/drm/panthor/panthor_syncobj.h
>   create mode 100644 drivers/gpu/drm/panthor/panthor_xgs.c
>   create mode 100644 drivers/gpu/drm/panthor/panthor_xgs.h
> 

Re: [RFC PATCH 00/10] drm/panthor: Add user submission
Posted by Christian König 1 year, 3 months ago
Am 28.08.24 um 19:25 schrieb Mihail Atanassov:
> Hello all,
>
> This series implements a mechanism to expose Mali CSF GPUs' queue
> ringbuffers directly to userspace, along with paraphernalia to allow
> userspace to control job synchronisation between the CPU and GPU.
>
> The goal of these changes is to allow userspace to control work
> submission to the FW/HW directly without kernel intervention in the
> common case, thereby reducing context switching overhead. It also allows
> for greater flexibility in the way work is enqueued in the ringbufs.
> For example, the current kernel submit path only supports indirect
> calls, which is inefficient for small command buffers. Userspace can
> also skip unnecessary sync operations.

Question is how do you guarantee forward progress for fence signaling?

E.g. when are fences created and published? How do they signal?

How are dependencies handled? How can the kernel suspend an userspace queue?

How does memory management work in this case?

Regards,
Christian.

>
> This is still a work-in-progress, there's an outstanding issue with
> multiple processes using different submission flows triggering
> scheduling bugs (e.g. the same group getting scheduled twice), but we'd
> love to gather some feedback on the suitability of the approach in
> general and see if there's a clear path to merging something like this
> eventually.
>
> I've also CCd AMD maintainers because they have in the past done
> something similar[1], in case they want to chime in.
>
> There are two uses of this new uAPI in Mesa, one in gallium/panfrost
> (link TBD), and one in panvk [2].
>
> The Gallium implementation is a naïve change just to switch the
> submission model and exercise the new kernel code, and we don't plan
> on pursuing this at this time.
>
> The panvk driver changes are, however, a better representation of the
> intent behind this new uAPI, so please consider that as the reference
> userspace. It is still very much also a work in progress.
>
>   * patch 1 adds all the uAPI changes;
>   * patch 2 implements the GROUP_CREATE ioctl changes necessary to expose
>     the required objects to userspace;
>   * patch 3 maps the doorbell pages, similarly to how the user I/O page is
>     mapped;
>   * patch 4 implements GROUP_KICK, which lets userspace request an
>     inactive group to be scheduled on the GPU;
>   * patches 5 & 6 implement XGS queues, a way for userspace to
>     synchronise GPU queue progress with DRM syncobjs;
>   * patches 7 & 8 add notification mechanisms for user & kernel to signal
>     changes to native GPU syncobjs.
>
> [1] https://lore.kernel.org/amd-gfx/CADnq5_N61q_o+5WYUZsZ=qu7VmeXTFHQSxLwTco05gLzHaiswA@mail.gmail.com/t/#m116a36a598d8fad1329e053974ad37a4dc0f28ed
> [2] https://gitlab.freedesktop.org/larsivsi/mesa/-/commits/panvk-v10-usersubmit?ref_type=heads
>
> Ketil Johnsen (7):
>    drm/panthor: Add uAPI to submit from user space
>    drm/panthor: Extend GROUP_CREATE for user submission
>    drm/panthor: Map doorbell pages
>    drm/panthor: Add GROUP_KICK ioctl
>    drm/panthor: Factor out syncobj handling
>    drm/panthor: Implement XGS queues
>    drm/panthor: Add SYNC_UPDATE ioctl
>
> Mihail Atanassov (1):
>    drm/panthor: Add sync_update eventfd handling
>
>   drivers/gpu/drm/panthor/Makefile          |   4 +-
>   drivers/gpu/drm/panthor/panthor_device.c  |  66 ++-
>   drivers/gpu/drm/panthor/panthor_device.h  |  35 +-
>   drivers/gpu/drm/panthor/panthor_drv.c     | 233 +++++++-
>   drivers/gpu/drm/panthor/panthor_fw.c      |   2 +-
>   drivers/gpu/drm/panthor/panthor_sched.c   | 408 +++++++++-----
>   drivers/gpu/drm/panthor/panthor_sched.h   |   8 +-
>   drivers/gpu/drm/panthor/panthor_syncobj.c | 167 ++++++
>   drivers/gpu/drm/panthor/panthor_syncobj.h |  27 +
>   drivers/gpu/drm/panthor/panthor_xgs.c     | 638 ++++++++++++++++++++++
>   drivers/gpu/drm/panthor/panthor_xgs.h     |  42 ++
>   include/uapi/drm/panthor_drm.h            | 243 +++++++-
>   12 files changed, 1696 insertions(+), 177 deletions(-)
>   create mode 100644 drivers/gpu/drm/panthor/panthor_syncobj.c
>   create mode 100644 drivers/gpu/drm/panthor/panthor_syncobj.h
>   create mode 100644 drivers/gpu/drm/panthor/panthor_xgs.c
>   create mode 100644 drivers/gpu/drm/panthor/panthor_xgs.h
>

Re: [RFC PATCH 00/10] drm/panthor: Add user submission
Posted by Ketil Johnsen 1 year, 3 months ago
On 29/08/2024 11:40, Christian König wrote:
> Am 28.08.24 um 19:25 schrieb Mihail Atanassov:
>> Hello all,
>>
>> This series implements a mechanism to expose Mali CSF GPUs' queue
>> ringbuffers directly to userspace, along with paraphernalia to allow
>> userspace to control job synchronisation between the CPU and GPU.
>>
>> The goal of these changes is to allow userspace to control work
>> submission to the FW/HW directly without kernel intervention in the
>> common case, thereby reducing context switching overhead. It also allows
>> for greater flexibility in the way work is enqueued in the ringbufs.
>> For example, the current kernel submit path only supports indirect
>> calls, which is inefficient for small command buffers. Userspace can
>> also skip unnecessary sync operations.
> 
> Question is how do you guarantee forward progress for fence signaling?
> 
> E.g. when are fences created and published? How do they signal?

Current XGS queue is built upon an instance of the DRM scheduler, and 
XGS jobs which can not complete immediately are assigned a fence (as one 
would expect). This proposal rely on the DRM scheduler timeout to ensure 
forward progress if user space have encoded a "bad" stream of commands.

PS: We have tried to consider error propagation in case of timeouts, but 
the implementation in this area is most likely somewhat missing ATM (not 
tested).

> How are dependencies handled? How can the kernel suspend an userspace 
> queue?

Mali FW will send IDLE interrupts when a group has become idle. This is 
already (mostly) handled in Panthor today.
There is of course the race to handle then, between GPU IDLE and user 
space submitting new work.

I'm actually working on this part right now. As this patchset stands, it 
doesn't check in the RTPM suspend callback if user space have managed to 
submit more work in the timeframe between "IDLE" and RTPM suspend 
callback. I just need to correctly "unbind" the group/queues, unmap the 
IO pages used, and abort the RTPM suspend if I detect that user space 
have managed to submit more work.

> How does memory management work in this case?

Not sure exactly what you refer to here. There has basically been no 
change to how we handle memory.

If you think of how GPU jobs and VM_BIND interacts, then the change here 
is that it is now the XGS job and VM_BIND which interact. XGS takes on 
the same duties as the GPU jobs with kernel submission (in this regard).

Actually, if you see the submission flow for GPU jobs and XGS jobs, you 
will find that they are virtually identical when it comes to setting up 
fences and dependencies. Same goes for VM_BIND jobs.

--
Regards,
Ketil

> Regards,
> Christian.
> 
>>
>> This is still a work-in-progress, there's an outstanding issue with
>> multiple processes using different submission flows triggering
>> scheduling bugs (e.g. the same group getting scheduled twice), but we'd
>> love to gather some feedback on the suitability of the approach in
>> general and see if there's a clear path to merging something like this
>> eventually.
>>
>> I've also CCd AMD maintainers because they have in the past done
>> something similar[1], in case they want to chime in.
>>
>> There are two uses of this new uAPI in Mesa, one in gallium/panfrost
>> (link TBD), and one in panvk [2].
>>
>> The Gallium implementation is a naïve change just to switch the
>> submission model and exercise the new kernel code, and we don't plan
>> on pursuing this at this time.
>>
>> The panvk driver changes are, however, a better representation of the
>> intent behind this new uAPI, so please consider that as the reference
>> userspace. It is still very much also a work in progress.
>>
>>   * patch 1 adds all the uAPI changes;
>>   * patch 2 implements the GROUP_CREATE ioctl changes necessary to expose
>>     the required objects to userspace;
>>   * patch 3 maps the doorbell pages, similarly to how the user I/O 
>> page is
>>     mapped;
>>   * patch 4 implements GROUP_KICK, which lets userspace request an
>>     inactive group to be scheduled on the GPU;
>>   * patches 5 & 6 implement XGS queues, a way for userspace to
>>     synchronise GPU queue progress with DRM syncobjs;
>>   * patches 7 & 8 add notification mechanisms for user & kernel to signal
>>     changes to native GPU syncobjs.
>>
>> [1] 
>> https://lore.kernel.org/amd-gfx/CADnq5_N61q_o+5WYUZsZ=qu7VmeXTFHQSxLwTco05gLzHaiswA@mail.gmail.com/t/#m116a36a598d8fad1329e053974ad37a4dc0f28ed
>> [2] 
>> https://gitlab.freedesktop.org/larsivsi/mesa/-/commits/panvk-v10-usersubmit?ref_type=heads
>>
>> Ketil Johnsen (7):
>>    drm/panthor: Add uAPI to submit from user space
>>    drm/panthor: Extend GROUP_CREATE for user submission
>>    drm/panthor: Map doorbell pages
>>    drm/panthor: Add GROUP_KICK ioctl
>>    drm/panthor: Factor out syncobj handling
>>    drm/panthor: Implement XGS queues
>>    drm/panthor: Add SYNC_UPDATE ioctl
>>
>> Mihail Atanassov (1):
>>    drm/panthor: Add sync_update eventfd handling
>>
>>   drivers/gpu/drm/panthor/Makefile          |   4 +-
>>   drivers/gpu/drm/panthor/panthor_device.c  |  66 ++-
>>   drivers/gpu/drm/panthor/panthor_device.h  |  35 +-
>>   drivers/gpu/drm/panthor/panthor_drv.c     | 233 +++++++-
>>   drivers/gpu/drm/panthor/panthor_fw.c      |   2 +-
>>   drivers/gpu/drm/panthor/panthor_sched.c   | 408 +++++++++-----
>>   drivers/gpu/drm/panthor/panthor_sched.h   |   8 +-
>>   drivers/gpu/drm/panthor/panthor_syncobj.c | 167 ++++++
>>   drivers/gpu/drm/panthor/panthor_syncobj.h |  27 +
>>   drivers/gpu/drm/panthor/panthor_xgs.c     | 638 ++++++++++++++++++++++
>>   drivers/gpu/drm/panthor/panthor_xgs.h     |  42 ++
>>   include/uapi/drm/panthor_drm.h            | 243 +++++++-
>>   12 files changed, 1696 insertions(+), 177 deletions(-)
>>   create mode 100644 drivers/gpu/drm/panthor/panthor_syncobj.c
>>   create mode 100644 drivers/gpu/drm/panthor/panthor_syncobj.h
>>   create mode 100644 drivers/gpu/drm/panthor/panthor_xgs.c
>>   create mode 100644 drivers/gpu/drm/panthor/panthor_xgs.h
>>
> 

Re: [RFC PATCH 00/10] drm/panthor: Add user submission
Posted by Steven Price 1 year, 3 months ago
Hi Christian,

Mihail should be able to give more definitive answers, but I think I can
answer your questions.

On 29/08/2024 10:40, Christian König wrote:
> Am 28.08.24 um 19:25 schrieb Mihail Atanassov:
>> Hello all,
>>
>> This series implements a mechanism to expose Mali CSF GPUs' queue
>> ringbuffers directly to userspace, along with paraphernalia to allow
>> userspace to control job synchronisation between the CPU and GPU.
>>
>> The goal of these changes is to allow userspace to control work
>> submission to the FW/HW directly without kernel intervention in the
>> common case, thereby reducing context switching overhead. It also allows
>> for greater flexibility in the way work is enqueued in the ringbufs.
>> For example, the current kernel submit path only supports indirect
>> calls, which is inefficient for small command buffers. Userspace can
>> also skip unnecessary sync operations.

> Question is how do you guarantee forward progress for fence signaling?

A timeout. Although looking at it I think it's probably set too high
currently:

> +#define JOB_TIMEOUT_MS				5000

But basically the XGS queue is a DRM scheduler just like a normal GPU
queue and the jobs have a timeout. If the timeout is hit then any fences
will be signalled (with an error).

> E.g. when are fences created and published? How do they signal?
> 
> How are dependencies handled? How can the kernel suspend an userspace
> queue?

The actual userspace queue can be suspended. This is actually a
combination of firmware and kernel driver, and this functionality is
already present without the user submission. The firmware will multiplex
multiple 'groups' onto the hardware, and if there are too many for the
firmware then the kernel multiplexes the extra groups onto the ones the
firmware supports.

I haven't studied Mihail's series in detail yet, but if I understand
correctly, the XGS queues are handled separately and are not suspended
when the hardware queues are suspended. I guess this might be an area
for improvement and might explain the currently very high timeout (to
deal with the case where the actual GPU work has been suspended).

> How does memory management work in this case?

I'm not entirely sure what you mean here. If you are referring to the
potential memory issues with signalling path then this should be handled
by the timeout - although I haven't studied the code to check for bugs here.

The actual new XGS queues don't allocate/free memory during the queue
execution - so it's just the memory usage related to fences (and the
other work which could be blocked on the fence).

In terms of memory management for the GPU work itself, this is handled
the same as before. The VM_BIND mechanism allows dependencies to be
created between syncobjs and VM operations, with XGS these can then be
tied to GPU HW events.


Fundamentally (modulo bugs) there is little change compared to kernel
submission - it's already fairly trivial to write GPU job which will
make no forward progress (a 'while (1)' equivalent job). The only
difference here is that XGS makes this 'easy' and doesn't involve the
GPU spinning. Either way we rely on a timeout to recover from these
situations.

Thanks,
Steve

> Regards,
> Christian.
> 
>>
>> This is still a work-in-progress, there's an outstanding issue with
>> multiple processes using different submission flows triggering
>> scheduling bugs (e.g. the same group getting scheduled twice), but we'd
>> love to gather some feedback on the suitability of the approach in
>> general and see if there's a clear path to merging something like this
>> eventually.
>>
>> I've also CCd AMD maintainers because they have in the past done
>> something similar[1], in case they want to chime in.
>>
>> There are two uses of this new uAPI in Mesa, one in gallium/panfrost
>> (link TBD), and one in panvk [2].
>>
>> The Gallium implementation is a naïve change just to switch the
>> submission model and exercise the new kernel code, and we don't plan
>> on pursuing this at this time.
>>
>> The panvk driver changes are, however, a better representation of the
>> intent behind this new uAPI, so please consider that as the reference
>> userspace. It is still very much also a work in progress.
>>
>>   * patch 1 adds all the uAPI changes;
>>   * patch 2 implements the GROUP_CREATE ioctl changes necessary to expose
>>     the required objects to userspace;
>>   * patch 3 maps the doorbell pages, similarly to how the user I/O
>> page is
>>     mapped;
>>   * patch 4 implements GROUP_KICK, which lets userspace request an
>>     inactive group to be scheduled on the GPU;
>>   * patches 5 & 6 implement XGS queues, a way for userspace to
>>     synchronise GPU queue progress with DRM syncobjs;
>>   * patches 7 & 8 add notification mechanisms for user & kernel to signal
>>     changes to native GPU syncobjs.
>>
>> [1]
>> https://lore.kernel.org/amd-gfx/CADnq5_N61q_o+5WYUZsZ=qu7VmeXTFHQSxLwTco05gLzHaiswA@mail.gmail.com/t/#m116a36a598d8fad1329e053974ad37a4dc0f28ed
>> [2]
>> https://gitlab.freedesktop.org/larsivsi/mesa/-/commits/panvk-v10-usersubmit?ref_type=heads
>>
>> Ketil Johnsen (7):
>>    drm/panthor: Add uAPI to submit from user space
>>    drm/panthor: Extend GROUP_CREATE for user submission
>>    drm/panthor: Map doorbell pages
>>    drm/panthor: Add GROUP_KICK ioctl
>>    drm/panthor: Factor out syncobj handling
>>    drm/panthor: Implement XGS queues
>>    drm/panthor: Add SYNC_UPDATE ioctl
>>
>> Mihail Atanassov (1):
>>    drm/panthor: Add sync_update eventfd handling
>>
>>   drivers/gpu/drm/panthor/Makefile          |   4 +-
>>   drivers/gpu/drm/panthor/panthor_device.c  |  66 ++-
>>   drivers/gpu/drm/panthor/panthor_device.h  |  35 +-
>>   drivers/gpu/drm/panthor/panthor_drv.c     | 233 +++++++-
>>   drivers/gpu/drm/panthor/panthor_fw.c      |   2 +-
>>   drivers/gpu/drm/panthor/panthor_sched.c   | 408 +++++++++-----
>>   drivers/gpu/drm/panthor/panthor_sched.h   |   8 +-
>>   drivers/gpu/drm/panthor/panthor_syncobj.c | 167 ++++++
>>   drivers/gpu/drm/panthor/panthor_syncobj.h |  27 +
>>   drivers/gpu/drm/panthor/panthor_xgs.c     | 638 ++++++++++++++++++++++
>>   drivers/gpu/drm/panthor/panthor_xgs.h     |  42 ++
>>   include/uapi/drm/panthor_drm.h            | 243 +++++++-
>>   12 files changed, 1696 insertions(+), 177 deletions(-)
>>   create mode 100644 drivers/gpu/drm/panthor/panthor_syncobj.c
>>   create mode 100644 drivers/gpu/drm/panthor/panthor_syncobj.h
>>   create mode 100644 drivers/gpu/drm/panthor/panthor_xgs.c
>>   create mode 100644 drivers/gpu/drm/panthor/panthor_xgs.h
>>
>