[v5] gpu: nova-core: gsp: add locking to Cmdq

[PATCH v5 0/5] gpu: nova-core: gsp: add locking to Cmdq

Posted by Eliot Courtney 2 weeks, 5 days ago

Add locking to Cmdq. This is required e.g. for unloading the driver,
which needs to send the UnloadingGuestDriver via the command queue
on unbind which may be on a different thread.

We have commands that need a reply and commands that don't. For
commands with a reply we want to make sure that they don't get
the reply of a different command back. The approach this patch series
takes is by making those commands block until they get a response. For
now this should be ok, and we expect GSP to be fast anyway.

To do this, we need to know which commands expect a reply and which
don't. John's existing series[1] adds IS_ASYNC which solves part of the
problem, but we need to know a bit more. So instead, add an
associated type called Reply which tells us what the reply is.

An alternative would be to define traits inheriting CommandToGsp, e.g.
CommandWithReply and CommandWithoutReply, instead of using the
associated type. I implemented the associated type version because it
feels more compositional rather than inherity so seemed a bit better
to me. But both of these approaches work and are fine, IMO.

In summary, this patch series has three steps:
1. Add the type infrastructure to know what replies are expected for a
command and update each caller to explicitly wait for the reply or
not.
2. Make Cmdq pinned so we can use Mutex
3. Add a Mutex to protect Cmdq by moving the relevant state to an
inner struct.

[1]: https://lore.kernel.org/all/20260211000451.192109-1-jhubbard@nvidia.com/

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
Changes in v5:
- tweak documentation
- rebase on drm-rust-next
- Link to v4: https://lore.kernel.org/r/20260310-cmdq-locking-v4-0-4e5c4753c408@nvidia.com

Changes in v4:
- Change RECEIVE_TIMEOUT from 10s to 5s
- Update NoReply doc comment
- Move CommandToGsp doc comment update
- Fix import formatting in continuation.rs
- Move CmdqInner after Cmdq to reduce diff size
- Link to v3: https://lore.kernel.org/r/20260304-cmdq-locking-v3-0-a6314b708850@nvidia.com

Changes in v3:
- Rename send_sync_command/send_async_command to
send_command/send_command_no_wait.
- Move `dev` field into `CmdqInner` to avoid passing it through method
parameters.
- Add `RECEIVE_TIMEOUT` constant for the 10s receive timeout.
- Link to v2: https://lore.kernel.org/r/20260226-cmdq-locking-v2-0-c7e16a6d5885@nvidia.com

Changes in v2:
- Rebase on drm-rust-next
- Link to v1: https://lore.kernel.org/r/20260225-cmdq-locking-v1-0-bbf6b4156706@nvidia.com

---
Eliot Courtney (5):
gpu: nova-core: gsp: fix stale doc comments on command queue methods
gpu: nova-core: gsp: add `RECEIVE_TIMEOUT` constant for command queue
gpu: nova-core: gsp: add reply/no-reply info to `CommandToGsp`
gpu: nova-core: gsp: make `Cmdq` a pinned type
gpu: nova-core: gsp: add mutex locking to Cmdq

drivers/gpu/nova-core/gsp.rs | 5 +-
drivers/gpu/nova-core/gsp/boot.rs | 13 +-
drivers/gpu/nova-core/gsp/cmdq.rs | 159 +++++++++++++++++++------
drivers/gpu/nova-core/gsp/cmdq/continuation.rs | 8 +-
drivers/gpu/nova-core/gsp/commands.rs | 23 ++--
drivers/gpu/nova-core/gsp/sequencer.rs | 4 +-
6 files changed, 151 insertions(+), 61 deletions(-)
---
base-commit: d19ab42867ae7c68be84ed957d95712b7934773f
change-id: 20260225-cmdq-locking-d32928a2c2cf

Best regards,
--
Eliot Courtney <ecourtney@nvidia.com>

Re: [PATCH v5 0/5] gpu: nova-core: gsp: add locking to Cmdq

Posted by Alexandre Courbot 2 weeks, 4 days ago

On Wed Mar 18, 2026 at 1:07 PM JST, Eliot Courtney wrote:
> Add locking to Cmdq. This is required e.g. for unloading the driver,
> which needs to send the UnloadingGuestDriver via the command queue
> on unbind which may be on a different thread.
>
> We have commands that need a reply and commands that don't. For
> commands with a reply we want to make sure that they don't get
> the reply of a different command back. The approach this patch series
> takes is by making those commands block until they get a response. For
> now this should be ok, and we expect GSP to be fast anyway.
>
> To do this, we need to know which commands expect a reply and which
> don't. John's existing series[1] adds IS_ASYNC which solves part of the
> problem, but we need to know a bit more. So instead, add an
> associated type called Reply which tells us what the reply is.
>
> An alternative would be to define traits inheriting CommandToGsp, e.g.
> CommandWithReply and CommandWithoutReply, instead of using the
> associated type. I implemented the associated type version because it
> feels more compositional rather than inherity so seemed a bit better
> to me. But both of these approaches work and are fine, IMO.
>
> In summary, this patch series has three steps:
> 1. Add the type infrastructure to know what replies are expected for a
>    command and update each caller to explicitly wait for the reply or
>    not.
> 2. Make Cmdq pinned so we can use Mutex
> 3. Add a Mutex to protect Cmdq by moving the relevant state to an
>    inner struct.
>
> [1]: https://lore.kernel.org/all/20260211000451.192109-1-jhubbard@nvidia.com/
>
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>

Merged into drm-rust-next, thanks!