[v3] gpu: nova-core: gsp: add locking to Cmdq

[PATCH v3 0/5] gpu: nova-core: gsp: add locking to Cmdq

Posted by Eliot Courtney 1 month, 1 week ago

Add locking to Cmdq. This is required e.g. for unloading the driver,
which needs to send the UnloadingGuestDriver via the command queue
on unbind which may be on a different thread.

We have commands that need a reply and commands that don't. For
commands with a reply we want to make sure that they don't get
the reply of a different command back. The approach this patch series
takes is by making those commands block until they get a response. For
now this should be ok, and we expect GSP to be fast anyway.

To do this, we need to know which commands expect a reply and which
don't. John's existing series[1] adds IS_ASYNC which solves part of the
problem, but we need to know a bit more. So instead, add an
associated type called Reply which tells us what the reply is.

An alternative would be to define traits inheriting CommandToGsp, e.g.
CommandWithReply and CommandWithoutReply, instead of using the
associated type. I implemented the associated type version because it
feels more compositional rather than inherity so seemed a bit better
to me. But both of these approaches work and are fine, IMO.

In summary, this patch series has three steps:
1. Add the type infrastructure to know what replies are expected for a
command and update each caller to explicitly wait for the reply or
not.
2. Make Cmdq pinned so we can use Mutex
3. Add a Mutex to protect Cmdq by moving the relevant state to an
inner struct.

[1]: https://lore.kernel.org/all/20260211000451.192109-1-jhubbard@nvidia.com/

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
Changes in v3:
- Rename send_sync_command/send_async_command to
send_command/send_command_no_wait.
- Move `dev` field into `CmdqInner` to avoid passing it through method
parameters.
- Add `RECEIVE_TIMEOUT` constant for the 10s receive timeout.
- Link to v2: https://lore.kernel.org/r/20260226-cmdq-locking-v2-0-c7e16a6d5885@nvidia.com

Changes in v2:
- Rebase on drm-rust-next
- Link to v1: https://lore.kernel.org/r/20260225-cmdq-locking-v1-0-bbf6b4156706@nvidia.com

---
Eliot Courtney (5):
gpu: nova-core: gsp: fix stale doc comments on command queue methods
gpu: nova-core: gsp: add `RECEIVE_TIMEOUT` constant for command queue
gpu: nova-core: gsp: add reply/no-reply info to `CommandToGsp`
gpu: nova-core: gsp: make `Cmdq` a pinned type
gpu: nova-core: gsp: add mutex locking to Cmdq

drivers/gpu/nova-core/gsp.rs | 5 +-
drivers/gpu/nova-core/gsp/boot.rs | 13 +-
drivers/gpu/nova-core/gsp/cmdq.rs | 225 +++++++++++++++++--------
drivers/gpu/nova-core/gsp/cmdq/continuation.rs | 5 +-
drivers/gpu/nova-core/gsp/commands.rs | 23 +--
drivers/gpu/nova-core/gsp/sequencer.rs | 4 +-
6 files changed, 180 insertions(+), 95 deletions(-)
---
base-commit: 4a49fe23e357b48845e31fe9c28a802c05458198
change-id: 20260225-cmdq-locking-d32928a2c2cf
prerequisite-message-id: <20260304-cmdq-continuation-v5-0-3f19d759ed93@nvidia.com>
prerequisite-patch-id: fd45bc5b8eda5e2b54a052dddb1a1c363107f0a7
prerequisite-patch-id: 06fe65f900206c44b5ba52286ca4ce1ca42b55d5
prerequisite-patch-id: 8844970d0e387488c70979a73732579ba174b46c
prerequisite-patch-id: e138a94ed48fa8d50d5ed1eb36524f98923ed478
prerequisite-patch-id: dccf2b12b176947e89b44baafda9c5a0aa0a93bc
prerequisite-patch-id: 30ed64c398e541d6efbcb2e46ed9a9e6cf953f4f
prerequisite-patch-id: ba1c8da0cbdb4682b879633a94a172d1b2b6bc8e
prerequisite-patch-id: 081d4a4198a0bf09f3480cb8baf296db585decce
prerequisite-patch-id: 56c8c25e7362178cd019c8f03954a6bcdb72b1b5

Best regards,
--
Eliot Courtney <ecourtney@nvidia.com>

Re: [PATCH v3 0/5] gpu: nova-core: gsp: add locking to Cmdq

Posted by Alexandre Courbot 1 month, 1 week ago

On Wed Mar 4, 2026 at 11:46 AM JST, Eliot Courtney wrote:
> Add locking to Cmdq. This is required e.g. for unloading the driver,
> which needs to send the UnloadingGuestDriver via the command queue
> on unbind which may be on a different thread.
>
> We have commands that need a reply and commands that don't. For
> commands with a reply we want to make sure that they don't get
> the reply of a different command back. The approach this patch series
> takes is by making those commands block until they get a response. For
> now this should be ok, and we expect GSP to be fast anyway.
>
> To do this, we need to know which commands expect a reply and which
> don't. John's existing series[1] adds IS_ASYNC which solves part of the
> problem, but we need to know a bit more. So instead, add an
> associated type called Reply which tells us what the reply is.
>
> An alternative would be to define traits inheriting CommandToGsp, e.g.
> CommandWithReply and CommandWithoutReply, instead of using the
> associated type. I implemented the associated type version because it
> feels more compositional rather than inherity so seemed a bit better
> to me. But both of these approaches work and are fine, IMO.

The associated type seems to work just fine. I was wondering whether we
could mess with it by using other types that do not implement
`MessageFromGsp`, but in that case we cannot call any of the
`send_command` methods so this approach looks good to me.