The current code hands out buffers that go all the way up to and
including `rx - 1`, but we need to maintain an empty slot to prevent the
ring buffer from wrapping around into having 'tx == rx', which means
empty.
Also add more rigorous no-panic proofs.
Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
drivers/gpu/nova-core/gsp/cmdq.rs | 31 +++++++++++++++++--------------
1 file changed, 17 insertions(+), 14 deletions(-)
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 09c28eeb6f12..b6d6093e3ac0 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -227,21 +227,24 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
// PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
- if rx <= tx {
- // The area from `tx` up to the end of the ring, and from the beginning of the ring up
- // to `rx`, minus one unit, belongs to the driver.
- if rx == 0 {
- let last = after_tx.len() - 1;
- (&mut after_tx[..last], &mut before_tx[0..0])
- } else {
- (after_tx, &mut before_tx[..rx])
- }
+ // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
+ // belongs to the driver for writing.
+ if rx == 0 {
+ // Since `rx` is zero, leave an empty slot at end of the buffer.
+ let last = after_tx.len() - 1;
+ (&mut after_tx[..last], &mut before_tx[0..0])
+ } else if rx > tx {
+ // The area is contiguous and we leave an empty slot before `rx`.
+ // PANIC: since `rx > tx` we have `rx - tx - 1 >= 0`
+ // PANIC: since `tx < rx < MSGQ_NUM_PAGES && after_tx.len() == MSGQ_NUM_PAGES - tx`:
+ // `rx - 1 <= MSGQ_NUM_PAGES` -> `rx - tx - 1 <= MSGQ_NUM_PAGES - tx`
+ // -> `rx - tx - 1 <= after_tx.len()`
+ (&mut after_tx[..(rx - tx - 1)], &mut before_tx[0..0])
} else {
- // The area from `tx` to `rx`, minus one unit, belongs to the driver.
- //
- // PANIC: per the invariants of `cpu_write_ptr` and `gsp_read_ptr`, `rx` and `tx` are
- // `<= MSGQ_NUM_PAGES`, and the test above ensured that `rx > tx`.
- (after_tx.split_at_mut(rx - tx).0, &mut before_tx[0..0])
+ // The area is discontiguous and we leave an empty slot before `rx`.
+ // PANIC: since `rx != 0 && rx is unsigned` we have `rx - 1 >= 0`
+ // PANIC: since `rx <= tx && before_tx.len() == tx` we have `rx - 1 <= before_tx.len()`
+ (after_tx, &mut before_tx[..(rx - 1)])
}
}
--
2.52.0
On Thu Jan 22, 2026 at 2:59 AM GMT, Eliot Courtney wrote:
> The current code hands out buffers that go all the way up to and
> including `rx - 1`, but we need to maintain an empty slot to prevent the
> ring buffer from wrapping around into having 'tx == rx', which means
> empty.
Doesn't this mean that you're effectively wasting a slot? You can easily
implementing a ring buffer that allows you to disambiguate empty/full while
still using all slots.
A common approach is to only do modulo/masking operation before accessing the
slot. Then `write_ptr.wrapping_sub(read_ptr)` will give you the accurate length of
things inside the ring buffer.
Best,
Gary
>
> Also add more rigorous no-panic proofs.
>
> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
> ---
> drivers/gpu/nova-core/gsp/cmdq.rs | 31 +++++++++++++++++--------------
> 1 file changed, 17 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index 09c28eeb6f12..b6d6093e3ac0 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -227,21 +227,24 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
> // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
> let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
>
> - if rx <= tx {
> - // The area from `tx` up to the end of the ring, and from the beginning of the ring up
> - // to `rx`, minus one unit, belongs to the driver.
> - if rx == 0 {
> - let last = after_tx.len() - 1;
> - (&mut after_tx[..last], &mut before_tx[0..0])
> - } else {
> - (after_tx, &mut before_tx[..rx])
> - }
> + // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
> + // belongs to the driver for writing.
> + if rx == 0 {
> + // Since `rx` is zero, leave an empty slot at end of the buffer.
> + let last = after_tx.len() - 1;
> + (&mut after_tx[..last], &mut before_tx[0..0])
> + } else if rx > tx {
> + // The area is contiguous and we leave an empty slot before `rx`.
> + // PANIC: since `rx > tx` we have `rx - tx - 1 >= 0`
> + // PANIC: since `tx < rx < MSGQ_NUM_PAGES && after_tx.len() == MSGQ_NUM_PAGES - tx`:
> + // `rx - 1 <= MSGQ_NUM_PAGES` -> `rx - tx - 1 <= MSGQ_NUM_PAGES - tx`
> + // -> `rx - tx - 1 <= after_tx.len()`
> + (&mut after_tx[..(rx - tx - 1)], &mut before_tx[0..0])
> } else {
> - // The area from `tx` to `rx`, minus one unit, belongs to the driver.
> - //
> - // PANIC: per the invariants of `cpu_write_ptr` and `gsp_read_ptr`, `rx` and `tx` are
> - // `<= MSGQ_NUM_PAGES`, and the test above ensured that `rx > tx`.
> - (after_tx.split_at_mut(rx - tx).0, &mut before_tx[0..0])
> + // The area is discontiguous and we leave an empty slot before `rx`.
> + // PANIC: since `rx != 0 && rx is unsigned` we have `rx - 1 >= 0`
> + // PANIC: since `rx <= tx && before_tx.len() == tx` we have `rx - 1 <= before_tx.len()`
> + (after_tx, &mut before_tx[..(rx - 1)])
> }
> }
>
On Sat Jan 24, 2026 at 3:31 AM JST, Gary Guo wrote: > On Thu Jan 22, 2026 at 2:59 AM GMT, Eliot Courtney wrote: >> The current code hands out buffers that go all the way up to and >> including `rx - 1`, but we need to maintain an empty slot to prevent the >> ring buffer from wrapping around into having 'tx == rx', which means >> empty. > > Doesn't this mean that you're effectively wasting a slot? You can easily > implementing a ring buffer that allows you to disambiguate empty/full while > still using all slots. > > A common approach is to only do modulo/masking operation before accessing the > slot. Then `write_ptr.wrapping_sub(read_ptr)` will give you the accurate length of > things inside the ring buffer. Yes, we are wasting a slot (the current code also does this). But, this is the protocol between the CPU and GSP, so the GSP actually expects this and we can't change the semantics easily, IIUC.
On 1/21/26 6:59 PM, Eliot Courtney wrote:
> + // The area is contiguous and we leave an empty slot before `rx`.
> + // PANIC: since `rx > tx` we have `rx - tx - 1 >= 0`
> + // PANIC: since `tx < rx < MSGQ_NUM_PAGES && after_tx.len() == MSGQ_NUM_PAGES - tx`:
> + // `rx - 1 <= MSGQ_NUM_PAGES` -> `rx - tx - 1 <= MSGQ_NUM_PAGES - tx`
> + // -> `rx - tx - 1 <= after_tx.len()`
Hi Eliot,
Documentation nit: the proofs are great, but the above just does
not go into my head easily, because it's a proof, rather than a
sentence.
Can you please reword these PANIC comments so that they are complete
sentences, along the lines of:
// PANIC: a > b and therefore c cannot overflow, therefore this
// cannot panic.
And please also use words "and", "therefore", "because", instead of
symbols such as "&&".
Same for the other patch with PANIC comments.
I did a quick search in Rust for Linux, before writing this, in order to
ensure that what I'm recommending is How It Is Done. (Not to claim that
Nova in particular is fully correct, though.)
thanks,
--
John Hubbard
On Thu Jan 22, 2026 at 12:26 PM JST, John Hubbard wrote: > On 1/21/26 6:59 PM, Eliot Courtney wrote: >> + // The area is contiguous and we leave an empty slot before `rx`. >> + // PANIC: since `rx > tx` we have `rx - tx - 1 >= 0` >> + // PANIC: since `tx < rx < MSGQ_NUM_PAGES && after_tx.len() == MSGQ_NUM_PAGES - tx`: >> + // `rx - 1 <= MSGQ_NUM_PAGES` -> `rx - tx - 1 <= MSGQ_NUM_PAGES - tx` >> + // -> `rx - tx - 1 <= after_tx.len()` > > Hi Eliot, > > Documentation nit: the proofs are great, but the above just does > not go into my head easily, because it's a proof, rather than a > sentence. I had a look now and I agree that it looks like plain English is the defacto standard for the PANIC comments, so I will update them. But, I wonder what people think about this. IMO it makes sense to have SAFETY and PANIC comments as rigorous proofs (where practical and possible) to match the level of work the compiler does for us in the infalliable areas of the code - if an issue occurs, unsafe or panicking code is often the root cause IMO. Writing these in plain English is easier to read but also harder to verify that the proof is correct and harder to verify if there are any implicit assumptions. I see there are some guidelines about SAFETY: comments but not about PANIC: comments in Documentation/rust/coding-guidelines.rst.
On Thu Jan 22, 2026 at 2:07 PM JST, Eliot Courtney wrote: > On Thu Jan 22, 2026 at 12:26 PM JST, John Hubbard wrote: >> On 1/21/26 6:59 PM, Eliot Courtney wrote: >>> + // The area is contiguous and we leave an empty slot before `rx`. >>> + // PANIC: since `rx > tx` we have `rx - tx - 1 >= 0` >>> + // PANIC: since `tx < rx < MSGQ_NUM_PAGES && after_tx.len() == MSGQ_NUM_PAGES - tx`: >>> + // `rx - 1 <= MSGQ_NUM_PAGES` -> `rx - tx - 1 <= MSGQ_NUM_PAGES - tx` >>> + // -> `rx - tx - 1 <= after_tx.len()` >> >> Hi Eliot, >> >> Documentation nit: the proofs are great, but the above just does >> not go into my head easily, because it's a proof, rather than a >> sentence. > I had a look now and I agree that it looks like plain English is the > defacto standard for the PANIC comments, so I will update them. > > But, I wonder what people think about this. IMO it makes sense to have > SAFETY and PANIC comments as rigorous proofs (where practical and > possible) to match the level of work the compiler does for us in the > infalliable areas of the code - if an issue occurs, unsafe or panicking > code is often the root cause IMO. Writing these in plain English is > easier to read but also harder to verify that the proof is correct and > harder to verify if there are any implicit assumptions. > > I see there are some guidelines about SAFETY: comments but not about > PANIC: comments in Documentation/rust/coding-guidelines.rst. Would be interesting to have Miguel/Alice and the core team's opinion on this IMHO.
On Thu, Jan 22, 2026 at 10:29 AM Alexandre Courbot <acourbot@nvidia.com> wrote: > > Would be interesting to have Miguel/Alice and the core team's opinion > on this IMHO. [ Re-replying to this so that it gets saved in the list. ] This is something that we have discussed back and forth, including making it way more formal, even using special notation, etc. Please take a look at Benno’s proposal at: https://lore.kernel.org/all/20240717221133.459589-1-benno.lossin@proton.me/ (He may have a newer link) Cheers, Miguel
© 2016 - 2026 Red Hat, Inc.