[v1] gpu: nova-core: gsp: add RM control command infrastructure

[PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`

Posted by Eliot Courtney 1 month, 1 week ago

Add general `flush_into_vec` function. Add `flush_into_kvvec`
convenience wrapper alongside the existing `flush_into_kvec` function.
This is generally useful but immediately used for e.g. holding RM
control payloads, which can be large (~>=20 KiB).

Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>
---
 drivers/gpu/nova-core/sbuffer.rs | 31 +++++++++++++++++++++++++++----
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/nova-core/sbuffer.rs b/drivers/gpu/nova-core/sbuffer.rs
index 3a41d224c77a..38f8a8426521 100644
--- a/drivers/gpu/nova-core/sbuffer.rs
+++ b/drivers/gpu/nova-core/sbuffer.rs
@@ -2,7 +2,13 @@
 
 use core::ops::Deref;
 
-use kernel::prelude::*;
+use kernel::{
+    alloc::{
+        Allocator,
+        KVec, //
+    },
+    prelude::*, //
+};
 
 /// A buffer abstraction for discontiguous byte slices.
 ///
@@ -162,11 +168,14 @@ pub(crate) fn read_exact(&mut self, mut dst: &mut [u8]) -> Result {
         Ok(())
     }
 
-    /// Read all the remaining data into a [`KVec`].
+    /// Read all the remaining data into a [`Vec`] with the given allocator.
     ///
     /// `self` will be empty after this operation.
-    pub(crate) fn flush_into_kvec(&mut self, flags: kernel::alloc::Flags) -> Result<KVec<u8>> {
-        let mut buf = KVec::<u8>::new();
+    pub(crate) fn flush_into_vec<A: Allocator>(
+        &mut self,
+        flags: kernel::alloc::Flags,
+    ) -> Result<Vec<u8, A>> {
+        let mut buf = Vec::<u8, A>::new();
 
         if let Some(slice) = core::mem::take(&mut self.cur_slice) {
             buf.extend_from_slice(slice, flags)?;
@@ -177,6 +186,20 @@ pub(crate) fn flush_into_kvec(&mut self, flags: kernel::alloc::Flags) -> Result<
 
         Ok(buf)
     }
+
+    /// Read all the remaining data into a [`KVec`].
+    ///
+    /// `self` will be empty after this operation.
+    pub(crate) fn flush_into_kvec(&mut self, flags: kernel::alloc::Flags) -> Result<KVec<u8>> {
+        self.flush_into_vec(flags)
+    }
+
+    /// Read all the remaining data into a [`KVVec`].
+    ///
+    /// `self` will be empty after this operation.
+    pub(crate) fn flush_into_kvvec(&mut self, flags: kernel::alloc::Flags) -> Result<KVVec<u8>> {
+        self.flush_into_vec(flags)
+    }
 }
 
 /// Provides a way to get mutable slices of data to write into.

-- 
2.53.0

Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`

Posted by Danilo Krummrich 4 weeks ago

On 2/27/2026 1:32 PM, Eliot Courtney wrote:
> Add general `flush_into_vec` function. Add `flush_into_kvvec`
> convenience wrapper alongside the existing `flush_into_kvec` function.
> This is generally useful but immediately used for e.g. holding RM
> control payloads, which can be large (~>=20 KiB).

Why not just always use KVVec? It also seems that the KVec variant is not used?

If there's no reason for having both, I'd also just call this into_vec().

Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`

Posted by Danilo Krummrich 4 weeks ago

On Mon Mar 9, 2026 at 10:57 PM CET, Danilo Krummrich wrote:
> On 2/27/2026 1:32 PM, Eliot Courtney wrote:
>> Add general `flush_into_vec` function. Add `flush_into_kvvec`
>> convenience wrapper alongside the existing `flush_into_kvec` function.
>> This is generally useful but immediately used for e.g. holding RM
>> control payloads, which can be large (~>=20 KiB).
>
> Why not just always use KVVec? It also seems that the KVec variant is not used?

(Besides its single usage in GspSequence, which wouldn't hurt to be a KVVec.)

> If there's no reason for having both, I'd also just call this into_vec().

Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`

Posted by Eliot Courtney 3 weeks, 1 day ago

On Tue Mar 10, 2026 at 7:01 AM JST, Danilo Krummrich wrote:
> On Mon Mar 9, 2026 at 10:57 PM CET, Danilo Krummrich wrote:
>> On 2/27/2026 1:32 PM, Eliot Courtney wrote:
>>> Add general `flush_into_vec` function. Add `flush_into_kvvec`
>>> convenience wrapper alongside the existing `flush_into_kvec` function.
>>> This is generally useful but immediately used for e.g. holding RM
>>> control payloads, which can be large (~>=20 KiB).
>>
>> Why not just always use KVVec? It also seems that the KVec variant is not used?
>
> (Besides its single usage in GspSequence, which wouldn't hurt to be a KVVec.)
>
>> If there's no reason for having both, I'd also just call this into_vec().

I think always using KVVec should be fine, thanks!

For the naming, I think `read_to_vec` may be more conventional for this
-- `into_vec` implies consuming the object, but if we want to keep the
warning in `Cmdq::receive_msg` if not all the data is consumed we need
to take &mut self.

Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`

Posted by Danilo Krummrich 3 weeks, 1 day ago

(Cc: Gary)

On Mon Mar 16, 2026 at 12:44 PM CET, Eliot Courtney wrote:
> On Tue Mar 10, 2026 at 7:01 AM JST, Danilo Krummrich wrote:
>> On Mon Mar 9, 2026 at 10:57 PM CET, Danilo Krummrich wrote:
>>> On 2/27/2026 1:32 PM, Eliot Courtney wrote:
>>>> Add general `flush_into_vec` function. Add `flush_into_kvvec`
>>>> convenience wrapper alongside the existing `flush_into_kvec` function.
>>>> This is generally useful but immediately used for e.g. holding RM
>>>> control payloads, which can be large (~>=20 KiB).
>>>
>>> Why not just always use KVVec? It also seems that the KVec variant is not used?
>>
>> (Besides its single usage in GspSequence, which wouldn't hurt to be a KVVec.)
>>
>>> If there's no reason for having both, I'd also just call this into_vec().
>
> I think always using KVVec should be fine, thanks!
>
> For the naming, I think `read_to_vec` may be more conventional for this
> -- `into_vec` implies consuming the object, but if we want to keep the
> warning in `Cmdq::receive_msg` if not all the data is consumed we need
> to take &mut self.

I had another look at this and especially how the SBuffer you refer to is used.
Unfortunately, the underlying code is broken.

driver_read_area() creates a reference to the whole DMA object, including the
area the GSP might concurrently write to. This is undefined behavior. See also
commit commit 0073a17b4666 ("gpu: nova-core: gsp: fix UB in DmaGspMem pointer
accessors"), where I fixed something similar.

Additionally, even if it would only create a reference to the part of the buffer
that can be considerd untouched by the GSP and hence suits for creating a
reference, driver_read_area() and all subsequent callers would still need to be
unsafe as they would need to promise to not keep the reference alive beyond GSP
accessing that memory region again.

(The situation is similar for driver_write_area().)

So, unfortunately, commit 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command
queue bindings and handling") seems broken in this aspect.

This needs to be fixed first, and I think we should probably create a copy in
driver_read_area() right away.

I don't want to merge any code that builds on top of this before we have sorted
this out.

Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`

Posted by Alexandre Courbot 3 weeks ago

On Mon Mar 16, 2026 at 9:21 PM JST, Danilo Krummrich wrote:
> (Cc: Gary)
>
> On Mon Mar 16, 2026 at 12:44 PM CET, Eliot Courtney wrote:
>> On Tue Mar 10, 2026 at 7:01 AM JST, Danilo Krummrich wrote:
>>> On Mon Mar 9, 2026 at 10:57 PM CET, Danilo Krummrich wrote:
>>>> On 2/27/2026 1:32 PM, Eliot Courtney wrote:
>>>>> Add general `flush_into_vec` function. Add `flush_into_kvvec`
>>>>> convenience wrapper alongside the existing `flush_into_kvec` function.
>>>>> This is generally useful but immediately used for e.g. holding RM
>>>>> control payloads, which can be large (~>=20 KiB).
>>>>
>>>> Why not just always use KVVec? It also seems that the KVec variant is not used?
>>>
>>> (Besides its single usage in GspSequence, which wouldn't hurt to be a KVVec.)
>>>
>>>> If there's no reason for having both, I'd also just call this into_vec().
>>
>> I think always using KVVec should be fine, thanks!
>>
>> For the naming, I think `read_to_vec` may be more conventional for this
>> -- `into_vec` implies consuming the object, but if we want to keep the
>> warning in `Cmdq::receive_msg` if not all the data is consumed we need
>> to take &mut self.
>
> I had another look at this and especially how the SBuffer you refer to is used.
> Unfortunately, the underlying code is broken.
>
> driver_read_area() creates a reference to the whole DMA object, including the
> area the GSP might concurrently write to. This is undefined behavior. See also
> commit commit 0073a17b4666 ("gpu: nova-core: gsp: fix UB in DmaGspMem pointer
> accessors"), where I fixed something similar.

We shouldn't be doing that - I think we are limited by the current
CoherentAllocation API though. But IIUC this is something that I/O
projections will allow us to handle properly?

>
> Additionally, even if it would only create a reference to the part of the buffer
> that can be considerd untouched by the GSP and hence suits for creating a
> reference, driver_read_area() and all subsequent callers would still need to be
> unsafe as they would need to promise to not keep the reference alive beyond GSP
> accessing that memory region again.

This is guaranteed by the inability to update the CPU read pointer for
as long as the slices exists.

To expand a bit: `driver_read_area` returns a slice to the area of the
DMA object that the GSP is guaranteed *not* to write into until the
driver updates the CPU read pointer.

This area is between the CPU read pointer (which signals the next bytes
the CPU has to read, and which the GSP won't cross) and the GSP write
pointer (i.e. the next page to be written by the GSP).

Everything in this zone is data that the GSP has already written but the
driver hasn't read yet at the time of the call.

The CPU read pointer cannot be updated for as long as the returned
slices exist - the slices hold a reference to the `DmaGspMem`, and
updating the read pointer requires a mutable reference to the same
`DmaGspMem`.

Meanwhile, the GSP can keep writing data while the slice exists but that
data will be past the area of the slice, and the GSP will never write
past the CPU read pointer.

So the data in the returned slices is guaranteed to be there at the time
of the call, and immutable for as long as the slices exist. Thus, they
can be provided by a safe method.

Unless we decide to not trust the GSP, but that would be opening a whole
new can of worms.

> I don't want to merge any code that builds on top of this before we have sorted
> this out.

If what I have written above is correct, then the fix should simply be
to use I/O projections to create properly-bounded references. Any more
immediate fix would need to be much more intrusive and require a
refactoring that is imho more risky than carrying on for a bit with the
current behavior.

Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`

Posted by Danilo Krummrich 3 weeks ago

On Tue Mar 17, 2026 at 2:55 AM CET, Alexandre Courbot wrote:
> We shouldn't be doing that - I think we are limited by the current
> CoherentAllocation API though. But IIUC this is something that I/O
> projections will allow us to handle properly?

Why do we need projections to avoid UB here? driver_read_area() already even
peeks into the firmware abstraction layer, which is where MsgqData technically
belongs into (despite being trivial).

	let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
	let data = &gsp_mem.gspq.msgq.data;

Why do we need I/O projections to do raw pointer arithmetic where creating a
reference is UB?

(Eventually, we want to use IoView of course, as this is a textbook example of
what I proposed IoSlice for.)

Another option in the meantime would be / have been to use dma_read!() and
extract (copy) the data right away in driver_read_area(), which I'd probably
prefer over raw pointer arithmetic.

But in any case, this can (and should) be fixed even without IoView.

Besides that, nothing prevents us doing the same thing I did for gsp_write_ptr()
in the meantime to not break out of the firmware abstraction layer.

> This is guaranteed by the inability to update the CPU read pointer for
> as long as the slices exists.

Fair enough.

> Unless we decide to not trust the GSP, but that would be opening a whole
> new can of worms.

I thought about this as well, and I think it's fine. The safety comment within
the function has to justify why the device won't access the memory. If the
device does so regardless, it's simply a bug.

>> I don't want to merge any code that builds on top of this before we have sorted
>> this out.
>
> If what I have written above is correct, then the fix should simply be
> to use I/O projections to create properly-bounded references.

I still don't think we need I/O projections for a reasonable fix and I also
don't agree that we should keep UB until new features land.

Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`

Posted by Alexandre Courbot 3 weeks ago

On Tue Mar 17, 2026 at 7:49 PM JST, Danilo Krummrich wrote:
> On Tue Mar 17, 2026 at 2:55 AM CET, Alexandre Courbot wrote:
>> We shouldn't be doing that - I think we are limited by the current
>> CoherentAllocation API though. But IIUC this is something that I/O
>> projections will allow us to handle properly?
>
> Why do we need projections to avoid UB here? driver_read_area() already even
> peeks into the firmware abstraction layer, which is where MsgqData technically
> belongs into (despite being trivial).
>
> 	let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
> 	let data = &gsp_mem.gspq.msgq.data;
>
> Why do we need I/O projections to do raw pointer arithmetic where creating a
> reference is UB?
>
> (Eventually, we want to use IoView of course, as this is a textbook example of
> what I proposed IoSlice for.)

Limiting the amount of `unsafe`s, but I guess we can live with that as
this is going to be short-term anyway.

>
> Another option in the meantime would be / have been to use dma_read!() and
> extract (copy) the data right away in driver_read_area(), which I'd probably
> prefer over raw pointer arithmetic.

I'd personally like to keep the current "no-copy" approach as it
implements the right reference discipline (i.e. you need a mutable
reference to update the read pointer, which cannot be done if the buffer
is read by the driver) and moving to copy semantics would open a window
of opportunity to mess with that balance further (on top of requiring
bigger code changes that will be temporary).

>
> But in any case, this can (and should) be fixed even without IoView.
>
> Besides that, nothing prevents us doing the same thing I did for gsp_write_ptr()
> in the meantime to not break out of the firmware abstraction layer.
>
>> This is guaranteed by the inability to update the CPU read pointer for
>> as long as the slices exists.
>
> Fair enough.
>
>> Unless we decide to not trust the GSP, but that would be opening a whole
>> new can of worms.
>
> I thought about this as well, and I think it's fine. The safety comment within
> the function has to justify why the device won't access the memory. If the
> device does so regardless, it's simply a bug.
>
>>> I don't want to merge any code that builds on top of this before we have sorted
>>> this out.
>>
>> If what I have written above is correct, then the fix should simply be
>> to use I/O projections to create properly-bounded references.
>
> I still don't think we need I/O projections for a reasonable fix and I also
> don't agree that we should keep UB until new features land.

I have the following (modulo missing safety comments) to fix
`driver_read_area` - does it look acceptable to you? If so I'll go
ahead and fix `driver_write_area` as well.

diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index efa1aab1568f..3bddb5a2923f 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -296,24 +296,53 @@ fn driver_write_area_size(&self) -> usize {
         let tx = self.gsp_write_ptr() as usize;
         let rx = self.cpu_read_ptr() as usize;

+        // Pointer to the start of the GSP message queue.
+        //
         // SAFETY:
-        // - The `CoherentAllocation` contains exactly one object.
-        // - We will only access the driver-owned part of the shared memory.
-        // - Per the safety statement of the function, no concurrent access will be performed.
-        let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
-        let data = &gsp_mem.gspq.msgq.data;
+        // - `self.0` contains exactly one element.
+        // - `gspq.msgq.data[0]` is within the bounds of that element.
+        let data = unsafe { &raw const (*self.0.start_ptr()).gspq.msgq.data[0] };
+
+        // Safety/Panic comments to be referenced by the code below.
+        //
+        // SAFETY[1]:
+        // - `data` contains `MSGQ_NUM_PAGES` elements.
+        // - The area starting at `rx` and ending at `tx - 1` modulo `MSGQ_NUM_PAGES`,
+        //   inclusive, belongs to the driver for reading and is not accessed concurrently by
+        //   the GSP.
+        //
+        // PANIC[1]:
+        // - Per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`.
+        // - Per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`.

-        // The area starting at `rx` and ending at `tx - 1` modulo MSGQ_NUM_PAGES, inclusive,
-        // belongs to the driver for reading.
-        // PANIC:
-        // - per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`
-        // - per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`
         if rx <= tx {
             // The area is contiguous.
-            (&data[rx..tx], &[])
+            (
+                // SAFETY: See SAFETY[1].
+                //
+                // PANIC:
+                // - See PANIC[1].
+                // - Per the branch test, `rx <= tx`.
+                unsafe { core::slice::from_raw_parts(data.add(rx), tx - rx) },
+                &[],
+            )
         } else {
             // The area is discontiguous.
-            (&data[rx..], &data[..tx])
+            (
+                // SAFETY: See SAFETY[1].
+                //
+                // PANIC: See PANIC[1].
+                unsafe {
+                    core::slice::from_raw_parts(
+                        data.add(rx),
+                        num::u32_as_usize(MSGQ_NUM_PAGES) - rx,
+                    )
+                },
+                // SAFETY: See SAFETY[1].
+                //
+                // PANIC: See PANIC[1].
+                unsafe { core::slice::from_raw_parts(data, tx) },
+            )
         }
     }

Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`

Posted by Danilo Krummrich 3 weeks ago

On Tue Mar 17, 2026 at 2:41 PM CET, Alexandre Courbot wrote:
> On Tue Mar 17, 2026 at 7:49 PM JST, Danilo Krummrich wrote:
>> On Tue Mar 17, 2026 at 2:55 AM CET, Alexandre Courbot wrote:
>>> We shouldn't be doing that - I think we are limited by the current
>>> CoherentAllocation API though. But IIUC this is something that I/O
>>> projections will allow us to handle properly?
>>
>> Why do we need projections to avoid UB here? driver_read_area() already even
>> peeks into the firmware abstraction layer, which is where MsgqData technically
>> belongs into (despite being trivial).
>>
>> 	let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
>> 	let data = &gsp_mem.gspq.msgq.data;
>>
>> Why do we need I/O projections to do raw pointer arithmetic where creating a
>> reference is UB?
>>
>> (Eventually, we want to use IoView of course, as this is a textbook example of
>> what I proposed IoSlice for.)
>
> Limiting the amount of `unsafe`s, but I guess we can live with that as
> this is going to be short-term anyway.

Of course it is going to be better with IoSlice, but limiting the number of
unsafe calls regardless is a bit pointless if the "safe" ones can cause
undefined behavior. :)

>> Another option in the meantime would be / have been to use dma_read!() and
>> extract (copy) the data right away in driver_read_area(), which I'd probably
>> prefer over raw pointer arithmetic.
>
> I'd personally like to keep the current "no-copy" approach as it
> implements the right reference discipline (i.e. you need a mutable
> reference to update the read pointer, which cannot be done if the buffer
> is read by the driver) and moving to copy semantics would open a window
> of opportunity to mess with that balance further (on top of requiring
> bigger code changes that will be temporary).

I don't even know if we want them to be temporary, i.e. we can copy right away
and IoSlice would still be an improvement in order to make the copy in the first
place.

Also, you say "no-copy", but that's not true, we do copy eventually. In fact,
the whole point of this patch is to copy this buffer into a KVVec.

So, why not copy it right away with dma_read!() (later replaced with an IoSlice
copy) and then process it further?

I am also very sceptical of the "holding on to the reference prevents the read
pointer update" argument. Once we have a copy, there is no need not to update
the read pointer anymore in the first place, no?

>> But in any case, this can (and should) be fixed even without IoView.
>>
>> Besides that, nothing prevents us doing the same thing I did for gsp_write_ptr()
>> in the meantime to not break out of the firmware abstraction layer.
>>
>>> This is guaranteed by the inability to update the CPU read pointer for
>>> as long as the slices exists.
>>
>> Fair enough.
>>
>>> Unless we decide to not trust the GSP, but that would be opening a whole
>>> new can of worms.
>>
>> I thought about this as well, and I think it's fine. The safety comment within
>> the function has to justify why the device won't access the memory. If the
>> device does so regardless, it's simply a bug.
>>
>>>> I don't want to merge any code that builds on top of this before we have sorted
>>>> this out.
>>>
>>> If what I have written above is correct, then the fix should simply be
>>> to use I/O projections to create properly-bounded references.
>>
>> I still don't think we need I/O projections for a reasonable fix and I also
>> don't agree that we should keep UB until new features land.
>
> I have the following (modulo missing safety comments) to fix
> `driver_read_area` - does it look acceptable to you? If so I'll go
> ahead and fix `driver_write_area` as well.

Not pretty (which is of course not on you :), but looks correct.

I still feel like we should just copy right away, as mentioned above.

> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index efa1aab1568f..3bddb5a2923f 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -296,24 +296,53 @@ fn driver_write_area_size(&self) -> usize {
>          let tx = self.gsp_write_ptr() as usize;
>          let rx = self.cpu_read_ptr() as usize;
>
> +        // Pointer to the start of the GSP message queue.
> +        //
>          // SAFETY:
> -        // - The `CoherentAllocation` contains exactly one object.
> -        // - We will only access the driver-owned part of the shared memory.
> -        // - Per the safety statement of the function, no concurrent access will be performed.
> -        let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
> -        let data = &gsp_mem.gspq.msgq.data;
> +        // - `self.0` contains exactly one element.
> +        // - `gspq.msgq.data[0]` is within the bounds of that element.
> +        let data = unsafe { &raw const (*self.0.start_ptr()).gspq.msgq.data[0] };
> +
> +        // Safety/Panic comments to be referenced by the code below.
> +        //
> +        // SAFETY[1]:
> +        // - `data` contains `MSGQ_NUM_PAGES` elements.
> +        // - The area starting at `rx` and ending at `tx - 1` modulo `MSGQ_NUM_PAGES`,
> +        //   inclusive, belongs to the driver for reading and is not accessed concurrently by
> +        //   the GSP.
> +        //
> +        // PANIC[1]:
> +        // - Per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`.
> +        // - Per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`.
>
> -        // The area starting at `rx` and ending at `tx - 1` modulo MSGQ_NUM_PAGES, inclusive,
> -        // belongs to the driver for reading.
> -        // PANIC:
> -        // - per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`
> -        // - per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`
>          if rx <= tx {
>              // The area is contiguous.
> -            (&data[rx..tx], &[])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                //
> +                // PANIC:
> +                // - See PANIC[1].
> +                // - Per the branch test, `rx <= tx`.
> +                unsafe { core::slice::from_raw_parts(data.add(rx), tx - rx) },
> +                &[],
> +            )
>          } else {
>              // The area is discontiguous.
> -            (&data[rx..], &data[..tx])
> +            (
> +                // SAFETY: See SAFETY[1].
> +                //
> +                // PANIC: See PANIC[1].
> +                unsafe {
> +                    core::slice::from_raw_parts(
> +                        data.add(rx),
> +                        num::u32_as_usize(MSGQ_NUM_PAGES) - rx,
> +                    )
> +                },
> +                // SAFETY: See SAFETY[1].
> +                //
> +                // PANIC: See PANIC[1].
> +                unsafe { core::slice::from_raw_parts(data, tx) },
> +            )
>          }
>      }

Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`

Posted by Alexandre Courbot 2 weeks, 6 days ago

On Tue Mar 17, 2026 at 11:12 PM JST, Danilo Krummrich wrote:
> On Tue Mar 17, 2026 at 2:41 PM CET, Alexandre Courbot wrote:
>> On Tue Mar 17, 2026 at 7:49 PM JST, Danilo Krummrich wrote:
>>> On Tue Mar 17, 2026 at 2:55 AM CET, Alexandre Courbot wrote:
>>>> We shouldn't be doing that - I think we are limited by the current
>>>> CoherentAllocation API though. But IIUC this is something that I/O
>>>> projections will allow us to handle properly?
>>>
>>> Why do we need projections to avoid UB here? driver_read_area() already even
>>> peeks into the firmware abstraction layer, which is where MsgqData technically
>>> belongs into (despite being trivial).
>>>
>>> 	let gsp_mem = &unsafe { self.0.as_slice(0, 1) }.unwrap()[0];
>>> 	let data = &gsp_mem.gspq.msgq.data;
>>>
>>> Why do we need I/O projections to do raw pointer arithmetic where creating a
>>> reference is UB?
>>>
>>> (Eventually, we want to use IoView of course, as this is a textbook example of
>>> what I proposed IoSlice for.)
>>
>> Limiting the amount of `unsafe`s, but I guess we can live with that as
>> this is going to be short-term anyway.
>
> Of course it is going to be better with IoSlice, but limiting the number of
> unsafe calls regardless is a bit pointless if the "safe" ones can cause
> undefined behavior. :)

No arguing. :)

>
>>> Another option in the meantime would be / have been to use dma_read!() and
>>> extract (copy) the data right away in driver_read_area(), which I'd probably
>>> prefer over raw pointer arithmetic.
>>
>> I'd personally like to keep the current "no-copy" approach as it
>> implements the right reference discipline (i.e. you need a mutable
>> reference to update the read pointer, which cannot be done if the buffer
>> is read by the driver) and moving to copy semantics would open a window
>> of opportunity to mess with that balance further (on top of requiring
>> bigger code changes that will be temporary).
>
> I don't even know if we want them to be temporary, i.e. we can copy right away
> and IoSlice would still be an improvement in order to make the copy in the first
> place.
>
> Also, you say "no-copy", but that's not true, we do copy eventually. In fact,
> the whole point of this patch is to copy this buffer into a KVVec.
>
> So, why not copy it right away with dma_read!() (later replaced with an IoSlice
> copy) and then process it further?

So as you said we are already making one copy: `receive_msg` for
instance does not return any reference to the command buffer, but an
object created from the slices returned by `driver_read_area`. What you
are suggesting would introduce another copy in the command queue
internals, for no tangible benefit as it would not allow us to
advance the read pointer sooner anyway.

Because even if we did that extra copy using `dma_read`, we would still
need to make sure the read pointer doesn't move as we are doing it, lest
we return corrupted messages - so the situation would be more or less
the same, and the `&mut self` required to advance the read pointer would
still be needed to shield us from doing it inadvertently.

In addition, we need space to hold that additional copy. It doesn't have
a fixed size, and can cover several memory pages. Does that mean we
would need to allocate a KVec or KVVec for every message received, and
drop it even before `receive_msg` returns?

The additional code to do that allocation would likely be larger than
the fix I proposed below, and we know it is very temporary and will be
replaced by mostly safe code soon. I would understand the trade-off of
an extra copy if this resulted in simpler code, but I don't think that
would be the case here.

>
> I am also very sceptical of the "holding on to the reference prevents the read
> pointer update" argument. Once we have a copy, there is no need not to update
> the read pointer anymore in the first place, no?

Correct, and that's exactly what `receive_msg` does - as soon as it has
created the object to return from the slices, it advances the read
pointer. So the read pointer is already updated when `receive_msg`
returns, and that "holding on to the reference" thing is purely for
internal safety.

Introducing another copy won't allow us to update it sooner - copying
the bytes into a KVec is not faster than processing the slices' data
into the final returned value. Actually it may even be slower if that
involves a dynamic allocation.

Re: [PATCH 6/9] gpu: nova-core: generalize `flush_into_kvec` to `flush_into_vec`

Posted by Joel Fernandes 4 weeks ago

On Fri, Feb 27, 2026 at 09:32:11PM +0900, Eliot Courtney wrote:
> Add general `flush_into_vec` function. Add `flush_into_kvvec`
> convenience wrapper alongside the existing `flush_into_kvec` function.
> This is generally useful but immediately used for e.g. holding RM
> control payloads, which can be large (~>=20 KiB).
> 
> Signed-off-by: Eliot Courtney <ecourtney@nvidia.com>

Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>

thanks,

--
Joel Fernandes



> ---
>  drivers/gpu/nova-core/sbuffer.rs | 31 +++++++++++++++++++++++++++----
>  1 file changed, 27 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/nova-core/sbuffer.rs b/drivers/gpu/nova-core/sbuffer.rs
> index 3a41d224c77a..38f8a8426521 100644
> --- a/drivers/gpu/nova-core/sbuffer.rs
> +++ b/drivers/gpu/nova-core/sbuffer.rs
> @@ -2,7 +2,13 @@
>  
>  use core::ops::Deref;
>  
> -use kernel::prelude::*;
> +use kernel::{
> +    alloc::{
> +        Allocator,
> +        KVec, //
> +    },
> +    prelude::*, //
> +};
>  
>  /// A buffer abstraction for discontiguous byte slices.
>  ///
> @@ -162,11 +168,14 @@ pub(crate) fn read_exact(&mut self, mut dst: &mut [u8]) -> Result {
>          Ok(())
>      }
>  
> -    /// Read all the remaining data into a [`KVec`].
> +    /// Read all the remaining data into a [`Vec`] with the given allocator.
>      ///
>      /// `self` will be empty after this operation.
> -    pub(crate) fn flush_into_kvec(&mut self, flags: kernel::alloc::Flags) -> Result<KVec<u8>> {
> -        let mut buf = KVec::<u8>::new();
> +    pub(crate) fn flush_into_vec<A: Allocator>(
> +        &mut self,
> +        flags: kernel::alloc::Flags,
> +    ) -> Result<Vec<u8, A>> {
> +        let mut buf = Vec::<u8, A>::new();
>  
>          if let Some(slice) = core::mem::take(&mut self.cur_slice) {
>              buf.extend_from_slice(slice, flags)?;
> @@ -177,6 +186,20 @@ pub(crate) fn flush_into_kvec(&mut self, flags: kernel::alloc::Flags) -> Result<
>  
>          Ok(buf)
>      }
> +
> +    /// Read all the remaining data into a [`KVec`].
> +    ///
> +    /// `self` will be empty after this operation.
> +    pub(crate) fn flush_into_kvec(&mut self, flags: kernel::alloc::Flags) -> Result<KVec<u8>> {
> +        self.flush_into_vec(flags)
> +    }
> +
> +    /// Read all the remaining data into a [`KVVec`].
> +    ///
> +    /// `self` will be empty after this operation.
> +    pub(crate) fn flush_into_kvvec(&mut self, flags: kernel::alloc::Flags) -> Result<KVVec<u8>> {
> +        self.flush_into_vec(flags)
> +    }
>  }
>  
>  /// Provides a way to get mutable slices of data to write into.
> 
> -- 
> 2.53.0
>