[v5] gpu: nova-core: gsp: fix undefined behavior in command queue code

[PATCH v5] gpu: nova-core: gsp: fix undefined behavior in command queue code

Posted by Alexandre Courbot 2 days, 8 hours ago

`driver_read_area` and `driver_write_area` are internal methods that
return slices containing the area of the command queue buffer that the
driver has exclusive read or write access, respectively.

While their returned value is correct and safe to use, internally they
temporarily create a reference to the whole command-buffer slice,
including GSP-owned regions. These regions can change without notice,
and thus creating a slice to them, even if never accessed, is undefined
behavior.

Fix this by making these methods create slices to valid regions only.

Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
Reported-by: Danilo Krummrich <dakr@kernel.org>
Closes: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
Since we are still getting `build_error`s on some configurations, this
revision reverts to building raw slices from computed ending indices.
---
Changes in v5:
- Eschew pointer projections with runtime-computed indices to avoid
  spurious `build_error`s.
- Drop `Reviewed-by` tags since the code has changed significantly.
- Link to v4: https://patch.msgid.link/20260401-cmdq-ub-fix-v4-0-a9a9cf982485@nvidia.com

Changes in v4:
- Make some methods providing the `ptr_project!` invariants inline.
- Use code paths that preserve the invariants `ptr_project!` depends on
  more obviously to fix these testbot build failures:
  - https://lore.kernel.org/all/202603280326.ucDKVaf2-lkp@intel.com/
  - https://lore.kernel.org/all/202603281331.1ESuqgfz-lkp@intel.com/
- Improve safety comment when creating the mutable slices (thanks Danilo!).
- Link to v3: https://patch.msgid.link/20260326-cmdq-ub-fix-v3-1-96af2148ca5c@nvidia.com

Changes in v3:
- Rebase on top of latest `drm-rust-next` (with `Coherent` patches).
- Use pointer projections. (thanks Gary!)
- Link to v2: https://patch.msgid.link/20260323-cmdq-ub-fix-v2-1-77d1213c3f7f@nvidia.com

Changes in v2:
- Use `u32_as_usize` consistently.
- Reduce the number of `unsafe` blocks by computing the end offset of
  the returned slices and creating them at the end, in one step.
- Take advantage of the fact that both slices have the same start index
  regardless of the branch chosen.
- Improve safety comments.
- Link to v1: https://patch.msgid.link/20260319-cmdq-ub-fix-v1-1-0f9f6e8f3ce3@nvidia.com
---
 drivers/gpu/nova-core/gsp/cmdq.rs | 116 +++++++++++++++++++++++---------------
 1 file changed, 69 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 2224896ccc89..569bb1a2501c 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -17,6 +17,7 @@
     },
     new_mutex,
     prelude::*,
+    ptr,
     sync::{
         aref::ARef,
         Mutex, //
@@ -255,37 +256,46 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
     /// As the message queue is a circular buffer, the region may be discontiguous in memory. In
     /// that case the second slice will have a non-zero length.
     fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
-        let tx = self.cpu_write_ptr() as usize;
-        let rx = self.gsp_read_ptr() as usize;
+        let tx = self.cpu_write_ptr();
+        let rx = self.gsp_read_ptr();
+
+        // Pointer to the first entry of the CPU message queue.
+        let data = ptr::project!(mut self.0.as_mut_ptr(), .cpuq.msgq.data[0]);
+
+        let (tail_end, wrap_end) = if rx == 0 {
+            // The write area is non-wrapping, and stops at the second-to-last entry of the command
+            // queue (to leave the last one empty).
+            (MSGQ_NUM_PAGES - 1, 0)
+        } else if rx <= tx {
+            // The write area wraps and continues until `rx - 1`.
+            (MSGQ_NUM_PAGES, rx - 1)
+        } else {
+            // The write area doesn't wrap and stops at `rx - 1`.
+            (rx - 1, 0)
+        };
 
         // SAFETY:
-        // - We will only access the driver-owned part of the shared memory.
-        // - Per the safety statement of the function, no concurrent access will be performed.
-        let gsp_mem = unsafe { &mut *self.0.as_mut() };
-        // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
-        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
-
-        // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
-        // belongs to the driver for writing.
-
-        if rx == 0 {
-            // Since `rx` is zero, leave an empty slot at end of the buffer.
-            let last = after_tx.len() - 1;
-            (&mut after_tx[..last], &mut [])
-        } else if rx <= tx {
-            // The area is discontiguous and we leave an empty slot before `rx`.
-            // PANIC:
-            // - The index `rx - 1` is non-negative because `rx != 0` in this branch.
-            // - The index does not exceed `before_tx.len()` (which equals `tx`) because
-            //   `rx <= tx` in this branch.
-            (after_tx, &mut before_tx[..(rx - 1)])
-        } else {
-            // The area is contiguous and we leave an empty slot before `rx`.
-            // PANIC:
-            // - The index `rx - tx - 1` is non-negative because `rx > tx` in this branch.
-            // - The index does not exceed `after_tx.len()` (which is `MSGQ_NUM_PAGES - tx`)
-            //   because `rx < MSGQ_NUM_PAGES` by the `gsp_read_ptr` invariant.
-            (&mut after_tx[..(rx - tx - 1)], &mut [])
+        // - `data` was created from a valid pointer, and `rx` and `tx` are in the
+        //   `0..MSGQ_NUM_PAGES` range per the invariants of `cpu_write_ptr` and `gsp_read_ptr`,
+        //   thus the created slices are valid.
+        // - The area starting at `tx` and ending at `rx - 2` modulo `MSGQ_NUM_PAGES`,
+        //   inclusive, belongs to the driver for writing and is not accessed concurrently by
+        //   the GSP.
+        // - The caller holds a reference to `self` for as long as the returned slices are live,
+        //   meaning the CPU write pointer cannot be advanced and thus that the returned area
+        //   remains exclusive to the CPU for the duration of the slices.
+        // - The created slices point to non-overlapping sub-ranges of `data` in all
+        //   branches (in the `rx <= tx` case, the second slice ends at `rx - 1` which is strictly
+        //   less than `tx` where the first slice starts; in the other cases the second slice is
+        //   empty), so creating two `&mut` references from them does not violate aliasing rules.
+        unsafe {
+            (
+                core::slice::from_raw_parts_mut(
+                    data.add(num::u32_as_usize(tx)),
+                    num::u32_as_usize(tail_end - tx),
+                ),
+                core::slice::from_raw_parts_mut(data, num::u32_as_usize(wrap_end)),
+            )
         }
     }
 
@@ -308,26 +318,38 @@ fn driver_write_area_size(&self) -> usize {
     /// As the message queue is a circular buffer, the region may be discontiguous in memory. In
     /// that case the second slice will have a non-zero length.
     fn driver_read_area(&self) -> (&[[u8; GSP_PAGE_SIZE]], &[[u8; GSP_PAGE_SIZE]]) {
-        let tx = self.gsp_write_ptr() as usize;
-        let rx = self.cpu_read_ptr() as usize;
+        let tx = self.gsp_write_ptr();
+        let rx = self.cpu_read_ptr();
+
+        // Pointer to the first entry of the GSP message queue.
+        let data = ptr::project!(self.0.as_ptr(), .gspq.msgq.data[0]);
+
+        let (tail_end, wrap_end) = if rx <= tx {
+            // Read area is non-wrapping and stops right before `tx`.
+            (tx, 0)
+        } else {
+            // Read area is wrapping and stops right before `tx`.
+            (MSGQ_NUM_PAGES, tx)
+        };
 
         // SAFETY:
-        // - We will only access the driver-owned part of the shared memory.
-        // - Per the safety statement of the function, no concurrent access will be performed.
-        let gsp_mem = unsafe { &*self.0.as_ptr() };
-        let data = &gsp_mem.gspq.msgq.data;
-
-        // The area starting at `rx` and ending at `tx - 1` modulo MSGQ_NUM_PAGES, inclusive,
-        // belongs to the driver for reading.
-        // PANIC:
-        // - per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`
-        // - per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`
-        if rx <= tx {
-            // The area is contiguous.
-            (&data[rx..tx], &[])
-        } else {
-            // The area is discontiguous.
-            (&data[rx..], &data[..tx])
+        // - `data` was created from a valid pointer, and `rx` and `tx` are in the
+        //   `0..MSGQ_NUM_PAGES` range per the invariants of `gsp_write_ptr` and `cpu_read_ptr`,
+        //   thus the created slices are valid.
+        // - The area starting at `rx` and ending at `tx - 1` modulo `MSGQ_NUM_PAGES`,
+        //   inclusive, belongs to the driver for reading and is not accessed concurrently by
+        //   the GSP.
+        // - The caller holds a reference to `self` for as long as the returned slices are live,
+        //   meaning the CPU read pointer cannot be advanced and thus that the returned area
+        //   remains exclusive to the CPU for the duration of the slices.
+        unsafe {
+            (
+                core::slice::from_raw_parts(
+                    data.add(num::u32_as_usize(rx)),
+                    num::u32_as_usize(tail_end - rx),
+                ),
+                core::slice::from_raw_parts(data, num::u32_as_usize(wrap_end)),
+            )
         }
     }
 

---
base-commit: 7c50d748b4a635bc39802ea3f6b120e66b1b9067
change-id: 20260319-cmdq-ub-fix-d57b09a745b9

Best regards,
--  
Alexandre Courbot <acourbot@nvidia.com>

Re: [PATCH v5] gpu: nova-core: gsp: fix undefined behavior in command queue code

Posted by Danilo Krummrich 22 hours ago

On Sat Apr 4, 2026 at 7:04 AM CEST, Alexandre Courbot wrote:
> `driver_read_area` and `driver_write_area` are internal methods that
> return slices containing the area of the command queue buffer that the
> driver has exclusive read or write access, respectively.
>
> While their returned value is correct and safe to use, internally they
> temporarily create a reference to the whole command-buffer slice,
> including GSP-owned regions. These regions can change without notice,
> and thus creating a slice to them, even if never accessed, is undefined
> behavior.
>
> Fix this by making these methods create slices to valid regions only.
>
> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
> Reported-by: Danilo Krummrich <dakr@kernel.org>
> Closes: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>

Applied to drm-rust-next, thanks!

Re: [PATCH v5] gpu: nova-core: gsp: fix undefined behavior in command queue code

Posted by Gary Guo 2 days ago

On Sat Apr 4, 2026 at 6:04 AM BST, Alexandre Courbot wrote:
> `driver_read_area` and `driver_write_area` are internal methods that
> return slices containing the area of the command queue buffer that the
> driver has exclusive read or write access, respectively.
>
> While their returned value is correct and safe to use, internally they
> temporarily create a reference to the whole command-buffer slice,
> including GSP-owned regions. These regions can change without notice,
> and thus creating a slice to them, even if never accessed, is undefined
> behavior.
>
> Fix this by making these methods create slices to valid regions only.
>
> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
> Reported-by: Danilo Krummrich <dakr@kernel.org>
> Closes: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
> Since we are still getting `build_error`s on some configurations, this
> revision reverts to building raw slices from computed ending indices.
> ---
> Changes in v5:
> - Eschew pointer projections with runtime-computed indices to avoid
>   spurious `build_error`s.
> - Drop `Reviewed-by` tags since the code has changed significantly.
> - Link to v4: https://patch.msgid.link/20260401-cmdq-ub-fix-v4-0-a9a9cf982485@nvidia.com
>
> Changes in v4:
> - Make some methods providing the `ptr_project!` invariants inline.
> - Use code paths that preserve the invariants `ptr_project!` depends on
>   more obviously to fix these testbot build failures:
>   - https://lore.kernel.org/all/202603280326.ucDKVaf2-lkp@intel.com/
>   - https://lore.kernel.org/all/202603281331.1ESuqgfz-lkp@intel.com/
> - Improve safety comment when creating the mutable slices (thanks Danilo!).
> - Link to v3: https://patch.msgid.link/20260326-cmdq-ub-fix-v3-1-96af2148ca5c@nvidia.com
>
> Changes in v3:
> - Rebase on top of latest `drm-rust-next` (with `Coherent` patches).
> - Use pointer projections. (thanks Gary!)
> - Link to v2: https://patch.msgid.link/20260323-cmdq-ub-fix-v2-1-77d1213c3f7f@nvidia.com
>
> Changes in v2:
> - Use `u32_as_usize` consistently.
> - Reduce the number of `unsafe` blocks by computing the end offset of
>   the returned slices and creating them at the end, in one step.
> - Take advantage of the fact that both slices have the same start index
>   regardless of the branch chosen.
> - Improve safety comments.
> - Link to v1: https://patch.msgid.link/20260319-cmdq-ub-fix-v1-1-0f9f6e8f3ce3@nvidia.com
> ---
>  drivers/gpu/nova-core/gsp/cmdq.rs | 116 +++++++++++++++++++++++---------------
>  1 file changed, 69 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index 2224896ccc89..569bb1a2501c 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -17,6 +17,7 @@
>      },
>      new_mutex,
>      prelude::*,
> +    ptr,
>      sync::{
>          aref::ARef,
>          Mutex, //
> @@ -255,37 +256,46 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>      /// As the message queue is a circular buffer, the region may be discontiguous in memory. In
>      /// that case the second slice will have a non-zero length.
>      fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
> -        let tx = self.cpu_write_ptr() as usize;
> -        let rx = self.gsp_read_ptr() as usize;
> +        let tx = self.cpu_write_ptr();
> +        let rx = self.gsp_read_ptr();
> +
> +        // Pointer to the first entry of the CPU message queue.
> +        let data = ptr::project!(mut self.0.as_mut_ptr(), .cpuq.msgq.data[0]);
> +
> +        let (tail_end, wrap_end) = if rx == 0 {
> +            // The write area is non-wrapping, and stops at the second-to-last entry of the command
> +            // queue (to leave the last one empty).
> +            (MSGQ_NUM_PAGES - 1, 0)
> +        } else if rx <= tx {
> +            // The write area wraps and continues until `rx - 1`.
> +            (MSGQ_NUM_PAGES, rx - 1)
> +        } else {
> +            // The write area doesn't wrap and stops at `rx - 1`.
> +            (rx - 1, 0)
> +        };
>  
>          // SAFETY:
> -        // - We will only access the driver-owned part of the shared memory.
> -        // - Per the safety statement of the function, no concurrent access will be performed.
> -        let gsp_mem = unsafe { &mut *self.0.as_mut() };
> -        // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
> -        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
> -
> -        // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
> -        // belongs to the driver for writing.
> -
> -        if rx == 0 {
> -            // Since `rx` is zero, leave an empty slot at end of the buffer.
> -            let last = after_tx.len() - 1;
> -            (&mut after_tx[..last], &mut [])
> -        } else if rx <= tx {
> -            // The area is discontiguous and we leave an empty slot before `rx`.
> -            // PANIC:
> -            // - The index `rx - 1` is non-negative because `rx != 0` in this branch.
> -            // - The index does not exceed `before_tx.len()` (which equals `tx`) because
> -            //   `rx <= tx` in this branch.
> -            (after_tx, &mut before_tx[..(rx - 1)])
> -        } else {
> -            // The area is contiguous and we leave an empty slot before `rx`.
> -            // PANIC:
> -            // - The index `rx - tx - 1` is non-negative because `rx > tx` in this branch.
> -            // - The index does not exceed `after_tx.len()` (which is `MSGQ_NUM_PAGES - tx`)
> -            //   because `rx < MSGQ_NUM_PAGES` by the `gsp_read_ptr` invariant.
> -            (&mut after_tx[..(rx - tx - 1)], &mut [])
> +        // - `data` was created from a valid pointer, and `rx` and `tx` are in the
> +        //   `0..MSGQ_NUM_PAGES` range per the invariants of `cpu_write_ptr` and `gsp_read_ptr`,
> +        //   thus the created slices are valid.
> +        // - The area starting at `tx` and ending at `rx - 2` modulo `MSGQ_NUM_PAGES`,
> +        //   inclusive, belongs to the driver for writing and is not accessed concurrently by
> +        //   the GSP.
> +        // - The caller holds a reference to `self` for as long as the returned slices are live,
> +        //   meaning the CPU write pointer cannot be advanced and thus that the returned area
> +        //   remains exclusive to the CPU for the duration of the slices.
> +        // - The created slices point to non-overlapping sub-ranges of `data` in all
> +        //   branches (in the `rx <= tx` case, the second slice ends at `rx - 1` which is strictly
> +        //   less than `tx` where the first slice starts; in the other cases the second slice is
> +        //   empty), so creating two `&mut` references from them does not violate aliasing rules.
> +        unsafe {
> +            (
> +                core::slice::from_raw_parts_mut(
> +                    data.add(num::u32_as_usize(tx)),

The code would be simpler if these `num::u32_as_usize` are just applied to `rx`
and `tx`. But other than that the code looks fine to me.

Reviewed-by: Gary Guo <gary@garyguo.net>

> +                    num::u32_as_usize(tail_end - tx),
> +                ),
> +                core::slice::from_raw_parts_mut(data, num::u32_as_usize(wrap_end)),
> +            )
>          }
>      }
>  
> @@ -308,26 +318,38 @@ fn driver_write_area_size(&self) -> usize {
>      /// As the message queue is a circular buffer, the region may be discontiguous in memory. In
>      /// that case the second slice will have a non-zero length.
>      fn driver_read_area(&self) -> (&[[u8; GSP_PAGE_SIZE]], &[[u8; GSP_PAGE_SIZE]]) {
> -        let tx = self.gsp_write_ptr() as usize;
> -        let rx = self.cpu_read_ptr() as usize;
> +        let tx = self.gsp_write_ptr();
> +        let rx = self.cpu_read_ptr();
> +
> +        // Pointer to the first entry of the GSP message queue.
> +        let data = ptr::project!(self.0.as_ptr(), .gspq.msgq.data[0]);
> +
> +        let (tail_end, wrap_end) = if rx <= tx {
> +            // Read area is non-wrapping and stops right before `tx`.
> +            (tx, 0)
> +        } else {
> +            // Read area is wrapping and stops right before `tx`.
> +            (MSGQ_NUM_PAGES, tx)
> +        };
>  
>          // SAFETY:
> -        // - We will only access the driver-owned part of the shared memory.
> -        // - Per the safety statement of the function, no concurrent access will be performed.
> -        let gsp_mem = unsafe { &*self.0.as_ptr() };
> -        let data = &gsp_mem.gspq.msgq.data;
> -
> -        // The area starting at `rx` and ending at `tx - 1` modulo MSGQ_NUM_PAGES, inclusive,
> -        // belongs to the driver for reading.
> -        // PANIC:
> -        // - per the invariant of `cpu_read_ptr`, `rx < MSGQ_NUM_PAGES`
> -        // - per the invariant of `gsp_write_ptr`, `tx < MSGQ_NUM_PAGES`
> -        if rx <= tx {
> -            // The area is contiguous.
> -            (&data[rx..tx], &[])
> -        } else {
> -            // The area is discontiguous.
> -            (&data[rx..], &data[..tx])
> +        // - `data` was created from a valid pointer, and `rx` and `tx` are in the
> +        //   `0..MSGQ_NUM_PAGES` range per the invariants of `gsp_write_ptr` and `cpu_read_ptr`,
> +        //   thus the created slices are valid.
> +        // - The area starting at `rx` and ending at `tx - 1` modulo `MSGQ_NUM_PAGES`,
> +        //   inclusive, belongs to the driver for reading and is not accessed concurrently by
> +        //   the GSP.
> +        // - The caller holds a reference to `self` for as long as the returned slices are live,
> +        //   meaning the CPU read pointer cannot be advanced and thus that the returned area
> +        //   remains exclusive to the CPU for the duration of the slices.
> +        unsafe {
> +            (
> +                core::slice::from_raw_parts(
> +                    data.add(num::u32_as_usize(rx)),
> +                    num::u32_as_usize(tail_end - rx),
> +                ),
> +                core::slice::from_raw_parts(data, num::u32_as_usize(wrap_end)),

Same here.

Best,
Gary

> +            )
>          }
>      }
>  
>
> ---
> base-commit: 7c50d748b4a635bc39802ea3f6b120e66b1b9067
> change-id: 20260319-cmdq-ub-fix-d57b09a745b9
>
> Best regards,
> --  
> Alexandre Courbot <acourbot@nvidia.com>

Re: [PATCH v5] gpu: nova-core: gsp: fix undefined behavior in command queue code

Posted by Alexandre Courbot 2 days ago

On Sat Apr 4, 2026 at 10:06 PM JST, Gary Guo wrote:
> On Sat Apr 4, 2026 at 6:04 AM BST, Alexandre Courbot wrote:
>> `driver_read_area` and `driver_write_area` are internal methods that
>> return slices containing the area of the command queue buffer that the
>> driver has exclusive read or write access, respectively.
>>
>> While their returned value is correct and safe to use, internally they
>> temporarily create a reference to the whole command-buffer slice,
>> including GSP-owned regions. These regions can change without notice,
>> and thus creating a slice to them, even if never accessed, is undefined
>> behavior.
>>
>> Fix this by making these methods create slices to valid regions only.
>>
>> Fixes: 75f6b1de8133 ("gpu: nova-core: gsp: Add GSP command queue bindings and handling")
>> Reported-by: Danilo Krummrich <dakr@kernel.org>
>> Closes: https://lore.kernel.org/all/DH47AVPEKN06.3BERUSJIB4M1R@kernel.org/
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>> Since we are still getting `build_error`s on some configurations, this
>> revision reverts to building raw slices from computed ending indices.
>> ---
>> Changes in v5:
>> - Eschew pointer projections with runtime-computed indices to avoid
>>   spurious `build_error`s.
>> - Drop `Reviewed-by` tags since the code has changed significantly.
>> - Link to v4: https://patch.msgid.link/20260401-cmdq-ub-fix-v4-0-a9a9cf982485@nvidia.com
>>
>> Changes in v4:
>> - Make some methods providing the `ptr_project!` invariants inline.
>> - Use code paths that preserve the invariants `ptr_project!` depends on
>>   more obviously to fix these testbot build failures:
>>   - https://lore.kernel.org/all/202603280326.ucDKVaf2-lkp@intel.com/
>>   - https://lore.kernel.org/all/202603281331.1ESuqgfz-lkp@intel.com/
>> - Improve safety comment when creating the mutable slices (thanks Danilo!).
>> - Link to v3: https://patch.msgid.link/20260326-cmdq-ub-fix-v3-1-96af2148ca5c@nvidia.com
>>
>> Changes in v3:
>> - Rebase on top of latest `drm-rust-next` (with `Coherent` patches).
>> - Use pointer projections. (thanks Gary!)
>> - Link to v2: https://patch.msgid.link/20260323-cmdq-ub-fix-v2-1-77d1213c3f7f@nvidia.com
>>
>> Changes in v2:
>> - Use `u32_as_usize` consistently.
>> - Reduce the number of `unsafe` blocks by computing the end offset of
>>   the returned slices and creating them at the end, in one step.
>> - Take advantage of the fact that both slices have the same start index
>>   regardless of the branch chosen.
>> - Improve safety comments.
>> - Link to v1: https://patch.msgid.link/20260319-cmdq-ub-fix-v1-1-0f9f6e8f3ce3@nvidia.com
>> ---
>>  drivers/gpu/nova-core/gsp/cmdq.rs | 116 +++++++++++++++++++++++---------------
>>  1 file changed, 69 insertions(+), 47 deletions(-)
>>
>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>> index 2224896ccc89..569bb1a2501c 100644
>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>> @@ -17,6 +17,7 @@
>>      },
>>      new_mutex,
>>      prelude::*,
>> +    ptr,
>>      sync::{
>>          aref::ARef,
>>          Mutex, //
>> @@ -255,37 +256,46 @@ fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
>>      /// As the message queue is a circular buffer, the region may be discontiguous in memory. In
>>      /// that case the second slice will have a non-zero length.
>>      fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
>> -        let tx = self.cpu_write_ptr() as usize;
>> -        let rx = self.gsp_read_ptr() as usize;
>> +        let tx = self.cpu_write_ptr();
>> +        let rx = self.gsp_read_ptr();
>> +
>> +        // Pointer to the first entry of the CPU message queue.
>> +        let data = ptr::project!(mut self.0.as_mut_ptr(), .cpuq.msgq.data[0]);
>> +
>> +        let (tail_end, wrap_end) = if rx == 0 {
>> +            // The write area is non-wrapping, and stops at the second-to-last entry of the command
>> +            // queue (to leave the last one empty).
>> +            (MSGQ_NUM_PAGES - 1, 0)
>> +        } else if rx <= tx {
>> +            // The write area wraps and continues until `rx - 1`.
>> +            (MSGQ_NUM_PAGES, rx - 1)
>> +        } else {
>> +            // The write area doesn't wrap and stops at `rx - 1`.
>> +            (rx - 1, 0)
>> +        };
>>  
>>          // SAFETY:
>> -        // - We will only access the driver-owned part of the shared memory.
>> -        // - Per the safety statement of the function, no concurrent access will be performed.
>> -        let gsp_mem = unsafe { &mut *self.0.as_mut() };
>> -        // PANIC: per the invariant of `cpu_write_ptr`, `tx` is `< MSGQ_NUM_PAGES`.
>> -        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
>> -
>> -        // The area starting at `tx` and ending at `rx - 2` modulo MSGQ_NUM_PAGES, inclusive,
>> -        // belongs to the driver for writing.
>> -
>> -        if rx == 0 {
>> -            // Since `rx` is zero, leave an empty slot at end of the buffer.
>> -            let last = after_tx.len() - 1;
>> -            (&mut after_tx[..last], &mut [])
>> -        } else if rx <= tx {
>> -            // The area is discontiguous and we leave an empty slot before `rx`.
>> -            // PANIC:
>> -            // - The index `rx - 1` is non-negative because `rx != 0` in this branch.
>> -            // - The index does not exceed `before_tx.len()` (which equals `tx`) because
>> -            //   `rx <= tx` in this branch.
>> -            (after_tx, &mut before_tx[..(rx - 1)])
>> -        } else {
>> -            // The area is contiguous and we leave an empty slot before `rx`.
>> -            // PANIC:
>> -            // - The index `rx - tx - 1` is non-negative because `rx > tx` in this branch.
>> -            // - The index does not exceed `after_tx.len()` (which is `MSGQ_NUM_PAGES - tx`)
>> -            //   because `rx < MSGQ_NUM_PAGES` by the `gsp_read_ptr` invariant.
>> -            (&mut after_tx[..(rx - tx - 1)], &mut [])
>> +        // - `data` was created from a valid pointer, and `rx` and `tx` are in the
>> +        //   `0..MSGQ_NUM_PAGES` range per the invariants of `cpu_write_ptr` and `gsp_read_ptr`,
>> +        //   thus the created slices are valid.
>> +        // - The area starting at `tx` and ending at `rx - 2` modulo `MSGQ_NUM_PAGES`,
>> +        //   inclusive, belongs to the driver for writing and is not accessed concurrently by
>> +        //   the GSP.
>> +        // - The caller holds a reference to `self` for as long as the returned slices are live,
>> +        //   meaning the CPU write pointer cannot be advanced and thus that the returned area
>> +        //   remains exclusive to the CPU for the duration of the slices.
>> +        // - The created slices point to non-overlapping sub-ranges of `data` in all
>> +        //   branches (in the `rx <= tx` case, the second slice ends at `rx - 1` which is strictly
>> +        //   less than `tx` where the first slice starts; in the other cases the second slice is
>> +        //   empty), so creating two `&mut` references from them does not violate aliasing rules.
>> +        unsafe {
>> +            (
>> +                core::slice::from_raw_parts_mut(
>> +                    data.add(num::u32_as_usize(tx)),
>
> The code would be simpler if these `num::u32_as_usize` are just applied to `rx`
> and `tx`. But other than that the code looks fine to me.

That's what I did initially - but the problem then becomes that
`MSGQ_NUM_PAGES`, which is also a `u32`, needs to be converted to
`usize` as well, resulting in more conversions. I've figured that
working with `u32`s until the very end was the less convoluted solution.

One could argue that `MSGQ_NUM_PAGES` should be a `usize` as well, but
that would require more changes to `cpu_write_ptr` and friends.