[v2] gpu: nova-core: Boot GSP to RISC-V active

[PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

Posted by Alistair Popple 1 week, 2 days ago

This commit introduces core infrastructure for handling GSP command and
message queues in the nova-core driver. The command queue system enables
bidirectional communication between the host driver and GSP firmware
through a remote message passing interface.

The interface is based on passing serialised data structures over a ring
buffer with separate transmit and receive queues. Commands are sent by
writing to the CPU transmit queue and waiting for completion via the
receive queue.

To ensure safety mutable or immutable (depending on whether it is a send
or receive operation) references are taken on the command queue when
allocating the message to write/read to. This ensures message memory
remains valid and the command queue can't be mutated whilst an operation
is in progress.

Currently this is only used by the probe() routine and therefore can
only used by a single thread of execution. Locking to enable safe access
from multiple threads will be introduced in a future series when that
becomes necessary.

Signed-off-by: Alistair Popple <apopple@nvidia.com>

---

Changes for v2:
 - Rebased on Alex's latest series
---
 drivers/gpu/nova-core/gsp.rs                  |  20 +-
 drivers/gpu/nova-core/gsp/cmdq.rs             | 423 ++++++++++++++++++
 drivers/gpu/nova-core/gsp/fw.rs               | 116 +++++
 .../gpu/nova-core/gsp/fw/r570_144/bindings.rs | 374 ++++++++++++++++
 drivers/gpu/nova-core/regs.rs                 |   4 +
 drivers/gpu/nova-core/sbuffer.rs              |   2 -
 6 files changed, 932 insertions(+), 7 deletions(-)
 create mode 100644 drivers/gpu/nova-core/gsp/cmdq.rs

diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
index 2daa46f2a514..3d4028d67d2e 100644
--- a/drivers/gpu/nova-core/gsp.rs
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -5,6 +5,7 @@
 
 pub(crate) use fw::{GspFwWprMeta, LibosParams};
 
+use kernel::alloc::flags::GFP_KERNEL;
 use kernel::device;
 use kernel::dma::CoherentAllocation;
 use kernel::dma_write;
@@ -14,8 +15,12 @@
 use kernel::transmute::{AsBytes, FromBytes};
 
 use crate::fb::FbLayout;
+use crate::gsp::cmdq::GspCmdq;
+
 use fw::LibosMemoryRegionInitArgument;
 
+pub(crate) mod cmdq;
+
 pub(crate) const GSP_PAGE_SHIFT: usize = 12;
 pub(crate) const GSP_PAGE_SIZE: usize = 1 << GSP_PAGE_SHIFT;
 pub(crate) const GSP_HEAP_ALIGNMENT: Alignment = Alignment::new::<{ 1 << 20 }>();
@@ -27,10 +32,11 @@ pub(crate) struct Gsp {
     pub loginit: CoherentAllocation<u8>,
     pub logintr: CoherentAllocation<u8>,
     pub logrm: CoherentAllocation<u8>,
+    pub cmdq: GspCmdq,
 }
 
 /// Creates a self-mapping page table for `obj` at its beginning.
-fn create_pte_array(obj: &mut CoherentAllocation<u8>) {
+fn create_pte_array<T: AsBytes + FromBytes>(obj: &mut CoherentAllocation<T>, skip: usize) {
     let num_pages = obj.size().div_ceil(GSP_PAGE_SIZE);
     let handle = obj.dma_handle();
 
@@ -42,7 +48,7 @@ fn create_pte_array(obj: &mut CoherentAllocation<u8>) {
     //  - The allocation size is at least as long as 8 * num_pages as
     //    GSP_PAGE_SIZE is larger than 8 bytes.
     let ptes = unsafe {
-        let ptr = obj.start_ptr_mut().cast::<u64>().add(1);
+        let ptr = obj.start_ptr_mut().cast::<u64>().add(skip);
         core::slice::from_raw_parts_mut(ptr, num_pages)
     };
 
@@ -76,17 +82,21 @@ pub(crate) fn new(pdev: &pci::Device<device::Bound>) -> Result<impl PinInit<Self
             GFP_KERNEL | __GFP_ZERO,
         )?;
         let mut loginit = create_coherent_dma_object::<u8>(dev, "LOGINIT", 0x10000, &mut libos, 0)?;
-        create_pte_array(&mut loginit);
+        create_pte_array(&mut loginit, 1);
         let mut logintr = create_coherent_dma_object::<u8>(dev, "LOGINTR", 0x10000, &mut libos, 1)?;
-        create_pte_array(&mut logintr);
+        create_pte_array(&mut logintr, 1);
         let mut logrm = create_coherent_dma_object::<u8>(dev, "LOGRM", 0x10000, &mut libos, 2)?;
-        create_pte_array(&mut logrm);
+        create_pte_array(&mut logrm, 1);
+
+        // Creates its own PTE array
+        let cmdq = GspCmdq::new(dev)?;
 
         Ok(try_pin_init!(Self {
             libos,
             loginit,
             logintr,
             logrm,
+            cmdq,
         }))
     }
 }
diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
new file mode 100644
index 000000000000..a9ba1a4c73d8
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -0,0 +1,423 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use core::mem::offset_of;
+use core::sync::atomic::fence;
+use core::sync::atomic::Ordering;
+
+use kernel::alloc::flags::GFP_KERNEL;
+use kernel::device;
+use kernel::dma::{CoherentAllocation, DmaAddress};
+use kernel::prelude::*;
+use kernel::sync::aref::ARef;
+use kernel::time::Delta;
+use kernel::transmute::{AsBytes, FromBytes};
+use kernel::{dma_read, dma_write};
+
+use super::fw::{
+    NV_VGPU_MSG_EVENT_GSP_INIT_DONE, NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
+    NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD, NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER,
+    NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED, NV_VGPU_MSG_EVENT_OS_ERROR_LOG,
+    NV_VGPU_MSG_EVENT_POST_EVENT, NV_VGPU_MSG_EVENT_RC_TRIGGERED,
+    NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT, NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA,
+    NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA, NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE,
+    NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY, NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT,
+    NV_VGPU_MSG_FUNCTION_ALLOC_ROOT, NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA, NV_VGPU_MSG_FUNCTION_FREE,
+    NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO, NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
+    NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU, NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
+    NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO, NV_VGPU_MSG_FUNCTION_LOG,
+    NV_VGPU_MSG_FUNCTION_MAP_MEMORY, NV_VGPU_MSG_FUNCTION_NOP,
+    NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO, NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
+};
+use crate::driver::Bar0;
+use crate::gsp::create_pte_array;
+use crate::gsp::fw::{GspMsgElement, MsgqRxHeader, MsgqTxHeader};
+use crate::gsp::{GSP_PAGE_SHIFT, GSP_PAGE_SIZE};
+use crate::regs::NV_PGSP_QUEUE_HEAD;
+use crate::sbuffer::SBuffer;
+use crate::util::wait_on;
+
+pub(crate) trait GspCommandToGsp: Sized + FromBytes + AsBytes {
+    const FUNCTION: u32;
+}
+
+pub(crate) trait GspMessageFromGsp: Sized + FromBytes + AsBytes {
+    const FUNCTION: u32;
+}
+
+/// Number of GSP pages making the Msgq.
+pub(crate) const MSGQ_NUM_PAGES: u32 = 0x3f;
+
+#[repr(C, align(0x1000))]
+#[derive(Debug)]
+struct MsgqData {
+    data: [[u8; GSP_PAGE_SIZE]; MSGQ_NUM_PAGES as usize],
+}
+
+// Annoyingly there is no real equivalent of #define so we're forced to use a
+// literal to specify the alignment above. So check that against the actual GSP
+// page size here.
+static_assert!(align_of::<MsgqData>() == GSP_PAGE_SIZE);
+
+// There is no struct defined for this in the open-gpu-kernel-source headers.
+// Instead it is defined by code in GspMsgQueuesInit().
+#[repr(C)]
+struct Msgq {
+    tx: MsgqTxHeader,
+    rx: MsgqRxHeader,
+    msgq: MsgqData,
+}
+
+#[repr(C)]
+struct GspMem {
+    ptes: [u8; GSP_PAGE_SIZE],
+    cpuq: Msgq,
+    gspq: Msgq,
+}
+
+// SAFETY: These structs don't meet the no-padding requirements of AsBytes but
+// that is not a problem because they are not used outside the kernel.
+unsafe impl AsBytes for GspMem {}
+
+// SAFETY: These structs don't meet the no-padding requirements of FromBytes but
+// that is not a problem because they are not used outside the kernel.
+unsafe impl FromBytes for GspMem {}
+
+/// `GspMem` struct that is shared with the GSP.
+struct DmaGspMem(CoherentAllocation<GspMem>);
+
+impl DmaGspMem {
+    fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
+        const MSGQ_SIZE: u32 = size_of::<Msgq>() as u32;
+        const RX_HDR_OFF: u32 = offset_of!(Msgq, rx) as u32;
+
+        let mut gsp_mem =
+            CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
+        create_pte_array(&mut gsp_mem, 0);
+        dma_write!(gsp_mem[0].cpuq.tx = MsgqTxHeader::new(MSGQ_SIZE, RX_HDR_OFF))?;
+        dma_write!(gsp_mem[0].cpuq.rx = MsgqRxHeader::new())?;
+
+        Ok(Self(gsp_mem))
+    }
+
+    #[expect(unused)]
+    fn dma_handle(&self) -> DmaAddress {
+        self.0.dma_handle()
+    }
+
+    /// # Safety
+    ///
+    /// The caller must ensure that the device doesn't access the parts of the [`GspMem`] it works
+    /// with.
+    unsafe fn access_mut(&mut self) -> &mut GspMem {
+        // SAFETY:
+        // - The [`CoherentAllocation`] contains exactly one object.
+        // - Per the safety statement of the function, no concurrent access wil be performed.
+        &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0]
+    }
+
+    /// # Safety
+    ///
+    /// The caller must ensure that the device doesn't access the parts of the [`GspMem`] it works
+    /// with.
+    unsafe fn access(&self) -> &GspMem {
+        // SAFETY:
+        // - The [`CoherentAllocation`] contains exactly one object.
+        // - Per the safety statement of the function, no concurrent access wil be performed.
+        &unsafe { self.0.as_slice(0, 1) }.unwrap()[0]
+    }
+
+    fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
+        let tx = self.cpu_write_ptr() as usize;
+        let rx = self.gsp_read_ptr() as usize;
+
+        // SAFETY: we will only access the driver-owned part of the shared memory.
+        let gsp_mem = unsafe { self.access_mut() };
+        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
+
+        if rx <= tx {
+            // The area from `tx` up to the end of the ring, and from the beginning of the ring up
+            // to `rx`, minus one unit, belongs to the driver.
+            if rx == 0 {
+                let last = after_tx.len() - 1;
+                (&mut after_tx[..last], &mut before_tx[0..0])
+            } else {
+                (after_tx, &mut before_tx[..rx])
+            }
+        } else {
+            // The area from `tx` to `rx`, minus one unit, belongs to the driver.
+            (after_tx.split_at_mut(rx - tx).0, &mut before_tx[0..0])
+        }
+    }
+
+    fn driver_read_area(&self) -> (&[[u8; GSP_PAGE_SIZE]], &[[u8; GSP_PAGE_SIZE]]) {
+        let tx = self.gsp_write_ptr() as usize;
+        let rx = self.cpu_read_ptr() as usize;
+
+        // SAFETY: we will only access the driver-owned part of the shared memory.
+        let gsp_mem = unsafe { self.access() };
+        let (before_rx, after_rx) = gsp_mem.gspq.msgq.data.split_at(rx);
+
+        if tx <= rx {
+            // The area from `rx` up to the end of the ring, and from the beginning of the ring up
+            // to `tx`, minus one unit, belongs to the driver.
+            if tx == 0 {
+                let last = after_rx.len() - 1;
+                (&after_rx[..last], &before_rx[0..0])
+            } else {
+                (after_rx, &before_rx[..tx])
+            }
+        } else {
+            // The area from `rx` to `tx`, minus one unit, belongs to the driver.
+            (after_rx.split_at(tx - rx).0, &before_rx[0..0])
+        }
+    }
+
+    fn gsp_write_ptr(&self) -> u32 {
+        let gsp_mem = &self.0;
+        dma_read!(gsp_mem[0].gspq.tx.writePtr).unwrap() % MSGQ_NUM_PAGES
+    }
+
+    fn gsp_read_ptr(&self) -> u32 {
+        let gsp_mem = &self.0;
+        dma_read!(gsp_mem[0].gspq.rx.readPtr).unwrap() % MSGQ_NUM_PAGES
+    }
+
+    fn cpu_read_ptr(&self) -> u32 {
+        let gsp_mem = &self.0;
+        dma_read!(gsp_mem[0].cpuq.rx.readPtr).unwrap() % MSGQ_NUM_PAGES
+    }
+
+    /// Inform the GSP that it can send `elem_count` new pages into the message queue.
+    fn advance_cpu_read_ptr(&mut self, elem_count: u32) {
+        let gsp_mem = &self.0;
+        let rptr = self.cpu_read_ptr().wrapping_add(elem_count) % MSGQ_NUM_PAGES;
+
+        // Ensure read pointer is properly ordered
+        fence(Ordering::SeqCst);
+
+        dma_write!(gsp_mem[0].cpuq.rx.readPtr = rptr).unwrap();
+    }
+
+    fn cpu_write_ptr(&self) -> u32 {
+        let gsp_mem = &self.0;
+        dma_read!(gsp_mem[0].cpuq.tx.writePtr).unwrap() % MSGQ_NUM_PAGES
+    }
+
+    /// Inform the GSP that it can process `elem_count` new pages from the command queue.
+    fn advance_cpu_write_ptr(&mut self, elem_count: u32) {
+        let gsp_mem = &self.0;
+        let wptr = self.cpu_write_ptr().wrapping_add(elem_count) & MSGQ_NUM_PAGES;
+        dma_write!(gsp_mem[0].cpuq.tx.writePtr = wptr).unwrap();
+
+        // Ensure all command data is visible before triggering the GSP read
+        fence(Ordering::SeqCst);
+    }
+}
+
+pub(crate) struct GspCmdq {
+    dev: ARef<device::Device>,
+    seq: u32,
+    gsp_mem: DmaGspMem,
+    pub _nr_ptes: u32,
+}
+
+impl GspCmdq {
+    pub(crate) fn new(dev: &device::Device<device::Bound>) -> Result<GspCmdq> {
+        let gsp_mem = DmaGspMem::new(dev)?;
+        let nr_ptes = size_of::<GspMem>() >> GSP_PAGE_SHIFT;
+        build_assert!(nr_ptes * size_of::<u64>() <= GSP_PAGE_SIZE);
+
+        Ok(GspCmdq {
+            dev: dev.into(),
+            seq: 0,
+            gsp_mem,
+            _nr_ptes: nr_ptes as u32,
+        })
+    }
+
+    fn calculate_checksum<T: Iterator<Item = u8>>(it: T) -> u32 {
+        let sum64 = it
+            .enumerate()
+            .map(|(idx, byte)| (((idx % 8) * 8) as u32, byte))
+            .fold(0, |acc, (rol, byte)| acc ^ u64::from(byte).rotate_left(rol));
+
+        ((sum64 >> 32) as u32) ^ (sum64 as u32)
+    }
+
+    #[expect(unused)]
+    pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
+        &mut self,
+        bar: &Bar0,
+        payload_size: usize,
+        init: impl FnOnce(&mut M, SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result,
+    ) -> Result {
+        // TODO: a method that extracts the regions for a given command?
+        // ... and another that reduces the region to a given number of bytes!
+        let driver_area = self.gsp_mem.driver_write_area();
+        let free_tx_pages = driver_area.0.len() + driver_area.1.len();
+
+        // Total size of the message, including the headers, command, and optional payload.
+        let msg_size = size_of::<GspMsgElement>() + size_of::<M>() + payload_size;
+        if free_tx_pages < msg_size.div_ceil(GSP_PAGE_SIZE) {
+            return Err(EAGAIN);
+        }
+
+        let (msg_header, cmd, payload_1, payload_2) = {
+            // TODO: find an alternative to as_flattened_mut()
+            #[allow(clippy::incompatible_msrv)]
+            let (msg_header_slice, slice_1) = driver_area
+                .0
+                .as_flattened_mut()
+                .split_at_mut(size_of::<GspMsgElement>());
+            let msg_header = GspMsgElement::from_bytes_mut(msg_header_slice).ok_or(EINVAL)?;
+            let (cmd_slice, payload_1) = slice_1.split_at_mut(size_of::<M>());
+            let cmd = M::from_bytes_mut(cmd_slice).ok_or(EINVAL)?;
+            // TODO: find an alternative to as_flattened_mut()
+            #[allow(clippy::incompatible_msrv)]
+            let payload_2 = driver_area.1.as_flattened_mut();
+            // TODO: Replace this workaround to cut the payload size.
+            let (payload_1, payload_2) = match payload_size.checked_sub(payload_1.len()) {
+                // The payload is longer than `payload_1`, set `payload_2` size to the difference.
+                Some(payload_2_len) => (payload_1, &mut payload_2[..payload_2_len]),
+                // `payload_1` is longer than the payload, we need to reduce its size.
+                None => (&mut payload_1[..payload_size], payload_2),
+            };
+
+            (msg_header, cmd, payload_1, payload_2)
+        };
+
+        let sbuffer = SBuffer::new_writer([&mut payload_1[..], &mut payload_2[..]]);
+        init(cmd, sbuffer)?;
+
+        *msg_header = GspMsgElement::new(self.seq, size_of::<M>() + payload_size, M::FUNCTION);
+        msg_header.checkSum = GspCmdq::calculate_checksum(SBuffer::new_reader([
+            msg_header.as_bytes(),
+            cmd.as_bytes(),
+            payload_1,
+            payload_2,
+        ]));
+
+        let rpc_header = &msg_header.rpc;
+        dev_info!(
+            &self.dev,
+            "GSP RPC: send: seq# {}, function=0x{:x} ({}), length=0x{:x}\n",
+            self.seq,
+            rpc_header.function,
+            decode_gsp_function(rpc_header.function),
+            rpc_header.length,
+        );
+
+        let elem_count = msg_header.elemCount;
+        self.seq += 1;
+        self.gsp_mem.advance_cpu_write_ptr(elem_count);
+        NV_PGSP_QUEUE_HEAD::default().set_address(0).write(bar);
+
+        Ok(())
+    }
+
+    #[expect(unused)]
+    pub(crate) fn receive_msg_from_gsp<M: GspMessageFromGsp, R>(
+        &mut self,
+        timeout: Delta,
+        init: impl FnOnce(&M, SBuffer<core::array::IntoIter<&[u8], 2>>) -> Result<R>,
+    ) -> Result<R> {
+        let (driver_area, msg_header, slice_1) = wait_on(timeout, || {
+            let driver_area = self.gsp_mem.driver_read_area();
+            // TODO: find an alternative to as_flattened()
+            #[allow(clippy::incompatible_msrv)]
+            let (msg_header_slice, slice_1) = driver_area
+                .0
+                .as_flattened()
+                .split_at(size_of::<GspMsgElement>());
+
+            // Can't fail because msg_slice will always be
+            // size_of::<GspMsgElement>() bytes long by the above split.
+            let msg_header = GspMsgElement::from_bytes(msg_header_slice).unwrap();
+            if msg_header.rpc.length < size_of::<M>() as u32 {
+                return None;
+            }
+
+            Some((driver_area, msg_header, slice_1))
+        })?;
+
+        let (cmd_slice, payload_1) = slice_1.split_at(size_of::<M>());
+        let cmd = M::from_bytes(cmd_slice).ok_or(EINVAL)?;
+        // TODO: find an alternative to as_flattened()
+        #[allow(clippy::incompatible_msrv)]
+        let payload_2 = driver_area.1.as_flattened();
+
+        // Log RPC receive with message type decoding
+        dev_info!(
+            self.dev,
+            "GSP RPC: receive: seq# {}, function=0x{:x} ({}), length=0x{:x}\n",
+            msg_header.rpc.sequence,
+            msg_header.rpc.function,
+            decode_gsp_function(msg_header.rpc.function),
+            msg_header.rpc.length,
+        );
+
+        if GspCmdq::calculate_checksum(SBuffer::new_reader([
+            msg_header.as_bytes(),
+            cmd.as_bytes(),
+            payload_1,
+            payload_2,
+        ])) != 0
+        {
+            dev_err!(
+                self.dev,
+                "GSP RPC: receive: Call {} - bad checksum",
+                msg_header.rpc.sequence
+            );
+            return Err(EIO);
+        }
+
+        let result = if msg_header.rpc.function == M::FUNCTION {
+            let sbuffer = SBuffer::new_reader([payload_1, payload_2]);
+            init(cmd, sbuffer)
+        } else {
+            Err(ERANGE)
+        };
+
+        self.gsp_mem
+            .advance_cpu_read_ptr(msg_header.rpc.length.div_ceil(GSP_PAGE_SIZE as u32));
+        result
+    }
+}
+
+fn decode_gsp_function(function: u32) -> &'static str {
+    match function {
+        // Common function codes
+        NV_VGPU_MSG_FUNCTION_NOP => "NOP",
+        NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO => "SET_GUEST_SYSTEM_INFO",
+        NV_VGPU_MSG_FUNCTION_ALLOC_ROOT => "ALLOC_ROOT",
+        NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE => "ALLOC_DEVICE",
+        NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY => "ALLOC_MEMORY",
+        NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA => "ALLOC_CTX_DMA",
+        NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA => "ALLOC_CHANNEL_DMA",
+        NV_VGPU_MSG_FUNCTION_MAP_MEMORY => "MAP_MEMORY",
+        NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA => "BIND_CTX_DMA",
+        NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT => "ALLOC_OBJECT",
+        NV_VGPU_MSG_FUNCTION_FREE => "FREE",
+        NV_VGPU_MSG_FUNCTION_LOG => "LOG",
+        NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO => "GET_GSP_STATIC_INFO",
+        NV_VGPU_MSG_FUNCTION_SET_REGISTRY => "SET_REGISTRY",
+        NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO => "GSP_SET_SYSTEM_INFO",
+        NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU => "GSP_INIT_POST_OBJGPU",
+        NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL => "GSP_RM_CONTROL",
+        NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO => "GET_STATIC_INFO",
+
+        // Event codes
+        NV_VGPU_MSG_EVENT_GSP_INIT_DONE => "INIT_DONE",
+        NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER => "RUN_CPU_SEQUENCER",
+        NV_VGPU_MSG_EVENT_POST_EVENT => "POST_EVENT",
+        NV_VGPU_MSG_EVENT_RC_TRIGGERED => "RC_TRIGGERED",
+        NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED => "MMU_FAULT_QUEUED",
+        NV_VGPU_MSG_EVENT_OS_ERROR_LOG => "OS_ERROR_LOG",
+        NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD => "NOCAT",
+        NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE => "LOCKDOWN_NOTICE",
+        NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT => "LIBOS_PRINT",
+
+        // Default for unknown codes
+        _ => "UNKNOWN",
+    }
+}
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index 7f4fe684ddaf..2e4255301e58 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -15,7 +15,9 @@
 use crate::firmware::gsp::GspFirmware;
 use crate::gpu::Chipset;
 use crate::gsp;
+use crate::gsp::cmdq::MSGQ_NUM_PAGES;
 use crate::gsp::FbLayout;
+use crate::gsp::GSP_PAGE_SIZE;
 
 /// Dummy type to group methods related to heap parameters for running the GSP firmware.
 pub(crate) struct GspFwHeapParams(());
@@ -159,6 +161,37 @@ pub(crate) fn new(gsp_firmware: &GspFirmware, fb_layout: &FbLayout) -> Self {
     // GSP firmware constants
     GSP_FW_WPR_META_MAGIC,
     GSP_FW_WPR_META_REVISION,
+
+    // GSP events
+    NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
+    NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
+    NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD,
+    NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER,
+    NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED,
+    NV_VGPU_MSG_EVENT_OS_ERROR_LOG,
+    NV_VGPU_MSG_EVENT_POST_EVENT,
+    NV_VGPU_MSG_EVENT_RC_TRIGGERED,
+    NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
+
+    // GSP function calls
+    NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA,
+    NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA,
+    NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE,
+    NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY,
+    NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT,
+    NV_VGPU_MSG_FUNCTION_ALLOC_ROOT,
+    NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA,
+    NV_VGPU_MSG_FUNCTION_FREE,
+    NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO,
+    NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
+    NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU,
+    NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
+    NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO,
+    NV_VGPU_MSG_FUNCTION_LOG,
+    NV_VGPU_MSG_FUNCTION_MAP_MEMORY,
+    NV_VGPU_MSG_FUNCTION_NOP,
+    NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
+    NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
 };
 
 #[repr(transparent)]
@@ -197,3 +230,86 @@ fn id8(name: &str) -> u64 {
         })
     }
 }
+
+pub(crate) type MsgqTxHeader = bindings::msgqTxHeader;
+
+// SAFETY: Padding is explicit and will not contain uninitialized data.
+unsafe impl AsBytes for MsgqTxHeader {}
+
+impl MsgqTxHeader {
+    pub(crate) fn new(msgq_size: u32, rx_hdr_offset: u32) -> Self {
+        Self {
+            version: 0,
+            size: msgq_size,
+            msgSize: GSP_PAGE_SIZE as u32,
+            msgCount: MSGQ_NUM_PAGES,
+            writePtr: 0,
+            flags: 1,
+            rxHdrOff: rx_hdr_offset,
+            entryOff: GSP_PAGE_SIZE as u32,
+        }
+    }
+}
+
+/// RX header for setting up a message queue with the GSP.
+///
+/// # Invariants
+///
+/// [`Self::read_ptr`] is guaranteed to return a value in the range `0..NUM_PAGES`.
+pub(crate) type MsgqRxHeader = bindings::msgqRxHeader;
+
+// SAFETY: Padding is explicit and will not contain uninitialized data.
+unsafe impl AsBytes for MsgqRxHeader {}
+
+impl MsgqRxHeader {
+    pub(crate) fn new() -> Self {
+        Default::default()
+    }
+}
+
+pub(crate) type GspRpcHeader = bindings::rpc_message_header_v;
+
+// SAFETY: Padding is explicit and will not contain uninitialized data.
+unsafe impl AsBytes for GspRpcHeader {}
+
+// SAFETY: This struct only contains integer types for which all bit patterns
+// are valid.
+unsafe impl FromBytes for GspRpcHeader {}
+
+impl GspRpcHeader {
+    pub(crate) fn new(cmd_size: u32, function: u32) -> Self {
+        Self {
+            // TODO: magic number
+            header_version: 0x03000000,
+            signature: bindings::NV_VGPU_MSG_SIGNATURE_VALID,
+            function,
+            // TODO: overflow check?
+            length: size_of::<Self>() as u32 + cmd_size,
+            rpc_result: 0xffffffff,
+            rpc_result_private: 0xffffffff,
+            ..Default::default()
+        }
+    }
+}
+
+pub(crate) type GspMsgElement = bindings::GSP_MSG_QUEUE_ELEMENT;
+
+// SAFETY: Padding is explicit and will not contain uninitialized data.
+unsafe impl AsBytes for GspMsgElement {}
+
+// SAFETY: This struct only contains integer types for which all bit patterns
+// are valid.
+unsafe impl FromBytes for GspMsgElement {}
+
+impl GspMsgElement {
+    pub(crate) fn new(sequence: u32, cmd_size: usize, function: u32) -> Self {
+        Self {
+            seqNum: sequence,
+            // TODO: overflow check and fallible div?
+            elemCount: (size_of::<Self>() + cmd_size).div_ceil(GSP_PAGE_SIZE) as u32,
+            // TODO: fallible conversion.
+            rpc: GspRpcHeader::new(cmd_size as u32, function),
+            ..Default::default()
+        }
+    }
+}
diff --git a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
index 392b25dc6991..3d96d91e5b12 100644
--- a/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
+++ b/drivers/gpu/nova-core/gsp/fw/r570_144/bindings.rs
@@ -1,5 +1,36 @@
 // SPDX-License-Identifier: GPL-2.0
 
+#[repr(C)]
+#[derive(Default)]
+pub struct __IncompleteArrayField<T>(::core::marker::PhantomData<T>, [T; 0]);
+impl<T> __IncompleteArrayField<T> {
+    #[inline]
+    pub const fn new() -> Self {
+        __IncompleteArrayField(::core::marker::PhantomData, [])
+    }
+    #[inline]
+    pub fn as_ptr(&self) -> *const T {
+        self as *const _ as *const T
+    }
+    #[inline]
+    pub fn as_mut_ptr(&mut self) -> *mut T {
+        self as *mut _ as *mut T
+    }
+    #[inline]
+    pub unsafe fn as_slice(&self, len: usize) -> &[T] {
+        ::core::slice::from_raw_parts(self.as_ptr(), len)
+    }
+    #[inline]
+    pub unsafe fn as_mut_slice(&mut self, len: usize) -> &mut [T] {
+        ::core::slice::from_raw_parts_mut(self.as_mut_ptr(), len)
+    }
+}
+impl<T> ::core::fmt::Debug for __IncompleteArrayField<T> {
+    fn fmt(&self, fmt: &mut ::core::fmt::Formatter<'_>) -> ::core::fmt::Result {
+        fmt.write_str("__IncompleteArrayField")
+    }
+}
+pub const NV_VGPU_MSG_SIGNATURE_VALID: u32 = 1129337430;
 pub const GSP_FW_HEAP_PARAM_OS_SIZE_LIBOS2: u32 = 0;
 pub const GSP_FW_HEAP_PARAM_OS_SIZE_LIBOS3_BAREMETAL: u32 = 23068672;
 pub const GSP_FW_HEAP_PARAM_BASE_RM_SIZE_TU10X: u32 = 8388608;
@@ -19,6 +50,312 @@
 pub type u16_ = __u16;
 pub type u32_ = __u32;
 pub type u64_ = __u64;
+pub const NV_VGPU_MSG_FUNCTION_NOP: _bindgen_ty_2 = 0;
+pub const NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO: _bindgen_ty_2 = 1;
+pub const NV_VGPU_MSG_FUNCTION_ALLOC_ROOT: _bindgen_ty_2 = 2;
+pub const NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE: _bindgen_ty_2 = 3;
+pub const NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY: _bindgen_ty_2 = 4;
+pub const NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA: _bindgen_ty_2 = 5;
+pub const NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA: _bindgen_ty_2 = 6;
+pub const NV_VGPU_MSG_FUNCTION_MAP_MEMORY: _bindgen_ty_2 = 7;
+pub const NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA: _bindgen_ty_2 = 8;
+pub const NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT: _bindgen_ty_2 = 9;
+pub const NV_VGPU_MSG_FUNCTION_FREE: _bindgen_ty_2 = 10;
+pub const NV_VGPU_MSG_FUNCTION_LOG: _bindgen_ty_2 = 11;
+pub const NV_VGPU_MSG_FUNCTION_ALLOC_VIDMEM: _bindgen_ty_2 = 12;
+pub const NV_VGPU_MSG_FUNCTION_UNMAP_MEMORY: _bindgen_ty_2 = 13;
+pub const NV_VGPU_MSG_FUNCTION_MAP_MEMORY_DMA: _bindgen_ty_2 = 14;
+pub const NV_VGPU_MSG_FUNCTION_UNMAP_MEMORY_DMA: _bindgen_ty_2 = 15;
+pub const NV_VGPU_MSG_FUNCTION_GET_EDID: _bindgen_ty_2 = 16;
+pub const NV_VGPU_MSG_FUNCTION_ALLOC_DISP_CHANNEL: _bindgen_ty_2 = 17;
+pub const NV_VGPU_MSG_FUNCTION_ALLOC_DISP_OBJECT: _bindgen_ty_2 = 18;
+pub const NV_VGPU_MSG_FUNCTION_ALLOC_SUBDEVICE: _bindgen_ty_2 = 19;
+pub const NV_VGPU_MSG_FUNCTION_ALLOC_DYNAMIC_MEMORY: _bindgen_ty_2 = 20;
+pub const NV_VGPU_MSG_FUNCTION_DUP_OBJECT: _bindgen_ty_2 = 21;
+pub const NV_VGPU_MSG_FUNCTION_IDLE_CHANNELS: _bindgen_ty_2 = 22;
+pub const NV_VGPU_MSG_FUNCTION_ALLOC_EVENT: _bindgen_ty_2 = 23;
+pub const NV_VGPU_MSG_FUNCTION_SEND_EVENT: _bindgen_ty_2 = 24;
+pub const NV_VGPU_MSG_FUNCTION_REMAPPER_CONTROL: _bindgen_ty_2 = 25;
+pub const NV_VGPU_MSG_FUNCTION_DMA_CONTROL: _bindgen_ty_2 = 26;
+pub const NV_VGPU_MSG_FUNCTION_DMA_FILL_PTE_MEM: _bindgen_ty_2 = 27;
+pub const NV_VGPU_MSG_FUNCTION_MANAGE_HW_RESOURCE: _bindgen_ty_2 = 28;
+pub const NV_VGPU_MSG_FUNCTION_BIND_ARBITRARY_CTX_DMA: _bindgen_ty_2 = 29;
+pub const NV_VGPU_MSG_FUNCTION_CREATE_FB_SEGMENT: _bindgen_ty_2 = 30;
+pub const NV_VGPU_MSG_FUNCTION_DESTROY_FB_SEGMENT: _bindgen_ty_2 = 31;
+pub const NV_VGPU_MSG_FUNCTION_ALLOC_SHARE_DEVICE: _bindgen_ty_2 = 32;
+pub const NV_VGPU_MSG_FUNCTION_DEFERRED_API_CONTROL: _bindgen_ty_2 = 33;
+pub const NV_VGPU_MSG_FUNCTION_REMOVE_DEFERRED_API: _bindgen_ty_2 = 34;
+pub const NV_VGPU_MSG_FUNCTION_SIM_ESCAPE_READ: _bindgen_ty_2 = 35;
+pub const NV_VGPU_MSG_FUNCTION_SIM_ESCAPE_WRITE: _bindgen_ty_2 = 36;
+pub const NV_VGPU_MSG_FUNCTION_SIM_MANAGE_DISPLAY_CONTEXT_DMA: _bindgen_ty_2 = 37;
+pub const NV_VGPU_MSG_FUNCTION_FREE_VIDMEM_VIRT: _bindgen_ty_2 = 38;
+pub const NV_VGPU_MSG_FUNCTION_PERF_GET_PSTATE_INFO: _bindgen_ty_2 = 39;
+pub const NV_VGPU_MSG_FUNCTION_PERF_GET_PERFMON_SAMPLE: _bindgen_ty_2 = 40;
+pub const NV_VGPU_MSG_FUNCTION_PERF_GET_VIRTUAL_PSTATE_INFO: _bindgen_ty_2 = 41;
+pub const NV_VGPU_MSG_FUNCTION_PERF_GET_LEVEL_INFO: _bindgen_ty_2 = 42;
+pub const NV_VGPU_MSG_FUNCTION_MAP_SEMA_MEMORY: _bindgen_ty_2 = 43;
+pub const NV_VGPU_MSG_FUNCTION_UNMAP_SEMA_MEMORY: _bindgen_ty_2 = 44;
+pub const NV_VGPU_MSG_FUNCTION_SET_SURFACE_PROPERTIES: _bindgen_ty_2 = 45;
+pub const NV_VGPU_MSG_FUNCTION_CLEANUP_SURFACE: _bindgen_ty_2 = 46;
+pub const NV_VGPU_MSG_FUNCTION_UNLOADING_GUEST_DRIVER: _bindgen_ty_2 = 47;
+pub const NV_VGPU_MSG_FUNCTION_TDR_SET_TIMEOUT_STATE: _bindgen_ty_2 = 48;
+pub const NV_VGPU_MSG_FUNCTION_SWITCH_TO_VGA: _bindgen_ty_2 = 49;
+pub const NV_VGPU_MSG_FUNCTION_GPU_EXEC_REG_OPS: _bindgen_ty_2 = 50;
+pub const NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO: _bindgen_ty_2 = 51;
+pub const NV_VGPU_MSG_FUNCTION_ALLOC_VIRTMEM: _bindgen_ty_2 = 52;
+pub const NV_VGPU_MSG_FUNCTION_UPDATE_PDE_2: _bindgen_ty_2 = 53;
+pub const NV_VGPU_MSG_FUNCTION_SET_PAGE_DIRECTORY: _bindgen_ty_2 = 54;
+pub const NV_VGPU_MSG_FUNCTION_GET_STATIC_PSTATE_INFO: _bindgen_ty_2 = 55;
+pub const NV_VGPU_MSG_FUNCTION_TRANSLATE_GUEST_GPU_PTES: _bindgen_ty_2 = 56;
+pub const NV_VGPU_MSG_FUNCTION_RESERVED_57: _bindgen_ty_2 = 57;
+pub const NV_VGPU_MSG_FUNCTION_RESET_CURRENT_GR_CONTEXT: _bindgen_ty_2 = 58;
+pub const NV_VGPU_MSG_FUNCTION_SET_SEMA_MEM_VALIDATION_STATE: _bindgen_ty_2 = 59;
+pub const NV_VGPU_MSG_FUNCTION_GET_ENGINE_UTILIZATION: _bindgen_ty_2 = 60;
+pub const NV_VGPU_MSG_FUNCTION_UPDATE_GPU_PDES: _bindgen_ty_2 = 61;
+pub const NV_VGPU_MSG_FUNCTION_GET_ENCODER_CAPACITY: _bindgen_ty_2 = 62;
+pub const NV_VGPU_MSG_FUNCTION_VGPU_PF_REG_READ32: _bindgen_ty_2 = 63;
+pub const NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO_EXT: _bindgen_ty_2 = 64;
+pub const NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO: _bindgen_ty_2 = 65;
+pub const NV_VGPU_MSG_FUNCTION_RMFS_INIT: _bindgen_ty_2 = 66;
+pub const NV_VGPU_MSG_FUNCTION_RMFS_CLOSE_QUEUE: _bindgen_ty_2 = 67;
+pub const NV_VGPU_MSG_FUNCTION_RMFS_CLEANUP: _bindgen_ty_2 = 68;
+pub const NV_VGPU_MSG_FUNCTION_RMFS_TEST: _bindgen_ty_2 = 69;
+pub const NV_VGPU_MSG_FUNCTION_UPDATE_BAR_PDE: _bindgen_ty_2 = 70;
+pub const NV_VGPU_MSG_FUNCTION_CONTINUATION_RECORD: _bindgen_ty_2 = 71;
+pub const NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO: _bindgen_ty_2 = 72;
+pub const NV_VGPU_MSG_FUNCTION_SET_REGISTRY: _bindgen_ty_2 = 73;
+pub const NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU: _bindgen_ty_2 = 74;
+pub const NV_VGPU_MSG_FUNCTION_SUBDEV_EVENT_SET_NOTIFICATION: _bindgen_ty_2 = 75;
+pub const NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL: _bindgen_ty_2 = 76;
+pub const NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO2: _bindgen_ty_2 = 77;
+pub const NV_VGPU_MSG_FUNCTION_DUMP_PROTOBUF_COMPONENT: _bindgen_ty_2 = 78;
+pub const NV_VGPU_MSG_FUNCTION_UNSET_PAGE_DIRECTORY: _bindgen_ty_2 = 79;
+pub const NV_VGPU_MSG_FUNCTION_GET_CONSOLIDATED_STATIC_INFO: _bindgen_ty_2 = 80;
+pub const NV_VGPU_MSG_FUNCTION_GMMU_REGISTER_FAULT_BUFFER: _bindgen_ty_2 = 81;
+pub const NV_VGPU_MSG_FUNCTION_GMMU_UNREGISTER_FAULT_BUFFER: _bindgen_ty_2 = 82;
+pub const NV_VGPU_MSG_FUNCTION_GMMU_REGISTER_CLIENT_SHADOW_FAULT_BUFFER: _bindgen_ty_2 = 83;
+pub const NV_VGPU_MSG_FUNCTION_GMMU_UNREGISTER_CLIENT_SHADOW_FAULT_BUFFER: _bindgen_ty_2 = 84;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_SET_VGPU_FB_USAGE: _bindgen_ty_2 = 85;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_NVFBC_SW_SESSION_UPDATE_INFO: _bindgen_ty_2 = 86;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_NVENC_SW_SESSION_UPDATE_INFO: _bindgen_ty_2 = 87;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_RESET_CHANNEL: _bindgen_ty_2 = 88;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_RESET_ISOLATED_CHANNEL: _bindgen_ty_2 = 89;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GPU_HANDLE_VF_PRI_FAULT: _bindgen_ty_2 = 90;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_CLK_GET_EXTENDED_INFO: _bindgen_ty_2 = 91;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_PERF_BOOST: _bindgen_ty_2 = 92;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_PERF_VPSTATES_GET_CONTROL: _bindgen_ty_2 = 93;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GET_ZBC_CLEAR_TABLE: _bindgen_ty_2 = 94;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_SET_ZBC_COLOR_CLEAR: _bindgen_ty_2 = 95;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_SET_ZBC_DEPTH_CLEAR: _bindgen_ty_2 = 96;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GPFIFO_SCHEDULE: _bindgen_ty_2 = 97;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_SET_TIMESLICE: _bindgen_ty_2 = 98;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_PREEMPT: _bindgen_ty_2 = 99;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_FIFO_DISABLE_CHANNELS: _bindgen_ty_2 = 100;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_SET_TSG_INTERLEAVE_LEVEL: _bindgen_ty_2 = 101;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_SET_CHANNEL_INTERLEAVE_LEVEL: _bindgen_ty_2 = 102;
+pub const NV_VGPU_MSG_FUNCTION_GSP_RM_ALLOC: _bindgen_ty_2 = 103;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GET_P2P_CAPS_V2: _bindgen_ty_2 = 104;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_CIPHER_AES_ENCRYPT: _bindgen_ty_2 = 105;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_CIPHER_SESSION_KEY: _bindgen_ty_2 = 106;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_CIPHER_SESSION_KEY_STATUS: _bindgen_ty_2 = 107;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DBG_CLEAR_ALL_SM_ERROR_STATES: _bindgen_ty_2 = 108;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DBG_READ_ALL_SM_ERROR_STATES: _bindgen_ty_2 = 109;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DBG_SET_EXCEPTION_MASK: _bindgen_ty_2 = 110;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GPU_PROMOTE_CTX: _bindgen_ty_2 = 111;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GR_CTXSW_PREEMPTION_BIND: _bindgen_ty_2 = 112;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GR_SET_CTXSW_PREEMPTION_MODE: _bindgen_ty_2 = 113;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GR_CTXSW_ZCULL_BIND: _bindgen_ty_2 = 114;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GPU_INITIALIZE_CTX: _bindgen_ty_2 = 115;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_VASPACE_COPY_SERVER_RESERVED_PDES: _bindgen_ty_2 = 116;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_FIFO_CLEAR_FAULTED_BIT: _bindgen_ty_2 = 117;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GET_LATEST_ECC_ADDRESSES: _bindgen_ty_2 = 118;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_MC_SERVICE_INTERRUPTS: _bindgen_ty_2 = 119;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DMA_SET_DEFAULT_VASPACE: _bindgen_ty_2 = 120;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GET_CE_PCE_MASK: _bindgen_ty_2 = 121;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GET_ZBC_CLEAR_TABLE_ENTRY: _bindgen_ty_2 = 122;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GET_NVLINK_PEER_ID_MASK: _bindgen_ty_2 = 123;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GET_NVLINK_STATUS: _bindgen_ty_2 = 124;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GET_P2P_CAPS: _bindgen_ty_2 = 125;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GET_P2P_CAPS_MATRIX: _bindgen_ty_2 = 126;
+pub const NV_VGPU_MSG_FUNCTION_RESERVED_0: _bindgen_ty_2 = 127;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_RESERVE_PM_AREA_SMPC: _bindgen_ty_2 = 128;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_RESERVE_HWPM_LEGACY: _bindgen_ty_2 = 129;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_B0CC_EXEC_REG_OPS: _bindgen_ty_2 = 130;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_BIND_PM_RESOURCES: _bindgen_ty_2 = 131;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DBG_SUSPEND_CONTEXT: _bindgen_ty_2 = 132;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DBG_RESUME_CONTEXT: _bindgen_ty_2 = 133;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DBG_EXEC_REG_OPS: _bindgen_ty_2 = 134;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DBG_SET_MODE_MMU_DEBUG: _bindgen_ty_2 = 135;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DBG_READ_SINGLE_SM_ERROR_STATE: _bindgen_ty_2 = 136;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DBG_CLEAR_SINGLE_SM_ERROR_STATE: _bindgen_ty_2 = 137;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DBG_SET_MODE_ERRBAR_DEBUG: _bindgen_ty_2 = 138;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DBG_SET_NEXT_STOP_TRIGGER_TYPE: _bindgen_ty_2 = 139;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_ALLOC_PMA_STREAM: _bindgen_ty_2 = 140;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_PMA_STREAM_UPDATE_GET_PUT: _bindgen_ty_2 = 141;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_FB_GET_INFO_V2: _bindgen_ty_2 = 142;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_FIFO_SET_CHANNEL_PROPERTIES: _bindgen_ty_2 = 143;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GR_GET_CTX_BUFFER_INFO: _bindgen_ty_2 = 144;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_KGR_GET_CTX_BUFFER_PTES: _bindgen_ty_2 = 145;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GPU_EVICT_CTX: _bindgen_ty_2 = 146;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_FB_GET_FS_INFO: _bindgen_ty_2 = 147;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GRMGR_GET_GR_FS_INFO: _bindgen_ty_2 = 148;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_STOP_CHANNEL: _bindgen_ty_2 = 149;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GR_PC_SAMPLING_MODE: _bindgen_ty_2 = 150;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_PERF_RATED_TDP_GET_STATUS: _bindgen_ty_2 = 151;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_PERF_RATED_TDP_SET_CONTROL: _bindgen_ty_2 = 152;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_FREE_PMA_STREAM: _bindgen_ty_2 = 153;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_TIMER_SET_GR_TICK_FREQ: _bindgen_ty_2 = 154;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_FIFO_SETUP_VF_ZOMBIE_SUBCTX_PDB: _bindgen_ty_2 = 155;
+pub const NV_VGPU_MSG_FUNCTION_GET_CONSOLIDATED_GR_STATIC_INFO: _bindgen_ty_2 = 156;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DBG_SET_SINGLE_SM_SINGLE_STEP: _bindgen_ty_2 = 157;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GR_GET_TPC_PARTITION_MODE: _bindgen_ty_2 = 158;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GR_SET_TPC_PARTITION_MODE: _bindgen_ty_2 = 159;
+pub const NV_VGPU_MSG_FUNCTION_UVM_PAGING_CHANNEL_ALLOCATE: _bindgen_ty_2 = 160;
+pub const NV_VGPU_MSG_FUNCTION_UVM_PAGING_CHANNEL_DESTROY: _bindgen_ty_2 = 161;
+pub const NV_VGPU_MSG_FUNCTION_UVM_PAGING_CHANNEL_MAP: _bindgen_ty_2 = 162;
+pub const NV_VGPU_MSG_FUNCTION_UVM_PAGING_CHANNEL_UNMAP: _bindgen_ty_2 = 163;
+pub const NV_VGPU_MSG_FUNCTION_UVM_PAGING_CHANNEL_PUSH_STREAM: _bindgen_ty_2 = 164;
+pub const NV_VGPU_MSG_FUNCTION_UVM_PAGING_CHANNEL_SET_HANDLES: _bindgen_ty_2 = 165;
+pub const NV_VGPU_MSG_FUNCTION_UVM_METHOD_STREAM_GUEST_PAGES_OPERATION: _bindgen_ty_2 = 166;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_INTERNAL_QUIESCE_PMA_CHANNEL: _bindgen_ty_2 = 167;
+pub const NV_VGPU_MSG_FUNCTION_DCE_RM_INIT: _bindgen_ty_2 = 168;
+pub const NV_VGPU_MSG_FUNCTION_REGISTER_VIRTUAL_EVENT_BUFFER: _bindgen_ty_2 = 169;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_EVENT_BUFFER_UPDATE_GET: _bindgen_ty_2 = 170;
+pub const NV_VGPU_MSG_FUNCTION_GET_PLCABLE_ADDRESS_KIND: _bindgen_ty_2 = 171;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_PERF_LIMITS_SET_STATUS_V2: _bindgen_ty_2 = 172;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_INTERNAL_SRIOV_PROMOTE_PMA_STREAM: _bindgen_ty_2 = 173;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GET_MMU_DEBUG_MODE: _bindgen_ty_2 = 174;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_INTERNAL_PROMOTE_FAULT_METHOD_BUFFERS: _bindgen_ty_2 = 175;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_FLCN_GET_CTX_BUFFER_SIZE: _bindgen_ty_2 = 176;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_FLCN_GET_CTX_BUFFER_INFO: _bindgen_ty_2 = 177;
+pub const NV_VGPU_MSG_FUNCTION_DISABLE_CHANNELS: _bindgen_ty_2 = 178;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_FABRIC_MEMORY_DESCRIBE: _bindgen_ty_2 = 179;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_FABRIC_MEM_STATS: _bindgen_ty_2 = 180;
+pub const NV_VGPU_MSG_FUNCTION_SAVE_HIBERNATION_DATA: _bindgen_ty_2 = 181;
+pub const NV_VGPU_MSG_FUNCTION_RESTORE_HIBERNATION_DATA: _bindgen_ty_2 = 182;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_INTERNAL_MEMSYS_SET_ZBC_REFERENCED: _bindgen_ty_2 = 183;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_EXEC_PARTITIONS_CREATE: _bindgen_ty_2 = 184;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_EXEC_PARTITIONS_DELETE: _bindgen_ty_2 = 185;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GPFIFO_GET_WORK_SUBMIT_TOKEN: _bindgen_ty_2 = 186;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GPFIFO_SET_WORK_SUBMIT_TOKEN_NOTIF_INDEX: _bindgen_ty_2 = 187;
+pub const NV_VGPU_MSG_FUNCTION_PMA_SCRUBBER_SHARED_BUFFER_GUEST_PAGES_OPERATION: _bindgen_ty_2 =
+    188;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_MASTER_GET_VIRTUAL_FUNCTION_ERROR_CONT_INTR_MASK:
+    _bindgen_ty_2 = 189;
+pub const NV_VGPU_MSG_FUNCTION_SET_SYSMEM_DIRTY_PAGE_TRACKING_BUFFER: _bindgen_ty_2 = 190;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_SUBDEVICE_GET_P2P_CAPS: _bindgen_ty_2 = 191;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_BUS_SET_P2P_MAPPING: _bindgen_ty_2 = 192;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_BUS_UNSET_P2P_MAPPING: _bindgen_ty_2 = 193;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_FLA_SETUP_INSTANCE_MEM_BLOCK: _bindgen_ty_2 = 194;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GPU_MIGRATABLE_OPS: _bindgen_ty_2 = 195;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GET_TOTAL_HS_CREDITS: _bindgen_ty_2 = 196;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GET_HS_CREDITS: _bindgen_ty_2 = 197;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_SET_HS_CREDITS: _bindgen_ty_2 = 198;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_PM_AREA_PC_SAMPLER: _bindgen_ty_2 = 199;
+pub const NV_VGPU_MSG_FUNCTION_INVALIDATE_TLB: _bindgen_ty_2 = 200;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GPU_QUERY_ECC_STATUS: _bindgen_ty_2 = 201;
+pub const NV_VGPU_MSG_FUNCTION_ECC_NOTIFIER_WRITE_ACK: _bindgen_ty_2 = 202;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DBG_GET_MODE_MMU_DEBUG: _bindgen_ty_2 = 203;
+pub const NV_VGPU_MSG_FUNCTION_RM_API_CONTROL: _bindgen_ty_2 = 204;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_CMD_INTERNAL_GPU_START_FABRIC_PROBE: _bindgen_ty_2 = 205;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_NVLINK_GET_INBAND_RECEIVED_DATA: _bindgen_ty_2 = 206;
+pub const NV_VGPU_MSG_FUNCTION_GET_STATIC_DATA: _bindgen_ty_2 = 207;
+pub const NV_VGPU_MSG_FUNCTION_RESERVED_208: _bindgen_ty_2 = 208;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_GPU_GET_INFO_V2: _bindgen_ty_2 = 209;
+pub const NV_VGPU_MSG_FUNCTION_GET_BRAND_CAPS: _bindgen_ty_2 = 210;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_CMD_NVLINK_INBAND_SEND_DATA: _bindgen_ty_2 = 211;
+pub const NV_VGPU_MSG_FUNCTION_UPDATE_GPM_GUEST_BUFFER_INFO: _bindgen_ty_2 = 212;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_CMD_INTERNAL_CONTROL_GSP_TRACE: _bindgen_ty_2 = 213;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_SET_ZBC_STENCIL_CLEAR: _bindgen_ty_2 = 214;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_SUBDEVICE_GET_VGPU_HEAP_STATS: _bindgen_ty_2 = 215;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_SUBDEVICE_GET_LIBOS_HEAP_STATS: _bindgen_ty_2 = 216;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DBG_SET_MODE_MMU_GCC_DEBUG: _bindgen_ty_2 = 217;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_DBG_GET_MODE_MMU_GCC_DEBUG: _bindgen_ty_2 = 218;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_RESERVE_HES: _bindgen_ty_2 = 219;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_RELEASE_HES: _bindgen_ty_2 = 220;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_RESERVE_CCU_PROF: _bindgen_ty_2 = 221;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_RELEASE_CCU_PROF: _bindgen_ty_2 = 222;
+pub const NV_VGPU_MSG_FUNCTION_RESERVED: _bindgen_ty_2 = 223;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_CMD_GET_CHIPLET_HS_CREDIT_POOL: _bindgen_ty_2 = 224;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_CMD_GET_HS_CREDITS_MAPPING: _bindgen_ty_2 = 225;
+pub const NV_VGPU_MSG_FUNCTION_CTRL_EXEC_PARTITIONS_EXPORT: _bindgen_ty_2 = 226;
+pub const NV_VGPU_MSG_FUNCTION_NUM_FUNCTIONS: _bindgen_ty_2 = 227;
+pub type _bindgen_ty_2 = ffi::c_uint;
+pub const NV_VGPU_MSG_EVENT_FIRST_EVENT: _bindgen_ty_3 = 4096;
+pub const NV_VGPU_MSG_EVENT_GSP_INIT_DONE: _bindgen_ty_3 = 4097;
+pub const NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER: _bindgen_ty_3 = 4098;
+pub const NV_VGPU_MSG_EVENT_POST_EVENT: _bindgen_ty_3 = 4099;
+pub const NV_VGPU_MSG_EVENT_RC_TRIGGERED: _bindgen_ty_3 = 4100;
+pub const NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED: _bindgen_ty_3 = 4101;
+pub const NV_VGPU_MSG_EVENT_OS_ERROR_LOG: _bindgen_ty_3 = 4102;
+pub const NV_VGPU_MSG_EVENT_RG_LINE_INTR: _bindgen_ty_3 = 4103;
+pub const NV_VGPU_MSG_EVENT_GPUACCT_PERFMON_UTIL_SAMPLES: _bindgen_ty_3 = 4104;
+pub const NV_VGPU_MSG_EVENT_SIM_READ: _bindgen_ty_3 = 4105;
+pub const NV_VGPU_MSG_EVENT_SIM_WRITE: _bindgen_ty_3 = 4106;
+pub const NV_VGPU_MSG_EVENT_SEMAPHORE_SCHEDULE_CALLBACK: _bindgen_ty_3 = 4107;
+pub const NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT: _bindgen_ty_3 = 4108;
+pub const NV_VGPU_MSG_EVENT_VGPU_GSP_PLUGIN_TRIGGERED: _bindgen_ty_3 = 4109;
+pub const NV_VGPU_MSG_EVENT_PERF_GPU_BOOST_SYNC_LIMITS_CALLBACK: _bindgen_ty_3 = 4110;
+pub const NV_VGPU_MSG_EVENT_PERF_BRIDGELESS_INFO_UPDATE: _bindgen_ty_3 = 4111;
+pub const NV_VGPU_MSG_EVENT_VGPU_CONFIG: _bindgen_ty_3 = 4112;
+pub const NV_VGPU_MSG_EVENT_DISPLAY_MODESET: _bindgen_ty_3 = 4113;
+pub const NV_VGPU_MSG_EVENT_EXTDEV_INTR_SERVICE: _bindgen_ty_3 = 4114;
+pub const NV_VGPU_MSG_EVENT_NVLINK_INBAND_RECEIVED_DATA_256: _bindgen_ty_3 = 4115;
+pub const NV_VGPU_MSG_EVENT_NVLINK_INBAND_RECEIVED_DATA_512: _bindgen_ty_3 = 4116;
+pub const NV_VGPU_MSG_EVENT_NVLINK_INBAND_RECEIVED_DATA_1024: _bindgen_ty_3 = 4117;
+pub const NV_VGPU_MSG_EVENT_NVLINK_INBAND_RECEIVED_DATA_2048: _bindgen_ty_3 = 4118;
+pub const NV_VGPU_MSG_EVENT_NVLINK_INBAND_RECEIVED_DATA_4096: _bindgen_ty_3 = 4119;
+pub const NV_VGPU_MSG_EVENT_TIMED_SEMAPHORE_RELEASE: _bindgen_ty_3 = 4120;
+pub const NV_VGPU_MSG_EVENT_NVLINK_IS_GPU_DEGRADED: _bindgen_ty_3 = 4121;
+pub const NV_VGPU_MSG_EVENT_PFM_REQ_HNDLR_STATE_SYNC_CALLBACK: _bindgen_ty_3 = 4122;
+pub const NV_VGPU_MSG_EVENT_NVLINK_FAULT_UP: _bindgen_ty_3 = 4123;
+pub const NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE: _bindgen_ty_3 = 4124;
+pub const NV_VGPU_MSG_EVENT_MIG_CI_CONFIG_UPDATE: _bindgen_ty_3 = 4125;
+pub const NV_VGPU_MSG_EVENT_UPDATE_GSP_TRACE: _bindgen_ty_3 = 4126;
+pub const NV_VGPU_MSG_EVENT_NVLINK_FATAL_ERROR_RECOVERY: _bindgen_ty_3 = 4127;
+pub const NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD: _bindgen_ty_3 = 4128;
+pub const NV_VGPU_MSG_EVENT_FECS_ERROR: _bindgen_ty_3 = 4129;
+pub const NV_VGPU_MSG_EVENT_RECOVERY_ACTION: _bindgen_ty_3 = 4130;
+pub const NV_VGPU_MSG_EVENT_NUM_EVENTS: _bindgen_ty_3 = 4131;
+pub type _bindgen_ty_3 = ffi::c_uint;
+#[repr(C)]
+#[derive(Copy, Clone)]
+pub union rpc_message_rpc_union_field_v03_00 {
+    pub spare: u32_,
+    pub cpuRmGfid: u32_,
+}
+impl Default for rpc_message_rpc_union_field_v03_00 {
+    fn default() -> Self {
+        let mut s = ::core::mem::MaybeUninit::<Self>::uninit();
+        unsafe {
+            ::core::ptr::write_bytes(s.as_mut_ptr(), 0, 1);
+            s.assume_init()
+        }
+    }
+}
+pub type rpc_message_rpc_union_field_v = rpc_message_rpc_union_field_v03_00;
+#[repr(C)]
+pub struct rpc_message_header_v03_00 {
+    pub header_version: u32_,
+    pub signature: u32_,
+    pub length: u32_,
+    pub function: u32_,
+    pub rpc_result: u32_,
+    pub rpc_result_private: u32_,
+    pub sequence: u32_,
+    pub u: rpc_message_rpc_union_field_v,
+    pub rpc_message_data: __IncompleteArrayField<u8_>,
+}
+impl Default for rpc_message_header_v03_00 {
+    fn default() -> Self {
+        let mut s = ::core::mem::MaybeUninit::<Self>::uninit();
+        unsafe {
+            ::core::ptr::write_bytes(s.as_mut_ptr(), 0, 1);
+            s.assume_init()
+        }
+    }
+}
+pub type rpc_message_header_v = rpc_message_header_v03_00;
 #[repr(C)]
 #[derive(Copy, Clone)]
 pub struct GspFwWprMeta {
@@ -145,3 +482,40 @@ pub struct LibosMemoryRegionInitArgument {
     pub loc: u8_,
     pub __bindgen_padding_0: [u8; 6usize],
 }
+#[repr(C)]
+#[derive(Debug, Default, Copy, Clone)]
+pub struct msgqTxHeader {
+    pub version: u32_,
+    pub size: u32_,
+    pub msgSize: u32_,
+    pub msgCount: u32_,
+    pub writePtr: u32_,
+    pub flags: u32_,
+    pub rxHdrOff: u32_,
+    pub entryOff: u32_,
+}
+#[repr(C)]
+#[derive(Debug, Default, Copy, Clone)]
+pub struct msgqRxHeader {
+    pub readPtr: u32_,
+}
+#[repr(C)]
+#[repr(align(8))]
+pub struct GSP_MSG_QUEUE_ELEMENT {
+    pub authTagBuffer: [u8_; 16usize],
+    pub aadBuffer: [u8_; 16usize],
+    pub checkSum: u32_,
+    pub seqNum: u32_,
+    pub elemCount: u32_,
+    pub __bindgen_padding_0: [u8; 4usize],
+    pub rpc: rpc_message_header_v,
+}
+impl Default for GSP_MSG_QUEUE_ELEMENT {
+    fn default() -> Self {
+        let mut s = ::core::mem::MaybeUninit::<Self>::uninit();
+        unsafe {
+            ::core::ptr::write_bytes(s.as_mut_ptr(), 0, 1);
+            s.assume_init()
+        }
+    }
+}
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 206dab2e1335..0585699ae951 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -71,6 +71,10 @@ pub(crate) fn chipset(self) -> Result<Chipset> {
     30:30   ecc_mode_enabled as bool;
 });
 
+register!(NV_PGSP_QUEUE_HEAD @ 0x00110c00 {
+    31:0    address as u32;
+});
+
 impl NV_PFB_PRI_MMU_LOCAL_MEMORY_RANGE {
     /// Returns the usable framebuffer size, in bytes.
     pub(crate) fn usable_fb_size(self) -> u64 {
diff --git a/drivers/gpu/nova-core/sbuffer.rs b/drivers/gpu/nova-core/sbuffer.rs
index e768e4f1cb7d..e6b18ecb7a55 100644
--- a/drivers/gpu/nova-core/sbuffer.rs
+++ b/drivers/gpu/nova-core/sbuffer.rs
@@ -37,7 +37,6 @@ impl<'a, I> SBuffer<I>
 where
     I: Iterator,
 {
-    #[expect(unused)]
     pub(crate) fn new_reader(slices: impl IntoIterator<IntoIter = I>) -> Self
     where
         I: Iterator<Item = &'a [u8]>,
@@ -45,7 +44,6 @@ pub(crate) fn new_reader(slices: impl IntoIterator<IntoIter = I>) -> Self
         Self::new(slices)
     }
 
-    #[expect(unused)]
     pub(crate) fn new_writer(slices: impl IntoIterator<IntoIter = I>) -> Self
     where
         I: Iterator<Item = &'a mut [u8]>,
-- 
2.50.1

Re: [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

Posted by Alexandre Courbot 6 days ago

Hi Alistair,

On Mon Sep 22, 2025 at 8:30 PM JST, Alistair Popple wrote:
<snip>
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> new file mode 100644
> index 000000000000..a9ba1a4c73d8
> --- /dev/null
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -0,0 +1,423 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +use core::mem::offset_of;
> +use core::sync::atomic::fence;
> +use core::sync::atomic::Ordering;
> +
> +use kernel::alloc::flags::GFP_KERNEL;
> +use kernel::device;
> +use kernel::dma::{CoherentAllocation, DmaAddress};
> +use kernel::prelude::*;
> +use kernel::sync::aref::ARef;
> +use kernel::time::Delta;
> +use kernel::transmute::{AsBytes, FromBytes};
> +use kernel::{dma_read, dma_write};
> +
> +use super::fw::{
> +    NV_VGPU_MSG_EVENT_GSP_INIT_DONE, NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
> +    NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD, NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER,
> +    NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED, NV_VGPU_MSG_EVENT_OS_ERROR_LOG,
> +    NV_VGPU_MSG_EVENT_POST_EVENT, NV_VGPU_MSG_EVENT_RC_TRIGGERED,
> +    NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT, NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA, NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY, NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_ROOT, NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA, NV_VGPU_MSG_FUNCTION_FREE,
> +    NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO, NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
> +    NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU, NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
> +    NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO, NV_VGPU_MSG_FUNCTION_LOG,
> +    NV_VGPU_MSG_FUNCTION_MAP_MEMORY, NV_VGPU_MSG_FUNCTION_NOP,
> +    NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO, NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
> +};

As I mentioned in v1, let's turn these into two enums to avoid this big
import and make sure we can never mix up the values.

This can be something like this in `fw.rs`:

#[repr(u32)]
pub(crate) enum VgpuMsgEvent {
    GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
    GspLockDownNotice = bindings::NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
    ...
}

Then you just need to import `VgpuMsgEvent`, and can use that type where
appropriate, e.g. for the `FUNCTION` associated type of
`GspMessageFromGsp`.

> +use crate::driver::Bar0;
> +use crate::gsp::create_pte_array;
> +use crate::gsp::fw::{GspMsgElement, MsgqRxHeader, MsgqTxHeader};
> +use crate::gsp::{GSP_PAGE_SHIFT, GSP_PAGE_SIZE};
> +use crate::regs::NV_PGSP_QUEUE_HEAD;
> +use crate::sbuffer::SBuffer;
> +use crate::util::wait_on;
> +
> +pub(crate) trait GspCommandToGsp: Sized + FromBytes + AsBytes {
> +    const FUNCTION: u32;
> +}
> +
> +pub(crate) trait GspMessageFromGsp: Sized + FromBytes + AsBytes {
> +    const FUNCTION: u32;
> +}

Do we need to repeat `Gsp` in these trait names? `CommandToGsp` and
`MessageFromGsp` should be fine.

(I am also guilty of prefixing type names to avoid name collisions - a
habit inherited from years of C programming, but since we are already in
the `gsp` module we can forgo this habit as users can just import the
module and refer to the type as `gsp::MessageFromGsp` if there is any
ambiguity).

> +
> +/// Number of GSP pages making the Msgq.
> +pub(crate) const MSGQ_NUM_PAGES: u32 = 0x3f;
> +
> +#[repr(C, align(0x1000))]
> +#[derive(Debug)]
> +struct MsgqData {
> +    data: [[u8; GSP_PAGE_SIZE]; MSGQ_NUM_PAGES as usize],
> +}
> +
> +// Annoyingly there is no real equivalent of #define so we're forced to use a
> +// literal to specify the alignment above. So check that against the actual GSP
> +// page size here.
> +static_assert!(align_of::<MsgqData>() == GSP_PAGE_SIZE);
> +
> +// There is no struct defined for this in the open-gpu-kernel-source headers.
> +// Instead it is defined by code in GspMsgQueuesInit().
> +#[repr(C)]
> +struct Msgq {
> +    tx: MsgqTxHeader,
> +    rx: MsgqRxHeader,
> +    msgq: MsgqData,
> +}
> +
> +#[repr(C)]
> +struct GspMem {
> +    ptes: [u8; GSP_PAGE_SIZE],
> +    cpuq: Msgq,
> +    gspq: Msgq,
> +}
> +
> +// SAFETY: These structs don't meet the no-padding requirements of AsBytes but
> +// that is not a problem because they are not used outside the kernel.
> +unsafe impl AsBytes for GspMem {}
> +
> +// SAFETY: These structs don't meet the no-padding requirements of FromBytes but
> +// that is not a problem because they are not used outside the kernel.
> +unsafe impl FromBytes for GspMem {}

These ARE used outside the kernel, since they are shared with the GSP.
:) I'd say the reason these are safe is just because we satisfy the
requirements of AsBytes and FromBytes:

- No initialized bytes,
- No interior mutability,
- All bytes patterns are valid (for some generous definition of
  "valid" limited to not triggering UB).

> +
> +/// `GspMem` struct that is shared with the GSP.
> +struct DmaGspMem(CoherentAllocation<GspMem>);
> +
> +impl DmaGspMem {
> +    fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
> +        const MSGQ_SIZE: u32 = size_of::<Msgq>() as u32;
> +        const RX_HDR_OFF: u32 = offset_of!(Msgq, rx) as u32;
> +
> +        let mut gsp_mem =
> +            CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
> +        create_pte_array(&mut gsp_mem, 0);
> +        dma_write!(gsp_mem[0].cpuq.tx = MsgqTxHeader::new(MSGQ_SIZE, RX_HDR_OFF))?;
> +        dma_write!(gsp_mem[0].cpuq.rx = MsgqRxHeader::new())?;
> +
> +        Ok(Self(gsp_mem))
> +    }
> +
> +    #[expect(unused)]
> +    fn dma_handle(&self) -> DmaAddress {
> +        self.0.dma_handle()
> +    }
> +
> +    /// # Safety
> +    ///
> +    /// The caller must ensure that the device doesn't access the parts of the [`GspMem`] it works
> +    /// with.
> +    unsafe fn access_mut(&mut self) -> &mut GspMem {
> +        // SAFETY:
> +        // - The [`CoherentAllocation`] contains exactly one object.
> +        // - Per the safety statement of the function, no concurrent access wil be performed.
> +        &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0]
> +    }
> +
> +    /// # Safety
> +    ///
> +    /// The caller must ensure that the device doesn't access the parts of the [`GspMem`] it works
> +    /// with.
> +    unsafe fn access(&self) -> &GspMem {
> +        // SAFETY:
> +        // - The [`CoherentAllocation`] contains exactly one object.
> +        // - Per the safety statement of the function, no concurrent access wil be performed.
> +        &unsafe { self.0.as_slice(0, 1) }.unwrap()[0]
> +    }

Since these two methods are only called once each from
`driver_write/read_area`, let's inline them there and reduce our
`unsafe` keyword counter.

> +
> +    fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
> +        let tx = self.cpu_write_ptr() as usize;
> +        let rx = self.gsp_read_ptr() as usize;
> +
> +        // SAFETY: we will only access the driver-owned part of the shared memory.
> +        let gsp_mem = unsafe { self.access_mut() };
> +        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
> +
> +        if rx <= tx {
> +            // The area from `tx` up to the end of the ring, and from the beginning of the ring up
> +            // to `rx`, minus one unit, belongs to the driver.
> +            if rx == 0 {
> +                let last = after_tx.len() - 1;
> +                (&mut after_tx[..last], &mut before_tx[0..0])
> +            } else {
> +                (after_tx, &mut before_tx[..rx])
> +            }
> +        } else {
> +            // The area from `tx` to `rx`, minus one unit, belongs to the driver.
> +            (after_tx.split_at_mut(rx - tx).0, &mut before_tx[0..0])
> +        }
> +    }
> +
> +    fn driver_read_area(&self) -> (&[[u8; GSP_PAGE_SIZE]], &[[u8; GSP_PAGE_SIZE]]) {
> +        let tx = self.gsp_write_ptr() as usize;
> +        let rx = self.cpu_read_ptr() as usize;
> +
> +        // SAFETY: we will only access the driver-owned part of the shared memory.
> +        let gsp_mem = unsafe { self.access() };
> +        let (before_rx, after_rx) = gsp_mem.gspq.msgq.data.split_at(rx);
> +
> +        if tx <= rx {
> +            // The area from `rx` up to the end of the ring, and from the beginning of the ring up
> +            // to `tx`, minus one unit, belongs to the driver.
> +            if tx == 0 {
> +                let last = after_rx.len() - 1;
> +                (&after_rx[..last], &before_rx[0..0])
> +            } else {
> +                (after_rx, &before_rx[..tx])
> +            }
> +        } else {
> +            // The area from `rx` to `tx`, minus one unit, belongs to the driver.
> +            (after_rx.split_at(tx - rx).0, &before_rx[0..0])
> +        }
> +    }

As we discussed offline, this method is incorrect (amongst other things,
it returns the whole ring buffer when it should be empty). Posting my
suggested diff for the record:

--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -152,17 +152,12 @@ unsafe fn access(&self) -> &GspMem {
         let gsp_mem = unsafe { self.access() };
         let (before_rx, after_rx) = gsp_mem.gspq.msgq.data.split_at(rx);

-        if tx <= rx {
+        if tx < rx {
             // The area from `rx` up to the end of the ring, and from the beginning of the ring up
-            // to `tx`, minus one unit, belongs to the driver.
-            if tx == 0 {
-                let last = after_rx.len() - 1;
-                (&after_rx[..last], &before_rx[0..0])
-            } else {
-                (after_rx, &before_rx[..tx])
-            }
+            // to `tx` belongs to the driver.
+            (after_rx, &before_rx[0..tx])
         } else {
-            // The area from `rx` to `tx`, minus one unit, belongs to the driver.
+            // The area from `rx` to `tx` belongs to the driver.
             (after_rx.split_at(tx - rx).0, &before_rx[0..0])
         }
     }

> +
> +    fn gsp_write_ptr(&self) -> u32 {
> +        let gsp_mem = &self.0;
> +        dma_read!(gsp_mem[0].gspq.tx.writePtr).unwrap() % MSGQ_NUM_PAGES
> +    }
> +
> +    fn gsp_read_ptr(&self) -> u32 {
> +        let gsp_mem = &self.0;
> +        dma_read!(gsp_mem[0].gspq.rx.readPtr).unwrap() % MSGQ_NUM_PAGES
> +    }
> +
> +    fn cpu_read_ptr(&self) -> u32 {
> +        let gsp_mem = &self.0;
> +        dma_read!(gsp_mem[0].cpuq.rx.readPtr).unwrap() % MSGQ_NUM_PAGES
> +    }

Maybe add one line of documentation for these. Generally I think we want
a bit more high-level documentation explaining how the ring buffers are
operating.

> +
> +    /// Inform the GSP that it can send `elem_count` new pages into the message queue.
> +    fn advance_cpu_read_ptr(&mut self, elem_count: u32) {
> +        let gsp_mem = &self.0;
> +        let rptr = self.cpu_read_ptr().wrapping_add(elem_count) % MSGQ_NUM_PAGES;
> +
> +        // Ensure read pointer is properly ordered
> +        fence(Ordering::SeqCst);
> +
> +        dma_write!(gsp_mem[0].cpuq.rx.readPtr = rptr).unwrap();
> +    }
> +
> +    fn cpu_write_ptr(&self) -> u32 {
> +        let gsp_mem = &self.0;
> +        dma_read!(gsp_mem[0].cpuq.tx.writePtr).unwrap() % MSGQ_NUM_PAGES
> +    }
> +
> +    /// Inform the GSP that it can process `elem_count` new pages from the command queue.
> +    fn advance_cpu_write_ptr(&mut self, elem_count: u32) {
> +        let gsp_mem = &self.0;
> +        let wptr = self.cpu_write_ptr().wrapping_add(elem_count) & MSGQ_NUM_PAGES;
> +        dma_write!(gsp_mem[0].cpuq.tx.writePtr = wptr).unwrap();
> +
> +        // Ensure all command data is visible before triggering the GSP read
> +        fence(Ordering::SeqCst);
> +    }
> +}
> +
> +pub(crate) struct GspCmdq {

Similar to my previous comment, we can just name this `Cmdq` since we
are already in the `gsp` module.

As a general comment, let's also document our types/methods a bit more,
explaining at least what they are for.

> +    dev: ARef<device::Device>,
> +    seq: u32,
> +    gsp_mem: DmaGspMem,
> +    pub _nr_ptes: u32,
> +}
> +
> +impl GspCmdq {
> +    pub(crate) fn new(dev: &device::Device<device::Bound>) -> Result<GspCmdq> {
> +        let gsp_mem = DmaGspMem::new(dev)?;
> +        let nr_ptes = size_of::<GspMem>() >> GSP_PAGE_SHIFT;
> +        build_assert!(nr_ptes * size_of::<u64>() <= GSP_PAGE_SIZE);
> +
> +        Ok(GspCmdq {
> +            dev: dev.into(),
> +            seq: 0,
> +            gsp_mem,
> +            _nr_ptes: nr_ptes as u32,
> +        })
> +    }
> +
> +    fn calculate_checksum<T: Iterator<Item = u8>>(it: T) -> u32 {
> +        let sum64 = it
> +            .enumerate()
> +            .map(|(idx, byte)| (((idx % 8) * 8) as u32, byte))
> +            .fold(0, |acc, (rol, byte)| acc ^ u64::from(byte).rotate_left(rol));
> +
> +        ((sum64 >> 32) as u32) ^ (sum64 as u32)
> +    }
> +
> +    #[expect(unused)]
> +    pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
> +        &mut self,
> +        bar: &Bar0,
> +        payload_size: usize,
> +        init: impl FnOnce(&mut M, SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result,

This works pretty well for in-place initialization.

There a couple of small drawbacks though: `M` must implement `FromBytes`
even though we only send it, and (more serious) there is no guarantee
that `init` will fully initialize the command - a forgetful caller can
omit some of its fields, or even the whole structure, and in that case
we will send a command with what happened to be at that position of the
queue at that time.

I think this is a good case for using the `Init` trait: it's like
`PinInit`, but without the `Pin`, and it ensures that the whole argument
is initialized. So this method would change into something like:

    pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
        &mut self,
        bar: &Bar0,
        payload_size: usize,
        init: impl Init<M, Error>,
        init_sbuf: impl FnOnce(SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result,

This also allows us to drop the `FromBytes` requirement on
`GspCommandToGsp`! But also requires us to use `unsafe` to call
`Init::__init` on the pointer to the command. I think it's worth it
though, as this removes the risk of sending partially-uninitialized
commands.

Then there is what to do with the `SBuffer`. I'd like to think a bit
more about this, as not all commands require it, and those that do
typically send arrays of particular types. I think we should be able to
use the type system to have more control over this, but let's keep that
for the next revision.

> +    ) -> Result {
> +        // TODO: a method that extracts the regions for a given command?
> +        // ... and another that reduces the region to a given number of bytes!
> +        let driver_area = self.gsp_mem.driver_write_area();
> +        let free_tx_pages = driver_area.0.len() + driver_area.1.len();
> +
> +        // Total size of the message, including the headers, command, and optional payload.
> +        let msg_size = size_of::<GspMsgElement>() + size_of::<M>() + payload_size;
> +        if free_tx_pages < msg_size.div_ceil(GSP_PAGE_SIZE) {
> +            return Err(EAGAIN);
> +        }
> +
> +        let (msg_header, cmd, payload_1, payload_2) = {
> +            // TODO: find an alternative to as_flattened_mut()

I think we can use it if we enable the "slice_flatten" feature (stable
since 1.80, introduced in 1.67).

> +            #[allow(clippy::incompatible_msrv)]
> +            let (msg_header_slice, slice_1) = driver_area
> +                .0
> +                .as_flattened_mut()
> +                .split_at_mut(size_of::<GspMsgElement>());
> +            let msg_header = GspMsgElement::from_bytes_mut(msg_header_slice).ok_or(EINVAL)?;
> +            let (cmd_slice, payload_1) = slice_1.split_at_mut(size_of::<M>());
> +            let cmd = M::from_bytes_mut(cmd_slice).ok_or(EINVAL)?;
> +            // TODO: find an alternative to as_flattened_mut()
> +            #[allow(clippy::incompatible_msrv)]
> +            let payload_2 = driver_area.1.as_flattened_mut();
> +            // TODO: Replace this workaround to cut the payload size.
> +            let (payload_1, payload_2) = match payload_size.checked_sub(payload_1.len()) {
> +                // The payload is longer than `payload_1`, set `payload_2` size to the difference.
> +                Some(payload_2_len) => (payload_1, &mut payload_2[..payload_2_len]),
> +                // `payload_1` is longer than the payload, we need to reduce its size.
> +                None => (&mut payload_1[..payload_size], payload_2),
> +            };

We will want (either you or I) to address these TODOs for the next
revision.

> +
> +            (msg_header, cmd, payload_1, payload_2)
> +        };
> +
> +        let sbuffer = SBuffer::new_writer([&mut payload_1[..], &mut payload_2[..]]);
> +        init(cmd, sbuffer)?;
> +
> +        *msg_header = GspMsgElement::new(self.seq, size_of::<M>() + payload_size, M::FUNCTION);
> +        msg_header.checkSum = GspCmdq::calculate_checksum(SBuffer::new_reader([
> +            msg_header.as_bytes(),
> +            cmd.as_bytes(),
> +            payload_1,
> +            payload_2,
> +        ]));
> +
> +        let rpc_header = &msg_header.rpc;
> +        dev_info!(
> +            &self.dev,
> +            "GSP RPC: send: seq# {}, function=0x{:x} ({}), length=0x{:x}\n",
> +            self.seq,
> +            rpc_header.function,
> +            decode_gsp_function(rpc_header.function),
> +            rpc_header.length,
> +        );
> +
> +        let elem_count = msg_header.elemCount;
> +        self.seq += 1;
> +        self.gsp_mem.advance_cpu_write_ptr(elem_count);
> +        NV_PGSP_QUEUE_HEAD::default().set_address(0).write(bar);

I'm a bit surprised that we always write `0` here, can we document this
behavior, maybe in the definition of `NV_PGSP_QUEUE_HEAD`?

> +
> +        Ok(())
> +    }
> +
> +    #[expect(unused)]
> +    pub(crate) fn receive_msg_from_gsp<M: GspMessageFromGsp, R>(
> +        &mut self,
> +        timeout: Delta,
> +        init: impl FnOnce(&M, SBuffer<core::array::IntoIter<&[u8], 2>>) -> Result<R>,
> +    ) -> Result<R> {
> +        let (driver_area, msg_header, slice_1) = wait_on(timeout, || {
> +            let driver_area = self.gsp_mem.driver_read_area();
> +            // TODO: find an alternative to as_flattened()
> +            #[allow(clippy::incompatible_msrv)]
> +            let (msg_header_slice, slice_1) = driver_area
> +                .0
> +                .as_flattened()
> +                .split_at(size_of::<GspMsgElement>());

Beware that `split_at` will panic if the slice is shorter than the
passed argument. So we must check here that the driver area is larger
than `GspMsgElement`.

> +
> +            // Can't fail because msg_slice will always be
> +            // size_of::<GspMsgElement>() bytes long by the above split.
> +            let msg_header = GspMsgElement::from_bytes(msg_header_slice).unwrap();
> +            if msg_header.rpc.length < size_of::<M>() as u32 {
> +                return None;
> +            }

If we have a message in the queue and this condition doesn't hold, I
don't think we can hope that it will in a further iteration - this
should be an error.

> +
> +            Some((driver_area, msg_header, slice_1))
> +        })?;
> +
> +        let (cmd_slice, payload_1) = slice_1.split_at(size_of::<M>());
> +        let cmd = M::from_bytes(cmd_slice).ok_or(EINVAL)?;
> +        // TODO: find an alternative to as_flattened()
> +        #[allow(clippy::incompatible_msrv)]
> +        let payload_2 = driver_area.1.as_flattened();

There is an issue here - payload_1 and payload_2 cover the *whole* area
that is readable by the driver, not just the first message in the queue.

If there is more than one message pending when this method is called, we
will get a wrong checksum and skip all the following messages. We need
to truncate payload_1/payload_2 to cover the exact area of the first
message only.

> +
> +        // Log RPC receive with message type decoding
> +        dev_info!(
> +            self.dev,
> +            "GSP RPC: receive: seq# {}, function=0x{:x} ({}), length=0x{:x}\n",
> +            msg_header.rpc.sequence,
> +            msg_header.rpc.function,
> +            decode_gsp_function(msg_header.rpc.function),
> +            msg_header.rpc.length,
> +        );
> +
> +        if GspCmdq::calculate_checksum(SBuffer::new_reader([
> +            msg_header.as_bytes(),
> +            cmd.as_bytes(),
> +            payload_1,
> +            payload_2,
> +        ])) != 0
> +        {
> +            dev_err!(
> +                self.dev,
> +                "GSP RPC: receive: Call {} - bad checksum",
> +                msg_header.rpc.sequence
> +            );
> +            return Err(EIO);
> +        }
> +
> +        let result = if msg_header.rpc.function == M::FUNCTION {

This should be done way earlier. At this point we have already converted
the command bytes slices into M, which is invalid if it happens that
this condition doesn't hold.

(on a related note, the checksum verification should also be done before
we interpret the message, as a corrupted message could make us cast
`cmd` into something that it is not)

> +            let sbuffer = SBuffer::new_reader([payload_1, payload_2]);
> +            init(cmd, sbuffer)
> +        } else {
> +            Err(ERANGE)
> +        };
> +
> +        self.gsp_mem
> +            .advance_cpu_read_ptr(msg_header.rpc.length.div_ceil(GSP_PAGE_SIZE as u32));

There is a landmine here. `msg_header.rpc.length` contains the length of
the payload, INCLUDING the RPC header itself, but NOT INCLUDING the
remainder of `msg_header`. Therefore the correct statement should be:

    self.gsp_mem.advance_cpu_read_ptr(
        (size_of_val(msg_header) as u32 - size_of_val(&msg_header.rpc) as u32
            + msg_header.rpc.length)
            .div_ceil(GSP_PAGE_SIZE as u32),
    );

Of course, it looks horrible, so we want to hide this member altogether
and provide a nice, well-documented method that returns something that
is immediately useful for us. More on that in `fw.rs`.

(the previous use of `length` in this method is also incorrect).

> +        result
> +    }
> +}
> +
> +fn decode_gsp_function(function: u32) -> &'static str {

Once we have proper enums for the function and events, this can be an
`as_str` method.

> +    match function {
> +        // Common function codes
> +        NV_VGPU_MSG_FUNCTION_NOP => "NOP",
> +        NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO => "SET_GUEST_SYSTEM_INFO",
> +        NV_VGPU_MSG_FUNCTION_ALLOC_ROOT => "ALLOC_ROOT",
> +        NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE => "ALLOC_DEVICE",
> +        NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY => "ALLOC_MEMORY",
> +        NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA => "ALLOC_CTX_DMA",
> +        NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA => "ALLOC_CHANNEL_DMA",
> +        NV_VGPU_MSG_FUNCTION_MAP_MEMORY => "MAP_MEMORY",
> +        NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA => "BIND_CTX_DMA",
> +        NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT => "ALLOC_OBJECT",
> +        NV_VGPU_MSG_FUNCTION_FREE => "FREE",
> +        NV_VGPU_MSG_FUNCTION_LOG => "LOG",
> +        NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO => "GET_GSP_STATIC_INFO",
> +        NV_VGPU_MSG_FUNCTION_SET_REGISTRY => "SET_REGISTRY",
> +        NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO => "GSP_SET_SYSTEM_INFO",
> +        NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU => "GSP_INIT_POST_OBJGPU",
> +        NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL => "GSP_RM_CONTROL",
> +        NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO => "GET_STATIC_INFO",
> +
> +        // Event codes
> +        NV_VGPU_MSG_EVENT_GSP_INIT_DONE => "INIT_DONE",
> +        NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER => "RUN_CPU_SEQUENCER",
> +        NV_VGPU_MSG_EVENT_POST_EVENT => "POST_EVENT",
> +        NV_VGPU_MSG_EVENT_RC_TRIGGERED => "RC_TRIGGERED",
> +        NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED => "MMU_FAULT_QUEUED",
> +        NV_VGPU_MSG_EVENT_OS_ERROR_LOG => "OS_ERROR_LOG",
> +        NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD => "NOCAT",
> +        NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE => "LOCKDOWN_NOTICE",
> +        NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT => "LIBOS_PRINT",
> +
> +        // Default for unknown codes
> +        _ => "UNKNOWN",
> +    }
> +}
> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> index 7f4fe684ddaf..2e4255301e58 100644
> --- a/drivers/gpu/nova-core/gsp/fw.rs
> +++ b/drivers/gpu/nova-core/gsp/fw.rs
> @@ -15,7 +15,9 @@
>  use crate::firmware::gsp::GspFirmware;
>  use crate::gpu::Chipset;
>  use crate::gsp;
> +use crate::gsp::cmdq::MSGQ_NUM_PAGES;

I guess the number of pages in the message queue is firmware-dependent,
so would it make sense to move its declaration to this module?

>  use crate::gsp::FbLayout;
> +use crate::gsp::GSP_PAGE_SIZE;
>  
>  /// Dummy type to group methods related to heap parameters for running the GSP firmware.
>  pub(crate) struct GspFwHeapParams(());
> @@ -159,6 +161,37 @@ pub(crate) fn new(gsp_firmware: &GspFirmware, fb_layout: &FbLayout) -> Self {
>      // GSP firmware constants
>      GSP_FW_WPR_META_MAGIC,
>      GSP_FW_WPR_META_REVISION,
> +
> +    // GSP events
> +    NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
> +    NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
> +    NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD,
> +    NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER,
> +    NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED,
> +    NV_VGPU_MSG_EVENT_OS_ERROR_LOG,
> +    NV_VGPU_MSG_EVENT_POST_EVENT,
> +    NV_VGPU_MSG_EVENT_RC_TRIGGERED,
> +    NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
> +
> +    // GSP function calls
> +    NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT,
> +    NV_VGPU_MSG_FUNCTION_ALLOC_ROOT,
> +    NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA,
> +    NV_VGPU_MSG_FUNCTION_FREE,
> +    NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO,
> +    NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
> +    NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU,
> +    NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
> +    NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO,
> +    NV_VGPU_MSG_FUNCTION_LOG,
> +    NV_VGPU_MSG_FUNCTION_MAP_MEMORY,
> +    NV_VGPU_MSG_FUNCTION_NOP,
> +    NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
> +    NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
>  };
>  
>  #[repr(transparent)]
> @@ -197,3 +230,86 @@ fn id8(name: &str) -> u64 {
>          })
>      }
>  }
> +
> +pub(crate) type MsgqTxHeader = bindings::msgqTxHeader;

This should be wrapped into a newtype that provides the exact set of
features required by the gsp module, like what is done for
`LibosMemoryRegionInitArgument`. For this type we just need two things:
return the `writePtr`, and advance it by a given value. That's all
the parent module needs to see.

By just aliasing this type, we make all its members visible to the `gsp`
module. This increases its dependency on a given firmware version,
carries a risk that the GSP module will mess with what it is not
supposed to, and introduces inconsistency in how we abstract the
firmware types - some are wrapped, others are not. Let's be consistent
and make all bindings completely opaque outside of `fw.rs`.

> +
> +// SAFETY: Padding is explicit and will not contain uninitialized data.
> +unsafe impl AsBytes for MsgqTxHeader {}
> +
> +impl MsgqTxHeader {
> +    pub(crate) fn new(msgq_size: u32, rx_hdr_offset: u32) -> Self {
> +        Self {
> +            version: 0,
> +            size: msgq_size,
> +            msgSize: GSP_PAGE_SIZE as u32,
> +            msgCount: MSGQ_NUM_PAGES,
> +            writePtr: 0,
> +            flags: 1,
> +            rxHdrOff: rx_hdr_offset,
> +            entryOff: GSP_PAGE_SIZE as u32,
> +        }
> +    }
> +}
> +
> +/// RX header for setting up a message queue with the GSP.
> +///
> +/// # Invariants
> +///
> +/// [`Self::read_ptr`] is guaranteed to return a value in the range `0..NUM_PAGES`.
> +pub(crate) type MsgqRxHeader = bindings::msgqRxHeader;
> +
> +// SAFETY: Padding is explicit and will not contain uninitialized data.
> +unsafe impl AsBytes for MsgqRxHeader {}
> +
> +impl MsgqRxHeader {
> +    pub(crate) fn new() -> Self {
> +        Default::default()
> +    }
> +}
> +
> +pub(crate) type GspRpcHeader = bindings::rpc_message_header_v;

This type too is another good illustration of why we should make our
bindings opaque. In `cmdq.rs` we access `GspRpcHeader::length` multiple
times, ignoring the fact that it also includes the size of the RPC
header itself - not just what comes after it! Since it doesn't come with
any documentation, we can be forgiven for assuming the obvious - that it
is just the size of what follows, but it is not.

What we actually want is a method on `GspMsgElement` that returns what
we actually want (and is documented as such): the actual size of the
payload following the whole header. That way there can be no room for
mistake.

> +
> +// SAFETY: Padding is explicit and will not contain uninitialized data.
> +unsafe impl AsBytes for GspRpcHeader {}
> +
> +// SAFETY: This struct only contains integer types for which all bit patterns
> +// are valid.
> +unsafe impl FromBytes for GspRpcHeader {}
> +
> +impl GspRpcHeader {
> +    pub(crate) fn new(cmd_size: u32, function: u32) -> Self {
> +        Self {
> +            // TODO: magic number
> +            header_version: 0x03000000,
> +            signature: bindings::NV_VGPU_MSG_SIGNATURE_VALID,
> +            function,
> +            // TODO: overflow check?
> +            length: size_of::<Self>() as u32 + cmd_size,
> +            rpc_result: 0xffffffff,
> +            rpc_result_private: 0xffffffff,
> +            ..Default::default()
> +        }
> +    }
> +}
> +
> +pub(crate) type GspMsgElement = bindings::GSP_MSG_QUEUE_ELEMENT;

Hammering my previous argument a bit more: in `cmdq.rs`, we do the
following:

    let msg_header = GspMsgElement::from_bytes(msg_header_slice).unwrap();

Even though we explicitly used `GspMsgElement`, `msg_header` appears as
being of type `GSP_MSG_QUEUE_ELEMENT` in LSP. That's super confusing and
can be avoided by using the newtype pattern.

Lastly, the bindings generated from C headers are supposed to be temporary, and
we eventually want to replace them with direct IDL-to-Rust bindings. Not
leaking the C types let us design `fw.rs` as a blueprint for the ideal
interface we would want to generate - so the limited extra labor that
comes with wrapping these types would also pay off in that respect.

Re: [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

Posted by Alistair Popple 2 days, 22 hours ago

On 2025-09-26 at 14:37 +1000, Alexandre Courbot <acourbot@nvidia.com> wrote...
> Hi Alistair,
> 
> On Mon Sep 22, 2025 at 8:30 PM JST, Alistair Popple wrote:
> <snip>
> > diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> > new file mode 100644
> > index 000000000000..a9ba1a4c73d8
> > --- /dev/null
> > +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> > @@ -0,0 +1,423 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +
> > +use core::mem::offset_of;
> > +use core::sync::atomic::fence;
> > +use core::sync::atomic::Ordering;
> > +
> > +use kernel::alloc::flags::GFP_KERNEL;
> > +use kernel::device;
> > +use kernel::dma::{CoherentAllocation, DmaAddress};
> > +use kernel::prelude::*;
> > +use kernel::sync::aref::ARef;
> > +use kernel::time::Delta;
> > +use kernel::transmute::{AsBytes, FromBytes};
> > +use kernel::{dma_read, dma_write};
> > +
> > +use super::fw::{
> > +    NV_VGPU_MSG_EVENT_GSP_INIT_DONE, NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
> > +    NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD, NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER,
> > +    NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED, NV_VGPU_MSG_EVENT_OS_ERROR_LOG,
> > +    NV_VGPU_MSG_EVENT_POST_EVENT, NV_VGPU_MSG_EVENT_RC_TRIGGERED,
> > +    NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT, NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA, NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY, NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_ROOT, NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA, NV_VGPU_MSG_FUNCTION_FREE,
> > +    NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO, NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
> > +    NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU, NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
> > +    NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO, NV_VGPU_MSG_FUNCTION_LOG,
> > +    NV_VGPU_MSG_FUNCTION_MAP_MEMORY, NV_VGPU_MSG_FUNCTION_NOP,
> > +    NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO, NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
> > +};
> 
> As I mentioned in v1, let's turn these into two enums to avoid this big
> import and make sure we can never mix up the values.
> 
> This can be something like this in `fw.rs`:
> 
> #[repr(u32)]
> pub(crate) enum VgpuMsgEvent {
>     GspInitDone = bindings::NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
>     GspLockDownNotice = bindings::NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
>     ...
> }
> 
> Then you just need to import `VgpuMsgEvent`, and can use that type where
> appropriate, e.g. for the `FUNCTION` associated type of
> `GspMessageFromGsp`.

Yep, makes sense. I'd just overlooked this one.

> > +use crate::driver::Bar0;
> > +use crate::gsp::create_pte_array;
> > +use crate::gsp::fw::{GspMsgElement, MsgqRxHeader, MsgqTxHeader};
> > +use crate::gsp::{GSP_PAGE_SHIFT, GSP_PAGE_SIZE};
> > +use crate::regs::NV_PGSP_QUEUE_HEAD;
> > +use crate::sbuffer::SBuffer;
> > +use crate::util::wait_on;
> > +
> > +pub(crate) trait GspCommandToGsp: Sized + FromBytes + AsBytes {
> > +    const FUNCTION: u32;
> > +}
> > +
> > +pub(crate) trait GspMessageFromGsp: Sized + FromBytes + AsBytes {
> > +    const FUNCTION: u32;
> > +}
> 
> Do we need to repeat `Gsp` in these trait names? `CommandToGsp` and
> `MessageFromGsp` should be fine.

Nah, and actually they're quite confusing really. Gsp sending a message from the
Gsp. Will fix them up.
 
> (I am also guilty of prefixing type names to avoid name collisions - a
> habit inherited from years of C programming, but since we are already in
> the `gsp` module we can forgo this habit as users can just import the
> module and refer to the type as `gsp::MessageFromGsp` if there is any
> ambiguity).

Indeed. Old habits die hard and all that.

> > +
> > +/// Number of GSP pages making the Msgq.
> > +pub(crate) const MSGQ_NUM_PAGES: u32 = 0x3f;
> > +
> > +#[repr(C, align(0x1000))]
> > +#[derive(Debug)]
> > +struct MsgqData {
> > +    data: [[u8; GSP_PAGE_SIZE]; MSGQ_NUM_PAGES as usize],
> > +}
> > +
> > +// Annoyingly there is no real equivalent of #define so we're forced to use a
> > +// literal to specify the alignment above. So check that against the actual GSP
> > +// page size here.
> > +static_assert!(align_of::<MsgqData>() == GSP_PAGE_SIZE);
> > +
> > +// There is no struct defined for this in the open-gpu-kernel-source headers.
> > +// Instead it is defined by code in GspMsgQueuesInit().
> > +#[repr(C)]
> > +struct Msgq {
> > +    tx: MsgqTxHeader,
> > +    rx: MsgqRxHeader,
> > +    msgq: MsgqData,
> > +}
> > +
> > +#[repr(C)]
> > +struct GspMem {
> > +    ptes: [u8; GSP_PAGE_SIZE],
> > +    cpuq: Msgq,
> > +    gspq: Msgq,
> > +}
> > +
> > +// SAFETY: These structs don't meet the no-padding requirements of AsBytes but
> > +// that is not a problem because they are not used outside the kernel.
> > +unsafe impl AsBytes for GspMem {}
> > +
> > +// SAFETY: These structs don't meet the no-padding requirements of FromBytes but
> > +// that is not a problem because they are not used outside the kernel.
> > +unsafe impl FromBytes for GspMem {}
> 
> These ARE used outside the kernel, since they are shared with the GSP.
> :) I'd say the reason these are safe is just because we satisfy the
> requirements of AsBytes and FromBytes:

Yes, and they're actually old safety comments that I missed updating to be the
same as the rest.
 
> - No initialized bytes,
> - No interior mutability,
> - All bytes patterns are valid (for some generous definition of
>   "valid" limited to not triggering UB).

I was under the impression that that's all `unsafe` is really marking - code
the compiler can't prove won't trigger UB. So I'd assume that's all we'd have to
prove in safety comments anyway.

> > +
> > +/// `GspMem` struct that is shared with the GSP.
> > +struct DmaGspMem(CoherentAllocation<GspMem>);
> > +
> > +impl DmaGspMem {
> > +    fn new(dev: &device::Device<device::Bound>) -> Result<Self> {
> > +        const MSGQ_SIZE: u32 = size_of::<Msgq>() as u32;
> > +        const RX_HDR_OFF: u32 = offset_of!(Msgq, rx) as u32;
> > +
> > +        let mut gsp_mem =
> > +            CoherentAllocation::<GspMem>::alloc_coherent(dev, 1, GFP_KERNEL | __GFP_ZERO)?;
> > +        create_pte_array(&mut gsp_mem, 0);
> > +        dma_write!(gsp_mem[0].cpuq.tx = MsgqTxHeader::new(MSGQ_SIZE, RX_HDR_OFF))?;
> > +        dma_write!(gsp_mem[0].cpuq.rx = MsgqRxHeader::new())?;
> > +
> > +        Ok(Self(gsp_mem))
> > +    }
> > +
> > +    #[expect(unused)]
> > +    fn dma_handle(&self) -> DmaAddress {
> > +        self.0.dma_handle()
> > +    }
> > +
> > +    /// # Safety
> > +    ///
> > +    /// The caller must ensure that the device doesn't access the parts of the [`GspMem`] it works
> > +    /// with.
> > +    unsafe fn access_mut(&mut self) -> &mut GspMem {
> > +        // SAFETY:
> > +        // - The [`CoherentAllocation`] contains exactly one object.
> > +        // - Per the safety statement of the function, no concurrent access wil be performed.
> > +        &mut unsafe { self.0.as_slice_mut(0, 1) }.unwrap()[0]
> > +    }
> > +
> > +    /// # Safety
> > +    ///
> > +    /// The caller must ensure that the device doesn't access the parts of the [`GspMem`] it works
> > +    /// with.
> > +    unsafe fn access(&self) -> &GspMem {
> > +        // SAFETY:
> > +        // - The [`CoherentAllocation`] contains exactly one object.
> > +        // - Per the safety statement of the function, no concurrent access wil be performed.
> > +        &unsafe { self.0.as_slice(0, 1) }.unwrap()[0]
> > +    }
> 
> Since these two methods are only called once each from
> `driver_write/read_area`, let's inline them there and reduce our
> `unsafe` keyword counter.

Sure.

> > +
> > +    fn driver_write_area(&mut self) -> (&mut [[u8; GSP_PAGE_SIZE]], &mut [[u8; GSP_PAGE_SIZE]]) {
> > +        let tx = self.cpu_write_ptr() as usize;
> > +        let rx = self.gsp_read_ptr() as usize;
> > +
> > +        // SAFETY: we will only access the driver-owned part of the shared memory.
> > +        let gsp_mem = unsafe { self.access_mut() };
> > +        let (before_tx, after_tx) = gsp_mem.cpuq.msgq.data.split_at_mut(tx);
> > +
> > +        if rx <= tx {
> > +            // The area from `tx` up to the end of the ring, and from the beginning of the ring up
> > +            // to `rx`, minus one unit, belongs to the driver.
> > +            if rx == 0 {
> > +                let last = after_tx.len() - 1;
> > +                (&mut after_tx[..last], &mut before_tx[0..0])
> > +            } else {
> > +                (after_tx, &mut before_tx[..rx])
> > +            }
> > +        } else {
> > +            // The area from `tx` to `rx`, minus one unit, belongs to the driver.
> > +            (after_tx.split_at_mut(rx - tx).0, &mut before_tx[0..0])
> > +        }
> > +    }
> > +
> > +    fn driver_read_area(&self) -> (&[[u8; GSP_PAGE_SIZE]], &[[u8; GSP_PAGE_SIZE]]) {
> > +        let tx = self.gsp_write_ptr() as usize;
> > +        let rx = self.cpu_read_ptr() as usize;
> > +
> > +        // SAFETY: we will only access the driver-owned part of the shared memory.
> > +        let gsp_mem = unsafe { self.access() };
> > +        let (before_rx, after_rx) = gsp_mem.gspq.msgq.data.split_at(rx);
> > +
> > +        if tx <= rx {
> > +            // The area from `rx` up to the end of the ring, and from the beginning of the ring up
> > +            // to `tx`, minus one unit, belongs to the driver.
> > +            if tx == 0 {
> > +                let last = after_rx.len() - 1;
> > +                (&after_rx[..last], &before_rx[0..0])
> > +            } else {
> > +                (after_rx, &before_rx[..tx])
> > +            }
> > +        } else {
> > +            // The area from `rx` to `tx`, minus one unit, belongs to the driver.
> > +            (after_rx.split_at(tx - rx).0, &before_rx[0..0])
> > +        }
> > +    }
> 
> As we discussed offline, this method is incorrect (amongst other things,
> it returns the whole ring buffer when it should be empty). Posting my
> suggested diff for the record:

Yep. I'm not sure how it was working on my setup but not yours but clearly
wrong.

> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -152,17 +152,12 @@ unsafe fn access(&self) -> &GspMem {
>          let gsp_mem = unsafe { self.access() };
>          let (before_rx, after_rx) = gsp_mem.gspq.msgq.data.split_at(rx);
> 
> -        if tx <= rx {
> +        if tx < rx {
>              // The area from `rx` up to the end of the ring, and from the beginning of the ring up
> -            // to `tx`, minus one unit, belongs to the driver.
> -            if tx == 0 {
> -                let last = after_rx.len() - 1;
> -                (&after_rx[..last], &before_rx[0..0])
> -            } else {
> -                (after_rx, &before_rx[..tx])
> -            }
> +            // to `tx` belongs to the driver.
> +            (after_rx, &before_rx[0..tx])
>          } else {
> -            // The area from `rx` to `tx`, minus one unit, belongs to the driver.
> +            // The area from `rx` to `tx` belongs to the driver.
>              (after_rx.split_at(tx - rx).0, &before_rx[0..0])
>          }
>      }

Thanks, similar to what I had. Although I also had this to avoid the panic later
when no message is present:

@@ -321,6 +315,10 @@ pub(crate) fn receive_msg_from_gsp<M: GspMessageFromGsp, R>(
     ) -> Result<R> {
         let (driver_area, msg_header, slice_1) = wait_on(timeout, || {
             let driver_area = self.gsp_mem.driver_read_area();
+            if driver_area.0.is_empty() {
+                return None;
+            }
+
             // TODO: find an alternative to as_flattened()
             #[allow(clippy::incompatible_msrv)]
             let (msg_header_slice, slice_1) = driver_area

> > +
> > +    fn gsp_write_ptr(&self) -> u32 {
> > +        let gsp_mem = &self.0;
> > +        dma_read!(gsp_mem[0].gspq.tx.writePtr).unwrap() % MSGQ_NUM_PAGES
> > +    }
> > +
> > +    fn gsp_read_ptr(&self) -> u32 {
> > +        let gsp_mem = &self.0;
> > +        dma_read!(gsp_mem[0].gspq.rx.readPtr).unwrap() % MSGQ_NUM_PAGES
> > +    }
> > +
> > +    fn cpu_read_ptr(&self) -> u32 {
> > +        let gsp_mem = &self.0;
> > +        dma_read!(gsp_mem[0].cpuq.rx.readPtr).unwrap() % MSGQ_NUM_PAGES
> > +    }
> 
> Maybe add one line of documentation for these. Generally I think we want
> a bit more high-level documentation explaining how the ring buffers are
> operating.

Yeah, good idea. Something I've been meaning to do (document the ring buffer a
bit better).

> > +
> > +    /// Inform the GSP that it can send `elem_count` new pages into the message queue.
> > +    fn advance_cpu_read_ptr(&mut self, elem_count: u32) {
> > +        let gsp_mem = &self.0;
> > +        let rptr = self.cpu_read_ptr().wrapping_add(elem_count) % MSGQ_NUM_PAGES;
> > +
> > +        // Ensure read pointer is properly ordered
> > +        fence(Ordering::SeqCst);
> > +
> > +        dma_write!(gsp_mem[0].cpuq.rx.readPtr = rptr).unwrap();
> > +    }
> > +
> > +    fn cpu_write_ptr(&self) -> u32 {
> > +        let gsp_mem = &self.0;
> > +        dma_read!(gsp_mem[0].cpuq.tx.writePtr).unwrap() % MSGQ_NUM_PAGES
> > +    }
> > +
> > +    /// Inform the GSP that it can process `elem_count` new pages from the command queue.
> > +    fn advance_cpu_write_ptr(&mut self, elem_count: u32) {
> > +        let gsp_mem = &self.0;
> > +        let wptr = self.cpu_write_ptr().wrapping_add(elem_count) & MSGQ_NUM_PAGES;
> > +        dma_write!(gsp_mem[0].cpuq.tx.writePtr = wptr).unwrap();
> > +
> > +        // Ensure all command data is visible before triggering the GSP read
> > +        fence(Ordering::SeqCst);
> > +    }
> > +}
> > +
> > +pub(crate) struct GspCmdq {
> 
> Similar to my previous comment, we can just name this `Cmdq` since we
> are already in the `gsp` module.
>
> As a general comment, let's also document our types/methods a bit more,
> explaining at least what they are for.

Ack.

> > +    dev: ARef<device::Device>,
> > +    seq: u32,
> > +    gsp_mem: DmaGspMem,
> > +    pub _nr_ptes: u32,
> > +}
> > +
> > +impl GspCmdq {
> > +    pub(crate) fn new(dev: &device::Device<device::Bound>) -> Result<GspCmdq> {
> > +        let gsp_mem = DmaGspMem::new(dev)?;
> > +        let nr_ptes = size_of::<GspMem>() >> GSP_PAGE_SHIFT;
> > +        build_assert!(nr_ptes * size_of::<u64>() <= GSP_PAGE_SIZE);
> > +
> > +        Ok(GspCmdq {
> > +            dev: dev.into(),
> > +            seq: 0,
> > +            gsp_mem,
> > +            _nr_ptes: nr_ptes as u32,
> > +        })
> > +    }
> > +
> > +    fn calculate_checksum<T: Iterator<Item = u8>>(it: T) -> u32 {
> > +        let sum64 = it
> > +            .enumerate()
> > +            .map(|(idx, byte)| (((idx % 8) * 8) as u32, byte))
> > +            .fold(0, |acc, (rol, byte)| acc ^ u64::from(byte).rotate_left(rol));
> > +
> > +        ((sum64 >> 32) as u32) ^ (sum64 as u32)
> > +    }
> > +
> > +    #[expect(unused)]
> > +    pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
> > +        &mut self,
> > +        bar: &Bar0,
> > +        payload_size: usize,
> > +        init: impl FnOnce(&mut M, SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result,
> 
> This works pretty well for in-place initialization.
> 
> There a couple of small drawbacks though: `M` must implement `FromBytes`
> even though we only send it, and (more serious) there is no guarantee

Yes, that detail annoyed me slightly too.

> that `init` will fully initialize the command - a forgetful caller can
> omit some of its fields, or even the whole structure, and in that case
> we will send a command with what happened to be at that position of the
> queue at that time.

Good timing as I was just looking to see if there wasn't some way of ensuring
this happened, or at least was much more explicit than what we currently do.

> I think this is a good case for using the `Init` trait: it's like
> `PinInit`, but without the `Pin`, and it ensures that the whole argument
> is initialized. So this method would change into something like:
> 
>     pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
>         &mut self,
>         bar: &Bar0,
>         payload_size: usize,
>         init: impl Init<M, Error>,
>         init_sbuf: impl FnOnce(SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result,
> 
> This also allows us to drop the `FromBytes` requirement on
> `GspCommandToGsp`! But also requires us to use `unsafe` to call
> `Init::__init` on the pointer to the command. I think it's worth it
> though, as this removes the risk of sending partially-uninitialized
> commands.

Agree on it being worth the unsafe call, especially because it is unsafe if the
caller doesn't do what it's supposed to regardless. But how does this contrast
with `MaybeUninit`? I was wondering if something like this would work:

    pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
        &mut self,
        bar: &Bar0,
        payload_size: usize,
        init: impl FnOnce(MaybeUninit<&mut M>, SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result<&M>,

Note I'm not sure we could actually make that work as I haven't tried it yet. I
suspect I've missed something with lifetimes there :-)

> Then there is what to do with the `SBuffer`. I'd like to think a bit
> more about this, as not all commands require it, and those that do
> typically send arrays of particular types. I think we should be able to
> use the type system to have more control over this, but let's keep that
> for the next revision.

Sure. You are correct that not all (most?) commands don't need it, and it's a
shame we don't have proper bindings for them anyway. Although in some cases that
still wouldn't work anyway (eg. for registry keys) where it really is just a
packed variable number of variable length strings. Not sure what a Rust native
representation of that would be.

> > +    ) -> Result {
> > +        // TODO: a method that extracts the regions for a given command?
> > +        // ... and another that reduces the region to a given number of bytes!
> > +        let driver_area = self.gsp_mem.driver_write_area();
> > +        let free_tx_pages = driver_area.0.len() + driver_area.1.len();
> > +
> > +        // Total size of the message, including the headers, command, and optional payload.
> > +        let msg_size = size_of::<GspMsgElement>() + size_of::<M>() + payload_size;
> > +        if free_tx_pages < msg_size.div_ceil(GSP_PAGE_SIZE) {
> > +            return Err(EAGAIN);
> > +        }
> > +
> > +        let (msg_header, cmd, payload_1, payload_2) = {
> > +            // TODO: find an alternative to as_flattened_mut()
> 
> I think we can use it if we enable the "slice_flatten" feature (stable
> since 1.80, introduced in 1.67).

Ok, will try that. Hopefully explicitly using the feature also makes clippy
happy but if not I guess we can leave the over-rides below.

> > +            #[allow(clippy::incompatible_msrv)]
> > +            let (msg_header_slice, slice_1) = driver_area
> > +                .0
> > +                .as_flattened_mut()
> > +                .split_at_mut(size_of::<GspMsgElement>());
> > +            let msg_header = GspMsgElement::from_bytes_mut(msg_header_slice).ok_or(EINVAL)?;
> > +            let (cmd_slice, payload_1) = slice_1.split_at_mut(size_of::<M>());
> > +            let cmd = M::from_bytes_mut(cmd_slice).ok_or(EINVAL)?;
> > +            // TODO: find an alternative to as_flattened_mut()
> > +            #[allow(clippy::incompatible_msrv)]
> > +            let payload_2 = driver_area.1.as_flattened_mut();
> > +            // TODO: Replace this workaround to cut the payload size.
> > +            let (payload_1, payload_2) = match payload_size.checked_sub(payload_1.len()) {
> > +                // The payload is longer than `payload_1`, set `payload_2` size to the difference.
> > +                Some(payload_2_len) => (payload_1, &mut payload_2[..payload_2_len]),
> > +                // `payload_1` is longer than the payload, we need to reduce its size.
> > +                None => (&mut payload_1[..payload_size], payload_2),
> > +            };
> 
> We will want (either you or I) to address these TODOs for the next
> revision.

Yep, I'm happy to (along with any others that are still floating about).

> > +
> > +            (msg_header, cmd, payload_1, payload_2)
> > +        };
> > +
> > +        let sbuffer = SBuffer::new_writer([&mut payload_1[..], &mut payload_2[..]]);
> > +        init(cmd, sbuffer)?;
> > +
> > +        *msg_header = GspMsgElement::new(self.seq, size_of::<M>() + payload_size, M::FUNCTION);
> > +        msg_header.checkSum = GspCmdq::calculate_checksum(SBuffer::new_reader([
> > +            msg_header.as_bytes(),
> > +            cmd.as_bytes(),
> > +            payload_1,
> > +            payload_2,
> > +        ]));
> > +
> > +        let rpc_header = &msg_header.rpc;
> > +        dev_info!(
> > +            &self.dev,
> > +            "GSP RPC: send: seq# {}, function=0x{:x} ({}), length=0x{:x}\n",
> > +            self.seq,
> > +            rpc_header.function,
> > +            decode_gsp_function(rpc_header.function),
> > +            rpc_header.length,
> > +        );
> > +
> > +        let elem_count = msg_header.elemCount;
> > +        self.seq += 1;
> > +        self.gsp_mem.advance_cpu_write_ptr(elem_count);
> > +        NV_PGSP_QUEUE_HEAD::default().set_address(0).write(bar);
> 
> I'm a bit surprised that we always write `0` here, can we document this
> behavior, maybe in the definition of `NV_PGSP_QUEUE_HEAD`?

I was also suprised the value doesn't matter but this is what both Nouveau and
OpenRM do. I guess it just triggers some kind of notification on the GSP and the
actual queue head is read from the shared memory pointers.

> > +
> > +        Ok(())
> > +    }
> > +
> > +    #[expect(unused)]
> > +    pub(crate) fn receive_msg_from_gsp<M: GspMessageFromGsp, R>(
> > +        &mut self,
> > +        timeout: Delta,
> > +        init: impl FnOnce(&M, SBuffer<core::array::IntoIter<&[u8], 2>>) -> Result<R>,
> > +    ) -> Result<R> {
> > +        let (driver_area, msg_header, slice_1) = wait_on(timeout, || {
> > +            let driver_area = self.gsp_mem.driver_read_area();
> > +            // TODO: find an alternative to as_flattened()
> > +            #[allow(clippy::incompatible_msrv)]
> > +            let (msg_header_slice, slice_1) = driver_area
> > +                .0
> > +                .as_flattened()
> > +                .split_at(size_of::<GspMsgElement>());
> 
> Beware that `split_at` will panic if the slice is shorter than the
> passed argument. So we must check here that the driver area is larger
> than `GspMsgElement`.

Yep, see the diff I had above.

> > +
> > +            // Can't fail because msg_slice will always be
> > +            // size_of::<GspMsgElement>() bytes long by the above split.
> > +            let msg_header = GspMsgElement::from_bytes(msg_header_slice).unwrap();
> > +            if msg_header.rpc.length < size_of::<M>() as u32 {
> > +                return None;
> > +            }
> 
> If we have a message in the queue and this condition doesn't hold, I
> don't think we can hope that it will in a further iteration - this
> should be an error.

I was thinking messages may get partially written and we'd see this state, but
you're right - the shared pointers shouldn't be updated until the entire message
is written so this should be an error.

That will require the `wait_on_result()` function, because `wait_on()` doesn't
allow the closure to return a failure. We could just return Option<Result<_>>
from the closure but that's a bit gross, because wait_on() would then return
Result<Option<Result<_>>>.

> > +
> > +            Some((driver_area, msg_header, slice_1))
> > +        })?;
> > +
> > +        let (cmd_slice, payload_1) = slice_1.split_at(size_of::<M>());
> > +        let cmd = M::from_bytes(cmd_slice).ok_or(EINVAL)?;
> > +        // TODO: find an alternative to as_flattened()
> > +        #[allow(clippy::incompatible_msrv)]
> > +        let payload_2 = driver_area.1.as_flattened();
> 
> There is an issue here - payload_1 and payload_2 cover the *whole* area
> that is readable by the driver, not just the first message in the queue.

Oh good catch.

> If there is more than one message pending when this method is called, we
> will get a wrong checksum and skip all the following messages. We need
> to truncate payload_1/payload_2 to cover the exact area of the first
> message only.
>
> > +
> > +        // Log RPC receive with message type decoding
> > +        dev_info!(
> > +            self.dev,
> > +            "GSP RPC: receive: seq# {}, function=0x{:x} ({}), length=0x{:x}\n",
> > +            msg_header.rpc.sequence,
> > +            msg_header.rpc.function,
> > +            decode_gsp_function(msg_header.rpc.function),
> > +            msg_header.rpc.length,
> > +        );
> > +
> > +        if GspCmdq::calculate_checksum(SBuffer::new_reader([
> > +            msg_header.as_bytes(),
> > +            cmd.as_bytes(),
> > +            payload_1,
> > +            payload_2,
> > +        ])) != 0
> > +        {
> > +            dev_err!(
> > +                self.dev,
> > +                "GSP RPC: receive: Call {} - bad checksum",
> > +                msg_header.rpc.sequence
> > +            );
> > +            return Err(EIO);
> > +        }
> > +
> > +        let result = if msg_header.rpc.function == M::FUNCTION {
> 
> This should be done way earlier. At this point we have already converted
> the command bytes slices into M, which is invalid if it happens that
> this condition doesn't hold.
> 
> (on a related note, the checksum verification should also be done before
> we interpret the message, as a corrupted message could make us cast
> `cmd` into something that it is not)

Sounds good.

> > +            let sbuffer = SBuffer::new_reader([payload_1, payload_2]);
> > +            init(cmd, sbuffer)
> > +        } else {
> > +            Err(ERANGE)
> > +        };
> > +
> > +        self.gsp_mem
> > +            .advance_cpu_read_ptr(msg_header.rpc.length.div_ceil(GSP_PAGE_SIZE as u32));
> 
> There is a landmine here. `msg_header.rpc.length` contains the length of
> the payload, INCLUDING the RPC header itself, but NOT INCLUDING the
> remainder of `msg_header`. Therefore the correct statement should be:
> 
>     self.gsp_mem.advance_cpu_read_ptr(
>         (size_of_val(msg_header) as u32 - size_of_val(&msg_header.rpc) as u32
>             + msg_header.rpc.length)
>             .div_ceil(GSP_PAGE_SIZE as u32),
>     );
> 
> Of course, it looks horrible, so we want to hide this member altogether
> and provide a nice, well-documented method that returns something that
> is immediately useful for us. More on that in `fw.rs`.
> 
> (the previous use of `length` in this method is also incorrect).
> 
> > +        result
> > +    }
> > +}
> > +
> > +fn decode_gsp_function(function: u32) -> &'static str {
> 
> Once we have proper enums for the function and events, this can be an
> `as_str` method.

Yep.

> > +    match function {
> > +        // Common function codes
> > +        NV_VGPU_MSG_FUNCTION_NOP => "NOP",
> > +        NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO => "SET_GUEST_SYSTEM_INFO",
> > +        NV_VGPU_MSG_FUNCTION_ALLOC_ROOT => "ALLOC_ROOT",
> > +        NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE => "ALLOC_DEVICE",
> > +        NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY => "ALLOC_MEMORY",
> > +        NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA => "ALLOC_CTX_DMA",
> > +        NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA => "ALLOC_CHANNEL_DMA",
> > +        NV_VGPU_MSG_FUNCTION_MAP_MEMORY => "MAP_MEMORY",
> > +        NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA => "BIND_CTX_DMA",
> > +        NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT => "ALLOC_OBJECT",
> > +        NV_VGPU_MSG_FUNCTION_FREE => "FREE",
> > +        NV_VGPU_MSG_FUNCTION_LOG => "LOG",
> > +        NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO => "GET_GSP_STATIC_INFO",
> > +        NV_VGPU_MSG_FUNCTION_SET_REGISTRY => "SET_REGISTRY",
> > +        NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO => "GSP_SET_SYSTEM_INFO",
> > +        NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU => "GSP_INIT_POST_OBJGPU",
> > +        NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL => "GSP_RM_CONTROL",
> > +        NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO => "GET_STATIC_INFO",
> > +
> > +        // Event codes
> > +        NV_VGPU_MSG_EVENT_GSP_INIT_DONE => "INIT_DONE",
> > +        NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER => "RUN_CPU_SEQUENCER",
> > +        NV_VGPU_MSG_EVENT_POST_EVENT => "POST_EVENT",
> > +        NV_VGPU_MSG_EVENT_RC_TRIGGERED => "RC_TRIGGERED",
> > +        NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED => "MMU_FAULT_QUEUED",
> > +        NV_VGPU_MSG_EVENT_OS_ERROR_LOG => "OS_ERROR_LOG",
> > +        NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD => "NOCAT",
> > +        NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE => "LOCKDOWN_NOTICE",
> > +        NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT => "LIBOS_PRINT",
> > +
> > +        // Default for unknown codes
> > +        _ => "UNKNOWN",
> > +    }
> > +}
> > diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> > index 7f4fe684ddaf..2e4255301e58 100644
> > --- a/drivers/gpu/nova-core/gsp/fw.rs
> > +++ b/drivers/gpu/nova-core/gsp/fw.rs
> > @@ -15,7 +15,9 @@
> >  use crate::firmware::gsp::GspFirmware;
> >  use crate::gpu::Chipset;
> >  use crate::gsp;
> > +use crate::gsp::cmdq::MSGQ_NUM_PAGES;
> 
> I guess the number of pages in the message queue is firmware-dependent,
> so would it make sense to move its declaration to this module?

It's not firmware dependent - it appears we can set it to whatever we reasonably
want (for some value of "reasonable) so long as we allocate the right buffer
sizes, etc. I noticed and tested this a while back when removing some magic
numbers.

That said 0x3f is what I chose because that's what Nouveau and I think OpenRM
use.

> >  use crate::gsp::FbLayout;
> > +use crate::gsp::GSP_PAGE_SIZE;
> >  
> >  /// Dummy type to group methods related to heap parameters for running the GSP firmware.
> >  pub(crate) struct GspFwHeapParams(());
> > @@ -159,6 +161,37 @@ pub(crate) fn new(gsp_firmware: &GspFirmware, fb_layout: &FbLayout) -> Self {
> >      // GSP firmware constants
> >      GSP_FW_WPR_META_MAGIC,
> >      GSP_FW_WPR_META_REVISION,
> > +
> > +    // GSP events
> > +    NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
> > +    NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
> > +    NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD,
> > +    NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER,
> > +    NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED,
> > +    NV_VGPU_MSG_EVENT_OS_ERROR_LOG,
> > +    NV_VGPU_MSG_EVENT_POST_EVENT,
> > +    NV_VGPU_MSG_EVENT_RC_TRIGGERED,
> > +    NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
> > +
> > +    // GSP function calls
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT,
> > +    NV_VGPU_MSG_FUNCTION_ALLOC_ROOT,
> > +    NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA,
> > +    NV_VGPU_MSG_FUNCTION_FREE,
> > +    NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO,
> > +    NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
> > +    NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU,
> > +    NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
> > +    NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO,
> > +    NV_VGPU_MSG_FUNCTION_LOG,
> > +    NV_VGPU_MSG_FUNCTION_MAP_MEMORY,
> > +    NV_VGPU_MSG_FUNCTION_NOP,
> > +    NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
> > +    NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
> >  };
> >  
> >  #[repr(transparent)]
> > @@ -197,3 +230,86 @@ fn id8(name: &str) -> u64 {
> >          })
> >      }
> >  }
> > +
> > +pub(crate) type MsgqTxHeader = bindings::msgqTxHeader;
> 
> This should be wrapped into a newtype that provides the exact set of
> features required by the gsp module, like what is done for
> `LibosMemoryRegionInitArgument`. For this type we just need two things:
> return the `writePtr`, and advance it by a given value. That's all
> the parent module needs to see.

Except that doesn't actually work, because we need to use DMA access methods on
bindings::msgqTxHeader which we can't do from an accessor.

> By just aliasing this type, we make all its members visible to the `gsp`
> module. This increases its dependency on a given firmware version,
> carries a risk that the GSP module will mess with what it is not
> supposed to, and introduces inconsistency in how we abstract the
> firmware types - some are wrapped, others are not. Let's be consistent
> and make all bindings completely opaque outside of `fw.rs`.

I can't see how accessors in `fw.rs` for every firmware binding we use is useful
though[1]? In what way does that reduce dependency on a given firmware version?
I can't see how it isn't just adding boiler plate for every struct field we want
to access.

We already have higher level abstractions (eg. DmaGspMem) to ensure the GSP
won't mess with what it isn't supposed to, and these higher level abstractions
also provide more options to reduce dependency on a given firmware version as
the logic isn't as tightly coupled to the binding representation. It also allows
us to access the fields using the proper DMA access methods.

[1] - Except for maybe removing some nonRustySnakeyNames

> > +
> > +// SAFETY: Padding is explicit and will not contain uninitialized data.
> > +unsafe impl AsBytes for MsgqTxHeader {}
> > +
> > +impl MsgqTxHeader {
> > +    pub(crate) fn new(msgq_size: u32, rx_hdr_offset: u32) -> Self {
> > +        Self {
> > +            version: 0,
> > +            size: msgq_size,
> > +            msgSize: GSP_PAGE_SIZE as u32,
> > +            msgCount: MSGQ_NUM_PAGES,
> > +            writePtr: 0,
> > +            flags: 1,
> > +            rxHdrOff: rx_hdr_offset,
> > +            entryOff: GSP_PAGE_SIZE as u32,
> > +        }
> > +    }
> > +}
> > +
> > +/// RX header for setting up a message queue with the GSP.
> > +///
> > +/// # Invariants
> > +///
> > +/// [`Self::read_ptr`] is guaranteed to return a value in the range `0..NUM_PAGES`.
> > +pub(crate) type MsgqRxHeader = bindings::msgqRxHeader;
> > +
> > +// SAFETY: Padding is explicit and will not contain uninitialized data.
> > +unsafe impl AsBytes for MsgqRxHeader {}
> > +
> > +impl MsgqRxHeader {
> > +    pub(crate) fn new() -> Self {
> > +        Default::default()
> > +    }
> > +}
> > +
> > +pub(crate) type GspRpcHeader = bindings::rpc_message_header_v;
> 
> This type too is another good illustration of why we should make our
> bindings opaque. In `cmdq.rs` we access `GspRpcHeader::length` multiple
> times, ignoring the fact that it also includes the size of the RPC
> header itself - not just what comes after it! Since it doesn't come with
> any documentation, we can be forgiven for assuming the obvious - that it
> is just the size of what follows, but it is not.

This has nothing to do with the bindings being opaque or not though. I can
almost gurantee that after writing ten brain dead accessor functions that the
eleventh would have looked pretty similar and included the same bug!

> What we actually want is a method on `GspMsgElement` that returns what
> we actually want (and is documented as such): the actual size of the
> payload following the whole header. That way there can be no room for
> mistake.

The length calculation can just be a method on GspRpcHeader without requiring
the rest of the fields to need accessor functions as welli though.

Take for example a structure that we will need in future, GspStaticConfigInfo_t.
This currently has 53 fields, although to be fair maybe we only care about half
of them.

My proposal[2] is to convert them into an abstract nova-core specific type,
GspConfigInfo, that contains the specific details Nova users care about.
Following the implied rule that no field accessors shall leak from fw.rs would
require accessors for all fields. That's at least another 78 lines (assuming a
minimal 3 line accessor function) of boiler plate for one FW structure. Multiply
that by many structures and you now have a lot of boiler plate for no real gain
and a very large fw.rs (note my example[2] only contains one field, but I expect
we will need a lot more if not most of them).

[2] - https://github.com/apopple-nvidia/linux/blob/nova-core-unstable/drivers/gpu/nova-core/gsp/commands.rs#L32

> > +
> > +// SAFETY: Padding is explicit and will not contain uninitialized data.
> > +unsafe impl AsBytes for GspRpcHeader {}
> > +
> > +// SAFETY: This struct only contains integer types for which all bit patterns
> > +// are valid.
> > +unsafe impl FromBytes for GspRpcHeader {}
> > +
> > +impl GspRpcHeader {
> > +    pub(crate) fn new(cmd_size: u32, function: u32) -> Self {
> > +        Self {
> > +            // TODO: magic number
> > +            header_version: 0x03000000,
> > +            signature: bindings::NV_VGPU_MSG_SIGNATURE_VALID,
> > +            function,
> > +            // TODO: overflow check?
> > +            length: size_of::<Self>() as u32 + cmd_size,
> > +            rpc_result: 0xffffffff,
> > +            rpc_result_private: 0xffffffff,
> > +            ..Default::default()
> > +        }
> > +    }
> > +}
> > +
> > +pub(crate) type GspMsgElement = bindings::GSP_MSG_QUEUE_ELEMENT;
> 
> Hammering my previous argument a bit more: in `cmdq.rs`, we do the
> following:
> 
>     let msg_header = GspMsgElement::from_bytes(msg_header_slice).unwrap();
> 
> Even though we explicitly used `GspMsgElement`, `msg_header` appears as
> being of type `GSP_MSG_QUEUE_ELEMENT` in LSP. That's super confusing and
> can be avoided by using the newtype pattern.

That's surprising. I would have hoped the LSP would be more helpful than that.

> Lastly, the bindings generated from C headers are supposed to be temporary, and
> we eventually want to replace them with direct IDL-to-Rust bindings. Not
> leaking the C types let us design `fw.rs` as a blueprint for the ideal
> interface we would want to generate - so the limited extra labor that
> comes with wrapping these types would also pay off in that respect.

I agree with this - `fw.rs` should be the blueprint for the ideal bindings
that we could conceivably generate directly from IDL. But this is also why
I'm pushing back against the idea of putting any kind of logic (in the form of
accessors/constructors) into `fw.rs` - I doubt you will ever be able to generate
those from IDL.

For example I doubt it would be possible to generate the length logic, and I'm
even more doubtful we could figure out how to generate the constructors based on
Rust type.

So to summarise where I think we're at we have two proposals for how to
treat/use the generated bindings:

1) Make them all a newtype[3] using a transparent wrapper to make the fields
opaque and accessor functions to implement the logic. This would all be in fw.rs
to stop the C types leaking out.

2) Build a completely new type based on the underlying bindings plus whatever
other logic is needed. The C types could be used within the `gsp` module to
implement the new types but would not leak further than that.

Using GspStaticConfigInfo_t as a concrete example I think option (1) looks like
this:

In fw.rs:

#[repr(transparent)]
pub(crate) struct GspStaticConfigInfo(bindings::GspStaticConfigInfo_t);

impl GspStaticConfigInfo {
    pub(crate) fn new(gsp: &mut Gsp) -> Result<Self> {
        // Create GspStaticConfigInfo sending whatever messages needed to the Gsp.
    }
    
    pub(crate) fn gpu_name(self) -> String {
        ...
    }    

    pub(crate) fn sriov_max_gfid(self) -> u32 {
        self.0.sriovMaxGfid
    }

    pub(crate) fn fb_length(self) -> u64 {
       self.0.fb_length
    }

    pub(crate) fn ats_supported(self> -> bool {
        self.0.bAtsSupported != 0
    }

    // More for every field we need
    ...
}

And Option (2) could look more like this:

In commands.rs:

pub(crate) struct GspStaticConfigInfo {
    pub gpu_name: String,
    pub sriov_max_gfid: u32,
    pub fb_length: u64,
    pub ats_supported: bool,
}

impl GspStaticConfigInfo {
    pub(crate) fn new(gsp: &mut Gsp) -> Result<Self> {
        // Create GspStaticConfigInfo sending whatever messages needed to the Gsp.
    }
}

Peronally I favour option (2) because it is more concise and allows us to
split the code up more. It is also more resilient to FW changes as it doesn't
depend on the underlying representation at all. However based on your feedback
I think you favour option (1) more so I want to make sure I'm understanding
it correctly.

[3] - https://doc.rust-lang.org/rust-by-example/generics/new_types.html

Re: [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

Posted by Alexandre Courbot 2 days, 14 hours ago

On Mon Sep 29, 2025 at 3:19 PM JST, Alistair Popple wrote:
<snip>
>> > +
>> > +/// Number of GSP pages making the Msgq.
>> > +pub(crate) const MSGQ_NUM_PAGES: u32 = 0x3f;
>> > +
>> > +#[repr(C, align(0x1000))]
>> > +#[derive(Debug)]
>> > +struct MsgqData {
>> > +    data: [[u8; GSP_PAGE_SIZE]; MSGQ_NUM_PAGES as usize],
>> > +}
>> > +
>> > +// Annoyingly there is no real equivalent of #define so we're forced to use a
>> > +// literal to specify the alignment above. So check that against the actual GSP
>> > +// page size here.
>> > +static_assert!(align_of::<MsgqData>() == GSP_PAGE_SIZE);
>> > +
>> > +// There is no struct defined for this in the open-gpu-kernel-source headers.
>> > +// Instead it is defined by code in GspMsgQueuesInit().
>> > +#[repr(C)]
>> > +struct Msgq {
>> > +    tx: MsgqTxHeader,
>> > +    rx: MsgqRxHeader,
>> > +    msgq: MsgqData,
>> > +}
>> > +
>> > +#[repr(C)]
>> > +struct GspMem {
>> > +    ptes: [u8; GSP_PAGE_SIZE],
>> > +    cpuq: Msgq,
>> > +    gspq: Msgq,
>> > +}
>> > +
>> > +// SAFETY: These structs don't meet the no-padding requirements of AsBytes but
>> > +// that is not a problem because they are not used outside the kernel.
>> > +unsafe impl AsBytes for GspMem {}
>> > +
>> > +// SAFETY: These structs don't meet the no-padding requirements of FromBytes but
>> > +// that is not a problem because they are not used outside the kernel.
>> > +unsafe impl FromBytes for GspMem {}
>> 
>> These ARE used outside the kernel, since they are shared with the GSP.
>> :) I'd say the reason these are safe is just because we satisfy the
>> requirements of AsBytes and FromBytes:
>
> Yes, and they're actually old safety comments that I missed updating to be the
> same as the rest.
>  
>> - No initialized bytes,
>> - No interior mutability,
>> - All bytes patterns are valid (for some generous definition of
>>   "valid" limited to not triggering UB).
>
> I was under the impression that that's all `unsafe` is really marking - code
> the compiler can't prove won't trigger UB. So I'd assume that's all we'd have to
> prove in safety comments anyway.

A good rule of thumb for writing `SAFETY` statements is to look at the
`Safety` section of the unsafe code we call (here `FromBytes`), and
justify why our calling code satisfies each of these.

>> > +    dev: ARef<device::Device>,
>> > +    seq: u32,
>> > +    gsp_mem: DmaGspMem,
>> > +    pub _nr_ptes: u32,
>> > +}
>> > +
>> > +impl GspCmdq {
>> > +    pub(crate) fn new(dev: &device::Device<device::Bound>) -> Result<GspCmdq> {
>> > +        let gsp_mem = DmaGspMem::new(dev)?;
>> > +        let nr_ptes = size_of::<GspMem>() >> GSP_PAGE_SHIFT;
>> > +        build_assert!(nr_ptes * size_of::<u64>() <= GSP_PAGE_SIZE);
>> > +
>> > +        Ok(GspCmdq {
>> > +            dev: dev.into(),
>> > +            seq: 0,
>> > +            gsp_mem,
>> > +            _nr_ptes: nr_ptes as u32,
>> > +        })
>> > +    }
>> > +
>> > +    fn calculate_checksum<T: Iterator<Item = u8>>(it: T) -> u32 {
>> > +        let sum64 = it
>> > +            .enumerate()
>> > +            .map(|(idx, byte)| (((idx % 8) * 8) as u32, byte))
>> > +            .fold(0, |acc, (rol, byte)| acc ^ u64::from(byte).rotate_left(rol));
>> > +
>> > +        ((sum64 >> 32) as u32) ^ (sum64 as u32)
>> > +    }
>> > +
>> > +    #[expect(unused)]
>> > +    pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
>> > +        &mut self,
>> > +        bar: &Bar0,
>> > +        payload_size: usize,
>> > +        init: impl FnOnce(&mut M, SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result,
>> 
>> This works pretty well for in-place initialization.
>> 
>> There a couple of small drawbacks though: `M` must implement `FromBytes`
>> even though we only send it, and (more serious) there is no guarantee
>
> Yes, that detail annoyed me slightly too.
>
>> that `init` will fully initialize the command - a forgetful caller can
>> omit some of its fields, or even the whole structure, and in that case
>> we will send a command with what happened to be at that position of the
>> queue at that time.
>
> Good timing as I was just looking to see if there wasn't some way of ensuring
> this happened, or at least was much more explicit than what we currently do.
>
>> I think this is a good case for using the `Init` trait: it's like
>> `PinInit`, but without the `Pin`, and it ensures that the whole argument
>> is initialized. So this method would change into something like:
>> 
>>     pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
>>         &mut self,
>>         bar: &Bar0,
>>         payload_size: usize,
>>         init: impl Init<M, Error>,
>>         init_sbuf: impl FnOnce(SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result,
>> 
>> This also allows us to drop the `FromBytes` requirement on
>> `GspCommandToGsp`! But also requires us to use `unsafe` to call
>> `Init::__init` on the pointer to the command. I think it's worth it
>> though, as this removes the risk of sending partially-uninitialized
>> commands.
>
> Agree on it being worth the unsafe call, especially because it is unsafe if the
> caller doesn't do what it's supposed to regardless. But how does this contrast
> with `MaybeUninit`? I was wondering if something like this would work:
>
>     pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
>         &mut self,
>         bar: &Bar0,
>         payload_size: usize,
>         init: impl FnOnce(MaybeUninit<&mut M>, SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result<&M>,
>
> Note I'm not sure we could actually make that work as I haven't tried it yet. I
> suspect I've missed something with lifetimes there :-)

The difference with using `MaybeUninit` is that the unsafe code would
need to be *within* each `init` argument - IOW, within each caller. It
also does not provide any guarantee that the whole `M` is initialized,
we have to trust each caller for doing that properly.

I've tried implementing using `Init` and it works even better than I
thought. Each type actually automatically implement `Init` on itself,
which means that instead of implicitly calling init!() as in

  cmdq.send_gsp_command(bar, init!(MyCommand {
    foo: ...,
  }), ...);

You can just pass an object as a stack argument and it will also work!

  cmdq.send_gsp_command(bar, MyCommand {
    foo: ...,
  }, ...);

Of course in this case the `MyCommand` instance is first passed on the
stack before being copied into the command-queue, but for small commands
this is pretty convenient. Larger ones can use the `init!` syntax for
full in-place initialization.

>
>
>> Then there is what to do with the `SBuffer`. I'd like to think a bit
>> more about this, as not all commands require it, and those that do
>> typically send arrays of particular types. I think we should be able to
>> use the type system to have more control over this, but let's keep that
>> for the next revision.
>
> Sure. You are correct that not all (most?) commands don't need it, and it's a
> shame we don't have proper bindings for them anyway. Although in some cases that
> still wouldn't work anyway (eg. for registry keys) where it really is just a
> packed variable number of variable length strings. Not sure what a Rust native
> representation of that would be.

For now the best I could think of is to have not one but two traits for
GSP commands: one for commands with payload and another one for commands
without payload. And two variants of `send_gsp_command` for each case.
That way we can at least guarantee that we won't pass a payload where we
shouldn't, or forget one where we should.

I got a prototype of this working and along with the `Init` thing it
feels pretty right.

>
>> > +    ) -> Result {
>> > +        // TODO: a method that extracts the regions for a given command?
>> > +        // ... and another that reduces the region to a given number of bytes!
>> > +        let driver_area = self.gsp_mem.driver_write_area();
>> > +        let free_tx_pages = driver_area.0.len() + driver_area.1.len();
>> > +
>> > +        // Total size of the message, including the headers, command, and optional payload.
>> > +        let msg_size = size_of::<GspMsgElement>() + size_of::<M>() + payload_size;
>> > +        if free_tx_pages < msg_size.div_ceil(GSP_PAGE_SIZE) {
>> > +            return Err(EAGAIN);
>> > +        }
>> > +
>> > +        let (msg_header, cmd, payload_1, payload_2) = {
>> > +            // TODO: find an alternative to as_flattened_mut()
>> 
>> I think we can use it if we enable the "slice_flatten" feature (stable
>> since 1.80, introduced in 1.67).
>
> Ok, will try that. Hopefully explicitly using the feature also makes clippy
> happy but if not I guess we can leave the over-rides below.

I think you will also need to explicitly enable the feature somewhere -
for the kernel crate it is in `rust/kernel/lib.rs`, but Nova being a
different crate I am not sure where we are supposed to do it...

<snip>
>> > +
>> > +            (msg_header, cmd, payload_1, payload_2)
>> > +        };
>> > +
>> > +        let sbuffer = SBuffer::new_writer([&mut payload_1[..], &mut payload_2[..]]);
>> > +        init(cmd, sbuffer)?;
>> > +
>> > +        *msg_header = GspMsgElement::new(self.seq, size_of::<M>() + payload_size, M::FUNCTION);
>> > +        msg_header.checkSum = GspCmdq::calculate_checksum(SBuffer::new_reader([
>> > +            msg_header.as_bytes(),
>> > +            cmd.as_bytes(),
>> > +            payload_1,
>> > +            payload_2,
>> > +        ]));
>> > +
>> > +        let rpc_header = &msg_header.rpc;
>> > +        dev_info!(
>> > +            &self.dev,
>> > +            "GSP RPC: send: seq# {}, function=0x{:x} ({}), length=0x{:x}\n",
>> > +            self.seq,
>> > +            rpc_header.function,
>> > +            decode_gsp_function(rpc_header.function),
>> > +            rpc_header.length,
>> > +        );
>> > +
>> > +        let elem_count = msg_header.elemCount;
>> > +        self.seq += 1;
>> > +        self.gsp_mem.advance_cpu_write_ptr(elem_count);
>> > +        NV_PGSP_QUEUE_HEAD::default().set_address(0).write(bar);
>> 
>> I'm a bit surprised that we always write `0` here, can we document this
>> behavior, maybe in the definition of `NV_PGSP_QUEUE_HEAD`?
>
> I was also suprised the value doesn't matter but this is what both Nouveau and
> OpenRM do. I guess it just triggers some kind of notification on the GSP and the
> actual queue head is read from the shared memory pointers.

In that case I think we should definitely document this - or even
better, if it nevers makes sense to write anything else than 0 to
`NV_PGSP_QUEUE_HEAD`, to add an explicit method (e.g. `kick`?) that does
this for us. Can you try and go to the bottom of this?

>> > +
>> > +            // Can't fail because msg_slice will always be
>> > +            // size_of::<GspMsgElement>() bytes long by the above split.
>> > +            let msg_header = GspMsgElement::from_bytes(msg_header_slice).unwrap();
>> > +            if msg_header.rpc.length < size_of::<M>() as u32 {
>> > +                return None;
>> > +            }
>> 
>> If we have a message in the queue and this condition doesn't hold, I
>> don't think we can hope that it will in a further iteration - this
>> should be an error.
>
> I was thinking messages may get partially written and we'd see this state, but
> you're right - the shared pointers shouldn't be updated until the entire message
> is written so this should be an error.
>
> That will require the `wait_on_result()` function, because `wait_on()` doesn't
> allow the closure to return a failure. We could just return Option<Result<_>>
> from the closure but that's a bit gross, because wait_on() would then return
> Result<Option<Result<_>>>.

We can now use `read_poll_timeout` (from `kernel::io::poll`) which
separates the acquisition of the state and the test of the condition
into two separate closures - the first one returns a `Result`, and its
error is propagated as-is, which is what we want in this case.

Actually this reminded me that I should send a patch to remove `wait_on`
altogether now that we have a better alternative! :)

>> > diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
>> > index 7f4fe684ddaf..2e4255301e58 100644
>> > --- a/drivers/gpu/nova-core/gsp/fw.rs
>> > +++ b/drivers/gpu/nova-core/gsp/fw.rs
>> > @@ -15,7 +15,9 @@
>> >  use crate::firmware::gsp::GspFirmware;
>> >  use crate::gpu::Chipset;
>> >  use crate::gsp;
>> > +use crate::gsp::cmdq::MSGQ_NUM_PAGES;
>> 
>> I guess the number of pages in the message queue is firmware-dependent,
>> so would it make sense to move its declaration to this module?
>
> It's not firmware dependent - it appears we can set it to whatever we reasonably
> want (for some value of "reasonable) so long as we allocate the right buffer
> sizes, etc. I noticed and tested this a while back when removing some magic
> numbers.
>
> That said 0x3f is what I chose because that's what Nouveau and I think OpenRM
> use.

Ah, that's great and makes perfect sense! So I suppose the
`pageTableEntryCount` member of the `MESSAGE_QUEUE_INIT_ARGUMENT` is to
pass the total size of the message queue. I'm starting to think about a
version of the command queue where its size would be a generic argument,
but let's keep that for a later patch, if we ever want it at all. :)

>
>> >  use crate::gsp::FbLayout;
>> > +use crate::gsp::GSP_PAGE_SIZE;
>> >  
>> >  /// Dummy type to group methods related to heap parameters for running the GSP firmware.
>> >  pub(crate) struct GspFwHeapParams(());
>> > @@ -159,6 +161,37 @@ pub(crate) fn new(gsp_firmware: &GspFirmware, fb_layout: &FbLayout) -> Self {
>> >      // GSP firmware constants
>> >      GSP_FW_WPR_META_MAGIC,
>> >      GSP_FW_WPR_META_REVISION,
>> > +
>> > +    // GSP events
>> > +    NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
>> > +    NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
>> > +    NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD,
>> > +    NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER,
>> > +    NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED,
>> > +    NV_VGPU_MSG_EVENT_OS_ERROR_LOG,
>> > +    NV_VGPU_MSG_EVENT_POST_EVENT,
>> > +    NV_VGPU_MSG_EVENT_RC_TRIGGERED,
>> > +    NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
>> > +
>> > +    // GSP function calls
>> > +    NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA,
>> > +    NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA,
>> > +    NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE,
>> > +    NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY,
>> > +    NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT,
>> > +    NV_VGPU_MSG_FUNCTION_ALLOC_ROOT,
>> > +    NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA,
>> > +    NV_VGPU_MSG_FUNCTION_FREE,
>> > +    NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO,
>> > +    NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
>> > +    NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU,
>> > +    NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
>> > +    NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO,
>> > +    NV_VGPU_MSG_FUNCTION_LOG,
>> > +    NV_VGPU_MSG_FUNCTION_MAP_MEMORY,
>> > +    NV_VGPU_MSG_FUNCTION_NOP,
>> > +    NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
>> > +    NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
>> >  };
>> >  
>> >  #[repr(transparent)]
>> > @@ -197,3 +230,86 @@ fn id8(name: &str) -> u64 {
>> >          })
>> >      }
>> >  }
>> > +
>> > +pub(crate) type MsgqTxHeader = bindings::msgqTxHeader;
>> 
>> This should be wrapped into a newtype that provides the exact set of
>> features required by the gsp module, like what is done for
>> `LibosMemoryRegionInitArgument`. For this type we just need two things:
>> return the `writePtr`, and advance it by a given value. That's all
>> the parent module needs to see.
>
> Except that doesn't actually work, because we need to use DMA access methods on
> bindings::msgqTxHeader which we can't do from an accessor.

If the accessor uses `read_volatile` and `write_volatile` then I guess
it should work? These are unsafe though, so this is not ideal, but at
least they would be contained into the `fw` module and limited to one
getter and one setter.

>
>> By just aliasing this type, we make all its members visible to the `gsp`
>> module. This increases its dependency on a given firmware version,
>> carries a risk that the GSP module will mess with what it is not
>> supposed to, and introduces inconsistency in how we abstract the
>> firmware types - some are wrapped, others are not. Let's be consistent
>> and make all bindings completely opaque outside of `fw.rs`.
>
> I can't see how accessors in `fw.rs` for every firmware binding we use is useful
> though[1]? In what way does that reduce dependency on a given firmware version?
> I can't see how it isn't just adding boiler plate for every struct field we want
> to access.

While the main point of this remains hiding stuff other modules don't
need to see, it also allows us to choose stable method names that are
independent from the firmware.

>
> We already have higher level abstractions (eg. DmaGspMem) to ensure the GSP
> won't mess with what it isn't supposed to, and these higher level abstractions
> also provide more options to reduce dependency on a given firmware version as
> the logic isn't as tightly coupled to the binding representation. It also allows
> us to access the fields using the proper DMA access methods.

`DmaGspMem` is a special case - it is in shared memory and requires
volatile reads and writes at runtime. And yes it also provides a
higher-level abstraction to access the driver's readable and writable
areas of the ring buffer - I think these are at the right place in
`cmdq.rs`.

But we also have other types that we build on the stack, even if we move
them to a DmaObject afterwards. `fw.rs` should provide constructors and
simple primitives for these, on top of which the command queue logic can
be built. Right now we are initializing their fields in `cmdq` and I
think we can agree this is not very portable.

To clarify further, our sole business with half of the bindings types is
initializing them to some value that makes sense and put it somewhere.
For this we only need a constructor. We don't need to know the names of
their fields, and we don't want to know.

Other bindings types are message-related, and we need to access their
contents. You discuss these in more detail below so let's address them
there.

Then the last kind are the types related to the command queue runtime
operation - namely the Rx/Tx headers and their read/write pointers, and
the message headers that we need to read and write. For the former we
only need to poke at their pointers using volatile accesses, which is
two methods that I propose to have for that purpose. For the latter we
only need to access a handful of fields, some of which can be tricky
(like `length`). In any case, we are talking about a handful of methods
here to keep the abstraction opaque.

More on messages below as you make an interesting point on them.

> [1] - Except for maybe removing some nonRustySnakeyNames
>
>> > +
>> > +// SAFETY: Padding is explicit and will not contain uninitialized data.
>> > +unsafe impl AsBytes for MsgqTxHeader {}
>> > +
>> > +impl MsgqTxHeader {
>> > +    pub(crate) fn new(msgq_size: u32, rx_hdr_offset: u32) -> Self {
>> > +        Self {
>> > +            version: 0,
>> > +            size: msgq_size,
>> > +            msgSize: GSP_PAGE_SIZE as u32,
>> > +            msgCount: MSGQ_NUM_PAGES,
>> > +            writePtr: 0,
>> > +            flags: 1,
>> > +            rxHdrOff: rx_hdr_offset,
>> > +            entryOff: GSP_PAGE_SIZE as u32,
>> > +        }
>> > +    }
>> > +}
>> > +
>> > +/// RX header for setting up a message queue with the GSP.
>> > +///
>> > +/// # Invariants
>> > +///
>> > +/// [`Self::read_ptr`] is guaranteed to return a value in the range `0..NUM_PAGES`.
>> > +pub(crate) type MsgqRxHeader = bindings::msgqRxHeader;
>> > +
>> > +// SAFETY: Padding is explicit and will not contain uninitialized data.
>> > +unsafe impl AsBytes for MsgqRxHeader {}
>> > +
>> > +impl MsgqRxHeader {
>> > +    pub(crate) fn new() -> Self {
>> > +        Default::default()
>> > +    }
>> > +}
>> > +
>> > +pub(crate) type GspRpcHeader = bindings::rpc_message_header_v;
>> 
>> This type too is another good illustration of why we should make our
>> bindings opaque. In `cmdq.rs` we access `GspRpcHeader::length` multiple
>> times, ignoring the fact that it also includes the size of the RPC
>> header itself - not just what comes after it! Since it doesn't come with
>> any documentation, we can be forgiven for assuming the obvious - that it
>> is just the size of what follows, but it is not.
>
> This has nothing to do with the bindings being opaque or not though. I can
> almost gurantee that after writing ten brain dead accessor functions that the
> eleventh would have looked pretty similar and included the same bug!

It does very much have to do with making the binding opaque. If it is
not opaque, and the original `length` is accessible, what is preventing
code from using it and repeating the same mistake again?

I don't quite get where the "ten brain dead accessor functions" would
come from. I propose we write one, that returns a value that makes sense
(contrary to the original member), and most importantly is documented so
users can know exactly what it does. Tell me what is brain-dead about
that? This very patch misused the `length` member on at least two
different occasions, isn't the benefit clear?

>
>> What we actually want is a method on `GspMsgElement` that returns what
>> we actually want (and is documented as such): the actual size of the
>> payload following the whole header. That way there can be no room for
>> mistake.
>
> The length calculation can just be a method on GspRpcHeader without requiring
> the rest of the fields to need accessor functions as welli though.
>
> Take for example a structure that we will need in future, GspStaticConfigInfo_t.
> This currently has 53 fields, although to be fair maybe we only care about half
> of them.
>
> My proposal[2] is to convert them into an abstract nova-core specific type,
> GspConfigInfo, that contains the specific details Nova users care about.
> Following the implied rule that no field accessors shall leak from fw.rs would
> require accessors for all fields. That's at least another 78 lines (assuming a
> minimal 3 line accessor function) of boiler plate for one FW structure. Multiply
> that by many structures and you now have a lot of boiler plate for no real gain
> and a very large fw.rs (note my example[2] only contains one field, but I expect
> we will need a lot more if not most of them).
>
> [2] - https://github.com/apopple-nvidia/linux/blob/nova-core-unstable/drivers/gpu/nova-core/gsp/commands.rs#L32

I am not opposed to having a nova-core specific type, and in the case of
GspStaticConfigInfo_t is certainly seems to make sense. Actually it even
does what we want here, since with that design this type should not be
visible outside of the `fw` module, right?

Now to be fair, one needs to also consider that this new
GspStaticConfigInfo type will also need its own boilerplate-y code to
initialize itself from the original GspStaticConfigInfo_t, field by
field. But in the end we would have a type that provides us with the
abstraction that we need for the GSP module, while making the bindings
type invisible - this fits the bill, and looks arguably nicer than
having too many accessors so why not!

One can even argue that the wrapping pattern is just a simpler version
of this - we create a new type to hide the bindings' internals and
provide us with the right interface. It is just that for many
commands/messages the payload is so small that it is actually less code
to just wrap it and provide a few utility methods. GspStaticConfigInfo_t
is certainly an exception.

>
>> > +
>> > +// SAFETY: Padding is explicit and will not contain uninitialized data.
>> > +unsafe impl AsBytes for GspRpcHeader {}
>> > +
>> > +// SAFETY: This struct only contains integer types for which all bit patterns
>> > +// are valid.
>> > +unsafe impl FromBytes for GspRpcHeader {}
>> > +
>> > +impl GspRpcHeader {
>> > +    pub(crate) fn new(cmd_size: u32, function: u32) -> Self {
>> > +        Self {
>> > +            // TODO: magic number
>> > +            header_version: 0x03000000,
>> > +            signature: bindings::NV_VGPU_MSG_SIGNATURE_VALID,
>> > +            function,
>> > +            // TODO: overflow check?
>> > +            length: size_of::<Self>() as u32 + cmd_size,
>> > +            rpc_result: 0xffffffff,
>> > +            rpc_result_private: 0xffffffff,
>> > +            ..Default::default()
>> > +        }
>> > +    }
>> > +}
>> > +
>> > +pub(crate) type GspMsgElement = bindings::GSP_MSG_QUEUE_ELEMENT;
>> 
>> Hammering my previous argument a bit more: in `cmdq.rs`, we do the
>> following:
>> 
>>     let msg_header = GspMsgElement::from_bytes(msg_header_slice).unwrap();
>> 
>> Even though we explicitly used `GspMsgElement`, `msg_header` appears as
>> being of type `GSP_MSG_QUEUE_ELEMENT` in LSP. That's super confusing and
>> can be avoided by using the newtype pattern.
>
> That's surprising. I would have hoped the LSP would be more helpful than that.
>
>> Lastly, the bindings generated from C headers are supposed to be temporary, and
>> we eventually want to replace them with direct IDL-to-Rust bindings. Not
>> leaking the C types let us design `fw.rs` as a blueprint for the ideal
>> interface we would want to generate - so the limited extra labor that
>> comes with wrapping these types would also pay off in that respect.
>
> I agree with this - `fw.rs` should be the blueprint for the ideal bindings
> that we could conceivably generate directly from IDL. But this is also why
> I'm pushing back against the idea of putting any kind of logic (in the form of
> accessors/constructors) into `fw.rs` - I doubt you will ever be able to generate
> those from IDL.

`fw.rs` should contain the minimum required to build the logic on top of
it, agreed. Doing so might require having another type on top of the one
defined in `fw` to implement said logic - if you remember I did
something like that for MsgqTxHeader and you thought that was too
convulated. :)

>
> For example I doubt it would be possible to generate the length logic, and I'm
> even more doubtful we could figure out how to generate the constructors based on
> Rust type.

Where to put the length logic depends on how our firmware interface is
defined. If we agree that `length` is "The size of the RPC header, the
following command, and its payload but oh wait we don't include the rest
of the message element which includes the RPC header so don't forget to
add it" then we can just return or use the member as-is. But it's a bad
and error-prone interface. We can document it, and that in itself would
be an improvement, but it would still require some mental gymnastic to
use and is just a giant footgun.

What is safe and useful is the length of the entire message, header,
command and payload included, because that's how we use it in the code.
It's also trivial to produce, and if we do it here we only do that
operation on one place instead of at each calling site. So yes, it
belongs here, as part of the firmware abstraction, and if we have to
write it by hand then so be it.

The same goes for the constructor. Of course there will be some manual
work needed to include a new incompatible firmware version. But if you
don't do it here, you will have to do that work somewhere else anyway -
probably `gsp` or `cmdq`, except without the elegance of a method
containing that code. How is that better?

>
> So to summarise where I think we're at we have two proposals for how to
> treat/use the generated bindings:
>
> 1) Make them all a newtype[3] using a transparent wrapper to make the fields
> opaque and accessor functions to implement the logic. This would all be in fw.rs
> to stop the C types leaking out.
>
> 2) Build a completely new type based on the underlying bindings plus whatever
> other logic is needed. The C types could be used within the `gsp` module to
> implement the new types but would not leak further than that.
>
> Using GspStaticConfigInfo_t as a concrete example I think option (1) looks like
> this:
>
> In fw.rs:
>
> #[repr(transparent)]
> pub(crate) struct GspStaticConfigInfo(bindings::GspStaticConfigInfo_t);
>
> impl GspStaticConfigInfo {
>     pub(crate) fn new(gsp: &mut Gsp) -> Result<Self> {
>         // Create GspStaticConfigInfo sending whatever messages needed to the Gsp.
>     }
>     
>     pub(crate) fn gpu_name(self) -> String {
>         ...
>     }    
>
>     pub(crate) fn sriov_max_gfid(self) -> u32 {
>         self.0.sriovMaxGfid
>     }
>
>     pub(crate) fn fb_length(self) -> u64 {
>        self.0.fb_length
>     }
>
>     pub(crate) fn ats_supported(self> -> bool {
>         self.0.bAtsSupported != 0
>     }
>
>     // More for every field we need
>     ...
> }
>
> And Option (2) could look more like this:
>
> In commands.rs:
>
> pub(crate) struct GspStaticConfigInfo {
>     pub gpu_name: String,
>     pub sriov_max_gfid: u32,
>     pub fb_length: u64,
>     pub ats_supported: bool,
> }
>
> impl GspStaticConfigInfo {
>     pub(crate) fn new(gsp: &mut Gsp) -> Result<Self> {
>         // Create GspStaticConfigInfo sending whatever messages needed to the Gsp.
>     }
> }

To make a fair comparison your Option (2) should include this:

impl TryFrom<GspStaticConfigInfo_t> for GspStaticConfigInfo {
  type Error = kernel::Error;

  fn try_from(value: GspStaticConfigInfo_t) -> Result<Self, Self::Error>
  {
    let gpu_name = // extract gpu string from `value`.
    ...

    Ok(Self {
      gpu_name,
      sriov_max_fgid: value.sriovMaxGid,
      fb_length: value.fb_length,
      ats_supported: value.bAtsSupported != 0,
      // More for every field we need
    })
  }
}

Because you are not only going to create this type from the `Gsp`, but
also from a message you have received as per the code.

>
> Peronally I favour option (2) because it is more concise and allows us to
> split the code up more. It is also more resilient to FW changes as it doesn't
> depend on the underlying representation at all. However based on your feedback
> I think you favour option (1) more so I want to make sure I'm understanding
> it correctly.

My constraints are that I want the binding types to be opaque and
abstracted in a way that is portable and makes sense for the higher
levels. If Option (2) satisfies these constraints (and in the case of
GspStaticConfigInfo_t I believe it does), then I'm fine with it where is
makes sense, and for large types with many fields I think we can make
the case that it does.

I also think this really only applies to messages received from the GSP,
as commands NEED to be laid out in-place in the format that the firmware
expects, so there is little room for using a temporary type where we can
have an in-place constructor that is just as convenient and will use
less stack space. Still, in theory we could also use that technique for
e.g. very complex commands that would benefit from the constructor
pattern.

But for messages from the GSP, why not. There is an additional cost to
consider related to the conversion from one type to the other, and that
will require extra complexity like an associated type in the
MessageFromGsp trait, so for very simple messages a wrapper might be a
better fit, but otherwise, I think it's fair game.

Re: [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

Posted by Alistair Popple 2 days, 4 hours ago

On 2025-09-30 at 00:34 +1000, Alexandre Courbot <acourbot@nvidia.com> wrote...
> On Mon Sep 29, 2025 at 3:19 PM JST, Alistair Popple wrote:
> <snip>
> >> > +
> >> > +/// Number of GSP pages making the Msgq.
> >> > +pub(crate) const MSGQ_NUM_PAGES: u32 = 0x3f;
> >> > +
> >> > +#[repr(C, align(0x1000))]
> >> > +#[derive(Debug)]
> >> > +struct MsgqData {
> >> > +    data: [[u8; GSP_PAGE_SIZE]; MSGQ_NUM_PAGES as usize],
> >> > +}
> >> > +
> >> > +// Annoyingly there is no real equivalent of #define so we're forced to use a
> >> > +// literal to specify the alignment above. So check that against the actual GSP
> >> > +// page size here.
> >> > +static_assert!(align_of::<MsgqData>() == GSP_PAGE_SIZE);
> >> > +
> >> > +// There is no struct defined for this in the open-gpu-kernel-source headers.
> >> > +// Instead it is defined by code in GspMsgQueuesInit().
> >> > +#[repr(C)]
> >> > +struct Msgq {
> >> > +    tx: MsgqTxHeader,
> >> > +    rx: MsgqRxHeader,
> >> > +    msgq: MsgqData,
> >> > +}
> >> > +
> >> > +#[repr(C)]
> >> > +struct GspMem {
> >> > +    ptes: [u8; GSP_PAGE_SIZE],
> >> > +    cpuq: Msgq,
> >> > +    gspq: Msgq,
> >> > +}
> >> > +
> >> > +// SAFETY: These structs don't meet the no-padding requirements of AsBytes but
> >> > +// that is not a problem because they are not used outside the kernel.
> >> > +unsafe impl AsBytes for GspMem {}
> >> > +
> >> > +// SAFETY: These structs don't meet the no-padding requirements of FromBytes but
> >> > +// that is not a problem because they are not used outside the kernel.
> >> > +unsafe impl FromBytes for GspMem {}
> >> 
> >> These ARE used outside the kernel, since they are shared with the GSP.
> >> :) I'd say the reason these are safe is just because we satisfy the
> >> requirements of AsBytes and FromBytes:
> >
> > Yes, and they're actually old safety comments that I missed updating to be the
> > same as the rest.
> >  
> >> - No initialized bytes,
> >> - No interior mutability,
> >> - All bytes patterns are valid (for some generous definition of
> >>   "valid" limited to not triggering UB).
> >
> > I was under the impression that that's all `unsafe` is really marking - code
> > the compiler can't prove won't trigger UB. So I'd assume that's all we'd have to
> > prove in safety comments anyway.
> 
> A good rule of thumb for writing `SAFETY` statements is to look at the
> `Safety` section of the unsafe code we call (here `FromBytes`), and
> justify why our calling code satisfies each of these.

Oh that's usually where I get my inspiration for them from. This was more a
statement of my understanding of the language design, that unsafe is mostly
about UB rather than overall struct validity. After all I think it is still
possible to have safe code initialise fields of a struct in "invalid" ways
(depending on the struct).

> >> > +    dev: ARef<device::Device>,
> >> > +    seq: u32,
> >> > +    gsp_mem: DmaGspMem,
> >> > +    pub _nr_ptes: u32,
> >> > +}
> >> > +
> >> > +impl GspCmdq {
> >> > +    pub(crate) fn new(dev: &device::Device<device::Bound>) -> Result<GspCmdq> {
> >> > +        let gsp_mem = DmaGspMem::new(dev)?;
> >> > +        let nr_ptes = size_of::<GspMem>() >> GSP_PAGE_SHIFT;
> >> > +        build_assert!(nr_ptes * size_of::<u64>() <= GSP_PAGE_SIZE);
> >> > +
> >> > +        Ok(GspCmdq {
> >> > +            dev: dev.into(),
> >> > +            seq: 0,
> >> > +            gsp_mem,
> >> > +            _nr_ptes: nr_ptes as u32,
> >> > +        })
> >> > +    }
> >> > +
> >> > +    fn calculate_checksum<T: Iterator<Item = u8>>(it: T) -> u32 {
> >> > +        let sum64 = it
> >> > +            .enumerate()
> >> > +            .map(|(idx, byte)| (((idx % 8) * 8) as u32, byte))
> >> > +            .fold(0, |acc, (rol, byte)| acc ^ u64::from(byte).rotate_left(rol));
> >> > +
> >> > +        ((sum64 >> 32) as u32) ^ (sum64 as u32)
> >> > +    }
> >> > +
> >> > +    #[expect(unused)]
> >> > +    pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
> >> > +        &mut self,
> >> > +        bar: &Bar0,
> >> > +        payload_size: usize,
> >> > +        init: impl FnOnce(&mut M, SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result,
> >> 
> >> This works pretty well for in-place initialization.
> >> 
> >> There a couple of small drawbacks though: `M` must implement `FromBytes`
> >> even though we only send it, and (more serious) there is no guarantee
> >
> > Yes, that detail annoyed me slightly too.
> >
> >> that `init` will fully initialize the command - a forgetful caller can
> >> omit some of its fields, or even the whole structure, and in that case
> >> we will send a command with what happened to be at that position of the
> >> queue at that time.
> >
> > Good timing as I was just looking to see if there wasn't some way of ensuring
> > this happened, or at least was much more explicit than what we currently do.
> >
> >> I think this is a good case for using the `Init` trait: it's like
> >> `PinInit`, but without the `Pin`, and it ensures that the whole argument
> >> is initialized. So this method would change into something like:
> >> 
> >>     pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
> >>         &mut self,
> >>         bar: &Bar0,
> >>         payload_size: usize,
> >>         init: impl Init<M, Error>,
> >>         init_sbuf: impl FnOnce(SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result,
> >> 
> >> This also allows us to drop the `FromBytes` requirement on
> >> `GspCommandToGsp`! But also requires us to use `unsafe` to call
> >> `Init::__init` on the pointer to the command. I think it's worth it
> >> though, as this removes the risk of sending partially-uninitialized
> >> commands.
> >
> > Agree on it being worth the unsafe call, especially because it is unsafe if the
> > caller doesn't do what it's supposed to regardless. But how does this contrast
> > with `MaybeUninit`? I was wondering if something like this would work:
> >
> >     pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
> >         &mut self,
> >         bar: &Bar0,
> >         payload_size: usize,
> >         init: impl FnOnce(MaybeUninit<&mut M>, SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result<&M>,
> >
> > Note I'm not sure we could actually make that work as I haven't tried it yet. I
> > suspect I've missed something with lifetimes there :-)
> 
> The difference with using `MaybeUninit` is that the unsafe code would
> need to be *within* each `init` argument - IOW, within each caller. It
> also does not provide any guarantee that the whole `M` is initialized,
> we have to trust each caller for doing that properly.

I suggested it because I was actually thinking that would be a feature :-) After
all callers are the ones doing the initialisation and so have to be trusted to
do that, and the unsafe call to `assume_init()` would be the indication they
have to do something right.

But I'm not particularly tied to using that, it was an idle thought/question. I
do wonder how Init gets around callers having to guarantee the whole of `M` is
initialised, so will have to go study it a bit more closely.

> I've tried implementing using `Init` and it works even better than I
> thought. Each type actually automatically implement `Init` on itself,
> which means that instead of implicitly calling init!() as in
> 
>   cmdq.send_gsp_command(bar, init!(MyCommand {
>     foo: ...,
>   }), ...);
> 
> You can just pass an object as a stack argument and it will also work!
> 
>   cmdq.send_gsp_command(bar, MyCommand {
>     foo: ...,
>   }, ...);

Nice!

> Of course in this case the `MyCommand` instance is first passed on the
> stack before being copied into the command-queue, but for small commands
> this is pretty convenient. Larger ones can use the `init!` syntax for
> full in-place initialization.

Yeah I think that's fine. Most commands are small and should fit on the stack
anyway.

> >
> >
> >> Then there is what to do with the `SBuffer`. I'd like to think a bit
> >> more about this, as not all commands require it, and those that do
> >> typically send arrays of particular types. I think we should be able to
> >> use the type system to have more control over this, but let's keep that
> >> for the next revision.
> >
> > Sure. You are correct that not all (most?) commands don't need it, and it's a
> > shame we don't have proper bindings for them anyway. Although in some cases that
> > still wouldn't work anyway (eg. for registry keys) where it really is just a
> > packed variable number of variable length strings. Not sure what a Rust native
> > representation of that would be.
> 
> For now the best I could think of is to have not one but two traits for
> GSP commands: one for commands with payload and another one for commands
> without payload. And two variants of `send_gsp_command` for each case.
> That way we can at least guarantee that we won't pass a payload where we
> shouldn't, or forget one where we should.

We can't guarantee that (at least with the current bindings) because we
still need to manually mark up the types with the correct traits. But if the
majority of commands don't have a payload I agree it would provide some nice
simplifications for callers.

> I got a prototype of this working and along with the `Init` thing it
> feels pretty right.
> 
> >
> >> > +    ) -> Result {
> >> > +        // TODO: a method that extracts the regions for a given command?
> >> > +        // ... and another that reduces the region to a given number of bytes!
> >> > +        let driver_area = self.gsp_mem.driver_write_area();
> >> > +        let free_tx_pages = driver_area.0.len() + driver_area.1.len();
> >> > +
> >> > +        // Total size of the message, including the headers, command, and optional payload.
> >> > +        let msg_size = size_of::<GspMsgElement>() + size_of::<M>() + payload_size;
> >> > +        if free_tx_pages < msg_size.div_ceil(GSP_PAGE_SIZE) {
> >> > +            return Err(EAGAIN);
> >> > +        }
> >> > +
> >> > +        let (msg_header, cmd, payload_1, payload_2) = {
> >> > +            // TODO: find an alternative to as_flattened_mut()
> >> 
> >> I think we can use it if we enable the "slice_flatten" feature (stable
> >> since 1.80, introduced in 1.67).
> >
> > Ok, will try that. Hopefully explicitly using the feature also makes clippy
> > happy but if not I guess we can leave the over-rides below.
> 
> I think you will also need to explicitly enable the feature somewhere -
> for the kernel crate it is in `rust/kernel/lib.rs`, but Nova being a
> different crate I am not sure where we are supposed to do it...
> 
> <snip>
> >> > +
> >> > +            (msg_header, cmd, payload_1, payload_2)
> >> > +        };
> >> > +
> >> > +        let sbuffer = SBuffer::new_writer([&mut payload_1[..], &mut payload_2[..]]);
> >> > +        init(cmd, sbuffer)?;
> >> > +
> >> > +        *msg_header = GspMsgElement::new(self.seq, size_of::<M>() + payload_size, M::FUNCTION);
> >> > +        msg_header.checkSum = GspCmdq::calculate_checksum(SBuffer::new_reader([
> >> > +            msg_header.as_bytes(),
> >> > +            cmd.as_bytes(),
> >> > +            payload_1,
> >> > +            payload_2,
> >> > +        ]));
> >> > +
> >> > +        let rpc_header = &msg_header.rpc;
> >> > +        dev_info!(
> >> > +            &self.dev,
> >> > +            "GSP RPC: send: seq# {}, function=0x{:x} ({}), length=0x{:x}\n",
> >> > +            self.seq,
> >> > +            rpc_header.function,
> >> > +            decode_gsp_function(rpc_header.function),
> >> > +            rpc_header.length,
> >> > +        );
> >> > +
> >> > +        let elem_count = msg_header.elemCount;
> >> > +        self.seq += 1;
> >> > +        self.gsp_mem.advance_cpu_write_ptr(elem_count);
> >> > +        NV_PGSP_QUEUE_HEAD::default().set_address(0).write(bar);
> >> 
> >> I'm a bit surprised that we always write `0` here, can we document this
> >> behavior, maybe in the definition of `NV_PGSP_QUEUE_HEAD`?
> >
> > I was also suprised the value doesn't matter but this is what both Nouveau and
> > OpenRM do. I guess it just triggers some kind of notification on the GSP and the
> > actual queue head is read from the shared memory pointers.
> 
> In that case I think we should definitely document this - or even
> better, if it nevers makes sense to write anything else than 0 to
> `NV_PGSP_QUEUE_HEAD`, to add an explicit method (e.g. `kick`?) that does
> this for us. Can you try and go to the bottom of this?

Sure, I'll put a kick method on the command queue and some documentation
covering that.

> >> > +
> >> > +            // Can't fail because msg_slice will always be
> >> > +            // size_of::<GspMsgElement>() bytes long by the above split.
> >> > +            let msg_header = GspMsgElement::from_bytes(msg_header_slice).unwrap();
> >> > +            if msg_header.rpc.length < size_of::<M>() as u32 {
> >> > +                return None;
> >> > +            }
> >> 
> >> If we have a message in the queue and this condition doesn't hold, I
> >> don't think we can hope that it will in a further iteration - this
> >> should be an error.
> >
> > I was thinking messages may get partially written and we'd see this state, but
> > you're right - the shared pointers shouldn't be updated until the entire message
> > is written so this should be an error.
> >
> > That will require the `wait_on_result()` function, because `wait_on()` doesn't
> > allow the closure to return a failure. We could just return Option<Result<_>>
> > from the closure but that's a bit gross, because wait_on() would then return
> > Result<Option<Result<_>>>.
> 
> We can now use `read_poll_timeout` (from `kernel::io::poll`) which
> separates the acquisition of the state and the test of the condition
> into two separate closures - the first one returns a `Result`, and its
> error is propagated as-is, which is what we want in this case.

Oh nice. That looks like it should do the job.

> Actually this reminded me that I should send a patch to remove `wait_on`
> altogether now that we have a better alternative! :)

I guess the implies we shouldn't add new callers of `wait_on` ... not sure
if there are others in this series but will take a look and convert them to
read_poll_timeout.

> >> > diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> >> > index 7f4fe684ddaf..2e4255301e58 100644
> >> > --- a/drivers/gpu/nova-core/gsp/fw.rs
> >> > +++ b/drivers/gpu/nova-core/gsp/fw.rs
> >> > @@ -15,7 +15,9 @@
> >> >  use crate::firmware::gsp::GspFirmware;
> >> >  use crate::gpu::Chipset;
> >> >  use crate::gsp;
> >> > +use crate::gsp::cmdq::MSGQ_NUM_PAGES;
> >> 
> >> I guess the number of pages in the message queue is firmware-dependent,
> >> so would it make sense to move its declaration to this module?
> >
> > It's not firmware dependent - it appears we can set it to whatever we reasonably
> > want (for some value of "reasonable) so long as we allocate the right buffer
> > sizes, etc. I noticed and tested this a while back when removing some magic
> > numbers.
> >
> > That said 0x3f is what I chose because that's what Nouveau and I think OpenRM
> > use.
> 
> Ah, that's great and makes perfect sense! So I suppose the
> `pageTableEntryCount` member of the `MESSAGE_QUEUE_INIT_ARGUMENT` is to
> pass the total size of the message queue. I'm starting to think about a
> version of the command queue where its size would be a generic argument,
> but let's keep that for a later patch, if we ever want it at all. :)

Ha. A fun coding project for sure but I'm not sure we'd ever really use it.

> >
> >> >  use crate::gsp::FbLayout;
> >> > +use crate::gsp::GSP_PAGE_SIZE;
> >> >  
> >> >  /// Dummy type to group methods related to heap parameters for running the GSP firmware.
> >> >  pub(crate) struct GspFwHeapParams(());
> >> > @@ -159,6 +161,37 @@ pub(crate) fn new(gsp_firmware: &GspFirmware, fb_layout: &FbLayout) -> Self {
> >> >      // GSP firmware constants
> >> >      GSP_FW_WPR_META_MAGIC,
> >> >      GSP_FW_WPR_META_REVISION,
> >> > +
> >> > +    // GSP events
> >> > +    NV_VGPU_MSG_EVENT_GSP_INIT_DONE,
> >> > +    NV_VGPU_MSG_EVENT_GSP_LOCKDOWN_NOTICE,
> >> > +    NV_VGPU_MSG_EVENT_GSP_POST_NOCAT_RECORD,
> >> > +    NV_VGPU_MSG_EVENT_GSP_RUN_CPU_SEQUENCER,
> >> > +    NV_VGPU_MSG_EVENT_MMU_FAULT_QUEUED,
> >> > +    NV_VGPU_MSG_EVENT_OS_ERROR_LOG,
> >> > +    NV_VGPU_MSG_EVENT_POST_EVENT,
> >> > +    NV_VGPU_MSG_EVENT_RC_TRIGGERED,
> >> > +    NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT,
> >> > +
> >> > +    // GSP function calls
> >> > +    NV_VGPU_MSG_FUNCTION_ALLOC_CHANNEL_DMA,
> >> > +    NV_VGPU_MSG_FUNCTION_ALLOC_CTX_DMA,
> >> > +    NV_VGPU_MSG_FUNCTION_ALLOC_DEVICE,
> >> > +    NV_VGPU_MSG_FUNCTION_ALLOC_MEMORY,
> >> > +    NV_VGPU_MSG_FUNCTION_ALLOC_OBJECT,
> >> > +    NV_VGPU_MSG_FUNCTION_ALLOC_ROOT,
> >> > +    NV_VGPU_MSG_FUNCTION_BIND_CTX_DMA,
> >> > +    NV_VGPU_MSG_FUNCTION_FREE,
> >> > +    NV_VGPU_MSG_FUNCTION_GET_GSP_STATIC_INFO,
> >> > +    NV_VGPU_MSG_FUNCTION_GET_STATIC_INFO,
> >> > +    NV_VGPU_MSG_FUNCTION_GSP_INIT_POST_OBJGPU,
> >> > +    NV_VGPU_MSG_FUNCTION_GSP_RM_CONTROL,
> >> > +    NV_VGPU_MSG_FUNCTION_GSP_SET_SYSTEM_INFO,
> >> > +    NV_VGPU_MSG_FUNCTION_LOG,
> >> > +    NV_VGPU_MSG_FUNCTION_MAP_MEMORY,
> >> > +    NV_VGPU_MSG_FUNCTION_NOP,
> >> > +    NV_VGPU_MSG_FUNCTION_SET_GUEST_SYSTEM_INFO,
> >> > +    NV_VGPU_MSG_FUNCTION_SET_REGISTRY,
> >> >  };
> >> >  
> >> >  #[repr(transparent)]
> >> > @@ -197,3 +230,86 @@ fn id8(name: &str) -> u64 {
> >> >          })
> >> >      }
> >> >  }
> >> > +
> >> > +pub(crate) type MsgqTxHeader = bindings::msgqTxHeader;
> >> 
> >> This should be wrapped into a newtype that provides the exact set of
> >> features required by the gsp module, like what is done for
> >> `LibosMemoryRegionInitArgument`. For this type we just need two things:
> >> return the `writePtr`, and advance it by a given value. That's all
> >> the parent module needs to see.
> >
> > Except that doesn't actually work, because we need to use DMA access methods on
> > bindings::msgqTxHeader which we can't do from an accessor.
> 
> If the accessor uses `read_volatile` and `write_volatile` then I guess
> it should work? These are unsafe though, so this is not ideal, but at
> least they would be contained into the `fw` module and limited to one
> getter and one setter.

Sure, `read_volatile()` and `write_volatile` would work but then we're adding
more unsafe calls and going around the DMA subsystem which has special
exceptions for justifying the SAFETY comments.

> >
> >> By just aliasing this type, we make all its members visible to the `gsp`
> >> module. This increases its dependency on a given firmware version,
> >> carries a risk that the GSP module will mess with what it is not
> >> supposed to, and introduces inconsistency in how we abstract the
> >> firmware types - some are wrapped, others are not. Let's be consistent
> >> and make all bindings completely opaque outside of `fw.rs`.
> >
> > I can't see how accessors in `fw.rs` for every firmware binding we use is useful
> > though[1]? In what way does that reduce dependency on a given firmware version?
> > I can't see how it isn't just adding boiler plate for every struct field we want
> > to access.
> 
> While the main point of this remains hiding stuff other modules don't
> need to see, it also allows us to choose stable method names that are
> independent from the firmware.

So the concern is firmware will rename struct fields?

> >
> > We already have higher level abstractions (eg. DmaGspMem) to ensure the GSP
> > won't mess with what it isn't supposed to, and these higher level abstractions
> > also provide more options to reduce dependency on a given firmware version as
> > the logic isn't as tightly coupled to the binding representation. It also allows
> > us to access the fields using the proper DMA access methods.
> 
> `DmaGspMem` is a special case - it is in shared memory and requires
> volatile reads and writes at runtime. And yes it also provides a
> higher-level abstraction to access the driver's readable and writable
> areas of the ring buffer - I think these are at the right place in
> `cmdq.rs`.
> 
> But we also have other types that we build on the stack, even if we move
> them to a DmaObject afterwards. `fw.rs` should provide constructors and
> simple primitives for these, on top of which the command queue logic can
> be built. Right now we are initializing their fields in `cmdq` and I
> think we can agree this is not very portable.

I agree but I also can't see where we are doing that in `cmdq`? I already added
constructors for those after the last round of review but maybe I missed one?

> To clarify further, our sole business with half of the bindings types is
> initializing them to some value that makes sense and put it somewhere.
> For this we only need a constructor. We don't need to know the names of
> their fields, and we don't want to know.

So an example of that would be GspSystemInfo right? I can see why a constructor
would be helpful for that, but does it have to go in fw.rs? What is wrong with
having the constructor implementation in eg. commands.rs where it is
currently[1]? That would help us split up fw.rs so it doesn't become unwieldy.

[1] - Admittedly there isn't an explicit constructor at the moment, because
initialisation needs to happen in place in the closure, but conceptually it
could have one and serves as an example.

> Other bindings types are message-related, and we need to access their
> contents. You discuss these in more detail below so let's address them
> there.
> 
> Then the last kind are the types related to the command queue runtime
> operation - namely the Rx/Tx headers and their read/write pointers, and
> the message headers that we need to read and write. For the former we
> only need to poke at their pointers using volatile accesses, which is
> two methods that I propose to have for that purpose. For the latter we
> only need to access a handful of fields, some of which can be tricky
> (like `length`). In any case, we are talking about a handful of methods
> here to keep the abstraction opaque.

It's largely the whole concept of keeping the details of the command queue
opaque from the thing implementing the command queue (the cmdq module) that
seems strange to me. If anything in the firmware command queue changes it's
unlikely the command queue implementation will remain the same.

> More on messages below as you make an interesting point on them.
> 
> > [1] - Except for maybe removing some nonRustySnakeyNames
> >
> >> > +
> >> > +// SAFETY: Padding is explicit and will not contain uninitialized data.
> >> > +unsafe impl AsBytes for MsgqTxHeader {}
> >> > +
> >> > +impl MsgqTxHeader {
> >> > +    pub(crate) fn new(msgq_size: u32, rx_hdr_offset: u32) -> Self {
> >> > +        Self {
> >> > +            version: 0,
> >> > +            size: msgq_size,
> >> > +            msgSize: GSP_PAGE_SIZE as u32,
> >> > +            msgCount: MSGQ_NUM_PAGES,
> >> > +            writePtr: 0,
> >> > +            flags: 1,
> >> > +            rxHdrOff: rx_hdr_offset,
> >> > +            entryOff: GSP_PAGE_SIZE as u32,
> >> > +        }
> >> > +    }
> >> > +}
> >> > +
> >> > +/// RX header for setting up a message queue with the GSP.
> >> > +///
> >> > +/// # Invariants
> >> > +///
> >> > +/// [`Self::read_ptr`] is guaranteed to return a value in the range `0..NUM_PAGES`.
> >> > +pub(crate) type MsgqRxHeader = bindings::msgqRxHeader;
> >> > +
> >> > +// SAFETY: Padding is explicit and will not contain uninitialized data.
> >> > +unsafe impl AsBytes for MsgqRxHeader {}
> >> > +
> >> > +impl MsgqRxHeader {
> >> > +    pub(crate) fn new() -> Self {
> >> > +        Default::default()
> >> > +    }
> >> > +}
> >> > +
> >> > +pub(crate) type GspRpcHeader = bindings::rpc_message_header_v;
> >> 
> >> This type too is another good illustration of why we should make our
> >> bindings opaque. In `cmdq.rs` we access `GspRpcHeader::length` multiple
> >> times, ignoring the fact that it also includes the size of the RPC
> >> header itself - not just what comes after it! Since it doesn't come with
> >> any documentation, we can be forgiven for assuming the obvious - that it
> >> is just the size of what follows, but it is not.
> >
> > This has nothing to do with the bindings being opaque or not though. I can
> > almost gurantee that after writing ten brain dead accessor functions that the
> > eleventh would have looked pretty similar and included the same bug!
> 
> It does very much have to do with making the binding opaque. If it is
> not opaque, and the original `length` is accessible, what is preventing
> code from using it and repeating the same mistake again?

Nothing, and nothing is preventing someone writing an accessor to access the
length or some other field and using it incorrectly if that's what they think
they need. Making the fields opaque just means more code and more code generally
means more bugs and more maintenance when things inevitably change.

> I don't quite get where the "ten brain dead accessor functions" would
> come from. I propose we write one, that returns a value that makes sense
> (contrary to the original member), and most importantly is documented so
> users can know exactly what it does. Tell me what is brain-dead about
> that? This very patch misused the `length` member on at least two
> different occasions, isn't the benefit clear?

To be clear I'm not disagreeing with the need for a method to get the length
of a message - I'm disagreeing with the requirement to make the binding fields
opaque to the rest of the Gsp module that needs them (as distinct from the rest
of nova-core - bindings should definitely not leak outside the Gsp abstraction).

The "ten brain dead accessor functions" comes from an assumption that eventually
we will need to access most fields in these structs (otherwise why would they
exist?) which would require many accessor functions to simply read/write the
fields. This is still an early implementation and I imagine as we grow to
support more features most of the fields will be needed.

> >
> >> What we actually want is a method on `GspMsgElement` that returns what
> >> we actually want (and is documented as such): the actual size of the
> >> payload following the whole header. That way there can be no room for
> >> mistake.
> >
> > The length calculation can just be a method on GspRpcHeader without requiring
> > the rest of the fields to need accessor functions as welli though.
> >
> > Take for example a structure that we will need in future, GspStaticConfigInfo_t.
> > This currently has 53 fields, although to be fair maybe we only care about half
> > of them.
> >
> > My proposal[2] is to convert them into an abstract nova-core specific type,
> > GspConfigInfo, that contains the specific details Nova users care about.
> > Following the implied rule that no field accessors shall leak from fw.rs would
> > require accessors for all fields. That's at least another 78 lines (assuming a
> > minimal 3 line accessor function) of boiler plate for one FW structure. Multiply
> > that by many structures and you now have a lot of boiler plate for no real gain
> > and a very large fw.rs (note my example[2] only contains one field, but I expect
> > we will need a lot more if not most of them).
> >
> > [2] - https://github.com/apopple-nvidia/linux/blob/nova-core-unstable/drivers/gpu/nova-core/gsp/commands.rs#L32
> 
> I am not opposed to having a nova-core specific type, and in the case of
> GspStaticConfigInfo_t is certainly seems to make sense. Actually it even
> does what we want here, since with that design this type should not be
> visible outside of the `fw` module, right?

I feel like this is largely where the confusion/disagreement lies - what the
`gsp` module provides versus what the `fw` module provides. My thinking is that
the types don't leak outside the `gsp` module, the `fw` is really just providing
raw binding information that the `gsp` module turns into something nice/abstract
for the rest of nova-core in the form of functions (or types) to do whatever
is needed.

> Now to be fair, one needs to also consider that this new
> GspStaticConfigInfo type will also need its own boilerplate-y code to
> initialize itself from the original GspStaticConfigInfo_t, field by
> field. But in the end we would have a type that provides us with the
> abstraction that we need for the GSP module, while making the bindings
> type invisible - this fits the bill, and looks arguably nicer than
> having too many accessors so why not!
> 
> One can even argue that the wrapping pattern is just a simpler version
> of this - we create a new type to hide the bindings' internals and
> provide us with the right interface. It is just that for many
> commands/messages the payload is so small that it is actually less code
> to just wrap it and provide a few utility methods. GspStaticConfigInfo_t
> is certainly an exception.

Exactly! Although it would be nice if we could wrap without needing all the
accessors, hence the suggestion of a type alias which could easily be turned
into a separate struct. If say the firmware binding type changed we could easily
construct the old struct from whatever the new type(s) are (which could even
involve extra GSP RPC calls or something else for example).

> >
> >> > +
> >> > +// SAFETY: Padding is explicit and will not contain uninitialized data.
> >> > +unsafe impl AsBytes for GspRpcHeader {}
> >> > +
> >> > +// SAFETY: This struct only contains integer types for which all bit patterns
> >> > +// are valid.
> >> > +unsafe impl FromBytes for GspRpcHeader {}
> >> > +
> >> > +impl GspRpcHeader {
> >> > +    pub(crate) fn new(cmd_size: u32, function: u32) -> Self {
> >> > +        Self {
> >> > +            // TODO: magic number
> >> > +            header_version: 0x03000000,
> >> > +            signature: bindings::NV_VGPU_MSG_SIGNATURE_VALID,
> >> > +            function,
> >> > +            // TODO: overflow check?
> >> > +            length: size_of::<Self>() as u32 + cmd_size,
> >> > +            rpc_result: 0xffffffff,
> >> > +            rpc_result_private: 0xffffffff,
> >> > +            ..Default::default()
> >> > +        }
> >> > +    }
> >> > +}
> >> > +
> >> > +pub(crate) type GspMsgElement = bindings::GSP_MSG_QUEUE_ELEMENT;
> >> 
> >> Hammering my previous argument a bit more: in `cmdq.rs`, we do the
> >> following:
> >> 
> >>     let msg_header = GspMsgElement::from_bytes(msg_header_slice).unwrap();
> >> 
> >> Even though we explicitly used `GspMsgElement`, `msg_header` appears as
> >> being of type `GSP_MSG_QUEUE_ELEMENT` in LSP. That's super confusing and
> >> can be avoided by using the newtype pattern.
> >
> > That's surprising. I would have hoped the LSP would be more helpful than that.
> >
> >> Lastly, the bindings generated from C headers are supposed to be temporary, and
> >> we eventually want to replace them with direct IDL-to-Rust bindings. Not
> >> leaking the C types let us design `fw.rs` as a blueprint for the ideal
> >> interface we would want to generate - so the limited extra labor that
> >> comes with wrapping these types would also pay off in that respect.
> >
> > I agree with this - `fw.rs` should be the blueprint for the ideal bindings
> > that we could conceivably generate directly from IDL. But this is also why
> > I'm pushing back against the idea of putting any kind of logic (in the form of
> > accessors/constructors) into `fw.rs` - I doubt you will ever be able to generate
> > those from IDL.
> 
> `fw.rs` should contain the minimum required to build the logic on top of
> it, agreed. Doing so might require having another type on top of the one
> defined in `fw` to implement said logic - if you remember I did
> something like that for MsgqTxHeader and you thought that was too
> convulated. :)

Types (plural) yes, but a type I have no problem with :-)

My thinking is DmaGspMem is that type. In other words things like the
read/writePtr accessors (which if I recall correctly you had spread across 3
types) can just live there - we don't need to hide them behind some opaque type
in fw.rs (and indeed doing so introduces more unsafe code as discussed above).

> >
> > For example I doubt it would be possible to generate the length logic, and I'm
> > even more doubtful we could figure out how to generate the constructors based on
> > Rust type.
> 
> Where to put the length logic depends on how our firmware interface is
> defined. If we agree that `length` is "The size of the RPC header, the
> following command, and its payload but oh wait we don't include the rest
> of the message element which includes the RPC header so don't forget to
> add it" then we can just return or use the member as-is. But it's a bad
> and error-prone interface. We can document it, and that in itself would
> be an improvement, but it would still require some mental gymnastic to
> use and is just a giant footgun.

Oh yeah, for sure. I'm not arguing against a method to calculate the length
we want. I'm just thinking it lives on the binding type as a method. Eg:

impl bindings::GSP_MSG_QUEUE_ELEMENT {
    pub(crate) fn message_length(self) -> u32 {
        (size_of::<Self>() as u32 - size_of::<GspRpcHeader>() as u32 + self.rpc.length)
    }
}

And that probably lives in the cmdq code in order to keep any non-generatable
logic out of fw.rs (but I could be convinced otherwise). And yes the name is
horrible, but we can fix that easily enough in the binding generation.

> What is safe and useful is the length of the entire message, header,
> command and payload included, because that's how we use it in the code.
> It's also trivial to produce, and if we do it here we only do that
> operation on one place instead of at each calling site. So yes, it
> belongs here, as part of the firmware abstraction, and if we have to
> write it by hand then so be it.

Just to re-iterate, I am not insane enough to want that mistake spread
everywhere. I just don't understand the advantage of putting this in fw.rs
rather than in the code using it and providing the abstraction (cmdq.rs),
nor the need to hide the rest of the rpc message header fields from the thing
implementing the abstraction.

IOW the rpc header shouldn't be used outside the cmdq.rs code anyway so why jump
through hoops in fw.rs to hide the fields.

> The same goes for the constructor. Of course there will be some manual
> work needed to include a new incompatible firmware version. But if you
> don't do it here, you will have to do that work somewhere else anyway -
> probably `gsp` or `cmdq`, except without the elegance of a method
> containing that code. How is that better?

Again, not against the methods at all. Just that we should add them as needed
rather than force them for every single field that will probably never change
anyway.

> >
> > So to summarise where I think we're at we have two proposals for how to
> > treat/use the generated bindings:
> >
> > 1) Make them all a newtype[3] using a transparent wrapper to make the fields
> > opaque and accessor functions to implement the logic. This would all be in fw.rs
> > to stop the C types leaking out.
> >
> > 2) Build a completely new type based on the underlying bindings plus whatever
> > other logic is needed. The C types could be used within the `gsp` module to
> > implement the new types but would not leak further than that.
> >
> > Using GspStaticConfigInfo_t as a concrete example I think option (1) looks like
> > this:
> >
> > In fw.rs:
> >
> > #[repr(transparent)]
> > pub(crate) struct GspStaticConfigInfo(bindings::GspStaticConfigInfo_t);
> >
> > impl GspStaticConfigInfo {
> >     pub(crate) fn new(gsp: &mut Gsp) -> Result<Self> {
> >         // Create GspStaticConfigInfo sending whatever messages needed to the Gsp.
> >     }
> >     
> >     pub(crate) fn gpu_name(self) -> String {
> >         ...
> >     }    
> >
> >     pub(crate) fn sriov_max_gfid(self) -> u32 {
> >         self.0.sriovMaxGfid
> >     }
> >
> >     pub(crate) fn fb_length(self) -> u64 {
> >        self.0.fb_length
> >     }
> >
> >     pub(crate) fn ats_supported(self> -> bool {
> >         self.0.bAtsSupported != 0
> >     }
> >
> >     // More for every field we need
> >     ...
> > }
> >
> > And Option (2) could look more like this:
> >
> > In commands.rs:
> >
> > pub(crate) struct GspStaticConfigInfo {
> >     pub gpu_name: String,
> >     pub sriov_max_gfid: u32,
> >     pub fb_length: u64,
> >     pub ats_supported: bool,
> > }
> >
> > impl GspStaticConfigInfo {
> >     pub(crate) fn new(gsp: &mut Gsp) -> Result<Self> {
> >         // Create GspStaticConfigInfo sending whatever messages needed to the Gsp.
> >     }
> > }
> 
> To make a fair comparison your Option (2) should include this:
> 
> impl TryFrom<GspStaticConfigInfo_t> for GspStaticConfigInfo {
>   type Error = kernel::Error;
> 
>   fn try_from(value: GspStaticConfigInfo_t) -> Result<Self, Self::Error>
>   {
>     let gpu_name = // extract gpu string from `value`.
>     ...
> 
>     Ok(Self {
>       gpu_name,
>       sriov_max_fgid: value.sriovMaxGid,
>       fb_length: value.fb_length,
>       ats_supported: value.bAtsSupported != 0,
>       // More for every field we need
>     })
>   }
> }
> 
> Because you are not only going to create this type from the `Gsp`, but
> also from a message you have received as per the code.

Hmm. I hadn't thought of TryFrom! I like this idea - but if I'm understanding
your previous comments correctly this would also have to live fw.rs rather than
commands.rs?

> >
> > Peronally I favour option (2) because it is more concise and allows us to
> > split the code up more. It is also more resilient to FW changes as it doesn't
> > depend on the underlying representation at all. However based on your feedback
> > I think you favour option (1) more so I want to make sure I'm understanding
> > it correctly.
> 
> My constraints are that I want the binding types to be opaque and
> abstracted in a way that is portable and makes sense for the higher
> levels. If Option (2) satisfies these constraints (and in the case of
> GspStaticConfigInfo_t I believe it does), then I'm fine with it where is
> makes sense, and for large types with many fields I think we can make
> the case that it does.
> 
> I also think this really only applies to messages received from the GSP,
> as commands NEED to be laid out in-place in the format that the firmware
> expects, so there is little room for using a temporary type where we can
> have an in-place constructor that is just as convenient and will use
> less stack space. Still, in theory we could also use that technique for
> e.g. very complex commands that would benefit from the constructor
> pattern.
> 
> But for messages from the GSP, why not. There is an additional cost to
> consider related to the conversion from one type to the other, and that
> will require extra complexity like an associated type in the
> MessageFromGsp trait, so for very simple messages a wrapper might be a
> better fit, but otherwise, I think it's fair game.

I suppose what I'm looking for (and maybe I've read too much into previous
comments) is a design for how we want to deal with all of this and I was
extrapolating it from the comments on "bindings shall not leak out of fw.rs".

It seems this may be being done in a piece meal manner as we're merging patches,
so maybe it would be helpful to get a design or some guidelines written up here?

Re: [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

Posted by Alexandre Courbot 1 day, 15 hours ago

On Tue Sep 30, 2025 at 9:36 AM JST, Alistair Popple wrote:
> On 2025-09-30 at 00:34 +1000, Alexandre Courbot <acourbot@nvidia.com> wrote...
>> On Mon Sep 29, 2025 at 3:19 PM JST, Alistair Popple wrote:
>> <snip>
>> >> > +
>> >> > +/// Number of GSP pages making the Msgq.
>> >> > +pub(crate) const MSGQ_NUM_PAGES: u32 = 0x3f;
>> >> > +
>> >> > +#[repr(C, align(0x1000))]
>> >> > +#[derive(Debug)]
>> >> > +struct MsgqData {
>> >> > +    data: [[u8; GSP_PAGE_SIZE]; MSGQ_NUM_PAGES as usize],
>> >> > +}
>> >> > +
>> >> > +// Annoyingly there is no real equivalent of #define so we're forced to use a
>> >> > +// literal to specify the alignment above. So check that against the actual GSP
>> >> > +// page size here.
>> >> > +static_assert!(align_of::<MsgqData>() == GSP_PAGE_SIZE);
>> >> > +
>> >> > +// There is no struct defined for this in the open-gpu-kernel-source headers.
>> >> > +// Instead it is defined by code in GspMsgQueuesInit().
>> >> > +#[repr(C)]
>> >> > +struct Msgq {
>> >> > +    tx: MsgqTxHeader,
>> >> > +    rx: MsgqRxHeader,
>> >> > +    msgq: MsgqData,
>> >> > +}
>> >> > +
>> >> > +#[repr(C)]
>> >> > +struct GspMem {
>> >> > +    ptes: [u8; GSP_PAGE_SIZE],
>> >> > +    cpuq: Msgq,
>> >> > +    gspq: Msgq,
>> >> > +}
>> >> > +
>> >> > +// SAFETY: These structs don't meet the no-padding requirements of AsBytes but
>> >> > +// that is not a problem because they are not used outside the kernel.
>> >> > +unsafe impl AsBytes for GspMem {}
>> >> > +
>> >> > +// SAFETY: These structs don't meet the no-padding requirements of FromBytes but
>> >> > +// that is not a problem because they are not used outside the kernel.
>> >> > +unsafe impl FromBytes for GspMem {}
>> >> 
>> >> These ARE used outside the kernel, since they are shared with the GSP.
>> >> :) I'd say the reason these are safe is just because we satisfy the
>> >> requirements of AsBytes and FromBytes:
>> >
>> > Yes, and they're actually old safety comments that I missed updating to be the
>> > same as the rest.
>> >  
>> >> - No initialized bytes,
>> >> - No interior mutability,
>> >> - All bytes patterns are valid (for some generous definition of
>> >>   "valid" limited to not triggering UB).
>> >
>> > I was under the impression that that's all `unsafe` is really marking - code
>> > the compiler can't prove won't trigger UB. So I'd assume that's all we'd have to
>> > prove in safety comments anyway.
>> 
>> A good rule of thumb for writing `SAFETY` statements is to look at the
>> `Safety` section of the unsafe code we call (here `FromBytes`), and
>> justify why our calling code satisfies each of these.
>
> Oh that's usually where I get my inspiration for them from. This was more a
> statement of my understanding of the language design, that unsafe is mostly
> about UB rather than overall struct validity. After all I think it is still
> possible to have safe code initialise fields of a struct in "invalid" ways
> (depending on the struct).

No you are correct, unsafe blocks are blocks of code where the
responsibility for preventing UB shifts from the compiler to the
developer. The statements in `FromBytes` are here to prevent UB.

For instance, if you have the following:

#[repr(u8, C)]
enum Switch {
  Off = 0,
  On = 1,
}

#[repr(C)]
struct Light {
  switch: Switch,
  intensity: u8,
}

Then the compiler can make the assumption that `switch` will only ever
be 0 or 1, and allow UB when it is not. `FromBytes` cannot be
implemented for that structure. So in that sense, struct validity must
be withheld.

What does not need to be withheld, is e.g. GSP messages that are
structurally sound from a type point of view, but don't make any sense
semantically. For these the code should produce a runtime error while
validating them - but this is not a source of UB as long as the types
they contain indeed satisfy the requirement that any bit pattern is
valid for them.

(tbh I am not sure myself why interior mutability is not allowed)

>
>> >> > +    dev: ARef<device::Device>,
>> >> > +    seq: u32,
>> >> > +    gsp_mem: DmaGspMem,
>> >> > +    pub _nr_ptes: u32,
>> >> > +}
>> >> > +
>> >> > +impl GspCmdq {
>> >> > +    pub(crate) fn new(dev: &device::Device<device::Bound>) -> Result<GspCmdq> {
>> >> > +        let gsp_mem = DmaGspMem::new(dev)?;
>> >> > +        let nr_ptes = size_of::<GspMem>() >> GSP_PAGE_SHIFT;
>> >> > +        build_assert!(nr_ptes * size_of::<u64>() <= GSP_PAGE_SIZE);
>> >> > +
>> >> > +        Ok(GspCmdq {
>> >> > +            dev: dev.into(),
>> >> > +            seq: 0,
>> >> > +            gsp_mem,
>> >> > +            _nr_ptes: nr_ptes as u32,
>> >> > +        })
>> >> > +    }
>> >> > +
>> >> > +    fn calculate_checksum<T: Iterator<Item = u8>>(it: T) -> u32 {
>> >> > +        let sum64 = it
>> >> > +            .enumerate()
>> >> > +            .map(|(idx, byte)| (((idx % 8) * 8) as u32, byte))
>> >> > +            .fold(0, |acc, (rol, byte)| acc ^ u64::from(byte).rotate_left(rol));
>> >> > +
>> >> > +        ((sum64 >> 32) as u32) ^ (sum64 as u32)
>> >> > +    }
>> >> > +
>> >> > +    #[expect(unused)]
>> >> > +    pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
>> >> > +        &mut self,
>> >> > +        bar: &Bar0,
>> >> > +        payload_size: usize,
>> >> > +        init: impl FnOnce(&mut M, SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result,
>> >> 
>> >> This works pretty well for in-place initialization.
>> >> 
>> >> There a couple of small drawbacks though: `M` must implement `FromBytes`
>> >> even though we only send it, and (more serious) there is no guarantee
>> >
>> > Yes, that detail annoyed me slightly too.
>> >
>> >> that `init` will fully initialize the command - a forgetful caller can
>> >> omit some of its fields, or even the whole structure, and in that case
>> >> we will send a command with what happened to be at that position of the
>> >> queue at that time.
>> >
>> > Good timing as I was just looking to see if there wasn't some way of ensuring
>> > this happened, or at least was much more explicit than what we currently do.
>> >
>> >> I think this is a good case for using the `Init` trait: it's like
>> >> `PinInit`, but without the `Pin`, and it ensures that the whole argument
>> >> is initialized. So this method would change into something like:
>> >> 
>> >>     pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
>> >>         &mut self,
>> >>         bar: &Bar0,
>> >>         payload_size: usize,
>> >>         init: impl Init<M, Error>,
>> >>         init_sbuf: impl FnOnce(SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result,
>> >> 
>> >> This also allows us to drop the `FromBytes` requirement on
>> >> `GspCommandToGsp`! But also requires us to use `unsafe` to call
>> >> `Init::__init` on the pointer to the command. I think it's worth it
>> >> though, as this removes the risk of sending partially-uninitialized
>> >> commands.
>> >
>> > Agree on it being worth the unsafe call, especially because it is unsafe if the
>> > caller doesn't do what it's supposed to regardless. But how does this contrast
>> > with `MaybeUninit`? I was wondering if something like this would work:
>> >
>> >     pub(crate) fn send_gsp_command<M: GspCommandToGsp>(
>> >         &mut self,
>> >         bar: &Bar0,
>> >         payload_size: usize,
>> >         init: impl FnOnce(MaybeUninit<&mut M>, SBuffer<core::array::IntoIter<&mut [u8], 2>>) -> Result<&M>,
>> >
>> > Note I'm not sure we could actually make that work as I haven't tried it yet. I
>> > suspect I've missed something with lifetimes there :-)
>> 
>> The difference with using `MaybeUninit` is that the unsafe code would
>> need to be *within* each `init` argument - IOW, within each caller. It
>> also does not provide any guarantee that the whole `M` is initialized,
>> we have to trust each caller for doing that properly.
>
> I suggested it because I was actually thinking that would be a feature :-) After
> all callers are the ones doing the initialisation and so have to be trusted to
> do that, and the unsafe call to `assume_init()` would be the indication they
> have to do something right.
>
> But I'm not particularly tied to using that, it was an idle thought/question. I
> do wonder how Init gets around callers having to guarantee the whole of `M` is
> initialised, so will have to go study it a bit more closely.

The only reason for `MaybeUninit` to be `unsafe` is because the user
might leave fields uninitialized or set to values that are invalid for
their type. The unsafe operation is here to shift the blame on the
developer who forget to set their fields properly.

With `Init`, such mistake cannot happen, as all fields must be
initialized (like a regular initialization) or you get a build error. So
we know for sure that the structure is properly initialized, and UB
cannot occur, hence no need for unsafe.

Of course the command thus initialized can still be invalid in the
semantic sense - but in this case the GSP will detect that upon
reception, and send us an error. As far as the compiler is concerned,
this is working as intended.

>> >
>> >
>> >> Then there is what to do with the `SBuffer`. I'd like to think a bit
>> >> more about this, as not all commands require it, and those that do
>> >> typically send arrays of particular types. I think we should be able to
>> >> use the type system to have more control over this, but let's keep that
>> >> for the next revision.
>> >
>> > Sure. You are correct that not all (most?) commands don't need it, and it's a
>> > shame we don't have proper bindings for them anyway. Although in some cases that
>> > still wouldn't work anyway (eg. for registry keys) where it really is just a
>> > packed variable number of variable length strings. Not sure what a Rust native
>> > representation of that would be.
>> 
>> For now the best I could think of is to have not one but two traits for
>> GSP commands: one for commands with payload and another one for commands
>> without payload. And two variants of `send_gsp_command` for each case.
>> That way we can at least guarantee that we won't pass a payload where we
>> shouldn't, or forget one where we should.
>
> We can't guarantee that (at least with the current bindings) because we
> still need to manually mark up the types with the correct traits. But if the
> majority of commands don't have a payload I agree it would provide some nice
> simplifications for callers.

Yes, the idea would be that each command implements one trait or the
other depending on whether it needs an extra payload. This needs to be
done manually, but we are already manually implementing `CommandToGsp`
so this is not much different.

>> >> > +
>> >> > +            // Can't fail because msg_slice will always be
>> >> > +            // size_of::<GspMsgElement>() bytes long by the above split.
>> >> > +            let msg_header = GspMsgElement::from_bytes(msg_header_slice).unwrap();
>> >> > +            if msg_header.rpc.length < size_of::<M>() as u32 {
>> >> > +                return None;
>> >> > +            }
>> >> 
>> >> If we have a message in the queue and this condition doesn't hold, I
>> >> don't think we can hope that it will in a further iteration - this
>> >> should be an error.
>> >
>> > I was thinking messages may get partially written and we'd see this state, but
>> > you're right - the shared pointers shouldn't be updated until the entire message
>> > is written so this should be an error.
>> >
>> > That will require the `wait_on_result()` function, because `wait_on()` doesn't
>> > allow the closure to return a failure. We could just return Option<Result<_>>
>> > from the closure but that's a bit gross, because wait_on() would then return
>> > Result<Option<Result<_>>>.
>> 
>> We can now use `read_poll_timeout` (from `kernel::io::poll`) which
>> separates the acquisition of the state and the test of the condition
>> into two separate closures - the first one returns a `Result`, and its
>> error is propagated as-is, which is what we want in this case.
>
> Oh nice. That looks like it should do the job.

Note that this won't be usable in drm-rust-next until -rc1 is tagged,
but we can assume it will be usable for us.

>
>> Actually this reminded me that I should send a patch to remove `wait_on`
>> altogether now that we have a better alternative! :)
>
> I guess the implies we shouldn't add new callers of `wait_on` ... not sure
> if there are others in this series but will take a look and convert them to
> read_poll_timeout.

There are a few - if we could use read_poll_timeout proactively that
would be great.

>> >> > +pub(crate) type MsgqTxHeader = bindings::msgqTxHeader;
>> >> 
>> >> This should be wrapped into a newtype that provides the exact set of
>> >> features required by the gsp module, like what is done for
>> >> `LibosMemoryRegionInitArgument`. For this type we just need two things:
>> >> return the `writePtr`, and advance it by a given value. That's all
>> >> the parent module needs to see.
>> >
>> > Except that doesn't actually work, because we need to use DMA access methods on
>> > bindings::msgqTxHeader which we can't do from an accessor.
>> 
>> If the accessor uses `read_volatile` and `write_volatile` then I guess
>> it should work? These are unsafe though, so this is not ideal, but at
>> least they would be contained into the `fw` module and limited to one
>> getter and one setter.
>
> Sure, `read_volatile()` and `write_volatile` would work but then we're adding
> more unsafe calls and going around the DMA subsystem which has special
> exceptions for justifying the SAFETY comments.

Honestly I myself don't understand why we can call `dma_write!` safely
wherever we want - technically there is nothing that guarantees that
what we access is not concurrently accessed by the device.

The volatile accesses are unsafe because they operate on pointers. I'd
rather do without but we will need them if we want to keep the bindings
opaque.

>
>> >
>> >> By just aliasing this type, we make all its members visible to the `gsp`
>> >> module. This increases its dependency on a given firmware version,
>> >> carries a risk that the GSP module will mess with what it is not
>> >> supposed to, and introduces inconsistency in how we abstract the
>> >> firmware types - some are wrapped, others are not. Let's be consistent
>> >> and make all bindings completely opaque outside of `fw.rs`.
>> >
>> > I can't see how accessors in `fw.rs` for every firmware binding we use is useful
>> > though[1]? In what way does that reduce dependency on a given firmware version?
>> > I can't see how it isn't just adding boiler plate for every struct field we want
>> > to access.
>> 
>> While the main point of this remains hiding stuff other modules don't
>> need to see, it also allows us to choose stable method names that are
>> independent from the firmware.
>
> So the concern is firmware will rename struct fields?

That's one of the funny things the firmware can do, amongst others. :)

And yes, I agree that if the firmware changes too much this can
invalidate the whole of the GSP module, no abstraction will ever protect
us completely against that. But `fw` can be our first line of defense,
and also a logical break similar to the role `rust/kernel` fulfills
for the larger kernel [1]: the `kernel` crate is allowed to call the
C bindings, but not the drivers. Similarly, to access our GSP bindings,
we need to go through the `fw` sanitization layer.

[1] https://docs.kernel.org/next/rust/general-information.html#abstractions-vs-bindings

So to conclude on this topic (since we also synced offline and I see you
already have a v3 in which the bindings don't leak `fw`: yes, we need a
thin shim on top of the bindings to make them opaque, and yes this
results in a bit more code overall, but it also makes the upper layer
(gsp, cmdq) more concise and easier to read. No design is set in stone,
and as we get more visibility into what we need things can perfectly
evolve, but I think this is a reasonable state for now that requires
only a very modest amount of labor.

Re: [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

Posted by Danilo Krummrich 1 day, 18 hours ago

On Tue Sep 30, 2025 at 2:36 AM CEST, Alistair Popple wrote:
> I feel like this is largely where the confusion/disagreement lies - what the
> `gsp` module provides versus what the `fw` module provides. My thinking is that
> the types don't leak outside the `gsp` module, the `fw` is really just providing
> raw binding information that the `gsp` module turns into something nice/abstract
> for the rest of nova-core in the form of functions (or types) to do whatever
> is needed.

The goal is that firmware specific structures (and the majority of their
implementation details) don't leak into the business logic of the driver code.

Of course, due to its nature the `gsp` module can't be entirely independent from
firmware specifics. However, think of it in different layers:

If we encapsulate the layout of firmware specific structures in new types (i.e.
wrap firmware structures into new types in the `fw` module), all changes to the
layout of a structure, new features, etc. are local to and can be handled
transparently by this abstraction layer.

Once we actually introduce a second firmware version, we will also introduce the
corresponding proc macros to annotate those structures accordingly with
`#[versions(GSP)]` and let the proc macro generate the version specific
structures, which subsequently allows us to treat the new types in the `fw`
module as if they'd be existing only once (and not per firmware version).

Hence, all the other code in the GSP module can be left untouched unless a new
firmware version introduces semantical changes (which should be rare).

This is not only a major advantage when it comes to maintainability (which is
one of the major motivations for the Nova project to begin with), but also leads
to a much cleaner structure and naturally separates the layout of data from its
semantics.

Yes, it does require a bit more code to set it up to begin with, but it avoids a
horrible mess leaking trivial firmware changes into the semantic layers of the
GSP implementation.

But even if that would never happen and the firmware interface would never
change (which I think will not and should not happen), the separation of how to
lay out data and the implementation of the semantics by itself adds clarity and
helps with maintainability.

The feedback Alex gave in this context looks pretty in line with the above.

I hope this helps.

- Danilo

Re: [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

Posted by Miguel Ojeda 2 days, 14 hours ago

On Mon, Sep 29, 2025 at 4:34 PM Alexandre Courbot <acourbot@nvidia.com> wrote:
>
> I think you will also need to explicitly enable the feature somewhere -
> for the kernel crate it is in `rust/kernel/lib.rs`, but Nova being a
> different crate I am not sure where we are supposed to do it...

`rust_allowed_features` in `scripts/Makefile.build`.

I hope that helps!

Cheers,
Miguel

Re: [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

Posted by Alexandre Courbot 2 days, 14 hours ago

On Mon Sep 29, 2025 at 11:38 PM JST, Miguel Ojeda wrote:
> On Mon, Sep 29, 2025 at 4:34 PM Alexandre Courbot <acourbot@nvidia.com> wrote:
>>
>> I think you will also need to explicitly enable the feature somewhere -
>> for the kernel crate it is in `rust/kernel/lib.rs`, but Nova being a
>> different crate I am not sure where we are supposed to do it...
>
> `rust_allowed_features` in `scripts/Makefile.build`.

Ah, that's where it was! Thanks a lot!

Re: [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

Posted by Alistair Popple 1 day, 17 hours ago

On 2025-09-30 at 00:45 +1000, Alexandre Courbot <acourbot@nvidia.com> wrote...
> On Mon Sep 29, 2025 at 11:38 PM JST, Miguel Ojeda wrote:
> > On Mon, Sep 29, 2025 at 4:34 PM Alexandre Courbot <acourbot@nvidia.com> wrote:
> >>
> >> I think you will also need to explicitly enable the feature somewhere -
> >> for the kernel crate it is in `rust/kernel/lib.rs`, but Nova being a
> >> different crate I am not sure where we are supposed to do it...
> >
> > `rust_allowed_features` in `scripts/Makefile.build`.
> 
> Ah, that's where it was! Thanks a lot!

Thanks. Is it still expected that `#[allow(clippy::incompatible_msrv)]` is
required? Just adding it to `rust_allowed_features` doesn't make the warning go
away without the allow, but maybe I'm just doing something wrong...

Re: [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

Posted by Miguel Ojeda 1 day, 17 hours ago

On Tue, Sep 30, 2025 at 1:42 PM Alistair Popple <apopple@nvidia.com> wrote:
>
> Thanks. Is it still expected that `#[allow(clippy::incompatible_msrv)]` is
> required? Just adding it to `rust_allowed_features` doesn't make the warning go
> away without the allow, but maybe I'm just doing something wrong...

The warning is independent of the allowed features, and yeah, it is
expected that the Clippy may trigger (at this moment when we dance
with these features), so you can just allow it locally.

If it is too cumbersome, I may disable it globally.

I hope that helps!

Cheers,
Miguel

Re: [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

Posted by Alistair Popple 1 day, 4 hours ago

On 2025-09-30 at 21:58 +1000, Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> wrote...
> On Tue, Sep 30, 2025 at 1:42 PM Alistair Popple <apopple@nvidia.com> wrote:
> >
> > Thanks. Is it still expected that `#[allow(clippy::incompatible_msrv)]` is
> > required? Just adding it to `rust_allowed_features` doesn't make the warning go
> > away without the allow, but maybe I'm just doing something wrong...
> 
> The warning is independent of the allowed features, and yeah, it is
> expected that the Clippy may trigger (at this moment when we dance
> with these features), so you can just allow it locally.
> 
> If it is too cumbersome, I may disable it globally.
> 
> I hope that helps!

Thanks! That does help, it's not too cumbersome as we only use it in a few
places. I was just wondering if it was expected or not. A nice future quality
of life improvement might be to automatically disable clippy warnings for
everything in `rust_allowed_features` but I think that's pretty low priority.

> Cheers,
> Miguel

Re: [PATCH v2 05/10] gpu: nova-core: gsp: Add GSP command queue handling

Posted by Timur Tabi 1 week, 2 days ago

On Mon, 2025-09-22 at 21:30 +1000, Alistair Popple wrote:
>  /// Creates a self-mapping page table for `obj` at its beginning.
> -fn create_pte_array(obj: &mut CoherentAllocation<u8>) {
> +fn create_pte_array<T: AsBytes + FromBytes>(obj: &mut CoherentAllocation<T>, skip: usize) {

Please add documentation for the 'skip' parameter.