[PATCH 2/2] gpu: nova-core: improve GSP RPC debug logging with message classification

John Hubbard posted 2 patches 1 month, 2 weeks ago
[PATCH 2/2] gpu: nova-core: improve GSP RPC debug logging with message classification
Posted by John Hubbard 1 month, 2 weeks ago
The debug logging printed a flat "send: seq#" and "receive: seq#" for
all GSP RPC messages, with no distinction between async events from GSP
(like GspLockdownNotice or GspInitDone) and command responses (like
GetGspStaticInfo).

Add driver-side tx_async_seq and rx_event_seq counters to independently
track async sends and async events. Move the receive debug log from
wait_for_msg() into receive_msg() where the message function is known.
Label all four message directions:

  GSP RPC: async send: seq# 0, function=GSP_SET_SYSTEM_INFO, length=0x3f0
  GSP RPC: async send: seq# 1, function=SET_REGISTRY, length=0xc5
  GSP RPC: async received: seq# 0, function=LOCKDOWN_NOTICE, length=0x51
  GSP RPC: async received: seq# 17, function=INIT_DONE, length=0x50
  GSP RPC: send: seq# 2, function=GET_GSP_STATIC_INFO, length=0x6c8
  GSP RPC: response received: seq# 2, function=GET_GSP_STATIC_INFO, length=0x6c8

The async received seq# values are driver-counted for now. For command
responses, GSP echoes back the inner rpc.sequence that the CPU sent, so
the response seq# matches the send seq#.

Cc: Maneet Singh <mmaneetsingh@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/gsp/cmdq.rs | 67 ++++++++++++++++++++++++-------
 drivers/gpu/nova-core/gsp/fw.rs   |  1 -
 2 files changed, 52 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 7d6d7d81287c..295e1a80d64d 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -449,6 +449,11 @@ pub(crate) struct Cmdq {
     /// Transport-level sequence number, incremented for every send. Used for the outer
     /// GSP_MSG_QUEUE_ELEMENT.seqNum. Also used as the inner rpc.sequence for sync commands.
     seq: u32,
+    /// Async (fire-and-forget) send sequence number, for debug logging.
+    tx_async_seq: u32,
+    /// Async event receive sequence number, for debug logging. GSP does not populate
+    /// rpc.sequence for async events today, so the driver counts them itself.
+    rx_event_seq: u32,
     /// Memory area shared with the GSP for communicating commands and messages.
     gsp_mem: DmaGspMem,
 }
@@ -477,6 +482,8 @@ pub(crate) fn new(dev: &device::Device<device::Bound>) -> Result<Cmdq> {
         Ok(Cmdq {
             dev: dev.into(),
             seq: 0,
+            tx_async_seq: 0,
+            rx_event_seq: 0,
             gsp_mem,
         })
     }
@@ -555,13 +562,24 @@ pub(crate) fn send_command<M>(&mut self, bar: &Bar0, command: M) -> Result
                 dst.contents.1,
             ])));
 
-        dev_dbg!(
-            &self.dev,
-            "GSP RPC: send: seq# {}, function={}, length=0x{:x}\n",
-            self.seq,
-            M::FUNCTION,
-            dst.header.length(),
-        );
+        if M::IS_ASYNC {
+            dev_dbg!(
+                &self.dev,
+                "GSP RPC: async send: seq# {}, function={}, length=0x{:x}\n",
+                self.tx_async_seq,
+                M::FUNCTION,
+                dst.header.length(),
+            );
+            self.tx_async_seq += 1;
+        } else {
+            dev_dbg!(
+                &self.dev,
+                "GSP RPC: send: seq# {}, function={}, length=0x{:x}\n",
+                self.seq,
+                M::FUNCTION,
+                dst.header.length(),
+            );
+        }
 
         // All set - update the write pointer and inform the GSP of the new command.
         let elem_count = dst.header.element_count();
@@ -606,14 +624,6 @@ fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
         // Extract the `GspMsgElement`.
         let (header, slice_1) = GspMsgElement::from_bytes_prefix(slice_1).ok_or(EIO)?;
 
-        dev_dbg!(
-            self.dev,
-            "GSP RPC: receive: seq# {}, function={:?}, length=0x{:x}\n",
-            header.sequence(),
-            header.function(),
-            header.length(),
-        );
-
         let payload_length = header.payload_length();
 
         // Check that the driver read area is large enough for the message.
@@ -680,6 +690,27 @@ pub(crate) fn receive_msg<M: MessageFromGsp>(&mut self, timeout: Delta) -> Resul
     {
         let message = self.wait_for_msg(timeout)?;
         let function = message.header.function().map_err(|_| EINVAL)?;
+        let is_event = function.is_event();
+
+        if is_event {
+            dev_dbg!(
+                &self.dev,
+                "GSP RPC: async received: seq# {}, function={}, length=0x{:x}\n",
+                self.rx_event_seq,
+                function,
+                message.header.length(),
+            );
+        } else {
+            // GSP echoes back the inner rpc.sequence that the CPU sent with the
+            // corresponding command, so this should match the send seq#.
+            dev_dbg!(
+                &self.dev,
+                "GSP RPC: response received: seq# {}, function={}, length=0x{:x}\n",
+                message.header.sequence(),
+                function,
+                message.header.length(),
+            );
+        }
 
         // Extract the message. Store the result as we want to advance the read pointer even in
         // case of failure.
@@ -697,6 +728,12 @@ pub(crate) fn receive_msg<M: MessageFromGsp>(&mut self, timeout: Delta) -> Resul
             message.header.length().div_ceil(GSP_PAGE_SIZE),
         )?);
 
+        // Deferred past message consumption to satisfy the borrow checker: message
+        // holds a reference into self.gsp_mem, so we can't mutate self until it's dropped.
+        if is_event {
+            self.rx_event_seq += 1;
+        }
+
         result
     }
 
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index e417ed58419f..1535969c3ba9 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -263,7 +263,6 @@ pub(crate) enum MsgFunction {
 impl MsgFunction {
     /// Returns true if this is a GSP-initiated async event (NV_VGPU_MSG_EVENT_*), as opposed to
     /// a command response (NV_VGPU_MSG_FUNCTION_*).
-    #[expect(dead_code)]
     pub(crate) fn is_event(&self) -> bool {
         matches!(
             self,
-- 
2.53.0
Re: [PATCH 2/2] gpu: nova-core: improve GSP RPC debug logging with message classification
Posted by Alexandre Courbot 1 month, 2 weeks ago
On Wed Feb 11, 2026 at 9:04 AM JST, John Hubbard wrote:
> The debug logging printed a flat "send: seq#" and "receive: seq#" for
> all GSP RPC messages, with no distinction between async events from GSP
> (like GspLockdownNotice or GspInitDone) and command responses (like
> GetGspStaticInfo).
>
> Add driver-side tx_async_seq and rx_event_seq counters to independently
> track async sends and async events. Move the receive debug log from
> wait_for_msg() into receive_msg() where the message function is known.
> Label all four message directions:
>
>   GSP RPC: async send: seq# 0, function=GSP_SET_SYSTEM_INFO, length=0x3f0
>   GSP RPC: async send: seq# 1, function=SET_REGISTRY, length=0xc5
>   GSP RPC: async received: seq# 0, function=LOCKDOWN_NOTICE, length=0x51
>   GSP RPC: async received: seq# 17, function=INIT_DONE, length=0x50
>   GSP RPC: send: seq# 2, function=GET_GSP_STATIC_INFO, length=0x6c8
>   GSP RPC: response received: seq# 2, function=GET_GSP_STATIC_INFO, length=0x6c8
>
> The async received seq# values are driver-counted for now. For command
> responses, GSP echoes back the inner rpc.sequence that the CPU sent, so
> the response seq# matches the send seq#.
>
> Cc: Maneet Singh <mmaneetsingh@nvidia.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>

I took a good look at it, but still don't understand the point of this
patch. On top of the two sequence numbers we already have, this adds two
more that are purely driver-internal and have no relation to the GSP.
Async messages and events have no sequence numbers and we cannot refer
the new counters against anything, so what do they help with?