[v1] gpu: nova-core: Fixups for GSP message queue and bindings

[PATCH 2/4] gpu: nova-core: gsp: Fix length of received messages

Posted by Alexandre Courbot 2 months, 2 weeks ago

The size of messages' payload is miscalculated, leading to extra data
passed to the message handler. While this is not a problem with our
current set of commands, others with a variable-length payload may
misbehave. Fix this.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/gsp/cmdq.rs | 11 +++++++----
 drivers/gpu/nova-core/gsp/fw.rs   |  2 +-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
index 6f946d14868a..dab73377c526 100644
--- a/drivers/gpu/nova-core/gsp/cmdq.rs
+++ b/drivers/gpu/nova-core/gsp/cmdq.rs
@@ -588,21 +588,24 @@ fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
             header.length(),
         );
 
+        // The length of the message that follows the header.
+        let msg_length = header.length() - size_of::<GspMsgElement>();
+
         // Check that the driver read area is large enough for the message.
-        if slice_1.len() + slice_2.len() < header.length() {
+        if slice_1.len() + slice_2.len() < msg_length {
             return Err(EIO);
         }
 
         // Cut the message slices down to the actual length of the message.
-        let (slice_1, slice_2) = if slice_1.len() > header.length() {
+        let (slice_1, slice_2) = if slice_1.len() > msg_length {
             // PANIC: we checked above that `slice_1` is at least as long as `msg_header.length()`.
-            (slice_1.split_at(header.length()).0, &slice_2[0..0])
+            (slice_1.split_at(msg_length).0, &slice_2[0..0])
         } else {
             (
                 slice_1,
                 // PANIC: we checked above that `slice_1.len() + slice_2.len()` is at least as
                 // large as `msg_header.length()`.
-                slice_2.split_at(header.length() - slice_1.len()).0,
+                slice_2.split_at(msg_length - slice_1.len()).0,
             )
         };
 
diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
index abffd6beec65..7fcba5afb0a3 100644
--- a/drivers/gpu/nova-core/gsp/fw.rs
+++ b/drivers/gpu/nova-core/gsp/fw.rs
@@ -853,7 +853,7 @@ pub(crate) fn set_checksum(&mut self, checksum: u32) {
         self.inner.checkSum = checksum;
     }
 
-    /// Returns the total length of the message.
+    /// Returns the total length of the message, message and RPC headers included.
     pub(crate) fn length(&self) -> usize {
         // `rpc.length` includes the length of the GspRpcHeader but not the message header.
         size_of::<Self>() - size_of::<bindings::rpc_message_header_v>()

-- 
2.51.2

Re: [PATCH 2/4] gpu: nova-core: gsp: Fix length of received messages

Posted by Joel Fernandes 1 month, 4 weeks ago

Hi Alex,

> On Nov 22, 2025, at 12:00 AM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> 
> The size of messages' payload is miscalculated, leading to extra data
> passed to the message handler. While this is not a problem with our
> current set of commands, others with a variable-length payload may
> misbehave. Fix this.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
> drivers/gpu/nova-core/gsp/cmdq.rs | 11 +++++++----
> drivers/gpu/nova-core/gsp/fw.rs   |  2 +-
> 2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> index 6f946d14868a..dab73377c526 100644
> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> @@ -588,21 +588,24 @@ fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
>             header.length(),
>         );
> 
> +        // The length of the message that follows the header.
> +        let msg_length = header.length() - size_of::<GspMsgElement>();

Is this immune to under flow without one of the checked subtraction wrappers? Either way, we should not tolerate the underflow I think. Which means it can panic when the rust overflow checks are enabled. Since the header length comes from firmware, this cannot be guaranteed to not underflow in the event of a malformed message.

thanks,

 - Joel



> +
>         // Check that the driver read area is large enough for the message.
> -        if slice_1.len() + slice_2.len() < header.length() {
> +        if slice_1.len() + slice_2.len() < msg_length {
>             return Err(EIO);
>         }
> 
>         // Cut the message slices down to the actual length of the message.
> -        let (slice_1, slice_2) = if slice_1.len() > header.length() {
> +        let (slice_1, slice_2) = if slice_1.len() > msg_length {
>             // PANIC: we checked above that `slice_1` is at least as long as `msg_header.length()`.
> -            (slice_1.split_at(header.length()).0, &slice_2[0..0])
> +            (slice_1.split_at(msg_length).0, &slice_2[0..0])
>         } else {
>             (
>                 slice_1,
>                 // PANIC: we checked above that `slice_1.len() + slice_2.len()` is at least as
>                 // large as `msg_header.length()`.
> -                slice_2.split_at(header.length() - slice_1.len()).0,
> +                slice_2.split_at(msg_length - slice_1.len()).0,
>             )
>         };
> 
> diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> index abffd6beec65..7fcba5afb0a3 100644
> --- a/drivers/gpu/nova-core/gsp/fw.rs
> +++ b/drivers/gpu/nova-core/gsp/fw.rs
> @@ -853,7 +853,7 @@ pub(crate) fn set_checksum(&mut self, checksum: u32) {
>         self.inner.checkSum = checksum;
>     }
> 
> -    /// Returns the total length of the message.
> +    /// Returns the total length of the message, message and RPC headers included.
>     pub(crate) fn length(&self) -> usize {
>         // `rpc.length` includes the length of the GspRpcHeader but not the message header.
>         size_of::<Self>() - size_of::<bindings::rpc_message_header_v>()
> 
> --
> 2.51.2
>

Re: [PATCH 2/4] gpu: nova-core: gsp: Fix length of received messages

Posted by Alistair Popple 1 month, 3 weeks ago

On 2025-12-12 at 18:59 +1100, Joel Fernandes <joelagnelf@nvidia.com> wrote...
> Hi Alex,
> 
> > On Nov 22, 2025, at 12:00 AM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> > 
> > The size of messages' payload is miscalculated, leading to extra data
> > passed to the message handler. While this is not a problem with our
> > current set of commands, others with a variable-length payload may
> > misbehave. Fix this.
> > 
> > Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> > ---
> > drivers/gpu/nova-core/gsp/cmdq.rs | 11 +++++++----
> > drivers/gpu/nova-core/gsp/fw.rs   |  2 +-
> > 2 files changed, 8 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> > index 6f946d14868a..dab73377c526 100644
> > --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> > +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> > @@ -588,21 +588,24 @@ fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
> >             header.length(),
> >         );
> > 
> > +        // The length of the message that follows the header.
> > +        let msg_length = header.length() - size_of::<GspMsgElement>();

Wouldn't it be better to add a new method to GspMsgElement to get the size of
the associated message rather than open coding it here?

 - Alistair

> Is this immune to under flow without one of the checked subtraction wrappers? Either way, we should not tolerate the underflow I think. Which means it can panic when the rust overflow checks are enabled. Since the header length comes from firmware, this cannot be guaranteed to not underflow in the event of a malformed message.
> 
> thanks,
> 
>  - Joel
> 
> 
> 
> > +
> >         // Check that the driver read area is large enough for the message.
> > -        if slice_1.len() + slice_2.len() < header.length() {
> > +        if slice_1.len() + slice_2.len() < msg_length {
> >             return Err(EIO);
> >         }
> > 
> >         // Cut the message slices down to the actual length of the message.
> > -        let (slice_1, slice_2) = if slice_1.len() > header.length() {
> > +        let (slice_1, slice_2) = if slice_1.len() > msg_length {
> >             // PANIC: we checked above that `slice_1` is at least as long as `msg_header.length()`.
> > -            (slice_1.split_at(header.length()).0, &slice_2[0..0])
> > +            (slice_1.split_at(msg_length).0, &slice_2[0..0])
> >         } else {
> >             (
> >                 slice_1,
> >                 // PANIC: we checked above that `slice_1.len() + slice_2.len()` is at least as
> >                 // large as `msg_header.length()`.
> > -                slice_2.split_at(header.length() - slice_1.len()).0,
> > +                slice_2.split_at(msg_length - slice_1.len()).0,
> >             )
> >         };
> > 
> > diff --git a/drivers/gpu/nova-core/gsp/fw.rs b/drivers/gpu/nova-core/gsp/fw.rs
> > index abffd6beec65..7fcba5afb0a3 100644
> > --- a/drivers/gpu/nova-core/gsp/fw.rs
> > +++ b/drivers/gpu/nova-core/gsp/fw.rs
> > @@ -853,7 +853,7 @@ pub(crate) fn set_checksum(&mut self, checksum: u32) {
> >         self.inner.checkSum = checksum;
> >     }
> > 
> > -    /// Returns the total length of the message.
> > +    /// Returns the total length of the message, message and RPC headers included.
> >     pub(crate) fn length(&self) -> usize {
> >         // `rpc.length` includes the length of the GspRpcHeader but not the message header.
> >         size_of::<Self>() - size_of::<bindings::rpc_message_header_v>()
> > 
> > --
> > 2.51.2
> >

Re: [PATCH 2/4] gpu: nova-core: gsp: Fix length of received messages

Posted by Alexandre Courbot 1 month, 3 weeks ago

On Mon Dec 15, 2025 at 8:29 AM JST, Alistair Popple wrote:
> On 2025-12-12 at 18:59 +1100, Joel Fernandes <joelagnelf@nvidia.com> wrote...
>> Hi Alex,
>> 
>> > On Nov 22, 2025, at 12:00 AM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> > 
>> > The size of messages' payload is miscalculated, leading to extra data
>> > passed to the message handler. While this is not a problem with our
>> > current set of commands, others with a variable-length payload may
>> > misbehave. Fix this.
>> > 
>> > Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> > ---
>> > drivers/gpu/nova-core/gsp/cmdq.rs | 11 +++++++----
>> > drivers/gpu/nova-core/gsp/fw.rs   |  2 +-
>> > 2 files changed, 8 insertions(+), 5 deletions(-)
>> > 
>> > diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>> > index 6f946d14868a..dab73377c526 100644
>> > --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>> > +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>> > @@ -588,21 +588,24 @@ fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
>> >             header.length(),
>> >         );
>> > 
>> > +        // The length of the message that follows the header.
>> > +        let msg_length = header.length() - size_of::<GspMsgElement>();
>
> Wouldn't it be better to add a new method to GspMsgElement to get the size of
> the associated message rather than open coding it here?

Agreed, that seems to make sense and should provide us with a safe
operation.

Re: [PATCH 2/4] gpu: nova-core: gsp: Fix length of received messages

Posted by Dirk Behme 1 month, 4 weeks ago

On 12.12.25 08:59, Joel Fernandes wrote:
> Hi Alex,
> 
>> On Nov 22, 2025, at 12:00 AM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>>
>> The size of messages' payload is miscalculated, leading to extra data
>> passed to the message handler. While this is not a problem with our
>> current set of commands, others with a variable-length payload may
>> misbehave. Fix this.
>>
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>> drivers/gpu/nova-core/gsp/cmdq.rs | 11 +++++++----
>> drivers/gpu/nova-core/gsp/fw.rs   |  2 +-
>> 2 files changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>> index 6f946d14868a..dab73377c526 100644
>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>> @@ -588,21 +588,24 @@ fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
>>             header.length(),
>>         );
>>
>> +        // The length of the message that follows the header.
>> +        let msg_length = header.length() - size_of::<GspMsgElement>();
> 
> Is this immune to under flow without one of the checked subtraction wrappers? Either way, we should not tolerate the underflow I think. Which means it can panic when the rust overflow checks are enabled. Since the header length comes from firmware, this cannot be guaranteed to not underflow in the event of a malformed message.

Would this be a possible use case for the untrusted data proposal

https://lwn.net/Articles/1034603/

?

Cheers

Dirk

Re: [PATCH 2/4] gpu: nova-core: gsp: Fix length of received messages

Posted by Alistair Popple 1 month, 3 weeks ago

On 2025-12-12 at 19:10 +1100, Dirk Behme <dirk.behme@gmail.com> wrote...
> On 12.12.25 08:59, Joel Fernandes wrote:
> > Hi Alex,
> > 
> >> On Nov 22, 2025, at 12:00 AM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> >>
> >> The size of messages' payload is miscalculated, leading to extra data
> >> passed to the message handler. While this is not a problem with our
> >> current set of commands, others with a variable-length payload may
> >> misbehave. Fix this.
> >>
> >> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> >> ---
> >> drivers/gpu/nova-core/gsp/cmdq.rs | 11 +++++++----
> >> drivers/gpu/nova-core/gsp/fw.rs   |  2 +-
> >> 2 files changed, 8 insertions(+), 5 deletions(-)
> >>
> >> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> >> index 6f946d14868a..dab73377c526 100644
> >> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> >> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> >> @@ -588,21 +588,24 @@ fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
> >>             header.length(),
> >>         );
> >>
> >> +        // The length of the message that follows the header.
> >> +        let msg_length = header.length() - size_of::<GspMsgElement>();
> > 
> > Is this immune to under flow without one of the checked subtraction wrappers? Either way, we should not tolerate the underflow I think. Which means it can panic when the rust overflow checks are enabled. Since the header length comes from firmware, this cannot be guaranteed to not underflow in the event of a malformed message.

I think we're guaranteed not to underflow here - check out the implementation for header.length():
 
    /// Returns the total length of the message.
    pub(crate) fn length(&self) -> usize {
        // `rpc.length` includes the length of the GspRpcHeader but not the message header.
        size_of::<Self>() - size_of::<bindings::rpc_message_header_v>()
            + num::u32_as_usize(self.inner.rpc.length)
    }

So the above calculation expands to:

msg_length = size_of::<Self>() - size_of::<bindings::rpc_message_header_v>()
            + num::u32_as_usize(self.inner.rpc.length) - size_of::<GspMsgElement>();

Where self.inner.rpc.length is guaranteed to be >= size_of::<rpc_message_header_v>() by the construction of the type.

> Would this be a possible use case for the untrusted data proposal
> 
> https://lwn.net/Articles/1034603/
> 
> ?

Responding here because Joel appears to have sent a HTML only response ;-)

I agree with Joel's points - this does sound useful but as a separate project.
I'd imagine we'd want to it one layer lower though - ie. in the construction of
the GspMsgElement.

> Cheers
> 
> Dirk

Re: [PATCH 2/4] gpu: nova-core: gsp: Fix length of received messages

Posted by Alexandre Courbot 1 month, 3 weeks ago

On Mon Dec 15, 2025 at 8:43 AM JST, Alistair Popple wrote:
> On 2025-12-12 at 19:10 +1100, Dirk Behme <dirk.behme@gmail.com> wrote...
>> On 12.12.25 08:59, Joel Fernandes wrote:
>> > Hi Alex,
>> > 
>> >> On Nov 22, 2025, at 12:00 AM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>> >>
>> >> The size of messages' payload is miscalculated, leading to extra data
>> >> passed to the message handler. While this is not a problem with our
>> >> current set of commands, others with a variable-length payload may
>> >> misbehave. Fix this.
>> >>
>> >> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> >> ---
>> >> drivers/gpu/nova-core/gsp/cmdq.rs | 11 +++++++----
>> >> drivers/gpu/nova-core/gsp/fw.rs   |  2 +-
>> >> 2 files changed, 8 insertions(+), 5 deletions(-)
>> >>
>> >> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>> >> index 6f946d14868a..dab73377c526 100644
>> >> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>> >> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>> >> @@ -588,21 +588,24 @@ fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
>> >>             header.length(),
>> >>         );
>> >>
>> >> +        // The length of the message that follows the header.
>> >> +        let msg_length = header.length() - size_of::<GspMsgElement>();
>> > 
>> > Is this immune to under flow without one of the checked subtraction wrappers? Either way, we should not tolerate the underflow I think. Which means it can panic when the rust overflow checks are enabled. Since the header length comes from firmware, this cannot be guaranteed to not underflow in the event of a malformed message.
>
> I think we're guaranteed not to underflow here - check out the implementation for header.length():
>  
>     /// Returns the total length of the message.
>     pub(crate) fn length(&self) -> usize {
>         // `rpc.length` includes the length of the GspRpcHeader but not the message header.
>         size_of::<Self>() - size_of::<bindings::rpc_message_header_v>()
>             + num::u32_as_usize(self.inner.rpc.length)
>     }
>
> So the above calculation expands to:
>
> msg_length = size_of::<Self>() - size_of::<bindings::rpc_message_header_v>()
>             + num::u32_as_usize(self.inner.rpc.length) - size_of::<GspMsgElement>();
>
> Where self.inner.rpc.length is guaranteed to be >= size_of::<rpc_message_header_v>() by the construction of the type.

That's correct, although it defers the possibility of underflow to
`length`, albeit using two constants. Still, doing this as a const would
catch any potential issue at build-time:

    pub(crate) fn length(&self) -> usize {
        // `rpc.length` includes the length of the GspRpcHeader but not the message header.
        const RPC_LENGTH: usize = size_of::<Self>() - size_of::<bindings::rpc_message_header_v>();

        RPC_LENGTH + num::u32_as_usize(self.inner.rpc.length)
    }

This length computation has been the subject of debate between me and
Alistair back when we were writing the code, so this part can be subject
to change if doing so limits the amount of potentially panicking
operations.

>
>> Would this be a possible use case for the untrusted data proposal
>> 
>> https://lwn.net/Articles/1034603/
>> 
>> ?
>
> Responding here because Joel appears to have sent a HTML only response ;-)
>
> I agree with Joel's points - this does sound useful but as a separate project.
> I'd imagine we'd want to it one layer lower though - ie. in the construction of
> the GspMsgElement.

How would that be beneficial? We would need to use `unsafe` to access
the data, but then unless we can encode the guarantees that we verified
in the type itself (or its invariants, but that would quickly become
cumbersome to manage) we would still have the same ops ongoing.

IMHO the simple and safe way is to use checked operations with data that
comes from the GSP.

Re: [PATCH 2/4] gpu: nova-core: gsp: Fix length of received messages

Posted by Joel Fernandes 1 month, 3 weeks ago

Hi Alistair,

> On Dec 15, 2025, at 8:43 AM, Alistair Popple <apopple@nvidia.com> wrote:
> 
> On 2025-12-12 at 19:10 +1100, Dirk Behme <dirk.behme@gmail.com> wrote...
>>> On 12.12.25 08:59, Joel Fernandes wrote:
>>> Hi Alex,
>>> 
>>>> On Nov 22, 2025, at 12:00 AM, Alexandre Courbot <acourbot@nvidia.com> wrote:
>>>> 
>>>> The size of messages' payload is miscalculated, leading to extra data
>>>> passed to the message handler. While this is not a problem with our
>>>> current set of commands, others with a variable-length payload may
>>>> misbehave. Fix this.
>>>> 
>>>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>>>> ---
>>>> drivers/gpu/nova-core/gsp/cmdq.rs | 11 +++++++----
>>>> drivers/gpu/nova-core/gsp/fw.rs   |  2 +-
>>>> 2 files changed, 8 insertions(+), 5 deletions(-)
>>>> 
>>>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
>>>> index 6f946d14868a..dab73377c526 100644
>>>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
>>>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
>>>> @@ -588,21 +588,24 @@ fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
>>>>            header.length(),
>>>>        );
>>>> 
>>>> +        // The length of the message that follows the header.
>>>> +        let msg_length = header.length() - size_of::<GspMsgElement>();
>>> 
>>> Is this immune to under flow without one of the checked subtraction wrappers? Either way, we should not tolerate the underflow I think. Which means it can panic when the rust overflow checks are enabled. Since the header length comes from firmware, this cannot be guaranteed to not underflow in the event of a malformed message.
> 
> I think we're guaranteed not to underflow here - check out the implementation for header.length():
> 
>    /// Returns the total length of the message.
>    pub(crate) fn length(&self) -> usize {
>        // `rpc.length` includes the length of the GspRpcHeader but not the message header.
>        size_of::<Self>() - size_of::<bindings::rpc_message_header_v>()
>            + num::u32_as_usize(self.inner.rpc.length)
>    }
> 
> So the above calculation expands to:
> 
> msg_length = size_of::<Self>() - size_of::<bindings::rpc_message_header_v>()
>            + num::u32_as_usize(self.inner.rpc.length) - size_of::<GspMsgElement>();
> 
> Where self.inner.rpc.length is guaranteed to be >= size_of::<rpc_message_header_v>() by the construction of the type.

But this length field is coming from the firmware, correct? The guarantee is provided by firmware, not by Rust code calculating the length.

Maybe Rust validating that the length matches, or is greater than or equal to, the message header would be one way to avoid doing the checked subtraction. I would still be comfortable doing the checked subtraction in case the firmware payload in the message buffer is corrupted and the length field is corrupted.

I think Rust cannot trust fields coming from the firmware and needs to check them to prevent undefined behavior. Or maybe the policy is to include safety comments, like we do when expecting C code to behave in a certain way. I do not know. But we should identify the policy for this and stick to it for future such issues.

I think one way to verify that there is a Rust guarantee about the length field is to do the unchecked subtraction, compile the code, and then see if the generated code has any panics in it (With the overflow checking config enabled.)

If it does, then dead code elimination could not determine at compile time that the subtraction would not overflow. Right?

> 
>> Would this be a possible use case for the untrusted data proposal
>> 
>> https://lwn.net/Articles/1034603/
>> 
>> ?
> 
> Responding here because Joel appears to have sent a HTML only response ;-)

Sorry. :)

> 
> I agree with Joel's points - this does sound useful but as a separate project.
> I'd imagine we'd want to it one layer lower though - ie. in the construction of
> the GspMsgElement.

Agreed, thanks.

 - Joel 



> 
>> Cheers
>> 
>> Dirk

Re: [PATCH 2/4] gpu: nova-core: gsp: Fix length of received messages

Posted by Alistair Popple 1 month, 3 weeks ago

On 2025-12-15 at 12:22 +1100, Joel Fernandes <joelagnelf@nvidia.com> wrote...
> Hi Alistair,
> 
> > On Dec 15, 2025, at 8:43 AM, Alistair Popple <apopple@nvidia.com> wrote:
> > 
> > On 2025-12-12 at 19:10 +1100, Dirk Behme <dirk.behme@gmail.com> wrote...
> >>> On 12.12.25 08:59, Joel Fernandes wrote:
> >>> Hi Alex,
> >>> 
> >>>> On Nov 22, 2025, at 12:00 AM, Alexandre Courbot <acourbot@nvidia.com> wrote:
> >>>> 
> >>>> The size of messages' payload is miscalculated, leading to extra data
> >>>> passed to the message handler. While this is not a problem with our
> >>>> current set of commands, others with a variable-length payload may
> >>>> misbehave. Fix this.
> >>>> 
> >>>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> >>>> ---
> >>>> drivers/gpu/nova-core/gsp/cmdq.rs | 11 +++++++----
> >>>> drivers/gpu/nova-core/gsp/fw.rs   |  2 +-
> >>>> 2 files changed, 8 insertions(+), 5 deletions(-)
> >>>> 
> >>>> diff --git a/drivers/gpu/nova-core/gsp/cmdq.rs b/drivers/gpu/nova-core/gsp/cmdq.rs
> >>>> index 6f946d14868a..dab73377c526 100644
> >>>> --- a/drivers/gpu/nova-core/gsp/cmdq.rs
> >>>> +++ b/drivers/gpu/nova-core/gsp/cmdq.rs
> >>>> @@ -588,21 +588,24 @@ fn wait_for_msg(&self, timeout: Delta) -> Result<GspMessage<'_>> {
> >>>>            header.length(),
> >>>>        );
> >>>> 
> >>>> +        // The length of the message that follows the header.
> >>>> +        let msg_length = header.length() - size_of::<GspMsgElement>();
> >>> 
> >>> Is this immune to under flow without one of the checked subtraction wrappers? Either way, we should not tolerate the underflow I think. Which means it can panic when the rust overflow checks are enabled. Since the header length comes from firmware, this cannot be guaranteed to not underflow in the event of a malformed message.
> > 
> > I think we're guaranteed not to underflow here - check out the implementation for header.length():
> > 
> >    /// Returns the total length of the message.
> >    pub(crate) fn length(&self) -> usize {
> >        // `rpc.length` includes the length of the GspRpcHeader but not the message header.
> >        size_of::<Self>() - size_of::<bindings::rpc_message_header_v>()
> >            + num::u32_as_usize(self.inner.rpc.length)
> >    }
> > 
> > So the above calculation expands to:
> > 
> > msg_length = size_of::<Self>() - size_of::<bindings::rpc_message_header_v>()
> >            + num::u32_as_usize(self.inner.rpc.length) - size_of::<GspMsgElement>();
> > 
> > Where self.inner.rpc.length is guaranteed to be >= size_of::<rpc_message_header_v>() by the construction of the type.
> 
> But this length field is coming from the firmware, correct? The guarantee is provided by firmware, not by Rust code calculating the length.

Oh you're right - I had this around the wrong way, I forgot our constructors were only used for creating messages not receiving them. Obviously killing time reading mailing lists in airports is not such a good idea :-)

> Maybe Rust validating that the length matches, or is greater than or equal to, the message header would be one way to avoid doing the checked subtraction. I would still be comfortable doing the checked subtraction in case the firmware payload in the message buffer is corrupted and the length field is corrupted.
> 
> I think Rust cannot trust fields coming from the firmware and needs to check them to prevent undefined behavior. Or maybe the policy is to include safety comments, like we do when expecting C code to behave in a certain way. I do not know. But we should identify the policy for this and stick to it for future such issues.
> 
> I think one way to verify that there is a Rust guarantee about the length field is to do the unchecked subtraction, compile the code, and then see if the generated code has any panics in it (With the overflow checking config enabled.)
> 
> If it does, then dead code elimination could not determine at compile time that the subtraction would not overflow. Right?

Agreed. Whilst we can't avoid needing to trust the firmware at some level crashing the kernel in response to a bad message would be bad and make debugging it a pain.

> > 
> >> Would this be a possible use case for the untrusted data proposal
> >> 
> >> https://lwn.net/Articles/1034603/
> >> 
> >> ?
> > 
> > Responding here because Joel appears to have sent a HTML only response ;-)
> 
> Sorry. :)
> 
> > 
> > I agree with Joel's points - this does sound useful but as a separate project.
> > I'd imagine we'd want to it one layer lower though - ie. in the construction of
> > the GspMsgElement.

So I guess we'd need our own implementation of a from_bytes trait or some such that would also validate the fields when deserialising the struct.

> Agreed, thanks.
> 
>  - Joel 
> 
> 
> 
> > 
> >> Cheers
> >> 
> >> Dirk

Re: [PATCH 2/4] gpu: nova-core: gsp: Fix length of received messages

Posted by John Hubbard 1 month, 3 weeks ago

On 12/15/25 10:22 AM, Joel Fernandes wrote:
>> On Dec 15, 2025, at 8:43 AM, Alistair Popple <apopple@nvidia.com> wrote:...
>> So the above calculation expands to:
>>
>> msg_length = size_of::<Self>() - size_of::<bindings::rpc_message_header_v>()
>>             + num::u32_as_usize(self.inner.rpc.length) - size_of::<GspMsgElement>();
>>
>> Where self.inner.rpc.length is guaranteed to be >= size_of::<rpc_message_header_v>() by the construction of the type.
> 
> But this length field is coming from the firmware, correct? The guarantee is provided by firmware, not by Rust code calculating the length.
> 
> Maybe Rust validating that the length matches, or is greater than or equal to, the message header would be one way to avoid doing the checked subtraction. I would still be comfortable doing the checked subtraction in case the firmware payload in the message buffer is corrupted and the length field is corrupted.
> 
> I think Rust cannot trust fields coming from the firmware and needs to check them to prevent undefined behavior.

Right. The firmware is a separate code base, running on a separate
processor, and it is not part of the Rust driver. So it cannot
participate in any of the various Rust guarantees.

We should treat data that comes from the firmware as not yet
validated, external data.

  Or maybe the policy is to include safety comments, like we do when 
expecting C code to behave in a certain way. I
do not know. But we should identify the policy for this and stick to it 
for future such issues.es

Yes. I've written above what I believe we should use for a policy.

thanks,
-- 
John Hubbard

Re: [PATCH 2/4] gpu: nova-core: gsp: Fix length of received messages

Posted by Timur Tabi 1 month, 3 weeks ago

On Mon, 2025-12-15 at 11:46 +0900, John Hubbard wrote:
> We should treat data that comes from the firmware as not yet
> validated, external data.

Maybe we should make it a policy that all data read from firmware / disk needs to be instantiate
via a constructor that validates all the fields.

Re: [PATCH 2/4] gpu: nova-core: gsp: Fix length of received messages

Posted by John Hubbard 1 month, 3 weeks ago

On 12/15/25 11:47 AM, Timur Tabi wrote:
> On Mon, 2025-12-15 at 11:46 +0900, John Hubbard wrote:
>> We should treat data that comes from the firmware as not yet
>> validated, external data.
> 
> Maybe we should make it a policy that all data read from firmware / disk needs to be instantiate
> via a constructor that validates all the fields.

That's one approach, and maybe it will end up being the best and
most desirable approach.

But I'd stop short of making it a full-blown policy, because there
are other ways of validating the data that comes from the firmware,
and there's not (yet?) any real need to require one particular
solution, I think.

thanks,
-- 
John Hubbard