[v2] gpu: nova-core: process and prepare more firmwares to boot GSP

[PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header

Posted by Alexandre Courbot 1 month, 1 week ago

Several firmware files loaded from userspace feature a common header
that describes their payload. Add basic support for it so subsequent
patches can leverage it.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/firmware.rs | 62 +++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 2931912ddba0ea1fe6d027ccec70b39cdb40344a..ccb4d19f8fa76b0e844252dede5f50b37c590571 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -4,11 +4,13 @@
 //! to be loaded into a given execution unit.
 
 use core::marker::PhantomData;
+use core::mem::size_of;
 
 use kernel::device;
 use kernel::firmware;
 use kernel::prelude::*;
 use kernel::str::CString;
+use kernel::transmute::FromBytes;
 
 use crate::dma::DmaObject;
 use crate::falcon::FalconFirmware;
@@ -150,6 +152,66 @@ fn no_patch_signature(self) -> FirmwareDmaObject<F, Signed> {
     }
 }
 
+/// Header common to most firmware files.
+#[repr(C)]
+#[derive(Debug, Clone)]
+struct BinHdr {
+    /// Magic number, must be `0x10de`.
+    bin_magic: u32,
+    /// Version of the header.
+    bin_ver: u32,
+    /// Size in bytes of the binary (to be ignored).
+    bin_size: u32,
+    /// Offset of the start of the application-specific header.
+    header_offset: u32,
+    /// Offset of the start of the data payload.
+    data_offset: u32,
+    /// Size in bytes of the data payload.
+    data_size: u32,
+}
+
+// SAFETY: all bit patterns are valid for this type, and it doesn't use interior mutability.
+unsafe impl FromBytes for BinHdr {}
+
+// A firmware blob starting with a `BinHdr`.
+struct BinFirmware<'a> {
+    hdr: BinHdr,
+    fw: &'a [u8],
+}
+
+#[expect(dead_code)]
+impl<'a> BinFirmware<'a> {
+    /// Interpret `fw` as a firmware image starting with a [`BinHdr`], and returns the
+    /// corresponding [`BinFirmware`] that can be used to extract its payload.
+    fn new(fw: &'a firmware::Firmware) -> Result<Self> {
+        const BIN_MAGIC: u32 = 0x10de;
+        let fw = fw.data();
+
+        fw.get(0..size_of::<BinHdr>())
+            // Extract header.
+            .and_then(BinHdr::from_bytes_copy)
+            // Validate header.
+            .and_then(|hdr| {
+                if hdr.bin_magic == BIN_MAGIC {
+                    Some(hdr)
+                } else {
+                    None
+                }
+            })
+            .map(|hdr| Self { hdr, fw })
+            .ok_or(EINVAL)
+    }
+
+    /// Returns the data payload of the firmware, or `None` if the data range is out of bounds of
+    /// the firmware image.
+    fn data(&self) -> Option<&[u8]> {
+        let fw_start = self.hdr.data_offset as usize;
+        let fw_size = self.hdr.data_size as usize;
+
+        self.fw.get(fw_start..fw_start + fw_size)
+    }
+}
+
 pub(crate) struct ModInfoBuilder<const N: usize>(firmware::ModInfoBuilder<N>);
 
 impl<const N: usize> ModInfoBuilder<N> {

-- 
2.50.1

Re: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header

Posted by John Hubbard 1 month, 1 week ago

On 8/25/25 9:07 PM, Alexandre Courbot wrote:
> Several firmware files loaded from userspace feature a common header
> that describes their payload. Add basic support for it so subsequent
> patches can leverage it.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  drivers/gpu/nova-core/firmware.rs | 62 +++++++++++++++++++++++++++++++++++++++
>  1 file changed, 62 insertions(+)
> 
> diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
> index 2931912ddba0ea1fe6d027ccec70b39cdb40344a..ccb4d19f8fa76b0e844252dede5f50b37c590571 100644
> --- a/drivers/gpu/nova-core/firmware.rs
> +++ b/drivers/gpu/nova-core/firmware.rs
> @@ -4,11 +4,13 @@
>  //! to be loaded into a given execution unit.
>  
>  use core::marker::PhantomData;
> +use core::mem::size_of;
>  
>  use kernel::device;
>  use kernel::firmware;
>  use kernel::prelude::*;
>  use kernel::str::CString;
> +use kernel::transmute::FromBytes;
>  
>  use crate::dma::DmaObject;
>  use crate::falcon::FalconFirmware;
> @@ -150,6 +152,66 @@ fn no_patch_signature(self) -> FirmwareDmaObject<F, Signed> {
>      }
>  }
>  
> +/// Header common to most firmware files.
> +#[repr(C)]
> +#[derive(Debug, Clone)]
> +struct BinHdr {
> +    /// Magic number, must be `0x10de`.

How about:

       /// Magic number, required to be equal to BIN_MAGIC

...and see below.

> +    bin_magic: u32,
> +    /// Version of the header.
> +    bin_ver: u32,
> +    /// Size in bytes of the binary (to be ignored).
> +    bin_size: u32,
> +    /// Offset of the start of the application-specific header.
> +    header_offset: u32,
> +    /// Offset of the start of the data payload.
> +    data_offset: u32,
> +    /// Size in bytes of the data payload.
> +    data_size: u32,
> +}
> +
> +// SAFETY: all bit patterns are valid for this type, and it doesn't use interior mutability.
> +unsafe impl FromBytes for BinHdr {}
> +
> +// A firmware blob starting with a `BinHdr`.
> +struct BinFirmware<'a> {
> +    hdr: BinHdr,
> +    fw: &'a [u8],
> +}
> +
> +#[expect(dead_code)]
> +impl<'a> BinFirmware<'a> {
> +    /// Interpret `fw` as a firmware image starting with a [`BinHdr`], and returns the
> +    /// corresponding [`BinFirmware`] that can be used to extract its payload.
> +    fn new(fw: &'a firmware::Firmware) -> Result<Self> {
> +        const BIN_MAGIC: u32 = 0x10de;

Let's promote this to approximately file scope and put it just
above the BinHdr definition. Then we end up with one definition
at the ideal scope, and the comment docs take better care of 
themselves.

> +        let fw = fw.data();
> +
> +        fw.get(0..size_of::<BinHdr>())
> +            // Extract header.
> +            .and_then(BinHdr::from_bytes_copy)
> +            // Validate header.
> +            .and_then(|hdr| {
> +                if hdr.bin_magic == BIN_MAGIC {
> +                    Some(hdr)
> +                } else {
> +                    None
> +                }
> +            })
> +            .map(|hdr| Self { hdr, fw })
> +            .ok_or(EINVAL)
> +    }
> +
> +    /// Returns the data payload of the firmware, or `None` if the data range is out of bounds of
> +    /// the firmware image.
> +    fn data(&self) -> Option<&[u8]> {
> +        let fw_start = self.hdr.data_offset as usize;
> +        let fw_size = self.hdr.data_size as usize;
> +
> +        self.fw.get(fw_start..fw_start + fw_size)

This worries me a bit, because we never checked that these bounds
are reasonable: within the range of the firmware, and not overflowing
(.checked_add() for example), that sort of thing.

Thoughts?

> +    }
> +}
> +
>  pub(crate) struct ModInfoBuilder<const N: usize>(firmware::ModInfoBuilder<N>);
>  
>  impl<const N: usize> ModInfoBuilder<N> {
> 

thanks,
-- 
John Hubbard

Re: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header

Posted by Alexandre Courbot 1 month, 1 week ago

On Wed Aug 27, 2025 at 10:34 AM JST, John Hubbard wrote:
<snip>
>> +    /// Returns the data payload of the firmware, or `None` if the data range is out of bounds of
>> +    /// the firmware image.
>> +    fn data(&self) -> Option<&[u8]> {
>> +        let fw_start = self.hdr.data_offset as usize;
>> +        let fw_size = self.hdr.data_size as usize;
>> +
>> +        self.fw.get(fw_start..fw_start + fw_size)
>
> This worries me a bit, because we never checked that these bounds
> are reasonable: within the range of the firmware, and not overflowing
> (.checked_add() for example), that sort of thing.
>
> Thoughts?

`get` returns `None` if the requested slice is out of bounds, so there
should be no risk of panicking here.

However, `fw_start + fw_size` can panic in debug configuration if it
overflows. In a release build I believe it will just happily wrap, and
`get` should consequently return `None` at the invalid range... Although
we can also get unlucky and produce a valid, yet incorrect, one.

This is actually something I've been thinking about while writing this
series and could not really decide upon: how to deal with operands and
functions in Rust that can potentially panic. Using `checked` operands
everywhere is a bit tedious, and even with great care there is no way to
guarantee that no panic occurs in a given function.

Panics are a big no-no in the kernel, yet I don't feel like we have the
proper tools to ensure they do not happen.

User-space has some crates like `no_panic`, but even these feel more
like hacks than anything else. Something at the compiler level would be
nice.

Maybe that would be a good discussion topic for the Plumber
Microconference?

Re: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header

Posted by Miguel Ojeda 1 month ago

On Wed, Aug 27, 2025 at 10:47 AM Alexandre Courbot <acourbot@nvidia.com> wrote:
>
> However, `fw_start + fw_size` can panic in debug configuration if it
> overflows. In a release build I believe it will just happily wrap, and

In the kernel, it is a panic in the default configuration, not just a debug one.

We have debug assertions too -- and those are disabled by default, but
they are separate from the overflow checking, which is the one enabled
by default.

So, any use of those operators is limited to cases where one knows,
somehow, that it will not overflow. And e.g. user-controlled inputs
cannot use them at all.

So, conceptually, something like this:

  - Static assert if the compiler knows it cannot fail.
  - Build assert if the optimizer knows it cannot fail.
  - Unfallible (like the possibly panicking operators) if the
developer knows it cannot fail.
  - Fallible/wrapping/saturating/... if the developer isn't sure or it
simply cannot be known until runtime. User-derived inputs must use
this option (or rarely the unsafe one).
  - Unsafe if the developer knows it cannot fail and the other options
are not acceptable for some reason. Ideally paired with a debug
assertion (the compiler adds these already for many unsafe
preconditions).

In the past I requested upstream Rust a way to have a "third mode"
("report and continue") for the operators so that it would wrap (like
the non-panicking mode) but allowing us to add a customization point
so that we can e.g. `WARN_ON_ONCE`.

That would be ideal, because it is a nice middle ground that is not as
punishing if a developer gets it wrong, especially for "normal users",
where panics typically mean lost reports etc. And other kernel users,
like cloud operators, can keep the panicking mode.

As for discussing no-panic, sure!

Cheers,
Miguel

Re: Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)

Posted by Alexandre Courbot 3 weeks, 2 days ago

Hi Miguel, sorry for the delay in replying!

On Thu Aug 28, 2025 at 8:26 PM JST, Miguel Ojeda wrote:
> On Wed, Aug 27, 2025 at 10:47 AM Alexandre Courbot <acourbot@nvidia.com> wrote:
>>
>> However, `fw_start + fw_size` can panic in debug configuration if it
>> overflows. In a release build I believe it will just happily wrap, and
>
> In the kernel, it is a panic in the default configuration, not just a debug one.
>
> We have debug assertions too -- and those are disabled by default, but
> they are separate from the overflow checking, which is the one enabled
> by default.
>
> So, any use of those operators is limited to cases where one knows,
> somehow, that it will not overflow. And e.g. user-controlled inputs
> cannot use them at all.
>
> So, conceptually, something like this:
>
>   - Static assert if the compiler knows it cannot fail.
>   - Build assert if the optimizer knows it cannot fail.
>   - Unfallible (like the possibly panicking operators) if the
> developer knows it cannot fail.
>   - Fallible/wrapping/saturating/... if the developer isn't sure or it
> simply cannot be known until runtime. User-derived inputs must use
> this option (or rarely the unsafe one).
>   - Unsafe if the developer knows it cannot fail and the other options
> are not acceptable for some reason. Ideally paired with a debug
> assertion (the compiler adds these already for many unsafe
> preconditions).
>
> In the past I requested upstream Rust a way to have a "third mode"
> ("report and continue") for the operators so that it would wrap (like
> the non-panicking mode) but allowing us to add a customization point
> so that we can e.g. `WARN_ON_ONCE`.

That would be nice, but also wouldn't cover all the cases where implicit
panics can happen, like out-of-bounds slice accesses - we can't have a
"report-and-continue" mode for these.

And that's really the elephant in the room IMHO: such panic sites can be
introduced implicitly, without the programmer realizing it, potentially
resulting in more runtime panics for Rust modules than one might expect
from a language whose main selling point is safety. I understand that
the previous sentence is a bit fallacious, since such panics indicate
bugs in the code that would likely go unnoticed in C (which is arguably
worse). But perception matters, and such crashes can be damaging to the
reputation of the project.

In user-space, crates like `no_panic` can provide a compile-time
guarantee that a given function cannot panic. I don't know how that
would translate to the kernel, but ideally we could have some support
from tooling (compiler and/or LSP?) to warn us of sites introduced in
the code. After all, since the compiler inserts these panic sites, it
should also be able to tell us where they are, allowing us to evaluate
(and hopefully remove) them before the code ships to users. Most of them
could then be eliminated by constraining inputs or using checked
variants.

I am not suggesting we should mandate that ALL Rust kernel code be
proven panic-free at compile time, however since I started writing
kernel code in Rust, I've often wished I had a simple way to check
whether my carefully-crafted function processing user-space data really
*is* panic-free.

> As for discussing no-panic, sure!

Writing a uC topic proposal for Plumbers right now. :)

Re: Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)

Posted by Miguel Ojeda 3 weeks, 2 days ago

On Wed, Sep 10, 2025 at 7:45 AM Alexandre Courbot <acourbot@nvidia.com> wrote:
>
> That would be nice, but also wouldn't cover all the cases where implicit
> panics can happen, like out-of-bounds slice accesses - we can't have a
> "report-and-continue" mode for these.

In principle, it could cover OOBs (even if it is a bad idea).

> But perception matters, and such crashes can be damaging to the reputation of the project.

Yes, we are well aware -- we have had it in our wish list for upstream
Rust for a long time.

We are tackling these things as we go -- e.g. we solved the `alloc`
panics and the ball on the report-and-continue mode for overflows
started moving.

Part of Rust for Linux is about making Rust the best language for
kernel development it can be, after all, and so far upstream Rust has
been quite helpful on giving us the features we need -- we meet with
them every two weeks, please join if you have time!

(Side note: the "safety" that Rust "sells" isn't really about avoiding
panics, although obviously it would be a nice feature to have.)

> Writing a uC topic proposal for Plumbers right now. :)

I see it there, thanks! I can briefly mention the topic in Kangrejos,
since we will have Rust representation, including from the language
team.

I don't think the discussion should focus much on "Do we need this?"
but rather more on "What exactly do we want? Would we be OK with a
local solution? Do we need/want a global one? Would we be OK with LSP?
Would we be OK with no panics after optimizations, e.g. a link time
check? Or do we want full support in the language for guaranteed
non-panicking functions? Do we need exceptional carve-outs on such
checking for particular language constructs?" and so on. And, of
course, "Who has time to write an RFC and implement an experiment
upstream if an approach is decided".

Getting data on "in practice, how much of an issue it is on the Rust
side" would help too -- those with actual users running Rust kernel
code probably can tell us something.

What I would personally expect to happen is that, over time, we
understand better what are the worst cases we must tackle.

Cheers,
Miguel

Re: Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)

Posted by Alexandre Courbot 3 weeks, 2 days ago

On Wed Sep 10, 2025 at 7:00 PM JST, Miguel Ojeda wrote:
> On Wed, Sep 10, 2025 at 7:45 AM Alexandre Courbot <acourbot@nvidia.com> wrote:
>>
>> That would be nice, but also wouldn't cover all the cases where implicit
>> panics can happen, like out-of-bounds slice accesses - we can't have a
>> "report-and-continue" mode for these.
>
> In principle, it could cover OOBs (even if it is a bad idea).
>
>> But perception matters, and such crashes can be damaging to the reputation of the project.
>
> Yes, we are well aware -- we have had it in our wish list for upstream
> Rust for a long time.
>
> We are tackling these things as we go -- e.g. we solved the `alloc`
> panics and the ball on the report-and-continue mode for overflows
> started moving.
>
> Part of Rust for Linux is about making Rust the best language for
> kernel development it can be, after all, and so far upstream Rust has
> been quite helpful on giving us the features we need -- we meet with
> them every two weeks, please join if you have time!
>
> (Side note: the "safety" that Rust "sells" isn't really about avoiding
> panics, although obviously it would be a nice feature to have.)

That's right, these panics are actually the last line of safety to
prevent a program for doing something damaging. It is just that the
consequences in a regular program are not as heavy as in the kernel.

The only two options are either allowing user-space to crash the kernel
through a module with a missing bound check, or letting it tamper with
data it is not supposed to access. While the first option is terrible,
the second one is unacceptable - so at the end of the day what we likely
want is to keep the panic behavior and limit these occurrences as much
as possible through information to the programmer.

Build errors on such panic sites insertions, with the option to relax
the rule locally if a justifying SAFETY comment is provided? And as you
said, what do we do if a panic can be removed through a particular
optimization - does it become mandatory to build the kernel? Is it
applicable to all architectures and (in the future) all supported
compilers?

I suspect it will take more than Plumbers to get to the bottom of this.
:)

>
>> Writing a uC topic proposal for Plumbers right now. :)
>
> I see it there, thanks! I can briefly mention the topic in Kangrejos,
> since we will have Rust representation, including from the language
> team.
>
> I don't think the discussion should focus much on "Do we need this?"
> but rather more on "What exactly do we want? Would we be OK with a
> local solution? Do we need/want a global one? Would we be OK with LSP?
> Would we be OK with no panics after optimizations, e.g. a link time
> check? Or do we want full support in the language for guaranteed
> non-panicking functions? Do we need exceptional carve-outs on such
> checking for particular language constructs?" and so on. And, of
> course, "Who has time to write an RFC and implement an experiment
> upstream if an approach is decided".
>
> Getting data on "in practice, how much of an issue it is on the Rust
> side" would help too -- those with actual users running Rust kernel
> code probably can tell us something.
>
> What I would personally expect to happen is that, over time, we
> understand better what are the worst cases we must tackle.

Thanks, these are great directions to explore. I see that some thinking
has already been done on this, do we have a bug or tracking issue so I
can catch up with the discussions that have already taken place?

Re: Implicit panics

Posted by John Hubbard 3 weeks, 1 day ago

On 9/10/25 6:54 AM, Alexandre Courbot wrote:
> On Wed Sep 10, 2025 at 7:00 PM JST, Miguel Ojeda wrote:
>> On Wed, Sep 10, 2025 at 7:45 AM Alexandre Courbot <acourbot@nvidia.com> wrote:
...> Build errors on such panic sites insertions, with the option to relax
> the rule locally if a justifying SAFETY comment is provided? And as you
> said, what do we do if a panic can be removed through a particular
> optimization - does it become mandatory to build the kernel? Is it
> applicable to all architectures and (in the future) all supported
> compilers?

We have a very solid precedent for that: building the kernel already
requires -O2 optimization for *functional* correctness.

So Rust for Linux certainly can specify a certain set of build
options that are required--including options that are normally
considered optimizations.

thanks,
-- 
John Hubbard

Re: Implicit panics (was: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header)

Posted by Miguel Ojeda 3 weeks, 1 day ago

On Wed, Sep 10, 2025 at 3:55 PM Alexandre Courbot <acourbot@nvidia.com> wrote:
>
> The only two options are either allowing user-space to crash the kernel
> through a module with a missing bound check, or letting it tamper with

So we are definitely not aiming to allow that on purpose, i.e. I know
you said "missing", but just to clarify: they would of course be
considered a bug, just like hitting similar facilities in C code.

(In general, if we get into the situation where we can actually crash
the kernel from userspace, that is a CVE with the current rules
(modulo exceptional cases, e.g. the recent discussion about debugging
tools, for instance), regardless of whether it was due to a Rust panic
or not.)

The Rust panic is, as you say, the last line of defense, attempting to
limit the damage. I think that is worth it for at least some users
that want it (like cloud providers) and some parts of the kernel.
Sometimes it can be worse, though, i.e. perhaps the OOB or the
overflow was not actually a "big issue" "in practice" for a particular
case.

I think it will likely depend on the kind of code we are talking about
-- not all panics are the same, and not all code is the same. For
random modules, for instance, yes, we should definitely prevent
developers from writing panics as much as possible (or not at all);
the same way we try to prevent them from writing unsafe code -- after
all, panicking operations and unsafe code are both an statement that
we are sure "something cannot happen".

So the more we prevent APIs that do not need to rely on panicking (nor
unsafe code) on drivers etc., the better.

I also think forcing to have no panics at all (i.e. globally) would
force us to handle way more truly impossible cases than C does
(unsafely so), which isn't great and has downsides too (increased
complexity, for one, which can also lead to more bugs). And we don't
want to go memory unsafe for all those either (I mean, it could be an
option, but I wouldn't recommend it, and it would still be a bug if
hit). So for certain cases panicking may be a reasonable option in
practice -- that is where I wanted us to get more experience to know
how we fare vs. C here and thus the data request.

It will also depend on what upstream Rust can give us (more on that
below). For instance, having an enforced way with carve-outs that need
to be annotated would at least make it stand out to reviewers and make
developers think thrice.

> Thanks, these are great directions to explore. I see that some thinking
> has already been done on this, do we have a bug or tracking issue so I
> can catch up with the discussions that have already taken place?

In general, I do my best to track this kind of thing in the issues
linked from issue #2 ("Rust features" in this case:
https://github.com/Rust-for-Linux/linux/issues/354), and you can see
some pointers for the overflow part. For no panics there is also an
entry for a long time, but I don't think there has been any "deep"
discussion on it -- I mean between Rust and the kernel (there are of
course discussions in upstream Rust, and there also have been in our
side in the mailing list). I just added a couple recent pointers, I
can look for more.

In that meeting I mentioned with them, I raised this a month or two
ago, and I did so today too, and mentioned that you wanted to discuss
it in LPC. If you (or e.g. someone from NVIDIA) have time to work on
this, including in upstream Rust (e.g. writing an RFC, implementing an
experiment...), then please let me know. They are open to the
possibility of having a Rust project goal for the next round on no
panics (2026H1), so it is a good chance to move things forward. We had
today a discussion on potential forms it could take (an attribute on a
function, guaranteed or not, escape hatches or not, lints, an enforced
comment, Klint, the report-and-continue feature...) and related
ongoing efforts (having a `nopanic` effect, `target_feature(enable
=`...).

I hope that helps, and thanks!

Cheers,
Miguel

Re: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header

Posted by John Hubbard 1 month ago

On 8/27/25 1:47 AM, Alexandre Courbot wrote:
> On Wed Aug 27, 2025 at 10:34 AM JST, John Hubbard wrote:
> <snip>
>>> +    /// Returns the data payload of the firmware, or `None` if the data range is out of bounds of
>>> +    /// the firmware image.
>>> +    fn data(&self) -> Option<&[u8]> {
>>> +        let fw_start = self.hdr.data_offset as usize;
>>> +        let fw_size = self.hdr.data_size as usize;
>>> +
>>> +        self.fw.get(fw_start..fw_start + fw_size)
>>
>> This worries me a bit, because we never checked that these bounds
>> are reasonable: within the range of the firmware, and not overflowing
>> (.checked_add() for example), that sort of thing.
>>
>> Thoughts?
> 
> `get` returns `None` if the requested slice is out of bounds, so there
> should be no risk of panicking here.

I was wondering about the bounds themselves, though. Couldn't they
be wrong? (Do we care?)

> 
> However, `fw_start + fw_size` can panic in debug configuration if it
> overflows. In a release build I believe it will just happily wrap, and
> `get` should consequently return `None` at the invalid range... Although
> we can also get unlucky and produce a valid, yet incorrect, one.
> 
> This is actually something I've been thinking about while writing this
> series and could not really decide upon: how to deal with operands and
> functions in Rust that can potentially panic. Using `checked` operands
> everywhere is a bit tedious, and even with great care there is no way to
> guarantee that no panic occurs in a given function.

Yes, .checked_add() all over the place is just awful, would like to 
avoid that for sure.

> 
> Panics are a big no-no in the kernel, yet I don't feel like we have the
> proper tools to ensure they do not happen.
> 
> User-space has some crates like `no_panic`, but even these feel more
> like hacks than anything else. Something at the compiler level would be
> nice.
> 
> Maybe that would be a good discussion topic for the Plumber
> Microconference?

Yes. And maybe even for Kangrejos.

thanks,
-- 
John Hubbard

Re: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header

Posted by Alexandre Courbot 1 month ago

On Thu Aug 28, 2025 at 6:50 AM JST, John Hubbard wrote:
> On 8/27/25 1:47 AM, Alexandre Courbot wrote:
>> On Wed Aug 27, 2025 at 10:34 AM JST, John Hubbard wrote:
>> <snip>
>>>> +    /// Returns the data payload of the firmware, or `None` if the data range is out of bounds of
>>>> +    /// the firmware image.
>>>> +    fn data(&self) -> Option<&[u8]> {
>>>> +        let fw_start = self.hdr.data_offset as usize;
>>>> +        let fw_size = self.hdr.data_size as usize;
>>>> +
>>>> +        self.fw.get(fw_start..fw_start + fw_size)
>>>
>>> This worries me a bit, because we never checked that these bounds
>>> are reasonable: within the range of the firmware, and not overflowing
>>> (.checked_add() for example), that sort of thing.
>>>
>>> Thoughts?
>> 
>> `get` returns `None` if the requested slice is out of bounds, so there
>> should be no risk of panicking here.
>
> I was wondering about the bounds themselves, though. Couldn't they
> be wrong? (Do we care?)

Not sure what you mean by wrong bounds here? Do you mean what if the
header data is incorrect?

Re: [PATCH v2 2/8] gpu: nova-core: firmware: add support for common firmware header

Posted by John Hubbard 1 month ago

On 8/28/25 12:08 AM, Alexandre Courbot wrote:
...
>>>> This worries me a bit, because we never checked that these bounds
>>>> are reasonable: within the range of the firmware, and not overflowing
>>>> (.checked_add() for example), that sort of thing.
>>>>
>>>> Thoughts?
>>>
>>> `get` returns `None` if the requested slice is out of bounds, so there
>>> should be no risk of panicking here.
>>
>> I was wondering about the bounds themselves, though. Couldn't they
>> be wrong? (Do we care?)
> 
> Not sure what you mean by wrong bounds here? Do you mean what if the
> header data is incorrect?

Yes, that's what I meant. And I'm mainly trying to get some perspective
about what kinds of checking we should be doing.

In this case, it seems that we don't actually need anything more than
what you already have, so we're all good here.


thanks,
-- 
John Hubbard