[v2] gpu: nova-core: process and prepare more firmwares to boot GSP

[PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware

Posted by Alexandre Courbot 1 month, 1 week ago

The GSP firmware is a binary blob that is verified, loaded, and run by
the GSP bootloader. Its presentation is a bit peculiar as the GSP
bootloader expects to be given a DMA address to a 3-levels page table
mapping the GSP firmware at address 0 of its own address space.

Prepare such a structure containing the DMA-mapped firmware as well as
the DMA-mapped page tables, and a way to obtain the DMA handle of the
level 0 page table.

As we are performing the required ELF section parsing and radix3 page
table building, remove these items from the TODO file.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 Documentation/gpu/nova/core/todo.rst  |  17 -----
 drivers/gpu/nova-core/firmware.rs     | 110 +++++++++++++++++++++++++++++++-
 drivers/gpu/nova-core/firmware/gsp.rs | 117 ++++++++++++++++++++++++++++++++++
 drivers/gpu/nova-core/gsp.rs          |   4 ++
 drivers/gpu/nova-core/nova_core.rs    |   1 +
 5 files changed, 229 insertions(+), 20 deletions(-)

diff --git a/Documentation/gpu/nova/core/todo.rst b/Documentation/gpu/nova/core/todo.rst
index 89431fec9041b1f35cc55799c91f48dc6bc918eb..0972cb905f7ae64dfbaef4808276757319009e9c 100644
--- a/Documentation/gpu/nova/core/todo.rst
+++ b/Documentation/gpu/nova/core/todo.rst
@@ -229,23 +229,6 @@ Rust abstraction for debugfs APIs.
 GPU (general)
 =============
 
-Parse firmware headers
-----------------------
-
-Parse ELF headers from the firmware files loaded from the filesystem.
-
-| Reference: ELF utils
-| Complexity: Beginner
-| Contact: Abdiel Janulgue
-
-Build radix3 page table
------------------------
-
-Build the radix3 page table to map the firmware.
-
-| Complexity: Intermediate
-| Contact: Abdiel Janulgue
-
 Initial Devinit support
 -----------------------
 
diff --git a/drivers/gpu/nova-core/firmware.rs b/drivers/gpu/nova-core/firmware.rs
index 9bee0e0a0ab99d10be7e56d366970fdf4c813fc4..fb751287e938e6a323db185ff8c4ba2781d25285 100644
--- a/drivers/gpu/nova-core/firmware.rs
+++ b/drivers/gpu/nova-core/firmware.rs
@@ -7,6 +7,7 @@
 use core::mem::size_of;
 
 use booter::BooterFirmware;
+use gsp::GspFirmware;
 use kernel::device;
 use kernel::firmware;
 use kernel::prelude::*;
@@ -19,14 +20,100 @@
 use crate::falcon::FalconFirmware;
 use crate::falcon::{sec2::Sec2, Falcon};
 use crate::gpu;
-use crate::gpu::Chipset;
+use crate::gpu::{Architecture, Chipset};
 
 pub(crate) mod booter;
 pub(crate) mod fwsec;
+pub(crate) mod gsp;
 pub(crate) mod riscv;
 
 pub(crate) const FIRMWARE_VERSION: &str = "535.113.01";
 
+/// Ad-hoc and temporary module to extract sections from ELF images.
+///
+/// Some firmware images are currently packaged as ELF files, where sections names are used as keys
+/// to specific and related bits of data. Future firmware versions are scheduled to move away from
+/// that scheme before nova-core becomes stable, which means this module will eventually be
+/// removed.
+mod elf {
+    use kernel::bindings;
+    use kernel::str::CStr;
+    use kernel::transmute::FromBytes;
+
+    /// Newtype to provide a [`FromBytes`] implementation.
+    #[repr(transparent)]
+    struct Elf64Hdr(bindings::elf64_hdr);
+
+    // SAFETY: all bit patterns are valid for this type, and it doesn't use interior mutability.
+    unsafe impl FromBytes for Elf64Hdr {}
+
+    /// Tries to extract section with name `name` from the ELF64 image `elf`, and returns it.
+    pub(super) fn elf64_section<'a, 'b>(elf: &'a [u8], name: &'b str) -> Option<&'a [u8]> {
+        let hdr = &elf
+            .get(0..size_of::<bindings::elf64_hdr>())
+            .and_then(Elf64Hdr::from_bytes)?
+            .0;
+
+        // Get all the section headers.
+        let shdr = {
+            let shdr_num = usize::from(hdr.e_shnum);
+            let shdr_start = usize::try_from(hdr.e_shoff).ok()?;
+            let shdr_end = shdr_num
+                .checked_mul(size_of::<bindings::elf64_shdr>())
+                .and_then(|v| v.checked_add(shdr_start))?;
+
+            elf.get(shdr_start..shdr_end)
+                .map(|slice| slice.as_ptr())
+                .filter(|ptr| ptr.align_offset(align_of::<bindings::elf64_shdr>()) == 0)
+                // `FromBytes::from_bytes` does not support slices yet, so build it manually.
+                //
+                // SAFETY:
+                // * `get` guarantees that the slice is within the bounds of `elf` and of size
+                //   `elf64_shdr * shdr_num`.
+                // * We checked that `ptr` had the correct alignment for `elf64_shdr`.
+                .map(|ptr| unsafe {
+                    core::slice::from_raw_parts(ptr.cast::<bindings::elf64_shdr>(), shdr_num)
+                })?
+        };
+
+        // Get the strings table.
+        let strhdr = shdr.get(usize::from(hdr.e_shstrndx))?;
+
+        // Find the section which name matches `name` and return it.
+        shdr.iter()
+            .find(|sh| {
+                let Some(name_idx) = strhdr
+                    .sh_offset
+                    .checked_add(u64::from(sh.sh_name))
+                    .and_then(|idx| usize::try_from(idx).ok())
+                else {
+                    return false;
+                };
+
+                // Get the start of the name.
+                elf.get(name_idx..)
+                    // Stop at the first `0`.
+                    .and_then(|nstr| nstr.get(0..=nstr.iter().position(|b| *b == 0)?))
+                    // Convert into CStr. This should never fail because of the line above.
+                    .and_then(|nstr| CStr::from_bytes_with_nul(nstr).ok())
+                    // Convert into str.
+                    .and_then(|c_str| c_str.to_str().ok())
+                    // Check that the name matches.
+                    .map(|str| str == name)
+                    .unwrap_or(false)
+            })
+            // Return the slice containing the section.
+            .and_then(|sh| {
+                let start = usize::try_from(sh.sh_offset).ok()?;
+                let end = usize::try_from(sh.sh_size)
+                    .ok()
+                    .and_then(|sh_size| start.checked_add(sh_size))?;
+
+                elf.get(start..end)
+            })
+    }
+}
+
 /// Structure encapsulating the firmware blobs required for the GPU to operate.
 #[expect(dead_code)]
 pub(crate) struct Firmware {
@@ -36,7 +123,10 @@ pub(crate) struct Firmware {
     booter_unloader: BooterFirmware,
     /// GSP bootloader, verifies the GSP firmware before loading and running it.
     gsp_bootloader: RiscvFirmware,
-    gsp: firmware::Firmware,
+    /// GSP firmware.
+    gsp: Pin<KBox<GspFirmware>>,
+    /// GSP signatures, to be passed as parameter to the bootloader for validation.
+    gsp_sigs: DmaObject,
 }
 
 impl Firmware {
@@ -56,13 +146,27 @@ pub(crate) fn new(
                 .and_then(|path| firmware::Firmware::request(&path, dev))
         };
 
+        let gsp_fw = request("gsp")?;
+        let gsp = elf::elf64_section(gsp_fw.data(), ".fwimage")
+            .ok_or(EINVAL)
+            .map(|data| GspFirmware::new(dev, data))?;
+
+        let gsp_sigs_section = match chipset.arch() {
+            Architecture::Ampere => ".fwsignature_ga10x",
+            _ => return Err(ENOTSUPP),
+        };
+        let gsp_sigs = elf::elf64_section(gsp_fw.data(), gsp_sigs_section)
+            .ok_or(EINVAL)
+            .and_then(|data| DmaObject::from_data(dev, data))?;
+
         Ok(Firmware {
             booter_loader: request("booter_load")
                 .and_then(|fw| BooterFirmware::new(dev, &fw, sec2, bar))?,
             booter_unloader: request("booter_unload")
                 .and_then(|fw| BooterFirmware::new(dev, &fw, sec2, bar))?,
             gsp_bootloader: request("bootloader").and_then(|fw| RiscvFirmware::new(dev, &fw))?,
-            gsp: request("gsp")?,
+            gsp: KBox::pin_init(gsp, GFP_KERNEL)?,
+            gsp_sigs,
         })
     }
 }
diff --git a/drivers/gpu/nova-core/firmware/gsp.rs b/drivers/gpu/nova-core/firmware/gsp.rs
new file mode 100644
index 0000000000000000000000000000000000000000..f37bd619bfb71629ed86ee8b7828971bbe4c5916
--- /dev/null
+++ b/drivers/gpu/nova-core/firmware/gsp.rs
@@ -0,0 +1,117 @@
+// SPDX-License-Identifier: GPL-2.0
+
+use kernel::device;
+use kernel::dma::DataDirection;
+use kernel::dma::DmaAddress;
+use kernel::prelude::*;
+use kernel::scatterlist::Owned;
+use kernel::scatterlist::SGTable;
+
+use crate::dma::DmaObject;
+use crate::gsp::GSP_PAGE_SIZE;
+
+/// A device-mapped firmware with a set of (also device-mapped) pages tables mapping the firmware
+/// to the start of their own address space, also known as a `Radix3` firmware.
+#[pin_data]
+pub(crate) struct GspFirmware {
+    /// The GSP firmware inside a [`VVec`], device-mapped via a SG table.
+    #[pin]
+    fw: SGTable<Owned<VVec<u8>>>,
+    /// The level 2 page table, mapping [`Self::fw`] at its beginning.
+    #[pin]
+    lvl2: SGTable<Owned<VVec<u8>>>,
+    /// The level 1 page table, mapping [`Self::lvl2`] at its beginning.
+    #[pin]
+    lvl1: SGTable<Owned<VVec<u8>>>,
+    /// The level 0 page table, mapping [`Self::lvl1`] at its beginning.
+    lvl0: DmaObject,
+    /// Size in bytes of the firmware contained in [`Self::fw`].
+    pub size: usize,
+}
+
+impl GspFirmware {
+    /// Maps the GSP firmware image `fw` into `dev`'s address-space, and creates the page tables
+    /// expected by the GSP bootloader to load it.
+    pub(crate) fn new<'a>(
+        dev: &'a device::Device<device::Bound>,
+        fw: &'a [u8],
+    ) -> impl PinInit<Self, Error> + 'a {
+        try_pin_init!(&this in Self {
+            fw <- {
+                // Move the firmware into a vmalloc'd vector and map it into the device address
+                // space.
+                VVec::with_capacity(fw.len(), GFP_KERNEL)
+                .and_then(|mut v| {
+                    v.extend_from_slice(fw, GFP_KERNEL)?;
+                    Ok(v)
+                })
+                .map_err(|_| ENOMEM)
+                .map(|v| SGTable::new(dev, v, DataDirection::ToDevice, GFP_KERNEL))?
+            },
+            lvl2 <- {
+                // Allocate the level 2 page table, map the firmware onto it, and map it into the
+                // device address space.
+                // SAFETY: `this` is a valid pointer, and `fw` has been initialized.
+                let fw_sg_table = unsafe { &(*this.as_ptr()).fw };
+                VVec::<u8>::with_capacity(
+                    fw_sg_table.iter().count() * core::mem::size_of::<u64>(),
+                    GFP_KERNEL,
+                )
+                .map_err(|_| ENOMEM)
+                .and_then(|lvl2| map_into_lvl(fw_sg_table, lvl2))
+                .map(|lvl2| SGTable::new(dev, lvl2, DataDirection::ToDevice, GFP_KERNEL))?
+            },
+            lvl1 <- {
+                // Allocate the level 1 page table, map the level 2 page table onto it, and map it
+                // into the device address space.
+                // SAFETY: `this` is a valid pointer, and `lvl2` has been initialized.
+                let lvl2_sg_table = unsafe { &(*this.as_ptr()).lvl2 };
+                VVec::<u8>::with_capacity(
+                    lvl2_sg_table.iter().count() * core::mem::size_of::<u64>(),
+                    GFP_KERNEL,
+                )
+                .map_err(|_| ENOMEM)
+                .and_then(|lvl1| map_into_lvl(lvl2_sg_table, lvl1))
+                .map(|lvl1| SGTable::new(dev, lvl1, DataDirection::ToDevice, GFP_KERNEL))?
+            },
+            lvl0: {
+                // Allocate the level 0 page table as a device-visible DMA object, and map the
+                // level 1 page table onto it.
+                // SAFETY: `this` is a valid pointer, and `lvl1` has been initialized.
+                let lvl1_sg_table = unsafe { &(*this.as_ptr()).lvl1 };
+                let mut lvl0 = DmaObject::new(dev, GSP_PAGE_SIZE)?;
+                // SAFETY: we are the only owner of this newly-created object, making races
+                // impossible.
+                let lvl0_slice = unsafe { lvl0.as_slice_mut(0, GSP_PAGE_SIZE) }?;
+                lvl0_slice[0..core::mem::size_of::<u64>()].copy_from_slice(
+                    #[allow(clippy::useless_conversion)]
+                    &(u64::from(lvl1_sg_table.iter().next().unwrap().dma_address())).to_le_bytes(),
+                );
+
+                lvl0
+            },
+            size: fw.len(),
+        })
+    }
+
+    #[expect(unused)]
+    /// Returns the DMA handle of the level 0 page table.
+    pub(crate) fn lvl0_dma_handle(&self) -> DmaAddress {
+        self.lvl0.dma_handle()
+    }
+}
+
+/// Create a linear mapping the device mapping of the buffer described by `sg_table` into `dst`.
+fn map_into_lvl(sg_table: &SGTable<Owned<VVec<u8>>>, mut dst: VVec<u8>) -> Result<VVec<u8>> {
+    for sg_entry in sg_table.iter() {
+        // Number of pages we need to map.
+        let num_pages = (sg_entry.dma_len() as usize).div_ceil(GSP_PAGE_SIZE);
+
+        for i in 0..num_pages {
+            let entry = sg_entry.dma_address() + (i as u64 * GSP_PAGE_SIZE as u64);
+            dst.extend_from_slice(&entry.to_le_bytes(), GFP_KERNEL)?;
+        }
+    }
+
+    Ok(dst)
+}
diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
new file mode 100644
index 0000000000000000000000000000000000000000..a0e7ec5f6c9c959d57540b3ebf4b782f2e002b08
--- /dev/null
+++ b/drivers/gpu/nova-core/gsp.rs
@@ -0,0 +1,4 @@
+// SPDX-License-Identifier: GPL-2.0
+
+pub(crate) const GSP_PAGE_SHIFT: usize = 12;
+pub(crate) const GSP_PAGE_SIZE: usize = 1 << GSP_PAGE_SHIFT;
diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
index cb2bbb30cba142265b354c9acf70349a6e40759e..fffcaee2249fe6cd7f55a7291c1e44be42e791d9 100644
--- a/drivers/gpu/nova-core/nova_core.rs
+++ b/drivers/gpu/nova-core/nova_core.rs
@@ -9,6 +9,7 @@
 mod firmware;
 mod gfw;
 mod gpu;
+mod gsp;
 mod regs;
 mod util;
 mod vbios;

-- 
2.50.1

Re: [PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware

Posted by Danilo Krummrich 1 month ago

On 8/26/25 6:07 AM, Alexandre Courbot wrote:
>   /// Structure encapsulating the firmware blobs required for the GPU to operate.
>   #[expect(dead_code)]
>   pub(crate) struct Firmware {
> @@ -36,7 +123,10 @@ pub(crate) struct Firmware {
>       booter_unloader: BooterFirmware,
>       /// GSP bootloader, verifies the GSP firmware before loading and running it.
>       gsp_bootloader: RiscvFirmware,
> -    gsp: firmware::Firmware,
> +    /// GSP firmware.
> +    gsp: Pin<KBox<GspFirmware>>,

Is there a reason why we don't just propagate it through struct Gpu, which uses 
pin-init already?

You can make Firmware pin_data too and then everything is within the single 
allocation of struct Gpu.

Re: [PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware

Posted by Alexandre Courbot 1 month ago

On Thu Aug 28, 2025 at 8:27 PM JST, Danilo Krummrich wrote:
> On 8/26/25 6:07 AM, Alexandre Courbot wrote:
>>   /// Structure encapsulating the firmware blobs required for the GPU to operate.
>>   #[expect(dead_code)]
>>   pub(crate) struct Firmware {
>> @@ -36,7 +123,10 @@ pub(crate) struct Firmware {
>>       booter_unloader: BooterFirmware,
>>       /// GSP bootloader, verifies the GSP firmware before loading and running it.
>>       gsp_bootloader: RiscvFirmware,
>> -    gsp: firmware::Firmware,
>> +    /// GSP firmware.
>> +    gsp: Pin<KBox<GspFirmware>>,
>
> Is there a reason why we don't just propagate it through struct Gpu, which uses 
> pin-init already?
>
> You can make Firmware pin_data too and then everything is within the single 
> allocation of struct Gpu.

I tried doing that at first, and hit the problem that the `impl PinInit`
returned by `GspFirmware::new` borrows a reference to the GSP firmware
binary loaded by `Firmware::new` - when `Firmware::new` returns, the
firmware gets freed, and the borrow checker complains.

We could move the GSP firmware loading code into the `pin_init!` of
`Firmware::new`, but now we hit another problem: in `Gpu::new` the
following code is executed:

    FbLayout::new(chipset, bar, &fw.gsp_bootloader, &fw.gsp)?

which requires the `Firmware` instance, which doesn't exist yet as the
`Gpu` object isn't initialized until the end of the method.

So we could move `FbLayout`, and everything else created by `Gpu::new`
to become members of the `Gpu` instance. It does make sense actually:
this `new` method is doing a lot of stuff, such as running FRTS, and
with Alistair's series it even runs Booter, the sequencer and so on.
Maybe we should move all firmware execution to a separate method that is
called by `probe` after the `Gpu` is constructed, as right now the `Gpu`
constructor looks like it does a bit more than it should.

... but even when doing that, `Firmware::new` and `FbLayout::new` still
require a reference to the `Bar`, and... you get the idea. :)

So I don't think the current design allows us to do that easily or at
all, and even if it does, it will be at a significant cost in code
clarity. There is also the fact that I am considering making the
firmware member of `Gpu` a trait object: the boot sequence is so
different between pre and post-Hopper that I don't think it makes sense
to share the same `Firmware` structure between the two. I would rather
see `Firmware` as an opaque trait object, which provides high-level
methods such as "start GSP" behind which the specifics of each GPU
family are hidden. If we go with this design, `Firmware` will become a
trait object and so cannot be pinned into `Gpu`.

This doesn't change my observation that `Gpu::new` should not IMHO do
steps like booting the GSP - it should just acquire the resources it
needs, return the pinned GPU object, and then `probe` can continue the
boot sequence. Having the GPU object pinned and constructed early
simplifies things quite a bit as the more we progress with boot, the
harder it becomes to construct everything in place (and the `PinInit`
closure also becomes more and more complex).

I'm still laying down the general design, but I'm pretty convinced that
having `Firmware` as a trait object is the right way to abstract the
differences between GPU families.

Re: [PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware

Posted by Danilo Krummrich 1 month ago

On 8/29/25 1:16 PM, Alexandre Courbot wrote:
> On Thu Aug 28, 2025 at 8:27 PM JST, Danilo Krummrich wrote:
>> On 8/26/25 6:07 AM, Alexandre Courbot wrote:
>>>    /// Structure encapsulating the firmware blobs required for the GPU to operate.
>>>    #[expect(dead_code)]
>>>    pub(crate) struct Firmware {
>>> @@ -36,7 +123,10 @@ pub(crate) struct Firmware {
>>>        booter_unloader: BooterFirmware,
>>>        /// GSP bootloader, verifies the GSP firmware before loading and running it.
>>>        gsp_bootloader: RiscvFirmware,
>>> -    gsp: firmware::Firmware,
>>> +    /// GSP firmware.
>>> +    gsp: Pin<KBox<GspFirmware>>,
>>
>> Is there a reason why we don't just propagate it through struct Gpu, which uses
>> pin-init already?
>>
>> You can make Firmware pin_data too and then everything is within the single
>> allocation of struct Gpu.

Thanks a lot for the write-up below!

> I tried doing that at first, and hit the problem that the `impl PinInit`
> returned by `GspFirmware::new` borrows a reference to the GSP firmware
> binary loaded by `Firmware::new` - when `Firmware::new` returns, the
> firmware gets freed, and the borrow checker complains.
> 
> We could move the GSP firmware loading code into the `pin_init!` of
> `Firmware::new`, but now we hit another problem: in `Gpu::new` the
> following code is executed:
> 
>      FbLayout::new(chipset, bar, &fw.gsp_bootloader, &fw.gsp)?
> 
> which requires the `Firmware` instance, which doesn't exist yet as the
> `Gpu` object isn't initialized until the end of the method.
> 
> So we could move `FbLayout`, and everything else created by `Gpu::new`
> to become members of the `Gpu` instance. It does make sense actually:
> this `new` method is doing a lot of stuff, such as running FRTS, and
> with Alistair's series it even runs Booter, the sequencer and so on.
> Maybe we should move all firmware execution to a separate method that is
> called by `probe` after the `Gpu` is constructed, as right now the `Gpu`
> constructor looks like it does a bit more than it should.

Absolutely, executing the firmware should be a separate method. Having it in the
constructor makes things more difficult.
> ... but even when doing that, `Firmware::new` and `FbLayout::new` still
> require a reference to the `Bar`, and... you get the idea. :)

Lifetime wise this should be fine, the &Bar out-lives the constructor, since
it's lifetime is bound to the &Device<Bound> which lives for the entire duration
of probe().
> So I don't think the current design allows us to do that easily or at
> all, and even if it does, it will be at a significant cost in code
> clarity. There is also the fact that I am considering making the
> firmware member of `Gpu` a trait object: the boot sequence is so
> different between pre and post-Hopper that I don't think it makes sense
> to share the same `Firmware` structure between the two. I would rather
> see `Firmware` as an opaque trait object, which provides high-level
> methods such as "start GSP" behind which the specifics of each GPU
> family are hidden. If we go with this design, `Firmware` will become a
> trait object and so cannot be pinned into `Gpu`.
> 
> This doesn't change my observation that `Gpu::new` should not IMHO do
> steps like booting the GSP - it should just acquire the resources it
> needs, return the pinned GPU object, and then `probe` can continue the
> boot sequence. Having the GPU object pinned and constructed early
> simplifies things quite a bit as the more we progress with boot, the
> harder it becomes to construct everything in place (and the `PinInit`
> closure also becomes more and more complex).
> 
> I'm still laying down the general design, but I'm pretty convinced that
> having `Firmware` as a trait object is the right way to abstract the
> differences between GPU families.

Makes sense, it's fine with me to keep this in its separate allocation for the
purpose of making Firmware an opaque trait object, which sounds reasonable.

But we should really properly separate construction of the GPU structure from
firmware boot code execution as you say. And actually move the construction of
the GPU object into try_pin_init!().

Re: [PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware

Posted by Alexandre Courbot 1 month ago

On Sat Aug 30, 2025 at 9:56 PM JST, Danilo Krummrich wrote:
> On 8/29/25 1:16 PM, Alexandre Courbot wrote:
>> On Thu Aug 28, 2025 at 8:27 PM JST, Danilo Krummrich wrote:
>>> On 8/26/25 6:07 AM, Alexandre Courbot wrote:
>>>>    /// Structure encapsulating the firmware blobs required for the GPU to operate.
>>>>    #[expect(dead_code)]
>>>>    pub(crate) struct Firmware {
>>>> @@ -36,7 +123,10 @@ pub(crate) struct Firmware {
>>>>        booter_unloader: BooterFirmware,
>>>>        /// GSP bootloader, verifies the GSP firmware before loading and running it.
>>>>        gsp_bootloader: RiscvFirmware,
>>>> -    gsp: firmware::Firmware,
>>>> +    /// GSP firmware.
>>>> +    gsp: Pin<KBox<GspFirmware>>,
>>>
>>> Is there a reason why we don't just propagate it through struct Gpu, which uses
>>> pin-init already?
>>>
>>> You can make Firmware pin_data too and then everything is within the single
>>> allocation of struct Gpu.
>
> Thanks a lot for the write-up below!
>
>> I tried doing that at first, and hit the problem that the `impl PinInit`
>> returned by `GspFirmware::new` borrows a reference to the GSP firmware
>> binary loaded by `Firmware::new` - when `Firmware::new` returns, the
>> firmware gets freed, and the borrow checker complains.
>> 
>> We could move the GSP firmware loading code into the `pin_init!` of
>> `Firmware::new`, but now we hit another problem: in `Gpu::new` the
>> following code is executed:
>> 
>>      FbLayout::new(chipset, bar, &fw.gsp_bootloader, &fw.gsp)?
>> 
>> which requires the `Firmware` instance, which doesn't exist yet as the
>> `Gpu` object isn't initialized until the end of the method.
>> 
>> So we could move `FbLayout`, and everything else created by `Gpu::new`
>> to become members of the `Gpu` instance. It does make sense actually:
>> this `new` method is doing a lot of stuff, such as running FRTS, and
>> with Alistair's series it even runs Booter, the sequencer and so on.
>> Maybe we should move all firmware execution to a separate method that is
>> called by `probe` after the `Gpu` is constructed, as right now the `Gpu`
>> constructor looks like it does a bit more than it should.
>
> Absolutely, executing the firmware should be a separate method. Having it in the
> constructor makes things more difficult.
>> ... but even when doing that, `Firmware::new` and `FbLayout::new` still
>> require a reference to the `Bar`, and... you get the idea. :)
>
> Lifetime wise this should be fine, the &Bar out-lives the constructor, since
> it's lifetime is bound to the &Device<Bound> which lives for the entire duration
> of probe().

The &Bar is actually obtained inside the constructor (which is passed
the `Arc<Devres<Bar0>>`), so I don't think that would even work without
more clever tricks. :)

>> So I don't think the current design allows us to do that easily or at
>> all, and even if it does, it will be at a significant cost in code
>> clarity. There is also the fact that I am considering making the
>> firmware member of `Gpu` a trait object: the boot sequence is so
>> different between pre and post-Hopper that I don't think it makes sense
>> to share the same `Firmware` structure between the two. I would rather
>> see `Firmware` as an opaque trait object, which provides high-level
>> methods such as "start GSP" behind which the specifics of each GPU
>> family are hidden. If we go with this design, `Firmware` will become a
>> trait object and so cannot be pinned into `Gpu`.
>> 
>> This doesn't change my observation that `Gpu::new` should not IMHO do
>> steps like booting the GSP - it should just acquire the resources it
>> needs, return the pinned GPU object, and then `probe` can continue the
>> boot sequence. Having the GPU object pinned and constructed early
>> simplifies things quite a bit as the more we progress with boot, the
>> harder it becomes to construct everything in place (and the `PinInit`
>> closure also becomes more and more complex).
>> 
>> I'm still laying down the general design, but I'm pretty convinced that
>> having `Firmware` as a trait object is the right way to abstract the
>> differences between GPU families.
>
> Makes sense, it's fine with me to keep this in its separate allocation for the
> purpose of making Firmware an opaque trait object, which sounds reasonable.
>
> But we should really properly separate construction of the GPU structure from
> firmware boot code execution as you say. And actually move the construction of
> the GPU object into try_pin_init!().

Yes, and I'm glad you agree with this idea as the current design of
putting everything inside the GPU is a bit limiting.

Regarding the firmware, I also think this should undergo a redesign -
right now we are putting unrelated stuff inside the `Firmware`
structure, and this won't scale well when we start supporting Hopper+
which use a very different way to start the GSP.

I'll give more details in my review of Alistair's series, and hopefully
send a v3 with some of these ideas implemented soon.

Re: [PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware

Posted by John Hubbard 1 month ago

On 8/25/25 9:07 PM, Alexandre Courbot wrote:
> The GSP firmware is a binary blob that is verified, loaded, and run by
> the GSP bootloader. Its presentation is a bit peculiar as the GSP
> bootloader expects to be given a DMA address to a 3-levels page table
> mapping the GSP firmware at address 0 of its own address space.
> 
> Prepare such a structure containing the DMA-mapped firmware as well as
> the DMA-mapped page tables, and a way to obtain the DMA handle of the
> level 0 page table.
> 
> As we are performing the required ELF section parsing and radix3 page
> table building, remove these items from the TODO file.
> 
> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
> ---
>  Documentation/gpu/nova/core/todo.rst  |  17 -----
>  drivers/gpu/nova-core/firmware.rs     | 110 +++++++++++++++++++++++++++++++-
>  drivers/gpu/nova-core/firmware/gsp.rs | 117 ++++++++++++++++++++++++++++++++++
>  drivers/gpu/nova-core/gsp.rs          |   4 ++
>  drivers/gpu/nova-core/nova_core.rs    |   1 +
>  5 files changed, 229 insertions(+), 20 deletions(-)
 
Code looks good. Or more accurately, it's working on my machine, and
I think I understand it, aside from the SG Table internals.

The documentation on the whole "radix 3" aspect is too light though, so
I've created some that you can add in if you agree with it.

...
> diff --git a/drivers/gpu/nova-core/firmware/gsp.rs b/drivers/gpu/nova-core/firmware/gsp.rs
...
> +/// A device-mapped firmware with a set of (also device-mapped) pages tables mapping the firmware
> +/// to the start of their own address space, also known as a `Radix3` firmware.

I'd like to replace the above two lines with something like this:

/// GSP firmware with 3-level radix page tables for the GSP bootloader.
///
/// The bootloader expects firmware to be mapped starting at address 0 in GSP's virtual address
/// space:
///
/// ```text
/// Level 0:  1 page, 1 entry         -> points to first level 1 page
/// Level 1:  Multiple pages/entries  -> each entry points to a level 2 page
/// Level 2:  Multiple pages/entries  -> each entry points to a firmware page
/// ```
///
/// Each page is 4KB, each entry is 8 bytes (64-bit DMA address).
/// Also known as "Radix3" firmware.


> +#[pin_data]
> +pub(crate) struct GspFirmware {

And then a slightly higher-level set of inline comments will help
developers, I think:

> +    /// The GSP firmware inside a [`VVec`], device-mapped via a SG table.
> +    #[pin]
> +    fw: SGTable<Owned<VVec<u8>>>,
> +    /// The level 2 page table, mapping [`Self::fw`] at its beginning.

Instead, how about:

      /// Level 2 page table(s) whose entries contain DMA addresses of firmware pages.

> +    #[pin]
> +    lvl2: SGTable<Owned<VVec<u8>>>,
> +    /// The level 1 page table, mapping [`Self::lvl2`] at its beginning.

       /// Level 1 page table(s) whose entries contain DMA addresses of level 2 pages.

> +    #[pin]
> +    lvl1: SGTable<Owned<VVec<u8>>>,
> +    /// The level 0 page table, mapping [`Self::lvl1`] at its beginning.

       /// Level 0 page table (single 4KB page) with one entry: DMA address of first level 1 page.

> +    lvl0: DmaObject,
> +    /// Size in bytes of the firmware contained in [`Self::fw`].
> +    pub size: usize,
> +}
> +
> +impl GspFirmware {
> +    /// Maps the GSP firmware image `fw` into `dev`'s address-space, and creates the page tables
> +    /// expected by the GSP bootloader to load it.
> +    pub(crate) fn new<'a>(
> +        dev: &'a device::Device<device::Bound>,
> +        fw: &'a [u8],
> +    ) -> impl PinInit<Self, Error> + 'a {
> +        try_pin_init!(&this in Self {
> +            fw <- {
> +                // Move the firmware into a vmalloc'd vector and map it into the device address
> +                // space.
> +                VVec::with_capacity(fw.len(), GFP_KERNEL)
> +                .and_then(|mut v| {
> +                    v.extend_from_slice(fw, GFP_KERNEL)?;
> +                    Ok(v)
> +                })
> +                .map_err(|_| ENOMEM)
> +                .map(|v| SGTable::new(dev, v, DataDirection::ToDevice, GFP_KERNEL))?
> +            },
> +            lvl2 <- {

Why must we use a strange vowel-removal algorithm for these vrbl nms? I'll let you have
a few extra characters and you can spell out "level2"...

> +                // Allocate the level 2 page table, map the firmware onto it, and map it into the
> +                // device address space.
> +                // SAFETY: `this` is a valid pointer, and `fw` has been initialized.
> +                let fw_sg_table = unsafe { &(*this.as_ptr()).fw };
> +                VVec::<u8>::with_capacity(
> +                    fw_sg_table.iter().count() * core::mem::size_of::<u64>(),
> +                    GFP_KERNEL,
> +                )
> +                .map_err(|_| ENOMEM)
> +                .and_then(|lvl2| map_into_lvl(fw_sg_table, lvl2))
> +                .map(|lvl2| SGTable::new(dev, lvl2, DataDirection::ToDevice, GFP_KERNEL))?
> +            },
> +            lvl1 <- {
> +                // Allocate the level 1 page table, map the level 2 page table onto it, and map it
> +                // into the device address space.
> +                // SAFETY: `this` is a valid pointer, and `lvl2` has been initialized.
> +                let lvl2_sg_table = unsafe { &(*this.as_ptr()).lvl2 };
> +                VVec::<u8>::with_capacity(
> +                    lvl2_sg_table.iter().count() * core::mem::size_of::<u64>(),
> +                    GFP_KERNEL,
> +                )
> +                .map_err(|_| ENOMEM)
> +                .and_then(|lvl1| map_into_lvl(lvl2_sg_table, lvl1))
> +                .map(|lvl1| SGTable::new(dev, lvl1, DataDirection::ToDevice, GFP_KERNEL))?
> +            },
> +            lvl0: {
> +                // Allocate the level 0 page table as a device-visible DMA object, and map the
> +                // level 1 page table onto it.
> +                // SAFETY: `this` is a valid pointer, and `lvl1` has been initialized.
> +                let lvl1_sg_table = unsafe { &(*this.as_ptr()).lvl1 };
> +                let mut lvl0 = DmaObject::new(dev, GSP_PAGE_SIZE)?;
> +                // SAFETY: we are the only owner of this newly-created object, making races
> +                // impossible.
> +                let lvl0_slice = unsafe { lvl0.as_slice_mut(0, GSP_PAGE_SIZE) }?;
> +                lvl0_slice[0..core::mem::size_of::<u64>()].copy_from_slice(
> +                    #[allow(clippy::useless_conversion)]
> +                    &(u64::from(lvl1_sg_table.iter().next().unwrap().dma_address())).to_le_bytes(),
> +                );
> +
> +                lvl0
> +            },
> +            size: fw.len(),
> +        })
> +    }
> +
> +    #[expect(unused)]
> +    /// Returns the DMA handle of the level 0 page table.
> +    pub(crate) fn lvl0_dma_handle(&self) -> DmaAddress {
> +        self.lvl0.dma_handle()
> +    }
> +}
> +
> +/// Create a linear mapping the device mapping of the buffer described by `sg_table` into `dst`.

How about this:

/// Build a page table from a scatter-gather list.
///
/// Takes each DMA-mapped region from `sg_table` and writes page table entries
/// for all 4KB pages within that region. For example, a 16KB SG entry becomes
/// 4 consecutive page table entries.

> +fn map_into_lvl(sg_table: &SGTable<Owned<VVec<u8>>>, mut dst: VVec<u8>) -> Result<VVec<u8>> {
> +    for sg_entry in sg_table.iter() {
> +        // Number of pages we need to map.
> +        let num_pages = (sg_entry.dma_len() as usize).div_ceil(GSP_PAGE_SIZE);
> +
> +        for i in 0..num_pages {
> +            let entry = sg_entry.dma_address() + (i as u64 * GSP_PAGE_SIZE as u64);
> +            dst.extend_from_slice(&entry.to_le_bytes(), GFP_KERNEL)?;
> +        }
> +    }
> +
> +    Ok(dst)
> +}
> diff --git a/drivers/gpu/nova-core/gsp.rs b/drivers/gpu/nova-core/gsp.rs
> new file mode 100644
> index 0000000000000000000000000000000000000000..a0e7ec5f6c9c959d57540b3ebf4b782f2e002b08
> --- /dev/null
> +++ b/drivers/gpu/nova-core/gsp.rs
> @@ -0,0 +1,4 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +pub(crate) const GSP_PAGE_SHIFT: usize = 12;
> +pub(crate) const GSP_PAGE_SIZE: usize = 1 << GSP_PAGE_SHIFT;
> diff --git a/drivers/gpu/nova-core/nova_core.rs b/drivers/gpu/nova-core/nova_core.rs
> index cb2bbb30cba142265b354c9acf70349a6e40759e..fffcaee2249fe6cd7f55a7291c1e44be42e791d9 100644
> --- a/drivers/gpu/nova-core/nova_core.rs
> +++ b/drivers/gpu/nova-core/nova_core.rs
> @@ -9,6 +9,7 @@
>  mod firmware;
>  mod gfw;
>  mod gpu;
> +mod gsp;
>  mod regs;
>  mod util;
>  mod vbios;
> 

thanks,
-- 
John Hubbard

Re: [PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware

Posted by Alexandre Courbot 1 month ago

On Thu Aug 28, 2025 at 1:01 PM JST, John Hubbard wrote:
> On 8/25/25 9:07 PM, Alexandre Courbot wrote:
>> The GSP firmware is a binary blob that is verified, loaded, and run by
>> the GSP bootloader. Its presentation is a bit peculiar as the GSP
>> bootloader expects to be given a DMA address to a 3-levels page table
>> mapping the GSP firmware at address 0 of its own address space.
>> 
>> Prepare such a structure containing the DMA-mapped firmware as well as
>> the DMA-mapped page tables, and a way to obtain the DMA handle of the
>> level 0 page table.
>> 
>> As we are performing the required ELF section parsing and radix3 page
>> table building, remove these items from the TODO file.
>> 
>> Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
>> ---
>>  Documentation/gpu/nova/core/todo.rst  |  17 -----
>>  drivers/gpu/nova-core/firmware.rs     | 110 +++++++++++++++++++++++++++++++-
>>  drivers/gpu/nova-core/firmware/gsp.rs | 117 ++++++++++++++++++++++++++++++++++
>>  drivers/gpu/nova-core/gsp.rs          |   4 ++
>>  drivers/gpu/nova-core/nova_core.rs    |   1 +
>>  5 files changed, 229 insertions(+), 20 deletions(-)
>  
> Code looks good. Or more accurately, it's working on my machine, and
> I think I understand it, aside from the SG Table internals.
>
> The documentation on the whole "radix 3" aspect is too light though, so
> I've created some that you can add in if you agree with it.
>
> ...
>> diff --git a/drivers/gpu/nova-core/firmware/gsp.rs b/drivers/gpu/nova-core/firmware/gsp.rs
> ...
>> +/// A device-mapped firmware with a set of (also device-mapped) pages tables mapping the firmware
>> +/// to the start of their own address space, also known as a `Radix3` firmware.
>
> I'd like to replace the above two lines with something like this:
>
> /// GSP firmware with 3-level radix page tables for the GSP bootloader.
> ///
> /// The bootloader expects firmware to be mapped starting at address 0 in GSP's virtual address
> /// space:
> ///
> /// ```text
> /// Level 0:  1 page, 1 entry         -> points to first level 1 page
> /// Level 1:  Multiple pages/entries  -> each entry points to a level 2 page
> /// Level 2:  Multiple pages/entries  -> each entry points to a firmware page
> /// ```
> ///
> /// Each page is 4KB, each entry is 8 bytes (64-bit DMA address).
> /// Also known as "Radix3" firmware.

Thanks, this is perfect!

>
>
>> +#[pin_data]
>> +pub(crate) struct GspFirmware {
>
> And then a slightly higher-level set of inline comments will help
> developers, I think:
>
>> +    /// The GSP firmware inside a [`VVec`], device-mapped via a SG table.
>> +    #[pin]
>> +    fw: SGTable<Owned<VVec<u8>>>,
>> +    /// The level 2 page table, mapping [`Self::fw`] at its beginning.
>
> Instead, how about:
>
>       /// Level 2 page table(s) whose entries contain DMA addresses of firmware pages.
>
>> +    #[pin]
>> +    lvl2: SGTable<Owned<VVec<u8>>>,
>> +    /// The level 1 page table, mapping [`Self::lvl2`] at its beginning.
>
>        /// Level 1 page table(s) whose entries contain DMA addresses of level 2 pages.

Looking good. But isn't it singular "table"? We have one table, with
potentialy several pages, but each page is not a table in itself IIUC.

>
>> +    #[pin]
>> +    lvl1: SGTable<Owned<VVec<u8>>>,
>> +    /// The level 0 page table, mapping [`Self::lvl1`] at its beginning.
>
>        /// Level 0 page table (single 4KB page) with one entry: DMA address of first level 1 page.
>
>> +    lvl0: DmaObject,
>> +    /// Size in bytes of the firmware contained in [`Self::fw`].
>> +    pub size: usize,
>> +}
>> +
>> +impl GspFirmware {
>> +    /// Maps the GSP firmware image `fw` into `dev`'s address-space, and creates the page tables
>> +    /// expected by the GSP bootloader to load it.
>> +    pub(crate) fn new<'a>(
>> +        dev: &'a device::Device<device::Bound>,
>> +        fw: &'a [u8],
>> +    ) -> impl PinInit<Self, Error> + 'a {
>> +        try_pin_init!(&this in Self {
>> +            fw <- {
>> +                // Move the firmware into a vmalloc'd vector and map it into the device address
>> +                // space.
>> +                VVec::with_capacity(fw.len(), GFP_KERNEL)
>> +                .and_then(|mut v| {
>> +                    v.extend_from_slice(fw, GFP_KERNEL)?;
>> +                    Ok(v)
>> +                })
>> +                .map_err(|_| ENOMEM)
>> +                .map(|v| SGTable::new(dev, v, DataDirection::ToDevice, GFP_KERNEL))?
>> +            },
>> +            lvl2 <- {
>
> Why must we use a strange vowel-removal algorithm for these vrbl nms? I'll let you have
> a few extra characters and you can spell out "level2"...

Yeah, let me spell these fully.

>
>> +                // Allocate the level 2 page table, map the firmware onto it, and map it into the
>> +                // device address space.
>> +                // SAFETY: `this` is a valid pointer, and `fw` has been initialized.
>> +                let fw_sg_table = unsafe { &(*this.as_ptr()).fw };
>> +                VVec::<u8>::with_capacity(
>> +                    fw_sg_table.iter().count() * core::mem::size_of::<u64>(),
>> +                    GFP_KERNEL,
>> +                )
>> +                .map_err(|_| ENOMEM)
>> +                .and_then(|lvl2| map_into_lvl(fw_sg_table, lvl2))
>> +                .map(|lvl2| SGTable::new(dev, lvl2, DataDirection::ToDevice, GFP_KERNEL))?
>> +            },
>> +            lvl1 <- {
>> +                // Allocate the level 1 page table, map the level 2 page table onto it, and map it
>> +                // into the device address space.
>> +                // SAFETY: `this` is a valid pointer, and `lvl2` has been initialized.
>> +                let lvl2_sg_table = unsafe { &(*this.as_ptr()).lvl2 };
>> +                VVec::<u8>::with_capacity(
>> +                    lvl2_sg_table.iter().count() * core::mem::size_of::<u64>(),
>> +                    GFP_KERNEL,
>> +                )
>> +                .map_err(|_| ENOMEM)
>> +                .and_then(|lvl1| map_into_lvl(lvl2_sg_table, lvl1))
>> +                .map(|lvl1| SGTable::new(dev, lvl1, DataDirection::ToDevice, GFP_KERNEL))?
>> +            },
>> +            lvl0: {
>> +                // Allocate the level 0 page table as a device-visible DMA object, and map the
>> +                // level 1 page table onto it.
>> +                // SAFETY: `this` is a valid pointer, and `lvl1` has been initialized.
>> +                let lvl1_sg_table = unsafe { &(*this.as_ptr()).lvl1 };
>> +                let mut lvl0 = DmaObject::new(dev, GSP_PAGE_SIZE)?;
>> +                // SAFETY: we are the only owner of this newly-created object, making races
>> +                // impossible.
>> +                let lvl0_slice = unsafe { lvl0.as_slice_mut(0, GSP_PAGE_SIZE) }?;
>> +                lvl0_slice[0..core::mem::size_of::<u64>()].copy_from_slice(
>> +                    #[allow(clippy::useless_conversion)]
>> +                    &(u64::from(lvl1_sg_table.iter().next().unwrap().dma_address())).to_le_bytes(),
>> +                );
>> +
>> +                lvl0
>> +            },
>> +            size: fw.len(),
>> +        })
>> +    }
>> +
>> +    #[expect(unused)]
>> +    /// Returns the DMA handle of the level 0 page table.
>> +    pub(crate) fn lvl0_dma_handle(&self) -> DmaAddress {
>> +        self.lvl0.dma_handle()
>> +    }
>> +}
>> +
>> +/// Create a linear mapping the device mapping of the buffer described by `sg_table` into `dst`.
>
> How about this:
>
> /// Build a page table from a scatter-gather list.
> ///
> /// Takes each DMA-mapped region from `sg_table` and writes page table entries
> /// for all 4KB pages within that region. For example, a 16KB SG entry becomes
> /// 4 consecutive page table entries.

Much better. And I mixed some words in the original message which didn't
even make sense to begin with.

Re: [PATCH v2 5/8] gpu: nova-core: firmware: process and prepare the GSP firmware

Posted by John Hubbard 1 month ago

On 8/28/25 4:13 AM, Alexandre Courbot wrote:
> On Thu Aug 28, 2025 at 1:01 PM JST, John Hubbard wrote:
>> On 8/25/25 9:07 PM, Alexandre Courbot wrote:
...
>>        /// Level 1 page table(s) whose entries contain DMA addresses of level 2 pages.
> 
> Looking good. But isn't it singular "table"? We have one table, with
> potentialy several pages, but each page is not a table in itself IIUC.

Oops, yes, good catch.


thanks,
-- 
John Hubbard