[v6] gpu: nova: add boot42 support for next-gen GPUs

[PATCH v6 4/4] gpu: nova-core: add boot42 support for next-gen GPUs

Posted by John Hubbard 3 months ago

NVIDIA GPUs are moving away from using NV_PMC_BOOT_0 to contain
architecture and revision details, and will instead use NV_PMC_BOOT_42
in the future. NV_PMC_BOOT_0 will contain a specific set of values
that will mean "go read NV_PMC_BOOT_42 instead".

Change the selection logic in Nova so that it will claim Turing and
later GPUs. This will work for the foreseeable future, without any
further code changes here, because all NVIDIA GPUs are considered, from
the oldest supported on Linux (NV04), through the future GPUs.

Add some comment documentation to explain, chronologically, how boot0
and boot42 change with the GPU eras, and how that affects the selection
logic.

Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs  | 35 ++++++++++++++++++++++++++++-
 drivers/gpu/nova-core/regs.rs | 42 +++++++++++++++++++++++++++++++++++
 2 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 94a6054bab95..5650c115c613 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -171,9 +171,31 @@ pub(crate) struct Spec {
 
 impl Spec {
     fn new(bar: &Bar0) -> Result<Spec> {
+        // Some brief notes about boot0 and boot42, in chronological order:
+        //
+        // NV04 through NV50:
+        //
+        //    Not supported by Nova. boot0 is necessary and sufficient to identify these GPUs.
+        //    boot42 may not even exist on some of these GPUs.
+        //
+        // Fermi through Volta:
+        //
+        //     Not supported by Nova. boot0 is still sufficient to identify these GPUs, but boot42
+        //     is also guaranteed to be both present and accurate.
+        //
+        // Turing and later:
+        //
+        //     Supported by Nova. Identified by first checking boot0 to ensure that the GPU is not
+        //     from an earlier (pre-Fermi) era, and then using boot42 to precisely identify the GPU.
+        //     Somewhere in the Rubin timeframe, boot0 will no longer have space to add new GPU IDs.
+
         let boot0 = regs::NV_PMC_BOOT_0::read(bar);
 
-        Spec::try_from(boot0)
+        if boot0.use_boot42_instead() {
+            Spec::try_from(regs::NV_PMC_BOOT_42::read(bar))
+        } else {
+            Spec::try_from(boot0)
+        }
     }
 }
 
@@ -188,6 +210,17 @@ fn try_from(boot0: regs::NV_PMC_BOOT_0) -> Result<Self> {
     }
 }
 
+impl TryFrom<regs::NV_PMC_BOOT_42> for Spec {
+    type Error = Error;
+
+    fn try_from(boot42: regs::NV_PMC_BOOT_42) -> Result<Self> {
+        Ok(Self {
+            chipset: boot42.chipset()?,
+            revision: boot42.revision(),
+        })
+    }
+}
+
 impl fmt::Display for Spec {
     fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
         write!(
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 8c9af3c59708..5d6397f6450a 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -41,6 +41,22 @@
 });
 
 impl NV_PMC_BOOT_0 {
+    fn older_than_fermi(self) -> bool {
+        // From https://github.com/NVIDIA/open-gpu-doc/tree/master/manuals :
+        const NV_PMC_BOOT_0_ARCHITECTURE_GF100: u8 = 0xc;
+
+        // Older chips left arch1 zeroed out. That, combined with an arch0 value that is less than
+        // GF00, means "older than Fermi".
+        self.architecture_1() == 0 && self.architecture_0() < NV_PMC_BOOT_0_ARCHITECTURE_GF100
+    }
+
+    pub(crate) fn use_boot42_instead(self) -> bool {
+        // For Fermi+ GPUs, boot42 is guaranteed to be both present and accurate, so that's the
+        // point at which we switch over to relying on boot42 for precise identification.
+
+        !self.older_than_fermi()
+    }
+
     /// Combines `architecture_0` and `architecture_1` to obtain the architecture of the chip.
     pub(crate) fn architecture(self) -> Result<Architecture> {
         Architecture::try_from(
@@ -67,6 +83,32 @@ pub(crate) fn revision(self) -> Revision {
     }
 }
 
+register!(NV_PMC_BOOT_42 @ 0x00000a00, "Extended architecture information" {
+    15:12   minor_revision as u8, "Minor revision of the chip";
+    19:16   major_revision as u8, "Major revision of the chip";
+    23:20   implementation as u8, "Implementation version of the architecture";
+    29:24   architecture as u8 ?=> Architecture, "Architecture value";
+});
+
+impl NV_PMC_BOOT_42 {
+    pub(crate) fn chipset(self) -> Result<Chipset> {
+        self.architecture()
+            .map(|arch| {
+                ((arch as u32) << Self::IMPLEMENTATION_RANGE.len())
+                    | u32::from(self.implementation())
+            })
+            .and_then(Chipset::try_from)
+    }
+
+    /// Returns the revision information of the chip.
+    pub(crate) fn revision(self) -> Revision {
+        Revision {
+            major: self.major_revision(),
+            minor: self.minor_revision(),
+        }
+    }
+}
+
 // PBUS
 
 register!(NV_PBUS_SW_SCRATCH @ 0x00001400[64]  {});
-- 
2.51.2

Re: [PATCH v6 4/4] gpu: nova-core: add boot42 support for next-gen GPUs

Posted by Timur Tabi 3 months ago

On Fri, 2025-11-07 at 20:39 -0800, John Hubbard wrote:
> +        // Turing and later:
> +        //
> +        //     Supported by Nova. Identified by first checking boot0 to ensure that the GPU
> is not
> +        //     from an earlier (pre-Fermi) era, and then using boot42 to precisely identify
> the GPU.
> +        //     Somewhere in the Rubin timeframe, boot0 will no longer have space to add new
> GPU IDs.
> +
>          let boot0 = regs::NV_PMC_BOOT_0::read(bar);
>  
> -        Spec::try_from(boot0)
> +        if boot0.use_boot42_instead() {
> +            Spec::try_from(regs::NV_PMC_BOOT_42::read(bar))
> +        } else {
> +            Spec::try_from(boot0)
> +        }
>      }

Spec::try_from(boot0) will always fail, because we can't generate a Spec from a pre-Turing GPU,
so it seems weird that we have it as an else condition.

I don't think the comment and the code aligns.  The code implies that sometimes we'll be using
boot0 to generate the Spec, but that isn't true.  However, the comment makes it clear that we'll
be using boot42 only.

Re: [PATCH v6 4/4] gpu: nova-core: add boot42 support for next-gen GPUs

Posted by John Hubbard 3 months ago

On 11/7/25 9:09 PM, Timur Tabi wrote:
> On Fri, 2025-11-07 at 20:39 -0800, John Hubbard wrote:
...
>>          let boot0 = regs::NV_PMC_BOOT_0::read(bar);
>>  
>> -        Spec::try_from(boot0)
>> +        if boot0.use_boot42_instead() {
>> +            Spec::try_from(regs::NV_PMC_BOOT_42::read(bar))
>> +        } else {
>> +            Spec::try_from(boot0)
>> +        }
>>      }
> 
> Spec::try_from(boot0) will always fail, because we can't generate a Spec from a pre-Turing GPU,
> so it seems weird that we have it as an else condition.
> 
> I don't think the comment and the code aligns.  The code implies that sometimes we'll be using
> boot0 to generate the Spec, but that isn't true.  However, the comment makes it clear that we'll
> be using boot42 only.

Hmmm, yes, the new use_boot42_instead() logic means that most of the
boot0 logic should actually be deleted now. OK, so I can apply this
diff on top, and everything still works:

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index 5650c115c613..6d17ad3cec40 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -194,22 +194,11 @@ fn new(bar: &Bar0) -> Result<Spec> {
         if boot0.use_boot42_instead() {
             Spec::try_from(regs::NV_PMC_BOOT_42::read(bar))
         } else {
-            Spec::try_from(boot0)
+            Err(ENOTSUPP)
         }
     }
 }
 
-impl TryFrom<regs::NV_PMC_BOOT_0> for Spec {
-    type Error = Error;
-
-    fn try_from(boot0: regs::NV_PMC_BOOT_0) -> Result<Self> {
-        Ok(Self {
-            chipset: boot0.chipset()?,
-            revision: boot0.revision(),
-        })
-    }
-}
-
 impl TryFrom<regs::NV_PMC_BOOT_42> for Spec {
     type Error = Error;
 
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 5d6397f6450a..018bee114a3f 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -56,31 +56,6 @@ pub(crate) fn use_boot42_instead(self) -> bool {
 
         !self.older_than_fermi()
     }
-
-    /// Combines `architecture_0` and `architecture_1` to obtain the architecture of the chip.
-    pub(crate) fn architecture(self) -> Result<Architecture> {
-        Architecture::try_from(
-            self.architecture_0() | (self.architecture_1() << Self::ARCHITECTURE_0_RANGE.len()),
-        )
-    }
-
-    /// Combines `architecture` and `implementation` to obtain a code unique to the chipset.
-    pub(crate) fn chipset(self) -> Result<Chipset> {
-        self.architecture()
-            .map(|arch| {
-                ((arch as u32) << Self::IMPLEMENTATION_RANGE.len())
-                    | u32::from(self.implementation())
-            })
-            .and_then(Chipset::try_from)
-    }
-
-    /// Returns the revision information of the chip.
-    pub(crate) fn revision(self) -> Revision {
-        Revision {
-            major: self.major_revision(),
-            minor: self.minor_revision(),
-        }
-    }
 }
 
 register!(NV_PMC_BOOT_42 @ 0x00000a00, "Extended architecture information" {



thanks,
-- 
John Hubbard

Re: [PATCH v6 4/4] gpu: nova-core: add boot42 support for next-gen GPUs

Posted by Timur Tabi 3 months ago

On Fri, 2025-11-07 at 21:19 -0800, John Hubbard wrote:
> > 
> > Spec::try_from(boot0) will always fail, because we can't generate a Spec from a pre-Turing
> > GPU,
> > so it seems weird that we have it as an else condition.
> > 
> > I don't think the comment and the code aligns.  The code implies that sometimes we'll be
> > using
> > boot0 to generate the Spec, but that isn't true.  However, the comment makes it clear that
> > we'll
> > be using boot42 only.
> 
> Hmmm, yes, the new use_boot42_instead() logic means that most of the
> boot0 logic should actually be deleted now. OK, so I can apply this
> diff on top, and everything still works:

...

This is much better, thanks.