[v7] gpu: nova: add boot42 support for next-gen GPUs

[PATCH v7 4/4] gpu: nova-core: add boot42 support for next-gen GPUs

Posted by John Hubbard 2 months, 4 weeks ago

NVIDIA GPUs are moving away from using NV_PMC_BOOT_0 to contain
architecture and revision details, and will instead use NV_PMC_BOOT_42
in the future. NV_PMC_BOOT_0 will contain a specific set of values
that will mean "go read NV_PMC_BOOT_42 instead".

Change the selection logic in Nova so that it will claim Turing and
later GPUs. This will work for the foreseeable future, without any
further code changes here, because all NVIDIA GPUs are considered, from
the oldest supported on Linux (NV04), through the future GPUs.

Add some comment documentation to explain, chronologically, how boot0
and boot42 change with the GPU eras, and how that affects the selection
logic.

Cc: Alexandre Courbot <acourbot@nvidia.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/gpu.rs  | 32 +++++++++++++++++++++++++++-----
 drivers/gpu/nova-core/regs.rs | 22 ++++++++++++++++------
 2 files changed, 43 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
index cd58040b681b..8c5f46f6aaac 100644
--- a/drivers/gpu/nova-core/gpu.rs
+++ b/drivers/gpu/nova-core/gpu.rs
@@ -175,19 +175,41 @@ pub(crate) struct Spec {
 
 impl Spec {
     fn new(bar: &Bar0) -> Result<Spec> {
+        // Some brief notes about boot0 and boot42, in chronological order:
+        //
+        // NV04 through NV50:
+        //
+        //    Not supported by Nova. boot0 is necessary and sufficient to identify these GPUs.
+        //    boot42 may not even exist on some of these GPUs.
+        //
+        // Fermi through Volta:
+        //
+        //     Not supported by Nova. boot0 is still sufficient to identify these GPUs, but boot42
+        //     is also guaranteed to be both present and accurate.
+        //
+        // Turing and later:
+        //
+        //     Supported by Nova. Identified by first checking boot0 to ensure that the GPU is not
+        //     from an earlier (pre-Fermi) era, and then using boot42 to precisely identify the GPU.
+        //     Somewhere in the Rubin timeframe, boot0 will no longer have space to add new GPU IDs.
+
         let boot0 = regs::NV_PMC_BOOT_0::read(bar);
 
-        Spec::try_from(boot0)
+        if boot0.is_older_than_fermi() {
+            return Err(ENOTSUPP);
+        }
+
+        Spec::try_from(regs::NV_PMC_BOOT_42::read(bar))
     }
 }
 
-impl TryFrom<regs::NV_PMC_BOOT_0> for Spec {
+impl TryFrom<regs::NV_PMC_BOOT_42> for Spec {
     type Error = Error;
 
-    fn try_from(boot0: regs::NV_PMC_BOOT_0) -> Result<Self> {
+    fn try_from(boot42: regs::NV_PMC_BOOT_42) -> Result<Self> {
         Ok(Self {
-            chipset: boot0.chipset()?,
-            revision: boot0.revision(),
+            chipset: boot42.chipset()?,
+            revision: boot42.revision(),
         })
     }
 }
diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
index 8c9af3c59708..81097e83c276 100644
--- a/drivers/gpu/nova-core/regs.rs
+++ b/drivers/gpu/nova-core/regs.rs
@@ -41,14 +41,24 @@
 });
 
 impl NV_PMC_BOOT_0 {
-    /// Combines `architecture_0` and `architecture_1` to obtain the architecture of the chip.
-    pub(crate) fn architecture(self) -> Result<Architecture> {
-        Architecture::try_from(
-            self.architecture_0() | (self.architecture_1() << Self::ARCHITECTURE_0_RANGE.len()),
-        )
+    pub(crate) fn is_older_than_fermi(self) -> bool {
+        // From https://github.com/NVIDIA/open-gpu-doc/tree/master/manuals :
+        const NV_PMC_BOOT_0_ARCHITECTURE_GF100: u8 = 0xc;
+
+        // Older chips left arch1 zeroed out. That, combined with an arch0 value that is less than
+        // GF100, means "older than Fermi".
+        self.architecture_1() == 0 && self.architecture_0() < NV_PMC_BOOT_0_ARCHITECTURE_GF100
     }
+}
+
+register!(NV_PMC_BOOT_42 @ 0x00000a00, "Extended architecture information" {
+    15:12   minor_revision as u8, "Minor revision of the chip";
+    19:16   major_revision as u8, "Major revision of the chip";
+    23:20   implementation as u8, "Implementation version of the architecture";
+    29:24   architecture as u8 ?=> Architecture, "Architecture value";
+});
 
-    /// Combines `architecture` and `implementation` to obtain a code unique to the chipset.
+impl NV_PMC_BOOT_42 {
     pub(crate) fn chipset(self) -> Result<Chipset> {
         self.architecture()
             .map(|arch| {
-- 
2.51.2

Re: [PATCH v7 4/4] gpu: nova-core: add boot42 support for next-gen GPUs

Posted by Joel Fernandes 2 months, 3 weeks ago

Hi John,

On 11/11/2025 11:30 PM, John Hubbard wrote:
> NVIDIA GPUs are moving away from using NV_PMC_BOOT_0 to contain
> architecture and revision details, and will instead use NV_PMC_BOOT_42
> in the future. NV_PMC_BOOT_0 will contain a specific set of values
> that will mean "go read NV_PMC_BOOT_42 instead".
> 
> Change the selection logic in Nova so that it will claim Turing and
> later GPUs. This will work for the foreseeable future, without any
> further code changes here, because all NVIDIA GPUs are considered, from
> the oldest supported on Linux (NV04), through the future GPUs.

[...]

> diff --git a/drivers/gpu/nova-core/gpu.rs b/drivers/gpu/nova-core/gpu.rs
> index cd58040b681b..8c5f46f6aaac 100644
> --- a/drivers/gpu/nova-core/gpu.rs
> +++ b/drivers/gpu/nova-core/gpu.rs
> @@ -175,19 +175,41 @@ pub(crate) struct Spec {
>  
>  impl Spec {
>      fn new(bar: &Bar0) -> Result<Spec> {
> +        // Some brief notes about boot0 and boot42, in chronological order:
> +        //
> +        // NV04 through NV50:
> +        //
> +        //    Not supported by Nova. boot0 is necessary and sufficient to identify these GPUs.
> +        //    boot42 may not even exist on some of these GPUs.
> +        //
> +        // Fermi through Volta:
> +        //
> +        //     Not supported by Nova. boot0 is still sufficient to identify these GPUs, but boot42
> +        //     is also guaranteed to be both present and accurate.
> +        //
> +        // Turing and later:
> +        //
> +        //     Supported by Nova. Identified by first checking boot0 to ensure that the GPU is not
> +        //     from an earlier (pre-Fermi) era, and then using boot42 to precisely identify the GPU.
> +        //     Somewhere in the Rubin timeframe, boot0 will no longer have space to add new GPU IDs.
> +
>          let boot0 = regs::NV_PMC_BOOT_0::read(bar);
>  
> -        Spec::try_from(boot0)
> +        if boot0.is_older_than_fermi() {
> +            return Err(ENOTSUPP);
> +        }
> +
> +        Spec::try_from(regs::NV_PMC_BOOT_42::read(bar))

There is an inconsistency in error return here, if NV04 through NV50, it returns
-ENOTSUPP. For Fermi through Volta, it will read boot42 but will return -ENODEV
because `Spec::try_from()` -> `boot42.chipset()` with return -ENODEV. I am Ok
with either error return, but it would be good to make it consistent.

There also does not seem to be a diagnostic if the chipset is not supported. It
would be good diagnostic that the chipset did not match, right now it will
return -ENODEV, which could mean the device does not exist. -ENOTSUPP is better
though but an actual dmesg error message would be nice.

With these,

Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>

Thanks.

Re: [PATCH v7 4/4] gpu: nova-core: add boot42 support for next-gen GPUs

Posted by John Hubbard 2 months, 3 weeks ago

On 11/13/25 11:59 AM, Joel Fernandes wrote:
> Hi John,
> 
...
>> -        Spec::try_from(boot0)
>> +        if boot0.is_older_than_fermi() {
>> +            return Err(ENOTSUPP);
>> +        }
>> +
>> +        Spec::try_from(regs::NV_PMC_BOOT_42::read(bar))
> 
> There is an inconsistency in error return here, if NV04 through NV50, it returns
> -ENOTSUPP. For Fermi through Volta, it will read boot42 but will return -ENODEV
> because `Spec::try_from()` -> `boot42.chipset()` with return -ENODEV. I am Ok
> with either error return, but it would be good to make it consistent.
> 

Yes, good catch. It should be -ENOTSUPP for sure.

> There also does not seem to be a diagnostic if the chipset is not supported. It
> would be good diagnostic that the chipset did not match, right now it will
> return -ENODEV, which could mean the device does not exist. -ENOTSUPP is better
> though but an actual dmesg error message would be nice.

Yes. The "not supported" case would happen in two situations:

a) Someone found a pre-Fermi GPU to use for (probably) display, and they
also have a Turing+ GPU in the same system (!), and they have both Nouveau
and Nova drivers available.

Here, it's not really an error situation. If this actually works, then
Nova not supporting the older GPU is just expected operation.

But these older GPUs are not even really directly supported, so this
is a fringe case anyway.

b) A newer GPU is installed, and Nova does not yet support it. Here,
an error message is OK, because Nova is eventually (soon) going to 
support that GPU.

So I think that means an error message is reasonable here.

> 
> With these,
> 
> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
> 

Thanks for the review!

Alex, I think I'd better re-spin and re-test, in order to safely collect
the various small fixes from you and Joel. I can do that today.

thanks,
-- 
John Hubbard

Re: [PATCH v7 4/4] gpu: nova-core: add boot42 support for next-gen GPUs

Posted by Alexandre Courbot 2 months, 3 weeks ago

On Fri Nov 14, 2025 at 5:16 AM JST, John Hubbard wrote:
> On 11/13/25 11:59 AM, Joel Fernandes wrote:
>> Hi John,
>> 
> ...
>>> -        Spec::try_from(boot0)
>>> +        if boot0.is_older_than_fermi() {
>>> +            return Err(ENOTSUPP);
>>> +        }
>>> +
>>> +        Spec::try_from(regs::NV_PMC_BOOT_42::read(bar))
>> 
>> There is an inconsistency in error return here, if NV04 through NV50, it returns
>> -ENOTSUPP. For Fermi through Volta, it will read boot42 but will return -ENODEV
>> because `Spec::try_from()` -> `boot42.chipset()` with return -ENODEV. I am Ok
>> with either error return, but it would be good to make it consistent.
>> 
>
> Yes, good catch. It should be -ENOTSUPP for sure.
>
>> There also does not seem to be a diagnostic if the chipset is not supported. It
>> would be good diagnostic that the chipset did not match, right now it will
>> return -ENODEV, which could mean the device does not exist. -ENOTSUPP is better
>> though but an actual dmesg error message would be nice.
>
> Yes. The "not supported" case would happen in two situations:
>
> a) Someone found a pre-Fermi GPU to use for (probably) display, and they
> also have a Turing+ GPU in the same system (!), and they have both Nouveau
> and Nova drivers available.
>
> Here, it's not really an error situation. If this actually works, then
> Nova not supporting the older GPU is just expected operation.
>
> But these older GPUs are not even really directly supported, so this
> is a fringe case anyway.
>
> b) A newer GPU is installed, and Nova does not yet support it. Here,
> an error message is OK, because Nova is eventually (soon) going to 
> support that GPU.
>
> So I think that means an error message is reasonable here.
>
>> 
>> With these,
>> 
>> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
>> 
>
> Thanks for the review!
>
> Alex, I think I'd better re-spin and re-test, in order to safely collect
> the various small fixes from you and Joel. I can do that today.

Agreed, that will limit the risk of me not capturing everything
properly. Thanks!

Re: [PATCH v7 4/4] gpu: nova-core: add boot42 support for next-gen GPUs

Posted by Alexandre Courbot 2 months, 3 weeks ago

On Wed Nov 12, 2025 at 1:30 PM JST, John Hubbard wrote:
> NVIDIA GPUs are moving away from using NV_PMC_BOOT_0 to contain
> architecture and revision details, and will instead use NV_PMC_BOOT_42
> in the future. NV_PMC_BOOT_0 will contain a specific set of values
> that will mean "go read NV_PMC_BOOT_42 instead".
>
> Change the selection logic in Nova so that it will claim Turing and
> later GPUs. This will work for the foreseeable future, without any
> further code changes here, because all NVIDIA GPUs are considered, from
> the oldest supported on Linux (NV04), through the future GPUs.
>
> Add some comment documentation to explain, chronologically, how boot0
> and boot42 change with the GPU eras, and how that affects the selection
> logic.
>
> Cc: Alexandre Courbot <acourbot@nvidia.com>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Timur Tabi <ttabi@nvidia.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>

Love it, it's super simple now. :)

<snip>
> diff --git a/drivers/gpu/nova-core/regs.rs b/drivers/gpu/nova-core/regs.rs
> index 8c9af3c59708..81097e83c276 100644
> --- a/drivers/gpu/nova-core/regs.rs
> +++ b/drivers/gpu/nova-core/regs.rs
> @@ -41,14 +41,24 @@
>  });
>  
>  impl NV_PMC_BOOT_0 {
> -    /// Combines `architecture_0` and `architecture_1` to obtain the architecture of the chip.
> -    pub(crate) fn architecture(self) -> Result<Architecture> {
> -        Architecture::try_from(
> -            self.architecture_0() | (self.architecture_1() << Self::ARCHITECTURE_0_RANGE.len()),
> -        )
> +    pub(crate) fn is_older_than_fermi(self) -> bool {
> +        // From https://github.com/NVIDIA/open-gpu-doc/tree/master/manuals :
> +        const NV_PMC_BOOT_0_ARCHITECTURE_GF100: u8 = 0xc;
> +
> +        // Older chips left arch1 zeroed out. That, combined with an arch0 value that is less than
> +        // GF100, means "older than Fermi".
> +        self.architecture_1() == 0 && self.architecture_0() < NV_PMC_BOOT_0_ARCHITECTURE_GF100

We could also keep `architecture` (making it private) and just test for
`self.architecture < NV_PMC_BOOT_0_ARCHITECTURE_GF100`. John, I can do
that when applying the series if you think that makes sense.

Considering that the series has been extensively reviewed during the
previous iterations, I think we can safely apply it for 6.19, so will
proceed once I have an answer.

Re: [PATCH v7 4/4] gpu: nova-core: add boot42 support for next-gen GPUs

Posted by John Hubbard 2 months, 3 weeks ago

On 11/13/25 12:03 AM, Alexandre Courbot wrote:
> On Wed Nov 12, 2025 at 1:30 PM JST, John Hubbard wrote:
...
>>  impl NV_PMC_BOOT_0 {
>> -    /// Combines `architecture_0` and `architecture_1` to obtain the architecture of the chip.
>> -    pub(crate) fn architecture(self) -> Result<Architecture> {
>> -        Architecture::try_from(
>> -            self.architecture_0() | (self.architecture_1() << Self::ARCHITECTURE_0_RANGE.len()),
>> -        )
>> +    pub(crate) fn is_older_than_fermi(self) -> bool {
>> +        // From https://github.com/NVIDIA/open-gpu-doc/tree/master/manuals :
>> +        const NV_PMC_BOOT_0_ARCHITECTURE_GF100: u8 = 0xc;
>> +
>> +        // Older chips left arch1 zeroed out. That, combined with an arch0 value that is less than
>> +        // GF100, means "older than Fermi".
>> +        self.architecture_1() == 0 && self.architecture_0() < NV_PMC_BOOT_0_ARCHITECTURE_GF100
> 
> We could also keep `architecture` (making it private) and just test for
> `self.architecture < NV_PMC_BOOT_0_ARCHITECTURE_GF100`. John, I can do
> that when applying the series if you think that makes sense.
> 
> Considering that the series has been extensively reviewed during the
> previous iterations, I think we can safely apply it for 6.19, so will
> proceed once I have an answer.

Hi Alex,

It turns out that this doesn't work well, because architecture()
returns an Architecture, not a u8, and then we have to map it back
and the whole things looks a lot worse: complexity on the screen
that serves no purpose.

After looking at several approaches, I've come full circle back to
what this patch has.


thanks,
-- 
John Hubbard

Re: [PATCH v7 4/4] gpu: nova-core: add boot42 support for next-gen GPUs

Posted by Alexandre Courbot 2 months, 3 weeks ago

On Fri Nov 14, 2025 at 10:54 AM JST, John Hubbard wrote:
> On 11/13/25 12:03 AM, Alexandre Courbot wrote:
>> On Wed Nov 12, 2025 at 1:30 PM JST, John Hubbard wrote:
> ...
>>>  impl NV_PMC_BOOT_0 {
>>> -    /// Combines `architecture_0` and `architecture_1` to obtain the architecture of the chip.
>>> -    pub(crate) fn architecture(self) -> Result<Architecture> {
>>> -        Architecture::try_from(
>>> -            self.architecture_0() | (self.architecture_1() << Self::ARCHITECTURE_0_RANGE.len()),
>>> -        )
>>> +    pub(crate) fn is_older_than_fermi(self) -> bool {
>>> +        // From https://github.com/NVIDIA/open-gpu-doc/tree/master/manuals :
>>> +        const NV_PMC_BOOT_0_ARCHITECTURE_GF100: u8 = 0xc;
>>> +
>>> +        // Older chips left arch1 zeroed out. That, combined with an arch0 value that is less than
>>> +        // GF100, means "older than Fermi".
>>> +        self.architecture_1() == 0 && self.architecture_0() < NV_PMC_BOOT_0_ARCHITECTURE_GF100
>> 
>> We could also keep `architecture` (making it private) and just test for
>> `self.architecture < NV_PMC_BOOT_0_ARCHITECTURE_GF100`. John, I can do
>> that when applying the series if you think that makes sense.
>> 
>> Considering that the series has been extensively reviewed during the
>> previous iterations, I think we can safely apply it for 6.19, so will
>> proceed once I have an answer.
>
> Hi Alex,
>
> It turns out that this doesn't work well, because architecture()
> returns an Architecture, not a u8, and then we have to map it back
> and the whole things looks a lot worse: complexity on the screen
> that serves no purpose.
>
> After looking at several approaches, I've come full circle back to
> what this patch has.

Ah right, we would need to add `Fermi` to `Architecture`, which would
now include some architecture we don't support...

In the light of this I agree your original approach makes the most
sense - thanks for checking it out!

Re: [PATCH v7 4/4] gpu: nova-core: add boot42 support for next-gen GPUs

Posted by John Hubbard 2 months, 3 weeks ago

On 11/13/25 12:03 AM, Alexandre Courbot wrote:
> On Wed Nov 12, 2025 at 1:30 PM JST, John Hubbard wrote:
...
>>   impl NV_PMC_BOOT_0 {
>> -    /// Combines `architecture_0` and `architecture_1` to obtain the architecture of the chip.
>> -    pub(crate) fn architecture(self) -> Result<Architecture> {
>> -        Architecture::try_from(
>> -            self.architecture_0() | (self.architecture_1() << Self::ARCHITECTURE_0_RANGE.len()),
>> -        )
>> +    pub(crate) fn is_older_than_fermi(self) -> bool {
>> +        // From https://github.com/NVIDIA/open-gpu-doc/tree/master/manuals :
>> +        const NV_PMC_BOOT_0_ARCHITECTURE_GF100: u8 = 0xc;
>> +
>> +        // Older chips left arch1 zeroed out. That, combined with an arch0 value that is less than
>> +        // GF100, means "older than Fermi".
>> +        self.architecture_1() == 0 && self.architecture_0() < NV_PMC_BOOT_0_ARCHITECTURE_GF100
> 
> We could also keep `architecture` (making it private) and just test for
> `self.architecture < NV_PMC_BOOT_0_ARCHITECTURE_GF100`. John, I can do
> that when applying the series if you think that makes sense.

Yes, I think that's a good change to make, sounds good to me.

> 
> Considering that the series has been extensively reviewed during the
> previous iterations, I think we can safely apply it for 6.19, so will
> proceed once I have an answer.

Great!

thanks,
-- 
John Hubbard