[PATCH v4 2/3] gpu: nova-core: avoid probing non-display/compute PCI functions

John Hubbard posted 3 patches 1 month, 2 weeks ago
There is a newer version of this series
[PATCH v4 2/3] gpu: nova-core: avoid probing non-display/compute PCI functions
Posted by John Hubbard 1 month, 2 weeks ago
NovaCore has so far been too imprecise about figuring out if .probe()
has found a supported PCI PF (Physical Function). By that I mean:
.probe() sets up BAR0 (which involves a lot of very careful devres and
Device<Bound> details behind the scenes). And then if it is dealing with
a non-supported device such as the .1 audio PF on many GPUs, it fails
out due to an unexpected BAR0 size. We have been fortunate that the BAR0
sizes are different.

Really, we should be filtering on PCI class ID instead. These days I
think we can confidently pick out Nova's supported PF's via PCI class
ID. And if not, then we'll revisit.

The approach here is to filter on "Display VGA" or "Display 3D", which
is how PCI class IDs express "this is a modern GPU's PF".

Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/driver.rs | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index 274989ea1fb4..b60c9defa9d1 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -1,6 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 
-use kernel::{auxiliary, bindings, c_str, device::Core, pci, prelude::*, sizes::SZ_16M, sync::Arc};
+use kernel::{
+    auxiliary, bindings, c_str, device::Core, pci, pci::Class, pci::ClassMask, prelude::*,
+    sizes::SZ_16M, sync::Arc,
+};
 
 use crate::gpu::Gpu;
 
@@ -18,10 +21,25 @@ pub(crate) struct NovaCore {
     PCI_TABLE,
     MODULE_PCI_TABLE,
     <NovaCore as pci::Driver>::IdInfo,
-    [(
-        pci::DeviceId::from_id(bindings::PCI_VENDOR_ID_NVIDIA, bindings::PCI_ANY_ID as u32),
-        ()
-    )]
+    [
+        // Modern NVIDIA GPUs will show up as either VGA or 3D controllers.
+        (
+            pci::DeviceId::from_class_and_vendor(
+                Class::DISPLAY_VGA,
+                ClassMask::ClassSubclass,
+                bindings::PCI_VENDOR_ID_NVIDIA
+            ),
+            ()
+        ),
+        (
+            pci::DeviceId::from_class_and_vendor(
+                Class::DISPLAY_3D,
+                ClassMask::ClassSubclass,
+                bindings::PCI_VENDOR_ID_NVIDIA
+            ),
+            ()
+        ),
+    ]
 );
 
 impl pci::Driver for NovaCore {
-- 
2.50.1
Re: [PATCH v4 2/3] gpu: nova-core: avoid probing non-display/compute PCI functions
Posted by Alexandre Courbot 1 month, 2 weeks ago
On Wed Aug 20, 2025 at 12:08 PM JST, John Hubbard wrote:
> NovaCore has so far been too imprecise about figuring out if .probe()
> has found a supported PCI PF (Physical Function). By that I mean:
> .probe() sets up BAR0 (which involves a lot of very careful devres and
> Device<Bound> details behind the scenes). And then if it is dealing with
> a non-supported device such as the .1 audio PF on many GPUs, it fails
> out due to an unexpected BAR0 size. We have been fortunate that the BAR0
> sizes are different.
>
> Really, we should be filtering on PCI class ID instead. These days I
> think we can confidently pick out Nova's supported PF's via PCI class
> ID. And if not, then we'll revisit.
>
> The approach here is to filter on "Display VGA" or "Display 3D", which
> is how PCI class IDs express "this is a modern GPU's PF".
>
> Cc: Danilo Krummrich <dakr@kernel.org>
> Cc: Alexandre Courbot <acourbot@nvidia.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/gpu/nova-core/driver.rs | 28 +++++++++++++++++++++++-----
>  1 file changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
> index 274989ea1fb4..b60c9defa9d1 100644
> --- a/drivers/gpu/nova-core/driver.rs
> +++ b/drivers/gpu/nova-core/driver.rs
> @@ -1,6 +1,9 @@
>  // SPDX-License-Identifier: GPL-2.0
>  
> -use kernel::{auxiliary, bindings, c_str, device::Core, pci, prelude::*, sizes::SZ_16M, sync::Arc};
> +use kernel::{
> +    auxiliary, bindings, c_str, device::Core, pci, pci::Class, pci::ClassMask, prelude::*,
> +    sizes::SZ_16M, sync::Arc,
> +};
>  
>  use crate::gpu::Gpu;
>  
> @@ -18,10 +21,25 @@ pub(crate) struct NovaCore {
>      PCI_TABLE,
>      MODULE_PCI_TABLE,
>      <NovaCore as pci::Driver>::IdInfo,
> -    [(
> -        pci::DeviceId::from_id(bindings::PCI_VENDOR_ID_NVIDIA, bindings::PCI_ANY_ID as u32),
> -        ()
> -    )]
> +    [
> +        // Modern NVIDIA GPUs will show up as either VGA or 3D controllers.
> +        (
> +            pci::DeviceId::from_class_and_vendor(
> +                Class::DISPLAY_VGA,
> +                ClassMask::ClassSubclass,
> +                bindings::PCI_VENDOR_ID_NVIDIA
> +            ),
> +            ()
> +        ),
> +        (
> +            pci::DeviceId::from_class_and_vendor(
> +                Class::DISPLAY_3D,
> +                ClassMask::ClassSubclass,
> +                bindings::PCI_VENDOR_ID_NVIDIA
> +            ),

This is making use of `from_class_and_vendor`, which is modified in the
next patch, requiring to modify this part of the file again. How about
switching this patch with 3/3 so we only modify the nova-core code once?

I also wonder if we want to merge 1/3 and (the current) 3/3, since 1/3
alone leaves `from_class_and_vendor` into some intermediate state that
nobody will ever get a chance to use anyway, and one doesn't really make
sense without the other. WDYT?
Re: [PATCH v4 2/3] gpu: nova-core: avoid probing non-display/compute PCI functions
Posted by Danilo Krummrich 1 month, 2 weeks ago
On Wed Aug 20, 2025 at 3:52 PM CEST, Alexandre Courbot wrote:
> This is making use of `from_class_and_vendor`, which is modified in the
> next patch, requiring to modify this part of the file again. How about
> switching this patch with 3/3 so we only modify the nova-core code once?

I think that makes sense.

> I also wonder if we want to merge 1/3 and (the current) 3/3, since 1/3
> alone leaves `from_class_and_vendor` into some intermediate state that
> nobody will ever get a chance to use anyway, and one doesn't really make
> sense without the other. WDYT?

Let's not merge them please, the intermediate state is not that bad, currently
we deal with raw integers for representing vendor IDs as well. So, patch 1 is an
improvement even by itself.
Re: [PATCH v4 2/3] gpu: nova-core: avoid probing non-display/compute PCI functions
Posted by John Hubbard 1 month, 2 weeks ago
On 8/20/25 6:56 AM, Danilo Krummrich wrote:
> On Wed Aug 20, 2025 at 3:52 PM CEST, Alexandre Courbot wrote:
>> This is making use of `from_class_and_vendor`, which is modified in the
>> next patch, requiring to modify this part of the file again. How about
>> switching this patch with 3/3 so we only modify the nova-core code once?
> 
> I think that makes sense.

Sure.

> 
>> I also wonder if we want to merge 1/3 and (the current) 3/3, since 1/3
>> alone leaves `from_class_and_vendor` into some intermediate state that
>> nobody will ever get a chance to use anyway, and one doesn't really make
>> sense without the other. WDYT?
> 
> Let's not merge them please, the intermediate state is not that bad, currently
> we deal with raw integers for representing vendor IDs as well. So, patch 1 is an
> improvement even by itself.

OK.

thanks,
-- 
John Hubbard