[PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems

John Hubbard posted 1 patch 2 days, 21 hours ago
There is a newer version of this series
drivers/gpu/nova-core/driver.rs | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
[PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems
Posted by John Hubbard 2 days, 21 hours ago
The auxiliary device registration was using a hardcoded ID of 0, which
caused probe() to fail on multi-GPU systems with:

   sysfs: cannot create duplicate filename '/bus/auxiliary/devices/NovaCore.nova-drm.0'

Fix this by using an atomic counter to generate unique IDs for each
GPU's aux device registration. The TODO item to eventually use XArray
for recycling aux device IDs is retained, but for now, this works very
nicely.

This has the side effect of making debugfs[1] work on multi-GPU systems.

[1] https://lore.kernel.org/20260203224757.871729-1-ttabi@nvidia.com

Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/gpu/nova-core/driver.rs | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Hi,

This is based on today's (Feb 4, 2026) linux-next/master branch.

thanks,
John Hubbard

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index 5a4cc047bcfc..a542ec0b40fa 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
+use core::sync::atomic::{AtomicU32, Ordering};
+
 use kernel::{
     auxiliary,
     device::Core,
@@ -19,6 +21,9 @@
 
 use crate::gpu::Gpu;
 
+/// Counter for generating unique auxiliary device IDs.
+static AUXILIARY_ID_COUNTER: AtomicU32 = AtomicU32::new(0);
+
 #[pin_data]
 pub(crate) struct NovaCore {
     #[pin]
@@ -85,12 +90,17 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> impl PinInit<Self, E
                 GFP_KERNEL,
             )?;
 
+            // TODO[XARR]: Use XArray for proper ID allocation/recycling; for now we use a simple
+            // atomic counter which never recycles IDs. A unique ID is required for multi-GPU
+            // systems; without it, probe() fails for all but the first GPU.
+            let aux_id = AUXILIARY_ID_COUNTER.fetch_add(1, Ordering::Relaxed);
+
             Ok(try_pin_init!(Self {
                 gpu <- Gpu::new(pdev, bar.clone(), bar.access(pdev.as_ref())?),
                 _reg <- auxiliary::Registration::new(
                     pdev.as_ref(),
                     c"nova-drm",
-                    0, // TODO[XARR]: Once it lands, use XArray; for now we don't use the ID.
+                    aux_id,
                     crate::MODULE_NAME
                 ),
             }))

base-commit: 0f8a890c4524d6e4013ff225e70de2aed7e6d726
-- 
2.53.0
Re: [PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems
Posted by Gary Guo 2 days, 11 hours ago
On Thu Feb 5, 2026 at 4:11 AM GMT, John Hubbard wrote:
> The auxiliary device registration was using a hardcoded ID of 0, which
> caused probe() to fail on multi-GPU systems with:
>
>    sysfs: cannot create duplicate filename '/bus/auxiliary/devices/NovaCore.nova-drm.0'
>
> Fix this by using an atomic counter to generate unique IDs for each
> GPU's aux device registration. The TODO item to eventually use XArray
> for recycling aux device IDs is retained, but for now, this works very
> nicely.
>
> This has the side effect of making debugfs[1] work on multi-GPU systems.

Hi John,

Looks like this is something that should be achieved via IDA?

Cc: Matthew Wilcox <willy@infradead.org>

>
> [1] https://lore.kernel.org/20260203224757.871729-1-ttabi@nvidia.com
>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/gpu/nova-core/driver.rs | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
>
> Hi,
>
> This is based on today's (Feb 4, 2026) linux-next/master branch.
>
> thanks,
> John Hubbard
>
> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
> index 5a4cc047bcfc..a542ec0b40fa 100644
> --- a/drivers/gpu/nova-core/driver.rs
> +++ b/drivers/gpu/nova-core/driver.rs
> @@ -1,5 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  
> +use core::sync::atomic::{AtomicU32, Ordering};

We're stopping the use of Rust atomics. Please use LKMM atomics available from
`kernel::sync::atomic`.

Best,
Gary

> +
>  use kernel::{
>      auxiliary,
>      device::Core,
> @@ -19,6 +21,9 @@
>  
>  use crate::gpu::Gpu;
>  
> +/// Counter for generating unique auxiliary device IDs.
> +static AUXILIARY_ID_COUNTER: AtomicU32 = AtomicU32::new(0);
> +
>  #[pin_data]
>  pub(crate) struct NovaCore {
>      #[pin]
> @@ -85,12 +90,17 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> impl PinInit<Self, E
>                  GFP_KERNEL,
>              )?;
>  
> +            // TODO[XARR]: Use XArray for proper ID allocation/recycling; for now we use a simple
> +            // atomic counter which never recycles IDs. A unique ID is required for multi-GPU
> +            // systems; without it, probe() fails for all but the first GPU.
> +            let aux_id = AUXILIARY_ID_COUNTER.fetch_add(1, Ordering::Relaxed);
> +
>              Ok(try_pin_init!(Self {
>                  gpu <- Gpu::new(pdev, bar.clone(), bar.access(pdev.as_ref())?),
>                  _reg <- auxiliary::Registration::new(
>                      pdev.as_ref(),
>                      c"nova-drm",
> -                    0, // TODO[XARR]: Once it lands, use XArray; for now we don't use the ID.
> +                    aux_id,
>                      crate::MODULE_NAME
>                  ),
>              }))
>
> base-commit: 0f8a890c4524d6e4013ff225e70de2aed7e6d726
Re: [PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems
Posted by Matthew Wilcox 2 days, 11 hours ago
On Thu, Feb 05, 2026 at 01:44:27PM +0000, Gary Guo wrote:
> > Fix this by using an atomic counter to generate unique IDs for each
> > GPU's aux device registration. The TODO item to eventually use XArray
> > for recycling aux device IDs is retained, but for now, this works very
> > nicely.
> >
> > This has the side effect of making debugfs[1] work on multi-GPU systems.
> 
> Hi John,
> 
> Looks like this is something that should be achieved via IDA?

Yes, if you have no need to go from ID to pointer, an IDA is better.
That said, as far as I understand what this code is doing, an atomic_t
solves the problem just fine and is cheaper.
Re: [PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems
Posted by Danilo Krummrich 2 days, 11 hours ago
On Thu Feb 5, 2026 at 2:48 PM CET, Matthew Wilcox wrote:
> On Thu, Feb 05, 2026 at 01:44:27PM +0000, Gary Guo wrote:
>> > Fix this by using an atomic counter to generate unique IDs for each
>> > GPU's aux device registration. The TODO item to eventually use XArray
>> > for recycling aux device IDs is retained, but for now, this works very
>> > nicely.
>> >
>> > This has the side effect of making debugfs[1] work on multi-GPU systems.
>> 
>> Hi John,
>> 
>> Looks like this is something that should be achieved via IDA?
>
> Yes, if you have no need to go from ID to pointer, an IDA is better.
> That said, as far as I understand what this code is doing, an atomic_t
> solves the problem just fine and is cheaper.

I agree, for now an atomic should be perfectly fine. Though, with enough
patience binding/unbinding the driver from sysfs you can probably make this
overflow. :)

The reason for the Xarray TODO is that it is one option for a place where
nova-core can store nova-drm / vGPU specific data, once either vGPU or nova-drm
attaches to the auxiliary device. But I think there may be better alternatives.
Re: [PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems
Posted by John Hubbard 2 days, 3 hours ago
On 2/5/26 6:19 AM, Danilo Krummrich wrote:
> On Thu Feb 5, 2026 at 2:48 PM CET, Matthew Wilcox wrote:
>> On Thu, Feb 05, 2026 at 01:44:27PM +0000, Gary Guo wrote:
>>>> Fix this by using an atomic counter to generate unique IDs for each
>>>> GPU's aux device registration. The TODO item to eventually use XArray
>>>> for recycling aux device IDs is retained, but for now, this works very
>>>> nicely.
>>>>
>>>> This has the side effect of making debugfs[1] work on multi-GPU systems.
>>>
>>> Hi John,
>>>
>>> Looks like this is something that should be achieved via IDA?
>>
>> Yes, if you have no need to go from ID to pointer, an IDA is better.
>> That said, as far as I understand what this code is doing, an atomic_t
>> solves the problem just fine and is cheaper.
> 
> I agree, for now an atomic should be perfectly fine. Though, with enough
> patience binding/unbinding the driver from sysfs you can probably make this
> overflow. :)
> 
> The reason for the Xarray TODO is that it is one option for a place where
> nova-core can store nova-drm / vGPU specific data, once either vGPU or nova-drm
> attaches to the auxiliary device. But I think there may be better alternatives.

OK, this seems like enough information to post a v2, thanks!

thanks,
-- 
John Hubbard
Re: [PATCH] gpu: nova-core: fix aux device registration for multi-GPU systems
Posted by John Hubbard 2 days, 21 hours ago
On 2/4/26 8:11 PM, John Hubbard wrote:
> The auxiliary device registration was using a hardcoded ID of 0, which
> caused probe() to fail on multi-GPU systems with:
> 
>    sysfs: cannot create duplicate filename '/bus/auxiliary/devices/NovaCore.nova-drm.0'
> 
> Fix this by using an atomic counter to generate unique IDs for each
> GPU's aux device registration. The TODO item to eventually use XArray
> for recycling aux device IDs is retained, but for now, this works very
> nicely.
> 
> This has the side effect of making debugfs[1] work on multi-GPU systems.
> 
> [1] https://lore.kernel.org/20260203224757.871729-1-ttabi@nvidia.com
> 
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/gpu/nova-core/driver.rs | 12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> Hi,
> 
> This is based on today's (Feb 4, 2026) linux-next/master branch.
> 
> thanks,
> John Hubbard
> 
> diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
> index 5a4cc047bcfc..a542ec0b40fa 100644
> --- a/drivers/gpu/nova-core/driver.rs
> +++ b/drivers/gpu/nova-core/driver.rs
> @@ -1,5 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>  
> +use core::sync::atomic::{AtomicU32, Ordering};

Somehow the wrong (non-vertical) formatting snuck back into
my patch! Arggh. I'll be glad when rustfmt support for this
can help me catch this.

> +
>  use kernel::{
>      auxiliary,
>      device::Core,
> @@ -19,6 +21,9 @@
>  
>  use crate::gpu::Gpu;
>  
> +/// Counter for generating unique auxiliary device IDs.
> +static AUXILIARY_ID_COUNTER: AtomicU32 = AtomicU32::new(0);
> +
>  #[pin_data]
>  pub(crate) struct NovaCore {
>      #[pin]
> @@ -85,12 +90,17 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> impl PinInit<Self, E
>                  GFP_KERNEL,
>              )?;
>  
> +            // TODO[XARR]: Use XArray for proper ID allocation/recycling; for now we use a simple

I also did *not* mean to leave the word "we" in there.

Lots of little glitches tonight, sorry about those.

thanks,
-- 
John Hubbard

> +            // atomic counter which never recycles IDs. A unique ID is required for multi-GPU
> +            // systems; without it, probe() fails for all but the first GPU.
> +            let aux_id = AUXILIARY_ID_COUNTER.fetch_add(1, Ordering::Relaxed);
> +
>              Ok(try_pin_init!(Self {
>                  gpu <- Gpu::new(pdev, bar.clone(), bar.access(pdev.as_ref())?),
>                  _reg <- auxiliary::Registration::new(
>                      pdev.as_ref(),
>                      c"nova-drm",
> -                    0, // TODO[XARR]: Once it lands, use XArray; for now we don't use the ID.
> +                    aux_id,
>                      crate::MODULE_NAME
>                  ),
>              }))
> 
> base-commit: 0f8a890c4524d6e4013ff225e70de2aed7e6d726