[PATCH 1/7] rust: pci: pass driver data by value to `unbind`

Alexandre Courbot posted 7 patches 1 month, 3 weeks ago
[PATCH 1/7] rust: pci: pass driver data by value to `unbind`
Posted by Alexandre Courbot 1 month, 3 weeks ago
When unbinding a PCI driver, the `T::unbind` callback is invoked by the
driver framework, passing the driver data as a `Pin<&T>`.

This artificially restricts what the driver can do, as it cannot mutate
any state on the data. This becomes a problem in e.g. Nova, which needs
to invoke mutable methods when unbinding.

`remove_callback` retrieves the driver data by value, and drops it right
after the call to `T::unbind`, meaning it is the only reference to the
driver data by the time `T::unbind` is called.

There is thus no reason for not granting full ownership of the data to
`T::unbind`, so do it. This allows the driver mutate the state of its
data while unbinding, and also makes the API more symmetric as the
framework grants back to the driver the data it created in the first
place.

Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
---
 drivers/gpu/nova-core/driver.rs | 2 +-
 rust/kernel/pci.rs              | 4 ++--
 samples/rust/rust_driver_pci.rs | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/nova-core/driver.rs b/drivers/gpu/nova-core/driver.rs
index b8b0cc0f2d93..b0e3e41ae9d8 100644
--- a/drivers/gpu/nova-core/driver.rs
+++ b/drivers/gpu/nova-core/driver.rs
@@ -98,7 +98,7 @@ fn probe(pdev: &pci::Device<Core>, _info: &Self::IdInfo) -> impl PinInit<Self, E
         })
     }
 
-    fn unbind(pdev: &pci::Device<Core>, this: Pin<&Self>) {
+    fn unbind(pdev: &pci::Device<Core>, this: Pin<KBox<Self>>) {
         this.gpu.unbind(pdev.as_ref());
     }
 }
diff --git a/rust/kernel/pci.rs b/rust/kernel/pci.rs
index 82e128431f08..39e70adde484 100644
--- a/rust/kernel/pci.rs
+++ b/rust/kernel/pci.rs
@@ -116,7 +116,7 @@ extern "C" fn remove_callback(pdev: *mut bindings::pci_dev) {
         // and stored a `Pin<KBox<T>>`.
         let data = unsafe { pdev.as_ref().drvdata_obtain::<T>() };
 
-        T::unbind(pdev, data.as_ref());
+        T::unbind(pdev, data);
     }
 }
 
@@ -303,7 +303,7 @@ pub trait Driver: Send {
     /// operations to gracefully tear down the device.
     ///
     /// Otherwise, release operations for driver resources should be performed in `Self::drop`.
-    fn unbind(dev: &Device<device::Core>, this: Pin<&Self>) {
+    fn unbind(dev: &Device<device::Core>, this: Pin<KBox<Self>>) {
         let _ = (dev, this);
     }
 }
diff --git a/samples/rust/rust_driver_pci.rs b/samples/rust/rust_driver_pci.rs
index 5823787bea8e..ee8fc8d48ff3 100644
--- a/samples/rust/rust_driver_pci.rs
+++ b/samples/rust/rust_driver_pci.rs
@@ -95,7 +95,7 @@ fn probe(pdev: &pci::Device<Core>, info: &Self::IdInfo) -> impl PinInit<Self, Er
         })
     }
 
-    fn unbind(pdev: &pci::Device<Core>, this: Pin<&Self>) {
+    fn unbind(pdev: &pci::Device<Core>, this: Pin<KBox<Self>>) {
         if let Ok(bar) = this.bar.access(pdev.as_ref()) {
             // Reset pci-testdev by writing a new test index.
             bar.write8(this.index.0, Regs::TEST);

-- 
2.52.0
Re: [PATCH 1/7] rust: pci: pass driver data by value to `unbind`
Posted by Danilo Krummrich 1 month, 3 weeks ago
On Tue Dec 16, 2025 at 6:13 AM CET, Alexandre Courbot wrote:
> When unbinding a PCI driver, the `T::unbind` callback is invoked by the
> driver framework, passing the driver data as a `Pin<&T>`.
>
> This artificially restricts what the driver can do, as it cannot mutate
> any state on the data. This becomes a problem in e.g. Nova, which needs
> to invoke mutable methods when unbinding.
>
> `remove_callback` retrieves the driver data by value, and drops it right
> after the call to `T::unbind`, meaning it is the only reference to the
> driver data by the time `T::unbind` is called.
>
> There is thus no reason for not granting full ownership of the data to
> `T::unbind`, so do it.

There are multiple reasons I did avoid this for:

(1) Race conditions

A driver can call Device::drvdata() and obtain a reference to the driver's
device private data as long as it has a &Device<Bound> and asserts the correct
type of the driver's device private data [1].

Assume you have an IRQ registration, for instance, that lives within this device
private data.  Within the IRQ handler, nothing prevents us from calling
Device::drvdata() given that the IRQ handler has a Device<Bound> reference.

Consequently, with passing the device private data by value to unbind() it can
happen that we have both a mutable and immutable reference at of the device
private data at the same time.

The same is true for a lot of other cases, such as work items or workqueues that
are scoped to the Device being bound living within the device private data.

More generally, you can't take full ownership of the device private data as long
as the device is not yet fully unbound (which is not yet the case in unbind()).

(2) Design

It is intentional that the device private data has a defined lifetime that ends
with the device being unbound from its driver. This way we get the guarantee
that any object stored in the device private data won't survive device / driver
unbind. If we give back the ownership to the driver, this guarantee is lost.

Conclusion:

Having that said, if you need mutable access to the fields of the device private
data within unbind() the corresponding field(s) should be protected by a lock.

Alternatively, you have mutable access within the destructor as well, but there
you don't have a bound device anymore. Which is only consequent, since we can't
call the destructor of the device private data before the device is fully
unbound.

(In fact, as by now, there is a bug with this, which I noticed a few days ago
when I still was on vacation: The bus implementations call the destructor of the
device private data too early, i.e. when the device is not fully unbound yet.

I'm working on a fix for this. Luckily, as by now, this is not a real bug in
practice, yet it has to be fixed.)

From your end you don't have to worry about this though. nova-core should just
employ a lock for this, as we will need it in the future anyways, since we will
have concurrent access to the GSP.

[1] https://rust.docs.kernel.org/kernel/device/struct.Device.html#method.drvdata
Re: [PATCH 1/7] rust: pci: pass driver data by value to `unbind`
Posted by Alexandre Courbot 1 month, 3 weeks ago
On Tue Dec 16, 2025 at 9:14 PM JST, Danilo Krummrich wrote:
> On Tue Dec 16, 2025 at 6:13 AM CET, Alexandre Courbot wrote:
>> When unbinding a PCI driver, the `T::unbind` callback is invoked by the
>> driver framework, passing the driver data as a `Pin<&T>`.
>>
>> This artificially restricts what the driver can do, as it cannot mutate
>> any state on the data. This becomes a problem in e.g. Nova, which needs
>> to invoke mutable methods when unbinding.
>>
>> `remove_callback` retrieves the driver data by value, and drops it right
>> after the call to `T::unbind`, meaning it is the only reference to the
>> driver data by the time `T::unbind` is called.
>>
>> There is thus no reason for not granting full ownership of the data to
>> `T::unbind`, so do it.
>
> There are multiple reasons I did avoid this for:
>
> (1) Race conditions
>
> A driver can call Device::drvdata() and obtain a reference to the driver's
> device private data as long as it has a &Device<Bound> and asserts the correct
> type of the driver's device private data [1].
>
> Assume you have an IRQ registration, for instance, that lives within this device
> private data.  Within the IRQ handler, nothing prevents us from calling
> Device::drvdata() given that the IRQ handler has a Device<Bound> reference.
>
> Consequently, with passing the device private data by value to unbind() it can
> happen that we have both a mutable and immutable reference at of the device
> private data at the same time.
>
> The same is true for a lot of other cases, such as work items or workqueues that
> are scoped to the Device being bound living within the device private data.
>
> More generally, you can't take full ownership of the device private data as long
> as the device is not yet fully unbound (which is not yet the case in unbind()).

Ah, I completely ignored the fact that we can indeed have other
references to the private data! The fact that `unbind` works with
`Device<Bounded>` should have given me a hint, but somehow I blissfully
ignored that. ^_^;

I will implement some basic locking on the command queue so we can work
with a non-mutable reference.