Add a method to return the Physical Function (PF) device for a Virtual
Function (VF) device in the bound device context.
Unlike for a PCI driver written in C, guarantee that when a VF device is
bound to a driver, the underlying PF device is bound to a driver, too.
When a device with enabled VFs is unbound from a driver, invoke the
sriov_configure() callback to disable SR-IOV before the unbind()
callback. To ensure the guarantee is upheld, call disable_sriov()
to remove all VF devices if the driver has not done so already.
Suggested-by: Danilo Krummrich <dakr@kernel.org>
Signed-off-by: Peter Colberg <pcolberg@redhat.com>
---
rust/kernel/pci.rs | 47 +++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 47 insertions(+)
diff --git a/rust/kernel/pci.rs b/rust/kernel/pci.rs
index f9054c52a3bdff2c42273366a4058d943ee5fd76..d6cc5d7e7cd7a3b6e451c0ff5aea044b89c6774a 100644
--- a/rust/kernel/pci.rs
+++ b/rust/kernel/pci.rs
@@ -120,6 +120,19 @@ extern "C" fn remove_callback(pdev: *mut bindings::pci_dev) {
// and stored a `Pin<KBox<T>>`.
let data = unsafe { pdev.as_ref().drvdata_obtain::<T>() };
+ // Always disable SR-IOV before `unbind()` to guarantee that when a VF device is bound
+ // to a driver, the underlying PF device is bound to a driver, too.
+ #[cfg(CONFIG_PCI_IOV)]
+ {
+ // First, allow the driver to gracefully disable SR-IOV by itself.
+ if T::HAS_SRIOV_CONFIGURE && pdev.num_vf() != 0 {
+ let _ = T::sriov_configure(pdev, 0);
+ }
+ // Then, forcibly disable SR-IOV if the driver has not done so already,
+ // to uphold the guarantee with regard to driver binding described above.
+ pdev.disable_sriov();
+ }
+
T::unbind(pdev, data.as_ref());
}
@@ -332,6 +345,11 @@ fn unbind(dev: &Device<device::Core>, this: Pin<&Self>) {
/// [`Device`] by writing the number of Virtual Functions (VF), `nr_virtfn` or zero to the
/// sysfs file `sriov_numvfs` for this device. Implementing this callback is optional.
///
+ /// Further, and unlike for a PCI driver written in C, when a PF device with enabled VFs is
+ /// unbound from its bound [`Driver`], the `sriov_configure()` callback is invoked to disable
+ /// SR-IOV before the `unbind()` callback. This guarantees that when a VF device is bound to a
+ /// driver, the underlying PF device is bound to a driver, too.
+ ///
/// Upon success, this callback must return the number of VFs that were enabled, or zero if
/// SR-IOV was disabled.
///
@@ -500,6 +518,35 @@ pub fn pci_class(&self) -> Class {
}
}
+impl Device<device::Bound> {
+ /// Returns the Physical Function (PF) device for a Virtual Function (VF) device.
+ ///
+ /// # Examples
+ ///
+ /// The following example illustrates how to obtain the private driver data of the PF device,
+ /// where `vf_pdev` is the VF device of reference type `&Device<Core>` or `&Device<Bound>`.
+ ///
+ /// ```
+ /// let pf_pdev = vf_pdev.physfn()?;
+ /// let pf_drvdata = pf_pdev.as_ref().drvdata::<MyDriver>()?;
+ /// ```
+ #[cfg(CONFIG_PCI_IOV)]
+ pub fn physfn(&self) -> Result<&Device<device::Bound>> {
+ if !self.is_virtfn() {
+ return Err(EINVAL);
+ }
+ // SAFETY:
+ // `self.as_raw` returns a valid pointer to a `struct pci_dev`.
+ //
+ // `physfn` is a valid pointer to a `struct pci_dev` since self.is_virtfn() is `true`.
+ //
+ // `physfn` may be cast to a `Device<device::Bound>` since `pci::Driver::remove()` calls
+ // `disable_sriov()` to remove all VF devices, which guarantees that the underlying
+ // PF device is always bound to a driver when the VF device is bound to a driver.
+ Ok(unsafe { &*(*self.as_raw()).__bindgen_anon_1.physfn.cast() })
+ }
+}
+
impl Device<device::Core> {
/// Enable memory resources for this device.
pub fn enable_device_mem(&self) -> Result {
--
2.51.1
On Wed, Nov 19, 2025 at 05:19:11PM -0500, Peter Colberg wrote:
> Add a method to return the Physical Function (PF) device for a Virtual
> Function (VF) device in the bound device context.
>
> Unlike for a PCI driver written in C, guarantee that when a VF device is
> bound to a driver, the underlying PF device is bound to a driver, too.
You can't do this as an absolutely statement from rust code alone,
this statement is confused.
> When a device with enabled VFs is unbound from a driver, invoke the
> sriov_configure() callback to disable SR-IOV before the unbind()
> callback. To ensure the guarantee is upheld, call disable_sriov()
> to remove all VF devices if the driver has not done so already.
This doesn't seem like it should be in this patch.
Good drivers using the PCI APIs should be calling pci_disable_sriov()
during their unbind.
The prior patch #3 should not have added "safe" bindings for
enable_sriov that allow this lifetime rule to be violated.
> + #[cfg(CONFIG_PCI_IOV)]
> + pub fn physfn(&self) -> Result<&Device<device::Bound>> {
> + if !self.is_virtfn() {
> + return Err(EINVAL);
> + }
> + // SAFETY:
> + // `self.as_raw` returns a valid pointer to a `struct pci_dev`.
> + //
> + // `physfn` is a valid pointer to a `struct pci_dev` since self.is_virtfn() is `true`.
> + //
> + // `physfn` may be cast to a `Device<device::Bound>` since `pci::Driver::remove()` calls
> + // `disable_sriov()` to remove all VF devices, which guarantees that the underlying
> + // PF device is always bound to a driver when the VF device is bound to a driver.
Wrong safety statement. There are drivers that don't call
disable_sriov(). You need to also check that the driver attached to
the PF is actually working properly.
I do not like to see this attempt to open code the tricky login of
pci_iov_get_pf_drvdata() in rust without understanding the issues :(
Jason
On Sat Nov 22, 2025 at 12:26 PM NZDT, Jason Gunthorpe wrote:
> On Wed, Nov 19, 2025 at 05:19:11PM -0500, Peter Colberg wrote:
>> Add a method to return the Physical Function (PF) device for a Virtual
>> Function (VF) device in the bound device context.
>>
>> Unlike for a PCI driver written in C, guarantee that when a VF device is
>> bound to a driver, the underlying PF device is bound to a driver, too.
>
> You can't do this as an absolutely statement from rust code alone,
> this statement is confused.
Indeed! However, I'd like to see that we actually provide this guarantee from
the C PCI code.
So far I haven't heard a convincing reason for not providing this guarantee. The
only reason not to guarantee this I have heard is that some PF drivers only
enable SR-IOV and hence could be unloaded afterwards. However, I think there is
no strong reason to do so.
What I would like to see is that we unbind VF drivers when the PF driver is
unbound in general, analogous to what we are guaranteed by the auxiliary bus.
>> + #[cfg(CONFIG_PCI_IOV)]
>> + pub fn physfn(&self) -> Result<&Device<device::Bound>> {
>> + if !self.is_virtfn() {
>> + return Err(EINVAL);
>> + }
>> + // SAFETY:
>> + // `self.as_raw` returns a valid pointer to a `struct pci_dev`.
>> + //
>> + // `physfn` is a valid pointer to a `struct pci_dev` since self.is_virtfn() is `true`.
>> + //
>> + // `physfn` may be cast to a `Device<device::Bound>` since `pci::Driver::remove()` calls
>> + // `disable_sriov()` to remove all VF devices, which guarantees that the underlying
>> + // PF device is always bound to a driver when the VF device is bound to a driver.
>
> Wrong safety statement. There are drivers that don't call
> disable_sriov(). You need to also check that the driver attached to
> the PF is actually working properly.
Indeed, with this patch, only Rust drivers provide this guarantee of the VF
being bound when the PF is bound.
> I do not like to see this attempt to open code the tricky login of
> pci_iov_get_pf_drvdata() in rust without understanding the issues :(
I discussed this with Peter in advance (thanks Peter for your work on this
topic!), and as mentioned above I'd like to see this series to propose that we
always guarantee that a VF is bound when the corresponding PF is bound.
With this, the above code will be correct and a driver can use the generic
infrastructure to:
1) Call pci::Device<Bound>::physfn() returning a Result<pci::Device<Bound>>
2) Grab the driver's device private data from the returned Device<Bound>
Note that 2) (i.e. accessing the driver's device private data with
Device::drvdata() [1]) ensures that the device is actually bound (drvdata() is
only implemented for Device<Bound>) and that the returned type actually matches
the type of the object that has been stored.
Since we always need those two checks when accessing a driver's device private
data, it is already done in the generic drvdata() accessor.
Therefore the only additional guarantee we have to give is that VF bound implies
PF bound. Otherwise physfn() would need to be unsafe and the driver would need
to promise that this is the case. From there on drvdata() already does the other
checks as mentioned.
I suggest to have a look at [2] for an example with how this turns out with the
auxiliary bus [2][3], since in the end it's the same problem, i.e. an auxiliary
driver calling into its parent, except that the auxiliary bus already guarantees
that the parent is bound when the child is bound.
Given that, there is no value in using pci_iov_get_pf_drvdata(), in Rust you'd
just call
// `vfdev` must be a `pci::Device<Bound>` for `physfn()` to be
// available; `pfdev` will therefore be a `pci::Device<Bound>` too
// (assuming we provide the guarantee for this, otherwise this would
// need to be unsafe).
let pfdev = vfdev.phyfn();
// `FooData` is the type of the PF drvier's device private data. The
// call to `drvdata()` will fail with an error of the asserted type is
// wrong.
let drvdata = pfdev.drvdata::<FooData>()?;
So, if we'd provide a Rust accessor for the PF's device driver data, we'd
implement it like above, because Device::drvdata() is already safe. If we want
pci::Device::pf_drvdata() to be safe, we'd otherwise need to do all the checks
Device::drvdata() already does again before we call into
pci_iov_get_pf_drvdata().
(Note that I'm currently travelling, hence I might not be as responsive as
usual. I'm travelling until after LPC; I plan to take a detailed look at this
series in the week of the conference).
[1] https://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git/tree/rust/kernel/device.rs?h=driver-core-next#n313
[2] https://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git/tree/samples/rust/rust_driver_auxiliary.rs?h=driver-core-next#n81
[3] https://lore.kernel.org/all/20251020223516.241050-1-dakr@kernel.org/
On Sat, Nov 22, 2025 at 11:23:16PM +1300, Danilo Krummrich wrote: > So far I haven't heard a convincing reason for not providing this guarantee. The > only reason not to guarantee this I have heard is that some PF drivers only > enable SR-IOV and hence could be unloaded afterwards. However, I think there is > no strong reason to do so. I know some people have work flows around this. I think they are wrong. When we "fixed" mlx5 a while back there was some pushback and some weird things did stop working. So while I support this goal, I don't know if enough people will agree.. > With this, the above code will be correct and a driver can use the generic > infrastructure to: > > 1) Call pci::Device<Bound>::physfn() returning a Result<pci::Device<Bound>> > 2) Grab the driver's device private data from the returned Device<Bound> > > Note that 2) (i.e. accessing the driver's device private data with > Device::drvdata() [1]) ensures that the device is actually bound (drvdata() is > only implemented for Device<Bound>) and that the returned type actually matches > the type of the object that has been stored. This is what the core helper does, with the twist that it "validates" the PF driver is working right by checking its driver binding.. > I suggest to have a look at [2] for an example with how this turns out with the > auxiliary bus [2][3], since in the end it's the same problem, i.e. an auxiliary > driver calling into its parent, except that the auxiliary bus already guarantees > that the parent is bound when the child is bound. Aux bus is a little different because it should be used in a way where there are stronger guarantees about what the parent is. Ie the aux bus names should be unique and limited to the type of parent. > So, if we'd provide a Rust accessor for the PF's device driver data, we'd > implement it like above, because Device::drvdata() is already safe. If we want > pci::Device::pf_drvdata() to be safe, we'd otherwise need to do all the checks > Device::drvdata() already does again before we call into > pci_iov_get_pf_drvdata(). I think to make progress along this line you need to still somehow validate that the PF driver is working right, either by checking that the driver is bound to a rust driver somehow or using the same approach as the core helper. I'm not sure the idea to force all drivers to do disable sriov is going to be easy, and I'd rather see rust bindings progress without opening such a topic.. Jason
On Sun Nov 23, 2025 at 5:16 AM NZDT, Jason Gunthorpe wrote: > I think to make progress along this line you need to still somehow > validate that the PF driver is working right, either by checking that > the driver is bound to a rust driver somehow or using the same > approach as the core helper. Do you refer to the if (pf_dev->driver != pf_driver) check? If so, that's (in a slightly different form) already part of the generic Device::drvdata() accessor. > I'm not sure the idea to force all drivers to do disable sriov is > going to be easy, and I'd rather see rust bindings progress without > opening such a topic.. I'm sorry, I should have mentioned what I actually propose: My idea would be to provide a bool in struct pci_driver, which, if set, guarantees that all VFs are unbound when the PF is unbound. With this bool being set, the PCI bus can provide the guarantee that VF bound implies PF bound; the Rust accessor can then leverage this guarantee. This can also be leveraged by the C code, where we could have a separate accessor that checks the bool rather than askes the driver to promise that the PF is bound, which pci_iov_get_pf_drvdata() does. (Although I have to admit that without the additional type system capabilities we have in Rust, it is not that big of an improvement.)
On Sun, Nov 23, 2025 at 11:43:08AM +1300, Danilo Krummrich wrote: > On Sun Nov 23, 2025 at 5:16 AM NZDT, Jason Gunthorpe wrote: > > I think to make progress along this line you need to still somehow > > validate that the PF driver is working right, either by checking that > > the driver is bound to a rust driver somehow or using the same > > approach as the core helper. > > Do you refer to the > > if (pf_dev->driver != pf_driver) > > check? If so, that's (in a slightly different form) already part of the generic > Device::drvdata() accessor. Yeah, but by that point it has already returned a bound device pointer without proving it is safe to do so.. If you check the flag you are proposing below then it would be OK: > > I'm not sure the idea to force all drivers to do disable sriov is > > going to be easy, and I'd rather see rust bindings progress without > > opening such a topic.. > > I'm sorry, I should have mentioned what I actually propose: > > My idea would be to provide a bool in struct pci_driver, which, if set, > guarantees that all VFs are unbound when the PF is unbound. > > With this bool being set, the PCI bus can provide the guarantee that VF bound > implies PF bound; the Rust accessor can then leverage this guarantee. > > This can also be leveraged by the C code, where we could have a separate > accessor that checks the bool rather than askes the driver to promise that the > PF is bound, which pci_iov_get_pf_drvdata() does. > > (Although I have to admit that without the additional type system capabilities > we have in Rust, it is not that big of an improvement.) This seems like it has a decent chance of succeeding.. Though as I said in my other email, you'd probably want a version where there is WARN_ON if that common code is actually triggered as it would be a driver bug to misorder its destruction. Jason
On Sat, Nov 22, 2025 at 12:16:15PM -0400, Jason Gunthorpe wrote: > On Sat, Nov 22, 2025 at 11:23:16PM +1300, Danilo Krummrich wrote: > > So far I haven't heard a convincing reason for not providing this guarantee. The > > only reason not to guarantee this I have heard is that some PF drivers only > > enable SR-IOV and hence could be unloaded afterwards. However, I think there is > > no strong reason to do so. > > I know some people have work flows around this. I think they are > wrong. When we "fixed" mlx5 a while back there was some pushback and > some weird things did stop working. I don't think that this is correct. The main use case for keeping VFs, while unloading PF driver is to allow reuse this PF for VFIO too. It is very natural and common flow for "real" SR-IOV devices which present all PF/VFs as truly separate devices. The pushback for mlx5 was related to change in eswitch mode and not to driver unload. In very corner case, mlx5 kept VFs to protect change in netdevs. > > So while I support this goal, I don't know if enough people will > agree.. I don't see that Bjorn is CCed here, so it is unclear to whom Danilo talked to get rationale for this new behaviour. > > > With this, the above code will be correct and a driver can use the generic > > infrastructure to: > > > > 1) Call pci::Device<Bound>::physfn() returning a Result<pci::Device<Bound>> > > 2) Grab the driver's device private data from the returned Device<Bound> > > > > Note that 2) (i.e. accessing the driver's device private data with > > Device::drvdata() [1]) ensures that the device is actually bound (drvdata() is > > only implemented for Device<Bound>) and that the returned type actually matches > > the type of the object that has been stored. > > This is what the core helper does, with the twist that it "validates" > the PF driver is working right by checking its driver binding.. > > > I suggest to have a look at [2] for an example with how this turns out with the > > auxiliary bus [2][3], since in the end it's the same problem, i.e. an auxiliary > > driver calling into its parent, except that the auxiliary bus already guarantees > > that the parent is bound when the child is bound. > > Aux bus is a little different because it should be used in a way where > there are stronger guarantees about what the parent is. Ie the aux bus > names should be unique and limited to the type of parent. Right, aux bus is completely unrelated to real physical PCI bus. Thanks
On Sun Nov 23, 2025 at 7:57 AM NZDT, Leon Romanovsky wrote: > On Sat, Nov 22, 2025 at 12:16:15PM -0400, Jason Gunthorpe wrote: >> On Sat, Nov 22, 2025 at 11:23:16PM +1300, Danilo Krummrich wrote: >> > So far I haven't heard a convincing reason for not providing this guarantee. The >> > only reason not to guarantee this I have heard is that some PF drivers only >> > enable SR-IOV and hence could be unloaded afterwards. However, I think there is >> > no strong reason to do so. >> >> I know some people have work flows around this. I think they are >> wrong. When we "fixed" mlx5 a while back there was some pushback and >> some weird things did stop working. > > I don't think that this is correct. The main use case for keeping VFs, > while unloading PF driver is to allow reuse this PF for VFIO too. It is > very natural and common flow for "real" SR-IOV devices which present all > PF/VFs as truly separate devices. That sounds a bit odd to me, what exactly do you mean with "reuse the PF for VFIO"? What do you do with the PF after driver unload instead? Load another driver? If so, why separate ones? >> So while I support this goal, I don't know if enough people will >> agree.. > > I don't see that Bjorn is CCed here, so it is unclear to whom Danilo > talked to get rationale for this new behaviour. Not sure what you mean, Bjorn is CC'd on the whole series and hence also this conversation. Otherwise, my proposal to guarantee that the PF is bound as long as the VF(s) are bound stems from the corresponding beneficial lifecycle implications for the driver model (some more on this below). >> > With this, the above code will be correct and a driver can use the generic >> > infrastructure to: >> > >> > 1) Call pci::Device<Bound>::physfn() returning a Result<pci::Device<Bound>> >> > 2) Grab the driver's device private data from the returned Device<Bound> >> > >> > Note that 2) (i.e. accessing the driver's device private data with >> > Device::drvdata() [1]) ensures that the device is actually bound (drvdata() is >> > only implemented for Device<Bound>) and that the returned type actually matches >> > the type of the object that has been stored. >> >> This is what the core helper does, with the twist that it "validates" >> the PF driver is working right by checking its driver binding.. >> >> > I suggest to have a look at [2] for an example with how this turns out with the >> > auxiliary bus [2][3], since in the end it's the same problem, i.e. an auxiliary >> > driver calling into its parent, except that the auxiliary bus already guarantees >> > that the parent is bound when the child is bound. >> >> Aux bus is a little different because it should be used in a way where >> there are stronger guarantees about what the parent is. Ie the aux bus >> names should be unique and limited to the type of parent. > > Right, aux bus is completely unrelated to real physical PCI bus. Of course the auxiliary and the PCI bus are fundamentally different busses, *but* they also have commonalities: the driver lifecycle requirements drivers have to deal with are the same. For instance, if you call from an auxiliary driver into the parent driver to let the parent driver do some work on behalf, the auxiliary driver has to guarantee that the parent device is guranteed to remain bound for the duration of the call, otherwise the parent driver can't call dev_get_drvdata() without risking a UAF, since a concurrent unbind might free the parent driver's device private data. The same is true for PCI when a VF driver calls into the PF driver to do some work on behalf. The VF driver has to make sure that the PF driver remains bound for the whole duration of the call for the exact same reasons. I.e. it is a common driver lifecycle pattern for interactions between drivers in general. As for Rust specifically, we can actually model such driver lifecycle patterns with the type system and with that do a lot of the driver lifecycle checks (that drivers love to ignore :) even at compile time, such that your code doesn't even compile when you violate the driver lifecycle rules. For the auxiliary bus that's already the case and I'd love to get to this point with PF <-> VF interactions (in Rust) as well. I hope this helps. - Danilo
On Sun, Nov 23, 2025 at 11:26:52AM +1300, Danilo Krummrich wrote: > On Sun Nov 23, 2025 at 7:57 AM NZDT, Leon Romanovsky wrote: > > On Sat, Nov 22, 2025 at 12:16:15PM -0400, Jason Gunthorpe wrote: > >> On Sat, Nov 22, 2025 at 11:23:16PM +1300, Danilo Krummrich wrote: > >> > So far I haven't heard a convincing reason for not providing this guarantee. The > >> > only reason not to guarantee this I have heard is that some PF drivers only > >> > enable SR-IOV and hence could be unloaded afterwards. However, I think there is > >> > no strong reason to do so. > >> > >> I know some people have work flows around this. I think they are > >> wrong. When we "fixed" mlx5 a while back there was some pushback and > >> some weird things did stop working. > > > > I don't think that this is correct. The main use case for keeping VFs, > > while unloading PF driver is to allow reuse this PF for VFIO too. It is > > very natural and common flow for "real" SR-IOV devices which present all > > PF/VFs as truly separate devices. > > That sounds a bit odd to me, what exactly do you mean with "reuse the PF for > VFIO"? What do you do with the PF after driver unload instead? Load another > driver? If so, why separate ones? One of the main use cases for SR-IOV is to provide to VM users/customers devices with performance and security promises as physical ones. In this case, the VMs are created through PF and not bound to any driver. Once customer/user requests VM, that VF is bound to vfio-pci driver and attached to that VM. In many cases, PF is unbound too from its original driver and attached to some other VM. It allows for these VM providers to maximize utilization of their SR-IOV devices. At least in PCI spec 6.0.1, it stays clearly that PF can be attached to SI (VM in spec language). "Physical Function (PF) - A PF is a PCIe Function that supports the SR-IOV Extended Capability and is accessible to an SR-PCIM, a VI, or an SI." > > >> So while I support this goal, I don't know if enough people will > >> agree.. > > > > I don't see that Bjorn is CCed here, so it is unclear to whom Danilo > > talked to get rationale for this new behaviour. > > Not sure what you mean, Bjorn is CC'd on the whole series and hence also this > conversation. > > Otherwise, my proposal to guarantee that the PF is bound as long as the VF(s) > are bound stems from the corresponding beneficial lifecycle implications for the > driver model (some more on this below). > > >> > With this, the above code will be correct and a driver can use the generic > >> > infrastructure to: > >> > > >> > 1) Call pci::Device<Bound>::physfn() returning a Result<pci::Device<Bound>> > >> > 2) Grab the driver's device private data from the returned Device<Bound> > >> > > >> > Note that 2) (i.e. accessing the driver's device private data with > >> > Device::drvdata() [1]) ensures that the device is actually bound (drvdata() is > >> > only implemented for Device<Bound>) and that the returned type actually matches > >> > the type of the object that has been stored. > >> > >> This is what the core helper does, with the twist that it "validates" > >> the PF driver is working right by checking its driver binding.. > >> > >> > I suggest to have a look at [2] for an example with how this turns out with the > >> > auxiliary bus [2][3], since in the end it's the same problem, i.e. an auxiliary > >> > driver calling into its parent, except that the auxiliary bus already guarantees > >> > that the parent is bound when the child is bound. > >> > >> Aux bus is a little different because it should be used in a way where > >> there are stronger guarantees about what the parent is. Ie the aux bus > >> names should be unique and limited to the type of parent. > > > > Right, aux bus is completely unrelated to real physical PCI bus. > > Of course the auxiliary and the PCI bus are fundamentally different busses, > *but* they also have commonalities: the driver lifecycle requirements drivers > have to deal with are the same. > > For instance, if you call from an auxiliary driver into the parent driver to let > the parent driver do some work on behalf, the auxiliary driver has to guarantee > that the parent device is guranteed to remain bound for the duration of the > call, otherwise the parent driver can't call dev_get_drvdata() without risking a > UAF, since a concurrent unbind might free the parent driver's device private > data. > > The same is true for PCI when a VF driver calls into the PF driver to do some > work on behalf. The VF driver has to make sure that the PF driver remains bound > for the whole duration of the call for the exact same reasons. And this section is not entirely correct. It is one of the possible usages of PF. In many devices, PF and its VFs are independent and don't communicate through kernel at all. There is no need for VF to have PF be bounded to its original driver. > > I.e. it is a common driver lifecycle pattern for interactions between drivers in > general. And it is not true as well. In case you need to interact between devices/drivers, it is up to the caller to ensure that this driver isn't unloaded. > > As for Rust specifically, we can actually model such driver lifecycle patterns > with the type system and with that do a lot of the driver lifecycle checks (that > drivers love to ignore :) even at compile time, such that your code doesn't even > compile when you violate the driver lifecycle rules. > > For the auxiliary bus that's already the case and I'd love to get to this point > with PF <-> VF interactions (in Rust) as well. It is absolutely correct thing to do for auxbus. It is very questionable for SR-IOV. > > I hope this helps. Thanks > > - Danilo
On Sun Nov 23, 2025 at 7:34 PM NZDT, Leon Romanovsky wrote: > On Sun, Nov 23, 2025 at 11:26:52AM +1300, Danilo Krummrich wrote: >> On Sun Nov 23, 2025 at 7:57 AM NZDT, Leon Romanovsky wrote: >> > On Sat, Nov 22, 2025 at 12:16:15PM -0400, Jason Gunthorpe wrote: >> >> On Sat, Nov 22, 2025 at 11:23:16PM +1300, Danilo Krummrich wrote: >> >> > So far I haven't heard a convincing reason for not providing this guarantee. The >> >> > only reason not to guarantee this I have heard is that some PF drivers only >> >> > enable SR-IOV and hence could be unloaded afterwards. However, I think there is >> >> > no strong reason to do so. >> >> >> >> I know some people have work flows around this. I think they are >> >> wrong. When we "fixed" mlx5 a while back there was some pushback and >> >> some weird things did stop working. >> > >> > I don't think that this is correct. The main use case for keeping VFs, >> > while unloading PF driver is to allow reuse this PF for VFIO too. It is >> > very natural and common flow for "real" SR-IOV devices which present all >> > PF/VFs as truly separate devices. >> >> That sounds a bit odd to me, what exactly do you mean with "reuse the PF for >> VFIO"? What do you do with the PF after driver unload instead? Load another >> driver? If so, why separate ones? > > One of the main use cases for SR-IOV is to provide to VM users/customers > devices with performance and security promises as physical ones. In this > case, the VMs are created through PF and not bound to any driver. Once > customer/user requests VM, that VF is bound to vfio-pci driver and > attached to that VM. > > In many cases, PF is unbound too from its original driver and attached > to some other VM. It allows for these VM providers to maximize > utilization of their SR-IOV devices. > > At least in PCI spec 6.0.1, it stays clearly that PF can be attached to SI (VM in spec language). > "Physical Function (PF) - A PF is a PCIe Function that supports the SR-IOV Extended Capability > and is accessible to an SR-PCIM, a VI, or an SI." Hm, that's possible, but do we have cases of this in practice where we bind and unbind the same PF multiple times, pass it to different VMs, etc.? In any case, what we can always do is what I propose in [1], i.e. add a bool to struct pci_driver that indicates whether the VFs should be unbound when the PF is unbound, such that we can uphold the guarantee of VF bound implies PF bound at least conditionally. [1] https://lore.kernel.org/all/DEFL4TG0WX1C.2GLH4417EPU3V@kernel.org/ >> >> So while I support this goal, I don't know if enough people will >> >> agree.. >> > >> > I don't see that Bjorn is CCed here, so it is unclear to whom Danilo >> > talked to get rationale for this new behaviour. >> >> Not sure what you mean, Bjorn is CC'd on the whole series and hence also this >> conversation. >> >> Otherwise, my proposal to guarantee that the PF is bound as long as the VF(s) >> are bound stems from the corresponding beneficial lifecycle implications for the >> driver model (some more on this below). >> >> >> > With this, the above code will be correct and a driver can use the generic >> >> > infrastructure to: >> >> > >> >> > 1) Call pci::Device<Bound>::physfn() returning a Result<pci::Device<Bound>> >> >> > 2) Grab the driver's device private data from the returned Device<Bound> >> >> > >> >> > Note that 2) (i.e. accessing the driver's device private data with >> >> > Device::drvdata() [1]) ensures that the device is actually bound (drvdata() is >> >> > only implemented for Device<Bound>) and that the returned type actually matches >> >> > the type of the object that has been stored. >> >> >> >> This is what the core helper does, with the twist that it "validates" >> >> the PF driver is working right by checking its driver binding.. >> >> >> >> > I suggest to have a look at [2] for an example with how this turns out with the >> >> > auxiliary bus [2][3], since in the end it's the same problem, i.e. an auxiliary >> >> > driver calling into its parent, except that the auxiliary bus already guarantees >> >> > that the parent is bound when the child is bound. >> >> >> >> Aux bus is a little different because it should be used in a way where >> >> there are stronger guarantees about what the parent is. Ie the aux bus >> >> names should be unique and limited to the type of parent. >> > >> > Right, aux bus is completely unrelated to real physical PCI bus. >> >> Of course the auxiliary and the PCI bus are fundamentally different busses, >> *but* they also have commonalities: the driver lifecycle requirements drivers >> have to deal with are the same. >> >> For instance, if you call from an auxiliary driver into the parent driver to let >> the parent driver do some work on behalf, the auxiliary driver has to guarantee >> that the parent device is guranteed to remain bound for the duration of the >> call, otherwise the parent driver can't call dev_get_drvdata() without risking a >> UAF, since a concurrent unbind might free the parent driver's device private >> data. >> >> The same is true for PCI when a VF driver calls into the PF driver to do some >> work on behalf. The VF driver has to make sure that the PF driver remains bound >> for the whole duration of the call for the exact same reasons. > > And this section is not entirely correct. It is one of the possible > usages of PF. In many devices, PF and its VFs are independent and don't > communicate through kernel at all. There is no need for VF to have PF be > bounded to its original driver. Of course -- similarly there might be cases where an auxiliary driver might not cummunicate with its parent, but that's not the case I'm referring to above. > >> >> I.e. it is a common driver lifecycle pattern for interactions between drivers in >> general. > > And it is not true as well. In case you need to interact between > devices/drivers, it is up to the caller to ensure that this driver isn't > unloaded. You're mixing two things here. The driver model lifecycle requires that if driver A calls into driver B - where B accesses its device private data - that B is bound for the full duration of the call. This is true for any such interaction, the question of who is responsible to uphold this guarantee, e.g. the bus layer or the driver itself is orthogonal. Whenever possible, it's best to let the bus layer uphold the guarantee though, since it's one pitfall less a driver can fall into. >> >> As for Rust specifically, we can actually model such driver lifecycle patterns >> with the type system and with that do a lot of the driver lifecycle checks (that >> drivers love to ignore :) even at compile time, such that your code doesn't even >> compile when you violate the driver lifecycle rules. >> >> For the auxiliary bus that's already the case and I'd love to get to this point >> with PF <-> VF interactions (in Rust) as well. > > It is absolutely correct thing to do for auxbus. It is very questionable for SR-IOV. At least conditionally (as proposed above), it's an improvement for cases where there is PF <-> VF interactions, i.e. why let drivers take care if the bus can already do it for them.
On Sun, Nov 23, 2025 at 11:07:47PM +1300, Danilo Krummrich wrote: > On Sun Nov 23, 2025 at 7:34 PM NZDT, Leon Romanovsky wrote: > > On Sun, Nov 23, 2025 at 11:26:52AM +1300, Danilo Krummrich wrote: > >> On Sun Nov 23, 2025 at 7:57 AM NZDT, Leon Romanovsky wrote: > >> > On Sat, Nov 22, 2025 at 12:16:15PM -0400, Jason Gunthorpe wrote: > >> >> On Sat, Nov 22, 2025 at 11:23:16PM +1300, Danilo Krummrich wrote: > >> >> > So far I haven't heard a convincing reason for not providing this guarantee. The > >> >> > only reason not to guarantee this I have heard is that some PF drivers only > >> >> > enable SR-IOV and hence could be unloaded afterwards. However, I think there is > >> >> > no strong reason to do so. > >> >> > >> >> I know some people have work flows around this. I think they are > >> >> wrong. When we "fixed" mlx5 a while back there was some pushback and > >> >> some weird things did stop working. > >> > > >> > I don't think that this is correct. The main use case for keeping VFs, > >> > while unloading PF driver is to allow reuse this PF for VFIO too. It is > >> > very natural and common flow for "real" SR-IOV devices which present all > >> > PF/VFs as truly separate devices. > >> > >> That sounds a bit odd to me, what exactly do you mean with "reuse the PF for > >> VFIO"? What do you do with the PF after driver unload instead? Load another > >> driver? If so, why separate ones? > > > > One of the main use cases for SR-IOV is to provide to VM users/customers > > devices with performance and security promises as physical ones. In this > > case, the VMs are created through PF and not bound to any driver. Once > > customer/user requests VM, that VF is bound to vfio-pci driver and > > attached to that VM. > > > > In many cases, PF is unbound too from its original driver and attached > > to some other VM. It allows for these VM providers to maximize > > utilization of their SR-IOV devices. > > > > At least in PCI spec 6.0.1, it stays clearly that PF can be attached to SI (VM in spec language). > > "Physical Function (PF) - A PF is a PCIe Function that supports the SR-IOV Extended Capability > > and is accessible to an SR-PCIM, a VI, or an SI." > > Hm, that's possible, but do we have cases of this in practice where we bind and > unbind the same PF multiple times, pass it to different VMs, etc.? It is very common case, when the goal is to maximize hardware utilization. > > In any case, what we can always do is what I propose in [1], i.e. add a bool to > struct pci_driver that indicates whether the VFs should be unbound when the PF > is unbound, such that we can uphold the guarantee of VF bound implies PF bound > at least conditionally. > > [1] https://lore.kernel.org/all/DEFL4TG0WX1C.2GLH4417EPU3V@kernel.org/ > > >> >> So while I support this goal, I don't know if enough people will > >> >> agree.. > >> > > >> > I don't see that Bjorn is CCed here, so it is unclear to whom Danilo > >> > talked to get rationale for this new behaviour. > >> > >> Not sure what you mean, Bjorn is CC'd on the whole series and hence also this > >> conversation. > >> > >> Otherwise, my proposal to guarantee that the PF is bound as long as the VF(s) > >> are bound stems from the corresponding beneficial lifecycle implications for the > >> driver model (some more on this below). > >> > >> >> > With this, the above code will be correct and a driver can use the generic > >> >> > infrastructure to: > >> >> > > >> >> > 1) Call pci::Device<Bound>::physfn() returning a Result<pci::Device<Bound>> > >> >> > 2) Grab the driver's device private data from the returned Device<Bound> > >> >> > > >> >> > Note that 2) (i.e. accessing the driver's device private data with > >> >> > Device::drvdata() [1]) ensures that the device is actually bound (drvdata() is > >> >> > only implemented for Device<Bound>) and that the returned type actually matches > >> >> > the type of the object that has been stored. > >> >> > >> >> This is what the core helper does, with the twist that it "validates" > >> >> the PF driver is working right by checking its driver binding.. > >> >> > >> >> > I suggest to have a look at [2] for an example with how this turns out with the > >> >> > auxiliary bus [2][3], since in the end it's the same problem, i.e. an auxiliary > >> >> > driver calling into its parent, except that the auxiliary bus already guarantees > >> >> > that the parent is bound when the child is bound. > >> >> > >> >> Aux bus is a little different because it should be used in a way where > >> >> there are stronger guarantees about what the parent is. Ie the aux bus > >> >> names should be unique and limited to the type of parent. > >> > > >> > Right, aux bus is completely unrelated to real physical PCI bus. > >> > >> Of course the auxiliary and the PCI bus are fundamentally different busses, > >> *but* they also have commonalities: the driver lifecycle requirements drivers > >> have to deal with are the same. > >> > >> For instance, if you call from an auxiliary driver into the parent driver to let > >> the parent driver do some work on behalf, the auxiliary driver has to guarantee > >> that the parent device is guranteed to remain bound for the duration of the > >> call, otherwise the parent driver can't call dev_get_drvdata() without risking a > >> UAF, since a concurrent unbind might free the parent driver's device private > >> data. > >> > >> The same is true for PCI when a VF driver calls into the PF driver to do some > >> work on behalf. The VF driver has to make sure that the PF driver remains bound > >> for the whole duration of the call for the exact same reasons. > > > > And this section is not entirely correct. It is one of the possible > > usages of PF. In many devices, PF and its VFs are independent and don't > > communicate through kernel at all. There is no need for VF to have PF be > > bounded to its original driver. > > Of course -- similarly there might be cases where an auxiliary driver might not > cummunicate with its parent, but that's not the case I'm referring to above. Not really. Auxbus is a logical in-kernel separation of one physical device. The promise is that parent auxdevice is tied to its children. It was done intentionally to make sure that children don't outlive their parent. It is not the case for SR-IOV. > > > > >> > >> I.e. it is a common driver lifecycle pattern for interactions between drivers in > >> general. > > > > And it is not true as well. In case you need to interact between > > devices/drivers, it is up to the caller to ensure that this driver isn't > > unloaded. > > You're mixing two things here. The driver model lifecycle requires that if > driver A calls into driver B - where B accesses its device private data - that B > is bound for the full duration of the call. I'm aware of this, and we are not talking about driver model. Whole discussion is "if PF can be unbound, while VFs exist". The answer is yes, it can, both from PCI spec perspective and from operational one. > > This is true for any such interaction, the question of who is responsible to > uphold this guarantee, e.g. the bus layer or the driver itself is orthogonal. > > Whenever possible, it's best to let the bus layer uphold the guarantee though, > since it's one pitfall less a driver can fall into. > > >> > >> As for Rust specifically, we can actually model such driver lifecycle patterns > >> with the type system and with that do a lot of the driver lifecycle checks (that > >> drivers love to ignore :) even at compile time, such that your code doesn't even > >> compile when you violate the driver lifecycle rules. > >> > >> For the auxiliary bus that's already the case and I'd love to get to this point > >> with PF <-> VF interactions (in Rust) as well. > > > > It is absolutely correct thing to do for auxbus. It is very questionable for SR-IOV. > > At least conditionally (as proposed above), it's an improvement for cases where > there is PF <-> VF interactions, i.e. why let drivers take care if the bus can > already do it for them. I'm not against some opt-in/opt-out solution. I'm against "one size fits all" proposal, where PF is tied to its original driver as long as VFs exist. Thanks
On Sun, Nov 23, 2025 at 01:18:23PM +0200, Leon Romanovsky wrote: > > >> That sounds a bit odd to me, what exactly do you mean with "reuse the PF for > > >> VFIO"? What do you do with the PF after driver unload instead? Load another > > >> driver? If so, why separate ones? > > > > > > One of the main use cases for SR-IOV is to provide to VM users/customers > > > devices with performance and security promises as physical ones. In this > > > case, the VMs are created through PF and not bound to any driver. Once > > > customer/user requests VM, that VF is bound to vfio-pci driver and > > > attached to that VM. > > > > > > In many cases, PF is unbound too from its original driver and attached > > > to some other VM. It allows for these VM providers to maximize > > > utilization of their SR-IOV devices. > > > > > > At least in PCI spec 6.0.1, it stays clearly that PF can be attached to SI (VM in spec language). > > > "Physical Function (PF) - A PF is a PCIe Function that supports the SR-IOV Extended Capability > > > and is accessible to an SR-PCIM, a VI, or an SI." > > > > Hm, that's possible, but do we have cases of this in practice where we bind and > > unbind the same PF multiple times, pass it to different VMs, etc.? > > It is very common case, when the goal is to maximize hardware utilization. It is a sort of common configuration, but VFIO should be driving the PF directly using its native SRIOV support. There is no need to rebind a driver while SRIOV is still enabled. > > You're mixing two things here. The driver model lifecycle requires that if > > driver A calls into driver B - where B accesses its device private data - that B > > is bound for the full duration of the call. > > I'm aware of this, and we are not talking about driver model. Whole > discussion is "if PF can be unbound, while VFs exist". The answer is yes, it can, > both from PCI spec perspective and from operational one. This whole discussion highlights my original feeling.. While I think it makes alot of sense to tie the VF lifecycle to the PF driver binding universally there are enough contrary opinions. > > At least conditionally (as proposed above), it's an improvement for cases where > > there is PF <-> VF interactions, i.e. why let drivers take care if the bus can > > already do it for them. Drivers like mlx5 have a sequencing requirement during shutdown, it wants to see SRIOV turned off before it moves on to other steps. This is probably somewhat common.. So while it is nice for the bus to guarentee it, it probably also signals there is a bug somewhere if that code gets used.. Jason
On Mon, Nov 24, 2025 at 09:57:25AM -0400, Jason Gunthorpe wrote: > On Sun, Nov 23, 2025 at 01:18:23PM +0200, Leon Romanovsky wrote: > > > >> That sounds a bit odd to me, what exactly do you mean with "reuse the PF for > > > >> VFIO"? What do you do with the PF after driver unload instead? Load another > > > >> driver? If so, why separate ones? > > > > > > > > One of the main use cases for SR-IOV is to provide to VM users/customers > > > > devices with performance and security promises as physical ones. In this > > > > case, the VMs are created through PF and not bound to any driver. Once > > > > customer/user requests VM, that VF is bound to vfio-pci driver and > > > > attached to that VM. > > > > > > > > In many cases, PF is unbound too from its original driver and attached > > > > to some other VM. It allows for these VM providers to maximize > > > > utilization of their SR-IOV devices. > > > > > > > > At least in PCI spec 6.0.1, it stays clearly that PF can be attached to SI (VM in spec language). > > > > "Physical Function (PF) - A PF is a PCIe Function that supports the SR-IOV Extended Capability > > > > and is accessible to an SR-PCIM, a VI, or an SI." > > > > > > Hm, that's possible, but do we have cases of this in practice where we bind and > > > unbind the same PF multiple times, pass it to different VMs, etc.? > > > > It is very common case, when the goal is to maximize hardware utilization. > > It is a sort of common configuration, but VFIO should be driving the > PF directly using its native SRIOV support. There is no need to rebind > a driver while SRIOV is still enabled. It depends on how you created these VFs. If you created them through native driver, you will need to unbind and bind it to VFIO. Thanks
On Mon, Nov 24, 2025 at 04:53:32PM +0200, Leon Romanovsky wrote: > On Mon, Nov 24, 2025 at 09:57:25AM -0400, Jason Gunthorpe wrote: > > On Sun, Nov 23, 2025 at 01:18:23PM +0200, Leon Romanovsky wrote: > > > > >> That sounds a bit odd to me, what exactly do you mean with "reuse the PF for > > > > >> VFIO"? What do you do with the PF after driver unload instead? Load another > > > > >> driver? If so, why separate ones? > > > > > > > > > > One of the main use cases for SR-IOV is to provide to VM users/customers > > > > > devices with performance and security promises as physical ones. In this > > > > > case, the VMs are created through PF and not bound to any driver. Once > > > > > customer/user requests VM, that VF is bound to vfio-pci driver and > > > > > attached to that VM. > > > > > > > > > > In many cases, PF is unbound too from its original driver and attached > > > > > to some other VM. It allows for these VM providers to maximize > > > > > utilization of their SR-IOV devices. > > > > > > > > > > At least in PCI spec 6.0.1, it stays clearly that PF can be attached to SI (VM in spec language). > > > > > "Physical Function (PF) - A PF is a PCIe Function that supports the SR-IOV Extended Capability > > > > > and is accessible to an SR-PCIM, a VI, or an SI." > > > > > > > > Hm, that's possible, but do we have cases of this in practice where we bind and > > > > unbind the same PF multiple times, pass it to different VMs, etc.? > > > > > > It is very common case, when the goal is to maximize hardware utilization. > > > > It is a sort of common configuration, but VFIO should be driving the > > PF directly using its native SRIOV support. There is no need to rebind > > a driver while SRIOV is still enabled. > > It depends on how you created these VFs. If you created them through > native driver, you will need to unbind and bind it to VFIO. That should not be done or encouraged. Jason
On Mon, Nov 24, 2025 at 11:04:50AM -0400, Jason Gunthorpe wrote: > On Mon, Nov 24, 2025 at 04:53:32PM +0200, Leon Romanovsky wrote: > > On Mon, Nov 24, 2025 at 09:57:25AM -0400, Jason Gunthorpe wrote: > > > On Sun, Nov 23, 2025 at 01:18:23PM +0200, Leon Romanovsky wrote: > > > > > >> That sounds a bit odd to me, what exactly do you mean with "reuse the PF for > > > > > >> VFIO"? What do you do with the PF after driver unload instead? Load another > > > > > >> driver? If so, why separate ones? > > > > > > > > > > > > One of the main use cases for SR-IOV is to provide to VM users/customers > > > > > > devices with performance and security promises as physical ones. In this > > > > > > case, the VMs are created through PF and not bound to any driver. Once > > > > > > customer/user requests VM, that VF is bound to vfio-pci driver and > > > > > > attached to that VM. > > > > > > > > > > > > In many cases, PF is unbound too from its original driver and attached > > > > > > to some other VM. It allows for these VM providers to maximize > > > > > > utilization of their SR-IOV devices. > > > > > > > > > > > > At least in PCI spec 6.0.1, it stays clearly that PF can be attached to SI (VM in spec language). > > > > > > "Physical Function (PF) - A PF is a PCIe Function that supports the SR-IOV Extended Capability > > > > > > and is accessible to an SR-PCIM, a VI, or an SI." > > > > > > > > > > Hm, that's possible, but do we have cases of this in practice where we bind and > > > > > unbind the same PF multiple times, pass it to different VMs, etc.? > > > > > > > > It is very common case, when the goal is to maximize hardware utilization. > > > > > > It is a sort of common configuration, but VFIO should be driving the > > > PF directly using its native SRIOV support. There is no need to rebind > > > a driver while SRIOV is still enabled. > > > > It depends on how you created these VFs. If you created them through > > native driver, you will need to unbind and bind it to VFIO. > > That should not be done or encouraged. Maybe, but this is how it is done now. Thanks > > Jason
Hi Peter,
kernel test robot noticed the following build errors:
[auto build test ERROR on e4addc7cc2dfcc19f1c8c8e47f3834b22cb21559]
url: https://github.com/intel-lab-lkp/linux/commits/Peter-Colberg/rust-pci-add-is_virtfn-to-check-for-VFs/20251120-062302
base: e4addc7cc2dfcc19f1c8c8e47f3834b22cb21559
patch link: https://lore.kernel.org/r/20251119-rust-pci-sriov-v1-7-883a94599a97%40redhat.com
patch subject: [PATCH 7/8] rust: pci: add physfn(), to return PF device for VF device
config: um-randconfig-001-20251121 (https://download.01.org/0day-ci/archive/20251121/202511211511.hUcSTiSF-lkp@intel.com/config)
compiler: clang version 22.0.0git (https://github.com/llvm/llvm-project 9e9fe08b16ea2c4d9867fb4974edf2a3776d6ece)
rustc: rustc 1.88.0 (6b00bc388 2025-06-23)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20251121/202511211511.hUcSTiSF-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202511211511.hUcSTiSF-lkp@intel.com/
All errors (new ones prefixed by >>):
>> error[E0609]: no field `__bindgen_anon_1` on type `bindings::pci_dev`
--> rust/kernel/pci.rs:546:40
|
546 | Ok(unsafe { &*(*self.as_raw()).__bindgen_anon_1.physfn.cast() })
| ^^^^^^^^^^^^^^^^ unknown field
|
= note: available field is: `_address`
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
© 2016 - 2025 Red Hat, Inc.