[PATCH v10 0/2] PCI/IOV: Fix SR-IOV locking races and AB-BA deadlock

Ionut Nechita (Wind River) posted 2 patches 2 weeks, 4 days ago
There is a newer version of this series
drivers/pci/iov.c       |  9 +++++----
drivers/pci/pci-sysfs.c | 20 +++++++++++++++++++-
drivers/pci/probe.c     | 11 +++++++++--
3 files changed, 33 insertions(+), 7 deletions(-)
[PATCH v10 0/2] PCI/IOV: Fix SR-IOV locking races and AB-BA deadlock
Posted by Ionut Nechita (Wind River) 2 weeks, 4 days ago
From: Ionut Nechita <ionut.nechita@windriver.com>

Hi Bjorn,

This is v10 of the fix for the SR-IOV race between driver .remove()
and concurrent hotplug events.  v10 adds a second patch to fix the
AB-BA deadlock between device_lock and pci_rescan_remove_lock that
was reported by Guenter Roeck (via Google's AI review agent) and
confirmed by Benjamin Block.

The AB-BA deadlock:

  CPU0 (remove_store)               CPU1 (unbind_store)
  --------------------              --------------------
  pci_lock_rescan_remove()
                                    device_lock()
                                    driver .remove()
                                      sriov_del_vfs()
                                        pci_lock_rescan_remove()  <-- WAITS
  pci_stop_bus_device()
    device_release_driver()
      device_lock()                                               <-- WAITS

Patch 2/2 fixes this by calling device_release_driver() in
remove_store() before pci_stop_and_remove_bus_device_locked(), so
that the driver is already unbound when pci_rescan_remove_lock is
acquired. Both paths then take locks in the same order: device_lock
first, then pci_rescan_remove_lock.

Note: the concurrent unbind_store + hotplug-event case (where the
hotplug handler takes pci_rescan_remove_lock before device_lock)
remains a known limitation.  This is a pre-existing issue that
Benjamin Block is addressing separately in:
  https://lore.kernel.org/linux-pci/354b9e4a54ced67f3c89df198041df19434fe4c8.1773235561.git.bblock@linux.ibm.com/

Changes since v9 (Mar 10):
  - NEW patch 2/2: fix AB-BA deadlock in remove_store() by calling
    device_release_driver() before pci_stop_and_remove_bus_device_locked(),
    as suggested by Benjamin Block (addresses Guenter Roeck's report)
  - Patch 1/2 unchanged from v9

Changes since v8 (Mar 9):
  - Added Reviewed-by from Niklas Schnelle (IBM) and Tested-by (s390)
  - Added Fixes tags for the three related commits
  - Removed rescan/remove locking from sriov_numvfs_store() since
    locking is now handled in sriov_add_vfs() and sriov_del_vfs()
  - Rebased on linux-next (20260309)

This race has been independently observed by multiple organizations:
  - IBM (s390 platform-generated hot-unplug events racing with
    sriov_del_vfs during PF driver unload)
  - NVIDIA (tested by Dragos Tatulea in earlier versions)
  - Intel (xe driver hitting lockdep warnings and deadlocks when
    calling pci_disable_sriov from .remove)
  - Wind River (original reporter and patch author)

Test environment:
  - Tested on s390 by Benjamin Block and Niklas Schnelle (IBM)
  - Tested on x86_64 with Intel and NVIDIA SR-IOV devices (earlier
    versions)

Based on linux-next (next-20260318).

Link: https://lore.kernel.org/linux-pci/20260214193235.262219-3-ionut.nechita@windriver.com/ [v1]
Link: https://lore.kernel.org/linux-pci/20260219212648.82606-1-ionut.nechita@windriver.com/ [v2]
Link: https://lore.kernel.org/lkml/20260225202434.18737-1-ionut.nechita@windriver.com/ [v3]
Link: https://lore.kernel.org/linux-pci/20260228120138.51197-2-ionut.nechita@windriver.com/ [v4]
Link: https://lore.kernel.org/linux-pci/20260303080903.28693-1-ionut.nechita@windriver.com/ [v5]
Link: https://lore.kernel.org/linux-pci/20260306082108.17322-1-ionut.nechita@windriver.com/ [v6]
Link: https://lore.kernel.org/linux-pci/20260308135352.80346-1-ionut.nechita@windriver.com/ [v7]
Link: https://lore.kernel.org/linux-pci/20260309194920.16459-1-ionut.nechita@windriver.com/ [v8]
Link: https://lore.kernel.org/linux-pci/20260310074303.17480-1-ionut.nechita@windriver.com/ [v9]

Ionut Nechita (2):
  PCI/IOV: Make pci_lock_rescan_remove() reentrant and protect
    sriov_add_vfs/sriov_del_vfs
  PCI: Fix AB-BA deadlock between device_lock and
    pci_rescan_remove_lock in remove_store

 drivers/pci/iov.c       |  9 +++++----
 drivers/pci/pci-sysfs.c | 20 +++++++++++++++++++-
 drivers/pci/probe.c     | 11 +++++++++--
 3 files changed, 33 insertions(+), 7 deletions(-)

--
2.43.0
Re: [PATCH v10 0/2] PCI/IOV: Fix SR-IOV locking races and AB-BA deadlock
Posted by Niklas Schnelle 2 weeks, 4 days ago
On Wed, 2026-03-18 at 23:03 +0200, Ionut Nechita (Wind River) wrote:
> From: Ionut Nechita <ionut.nechita@windriver.com>
> 
> Hi Bjorn,
> 
> This is v10 of the fix for the SR-IOV race between driver .remove()
> and concurrent hotplug events.  v10 adds a second patch to fix the
> AB-BA deadlock between device_lock and pci_rescan_remove_lock that
> was reported by Guenter Roeck (via Google's AI review agent) and
> confirmed by Benjamin Block.
> 
> The AB-BA deadlock:
> 
>   CPU0 (remove_store)               CPU1 (unbind_store)
>   --------------------              --------------------
>   pci_lock_rescan_remove()
>                                     device_lock()
>                                     driver .remove()
>                                       sriov_del_vfs()
>                                         pci_lock_rescan_remove()  <-- WAITS
>   pci_stop_bus_device()
>     device_release_driver()
>       device_lock()                                               <-- WAITS
> 
> Patch 2/2 fixes this by calling device_release_driver() in
> remove_store() before pci_stop_and_remove_bus_device_locked(), so
> that the driver is already unbound when pci_rescan_remove_lock is
> acquired. Both paths then take locks in the same order: device_lock
> first, then pci_rescan_remove_lock.
> 
> Note: the concurrent unbind_store + hotplug-event case (where the
> hotplug handler takes pci_rescan_remove_lock before device_lock)
> remains a known limitation.  This is a pre-existing issue that
> Benjamin Block is addressing separately in:
>   https://lore.kernel.org/linux-pci/354b9e4a54ced67f3c89df198041df19434fe4c8.1773235561.git.bblock@linux.ibm.com/
> 
--- snip ---
> 
> Ionut Nechita (2):
>   PCI/IOV: Make pci_lock_rescan_remove() reentrant and protect
>     sriov_add_vfs/sriov_del_vfs
>   PCI: Fix AB-BA deadlock between device_lock and
>     pci_rescan_remove_lock in remove_store
> 
>  drivers/pci/iov.c       |  9 +++++----
>  drivers/pci/pci-sysfs.c | 20 +++++++++++++++++++-
>  drivers/pci/probe.c     | 11 +++++++++--
>  3 files changed, 33 insertions(+), 7 deletions(-)
> 
> --
> 2.43.0

Hi Ionut,

For your awareness, I saw that this series has some findings on
Google's new Sashiko AI reviewing tool[0]. At a quick glance the
findings seem like at least reasonable concerns to me. I'm still
looking at this independently also of course.

Thanks,
Niklas

[0]
https://sashiko.dev/#/patchset/20260318210316.61975-1-ionut.nechita%40windriver.com
Re: [PATCH v10 0/2] PCI/IOV: Fix SR-IOV locking races and AB-BA deadlock
Posted by Ionut Nechita (Wind River) 2 weeks, 3 days ago
On Thu, 19 Mar 2026 13:31:39 +0100, Niklas Schnelle wrote:
> For your awareness, I saw that this series has some findings on
> Google's new Sashiko AI reviewing tool[0]. At a quick glance the
> findings seem like at least reasonable concerns to me. I'm still
> looking at this independently also of course.

Hi Niklas,

Thanks for pointing this out. I went through each of the Sashiko
findings carefully(+ AI). Here is my analysis:


1) Incomplete Deadlock Fix (Hierarchical Topology)
   — remove_store on a bridge still has the AB-BA between
     pci_rescan_remove_lock and device_lock(child)

   This is correct, but it is a pre-existing issue, not introduced
   by this series. The v10 cover letter explicitly acknowledges this:

     "Note: the concurrent unbind_store + hotplug-event case (where
      the hotplug handler takes pci_rescan_remove_lock before
      device_lock) remains a known limitation."

   and points to Benjamin Block's separate series that addresses the
   broader hierarchical locking problem:
     https://lore.kernel.org/linux-pci/354b9e4a54ced67f3c89df198041df19434fe4c8.1773235561.git.bblock@linux.ibm.com/

   Patch 2/2 fixes the specific deadlock reported by Guenter Roeck
   (remove_store vs unbind_store on the same device). It does not
   claim to solve all lock ordering issues in the PCI subsystem.
   Trying to do so in a stable-targeted fix would be too invasive.


2) Driver Teardown Ordering Violation
   — unbinding the bridge driver before its children causes PCIe
     errors because pci_disable_device() clears Memory/IO Enable

   This is a partially false positive. Sashiko assumes that
   pci_disable_device() on a bridge disables forwarding of
   transactions to child devices. It does not — pci_disable_device()
   clears the bus master bit and decrements the enable count, but
   does not touch the Memory Space Enable or I/O Space Enable bits
   that control transaction forwarding on bridges. Child devices
   can still access their MMIO regions through the bridge.

   What does happen is that pcie_portdrv_remove() tears down port
   services (AER, hotplug, PME, DPC) before the children are
   unbound. This means error reporting is degraded during child
   teardown, but it does not cause Unsupported Requests, Completer
   Aborts, or system crashes as Sashiko claims.

   Also worth noting: this scenario (remove_store on a bridge)
   is not the path that was deadlocking. The reported deadlocks
   were on the PF device itself, not on its parent bridge.


3) TOCTOU Race Condition / Lock Window Vulnerability
   — a driver can rebind between device_release_driver() and
     pci_stop_and_remove_bus_device_locked()

   This is theoretically valid but practically impossible. The
   window is a few instructions wide. For this race to trigger:

   a) device_remove_file_self() has already removed the "remove"
      sysfs attribute, signaling the device is being torn down
   b) a bind_store or udev probe would need to fire in exactly
      that window
   c) the newly bound driver's probe() would need to call
      pci_enable_sriov() and block on pci_rescan_remove_lock

   This is the same pattern used elsewhere in the kernel (e.g.,
   the existing remove_store already had no synchronization between
   device_remove_file_self() and pci_stop_and_remove_bus_device_locked()
   — the patch just adds one more call in between).

   If this is a real concern, it would need to be addressed as a
   separate improvement, not as a blocker for this fix.


4) Use-After-Free in remove_store
   — device_remove_file_self() breaks active sysfs protection,
     allowing concurrent device_del() to free dev

   This is a false positive. kernfs_remove_self() does three things:

   a) breaks active protection (decrements active ref)
   b) calls __kernfs_remove(kn) to unlink the node
   c) calls kernfs_unbreak_active_protection(kn) to restore the
      active ref

   After step (c), the active reference is restored. The sysfs
   write handler (kernfs_fop_write_iter) still holds this active
   reference for the duration of the store callback. A concurrent
   device_del() calling kernfs_remove() on the parent directory
   will call kernfs_drain() on all remaining children, which blocks
   until active references drop to zero. Since remove_store still
   holds the active ref (restored in step c), device_del() will
   block until remove_store returns.

   Additionally, pci_rescan_remove_lock serializes any concurrent
   PCI device removal (hotplug events also take this lock), so
   the scenario described by Sashiko cannot actually occur.

   This is the same lifetime model that the existing remove_store
   (without this patch) has always relied on.


In summary: finding 1 is a known pre-existing limitation documented
in the cover letter, findings 2 and 4 are false positives, and
finding 3 is a theoretical race that is not practically exploitable
and is not new to this patch.

I don't think a v11 is needed based on these findings. But I'm
happy to discuss further if you see something I missed in my
analysis.

Thanks,
Ionut
Re: [PATCH v10 0/2] PCI/IOV: Fix SR-IOV locking races and AB-BA deadlock
Posted by Benjamin Block 1 week, 5 days ago
On Thu, Mar 19, 2026 at 10:27:55PM +0200, Ionut Nechita (Wind River) wrote:
> On Thu, 19 Mar 2026 13:31:39 +0100, Niklas Schnelle wrote:
> > For your awareness, I saw that this series has some findings on
> > Google's new Sashiko AI reviewing tool[0]. At a quick glance the
> > findings seem like at least reasonable concerns to me. I'm still
> > looking at this independently also of course.
> 
--8<--
> 3) TOCTOU Race Condition / Lock Window Vulnerability
>    — a driver can rebind between device_release_driver() and
>      pci_stop_and_remove_bus_device_locked()
> 
>    This is theoretically valid but practically impossible. The
>    window is a few instructions wide. For this race to trigger:
> 
>    a) device_remove_file_self() has already removed the "remove"
>       sysfs attribute, signaling the device is being torn down
>    b) a bind_store or udev probe would need to fire in exactly
>       that window
>    c) the newly bound driver's probe() would need to call
>       pci_enable_sriov() and block on pci_rescan_remove_lock
> 
>    This is the same pattern used elsewhere in the kernel (e.g.,
>    the existing remove_store already had no synchronization between
>    device_remove_file_self() and pci_stop_and_remove_bus_device_locked()
>    — the patch just adds one more call in between).
> 
>    If this is a real concern, it would need to be addressed as a
>    separate improvement, not as a blocker for this fix.

I haven't had time to fully review all this yet, but one quick comment: after
the first idea to unbind the device driver I also realized we could have a
race here between unbinding, and then possibly re-binding. We could probably
prevent that by marking the device as dead:

+	if (val && device_remove_file_self(dev, attr)) {
+		device_lock(dev);
+		kill_device(dev);
+		device_unlock(dev);

This doesn't modify the reference count or anything, but only sets the private
member of the `struct device` `dead` to true. This can't be undone using the
device core's public API, and once set, a device can not be bound to a new
device-driver.

This should prevent any such race AFAICS. The unbind is protected by the
device-mutex, so once the flag is set, and the unbind is done, this device
will stay unbound.

It's not really "pretty" though.

-- 
Best Regards, Benjamin Block        /        Linux on IBM Z Kernel Development
IBM Deutschland Research & Development GmbH    /   https://www.ibm.com/privacy
Vors. Aufs.-R.: Wolfgang Wendt         /        Geschäftsführung: David Faller
Sitz der Ges.: Ehningen     /     Registergericht: AmtsG Stuttgart, HRB 243294
Re: [PATCH v10 0/2] PCI/IOV: Fix SR-IOV locking races and AB-BA deadlock
Posted by Guenter Roeck 2 weeks, 3 days ago
On 3/19/26 05:31, Niklas Schnelle wrote:
> On Wed, 2026-03-18 at 23:03 +0200, Ionut Nechita (Wind River) wrote:
>> From: Ionut Nechita <ionut.nechita@windriver.com>
>>
>> Hi Bjorn,
>>
>> This is v10 of the fix for the SR-IOV race between driver .remove()
>> and concurrent hotplug events.  v10 adds a second patch to fix the
>> AB-BA deadlock between device_lock and pci_rescan_remove_lock that
>> was reported by Guenter Roeck (via Google's AI review agent) and
>> confirmed by Benjamin Block.
>>
>> The AB-BA deadlock:
>>
>>    CPU0 (remove_store)               CPU1 (unbind_store)
>>    --------------------              --------------------
>>    pci_lock_rescan_remove()
>>                                      device_lock()
>>                                      driver .remove()
>>                                        sriov_del_vfs()
>>                                          pci_lock_rescan_remove()  <-- WAITS
>>    pci_stop_bus_device()
>>      device_release_driver()
>>        device_lock()                                               <-- WAITS
>>
>> Patch 2/2 fixes this by calling device_release_driver() in
>> remove_store() before pci_stop_and_remove_bus_device_locked(), so
>> that the driver is already unbound when pci_rescan_remove_lock is
>> acquired. Both paths then take locks in the same order: device_lock
>> first, then pci_rescan_remove_lock.
>>
>> Note: the concurrent unbind_store + hotplug-event case (where the
>> hotplug handler takes pci_rescan_remove_lock before device_lock)
>> remains a known limitation.  This is a pre-existing issue that
>> Benjamin Block is addressing separately in:
>>    https://lore.kernel.org/linux-pci/354b9e4a54ced67f3c89df198041df19434fe4c8.1773235561.git.bblock@linux.ibm.com/
>>
> --- snip ---
>>
>> Ionut Nechita (2):
>>    PCI/IOV: Make pci_lock_rescan_remove() reentrant and protect
>>      sriov_add_vfs/sriov_del_vfs
>>    PCI: Fix AB-BA deadlock between device_lock and
>>      pci_rescan_remove_lock in remove_store
>>
>>   drivers/pci/iov.c       |  9 +++++----
>>   drivers/pci/pci-sysfs.c | 20 +++++++++++++++++++-
>>   drivers/pci/probe.c     | 11 +++++++++--
>>   3 files changed, 33 insertions(+), 7 deletions(-)
>>
>> --
>> 2.43.0
> 
> Hi Ionut,
> 
> For your awareness, I saw that this series has some findings on
> Google's new Sashiko AI reviewing tool[0]. At a quick glance the
> findings seem like at least reasonable concerns to me. I'm still
> looking at this independently also of course.
> 

It is almost scary to see how many problems Sashiko is able to find.
The AB-BA deadlock that the second patch in the series tries to fix
was reported by a prototype version of it when running it on an LTS
backport.

Guenter