Hi Bjorn,
This is v5 of the fix for the SR-IOV race between driver .remove()
and concurrent hotplug events (particularly on s390).
Changes since v4 (Feb 28):
- Replaced local pci_rescan_remove_owner variable with
mutex_get_owner() to check lock ownership, as suggested by
Manivannan Sadhasivam and agreed by Benjamin Block
- Removed owner tracking from pci_lock_rescan_remove() and
pci_unlock_rescan_remove() - they are now unchanged from upstream
- Rebased on linux-next (20260302)
Changes since v3 (Feb 25):
- Rebased on linux-next (next-20260227)
- Declared pci_rescan_remove_owner as const pointer
(const struct task_struct *) to make clear it is not meant to
modify the task (Benjamin Block)
- Added Reviewed-by and Tested-by from Benjamin Block (IBM)
Changes since v2 (Feb 19):
- Rebased on linux-next (next-20260225)
- Added Tested-by from Dragos Tatulea (NVIDIA)
- No code changes from v2
Changes since v1 (Feb 14):
- Renamed from pci_lock_rescan_remove_nested() to
pci_lock_rescan_remove_reentrant() to avoid confusion with
mutex_lock_nested() lockdep annotations (Benjamin Block)
- Added pci_unlock_rescan_remove_reentrant(const bool locked) helper
to avoid open-coding conditional unlock at each call site
(Benjamin Block)
- Moved declarations from drivers/pci/pci.h to include/linux/pci.h
alongside existing lock/unlock declarations (Benjamin Block)
- Simplified callers: removed negation of return value and manual
conditional unlock in favor of the paired lock/unlock helpers
The problem: on s390, platform-generated hot-unplug events for VFs
can race with sriov_del_vfs() when a PF driver is being unloaded.
The platform event handler takes pci_rescan_remove_lock, but
sriov_del_vfs() does not, leading to double removal and list
corruption. We cannot use a plain mutex_lock() because
sriov_del_vfs() may be called from paths that already hold the
lock (deadlock), and mutex_trylock() cannot distinguish self from
other holders.
The fix introduces pci_lock_rescan_remove_reentrant() which uses
mutex_get_owner() to check if the current task already holds the
lock, avoiding both deadlock and the trylock problem.
Link: https://lore.kernel.org/linux-pci/20260214193235.262219-3-ionut.nechita@windriver.com/ [v1]
Link: https://lore.kernel.org/linux-pci/20260219212648.82606-1-ionut.nechita@windriver.com/ [v2]
Link: https://lore.kernel.org/linux-pci/20260225202434.18737-1-ionut.nechita@windriver.com/ [v3]
Link: https://lore.kernel.org/linux-pci/20260228120138.51197-2-ionut.nechita@windriver.com/ [v4]
Ionut Nechita (1):
PCI/IOV: Add reentrant locking in sriov_add_vfs/sriov_del_vfs for
complete serialization
drivers/pci/iov.c | 7 +++++++
drivers/pci/probe.c | 16 ++++++++++++++++
include/linux/pci.h | 2 ++
3 files changed, 25 insertions(+)
--
2.53.0