drivers/pci/access.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
The generic PCI configuration space accessors are globally serialized via
pci_lock. On larger systems this causes massive lock contention when the
configuration space has to be accessed frequently. One such access pattern
is the Intel Uncore performance counter unit.
All x86 PCI configuration space accessors have either their own
serialization or can operate completely lockless, So Disable the global
lock in the generic PCI configuration space accessors
Signed-off-by: Zijiang Huang <kerayhuang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
---
drivers/pci/access.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 3c230ca3d..5200f7bbc 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -216,20 +216,21 @@ static noinline void pci_wait_cfg(struct pci_dev *dev)
}
/* Returns 0 on success, negative values indicate error. */
-#define PCI_USER_READ_CONFIG(size, type) \
+#define PCI_USER_READ_CONFIG(size, type) \
int pci_user_read_config_##size \
(struct pci_dev *dev, int pos, type *val) \
{ \
int ret = PCIBIOS_SUCCESSFUL; \
u32 data = -1; \
+ unsigned long flags; \
if (PCI_##size##_BAD) \
return -EINVAL; \
- raw_spin_lock_irq(&pci_lock); \
+ pci_lock_config(flags); \
if (unlikely(dev->block_cfg_access)) \
pci_wait_cfg(dev); \
ret = dev->bus->ops->read(dev->bus, dev->devfn, \
pos, sizeof(type), &data); \
- raw_spin_unlock_irq(&pci_lock); \
+ pci_unlock_config(flags); \
if (ret) \
PCI_SET_ERROR_RESPONSE(val); \
else \
@@ -244,14 +245,15 @@ int pci_user_write_config_##size \
(struct pci_dev *dev, int pos, type val) \
{ \
int ret = PCIBIOS_SUCCESSFUL; \
+ unsigned long flags; \
if (PCI_##size##_BAD) \
return -EINVAL; \
- raw_spin_lock_irq(&pci_lock); \
+ pci_lock_config(flags); \
if (unlikely(dev->block_cfg_access)) \
pci_wait_cfg(dev); \
ret = dev->bus->ops->write(dev->bus, dev->devfn, \
pos, sizeof(type), val); \
- raw_spin_unlock_irq(&pci_lock); \
+ pci_unlock_config(flags); \
return pcibios_err_to_errno(ret); \
} \
EXPORT_SYMBOL_GPL(pci_user_write_config_##size);
--
2.43.5
[cc += Thomas, start of thread:
https://lore.kernel.org/r/20250507073028.2071852-1-kerayhuang@tencent.com
]
On Wed, May 07, 2025 at 03:30:28PM +0800, Zijiang Huang wrote:
> The generic PCI configuration space accessors are globally serialized via
> pci_lock. On larger systems this causes massive lock contention when the
> configuration space has to be accessed frequently. One such access pattern
> is the Intel Uncore performance counter unit.
Verbatim copy-paste from Thomas Gleixner's commit 714fe383d6c9
("PCI: Provide Kconfig option for lockless config space accessors").
> All x86 PCI configuration space accessors have either their own
> serialization or can operate completely lockless, So Disable the global
> lock in the generic PCI configuration space accessors
Also copied and rephrased from the above-mentioned commit.
The question is, why did the commit only replace raw_spin_lock()
with pci_lock_config() in the in-kernel PCI accessors, but not in
the user space accessors? Is it safe to replace it there as well?
Why is performance of the user space accessors important?
Perhaps because of vfio?
That's the information I'm missing in the commit message.
Thanks,
Lukas
> Signed-off-by: Zijiang Huang <kerayhuang@tencent.com>
> Reviewed-by: Hao Peng <flyingpeng@tencent.com>
> ---
> drivers/pci/access.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
> index 3c230ca3d..5200f7bbc 100644
> --- a/drivers/pci/access.c
> +++ b/drivers/pci/access.c
> @@ -216,20 +216,21 @@ static noinline void pci_wait_cfg(struct pci_dev *dev)
> }
>
> /* Returns 0 on success, negative values indicate error. */
> -#define PCI_USER_READ_CONFIG(size, type) \
> +#define PCI_USER_READ_CONFIG(size, type) \
> int pci_user_read_config_##size \
> (struct pci_dev *dev, int pos, type *val) \
> { \
> int ret = PCIBIOS_SUCCESSFUL; \
> u32 data = -1; \
> + unsigned long flags; \
> if (PCI_##size##_BAD) \
> return -EINVAL; \
> - raw_spin_lock_irq(&pci_lock); \
> + pci_lock_config(flags); \
> if (unlikely(dev->block_cfg_access)) \
> pci_wait_cfg(dev); \
> ret = dev->bus->ops->read(dev->bus, dev->devfn, \
> pos, sizeof(type), &data); \
> - raw_spin_unlock_irq(&pci_lock); \
> + pci_unlock_config(flags); \
> if (ret) \
> PCI_SET_ERROR_RESPONSE(val); \
> else \
> @@ -244,14 +245,15 @@ int pci_user_write_config_##size \
> (struct pci_dev *dev, int pos, type val) \
> { \
> int ret = PCIBIOS_SUCCESSFUL; \
> + unsigned long flags; \
> if (PCI_##size##_BAD) \
> return -EINVAL; \
> - raw_spin_lock_irq(&pci_lock); \
> + pci_lock_config(flags); \
> if (unlikely(dev->block_cfg_access)) \
> pci_wait_cfg(dev); \
> ret = dev->bus->ops->write(dev->bus, dev->devfn, \
> pos, sizeof(type), val); \
> - raw_spin_unlock_irq(&pci_lock); \
> + pci_unlock_config(flags); \
> return pcibios_err_to_errno(ret); \
> } \
> EXPORT_SYMBOL_GPL(pci_user_write_config_##size);
> --
> 2.43.5
On Wed, May 07 2025 at 09:53, Lukas Wunner wrote: > On Wed, May 07, 2025 at 03:30:28PM +0800, Zijiang Huang wrote: > The question is, why did the commit only replace raw_spin_lock() > with pci_lock_config() in the in-kernel PCI accessors, but not in > the user space accessors? Is it safe to replace it there as well? See comment above pci_cfg_access_lock() ...
Hi Lukas,
I think it's safe to make this change for user-space accessors as well,
since user-space only reads from proc files.
> Why is performance of the user space accessors important?
> Perhaps because of vfio?
During stability testing on large-scale machines (384+ CPUs), we always
observed that heavy concurrent user-space access to PCI config space triggers
kernel softlockups.
Reproduction method: stress-ng --pci 384
Thanks,
keray
> That's the information I'm missing in the commit message.
>
> Thanks,
>
> Lukas
>
> > Signed-off-by: Zijiang Huang <kerayhuang@tencent.com>
> > Reviewed-by: Hao Peng <flyingpeng@tencent.com>
> > ---
> > drivers/pci/access.c | 12 +++++++-----
> > 1 file changed, 7 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/pci/access.c b/drivers/pci/access.c
> > index 3c230ca3d..5200f7bbc 100644
> > --- a/drivers/pci/access.c
> > +++ b/drivers/pci/access.c
> > @@ -216,20 +216,21 @@ static noinline void pci_wait_cfg(struct pci_dev *dev)
> > }
> >
> > /* Returns 0 on success, negative values indicate error. */
> > -#define PCI_USER_READ_CONFIG(size, type) \
> > +#define PCI_USER_READ_CONFIG(size, type) \
> > int pci_user_read_config_##size \
> > (struct pci_dev *dev, int pos, type *val) \
> > { \
> > int ret = PCIBIOS_SUCCESSFUL; \
> > u32 data = -1; \
> > + unsigned long flags; \
> > if (PCI_##size##_BAD) \
> > return -EINVAL; \
> > - raw_spin_lock_irq(&pci_lock); \
> > + pci_lock_config(flags); \
> > if (unlikely(dev->block_cfg_access)) \
> > pci_wait_cfg(dev); \
> > ret = dev->bus->ops->read(dev->bus, dev->devfn, \
> > pos, sizeof(type), &data); \
> > - raw_spin_unlock_irq(&pci_lock); \
> > + pci_unlock_config(flags); \
> > if (ret) \
> > PCI_SET_ERROR_RESPONSE(val); \
> > else \
> > @@ -244,14 +245,15 @@ int pci_user_write_config_##size \
> > (struct pci_dev *dev, int pos, type val) \
> > { \
> > int ret = PCIBIOS_SUCCESSFUL; \
> > + unsigned long flags; \
> > if (PCI_##size##_BAD) \
> > return -EINVAL; \
> > - raw_spin_lock_irq(&pci_lock); \
> > + pci_lock_config(flags); \
> > if (unlikely(dev->block_cfg_access)) \
> > pci_wait_cfg(dev); \
> > ret = dev->bus->ops->write(dev->bus, dev->devfn, \
> > pos, sizeof(type), val); \
> > - raw_spin_unlock_irq(&pci_lock); \
> > + pci_unlock_config(flags); \
> > return pcibios_err_to_errno(ret); \
> > } \
> > EXPORT_SYMBOL_GPL(pci_user_write_config_##size);
> > --
> > 2.43.5
On Wed, May 07 2025 at 17:04, Zijiang Huang wrote:
> I think it's safe to make this change for user-space accessors as well,
> since user-space only reads from proc files.
Again. See pci_cfg_access_lock()
>> Why is performance of the user space accessors important?
>> Perhaps because of vfio?
>
> During stability testing on large-scale machines (384+ CPUs), we always > observed that heavy concurrent user-space access to PCI config space triggers
> kernel softlockups.
>
> Reproduction method: stress-ng --pci 384
This is not really interesting as stress-ng is not a real world work
load.
What's the actual real world use case which uses those interfaces so
that the lock becomes an issue?
Thanks,
tglx
© 2016 - 2025 Red Hat, Inc.