[PATCH] PCI: Using lockless config space accessors based on Kconfig option

Zijiang Huang posted 1 patch 7 months, 1 week ago
drivers/pci/access.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
[PATCH] PCI: Using lockless config space accessors based on Kconfig option
Posted by Zijiang Huang 7 months, 1 week ago
The generic PCI configuration space accessors are globally serialized via
pci_lock. On larger systems this causes massive lock contention when the
configuration space has to be accessed frequently. One such access pattern
is the Intel Uncore performance counter unit.

All x86 PCI configuration space accessors have either their own
serialization or can operate completely lockless, So Disable the global
lock in the generic PCI configuration space accessors

Signed-off-by: Zijiang Huang <kerayhuang@tencent.com>
Reviewed-by: Hao Peng <flyingpeng@tencent.com>
---
 drivers/pci/access.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 3c230ca3d..5200f7bbc 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -216,20 +216,21 @@ static noinline void pci_wait_cfg(struct pci_dev *dev)
 }
 
 /* Returns 0 on success, negative values indicate error. */
-#define PCI_USER_READ_CONFIG(size, type)					\
+#define PCI_USER_READ_CONFIG(size, type)				\
 int pci_user_read_config_##size						\
 	(struct pci_dev *dev, int pos, type *val)			\
 {									\
 	int ret = PCIBIOS_SUCCESSFUL;					\
 	u32 data = -1;							\
+	unsigned long flags;						\
 	if (PCI_##size##_BAD)						\
 		return -EINVAL;						\
-	raw_spin_lock_irq(&pci_lock);				\
+	pci_lock_config(flags);						\
 	if (unlikely(dev->block_cfg_access))				\
 		pci_wait_cfg(dev);					\
 	ret = dev->bus->ops->read(dev->bus, dev->devfn,			\
 					pos, sizeof(type), &data);	\
-	raw_spin_unlock_irq(&pci_lock);				\
+	pci_unlock_config(flags);					\
 	if (ret)							\
 		PCI_SET_ERROR_RESPONSE(val);				\
 	else								\
@@ -244,14 +245,15 @@ int pci_user_write_config_##size					\
 	(struct pci_dev *dev, int pos, type val)			\
 {									\
 	int ret = PCIBIOS_SUCCESSFUL;					\
+	unsigned long flags;						\
 	if (PCI_##size##_BAD)						\
 		return -EINVAL;						\
-	raw_spin_lock_irq(&pci_lock);				\
+	pci_lock_config(flags);						\
 	if (unlikely(dev->block_cfg_access))				\
 		pci_wait_cfg(dev);					\
 	ret = dev->bus->ops->write(dev->bus, dev->devfn,		\
 					pos, sizeof(type), val);	\
-	raw_spin_unlock_irq(&pci_lock);				\
+	pci_unlock_config(flags);					\
 	return pcibios_err_to_errno(ret);				\
 }									\
 EXPORT_SYMBOL_GPL(pci_user_write_config_##size);
-- 
2.43.5
Re: [PATCH] PCI: Using lockless config space accessors based on Kconfig option
Posted by Lukas Wunner 7 months, 1 week ago
[cc += Thomas, start of thread:
https://lore.kernel.org/r/20250507073028.2071852-1-kerayhuang@tencent.com
]

On Wed, May 07, 2025 at 03:30:28PM +0800, Zijiang Huang wrote:
> The generic PCI configuration space accessors are globally serialized via
> pci_lock. On larger systems this causes massive lock contention when the
> configuration space has to be accessed frequently. One such access pattern
> is the Intel Uncore performance counter unit.

Verbatim copy-paste from Thomas Gleixner's commit 714fe383d6c9
("PCI: Provide Kconfig option for lockless config space accessors").

> All x86 PCI configuration space accessors have either their own
> serialization or can operate completely lockless, So Disable the global
> lock in the generic PCI configuration space accessors

Also copied and rephrased from the above-mentioned commit.

The question is, why did the commit only replace raw_spin_lock()
with pci_lock_config() in the in-kernel PCI accessors, but not in
the user space accessors?  Is it safe to replace it there as well?

Why is performance of the user space accessors important?
Perhaps because of vfio?

That's the information I'm missing in the commit message.

Thanks,

Lukas

> Signed-off-by: Zijiang Huang <kerayhuang@tencent.com>
> Reviewed-by: Hao Peng <flyingpeng@tencent.com>
> ---
>  drivers/pci/access.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pci/access.c b/drivers/pci/access.c
> index 3c230ca3d..5200f7bbc 100644
> --- a/drivers/pci/access.c
> +++ b/drivers/pci/access.c
> @@ -216,20 +216,21 @@ static noinline void pci_wait_cfg(struct pci_dev *dev)
>  }
>  
>  /* Returns 0 on success, negative values indicate error. */
> -#define PCI_USER_READ_CONFIG(size, type)					\
> +#define PCI_USER_READ_CONFIG(size, type)				\
>  int pci_user_read_config_##size						\
>  	(struct pci_dev *dev, int pos, type *val)			\
>  {									\
>  	int ret = PCIBIOS_SUCCESSFUL;					\
>  	u32 data = -1;							\
> +	unsigned long flags;						\
>  	if (PCI_##size##_BAD)						\
>  		return -EINVAL;						\
> -	raw_spin_lock_irq(&pci_lock);				\
> +	pci_lock_config(flags);						\
>  	if (unlikely(dev->block_cfg_access))				\
>  		pci_wait_cfg(dev);					\
>  	ret = dev->bus->ops->read(dev->bus, dev->devfn,			\
>  					pos, sizeof(type), &data);	\
> -	raw_spin_unlock_irq(&pci_lock);				\
> +	pci_unlock_config(flags);					\
>  	if (ret)							\
>  		PCI_SET_ERROR_RESPONSE(val);				\
>  	else								\
> @@ -244,14 +245,15 @@ int pci_user_write_config_##size					\
>  	(struct pci_dev *dev, int pos, type val)			\
>  {									\
>  	int ret = PCIBIOS_SUCCESSFUL;					\
> +	unsigned long flags;						\
>  	if (PCI_##size##_BAD)						\
>  		return -EINVAL;						\
> -	raw_spin_lock_irq(&pci_lock);				\
> +	pci_lock_config(flags);						\
>  	if (unlikely(dev->block_cfg_access))				\
>  		pci_wait_cfg(dev);					\
>  	ret = dev->bus->ops->write(dev->bus, dev->devfn,		\
>  					pos, sizeof(type), val);	\
> -	raw_spin_unlock_irq(&pci_lock);				\
> +	pci_unlock_config(flags);					\
>  	return pcibios_err_to_errno(ret);				\
>  }									\
>  EXPORT_SYMBOL_GPL(pci_user_write_config_##size);
> -- 
> 2.43.5
Re: [PATCH] PCI: Using lockless config space accessors based on Kconfig option
Posted by Thomas Gleixner 7 months, 1 week ago
On Wed, May 07 2025 at 09:53, Lukas Wunner wrote:
> On Wed, May 07, 2025 at 03:30:28PM +0800, Zijiang Huang wrote:
> The question is, why did the commit only replace raw_spin_lock()
> with pci_lock_config() in the in-kernel PCI accessors, but not in
> the user space accessors?  Is it safe to replace it there as well?

See comment above pci_cfg_access_lock() ...
Re: [PATCH] PCI: Using lockless config space accessors based on
Posted by Zijiang Huang 7 months, 1 week ago
Hi Lukas,

I think it's safe to make this change for user-space accessors as well,
since user-space only reads from proc files.

> Why is performance of the user space accessors important?
> Perhaps because of vfio?

During stability testing on large-scale machines (384+ CPUs), we always                                                                                                                                                                                           
observed that heavy concurrent user-space access to PCI config space triggers 
kernel softlockups.
 
Reproduction method: stress-ng --pci 384 

Thanks,

keray

> That's the information I'm missing in the commit message.
> 
> Thanks,
> 
> Lukas
> 
> > Signed-off-by: Zijiang Huang <kerayhuang@tencent.com>
> > Reviewed-by: Hao Peng <flyingpeng@tencent.com>
> > ---
> >  drivers/pci/access.c | 12 +++++++-----
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/pci/access.c b/drivers/pci/access.c
> > index 3c230ca3d..5200f7bbc 100644
> > --- a/drivers/pci/access.c
> > +++ b/drivers/pci/access.c
> > @@ -216,20 +216,21 @@ static noinline void pci_wait_cfg(struct pci_dev *dev)
> >  }
> >  
> >  /* Returns 0 on success, negative values indicate error. */
> > -#define PCI_USER_READ_CONFIG(size, type)					\
> > +#define PCI_USER_READ_CONFIG(size, type)				\
> >  int pci_user_read_config_##size						\
> >  	(struct pci_dev *dev, int pos, type *val)			\
> >  {									\
> >  	int ret = PCIBIOS_SUCCESSFUL;					\
> >  	u32 data = -1;							\
> > +	unsigned long flags;						\
> >  	if (PCI_##size##_BAD)						\
> >  		return -EINVAL;						\
> > -	raw_spin_lock_irq(&pci_lock);				\
> > +	pci_lock_config(flags);						\
> >  	if (unlikely(dev->block_cfg_access))				\
> >  		pci_wait_cfg(dev);					\
> >  	ret = dev->bus->ops->read(dev->bus, dev->devfn,			\
> >  					pos, sizeof(type), &data);	\
> > -	raw_spin_unlock_irq(&pci_lock);				\
> > +	pci_unlock_config(flags);					\
> >  	if (ret)							\
> >  		PCI_SET_ERROR_RESPONSE(val);				\
> >  	else								\
> > @@ -244,14 +245,15 @@ int pci_user_write_config_##size					\
> >  	(struct pci_dev *dev, int pos, type val)			\
> >  {									\
> >  	int ret = PCIBIOS_SUCCESSFUL;					\
> > +	unsigned long flags;						\
> >  	if (PCI_##size##_BAD)						\
> >  		return -EINVAL;						\
> > -	raw_spin_lock_irq(&pci_lock);				\
> > +	pci_lock_config(flags);						\
> >  	if (unlikely(dev->block_cfg_access))				\
> >  		pci_wait_cfg(dev);					\
> >  	ret = dev->bus->ops->write(dev->bus, dev->devfn,		\
> >  					pos, sizeof(type), val);	\
> > -	raw_spin_unlock_irq(&pci_lock);				\
> > +	pci_unlock_config(flags);					\
> >  	return pcibios_err_to_errno(ret);				\
> >  }									\
> >  EXPORT_SYMBOL_GPL(pci_user_write_config_##size);
> > -- 
> > 2.43.5
Re: [PATCH] PCI: Using lockless config space accessors based on
Posted by Thomas Gleixner 7 months, 1 week ago
On Wed, May 07 2025 at 17:04, Zijiang Huang wrote:
> I think it's safe to make this change for user-space accessors as well,
> since user-space only reads from proc files.

Again. See pci_cfg_access_lock()

>> Why is performance of the user space accessors important?
>> Perhaps because of vfio?
>
> During stability testing on large-scale machines (384+ CPUs), we always                                             > observed that heavy concurrent user-space access to PCI config space triggers 
> kernel softlockups.
>  
> Reproduction method: stress-ng --pci 384 

This is not really interesting as stress-ng is not a real world work
load.

What's the actual real world use case which uses those interfaces so
that the lock becomes an issue?

Thanks,

        tglx