[PATCH v3 2/2] PCI: AtomicOps: Fix logic in enable function

Gerd Bayer posted 2 patches 1 month ago
There is a newer version of this series
[PATCH v3 2/2] PCI: AtomicOps: Fix logic in enable function
Posted by Gerd Bayer 1 month ago
Move the check for root port requirements past the loop within
pci_enable_atomic_ops_to_root() that checks on potential switch
(up- and downstream) ports.

Inside the loop traversing the PCI tree upwards, prepend the switch case
to validate the routing capability on any port with a fallthrough-case
that does the additional check for Atomic Ops not being blocked on
upstream ports.

Do not enable Atomic Op Requests if nothing can be learned about how the
device is attached - e.g. if it is on an "isolated" bus, as in s390.

Reported-by: Alexander Schmidt <alexs@linux.ibm.com>
Cc: stable@vger.kernel.org
Fixes: 430a23689dea ("PCI: Add pci_enable_atomic_ops_to_root()")
Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
---
 drivers/pci/pci.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index cc8abe6b1d07661488895876dbbcf8aaeadf4a17..23db6ad5f310ed009a9b2ca4933c7498e0d22b85 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3677,7 +3677,7 @@ void pci_acs_init(struct pci_dev *dev)
 int pci_enable_atomic_ops_to_root(struct pci_dev *dev, u32 cap_mask)
 {
 	struct pci_bus *bus = dev->bus;
-	struct pci_dev *bridge;
+	struct pci_dev *bridge = NULL;
 	u32 cap, ctl2;
 
 	/*
@@ -3715,29 +3715,27 @@ int pci_enable_atomic_ops_to_root(struct pci_dev *dev, u32 cap_mask)
 		switch (pci_pcie_type(bridge)) {
 		/* Ensure switch ports support AtomicOp routing */
 		case PCI_EXP_TYPE_UPSTREAM:
-		case PCI_EXP_TYPE_DOWNSTREAM:
-			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
-				return -EINVAL;
-			break;
-
-		/* Ensure root port supports all the sizes we care about */
-		case PCI_EXP_TYPE_ROOT_PORT:
-			if ((cap & cap_mask) != cap_mask)
-				return -EINVAL;
-			break;
-		}
-
-		/* Ensure upstream ports don't block AtomicOps on egress */
-		if (pci_pcie_type(bridge) == PCI_EXP_TYPE_UPSTREAM) {
+			/* Upstream ports must not block AtomicOps on egress */
 			pcie_capability_read_dword(bridge, PCI_EXP_DEVCTL2,
 						   &ctl2);
 			if (ctl2 & PCI_EXP_DEVCTL2_ATOMIC_EGRESS_BLOCK)
 				return -EINVAL;
+			fallthrough;
+		/* All switch ports need to route AtomicOps */
+		case PCI_EXP_TYPE_DOWNSTREAM:
+			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
+				return -EINVAL;
+			break;
 		}
-
 		bus = bus->parent;
 	}
 
+	/* Finally, last bridge must be root port and support requested sizes */
+	if ((!bridge) ||
+	    (pci_pcie_type(bridge) != PCI_EXP_TYPE_ROOT_PORT) ||
+	    ((cap & cap_mask) != cap_mask))
+		return -EINVAL;
+
 	pcie_capability_set_word(dev, PCI_EXP_DEVCTL2,
 				 PCI_EXP_DEVCTL2_ATOMIC_REQ);
 	return 0;

-- 
2.51.0
Re: [PATCH v3 2/2] PCI: AtomicOps: Fix logic in enable function
Posted by Bjorn Helgaas 4 weeks, 1 day ago
On Fri, Mar 06, 2026 at 06:13:59PM +0100, Gerd Bayer wrote:
> Move the check for root port requirements past the loop within
> pci_enable_atomic_ops_to_root() that checks on potential switch
> (up- and downstream) ports.
> 
> Inside the loop traversing the PCI tree upwards, prepend the switch case
> to validate the routing capability on any port with a fallthrough-case
> that does the additional check for Atomic Ops not being blocked on
> upstream ports.

Thanks for looking at this.  I think this makes good sense, and I'd
like to:

  - Hoist the problem description up here.  IIUC we enable AtomicOps on
    s390 when we shouldn't, which presumably leads to some problem.  I
    think the same could happen anywhere we don't have a Root Port,
    e.g., jailhouse, loongarch, maybe some VMM guests?

  - Reduce or remove the text above, which is basically C code
    translated to English, and move it down after the problem
    description, so we can state the problem and symptom, followed by
    the solution.

I think the core is (as you say below) that if there's no Root Port,
we previously allowed endpoints to use AtomicOps even in cases where
we don't know if the recipient supports them.

That *sounds* bad, and if you actually saw some kind of corruption as
a result, that would make this very compelling.

> Do not enable Atomic Op Requests if nothing can be learned about how the
> device is attached - e.g. if it is on an "isolated" bus, as in s390.
> 
> Reported-by: Alexander Schmidt <alexs@linux.ibm.com>

If there's any public report of the problem, include the URL here.

> Cc: stable@vger.kernel.org
> Fixes: 430a23689dea ("PCI: Add pci_enable_atomic_ops_to_root()")
> Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> ---
>  drivers/pci/pci.c | 30 ++++++++++++++----------------
>  1 file changed, 14 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index cc8abe6b1d07661488895876dbbcf8aaeadf4a17..23db6ad5f310ed009a9b2ca4933c7498e0d22b85 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -3677,7 +3677,7 @@ void pci_acs_init(struct pci_dev *dev)
>  int pci_enable_atomic_ops_to_root(struct pci_dev *dev, u32 cap_mask)
>  {
>  	struct pci_bus *bus = dev->bus;
> -	struct pci_dev *bridge;
> +	struct pci_dev *bridge = NULL;
>  	u32 cap, ctl2;
>  
>  	/*
> @@ -3715,29 +3715,27 @@ int pci_enable_atomic_ops_to_root(struct pci_dev *dev, u32 cap_mask)

Since we're looking at this, I think we should update the spec
references in this function (in a separate patch).  

  * Per PCIe r5.0, sec 9.3.5.10, the AtomicOp Requester Enable bit
  * in Device Control 2 is reserved in VFs and the PF value applies
  * to all associated VFs.

It looks like the AtomicOp Requester Enable part of PCIe r5.0, sec
9.3.5.10, was incorporated into the Device Control 2 Register
description in PCIe r7.0, sec 7.5.3.16.

  * Per PCIe r4.0, sec 6.15, endpoints and root ports may be
  * AtomicOp requesters.  For now, we only support endpoints as
  * requesters and root ports as completers.  No endpoints as
  * completers, and no peer-to-peer.

This looks like PCIe r7.0, sec 6.15.  Same section as r4.0, but we
should at least make both of these refer to the same spec revision.

>  		switch (pci_pcie_type(bridge)) {
>  		/* Ensure switch ports support AtomicOp routing */
>  		case PCI_EXP_TYPE_UPSTREAM:
> -		case PCI_EXP_TYPE_DOWNSTREAM:
> -			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
> -				return -EINVAL;
> -			break;
> -
> -		/* Ensure root port supports all the sizes we care about */
> -		case PCI_EXP_TYPE_ROOT_PORT:
> -			if ((cap & cap_mask) != cap_mask)
> -				return -EINVAL;
> -			break;
> -		}
> -
> -		/* Ensure upstream ports don't block AtomicOps on egress */
> -		if (pci_pcie_type(bridge) == PCI_EXP_TYPE_UPSTREAM) {
> +			/* Upstream ports must not block AtomicOps on egress */
>  			pcie_capability_read_dword(bridge, PCI_EXP_DEVCTL2,
>  						   &ctl2);
>  			if (ctl2 & PCI_EXP_DEVCTL2_ATOMIC_EGRESS_BLOCK)
>  				return -EINVAL;
> +			fallthrough;
> +		/* All switch ports need to route AtomicOps */
> +		case PCI_EXP_TYPE_DOWNSTREAM:
> +			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
> +				return -EINVAL;
> +			break;
>  		}
> -
>  		bus = bus->parent;
>  	}
>  
> +	/* Finally, last bridge must be root port and support requested sizes */
> +	if ((!bridge) ||
> +	    (pci_pcie_type(bridge) != PCI_EXP_TYPE_ROOT_PORT) ||
> +	    ((cap & cap_mask) != cap_mask))
> +		return -EINVAL;
> +
>  	pcie_capability_set_word(dev, PCI_EXP_DEVCTL2,
>  				 PCI_EXP_DEVCTL2_ATOMIC_REQ);
>  	return 0;
> 
> -- 
> 2.51.0
>
Re: [PATCH v3 2/2] PCI: AtomicOps: Fix logic in enable function
Posted by Gerd Bayer 4 weeks, 1 day ago
On Tue, 2026-03-10 at 16:52 -0500, Bjorn Helgaas wrote:
> On Fri, Mar 06, 2026 at 06:13:59PM +0100, Gerd Bayer wrote:
> > Move the check for root port requirements past the loop within
> > pci_enable_atomic_ops_to_root() that checks on potential switch
> > (up- and downstream) ports.
> > 
> > Inside the loop traversing the PCI tree upwards, prepend the switch case
> > to validate the routing capability on any port with a fallthrough-case
> > that does the additional check for Atomic Ops not being blocked on
> > upstream ports.
> 
> Thanks for looking at this.  I think this makes good sense, and I'd
> like to:
> 
>   - Hoist the problem description up here.  IIUC we enable AtomicOps on
>     s390 when we shouldn't, which presumably leads to some problem.  I
>     think the same could happen anywhere we don't have a Root Port,
>     e.g., jailhouse, loongarch, maybe some VMM guests?

A few things need to align here in order to observe the bug:
- architecture/configuration w/o Root Port knowledge
- PCIe device with AtomicOps support
- device driver that requests the AtomicOps enablement at the device
Unfortunately, I don't have access to any other combination that may
fail this way. However, I do have access to an x86 system to verify
that this does not generate an (immediate) regression.


>   - Reduce or remove the text above, which is basically C code
>     translated to English, and move it down after the problem
>     description, so we can state the problem and symptom, followed by
>     the solution.

Makes sense: I'll focus on the actual issue in the commit message here
and spin off a new series with patch 1.

> I think the core is (as you say below) that if there's no Root Port,
> we previously allowed endpoints to use AtomicOps even in cases where
> we don't know if the recipient supports them.
> 
> That *sounds* bad, and if you actually saw some kind of corruption as
> a result, that would make this very compelling.

So far, we have not seen any real functional fall-out on s390 due to
this bug. Our current use-cases of Mellanox/Nvidia's ConnectX adapters
do not seem to lead to the adapter's exploitation of PCIe AtomicOps.
However driver init succeeds to enable AtomicOps Requests as can be
observed with lspci.

> > Do not enable Atomic Op Requests if nothing can be learned about how the
> > device is attached - e.g. if it is on an "isolated" bus, as in s390.
> > 
> > Reported-by: Alexander Schmidt <alexs@linux.ibm.com>
> 
> If there's any public report of the problem, include the URL here.

I can offer excerpts of output from `lspci`, only.

> > Cc: stable@vger.kernel.org
> > Fixes: 430a23689dea ("PCI: Add pci_enable_atomic_ops_to_root()")
> > Signed-off-by: Gerd Bayer <gbayer@linux.ibm.com>
> > ---
> >  drivers/pci/pci.c | 30 ++++++++++++++----------------
> >  1 file changed, 14 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index cc8abe6b1d07661488895876dbbcf8aaeadf4a17..23db6ad5f310ed009a9b2ca4933c7498e0d22b85 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -3677,7 +3677,7 @@ void pci_acs_init(struct pci_dev *dev)
> >  int pci_enable_atomic_ops_to_root(struct pci_dev *dev, u32 cap_mask)
> >  {
> >  	struct pci_bus *bus = dev->bus;
> > -	struct pci_dev *bridge;
> > +	struct pci_dev *bridge = NULL;
> >  	u32 cap, ctl2;
> >  
> >  	/*
> > @@ -3715,29 +3715,27 @@ int pci_enable_atomic_ops_to_root(struct pci_dev *dev, u32 cap_mask)
> 
> Since we're looking at this, I think we should update the spec
> references in this function (in a separate patch).  
> 
>   * Per PCIe r5.0, sec 9.3.5.10, the AtomicOp Requester Enable bit
>   * in Device Control 2 is reserved in VFs and the PF value applies
>   * to all associated VFs.
> 
> It looks like the AtomicOp Requester Enable part of PCIe r5.0, sec
> 9.3.5.10, was incorporated into the Device Control 2 Register
> description in PCIe r7.0, sec 7.5.3.16.
> 
>   * Per PCIe r4.0, sec 6.15, endpoints and root ports may be
>   * AtomicOp requesters.  For now, we only support endpoints as
>   * requesters and root ports as completers.  No endpoints as
>   * completers, and no peer-to-peer.
> 
> This looks like PCIe r7.0, sec 6.15.  Same section as r4.0, but we
> should at least make both of these refer to the same spec revision.
> 

Fair request... Will clean up in a separate patch.

> >  		switch (pci_pcie_type(bridge)) {
> >  		/* Ensure switch ports support AtomicOp routing */
> >  		case PCI_EXP_TYPE_UPSTREAM:
> > -		case PCI_EXP_TYPE_DOWNSTREAM:
> > -			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
> > -				return -EINVAL;
> > -			break;
> > -
> > -		/* Ensure root port supports all the sizes we care about */
> > -		case PCI_EXP_TYPE_ROOT_PORT:
> > -			if ((cap & cap_mask) != cap_mask)
> > -				return -EINVAL;
> > -			break;
> > -		}
> > -
> > -		/* Ensure upstream ports don't block AtomicOps on egress */
> > -		if (pci_pcie_type(bridge) == PCI_EXP_TYPE_UPSTREAM) {
> > +			/* Upstream ports must not block AtomicOps on egress */
> >  			pcie_capability_read_dword(bridge, PCI_EXP_DEVCTL2,
> >  						   &ctl2);
> >  			if (ctl2 & PCI_EXP_DEVCTL2_ATOMIC_EGRESS_BLOCK)
> >  				return -EINVAL;
> > +			fallthrough;
> > +		/* All switch ports need to route AtomicOps */
> > +		case PCI_EXP_TYPE_DOWNSTREAM:
> > +			if (!(cap & PCI_EXP_DEVCAP2_ATOMIC_ROUTE))
> > +				return -EINVAL;
> > +			break;
> >  		}
> > -
> >  		bus = bus->parent;
> >  	}
> >  
> > +	/* Finally, last bridge must be root port and support requested sizes */
> > +	if ((!bridge) ||
> > +	    (pci_pcie_type(bridge) != PCI_EXP_TYPE_ROOT_PORT) ||
> > +	    ((cap & cap_mask) != cap_mask))
> > +		return -EINVAL;
> > +
> >  	pcie_capability_set_word(dev, PCI_EXP_DEVCTL2,
> >  				 PCI_EXP_DEVCTL2_ATOMIC_REQ);
> >  	return 0;
> > 
> > -- 
> > 2.51.0
> > 

Thank you,
Gerd