[PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining

Maciej W. Rozycki posted 1 patch 17 hours ago
drivers/pci/quirks.c |   50 ++++++++++++++++++--------------------------------
1 file changed, 18 insertions(+), 32 deletions(-)
[PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
Posted by Maciej W. Rozycki 17 hours ago
Discard Vendor:Device ID matching in the PCIe failed link retraining 
quirk and ignore the link status for the removal of the 2.5GT/s speed 
clamp, whether applied by the quirk itself or the firmware earlier on.  
Revert to the original target link speed if this final link retraining 
has failed.

This is so that link training noise in hot-plug scenarios does not make 
a link remain clamped to the 2.5GT/s speed where an event race has led 
the quirk to apply the speed clamp for one device, only to leave it in 
place for a subsequent device to be plugged in.

Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: <stable@vger.kernel.org> # v6.5+
---
 drivers/pci/quirks.c |   50 ++++++++++++++++++--------------------------------
 1 file changed, 18 insertions(+), 32 deletions(-)

linux-pcie-failed-link-retrain-unclamp-always.diff
Index: linux-macro/drivers/pci/quirks.c
===================================================================
--- linux-macro.orig/drivers/pci/quirks.c
+++ linux-macro/drivers/pci/quirks.c
@@ -79,11 +79,10 @@ static bool pcie_lbms_seen(struct pci_de
  * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
  * request a retrain and check the result.
  *
- * If this turns out successful and we know by the Vendor:Device ID it is
- * safe to do so, then lift the restriction, letting the devices negotiate
- * a higher speed.  Also check for a similar 2.5GT/s speed restriction the
- * firmware may have already arranged and lift it with ports that already
- * report their data link being up.
+ * If this turns out successful, or where a 2.5GT/s speed restriction has
+ * been previously arranged by the firmware and the port reports its link
+ * already being up, lift the restriction, in a hope it is safe to do so,
+ * letting the devices negotiate a higher speed.
  *
  * Otherwise revert the speed to the original setting and request a retrain
  * again to remove any residual state, ignoring the result as it's supposed
@@ -94,52 +93,39 @@ static bool pcie_lbms_seen(struct pci_de
  */
 int pcie_failed_link_retrain(struct pci_dev *dev)
 {
-	static const struct pci_device_id ids[] = {
-		{ PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
-		{}
-	};
-	u16 lnksta, lnkctl2;
+	u16 lnksta, lnkctl2, oldlnkctl2;
 	int ret = -ENOTTY;
+	u32 lnkcap;
 
 	if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
 	    !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
 		return ret;
 
 	pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+	pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &oldlnkctl2);
 	if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) {
-		u16 oldlnkctl2;
-
 		pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
 
-		pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &oldlnkctl2);
 		ret = pcie_set_target_speed(dev, PCIE_SPEED_2_5GT, false);
-		if (ret) {
-			pci_info(dev, "retraining failed\n");
-			pcie_set_target_speed(dev, PCIE_LNKCTL2_TLS2SPEED(oldlnkctl2),
-					      true);
-			return ret;
-		}
-
-		pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
+		if (ret)
+			goto err;
 	}
 
 	pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
-
-	if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
-	    (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
-	    pci_match_id(ids, dev)) {
-		u32 lnkcap;
-
+	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
+	if ((lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
+	    (lnkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_2_5GB) {
 		pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
-		pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
 		ret = pcie_set_target_speed(dev, PCIE_LNKCAP_SLS2SPEED(lnkcap), false);
-		if (ret) {
-			pci_info(dev, "retraining failed\n");
-			return ret;
-		}
+		if (ret)
+			goto err;
 	}
 
 	return ret;
+err:
+	pci_info(dev, "retraining failed\n");
+	pcie_set_target_speed(dev, PCIE_LNKCTL2_TLS2SPEED(oldlnkctl2), true);
+	return ret;
 }
 
 static ktime_t fixup_debug_start(struct pci_dev *dev,
Re: [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
Posted by Ilpo Järvinen 11 hours ago
On Mon, 1 Dec 2025, Maciej W. Rozycki wrote:

> Discard Vendor:Device ID matching in the PCIe failed link retraining 
> quirk and ignore the link status for the removal of the 2.5GT/s speed 
> clamp, whether applied by the quirk itself or the firmware earlier on.  
> Revert to the original target link speed if this final link retraining 
> has failed.
> 
> This is so that link training noise in hot-plug scenarios does not make 
> a link remain clamped to the 2.5GT/s speed where an event race has led 
> the quirk to apply the speed clamp for one device, only to leave it in 
> place for a subsequent device to be plugged in.
> 
> Fixes: a89c82249c37 ("PCI: Work around PCIe link training failures")
> Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
> Cc: <stable@vger.kernel.org> # v6.5+
> ---
>  drivers/pci/quirks.c |   50 ++++++++++++++++++--------------------------------
>  1 file changed, 18 insertions(+), 32 deletions(-)
> 
> linux-pcie-failed-link-retrain-unclamp-always.diff
> Index: linux-macro/drivers/pci/quirks.c
> ===================================================================
> --- linux-macro.orig/drivers/pci/quirks.c
> +++ linux-macro/drivers/pci/quirks.c
> @@ -79,11 +79,10 @@ static bool pcie_lbms_seen(struct pci_de
>   * Restrict the speed to 2.5GT/s then with the Target Link Speed field,
>   * request a retrain and check the result.
>   *
> - * If this turns out successful and we know by the Vendor:Device ID it is
> - * safe to do so, then lift the restriction, letting the devices negotiate
> - * a higher speed.  Also check for a similar 2.5GT/s speed restriction the
> - * firmware may have already arranged and lift it with ports that already
> - * report their data link being up.
> + * If this turns out successful, or where a 2.5GT/s speed restriction has
> + * been previously arranged by the firmware and the port reports its link
> + * already being up, lift the restriction, in a hope it is safe to do so,
> + * letting the devices negotiate a higher speed.
>   *
>   * Otherwise revert the speed to the original setting and request a retrain
>   * again to remove any residual state, ignoring the result as it's supposed
> @@ -94,52 +93,39 @@ static bool pcie_lbms_seen(struct pci_de
>   */
>  int pcie_failed_link_retrain(struct pci_dev *dev)
>  {
> -	static const struct pci_device_id ids[] = {
> -		{ PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
> -		{}
> -	};
> -	u16 lnksta, lnkctl2;
> +	u16 lnksta, lnkctl2, oldlnkctl2;
>  	int ret = -ENOTTY;
> +	u32 lnkcap;
>  
>  	if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
>  	    !pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
>  		return ret;
>  
>  	pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
> +	pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &oldlnkctl2);
>  	if (!(lnksta & PCI_EXP_LNKSTA_DLLLA) && pcie_lbms_seen(dev, lnksta)) {
> -		u16 oldlnkctl2;
> -
>  		pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
>  
> -		pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &oldlnkctl2);
>  		ret = pcie_set_target_speed(dev, PCIE_SPEED_2_5GT, false);
> -		if (ret) {
> -			pci_info(dev, "retraining failed\n");
> -			pcie_set_target_speed(dev, PCIE_LNKCTL2_TLS2SPEED(oldlnkctl2),
> -					      true);
> -			return ret;
> -		}
> -
> -		pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
> +		if (ret)
> +			goto err;
>  	}
>  
>  	pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
> -
> -	if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
> -	    (lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
> -	    pci_match_id(ids, dev)) {
> -		u32 lnkcap;
> -
> +	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> +	if ((lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
> +	    (lnkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_2_5GB) {

I'm trying to recall, if there was some particular reason why 
->supported_speeds couldn't be used in this function. It would avoid the 
need to read LinkCap at all.

>  		pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
> -		pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
>  		ret = pcie_set_target_speed(dev, PCIE_LNKCAP_SLS2SPEED(lnkcap), false);
> -		if (ret) {
> -			pci_info(dev, "retraining failed\n");
> -			return ret;
> -		}
> +		if (ret)
> +			goto err;
>  	}
>  
>  	return ret;

return 0;

> +err:
> +	pci_info(dev, "retraining failed\n");
> +	pcie_set_target_speed(dev, PCIE_LNKCTL2_TLS2SPEED(oldlnkctl2), true);
> +	return ret;
>  }
>  
>  static ktime_t fixup_debug_start(struct pci_dev *dev,
> 

-- 
 i.
Re: [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
Posted by Maciej W. Rozycki 7 hours ago
On Mon, 1 Dec 2025, Ilpo Järvinen wrote:

> > +	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > +	if ((lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
> > +	    (lnkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_2_5GB) {
> 
> I'm trying to recall, if there was some particular reason why 
> ->supported_speeds couldn't be used in this function. It would avoid the 
> need to read LinkCap at all.

 Thanks for the hint.  There's probably none and it's just me missing some 
of the zillion bits and pieces.  I'll wait a couple of days for any other 
people to chime in and respin with this update included if everyone is 
otherwise happy to proceed with this update.

> > +		if (ret)
> > +			goto err;
> >  	}
> >  
> >  	return ret;
> 
> return 0;

 It can still return -ENOTTY if neither of the two latter conditionals 
matched, meaning the quirk was not applicable after all.  ISTR you had 
issues with the structure of this code before; I am not sure if it can 
be made any better in a reasonable way.  It is not a failure per se, so 
the newly-added common error path does not apply.  This is the case for: 
"Return an error if retraining was not needed[...]" from the introductory 
comment.

 Shall I add a comment above the return statement referring to this?

  Maciej
Re: [PATCH] PCI: Always lift 2.5GT/s restriction in PCIe failed link retraining
Posted by Ilpo Järvinen 4 hours ago
On Mon, 1 Dec 2025, Maciej W. Rozycki wrote:

> On Mon, 1 Dec 2025, Ilpo Järvinen wrote:
> 
> > > +	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
> > > +	if ((lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
> > > +	    (lnkcap & PCI_EXP_LNKCAP_SLS) != PCI_EXP_LNKCAP_SLS_2_5GB) {
> > 
> > I'm trying to recall, if there was some particular reason why 
> > ->supported_speeds couldn't be used in this function. It would avoid the 
> > need to read LinkCap at all.
> 
>  Thanks for the hint.  There's probably none and it's just me missing some 
> of the zillion bits and pieces.  I'll wait a couple of days for any other 
> people to chime in and respin with this update included if everyone is 
> otherwise happy to proceed with this update.
> 
> > > +		if (ret)
> > > +			goto err;
> > >  	}
> > >  
> > >  	return ret;
> > 
> > return 0;
> 
>  It can still return -ENOTTY if neither of the two latter conditionals 
> matched, meaning the quirk was not applicable after all.  ISTR you had 
> issues with the structure of this code before; I am not sure if it can 
> be made any better in a reasonable way.  It is not a failure per se, so 
> the newly-added common error path does not apply.  This is the case for: 
> "Return an error if retraining was not needed[...]" from the introductory 
> comment.
> 
>  Shall I add a comment above the return statement referring to this?

I think it's fine as is, I just didn't review with enough context to 
notice what it was initialized to (the usual thing when adding a 
rollback path is to forget to change the normal path to return 0, thus 
"auto commenting" it without checking enough, I'm sorry about that).

-- 
 i.