[PATCH v2] PCI: Mask Replay Timer Timeout for Realtek RTS525A

Max Lee posted 1 patch 17 hours ago
drivers/pci/quirks.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
[PATCH v2] PCI: Mask Replay Timer Timeout for Realtek RTS525A
Posted by Max Lee 17 hours ago
The Realtek RTS525A PCI-Express SD card reader (10ec:525a) generates
excessive Correctable Error (Replay Timer Timeout) AER events during
PCIe link initialization. On systems where firmware enables AER
reporting (CERptEn+), this causes an AER storm of ~240K error events
within 11 seconds of boot, overwhelming the kernel error handler and
blocking shutdown/reboot.

The root cause is a transient link training instability inherent to this
device -- even on BIOS versions that suppress reporting, the error
status register (CESta) shows Timeout+ set.

Unlike the GL9750/GL9755 fixup, which only masks the parent root port,
the RTS525A also needs its endpoint Correctable Error Mask bit 12
(PCI_ERR_COR_REP_TIMER) masked when the endpoint exposes AER, so it does
not send ERR_COR messages upstream. Also mask the parent root port to
cover root-port reporting of link errors caused by the endpoint.

Signed-off-by: Max Lee <max.lee@canonical.com>
---
Changes in v2:
  - Mask the parent root port even when the endpoint lacks AER capability.
  - Remove the early return before parent root port masking.

 drivers/pci/quirks.c | 22 ++++++++++++++++++++++
 1 file changed, 22 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index caaed1a01dc0..6597536a4c70 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -6380,4 +6380,26 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
+
+static void pci_mask_replay_timer_timeout_on_endpoint(struct pci_dev *pdev)
+{
+	u32 val;
+
+	if (pdev->aer_cap) {
+		pci_info(pdev, "mask Replay Timer Timeout on endpoint due to hardware defect\n");
+
+		pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_COR_MASK, &val);
+		val |= PCI_ERR_COR_REP_TIMER;
+		pci_write_config_dword(pdev, pdev->aer_cap + PCI_ERR_COR_MASK, val);
+	}
+
+	/*
+	 * Also mask the parent root port. Do this even if the endpoint lacks
+	 * AER capability because the root port may still report link errors
+	 * caused by the endpoint.
+	 */
+	pci_mask_replay_timer_timeout(pdev);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_REALTEK, 0x525a,
+			pci_mask_replay_timer_timeout_on_endpoint);
 #endif
-- 
2.43.0