[PATCH] PCI: Mask Replay Timer Timeout for Realtek RTS525A

Max Lee posted 1 patch 1 week, 4 days ago
drivers/pci/quirks.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
[PATCH] PCI: Mask Replay Timer Timeout for Realtek RTS525A
Posted by Max Lee 1 week, 4 days ago
The Realtek RTS525A PCI-Express SD card reader (10ec:525a) generates
excessive Correctable Error (Replay Timer Timeout) AER events during
PCIe link initialization. On systems where firmware enables AER
reporting (CERptEn+), this causes an AER storm of ~240K error events
within 11 seconds of boot, overwhelming the kernel's error handler and
blocking shutdown/reboot.

The root cause is a transient link training instability inherent to this
device -- even on BIOS versions that suppress reporting, the error
status register (CESta) shows Timeout+ set.

Unlike the GL9750/GL9755 fixup (which only masks the parent root port),
the RTS525A additionally requires masking the endpoint's own Correctable
Error Mask register bit 12 (PCI_ERR_COR_REP_TIMER) to prevent it from
sending ERR_COR messages upstream. Call pci_mask_replay_timer_timeout()
to mask the parent root port as well.

Signed-off-by: Max Lee <max.lee@canonical.com>
---
 drivers/pci/quirks.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index caaed1a01dc0..072d1456daad 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -6380,4 +6380,23 @@ static void pci_mask_replay_timer_timeout(struct pci_dev *pdev)
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9750, pci_mask_replay_timer_timeout);
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_GLI, 0x9755, pci_mask_replay_timer_timeout);
+
+static void pci_mask_replay_timer_timeout_on_endpoint(struct pci_dev *pdev)
+{
+	u32 val;
+
+	if (!pdev->aer_cap)
+		return;
+
+	pci_info(pdev, "mask Replay Timer Timeout on endpoint due to hardware defect\n");
+
+	pci_read_config_dword(pdev, pdev->aer_cap + PCI_ERR_COR_MASK, &val);
+	val |= PCI_ERR_COR_REP_TIMER;
+	pci_write_config_dword(pdev, pdev->aer_cap + PCI_ERR_COR_MASK, val);
+
+	/* Also mask the parent root port */
+	pci_mask_replay_timer_timeout(pdev);
+}
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_REALTEK, 0x525a,
+			 pci_mask_replay_timer_timeout_on_endpoint);
 #endif
-- 
2.43.0