[PATCH 1/3] MIPS: DEC: Rate-limit memory errors for ECC systems

Maciej W. Rozycki posted 3 patches 4 days, 20 hours ago
[PATCH 1/3] MIPS: DEC: Rate-limit memory errors for ECC systems
Posted by Maciej W. Rozycki 4 days, 20 hours ago
Prevent the system from becoming unusable due to a flood of memory error 
messages with DECstation and DECsystem models using ECC, that is KN02, 
KN03 and KN05 systems.  It seems common for gradual oxidation of memory 
module contacts to cause memory errors to eventually develop and while 
ECC takes care of correcting them and the system affected can continue 
operating normally until the contacts have been cleaned, the unlimited 
messages make the system spend all its time on producing them, therefore 
preventing it from being used.

Rate-limiting removes the load from the system and enables its normal 
operation, e.g.:

Bus error interrupt: CPU memory read ECC error at 0x139cfb04
  ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU partial memory write ECC error at 0x138c1f5c
  ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU partial memory write ECC error at 0x138c1f6c
  ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x139cff64
  ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af00c
  ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af044
  ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af0cc
  ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af0cc
  ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af0e4
  ECC syndrome 0x54 -- corrected single bit error at data bit D3
Bus error interrupt: CPU memory read ECC error at 0x136af104
  ECC syndrome 0x54 -- corrected single bit error at data bit D3
dec_ecc_be_backend: 34455 callbacks suppressed

Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
---
 arch/mips/dec/ecc-berr.c |   16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

linux-mips-dec-berr-ratelimit-ecc.diff
Index: linux-macro/arch/mips/dec/ecc-berr.c
===================================================================
--- linux-macro.orig/arch/mips/dec/ecc-berr.c
+++ linux-macro/arch/mips/dec/ecc-berr.c
@@ -5,12 +5,13 @@
  *	5000/240 (KN03), 5000/260 (KN05) and DECsystem 5900 (KN03),
  *	5900/260 (KN05) systems.
  *
- *	Copyright (c) 2003, 2005  Maciej W. Rozycki
+ *	Copyright (c) 2003, 2005, 2026  Maciej W. Rozycki
  */
 
 #include <linux/init.h>
 #include <linux/interrupt.h>
 #include <linux/kernel.h>
+#include <linux/ratelimit.h>
 #include <linux/sched.h>
 #include <linux/types.h>
 
@@ -51,6 +52,10 @@ static int dec_ecc_be_backend(struct pt_
 	static const char overstr[] = "overrun";
 	static const char eccstr[] = "ECC error";
 
+	static DEFINE_RATELIMIT_STATE(rs,
+				      DEFAULT_RATELIMIT_INTERVAL,
+				      DEFAULT_RATELIMIT_BURST);
+
 	const char *kind, *agent, *cycle, *event;
 	const char *status = "", *xbit = "", *fmt = "";
 	unsigned long address;
@@ -70,7 +75,7 @@ static int dec_ecc_be_backend(struct pt_
 
 	if (!(erraddr & KN0X_EAR_VALID)) {
 		/* No idea what happened. */
-		printk(KERN_ALERT "Unidentified bus error %s\n", kind);
+		pr_alert_ratelimited("Unidentified bus error %s\n", kind);
 		return action;
 	}
 
@@ -180,12 +185,13 @@ static int dec_ecc_be_backend(struct pt_
 		}
 	}
 
-	if (action != MIPS_BE_FIXUP)
+	if (action != MIPS_BE_FIXUP && __ratelimit(&rs)) {
 		printk(KERN_ALERT "Bus error %s: %s %s %s at %#010lx\n",
 			kind, agent, cycle, event, address);
 
-	if (action != MIPS_BE_FIXUP && erraddr & KN0X_EAR_ECCERR)
-		printk(fmt, "  ECC syndrome ", syn, status, xbit, i);
+		if (erraddr & KN0X_EAR_ECCERR)
+			printk(fmt, "  ECC syndrome ", syn, status, xbit, i);
+	}
 
 	return action;
 }