drivers/edac/igen6_edac.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)
Hi Ramses,
> From: Ramses <ramses@well-founded.dev>
> [...]
>
> Thanks for your reply!
>
> I recompiled the kernel with that option enabled, and attached the dmesg
> output to this email. Let me know if I can do anything else to help debug this.
Thanks for helping debug the issue and taking the useful dmesg log.
From the dmesg log, the ECC error log register of this SoC contained the
invalid error value ~0, resulting in a flood of invalid error reports in polling mode.
@Ramses & @John,
Can you please apply the attached fix patch and see whether it fixes the issue?
Thanks!
-Qiuxu
From fa843550da0589a1fae2fa7713767bbc98b3a02c Mon Sep 17 00:00:00 2001
From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Date: Mon, 10 Feb 2025 15:17:37 +0800
Subject: [PATCH 1/1] EDAC/igen6: Fix the flood of invalid error reports
The ECC_ERROR_LOG register of certain SoCs may contain the invalid value
~0, which results in a flood of invalid error reports in polling mode.
Fix the flood of invalid error reports by skipping the invalid ECC error
log value ~0.
Fixes: e14232afa944 ("EDAC/igen6: Add polling support")
Reported-by: Ramses <ramses@well-founded.dev>
Closes: https://lore.kernel.org/all/OISL8Rv--F-9@well-founded.dev/
Reported-by: John <therealgraysky@proton.me>
Closes: https://lore.kernel.org/all/p5YcxOE6M3Ncxpn2-Ia_wCt61EM4LwIiN3LroQvT_-G2jMrFDSOW5k2A9D8UUzD2toGpQBN1eI0sL5dSKnkO8iteZegLoQEj-DwQaMhGx4A=@proton.me/
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
drivers/edac/igen6_edac.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/drivers/edac/igen6_edac.c b/drivers/edac/igen6_edac.c
index fdf3a84fe698..595908af9e5c 100644
--- a/drivers/edac/igen6_edac.c
+++ b/drivers/edac/igen6_edac.c
@@ -785,13 +785,22 @@ static u64 ecclog_read_and_clear(struct igen6_imc *imc)
{
u64 ecclog = readq(imc->window + ECC_ERROR_LOG_OFFSET);
- if (ecclog & (ECC_ERROR_LOG_CE | ECC_ERROR_LOG_UE)) {
- /* Clear CE/UE bits by writing 1s */
- writeq(ecclog, imc->window + ECC_ERROR_LOG_OFFSET);
- return ecclog;
- }
+ /*
+ * Quirk: The ECC_ERROR_LOG register of certain SoCs may contain
+ * the invalid value ~0. This will result in a flood of invalid
+ * error reports in polling mode. Skip it.
+ */
+ if (ecclog == ~0)
+ return 0;
- return 0;
+ /* Neither a CE nor a UE. Skip it.*/
+ if (!(ecclog & (ECC_ERROR_LOG_CE | ECC_ERROR_LOG_UE)))
+ return 0;
+
+ /* Clear CE/UE bits by writing 1s */
+ writeq(ecclog, imc->window + ECC_ERROR_LOG_OFFSET);
+
+ return ecclog;
}
static void errsts_clear(struct igen6_imc *imc)
--
2.17.1
Feb 10, 2025, 09:05 by qiuxu.zhuo@intel.com: > Hi Ramses, > >> From: Ramses <>> ramses@well-founded.dev>> > >> [...] >> >> Thanks for your reply! >> >> I recompiled the kernel with that option enabled, and attached the dmesg >> output to this email. Let me know if I can do anything else to help debug this. >> > > Thanks for helping debug the issue and taking the useful dmesg log. > From the dmesg log, the ECC error log register of this SoC contained the > invalid error value ~0, resulting in a flood of invalid error reports in polling mode. > > @Ramses & @John, > Can you please apply the attached fix patch and see whether it fixes the issue? > Thanks! > > -Qiuxu > I just booted into a kernel with that patch applied and I'm not getting the errors anymore, so that seems to fix the issue for me indeed! Thanks a bunch! Ramses
Hi Ramses, > From: Ramses <ramses@well-founded.dev> > [...] > > I just booted into a kernel with that patch applied and I'm not getting the > errors anymore, so that seems to fix the issue for me indeed! Thank you for your testing. I'm waiting a bit for John's test result (if he has the chance to test it). After that, I'll post the fix path to the EDAC mailing list. -Qiuxu
On Monday, February 10th, 2025 at 8:55 PM, Zhuo, Qiuxu qiuxu.zhuo@intel.com wrote: > I'm waiting a bit for John's test result (if he has the chance to test it). > After that, I'll post the fix path to the EDAC mailing list. Confirmed, your patch fixed the flood experienced with 6.13.2-arch1-1 on my Beelink EQ12 (N100). Many thanks.
© 2016 - 2026 Red Hat, Inc.