[PATCH] x86/mce/amd: Filter bogus L3 deferred errors on CZN A0

Yazen Ghannam posted 1 patch 1 month, 2 weeks ago
arch/x86/kernel/cpu/mce/amd.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
[PATCH] x86/mce/amd: Filter bogus L3 deferred errors on CZN A0
Posted by Yazen Ghannam 1 month, 2 weeks ago
User has observed multiple L3 cache deferred errors logs after recent
kernel rework of deferred error handling. [1]

Upon inspection, the errors are determined to be bogus due to
inconsistent status values. Also, user verified that bogus MCA_DESTAT
values are present on the system even with an older kernel. [2] The
errors seem to be garbage values present in the MCA_DESTAT of some L3
cache banks. These were implicitly ignored before the recent kernel
rework because these do not generate a deferred error interrupt.

A later revision of the rework patch was merged for v6.19. This
naturally filtered out most of the bogus error logs. However, a few
signatures still remain. [3]

Add the remaining bogus signatures to the MCE filter function. Minimize
the scope of the filter to the reported CPU family/model/stepping so
that similar issues are not implicitly masked on other systems.

Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
Reported-by: Bert Karwatzki <spasswolf@web.de>
Closes: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de                # [1]
Link: https://lore.kernel.org/6e1eda7dd55f6fa30405edf7b0f75695cf55b237.camel@web.de # [2]
Link: https://lore.kernel.org/21ba47fa8893b33b94370c2a42e5084cf0d2e975.camel@web.de # [3]
---
 arch/x86/kernel/cpu/mce/amd.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index da13c1e37f87..7a94492aa50f 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -604,6 +604,18 @@ bool amd_filter_mce(struct mce *m)
 	enum smca_bank_types bank_type = smca_get_bank_type(m->extcpu, m->bank);
 	struct cpuinfo_x86 *c = &boot_cpu_data;
 
+	/*
+	 * Bogus L3 cache deferred errors on Cezanne A0.
+	 *
+	 * Case #1: PCC bit set. This is not valid for deferred errors.
+	 * Case #2: XEC 29. This is not a valid error code.
+	 */
+	if (c->x86 == 0x19 && c->x86_model == 0x50 && c->x86_stepping == 0x0 &&
+	    bank_type == SMCA_L3_CACHE && (m->status & MCI_STATUS_DEFERRED)) {
+		if ((m->status & MCI_STATUS_PCC) || XEC(m->status, 0x3f) == 29)
+			return true;
+	}
+
 	/* See Family 17h Models 10h-2Fh Erratum #1114. */
 	if (c->x86 == 0x17 &&
 	    c->x86_model >= 0x10 && c->x86_model <= 0x2F &&
-- 
2.53.0
Re: [PATCH] x86/mce/amd: Filter bogus L3 deferred errors on CZN A0
Posted by Mario Limonciello 1 month, 2 weeks ago
On 2/28/26 8:08 AM, Yazen Ghannam wrote:
> User has observed multiple L3 cache deferred errors logs after recent
> kernel rework of deferred error handling. [1]
> 
> Upon inspection, the errors are determined to be bogus due to
> inconsistent status values. Also, user verified that bogus MCA_DESTAT
> values are present on the system even with an older kernel. [2] The
> errors seem to be garbage values present in the MCA_DESTAT of some L3
> cache banks. These were implicitly ignored before the recent kernel
> rework because these do not generate a deferred error interrupt.
> 
> A later revision of the rework patch was merged for v6.19. This
> naturally filtered out most of the bogus error logs. However, a few
> signatures still remain. [3]
> 
> Add the remaining bogus signatures to the MCE filter function. Minimize
> the scope of the filter to the reported CPU family/model/stepping so
> that similar issues are not implicitly masked on other systems.
> 
> Fixes: 7cb735d7c0cb ("x86/mce: Unify AMD DFR handler with MCA Polling")
> Reported-by: Bert Karwatzki <spasswolf@web.de>
> Closes: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de
> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> Cc: Mario Limonciello <mario.limonciello@amd.com>
> Cc: stable@vger.kernel.org
> Link: https://lore.kernel.org/20250915010010.3547-1-spasswolf@web.de                # [1]
> Link: https://lore.kernel.org/6e1eda7dd55f6fa30405edf7b0f75695cf55b237.camel@web.de # [2]
> Link: https://lore.kernel.org/21ba47fa8893b33b94370c2a42e5084cf0d2e975.camel@web.de # [3]
> ---
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>

>   arch/x86/kernel/cpu/mce/amd.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index da13c1e37f87..7a94492aa50f 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -604,6 +604,18 @@ bool amd_filter_mce(struct mce *m)
>   	enum smca_bank_types bank_type = smca_get_bank_type(m->extcpu, m->bank);
>   	struct cpuinfo_x86 *c = &boot_cpu_data;
>   
> +	/*
> +	 * Bogus L3 cache deferred errors on Cezanne A0.
> +	 *
> +	 * Case #1: PCC bit set. This is not valid for deferred errors.
> +	 * Case #2: XEC 29. This is not a valid error code.
> +	 */
> +	if (c->x86 == 0x19 && c->x86_model == 0x50 && c->x86_stepping == 0x0 &&
> +	    bank_type == SMCA_L3_CACHE && (m->status & MCI_STATUS_DEFERRED)) {
> +		if ((m->status & MCI_STATUS_PCC) || XEC(m->status, 0x3f) == 29)
> +			return true;
> +	}
> +
>   	/* See Family 17h Models 10h-2Fh Erratum #1114. */
>   	if (c->x86 == 0x17 &&
>   	    c->x86_model >= 0x10 && c->x86_model <= 0x2F &&
Re: [PATCH] x86/mce/amd: Filter bogus L3 deferred errors on CZN A0
Posted by Yazen Ghannam 1 month, 2 weeks ago
On Mon, Mar 02, 2026 at 10:16:31AM -0600, Mario Limonciello wrote:
[...]
> > ---
> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>

Thanks Mario.

Bert,

There are other options to prevent these from being reported. And
they don't require a kernel patch.

"ignore_ce" parameter: Will disable the MCA polling timer.

"dont_log_ce" parameter: Will keep the MCA polling timer but will not
report errors that don't have a usable address.

You can set either of these through sysfs or on the kernel command line.

Thanks,
Yazen