[PATCH v5 13/20] x86/mce: Unify AMD THR handler with MCA Polling

Yazen Ghannam posted 20 patches 1 month, 1 week ago
There is a newer version of this series
[PATCH v5 13/20] x86/mce: Unify AMD THR handler with MCA Polling
Posted by Yazen Ghannam 1 month, 1 week ago
AMD systems optionally support an MCA thresholding interrupt. The
interrupt should be used as another signal to trigger MCA polling. This
is similar to how the Intel Corrected Machine Check interrupt (CMCI) is
handled.

AMD MCA thresholding is managed using the MCA_MISC registers within an
MCA bank. The OS will need to modify the hardware error count field in
order to reset the threshold limit and rearm the interrupt. Management
of the MCA_MISC register should be done as a follow up to the basic MCA
polling flow. It should not be the main focus of the interrupt handler.

Furthermore, future systems will have the ability to send an MCA
thresholding interrupt to the OS even when the OS does not manage the
feature, i.e. MCA_MISC registers are Read-as-Zero/Locked.

Call the common MCA polling function when handling the MCA thresholding
interrupt. This will allow the OS to find any valid errors whether or
not the MCA thresholding feature is OS-managed. Also, this allows the
common MCA polling options and kernel parameters to apply to AMD
systems.

Add a callback to the MCA polling function to check and reset any
threshold blocks that have reached their threshold limit.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250624-wip-mca-updates-v4-16-236dd74f645f@amd.com
    
    v4->v5:
    * No change.
    
    v3->v4:
    * No change.
    
    v2->v3:
    * Add tags from Qiuxu and Tony.
    
    v1->v2:
    * Start collecting per-CPU items in a struct.
    * Keep and use mce_flags.amd_threshold.

 arch/x86/kernel/cpu/mce/amd.c      | 49 ++++++++++++++++----------------------
 arch/x86/kernel/cpu/mce/core.c     |  3 +++
 arch/x86/kernel/cpu/mce/internal.h |  2 ++
 3 files changed, 26 insertions(+), 28 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 42f5c115395b..63d8b12fe30f 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -54,6 +54,12 @@
 
 static bool thresholding_irq_en;
 
+struct mce_amd_cpu_data {
+	mce_banks_t     thr_intr_banks;
+};
+
+static DEFINE_PER_CPU_READ_MOSTLY(struct mce_amd_cpu_data, mce_amd_data);
+
 static const char * const th_names[] = {
 	"load_store",
 	"insn_fetch",
@@ -560,6 +566,7 @@ prepare_threshold_block(unsigned int bank, unsigned int block, u32 addr,
 	if (!b.interrupt_capable)
 		goto done;
 
+	__set_bit(bank, this_cpu_ptr(&mce_amd_data)->thr_intr_banks);
 	b.interrupt_enable = 1;
 
 	if (!mce_flags.smca) {
@@ -900,12 +907,7 @@ static void amd_deferred_error_interrupt(void)
 		log_error_deferred(bank);
 }
 
-static void log_error_thresholding(unsigned int bank, u64 misc)
-{
-	_log_error_deferred(bank, misc);
-}
-
-static void log_and_reset_block(struct threshold_block *block)
+static void reset_block(struct threshold_block *block)
 {
 	struct thresh_restart tr;
 	u32 low = 0, high = 0;
@@ -919,23 +921,14 @@ static void log_and_reset_block(struct threshold_block *block)
 	if (!(high & MASK_OVERFLOW_HI))
 		return;
 
-	/* Log the MCE which caused the threshold event. */
-	log_error_thresholding(block->bank, ((u64)high << 32) | low);
-
-	/* Reset threshold block after logging error. */
 	memset(&tr, 0, sizeof(tr));
 	tr.b = block;
 	threshold_restart_block(&tr);
 }
 
-/*
- * Threshold interrupt handler will service THRESHOLD_APIC_VECTOR. The interrupt
- * goes off when error_count reaches threshold_limit.
- */
-static void amd_threshold_interrupt(void)
+void amd_reset_thr_limit(unsigned int bank)
 {
-	struct threshold_bank **bp = this_cpu_read(threshold_banks), *thr_bank;
-	unsigned int bank, cpu = smp_processor_id();
+	struct threshold_bank **bp = this_cpu_read(threshold_banks);
 	struct threshold_block *block, *tmp;
 
 	/*
@@ -943,20 +936,20 @@ static void amd_threshold_interrupt(void)
 	 * handler is installed at boot time, but on a hotplug event the
 	 * interrupt might fire before the data has been initialized.
 	 */
-	if (!bp)
+	if (!bp || !bp[bank])
 		return;
 
-	for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
-		if (!(per_cpu(bank_map, cpu) & BIT_ULL(bank)))
-			continue;
-
-		thr_bank = bp[bank];
-		if (!thr_bank)
-			continue;
+	list_for_each_entry_safe(block, tmp, &bp[bank]->miscj, miscj)
+		reset_block(block);
+}
 
-		list_for_each_entry_safe(block, tmp, &thr_bank->miscj, miscj)
-			log_and_reset_block(block);
-	}
+/*
+ * Threshold interrupt handler will service THRESHOLD_APIC_VECTOR. The interrupt
+ * goes off when error_count reaches threshold_limit.
+ */
+static void amd_threshold_interrupt(void)
+{
+	machine_check_poll(MCP_TIMESTAMP, &this_cpu_ptr(&mce_amd_data)->thr_intr_banks);
 }
 
 /*
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index b3593a370bc9..e7a9a175bf49 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -831,6 +831,9 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 			mce_log(&err);
 
 clear_it:
+		if (mce_flags.amd_threshold)
+			amd_reset_thr_limit(i);
+
 		/*
 		 * Clear state for this bank.
 		 */
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 6cb2995f0ec1..e25ad0c005d5 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -269,6 +269,7 @@ void mce_threshold_create_device(unsigned int cpu);
 void mce_threshold_remove_device(unsigned int cpu);
 extern bool amd_filter_mce(struct mce *m);
 bool amd_mce_usable_address(struct mce *m);
+void amd_reset_thr_limit(unsigned int bank);
 
 /*
  * If MCA_CONFIG[McaLsbInStatusSupported] is set, extract ErrAddr in bits
@@ -300,6 +301,7 @@ static inline void mce_threshold_create_device(unsigned int cpu)	{ }
 static inline void mce_threshold_remove_device(unsigned int cpu)	{ }
 static inline bool amd_filter_mce(struct mce *m) { return false; }
 static inline bool amd_mce_usable_address(struct mce *m) { return false; }
+static inline void amd_reset_thr_limit(unsigned int bank) { }
 static inline void smca_extract_err_addr(struct mce *m) { }
 static inline void smca_bsp_init(void) { }
 #endif

-- 
2.51.0
Re: [PATCH v5 13/20] x86/mce: Unify AMD THR handler with MCA Polling
Posted by Borislav Petkov 1 month ago
On Mon, Aug 25, 2025 at 05:33:10PM +0000, Yazen Ghannam wrote:
> +/*
> + * Threshold interrupt handler will service THRESHOLD_APIC_VECTOR. The interrupt
> + * goes off when error_count reaches threshold_limit.
> + */
> +static void amd_threshold_interrupt(void)
> +{
> +	machine_check_poll(MCP_TIMESTAMP, &this_cpu_ptr(&mce_amd_data)->thr_intr_banks);
>  }

So the thresholding interrupt will fire.

It'll call machine_check_poll().

That thing will do something and eventually call back into amd.c again:

                if (mce_flags.amd_threshold)
                        amd_reset_thr_limit(i);

Why the back'n'forth?

Why not:

static void amd_threshold_interrupt(void)
{
	machine_check_poll(MCP_TIMESTAMP, &this_cpu_ptr(&mce_amd_data)->thr_intr_banks);
	amd_reset_thr_limit();
}

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v5 13/20] x86/mce: Unify AMD THR handler with MCA Polling
Posted by Yazen Ghannam 1 month ago
On Tue, Sep 02, 2025 at 01:10:52PM +0200, Borislav Petkov wrote:
> On Mon, Aug 25, 2025 at 05:33:10PM +0000, Yazen Ghannam wrote:
> > +/*
> > + * Threshold interrupt handler will service THRESHOLD_APIC_VECTOR. The interrupt
> > + * goes off when error_count reaches threshold_limit.
> > + */
> > +static void amd_threshold_interrupt(void)
> > +{
> > +	machine_check_poll(MCP_TIMESTAMP, &this_cpu_ptr(&mce_amd_data)->thr_intr_banks);
> >  }
> 
> So the thresholding interrupt will fire.
> 
> It'll call machine_check_poll().
> 
> That thing will do something and eventually call back into amd.c again:
> 
>                 if (mce_flags.amd_threshold)
>                         amd_reset_thr_limit(i);

This resets only a bank with a valid error.

Also, it resets the limit *before* clearing MCi_STATUS which should be
the last step.

> 
> Why the back'n'forth?
> 
> Why not:
> 
> static void amd_threshold_interrupt(void)
> {
> 	machine_check_poll(MCP_TIMESTAMP, &this_cpu_ptr(&mce_amd_data)->thr_intr_banks);
> 	amd_reset_thr_limit();

This means we'd need to do another loop through the banks. Their
MCi_STATUS registers would be cleared. So they could log another error
before the limit is reset.

Overall, the goal is to loop through the banks one time and log/reset
banks as we go through them.

Thanks,
Yazen
Re: [PATCH v5 13/20] x86/mce: Unify AMD THR handler with MCA Polling
Posted by Borislav Petkov 1 month ago
On Tue, Sep 02, 2025 at 09:37:13AM -0400, Yazen Ghannam wrote:
> This means we'd need to do another loop through the banks. Their
> MCi_STATUS registers would be cleared. So they could log another error
> before the limit is reset.
> 
> Overall, the goal is to loop through the banks one time and log/reset
> banks as we go through them.

Is there anything special about keeping this looping once? I might've missed
the reason if there were any particular one...

In any case, it sounds to me like you want a wrapper called

	clear_bank(i)

which executes at the end of machine_check_poll() and hides in there all
the possible MCA banks that need to be touched when done with the bank.

It'll still call back'n'forth through the code but at least all will be nicely
abstracted and concentrated.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v5 13/20] x86/mce: Unify AMD THR handler with MCA Polling
Posted by Yazen Ghannam 1 month ago
On Tue, Sep 02, 2025 at 07:04:38PM +0200, Borislav Petkov wrote:
> On Tue, Sep 02, 2025 at 09:37:13AM -0400, Yazen Ghannam wrote:
> > This means we'd need to do another loop through the banks. Their
> > MCi_STATUS registers would be cleared. So they could log another error
> > before the limit is reset.
> > 
> > Overall, the goal is to loop through the banks one time and log/reset
> > banks as we go through them.
> 
> Is there anything special about keeping this looping once? I might've missed
> the reason if there were any particular one...
> 

Mostly for code reuse and so that user settings for polling will apply
to AMD systems.

Also, MCi_STATUS should be cleared as the last step. So it'd be more
efficient to do any logging/clearing/resetting of an MCA bank all
together.

> In any case, it sounds to me like you want a wrapper called
> 
> 	clear_bank(i)
> 
> which executes at the end of machine_check_poll() and hides in there all
> the possible MCA banks that need to be touched when done with the bank.
> 
> It'll still call back'n'forth through the code but at least all will be nicely
> abstracted and concentrated.
> 

Right, I had a similar idea earlier:
https://lore.kernel.org/all/20240523155641.2805411-7-yazen.ghannam@amd.com/

The callback function still referenced "threshold limit" so it wasn't
totally abstracted.

I can go back to this idea and make it more abstracted like you suggest.

Thanks,
Yazen
Re: [PATCH v5 13/20] x86/mce: Unify AMD THR handler with MCA Polling
Posted by Borislav Petkov 1 month ago
On Tue, Sep 02, 2025 at 01:25:43PM -0400, Yazen Ghannam wrote:
> Also, MCi_STATUS should be cleared as the last step. So it'd be more
> efficient to do any logging/clearing/resetting of an MCA bank all together.

Makes sense.

> I can go back to this idea and make it more abstracted like you suggest.

Yap, that would make the code flow more clear, at least, instead of spreading
all those different MSR clearing calls at the end of _poll().

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette