[v1] x86/mce: Restore MCA polling interval halving

[PATCH] x86/mce: Restore MCA polling interval halving

Posted by Borislav Petkov 2 months, 1 week ago

Ok,

finally. :-\

Pls run it to make sure it DTRT for you too.

Thx.

---
From: "Borislav Petkov (AMD)" <bp@alien8.de>
Date: Mon, 16 Mar 2026 16:12:00 +0100
Subject: [PATCH] x86/mce: Restore MCA polling interval halving

RongQing reported that the MCA polling interval doesn't halve when an
error gets logged. It was traced down to the commit in Fixes: because:

mce_timer_fn()
|-> mce_poll_banks()
|-> machine_check_poll()
|-> mce_log()

which will queue the work and return.

Now, back in mce_timer_fn():

        /*
         * Alert userspace if needed. If we logged an MCE, reduce the polling
         * interval, otherwise increase the polling interval.
         */
        if (mce_notify_irq())

<--- here we haven't ran the notifier chain yet so mce_need_notify is
not set yet so this won't hit and we won't halve the interval iv.

Now the notifier chain runs. mce_early_notifier() sets the bit, does
mce_notify_irq(), that clears the bit and then the notifier chain
a little later logs the error.

So this is a silly timing issue.

But, that's all unnecessary.

All it needs to happen here is, the "should we notify of a logged MCE"
mce_notify_irq() asks, should be simply a question to the mce gen pool:
"Are you empty?"

And that then turns into a simple yes or no answer and it all
JustWorks(tm).

So do that.

Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
Reported-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20260112082747.2842-1-lirongqing@baidu.com
---
 arch/x86/kernel/cpu/mce/core.c | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 8dd424ac5de8..d18db7d8d237 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -90,7 +90,6 @@ struct mca_config mca_cfg __read_mostly = {
 };
 
 static DEFINE_PER_CPU(struct mce_hw_err, hw_errs_seen);
-static unsigned long mce_need_notify;
 
 /*
  * MCA banks polled by the period polling timer for corrected events.
@@ -595,7 +594,7 @@ static bool mce_notify_irq(void)
 	/* Not more than two messages every minute */
 	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
 
-	if (test_and_clear_bit(0, &mce_need_notify)) {
+	if (!mce_gen_pool_empty()) {
 		mce_work_trigger();
 
 		if (__ratelimit(&ratelimit))
@@ -618,10 +617,6 @@ static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
 	/* Emit the trace record: */
 	trace_mce_record(err);
 
-	set_bit(0, &mce_need_notify);
-
-	mce_notify_irq();
-
 	return NOTIFY_DONE;
 }
 
-- 
2.51.0


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

RE: [PATCH] x86/mce: Restore MCA polling interval halving

Posted by Zhuo, Qiuxu 2 months, 1 week ago

Hi Boris,

> From: Borislav Petkov <bp@alien8.de>
> Sent: Tuesday, April 7, 2026 6:49 AM
> To: Li,Rongqing(ACG CCN) <lirongqing@baidu.com>
> Cc: Luck, Tony <tony.luck@intel.com>; Nikolay Borisov
> <nik.borisov@suse.com>; Thomas Gleixner <tglx@kernel.org>; Ingo Molnar
> <mingo@redhat.com>; Dave Hansen <dave.hansen@linux.intel.com>;
> x86@kernel.org; H . Peter Anvin <hpa@zytor.com>; Yazen Ghannam
> <yazen.ghannam@amd.com>; Zhuo, Qiuxu <qiuxu.zhuo@intel.com>;
> Avadhut Naik <avadhut.naik@amd.com>; linux-kernel@vger.kernel.org; linux-
> edac@vger.kernel.org
> Subject: [PATCH] x86/mce: Restore MCA polling interval halving
> 
> Ok,
> 
> finally. :-\
> 
> Pls run it to make sure it DTRT for you too.
> 
> Thx.
> 
> ---
> From: "Borislav Petkov (AMD)" <bp@alien8.de>
> Date: Mon, 16 Mar 2026 16:12:00 +0100
> Subject: [PATCH] x86/mce: Restore MCA polling interval halving
> 
> RongQing reported that the MCA polling interval doesn't halve when an error
> gets logged. It was traced down to the commit in Fixes: because:
> 
> mce_timer_fn()
> |-> mce_poll_banks()
> |-> machine_check_poll()
> |-> mce_log()
> 
> which will queue the work and return.
> 
> Now, back in mce_timer_fn():
> 
>         /*
>          * Alert userspace if needed. If we logged an MCE, reduce the polling
>          * interval, otherwise increase the polling interval.
>          */
>         if (mce_notify_irq())
> 
> <--- here we haven't ran the notifier chain yet so mce_need_notify is not set
> yet so this won't hit and we won't halve the interval iv.
> 
> Now the notifier chain runs. mce_early_notifier() sets the bit, does
> mce_notify_irq(), that clears the bit and then the notifier chain a little later
> logs the error.
> 
> So this is a silly timing issue.
> 
> But, that's all unnecessary.
> 
> All it needs to happen here is, the "should we notify of a logged MCE"
> mce_notify_irq() asks, should be simply a question to the mce gen pool:
> "Are you empty?"
> 
> And that then turns into a simple yes or no answer and it all JustWorks(tm).
> 
> So do that.
> 
> Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
> Reported-by: Li RongQing <lirongqing@baidu.com>
> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
> Link: https://lore.kernel.org/r/20260112082747.2842-1-
> lirongqing@baidu.com
> ---
>  arch/x86/kernel/cpu/mce/core.c | 7 +------
>  1 file changed, 1 insertion(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 8dd424ac5de8..d18db7d8d237 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -90,7 +90,6 @@ struct mca_config mca_cfg __read_mostly = {  };
> 
>  static DEFINE_PER_CPU(struct mce_hw_err, hw_errs_seen); -static unsigned
> long mce_need_notify;
> 
>  /*
>   * MCA banks polled by the period polling timer for corrected events.
> @@ -595,7 +594,7 @@ static bool mce_notify_irq(void)
>  	/* Not more than two messages every minute */
>  	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
> 
> -	if (test_and_clear_bit(0, &mce_need_notify)) {
> +	if (!mce_gen_pool_empty()) {
>  		mce_work_trigger();
> 
>  		if (__ratelimit(&ratelimit))
> @@ -618,10 +617,6 @@ static int mce_early_notifier(struct notifier_block
> *nb, unsigned long val,
>  	/* Emit the trace record: */
>  	trace_mce_record(err);
> 
> -	set_bit(0, &mce_need_notify);
> -
> -	mce_notify_irq();
> -

I injected a correctable error with the CMCI interrupt enabled on an Intel testing machine, 
and this mce_early_notifier() was invoked.  But the following code in mce_notify_irq() is now
never executed, and I didn't see the error log message "Machine check events logged".

    ...
    mce_work_trigger();

    if (__ratelimit(&ratelimit))
        pr_info(HW_ERR "Machine check events logged\n");

    return true;
    ...

Thanks!
Qiuxu

Re: [PATCH] x86/mce: Restore MCA polling interval halving

Posted by Borislav Petkov 2 months ago

On Tue, Apr 07, 2026 at 03:04:04PM +0000, Zhuo, Qiuxu wrote:
> I injected a correctable error with the CMCI interrupt enabled on an Intel testing machine, 
> and this mce_early_notifier() was invoked.  But the following code in mce_notify_irq() is now
> never executed, and I didn't see the error log message "Machine check events logged".

You did disable the CEC, right?

In any case, let's have a look:

When we log an MCE, we do:

mce_log			# add it to the genpool and run the works
 -> mce_irq_work
  -> mce_schedule_work
   -> ..
    -> mce_gen_pool_process 	# this'll send it down the notifier chain
     -> x86_mce_decoder_chain
      -> mce_early_notifier 	# that guy sees it here and issues the trace record

Now, mce_notify_irq() would do mce_work_trigger() and issue the printk
- dunno, I guess we still want our printk and probably should add it back
- but the first one - the work triggering - that's mcelog. It is using that
usermode helper gunk, dunno if you guys still need it.

Because mcelog does register to the decoder chain so it'll get to see the MCE
eventually. So that part is fine.

The only question is the usermode helper gunk...

Tony?

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

RE: [PATCH] x86/mce: Restore MCA polling interval halving

Posted by Zhuo, Qiuxu 2 months ago

Hi Boris,

> From: Borislav Petkov <bp@alien8.de>
> [...]
> Subject: Re: [PATCH] x86/mce: Restore MCA polling interval halving
> 
> On Tue, Apr 07, 2026 at 03:04:04PM +0000, Zhuo, Qiuxu wrote:
> > I injected a correctable error with the CMCI interrupt enabled on an
> > Intel testing machine, and this mce_early_notifier() was invoked.  But
> > the following code in mce_notify_irq() is now never executed, and I didn't
> see the error log message "Machine check events logged".
> 
> You did disable the CEC, right?

Yes, I disabled the RAS_CEC.

> In any case, let's have a look:
> 
> When we log an MCE, we do:
> 
> mce_log			# add it to the genpool and run the works
>  -> mce_irq_work
>   -> mce_schedule_work
>    -> ..
>     -> mce_gen_pool_process 	# this'll send it down the notifier chain
>      -> x86_mce_decoder_chain
>       -> mce_early_notifier 	# that guy sees it here and issues the trace
> record
> 
> Now, mce_notify_irq() would do mce_work_trigger() and issue the printk
> - dunno, I guess we still want our printk and probably should add it back

This printk is quite useful for checking whether an MCE error event occurs.
Even though it's rate-limited, I'd appreciate keeping it.

> - but the first one - the work triggering - that's mcelog. It is using that
> usermode helper gunk, dunno if you guys still need it.
>

I'm not sure whether any distros still use this user-mode helper to start mcelog or 
mcelog-like tools. @Luck, Tony, could you comment? 😊

> Because mcelog does register to the decoder chain so it'll get to see the MCE
> eventually. So that part is fine.
> 
> The only question is the usermode helper gunk...
> 
> Tony?

Re: [PATCH] x86/mce: Restore MCA polling interval halving

Posted by Borislav Petkov 2 months ago

On Wed, Apr 15, 2026 at 01:39:06PM +0000, Zhuo, Qiuxu wrote:
> I'm not sure whether any distros still use this user-mode helper to start mcelog or 
> mcelog-like tools. @Luck, Tony, could you comment? 😊

Old distros probably but they don't update to the newest kernel...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] x86/mce: Restore MCA polling interval halving

Posted by Luck, Tony 2 months ago

On Tue, Apr 14, 2026 at 11:18:03PM +0200, Borislav Petkov wrote:
> On Tue, Apr 07, 2026 at 03:04:04PM +0000, Zhuo, Qiuxu wrote:
> > I injected a correctable error with the CMCI interrupt enabled on an Intel testing machine, 
> > and this mce_early_notifier() was invoked.  But the following code in mce_notify_irq() is now
> > never executed, and I didn't see the error log message "Machine check events logged".
> 
> You did disable the CEC, right?
> 
> In any case, let's have a look:
> 
> When we log an MCE, we do:
> 
> mce_log			# add it to the genpool and run the works
>  -> mce_irq_work
>   -> mce_schedule_work
>    -> ..
>     -> mce_gen_pool_process 	# this'll send it down the notifier chain
>      -> x86_mce_decoder_chain
>       -> mce_early_notifier 	# that guy sees it here and issues the trace record
> 
> Now, mce_notify_irq() would do mce_work_trigger() and issue the printk
> - dunno, I guess we still want our printk and probably should add it back
> - but the first one - the work triggering - that's mcelog. It is using that
> usermode helper gunk, dunno if you guys still need it.
> 
> Because mcelog does register to the decoder chain so it'll get to see the MCE
> eventually. So that part is fine.
> 
> The only question is the usermode helper gunk...
> 
> Tony?

Ran my own test. RAS_CEC disabled. Booted with mce=no_cmci injected a
corrected error every twenty seconds. Added pr_info() to mce_timer_fn()
to say which CPUs were doubling or halving interval.

Results:

I did see some "Machine check events logged" console messages.

The debug messages are "interesting". Polling timers on CPUs aren't
synchronized, so I got random bursts of debug messages where some
CPUs found an error and halved their interval, while others didn't
see an error and doubled their interval. The machine check banks for
memory corrected errors are socket scoped, so when an error is logged
whichever CPU on the socket polls next will find the error.

Both mcelog and EDAC were invoked on the mce decode chain and logged
errors OK.

When I stopped injecting, all the CPUs doubled back up to maximum
polling interval.

Summary: This is working as well as can be expected given the shared
scope of the machine check banks. If Linux were to understand the
scope of machine check banks it might designate a single CPU in
that scope to do the polling. But Intel doesn't make it easy to derive
the scope. In any case, the common case is CMCI enabled.

-Tony

Re: [PATCH] x86/mce: Restore MCA polling interval halving

Posted by Borislav Petkov 2 months ago

On Tue, Apr 14, 2026 at 03:22:23PM -0700, Luck, Tony wrote:
> Ran my own test. RAS_CEC disabled. Booted with mce=no_cmci injected a
> corrected error every twenty seconds. Added pr_info() to mce_timer_fn()
> to say which CPUs were doubling or halving interval.

Right, we still need some sort of a feedback that we've logged an error.

> Results:
> 
> I did see some "Machine check events logged" console messages.

Right, mce_timer_fn().

Not sure that is the right place tho. We want to issue that printk the moment
we log an MCE, perhaps in the early notifier or so, where mce_notify_irq()
was.

> The debug messages are "interesting". Polling timers on CPUs aren't
> synchronized, so I got random bursts of debug messages where some
> CPUs found an error and halved their interval, while others didn't
> see an error and doubled their interval. The machine check banks for
> memory corrected errors are socket scoped, so when an error is logged
> whichever CPU on the socket polls next will find the error.
> 
> Both mcelog and EDAC were invoked on the mce decode chain and logged
> errors OK.
> 
> When I stopped injecting, all the CPUs doubled back up to maximum
> polling interval.
> 
> Summary: This is working as well as can be expected given the shared
> scope of the machine check banks. If Linux were to understand the
> scope of machine check banks it might designate a single CPU in
> that scope to do the polling. But Intel doesn't make it easy to derive
> the scope. In any case, the common case is CMCI enabled.

Thanks for the testing - much appreciated.

One aspect remained unanswered: 

mce_notify_irq -> mce_work_trigger -> schedule_work(&mce_trigger_work); ->
mce_do_trigger -> 

	call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);

Is that thing still used?

If so, what is the use case? Is per-chance that mce_helper the userspace
mcelog tool which the kernel calls here on a MCE?

Or?

Do we need that still?

If not, ripping that out would be a nice, additional simplification.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

RE: [PATCH] x86/mce: Restore MCA polling interval halving

Posted by Luck, Tony 2 months ago

> One aspect remained unanswered:
>
> mce_notify_irq -> mce_work_trigger -> schedule_work(&mce_trigger_work); ->
> mce_do_trigger ->
>
>       call_usermodehelper(mce_helper, mce_helper_argv, NULL, UMH_NO_WAIT);
>
> Is that thing still used?
>
> If so, what is the use case? Is per-chance that mce_helper the userspace
> mcelog tool which the kernel calls here on a MCE?
>
> Or?
>
> Do we need that still?
>
> If not, ripping that out would be a nice, additional simplification.

It is documented here: https://github.com/andikleen/mcelog?tab=readme-ov-file#readme

Three ways to invoke mcelog (cron job, trigger, daemon). That doc describes
trigger as "a newer method", but git blame shows that is part of the original version of the
README,md from 2016. So "new" some time more than 10 years ago.

That doc recommends daemon mode. Also, there's a geologically slow trend moving
from mcelog to rasdaemon.

Maybe we can drop it, but with the caveat that we'd need to revert if anyone ever
complains about breakage to user interface (to avoid the wrath of Torvalds).

I could update the mcelog document to say trigger mode supported in kernel version
<= 7.0

-Tony

Re: [PATCH] x86/mce: Restore MCA polling interval halving

Posted by Borislav Petkov 2 months ago

On Wed, Apr 15, 2026 at 07:53:58PM +0000, Luck, Tony wrote:
> It is documented here: https://github.com/andikleen/mcelog?tab=readme-ov-file#readme
> 
> Three ways to invoke mcelog (cron job, trigger, daemon). That doc describes
> trigger as "a newer method", but git blame shows that is part of the original version of the
> README,md from 2016. So "new" some time more than 10 years ago.
> 
> That doc recommends daemon mode. Also, there's a geologically slow trend moving
> from mcelog to rasdaemon.
> 
> Maybe we can drop it, but with the caveat that we'd need to revert if anyone ever
> complains about breakage to user interface (to avoid the wrath of Torvalds).
> 
> I could update the mcelog document to say trigger mode supported in kernel version
> <= 7.0

Or simply deprecate it...

I mean, the savings are nothing earth-shattering to even do the effort here so
pls do whatever's the easiest.

Lemme think about how to restructure this patch of mine...

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] x86/mce: Restore MCA polling interval halving

Posted by Borislav Petkov 1 month, 4 weeks ago

On Wed, Apr 15, 2026 at 10:02:03PM +0200, Borislav Petkov wrote:
> Lemme think about how to restructure this patch of mine...

Ok, totally untested. This is only to show the idea. I've basically went and
distributed the functionality where it fits best: the pr_info logging at
mce_log time and the work trigger in the notifier. It ended up like below.

I'll run it but I'd let you folks check it first, whether I've missed an angle
conceptually.

Thx.

---
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 8dd424ac5de8..f3a793e3a6c8 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -90,7 +90,6 @@ struct mca_config mca_cfg __read_mostly = {
 };
 
 static DEFINE_PER_CPU(struct mce_hw_err, hw_errs_seen);
-static unsigned long mce_need_notify;
 
 /*
  * MCA banks polled by the period polling timer for corrected events.
@@ -152,8 +151,10 @@ EXPORT_PER_CPU_SYMBOL_GPL(injectm);
 
 void mce_log(struct mce_hw_err *err)
 {
-	if (mce_gen_pool_add(err))
+	if (mce_gen_pool_add(err)) {
+		pr_info(HW_ERR "Machine check events logged\n");
 		irq_work_queue(&mce_irq_work);
+	}
 }
 EXPORT_SYMBOL_GPL(mce_log);
 
@@ -585,28 +586,6 @@ bool mce_is_correctable(struct mce *m)
 }
 EXPORT_SYMBOL_GPL(mce_is_correctable);
 
-/*
- * Notify the user(s) about new machine check events.
- * Can be called from interrupt context, but not from machine check/NMI
- * context.
- */
-static bool mce_notify_irq(void)
-{
-	/* Not more than two messages every minute */
-	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
-
-	if (test_and_clear_bit(0, &mce_need_notify)) {
-		mce_work_trigger();
-
-		if (__ratelimit(&ratelimit))
-			pr_info(HW_ERR "Machine check events logged\n");
-
-		return true;
-	}
-
-	return false;
-}
-
 static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
 			      void *data)
 {
@@ -618,9 +597,7 @@ static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
 	/* Emit the trace record: */
 	trace_mce_record(err);
 
-	set_bit(0, &mce_need_notify);
-
-	mce_notify_irq();
+	mce_work_trigger();
 
 	return NOTIFY_DONE;
 }
@@ -1804,7 +1781,7 @@ static void mce_timer_fn(struct timer_list *t)
 	 * Alert userspace if needed. If we logged an MCE, reduce the polling
 	 * interval, otherwise increase the polling interval.
 	 */
-	if (mce_notify_irq())
+	if (!mce_gen_pool_empty())
 		iv = max(iv / 2, (unsigned long) HZ/100);
 	else
 		iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

RE: [PATCH] x86/mce: Restore MCA polling interval halving

Posted by Zhuo, Qiuxu 1 month, 3 weeks ago

Hi Boris,

> From: Borislav Petkov <bp@alien8.de>
> [...]
> 
> On Wed, Apr 15, 2026 at 10:02:03PM +0200, Borislav Petkov wrote:
> > Lemme think about how to restructure this patch of mine...
> 
> Ok, totally untested. This is only to show the idea. I've basically went and
> distributed the functionality where it fits best: the pr_info logging at mce_log
> time and the work trigger in the notifier. It ended up like below.
> 
> I'll run it but I'd let you folks check it first, whether I've missed an angle
> conceptually.
> 

1. Test precondition:
     - Added debug messages [1] on top of Boris' patch.
     - RAS_CEC was disabled.
     - A correctable error was injected every 10 seconds.

2. Tested with CMCI interrupts enabled:
   - The message "Machine check events logged" was printed each time a correctable error was injected.
   - EDAC and mcelog in the decode chain were notified as expected.

    So, this part tested OK.

3. Tested in polling mode (boot with "mce=no_cmci"):
   - A CPU’s timer interval was halved after calling mce_log(), or when !mce_gen_pool_empty() was true during polling [2].
   - A CPU’s timer interval was doubled when mce_gen_pool_empty() was true during polling [2].

    This part tested OK, but please see comments below about mce_gen_pool_empty() check in mce_timer_fn().
   
[1]
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index f3a793e3a6c8..927dcdb15ff4 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -152,7 +152,7 @@ EXPORT_PER_CPU_SYMBOL_GPL(injectm);
 void mce_log(struct mce_hw_err *err)
 {
        if (mce_gen_pool_add(err)) {
-               pr_info(HW_ERR "Machine check events logged\n");
+               pr_info(HW_ERR "Machine check events logged by CPU %d\n", smp_processor_id());
                irq_work_queue(&mce_irq_work);
        }
 }
@@ -1781,10 +1781,13 @@ static void mce_timer_fn(struct timer_list *t)
         * Alert userspace if needed. If we logged an MCE, reduce the polling
         * interval, otherwise increase the polling interval.
         */
-       if (!mce_gen_pool_empty())
+       if (!mce_gen_pool_empty()) {
                iv = max(iv / 2, (unsigned long) HZ/100);
-       else
+               pr_info("!mce_gen_pool_empty() - CPU %d halves timer interval %ums\n", smp_processor_id(), jiffies_to_msecs(iv));
+       } else {
                iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));
+               pr_info(" mce_gen_pool_empty() - CPU %d doubles timer interval %ums\n", smp_processor_id(), jiffies_to_msecs(iv));
+       }

        if (mce_get_storm_mode()) {
                __start_timer(t, HZ);

[2] See example of 'CPU 82':

dmesg | grep -E 'Machine check events logged|CPU 82' | grep -v "EDAC"

[  323.797260] mce: [Hardware Error]: Machine check events logged by CPU 82
[  323.804985] mce: [Hardware Error]: Machine check events logged by CPU 82
[  323.812618] mce: [Hardware Error]: Machine check events logged by CPU 82
[  323.820237] mce: [Hardware Error]: Machine check events logged by CPU 82
[  323.827868] mce: !mce_gen_pool_empty() - CPU 82 halves timer interval 150000ms
[  323.827970] mce: [Hardware Error]: Machine check events logged by CPU 147
[  487.635781] mce: [Hardware Error]: Machine check events logged by CPU 219
[  487.652751] mce: [Hardware Error]: Machine check events logged by CPU 219
[  487.660571] mce: [Hardware Error]: Machine check events logged by CPU 219
[  487.668386] mce: [Hardware Error]: Machine check events logged by CPU 219
[  487.676195] mce: [Hardware Error]: Machine check events logged by CPU 219
[  487.684874] mce: !mce_gen_pool_empty() - CPU 82 halves timer interval 75000ms
[  563.411184] mce: [Hardware Error]: Machine check events logged by CPU 88
[  563.427845] mce: [Hardware Error]: Machine check events logged by CPU 88
[  563.435553] mce: [Hardware Error]: Machine check events logged by CPU 88
[  563.444290] mce: !mce_gen_pool_empty() - CPU 82 halves timer interval 37500ms
[  602.322784] mce: [Hardware Error]: Machine check events logged by CPU 241
[  602.331355] mce: [Hardware Error]: Machine check events logged by CPU 241
[  602.339264] mce: !mce_gen_pool_empty() - CPU 82 halves timer interval 18748ms
[  622.802721] mce: [Hardware Error]: Machine check events logged by CPU 82
[  622.811199] mce: [Hardware Error]: Machine check events logged by CPU 82
[  622.818948] mce: !mce_gen_pool_empty() - CPU 82 halves timer interval 9372ms
[  632.018480] mce: [Hardware Error]: Machine check events logged by CPU 273
[  632.026526] mce: [Hardware Error]: Machine check events logged by CPU 185
[  632.275383] mce:  mce_gen_pool_empty() - CPU 82 doubles timer interval 18744ms
[  647.122282] mce: [Hardware Error]: Machine check events logged by CPU 273
[  651.475854] mce:  mce_gen_pool_empty() - CPU 82 doubles timer interval 37488ms
[  661.970112] mce: [Hardware Error]: Machine check events logged by CPU 273
[  677.073945] mce: [Hardware Error]: Machine check events logged by CPU 273
[  682.193878] mce: [Hardware Error]: Machine check events logged by CPU 273
[  690.386214] mce:  mce_gen_pool_empty() - CPU 82 doubles timer interval 74976ms
[  692.433727] mce: [Hardware Error]: Machine check events logged by CPU 225
[  712.913487] mce: [Hardware Error]: Machine check events logged by CPU 113
[  717.009440] mce: [Hardware Error]: Machine check events logged by CPU 232
[  743.632392] mce: [Hardware Error]: Machine check events logged by CPU 113
[  743.640869] mce: [Hardware Error]: Machine check events logged by CPU 113
[  757.967947] mce: [Hardware Error]: Machine check events logged by CPU 273
[  766.160445] mce:  mce_gen_pool_empty() - CPU 82 doubles timer interval 149952ms
[  768.207807] mce: [Hardware Error]: Machine check events logged by CPU 253
[  792.783453] mce: [Hardware Error]: Machine check events logged by CPU 234
[  807.119257] mce: [Hardware Error]: Machine check events logged by CPU 253
[  817.359155] mce: [Hardware Error]: Machine check events logged by CPU 273
[  831.695030] mce: [Hardware Error]: Machine check events logged by CPU 234
[  852.174749] mce: [Hardware Error]: Machine check events logged by CPU 232
[  861.646550] mce: [Hardware Error]: Machine check events logged by CPU 232
[  866.510500] mce: [Hardware Error]: Machine check events logged by CPU 234
[  884.430286] mce: [Hardware Error]: Machine check events logged by CPU 234
[  899.534081] mce: [Hardware Error]: Machine check events logged by CPU 234
[  904.654067] mce: [Hardware Error]: Machine check events logged by CPU 234
[  922.573822] mce: [Hardware Error]: Machine check events logged by CPU 234
[  929.998246] mce: [Hardware Error]: Machine check events logged by CPU 261
[  930.000003] mce:  mce_gen_pool_empty() - CPU 82 doubles timer interval 299904ms
[  944.333567] mce: [Hardware Error]: Machine check events logged by CPU 232
[  952.525529] mce: [Hardware Error]: Machine check events logged by CPU 273
[  964.813362] mce: [Hardware Error]: Machine check events logged by CPU 232
[  979.149459] mce: [Hardware Error]: Machine check events logged by CPU 225
[  991.436910] mce: [Hardware Error]: Machine check events logged by CPU 273
[ 1005.772632] mce: [Hardware Error]: Machine check events logged by CPU 261
[ 1028.300439] mce: [Hardware Error]: Machine check events logged by CPU 233
[ 1028.300522] mce: [Hardware Error]: Machine check events logged by CPU 233
[ 1044.683195] mce: [Hardware Error]: Machine check events logged by CPU 261
[ 1054.922669] mce: [Hardware Error]: Machine check events logged by CPU 225
[ 1065.162622] mce: [Hardware Error]: Machine check events logged by CPU 261


> Thx.
> 
> ---
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 8dd424ac5de8..f3a793e3a6c8 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -90,7 +90,6 @@ struct mca_config mca_cfg __read_mostly = {  };
> 
>  static DEFINE_PER_CPU(struct mce_hw_err, hw_errs_seen); -static unsigned
> long mce_need_notify;
> 
>  /*
>   * MCA banks polled by the period polling timer for corrected events.
> @@ -152,8 +151,10 @@ EXPORT_PER_CPU_SYMBOL_GPL(injectm);
> 
>  void mce_log(struct mce_hw_err *err)
>  {
> -	if (mce_gen_pool_add(err))
> +	if (mce_gen_pool_add(err)) {
> +		pr_info(HW_ERR "Machine check events logged\n");
>  		irq_work_queue(&mce_irq_work);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(mce_log);
> 
> @@ -585,28 +586,6 @@ bool mce_is_correctable(struct mce *m)  }
> EXPORT_SYMBOL_GPL(mce_is_correctable);
> 
> -/*
> - * Notify the user(s) about new machine check events.
> - * Can be called from interrupt context, but not from machine check/NMI
> - * context.
> - */
> -static bool mce_notify_irq(void)
> -{
> -	/* Not more than two messages every minute */
> -	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
> -
> -	if (test_and_clear_bit(0, &mce_need_notify)) {
> -		mce_work_trigger();
> -
> -		if (__ratelimit(&ratelimit))
> -			pr_info(HW_ERR "Machine check events logged\n");
> -
> -		return true;
> -	}
> -
> -	return false;
> -}
> -
>  static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
>  			      void *data)
>  {
> @@ -618,9 +597,7 @@ static int mce_early_notifier(struct notifier_block *nb,
> unsigned long val,
>  	/* Emit the trace record: */
>  	trace_mce_record(err);
> 
> -	set_bit(0, &mce_need_notify);
> -
> -	mce_notify_irq();
> +	mce_work_trigger();
> 
>  	return NOTIFY_DONE;
>  }
> @@ -1804,7 +1781,7 @@ static void mce_timer_fn(struct timer_list *t)
>  	 * Alert userspace if needed. If we logged an MCE, reduce the polling
>  	 * interval, otherwise increase the polling interval.
>  	 */
> -	if (mce_notify_irq())
> +	if (!mce_gen_pool_empty())

mce_timer_fn()
  machine_check_poll()
        mce_log()
          irq_work_queue(&mce_irq_work)
            ...
              mce_irq_work_cb()
                mce_schedule_work()
                  schedule_work(&mce_work)
                    ...
                      mce_gen_pool_process() // [3] worker thread concurrently running on any CPU handles MCE logs.

  mce_gen_pool_empty() // [4]

It seems there is a race between [3] and [4].
Although my testing did not observe this race, it's possible 
that mce_timer_fn() (in softirq) completes fast 
enough that it always finishes before [1] (in worker thread) is scheduled to run.

>  		iv = max(iv / 2, (unsigned long) HZ/100);
>  	else
>  		iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));
> 

[...]

Thanks!
- Qiuxu

Re: [PATCH] x86/mce: Restore MCA polling interval halving

Posted by Borislav Petkov 1 month, 3 weeks ago

Hi Qiuxu,

On Mon, Apr 20, 2026 at 02:14:52PM +0000, Zhuo, Qiuxu wrote:
> 1. Test precondition:
>      - Added debug messages [1] on top of Boris' patch.
>      - RAS_CEC was disabled.
>      - A correctable error was injected every 10 seconds.
> 
> 2. Tested with CMCI interrupts enabled:
>    - The message "Machine check events logged" was printed each time a correctable error was injected.
>    - EDAC and mcelog in the decode chain were notified as expected.
> 
>     So, this part tested OK.
> 
> 3. Tested in polling mode (boot with "mce=no_cmci"):
>    - A CPU’s timer interval was halved after calling mce_log(), or when !mce_gen_pool_empty() was true during polling [2].
>    - A CPU’s timer interval was doubled when mce_gen_pool_empty() was true during polling [2].
> 
>     This part tested OK, but please see comments below about mce_gen_pool_empty() check in mce_timer_fn().

Thanks for testing.

> mce_timer_fn()
>   machine_check_poll()
>         mce_log()
>           irq_work_queue(&mce_irq_work)
>             ...
>               mce_irq_work_cb()
>                 mce_schedule_work()
>                   schedule_work(&mce_work)
>                     ...
>                       mce_gen_pool_process() // [3] worker thread concurrently running on any CPU handles MCE logs.
> 
>   mce_gen_pool_empty() // [4]
> 
> It seems there is a race between [3] and [4].
> Although my testing did not observe this race, it's possible 
> that mce_timer_fn() (in softirq) completes fast 
> enough that it always finishes before [1] (in worker thread) is scheduled to run.

Does this and the next message in the thread explain the situation?

https://lore.kernel.org/r/20260207115142.GBaYcnTp7maUDVv3Nc@fat_crate.local

Bottom line: I don't think this was ever meant to be anything but a rough and
simple method to catch too many errors being logged and halve the polling
interval.

IOW, even if the above race happens, in the abundance of too many errors, it
would pick up and start halving eventually.

Right?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

RE: [PATCH] x86/mce: Restore MCA polling interval halving

Posted by Zhuo, Qiuxu 1 month, 3 weeks ago

Hi Boris,

> From: Borislav Petkov <bp@alien8.de>
> [...]
> > mce_timer_fn()
> >   machine_check_poll()
> >         mce_log()
> >           irq_work_queue(&mce_irq_work)
> >             ...
> >               mce_irq_work_cb()
> >                 mce_schedule_work()
> >                   schedule_work(&mce_work)
> >                     ...
> >                       mce_gen_pool_process() // [3] worker thread concurrently
> running on any CPU handles MCE logs.
> >
> >   mce_gen_pool_empty() // [4]
> >
> > It seems there is a race between [3] and [4].
> > Although my testing did not observe this race, it's possible that
> > mce_timer_fn() (in softirq) completes fast enough that it always
> > finishes before [1] (in worker thread) is scheduled to run.
> 
> Does this and the next message in the thread explain the situation?

Yes. Thanks for pointing out this link.

> https://lore.kernel.org/r/20260207115142.GBaYcnTp7maUDVv3Nc@fat_crate.l
> ocal
> 
> Bottom line: I don't think this was ever meant to be anything but a rough and
> simple method to catch too many errors being logged and halve the polling
> interval.

Agree.

> IOW, even if the above race happens, in the abundance of too many errors, it
> would pick up and start halving eventually.
> 
> Right?
>

Yes, I think so. 

So, to me, the current rough and simple method for catching frequent error cases is a good
trade-off between accuracy and complexity.

  Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> 
  Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> 

Thanks!
-Qiuxu

Re: [PATCH] x86/mce: Restore MCA polling interval halving

Posted by Borislav Petkov 1 month, 3 weeks ago

On Tue, Apr 21, 2026 at 03:49:06PM +0000, Zhuo, Qiuxu wrote:
> Yes, I think so. 
> 
> So, to me, the current rough and simple method for catching frequent error cases is a good
> trade-off between accuracy and complexity.
> 
>   Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> 
>   Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> 

Thanks!

I have this now - I will hammer on it some more before I queue it.

---
Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Mon Mar 16 16:12:00 2026 +0100

    x86/mce: Restore MCA polling interval halving
    
    RongQing reported that the MCA polling interval doesn't halve when an
    error gets logged. It was traced down to the commit in Fixes:, because:
    
      mce_timer_fn()
      |-> mce_poll_banks()
      |-> machine_check_poll()
      |-> mce_log()
    
    which will queue the work and return.
    
    Now, back in mce_timer_fn():
    
            /*
             * Alert userspace if needed. If we logged an MCE, reduce the polling
             * interval, otherwise increase the polling interval.
             */
            if (mce_notify_irq())
    
    <--- here we haven't ran the notifier chain yet so mce_need_notify is
    not set yet so this won't hit and we won't halve the interval iv.
    
    Now the notifier chain runs. mce_early_notifier() sets the bit, does
    mce_notify_irq(), that clears the bit and then the notifier chain
    a little later logs the error.
    
    So this is a silly timing issue.
    
    But, that's all unnecessary.
    
    All it needs to happen here is, the "should we notify of a logged MCE"
    mce_notify_irq() asks, should be simply a question to the mce gen pool:
    "Are you empty?"
    
    And that then turns into a simple yes or no answer and it all
    JustWorks(tm).
    
    So do that and also distribute the functionality where it belongs:
     - Print that MCE events have been logged in mce_log()
     - Trigger the mcelog tool specific work in the first notifier
    
    As a result, mce_notify_irq() can go now.
    
    Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
    Reported-by: Li RongQing <lirongqing@baidu.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
    Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
    Link: https://lore.kernel.org/r/20260112082747.2842-1-lirongqing@baidu.com

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 8dd424ac5de8..f3a793e3a6c8 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -90,7 +90,6 @@ struct mca_config mca_cfg __read_mostly = {
 };
 
 static DEFINE_PER_CPU(struct mce_hw_err, hw_errs_seen);
-static unsigned long mce_need_notify;
 
 /*
  * MCA banks polled by the period polling timer for corrected events.
@@ -152,8 +151,10 @@ EXPORT_PER_CPU_SYMBOL_GPL(injectm);
 
 void mce_log(struct mce_hw_err *err)
 {
-	if (mce_gen_pool_add(err))
+	if (mce_gen_pool_add(err)) {
+		pr_info(HW_ERR "Machine check events logged\n");
 		irq_work_queue(&mce_irq_work);
+	}
 }
 EXPORT_SYMBOL_GPL(mce_log);
 
@@ -585,28 +586,6 @@ bool mce_is_correctable(struct mce *m)
 }
 EXPORT_SYMBOL_GPL(mce_is_correctable);
 
-/*
- * Notify the user(s) about new machine check events.
- * Can be called from interrupt context, but not from machine check/NMI
- * context.
- */
-static bool mce_notify_irq(void)
-{
-	/* Not more than two messages every minute */
-	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
-
-	if (test_and_clear_bit(0, &mce_need_notify)) {
-		mce_work_trigger();
-
-		if (__ratelimit(&ratelimit))
-			pr_info(HW_ERR "Machine check events logged\n");
-
-		return true;
-	}
-
-	return false;
-}
-
 static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
 			      void *data)
 {
@@ -618,9 +597,7 @@ static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
 	/* Emit the trace record: */
 	trace_mce_record(err);
 
-	set_bit(0, &mce_need_notify);
-
-	mce_notify_irq();
+	mce_work_trigger();
 
 	return NOTIFY_DONE;
 }
@@ -1804,7 +1781,7 @@ static void mce_timer_fn(struct timer_list *t)
 	 * Alert userspace if needed. If we logged an MCE, reduce the polling
 	 * interval, otherwise increase the polling interval.
 	 */
-	if (mce_notify_irq())
+	if (!mce_gen_pool_empty())
 		iv = max(iv / 2, (unsigned long) HZ/100);
 	else
 		iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: [PATCH] x86/mce: Restore MCA polling interval halving

Posted by Nikolay Borisov 2 months, 1 week ago


On 7.04.26 г. 1:49 ч., Borislav Petkov wrote:
> Ok,
> 
> finally. :-\
> 
> Pls run it to make sure it DTRT for you too.
> 
> Thx.
> 
> ---
> From: "Borislav Petkov (AMD)" <bp@alien8.de>
> Date: Mon, 16 Mar 2026 16:12:00 +0100
> Subject: [PATCH] x86/mce: Restore MCA polling interval halving
> 
> RongQing reported that the MCA polling interval doesn't halve when an
> error gets logged. It was traced down to the commit in Fixes: because:
> 
> mce_timer_fn()
> |-> mce_poll_banks()
> |-> machine_check_poll()
> |-> mce_log()
> 
> which will queue the work and return.
> 
> Now, back in mce_timer_fn():
> 
>          /*
>           * Alert userspace if needed. If we logged an MCE, reduce the polling
>           * interval, otherwise increase the polling interval.
>           */
>          if (mce_notify_irq())
> 
> <--- here we haven't ran the notifier chain yet so mce_need_notify is
> not set yet so this won't hit and we won't halve the interval iv.
> 
> Now the notifier chain runs. mce_early_notifier() sets the bit, does
> mce_notify_irq(), that clears the bit and then the notifier chain
> a little later logs the error.
> 
> So this is a silly timing issue.
> 
> But, that's all unnecessary.
> 
> All it needs to happen here is, the "should we notify of a logged MCE"
> mce_notify_irq() asks, should be simply a question to the mce gen pool:
> "Are you empty?"
> 
> And that then turns into a simple yes or no answer and it all
> JustWorks(tm).
> 
> So do that.
> 
> Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
> Reported-by: Li RongQing <lirongqing@baidu.com>
> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
> Link: https://lore.kernel.org/r/20260112082747.2842-1-lirongqing@baidu.com


Much cleaner and simpler,

Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>