RE: [PATCH] x86/mce: Restore MCA polling interval halving

Zhuo, Qiuxu posted 1 patch 1 month, 3 weeks ago
RE: [PATCH] x86/mce: Restore MCA polling interval halving
Posted by Zhuo, Qiuxu 1 month, 3 weeks ago
Hi Boris,

> From: Borislav Petkov <bp@alien8.de>
> [...]
> 
> On Wed, Apr 15, 2026 at 10:02:03PM +0200, Borislav Petkov wrote:
> > Lemme think about how to restructure this patch of mine...
> 
> Ok, totally untested. This is only to show the idea. I've basically went and
> distributed the functionality where it fits best: the pr_info logging at mce_log
> time and the work trigger in the notifier. It ended up like below.
> 
> I'll run it but I'd let you folks check it first, whether I've missed an angle
> conceptually.
> 

1. Test precondition:
     - Added debug messages [1] on top of Boris' patch.
     - RAS_CEC was disabled.
     - A correctable error was injected every 10 seconds.

2. Tested with CMCI interrupts enabled:
   - The message "Machine check events logged" was printed each time a correctable error was injected.
   - EDAC and mcelog in the decode chain were notified as expected.

    So, this part tested OK.

3. Tested in polling mode (boot with "mce=no_cmci"):
   - A CPU’s timer interval was halved after calling mce_log(), or when !mce_gen_pool_empty() was true during polling [2].
   - A CPU’s timer interval was doubled when mce_gen_pool_empty() was true during polling [2].

    This part tested OK, but please see comments below about mce_gen_pool_empty() check in mce_timer_fn().
   
[1]
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index f3a793e3a6c8..927dcdb15ff4 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -152,7 +152,7 @@ EXPORT_PER_CPU_SYMBOL_GPL(injectm);
 void mce_log(struct mce_hw_err *err)
 {
        if (mce_gen_pool_add(err)) {
-               pr_info(HW_ERR "Machine check events logged\n");
+               pr_info(HW_ERR "Machine check events logged by CPU %d\n", smp_processor_id());
                irq_work_queue(&mce_irq_work);
        }
 }
@@ -1781,10 +1781,13 @@ static void mce_timer_fn(struct timer_list *t)
         * Alert userspace if needed. If we logged an MCE, reduce the polling
         * interval, otherwise increase the polling interval.
         */
-       if (!mce_gen_pool_empty())
+       if (!mce_gen_pool_empty()) {
                iv = max(iv / 2, (unsigned long) HZ/100);
-       else
+               pr_info("!mce_gen_pool_empty() - CPU %d halves timer interval %ums\n", smp_processor_id(), jiffies_to_msecs(iv));
+       } else {
                iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));
+               pr_info(" mce_gen_pool_empty() - CPU %d doubles timer interval %ums\n", smp_processor_id(), jiffies_to_msecs(iv));
+       }

        if (mce_get_storm_mode()) {
                __start_timer(t, HZ);

[2] See example of 'CPU 82':

dmesg | grep -E 'Machine check events logged|CPU 82' | grep -v "EDAC"

[  323.797260] mce: [Hardware Error]: Machine check events logged by CPU 82
[  323.804985] mce: [Hardware Error]: Machine check events logged by CPU 82
[  323.812618] mce: [Hardware Error]: Machine check events logged by CPU 82
[  323.820237] mce: [Hardware Error]: Machine check events logged by CPU 82
[  323.827868] mce: !mce_gen_pool_empty() - CPU 82 halves timer interval 150000ms
[  323.827970] mce: [Hardware Error]: Machine check events logged by CPU 147
[  487.635781] mce: [Hardware Error]: Machine check events logged by CPU 219
[  487.652751] mce: [Hardware Error]: Machine check events logged by CPU 219
[  487.660571] mce: [Hardware Error]: Machine check events logged by CPU 219
[  487.668386] mce: [Hardware Error]: Machine check events logged by CPU 219
[  487.676195] mce: [Hardware Error]: Machine check events logged by CPU 219
[  487.684874] mce: !mce_gen_pool_empty() - CPU 82 halves timer interval 75000ms
[  563.411184] mce: [Hardware Error]: Machine check events logged by CPU 88
[  563.427845] mce: [Hardware Error]: Machine check events logged by CPU 88
[  563.435553] mce: [Hardware Error]: Machine check events logged by CPU 88
[  563.444290] mce: !mce_gen_pool_empty() - CPU 82 halves timer interval 37500ms
[  602.322784] mce: [Hardware Error]: Machine check events logged by CPU 241
[  602.331355] mce: [Hardware Error]: Machine check events logged by CPU 241
[  602.339264] mce: !mce_gen_pool_empty() - CPU 82 halves timer interval 18748ms
[  622.802721] mce: [Hardware Error]: Machine check events logged by CPU 82
[  622.811199] mce: [Hardware Error]: Machine check events logged by CPU 82
[  622.818948] mce: !mce_gen_pool_empty() - CPU 82 halves timer interval 9372ms
[  632.018480] mce: [Hardware Error]: Machine check events logged by CPU 273
[  632.026526] mce: [Hardware Error]: Machine check events logged by CPU 185
[  632.275383] mce:  mce_gen_pool_empty() - CPU 82 doubles timer interval 18744ms
[  647.122282] mce: [Hardware Error]: Machine check events logged by CPU 273
[  651.475854] mce:  mce_gen_pool_empty() - CPU 82 doubles timer interval 37488ms
[  661.970112] mce: [Hardware Error]: Machine check events logged by CPU 273
[  677.073945] mce: [Hardware Error]: Machine check events logged by CPU 273
[  682.193878] mce: [Hardware Error]: Machine check events logged by CPU 273
[  690.386214] mce:  mce_gen_pool_empty() - CPU 82 doubles timer interval 74976ms
[  692.433727] mce: [Hardware Error]: Machine check events logged by CPU 225
[  712.913487] mce: [Hardware Error]: Machine check events logged by CPU 113
[  717.009440] mce: [Hardware Error]: Machine check events logged by CPU 232
[  743.632392] mce: [Hardware Error]: Machine check events logged by CPU 113
[  743.640869] mce: [Hardware Error]: Machine check events logged by CPU 113
[  757.967947] mce: [Hardware Error]: Machine check events logged by CPU 273
[  766.160445] mce:  mce_gen_pool_empty() - CPU 82 doubles timer interval 149952ms
[  768.207807] mce: [Hardware Error]: Machine check events logged by CPU 253
[  792.783453] mce: [Hardware Error]: Machine check events logged by CPU 234
[  807.119257] mce: [Hardware Error]: Machine check events logged by CPU 253
[  817.359155] mce: [Hardware Error]: Machine check events logged by CPU 273
[  831.695030] mce: [Hardware Error]: Machine check events logged by CPU 234
[  852.174749] mce: [Hardware Error]: Machine check events logged by CPU 232
[  861.646550] mce: [Hardware Error]: Machine check events logged by CPU 232
[  866.510500] mce: [Hardware Error]: Machine check events logged by CPU 234
[  884.430286] mce: [Hardware Error]: Machine check events logged by CPU 234
[  899.534081] mce: [Hardware Error]: Machine check events logged by CPU 234
[  904.654067] mce: [Hardware Error]: Machine check events logged by CPU 234
[  922.573822] mce: [Hardware Error]: Machine check events logged by CPU 234
[  929.998246] mce: [Hardware Error]: Machine check events logged by CPU 261
[  930.000003] mce:  mce_gen_pool_empty() - CPU 82 doubles timer interval 299904ms
[  944.333567] mce: [Hardware Error]: Machine check events logged by CPU 232
[  952.525529] mce: [Hardware Error]: Machine check events logged by CPU 273
[  964.813362] mce: [Hardware Error]: Machine check events logged by CPU 232
[  979.149459] mce: [Hardware Error]: Machine check events logged by CPU 225
[  991.436910] mce: [Hardware Error]: Machine check events logged by CPU 273
[ 1005.772632] mce: [Hardware Error]: Machine check events logged by CPU 261
[ 1028.300439] mce: [Hardware Error]: Machine check events logged by CPU 233
[ 1028.300522] mce: [Hardware Error]: Machine check events logged by CPU 233
[ 1044.683195] mce: [Hardware Error]: Machine check events logged by CPU 261
[ 1054.922669] mce: [Hardware Error]: Machine check events logged by CPU 225
[ 1065.162622] mce: [Hardware Error]: Machine check events logged by CPU 261


> Thx.
> 
> ---
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 8dd424ac5de8..f3a793e3a6c8 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -90,7 +90,6 @@ struct mca_config mca_cfg __read_mostly = {  };
> 
>  static DEFINE_PER_CPU(struct mce_hw_err, hw_errs_seen); -static unsigned
> long mce_need_notify;
> 
>  /*
>   * MCA banks polled by the period polling timer for corrected events.
> @@ -152,8 +151,10 @@ EXPORT_PER_CPU_SYMBOL_GPL(injectm);
> 
>  void mce_log(struct mce_hw_err *err)
>  {
> -	if (mce_gen_pool_add(err))
> +	if (mce_gen_pool_add(err)) {
> +		pr_info(HW_ERR "Machine check events logged\n");
>  		irq_work_queue(&mce_irq_work);
> +	}
>  }
>  EXPORT_SYMBOL_GPL(mce_log);
> 
> @@ -585,28 +586,6 @@ bool mce_is_correctable(struct mce *m)  }
> EXPORT_SYMBOL_GPL(mce_is_correctable);
> 
> -/*
> - * Notify the user(s) about new machine check events.
> - * Can be called from interrupt context, but not from machine check/NMI
> - * context.
> - */
> -static bool mce_notify_irq(void)
> -{
> -	/* Not more than two messages every minute */
> -	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
> -
> -	if (test_and_clear_bit(0, &mce_need_notify)) {
> -		mce_work_trigger();
> -
> -		if (__ratelimit(&ratelimit))
> -			pr_info(HW_ERR "Machine check events logged\n");
> -
> -		return true;
> -	}
> -
> -	return false;
> -}
> -
>  static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
>  			      void *data)
>  {
> @@ -618,9 +597,7 @@ static int mce_early_notifier(struct notifier_block *nb,
> unsigned long val,
>  	/* Emit the trace record: */
>  	trace_mce_record(err);
> 
> -	set_bit(0, &mce_need_notify);
> -
> -	mce_notify_irq();
> +	mce_work_trigger();
> 
>  	return NOTIFY_DONE;
>  }
> @@ -1804,7 +1781,7 @@ static void mce_timer_fn(struct timer_list *t)
>  	 * Alert userspace if needed. If we logged an MCE, reduce the polling
>  	 * interval, otherwise increase the polling interval.
>  	 */
> -	if (mce_notify_irq())
> +	if (!mce_gen_pool_empty())

mce_timer_fn()
  machine_check_poll()
        mce_log()
          irq_work_queue(&mce_irq_work)
            ...
              mce_irq_work_cb()
                mce_schedule_work()
                  schedule_work(&mce_work)
                    ...
                      mce_gen_pool_process() // [3] worker thread concurrently running on any CPU handles MCE logs.

  mce_gen_pool_empty() // [4]

It seems there is a race between [3] and [4].
Although my testing did not observe this race, it's possible 
that mce_timer_fn() (in softirq) completes fast 
enough that it always finishes before [1] (in worker thread) is scheduled to run.

>  		iv = max(iv / 2, (unsigned long) HZ/100);
>  	else
>  		iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));
> 

[...]

Thanks!
- Qiuxu
Re: [PATCH] x86/mce: Restore MCA polling interval halving
Posted by Borislav Petkov 1 month, 3 weeks ago
Hi Qiuxu,

On Mon, Apr 20, 2026 at 02:14:52PM +0000, Zhuo, Qiuxu wrote:
> 1. Test precondition:
>      - Added debug messages [1] on top of Boris' patch.
>      - RAS_CEC was disabled.
>      - A correctable error was injected every 10 seconds.
> 
> 2. Tested with CMCI interrupts enabled:
>    - The message "Machine check events logged" was printed each time a correctable error was injected.
>    - EDAC and mcelog in the decode chain were notified as expected.
> 
>     So, this part tested OK.
> 
> 3. Tested in polling mode (boot with "mce=no_cmci"):
>    - A CPU’s timer interval was halved after calling mce_log(), or when !mce_gen_pool_empty() was true during polling [2].
>    - A CPU’s timer interval was doubled when mce_gen_pool_empty() was true during polling [2].
> 
>     This part tested OK, but please see comments below about mce_gen_pool_empty() check in mce_timer_fn().

Thanks for testing.

> mce_timer_fn()
>   machine_check_poll()
>         mce_log()
>           irq_work_queue(&mce_irq_work)
>             ...
>               mce_irq_work_cb()
>                 mce_schedule_work()
>                   schedule_work(&mce_work)
>                     ...
>                       mce_gen_pool_process() // [3] worker thread concurrently running on any CPU handles MCE logs.
> 
>   mce_gen_pool_empty() // [4]
> 
> It seems there is a race between [3] and [4].
> Although my testing did not observe this race, it's possible 
> that mce_timer_fn() (in softirq) completes fast 
> enough that it always finishes before [1] (in worker thread) is scheduled to run.

Does this and the next message in the thread explain the situation?

https://lore.kernel.org/r/20260207115142.GBaYcnTp7maUDVv3Nc@fat_crate.local

Bottom line: I don't think this was ever meant to be anything but a rough and
simple method to catch too many errors being logged and halve the polling
interval.

IOW, even if the above race happens, in the abundance of too many errors, it
would pick up and start halving eventually.

Right?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
RE: [PATCH] x86/mce: Restore MCA polling interval halving
Posted by Zhuo, Qiuxu 1 month, 3 weeks ago
Hi Boris,

> From: Borislav Petkov <bp@alien8.de>
> [...]
> > mce_timer_fn()
> >   machine_check_poll()
> >         mce_log()
> >           irq_work_queue(&mce_irq_work)
> >             ...
> >               mce_irq_work_cb()
> >                 mce_schedule_work()
> >                   schedule_work(&mce_work)
> >                     ...
> >                       mce_gen_pool_process() // [3] worker thread concurrently
> running on any CPU handles MCE logs.
> >
> >   mce_gen_pool_empty() // [4]
> >
> > It seems there is a race between [3] and [4].
> > Although my testing did not observe this race, it's possible that
> > mce_timer_fn() (in softirq) completes fast enough that it always
> > finishes before [1] (in worker thread) is scheduled to run.
> 
> Does this and the next message in the thread explain the situation?

Yes. Thanks for pointing out this link.

> https://lore.kernel.org/r/20260207115142.GBaYcnTp7maUDVv3Nc@fat_crate.l
> ocal
> 
> Bottom line: I don't think this was ever meant to be anything but a rough and
> simple method to catch too many errors being logged and halve the polling
> interval.

Agree.

> IOW, even if the above race happens, in the abundance of too many errors, it
> would pick up and start halving eventually.
> 
> Right?
>

Yes, I think so. 

So, to me, the current rough and simple method for catching frequent error cases is a good
trade-off between accuracy and complexity.

  Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> 
  Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> 

Thanks!
-Qiuxu
Re: [PATCH] x86/mce: Restore MCA polling interval halving
Posted by Borislav Petkov 1 month, 3 weeks ago
On Tue, Apr 21, 2026 at 03:49:06PM +0000, Zhuo, Qiuxu wrote:
> Yes, I think so. 
> 
> So, to me, the current rough and simple method for catching frequent error cases is a good
> trade-off between accuracy and complexity.
> 
>   Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> 
>   Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> 

Thanks!

I have this now - I will hammer on it some more before I queue it.

---
Author: Borislav Petkov (AMD) <bp@alien8.de>
Date:   Mon Mar 16 16:12:00 2026 +0100

    x86/mce: Restore MCA polling interval halving
    
    RongQing reported that the MCA polling interval doesn't halve when an
    error gets logged. It was traced down to the commit in Fixes:, because:
    
      mce_timer_fn()
      |-> mce_poll_banks()
      |-> machine_check_poll()
      |-> mce_log()
    
    which will queue the work and return.
    
    Now, back in mce_timer_fn():
    
            /*
             * Alert userspace if needed. If we logged an MCE, reduce the polling
             * interval, otherwise increase the polling interval.
             */
            if (mce_notify_irq())
    
    <--- here we haven't ran the notifier chain yet so mce_need_notify is
    not set yet so this won't hit and we won't halve the interval iv.
    
    Now the notifier chain runs. mce_early_notifier() sets the bit, does
    mce_notify_irq(), that clears the bit and then the notifier chain
    a little later logs the error.
    
    So this is a silly timing issue.
    
    But, that's all unnecessary.
    
    All it needs to happen here is, the "should we notify of a logged MCE"
    mce_notify_irq() asks, should be simply a question to the mce gen pool:
    "Are you empty?"
    
    And that then turns into a simple yes or no answer and it all
    JustWorks(tm).
    
    So do that and also distribute the functionality where it belongs:
     - Print that MCE events have been logged in mce_log()
     - Trigger the mcelog tool specific work in the first notifier
    
    As a result, mce_notify_irq() can go now.
    
    Fixes: 011d82611172 ("RAS: Add a Corrected Errors Collector")
    Reported-by: Li RongQing <lirongqing@baidu.com>
    Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
    Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
    Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
    Link: https://lore.kernel.org/r/20260112082747.2842-1-lirongqing@baidu.com

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 8dd424ac5de8..f3a793e3a6c8 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -90,7 +90,6 @@ struct mca_config mca_cfg __read_mostly = {
 };
 
 static DEFINE_PER_CPU(struct mce_hw_err, hw_errs_seen);
-static unsigned long mce_need_notify;
 
 /*
  * MCA banks polled by the period polling timer for corrected events.
@@ -152,8 +151,10 @@ EXPORT_PER_CPU_SYMBOL_GPL(injectm);
 
 void mce_log(struct mce_hw_err *err)
 {
-	if (mce_gen_pool_add(err))
+	if (mce_gen_pool_add(err)) {
+		pr_info(HW_ERR "Machine check events logged\n");
 		irq_work_queue(&mce_irq_work);
+	}
 }
 EXPORT_SYMBOL_GPL(mce_log);
 
@@ -585,28 +586,6 @@ bool mce_is_correctable(struct mce *m)
 }
 EXPORT_SYMBOL_GPL(mce_is_correctable);
 
-/*
- * Notify the user(s) about new machine check events.
- * Can be called from interrupt context, but not from machine check/NMI
- * context.
- */
-static bool mce_notify_irq(void)
-{
-	/* Not more than two messages every minute */
-	static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
-
-	if (test_and_clear_bit(0, &mce_need_notify)) {
-		mce_work_trigger();
-
-		if (__ratelimit(&ratelimit))
-			pr_info(HW_ERR "Machine check events logged\n");
-
-		return true;
-	}
-
-	return false;
-}
-
 static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
 			      void *data)
 {
@@ -618,9 +597,7 @@ static int mce_early_notifier(struct notifier_block *nb, unsigned long val,
 	/* Emit the trace record: */
 	trace_mce_record(err);
 
-	set_bit(0, &mce_need_notify);
-
-	mce_notify_irq();
+	mce_work_trigger();
 
 	return NOTIFY_DONE;
 }
@@ -1804,7 +1781,7 @@ static void mce_timer_fn(struct timer_list *t)
 	 * Alert userspace if needed. If we logged an MCE, reduce the polling
 	 * interval, otherwise increase the polling interval.
 	 */
-	if (mce_notify_irq())
+	if (!mce_gen_pool_empty())
 		iv = max(iv / 2, (unsigned long) HZ/100);
 	else
 		iv = min(iv * 2, round_jiffies_relative(check_interval * HZ));


-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette