[PATCH v3 17/17] x86/mce: Restore poll settings after storm subsides

Yazen Ghannam posted 17 patches 8 months, 1 week ago
There is a newer version of this series
[PATCH v3 17/17] x86/mce: Restore poll settings after storm subsides
Posted by Yazen Ghannam 8 months, 1 week ago
Users can disable MCA polling by setting the "ignore_ce" parameter or by
setting "check_interval=0". This tells the kernel to *not* start the MCE
timer on a CPU.

During a CMCI storm, the MCE timer will be started with a fixed
interval. After the storm subsides, the timer's next interval is set to
check_interval.

This disregards the user's input through "ignore_ce" and
"check_interval". Furthermore, if "check_interval=0", then the new timer
will run faster than expected.

Create a new helper to check these conditions and use it when a CMCI
storm ends.

Fixes: 7eae17c4add5 ("x86/mce: Add per-bank CMCI storm mitigation")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: stable@vger.kernel.org
---

Notes:
    v2->v3:
    * New in v3.

 arch/x86/kernel/cpu/mce/core.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 0a2a97681266..131015f5eadc 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1806,6 +1806,11 @@ static void mc_poll_banks_default(void)
 
 void (*mc_poll_banks)(void) = mc_poll_banks_default;
 
+static bool should_enable_timer(unsigned long iv)
+{
+	return !mca_cfg.ignore_ce && iv;
+}
+
 static void mce_timer_fn(struct timer_list *t)
 {
 	struct timer_list *cpu_t = this_cpu_ptr(&mce_timer);
@@ -1829,7 +1834,7 @@ static void mce_timer_fn(struct timer_list *t)
 
 	if (mce_get_storm_mode()) {
 		__start_timer(t, HZ);
-	} else {
+	} else if (should_enable_timer(iv)) {
 		__this_cpu_write(mce_next_interval, iv);
 		__start_timer(t, iv);
 	}
@@ -2142,7 +2147,7 @@ static void mce_start_timer(struct timer_list *t)
 {
 	unsigned long iv = check_interval * HZ;
 
-	if (mca_cfg.ignore_ce || !iv)
+	if (!should_enable_timer(iv))
 		return;
 
 	this_cpu_write(mce_next_interval, iv);

-- 
2.49.0
Re: [PATCH v3 17/17] x86/mce: Restore poll settings after storm subsides
Posted by Borislav Petkov 7 months, 1 week ago
On Tue, Apr 15, 2025 at 02:55:12PM +0000, Yazen Ghannam wrote:
> Users can disable MCA polling by setting the "ignore_ce" parameter or by
> setting "check_interval=0". This tells the kernel to *not* start the MCE
> timer on a CPU.
> 
> During a CMCI storm, the MCE timer will be started with a fixed
> interval.

Why?

If a user doesn't want CEs, why are we even bothering with CMCI storms?

Might as well disable the storm handling code altogether...

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v3 17/17] x86/mce: Restore poll settings after storm subsides
Posted by Yazen Ghannam 7 months, 1 week ago
On Mon, May 12, 2025 at 09:46:16AM +0200, Borislav Petkov wrote:
> On Tue, Apr 15, 2025 at 02:55:12PM +0000, Yazen Ghannam wrote:
> > Users can disable MCA polling by setting the "ignore_ce" parameter or by
> > setting "check_interval=0". This tells the kernel to *not* start the MCE
> > timer on a CPU.
> > 
> > During a CMCI storm, the MCE timer will be started with a fixed
> > interval.
> 
> Why?
> 
> If a user doesn't want CEs, why are we even bothering with CMCI storms?
> 
> Might as well disable the storm handling code altogether...
> 

The use case is "disable MCA polling". I just gave two examples of how
this can be done.

We can focus on "check_interval=0". The user wants to disable MCA
polling and rely only on interrupts. They still want to see the CEs.

If an interrupt storm happens, the kernel will switch to polling with
our MCE timer.

When the storm ends, the kernel should go back to how things were before
the storm. If there was a timer before, then go back to the old
interval. If there was *not* a timer before, then delete/remove the
timer.

Thanks,
Yazen
Re: [PATCH v3 17/17] x86/mce: Restore poll settings after storm subsides
Posted by Borislav Petkov 7 months, 1 week ago
On Mon, May 12, 2025 at 11:43:15AM -0400, Yazen Ghannam wrote:
> The use case is "disable MCA polling". I just gave two examples of how
> this can be done.

Our documentation says:

                ignore_ce
                        disable features for corrected errors, e.g.
                        polling timer and CMCI.  All events reported as 
                        corrected are not cleared by OS and remained in its
                        error banks.

                        Usually this disablement is not recommended, however
                        if there is an agent checking/clearing corrected
                        errors (e.g. BIOS or hardware monitoring 
                        applications), conflicting with OS's error handling,
                        and you cannot deactivate the agent, then this option
                        will be a help.

it basically disables all: polling *and* CMCI.

So why do we even bother with storms?

> We can focus on "check_interval=0". The user wants to disable MCA
> polling and rely only on interrupts. They still want to see the CEs.

Is that a use case we support?

Where is that documented?

I can see why someone would want to avoid the recurrent polling but I'm not
sure we explicitly say that somewhere in the text...

> When the storm ends, the kernel should go back to how things were before
> the storm. If there was a timer before, then go back to the old
> interval. If there was *not* a timer before, then delete/remove the
> timer.

That I agree with.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v3 17/17] x86/mce: Restore poll settings after storm subsides
Posted by Yazen Ghannam 7 months, 1 week ago
On Tue, May 13, 2025 at 07:55:43PM +0200, Borislav Petkov wrote:
> On Mon, May 12, 2025 at 11:43:15AM -0400, Yazen Ghannam wrote:
> > The use case is "disable MCA polling". I just gave two examples of how
> > this can be done.
> 
> Our documentation says:
> 
>                 ignore_ce
>                         disable features for corrected errors, e.g.
>                         polling timer and CMCI.  All events reported as 
>                         corrected are not cleared by OS and remained in its
>                         error banks.
> 
>                         Usually this disablement is not recommended, however
>                         if there is an agent checking/clearing corrected
>                         errors (e.g. BIOS or hardware monitoring 
>                         applications), conflicting with OS's error handling,
>                         and you cannot deactivate the agent, then this option
>                         will be a help.
> 
> it basically disables all: polling *and* CMCI.
> 
> So why do we even bother with storms?
> 

I think I see your point. The AMD MCA Thresholding init flow doesn't
check "ignore_ce". We should fix that to have more feature parity
between vendors.

Another item for the to do list. :)

> > We can focus on "check_interval=0". The user wants to disable MCA
> > polling and rely only on interrupts. They still want to see the CEs.
> 
> Is that a use case we support?
> 
> Where is that documented?
> 
> I can see why someone would want to avoid the recurrent polling but I'm not
> sure we explicitly say that somewhere in the text...
> 

Right, good point. This may come down to another vendor difference.

On Intel, the set of banks for polling and for CMCI are mutually
exclusive. So polling can be effectively disabled if all banks support
CMCI.

On AMD, polling and interrupt are independent. We still poll all banks
even if they are interrupt-capable. I think we discussed this in a
previous revision of this set.

So I think we should document this use case. And also consider changing
the polling/interrupt behavior on AMD. Even if this use case is
documented, it still requires user intervention which we can avoid if we
change the kernel behavior.

> > When the storm ends, the kernel should go back to how things were before
> > the storm. If there was a timer before, then go back to the old
> > interval. If there was *not* a timer before, then delete/remove the
> > timer.
> 
> That I agree with.
> 

Okay.

Thanks,
Yazen
RE: [PATCH v3 17/17] x86/mce: Restore poll settings after storm subsides
Posted by Luck, Tony 7 months, 1 week ago
> On AMD, polling and interrupt are independent. We still poll all banks
> even if they are interrupt-capable. I think we discussed this in a
> previous revision of this set.

Can you race and double report the same error if a polling interval
and interrupt happen together?

Disabling polling for interrupt capable banks happened before I
started looking at this code. But I assumed it was to avoid double
report.

-Tony
Re: [PATCH v3 17/17] x86/mce: Restore poll settings after storm subsides
Posted by Yazen Ghannam 7 months, 1 week ago
On Tue, May 13, 2025 at 10:07:13PM +0000, Luck, Tony wrote:
> > On AMD, polling and interrupt are independent. We still poll all banks
> > even if they are interrupt-capable. I think we discussed this in a
> > previous revision of this set.
> 
> Can you race and double report the same error if a polling interval
> and interrupt happen together?
> 

Maybe, but probably very unlikely.

On AMD, MCA bank management is always 'local', i.e. per-CPU.

If a CPU is in the polling function, can it be preempted by an interrupt
(not MCE)?

> Disabling polling for interrupt capable banks happened before I
> started looking at this code. But I assumed it was to avoid double
> report.
> 

Ah okay. I assumed it was a performance thing too. But maybe that's just
a nice side effect.

Thanks,
Yazen
Re: [PATCH v3 17/17] x86/mce: Restore poll settings after storm subsides
Posted by Borislav Petkov 7 months, 1 week ago
On Wed, May 14, 2025 at 10:34:16AM -0400, Yazen Ghannam wrote:
> On AMD, MCA bank management is always 'local', i.e. per-CPU.
> 
> If a CPU is in the polling function, can it be preempted by an interrupt
> (not MCE)?

Well, ofc. We're polling with interrupts enabled.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v3 17/17] x86/mce: Restore poll settings after storm subsides
Posted by Yazen Ghannam 7 months, 1 week ago
On Thu, May 15, 2025 at 02:37:08PM +0200, Borislav Petkov wrote:
> On Wed, May 14, 2025 at 10:34:16AM -0400, Yazen Ghannam wrote:
> > On AMD, MCA bank management is always 'local', i.e. per-CPU.
> > 
> > If a CPU is in the polling function, can it be preempted by an interrupt
> > (not MCE)?
> 
> Well, ofc. We're polling with interrupts enabled.
> 

Right.

The polling function is called from a timer. I expect the timer is
checked during a timer tick or during rescheduling.

Even though these events are interrupt-driven, it doesn't make sense to
stay in interrupt context the whole time. I think this is where my
thoughts were.

So there's a slight change of double counting errors if the polling
function is interrupted between reading MCA_STATUS in a bank and
clearing it.

Thanks,
Yazen
Re: [PATCH v3 17/17] x86/mce: Restore poll settings after storm subsides
Posted by Luck, Tony 7 months, 1 week ago
On Mon, May 12, 2025 at 11:43:15AM -0400, Yazen Ghannam wrote:
> On Mon, May 12, 2025 at 09:46:16AM +0200, Borislav Petkov wrote:
> > On Tue, Apr 15, 2025 at 02:55:12PM +0000, Yazen Ghannam wrote:
> > > Users can disable MCA polling by setting the "ignore_ce" parameter or by
> > > setting "check_interval=0". This tells the kernel to *not* start the MCE
> > > timer on a CPU.
> > > 
> > > During a CMCI storm, the MCE timer will be started with a fixed
> > > interval.

I think you just need some more words at the start of this second
paragraph to avoid confusion when reading together with the previous
one.

Perhaps:

If the user did not disable CMCI, then storms can occur. When these
happen the MCE timer will be started ...

-Tony
Re: [PATCH v3 17/17] x86/mce: Restore poll settings after storm subsides
Posted by Yazen Ghannam 7 months, 1 week ago
On Mon, May 12, 2025 at 08:53:08AM -0700, Luck, Tony wrote:
> On Mon, May 12, 2025 at 11:43:15AM -0400, Yazen Ghannam wrote:
> > On Mon, May 12, 2025 at 09:46:16AM +0200, Borislav Petkov wrote:
> > > On Tue, Apr 15, 2025 at 02:55:12PM +0000, Yazen Ghannam wrote:
> > > > Users can disable MCA polling by setting the "ignore_ce" parameter or by
> > > > setting "check_interval=0". This tells the kernel to *not* start the MCE
> > > > timer on a CPU.
> > > > 
> > > > During a CMCI storm, the MCE timer will be started with a fixed
> > > > interval.
> 
> I think you just need some more words at the start of this second
> paragraph to avoid confusion when reading together with the previous
> one.
> 
> Perhaps:
> 
> If the user did not disable CMCI, then storms can occur. When these
> happen the MCE timer will be started ...
> 

Yes, good point. I'll update the commit message.

Thanks,
Yazen