Users can disable MCA polling by setting the "ignore_ce" parameter or by
setting "check_interval=0". This tells the kernel to *not* start the MCE
timer on a CPU.
During a CMCI storm, the MCE timer will be started with a fixed
interval. After the storm subsides, the timer's next interval is set to
check_interval.
This disregards the user's input through "ignore_ce" and
"check_interval". Furthermore, if "check_interval=0", then the new timer
will run faster than expected.
Create a new helper to check these conditions and use it when a CMCI
storm ends.
Fixes: 7eae17c4add5 ("x86/mce: Add per-bank CMCI storm mitigation")
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Cc: stable@vger.kernel.org
---
Notes:
v2->v3:
* New in v3.
arch/x86/kernel/cpu/mce/core.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 0a2a97681266..131015f5eadc 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1806,6 +1806,11 @@ static void mc_poll_banks_default(void)
void (*mc_poll_banks)(void) = mc_poll_banks_default;
+static bool should_enable_timer(unsigned long iv)
+{
+ return !mca_cfg.ignore_ce && iv;
+}
+
static void mce_timer_fn(struct timer_list *t)
{
struct timer_list *cpu_t = this_cpu_ptr(&mce_timer);
@@ -1829,7 +1834,7 @@ static void mce_timer_fn(struct timer_list *t)
if (mce_get_storm_mode()) {
__start_timer(t, HZ);
- } else {
+ } else if (should_enable_timer(iv)) {
__this_cpu_write(mce_next_interval, iv);
__start_timer(t, iv);
}
@@ -2142,7 +2147,7 @@ static void mce_start_timer(struct timer_list *t)
{
unsigned long iv = check_interval * HZ;
- if (mca_cfg.ignore_ce || !iv)
+ if (!should_enable_timer(iv))
return;
this_cpu_write(mce_next_interval, iv);
--
2.49.0
On Tue, Apr 15, 2025 at 02:55:12PM +0000, Yazen Ghannam wrote:
> Users can disable MCA polling by setting the "ignore_ce" parameter or by
> setting "check_interval=0". This tells the kernel to *not* start the MCE
> timer on a CPU.
>
> During a CMCI storm, the MCE timer will be started with a fixed
> interval.
Why?
If a user doesn't want CEs, why are we even bothering with CMCI storms?
Might as well disable the storm handling code altogether...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Mon, May 12, 2025 at 09:46:16AM +0200, Borislav Petkov wrote: > On Tue, Apr 15, 2025 at 02:55:12PM +0000, Yazen Ghannam wrote: > > Users can disable MCA polling by setting the "ignore_ce" parameter or by > > setting "check_interval=0". This tells the kernel to *not* start the MCE > > timer on a CPU. > > > > During a CMCI storm, the MCE timer will be started with a fixed > > interval. > > Why? > > If a user doesn't want CEs, why are we even bothering with CMCI storms? > > Might as well disable the storm handling code altogether... > The use case is "disable MCA polling". I just gave two examples of how this can be done. We can focus on "check_interval=0". The user wants to disable MCA polling and rely only on interrupts. They still want to see the CEs. If an interrupt storm happens, the kernel will switch to polling with our MCE timer. When the storm ends, the kernel should go back to how things were before the storm. If there was a timer before, then go back to the old interval. If there was *not* a timer before, then delete/remove the timer. Thanks, Yazen
On Mon, May 12, 2025 at 11:43:15AM -0400, Yazen Ghannam wrote:
> The use case is "disable MCA polling". I just gave two examples of how
> this can be done.
Our documentation says:
ignore_ce
disable features for corrected errors, e.g.
polling timer and CMCI. All events reported as
corrected are not cleared by OS and remained in its
error banks.
Usually this disablement is not recommended, however
if there is an agent checking/clearing corrected
errors (e.g. BIOS or hardware monitoring
applications), conflicting with OS's error handling,
and you cannot deactivate the agent, then this option
will be a help.
it basically disables all: polling *and* CMCI.
So why do we even bother with storms?
> We can focus on "check_interval=0". The user wants to disable MCA
> polling and rely only on interrupts. They still want to see the CEs.
Is that a use case we support?
Where is that documented?
I can see why someone would want to avoid the recurrent polling but I'm not
sure we explicitly say that somewhere in the text...
> When the storm ends, the kernel should go back to how things were before
> the storm. If there was a timer before, then go back to the old
> interval. If there was *not* a timer before, then delete/remove the
> timer.
That I agree with.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Tue, May 13, 2025 at 07:55:43PM +0200, Borislav Petkov wrote: > On Mon, May 12, 2025 at 11:43:15AM -0400, Yazen Ghannam wrote: > > The use case is "disable MCA polling". I just gave two examples of how > > this can be done. > > Our documentation says: > > ignore_ce > disable features for corrected errors, e.g. > polling timer and CMCI. All events reported as > corrected are not cleared by OS and remained in its > error banks. > > Usually this disablement is not recommended, however > if there is an agent checking/clearing corrected > errors (e.g. BIOS or hardware monitoring > applications), conflicting with OS's error handling, > and you cannot deactivate the agent, then this option > will be a help. > > it basically disables all: polling *and* CMCI. > > So why do we even bother with storms? > I think I see your point. The AMD MCA Thresholding init flow doesn't check "ignore_ce". We should fix that to have more feature parity between vendors. Another item for the to do list. :) > > We can focus on "check_interval=0". The user wants to disable MCA > > polling and rely only on interrupts. They still want to see the CEs. > > Is that a use case we support? > > Where is that documented? > > I can see why someone would want to avoid the recurrent polling but I'm not > sure we explicitly say that somewhere in the text... > Right, good point. This may come down to another vendor difference. On Intel, the set of banks for polling and for CMCI are mutually exclusive. So polling can be effectively disabled if all banks support CMCI. On AMD, polling and interrupt are independent. We still poll all banks even if they are interrupt-capable. I think we discussed this in a previous revision of this set. So I think we should document this use case. And also consider changing the polling/interrupt behavior on AMD. Even if this use case is documented, it still requires user intervention which we can avoid if we change the kernel behavior. > > When the storm ends, the kernel should go back to how things were before > > the storm. If there was a timer before, then go back to the old > > interval. If there was *not* a timer before, then delete/remove the > > timer. > > That I agree with. > Okay. Thanks, Yazen
> On AMD, polling and interrupt are independent. We still poll all banks > even if they are interrupt-capable. I think we discussed this in a > previous revision of this set. Can you race and double report the same error if a polling interval and interrupt happen together? Disabling polling for interrupt capable banks happened before I started looking at this code. But I assumed it was to avoid double report. -Tony
On Tue, May 13, 2025 at 10:07:13PM +0000, Luck, Tony wrote: > > On AMD, polling and interrupt are independent. We still poll all banks > > even if they are interrupt-capable. I think we discussed this in a > > previous revision of this set. > > Can you race and double report the same error if a polling interval > and interrupt happen together? > Maybe, but probably very unlikely. On AMD, MCA bank management is always 'local', i.e. per-CPU. If a CPU is in the polling function, can it be preempted by an interrupt (not MCE)? > Disabling polling for interrupt capable banks happened before I > started looking at this code. But I assumed it was to avoid double > report. > Ah okay. I assumed it was a performance thing too. But maybe that's just a nice side effect. Thanks, Yazen
On Wed, May 14, 2025 at 10:34:16AM -0400, Yazen Ghannam wrote:
> On AMD, MCA bank management is always 'local', i.e. per-CPU.
>
> If a CPU is in the polling function, can it be preempted by an interrupt
> (not MCE)?
Well, ofc. We're polling with interrupts enabled.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, May 15, 2025 at 02:37:08PM +0200, Borislav Petkov wrote: > On Wed, May 14, 2025 at 10:34:16AM -0400, Yazen Ghannam wrote: > > On AMD, MCA bank management is always 'local', i.e. per-CPU. > > > > If a CPU is in the polling function, can it be preempted by an interrupt > > (not MCE)? > > Well, ofc. We're polling with interrupts enabled. > Right. The polling function is called from a timer. I expect the timer is checked during a timer tick or during rescheduling. Even though these events are interrupt-driven, it doesn't make sense to stay in interrupt context the whole time. I think this is where my thoughts were. So there's a slight change of double counting errors if the polling function is interrupted between reading MCA_STATUS in a bank and clearing it. Thanks, Yazen
On Mon, May 12, 2025 at 11:43:15AM -0400, Yazen Ghannam wrote: > On Mon, May 12, 2025 at 09:46:16AM +0200, Borislav Petkov wrote: > > On Tue, Apr 15, 2025 at 02:55:12PM +0000, Yazen Ghannam wrote: > > > Users can disable MCA polling by setting the "ignore_ce" parameter or by > > > setting "check_interval=0". This tells the kernel to *not* start the MCE > > > timer on a CPU. > > > > > > During a CMCI storm, the MCE timer will be started with a fixed > > > interval. I think you just need some more words at the start of this second paragraph to avoid confusion when reading together with the previous one. Perhaps: If the user did not disable CMCI, then storms can occur. When these happen the MCE timer will be started ... -Tony
On Mon, May 12, 2025 at 08:53:08AM -0700, Luck, Tony wrote: > On Mon, May 12, 2025 at 11:43:15AM -0400, Yazen Ghannam wrote: > > On Mon, May 12, 2025 at 09:46:16AM +0200, Borislav Petkov wrote: > > > On Tue, Apr 15, 2025 at 02:55:12PM +0000, Yazen Ghannam wrote: > > > > Users can disable MCA polling by setting the "ignore_ce" parameter or by > > > > setting "check_interval=0". This tells the kernel to *not* start the MCE > > > > timer on a CPU. > > > > > > > > During a CMCI storm, the MCE timer will be started with a fixed > > > > interval. > > I think you just need some more words at the start of this second > paragraph to avoid confusion when reading together with the previous > one. > > Perhaps: > > If the user did not disable CMCI, then storms can occur. When these > happen the MCE timer will be started ... > Yes, good point. I'll update the commit message. Thanks, Yazen
© 2016 - 2025 Red Hat, Inc.