[v2] x86/mce: Do away with unnecessary context quirks

[PATCH v2] x86/mce: Do away with unnecessary context quirks

Posted by Yazen Ghannam 1 month, 2 weeks ago

Both Intel and AMD have quirks related to recovery in the Instruction
Fetch Units. The common issue is that MCG_STATUS[RIPV] and
MCG_STATUS[EIPV] are set to '0', so Linux will not save the CS and IP
registers. The severity grading functions will later see that CS=0, so it
is assumed that the #MC was taken in kernel context. This leads to a
kernel panic even if the #MC was recoverable and in user context.

RIPV is "restart IP valid" which means program execution can restart at
the IP on the stack. This is a general indicator on whether system
software should try to return to the executing process or not. The exact
value is not needed by MCE handling.

EIPV is "error IP valid" which means the IP on the stack is directly
associated with the error. This is a specific indicator that the saved
IP is exactly where the #MC was taken. System software can share this
for debugging and/or try to take further recovery actions based on the
nature of the code represented by the IP.

Neither of these refer to the CS register which is used to determine
the execution context privilege level.

It is not clear why CS and IP are tied together in the Linux handling.
This could be a carryover from 32-bit execution where "IP" is the
combination of "CS:IP". But it not apparent if this "IP=CS:IP"
association, as applies to MCE handling, is a Linux assumption or
explicitly noted in x86 documentation when describing RIPV/EIPV.

It is clear that in the affected use cases, the processor context is
valid in general. And the only variable is the IP validity which is
explicitly based on RIPV/EIPV. An invalid CPU context is represented by
the MCA_STATUS[PCC] "Processor Context Corrupt" bit.

Avoid the need for these context quirks by refactoring the Linux MCE
handling code to treat the CS and IP registers independently.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---
Link:
https://lore.kernel.org/r/20250813154455.162489-1-yazen.ghannam@amd.com

v1->v2:
* Minimize changes to only code related to context quirks.

 arch/x86/kernel/cpu/mce/core.c     | 83 +++++-------------------------
 arch/x86/kernel/cpu/mce/internal.h |  8 +--
 2 files changed, 13 insertions(+), 78 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4da4eab56c81..a26534a914ec 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -470,22 +470,23 @@ static noinstr void mce_gather_info(struct mce_hw_err *err, struct pt_regs *regs
 	m = &err->m;
 	m->mcgstatus = mce_rdmsrq(MSR_IA32_MCG_STATUS);
 	if (regs) {
+		m->cs = regs->cs;
+
+		/*
+		 * When in VM86 mode make the cs look like ring 3
+		 * always. This is a lie, but it's better than passing
+		 * the additional vm86 bit around everywhere.
+		 */
+		if (v8086_mode(regs))
+			m->cs |= 3;
+
 		/*
 		 * Get the address of the instruction at the time of
 		 * the machine check error.
 		 */
-		if (m->mcgstatus & (MCG_STATUS_RIPV|MCG_STATUS_EIPV)) {
+		if (m->mcgstatus & (MCG_STATUS_RIPV | MCG_STATUS_EIPV))
 			m->ip = regs->ip;
-			m->cs = regs->cs;
-
-			/*
-			 * When in VM86 mode make the cs look like ring 3
-			 * always. This is a lie, but it's better than passing
-			 * the additional vm86 bit around everywhere.
-			 */
-			if (v8086_mode(regs))
-				m->cs |= 3;
-		}
+
 		/* Use accurate RIP reporting if available. */
 		if (mca_cfg.rip_msr)
 			m->ip = mce_rdmsrq(mca_cfg.rip_msr);
@@ -841,35 +842,6 @@ void machine_check_poll(enum mcp_flags flags, mce_banks_t *b)
 }
 EXPORT_SYMBOL_GPL(machine_check_poll);
 
-/*
- * During IFU recovery Sandy Bridge -EP4S processors set the RIPV and
- * EIPV bits in MCG_STATUS to zero on the affected logical processor (SDM
- * Vol 3B Table 15-20). But this confuses both the code that determines
- * whether the machine check occurred in kernel or user mode, and also
- * the severity assessment code. Pretend that EIPV was set, and take the
- * ip/cs values from the pt_regs that mce_gather_info() ignored earlier.
- */
-static __always_inline void
-quirk_sandybridge_ifu(int bank, struct mce *m, struct pt_regs *regs)
-{
-	if (bank != 0)
-		return;
-	if ((m->mcgstatus & (MCG_STATUS_EIPV|MCG_STATUS_RIPV)) != 0)
-		return;
-	if ((m->status & (MCI_STATUS_OVER|MCI_STATUS_UC|
-		          MCI_STATUS_EN|MCI_STATUS_MISCV|MCI_STATUS_ADDRV|
-			  MCI_STATUS_PCC|MCI_STATUS_S|MCI_STATUS_AR|
-			  MCACOD)) !=
-			 (MCI_STATUS_UC|MCI_STATUS_EN|
-			  MCI_STATUS_MISCV|MCI_STATUS_ADDRV|MCI_STATUS_S|
-			  MCI_STATUS_AR|MCACOD_INSTR))
-		return;
-
-	m->mcgstatus |= MCG_STATUS_EIPV;
-	m->ip = regs->ip;
-	m->cs = regs->cs;
-}
-
 /*
  * Disable fast string copy and return from the MCE handler upon the first SRAR
  * MCE on bank 1 due to a CPU erratum on Intel Skylake/Cascade Lake/Cooper Lake
@@ -923,26 +895,6 @@ static noinstr bool quirk_skylake_repmov(void)
 	return false;
 }
 
-/*
- * Some Zen-based Instruction Fetch Units set EIPV=RIPV=0 on poison consumption
- * errors. This means mce_gather_info() will not save the "ip" and "cs" registers.
- *
- * However, the context is still valid, so save the "cs" register for later use.
- *
- * The "ip" register is truly unknown, so don't save it or fixup EIPV/RIPV.
- *
- * The Instruction Fetch Unit is at MCA bank 1 for all affected systems.
- */
-static __always_inline void quirk_zen_ifu(int bank, struct mce *m, struct pt_regs *regs)
-{
-	if (bank != 1)
-		return;
-	if (!(m->status & MCI_STATUS_POISON))
-		return;
-
-	m->cs = regs->cs;
-}
-
 /*
  * Do a quick check if any of the events requires a panic.
  * This decides if we keep the events around or clear them.
@@ -960,11 +912,6 @@ static __always_inline int mce_no_way_out(struct mce_hw_err *err, char **msg, un
 			continue;
 
 		arch___set_bit(i, validp);
-		if (mce_flags.snb_ifu_quirk)
-			quirk_sandybridge_ifu(i, m, regs);
-
-		if (mce_flags.zen_ifu_quirk)
-			quirk_zen_ifu(i, m, regs);
 
 		m->bank = i;
 		if (mce_severity(m, regs, &tmp, true) >= MCE_PANIC_SEVERITY) {
@@ -1950,9 +1897,6 @@ static void apply_quirks_amd(struct cpuinfo_x86 *c)
 	 */
 	if (c->x86 == 0x15 && c->x86_model <= 0xf)
 		mce_flags.overflow_recov = 1;
-
-	if (c->x86 >= 0x17 && c->x86 <= 0x1A)
-		mce_flags.zen_ifu_quirk = 1;
 }
 
 static void apply_quirks_intel(struct cpuinfo_x86 *c)
@@ -1988,9 +1932,6 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c)
 	if (c->x86_vfm < INTEL_CORE_YONAH && mca_cfg.bootlog < 0)
 		mca_cfg.bootlog = 0;
 
-	if (c->x86_vfm == INTEL_SANDYBRIDGE_X)
-		mce_flags.snb_ifu_quirk = 1;
-
 	/*
 	 * Skylake, Cascacde Lake and Cooper Lake require a quirk on
 	 * rep movs.
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index b5ba598e54cb..59a94daa31ad 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -211,9 +211,6 @@ struct mce_vendor_flags {
 	 */
 	smca			: 1,
 
-	/* Zen IFU quirk */
-	zen_ifu_quirk		: 1,
-
 	/* AMD-style error thresholding banks present. */
 	amd_threshold		: 1,
 
@@ -223,13 +220,10 @@ struct mce_vendor_flags {
 	/* Centaur Winchip C6-style MCA */
 	winchip			: 1,
 
-	/* SandyBridge IFU quirk */
-	snb_ifu_quirk		: 1,
-
 	/* Skylake, Cascade Lake, Cooper Lake REP;MOVS* quirk */
 	skx_repmov_quirk	: 1,
 
-	__reserved_0		: 55;
+	__reserved_0		: 57;
 };
 
 extern struct mce_vendor_flags mce_flags;

base-commit: 0cc53520e68bea7fb80fdc6bdf8d226d1b6a98d9
-- 
2.50.1

Re: [PATCH v2] x86/mce: Do away with unnecessary context quirks

Posted by Luck, Tony 1 month, 2 weeks ago

On Thu, Aug 14, 2025 at 11:48:09AM -0400, Yazen Ghannam wrote:
> -/*
> - * During IFU recovery Sandy Bridge -EP4S processors set the RIPV and
> - * EIPV bits in MCG_STATUS to zero on the affected logical processor (SDM
> - * Vol 3B Table 15-20). But this confuses both the code that determines
> - * whether the machine check occurred in kernel or user mode, and also
> - * the severity assessment code. Pretend that EIPV was set, and take the
> - * ip/cs values from the pt_regs that mce_gather_info() ignored earlier.
> - */
> -static __always_inline void
> -quirk_sandybridge_ifu(int bank, struct mce *m, struct pt_regs *regs)
> -{
> -	if (bank != 0)
> -		return;
> -	if ((m->mcgstatus & (MCG_STATUS_EIPV|MCG_STATUS_RIPV)) != 0)
> -		return;
> -	if ((m->status & (MCI_STATUS_OVER|MCI_STATUS_UC|
> -		          MCI_STATUS_EN|MCI_STATUS_MISCV|MCI_STATUS_ADDRV|
> -			  MCI_STATUS_PCC|MCI_STATUS_S|MCI_STATUS_AR|
> -			  MCACOD)) !=
> -			 (MCI_STATUS_UC|MCI_STATUS_EN|
> -			  MCI_STATUS_MISCV|MCI_STATUS_ADDRV|MCI_STATUS_S|
> -			  MCI_STATUS_AR|MCACOD_INSTR))
> -		return;
> -
> -	m->mcgstatus |= MCG_STATUS_EIPV;

I don't think this part of the Sandybridge quirk is covered in your
new code. Without EIPV set, the Intel severity table driven code will
fail to note this as recoverable.

-Tony

Re: [PATCH v2] x86/mce: Do away with unnecessary context quirks

Posted by Yazen Ghannam 1 month, 2 weeks ago

On Thu, Aug 14, 2025 at 09:54:54AM -0700, Luck, Tony wrote:
> On Thu, Aug 14, 2025 at 11:48:09AM -0400, Yazen Ghannam wrote:
> > -/*
> > - * During IFU recovery Sandy Bridge -EP4S processors set the RIPV and
> > - * EIPV bits in MCG_STATUS to zero on the affected logical processor (SDM
> > - * Vol 3B Table 15-20). But this confuses both the code that determines
> > - * whether the machine check occurred in kernel or user mode, and also
> > - * the severity assessment code. Pretend that EIPV was set, and take the
> > - * ip/cs values from the pt_regs that mce_gather_info() ignored earlier.
> > - */
> > -static __always_inline void
> > -quirk_sandybridge_ifu(int bank, struct mce *m, struct pt_regs *regs)
> > -{
> > -	if (bank != 0)
> > -		return;
> > -	if ((m->mcgstatus & (MCG_STATUS_EIPV|MCG_STATUS_RIPV)) != 0)
> > -		return;
> > -	if ((m->status & (MCI_STATUS_OVER|MCI_STATUS_UC|
> > -		          MCI_STATUS_EN|MCI_STATUS_MISCV|MCI_STATUS_ADDRV|
> > -			  MCI_STATUS_PCC|MCI_STATUS_S|MCI_STATUS_AR|
> > -			  MCACOD)) !=
> > -			 (MCI_STATUS_UC|MCI_STATUS_EN|
> > -			  MCI_STATUS_MISCV|MCI_STATUS_ADDRV|MCI_STATUS_S|
> > -			  MCI_STATUS_AR|MCACOD_INSTR))
> > -		return;
> > -
> > -	m->mcgstatus |= MCG_STATUS_EIPV;
> 
> I don't think this part of the Sandybridge quirk is covered in your
> new code. Without EIPV set, the Intel severity table driven code will
> fail to note this as recoverable.
> 

Which severity do you mean? EIPV is not needed to be set in any of them.

Unless you mean this check:

	if (!mc_recoverable(m->mcgstatus))
		return IN_KERNEL;

This would never give the "IN_KERNEL_RECOV" context.

And this is the only case affected:
"Action required: data load in error recoverable area of kernel"

But that is for "Data": MCACOD_DATA

And the quirk is for "Instructions": MCACOD_INSTR

So I *think* we're covered.

I got the impression that setting EIPV in the quirk was to fake our way
through getting the CS register. It seemed to me that it wasn't directly
needed for severity grading in the quirky case.

If we unconditionally get the CS register, then we no longer need to
fake EIPV. At least, that is my understanding.

Thanks,
Yazen

Re: [PATCH v2] x86/mce: Do away with unnecessary context quirks

Posted by Luck, Tony 1 month, 2 weeks ago

On Thu, Aug 14, 2025 at 03:30:56PM -0400, Yazen Ghannam wrote:
> On Thu, Aug 14, 2025 at 09:54:54AM -0700, Luck, Tony wrote:
> > On Thu, Aug 14, 2025 at 11:48:09AM -0400, Yazen Ghannam wrote:
> > > -/*
> > > - * During IFU recovery Sandy Bridge -EP4S processors set the RIPV and
> > > - * EIPV bits in MCG_STATUS to zero on the affected logical processor (SDM
> > > - * Vol 3B Table 15-20). But this confuses both the code that determines
> > > - * whether the machine check occurred in kernel or user mode, and also
> > > - * the severity assessment code. Pretend that EIPV was set, and take the
> > > - * ip/cs values from the pt_regs that mce_gather_info() ignored earlier.
> > > - */
> > > -static __always_inline void
> > > -quirk_sandybridge_ifu(int bank, struct mce *m, struct pt_regs *regs)
> > > -{
> > > -	if (bank != 0)
> > > -		return;
> > > -	if ((m->mcgstatus & (MCG_STATUS_EIPV|MCG_STATUS_RIPV)) != 0)
> > > -		return;
> > > -	if ((m->status & (MCI_STATUS_OVER|MCI_STATUS_UC|
> > > -		          MCI_STATUS_EN|MCI_STATUS_MISCV|MCI_STATUS_ADDRV|
> > > -			  MCI_STATUS_PCC|MCI_STATUS_S|MCI_STATUS_AR|
> > > -			  MCACOD)) !=
> > > -			 (MCI_STATUS_UC|MCI_STATUS_EN|
> > > -			  MCI_STATUS_MISCV|MCI_STATUS_ADDRV|MCI_STATUS_S|
> > > -			  MCI_STATUS_AR|MCACOD_INSTR))
> > > -		return;
> > > -
> > > -	m->mcgstatus |= MCG_STATUS_EIPV;
> > 
> > I don't think this part of the Sandybridge quirk is covered in your
> > new code. Without EIPV set, the Intel severity table driven code will
> > fail to note this as recoverable.
> > 
> 
> Which severity do you mean? EIPV is not needed to be set in any of them.
> 
> Unless you mean this check:
> 
> 	if (!mc_recoverable(m->mcgstatus))
> 		return IN_KERNEL;
> 
> This would never give the "IN_KERNEL_RECOV" context.
> 
> And this is the only case affected:
> "Action required: data load in error recoverable area of kernel"
> 
> But that is for "Data": MCACOD_DATA
> 
> And the quirk is for "Instructions": MCACOD_INSTR
> 
> So I *think* we're covered.
> 
> I got the impression that setting EIPV in the quirk was to fake our way
> through getting the CS register. It seemed to me that it wasn't directly
> needed for severity grading in the quirky case.
> 
> If we unconditionally get the CS register, then we no longer need to
> fake EIPV. At least, that is my understanding.
 
Yazen,

It's horribly subtle.  On Sandybridge machine check bank 0 is shared by
the two hyperthreads on a core, and machine checks are always broadcast.

For instruction poison consumption both threads on the core see the
machine check and the same IA32_MC0_STATUS value.

IA32_MCG_STATUS is per-thread.

The thread that tried to consume the poison sees: RIPV=0, EIPV=0, MCIP=1

The innocent bystander thread sees: RIPV=1, EIPV=0, MCIP=1

The innocent bystander matches this entry in the severity table:

        MCESEV(
                KEEP, "Action required but unaffected thread is continuable",
                SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR, MCI_UC_SAR|MCI_ADDR),
                MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, MCG_STATUS_RIPV)
                ),

I need the consuming thread to match this one:

        MCESEV(
                AR, "Action required: instruction fetch error in a user process",
                SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR),
                USER
                ),


But the first match nature of the table means that this rule hits
(becauase neither or RIPV or EIPV is set):

        /* Neither return not error IP -- no chance to recover -> PANIC */
        MCESEV(
                PANIC, "Neither restart nor error IP",
                EXCP, MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0)
                ),

-Tony

Re: [PATCH v2] x86/mce: Do away with unnecessary context quirks

Posted by Yazen Ghannam 1 month, 2 weeks ago

On Thu, Aug 14, 2025 at 12:52:19PM -0700, Luck, Tony wrote:
> On Thu, Aug 14, 2025 at 03:30:56PM -0400, Yazen Ghannam wrote:
> > On Thu, Aug 14, 2025 at 09:54:54AM -0700, Luck, Tony wrote:
> > > On Thu, Aug 14, 2025 at 11:48:09AM -0400, Yazen Ghannam wrote:
> > > > -/*
> > > > - * During IFU recovery Sandy Bridge -EP4S processors set the RIPV and
> > > > - * EIPV bits in MCG_STATUS to zero on the affected logical processor (SDM
> > > > - * Vol 3B Table 15-20). But this confuses both the code that determines
> > > > - * whether the machine check occurred in kernel or user mode, and also
> > > > - * the severity assessment code. Pretend that EIPV was set, and take the
> > > > - * ip/cs values from the pt_regs that mce_gather_info() ignored earlier.
> > > > - */
> > > > -static __always_inline void
> > > > -quirk_sandybridge_ifu(int bank, struct mce *m, struct pt_regs *regs)
> > > > -{
> > > > -	if (bank != 0)
> > > > -		return;
> > > > -	if ((m->mcgstatus & (MCG_STATUS_EIPV|MCG_STATUS_RIPV)) != 0)
> > > > -		return;
> > > > -	if ((m->status & (MCI_STATUS_OVER|MCI_STATUS_UC|
> > > > -		          MCI_STATUS_EN|MCI_STATUS_MISCV|MCI_STATUS_ADDRV|
> > > > -			  MCI_STATUS_PCC|MCI_STATUS_S|MCI_STATUS_AR|
> > > > -			  MCACOD)) !=
> > > > -			 (MCI_STATUS_UC|MCI_STATUS_EN|
> > > > -			  MCI_STATUS_MISCV|MCI_STATUS_ADDRV|MCI_STATUS_S|
> > > > -			  MCI_STATUS_AR|MCACOD_INSTR))
> > > > -		return;
> > > > -
> > > > -	m->mcgstatus |= MCG_STATUS_EIPV;
> > > 
> > > I don't think this part of the Sandybridge quirk is covered in your
> > > new code. Without EIPV set, the Intel severity table driven code will
> > > fail to note this as recoverable.
> > > 
> > 
> > Which severity do you mean? EIPV is not needed to be set in any of them.
> > 
> > Unless you mean this check:
> > 
> > 	if (!mc_recoverable(m->mcgstatus))
> > 		return IN_KERNEL;
> > 
> > This would never give the "IN_KERNEL_RECOV" context.
> > 
> > And this is the only case affected:
> > "Action required: data load in error recoverable area of kernel"
> > 
> > But that is for "Data": MCACOD_DATA
> > 
> > And the quirk is for "Instructions": MCACOD_INSTR
> > 
> > So I *think* we're covered.
> > 
> > I got the impression that setting EIPV in the quirk was to fake our way
> > through getting the CS register. It seemed to me that it wasn't directly
> > needed for severity grading in the quirky case.
> > 
> > If we unconditionally get the CS register, then we no longer need to
> > fake EIPV. At least, that is my understanding.
>  
> Yazen,
> 
> It's horribly subtle.  On Sandybridge machine check bank 0 is shared by
> the two hyperthreads on a core, and machine checks are always broadcast.
> 
> For instruction poison consumption both threads on the core see the
> machine check and the same IA32_MC0_STATUS value.
> 
> IA32_MCG_STATUS is per-thread.
> 
> The thread that tried to consume the poison sees: RIPV=0, EIPV=0, MCIP=1
> 
> The innocent bystander thread sees: RIPV=1, EIPV=0, MCIP=1
> 
> The innocent bystander matches this entry in the severity table:
> 
>         MCESEV(
>                 KEEP, "Action required but unaffected thread is continuable",
>                 SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR, MCI_UC_SAR|MCI_ADDR),
>                 MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, MCG_STATUS_RIPV)
>                 ),
> 
> I need the consuming thread to match this one:
> 
>         MCESEV(
>                 AR, "Action required: instruction fetch error in a user process",
>                 SER, MASK(MCI_STATUS_OVER|MCI_UC_SAR|MCI_ADDR|MCACOD, MCI_UC_SAR|MCI_ADDR|MCACOD_INSTR),
>                 USER
>                 ),
> 
> 
> But the first match nature of the table means that this rule hits
> (becauase neither or RIPV or EIPV is set):
> 
>         /* Neither return not error IP -- no chance to recover -> PANIC */
>         MCESEV(
>                 PANIC, "Neither restart nor error IP",
>                 EXCP, MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0)
>                 ),
> 

Thanks Tony. I see what you mean.

Do we really need this rule? It is essentially the same as the following
rule:

	        MCESEV(
			PANIC, "In kernel and no restart IP",
		        EXCP, KERNEL, MCGMASK(MCG_STATUS_RIPV, 0)
			),

...since we assume "KERNEL" context if RIPV|EIPV are clear after
checking the CS register.

The message is not as explicit though. 

I did have an earlier idea that we introduce an "UNKNOWN" context for
the !pt_regs case.

We could add the "UNKNOWN" context to the "Neither restart nor error IP"
rule. That way it'll be skipped if we have a "USER" context and then it
should match the one you want.

Also, I just saw this in the Intel SDM:

"For the P6 family processors, if the EIPV flag in the MCG_STATUS MSR is
set, the saved contents of CS and EIP registers are directly associated
with the error that caused the machine-check exception to be generated;
if the flag is clear, the saved instruction pointer may not be associated
with the error (see Section 17.3.1.2, “IA32_MCG_STATUS MSR”)."

But I can't tell if this is true just for P6 or all, because the CS
register isn't referenced again with EIPV.

Thanks,
Yazen

Re: [PATCH v2] x86/mce: Do away with unnecessary context quirks

Posted by Luck, Tony 1 month, 2 weeks ago

On Thu, Aug 14, 2025 at 05:07:30PM -0400, Yazen Ghannam wrote:
> On Thu, Aug 14, 2025 at 12:52:19PM -0700, Luck, Tony wrote:
> > But the first match nature of the table means that this rule hits
> > (becauase neither or RIPV or EIPV is set):
> > 
> >         /* Neither return not error IP -- no chance to recover -> PANIC */
> >         MCESEV(
> >                 PANIC, "Neither restart nor error IP",
> >                 EXCP, MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0)
> >                 ),
> > 
> 
> Thanks Tony. I see what you mean.
> 
> Do we really need this rule? It is essentially the same as the following
> rule:
> 
> 	        MCESEV(
> 			PANIC, "In kernel and no restart IP",
> 		        EXCP, KERNEL, MCGMASK(MCG_STATUS_RIPV, 0)
> 			),
> 
> ...since we assume "KERNEL" context if RIPV|EIPV are clear after
> checking the CS register.

I'm not sure this could ever happen. But if it did, I think I'd like
to see that message.
> 
> The message is not as explicit though. 
> 
> I did have an earlier idea that we introduce an "UNKNOWN" context for
> the !pt_regs case.
> 
> We could add the "UNKNOWN" context to the "Neither restart nor error IP"
> rule. That way it'll be skipped if we have a "USER" context and then it
> should match the one you want.

I don't want to do that anywhere execpt that Sandybridge instruction
fetch case (which wasn't classified as an erratum, because the h/w
guys chose to set RIPV==0 and EIPV==0 ... but it was a poor choice.)

> Also, I just saw this in the Intel SDM:
> 
> "For the P6 family processors, if the EIPV flag in the MCG_STATUS MSR is
> set, the saved contents of CS and EIP registers are directly associated
> with the error that caused the machine-check exception to be generated;
> if the flag is clear, the saved instruction pointer may not be associated
> with the error (see Section 17.3.1.2, “IA32_MCG_STATUS MSR”)."
> 
> But I can't tell if this is true just for P6 or all, because the CS
> register isn't referenced again with EIPV.

Should probably have said "P6 and newer". The intent of EIPV is to
indicate that this machine check is because of something that happened
on the current CPU (remember this bit was defined when all #MC on Intel
were broadcast, so knowing which CPU(s) are involved, and which have
just been pulled in to the #MC handler by the broadcast was very
important.

-Tony

Re: [PATCH v2] x86/mce: Do away with unnecessary context quirks

Posted by Yazen Ghannam 1 month, 2 weeks ago

On Thu, Aug 14, 2025 at 03:17:21PM -0700, Luck, Tony wrote:
> On Thu, Aug 14, 2025 at 05:07:30PM -0400, Yazen Ghannam wrote:
> > On Thu, Aug 14, 2025 at 12:52:19PM -0700, Luck, Tony wrote:
> > > But the first match nature of the table means that this rule hits
> > > (becauase neither or RIPV or EIPV is set):
> > > 
> > >         /* Neither return not error IP -- no chance to recover -> PANIC */
> > >         MCESEV(
> > >                 PANIC, "Neither restart nor error IP",
> > >                 EXCP, MCGMASK(MCG_STATUS_RIPV|MCG_STATUS_EIPV, 0)
> > >                 ),
> > > 
> > 
> > Thanks Tony. I see what you mean.
> > 
> > Do we really need this rule? It is essentially the same as the following
> > rule:
> > 
> > 	        MCESEV(
> > 			PANIC, "In kernel and no restart IP",
> > 		        EXCP, KERNEL, MCGMASK(MCG_STATUS_RIPV, 0)
> > 			),
> > 
> > ...since we assume "KERNEL" context if RIPV|EIPV are clear after
> > checking the CS register.
> 
> I'm not sure this could ever happen. But if it did, I think I'd like
> to see that message.
> > 
> > The message is not as explicit though. 
> > 
> > I did have an earlier idea that we introduce an "UNKNOWN" context for
> > the !pt_regs case.
> > 
> > We could add the "UNKNOWN" context to the "Neither restart nor error IP"
> > rule. That way it'll be skipped if we have a "USER" context and then it
> > should match the one you want.
> 
> I don't want to do that anywhere execpt that Sandybridge instruction
> fetch case (which wasn't classified as an erratum, because the h/w
> guys chose to set RIPV==0 and EIPV==0 ... but it was a poor choice.)
> 
> > Also, I just saw this in the Intel SDM:
> > 
> > "For the P6 family processors, if the EIPV flag in the MCG_STATUS MSR is
> > set, the saved contents of CS and EIP registers are directly associated
> > with the error that caused the machine-check exception to be generated;
> > if the flag is clear, the saved instruction pointer may not be associated
> > with the error (see Section 17.3.1.2, “IA32_MCG_STATUS MSR”)."
> > 
> > But I can't tell if this is true just for P6 or all, because the CS
> > register isn't referenced again with EIPV.
> 
> Should probably have said "P6 and newer". The intent of EIPV is to
> indicate that this machine check is because of something that happened
> on the current CPU (remember this bit was defined when all #MC on Intel
> were broadcast, so knowing which CPU(s) are involved, and which have
> just been pulled in to the #MC handler by the broadcast was very
> important.
> 

Okay, fair enough. It seems like these quirks should stay. Thanks for
the discussion. It really helped me better understand these quirks and
their history.

Thanks,
Yazen

RE: [PATCH v2] x86/mce: Do away with unnecessary context quirks

Posted by Luck, Tony 1 month, 2 weeks ago

> Okay, fair enough. It seems like these quirks should stay. Thanks for
> the discussion. It really helped me better understand these quirks and
> their history.

Yazen,

Maybe someday we should say "enough" and clean out code to support ancient history.
That Sandybridge CPU was launched in 2012, discontinued in 2015, and is now out of
service window (meaning no new microcode updates). With only 8 cores and 20MB
L3 cache, I expect many laptops now run rings around it.

Linux has a history of supporting systems long after manufacturers have moved on,
but it does eventually drop old stuff. Maybe today isn't the right time. But perhaps
2033 when those old system reach USA legal drinking age :-)

-Tony