[PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement

Qiuxu Zhuo posted 10 patches 1 month, 1 week ago
There is a newer version of this series
[PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Qiuxu Zhuo 1 month, 1 week ago
Convert the multiple if() statements used for vendor differentiation
into a switch() statement for better readability.

As a bonus, the size of new generated text is reduced by 16 bytes.

  $ size core.o.*
     text	   data	    bss	    dec	    hex	filename
    21364	   4181	   3776	  29321	   7289	core.o.old
    21348	   4181	   3776	  29305	   7279	core.o.new

No functional changes intended.

Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 725c1d6fb1e5..40672fe0991a 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1892,7 +1892,8 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 	}
 
 	/* This should be disabled by the BIOS, but isn't always */
-	if (c->x86_vendor == X86_VENDOR_AMD) {
+	switch (c->x86_vendor) {
+	case X86_VENDOR_AMD:
 		if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) {
 			/*
 			 * disable GART TBL walk error reporting, which
@@ -1925,9 +1926,9 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 		if (c->x86 >= 0x17 && c->x86 <= 0x1A)
 			mce_flags.zen_ifu_quirk = 1;
 
-	}
+		break;
 
-	if (c->x86_vendor == X86_VENDOR_INTEL) {
+	case X86_VENDOR_INTEL:
 		/*
 		 * SDM documents that on family 6 bank 0 should not be written
 		 * because it aliases to another special BIOS controlled
@@ -1964,9 +1965,10 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 		 */
 		if (c->x86_vfm == INTEL_SKYLAKE_X)
 			mce_flags.skx_repmov_quirk = 1;
-	}
 
-	if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
+		break;
+
+	case X86_VENDOR_ZHAOXIN:
 		/*
 		 * All newer Zhaoxin CPUs support MCE broadcasting. Enable
 		 * synchronization with a one second timeout.
@@ -1975,6 +1977,8 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 			if (cfg->monarch_timeout < 0)
 				cfg->monarch_timeout = USEC_PER_SEC;
 		}
+
+		break;
 	}
 
 	if (cfg->monarch_timeout < 0)
-- 
2.17.1
Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Sohil Mehta 1 month, 1 week ago
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index 725c1d6fb1e5..40672fe0991a 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1892,7 +1892,8 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
>  	}
>  
>  	/* This should be disabled by the BIOS, but isn't always */

This comment is specific to the AMD and placing it before the switch
makes it seem generic to the entire switch statement. It should probably
be moved inside the AMD case just above the disable GART TLB check.

> -	if (c->x86_vendor == X86_VENDOR_AMD) {
> +	switch (c->x86_vendor) {
> +	case X86_VENDOR_AMD:
>  		if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) {
>  			/*
>  			 * disable GART TBL walk error reporting, which
> @@ -1925,9 +1926,9 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
>  		if (c->x86 >= 0x17 && c->x86 <= 0x1A)
>  			mce_flags.zen_ifu_quirk = 1;
>  
> -	}
> +		break;
>  


Also, why not include the unknown vendor check (right above) inside the
switch case as well?

if (c->x86_vendor == X86_VENDOR_UNKNOWN) {
	pr_info("unknown CPU type - not enabling MCE support\n");
	return -EOPNOTSUPP;
}

This seems to follow the same pattern as others and can be the first
case inside the switch.
Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Tony Luck 1 month, 1 week ago
On Fri, Oct 18, 2024 at 12:44:00PM -0700, Sohil Mehta wrote:
> > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> > index 725c1d6fb1e5..40672fe0991a 100644
> > --- a/arch/x86/kernel/cpu/mce/core.c
> > +++ b/arch/x86/kernel/cpu/mce/core.c
> > @@ -1892,7 +1892,8 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
> >  	}
> >  
> >  	/* This should be disabled by the BIOS, but isn't always */
> 
> This comment is specific to the AMD and placing it before the switch
> makes it seem generic to the entire switch statement. It should probably
> be moved inside the AMD case just above the disable GART TLB check.
> 
> > -	if (c->x86_vendor == X86_VENDOR_AMD) {
> > +	switch (c->x86_vendor) {
> > +	case X86_VENDOR_AMD:
> >  		if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) {
> >  			/*
> >  			 * disable GART TBL walk error reporting, which
> > @@ -1925,9 +1926,9 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
> >  		if (c->x86 >= 0x17 && c->x86 <= 0x1A)
> >  			mce_flags.zen_ifu_quirk = 1;
> >  
> > -	}
> > +		break;
> >  
> 
> 
> Also, why not include the unknown vendor check (right above) inside the
> switch case as well?
> 
> if (c->x86_vendor == X86_VENDOR_UNKNOWN) {
> 	pr_info("unknown CPU type - not enabling MCE support\n");
> 	return -EOPNOTSUPP;
> }
> 
> This seems to follow the same pattern as others and can be the first
> case inside the switch.

The vendor specific bits are large enough to warrant their own
static functions (as we do elsewhere in this file).

How about this (only compile-tested) patch?

-Tony


From 967d8637ac90823f28f4612cbbac305c421b4853 Mon Sep 17 00:00:00 2001
From: Tony Luck <tony.luck@intel.com>
Date: Fri, 18 Oct 2024 13:01:02 -0700
Subject: [PATCH] x86/mce: Break up __mcheck_cpu_apply_quirks()

Split each vendor specific part into its own helper function.

Signed-off-by: Tony Luck <tony.luck@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 172 ++++++++++++++++++---------------
 1 file changed, 96 insertions(+), 76 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2a938f429c4d..f51fb393d369 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1880,101 +1880,121 @@ static void __mcheck_cpu_check_banks(void)
 	}
 }
 
-/* Add per CPU specific workarounds here */
-static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
+static void apply_quirks_amd(struct cpuinfo_x86 *c)
 {
 	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
 	struct mca_config *cfg = &mca_cfg;
 
-	if (c->x86_vendor == X86_VENDOR_UNKNOWN) {
-		pr_info("unknown CPU type - not enabling MCE support\n");
-		return -EOPNOTSUPP;
-	}
-
 	/* This should be disabled by the BIOS, but isn't always */
-	if (c->x86_vendor == X86_VENDOR_AMD) {
-		if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) {
-			/*
-			 * disable GART TBL walk error reporting, which
-			 * trips off incorrectly with the IOMMU & 3ware
-			 * & Cerberus:
-			 */
-			clear_bit(10, (unsigned long *)&mce_banks[4].ctl);
-		}
-		if (c->x86 < 0x11 && cfg->bootlog < 0) {
-			/*
-			 * Lots of broken BIOS around that don't clear them
-			 * by default and leave crap in there. Don't log:
-			 */
-			cfg->bootlog = 0;
-		}
+	if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) {
 		/*
-		 * Various K7s with broken bank 0 around. Always disable
-		 * by default.
+		 * disable GART TBL walk error reporting, which
+		 * trips off incorrectly with the IOMMU & 3ware
+		 * & Cerberus:
 		 */
-		if (c->x86 == 6 && this_cpu_read(mce_num_banks) > 0)
-			mce_banks[0].ctl = 0;
-
+		clear_bit(10, (unsigned long *)&mce_banks[4].ctl);
+	}
+	if (c->x86 < 0x11 && cfg->bootlog < 0) {
 		/*
-		 * overflow_recov is supported for F15h Models 00h-0fh
-		 * even though we don't have a CPUID bit for it.
+		 * Lots of broken BIOS around that don't clear them
+		 * by default and leave crap in there. Don't log:
 		 */
-		if (c->x86 == 0x15 && c->x86_model <= 0xf)
-			mce_flags.overflow_recov = 1;
+		cfg->bootlog = 0;
+	}
+	/*
+	 * Various K7s with broken bank 0 around. Always disable
+	 * by default.
+	 */
+	if (c->x86 == 6 && this_cpu_read(mce_num_banks) > 0)
+		mce_banks[0].ctl = 0;
 
-		if (c->x86 >= 0x17 && c->x86 <= 0x1A)
-			mce_flags.zen_ifu_quirk = 1;
+	/*
+	 * overflow_recov is supported for F15h Models 00h-0fh
+	 * even though we don't have a CPUID bit for it.
+	 */
+	if (c->x86 == 0x15 && c->x86_model <= 0xf)
+		mce_flags.overflow_recov = 1;
 
-	}
+	if (c->x86 >= 0x17 && c->x86 <= 0x1A)
+		mce_flags.zen_ifu_quirk = 1;
+}
 
-	if (c->x86_vendor == X86_VENDOR_INTEL) {
-		/*
-		 * SDM documents that on family 6 bank 0 should not be written
-		 * because it aliases to another special BIOS controlled
-		 * register.
-		 * But it's not aliased anymore on model 0x1a+
-		 * Don't ignore bank 0 completely because there could be a
-		 * valid event later, merely don't write CTL0.
-		 */
+static void apply_quirks_intel(struct cpuinfo_x86 *c)
+{
+	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
+	struct mca_config *cfg = &mca_cfg;
 
-		if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0)
-			mce_banks[0].init = false;
+	/*
+	 * SDM documents that on family 6 bank 0 should not be written
+	 * because it aliases to another special BIOS controlled
+	 * register.
+	 * But it's not aliased anymore on model 0x1a+
+	 * Don't ignore bank 0 completely because there could be a
+	 * valid event later, merely don't write CTL0.
+	 */
 
-		/*
-		 * All newer Intel systems support MCE broadcasting. Enable
-		 * synchronization with a one second timeout.
-		 */
-		if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
-			cfg->monarch_timeout < 0)
-			cfg->monarch_timeout = USEC_PER_SEC;
+	if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0)
+		mce_banks[0].init = false;
 
-		/*
-		 * There are also broken BIOSes on some Pentium M and
-		 * earlier systems:
-		 */
-		if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0)
-			cfg->bootlog = 0;
+	/*
+	 * All newer Intel systems support MCE broadcasting. Enable
+	 * synchronization with a one second timeout.
+	 */
+	if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
+	    cfg->monarch_timeout < 0)
+		cfg->monarch_timeout = USEC_PER_SEC;
 
-		if (c->x86_vfm == INTEL_SANDYBRIDGE_X)
-			mce_flags.snb_ifu_quirk = 1;
+	/*
+	 * There are also broken BIOSes on some Pentium M and
+	 * earlier systems:
+	 */
+	if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0)
+		cfg->bootlog = 0;
 
-		/*
-		 * Skylake, Cascacde Lake and Cooper Lake require a quirk on
-		 * rep movs.
-		 */
-		if (c->x86_vfm == INTEL_SKYLAKE_X)
-			mce_flags.skx_repmov_quirk = 1;
+	if (c->x86_vfm == INTEL_SANDYBRIDGE_X)
+		mce_flags.snb_ifu_quirk = 1;
+
+	/*
+	 * Skylake, Cascacde Lake and Cooper Lake require a quirk on
+	 * rep movs.
+	 */
+	if (c->x86_vfm == INTEL_SKYLAKE_X)
+		mce_flags.skx_repmov_quirk = 1;
+}
+
+static void apply_quirks_zhaoxin(struct cpuinfo_x86 *c)
+{
+	struct mca_config *cfg = &mca_cfg;
+
+	/*
+	 * All newer Zhaoxin CPUs support MCE broadcasting. Enable
+	 * synchronization with a one second timeout.
+	 */
+	if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
+		if (cfg->monarch_timeout < 0)
+			cfg->monarch_timeout = USEC_PER_SEC;
 	}
+}
 
-	if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
-		/*
-		 * All newer Zhaoxin CPUs support MCE broadcasting. Enable
-		 * synchronization with a one second timeout.
-		 */
-		if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
-			if (cfg->monarch_timeout < 0)
-				cfg->monarch_timeout = USEC_PER_SEC;
-		}
+/* Add per CPU specific workarounds here */
+static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
+{
+	struct mca_config *cfg = &mca_cfg;
+
+	switch (c->x86_vendor) {
+	case X86_VENDOR_UNKNOWN:
+		pr_info("unknown CPU type - not enabling MCE support\n");
+		return -EOPNOTSUPP;
+
+	case X86_VENDOR_AMD:
+		apply_quirks_amd(c);
+		break;
+	case X86_VENDOR_INTEL:
+		apply_quirks_intel(c);
+		break;
+	case X86_VENDOR_ZHAOXIN:
+		apply_quirks_zhaoxin(c);
+		break;
 	}
 
 	if (cfg->monarch_timeout < 0)
-- 
2.47.0

>
RE: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Zhuo, Qiuxu 1 month, 1 week ago
> From: Luck, Tony <tony.luck@intel.com>
> [...]
> On Fri, Oct 18, 2024 at 12:44:00PM -0700, Sohil Mehta wrote:
> > > diff --git a/arch/x86/kernel/cpu/mce/core.c
> > > b/arch/x86/kernel/cpu/mce/core.c index 725c1d6fb1e5..40672fe0991a
> > > 100644
> > > --- a/arch/x86/kernel/cpu/mce/core.c
> > > +++ b/arch/x86/kernel/cpu/mce/core.c
> > > @@ -1892,7 +1892,8 @@ static int __mcheck_cpu_apply_quirks(struct
> cpuinfo_x86 *c)
> > >  	}
> > >
> > >  	/* This should be disabled by the BIOS, but isn't always */
> >
> > This comment is specific to the AMD and placing it before the switch
> > makes it seem generic to the entire switch statement. It should
> > probably be moved inside the AMD case just above the disable GART TLB
> check.
> >
> > > -	if (c->x86_vendor == X86_VENDOR_AMD) {
> > > +	switch (c->x86_vendor) {
> > > +	case X86_VENDOR_AMD:
> > >  		if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) {
> > >  			/*
> > >  			 * disable GART TBL walk error reporting, which @@ -
> 1925,9
> > > +1926,9 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
> > >  		if (c->x86 >= 0x17 && c->x86 <= 0x1A)
> > >  			mce_flags.zen_ifu_quirk = 1;
> > >
> > > -	}
> > > +		break;
> > >
> >
> >
> > Also, why not include the unknown vendor check (right above) inside
> > the switch case as well?
> >
> > if (c->x86_vendor == X86_VENDOR_UNKNOWN) {
> > 	pr_info("unknown CPU type - not enabling MCE support\n");
> > 	return -EOPNOTSUPP;
> > }
> >
> > This seems to follow the same pattern as others and can be the first
> > case inside the switch.
> 
> The vendor specific bits are large enough to warrant their own static functions
> (as we do elsewhere in this file).
> 
> How about this (only compile-tested) patch?
> 

LGTM.
This makes the code more structured and readable. 
I'll replace mine with this new patch in my next version after basic testing.
Thanks, Tony & Sohil.

-Qiuxu
Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Sohil Mehta 1 month, 1 week ago
On 10/18/2024 1:14 PM, Tony Luck wrote:

> The vendor specific bits are large enough to warrant their own
> static functions (as we do elsewhere in this file).
> 
> How about this (only compile-tested) patch?
> 

Yeah, it does make it easier to read. Can some of the hardcoded numbers
be changed to vfm macros as well?
RE: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Zhuo, Qiuxu 1 month, 1 week ago
> From: Mehta, Sohil <sohil.mehta@intel.com>
> [...]
> Subject: Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into
> a switch() statement
> 
> On 10/18/2024 1:14 PM, Tony Luck wrote:
> 
> > The vendor specific bits are large enough to warrant their own static
> > functions (as we do elsewhere in this file).
> >
> > How about this (only compile-tested) patch?
> >
> 
> Yeah, it does make it easier to read. Can some of the hardcoded numbers be
> changed to vfm macros as well?

Convert some hard-coded numbers to VFM macros on top of Tony's patch as shown below. 
Please review it for any errors or omissions.

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index f51fb393d369..3b83efa1ca65 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1924,6 +1924,10 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c)
        struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
        struct mca_config *cfg = &mca_cfg;

+       /* Older CPUs don't need quirks. */
+       if (c->x86 < 6)
+               return;
+
        /*
         * SDM documents that on family 6 bank 0 should not be written
         * because it aliases to another special BIOS controlled
@@ -1932,23 +1936,21 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c)
         * Don't ignore bank 0 completely because there could be a
         * valid event later, merely don't write CTL0.
         */
-
-       if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0)
+       if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0)
                mce_banks[0].init = false;

        /*
         * All newer Intel systems support MCE broadcasting. Enable
         * synchronization with a one second timeout.
         */
-       if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
-           cfg->monarch_timeout < 0)
+       if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout < 0)
                cfg->monarch_timeout = USEC_PER_SEC;

        /*
         * There are also broken BIOSes on some Pentium M and
         * earlier systems:
         */
-       if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0)
+       if (c->x86_vfm < INTEL_CORE_YONAH && cfg->bootlog < 0)
                cfg->bootlog = 0;

        if (c->x86_vfm == INTEL_SANDYBRIDGE_X)

Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Sohil Mehta 1 month ago
On 10/18/2024 10:46 PM, Zhuo, Qiuxu wrote:
> Convert some hard-coded numbers to VFM macros on top of Tony's patch as shown below. 
> Please review it for any errors or omissions.
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index f51fb393d369..3b83efa1ca65 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -1924,6 +1924,10 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c)
>         struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
>         struct mca_config *cfg = &mca_cfg;
> 
> +       /* Older CPUs don't need quirks. */
> +       if (c->x86 < 6)
> +               return;
> +

As Dave mentioned, change this to make the use of vfm consistent in the
entire function and probably update the comment as well to make it explicit:

	/* Older CPUs (prior to family 6) don't need quirks */
	if (c->x86_vfm < INTEL_PENTIUM_PRO)
		return;


>         /*
>          * SDM documents that on family 6 bank 0 should not be written
>          * because it aliases to another special BIOS controlled
> @@ -1932,23 +1936,21 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c)
>          * Don't ignore bank 0 completely because there could be a
>          * valid event later, merely don't write CTL0.
>          */
> -
> -       if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0)
> +       if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0)
>                 mce_banks[0].init = false;
> 
>         /*
>          * All newer Intel systems support MCE broadcasting. Enable
>          * synchronization with a one second timeout.
>          */
> -       if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
> -           cfg->monarch_timeout < 0)
> +       if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout < 0)
>                 cfg->monarch_timeout = USEC_PER_SEC;
> 

Instead of keeping this open-ended we could tweak this a bit as follows:

if (!(c->x86_vfm < INTEL_CORE_YONAH)) && cfg->monarch_timeout < 0)
	cfg->monarch_timeout = USEC_PER_SEC;

Essentially the same: if (new_cpu) vs if (!old_cpu)
Don't have a strong preference. Will leave it to you and Tony.

>         /*
>          * There are also broken BIOSes on some Pentium M and
>          * earlier systems:
>          */
> -       if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0)
> +       if (c->x86_vfm < INTEL_CORE_YONAH && cfg->bootlog < 0)
>                 cfg->bootlog = 0;
> 
>         if (c->x86_vfm == INTEL_SANDYBRIDGE_X)
> 

All the other checks look fine to me.
RE: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Zhuo, Qiuxu 1 month ago
> From: Mehta, Sohil <sohil.mehta@intel.com>
> [...]
> As Dave mentioned, change this to make the use of vfm consistent in the
> entire function and probably update the comment as well to make it explicit:
> 
> 	/* Older CPUs (prior to family 6) don't need quirks */

Yes, the improved comment is better.

> 	if (c->x86_vfm < INTEL_PENTIUM_PRO)
> 		return;
> 
> [...]
> > -       if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
> > -           cfg->monarch_timeout < 0)
> > +       if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout <
> > + 0)
> >                 cfg->monarch_timeout = USEC_PER_SEC;
> >
> 
> Instead of keeping this open-ended we could tweak this a bit as follows:
> 
> if (!(c->x86_vfm < INTEL_CORE_YONAH)) && cfg->monarch_timeout < 0)
> 	cfg->monarch_timeout = USEC_PER_SEC;
> 
> Essentially the same: if (new_cpu) vs if (!old_cpu) Don't have a strong
> preference. Will leave it to you and Tony.
>

I prefer the single, straightforward '>=' operation over the '<' and then '!' two operations.
 
- Qiuxu
RE: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Luck, Tony 1 month ago
>         /*
>          * All newer Intel systems support MCE broadcasting. Enable
>          * synchronization with a one second timeout.
>          */
> -       if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
> -           cfg->monarch_timeout < 0)
> +       if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout < 0)
>                 cfg->monarch_timeout = USEC_PER_SEC;

This change is correct. But the old code makes it more explicit that
CPUs in families > 6 take this action. As the author of the VFM changes
it's clear to me, maybe less so to others?

But maybe its OK.  The comment does help a lot. Anyone else have thoughts on this?

-Tony
Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Sohil Mehta 1 month ago
On 10/21/2024 9:06 AM, Luck, Tony wrote:
>>         /*
>>          * All newer Intel systems support MCE broadcasting. Enable
>>          * synchronization with a one second timeout.
>>          */
>> -       if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
>> -           cfg->monarch_timeout < 0)
>> +       if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout < 0)
>>                 cfg->monarch_timeout = USEC_PER_SEC;
> 
> This change is correct. But the old code makes it more explicit that
> CPUs in families > 6 take this action. As the author of the VFM changes
> it's clear to me, maybe less so to others?
> 
> But maybe its OK.  The comment does help a lot. Anyone else have thoughts on this?
> 

I am not very familiar with the intricacies of the VFM checks. I did
take me a few minutes to figure out why the modified code is correct.

But looking at the prior or the later checks, I see the '<' operator
used directly on platform names. So, the new check seems inline with
that i.e. in this case, any model or family after the said platform
supports MCE broadcasting.

>         /*
>          * There are also broken BIOSes on some Pentium M and
>          * earlier systems:
>          */
> -       if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0)
> +       if (c->x86_vfm < INTEL_CORE_YONAH && cfg->bootlog < 0)
>                 cfg->bootlog = 0;
RE: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Zhuo, Qiuxu 1 month ago
> From: Mehta, Sohil <sohil.mehta@intel.com>
> [...]
> On 10/21/2024 9:06 AM, Luck, Tony wrote:
> [...]
> > This change is correct. But the old code makes it more explicit that
> > CPUs in families > 6 take this action. As the author of the VFM
> > changes it's clear to me, maybe less so to others?
> >
> > But maybe its OK.  The comment does help a lot. Anyone else have thoughts
> on this?
> >
> 
> I am not very familiar with the intricacies of the VFM checks. I did take me a
> few minutes to figure out why the modified code is correct.

OK. So, back to your original question below, what is your answer to it now? :-)

    "Can some of the hardcoded numbers be changed to vfm macros as well?"


Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Sohil Mehta 1 month ago
On 10/22/2024 7:08 PM, Zhuo, Qiuxu wrote:
> OK. So, back to your original question below, what is your answer to it now? :-)
> 
>     "Can some of the hardcoded numbers be changed to vfm macros as well?"
> 

Even though it takes a tiny bit of reading to understand the VFM macros,
the pros significantly outweigh the cons. I still feel we should go
ahead and make the changes.

I have responded with minor comments to your patch.

Sohil
RE: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Zhuo, Qiuxu 1 month ago
> From: Mehta, Sohil <sohil.mehta@intel.com>
> [...]
> On 10/22/2024 7:08 PM, Zhuo, Qiuxu wrote:
> > OK. So, back to your original question below, what is your answer to
> > it now? :-)
> >
> >     "Can some of the hardcoded numbers be changed to vfm macros as
> well?"
> >
> 
> Even though it takes a tiny bit of reading to understand the VFM macros, the
> pros significantly outweigh the cons. I still feel we should go ahead and make
> the changes.

Thanks for letting me know your thoughts.

To me, the VFM-based checks can make the code more compact.
So, if nobody else objects, I'll include this VFM-based check version in the next version.

- Qiuxu
RE: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Luck, Tony 1 month ago
> I am not very familiar with the intricacies of the VFM checks. I did
> take me a few minutes to figure out why the modified code is correct.

Hence my concern :-)

> But looking at the prior or the later checks, I see the '<' operator
> used directly on platform names. So, the new check seems inline with
> that i.e. in this case, any model or family after the said platform
> supports MCE broadcasting.

Intel model number allocation policies aren't necessarily sequential.
So range checks need to be used with caution. They should be safe
enough when done to simplify code that checks very old models.

Range checks across families may be even more problematic. Again
these old checks that assume all future families will not reintroduce
quirks from 20-year-old CPUs should be safe (I hope!!).


But, spoiler alert, Intel is planning to begin use of two families in
parallel. Family 19 (first model Diamond Rapids) is already in
<asm/intel-family.h>). But there are going to be some CPUs
in family 18 too. I'll be surprised if there are any use cases for
ranges that span between families 18 and 19.

-Tony
Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Sohil Mehta 1 month ago
On 10/21/2024 10:51 AM, Luck, Tony wrote:
>> But looking at the prior or the later checks, I see the '<' operator
>> used directly on platform names. So, the new check seems inline with
>> that i.e. in this case, any model or family after the said platform
>> supports MCE broadcasting.
> 
> Intel model number allocation policies aren't necessarily sequential.

Model numbers are assumed to be sequential at least within family 6.

if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
	cfg->monarch_timeout < 0)

There is another check in early_init_intel() which does this:

	if (c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xd)) {
		rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);


> So range checks need to be used with caution. They should be safe
> enough when done to simplify code that checks very old models.
> 

...

> Range checks across families may be even more problematic. 

I agree. Maybe we should avoid range checks across families altogether
forward (>) or backward (<).

For example, does the following change from Qiuxu, unintentionally
become applicable to Quark CPUs with family -> 5?


>         /*
>          * There are also broken BIOSes on some Pentium M and
>          * earlier systems:
>          */
> -       if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0)
> +       if (c->x86_vfm < INTEL_CORE_YONAH && cfg->bootlog < 0)
>                 cfg->bootlog = 0;
RE: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Luck, Tony 1 month ago
>> Intel model number allocation policies aren't necessarily sequential.
>
> Model numbers are assumed to be sequential at least within family 6.

Assumption can only be applied retroactively to simpler times.  Looking
at the timelines and model numbers for pure-Atom, pure-Core, Hybrid,
and Xeon, they are somewhat jumbled.

> For example, does the following change from Qiuxu, unintentionally
> become applicable to Quark CPUs with family -> 5?

Qiuxu starts the function with:

+       /* Older CPUs don't need quirks. */
+       if (c->x86 < 6)
+               return;

So Quark leaves the function early.

-Tony

	
Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Sohil Mehta 1 month ago
On 10/21/2024 11:40 AM, Luck, Tony wrote:
>>> Intel model number allocation policies aren't necessarily sequential.
>>
>> Model numbers are assumed to be sequential at least within family 6.
> 
> Assumption can only be applied retroactively to simpler times.  Looking
> at the timelines and model numbers for pure-Atom, pure-Core, Hybrid,
> and Xeon, they are somewhat jumbled.
> 

Agreed. Using range checks within a family with extreme care and
avoiding cross-family ones seems like the saner thing to do.

Maybe everything in the future is enumerated and VFM checks would not be
needed :)

Trying to understand more, I have more questions than answers. With the
introduction of Family 0x19, do we need to reevaluate some of the
existing model checks?

early_init_intel():
if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
    (c->x86 == 0x6 && c->x86_model >= 0x0e))
	set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);

It seems "constant_tsc" wouldn't show on Diamond rapids. Do we need it to?

> Qiuxu starts the function with:
> 
> +       /* Older CPUs don't need quirks. */
> +       if (c->x86 < 6)
> +               return;
> 
> So Quark leaves the function early.
> 

Ah! My bad, I missed that.

> -Tony
> 
>
Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Dave Hansen 1 month ago
On 10/21/24 15:57, Sohil Mehta wrote:
> On 10/21/2024 11:40 AM, Luck, Tony wrote:
>>>> Intel model number allocation policies aren't necessarily sequential.
>>> Model numbers are assumed to be sequential at least within family 6.
>> Assumption can only be applied retroactively to simpler times.  Looking
>> at the timelines and model numbers for pure-Atom, pure-Core, Hybrid,
>> and Xeon, they are somewhat jumbled.
>>
> Agreed. Using range checks within a family with extreme care and
> avoiding cross-family ones seems like the saner thing to do.
> 
> Maybe everything in the future is enumerated and VFM checks would not be
> needed 🙂
> 
> Trying to understand more, I have more questions than answers. With the
> introduction of Family 0x19, do we need to reevaluate some of the
> existing model checks?
> 
> early_init_intel():
> if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
>     (c->x86 == 0x6 && c->x86_model >= 0x0e))
> 	set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
> 
> It seems "constant_tsc" wouldn't show on Diamond rapids. Do we need it to?

We only have a handful of these and they're mostly for early family 6
things.  I bet there's less than half a dozen.

Let's just list them in one of our match structures:

const u32 NOT_SUPPORTED = UINT_MAX; // or another special, invalid VFM
const u32 ALL_SUPPORTED = 0;

static const struct x86_cpu_id table[] __initconst = {
	X86_MATCH_FAM(INTEL,   3, NOT_SUPPORTED),
	X86_MATCH_FAM(INTEL,   4, NOT_SUPPORTED),
	X86_MATCH_FAM(INTEL,   5, NOT_SUPPORTED),
	X86_MATCH_FAM(INTEL,   6, INTEL_CORE_YONAH),
	X86_MATCH_FAM(INTEL, 0xf, INTEL_P4_WHATEVER),
	X86_MATCH_VEN(INTEL,      ALL_SUPPORTED),
};

... and then use it like this:

	m = x86_match_cpu(table);
	// Non-Intel lands here:
	if (!m)
		return false;

	if (VFM_MODEL(c->x86_vfm) >= m->driver_data)
		return true;

	return false;
Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Sohil Mehta 1 month ago
On 10/21/2024 5:17 PM, Dave Hansen wrote:

> We only have a handful of these and they're mostly for early family 6
> things.  I bet there's less than half a dozen.
> 

You are right. There don't seem to be many unbounded model checks for
Intel family 6. I could only find 3.

early_init_intel() -> constant_tsc - Tony found out that it is harmless
since it got it's own enumeration later on.

should_io_be_busy() and acpi_processor_power_init_bm_check() also seem
to be for older platforms and probably no longer applicable. I'll reach
out to the power folks to confirm.

Maybe if we just add an upper bound to these checks then we don't to
worry about carrying them forward with the newer family 6 models and
upcoming family 19 models.
RE: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Luck, Tony 1 month ago
> Trying to understand more, I have more questions than answers. With the
> introduction of Family 0x19, do we need to reevaluate some of the
> existing model checks?

Diamond Rapids is in Family 19, not 0x19.  I was unsure in <asm/intel-family.h>
to use decimal or hex for family (since only 5 & 6 are used there, and they are
same in both bases). I picked decimal to avoid 0x prefixes everywhere.

> early_init_intel():
> if ((c->x86 == 0xf && c->x86_model >= 0x03) ||
>     (c->x86 == 0x6 && c->x86_model >= 0x0e))
>       set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
>
> It seems "constant_tsc" wouldn't show on Diamond rapids. Do we need it to?

This looks to be checking for Pentium IV Prescott or newer in
family 0xf, or Yonah or newer in family 6.

You are right that it won't catch the new families. But it might
not matter if this later block sets the feature bit.

        /*
         * c->x86_power is 8000_0007 edx. Bit 8 is TSC runs at constant rate
         * with P/T states and does not stop in deep C-states.
         *
         * It is also reliable across cores and sockets. (but not across
         * cabinets - we turn it off in that case explicitly.)
         */
        if (c->x86_power & (1 << 8)) {
                set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
                set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
        }

It appears that constant TSC started out model specific, but later
got a proper enumeration bit in CPUID.

-Tony


Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Sohil Mehta 1 month ago
On 10/21/2024 4:31 PM, Luck, Tony wrote:

> Diamond Rapids is in Family 19, not 0x19.  I was unsure in <asm/intel-family.h>
> to use decimal or hex for family (since only 5 & 6 are used there, and they are
> same in both bases). I picked decimal to avoid 0x prefixes everywhere.
> 

Got it. In intel-family.h it probably doesn't matter. But across x86 the
family checks seem to be a mix of decimal and hexadecimal  with maybe
more inclination towards hex.


> It appears that constant TSC started out model specific, but later
> got a proper enumeration bit in CPUID.
> 

Ah! Makes sense.

> -Tony
> 
>
Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Dave Hansen 1 month ago
On 10/21/24 09:06, Luck, Tony wrote:
>>         /*
>>          * All newer Intel systems support MCE broadcasting. Enable
>>          * synchronization with a one second timeout.
>>          */
>> -       if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
>> -           cfg->monarch_timeout < 0)
>> +       if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout < 0)
>>                 cfg->monarch_timeout = USEC_PER_SEC;
> This change is correct. But the old code makes it more explicit that
> CPUs in families > 6 take this action. As the author of the VFM changes
> it's clear to me, maybe less so to others?
> 
> But maybe its OK.  The comment does help a lot. Anyone else have thoughts on this?

It certainly is a bit subtle.

To me, the earlier check would be even better if it were:

-	if (c->x86 < 6)
+	if (c->x86_vfm < INTEL_PENTIUM_PRO)
		return;

That at least makes it more clear that it's a range of models and avoids
having a ->x86 check mixed with a ->x86_vfm one.
RE: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Zhuo, Qiuxu 1 month ago
> From: Hansen, Dave <dave.hansen@intel.com>
> [...]
> On 10/21/24 09:06, Luck, Tony wrote:
> >>         /*
> >>          * All newer Intel systems support MCE broadcasting. Enable
> >>          * synchronization with a one second timeout.
> >>          */
> >> -       if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
> >> -           cfg->monarch_timeout < 0)
> >> +       if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout <
> >> + 0)
> >>                 cfg->monarch_timeout = USEC_PER_SEC;
> > This change is correct. But the old code makes it more explicit that
> > CPUs in families > 6 take this action. As the author of the VFM
> > changes it's clear to me, maybe less so to others?
> >
> > But maybe its OK.  The comment does help a lot. Anyone else have thoughts
> on this?
> 
> It certainly is a bit subtle.
> 
> To me, the earlier check would be even better if it were:
> 
> -	if (c->x86 < 6)

Thanks, Dave. 
OK, I'll update it in the next version. 
Apart from this, I'll also add a comment below, as suggested by Sohil, to make it explicit that it's for prior to family 6.

   /* Older CPUs (prior to family 6) don't need quirks */
> +	if (c->x86_vfm < INTEL_PENTIUM_PRO)
> 		return;
> 
> That at least makes it more clear that it's a range of models and avoids having
> a ->x86 check mixed with a ->x86_vfm one.
RE: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Zhuo, Qiuxu 1 month ago
> From: Zhuo, Qiuxu <qiuxu.zhuo@intel.com>
> [...]
> Subject: RE: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into
> a switch() statement
> 
> > From: Hansen, Dave <dave.hansen@intel.com> [...] On 10/21/24 09:06,
> [...]
> >
> > It certainly is a bit subtle.
> >
> > To me, the earlier check would be even better if it were:
> >
> > -	if (c->x86 < 6)
> 
> Thanks, Dave.
> OK, I'll update it in the next version.
> Apart from this, I'll also add a comment below, as suggested by Sohil, to make
> it explicit that it's for prior to family 6.
> 
>    /* Older CPUs (prior to family 6) don't need quirks */
> > +	if (c->x86_vfm < INTEL_PENTIUM_PRO)
> > 		return;
> >
> > That at least makes it more clear that it's a range of models and
> > avoids having a ->x86 check mixed with a ->x86_vfm one.

Hi @Luck, Tony, @Hansen, Dave, @Mehta, Sohil,

Thanks for your time discussing the VFM-based checks. 

I made the patch on top of Tony's [1] based on what we've discussed.
I'd like to add the tags from you to the following patch.
Please let me know if these tags are OK with you. Thanks!

[1] https://lore.kernel.org/all/ZxLBwO4HkkJG4WYn@agluck-desk3.sc.intel.com/

From 6e88743f0619a902c6e6f985b9fc93669098b4af Mon Sep 17 00:00:00 2001
From: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Date: Mon, 21 Oct 2024 10:42:23 +0800
Subject: [PATCH v3 07/10] x86/mce: Convert family/model mixed checks to
 VFM-based checks

Convert family/model mixed checks to VFM-based checks to make
the code more compact.

Suggested-by: Sohil Mehta <sohil.mehta@intel.com>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index bb8b1000fa0a..936804a5a0b9 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1924,6 +1924,10 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c)
        struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
        struct mca_config *cfg = &mca_cfg;

+       /* Older CPUs (prior to family 6) don't need quirks. */
+       if (c->x86_vfm < INTEL_PENTIUM_PRO)
+               return;
+
        /*
         * SDM documents that on family 6 bank 0 should not be written
         * because it aliases to another special BIOS controlled
@@ -1932,22 +1936,21 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c)
         * Don't ignore bank 0 completely because there could be a
         * valid event later, merely don't write CTL0.
         */
-       if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0)
+       if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0)
                mce_banks[0].init = false;

        /*
         * All newer Intel systems support MCE broadcasting. Enable
         * synchronization with a one second timeout.
         */
-       if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
-           cfg->monarch_timeout < 0)
+       if (c->x86_vfm >= INTEL_CORE_YONAH && cfg->monarch_timeout < 0)
                cfg->monarch_timeout = USEC_PER_SEC;

        /*
         * There are also broken BIOSes on some Pentium M and
         * earlier systems:
         */
-       if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0)
+       if (c->x86_vfm < INTEL_CORE_YONAH && cfg->bootlog < 0)
                cfg->bootlog = 0;

        if (c->x86_vfm == INTEL_SANDYBRIDGE_X)
--
2.17.1
RE: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Luck, Tony 1 month ago
-       if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0)
+       if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0)
                mce_banks[0].init = false;

Updated code now matches for families before 6 (486, Pentium). 486 would never get
to this code. But I think from the comments about machine check bank 0 being magic
that Pentium had some rudimentary support.

Should this be:
	if (c->x86_vfm >= INTEL_PENTIUM_PRO && c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0)
to avoid this semantic change?

-       if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0)
+       if (c->x86_vfm < INTEL_CORE_YONAH && cfg->bootlog < 0)
                cfg->bootlog = 0;

Same as above. New code matches for families 4 and 5.

-Tony
Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Sohil Mehta 1 month ago
On 10/24/2024 9:42 AM, Luck, Tony wrote:
> -       if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0)
> +       if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0)
>                 mce_banks[0].init = false;
> 
> Updated code now matches for families before 6 (486, Pentium). 486 would never get
> to this code. But I think from the comments about machine check bank 0 being magic
> that Pentium had some rudimentary support.
> 

As you mentioned it yourself (the last time I was concerned about family
5), the following check should cover this scenario?

> @@ -1924,6 +1924,10 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c)
>         struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
>         struct mca_config *cfg = &mca_cfg;
> 
> +       /* Older CPUs (prior to family 6) don't need quirks. */
> +       if (c->x86_vfm < INTEL_PENTIUM_PRO)
> +               return;
> +


Sohil
Re: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Dave Hansen 1 month ago
On 10/24/24 14:31, Sohil Mehta wrote:
> @@ -1924,6 +1924,10 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c)
>         struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
>         struct mca_config *cfg = &mca_cfg;
> 
> +       /* Older CPUs (prior to family 6) don't need quirks. */
> +       if (c->x86_vfm < INTEL_PENTIUM_PRO)
> +               return;

In case anyone grumbles, this is one of those expressions that's not
perfect, but I think it's quite good enough in practice.

It wouldn't work if we ever (for instance) moved 'vendor' to be less
significant bits than 'model'.  Or on a CPU that claimed to be family=6
but model=0.  But Intel never (as far as I know) sold a CPU like that,
so it's probably only possible in a VM where these checks are rather
worthless anyway.

In short, I think >=INTEL_PENTIUM_PRO is a great check that can mean
"family 6 or later" almost anywhere.
RE: [PATCH v2 06/10] x86/mce: Convert multiple if () statements into a switch() statement
Posted by Luck, Tony 1 month ago
> > -       if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0)
> > +       if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0)
> >                 mce_banks[0].init = false;
> >
> > Updated code now matches for families before 6 (486, Pentium). 486 would never get
> > to this code. But I think from the comments about machine check bank 0 being magic
> > that Pentium had some rudimentary support.
> >
>
> As you mentioned it yourself (the last time I was concerned about family
> 5), the following check should cover this scenario?
>
> > @@ -1924,6 +1924,10 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c)
> >         struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
> >         struct mca_config *cfg = &mca_cfg;
> >
> > +       /* Older CPUs (prior to family 6) don't need quirks. */
> > +       if (c->x86_vfm < INTEL_PENTIUM_PRO)
> > +               return;

So I did. I was right about it too.

Sorry for the noise. This patch looks good.

-Tony