[PATCH v2 06/16] x86/mce: Remove __mcheck_cpu_init_early()

Yazen Ghannam posted 16 patches 10 months, 1 week ago
There is a newer version of this series
[PATCH v2 06/16] x86/mce: Remove __mcheck_cpu_init_early()
Posted by Yazen Ghannam 10 months, 1 week ago
The __mcheck_cpu_init_early() function was introduced so that some
vendor-specific features are detected before the first MCA polling event
done in __mcheck_cpu_init_generic().

Currently, __mcheck_cpu_init_early() is only used on AMD-based systems and
additional code will be needed to support various system configurations.

However, the current and future vendor-specific code should be done during
vendor init. This keeps all the vendor code in a common location and
simplifies the generic init flow.

Move all the __mcheck_cpu_init_early() code into mce_amd_feature_init().
Also, move __mcheck_cpu_init_generic() after
__mcheck_cpu_init_prepare_banks() so that MCA is enabled after the first
MCA polling event.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20221206173607.1185907-3-yazen.ghannam@amd.com
    
    v1->v2:
    * New in v2, but based on old patch (see link).
    * Changed cpu_has() to cpu_feature_enabled().

 arch/x86/kernel/cpu/mce/amd.c  |  4 ++++
 arch/x86/kernel/cpu/mce/core.c | 20 +++-----------------
 2 files changed, 7 insertions(+), 17 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index df8984aac1c6..302a310d0630 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -656,6 +656,10 @@ void mce_amd_feature_init(struct cpuinfo_x86 *c)
 	u32 low = 0, high = 0, address = 0;
 	int offset = -1;
 
+	mce_flags.overflow_recov = cpu_feature_enabled(X86_FEATURE_OVERFLOW_RECOV);
+	mce_flags.succor	 = cpu_feature_enabled(X86_FEATURE_SUCCOR);
+	mce_flags.smca		 = cpu_feature_enabled(X86_FEATURE_SMCA);
+	mce_flags.amd_threshold	 = 1;
 
 	for (bank = 0; bank < this_cpu_read(mce_num_banks); ++bank) {
 		if (mce_flags.smca)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index d85bd861ecca..7fbf1c8291b8 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2029,19 +2029,6 @@ static bool __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c)
 	return false;
 }
 
-/*
- * Init basic CPU features needed for early decoding of MCEs.
- */
-static void __mcheck_cpu_init_early(struct cpuinfo_x86 *c)
-{
-	if (c->x86_vendor == X86_VENDOR_AMD || c->x86_vendor == X86_VENDOR_HYGON) {
-		mce_flags.overflow_recov = !!cpu_has(c, X86_FEATURE_OVERFLOW_RECOV);
-		mce_flags.succor	 = !!cpu_has(c, X86_FEATURE_SUCCOR);
-		mce_flags.smca		 = !!cpu_has(c, X86_FEATURE_SMCA);
-		mce_flags.amd_threshold	 = 1;
-	}
-}
-
 static void mce_centaur_feature_init(struct cpuinfo_x86 *c)
 {
 	struct mca_config *cfg = &mca_cfg;
@@ -2281,10 +2268,9 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
 
 	mca_cfg.initialized = 1;
 
-	__mcheck_cpu_init_early(c);
-	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_vendor(c);
 	__mcheck_cpu_init_prepare_banks();
+	__mcheck_cpu_init_generic();
 	__mcheck_cpu_setup_timer();
 }
 
@@ -2450,9 +2436,9 @@ static void mce_syscore_shutdown(void)
  */
 static void mce_syscore_resume(void)
 {
-	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_vendor(raw_cpu_ptr(&cpu_info));
 	__mcheck_cpu_init_prepare_banks();
+	__mcheck_cpu_init_generic();
 }
 
 static struct syscore_ops mce_syscore_ops = {
@@ -2469,8 +2455,8 @@ static void mce_cpu_restart(void *data)
 {
 	if (!mce_available(raw_cpu_ptr(&cpu_info)))
 		return;
-	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_prepare_banks();
+	__mcheck_cpu_init_generic();
 	__mcheck_cpu_init_timer();
 }
 

-- 
2.43.0
Re: [PATCH v2 06/16] x86/mce: Remove __mcheck_cpu_init_early()
Posted by Borislav Petkov 9 months, 3 weeks ago
On Thu, Feb 13, 2025 at 04:45:55PM +0000, Yazen Ghannam wrote:
> Also, move __mcheck_cpu_init_generic() after
> __mcheck_cpu_init_prepare_banks() so that MCA is enabled after the first
> MCA polling event.

The reason being?

Precaution?

It was this way since forever, why are you moving it now? Any particular
reason?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v2 06/16] x86/mce: Remove __mcheck_cpu_init_early()
Posted by Yazen Ghannam 9 months, 3 weeks ago
On Thu, Feb 27, 2025 at 04:25:00PM +0100, Borislav Petkov wrote:
> On Thu, Feb 13, 2025 at 04:45:55PM +0000, Yazen Ghannam wrote:
> > Also, move __mcheck_cpu_init_generic() after
> > __mcheck_cpu_init_prepare_banks() so that MCA is enabled after the first
> > MCA polling event.
> 
> The reason being?
> 
> Precaution?
> 
> It was this way since forever, why are you moving it now? Any particular
> reason?
> 

1) To read/clear old errors before turning on MCA. The updated
__mcheck_cpu_init_prepare_banks() function does this for the MCi_CTL
registers. This patch does this for the MCG_CTL register too.

2) To ensure that vendor-specific setup is finished beforehand also.

Thanks,
Yazen
Re: [PATCH v2 06/16] x86/mce: Remove __mcheck_cpu_init_early()
Posted by Borislav Petkov 9 months, 3 weeks ago
On February 27, 2025 5:31:48 PM GMT+01:00, Yazen Ghannam <yazen.ghannam@amd.com> wrote:
>On Thu, Feb 27, 2025 at 04:25:00PM +0100, Borislav Petkov wrote:
>> On Thu, Feb 13, 2025 at 04:45:55PM +0000, Yazen Ghannam wrote:
>> > Also, move __mcheck_cpu_init_generic() after
>> > __mcheck_cpu_init_prepare_banks() so that MCA is enabled after the first
>> > MCA polling event.
>> 
>> The reason being?
>> 
>> Precaution?
>> 
>> It was this way since forever, why are you moving it now? Any particular
>> reason?
>> 
>
>1) To read/clear old errors before turning on MCA. The updated
>__mcheck_cpu_init_prepare_banks() function does this for the MCi_CTL
>registers. This patch does this for the MCG_CTL register too.
>
>2) To ensure that vendor-specific setup is finished beforehand also.

That doesn't answer my question. All of the above gets done even without shuffling the order...


-- 
Sent from a small device: formatting sucks and brevity is inevitable.
Re: [PATCH v2 06/16] x86/mce: Remove __mcheck_cpu_init_early()
Posted by Yazen Ghannam 9 months, 3 weeks ago
On Thu, Feb 27, 2025 at 08:33:19PM +0100, Borislav Petkov wrote:
> On February 27, 2025 5:31:48 PM GMT+01:00, Yazen Ghannam <yazen.ghannam@amd.com> wrote:
> >On Thu, Feb 27, 2025 at 04:25:00PM +0100, Borislav Petkov wrote:
> >> On Thu, Feb 13, 2025 at 04:45:55PM +0000, Yazen Ghannam wrote:
> >> > Also, move __mcheck_cpu_init_generic() after
> >> > __mcheck_cpu_init_prepare_banks() so that MCA is enabled after the first
> >> > MCA polling event.
> >> 
> >> The reason being?
> >> 
> >> Precaution?
> >> 
> >> It was this way since forever, why are you moving it now? Any particular
> >> reason?
> >> 
> >
> >1) To read/clear old errors before turning on MCA. The updated
> >__mcheck_cpu_init_prepare_banks() function does this for the MCi_CTL
> >registers. This patch does this for the MCG_CTL register too.
> >
> >2) To ensure that vendor-specific setup is finished beforehand also.
> 
> That doesn't answer my question. All of the above gets done even without shuffling the order...
> 
> 

MCA banks can start logging errors once MCG_CTL is set. The AMD docs say
"The operating system must initialize the MCA_CONFIG registers prior to
initialization of the MCA_CTL registers."

"The MCA_CTL registers must be initialized prior to enabling the error
reporting banks in MCG_CTL".

However, the Intel docs "Machine-Check Initialization Pseudocode" say
MCG_CTL first then MCi_CTL.

But both agree that CR4.MCE should be set last.

We have an old thread on the topic that led to this patch.
https://lore.kernel.org/all/YqJHwXkg3Ny9fI3s@yaz-fattaah/

And it seemed okay at the time.
https://lore.kernel.org/all/YrnTMmwl5TrHwT9J@zn.tnic/

I don't think anything much has changed since then, so I included the
old patch again in this set.

Thanks,
Yazen
Re: [PATCH v2 06/16] x86/mce: Remove __mcheck_cpu_init_early()
Posted by Borislav Petkov 9 months, 3 weeks ago
On February 27, 2025 8:59:33 PM GMT+01:00, Yazen Ghannam <yazen.ghannam@amd.com> wrote:
>On Thu, Feb 27, 2025 at 08:33:19PM +0100, Borislav Petkov wrote:
>> On February 27, 2025 5:31:48 PM GMT+01:00, Yazen Ghannam <yazen.ghannam@amd.com> wrote:
>> >On Thu, Feb 27, 2025 at 04:25:00PM +0100, Borislav Petkov wrote:
>> >> On Thu, Feb 13, 2025 at 04:45:55PM +0000, Yazen Ghannam wrote:
>> >> > Also, move __mcheck_cpu_init_generic() after
>> >> > __mcheck_cpu_init_prepare_banks() so that MCA is enabled after the first
>> >> > MCA polling event.
>> >> 
>> >> The reason being?
>> >> 
>> >> Precaution?
>> >> 
>> >> It was this way since forever, why are you moving it now? Any particular
>> >> reason?
>> >> 
>> >
>> >1) To read/clear old errors before turning on MCA. The updated
>> >__mcheck_cpu_init_prepare_banks() function does this for the MCi_CTL
>> >registers. This patch does this for the MCG_CTL register too.
>> >
>> >2) To ensure that vendor-specific setup is finished beforehand also.
>> 
>> That doesn't answer my question. All of the above gets done even without shuffling the order...
>> 
>> 
>
>MCA banks can start logging errors once MCG_CTL is set. The AMD docs say
>"The operating system must initialize the MCA_CONFIG registers prior to
>initialization of the MCA_CTL registers."
>
>"The MCA_CTL registers must be initialized prior to enabling the error
>reporting banks in MCG_CTL".
>
>However, the Intel docs "Machine-Check Initialization Pseudocode" say
>MCG_CTL first then MCi_CTL.
>
>But both agree that CR4.MCE should be set last.
>
>We have an old thread on the topic that led to this patch.
>https://lore.kernel.org/all/YqJHwXkg3Ny9fI3s@yaz-fattaah/
>
>And it seemed okay at the time.
>https://lore.kernel.org/all/YrnTMmwl5TrHwT9J@zn.tnic/
>
>I don't think anything much has changed since then, so I included the
>old patch again in this set.
>
>Thanks,
>Yazen

This is exactly what needs to be in the commit message - why is the change being done. 
-- 
Sent from a small device: formatting sucks and brevity is inevitable.
Re: [PATCH v2 06/16] x86/mce: Remove __mcheck_cpu_init_early()
Posted by Yazen Ghannam 9 months, 3 weeks ago
On Thu, Feb 27, 2025 at 09:48:01PM +0100, Borislav Petkov wrote:
> On February 27, 2025 8:59:33 PM GMT+01:00, Yazen Ghannam <yazen.ghannam@amd.com> wrote:
> >On Thu, Feb 27, 2025 at 08:33:19PM +0100, Borislav Petkov wrote:
> >> On February 27, 2025 5:31:48 PM GMT+01:00, Yazen Ghannam <yazen.ghannam@amd.com> wrote:
> >> >On Thu, Feb 27, 2025 at 04:25:00PM +0100, Borislav Petkov wrote:
> >> >> On Thu, Feb 13, 2025 at 04:45:55PM +0000, Yazen Ghannam wrote:
> >> >> > Also, move __mcheck_cpu_init_generic() after
> >> >> > __mcheck_cpu_init_prepare_banks() so that MCA is enabled after the first
> >> >> > MCA polling event.
> >> >> 
> >> >> The reason being?
> >> >> 
> >> >> Precaution?
> >> >> 
> >> >> It was this way since forever, why are you moving it now? Any particular
> >> >> reason?
> >> >> 
> >> >
> >> >1) To read/clear old errors before turning on MCA. The updated
> >> >__mcheck_cpu_init_prepare_banks() function does this for the MCi_CTL
> >> >registers. This patch does this for the MCG_CTL register too.
> >> >
> >> >2) To ensure that vendor-specific setup is finished beforehand also.
> >> 
> >> That doesn't answer my question. All of the above gets done even without shuffling the order...
> >> 
> >> 
> >
> >MCA banks can start logging errors once MCG_CTL is set. The AMD docs say
> >"The operating system must initialize the MCA_CONFIG registers prior to
> >initialization of the MCA_CTL registers."
> >
> >"The MCA_CTL registers must be initialized prior to enabling the error
> >reporting banks in MCG_CTL".
> >
> >However, the Intel docs "Machine-Check Initialization Pseudocode" say
> >MCG_CTL first then MCi_CTL.
> >
> >But both agree that CR4.MCE should be set last.
> >
> >We have an old thread on the topic that led to this patch.
> >https://lore.kernel.org/all/YqJHwXkg3Ny9fI3s@yaz-fattaah/
> >
> >And it seemed okay at the time.
> >https://lore.kernel.org/all/YrnTMmwl5TrHwT9J@zn.tnic/
> >
> >I don't think anything much has changed since then, so I included the
> >old patch again in this set.
> >
> >Thanks,
> >Yazen
> 
> This is exactly what needs to be in the commit message - why is the change being done. 
> -- 

Sure thing. I'll update the commit message.

Thanks,
Yazen
RE: [PATCH v2 06/16] x86/mce: Remove __mcheck_cpu_init_early()
Posted by Zhuo, Qiuxu 10 months ago
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> Sent: Friday, February 14, 2025 12:46 AM
> To: x86@kernel.org; Luck, Tony <tony.luck@intel.com>
> Cc: linux-kernel@vger.kernel.org; linux-edac@vger.kernel.org;
> Smita.KoralahalliChannabasappa@amd.com; Yazen Ghannam
> <yazen.ghannam@amd.com>
> Subject: [PATCH v2 06/16] x86/mce: Remove __mcheck_cpu_init_early()
> 
> The __mcheck_cpu_init_early() function was introduced so that some
> vendor-specific features are detected before the first MCA polling event done
> in __mcheck_cpu_init_generic().
> 
> Currently, __mcheck_cpu_init_early() is only used on AMD-based systems
> and additional code will be needed to support various system configurations.
> 
> However, the current and future vendor-specific code should be done during
> vendor init. This keeps all the vendor code in a common location and
> simplifies the generic init flow.
> 
> Move all the __mcheck_cpu_init_early() code into mce_amd_feature_init().
> Also, move __mcheck_cpu_init_generic() after
> __mcheck_cpu_init_prepare_banks() so that MCA is enabled after the first
> MCA polling event.

Maybe providing a bit more information about the first MCA polling event, 
as shown below, would be clearer:

   "... so that MCA is enabled after the first MCA polling event for the machine 
    checks left over from the previous reset."

> Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>

Other than that,

    Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
  
Re: [PATCH v2 06/16] x86/mce: Remove __mcheck_cpu_init_early()
Posted by Yazen Ghannam 10 months ago
On Tue, Feb 18, 2025 at 03:00:12AM +0000, Zhuo, Qiuxu wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> > Sent: Friday, February 14, 2025 12:46 AM
> > To: x86@kernel.org; Luck, Tony <tony.luck@intel.com>
> > Cc: linux-kernel@vger.kernel.org; linux-edac@vger.kernel.org;
> > Smita.KoralahalliChannabasappa@amd.com; Yazen Ghannam
> > <yazen.ghannam@amd.com>
> > Subject: [PATCH v2 06/16] x86/mce: Remove __mcheck_cpu_init_early()
> > 
> > The __mcheck_cpu_init_early() function was introduced so that some
> > vendor-specific features are detected before the first MCA polling event done
> > in __mcheck_cpu_init_generic().
> > 
> > Currently, __mcheck_cpu_init_early() is only used on AMD-based systems
> > and additional code will be needed to support various system configurations.
> > 
> > However, the current and future vendor-specific code should be done during
> > vendor init. This keeps all the vendor code in a common location and
> > simplifies the generic init flow.
> > 
> > Move all the __mcheck_cpu_init_early() code into mce_amd_feature_init().
> > Also, move __mcheck_cpu_init_generic() after
> > __mcheck_cpu_init_prepare_banks() so that MCA is enabled after the first
> > MCA polling event.
> 
> Maybe providing a bit more information about the first MCA polling event, 
> as shown below, would be clearer:
> 
>    "... so that MCA is enabled after the first MCA polling event for the machine 
>     checks left over from the previous reset."
> 

Sure thing.

> > Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
> 
> Other than that,
> 
>     Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
>   

Thanks,
Yazen