[PATCH v2 15/16] x86/mce/amd: Support SMCA Corrected Error Interrupt

Yazen Ghannam posted 16 patches 10 months, 1 week ago
There is a newer version of this series
[PATCH v2 15/16] x86/mce/amd: Support SMCA Corrected Error Interrupt
Posted by Yazen Ghannam 10 months, 1 week ago
AMD systems optionally support MCA thresholding which provides the
ability for hardware to send an interrupt when a set error threshold is
reached. This feature counts errors of all severities, but it is
commonly used to report correctable errors with an interrupt rather than
polling.

Scalable MCA systems allow the Platform to take control of this feature.
In this case, the OS will not see the feature configuration and control
bits in the MCA_MISC* registers. The OS will not receive the MCA
thresholding interrupt, and it will need to poll for correctable errors.

A "corrected error interrupt" will be available on Scalable MCA systems.
This will be used in the same configuration where the Platform controls
MCA thresholding. However, the Platform will now be able to send the
MCA thresholding interrupt to the OS.

Check for the feature bit in the MCA_CONFIG register and confirm that
the MCA thresholding interrupt handler is already enabled. If successful,
set the feature enable bit in the MCA_CONFIG register to indicate to the
Platform that the OS is ready for the interrupt.

Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20240523155641.2805411-10-yazen.ghannam@amd.com
    
    v1->v2:
    * Use new per-CPU struct.

 arch/x86/kernel/cpu/mce/amd.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 75a48f484815..404e0c38f9d8 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -306,6 +306,11 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
 			high |= BIT(5);
 		}
 
+		if ((low & BIT(10)) && data->thr_intr_en) {
+			__set_bit(bank, data->thr_intr_banks);
+			high |= BIT(8);
+		}
+
 		this_cpu_ptr(mce_banks_array)[bank].lsb_in_status = !!(low & BIT(8));
 
 		wrmsr(smca_config, low, high);

-- 
2.43.0
RE: [PATCH v2 15/16] x86/mce/amd: Support SMCA Corrected Error Interrupt
Posted by Zhuo, Qiuxu 10 months ago
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> [...]
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -306,6 +306,11 @@ static void smca_configure(unsigned int bank,
> unsigned int cpu)
>  			high |= BIT(5);
>  		}
> 

Do you think it would be better to have some comments on the thresholding 
interrupt here (as the bits 10/8 look like magic numbers), similar to the 
comments above for the deferred interrupt?

> +		if ((low & BIT(10)) && data->thr_intr_en) {
> +			__set_bit(bank, data->thr_intr_banks);
> +			high |= BIT(8);
> +		}
> +
> [...]
Re: [PATCH v2 15/16] x86/mce/amd: Support SMCA Corrected Error Interrupt
Posted by Yazen Ghannam 10 months ago
On Tue, Feb 18, 2025 at 01:27:04PM +0000, Zhuo, Qiuxu wrote:
> > From: Yazen Ghannam <yazen.ghannam@amd.com>
> > [...]
> > --- a/arch/x86/kernel/cpu/mce/amd.c
> > +++ b/arch/x86/kernel/cpu/mce/amd.c
> > @@ -306,6 +306,11 @@ static void smca_configure(unsigned int bank,
> > unsigned int cpu)
> >  			high |= BIT(5);
> >  		}
> > 
> 
> Do you think it would be better to have some comments on the thresholding 
> interrupt here (as the bits 10/8 look like magic numbers), similar to the 
> comments above for the deferred interrupt?
> 
> > +		if ((low & BIT(10)) && data->thr_intr_en) {
> > +			__set_bit(bank, data->thr_intr_banks);
> > +			high |= BIT(8);
> > +		}
> > +
> > [...]

Yes, I'd like to convert these to defines. I left that idea, for now, to
simplify the changes.

Thanks,
Yazen
Re: [PATCH v2 15/16] x86/mce/amd: Support SMCA Corrected Error Interrupt
Posted by Luck, Tony 10 months, 1 week ago
On Thu, Feb 13, 2025 at 04:46:04PM +0000, Yazen Ghannam wrote:
> @@ -306,6 +306,11 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
>  			high |= BIT(5);
>  		}
>  
> +		if ((low & BIT(10)) && data->thr_intr_en) {
> +			__set_bit(bank, data->thr_intr_banks);
> +			high |= BIT(8);

Maybe someday add some #define names for these BITs?

> +		}
> +
>  		this_cpu_ptr(mce_banks_array)[bank].lsb_in_status = !!(low & BIT(8));
>  
>  		wrmsr(smca_config, low, high);

-Tony
Re: [PATCH v2 15/16] x86/mce/amd: Support SMCA Corrected Error Interrupt
Posted by Yazen Ghannam 10 months ago
On Thu, Feb 13, 2025 at 02:34:59PM -0800, Luck, Tony wrote:
> On Thu, Feb 13, 2025 at 04:46:04PM +0000, Yazen Ghannam wrote:
> > @@ -306,6 +306,11 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
> >  			high |= BIT(5);
> >  		}
> >  
> > +		if ((low & BIT(10)) && data->thr_intr_en) {
> > +			__set_bit(bank, data->thr_intr_banks);
> > +			high |= BIT(8);
> 
> Maybe someday add some #define names for these BITs?
> 

Yep, I have something like that:
https://lore.kernel.org/r/20240404151359.47970-6-yazen.ghannam@amd.com
https://lore.kernel.org/r/20240404151359.47970-13-yazen.ghannam@amd.com

These sets keep growing, so I'm trying to do better about breaking
things up and prioritizing. :)

Thanks,
Yazen