[PATCH v3 15/17] x86/mce/amd: Support SMCA Corrected Error Interrupt

Yazen Ghannam posted 17 patches 8 months, 1 week ago
There is a newer version of this series
[PATCH v3 15/17] x86/mce/amd: Support SMCA Corrected Error Interrupt
Posted by Yazen Ghannam 8 months, 1 week ago
AMD systems optionally support MCA thresholding which provides the
ability for hardware to send an interrupt when a set error threshold is
reached. This feature counts errors of all severities, but it is
commonly used to report correctable errors with an interrupt rather than
polling.

Scalable MCA systems allow the Platform to take control of this feature.
In this case, the OS will not see the feature configuration and control
bits in the MCA_MISC* registers. The OS will not receive the MCA
thresholding interrupt, and it will need to poll for correctable errors.

A "corrected error interrupt" will be available on Scalable MCA systems.
This will be used in the same configuration where the Platform controls
MCA thresholding. However, the Platform will now be able to send the
MCA thresholding interrupt to the OS.

Check for the feature bit in the MCA_CONFIG register and confirm that
the MCA thresholding interrupt handler is already enabled. If successful,
set the feature enable bit in the MCA_CONFIG register to indicate to the
Platform that the OS is ready for the interrupt.

Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
---

Notes:
    Link:
    https://lore.kernel.org/r/20250213-wip-mca-updates-v2-15-3636547fe05f@amd.com
    
    v2->v3:
    * Add tags from Tony.
    
    v1->v2:
    * Use new per-CPU struct.

 arch/x86/kernel/cpu/mce/amd.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 9e226bdbdc40..d76a64c47a6d 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -306,6 +306,11 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
 			high |= BIT(5);
 		}
 
+		if ((low & BIT(10)) && data->thr_intr_en) {
+			__set_bit(bank, data->thr_intr_banks);
+			high |= BIT(8);
+		}
+
 		this_cpu_ptr(mce_banks_array)[bank].lsb_in_status = !!(low & BIT(8));
 
 		wrmsr(smca_config, low, high);

-- 
2.49.0
Re: [PATCH v3 15/17] x86/mce/amd: Support SMCA Corrected Error Interrupt
Posted by Borislav Petkov 7 months, 1 week ago
On Tue, Apr 15, 2025 at 02:55:10PM +0000, Yazen Ghannam wrote:
> @@ -306,6 +306,11 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
>  			high |= BIT(5);
>  		}

Yeah, the above statements explain in comments what they do so that we don't
have to define the bits but use them straight "naked" with the BIT macro.
I think you'd need to put something along the lines of that text...

> Check for the feature bit in the MCA_CONFIG register and confirm that
> the MCA thresholding interrupt handler is already enabled. If successful,
> set the feature enable bit in the MCA_CONFIG register to indicate to the
> Platform that the OS is ready for the interrupt.

... here.

<---

> +		if ((low & BIT(10)) && data->thr_intr_en) {
> +			__set_bit(bank, data->thr_intr_banks);
> +			high |= BIT(8);
> +		}
> +
>  		this_cpu_ptr(mce_banks_array)[bank].lsb_in_status = !!(low & BIT(8));
>  
>  		wrmsr(smca_config, low, high);
> 
> -- 
> 2.49.0
> 

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v3 15/17] x86/mce/amd: Support SMCA Corrected Error Interrupt
Posted by Yazen Ghannam 7 months, 1 week ago
On Fri, May 09, 2025 at 09:37:21PM +0200, Borislav Petkov wrote:
> On Tue, Apr 15, 2025 at 02:55:10PM +0000, Yazen Ghannam wrote:
> > @@ -306,6 +306,11 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
> >  			high |= BIT(5);
> >  		}
> 
> Yeah, the above statements explain in comments what they do so that we don't
> have to define the bits but use them straight "naked" with the BIT macro.
> I think you'd need to put something along the lines of that text...
> 
> > Check for the feature bit in the MCA_CONFIG register and confirm that
> > the MCA thresholding interrupt handler is already enabled. If successful,
> > set the feature enable bit in the MCA_CONFIG register to indicate to the
> > Platform that the OS is ready for the interrupt.
> 
> ... here.
> 
> <---
> 

Okay, will do.

Thanks,
Yazen