arch/x86/include/asm/mce.h | 2 ++ arch/x86/kernel/cpu/mce/amd.c | 15 ++++++++++++--- 2 files changed, 14 insertions(+), 3 deletions(-)
Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
valid System Physical Address (SPA) is present in MCA_ADDR.
PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
if it indicates validity of SPA in MCA_ADDR.
PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
SPA or if it is implementation specific.
Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address.
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
Changes in v2:
1. Modified commit message per feedback received.
2. Refactored and simplified per feedback received.
Links:
v1: https://lore.kernel.org/all/20250729204801.1044100-1-avadhut.naik@amd.com/
---
arch/x86/include/asm/mce.h | 2 ++
arch/x86/kernel/cpu/mce/amd.c | 15 ++++++++++++---
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 31e3cb550fb3..09d04cedf175 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -48,6 +48,7 @@
/* AMD-specific bits */
#define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */
+#define MCI_STATUS_PADDRV BIT_ULL(54) /* Valid System Physical Address */
#define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */
#define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */
#define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
@@ -62,6 +63,7 @@
*/
#define MCI_CONFIG_MCAX 0x1
#define MCI_CONFIG_FRUTEXT BIT_ULL(9)
+#define MCI_CONFIG_PADDRV BIT_ULL(11)
#define MCI_IPID_MCATYPE 0xFFFF0000
#define MCI_IPID_HWID 0xFFF
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index d6906442f49b..e92829a423b1 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m)
}
/*
- * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
- * a system physical address. Therefore, individual cases need to be detected.
- * Future cases and checks will be added as needed.
+ * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
+ * system physical address. Individual cases though, need to be detected for
+ * other systems. Future cases will be added as needed.
*
* 1) General case
* a) Assume address is not usable.
@@ -764,11 +764,15 @@ bool amd_mce_is_memory_error(struct mce *m)
* a) Reported in legacy bank 4 with extended error code (XEC) 8.
* b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
* this bit should not be checked.
+ * 4) MCI_STATUS_PADDRVAL is set
+ * a) Will provide a valid system physical address.
*
* NOTE: SMCA UMC memory errors fall into case #1.
*/
bool amd_mce_usable_address(struct mce *m)
{
+ u64 smca_config;
+
/* Check special northbridge case 3) first. */
if (!mce_flags.smca) {
if (legacy_mce_is_memory_error(m))
@@ -777,6 +781,11 @@ bool amd_mce_usable_address(struct mce *m)
return false;
}
+ rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
+
+ if (smca_config & MCI_CONFIG_PADDRV)
+ return m->status & MCI_STATUS_PADDRV;
+
/* Check poison bit for all other bank types. */
if (m->status & MCI_STATUS_POISON)
return true;
base-commit: 0292ef418ce08aad597fc0bba65b6dbb841808ba
--
2.43.0
On Wed, Oct 15, 2025 at 05:22:25PM +0000, Avadhut Naik wrote:
> @@ -777,6 +781,11 @@ bool amd_mce_usable_address(struct mce *m)
> return false;
> }
>
> + rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
We have struct smca_bank and per-CPU smca_banks array.
MCI_CONFIG_PADDRV looks like a property of the bank which is static and
doesn't change willy-nilly. So instead of doing the silly MSR read on every
error, you can cache the fact that the bank supports MCI_STATUS_PADDRV and
query that and save us 100+ unnecessary cycles every time...
No?
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
Do you mean something like this?
---
arch/x86/include/asm/mce.h | 2 ++
arch/x86/kernel/cpu/mce/amd.c | 15 ++++++++++++---
2 files changed, 14 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 1cfbfff0be3f..2d98886de09a 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -48,6 +48,7 @@
/* AMD-specific bits */
#define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */
+#define MCI_STATUS_PADDRV BIT_ULL(54) /* Valid System Physical Address */
#define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */
#define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */
#define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
@@ -62,6 +63,7 @@
*/
#define MCI_CONFIG_MCAX 0x1
#define MCI_CONFIG_FRUTEXT BIT_ULL(9)
+#define MCI_CONFIG_PADDRV BIT_ULL(11)
#define MCI_IPID_MCATYPE 0xFFFF0000
#define MCI_IPID_HWID 0xFFF
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 53385e6aa230..c6be2f520476 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -87,6 +87,7 @@ struct smca_bank {
const struct smca_hwid *hwid;
u32 id; /* Value of MCA_IPID[InstanceId]. */
u8 sysfs_id; /* Value used for sysfs name. */
+ bool paddrv_support; /* Physical Address Valid bit in MCA_CONFIG */
};
static DEFINE_PER_CPU_READ_MOSTLY(struct smca_bank[MAX_NR_BANKS], smca_banks);
@@ -327,6 +328,9 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
this_cpu_ptr(mce_banks_array)[bank].lsb_in_status = !!(low & BIT(8));
+ if (low & MCI_CONFIG_PADDRV)
+ this_cpu_ptr(smca_banks)[bank].paddrv_support = true;
+
wrmsr(smca_config, low, high);
}
@@ -790,9 +794,9 @@ bool amd_mce_is_memory_error(struct mce *m)
}
/*
- * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
- * a system physical address. Therefore, individual cases need to be detected.
- * Future cases and checks will be added as needed.
+ * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
+ * system physical address. Individual cases though, need to be detected for
+ * other systems. Future cases will be added as needed.
*
* 1) General case
* a) Assume address is not usable.
@@ -806,6 +810,8 @@ bool amd_mce_is_memory_error(struct mce *m)
* a) Reported in legacy bank 4 with extended error code (XEC) 8.
* b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
* this bit should not be checked.
+ * 4) MCI_STATUS_PADDRVAL is set
+ * a) Will provide a valid system physical address.
*
* NOTE: SMCA UMC memory errors fall into case #1.
*/
@@ -819,6 +825,9 @@ bool amd_mce_usable_address(struct mce *m)
return false;
}
+ if (this_cpu_ptr(smca_banks)[m->bank].paddrv_support)
+ return m->status & MCI_STATUS_PADDRV;
+
/* Check poison bit for all other bank types. */
if (m->status & MCI_STATUS_POISON)
return true;
base-commit: 438be5bb46f4be6e78cef7c3400f20d77f03c734
--
2.43.0
On Fri, Nov 14, 2025 at 08:20:20PM +0000, Avadhut Naik wrote:
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 53385e6aa230..c6be2f520476 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -87,6 +87,7 @@ struct smca_bank {
> const struct smca_hwid *hwid;
> u32 id; /* Value of MCA_IPID[InstanceId]. */
> u8 sysfs_id; /* Value used for sysfs name. */
> + bool paddrv_support; /* Physical Address Valid bit in MCA_CONFIG */
u64 paddrv : 1,
__reserved : 63;
Otherwise, yes.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Oct 15, 2025 at 05:22:25PM +0000, Avadhut Naik wrote: > Starting with Zen6, AMD's Scalable MCA systems will incorporate two new > bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a > valid System Physical Address (SPA) is present in MCA_ADDR. > > PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural > indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or > if it indicates validity of SPA in MCA_ADDR. > > PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid > SPA or if it is implementation specific. > > Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address. > > Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> > --- Minor nit: The $SUBJECT could be simpler, like "Use new physical address valid bit". And leave the details and proper field names for the commit message. But in any case, looks good to me. Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Thanks, Yazen
On 10/17/2025 09:31, Yazen Ghannam wrote: > On Wed, Oct 15, 2025 at 05:22:25PM +0000, Avadhut Naik wrote: >> Starting with Zen6, AMD's Scalable MCA systems will incorporate two new >> bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a >> valid System Physical Address (SPA) is present in MCA_ADDR. >> >> PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural >> indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or >> if it indicates validity of SPA in MCA_ADDR. >> >> PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid >> SPA or if it is implementation specific. >> >> Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address. >> >> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> >> --- > > Minor nit: > The $SUBJECT could be simpler, like "Use new physical address valid > bit". And leave the details and proper field names for the commit > message. > > But in any case, looks good to me. > > Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> > Thanks Yazen! Hi Boris, If you don't have any further feedback on this, do you want me to send another version with the subject changed per Yazen's recommendation? Or would you take care of it at your end? > Thanks, > Yazen -- Thanks, Avadhut Naik
© 2016 - 2025 Red Hat, Inc.