arch/x86/include/asm/mce.h | 2 ++ arch/x86/kernel/cpu/mce/amd.c | 16 +++++++++++++--- 2 files changed, 15 insertions(+), 3 deletions(-)
Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
valid System Physical Address (SPA) is present in MCA_ADDR.
PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
if it indicates validity of SPA in MCA_ADDR.
PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
SPA or if it is implementation specific.
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
arch/x86/include/asm/mce.h | 2 ++
arch/x86/kernel/cpu/mce/amd.c | 16 +++++++++++++---
2 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 6c77c03139f7..387cf250525f 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -48,6 +48,7 @@
/* AMD-specific bits */
#define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */
+#define MCI_STATUS_PADDRVAL BIT_ULL(54) /* Valid System Physical Address */
#define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */
#define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */
#define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */
@@ -62,6 +63,7 @@
*/
#define MCI_CONFIG_MCAX 0x1
#define MCI_CONFIG_FRUTEXT BIT_ULL(9)
+#define MCI_CONFIG_PAVALID BIT_ULL(11)
#define MCI_IPID_MCATYPE 0xFFFF0000
#define MCI_IPID_HWID 0xFFF
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 5c4eb28c3ac9..6ac222aec28d 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m)
}
/*
- * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
- * a system physical address. Therefore, individual cases need to be detected.
- * Future cases and checks will be added as needed.
+ * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
+ * system physical address. Individual cases though, need to be detected for
+ * other systems. Future cases will be added as needed.
*
* 1) General case
* a) Assume address is not usable.
@@ -764,11 +764,21 @@ bool amd_mce_is_memory_error(struct mce *m)
* a) Reported in legacy bank 4 with extended error code (XEC) 8.
* b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
* this bit should not be checked.
+ * 4) MCI_STATUS_PADDRVAL is set
+ * a)Will provide a valid system physical address.
*
* NOTE: SMCA UMC memory errors fall into case #1.
*/
bool amd_mce_usable_address(struct mce *m)
{
+ u64 smca_config;
+
+ rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
+ if (smca_config & MCI_CONFIG_PAVALID) {
+ if(m->status & MCI_STATUS_PADDRVAL)
+ return true;
+ return false;
+ }
/* Check special northbridge case 3) first. */
if (!mce_flags.smca) {
if (legacy_mce_is_memory_error(m))
base-commit: d69139008b6dcd9c18483e956f61d187b0c214a2
--
2.43.0
On Tue, Jul 29, 2025 at 08:46:57PM +0000, Avadhut Naik wrote: > Starting with Zen6, AMD's Scalable MCA systems will incorporate two new > bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a > valid System Physical Address (SPA) is present in MCA_ADDR. > > PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural > indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or > if it indicates validity of SPA in MCA_ADDR. > > PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid > SPA or if it is implementation specific. The commit message is missing an imperative statement. You can describe how and why these bits are helpful. For example: "Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address." or something like this. > > Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> > --- > arch/x86/include/asm/mce.h | 2 ++ > arch/x86/kernel/cpu/mce/amd.c | 16 +++++++++++++--- > 2 files changed, 15 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h > index 6c77c03139f7..387cf250525f 100644 > --- a/arch/x86/include/asm/mce.h > +++ b/arch/x86/include/asm/mce.h > @@ -48,6 +48,7 @@ > > /* AMD-specific bits */ > #define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */ > +#define MCI_STATUS_PADDRVAL BIT_ULL(54) /* Valid System Physical Address */ > #define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */ > #define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */ > #define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */ > @@ -62,6 +63,7 @@ > */ > #define MCI_CONFIG_MCAX 0x1 > #define MCI_CONFIG_FRUTEXT BIT_ULL(9) > +#define MCI_CONFIG_PAVALID BIT_ULL(11) > #define MCI_IPID_MCATYPE 0xFFFF0000 > #define MCI_IPID_HWID 0xFFF > > diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c > index 5c4eb28c3ac9..6ac222aec28d 100644 > --- a/arch/x86/kernel/cpu/mce/amd.c > +++ b/arch/x86/kernel/cpu/mce/amd.c > @@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m) > } > > /* > - * AMD systems do not have an explicit indicator that the value in MCA_ADDR is > - * a system physical address. Therefore, individual cases need to be detected. > - * Future cases and checks will be added as needed. > + * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a > + * system physical address. Individual cases though, need to be detected for > + * other systems. Future cases will be added as needed. > * > * 1) General case > * a) Assume address is not usable. > @@ -764,11 +764,21 @@ bool amd_mce_is_memory_error(struct mce *m) > * a) Reported in legacy bank 4 with extended error code (XEC) 8. > * b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore, > * this bit should not be checked. > + * 4) MCI_STATUS_PADDRVAL is set > + * a)Will provide a valid system physical address. Space after "a)". > * > * NOTE: SMCA UMC memory errors fall into case #1. > */ > bool amd_mce_usable_address(struct mce *m) > { > + u64 smca_config; > + > + rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config); Newline here. > + if (smca_config & MCI_CONFIG_PAVALID) { > + if(m->status & MCI_STATUS_PADDRVAL) > + return true; > + return false; > + } Newline here. Also, the entire hunk above should go after the !SMCA case below and before the other SMCA cases. Furthermore, the hunk can be simplified to this: if (smca_config & MCI_CONFIG_PAVALID) return m->status & MCI_STATUS_PADDRVAL; Also also, the bit names are uncannily similar. I think they should be the same (except for the prefix) or clearly different. MCI_CONFIG_PADDRV/MCI_STATUS_PADDRV MCI_CONFIG_PHYS_ADDRV_SUPP/MCI_STATUS_PADDRVAL > /* Check special northbridge case 3) first. */ > if (!mce_flags.smca) { > if (legacy_mce_is_memory_error(m)) > > base-commit: d69139008b6dcd9c18483e956f61d187b0c214a2 > -- We should revisit saving MCA_CONFIG during logging. We end up reading it multiple times during error processing. I realize this came up before in a previous patch set. IIRC, the point of contention was exposing MCA_CONFIG to userspace. We don't necessarily have to do that. We can just use it in the kernel data structures. Thanks, Yazen
Hi, On 8/6/2025 07:52, Yazen Ghannam wrote: > On Tue, Jul 29, 2025 at 08:46:57PM +0000, Avadhut Naik wrote: >> Starting with Zen6, AMD's Scalable MCA systems will incorporate two new >> bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a >> valid System Physical Address (SPA) is present in MCA_ADDR. >> >> PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural >> indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or >> if it indicates validity of SPA in MCA_ADDR. >> >> PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid >> SPA or if it is implementation specific. > > The commit message is missing an imperative statement. You can describe > how and why these bits are helpful. > > For example: "Use and prefer MCA_STATUS[PhysAddrV] when checking for a > usable address." or something like this. > Will add it. >> >> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com> >> --- >> arch/x86/include/asm/mce.h | 2 ++ >> arch/x86/kernel/cpu/mce/amd.c | 16 +++++++++++++--- >> 2 files changed, 15 insertions(+), 3 deletions(-) >> >> diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h >> index 6c77c03139f7..387cf250525f 100644 >> --- a/arch/x86/include/asm/mce.h >> +++ b/arch/x86/include/asm/mce.h >> @@ -48,6 +48,7 @@ >> >> /* AMD-specific bits */ >> #define MCI_STATUS_TCC BIT_ULL(55) /* Task context corrupt */ >> +#define MCI_STATUS_PADDRVAL BIT_ULL(54) /* Valid System Physical Address */ >> #define MCI_STATUS_SYNDV BIT_ULL(53) /* synd reg. valid */ >> #define MCI_STATUS_DEFERRED BIT_ULL(44) /* uncorrected error, deferred exception */ >> #define MCI_STATUS_POISON BIT_ULL(43) /* access poisonous data */ >> @@ -62,6 +63,7 @@ >> */ >> #define MCI_CONFIG_MCAX 0x1 >> #define MCI_CONFIG_FRUTEXT BIT_ULL(9) >> +#define MCI_CONFIG_PAVALID BIT_ULL(11) >> #define MCI_IPID_MCATYPE 0xFFFF0000 >> #define MCI_IPID_HWID 0xFFF >> >> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c >> index 5c4eb28c3ac9..6ac222aec28d 100644 >> --- a/arch/x86/kernel/cpu/mce/amd.c >> +++ b/arch/x86/kernel/cpu/mce/amd.c >> @@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m) >> } >> >> /* >> - * AMD systems do not have an explicit indicator that the value in MCA_ADDR is >> - * a system physical address. Therefore, individual cases need to be detected. >> - * Future cases and checks will be added as needed. >> + * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a >> + * system physical address. Individual cases though, need to be detected for >> + * other systems. Future cases will be added as needed. >> * >> * 1) General case >> * a) Assume address is not usable. >> @@ -764,11 +764,21 @@ bool amd_mce_is_memory_error(struct mce *m) >> * a) Reported in legacy bank 4 with extended error code (XEC) 8. >> * b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore, >> * this bit should not be checked. >> + * 4) MCI_STATUS_PADDRVAL is set >> + * a)Will provide a valid system physical address. > > Space after "a)". > Noted. >> * >> * NOTE: SMCA UMC memory errors fall into case #1. >> */ >> bool amd_mce_usable_address(struct mce *m) >> { >> + u64 smca_config; >> + >> + rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config); > > Newline here. > Noted. >> + if (smca_config & MCI_CONFIG_PAVALID) { >> + if(m->status & MCI_STATUS_PADDRVAL) >> + return true; >> + return false; >> + } > > Newline here. > Noted. > Also, the entire hunk above should go after the !SMCA case below and > before the other SMCA cases. > > Furthermore, the hunk can be simplified to this: > > if (smca_config & MCI_CONFIG_PAVALID) > return m->status & MCI_STATUS_PADDRVAL; > Ack for both. > Also also, the bit names are uncannily similar. I think they should be > the same (except for the prefix) or clearly different. > > MCI_CONFIG_PADDRV/MCI_STATUS_PADDRV > > MCI_CONFIG_PHYS_ADDRV_SUPP/MCI_STATUS_PADDRVAL > Will change the name per your suggestions. >> /* Check special northbridge case 3) first. */ >> if (!mce_flags.smca) { >> if (legacy_mce_is_memory_error(m)) >> >> base-commit: d69139008b6dcd9c18483e956f61d187b0c214a2 >> -- > > We should revisit saving MCA_CONFIG during logging. We end up reading it > multiple times during error processing. > > I realize this came up before in a previous patch set. IIRC, the point > of contention was exposing MCA_CONFIG to userspace. We don't necessarily > have to do that. We can just use it in the kernel data structures. > I agree. If there is consensus, I can work on getting a patch out. Maybe make this a set with the first patch caching MCA_CONFIG and this being the second patch. > Thanks, > Yazen -- Thanks, Avadhut Naik
© 2016 - 2025 Red Hat, Inc.