[PATCH v2] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits

Avadhut Naik posted 1 patch 2 months ago
There is a newer version of this series
arch/x86/include/asm/mce.h    |  2 ++
arch/x86/kernel/cpu/mce/amd.c | 15 ++++++++++++---
2 files changed, 14 insertions(+), 3 deletions(-)
[PATCH v2] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
Posted by Avadhut Naik 2 months ago
Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
valid System Physical Address (SPA) is present in MCA_ADDR.

PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
if it indicates validity of SPA in MCA_ADDR.

PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
SPA or if it is implementation specific.

Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address.

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
Changes in v2:
1. Modified commit message per feedback received.
2. Refactored and simplified per feedback received.

Links:
v1: https://lore.kernel.org/all/20250729204801.1044100-1-avadhut.naik@amd.com/
---
 arch/x86/include/asm/mce.h    |  2 ++
 arch/x86/kernel/cpu/mce/amd.c | 15 ++++++++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 31e3cb550fb3..09d04cedf175 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -48,6 +48,7 @@
 
 /* AMD-specific bits */
 #define MCI_STATUS_TCC		BIT_ULL(55)  /* Task context corrupt */
+#define MCI_STATUS_PADDRV	BIT_ULL(54)  /* Valid System Physical Address */
 #define MCI_STATUS_SYNDV	BIT_ULL(53)  /* synd reg. valid */
 #define MCI_STATUS_DEFERRED	BIT_ULL(44)  /* uncorrected error, deferred exception */
 #define MCI_STATUS_POISON	BIT_ULL(43)  /* access poisonous data */
@@ -62,6 +63,7 @@
  */
 #define MCI_CONFIG_MCAX		0x1
 #define MCI_CONFIG_FRUTEXT	BIT_ULL(9)
+#define MCI_CONFIG_PADDRV	BIT_ULL(11)
 #define MCI_IPID_MCATYPE	0xFFFF0000
 #define MCI_IPID_HWID		0xFFF
 
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index d6906442f49b..e92829a423b1 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -748,9 +748,9 @@ bool amd_mce_is_memory_error(struct mce *m)
 }
 
 /*
- * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
- * a system physical address. Therefore, individual cases need to be detected.
- * Future cases and checks will be added as needed.
+ * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
+ * system physical address. Individual cases though, need to be detected for
+ * other systems. Future cases will be added as needed.
  *
  * 1) General case
  *	a) Assume address is not usable.
@@ -764,11 +764,15 @@ bool amd_mce_is_memory_error(struct mce *m)
  *	a) Reported in legacy bank 4 with extended error code (XEC) 8.
  *	b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
  *	   this bit should not be checked.
+ * 4) MCI_STATUS_PADDRVAL is set
+ *	a) Will provide a valid system physical address.
  *
  * NOTE: SMCA UMC memory errors fall into case #1.
  */
 bool amd_mce_usable_address(struct mce *m)
 {
+	u64 smca_config;
+
 	/* Check special northbridge case 3) first. */
 	if (!mce_flags.smca) {
 		if (legacy_mce_is_memory_error(m))
@@ -777,6 +781,11 @@ bool amd_mce_usable_address(struct mce *m)
 			return false;
 	}
 
+	rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);
+
+	if (smca_config & MCI_CONFIG_PADDRV)
+		return m->status & MCI_STATUS_PADDRV;
+
 	/* Check poison bit for all other bank types. */
 	if (m->status & MCI_STATUS_POISON)
 		return true;

base-commit: 0292ef418ce08aad597fc0bba65b6dbb841808ba
-- 
2.43.0
Re: [PATCH v2] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
Posted by Borislav Petkov 1 month ago
On Wed, Oct 15, 2025 at 05:22:25PM +0000, Avadhut Naik wrote:
> @@ -777,6 +781,11 @@ bool amd_mce_usable_address(struct mce *m)
>  			return false;
>  	}
>  
> +	rdmsrl(MSR_AMD64_SMCA_MCx_CONFIG(m->bank), smca_config);

We have struct smca_bank and per-CPU smca_banks array.

MCI_CONFIG_PADDRV looks like a property of the bank which is static and
doesn't change willy-nilly. So instead of doing the silly MSR read on every
error, you can cache the fact that the bank supports MCI_STATUS_PADDRV and
query that and save us 100+ unnecessary cycles every time...

No?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
[PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
Posted by Avadhut Naik 1 month ago
Do you mean something like this?
---
 arch/x86/include/asm/mce.h    |  2 ++
 arch/x86/kernel/cpu/mce/amd.c | 15 ++++++++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 1cfbfff0be3f..2d98886de09a 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -48,6 +48,7 @@
 
 /* AMD-specific bits */
 #define MCI_STATUS_TCC		BIT_ULL(55)  /* Task context corrupt */
+#define MCI_STATUS_PADDRV	BIT_ULL(54)  /* Valid System Physical Address */
 #define MCI_STATUS_SYNDV	BIT_ULL(53)  /* synd reg. valid */
 #define MCI_STATUS_DEFERRED	BIT_ULL(44)  /* uncorrected error, deferred exception */
 #define MCI_STATUS_POISON	BIT_ULL(43)  /* access poisonous data */
@@ -62,6 +63,7 @@
  */
 #define MCI_CONFIG_MCAX		0x1
 #define MCI_CONFIG_FRUTEXT	BIT_ULL(9)
+#define MCI_CONFIG_PADDRV	BIT_ULL(11)
 #define MCI_IPID_MCATYPE	0xFFFF0000
 #define MCI_IPID_HWID		0xFFF
 
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 53385e6aa230..c6be2f520476 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -87,6 +87,7 @@ struct smca_bank {
 	const struct smca_hwid *hwid;
 	u32 id;			/* Value of MCA_IPID[InstanceId]. */
 	u8 sysfs_id;		/* Value used for sysfs name. */
+	bool paddrv_support;	/* Physical Address Valid bit in MCA_CONFIG */
 };
 
 static DEFINE_PER_CPU_READ_MOSTLY(struct smca_bank[MAX_NR_BANKS], smca_banks);
@@ -327,6 +328,9 @@ static void smca_configure(unsigned int bank, unsigned int cpu)
 
 		this_cpu_ptr(mce_banks_array)[bank].lsb_in_status = !!(low & BIT(8));
 
+		if (low & MCI_CONFIG_PADDRV)
+			this_cpu_ptr(smca_banks)[bank].paddrv_support = true;
+
 		wrmsr(smca_config, low, high);
 	}
 
@@ -790,9 +794,9 @@ bool amd_mce_is_memory_error(struct mce *m)
 }
 
 /*
- * AMD systems do not have an explicit indicator that the value in MCA_ADDR is
- * a system physical address. Therefore, individual cases need to be detected.
- * Future cases and checks will be added as needed.
+ * Some AMD systems have an explicit indicator that the value in MCA_ADDR is a
+ * system physical address. Individual cases though, need to be detected for
+ * other systems. Future cases will be added as needed.
  *
  * 1) General case
  *	a) Assume address is not usable.
@@ -806,6 +810,8 @@ bool amd_mce_is_memory_error(struct mce *m)
  *	a) Reported in legacy bank 4 with extended error code (XEC) 8.
  *	b) MCA_STATUS[43] is *not* defined as poison in legacy bank 4. Therefore,
  *	   this bit should not be checked.
+ * 4) MCI_STATUS_PADDRVAL is set
+ *	a) Will provide a valid system physical address.
  *
  * NOTE: SMCA UMC memory errors fall into case #1.
  */
@@ -819,6 +825,9 @@ bool amd_mce_usable_address(struct mce *m)
 			return false;
 	}
 
+	if (this_cpu_ptr(smca_banks)[m->bank].paddrv_support)
+		return m->status & MCI_STATUS_PADDRV;
+
 	/* Check poison bit for all other bank types. */
 	if (m->status & MCI_STATUS_POISON)
 		return true;

base-commit: 438be5bb46f4be6e78cef7c3400f20d77f03c734
-- 
2.43.0
Re: [PATCH] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
Posted by Borislav Petkov 1 month ago
On Fri, Nov 14, 2025 at 08:20:20PM +0000, Avadhut Naik wrote:
> diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
> index 53385e6aa230..c6be2f520476 100644
> --- a/arch/x86/kernel/cpu/mce/amd.c
> +++ b/arch/x86/kernel/cpu/mce/amd.c
> @@ -87,6 +87,7 @@ struct smca_bank {
>  	const struct smca_hwid *hwid;
>  	u32 id;			/* Value of MCA_IPID[InstanceId]. */
>  	u8 sysfs_id;		/* Value used for sysfs name. */
> +	bool paddrv_support;	/* Physical Address Valid bit in MCA_CONFIG */

	u64 paddrv	: 1,
	    __reserved	: 63;

Otherwise, yes.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCH v2] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
Posted by Yazen Ghannam 2 months ago
On Wed, Oct 15, 2025 at 05:22:25PM +0000, Avadhut Naik wrote:
> Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
> bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
> valid System Physical Address (SPA) is present in MCA_ADDR.
> 
> PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
> indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
> if it indicates validity of SPA in MCA_ADDR.
> 
> PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
> SPA or if it is implementation specific.
> 
> Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address.
> 
> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
> ---

Minor nit:
The $SUBJECT could be simpler, like "Use new physical address valid
bit". And leave the details and proper field names for the commit
message.

But in any case, looks good to me.

Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>

Thanks,
Yazen
Re: [PATCH v2] x86/mce: Add support for PHYSADDRV and PHYSADDRVALIDSUPPORTED bits
Posted by Naik, Avadhut 2 months ago

On 10/17/2025 09:31, Yazen Ghannam wrote:
> On Wed, Oct 15, 2025 at 05:22:25PM +0000, Avadhut Naik wrote:
>> Starting with Zen6, AMD's Scalable MCA systems will incorporate two new
>> bits in MCA_STATUS and MCA_CONFIG MSRs. These bits will indicate if a
>> valid System Physical Address (SPA) is present in MCA_ADDR.
>>
>> PhysAddrValidSupported bit (MCA_CONFIG[11]) serves as the architectural
>> indicator and states if PhysAddrV bit (MCA_STATUS[54]) is Reserved or
>> if it indicates validity of SPA in MCA_ADDR.
>>
>> PhysAddrV bit (MCA_STATUS[54]) advertises if MCA_ADDR contains valid
>> SPA or if it is implementation specific.
>>
>> Use and prefer MCA_STATUS[PhysAddrV] when checking for a usable address.
>>
>> Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
>> ---
> 
> Minor nit:
> The $SUBJECT could be simpler, like "Use new physical address valid
> bit". And leave the details and proper field names for the commit
> message.
> 
> But in any case, looks good to me.
> 
> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com>
> 
Thanks Yazen!

Hi Boris,

If you don't have any further feedback on this, do you want me to send
another version with the subject changed per Yazen's recommendation?
Or would you take care of it at your end?

> Thanks,
> Yazen

-- 
Thanks,
Avadhut Naik