[PATCH v6 3/5] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers

Avadhut Naik posted 5 patches 1 year, 3 months ago
There is a newer version of this series
[PATCH v6 3/5] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
Posted by Avadhut Naik 1 year, 3 months ago
Starting with Zen4, AMD's Scalable MCA systems incorporate two new
registers: MCA_SYND1 and MCA_SYND2.

These registers will include supplemental error information in addition
to the existing MCA_SYND register. The data within these registers is
considered valid if MCA_STATUS[SyndV] is set.

Userspace error decoding tools like the rasdaemon gather related hardware
error information through the tracepoints. As such, these two registers
should be exported through the mce_record tracepoint so that tools like
rasdaemon can parse them and output the supplemental error information
like FRU Text contained in them.

[Yazen: Drop Yazen's Co-developed-by tag and moved SoB tag.]

Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
Signed-off-by: Yazen Ghannam <yazen.ghannam@amd.com>
Signed-off-by: Avadhut Naik <avadhut.naik@amd.com>
---
Changes in v2:
[1] https://lore.kernel.org/linux-edac/20240521125434.1555845-1-yazen.ghannam@amd.com/
[2] https://lore.kernel.org/linux-edac/20240523155641.2805411-1-yazen.ghannam@amd.com/

1. Drop dependencies on sets [1] and [2] above and rebase on top of
tip/master.

Changes in v3:
1. Move wrapper changes required in mce_read_aux() and mce_no_way_out()
from this patch to the first patch.
2. Add comments to explain the new wrapper's purpose.
3. Modify commit message per feedback received.
4. Fix SoB chain to properly reflect the patch path.

Changes in v4:
1. Rebase on top of tip/master to avoid merge conflicts.

Changes in v5:
1. Remove "len" field since the length of a dynamic array can be fetched
from its metadata.
2. Substitute __print_array() with __print_dynamic_array().

Changes in v6:
1. Rebase on top of tip/master.
2. Use the newly introduced to_mce_hw_err macro in amd_decode_mce().
---
 arch/x86/include/asm/mce.h      | 22 ++++++++++++++++++++++
 arch/x86/include/uapi/asm/mce.h |  3 ++-
 arch/x86/kernel/cpu/mce/amd.c   |  5 ++++-
 arch/x86/kernel/cpu/mce/core.c  |  9 ++++++++-
 drivers/edac/mce_amd.c          |  8 ++++++--
 include/trace/events/mce.h      |  7 +++++--
 6 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 4e45f45673a3..c2466b20fe79 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -122,6 +122,9 @@
 #define MSR_AMD64_SMCA_MC0_DESTAT	0xc0002008
 #define MSR_AMD64_SMCA_MC0_DEADDR	0xc0002009
 #define MSR_AMD64_SMCA_MC0_MISC1	0xc000200a
+/* Registers MISC2 to MISC4 are at offsets B to D. */
+#define MSR_AMD64_SMCA_MC0_SYND1	0xc000200e
+#define MSR_AMD64_SMCA_MC0_SYND2	0xc000200f
 #define MSR_AMD64_SMCA_MCx_CTL(x)	(MSR_AMD64_SMCA_MC0_CTL + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_STATUS(x)	(MSR_AMD64_SMCA_MC0_STATUS + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_ADDR(x)	(MSR_AMD64_SMCA_MC0_ADDR + 0x10*(x))
@@ -132,6 +135,8 @@
 #define MSR_AMD64_SMCA_MCx_DESTAT(x)	(MSR_AMD64_SMCA_MC0_DESTAT + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_DEADDR(x)	(MSR_AMD64_SMCA_MC0_DEADDR + 0x10*(x))
 #define MSR_AMD64_SMCA_MCx_MISCy(x, y)	((MSR_AMD64_SMCA_MC0_MISC1 + y) + (0x10*(x)))
+#define MSR_AMD64_SMCA_MCx_SYND1(x)	(MSR_AMD64_SMCA_MC0_SYND1 + 0x10*(x))
+#define MSR_AMD64_SMCA_MCx_SYND2(x)	(MSR_AMD64_SMCA_MC0_SYND2 + 0x10*(x))
 
 #define XEC(x, mask)			(((x) >> 16) & mask)
 
@@ -190,9 +195,26 @@ enum mce_notifier_prios {
 /**
  * struct mce_hw_err - Hardware Error Record.
  * @m:		Machine Check record.
+ * @vendor:	Vendor-specific error information.
+ *
+ * Vendor-specific fields should not be added to struct mce.
+ * Instead, vendors should export their vendor-specific data
+ * through their structure in the vendor union below.
+ *
+ * AMD's vendor data is parsed by error decoding tools for
+ * supplemental error information. Thus, current offsets of
+ * existing fields must be maintained.
+ * Only add new fields at the end of AMD's vendor structure.
  */
 struct mce_hw_err {
 	struct mce m;
+
+	union vendor_info {
+		struct {
+			u64 synd1;		/* MCA_SYND1 MSR */
+			u64 synd2;		/* MCA_SYND2 MSR */
+		} amd;
+	} vendor;
 };
 
 #define	to_mce_hw_err(mce) container_of(mce, struct mce_hw_err, m)
diff --git a/arch/x86/include/uapi/asm/mce.h b/arch/x86/include/uapi/asm/mce.h
index db9adc081c5a..cb6b48a7c22b 100644
--- a/arch/x86/include/uapi/asm/mce.h
+++ b/arch/x86/include/uapi/asm/mce.h
@@ -8,7 +8,8 @@
 /*
  * Fields are zero when not available. Also, this struct is shared with
  * userspace mcelog and thus must keep existing fields at current offsets.
- * Only add new fields to the end of the structure
+ * Only add new, shared fields to the end of the structure.
+ * Do not add vendor-specific fields.
  */
 struct mce {
 	__u64 status;		/* Bank's MCi_STATUS MSR */
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 5b4d266500b2..6ca80fff1fea 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -797,8 +797,11 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc)
 	if (mce_flags.smca) {
 		rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid);
 
-		if (m->status & MCI_STATUS_SYNDV)
+		if (m->status & MCI_STATUS_SYNDV) {
 			rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd);
+			rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(bank), err.vendor.amd.synd1);
+			rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(bank), err.vendor.amd.synd2);
+		}
 	}
 
 	mce_log(&err);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index d3ac1a738b2e..b7f4a49a1053 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -202,6 +202,10 @@ static void __print_mce(struct mce_hw_err *err)
 	if (mce_flags.smca) {
 		if (m->synd)
 			pr_cont("SYND %llx ", m->synd);
+		if (err->vendor.amd.synd1)
+			pr_cont("SYND1 %llx ", err->vendor.amd.synd1);
+		if (err->vendor.amd.synd2)
+			pr_cont("SYND2 %llx ", err->vendor.amd.synd2);
 		if (m->ipid)
 			pr_cont("IPID %llx ", m->ipid);
 	}
@@ -678,8 +682,11 @@ static noinstr void mce_read_aux(struct mce_hw_err *err, int i)
 	if (mce_flags.smca) {
 		m->ipid = mce_rdmsrl(MSR_AMD64_SMCA_MCx_IPID(i));
 
-		if (m->status & MCI_STATUS_SYNDV)
+		if (m->status & MCI_STATUS_SYNDV) {
 			m->synd = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND(i));
+			err->vendor.amd.synd1 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND1(i));
+			err->vendor.amd.synd2 = mce_rdmsrl(MSR_AMD64_SMCA_MCx_SYND2(i));
+		}
 	}
 }
 
diff --git a/drivers/edac/mce_amd.c b/drivers/edac/mce_amd.c
index 8130c3dc64da..194d9fd47d20 100644
--- a/drivers/edac/mce_amd.c
+++ b/drivers/edac/mce_amd.c
@@ -793,6 +793,7 @@ static int
 amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 {
 	struct mce *m = (struct mce *)data;
+	struct mce_hw_err *err = to_mce_hw_err(m);
 	unsigned int fam = x86_family(m->cpuid);
 	int ecc;
 
@@ -850,8 +851,11 @@ amd_decode_mce(struct notifier_block *nb, unsigned long val, void *data)
 	if (boot_cpu_has(X86_FEATURE_SMCA)) {
 		pr_emerg(HW_ERR "IPID: 0x%016llx", m->ipid);
 
-		if (m->status & MCI_STATUS_SYNDV)
-			pr_cont(", Syndrome: 0x%016llx", m->synd);
+		if (m->status & MCI_STATUS_SYNDV) {
+			pr_cont(", Syndrome: 0x%016llx\n", m->synd);
+			pr_emerg(HW_ERR "Syndrome1: 0x%016llx, Syndrome2: 0x%016llx",
+				 err->vendor.amd.synd1, err->vendor.amd.synd2);
+		}
 
 		pr_cont("\n");
 
diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h
index 65aba1afcd07..ca6fa01791f2 100644
--- a/include/trace/events/mce.h
+++ b/include/trace/events/mce.h
@@ -43,6 +43,7 @@ TRACE_EVENT(mce_record,
 		__field(	u8,		bank		)
 		__field(	u8,		cpuvendor	)
 		__field(	u32,		microcode	)
+		__dynamic_array(u8, v_data, sizeof(err->vendor))
 	),
 
 	TP_fast_assign(
@@ -65,9 +66,10 @@ TRACE_EVENT(mce_record,
 		__entry->bank		= err->m.bank;
 		__entry->cpuvendor	= err->m.cpuvendor;
 		__entry->microcode	= err->m.microcode;
+		memcpy(__get_dynamic_array(v_data), &err->vendor, sizeof(err->vendor));
 	),
 
-	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR: %016Lx, MISC: %016Lx, SYND: %016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x",
+	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016llx, IPID: %016llx, ADDR: %016llx, MISC: %016llx, SYND: %016llx, RIP: %02x:<%016llx>, TSC: %llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x, vendor data: %s",
 		__entry->cpu,
 		__entry->mcgcap, __entry->mcgstatus,
 		__entry->bank, __entry->status,
@@ -83,7 +85,8 @@ TRACE_EVENT(mce_record,
 		__entry->walltime,
 		__entry->socketid,
 		__entry->apicid,
-		__entry->microcode)
+		__entry->microcode,
+		__print_dynamic_array(v_data, 8))
 );
 
 #endif /* _TRACE_MCE_H */
-- 
2.43.0
RE: [PATCH v6 3/5] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
Posted by Zhuo, Qiuxu 1 year, 3 months ago
> From: Avadhut Naik <avadhut.naik@amd.com>
> [...]
> --- a/include/trace/events/mce.h
> +++ b/include/trace/events/mce.h
> @@ -43,6 +43,7 @@ TRACE_EVENT(mce_record,
>  		__field(	u8,		bank		)
>  		__field(	u8,		cpuvendor	)
>  		__field(	u32,		microcode	)
> +		__dynamic_array(u8, v_data, sizeof(err->vendor))
>  	),
> 
>  	TP_fast_assign(
> @@ -65,9 +66,10 @@ TRACE_EVENT(mce_record,
>  		__entry->bank		= err->m.bank;
>  		__entry->cpuvendor	= err->m.cpuvendor;
>  		__entry->microcode	= err->m.microcode;
> +		memcpy(__get_dynamic_array(v_data), &err->vendor,
> +sizeof(err->vendor));
>  	),
> 
> -	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx,
> ADDR: %016Lx, MISC: %016Lx, SYND: %016Lx, RIP: %02x:<%016Lx>, TSC: %llx,
> PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x,
> microcode: %x",
> +	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016llx, IPID: %016llx,
> +ADDR: %016llx, MISC: %016llx, SYND: %016llx, RIP: %02x:<%016llx>, TSC:
> +%llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC:
> +%x, microcode: %x, vendor data: %s",
>  		__entry->cpu,
>  		__entry->mcgcap, __entry->mcgstatus,
>  		__entry->bank, __entry->status,
> @@ -83,7 +85,8 @@ TRACE_EVENT(mce_record,
>  		__entry->walltime,
>  		__entry->socketid,
>  		__entry->apicid,
> -		__entry->microcode)
> +		__entry->microcode,
> +		__print_dynamic_array(v_data, 8))

What is the 2nd parameter '8' about? 

The 2nd parameter is about the element size. 
The element type is 'u8', as defined above. 
Therefore:

    __print_dynamic_array(v_data, sizeof(u8)))
    
-Qiuxu

[...]
[PATCH v6 3/5] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
Posted by Naik, Avadhut 1 year, 3 months ago

On 10/17/2024 02:09, Zhuo, Qiuxu wrote:
>> From: Avadhut Naik <avadhut.naik@amd.com>
>> [...]
>> --- a/include/trace/events/mce.h
>> +++ b/include/trace/events/mce.h
>> @@ -43,6 +43,7 @@ TRACE_EVENT(mce_record,
>>  		__field(	u8,		bank		)
>>  		__field(	u8,		cpuvendor	)
>>  		__field(	u32,		microcode	)
>> +		__dynamic_array(u8, v_data, sizeof(err->vendor))
>>  	),
>>
>>  	TP_fast_assign(
>> @@ -65,9 +66,10 @@ TRACE_EVENT(mce_record,
>>  		__entry->bank		= err->m.bank;
>>  		__entry->cpuvendor	= err->m.cpuvendor;
>>  		__entry->microcode	= err->m.microcode;
>> +		memcpy(__get_dynamic_array(v_data), &err->vendor,
>> +sizeof(err->vendor));
>>  	),
>>
>> -	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx,
>> ADDR: %016Lx, MISC: %016Lx, SYND: %016Lx, RIP: %02x:<%016Lx>, TSC: %llx,
>> PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x,
>> microcode: %x",
>> +	TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016llx, IPID: %016llx,
>> +ADDR: %016llx, MISC: %016llx, SYND: %016llx, RIP: %02x:<%016llx>, TSC:
>> +%llx, PPIN: %llx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC:
>> +%x, microcode: %x, vendor data: %s",
>>  		__entry->cpu,
>>  		__entry->mcgcap, __entry->mcgstatus,
>>  		__entry->bank, __entry->status,
>> @@ -83,7 +85,8 @@ TRACE_EVENT(mce_record,
>>  		__entry->walltime,
>>  		__entry->socketid,
>>  		__entry->apicid,
>> -		__entry->microcode)
>> +		__entry->microcode,
>> +		__print_dynamic_array(v_data, 8))
> 
> What is the 2nd parameter '8' about? 
> 
> The 2nd parameter is about the element size. 
> The element type is 'u8', as defined above. 
> Therefore:
> 
>     __print_dynamic_array(v_data, sizeof(u8)))
>     
> -Qiuxu
> 
IIUC, the second parameter above determines how the dynamic
array is parsed and logged. The value of 8 means that the
array will be traversed with a u64 pointer i.e. data within
the array will be logged by the tracepoint in chunks of 8
bytes. Something like below:

vendor data: {0x3a726f7461636f4c,0x30434d305020,0x27000003fd}

This seems convenient since, AFAIK, MCA registers on x86-64
are of 8 bytes.

If we use sizeof(u8) (which equates to 1) above, then u8
pointer will be used for traversing the dynamic array and each
byte within the registers will be logged individually.
Something like below.

vendor data: {0x4c,0x6f,0x63,0x61,0x74,0x6f,0x72,0x3a,0x20,0x50,0x30,0x4d,0x43,0x30,0x0,0x0,0xfd,0x3,0x0,0x0,0x27,0x0,0x0,0x0}

Combined with endianness of the processor, this seems somewhat
inconvenient to decipher. Would you agree?

> [...]

-- 
Thanks,
Avadhut Naik
RE: [PATCH v6 3/5] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
Posted by Zhuo, Qiuxu 1 year, 3 months ago
> From: Naik, Avadhut <avadnaik@amd.com>
> [...]
> >     __print_dynamic_array(v_data, sizeof(u8)))
> >
> > -Qiuxu
> >
> IIUC, the second parameter above determines how the dynamic array is
> parsed and logged. The value of 8 means that the array will be traversed with
> a u64 pointer i.e. data within the array will be logged by the tracepoint in
> chunks of 8 bytes. Something like below:
> 
> vendor data: {0x3a726f7461636f4c,0x30434d305020,0x27000003fd}
> 
> This seems convenient since, AFAIK, MCA registers on x86-64 are of 8 bytes.

So, this is based on the assumption that all vendor data fields are of the u64 type, which may 
NOT be true for other x86 vendors in the future. In case there is some non-u64 vendor data in 
the future, the parser would need to break the u64 tracing data into u8 data and then composite 
the split u8 data into other types like u16 or u32, which seems even inconvenient.

IMHO: Printing the u8 tracing data and leaving the vendor-specific tool to parse the u8 data is a
more flexible and balanced approach.

Maybe Boris and Tony could provide more comments here.

> If we use sizeof(u8) (which equates to 1) above, then u8 pointer will be used
> for traversing the dynamic array and each byte within the registers will be
> logged individually.
> Something like below.
> 
> vendor data:
> {0x4c,0x6f,0x63,0x61,0x74,0x6f,0x72,0x3a,0x20,0x50,0x30,0x4d,0x43,0x30,0x0,
> 0x0,0xfd,0x3,0x0,0x0,0x27,0x0,0x0,0x0}
> Combined with endianness of the processor, this seems somewhat
> inconvenient to decipher. Would you agree?

x86 is in little-endian, which is consistent across all x86 processors. 
Endianness should not pose any inconvenience.

Thanks!
-Qiuxu
RE: [PATCH v6 3/5] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
Posted by Luck, Tony 1 year, 3 months ago
> So, this is based on the assumption that all vendor data fields are of the u64 type, which may
> NOT be true for other x86 vendors in the future. In case there is some non-u64 vendor data in
> the future, the parser would need to break the u64 tracing data into u8 data and then composite
> the split u8 data into other types like u16 or u32, which seems even inconvenient.
>
> IMHO: Printing the u8 tracing data and leaving the vendor-specific tool to parse the u8 data is a
> more flexible and balanced approach.
>
> Maybe Boris and Tony could provide more comments here.

I agree. It's hard to predict the future, but there seems to be no guarantee that
vendor specific fields will always be "u64" sized. Perhaps the MSR that data
is picked from has only a few bits defined so we elect to save those bits in
some smaller data type.

-Tony
Re: [PATCH v6 3/5] x86/mce, EDAC/mce_amd: Add support for new MCA_SYND{1,2} registers
Posted by Naik, Avadhut 1 year, 3 months ago

On 10/18/2024 10:28, Luck, Tony wrote:
>> So, this is based on the assumption that all vendor data fields are of the u64 type, which may
>> NOT be true for other x86 vendors in the future. In case there is some non-u64 vendor data in
>> the future, the parser would need to break the u64 tracing data into u8 data and then composite
>> the split u8 data into other types like u16 or u32, which seems even inconvenient.
>>
>> IMHO: Printing the u8 tracing data and leaving the vendor-specific tool to parse the u8 data is a
>> more flexible and balanced approach.
>>
>> Maybe Boris and Tony could provide more comments here.
> 
> I agree. It's hard to predict the future, but there seems to be no guarantee that
> vendor specific fields will always be "u64" sized. Perhaps the MSR that data
> is picked from has only a few bits defined so we elect to save those bits in
> some smaller data type.
> 
> -Tony
Okay, sounds good! As recommended, will change that parameter to sizeof(u8).

-- 
Thanks,
Avadhut Naik