[PATCH v1 1/4] x86/mce: Collect error message for severities below MCE_PANIC_SEVERITY

Shuai Xue posted 4 patches 10 months, 1 week ago
[PATCH v1 1/4] x86/mce: Collect error message for severities below MCE_PANIC_SEVERITY
Posted by Shuai Xue 10 months, 1 week ago
Currently, mce_no_way_out() only collects error messages when the error
severity is equal to `MCE_PANIC_SEVERITY`. To improve diagnostics,
modify the behavior to also collect error messages when the severity is
less than `MCE_PANIC_SEVERITY`.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 arch/x86/kernel/cpu/mce/core.c | 17 +++++++++++------
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 0dc00c9894c7..2919a077cd66 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -925,11 +925,12 @@ static __always_inline void quirk_zen_ifu(int bank, struct mce *m, struct pt_reg
  * Do a quick check if any of the events requires a panic.
  * This decides if we keep the events around or clear them.
  */
-static __always_inline int mce_no_way_out(struct mce_hw_err *err, char **msg, unsigned long *validp,
-					  struct pt_regs *regs)
+static __always_inline bool mce_no_way_out(struct mce_hw_err *err, char **msg,
+					   unsigned long *validp,
+					   struct pt_regs *regs)
 {
 	struct mce *m = &err->m;
-	char *tmp = *msg;
+	char *tmp = *msg, cur_sev = MCE_NO_SEVERITY, sev;
 	int i;
 
 	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
@@ -945,13 +946,17 @@ static __always_inline int mce_no_way_out(struct mce_hw_err *err, char **msg, un
 			quirk_zen_ifu(i, m, regs);
 
 		m->bank = i;
-		if (mce_severity(m, regs, &tmp, true) >= MCE_PANIC_SEVERITY) {
+		sev = mce_severity(m, regs, &tmp, true);
+		if (sev >= cur_sev) {
 			mce_read_aux(err, i);
 			*msg = tmp;
-			return 1;
+			cur_sev = sev;
 		}
+
+		if (cur_sev == MCE_PANIC_SEVERITY)
+			return true;
 	}
-	return 0;
+	return false;
 }
 
 /*
-- 
2.39.3
RE: [PATCH v1 1/4] x86/mce: Collect error message for severities below MCE_PANIC_SEVERITY
Posted by Luck, Tony 10 months, 1 week ago
> +	char *tmp = *msg, cur_sev = MCE_NO_SEVERITY, sev;

Should cur_sev and sev be of type "int" (since that's the type returned by mce_severity())?

It doesn't matter today since the list of return value does fit into "char", but it is setting up
to fail if that should ever change.

-Tony
Re: [PATCH v1 1/4] x86/mce: Collect error message for severities below MCE_PANIC_SEVERITY
Posted by Shuai Xue 10 months, 1 week ago

在 2025/2/12 00:51, Luck, Tony 写道:
>> +	char *tmp = *msg, cur_sev = MCE_NO_SEVERITY, sev;
> 
> Should cur_sev and sev be of type "int" (since that's the type returned by mce_severity())?
> 
> It doesn't matter today since the list of return value does fit into "char", but it is setting up
> to fail if that should ever change.
> 
> -Tony

You are right, I previously only focused on the fact that the field 'sev' of
struct severity is an unsigned char.

Will fix it in next version.

Thanks.
Shuai