[PATCH v2 0/3] x86/mce: Add Zhaoxin MCE support

Tony W Wang-oc posted 3 patches 2 months, 2 weeks ago
There is a newer version of this series
arch/x86/Kconfig                    |  8 ++++
arch/x86/kernel/cpu/mce/Makefile    |  2 +-
arch/x86/kernel/cpu/mce/core.c      | 70 +++++++++--------------------
arch/x86/kernel/cpu/mce/intel.c     |  8 ++--
arch/x86/kernel/cpu/mce/internal.h  | 14 +++++-
arch/x86/kernel/cpu/mce/threshold.c |  4 ++
arch/x86/kernel/cpu/mce/zhaoxin.c   | 53 ++++++++++++++++++++++
7 files changed, 104 insertions(+), 55 deletions(-)
create mode 100644 arch/x86/kernel/cpu/mce/zhaoxin.c
[PATCH v2 0/3] x86/mce: Add Zhaoxin MCE support
Posted by Tony W Wang-oc 2 months, 2 weeks ago
From: Lyle Li <LyleLi@zhaoxin.com>

Zhaoxin consists of two vendors, X86_VENDOR_ZHAOXIN and
X86_VENDOR_CENTAUR, so add the centaur vendor to support
Zhaoxin MCA in mce/core.c and mce/intel.c.

For the sake of code standardization, add zhaoxin.c to
override the Zhaoxin MCA code.

Zhaoxin CPUs support CMCI compatible with Intel, because
Zhaoxin's UCR error is not reported through CMCI, and in
order to be compatible with intel's CMCI code, so add Zhaoxin
CMCI storm toggle to support the new CMCI storm switching
in mce/intel.c, mce/zhaoxin.c, mce/threshold.c, and mce/internal.h.

v1->v2:
 - Fix multiple definition of "mce_zhaoxin_feature_init" (patch 2/3)
 - Fix multiple definition of "mce_zhaoxin_feature_clear" (patch 2/3)
 - Fix multiple definition of "mce_zhaoxin_handle_storm" (patch 3/3)

Lyle Li (3):
  x86/mce: Add centaur vendor to support Zhaoxin MCA
  x86/mce: Add zhaoxin.c to support Zhaoxin MCA
  x86/mce: Add CMCI storm switching support for Zhaoxin

 arch/x86/Kconfig                    |  8 ++++
 arch/x86/kernel/cpu/mce/Makefile    |  2 +-
 arch/x86/kernel/cpu/mce/core.c      | 70 +++++++++--------------------
 arch/x86/kernel/cpu/mce/intel.c     |  8 ++--
 arch/x86/kernel/cpu/mce/internal.h  | 14 +++++-
 arch/x86/kernel/cpu/mce/threshold.c |  4 ++
 arch/x86/kernel/cpu/mce/zhaoxin.c   | 53 ++++++++++++++++++++++
 7 files changed, 104 insertions(+), 55 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/mce/zhaoxin.c

-- 
2.34.1
[PATCH v3 0/3] x86/mce: Add Zhaoxin MCE support
Posted by Tony W Wang-oc 2 months, 1 week ago
From: Lyle Li <LyleLi@zhaoxin.com>

Zhaoxin consists of two vendors, X86_VENDOR_ZHAOXIN and
X86_VENDOR_CENTAUR, so add the centaur vendor to support
Zhaoxin MCA in mce/core.c and mce/intel.c.

For the sake of code standardization, add zhaoxin.c to
override the Zhaoxin MCA code.

Zhaoxin CPUs support CMCI compatible with Intel, because
Zhaoxin's UCR error is not reported through CMCI, and in
order to be compatible with intel's CMCI code, so add Zhaoxin
CMCI storm toggle to support the new CMCI storm switching
in mce/intel.c, mce/zhaoxin.c, mce/threshold.c, and mce/internal.h.

v2->v3:
 - Consolidate the MCA code for Zhaoxin and Centaur regarding the
   broadcast MCE configuration (patch 1/3, 2/3)

v1->v2:
 - Fix multiple definition of "mce_zhaoxin_feature_init" (patch 2/3)
 - Fix multiple definition of "mce_zhaoxin_feature_clear" (patch 2/3)
 - Fix multiple definition of "mce_zhaoxin_handle_storm" (patch 3/3)

Lyle Li (3):
  x86/mce: Add centaur vendor to support Zhaoxin MCA
  x86/mce: Add zhaoxin.c to support Zhaoxin MCA
  x86/mce: Add CMCI storm switching support for Zhaoxin

 arch/x86/Kconfig                    |  8 +++
 arch/x86/kernel/cpu/mce/Makefile    |  2 +-
 arch/x86/kernel/cpu/mce/core.c      | 69 ++++--------------------
 arch/x86/kernel/cpu/mce/intel.c     |  8 +--
 arch/x86/kernel/cpu/mce/internal.h  | 14 ++++-
 arch/x86/kernel/cpu/mce/threshold.c |  4 ++
 arch/x86/kernel/cpu/mce/zhaoxin.c   | 83 +++++++++++++++++++++++++++++
 7 files changed, 122 insertions(+), 66 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/mce/zhaoxin.c

-- 
2.34.1
[PATCH v4 0/4] x86/mce: Add Zhaoxin MCE support and remove
Posted by Tony W Wang-oc 2 months, 1 week ago
From: Lyle Li <LyleLi@zhaoxin.com>

Zhaoxin consists of two vendors, X86_VENDOR_ZHAOXIN and
X86_VENDOR_CENTAUR, so add the centaur vendor to support
Zhaoxin MCA in mce/core.c and mce/intel.c.

Since all major vendors do not disable MCA_CTL after initialization,
functions that disable error reporting should be removed in mce/core.c.

For the sake of code standardization, add zhaoxin.c to
override the Zhaoxin MCA code.

Zhaoxin CPUs support CMCI compatible with Intel, because
Zhaoxin's UCR error is not reported through CMCI, and in
order to be compatible with intel's CMCI code, so add Zhaoxin
CMCI storm toggle to support the new CMCI storm switching
in mce/intel.c, mce/zhaoxin.c, mce/threshold.c, and mce/internal.h.

v3->v4:
 - Remove functions that disable error reporting (patch 2/4)

v2->v3:
 - Consolidate the MCA code for Zhaoxin and Centaur regarding the
   broadcast MCE configuration (patch 3/4)

v1->v2:
 - Fix multiple definition of "mce_zhaoxin_feature_init" (patch 3/4)
 - Fix multiple definition of "mce_zhaoxin_feature_clear" (patch 3/4)
 - Fix multiple definition of "mce_zhaoxin_handle_storm" (patch 4/4)

Lyle Li (4):
  x86/mce: Add centaur vendor to support Zhaoxin MCA
  x86/mce: Remove functions that disable error reporting
  x86/mce: Add zhaoxin.c to support Zhaoxin MCA
  x86/mce: Add CMCI storm switching support for Zhaoxin

 arch/x86/Kconfig                    |   8 ++
 arch/x86/kernel/cpu/mce/Makefile    |   2 +-
 arch/x86/kernel/cpu/mce/core.c      | 116 ++--------------------------
 arch/x86/kernel/cpu/mce/intel.c     |   8 +-
 arch/x86/kernel/cpu/mce/internal.h  |  14 +++-
 arch/x86/kernel/cpu/mce/threshold.c |   4 +
 arch/x86/kernel/cpu/mce/zhaoxin.c   |  83 ++++++++++++++++++++
 7 files changed, 120 insertions(+), 115 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/mce/zhaoxin.c

-- 
2.34.1
[PATCH v5 0/4] x86/mce: Add Zhaoxin MCE support and remove functions that disable error reporting
Posted by Tony W Wang-oc 1 month, 2 weeks ago
From: Lyle Li <LyleLi@zhaoxin.com>

Zhaoxin consists of two vendors, X86_VENDOR_ZHAOXIN and
X86_VENDOR_CENTAUR. Add the missing Centaur vendor to
support Zhaoxin MCA.

Since all major vendors do not disable MCA_CTL after initialization,
remove the functions that disable error reporting.

For the sake of code standardization, add zhaoxin.c to
override the Zhaoxin MCA code.

Zhaoxin CPUs support CMCI which is compatible with Intel, but
their UCR errors are not reported through CMCI like Intel's. To
be compatible with intel's CMCI code, add Zhaoxin's specific
CMCI storm toggle.

v4->v5:
 - Simplify the commit message (patch 1/4)
 - Modify the commit message (patch 2/4)
 - Simplify the code comments and modify the compilation order in the
   Makefile (patch 3/4)
 - Simplify the commit message and modify the order of header file
   imports (patch 4/4)

v3->v4:
 - Remove functions that disable error reporting (patch 2/4)

v2->v3:
 - Consolidate the MCA code for Zhaoxin and Centaur regarding the
   broadcast MCE configuration (patch 3/4)

v1->v2:
 - Fix multiple definition of "mce_zhaoxin_feature_init" (patch 3/4)
 - Fix multiple definition of "mce_zhaoxin_feature_clear" (patch 3/4)
 - Fix multiple definition of "mce_zhaoxin_handle_storm" (patch 4/4)

Lyle Li (4):
  x86/mce: Add Centaur vendor to support Zhaoxin MCA
  x86/mce: Remove functions that disable error reporting
  x86/mce: Add zhaoxin.c to support Zhaoxin MCA
  x86/mce: Add CMCI storm switching support for Zhaoxin

 arch/x86/Kconfig                    |   8 ++
 arch/x86/kernel/cpu/mce/Makefile    |   1 +
 arch/x86/kernel/cpu/mce/core.c      | 116 ++--------------------------
 arch/x86/kernel/cpu/mce/intel.c     |   8 +-
 arch/x86/kernel/cpu/mce/internal.h  |  14 ++++
 arch/x86/kernel/cpu/mce/threshold.c |   4 +
 arch/x86/kernel/cpu/mce/zhaoxin.c   |  82 ++++++++++++++++++++
 7 files changed, 120 insertions(+), 113 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/mce/zhaoxin.c

-- 
2.34.1
[PATCH v5 1/4] x86/mce: Add Centaur vendor to support Zhaoxin MCA
Posted by Tony W Wang-oc 1 month, 2 weeks ago
From: Lyle Li <LyleLi@zhaoxin.com>

Zhaoxin consists of two vendors, X86_VENDOR_ZHAOXIN and
X86_VENDOR_CENTAUR. Add the missing Centaur vendor to
support Zhaoxin MCA.

Signed-off-by: Lyle Li <LyleLi@zhaoxin.com>
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c  | 9 +++++++--
 arch/x86/kernel/cpu/mce/intel.c | 3 ++-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2a938f429c4d..1b647869c320 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -508,6 +508,7 @@ bool mce_usable_address(struct mce *m)
 
 	case X86_VENDOR_INTEL:
 	case X86_VENDOR_ZHAOXIN:
+	case X86_VENDOR_CENTAUR:
 		return intel_mce_usable_address(m);
 
 	default:
@@ -525,6 +526,7 @@ bool mce_is_memory_error(struct mce *m)
 
 	case X86_VENDOR_INTEL:
 	case X86_VENDOR_ZHAOXIN:
+	case X86_VENDOR_CENTAUR:
 		/*
 		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
 		 *
@@ -1254,7 +1256,8 @@ static noinstr bool mce_check_crashing_cpu(void)
 
 		mcgstatus = __rdmsr(MSR_IA32_MCG_STATUS);
 
-		if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN) {
+		if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN ||
+		    boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR) {
 			if (mcgstatus & MCG_STATUS_LMCES)
 				return false;
 		}
@@ -1528,7 +1531,8 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * on Intel, Zhaoxin only.
 	 */
 	if (m.cpuvendor == X86_VENDOR_INTEL ||
-	    m.cpuvendor == X86_VENDOR_ZHAOXIN)
+	    m.cpuvendor == X86_VENDOR_ZHAOXIN ||
+	    m.cpuvendor == X86_VENDOR_CENTAUR)
 		lmce = m.mcgstatus & MCG_STATUS_LMCES;
 
 	/*
@@ -2099,6 +2103,7 @@ static void __mcheck_cpu_clear_vendor(struct cpuinfo_x86 *c)
 		break;
 
 	case X86_VENDOR_ZHAOXIN:
+	case X86_VENDOR_CENTAUR:
 		mce_zhaoxin_feature_clear(c);
 		break;
 
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index f6103e6bf69a..b7e67f4f7edd 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -88,7 +88,8 @@ static int cmci_supported(int *banks)
 	 * makes sure none of the backdoors are entered otherwise.
 	 */
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN)
+	    boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN &&
+	    boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR)
 		return 0;
 
 	if (!boot_cpu_has(X86_FEATURE_APIC) || lapic_get_maxlvt() < 6)
-- 
2.34.1
[PATCH v5 2/4] x86/mce: Remove functions that disable error reporting
Posted by Tony W Wang-oc 1 month, 2 weeks ago
From: Lyle Li <LyleLi@zhaoxin.com>

Since all major vendors do not disable MCA_CTL after initialization,
remove the functions that disable error reporting.

Signed-off-by: Lyle Li <LyleLi@zhaoxin.com>
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
 arch/x86/kernel/cpu/mce/core.c | 50 ----------------------------------
 1 file changed, 50 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 1b647869c320..f71b33b96b5b 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2383,53 +2383,6 @@ int __init mcheck_init(void)
  * mce_syscore: PM support
  */
 
-/*
- * Disable machine checks on suspend and shutdown. We can't really handle
- * them later.
- */
-static void mce_disable_error_reporting(void)
-{
-	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
-	int i;
-
-	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
-		struct mce_bank *b = &mce_banks[i];
-
-		if (b->init)
-			wrmsrl(mca_msr_reg(i, MCA_CTL), 0);
-	}
-	return;
-}
-
-static void vendor_disable_error_reporting(void)
-{
-	/*
-	 * Don't clear on Intel or AMD or Hygon or Zhaoxin CPUs. Some of these
-	 * MSRs are socket-wide. Disabling them for just a single offlined CPU
-	 * is bad, since it will inhibit reporting for all shared resources on
-	 * the socket like the last level cache (LLC), the integrated memory
-	 * controller (iMC), etc.
-	 */
-	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
-	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON ||
-	    boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
-	    boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN)
-		return;
-
-	mce_disable_error_reporting();
-}
-
-static int mce_syscore_suspend(void)
-{
-	vendor_disable_error_reporting();
-	return 0;
-}
-
-static void mce_syscore_shutdown(void)
-{
-	vendor_disable_error_reporting();
-}
-
 /*
  * On resume clear all MCE state. Don't want to see leftovers from the BIOS.
  * Only one CPU is active at this time, the others get re-added later using
@@ -2443,8 +2396,6 @@ static void mce_syscore_resume(void)
 }
 
 static struct syscore_ops mce_syscore_ops = {
-	.suspend	= mce_syscore_suspend,
-	.shutdown	= mce_syscore_shutdown,
 	.resume		= mce_syscore_resume,
 };
 
@@ -2729,7 +2680,6 @@ static void mce_disable_cpu(void)
 	if (!cpuhp_tasks_frozen)
 		cmci_clear();
 
-	vendor_disable_error_reporting();
 }
 
 static void mce_reenable_cpu(void)
-- 
2.34.1
[PATCH v5 3/4] x86/mce: Add zhaoxin.c to support Zhaoxin MCA
Posted by Tony W Wang-oc 1 month, 2 weeks ago
From: Lyle Li <LyleLi@zhaoxin.com>

For the sake of code standardization, add zhaoxin.c to
override the Zhaoxin MCA code.

Signed-off-by: Lyle Li <LyleLi@zhaoxin.com>
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
 arch/x86/Kconfig                   |  8 ++++
 arch/x86/kernel/cpu/mce/Makefile   |  1 +
 arch/x86/kernel/cpu/mce/core.c     | 57 --------------------------
 arch/x86/kernel/cpu/mce/internal.h |  7 ++++
 arch/x86/kernel/cpu/mce/zhaoxin.c  | 64 ++++++++++++++++++++++++++++++
 5 files changed, 80 insertions(+), 57 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/mce/zhaoxin.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1ea18662942c..a7993835f460 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1183,6 +1183,14 @@ config X86_MCE_INTEL
 	  Additional support for intel specific MCE features such as
 	  the thermal monitor.
 
+config X86_MCE_ZHAOXIN
+	def_bool y
+	prompt "Zhaoxin MCE features"
+	depends on X86_MCE_INTEL
+	help
+	  Additional support for Zhaoxin specific MCE features such as
+	  the corrected machine check interrupt.
+
 config X86_MCE_AMD
 	def_bool y
 	prompt "AMD MCE features"
diff --git a/arch/x86/kernel/cpu/mce/Makefile b/arch/x86/kernel/cpu/mce/Makefile
index 015856abdbb1..7b52c8f2f08e 100644
--- a/arch/x86/kernel/cpu/mce/Makefile
+++ b/arch/x86/kernel/cpu/mce/Makefile
@@ -4,6 +4,7 @@ obj-y				=  core.o severity.o genpool.o
 obj-$(CONFIG_X86_ANCIENT_MCE)	+= winchip.o p5.o
 obj-$(CONFIG_X86_MCE_INTEL)	+= intel.o
 obj-$(CONFIG_X86_MCE_AMD)	+= amd.o
+obj-$(CONFIG_X86_MCE_ZHAOXIN)   += zhaoxin.o
 obj-$(CONFIG_X86_MCE_THRESHOLD) += threshold.o
 
 mce-inject-y			:= inject.o
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index f71b33b96b5b..a1684b73b349 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1970,17 +1970,6 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 			mce_flags.skx_repmov_quirk = 1;
 	}
 
-	if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
-		/*
-		 * All newer Zhaoxin CPUs support MCE broadcasting. Enable
-		 * synchronization with a one second timeout.
-		 */
-		if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
-			if (cfg->monarch_timeout < 0)
-				cfg->monarch_timeout = USEC_PER_SEC;
-		}
-	}
-
 	if (cfg->monarch_timeout < 0)
 		cfg->monarch_timeout = 0;
 	if (cfg->bootlog != 0)
@@ -2023,49 +2012,6 @@ static void __mcheck_cpu_init_early(struct cpuinfo_x86 *c)
 	}
 }
 
-static void mce_centaur_feature_init(struct cpuinfo_x86 *c)
-{
-	struct mca_config *cfg = &mca_cfg;
-
-	 /*
-	  * All newer Centaur CPUs support MCE broadcasting. Enable
-	  * synchronization with a one second timeout.
-	  */
-	if ((c->x86 == 6 && c->x86_model == 0xf && c->x86_stepping >= 0xe) ||
-	     c->x86 > 6) {
-		if (cfg->monarch_timeout < 0)
-			cfg->monarch_timeout = USEC_PER_SEC;
-	}
-}
-
-static void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c)
-{
-	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
-
-	/*
-	 * These CPUs have MCA bank 8 which reports only one error type called
-	 * SVAD (System View Address Decoder). The reporting of that error is
-	 * controlled by IA32_MC8.CTL.0.
-	 *
-	 * If enabled, prefetching on these CPUs will cause SVAD MCE when
-	 * virtual machines start and result in a system  panic. Always disable
-	 * bank 8 SVAD error by default.
-	 */
-	if ((c->x86 == 7 && c->x86_model == 0x1b) ||
-	    (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
-		if (this_cpu_read(mce_num_banks) > 8)
-			mce_banks[8].ctl = 0;
-	}
-
-	intel_init_cmci();
-	intel_init_lmce();
-}
-
-static void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c)
-{
-	intel_clear_lmce();
-}
-
 static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
 {
 	switch (c->x86_vendor) {
@@ -2083,9 +2029,6 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
 		break;
 
 	case X86_VENDOR_CENTAUR:
-		mce_centaur_feature_init(c);
-		break;
-
 	case X86_VENDOR_ZHAOXIN:
 		mce_zhaoxin_feature_init(c);
 		break;
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 43c7f3b71df5..fb9d8b5b3b75 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -336,4 +336,11 @@ static __always_inline u32 mca_msr_reg(int bank, enum mca_msr reg)
 }
 
 extern void (*mc_poll_banks)(void);
+#ifdef CONFIG_X86_MCE_ZHAOXIN
+void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c);
+void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c);
+#else
+static inline void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c) { }
+static inline void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c) { }
+#endif
 #endif /* __X86_MCE_INTERNAL_H__ */
diff --git a/arch/x86/kernel/cpu/mce/zhaoxin.c b/arch/x86/kernel/cpu/mce/zhaoxin.c
new file mode 100644
index 000000000000..6fdef2d24f31
--- /dev/null
+++ b/arch/x86/kernel/cpu/mce/zhaoxin.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Zhaoxin specific MCE features
+ * Author: Lyle Li
+ */
+#include <asm/msr.h>
+#include "internal.h"
+
+static void mce_zhaoxin_apply_mce_broadcast(struct cpuinfo_x86 *c)
+{
+	struct mca_config *cfg = &mca_cfg;
+
+	/* Older CPUs do not do MCE broadcast: */
+	if (c->x86 < 6)
+		return;
+
+	/* All newer ones do: */
+	if (c->x86 > 6)
+		goto mce_broadcast;
+
+	/* Family 6 is mixed: */
+	if (c->x86_vendor == X86_VENDOR_CENTAUR) {
+		if (c->x86_model == 0xf && c->x86_stepping >= 0xe)
+			goto mce_broadcast;
+	} else if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
+		if (c->x86_model == 0x19 || c->x86_model == 0x1f)
+			goto mce_broadcast;
+	}
+
+	return;
+
+mce_broadcast:
+	if (cfg->monarch_timeout <= 0)
+		cfg->monarch_timeout = USEC_PER_SEC;
+}
+
+void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c)
+{
+	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
+
+	/*
+	 * These CPUs have MCA bank 8 which reports only one error type called
+	 * SVAD (System View Address Decoder). The reporting of that error is
+	 * controlled by IA32_MC8.CTL.0.
+	 *
+	 * If enabled, prefetching on these CPUs will cause SVAD MCE when
+	 * virtual machines start and result in a system  panic. Always disable
+	 * bank 8 SVAD error by default.
+	 */
+	if ((c->x86 == 7 && c->x86_model == 0x1b) ||
+	    (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
+		if (this_cpu_read(mce_num_banks) > 8)
+			mce_banks[8].ctl = 0;
+	}
+
+	mce_zhaoxin_apply_mce_broadcast(c);
+	intel_init_cmci();
+	intel_init_lmce();
+}
+
+void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c)
+{
+	intel_clear_lmce();
+}
-- 
2.34.1
[PATCH v5 4/4] x86/mce: Add CMCI storm switching support for Zhaoxin
Posted by Tony W Wang-oc 1 month, 2 weeks ago
From: Lyle Li <LyleLi@zhaoxin.com>

Zhaoxin CPUs support CMCI which is compatible with Intel, but
their UCR errors are not reported through CMCI like Intel's. To
be compatible with intel's CMCI code, add Zhaoxin's specific
CMCI storm toggle.

Signed-off-by: Lyle Li <LyleLi@zhaoxin.com>
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
 arch/x86/kernel/cpu/mce/intel.c     |  5 ++---
 arch/x86/kernel/cpu/mce/internal.h  |  7 +++++++
 arch/x86/kernel/cpu/mce/threshold.c |  4 ++++
 arch/x86/kernel/cpu/mce/zhaoxin.c   | 18 ++++++++++++++++++
 4 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index b7e67f4f7edd..aa75e28486c3 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -45,7 +45,7 @@ static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned);
  * cmci_discover_lock protects against parallel discovery attempts
  * which could race against each other.
  */
-static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
+DEFINE_RAW_SPINLOCK(cmci_discover_lock);
 
 /*
  * On systems that do support CMCI but it's disabled, polling for MCEs can
@@ -61,7 +61,7 @@ static DEFINE_SPINLOCK(cmci_poll_lock);
  * MCi_CTL2 threshold for each bank when there is no storm.
  * Default value for each bank may have been set by BIOS.
  */
-static u16 cmci_threshold[MAX_NR_BANKS];
+u16 cmci_threshold[MAX_NR_BANKS];
 
 /*
  * High threshold to limit CMCI rate during storms. Max supported is
@@ -73,7 +73,6 @@ static u16 cmci_threshold[MAX_NR_BANKS];
  * to corrected errors, so keeping CMCI enabled means that uncorrected
  * errors will still be processed in a timely fashion.
  */
-#define CMCI_STORM_THRESHOLD	32749
 
 static int cmci_supported(int *banks)
 {
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index fb9d8b5b3b75..69377664bacf 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -6,6 +6,8 @@
 #define pr_fmt(fmt) "mce: " fmt
 
 #include <linux/device.h>
+#include <linux/spinlock.h>
+
 #include <asm/mce.h>
 
 enum severity_level {
@@ -336,11 +338,16 @@ static __always_inline u32 mca_msr_reg(int bank, enum mca_msr reg)
 }
 
 extern void (*mc_poll_banks)(void);
+#define CMCI_STORM_THRESHOLD    32749
+extern raw_spinlock_t cmci_discover_lock;
+extern u16 cmci_threshold[MAX_NR_BANKS];
 #ifdef CONFIG_X86_MCE_ZHAOXIN
 void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c);
 void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c);
+void mce_zhaoxin_handle_storm(int bank, bool on);
 #else
 static inline void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c) { }
 static inline void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c) { }
+static inline void mce_zhaoxin_handle_storm(int bank, bool on) { }
 #endif
 #endif /* __X86_MCE_INTERNAL_H__ */
diff --git a/arch/x86/kernel/cpu/mce/threshold.c b/arch/x86/kernel/cpu/mce/threshold.c
index 89e31e1e5c9c..200280387f04 100644
--- a/arch/x86/kernel/cpu/mce/threshold.c
+++ b/arch/x86/kernel/cpu/mce/threshold.c
@@ -63,6 +63,10 @@ static void mce_handle_storm(unsigned int bank, bool on)
 	case X86_VENDOR_INTEL:
 		mce_intel_handle_storm(bank, on);
 		break;
+	case X86_VENDOR_ZHAOXIN:
+	case X86_VENDOR_CENTAUR:
+		mce_zhaoxin_handle_storm(bank, on);
+		break;
 	}
 }
 
diff --git a/arch/x86/kernel/cpu/mce/zhaoxin.c b/arch/x86/kernel/cpu/mce/zhaoxin.c
index 6fdef2d24f31..6e38e2b8af20 100644
--- a/arch/x86/kernel/cpu/mce/zhaoxin.c
+++ b/arch/x86/kernel/cpu/mce/zhaoxin.c
@@ -62,3 +62,21 @@ void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c)
 {
 	intel_clear_lmce();
 }
+
+void mce_zhaoxin_handle_storm(int bank, bool on)
+{
+	unsigned long flags;
+	u64 val;
+
+	raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+	rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
+	if (on) {
+		val &= ~(MCI_CTL2_CMCI_EN | MCI_CTL2_CMCI_THRESHOLD_MASK);
+		val |= CMCI_STORM_THRESHOLD;
+	} else {
+		val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
+		val |= (MCI_CTL2_CMCI_EN | cmci_threshold[bank]);
+	}
+	wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
+	raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+}
-- 
2.34.1
[PATCH v4 1/4] x86/mce: Add centaur vendor to support Zhaoxin MCA
Posted by Tony W Wang-oc 2 months, 1 week ago
From: Lyle Li <LyleLi@zhaoxin.com>

Zhaoxin consists of two vendors, X86_VENDOR_ZHAOXIN and
X86_VENDOR_CENTAUR, so add the centaur vendor to support
Zhaoxin MCA in mce/core.c and mce/intel.c.

Signed-off-by: Lyle Li <LyleLi@zhaoxin.com>
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
 arch/x86/kernel/cpu/mce/core.c  | 9 +++++++--
 arch/x86/kernel/cpu/mce/intel.c | 3 ++-
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index ad0623b65..4ad6b5083 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -496,6 +496,7 @@ bool mce_usable_address(struct mce *m)
 
 	case X86_VENDOR_INTEL:
 	case X86_VENDOR_ZHAOXIN:
+	case X86_VENDOR_CENTAUR:
 		return intel_mce_usable_address(m);
 
 	default:
@@ -513,6 +514,7 @@ bool mce_is_memory_error(struct mce *m)
 
 	case X86_VENDOR_INTEL:
 	case X86_VENDOR_ZHAOXIN:
+	case X86_VENDOR_CENTAUR:
 		/*
 		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
 		 *
@@ -1247,7 +1249,8 @@ static noinstr bool mce_check_crashing_cpu(void)
 
 		mcgstatus = __rdmsr(MSR_IA32_MCG_STATUS);
 
-		if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN) {
+		if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN ||
+		    boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR) {
 			if (mcgstatus & MCG_STATUS_LMCES)
 				return false;
 		}
@@ -1521,7 +1524,8 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * on Intel, Zhaoxin only.
 	 */
 	if (m.cpuvendor == X86_VENDOR_INTEL ||
-	    m.cpuvendor == X86_VENDOR_ZHAOXIN)
+	    m.cpuvendor == X86_VENDOR_ZHAOXIN ||
+	    m.cpuvendor == X86_VENDOR_CENTAUR)
 		lmce = m.mcgstatus & MCG_STATUS_LMCES;
 
 	/*
@@ -2092,6 +2096,7 @@ static void __mcheck_cpu_clear_vendor(struct cpuinfo_x86 *c)
 		break;
 
 	case X86_VENDOR_ZHAOXIN:
+	case X86_VENDOR_CENTAUR:
 		mce_zhaoxin_feature_clear(c);
 		break;
 
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index f6103e6bf..b7e67f4f7 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -88,7 +88,8 @@ static int cmci_supported(int *banks)
 	 * makes sure none of the backdoors are entered otherwise.
 	 */
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN)
+	    boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN &&
+	    boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR)
 		return 0;
 
 	if (!boot_cpu_has(X86_FEATURE_APIC) || lapic_get_maxlvt() < 6)
-- 
2.34.1
RE: [PATCH v4 1/4] x86/mce: Add centaur vendor to support Zhaoxin MCA
Posted by Zhuo, Qiuxu 1 month, 2 weeks ago
> From: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> [...]
> Subject: [PATCH v4 1/4] x86/mce: Add centaur vendor to support Zhaoxin MCA

s/centaur/Centaur/

> 
> From: Lyle Li <LyleLi@zhaoxin.com>
> 
> Zhaoxin consists of two vendors, X86_VENDOR_ZHAOXIN and
> X86_VENDOR_CENTAUR, so add the centaur vendor to support Zhaoxin MCA in
> mce/core.c and mce/intel.c.

s/centaur/Centaur

Simplify the commit message as follows:

   Zhaoxin consists of two vendors, X86_VENDOR_ZHAOXIN and X86_VENDOR_CENTAUR. Add the missing Centaur vendor to support Zhaoxin MCA.

Other than that:

   Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

> Signed-off-by: Lyle Li <LyleLi@zhaoxin.com>
> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> ---
> [...]
[PATCH v4 2/4] x86/mce: Remove functions that disable error reporting
Posted by Tony W Wang-oc 2 months, 1 week ago
From: Lyle Li <LyleLi@zhaoxin.com>

Since all major vendors do not disable MCA_CTL after initialization,
functions that disable error reporting should be removed in mce/core.c.

Signed-off-by: Lyle Li <LyleLi@zhaoxin.com>
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
 arch/x86/kernel/cpu/mce/core.c | 50 ----------------------------------
 1 file changed, 50 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 4ad6b5083..1654133da 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2376,53 +2376,6 @@ int __init mcheck_init(void)
  * mce_syscore: PM support
  */
 
-/*
- * Disable machine checks on suspend and shutdown. We can't really handle
- * them later.
- */
-static void mce_disable_error_reporting(void)
-{
-	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
-	int i;
-
-	for (i = 0; i < this_cpu_read(mce_num_banks); i++) {
-		struct mce_bank *b = &mce_banks[i];
-
-		if (b->init)
-			wrmsrl(mca_msr_reg(i, MCA_CTL), 0);
-	}
-	return;
-}
-
-static void vendor_disable_error_reporting(void)
-{
-	/*
-	 * Don't clear on Intel or AMD or Hygon or Zhaoxin CPUs. Some of these
-	 * MSRs are socket-wide. Disabling them for just a single offlined CPU
-	 * is bad, since it will inhibit reporting for all shared resources on
-	 * the socket like the last level cache (LLC), the integrated memory
-	 * controller (iMC), etc.
-	 */
-	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
-	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON ||
-	    boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
-	    boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN)
-		return;
-
-	mce_disable_error_reporting();
-}
-
-static int mce_syscore_suspend(void)
-{
-	vendor_disable_error_reporting();
-	return 0;
-}
-
-static void mce_syscore_shutdown(void)
-{
-	vendor_disable_error_reporting();
-}
-
 /*
  * On resume clear all MCE state. Don't want to see leftovers from the BIOS.
  * Only one CPU is active at this time, the others get re-added later using
@@ -2436,8 +2389,6 @@ static void mce_syscore_resume(void)
 }
 
 static struct syscore_ops mce_syscore_ops = {
-	.suspend	= mce_syscore_suspend,
-	.shutdown	= mce_syscore_shutdown,
 	.resume		= mce_syscore_resume,
 };
 
@@ -2722,7 +2673,6 @@ static void mce_disable_cpu(void)
 	if (!cpuhp_tasks_frozen)
 		cmci_clear();
 
-	vendor_disable_error_reporting();
 }
 
 static void mce_reenable_cpu(void)
-- 
2.34.1
RE: [PATCH v4 2/4] x86/mce: Remove functions that disable error reporting
Posted by Zhuo, Qiuxu 1 month, 2 weeks ago
> From: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> [...]
> Subject: [PATCH v4 2/4] x86/mce: Remove functions that disable error reporting
> 
> From: Lyle Li <LyleLi@zhaoxin.com>
> 
> Since all major vendors do not disable MCA_CTL after initialization, functions
> that disable error reporting should be removed in mce/core.c.

It's obvious from this patch that the functions are removed from mce/core.c. 
IMHO: No need to mention the file name in the commit message.

And please use active voice in the commit message, like this:

    Since all major vendors do not disable MCA_CTL after initialization, remove the functions that disable error reporting.

Other than that:

    Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

> [...]
[PATCH v4 3/4] x86/mce: Add zhaoxin.c to support Zhaoxin MCA
Posted by Tony W Wang-oc 2 months, 1 week ago
From: Lyle Li <LyleLi@zhaoxin.com>

For the sake of code standardization, add zhaoxin.c to
override the Zhaoxin MCA code.

Signed-off-by: Lyle Li <LyleLi@zhaoxin.com>
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
 arch/x86/Kconfig                   |  8 ++++
 arch/x86/kernel/cpu/mce/Makefile   |  2 +-
 arch/x86/kernel/cpu/mce/core.c     | 57 --------------------------
 arch/x86/kernel/cpu/mce/internal.h |  7 ++++
 arch/x86/kernel/cpu/mce/zhaoxin.c  | 65 ++++++++++++++++++++++++++++++
 5 files changed, 81 insertions(+), 58 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/mce/zhaoxin.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1d7122a18..b908cdfb9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1171,6 +1171,14 @@ config X86_MCE_INTEL
 	  Additional support for intel specific MCE features such as
 	  the thermal monitor.
 
+config X86_MCE_ZHAOXIN
+	def_bool y
+	prompt "Zhaoxin MCE features"
+	depends on X86_MCE_INTEL
+	help
+	  Additional support for zhaoxin specific MCE features such as
+	  the corrected machine check interrupt.
+
 config X86_MCE_AMD
 	def_bool y
 	prompt "AMD MCE features"
diff --git a/arch/x86/kernel/cpu/mce/Makefile b/arch/x86/kernel/cpu/mce/Makefile
index 015856abd..2e863e78d 100644
--- a/arch/x86/kernel/cpu/mce/Makefile
+++ b/arch/x86/kernel/cpu/mce/Makefile
@@ -5,7 +5,7 @@ obj-$(CONFIG_X86_ANCIENT_MCE)	+= winchip.o p5.o
 obj-$(CONFIG_X86_MCE_INTEL)	+= intel.o
 obj-$(CONFIG_X86_MCE_AMD)	+= amd.o
 obj-$(CONFIG_X86_MCE_THRESHOLD) += threshold.o
-
+obj-$(CONFIG_X86_MCE_ZHAOXIN)   += zhaoxin.o
 mce-inject-y			:= inject.o
 obj-$(CONFIG_X86_MCE_INJECT)	+= mce-inject.o
 
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 1654133da..37d4b6cd4 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1963,17 +1963,6 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 			mce_flags.skx_repmov_quirk = 1;
 	}
 
-	if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
-		/*
-		 * All newer Zhaoxin CPUs support MCE broadcasting. Enable
-		 * synchronization with a one second timeout.
-		 */
-		if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
-			if (cfg->monarch_timeout < 0)
-				cfg->monarch_timeout = USEC_PER_SEC;
-		}
-	}
-
 	if (cfg->monarch_timeout < 0)
 		cfg->monarch_timeout = 0;
 	if (cfg->bootlog != 0)
@@ -2016,49 +2005,6 @@ static void __mcheck_cpu_init_early(struct cpuinfo_x86 *c)
 	}
 }
 
-static void mce_centaur_feature_init(struct cpuinfo_x86 *c)
-{
-	struct mca_config *cfg = &mca_cfg;
-
-	 /*
-	  * All newer Centaur CPUs support MCE broadcasting. Enable
-	  * synchronization with a one second timeout.
-	  */
-	if ((c->x86 == 6 && c->x86_model == 0xf && c->x86_stepping >= 0xe) ||
-	     c->x86 > 6) {
-		if (cfg->monarch_timeout < 0)
-			cfg->monarch_timeout = USEC_PER_SEC;
-	}
-}
-
-static void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c)
-{
-	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
-
-	/*
-	 * These CPUs have MCA bank 8 which reports only one error type called
-	 * SVAD (System View Address Decoder). The reporting of that error is
-	 * controlled by IA32_MC8.CTL.0.
-	 *
-	 * If enabled, prefetching on these CPUs will cause SVAD MCE when
-	 * virtual machines start and result in a system  panic. Always disable
-	 * bank 8 SVAD error by default.
-	 */
-	if ((c->x86 == 7 && c->x86_model == 0x1b) ||
-	    (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
-		if (this_cpu_read(mce_num_banks) > 8)
-			mce_banks[8].ctl = 0;
-	}
-
-	intel_init_cmci();
-	intel_init_lmce();
-}
-
-static void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c)
-{
-	intel_clear_lmce();
-}
-
 static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
 {
 	switch (c->x86_vendor) {
@@ -2076,9 +2022,6 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
 		break;
 
 	case X86_VENDOR_CENTAUR:
-		mce_centaur_feature_init(c);
-		break;
-
 	case X86_VENDOR_ZHAOXIN:
 		mce_zhaoxin_feature_init(c);
 		break;
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 01f8f0396..836e56027 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -334,4 +334,11 @@ static __always_inline u32 mca_msr_reg(int bank, enum mca_msr reg)
 }
 
 extern void (*mc_poll_banks)(void);
+#ifdef CONFIG_X86_MCE_ZHAOXIN
+void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c);
+void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c);
+#else
+static inline void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c) { }
+static inline void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c) { }
+#endif
 #endif /* __X86_MCE_INTERNAL_H__ */
diff --git a/arch/x86/kernel/cpu/mce/zhaoxin.c b/arch/x86/kernel/cpu/mce/zhaoxin.c
new file mode 100644
index 000000000..de69c560f
--- /dev/null
+++ b/arch/x86/kernel/cpu/mce/zhaoxin.c
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Zhaoxin specific MCE features
+ * Author: Lyle Li
+ */
+#include <asm/msr.h>
+#include "internal.h"
+
+static void mce_zhaoxin_apply_mce_broadcast(struct cpuinfo_x86 *c)
+{
+	struct mca_config *cfg = &mca_cfg;
+
+	/* Older CPUs do not do MCE broadcast */
+	if (c->x86 < 6)
+		return;
+	/*
+	 * All newer Zhaoxin and Centaur CPUs support MCE broadcasting. Enable
+	 * synchronization with a one second timeout.
+	 */
+	if (c->x86 > 6)
+		goto mce_broadcast;
+
+	if (c->x86_vendor == X86_VENDOR_CENTAUR) {
+		if (c->x86_model == 0xf && c->x86_stepping >= 0xe)
+			goto mce_broadcast;
+	} else if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
+		if (c->x86_model == 0x19 || c->x86_model == 0x1f)
+			goto mce_broadcast;
+	}
+
+	return;
+
+mce_broadcast:
+	if (cfg->monarch_timeout <= 0)
+		cfg->monarch_timeout = USEC_PER_SEC;
+}
+
+void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c)
+{
+	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
+
+	/*
+	 * These CPUs have MCA bank 8 which reports only one error type called
+	 * SVAD (System View Address Decoder). The reporting of that error is
+	 * controlled by IA32_MC8.CTL.0.
+	 *
+	 * If enabled, prefetching on these CPUs will cause SVAD MCE when
+	 * virtual machines start and result in a system  panic. Always disable
+	 * bank 8 SVAD error by default.
+	 */
+	if ((c->x86 == 7 && c->x86_model == 0x1b) ||
+	    (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
+		if (this_cpu_read(mce_num_banks) > 8)
+			mce_banks[8].ctl = 0;
+	}
+
+	mce_zhaoxin_apply_mce_broadcast(c);
+	intel_init_cmci();
+	intel_init_lmce();
+}
+
+void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c)
+{
+	intel_clear_lmce();
+}
-- 
2.34.1
RE: [PATCH v4 3/4] x86/mce: Add zhaoxin.c to support Zhaoxin MCA
Posted by Zhuo, Qiuxu 1 month, 2 weeks ago
> From: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> [...]
> +config X86_MCE_ZHAOXIN
> +	def_bool y
> +	prompt "Zhaoxin MCE features"
> +	depends on X86_MCE_INTEL
> +	help
> +	  Additional support for zhaoxin specific MCE features such as

s/zhaoxin/Zhaoxin

> +	  the corrected machine check interrupt.
> +
>  config X86_MCE_AMD
>  	def_bool y
>  	prompt "AMD MCE features"
> diff --git a/arch/x86/kernel/cpu/mce/Makefile
> b/arch/x86/kernel/cpu/mce/Makefile
> index 015856abd..2e863e78d 100644
> --- a/arch/x86/kernel/cpu/mce/Makefile
> +++ b/arch/x86/kernel/cpu/mce/Makefile
> @@ -5,7 +5,7 @@ obj-$(CONFIG_X86_ANCIENT_MCE)	+= winchip.o p5.o
>  obj-$(CONFIG_X86_MCE_INTEL)	+= intel.o
>  obj-$(CONFIG_X86_MCE_AMD)	+= amd.o
>  obj-$(CONFIG_X86_MCE_THRESHOLD) += threshold.o
> -
> +obj-$(CONFIG_X86_MCE_ZHAOXIN)   += zhaoxin.o

Move this newly added item just after AMD's, so they're sorted in vendors. 
And keep a blank line here as it was.

>  mce-inject-y			:= inject.o
>  obj-$(CONFIG_X86_MCE_INJECT)	+= mce-inject.o
> 
> [...]
> --- /dev/null
> +++ b/arch/x86/kernel/cpu/mce/zhaoxin.c
> @@ -0,0 +1,65 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Zhaoxin specific MCE features
> + * Author: Lyle Li
> + */
> +#include <asm/msr.h>
> +#include "internal.h"
> +
> +static void mce_zhaoxin_apply_mce_broadcast(struct cpuinfo_x86 *c) {
> +	struct mca_config *cfg = &mca_cfg;
> +
> +	/* Older CPUs do not do MCE broadcast */

s/MCE broadcast/MCE broadcast:/

> +	if (c->x86 < 6)
> +		return;
> +	/*
> +	 * All newer Zhaoxin and Centaur CPUs support MCE broadcasting.
> Enable
> +	 * synchronization with a one second timeout.
> +	 */


Instead of copying and pasting the redundant comments, could you use Dave's concise comments as suggested in:

   https://lore.kernel.org/all/a25f878e-83d9-440a-9741-4cf86db4a716@intel.com/

/* All newer ones do: */
> +	if (c->x86 > 6)
> +		goto mce_broadcast;
> +

/* Family 6 is mixed: */
> +	if (c->x86_vendor == X86_VENDOR_CENTAUR) {
> +		if (c->x86_model == 0xf && c->x86_stepping >= 0xe)
> +			goto mce_broadcast;
> +	} else if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
> +		if (c->x86_model == 0x19 || c->x86_model == 0x1f)
> +			goto mce_broadcast;
> +	}
> +
> +	return;
> [...]
Re: [PATCH v4 3/4] x86/mce: Add zhaoxin.c to support Zhaoxin MCA
Posted by Tony W Wang-oc 1 month, 2 weeks ago

On 2024/10/12 14:41, Zhuo, Qiuxu wrote:
> 
> 
>> From: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>> [...]
>> +config X86_MCE_ZHAOXIN
>> +     def_bool y
>> +     prompt "Zhaoxin MCE features"
>> +     depends on X86_MCE_INTEL
>> +     help
>> +       Additional support for zhaoxin specific MCE features such as
> 
> s/zhaoxin/Zhaoxin
> 
>> +       the corrected machine check interrupt.
>> +
>>   config X86_MCE_AMD
>>        def_bool y
>>        prompt "AMD MCE features"
>> diff --git a/arch/x86/kernel/cpu/mce/Makefile
>> b/arch/x86/kernel/cpu/mce/Makefile
>> index 015856abd..2e863e78d 100644
>> --- a/arch/x86/kernel/cpu/mce/Makefile
>> +++ b/arch/x86/kernel/cpu/mce/Makefile
>> @@ -5,7 +5,7 @@ obj-$(CONFIG_X86_ANCIENT_MCE) += winchip.o p5.o
>>   obj-$(CONFIG_X86_MCE_INTEL)  += intel.o
>>   obj-$(CONFIG_X86_MCE_AMD)    += amd.o
>>   obj-$(CONFIG_X86_MCE_THRESHOLD) += threshold.o
>> -
>> +obj-$(CONFIG_X86_MCE_ZHAOXIN)   += zhaoxin.o
> 
> Move this newly added item just after AMD's, so they're sorted in vendors.
> And keep a blank line here as it was.
> 
>>   mce-inject-y                 := inject.o
>>   obj-$(CONFIG_X86_MCE_INJECT) += mce-inject.o
>>
>> [...]
>> --- /dev/null
>> +++ b/arch/x86/kernel/cpu/mce/zhaoxin.c
>> @@ -0,0 +1,65 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Zhaoxin specific MCE features
>> + * Author: Lyle Li
>> + */
>> +#include <asm/msr.h>
>> +#include "internal.h"
>> +
>> +static void mce_zhaoxin_apply_mce_broadcast(struct cpuinfo_x86 *c) {
>> +     struct mca_config *cfg = &mca_cfg;
>> +
>> +     /* Older CPUs do not do MCE broadcast */
> 
> s/MCE broadcast/MCE broadcast:/
> 
>> +     if (c->x86 < 6)
>> +             return;
>> +     /*
>> +      * All newer Zhaoxin and Centaur CPUs support MCE broadcasting.
>> Enable
>> +      * synchronization with a one second timeout.
>> +      */
> 
> 
> Instead of copying and pasting the redundant comments, could you use Dave's concise comments as suggested in:
> 
>     https://lore.kernel.org/all/a25f878e-83d9-440a-9741-4cf86db4a716@intel.com/
> 
> /* All newer ones do: */
>> +     if (c->x86 > 6)
>> +             goto mce_broadcast;
>> +
> 
> /* Family 6 is mixed: */
>> +     if (c->x86_vendor == X86_VENDOR_CENTAUR) {
>> +             if (c->x86_model == 0xf && c->x86_stepping >= 0xe)
>> +                     goto mce_broadcast;
>> +     } else if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
>> +             if (c->x86_model == 0x19 || c->x86_model == 0x1f)
>> +                     goto mce_broadcast;
>> +     }
>> +
>> +     return;
>> [...]

Thank you for your review.
Should I add the tag for this patch:
     Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

Sincerely
TonyWWang-oc
RE: [PATCH v4 3/4] x86/mce: Add zhaoxin.c to support Zhaoxin MCA
Posted by Zhuo, Qiuxu 1 month, 2 weeks ago
> From: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> [...]
> Subject: Re: [PATCH v4 3/4] x86/mce: Add zhaoxin.c to support Zhaoxin MCA
> >
> >> From: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com> [...]
> >> +config X86_MCE_ZHAOXIN
> >> +     def_bool y
> >> +     prompt "Zhaoxin MCE features"
> >> +     depends on X86_MCE_INTEL
> >> +     help
> >> +       Additional support for zhaoxin specific MCE features such as
> >
> > s/zhaoxin/Zhaoxin
> >
> >> +       the corrected machine check interrupt.
> >> +
> >>   config X86_MCE_AMD
> >>        def_bool y
> >>        prompt "AMD MCE features"
> >> diff --git a/arch/x86/kernel/cpu/mce/Makefile
> >> b/arch/x86/kernel/cpu/mce/Makefile
> >> index 015856abd..2e863e78d 100644
> >> --- a/arch/x86/kernel/cpu/mce/Makefile
> >> +++ b/arch/x86/kernel/cpu/mce/Makefile
> >> @@ -5,7 +5,7 @@ obj-$(CONFIG_X86_ANCIENT_MCE) += winchip.o p5.o
> >>   obj-$(CONFIG_X86_MCE_INTEL)  += intel.o
> >>   obj-$(CONFIG_X86_MCE_AMD)    += amd.o
> >>   obj-$(CONFIG_X86_MCE_THRESHOLD) += threshold.o
> >> -
> >> +obj-$(CONFIG_X86_MCE_ZHAOXIN)   += zhaoxin.o
> >
> > Move this newly added item just after AMD's, so they're sorted in vendors.
> > And keep a blank line here as it was.
> >
> >>   mce-inject-y                 := inject.o
> >>   obj-$(CONFIG_X86_MCE_INJECT) += mce-inject.o
> >>
> >> [...]
> >> --- /dev/null
> >> +++ b/arch/x86/kernel/cpu/mce/zhaoxin.c
> >> @@ -0,0 +1,65 @@
> >> +// SPDX-License-Identifier: GPL-2.0
> >> +/*
> >> + * Zhaoxin specific MCE features
> >> + * Author: Lyle Li
> >> + */
> >> +#include <asm/msr.h>
> >> +#include "internal.h"
> >> +
> >> +static void mce_zhaoxin_apply_mce_broadcast(struct cpuinfo_x86 *c) {
> >> +     struct mca_config *cfg = &mca_cfg;
> >> +
> >> +     /* Older CPUs do not do MCE broadcast */
> >
> > s/MCE broadcast/MCE broadcast:/
> >
> >> +     if (c->x86 < 6)
> >> +             return;
> >> +     /*
> >> +      * All newer Zhaoxin and Centaur CPUs support MCE broadcasting.
> >> Enable
> >> +      * synchronization with a one second timeout.
> >> +      */
> >
> >
> > Instead of copying and pasting the redundant comments, could you use
> Dave's concise comments as suggested in:
> >
> >
> > https://lore.kernel.org/all/a25f878e-83d9-440a-9741-4cf86db4a716@intel
> > .com/
> >
> > /* All newer ones do: */
> >> +     if (c->x86 > 6)
> >> +             goto mce_broadcast;
> >> +
> >
> > /* Family 6 is mixed: */
> >> +     if (c->x86_vendor == X86_VENDOR_CENTAUR) {
> >> +             if (c->x86_model == 0xf && c->x86_stepping >= 0xe)
> >> +                     goto mce_broadcast;
> >> +     } else if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
> >> +             if (c->x86_model == 0x19 || c->x86_model == 0x1f)
> >> +                     goto mce_broadcast;
> >> +     }
> >> +
> >> +     return;
> >> [...]
> 
> Thank you for your review.
> Should I add the tag for this patch:
>      Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

Yes, you can, if the comments above are resolved.

-Qiuxu

[PATCH v4 4/4] x86/mce: Add CMCI storm switching support for Zhaoxin
Posted by Tony W Wang-oc 2 months, 1 week ago
From: Lyle Li <LyleLi@zhaoxin.com>

Zhaoxin CPUs support CMCI compatible with Intel, because
Zhaoxin's UCR error is not reported through CMCI, and in
order to be compatible with intel's CMCI code, so add Zhaoxin
CMCI storm toggle to support the new CMCI storm switching
in mce/intel.c, mce/zhaoxin.c, mce/threshold.c, and mce/internal.h.

Signed-off-by: Lyle Li <LyleLi@zhaoxin.com>
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
 arch/x86/kernel/cpu/mce/intel.c     |  5 ++---
 arch/x86/kernel/cpu/mce/internal.h  |  7 ++++++-
 arch/x86/kernel/cpu/mce/threshold.c |  4 ++++
 arch/x86/kernel/cpu/mce/zhaoxin.c   | 18 ++++++++++++++++++
 4 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index b7e67f4f7..aa75e2848 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -45,7 +45,7 @@ static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned);
  * cmci_discover_lock protects against parallel discovery attempts
  * which could race against each other.
  */
-static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
+DEFINE_RAW_SPINLOCK(cmci_discover_lock);
 
 /*
  * On systems that do support CMCI but it's disabled, polling for MCEs can
@@ -61,7 +61,7 @@ static DEFINE_SPINLOCK(cmci_poll_lock);
  * MCi_CTL2 threshold for each bank when there is no storm.
  * Default value for each bank may have been set by BIOS.
  */
-static u16 cmci_threshold[MAX_NR_BANKS];
+u16 cmci_threshold[MAX_NR_BANKS];
 
 /*
  * High threshold to limit CMCI rate during storms. Max supported is
@@ -73,7 +73,6 @@ static u16 cmci_threshold[MAX_NR_BANKS];
  * to corrected errors, so keeping CMCI enabled means that uncorrected
  * errors will still be processed in a timely fashion.
  */
-#define CMCI_STORM_THRESHOLD	32749
 
 static int cmci_supported(int *banks)
 {
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 836e56027..086b833c5 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -7,7 +7,7 @@
 
 #include <linux/device.h>
 #include <asm/mce.h>
-
+#include <linux/spinlock.h>
 enum severity_level {
 	MCE_NO_SEVERITY,
 	MCE_DEFERRED_SEVERITY,
@@ -334,11 +334,16 @@ static __always_inline u32 mca_msr_reg(int bank, enum mca_msr reg)
 }
 
 extern void (*mc_poll_banks)(void);
+#define CMCI_STORM_THRESHOLD    32749
+extern raw_spinlock_t cmci_discover_lock;
+extern u16 cmci_threshold[MAX_NR_BANKS];
 #ifdef CONFIG_X86_MCE_ZHAOXIN
 void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c);
 void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c);
+void mce_zhaoxin_handle_storm(int bank, bool on);
 #else
 static inline void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c) { }
 static inline void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c) { }
+static inline void mce_zhaoxin_handle_storm(int bank, bool on) { }
 #endif
 #endif /* __X86_MCE_INTERNAL_H__ */
diff --git a/arch/x86/kernel/cpu/mce/threshold.c b/arch/x86/kernel/cpu/mce/threshold.c
index 89e31e1e5..200280387 100644
--- a/arch/x86/kernel/cpu/mce/threshold.c
+++ b/arch/x86/kernel/cpu/mce/threshold.c
@@ -63,6 +63,10 @@ static void mce_handle_storm(unsigned int bank, bool on)
 	case X86_VENDOR_INTEL:
 		mce_intel_handle_storm(bank, on);
 		break;
+	case X86_VENDOR_ZHAOXIN:
+	case X86_VENDOR_CENTAUR:
+		mce_zhaoxin_handle_storm(bank, on);
+		break;
 	}
 }
 
diff --git a/arch/x86/kernel/cpu/mce/zhaoxin.c b/arch/x86/kernel/cpu/mce/zhaoxin.c
index de69c560f..994b8520a 100644
--- a/arch/x86/kernel/cpu/mce/zhaoxin.c
+++ b/arch/x86/kernel/cpu/mce/zhaoxin.c
@@ -63,3 +63,21 @@ void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c)
 {
 	intel_clear_lmce();
 }
+
+void mce_zhaoxin_handle_storm(int bank, bool on)
+{
+	unsigned long flags;
+	u64 val;
+
+	raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+	rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
+	if (on) {
+		val &= ~(MCI_CTL2_CMCI_EN | MCI_CTL2_CMCI_THRESHOLD_MASK);
+		val |= CMCI_STORM_THRESHOLD;
+	} else {
+		val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
+		val |= (MCI_CTL2_CMCI_EN | cmci_threshold[bank]);
+	}
+	wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
+	raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+}
-- 
2.34.1
RE: [PATCH v4 4/4] x86/mce: Add CMCI storm switching support for Zhaoxin
Posted by Zhuo, Qiuxu 1 month, 2 weeks ago
> From: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> [...]
> Subject: [PATCH v4 4/4] x86/mce: Add CMCI storm switching support for
> Zhaoxin
> 
> From: Lyle Li <LyleLi@zhaoxin.com>
> 
> Zhaoxin CPUs support CMCI compatible with Intel, because Zhaoxin's UCR error
> is not reported through CMCI, and in order to be compatible with intel's CMCI
> code, so add Zhaoxin CMCI storm toggle to support the new CMCI storm
> switching in mce/intel.c, mce/zhaoxin.c, mce/threshold.c, and mce/internal.h.

Could you tweak and simplify the commit message, like this:

    Zhaoxin CPUs support CMCI which is compatible with Intel, but their UCR errors are
    not reported through CMCI like Intel's. To be compatible with Intel's CMCI code,
    add Zhaoxin's specific CMCI storm toggle.

> [...]
> diff --git a/arch/x86/kernel/cpu/mce/internal.h
> b/arch/x86/kernel/cpu/mce/internal.h
> index 836e56027..086b833c5 100644
> --- a/arch/x86/kernel/cpu/mce/internal.h
> +++ b/arch/x86/kernel/cpu/mce/internal.h
> @@ -7,7 +7,7 @@
> 
>  #include <linux/device.h>
>  #include <asm/mce.h>
> -
> +#include <linux/spinlock.h>

Please sort the header files, like this:

    #include <linux/device.h>
    #include <linux/spinlock.h>

    #include <asm/mce.h>

And keep a blank line here as it was.

>  enum severity_level {
>  	MCE_NO_SEVERITY,
>  	MCE_DEFERRED_SEVERITY,
> @@ -334,11 +334,16 @@ static __always_inline u32 mca_msr_reg(int bank,
> enum mca_msr reg)  }
[...]

Other than that:

    Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Re: [PATCH v4 4/4] x86/mce: Add CMCI storm switching support for Zhaoxin
Posted by Tony W Wang-oc 1 month, 2 weeks ago

On 2024/10/12 15:13, Zhuo, Qiuxu wrote:
> 
> 
> [这封邮件来自外部发件人 谨防风险]
> 
>> From: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>> [...]
>> Subject: [PATCH v4 4/4] x86/mce: Add CMCI storm switching support for
>> Zhaoxin
>>
>> From: Lyle Li <LyleLi@zhaoxin.com>
>>
>> Zhaoxin CPUs support CMCI compatible with Intel, because Zhaoxin's UCR error
>> is not reported through CMCI, and in order to be compatible with intel's CMCI
>> code, so add Zhaoxin CMCI storm toggle to support the new CMCI storm
>> switching in mce/intel.c, mce/zhaoxin.c, mce/threshold.c, and mce/internal.h.
> 
> Could you tweak and simplify the commit message, like this:
> 
>      Zhaoxin CPUs support CMCI which is compatible with Intel, but their UCR errors are
>      not reported through CMCI like Intel's. To be compatible with Intel's CMCI code,
>      add Zhaoxin's specific CMCI storm toggle.
> 
>> [...]
>> diff --git a/arch/x86/kernel/cpu/mce/internal.h
>> b/arch/x86/kernel/cpu/mce/internal.h
>> index 836e56027..086b833c5 100644
>> --- a/arch/x86/kernel/cpu/mce/internal.h
>> +++ b/arch/x86/kernel/cpu/mce/internal.h
>> @@ -7,7 +7,7 @@
>>
>>   #include <linux/device.h>
>>   #include <asm/mce.h>
>> -
>> +#include <linux/spinlock.h>
> 
> Please sort the header files, like this:
> 
>      #include <linux/device.h>
>      #include <linux/spinlock.h>
> 
>      #include <asm/mce.h>
> 
> And keep a blank line here as it was.
> 
>>   enum severity_level {
>>        MCE_NO_SEVERITY,
>>        MCE_DEFERRED_SEVERITY,
>> @@ -334,11 +334,16 @@ static __always_inline u32 mca_msr_reg(int bank,
>> enum mca_msr reg)  }
> [...]
> 
> Other than that:
> 
>      Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>

Thank you for reviewing this patchset. v5 will be resent according to 
your suggestion.

Sincerely
TonyWWang-oc
[PATCH v3 1/3] x86/mce: Add centaur vendor to support Zhaoxin MCA
Posted by Tony W Wang-oc 2 months, 1 week ago
From: Lyle Li <LyleLi@zhaoxin.com>

Zhaoxin consists of two vendors, X86_VENDOR_ZHAOXIN and
X86_VENDOR_CENTAUR, so add the centaur vendor to support
Zhaoxin MCA in mce/core.c and mce/intel.c.

Signed-off-by: Lyle Li <LyleLi@zhaoxin.com>
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
 arch/x86/kernel/cpu/mce/core.c  | 12 +++++++++---
 arch/x86/kernel/cpu/mce/intel.c |  3 ++-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index ad0623b65..7f79d900f 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -496,6 +496,7 @@ bool mce_usable_address(struct mce *m)
 
 	case X86_VENDOR_INTEL:
 	case X86_VENDOR_ZHAOXIN:
+	case X86_VENDOR_CENTAUR:
 		return intel_mce_usable_address(m);
 
 	default:
@@ -513,6 +514,7 @@ bool mce_is_memory_error(struct mce *m)
 
 	case X86_VENDOR_INTEL:
 	case X86_VENDOR_ZHAOXIN:
+	case X86_VENDOR_CENTAUR:
 		/*
 		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
 		 *
@@ -1247,7 +1249,8 @@ static noinstr bool mce_check_crashing_cpu(void)
 
 		mcgstatus = __rdmsr(MSR_IA32_MCG_STATUS);
 
-		if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN) {
+		if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN ||
+		    boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR) {
 			if (mcgstatus & MCG_STATUS_LMCES)
 				return false;
 		}
@@ -1521,7 +1524,8 @@ noinstr void do_machine_check(struct pt_regs *regs)
 	 * on Intel, Zhaoxin only.
 	 */
 	if (m.cpuvendor == X86_VENDOR_INTEL ||
-	    m.cpuvendor == X86_VENDOR_ZHAOXIN)
+	    m.cpuvendor == X86_VENDOR_ZHAOXIN ||
+	    m.cpuvendor == X86_VENDOR_CENTAUR)
 		lmce = m.mcgstatus & MCG_STATUS_LMCES;
 
 	/*
@@ -2092,6 +2096,7 @@ static void __mcheck_cpu_clear_vendor(struct cpuinfo_x86 *c)
 		break;
 
 	case X86_VENDOR_ZHAOXIN:
+	case X86_VENDOR_CENTAUR:
 		mce_zhaoxin_feature_clear(c);
 		break;
 
@@ -2401,7 +2406,8 @@ static void vendor_disable_error_reporting(void)
 	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
 	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON ||
 	    boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
-	    boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN)
+	    boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN ||
+	    boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR)
 		return;
 
 	mce_disable_error_reporting();
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index f6103e6bf..b7e67f4f7 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -88,7 +88,8 @@ static int cmci_supported(int *banks)
 	 * makes sure none of the backdoors are entered otherwise.
 	 */
 	if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL &&
-	    boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN)
+	    boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN &&
+	    boot_cpu_data.x86_vendor != X86_VENDOR_CENTAUR)
 		return 0;
 
 	if (!boot_cpu_has(X86_FEATURE_APIC) || lapic_get_maxlvt() < 6)
-- 
2.34.1
Re: [PATCH v3 1/3] x86/mce: Add centaur vendor to support Zhaoxin MCA
Posted by Yazen Ghannam 2 months, 1 week ago
On Wed, Sep 18, 2024 at 01:54:34PM +0800, Tony W Wang-oc wrote:
> From: Lyle Li <LyleLi@zhaoxin.com>
> 
> Zhaoxin consists of two vendors, X86_VENDOR_ZHAOXIN and
> X86_VENDOR_CENTAUR, so add the centaur vendor to support
> Zhaoxin MCA in mce/core.c and mce/intel.c.
> 
> Signed-off-by: Lyle Li <LyleLi@zhaoxin.com>
> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> ---
>  arch/x86/kernel/cpu/mce/core.c  | 12 +++++++++---
>  arch/x86/kernel/cpu/mce/intel.c |  3 ++-
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
> index ad0623b65..7f79d900f 100644
> --- a/arch/x86/kernel/cpu/mce/core.c
> +++ b/arch/x86/kernel/cpu/mce/core.c
> @@ -496,6 +496,7 @@ bool mce_usable_address(struct mce *m)
>  
>  	case X86_VENDOR_INTEL:
>  	case X86_VENDOR_ZHAOXIN:
> +	case X86_VENDOR_CENTAUR:
>  		return intel_mce_usable_address(m);
>  
>  	default:
> @@ -513,6 +514,7 @@ bool mce_is_memory_error(struct mce *m)
>  
>  	case X86_VENDOR_INTEL:
>  	case X86_VENDOR_ZHAOXIN:
> +	case X86_VENDOR_CENTAUR:
>  		/*
>  		 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
>  		 *
> @@ -1247,7 +1249,8 @@ static noinstr bool mce_check_crashing_cpu(void)
>  
>  		mcgstatus = __rdmsr(MSR_IA32_MCG_STATUS);
>  
> -		if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN) {
> +		if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN ||
> +		    boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR) {
>  			if (mcgstatus & MCG_STATUS_LMCES)
>  				return false;
>  		}
> @@ -1521,7 +1524,8 @@ noinstr void do_machine_check(struct pt_regs *regs)
>  	 * on Intel, Zhaoxin only.
>  	 */
>  	if (m.cpuvendor == X86_VENDOR_INTEL ||
> -	    m.cpuvendor == X86_VENDOR_ZHAOXIN)
> +	    m.cpuvendor == X86_VENDOR_ZHAOXIN ||
> +	    m.cpuvendor == X86_VENDOR_CENTAUR)
>  		lmce = m.mcgstatus & MCG_STATUS_LMCES;
>  
>  	/*
> @@ -2092,6 +2096,7 @@ static void __mcheck_cpu_clear_vendor(struct cpuinfo_x86 *c)
>  		break;
>  
>  	case X86_VENDOR_ZHAOXIN:
> +	case X86_VENDOR_CENTAUR:
>  		mce_zhaoxin_feature_clear(c);
>  		break;
>  
> @@ -2401,7 +2406,8 @@ static void vendor_disable_error_reporting(void)
>  	if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
>  	    boot_cpu_data.x86_vendor == X86_VENDOR_HYGON ||
>  	    boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
> -	    boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN)
> +	    boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN ||
> +	    boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR)
>  		return;
>  
>  	mce_disable_error_reporting();

At this point, should we even do this? It seems all major vendors want
to *not* disable MCA_CTL after init.

This, and related functions, can be deleted. Unless there's a compelling
reason to keep them.

Thanks,
Yazen
Re: [PATCH v3 1/3] x86/mce: Add centaur vendor to support Zhaoxin MCA
Posted by Tony W Wang-oc 2 months, 1 week ago

On 2024/9/19 21:55, Yazen Ghannam wrote:
> 
> On Wed, Sep 18, 2024 at 01:54:34PM +0800, Tony W Wang-oc wrote:
>> From: Lyle Li <LyleLi@zhaoxin.com>
>>
>> Zhaoxin consists of two vendors, X86_VENDOR_ZHAOXIN and
>> X86_VENDOR_CENTAUR, so add the centaur vendor to support
>> Zhaoxin MCA in mce/core.c and mce/intel.c.
>>
>> Signed-off-by: Lyle Li <LyleLi@zhaoxin.com>
>> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>> ---
>>   arch/x86/kernel/cpu/mce/core.c  | 12 +++++++++---
>>   arch/x86/kernel/cpu/mce/intel.c |  3 ++-
>>   2 files changed, 11 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
>> index ad0623b65..7f79d900f 100644
>> --- a/arch/x86/kernel/cpu/mce/core.c
>> +++ b/arch/x86/kernel/cpu/mce/core.c
>> @@ -496,6 +496,7 @@ bool mce_usable_address(struct mce *m)
>>
>>        case X86_VENDOR_INTEL:
>>        case X86_VENDOR_ZHAOXIN:
>> +     case X86_VENDOR_CENTAUR:
>>                return intel_mce_usable_address(m);
>>
>>        default:
>> @@ -513,6 +514,7 @@ bool mce_is_memory_error(struct mce *m)
>>
>>        case X86_VENDOR_INTEL:
>>        case X86_VENDOR_ZHAOXIN:
>> +     case X86_VENDOR_CENTAUR:
>>                /*
>>                 * Intel SDM Volume 3B - 15.9.2 Compound Error Codes
>>                 *
>> @@ -1247,7 +1249,8 @@ static noinstr bool mce_check_crashing_cpu(void)
>>
>>                mcgstatus = __rdmsr(MSR_IA32_MCG_STATUS);
>>
>> -             if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN) {
>> +             if (boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN ||
>> +                 boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR) {
>>                        if (mcgstatus & MCG_STATUS_LMCES)
>>                                return false;
>>                }
>> @@ -1521,7 +1524,8 @@ noinstr void do_machine_check(struct pt_regs *regs)
>>         * on Intel, Zhaoxin only.
>>         */
>>        if (m.cpuvendor == X86_VENDOR_INTEL ||
>> -         m.cpuvendor == X86_VENDOR_ZHAOXIN)
>> +         m.cpuvendor == X86_VENDOR_ZHAOXIN ||
>> +         m.cpuvendor == X86_VENDOR_CENTAUR)
>>                lmce = m.mcgstatus & MCG_STATUS_LMCES;
>>
>>        /*
>> @@ -2092,6 +2096,7 @@ static void __mcheck_cpu_clear_vendor(struct cpuinfo_x86 *c)
>>                break;
>>
>>        case X86_VENDOR_ZHAOXIN:
>> +     case X86_VENDOR_CENTAUR:
>>                mce_zhaoxin_feature_clear(c);
>>                break;
>>
>> @@ -2401,7 +2406,8 @@ static void vendor_disable_error_reporting(void)
>>        if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL ||
>>            boot_cpu_data.x86_vendor == X86_VENDOR_HYGON ||
>>            boot_cpu_data.x86_vendor == X86_VENDOR_AMD ||
>> -         boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN)
>> +         boot_cpu_data.x86_vendor == X86_VENDOR_ZHAOXIN ||
>> +         boot_cpu_data.x86_vendor == X86_VENDOR_CENTAUR)
>>                return;
>>
>>        mce_disable_error_reporting();
> 
> At this point, should we even do this? It seems all major vendors want
> to *not* disable MCA_CTL after init.
> 
> This, and related functions, can be deleted. Unless there's a compelling
> reason to keep them.
> 

 From the current code implementation, your suggestion is reasonable.
Will delete the related functions in the next version.

Sincerely
TonyWWang-oc
[PATCH v3 2/3] x86/mce: Add zhaoxin.c to support Zhaoxin MCA
Posted by Tony W Wang-oc 2 months, 1 week ago
From: Lyle Li <LyleLi@zhaoxin.com>

For the sake of code standardization, add zhaoxin.c to
override the Zhaoxin MCA code.

Signed-off-by: Lyle Li <LyleLi@zhaoxin.com>
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
 arch/x86/Kconfig                   |  8 ++++
 arch/x86/kernel/cpu/mce/Makefile   |  2 +-
 arch/x86/kernel/cpu/mce/core.c     | 57 --------------------------
 arch/x86/kernel/cpu/mce/internal.h |  7 ++++
 arch/x86/kernel/cpu/mce/zhaoxin.c  | 65 ++++++++++++++++++++++++++++++
 5 files changed, 81 insertions(+), 58 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/mce/zhaoxin.c

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1d7122a18..b908cdfb9 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1171,6 +1171,14 @@ config X86_MCE_INTEL
 	  Additional support for intel specific MCE features such as
 	  the thermal monitor.
 
+config X86_MCE_ZHAOXIN
+	def_bool y
+	prompt "Zhaoxin MCE features"
+	depends on X86_MCE_INTEL
+	help
+	  Additional support for zhaoxin specific MCE features such as
+	  the corrected machine check interrupt.
+
 config X86_MCE_AMD
 	def_bool y
 	prompt "AMD MCE features"
diff --git a/arch/x86/kernel/cpu/mce/Makefile b/arch/x86/kernel/cpu/mce/Makefile
index 015856abd..2e863e78d 100644
--- a/arch/x86/kernel/cpu/mce/Makefile
+++ b/arch/x86/kernel/cpu/mce/Makefile
@@ -5,7 +5,7 @@ obj-$(CONFIG_X86_ANCIENT_MCE)	+= winchip.o p5.o
 obj-$(CONFIG_X86_MCE_INTEL)	+= intel.o
 obj-$(CONFIG_X86_MCE_AMD)	+= amd.o
 obj-$(CONFIG_X86_MCE_THRESHOLD) += threshold.o
-
+obj-$(CONFIG_X86_MCE_ZHAOXIN)   += zhaoxin.o
 mce-inject-y			:= inject.o
 obj-$(CONFIG_X86_MCE_INJECT)	+= mce-inject.o
 
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 7f79d900f..5315c86b6 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1963,17 +1963,6 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
 			mce_flags.skx_repmov_quirk = 1;
 	}
 
-	if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
-		/*
-		 * All newer Zhaoxin CPUs support MCE broadcasting. Enable
-		 * synchronization with a one second timeout.
-		 */
-		if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
-			if (cfg->monarch_timeout < 0)
-				cfg->monarch_timeout = USEC_PER_SEC;
-		}
-	}
-
 	if (cfg->monarch_timeout < 0)
 		cfg->monarch_timeout = 0;
 	if (cfg->bootlog != 0)
@@ -2016,49 +2005,6 @@ static void __mcheck_cpu_init_early(struct cpuinfo_x86 *c)
 	}
 }
 
-static void mce_centaur_feature_init(struct cpuinfo_x86 *c)
-{
-	struct mca_config *cfg = &mca_cfg;
-
-	 /*
-	  * All newer Centaur CPUs support MCE broadcasting. Enable
-	  * synchronization with a one second timeout.
-	  */
-	if ((c->x86 == 6 && c->x86_model == 0xf && c->x86_stepping >= 0xe) ||
-	     c->x86 > 6) {
-		if (cfg->monarch_timeout < 0)
-			cfg->monarch_timeout = USEC_PER_SEC;
-	}
-}
-
-static void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c)
-{
-	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
-
-	/*
-	 * These CPUs have MCA bank 8 which reports only one error type called
-	 * SVAD (System View Address Decoder). The reporting of that error is
-	 * controlled by IA32_MC8.CTL.0.
-	 *
-	 * If enabled, prefetching on these CPUs will cause SVAD MCE when
-	 * virtual machines start and result in a system  panic. Always disable
-	 * bank 8 SVAD error by default.
-	 */
-	if ((c->x86 == 7 && c->x86_model == 0x1b) ||
-	    (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
-		if (this_cpu_read(mce_num_banks) > 8)
-			mce_banks[8].ctl = 0;
-	}
-
-	intel_init_cmci();
-	intel_init_lmce();
-}
-
-static void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c)
-{
-	intel_clear_lmce();
-}
-
 static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
 {
 	switch (c->x86_vendor) {
@@ -2076,9 +2022,6 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
 		break;
 
 	case X86_VENDOR_CENTAUR:
-		mce_centaur_feature_init(c);
-		break;
-
 	case X86_VENDOR_ZHAOXIN:
 		mce_zhaoxin_feature_init(c);
 		break;
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 01f8f0396..836e56027 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -334,4 +334,11 @@ static __always_inline u32 mca_msr_reg(int bank, enum mca_msr reg)
 }
 
 extern void (*mc_poll_banks)(void);
+#ifdef CONFIG_X86_MCE_ZHAOXIN
+void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c);
+void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c);
+#else
+static inline void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c) { }
+static inline void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c) { }
+#endif
 #endif /* __X86_MCE_INTERNAL_H__ */
diff --git a/arch/x86/kernel/cpu/mce/zhaoxin.c b/arch/x86/kernel/cpu/mce/zhaoxin.c
new file mode 100644
index 000000000..de69c560f
--- /dev/null
+++ b/arch/x86/kernel/cpu/mce/zhaoxin.c
@@ -0,0 +1,65 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Zhaoxin specific MCE features
+ * Author: Lyle Li
+ */
+#include <asm/msr.h>
+#include "internal.h"
+
+static void mce_zhaoxin_apply_mce_broadcast(struct cpuinfo_x86 *c)
+{
+	struct mca_config *cfg = &mca_cfg;
+
+	/* Older CPUs do not do MCE broadcast */
+	if (c->x86 < 6)
+		return;
+	/*
+	 * All newer Zhaoxin and Centaur CPUs support MCE broadcasting. Enable
+	 * synchronization with a one second timeout.
+	 */
+	if (c->x86 > 6)
+		goto mce_broadcast;
+
+	if (c->x86_vendor == X86_VENDOR_CENTAUR) {
+		if (c->x86_model == 0xf && c->x86_stepping >= 0xe)
+			goto mce_broadcast;
+	} else if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
+		if (c->x86_model == 0x19 || c->x86_model == 0x1f)
+			goto mce_broadcast;
+	}
+
+	return;
+
+mce_broadcast:
+	if (cfg->monarch_timeout <= 0)
+		cfg->monarch_timeout = USEC_PER_SEC;
+}
+
+void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c)
+{
+	struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
+
+	/*
+	 * These CPUs have MCA bank 8 which reports only one error type called
+	 * SVAD (System View Address Decoder). The reporting of that error is
+	 * controlled by IA32_MC8.CTL.0.
+	 *
+	 * If enabled, prefetching on these CPUs will cause SVAD MCE when
+	 * virtual machines start and result in a system  panic. Always disable
+	 * bank 8 SVAD error by default.
+	 */
+	if ((c->x86 == 7 && c->x86_model == 0x1b) ||
+	    (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
+		if (this_cpu_read(mce_num_banks) > 8)
+			mce_banks[8].ctl = 0;
+	}
+
+	mce_zhaoxin_apply_mce_broadcast(c);
+	intel_init_cmci();
+	intel_init_lmce();
+}
+
+void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c)
+{
+	intel_clear_lmce();
+}
-- 
2.34.1
[PATCH v3 3/3] x86/mce: Add CMCI storm switching support for Zhaoxin
Posted by Tony W Wang-oc 2 months, 1 week ago
From: Lyle Li <LyleLi@zhaoxin.com>

Zhaoxin CPUs support CMCI compatible with Intel, because
Zhaoxin's UCR error is not reported through CMCI, and in
order to be compatible with intel's CMCI code, so add Zhaoxin
CMCI storm toggle to support the new CMCI storm switching
in mce/intel.c, mce/zhaoxin.c, mce/threshold.c, and mce/internal.h.

Signed-off-by: Lyle Li <LyleLi@zhaoxin.com>
Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
---
 arch/x86/kernel/cpu/mce/intel.c     |  5 ++---
 arch/x86/kernel/cpu/mce/internal.h  |  7 ++++++-
 arch/x86/kernel/cpu/mce/threshold.c |  4 ++++
 arch/x86/kernel/cpu/mce/zhaoxin.c   | 18 ++++++++++++++++++
 4 files changed, 30 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index b7e67f4f7..aa75e2848 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -45,7 +45,7 @@ static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned);
  * cmci_discover_lock protects against parallel discovery attempts
  * which could race against each other.
  */
-static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
+DEFINE_RAW_SPINLOCK(cmci_discover_lock);
 
 /*
  * On systems that do support CMCI but it's disabled, polling for MCEs can
@@ -61,7 +61,7 @@ static DEFINE_SPINLOCK(cmci_poll_lock);
  * MCi_CTL2 threshold for each bank when there is no storm.
  * Default value for each bank may have been set by BIOS.
  */
-static u16 cmci_threshold[MAX_NR_BANKS];
+u16 cmci_threshold[MAX_NR_BANKS];
 
 /*
  * High threshold to limit CMCI rate during storms. Max supported is
@@ -73,7 +73,6 @@ static u16 cmci_threshold[MAX_NR_BANKS];
  * to corrected errors, so keeping CMCI enabled means that uncorrected
  * errors will still be processed in a timely fashion.
  */
-#define CMCI_STORM_THRESHOLD	32749
 
 static int cmci_supported(int *banks)
 {
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 836e56027..086b833c5 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -7,7 +7,7 @@
 
 #include <linux/device.h>
 #include <asm/mce.h>
-
+#include <linux/spinlock.h>
 enum severity_level {
 	MCE_NO_SEVERITY,
 	MCE_DEFERRED_SEVERITY,
@@ -334,11 +334,16 @@ static __always_inline u32 mca_msr_reg(int bank, enum mca_msr reg)
 }
 
 extern void (*mc_poll_banks)(void);
+#define CMCI_STORM_THRESHOLD    32749
+extern raw_spinlock_t cmci_discover_lock;
+extern u16 cmci_threshold[MAX_NR_BANKS];
 #ifdef CONFIG_X86_MCE_ZHAOXIN
 void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c);
 void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c);
+void mce_zhaoxin_handle_storm(int bank, bool on);
 #else
 static inline void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c) { }
 static inline void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c) { }
+static inline void mce_zhaoxin_handle_storm(int bank, bool on) { }
 #endif
 #endif /* __X86_MCE_INTERNAL_H__ */
diff --git a/arch/x86/kernel/cpu/mce/threshold.c b/arch/x86/kernel/cpu/mce/threshold.c
index 89e31e1e5..200280387 100644
--- a/arch/x86/kernel/cpu/mce/threshold.c
+++ b/arch/x86/kernel/cpu/mce/threshold.c
@@ -63,6 +63,10 @@ static void mce_handle_storm(unsigned int bank, bool on)
 	case X86_VENDOR_INTEL:
 		mce_intel_handle_storm(bank, on);
 		break;
+	case X86_VENDOR_ZHAOXIN:
+	case X86_VENDOR_CENTAUR:
+		mce_zhaoxin_handle_storm(bank, on);
+		break;
 	}
 }
 
diff --git a/arch/x86/kernel/cpu/mce/zhaoxin.c b/arch/x86/kernel/cpu/mce/zhaoxin.c
index de69c560f..994b8520a 100644
--- a/arch/x86/kernel/cpu/mce/zhaoxin.c
+++ b/arch/x86/kernel/cpu/mce/zhaoxin.c
@@ -63,3 +63,21 @@ void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c)
 {
 	intel_clear_lmce();
 }
+
+void mce_zhaoxin_handle_storm(int bank, bool on)
+{
+	unsigned long flags;
+	u64 val;
+
+	raw_spin_lock_irqsave(&cmci_discover_lock, flags);
+	rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
+	if (on) {
+		val &= ~(MCI_CTL2_CMCI_EN | MCI_CTL2_CMCI_THRESHOLD_MASK);
+		val |= CMCI_STORM_THRESHOLD;
+	} else {
+		val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
+		val |= (MCI_CTL2_CMCI_EN | cmci_threshold[bank]);
+	}
+	wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
+	raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
+}
-- 
2.34.1
RE: [PATCH v3 3/3] x86/mce: Add CMCI storm switching support for Zhaoxin
Posted by Zhuo, Qiuxu 2 months, 1 week ago
> From: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> [...]
> Subject: [PATCH v3 3/3] x86/mce: Add CMCI storm switching support for
> [...]
> --- a/arch/x86/kernel/cpu/mce/zhaoxin.c
> +++ b/arch/x86/kernel/cpu/mce/zhaoxin.c
> @@ -63,3 +63,21 @@ void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c)
> {
>  	intel_clear_lmce();
>  }
> +
> +void mce_zhaoxin_handle_storm(int bank, bool on) {
> +	unsigned long flags;
> +	u64 val;
> +
> +	raw_spin_lock_irqsave(&cmci_discover_lock, flags);
> +	rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
> +	if (on) {
> +		val &= ~(MCI_CTL2_CMCI_EN |
> MCI_CTL2_CMCI_THRESHOLD_MASK);
> +		val |= CMCI_STORM_THRESHOLD;
> +	} else {
> +		val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
> +		val |= (MCI_CTL2_CMCI_EN | cmci_threshold[bank]);
> +	}
> +	wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
> +	raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); }

Are there any reasons or comments why it needs to disable/enable the CMCI interrupt 
here during a CMCI storm on/off? If not, then reuse mce_intel_handle_storm() to avoid 
duplicating the code.

-Qiuxu
Re: [PATCH v3 3/3] x86/mce: Add CMCI storm switching support for Zhaoxin
Posted by Tony W Wang-oc 2 months, 1 week ago

On 2024/9/20 17:17, Zhuo, Qiuxu wrote:
> 
>> From: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
>> [...]
>> Subject: [PATCH v3 3/3] x86/mce: Add CMCI storm switching support for
>> [...]
>> --- a/arch/x86/kernel/cpu/mce/zhaoxin.c
>> +++ b/arch/x86/kernel/cpu/mce/zhaoxin.c
>> @@ -63,3 +63,21 @@ void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c)
>> {
>>        intel_clear_lmce();
>>   }
>> +
>> +void mce_zhaoxin_handle_storm(int bank, bool on) {
>> +     unsigned long flags;
>> +     u64 val;
>> +
>> +     raw_spin_lock_irqsave(&cmci_discover_lock, flags);
>> +     rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
>> +     if (on) {
>> +             val &= ~(MCI_CTL2_CMCI_EN |
>> MCI_CTL2_CMCI_THRESHOLD_MASK);
>> +             val |= CMCI_STORM_THRESHOLD;
>> +     } else {
>> +             val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
>> +             val |= (MCI_CTL2_CMCI_EN | cmci_threshold[bank]);
>> +     }
>> +     wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
>> +     raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); }
> 
> Are there any reasons or comments why it needs to disable/enable the CMCI interrupt
> here during a CMCI storm on/off? If not, then reuse mce_intel_handle_storm() to avoid
> duplicating the code.
> 

As explained in another email.
The reason is actually mentioned in the cover letter: "because Zhaoxin's 
UCR error is not reported through CMCI", and we want to disable CMCI 
interrupt when CMCI storm happened.

Sincerely
TonyWWang-oc
RE: [PATCH v3 3/3] x86/mce: Add CMCI storm switching support for Zhaoxin
Posted by Zhuo, Qiuxu 2 months, 1 week ago
> From: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> [...]
> >> --- a/arch/x86/kernel/cpu/mce/zhaoxin.c
> >> +++ b/arch/x86/kernel/cpu/mce/zhaoxin.c
> >> @@ -63,3 +63,21 @@ void mce_zhaoxin_feature_clear(struct cpuinfo_x86
> >> *c) {
> >>        intel_clear_lmce();
> >>   }
> >> +
> >> +void mce_zhaoxin_handle_storm(int bank, bool on) {
> >> +     unsigned long flags;
> >> +     u64 val;
> >> +
> >> +     raw_spin_lock_irqsave(&cmci_discover_lock, flags);
> >> +     rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
> >> +     if (on) {
> >> +             val &= ~(MCI_CTL2_CMCI_EN |
> >> MCI_CTL2_CMCI_THRESHOLD_MASK);
> >> +             val |= CMCI_STORM_THRESHOLD;
> >> +     } else {
> >> +             val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
> >> +             val |= (MCI_CTL2_CMCI_EN | cmci_threshold[bank]);
> >> +     }
> >> +     wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
> >> +     raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); }
> >
> > Are there any reasons or comments why it needs to disable/enable the
> > CMCI interrupt here during a CMCI storm on/off? If not, then reuse
> > mce_intel_handle_storm() to avoid duplicating the code.
> >
> 
> As explained in another email.
> The reason is actually mentioned in the cover letter: "because Zhaoxin's UCR
> error is not reported through CMCI", and we want to disable CMCI interrupt
> when CMCI storm happened.

So, this is just you want to disable CMCI when a CMCI storm happens. 
This doesn't explain much to me.
What's the problem if not disable CMCI when a CMCI storm happens?

Re: [PATCH v3 3/3] x86/mce: Add CMCI storm switching support for Zhaoxin
Posted by Yazen Ghannam 2 months, 1 week ago
On Fri, Sep 20, 2024 at 11:44:59AM +0000, Zhuo, Qiuxu wrote:
> > From: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> > [...]
> > >> --- a/arch/x86/kernel/cpu/mce/zhaoxin.c
> > >> +++ b/arch/x86/kernel/cpu/mce/zhaoxin.c
> > >> @@ -63,3 +63,21 @@ void mce_zhaoxin_feature_clear(struct cpuinfo_x86
> > >> *c) {
> > >>        intel_clear_lmce();
> > >>   }
> > >> +
> > >> +void mce_zhaoxin_handle_storm(int bank, bool on) {
> > >> +     unsigned long flags;
> > >> +     u64 val;
> > >> +
> > >> +     raw_spin_lock_irqsave(&cmci_discover_lock, flags);
> > >> +     rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
> > >> +     if (on) {
> > >> +             val &= ~(MCI_CTL2_CMCI_EN |
> > >> MCI_CTL2_CMCI_THRESHOLD_MASK);
> > >> +             val |= CMCI_STORM_THRESHOLD;
> > >> +     } else {
> > >> +             val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
> > >> +             val |= (MCI_CTL2_CMCI_EN | cmci_threshold[bank]);
> > >> +     }
> > >> +     wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
> > >> +     raw_spin_unlock_irqrestore(&cmci_discover_lock, flags); }
> > >
> > > Are there any reasons or comments why it needs to disable/enable the
> > > CMCI interrupt here during a CMCI storm on/off? If not, then reuse
> > > mce_intel_handle_storm() to avoid duplicating the code.
> > >
> > 
> > As explained in another email.
> > The reason is actually mentioned in the cover letter: "because Zhaoxin's UCR
> > error is not reported through CMCI", and we want to disable CMCI interrupt
> > when CMCI storm happened.
> 
> So, this is just you want to disable CMCI when a CMCI storm happens. 
> This doesn't explain much to me.
> What's the problem if not disable CMCI when a CMCI storm happens?
>

A more direct way to handle an interrupt storm is to turn off the
interrupt. The proposed AMD solution would also do this.

Reprogramming the threshold to a high value does not 100% guarantee that
a storm will be mitigated. But this is a necessary trade-off given that
the CMCI is used to report other error types on Intel systems.

Thanks,
Yazen
RE: [PATCH v3 3/3] x86/mce: Add CMCI storm switching support for Zhaoxin
Posted by Zhuo, Qiuxu 2 months, 1 week ago
> From: Yazen Ghannam <yazen.ghannam@amd.com>
> Sent: Friday, September 20, 2024 9:36 PM
> [...]
> > So, this is just you want to disable CMCI when a CMCI storm happens.
> > This doesn't explain much to me.
> > What's the problem if not disable CMCI when a CMCI storm happens?
> >
> 
> A more direct way to handle an interrupt storm is to turn off the interrupt. The
> proposed AMD solution would also do this.
> 
> Reprogramming the threshold to a high value does not 100% guarantee that a
> storm will be mitigated. But this is a necessary trade-off given that the CMCI is
> used to report other error types on Intel systems.

Thanks for your comments. 
Re: [PATCH v3 3/3] x86/mce: Add CMCI storm switching support for Zhaoxin
Posted by Yazen Ghannam 2 months, 1 week ago
On Wed, Sep 18, 2024 at 01:54:36PM +0800, Tony W Wang-oc wrote:
> From: Lyle Li <LyleLi@zhaoxin.com>
> 
> Zhaoxin CPUs support CMCI compatible with Intel, because
> Zhaoxin's UCR error is not reported through CMCI, and in
> order to be compatible with intel's CMCI code, so add Zhaoxin
> CMCI storm toggle to support the new CMCI storm switching
> in mce/intel.c, mce/zhaoxin.c, mce/threshold.c, and mce/internal.h.
> 
> Signed-off-by: Lyle Li <LyleLi@zhaoxin.com>
> Signed-off-by: Tony W Wang-oc <TonyWWang-oc@zhaoxin.com>
> ---
>  arch/x86/kernel/cpu/mce/intel.c     |  5 ++---
>  arch/x86/kernel/cpu/mce/internal.h  |  7 ++++++-
>  arch/x86/kernel/cpu/mce/threshold.c |  4 ++++
>  arch/x86/kernel/cpu/mce/zhaoxin.c   | 18 ++++++++++++++++++
>  4 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
> index b7e67f4f7..aa75e2848 100644
> --- a/arch/x86/kernel/cpu/mce/intel.c
> +++ b/arch/x86/kernel/cpu/mce/intel.c
> @@ -45,7 +45,7 @@ static DEFINE_PER_CPU(mce_banks_t, mce_banks_owned);
>   * cmci_discover_lock protects against parallel discovery attempts
>   * which could race against each other.
>   */
> -static DEFINE_RAW_SPINLOCK(cmci_discover_lock);
> +DEFINE_RAW_SPINLOCK(cmci_discover_lock);
>  
>  /*
>   * On systems that do support CMCI but it's disabled, polling for MCEs can
> @@ -61,7 +61,7 @@ static DEFINE_SPINLOCK(cmci_poll_lock);
>   * MCi_CTL2 threshold for each bank when there is no storm.
>   * Default value for each bank may have been set by BIOS.
>   */
> -static u16 cmci_threshold[MAX_NR_BANKS];
> +u16 cmci_threshold[MAX_NR_BANKS];
>  
>  /*
>   * High threshold to limit CMCI rate during storms. Max supported is
> @@ -73,7 +73,6 @@ static u16 cmci_threshold[MAX_NR_BANKS];
>   * to corrected errors, so keeping CMCI enabled means that uncorrected
>   * errors will still be processed in a timely fashion.
>   */
> -#define CMCI_STORM_THRESHOLD	32749
>  
>  static int cmci_supported(int *banks)
>  {
> diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
> index 836e56027..086b833c5 100644
> --- a/arch/x86/kernel/cpu/mce/internal.h
> +++ b/arch/x86/kernel/cpu/mce/internal.h
> @@ -7,7 +7,7 @@
>  
>  #include <linux/device.h>
>  #include <asm/mce.h>
> -
> +#include <linux/spinlock.h>
>  enum severity_level {
>  	MCE_NO_SEVERITY,
>  	MCE_DEFERRED_SEVERITY,
> @@ -334,11 +334,16 @@ static __always_inline u32 mca_msr_reg(int bank, enum mca_msr reg)
>  }
>  
>  extern void (*mc_poll_banks)(void);
> +#define CMCI_STORM_THRESHOLD    32749
> +extern raw_spinlock_t cmci_discover_lock;
> +extern u16 cmci_threshold[MAX_NR_BANKS];
>  #ifdef CONFIG_X86_MCE_ZHAOXIN
>  void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c);
>  void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c);
> +void mce_zhaoxin_handle_storm(int bank, bool on);
>  #else
>  static inline void mce_zhaoxin_feature_init(struct cpuinfo_x86 *c) { }
>  static inline void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c) { }
> +static inline void mce_zhaoxin_handle_storm(int bank, bool on) { }
>  #endif
>  #endif /* __X86_MCE_INTERNAL_H__ */
> diff --git a/arch/x86/kernel/cpu/mce/threshold.c b/arch/x86/kernel/cpu/mce/threshold.c
> index 89e31e1e5..200280387 100644
> --- a/arch/x86/kernel/cpu/mce/threshold.c
> +++ b/arch/x86/kernel/cpu/mce/threshold.c
> @@ -63,6 +63,10 @@ static void mce_handle_storm(unsigned int bank, bool on)
>  	case X86_VENDOR_INTEL:
>  		mce_intel_handle_storm(bank, on);
>  		break;
> +	case X86_VENDOR_ZHAOXIN:
> +	case X86_VENDOR_CENTAUR:
> +		mce_zhaoxin_handle_storm(bank, on);
> +		break;
>  	}
>  }
>  
> diff --git a/arch/x86/kernel/cpu/mce/zhaoxin.c b/arch/x86/kernel/cpu/mce/zhaoxin.c
> index de69c560f..994b8520a 100644
> --- a/arch/x86/kernel/cpu/mce/zhaoxin.c
> +++ b/arch/x86/kernel/cpu/mce/zhaoxin.c
> @@ -63,3 +63,21 @@ void mce_zhaoxin_feature_clear(struct cpuinfo_x86 *c)
>  {
>  	intel_clear_lmce();
>  }
> +
> +void mce_zhaoxin_handle_storm(int bank, bool on)
> +{
> +	unsigned long flags;
> +	u64 val;
> +
> +	raw_spin_lock_irqsave(&cmci_discover_lock, flags);
> +	rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
> +	if (on) {
> +		val &= ~(MCI_CTL2_CMCI_EN | MCI_CTL2_CMCI_THRESHOLD_MASK);
> +		val |= CMCI_STORM_THRESHOLD;
> +	} else {
> +		val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
> +		val |= (MCI_CTL2_CMCI_EN | cmci_threshold[bank]);
> +	}
> +	wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
> +	raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
> +}

Why does this need to be different than mce_intel_handle_storm()?

Thanks,
Yazen
Re: [PATCH v3 3/3] x86/mce: Add CMCI storm switching support for Zhaoxin
Posted by Tony W Wang-oc 2 months, 1 week ago

On 2024/9/19 22:06, Yazen Ghannam wrote:
> 
>> +void mce_zhaoxin_handle_storm(int bank, bool on)
>> +{
>> +     unsigned long flags;
>> +     u64 val;
>> +
>> +     raw_spin_lock_irqsave(&cmci_discover_lock, flags);
>> +     rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
>> +     if (on) {
>> +             val &= ~(MCI_CTL2_CMCI_EN | MCI_CTL2_CMCI_THRESHOLD_MASK);
>> +             val |= CMCI_STORM_THRESHOLD;
>> +     } else {
>> +             val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
>> +             val |= (MCI_CTL2_CMCI_EN | cmci_threshold[bank]);
>> +     }
>> +     wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
>> +     raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
>> +}
> 
> Why does this need to be different than mce_intel_handle_storm()?
> 

The reason is actually mentioned in the cover letter: "because Zhaoxin's 
UCR error is not reported through CMCI", and we want to disable CMCI 
interrupt when CMCI storm happened.

Sincerely
TonyWWang-oc
Re: [PATCH v3 3/3] x86/mce: Add CMCI storm switching support for Zhaoxin
Posted by Yazen Ghannam 2 months, 1 week ago
On Fri, Sep 20, 2024 at 06:42:15PM +0800, Tony W Wang-oc wrote:
> 
> 
> On 2024/9/19 22:06, Yazen Ghannam wrote:
> > 
> > > +void mce_zhaoxin_handle_storm(int bank, bool on)
> > > +{
> > > +     unsigned long flags;
> > > +     u64 val;
> > > +
> > > +     raw_spin_lock_irqsave(&cmci_discover_lock, flags);
> > > +     rdmsrl(MSR_IA32_MCx_CTL2(bank), val);
> > > +     if (on) {
> > > +             val &= ~(MCI_CTL2_CMCI_EN | MCI_CTL2_CMCI_THRESHOLD_MASK);
> > > +             val |= CMCI_STORM_THRESHOLD;
> > > +     } else {
> > > +             val &= ~MCI_CTL2_CMCI_THRESHOLD_MASK;
> > > +             val |= (MCI_CTL2_CMCI_EN | cmci_threshold[bank]);
> > > +     }
> > > +     wrmsrl(MSR_IA32_MCx_CTL2(bank), val);
> > > +     raw_spin_unlock_irqrestore(&cmci_discover_lock, flags);
> > > +}
> > 
> > Why does this need to be different than mce_intel_handle_storm()?
> > 
> 
> The reason is actually mentioned in the cover letter: "because Zhaoxin's UCR
> error is not reported through CMCI", and we want to disable CMCI interrupt
> when CMCI storm happened.
>

I see, so you want to disable the interrupt entirely rather than
reprogramming to a high value. Makes sense.

Thanks,
Yazen