arch/x86/include/asm/mce.h | 4 +- arch/x86/kernel/cpu/mce/amd.c | 18 +-- arch/x86/kernel/cpu/mce/core.c | 230 +++++++++++++++------------ arch/x86/kernel/cpu/mce/dev-mcelog.c | 11 +- arch/x86/kernel/cpu/mce/genpool.c | 9 +- arch/x86/kernel/cpu/mce/inject.c | 2 +- arch/x86/kernel/cpu/mce/intel.c | 11 +- arch/x86/kernel/cpu/mce/threshold.c | 2 +- 8 files changed, 149 insertions(+), 138 deletions(-)
Clean up some x86/mce code as below. No functional changes intended. - Simplify some code. - Remove some unnecessary code. - Improve readability for some code. - Convert some family/model mixed checks to VFM-based checks. - Fix some typos. Pass the following basic tests: - Tested on an Intel Sapphire Rapids server. - Compile test. - System boot test. - Correctable/uncorrectable memory errors can be notified via CMCI/MCE interrupts. - Correctable/uncorrectable memory errors can be dispatched to the mcelog daemon and the EDAC driver. Changes in v3: - Collect "Reviewed-by:" from Nikolay & Sohil. - Drop the "x86/mce: Remove the redundant zeroing assignments" patch. - 0003: Rename mce_notify_irq() to mce_notify_user(). - 0005: Move the 'int ret' variable along with the other int variables. - 0006: New patch. Break up __mcheck_cpu_apply_quirks(). - 0007: New patch. Convert family/model mixed checks to VFM-based checks. - 0009: Remove the variable names from the commit message. - 0010: Remove the detail typos from the commit message. Changes in v2: - Collect "Reviewed-by:" from Tony. - 0009: Add the missing variable names to the commit message. This series is based on v6.12-rc4. Thanks Tony, Dave, Sohil and Nikolay for your review and discussion on this series. Qiuxu Zhuo (9): x86/mce/dev-mcelog: Use xchg() to get and clear the flags x86/mce/intel: Use MCG_BANKCNT_MASK instead of 0xff x86/mce: Make several functions return bool and rename a function x86/mce/threshold: Remove the redundant this_cpu_dec_return() x86/mce/genpool: Make mce_gen_pool_create() return explicit error codes x86/mce: Convert family/model mixed checks to VFM-based checks x86/mce: Remove the unnecessary {} x86/mce/amd: Remove unnecessary NULL pointer initializations x86/mce: Fix typos Tony Luck (1): x86/mce: Break up __mcheck_cpu_apply_quirks() arch/x86/include/asm/mce.h | 4 +- arch/x86/kernel/cpu/mce/amd.c | 18 +-- arch/x86/kernel/cpu/mce/core.c | 230 +++++++++++++++------------ arch/x86/kernel/cpu/mce/dev-mcelog.c | 11 +- arch/x86/kernel/cpu/mce/genpool.c | 9 +- arch/x86/kernel/cpu/mce/inject.c | 2 +- arch/x86/kernel/cpu/mce/intel.c | 11 +- arch/x86/kernel/cpu/mce/threshold.c | 2 +- 8 files changed, 149 insertions(+), 138 deletions(-) base-commit: 42f7652d3eb527d03665b09edac47f85fb600924 -- 2.17.1
Clean up some x86/mce code as below. No functional changes intended. - Simplify some code. - Remove some unnecessary code. - Improve readability for some code. - Convert some family/model mixed checks to VFM-based checks. - Fix some typos. Pass the following basic tests: - Tested on an Intel Sapphire Rapids server. - Compile test. - System boot test. - Correctable/uncorrectable memory errors can be notified via CMCI/MCE interrupts. - Correctable/uncorrectable memory errors can be dispatched to the mcelog daemon and the EDAC driver. Changes in v4: - Drop the first two patches as they've landed in the TIP ras/core branch. - Drop "Make mce_gen_pool_create() return explicit error codes" patch. - 0001: Don't rename mce_notify_irq(). - 0003: New patch. Make four functions return bool. - 0004: Add necessary blank lines and directly use 'mca_cfg'. Changes in v3: - Collect "Reviewed-by:" from Nikolay & Sohil. - Drop the "x86/mce: Remove the redundant zeroing assignments" patch. - 0003: Rename mce_notify_irq() to mce_notify_user(). - 0005: Move the 'int ret' variable along with the other int variables. - 0006: New patch. Break up __mcheck_cpu_apply_quirks(). - 0007: New patch. Convert family/model mixed checks to VFM-based checks. - 0009: Remove the variables' names from the commit message. - 0010: Remove the detail typos from the commit message. Changes in v2: - Collect "Reviewed-by:" from Tony. - Update the commit message of patch 9 to include the names of all variables that don't need NULL pointer initializations. This series is based on the TIP branch ras/core with top commit 612c2addff36. Thanks Thomas, Boris, Tony, Dave, Sohil and Nikolay for your review and discussion on this series. Qiuxu Zhuo (7): x86/mce: Make several functions return bool x86/mce/threshold: Remove the redundant this_cpu_dec_return() x86/mce: Make four functions return bool x86/mce: Convert family/model mixed checks to VFM-based checks x86/mce: Remove the unnecessary {} x86/mce/amd: Remove unnecessary NULL pointer initializations x86/mce: Fix typos Tony Luck (1): x86/mce: Break up __mcheck_cpu_apply_quirks() arch/x86/include/asm/mce.h | 4 +- arch/x86/kernel/cpu/mce/amd.c | 18 +-- arch/x86/kernel/cpu/mce/core.c | 234 +++++++++++++++------------- arch/x86/kernel/cpu/mce/genpool.c | 29 ++-- arch/x86/kernel/cpu/mce/intel.c | 9 +- arch/x86/kernel/cpu/mce/internal.h | 4 +- arch/x86/kernel/cpu/mce/threshold.c | 2 +- 7 files changed, 158 insertions(+), 142 deletions(-) base-commit: 612c2addff367ee461dc99ffca2bc786f105d2ec -- 2.17.1
Make several functions that return 0 or 1 return a boolean value for
better readability.
No functional changes are intended.
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
Changes in v4:
- Don't rename mce_notify_irq() (Boris).
Changes in v3:
- Collect "Reviewed-by:" from Nikolay & Sohil.
- Rename mce_notify_irq() to mce_notify_user() (Sohil).
Changes in v2:
- Collect "Reviewed-by:" from Tony.
arch/x86/include/asm/mce.h | 4 ++--
arch/x86/kernel/cpu/mce/amd.c | 10 +++++-----
arch/x86/kernel/cpu/mce/core.c | 22 +++++++++++-----------
arch/x86/kernel/cpu/mce/intel.c | 9 +++++----
4 files changed, 23 insertions(+), 22 deletions(-)
diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index 4543cf2eb5e8..ea9ca7689f6b 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -276,7 +276,7 @@ static inline void cmci_rediscover(void) {}
static inline void cmci_recheck(void) {}
#endif
-int mce_available(struct cpuinfo_x86 *c);
+bool mce_available(struct cpuinfo_x86 *c);
bool mce_is_memory_error(struct mce *m);
bool mce_is_correctable(struct mce *m);
bool mce_usable_address(struct mce *m);
@@ -296,7 +296,7 @@ enum mcp_flags {
void machine_check_poll(enum mcp_flags flags, mce_banks_t *b);
-int mce_notify_irq(void);
+bool mce_notify_irq(void);
DECLARE_PER_CPU(struct mce, injectm);
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 6ca80fff1fea..018874b554cb 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -381,7 +381,7 @@ static bool lvt_interrupt_supported(unsigned int bank, u32 msr_high_bits)
return msr_high_bits & BIT(28);
}
-static int lvt_off_valid(struct threshold_block *b, int apic, u32 lo, u32 hi)
+static bool lvt_off_valid(struct threshold_block *b, int apic, u32 lo, u32 hi)
{
int msr = (hi & MASK_LVTOFF_HI) >> 20;
@@ -389,7 +389,7 @@ static int lvt_off_valid(struct threshold_block *b, int apic, u32 lo, u32 hi)
pr_err(FW_BUG "cpu %d, failed to setup threshold interrupt "
"for bank %d, block %d (MSR%08X=0x%x%08x)\n", b->cpu,
b->bank, b->block, b->address, hi, lo);
- return 0;
+ return false;
}
if (apic != msr) {
@@ -399,15 +399,15 @@ static int lvt_off_valid(struct threshold_block *b, int apic, u32 lo, u32 hi)
* was set is reserved. Return early here:
*/
if (mce_flags.smca)
- return 0;
+ return false;
pr_err(FW_BUG "cpu %d, invalid threshold interrupt offset %d "
"for bank %d, block %d (MSR%08X=0x%x%08x)\n",
b->cpu, apic, b->bank, b->block, b->address, hi, lo);
- return 0;
+ return false;
}
- return 1;
+ return true;
};
/* Reprogram MCx_MISC MSR behind this threshold bank. */
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 7fb5556a0b53..167965bd2ac0 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -492,10 +492,10 @@ static noinstr void mce_gather_info(struct mce_hw_err *err, struct pt_regs *regs
}
}
-int mce_available(struct cpuinfo_x86 *c)
+bool mce_available(struct cpuinfo_x86 *c)
{
if (mca_cfg.disabled)
- return 0;
+ return false;
return cpu_has(c, X86_FEATURE_MCE) && cpu_has(c, X86_FEATURE_MCA);
}
@@ -1778,7 +1778,7 @@ static void mce_timer_delete_all(void)
* Can be called from interrupt context, but not from machine check/NMI
* context.
*/
-int mce_notify_irq(void)
+bool mce_notify_irq(void)
{
/* Not more than two messages every minute */
static DEFINE_RATELIMIT_STATE(ratelimit, 60*HZ, 2);
@@ -1789,9 +1789,9 @@ int mce_notify_irq(void)
if (__ratelimit(&ratelimit))
pr_info(HW_ERR "Machine check events logged\n");
- return 1;
+ return true;
}
- return 0;
+ return false;
}
EXPORT_SYMBOL_GPL(mce_notify_irq);
@@ -2015,25 +2015,25 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
return 0;
}
-static int __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c)
+static bool __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c)
{
if (c->x86 != 5)
- return 0;
+ return false;
switch (c->x86_vendor) {
case X86_VENDOR_INTEL:
intel_p5_mcheck_init(c);
mce_flags.p5 = 1;
- return 1;
+ return true;
case X86_VENDOR_CENTAUR:
winchip_mcheck_init(c);
mce_flags.winchip = 1;
- return 1;
+ return true;
default:
- return 0;
+ return false;
}
- return 0;
+ return false;
}
/*
diff --git a/arch/x86/kernel/cpu/mce/intel.c b/arch/x86/kernel/cpu/mce/intel.c
index b3cd2c61b11d..f863df0ff42c 100644
--- a/arch/x86/kernel/cpu/mce/intel.c
+++ b/arch/x86/kernel/cpu/mce/intel.c
@@ -75,12 +75,12 @@ static u16 cmci_threshold[MAX_NR_BANKS];
*/
#define CMCI_STORM_THRESHOLD 32749
-static int cmci_supported(int *banks)
+static bool cmci_supported(int *banks)
{
u64 cap;
if (mca_cfg.cmci_disabled || mca_cfg.ignore_ce)
- return 0;
+ return false;
/*
* Vendor check is not strictly needed, but the initial
@@ -89,10 +89,11 @@ static int cmci_supported(int *banks)
*/
if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL &&
boot_cpu_data.x86_vendor != X86_VENDOR_ZHAOXIN)
- return 0;
+ return false;
if (!boot_cpu_has(X86_FEATURE_APIC) || lapic_get_maxlvt() < 6)
- return 0;
+ return false;
+
rdmsrl(MSR_IA32_MCG_CAP, cap);
*banks = min_t(unsigned, MAX_NR_BANKS, cap & MCG_BANKCNT_MASK);
return !!(cap & MCG_CMCI_P);
--
2.17.1
The 'storm' variable points to this_cpu_ptr(&storm_desc). Access the
'stormy_bank_count' field through the 'storm' to avoid calling
this_cpu_*() on the same per-CPU variable twice.
This minor optimization reduces the text size by 16 bytes.
$ size threshold.o.*
text data bss dec hex filename
1395 1664 0 3059 bf3 threshold.o.old
1379 1664 0 3043 be3 threshold.o.new
No functional changes intended.
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
Changes in v4:
- No changes.
Changes in v3:
- Collect "Reviewed-by:" from Nikolay & Sohil.
Changes in v2:
- Collect "Reviewed-by:" from Tony.
arch/x86/kernel/cpu/mce/threshold.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/mce/threshold.c b/arch/x86/kernel/cpu/mce/threshold.c
index 89e31e1e5c9c..f4a007616468 100644
--- a/arch/x86/kernel/cpu/mce/threshold.c
+++ b/arch/x86/kernel/cpu/mce/threshold.c
@@ -90,7 +90,7 @@ void cmci_storm_end(unsigned int bank)
storm->banks[bank].in_storm_mode = false;
/* If no banks left in storm mode, stop polling. */
- if (!this_cpu_dec_return(storm_desc.stormy_bank_count))
+ if (!--storm->stormy_bank_count)
mce_timer_kick(false);
}
--
2.17.1
Make those functions whose callers only care about success or failure
return a boolean value for better readability. Also, update the call
sites accordingly.
No functional changes.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
Changes in v4:
- New patch.
arch/x86/kernel/cpu/mce/core.c | 12 ++++++------
arch/x86/kernel/cpu/mce/genpool.c | 29 ++++++++++++++---------------
arch/x86/kernel/cpu/mce/internal.h | 4 ++--
3 files changed, 22 insertions(+), 23 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 167965bd2ac0..ce6fe5e20805 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -151,7 +151,7 @@ EXPORT_PER_CPU_SYMBOL_GPL(injectm);
void mce_log(struct mce_hw_err *err)
{
- if (!mce_gen_pool_add(err))
+ if (mce_gen_pool_add(err))
irq_work_queue(&mce_irq_work);
}
EXPORT_SYMBOL_GPL(mce_log);
@@ -1911,14 +1911,14 @@ static void __mcheck_cpu_check_banks(void)
}
/* Add per CPU specific workarounds here */
-static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
+static bool __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
{
struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
struct mca_config *cfg = &mca_cfg;
if (c->x86_vendor == X86_VENDOR_UNKNOWN) {
pr_info("unknown CPU type - not enabling MCE support\n");
- return -EOPNOTSUPP;
+ return false;
}
/* This should be disabled by the BIOS, but isn't always */
@@ -2012,7 +2012,7 @@ static int __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
if (cfg->bootlog != 0)
cfg->panic_timeout = 30;
- return 0;
+ return true;
}
static bool __mcheck_cpu_ancient_init(struct cpuinfo_x86 *c)
@@ -2279,12 +2279,12 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c)
__mcheck_cpu_cap_init();
- if (__mcheck_cpu_apply_quirks(c) < 0) {
+ if (!__mcheck_cpu_apply_quirks(c)) {
mca_cfg.disabled = 1;
return;
}
- if (mce_gen_pool_init()) {
+ if (!mce_gen_pool_init()) {
mca_cfg.disabled = 1;
pr_emerg("Couldn't allocate MCE records pool!\n");
return;
diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/genpool.c
index d0be6dda0c14..3ca9c007a666 100644
--- a/arch/x86/kernel/cpu/mce/genpool.c
+++ b/arch/x86/kernel/cpu/mce/genpool.c
@@ -94,64 +94,63 @@ bool mce_gen_pool_empty(void)
return llist_empty(&mce_event_llist);
}
-int mce_gen_pool_add(struct mce_hw_err *err)
+bool mce_gen_pool_add(struct mce_hw_err *err)
{
struct mce_evt_llist *node;
if (filter_mce(&err->m))
- return -EINVAL;
+ return false;
if (!mce_evt_pool)
- return -EINVAL;
+ return false;
node = (void *)gen_pool_alloc(mce_evt_pool, sizeof(*node));
if (!node) {
pr_warn_ratelimited("MCE records pool full!\n");
- return -ENOMEM;
+ return false;
}
memcpy(&node->err, err, sizeof(*err));
llist_add(&node->llnode, &mce_event_llist);
- return 0;
+ return true;
}
-static int mce_gen_pool_create(void)
+static bool mce_gen_pool_create(void)
{
int mce_numrecords, mce_poolsz, order;
struct gen_pool *gpool;
- int ret = -ENOMEM;
void *mce_pool;
order = order_base_2(sizeof(struct mce_evt_llist));
gpool = gen_pool_create(order, -1);
if (!gpool)
- return ret;
+ return false;
mce_numrecords = max(MCE_MIN_ENTRIES, num_possible_cpus() * MCE_PER_CPU);
mce_poolsz = mce_numrecords * (1 << order);
mce_pool = kmalloc(mce_poolsz, GFP_KERNEL);
if (!mce_pool) {
gen_pool_destroy(gpool);
- return ret;
+ return false;
}
- ret = gen_pool_add(gpool, (unsigned long)mce_pool, mce_poolsz, -1);
- if (ret) {
+
+ if (gen_pool_add(gpool, (unsigned long)mce_pool, mce_poolsz, -1)) {
gen_pool_destroy(gpool);
kfree(mce_pool);
- return ret;
+ return false;
}
mce_evt_pool = gpool;
- return ret;
+ return true;
}
-int mce_gen_pool_init(void)
+bool mce_gen_pool_init(void)
{
/* Just init mce_gen_pool once. */
if (mce_evt_pool)
- return 0;
+ return true;
return mce_gen_pool_create();
}
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h
index 84f810598231..95a504ece43e 100644
--- a/arch/x86/kernel/cpu/mce/internal.h
+++ b/arch/x86/kernel/cpu/mce/internal.h
@@ -31,8 +31,8 @@ struct mce_evt_llist {
void mce_gen_pool_process(struct work_struct *__unused);
bool mce_gen_pool_empty(void);
-int mce_gen_pool_add(struct mce_hw_err *err);
-int mce_gen_pool_init(void);
+bool mce_gen_pool_add(struct mce_hw_err *err);
+bool mce_gen_pool_init(void);
struct llist_node *mce_gen_pool_prepare_records(void);
int mce_severity(struct mce *a, struct pt_regs *regs, char **msg, bool is_excp);
--
2.17.1
On 11/10/2024 10:04 PM, Qiuxu Zhuo wrote: > Make those functions whose callers only care about success or failure > return a boolean value for better readability. Also, update the call > sites accordingly. > > No functional changes. > > Suggested-by: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > --- Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> > Changes in v4: > - New patch. > > arch/x86/kernel/cpu/mce/core.c | 12 ++++++------ > arch/x86/kernel/cpu/mce/genpool.c | 29 ++++++++++++++--------------- > arch/x86/kernel/cpu/mce/internal.h | 4 ++-- > 3 files changed, 22 insertions(+), 23 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > index 167965bd2ac0..ce6fe5e20805 100644 > --- a/arch/x86/kernel/cpu/mce/core.c > +++ b/arch/x86/kernel/cpu/mce/core.c > @@ -151,7 +151,7 @@ EXPORT_PER_CPU_SYMBOL_GPL(injectm); > > void mce_log(struct mce_hw_err *err) > { > - if (!mce_gen_pool_add(err)) > + if (mce_gen_pool_add(err)) > irq_work_queue(&mce_irq_work); > } The polarities of all the return values have been flipped. They all look correct but took me a few minutes to figure out. Might be useful to make a mention in the commit message if you end up doing another version (Not needed otherwise). > EXPORT_SYMBOL_GPL(mce_log);
From: Tony Luck <tony.luck@intel.com>
Split each vendor specific part into its own helper function.
Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
Changes in v4:
- Add necessary blank lines in apply_quirks_amd() (Yazen).
- Use 'mca_cfg' instead of 'cfg' in apply_quirks_*(). (Yazen).
Changes in v3:
- New patch.
arch/x86/kernel/cpu/mce/core.c | 192 ++++++++++++++++++---------------
1 file changed, 104 insertions(+), 88 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index ce6fe5e20805..3855ec2ed0e0 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1910,101 +1910,117 @@ static void __mcheck_cpu_check_banks(void)
}
}
+static void apply_quirks_amd(struct cpuinfo_x86 *c)
+{
+ struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
+
+ /* This should be disabled by the BIOS, but isn't always */
+ if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) {
+ /*
+ * disable GART TBL walk error reporting, which
+ * trips off incorrectly with the IOMMU & 3ware
+ * & Cerberus:
+ */
+ clear_bit(10, (unsigned long *)&mce_banks[4].ctl);
+ }
+
+ if (c->x86 < 0x11 && mca_cfg.bootlog < 0) {
+ /*
+ * Lots of broken BIOS around that don't clear them
+ * by default and leave crap in there. Don't log:
+ */
+ mca_cfg.bootlog = 0;
+ }
+
+ /*
+ * Various K7s with broken bank 0 around. Always disable
+ * by default.
+ */
+ if (c->x86 == 6 && this_cpu_read(mce_num_banks) > 0)
+ mce_banks[0].ctl = 0;
+
+ /*
+ * overflow_recov is supported for F15h Models 00h-0fh
+ * even though we don't have a CPUID bit for it.
+ */
+ if (c->x86 == 0x15 && c->x86_model <= 0xf)
+ mce_flags.overflow_recov = 1;
+
+ if (c->x86 >= 0x17 && c->x86 <= 0x1A)
+ mce_flags.zen_ifu_quirk = 1;
+}
+
+static void apply_quirks_intel(struct cpuinfo_x86 *c)
+{
+ struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
+
+ /*
+ * SDM documents that on family 6 bank 0 should not be written
+ * because it aliases to another special BIOS controlled
+ * register.
+ * But it's not aliased anymore on model 0x1a+
+ * Don't ignore bank 0 completely because there could be a
+ * valid event later, merely don't write CTL0.
+ */
+ if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0)
+ mce_banks[0].init = false;
+
+ /*
+ * All newer Intel systems support MCE broadcasting. Enable
+ * synchronization with a one second timeout.
+ */
+ if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
+ mca_cfg.monarch_timeout < 0)
+ mca_cfg.monarch_timeout = USEC_PER_SEC;
+
+ /*
+ * There are also broken BIOSes on some Pentium M and
+ * earlier systems:
+ */
+ if (c->x86 == 6 && c->x86_model <= 13 && mca_cfg.bootlog < 0)
+ mca_cfg.bootlog = 0;
+
+ if (c->x86_vfm == INTEL_SANDYBRIDGE_X)
+ mce_flags.snb_ifu_quirk = 1;
+
+ /*
+ * Skylake, Cascacde Lake and Cooper Lake require a quirk on
+ * rep movs.
+ */
+ if (c->x86_vfm == INTEL_SKYLAKE_X)
+ mce_flags.skx_repmov_quirk = 1;
+}
+
+static void apply_quirks_zhaoxin(struct cpuinfo_x86 *c)
+{
+ /*
+ * All newer Zhaoxin CPUs support MCE broadcasting. Enable
+ * synchronization with a one second timeout.
+ */
+ if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
+ if (mca_cfg.monarch_timeout < 0)
+ mca_cfg.monarch_timeout = USEC_PER_SEC;
+ }
+}
+
/* Add per CPU specific workarounds here */
static bool __mcheck_cpu_apply_quirks(struct cpuinfo_x86 *c)
{
- struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
struct mca_config *cfg = &mca_cfg;
- if (c->x86_vendor == X86_VENDOR_UNKNOWN) {
+ switch (c->x86_vendor) {
+ case X86_VENDOR_UNKNOWN:
pr_info("unknown CPU type - not enabling MCE support\n");
return false;
- }
-
- /* This should be disabled by the BIOS, but isn't always */
- if (c->x86_vendor == X86_VENDOR_AMD) {
- if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) {
- /*
- * disable GART TBL walk error reporting, which
- * trips off incorrectly with the IOMMU & 3ware
- * & Cerberus:
- */
- clear_bit(10, (unsigned long *)&mce_banks[4].ctl);
- }
- if (c->x86 < 0x11 && cfg->bootlog < 0) {
- /*
- * Lots of broken BIOS around that don't clear them
- * by default and leave crap in there. Don't log:
- */
- cfg->bootlog = 0;
- }
- /*
- * Various K7s with broken bank 0 around. Always disable
- * by default.
- */
- if (c->x86 == 6 && this_cpu_read(mce_num_banks) > 0)
- mce_banks[0].ctl = 0;
-
- /*
- * overflow_recov is supported for F15h Models 00h-0fh
- * even though we don't have a CPUID bit for it.
- */
- if (c->x86 == 0x15 && c->x86_model <= 0xf)
- mce_flags.overflow_recov = 1;
-
- if (c->x86 >= 0x17 && c->x86 <= 0x1A)
- mce_flags.zen_ifu_quirk = 1;
-
- }
-
- if (c->x86_vendor == X86_VENDOR_INTEL) {
- /*
- * SDM documents that on family 6 bank 0 should not be written
- * because it aliases to another special BIOS controlled
- * register.
- * But it's not aliased anymore on model 0x1a+
- * Don't ignore bank 0 completely because there could be a
- * valid event later, merely don't write CTL0.
- */
-
- if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0)
- mce_banks[0].init = false;
-
- /*
- * All newer Intel systems support MCE broadcasting. Enable
- * synchronization with a one second timeout.
- */
- if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
- cfg->monarch_timeout < 0)
- cfg->monarch_timeout = USEC_PER_SEC;
-
- /*
- * There are also broken BIOSes on some Pentium M and
- * earlier systems:
- */
- if (c->x86 == 6 && c->x86_model <= 13 && cfg->bootlog < 0)
- cfg->bootlog = 0;
-
- if (c->x86_vfm == INTEL_SANDYBRIDGE_X)
- mce_flags.snb_ifu_quirk = 1;
-
- /*
- * Skylake, Cascacde Lake and Cooper Lake require a quirk on
- * rep movs.
- */
- if (c->x86_vfm == INTEL_SKYLAKE_X)
- mce_flags.skx_repmov_quirk = 1;
- }
-
- if (c->x86_vendor == X86_VENDOR_ZHAOXIN) {
- /*
- * All newer Zhaoxin CPUs support MCE broadcasting. Enable
- * synchronization with a one second timeout.
- */
- if (c->x86 > 6 || (c->x86_model == 0x19 || c->x86_model == 0x1f)) {
- if (cfg->monarch_timeout < 0)
- cfg->monarch_timeout = USEC_PER_SEC;
- }
+ case X86_VENDOR_AMD:
+ apply_quirks_amd(c);
+ break;
+ case X86_VENDOR_INTEL:
+ apply_quirks_intel(c);
+ break;
+ case X86_VENDOR_ZHAOXIN:
+ apply_quirks_zhaoxin(c);
+ break;
}
if (cfg->monarch_timeout < 0)
--
2.17.1
On 11/10/2024 10:04 PM, Qiuxu Zhuo wrote: > From: Tony Luck <tony.luck@intel.com> > > Split each vendor specific part into its own helper function. > > Tested-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > Signed-off-by: Tony Luck <tony.luck@intel.com> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > --- Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Convert family/model mixed checks to VFM-based checks to make
the code more compact.
Suggested-by: Sohil Mehta <sohil.mehta@intel.com>
Suggested-by: Dave Hansen <dave.hansen@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
Changes in v4:
- No changes but rebased.
Changes in v3:
- Newly added.
arch/x86/kernel/cpu/mce/core.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 3855ec2ed0e0..d288cc7390f6 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1954,6 +1954,10 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c)
{
struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array);
+ /* Older CPUs (prior to family 6) don't need quirks. */
+ if (c->x86_vfm < INTEL_PENTIUM_PRO)
+ return;
+
/*
* SDM documents that on family 6 bank 0 should not be written
* because it aliases to another special BIOS controlled
@@ -1962,22 +1966,21 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c)
* Don't ignore bank 0 completely because there could be a
* valid event later, merely don't write CTL0.
*/
- if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0)
+ if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0)
mce_banks[0].init = false;
/*
* All newer Intel systems support MCE broadcasting. Enable
* synchronization with a one second timeout.
*/
- if ((c->x86 > 6 || (c->x86 == 6 && c->x86_model >= 0xe)) &&
- mca_cfg.monarch_timeout < 0)
+ if (c->x86_vfm >= INTEL_CORE_YONAH && mca_cfg.monarch_timeout < 0)
mca_cfg.monarch_timeout = USEC_PER_SEC;
/*
* There are also broken BIOSes on some Pentium M and
* earlier systems:
*/
- if (c->x86 == 6 && c->x86_model <= 13 && mca_cfg.bootlog < 0)
+ if (c->x86_vfm < INTEL_CORE_YONAH && mca_cfg.bootlog < 0)
mca_cfg.bootlog = 0;
if (c->x86_vfm == INTEL_SANDYBRIDGE_X)
--
2.17.1
On Mon, Nov 11, 2024 at 02:04:25PM +0800, Qiuxu Zhuo wrote: > Convert family/model mixed checks to VFM-based checks to make > the code more compact. > > Suggested-by: Sohil Mehta <sohil.mehta@intel.com> > Suggested-by: Dave Hansen <dave.hansen@intel.com> > Reviewed-by: Tony Luck <tony.luck@intel.com> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > --- > Changes in v4: > - No changes but rebased. > > Changes in v3: > - Newly added. > > arch/x86/kernel/cpu/mce/core.c | 11 +++++++---- > 1 file changed, 7 insertions(+), 4 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > index 3855ec2ed0e0..d288cc7390f6 100644 > --- a/arch/x86/kernel/cpu/mce/core.c > +++ b/arch/x86/kernel/cpu/mce/core.c > @@ -1954,6 +1954,10 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c) > { > struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array); > > + /* Older CPUs (prior to family 6) don't need quirks. */ > + if (c->x86_vfm < INTEL_PENTIUM_PRO) > + return; > + Is it possible for pre-"family 6" to get here? Family 5 is "ancient" which has its own MCE init path. And I assume anything older doesn't support MCE/MCA. Is this correct? > /* > * SDM documents that on family 6 bank 0 should not be written > * because it aliases to another special BIOS controlled > @@ -1962,22 +1966,21 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c) > * Don't ignore bank 0 completely because there could be a > * valid event later, merely don't write CTL0. > */ > - if (c->x86 == 6 && c->x86_model < 0x1A && this_cpu_read(mce_num_banks) > 0) > + if (c->x86_vfm < INTEL_NEHALEM_EP && this_cpu_read(mce_num_banks) > 0) The "> 0" is not needed, since mce_num_banks is unsigned int. Otherwise, looks good. Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Thanks, Yazen
Hi Yazen, > From: Yazen Ghannam <yazen.ghannam@amd.com> > [...] > > @@ -1954,6 +1954,10 @@ static void apply_quirks_intel(struct > > cpuinfo_x86 *c) { > > struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array); > > > > + /* Older CPUs (prior to family 6) don't need quirks. */ > > + if (c->x86_vfm < INTEL_PENTIUM_PRO) > > + return; > > + > > Is it possible for pre-"family 6" to get here? > > Family 5 is "ancient" which has its own MCE init path. And I assume anything > older doesn't support MCE/MCA. Is this correct? Yes, there is an early return in __mcheck_cpu_ancient_init() for Family 5. However, this code explicitly indicates that "prior to families 6 don't need quirks" and addresses concerns like: https://lore.kernel.org/all/dcfdba92-7004-413d-8011-12771636d11f@intel.com/ > > /* > > * SDM documents that on family 6 bank 0 should not be written > > * because it aliases to another special BIOS controlled @@ -1962,22 > > +1966,21 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c) > > * Don't ignore bank 0 completely because there could be a > > * valid event later, merely don't write CTL0. > > */ > > - if (c->x86 == 6 && c->x86_model < 0x1A && > this_cpu_read(mce_num_banks) > 0) > > + if (c->x86_vfm < INTEL_NEHALEM_EP && > this_cpu_read(mce_num_banks) > > > +0) > > The "> 0" is not needed, since mce_num_banks is unsigned int. I don't get your point here. But it needs to check for the case where mce_num_banks == 0. > Otherwise, looks good. > > Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Thanks! -Qiuxu
On Wed, Nov 13, 2024 at 12:10:31PM +0000, Zhuo, Qiuxu wrote: > Hi Yazen, > > > From: Yazen Ghannam <yazen.ghannam@amd.com> > > [...] > > > @@ -1954,6 +1954,10 @@ static void apply_quirks_intel(struct > > > cpuinfo_x86 *c) { > > > struct mce_bank *mce_banks = this_cpu_ptr(mce_banks_array); > > > > > > + /* Older CPUs (prior to family 6) don't need quirks. */ > > > + if (c->x86_vfm < INTEL_PENTIUM_PRO) > > > + return; > > > + > > > > Is it possible for pre-"family 6" to get here? > > > > Family 5 is "ancient" which has its own MCE init path. And I assume anything > > older doesn't support MCE/MCA. Is this correct? > > Yes, there is an early return in __mcheck_cpu_ancient_init() for Family 5. > However, this code explicitly indicates that "prior to families 6 don't need quirks" > and addresses concerns like: > > https://lore.kernel.org/all/dcfdba92-7004-413d-8011-12771636d11f@intel.com/ > Right, but my point is that this check would never be executed, since the older systems would not get here during init. So this seems like dead code. > > > /* > > > * SDM documents that on family 6 bank 0 should not be written > > > * because it aliases to another special BIOS controlled @@ -1962,22 > > > +1966,21 @@ static void apply_quirks_intel(struct cpuinfo_x86 *c) > > > * Don't ignore bank 0 completely because there could be a > > > * valid event later, merely don't write CTL0. > > > */ > > > - if (c->x86 == 6 && c->x86_model < 0x1A && > > this_cpu_read(mce_num_banks) > 0) > > > + if (c->x86_vfm < INTEL_NEHALEM_EP && > > this_cpu_read(mce_num_banks) > > > > +0) > > > > The "> 0" is not needed, since mce_num_banks is unsigned int. > > I don't get your point here. > But it needs to check for the case where mce_num_banks == 0. > The check is "mce_num_banks > 0", and mce_num_banks is an unsigned int. Therefore, the check is reduced to "mce_num_banks != 0". In this case, you can just do "if (mce_num_banks)" to the same effect. Thanks, Yazen
Hi Yazen, > From: Yazen Ghannam <yazen.ghannam@amd.com> > [...] > > > > + if (c->x86_vfm < INTEL_NEHALEM_EP && > > > this_cpu_read(mce_num_banks) > > > > > +0) > > > > > > The "> 0" is not needed, since mce_num_banks is unsigned int. > > > > I don't get your point here. > > But it needs to check for the case where mce_num_banks == 0. > > > > The check is "mce_num_banks > 0", and mce_num_banks is an unsigned int. > Therefore, the check is reduced to "mce_num_banks != 0". In this case, you > can just do "if (mce_num_banks)" to the same effect. I got you. OK, if nobody else objects, I'll update it in the next version. [ Hope others won't blame this as over-optimization. ] -Qiuxu
On 11/10/2024 10:04 PM, Qiuxu Zhuo wrote: > Convert family/model mixed checks to VFM-based checks to make > the code more compact. > > Suggested-by: Sohil Mehta <sohil.mehta@intel.com> > Suggested-by: Dave Hansen <dave.hansen@intel.com> > Reviewed-by: Tony Luck <tony.luck@intel.com> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > --- Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Remove the unnecessary {} from the case statement.
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
Changes in v4:
- No changes.
Changes in v3:
- Collect "Reviewed-by:" from Nikolay & Sohil.
Changes in v2:
- Collect "Reviewed-by:" from Tony.
arch/x86/kernel/cpu/mce/core.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index d288cc7390f6..0f0c6e9d9183 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -2118,10 +2118,9 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c)
mce_intel_feature_init(c);
break;
- case X86_VENDOR_AMD: {
+ case X86_VENDOR_AMD:
mce_amd_feature_init(c);
break;
- }
case X86_VENDOR_HYGON:
mce_hygon_feature_init(c);
--
2.17.1
On Mon, Nov 11, 2024 at 02:04:26PM +0800, Qiuxu Zhuo wrote: > Remove the unnecessary {} from the case statement. > > Reviewed-by: Tony Luck <tony.luck@intel.com> > Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> > Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> But please see note below. > --- > Changes in v4: > - No changes. > > Changes in v3: > - Collect "Reviewed-by:" from Nikolay & Sohil. > > Changes in v2: > - Collect "Reviewed-by:" from Tony. > > arch/x86/kernel/cpu/mce/core.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > index d288cc7390f6..0f0c6e9d9183 100644 > --- a/arch/x86/kernel/cpu/mce/core.c > +++ b/arch/x86/kernel/cpu/mce/core.c > @@ -2118,10 +2118,9 @@ static void __mcheck_cpu_init_vendor(struct cpuinfo_x86 *c) > mce_intel_feature_init(c); > break; > > - case X86_VENDOR_AMD: { > + case X86_VENDOR_AMD: > mce_amd_feature_init(c); > break; > - } > > case X86_VENDOR_HYGON: > mce_hygon_feature_init(c); > -- I think this could be a bit more substantive if you also combine the AMD and HYGON cases. And remove mce_hygon_feature_init() which just calls mce_amd_feature_init() anyway. Thanks, Yazen
Hi Yazen, > From: Yazen Ghannam <yazen.ghannam@amd.com> > [...] > > Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Thanks! > But please see note below. > [...] > > - case X86_VENDOR_AMD: { > > + case X86_VENDOR_AMD: > > mce_amd_feature_init(c); > > break; > > - } > > > > case X86_VENDOR_HYGON: > > mce_hygon_feature_init(c); > > -- > > I think this could be a bit more substantive if you also combine the AMD and > HYGON cases. And remove mce_hygon_feature_init() which just calls > mce_amd_feature_init() anyway. How about a separate patch for this? If It's OK for you, I'll follow up on it after the current patch series has settled. -Qiuxu
On Wed, Nov 13, 2024 at 01:32:08PM +0000, Zhuo, Qiuxu wrote: > How about a separate patch for this? Can you drop this micro-change per patch? Just do a single patch here which fixes up everything mentioned during review in that area and be done with it. -- Regards/Gruss, Boris. https://people.kernel.org/tglx/notes-about-netiquette
Hi Boris, > From: Borislav Petkov <bp@alien8.de> > [...] > > Can you drop this micro-change per patch? Just do a single patch here which > fixes up everything mentioned during review in that area and be done with it. OK. I'll drop this one and replace it with Yazen's suggestion in next version. -Qiuxu
On Wed, Nov 13, 2024 at 01:32:08PM +0000, Zhuo, Qiuxu wrote: > Hi Yazen, > > > From: Yazen Ghannam <yazen.ghannam@amd.com> > > [...] > > > > Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> > > Thanks! > > > But please see note below. > > [...] > > > - case X86_VENDOR_AMD: { > > > + case X86_VENDOR_AMD: > > > mce_amd_feature_init(c); > > > break; > > > - } > > > > > > case X86_VENDOR_HYGON: > > > mce_hygon_feature_init(c); > > > -- > > > > I think this could be a bit more substantive if you also combine the AMD and > > HYGON cases. And remove mce_hygon_feature_init() which just calls > > mce_amd_feature_init() anyway. > > How about a separate patch for this? > If It's OK for you, I'll follow up on it after the current patch series has settled. > Sure thing. No problem. Thanks, Yazen
Remove unnecessary NULL pointer initializations from variables that
are already initialized before use.
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
Changes in v4:
- No changes.
Changes in v3:
- Collect "Reviewed-by:" Nikolay & Sohil.
- Remove the variables' names from the commit message (Sohil).
arch/x86/kernel/cpu/mce/amd.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c
index 018874b554cb..c79a82912d38 100644
--- a/arch/x86/kernel/cpu/mce/amd.c
+++ b/arch/x86/kernel/cpu/mce/amd.c
@@ -921,8 +921,8 @@ static void log_and_reset_block(struct threshold_block *block)
*/
static void amd_threshold_interrupt(void)
{
- struct threshold_block *first_block = NULL, *block = NULL, *tmp = NULL;
struct threshold_bank **bp = this_cpu_read(threshold_banks);
+ struct threshold_block *first_block, *block, *tmp;
unsigned int bank, cpu = smp_processor_id();
/*
@@ -1201,8 +1201,7 @@ static int allocate_threshold_blocks(unsigned int cpu, struct threshold_bank *tb
static int __threshold_add_blocks(struct threshold_bank *b)
{
struct list_head *head = &b->blocks->miscj;
- struct threshold_block *pos = NULL;
- struct threshold_block *tmp = NULL;
+ struct threshold_block *pos, *tmp;
int err = 0;
err = kobject_add(&b->blocks->kobj, b->kobj, b->blocks->kobj.name);
@@ -1312,8 +1311,7 @@ static void deallocate_threshold_blocks(struct threshold_bank *bank)
static void __threshold_remove_blocks(struct threshold_bank *b)
{
- struct threshold_block *pos = NULL;
- struct threshold_block *tmp = NULL;
+ struct threshold_block *pos, *tmp;
kobject_put(b->kobj);
--
2.17.1
On Mon, Nov 11, 2024 at 02:04:27PM +0800, Qiuxu Zhuo wrote: > Remove unnecessary NULL pointer initializations from variables that > are already initialized before use. > > Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> > Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > --- Reviewed-by: Yazen Ghannam <yazen.ghannam@amd.com> Thanks, Yazen
Fix typos in comments.
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Nikolay Borisov <nik.borisov@suse.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
---
Changes in v4:
- No changes.
Changes in v3:
- Collect "Reviewed-by:" from Nikolay & Sohil.
- Remove the detail typos from the commit message (Sohil).
Changes in v2:
- Collect "Reviewed-by:" from Tony.
arch/x86/kernel/cpu/mce/core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 0f0c6e9d9183..6e194ccffc7c 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -1144,7 +1144,7 @@ static noinstr int mce_start(int *no_way_out)
} else {
/*
* Subject: Now start the scanning loop one by one in
- * the original callin order.
+ * the original calling order.
* This way when there are any shared banks it will be
* only seen by one CPU before cleared, avoiding duplicates.
*/
@@ -1917,7 +1917,7 @@ static void apply_quirks_amd(struct cpuinfo_x86 *c)
/* This should be disabled by the BIOS, but isn't always */
if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) {
/*
- * disable GART TBL walk error reporting, which
+ * disable GART TLB walk error reporting, which
* trips off incorrectly with the IOMMU & 3ware
* & Cerberus:
*/
--
2.17.1
On Mon, Nov 11, 2024 at 02:04:28PM +0800, Qiuxu Zhuo wrote: > Fix typos in comments. > > Reviewed-by: Tony Luck <tony.luck@intel.com> > Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> > Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > --- > Changes in v4: > - No changes. > > Changes in v3: > - Collect "Reviewed-by:" from Nikolay & Sohil. > - Remove the detail typos from the commit message (Sohil). > > Changes in v2: > - Collect "Reviewed-by:" from Tony. > > arch/x86/kernel/cpu/mce/core.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > index 0f0c6e9d9183..6e194ccffc7c 100644 > --- a/arch/x86/kernel/cpu/mce/core.c > +++ b/arch/x86/kernel/cpu/mce/core.c > @@ -1144,7 +1144,7 @@ static noinstr int mce_start(int *no_way_out) > } else { > /* > * Subject: Now start the scanning loop one by one in > - * the original callin order. > + * the original calling order. I don't think this is a typo. It seems to refer to the mce_callin variable/idea. For example, each CPU "calls in" when ready. This is independent of when each CPU is "called" to do something. CPUs are called in this order 0, 1, 2. CPUs "call in" in this order 1, 0, 2. When a CPU is called can be different from when it responds. Maybe I'm reading too much into this. :/ > * This way when there are any shared banks it will be > * only seen by one CPU before cleared, avoiding duplicates. > */ > @@ -1917,7 +1917,7 @@ static void apply_quirks_amd(struct cpuinfo_x86 *c) > /* This should be disabled by the BIOS, but isn't always */ > if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) { > /* > - * disable GART TBL walk error reporting, which > + * disable GART TLB walk error reporting, which This also is not a typo. TBL -> table From old AMD K8 BKDG document: 10 GartTblWkEn GART Table Walk Error Reporting Enable R/W 0 Thanks, Yazen
Hi Yazen, > From: Yazen Ghannam <yazen.ghannam@amd.com> > [...] > > @@ -1144,7 +1144,7 @@ static noinstr int mce_start(int *no_way_out) > > } else { > > /* > > * Subject: Now start the scanning loop one by one in > > - * the original callin order. > > + * the original calling order. > > I don't think this is a typo. It seems to refer to the mce_callin variable/idea. > > For example, each CPU "calls in" when ready. This is independent of when > each CPU is "called" to do something. > > CPUs are called in this order 0, 1, 2. > CPUs "call in" in this order 1, 0, 2. > > When a CPU is called can be different from when it responds. > > Maybe I'm reading too much into this. :/ Too finicky to me :( But I appreciate you sharing your reading and thoughts. 😊 -Qiuxu
On 11/12/2024 7:38 AM, Yazen Ghannam wrote: > >> * This way when there are any shared banks it will be >> * only seen by one CPU before cleared, avoiding duplicates. >> */ >> @@ -1917,7 +1917,7 @@ static void apply_quirks_amd(struct cpuinfo_x86 *c) >> /* This should be disabled by the BIOS, but isn't always */ >> if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) { >> /* >> - * disable GART TBL walk error reporting, which >> + * disable GART TLB walk error reporting, which > > This also is not a typo. TBL -> table > > From old AMD K8 BKDG document: > 10 GartTblWkEn GART Table Walk Error Reporting Enable R/W 0 > There is another comment in init_amd_gh() that seems to be related to the same thing and similarly worded. That seems to refer to TLB instead of TBL(table). /* * Disable GART TLB Walk Errors on Fam10h. We do this here because this * is always needed when GART is enabled, even in a kernel which has no * MCE support built in. BIOS should disable GartTlbWlk Errors already. * If it doesn't, we do it here as suggested by the BKDG. * * Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=33012 */ msr_set_bit(MSR_AMD64_MCx_MASK(4), 10);
On Tue, Nov 12, 2024 at 02:35:32PM -0800, Sohil Mehta wrote: > On 11/12/2024 7:38 AM, Yazen Ghannam wrote: > > > >> * This way when there are any shared banks it will be > >> * only seen by one CPU before cleared, avoiding duplicates. > >> */ > >> @@ -1917,7 +1917,7 @@ static void apply_quirks_amd(struct cpuinfo_x86 *c) > >> /* This should be disabled by the BIOS, but isn't always */ > >> if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) { > >> /* > >> - * disable GART TBL walk error reporting, which > >> + * disable GART TLB walk error reporting, which > > > > This also is not a typo. TBL -> table > > > > From old AMD K8 BKDG document: > > 10 GartTblWkEn GART Table Walk Error Reporting Enable R/W 0 > > > > There is another comment in init_amd_gh() that seems to be related to > the same thing and similarly worded. That seems to refer to TLB instead > of TBL(table). > > /* > * Disable GART TLB Walk Errors on Fam10h. We do this here because this > * is always needed when GART is enabled, even in a kernel which has no > * MCE support built in. BIOS should disable GartTlbWlk Errors already. > * If it doesn't, we do it here as suggested by the BKDG. > * > * Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=33012 > */ > msr_set_bit(MSR_AMD64_MCx_MASK(4), 10); > > Now I think *that* is a typo, since it doesn't match the documentation. :) Thanks, Yazen
> From: Yazen Ghannam <yazen.ghannam@amd.com> > [...] > > >> - * disable GART TBL walk error reporting, which > > >> + * disable GART TLB walk error reporting, which > > > > > > This also is not a typo. TBL -> table > > > > > > From old AMD K8 BKDG document: > > > 10 GartTblWkEn GART Table Walk Error Reporting Enable R/W 0 > > > > > > > There is another comment in init_amd_gh() that seems to be related to > > the same thing and similarly worded. That seems to refer to TLB > > instead of TBL(table). > > > > /* > > * Disable GART TLB Walk Errors on Fam10h. We do this here because > > this > > * is always needed when GART is enabled, even in a kernel which has > > no > > * MCE support built in. BIOS should disable GartTlbWlk Errors already. > > * If it doesn't, we do it here as suggested by the BKDG. > > * > > * Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=33012 > > */ Thanks Sohil for this reference. > > msr_set_bit(MSR_AMD64_MCx_MASK(4), 10); > > > > > > Now I think *that* is a typo, since it doesn't match the documentation. > :) As Yazen has the documentation that says TBL (table) is the intended meaning. I'll drop this patch in next version. -Qiuxu
On 11/10/2024 10:04 PM, Qiuxu Zhuo wrote: > Fix typos in comments. > > Reviewed-by: Tony Luck <tony.luck@intel.com> > Reviewed-by: Nikolay Borisov <nik.borisov@suse.com> > Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> > Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> > --- It might have been fine to merge this with patch 6/8. They both touch the same file and the changes are mostly superficial. Probably consider it if you do another revision but not needed otherwise. > arch/x86/kernel/cpu/mce/core.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c > index 0f0c6e9d9183..6e194ccffc7c 100644 > --- a/arch/x86/kernel/cpu/mce/core.c > +++ b/arch/x86/kernel/cpu/mce/core.c > @@ -1144,7 +1144,7 @@ static noinstr int mce_start(int *no_way_out) > } else { > /* > * Subject: Now start the scanning loop one by one in > - * the original callin order. > + * the original calling order. > * This way when there are any shared banks it will be > * only seen by one CPU before cleared, avoiding duplicates. > */ > @@ -1917,7 +1917,7 @@ static void apply_quirks_amd(struct cpuinfo_x86 *c) > /* This should be disabled by the BIOS, but isn't always */ > if (c->x86 == 15 && this_cpu_read(mce_num_banks) > 4) { > /* > - * disable GART TBL walk error reporting, which > + * disable GART TLB walk error reporting, which > * trips off incorrectly with the IOMMU & 3ware > * & Cerberus: > */
© 2016 - 2024 Red Hat, Inc.