From nobody Sun Nov 24 20:34:58 2024 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 75E6219C54E; Fri, 1 Nov 2024 11:41:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730461273; cv=none; b=fzpdupYEibfH9oG2lhUDHEITztf9uroTpo9WI0OnwnoYQVZLN/0W3qhtUKwWCeyTRh7G047/8Dbc+AkFJFdrQ5MC9ql9KoD7pafVk5NC1yXOr3B7kQj9KByrcMkIwVlTimjSag6reyts8LZBo0gsW5k73ipvRb8QfpP2/v0k2CE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1730461273; c=relaxed/simple; bh=tZljawsPB14Q5w4g2150A7qIeqDj40ztH0aDebUSwyM=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=pAcl/SK/t17ggEWHYYe0rE0vCRjlqGq4caZje/bnslLTH729PNJ0yU1C30fgc5ZSvVGNNpGJVPkiuAeAyD+u6yC0dDf41PmkO7x/pAF1CkBjSDrX0BwmZXeSZAemO4AqnRip8z0ElcdHa3H2EU9WgLkQYFpQrZ2s13JsFLnVIPE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Y5O4PRI3; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=nDzEuDha; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Y5O4PRI3"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="nDzEuDha" Date: Fri, 01 Nov 2024 11:41:06 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1730461267; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=H2rir96t8jTjvgaP+au67UDaUcv92Prj3L3pG29xg+M=; b=Y5O4PRI3qpNYwLseZLUASDqrRoUm+Z5DmmEulOVwmXxQzH/pdYn0ePRngfjgtq6C97Ae9l TfgMRJOKJysHvLc+9dVV9n/8Z7mOI7AaAtEAG0dm3pPAhnCDGLnxp5H0ernbzqw/7R08yV xv0ar/bax2H8e1QvgTdP4JiZrI4IxCGmcGWZVQsImF2y7KnDzRX8FzAJPREcfjeq9YW6OY UJdoDdUq/h8xgcNH9KvRYfQR6S9x5QihkaodR0lWfaWUhy9ieJm9UPw7tqHYQfA0qfIosf rlDp1WH8RdOwIumyliI2IMOnJ0XucS3Z3qW2UG0uEaf3OuJp4KHiLOTFowwL6A== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1730461267; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=H2rir96t8jTjvgaP+au67UDaUcv92Prj3L3pG29xg+M=; b=nDzEuDha4bD8YLA+9a5+CO5DbPPMDTRazVYZpGs1qPH3/R0QFkX65deqmoFKTbGcuhpi48 4NjMtE/b2G0YV1Dg== From: "tip-bot2 for Avadhut Naik" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: ras/core] x86/mce: Add wrapper for struct mce to export vendor specific info Cc: "Borislav Petkov (AMD)" , Avadhut Naik , Yazen Ghannam , Qiuxu Zhuo , x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20241022194158.110073-2-avadhut.naik@amd.com> References: <20241022194158.110073-2-avadhut.naik@amd.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <173046126668.3137.12340046489566276647.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the ras/core branch of tip: Commit-ID: 750fd23926f1507cc826b5a4fdd4bfc7283e7723 Gitweb: https://git.kernel.org/tip/750fd23926f1507cc826b5a4fdd4bfc72= 83e7723 Author: Avadhut Naik AuthorDate: Tue, 22 Oct 2024 19:36:27=20 Committer: Borislav Petkov (AMD) CommitterDate: Wed, 30 Oct 2024 17:18:59 +01:00 x86/mce: Add wrapper for struct mce to export vendor specific info Currently, exporting new additional machine check error information involves adding new fields for the same at the end of the struct mce. This additional information can then be consumed through mcelog or tracepoint. However, as new MSRs are being added (and will be added in the future) by CPU vendors on their newer CPUs with additional machine check error information to be exported, the size of struct mce will balloon on some CPUs, unnecessarily, since those fields are vendor-specific. Moreover, different CPU vendors may export the additional information in varying sizes. The problem particularly intensifies since struct mce is exposed to userspace as part of UAPI. It's bloating through vendor-specific data should be avoided to limit the information being sent out to userspace. Add a new structure mce_hw_err to wrap the existing struct mce. The same will prevent its ballooning since vendor-specifc data, if any, can now be exported through a union within the wrapper structure and through __dynamic_array in mce_record tracepoint. Furthermore, new internal kernel fields can be added to the wrapper struct without impacting the user space API. [ bp: Restore reverse x-mas tree order of function vars declarations. ] Suggested-by: Borislav Petkov (AMD) Signed-off-by: Avadhut Naik Signed-off-by: Yazen Ghannam Signed-off-by: Borislav Petkov (AMD) Reviewed-by: Qiuxu Zhuo Link: https://lore.kernel.org/r/20241022194158.110073-2-avadhut.naik@amd.com --- arch/x86/include/asm/mce.h | 14 +- arch/x86/kernel/cpu/mce/amd.c | 27 ++-- arch/x86/kernel/cpu/mce/apei.c | 45 +++--- arch/x86/kernel/cpu/mce/core.c | 207 +++++++++++++++------------- arch/x86/kernel/cpu/mce/genpool.c | 18 +- arch/x86/kernel/cpu/mce/inject.c | 6 +- arch/x86/kernel/cpu/mce/internal.h | 4 +- include/trace/events/mce.h | 42 +++--- 8 files changed, 202 insertions(+), 161 deletions(-) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index 3b99701..4e45f45 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -187,6 +187,16 @@ enum mce_notifier_prios { MCE_PRIO_HIGHEST =3D MCE_PRIO_CEC }; =20 +/** + * struct mce_hw_err - Hardware Error Record. + * @m: Machine Check record. + */ +struct mce_hw_err { + struct mce m; +}; + +#define to_mce_hw_err(mce) container_of(mce, struct mce_hw_err, m) + struct notifier_block; extern void mce_register_decode_chain(struct notifier_block *nb); extern void mce_unregister_decode_chain(struct notifier_block *nb); @@ -221,8 +231,8 @@ static inline int apei_smca_report_x86_error(struct cpe= r_ia_proc_ctx *ctx_info, u64 lapic_id) { return -EINVAL; } #endif =20 -void mce_prep_record(struct mce *m); -void mce_log(struct mce *m); +void mce_prep_record(struct mce_hw_err *err); +void mce_log(struct mce_hw_err *err); DECLARE_PER_CPU(struct device *, mce_device); =20 /* Maximum number of MCA banks per CPU. */ diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index 14bf8c2..5b4d266 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -778,29 +778,30 @@ bool amd_mce_usable_address(struct mce *m) =20 static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc) { - struct mce m; + struct mce_hw_err err; + struct mce *m =3D &err.m; =20 - mce_prep_record(&m); + mce_prep_record(&err); =20 - m.status =3D status; - m.misc =3D misc; - m.bank =3D bank; - m.tsc =3D rdtsc(); + m->status =3D status; + m->misc =3D misc; + m->bank =3D bank; + m->tsc =3D rdtsc(); =20 - if (m.status & MCI_STATUS_ADDRV) { - m.addr =3D addr; + if (m->status & MCI_STATUS_ADDRV) { + m->addr =3D addr; =20 - smca_extract_err_addr(&m); + smca_extract_err_addr(m); } =20 if (mce_flags.smca) { - rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m.ipid); + rdmsrl(MSR_AMD64_SMCA_MCx_IPID(bank), m->ipid); =20 - if (m.status & MCI_STATUS_SYNDV) - rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m.synd); + if (m->status & MCI_STATUS_SYNDV) + rdmsrl(MSR_AMD64_SMCA_MCx_SYND(bank), m->synd); } =20 - mce_log(&m); + mce_log(&err); } =20 DEFINE_IDTENTRY_SYSVEC(sysvec_deferred_error) diff --git a/arch/x86/kernel/cpu/mce/apei.c b/arch/x86/kernel/cpu/mce/apei.c index 3885fe0..7f582b4 100644 --- a/arch/x86/kernel/cpu/mce/apei.c +++ b/arch/x86/kernel/cpu/mce/apei.c @@ -28,7 +28,8 @@ =20 void apei_mce_report_mem_error(int severity, struct cper_sec_mem_err *mem_= err) { - struct mce m; + struct mce_hw_err err; + struct mce *m; int lsb; =20 if (!(mem_err->validation_bits & CPER_MEM_VALID_PA)) @@ -44,22 +45,23 @@ void apei_mce_report_mem_error(int severity, struct cpe= r_sec_mem_err *mem_err) else lsb =3D PAGE_SHIFT; =20 - mce_prep_record(&m); - m.bank =3D -1; + mce_prep_record(&err); + m =3D &err.m; + m->bank =3D -1; /* Fake a memory read error with unknown channel */ - m.status =3D MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STAT= US_MISCV | 0x9f; - m.misc =3D (MCI_MISC_ADDR_PHYS << 6) | lsb; + m->status =3D MCI_STATUS_VAL | MCI_STATUS_EN | MCI_STATUS_ADDRV | MCI_STA= TUS_MISCV | 0x9f; + m->misc =3D (MCI_MISC_ADDR_PHYS << 6) | lsb; =20 if (severity >=3D GHES_SEV_RECOVERABLE) - m.status |=3D MCI_STATUS_UC; + m->status |=3D MCI_STATUS_UC; =20 if (severity >=3D GHES_SEV_PANIC) { - m.status |=3D MCI_STATUS_PCC; - m.tsc =3D rdtsc(); + m->status |=3D MCI_STATUS_PCC; + m->tsc =3D rdtsc(); } =20 - m.addr =3D mem_err->physical_addr; - mce_log(&m); + m->addr =3D mem_err->physical_addr; + mce_log(&err); } EXPORT_SYMBOL_GPL(apei_mce_report_mem_error); =20 @@ -67,8 +69,9 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ctx *c= tx_info, u64 lapic_id) { const u64 *i_mce =3D ((const u64 *) (ctx_info + 1)); bool apicid_found =3D false; + struct mce_hw_err err; unsigned int cpu; - struct mce m; + struct mce *m; =20 if (!boot_cpu_has(X86_FEATURE_SMCA)) return -EINVAL; @@ -108,18 +111,20 @@ int apei_smca_report_x86_error(struct cper_ia_proc_ct= x *ctx_info, u64 lapic_id) if (!apicid_found) return -EINVAL; =20 - mce_prep_record_common(&m); - mce_prep_record_per_cpu(cpu, &m); + m =3D &err.m; + memset(&err, 0, sizeof(struct mce_hw_err)); + mce_prep_record_common(m); + mce_prep_record_per_cpu(cpu, m); =20 - m.bank =3D (ctx_info->msr_addr >> 4) & 0xFF; - m.status =3D *i_mce; - m.addr =3D *(i_mce + 1); - m.misc =3D *(i_mce + 2); + m->bank =3D (ctx_info->msr_addr >> 4) & 0xFF; + m->status =3D *i_mce; + m->addr =3D *(i_mce + 1); + m->misc =3D *(i_mce + 2); /* Skipping MCA_CONFIG */ - m.ipid =3D *(i_mce + 4); - m.synd =3D *(i_mce + 5); + m->ipid =3D *(i_mce + 4); + m->synd =3D *(i_mce + 5); =20 - mce_log(&m); + mce_log(&err); =20 return 0; } diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 2a938f4..28e28b6 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -88,7 +88,7 @@ struct mca_config mca_cfg __read_mostly =3D { .monarch_timeout =3D -1 }; =20 -static DEFINE_PER_CPU(struct mce, mces_seen); +static DEFINE_PER_CPU(struct mce_hw_err, hw_errs_seen); static unsigned long mce_need_notify; =20 /* @@ -119,8 +119,6 @@ BLOCKING_NOTIFIER_HEAD(x86_mce_decoder_chain); =20 void mce_prep_record_common(struct mce *m) { - memset(m, 0, sizeof(struct mce)); - m->cpuid =3D cpuid_eax(1); m->cpuvendor =3D boot_cpu_data.x86_vendor; m->mcgcap =3D __rdmsr(MSR_IA32_MCG_CAP); @@ -138,9 +136,12 @@ void mce_prep_record_per_cpu(unsigned int cpu, struct = mce *m) m->socketid =3D topology_physical_package_id(cpu); } =20 -/* Do initial initialization of a struct mce */ -void mce_prep_record(struct mce *m) +/* Do initial initialization of struct mce_hw_err */ +void mce_prep_record(struct mce_hw_err *err) { + struct mce *m =3D &err->m; + + memset(err, 0, sizeof(struct mce_hw_err)); mce_prep_record_common(m); mce_prep_record_per_cpu(smp_processor_id(), m); } @@ -148,9 +149,9 @@ void mce_prep_record(struct mce *m) DEFINE_PER_CPU(struct mce, injectm); EXPORT_PER_CPU_SYMBOL_GPL(injectm); =20 -void mce_log(struct mce *m) +void mce_log(struct mce_hw_err *err) { - if (!mce_gen_pool_add(m)) + if (!mce_gen_pool_add(err)) irq_work_queue(&mce_irq_work); } EXPORT_SYMBOL_GPL(mce_log); @@ -171,8 +172,10 @@ void mce_unregister_decode_chain(struct notifier_block= *nb) } EXPORT_SYMBOL_GPL(mce_unregister_decode_chain); =20 -static void __print_mce(struct mce *m) +static void __print_mce(struct mce_hw_err *err) { + struct mce *m =3D &err->m; + pr_emerg(HW_ERR "CPU %d: Machine Check%s: %Lx Bank %d: %016Lx\n", m->extcpu, (m->mcgstatus & MCG_STATUS_MCIP ? " Exception" : ""), @@ -214,9 +217,11 @@ static void __print_mce(struct mce *m) m->microcode); } =20 -static void print_mce(struct mce *m) +static void print_mce(struct mce_hw_err *err) { - __print_mce(m); + struct mce *m =3D &err->m; + + __print_mce(err); =20 if (m->cpuvendor !=3D X86_VENDOR_AMD && m->cpuvendor !=3D X86_VENDOR_HYGO= N) pr_emerg_ratelimited(HW_ERR "Run the above through 'mcelog --ascii'\n"); @@ -251,7 +256,7 @@ static const char *mce_dump_aux_info(struct mce *m) return NULL; } =20 -static noinstr void mce_panic(const char *msg, struct mce *final, char *ex= p) +static noinstr void mce_panic(const char *msg, struct mce_hw_err *final, c= har *exp) { struct llist_node *pending; struct mce_evt_llist *l; @@ -282,20 +287,22 @@ static noinstr void mce_panic(const char *msg, struct= mce *final, char *exp) pending =3D mce_gen_pool_prepare_records(); /* First print corrected ones that are still unlogged */ llist_for_each_entry(l, pending, llnode) { - struct mce *m =3D &l->mce; + struct mce_hw_err *err =3D &l->err; + struct mce *m =3D &err->m; if (!(m->status & MCI_STATUS_UC)) { - print_mce(m); + print_mce(err); if (!apei_err) apei_err =3D apei_write_mce(m); } } /* Now print uncorrected but with the final one last */ llist_for_each_entry(l, pending, llnode) { - struct mce *m =3D &l->mce; + struct mce_hw_err *err =3D &l->err; + struct mce *m =3D &err->m; if (!(m->status & MCI_STATUS_UC)) continue; - if (!final || mce_cmp(m, final)) { - print_mce(m); + if (!final || mce_cmp(m, &final->m)) { + print_mce(err); if (!apei_err) apei_err =3D apei_write_mce(m); } @@ -303,12 +310,12 @@ static noinstr void mce_panic(const char *msg, struct= mce *final, char *exp) if (final) { print_mce(final); if (!apei_err) - apei_err =3D apei_write_mce(final); + apei_err =3D apei_write_mce(&final->m); } if (exp) pr_emerg(HW_ERR "Machine check: %s\n", exp); =20 - memmsg =3D mce_dump_aux_info(final); + memmsg =3D mce_dump_aux_info(&final->m); if (memmsg) pr_emerg(HW_ERR "Machine check: %s\n", memmsg); =20 @@ -323,9 +330,9 @@ static noinstr void mce_panic(const char *msg, struct m= ce *final, char *exp) * panic. */ if (kexec_crash_loaded()) { - if (final && (final->status & MCI_STATUS_ADDRV)) { + if (final && (final->m.status & MCI_STATUS_ADDRV)) { struct page *p; - p =3D pfn_to_online_page(final->addr >> PAGE_SHIFT); + p =3D pfn_to_online_page(final->m.addr >> PAGE_SHIFT); if (p) SetPageHWPoison(p); } @@ -445,16 +452,18 @@ static noinstr void mce_wrmsrl(u32 msr, u64 v) * check into our "mce" struct so that we can use it later to assess * the severity of the problem as we read per-bank specific details. */ -static noinstr void mce_gather_info(struct mce *m, struct pt_regs *regs) +static noinstr void mce_gather_info(struct mce_hw_err *err, struct pt_regs= *regs) { + struct mce *m; /* * Enable instrumentation around mce_prep_record() which calls external * facilities. */ instrumentation_begin(); - mce_prep_record(m); + mce_prep_record(err); instrumentation_end(); =20 + m =3D &err->m; m->mcgstatus =3D mce_rdmsrl(MSR_IA32_MCG_STATUS); if (regs) { /* @@ -574,13 +583,13 @@ EXPORT_SYMBOL_GPL(mce_is_correctable); static int mce_early_notifier(struct notifier_block *nb, unsigned long val, void *data) { - struct mce *m =3D (struct mce *)data; + struct mce_hw_err *err =3D to_mce_hw_err(data); =20 - if (!m) + if (!err) return NOTIFY_DONE; =20 /* Emit the trace record: */ - trace_mce_record(m); + trace_mce_record(err); =20 set_bit(0, &mce_need_notify); =20 @@ -624,13 +633,13 @@ static struct notifier_block mce_uc_nb =3D { static int mce_default_notifier(struct notifier_block *nb, unsigned long v= al, void *data) { - struct mce *m =3D (struct mce *)data; + struct mce_hw_err *err =3D to_mce_hw_err(data); =20 - if (!m) + if (!err) return NOTIFY_DONE; =20 - if (mca_cfg.print_all || !m->kflags) - __print_mce(m); + if (mca_cfg.print_all || !(err->m.kflags)) + __print_mce(err); =20 return NOTIFY_DONE; } @@ -644,8 +653,10 @@ static struct notifier_block mce_default_nb =3D { /* * Read ADDR and MISC registers. */ -static noinstr void mce_read_aux(struct mce *m, int i) +static noinstr void mce_read_aux(struct mce_hw_err *err, int i) { + struct mce *m =3D &err->m; + if (m->status & MCI_STATUS_MISCV) m->misc =3D mce_rdmsrl(mca_msr_reg(i, MCA_MISC)); =20 @@ -692,26 +703,28 @@ DEFINE_PER_CPU(unsigned, mce_poll_count); void machine_check_poll(enum mcp_flags flags, mce_banks_t *b) { struct mce_bank *mce_banks =3D this_cpu_ptr(mce_banks_array); - struct mce m; + struct mce_hw_err err; + struct mce *m; int i; =20 this_cpu_inc(mce_poll_count); =20 - mce_gather_info(&m, NULL); + mce_gather_info(&err, NULL); + m =3D &err.m; =20 if (flags & MCP_TIMESTAMP) - m.tsc =3D rdtsc(); + m->tsc =3D rdtsc(); =20 for (i =3D 0; i < this_cpu_read(mce_num_banks); i++) { if (!mce_banks[i].ctl || !test_bit(i, *b)) continue; =20 - m.misc =3D 0; - m.addr =3D 0; - m.bank =3D i; + m->misc =3D 0; + m->addr =3D 0; + m->bank =3D i; =20 barrier(); - m.status =3D mce_rdmsrl(mca_msr_reg(i, MCA_STATUS)); + m->status =3D mce_rdmsrl(mca_msr_reg(i, MCA_STATUS)); =20 /* * Update storm tracking here, before checking for the @@ -721,17 +734,17 @@ void machine_check_poll(enum mcp_flags flags, mce_ban= ks_t *b) * storm status. */ if (!mca_cfg.cmci_disabled) - mce_track_storm(&m); + mce_track_storm(m); =20 /* If this entry is not valid, ignore it */ - if (!(m.status & MCI_STATUS_VAL)) + if (!(m->status & MCI_STATUS_VAL)) continue; =20 /* * If we are logging everything (at CPU online) or this * is a corrected error, then we must log it. */ - if ((flags & MCP_UC) || !(m.status & MCI_STATUS_UC)) + if ((flags & MCP_UC) || !(m->status & MCI_STATUS_UC)) goto log_it; =20 /* @@ -741,20 +754,20 @@ void machine_check_poll(enum mcp_flags flags, mce_ban= ks_t *b) * everything else. */ if (!mca_cfg.ser) { - if (m.status & MCI_STATUS_UC) + if (m->status & MCI_STATUS_UC) continue; goto log_it; } =20 /* Log "not enabled" (speculative) errors */ - if (!(m.status & MCI_STATUS_EN)) + if (!(m->status & MCI_STATUS_EN)) goto log_it; =20 /* * Log UCNA (SDM: 15.6.3 "UCR Error Classification") * UC =3D=3D 1 && PCC =3D=3D 0 && S =3D=3D 0 */ - if (!(m.status & MCI_STATUS_PCC) && !(m.status & MCI_STATUS_S)) + if (!(m->status & MCI_STATUS_PCC) && !(m->status & MCI_STATUS_S)) goto log_it; =20 /* @@ -768,20 +781,20 @@ log_it: if (flags & MCP_DONTLOG) goto clear_it; =20 - mce_read_aux(&m, i); - m.severity =3D mce_severity(&m, NULL, NULL, false); + mce_read_aux(&err, i); + m->severity =3D mce_severity(m, NULL, NULL, false); /* * Don't get the IP here because it's unlikely to * have anything to do with the actual error location. */ =20 - if (mca_cfg.dont_log_ce && !mce_usable_address(&m)) + if (mca_cfg.dont_log_ce && !mce_usable_address(m)) goto clear_it; =20 if (flags & MCP_QUEUE_LOG) - mce_gen_pool_add(&m); + mce_gen_pool_add(&err); else - mce_log(&m); + mce_log(&err); =20 clear_it: /* @@ -905,9 +918,10 @@ static __always_inline void quirk_zen_ifu(int bank, st= ruct mce *m, struct pt_reg * Do a quick check if any of the events requires a panic. * This decides if we keep the events around or clear them. */ -static __always_inline int mce_no_way_out(struct mce *m, char **msg, unsig= ned long *validp, +static __always_inline int mce_no_way_out(struct mce_hw_err *err, char **m= sg, unsigned long *validp, struct pt_regs *regs) { + struct mce *m =3D &err->m; char *tmp =3D *msg; int i; =20 @@ -925,7 +939,7 @@ static __always_inline int mce_no_way_out(struct mce *m= , char **msg, unsigned lo =20 m->bank =3D i; if (mce_severity(m, regs, &tmp, true) >=3D MCE_PANIC_SEVERITY) { - mce_read_aux(m, i); + mce_read_aux(err, i); *msg =3D tmp; return 1; } @@ -1016,10 +1030,11 @@ out: */ static void mce_reign(void) { - int cpu; + struct mce_hw_err *err =3D NULL; struct mce *m =3D NULL; int global_worst =3D 0; char *msg =3D NULL; + int cpu; =20 /* * This CPU is the Monarch and the other CPUs have run @@ -1027,11 +1042,13 @@ static void mce_reign(void) * Grade the severity of the errors of all the CPUs. */ for_each_possible_cpu(cpu) { - struct mce *mtmp =3D &per_cpu(mces_seen, cpu); + struct mce_hw_err *etmp =3D &per_cpu(hw_errs_seen, cpu); + struct mce *mtmp =3D &etmp->m; =20 if (mtmp->severity > global_worst) { global_worst =3D mtmp->severity; - m =3D &per_cpu(mces_seen, cpu); + err =3D &per_cpu(hw_errs_seen, cpu); + m =3D &err->m; } } =20 @@ -1043,7 +1060,7 @@ static void mce_reign(void) if (m && global_worst >=3D MCE_PANIC_SEVERITY) { /* call mce_severity() to get "msg" for panic */ mce_severity(m, NULL, &msg, true); - mce_panic("Fatal machine check", m, msg); + mce_panic("Fatal machine check", err, msg); } =20 /* @@ -1060,11 +1077,11 @@ static void mce_reign(void) mce_panic("Fatal machine check from unknown source", NULL, NULL); =20 /* - * Now clear all the mces_seen so that they don't reappear on + * Now clear all the hw_errs_seen so that they don't reappear on * the next mce. */ for_each_possible_cpu(cpu) - memset(&per_cpu(mces_seen, cpu), 0, sizeof(struct mce)); + memset(&per_cpu(hw_errs_seen, cpu), 0, sizeof(struct mce_hw_err)); } =20 static atomic_t global_nwo; @@ -1268,13 +1285,14 @@ static noinstr bool mce_check_crashing_cpu(void) } =20 static __always_inline int -__mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *final, - unsigned long *toclear, unsigned long *valid_banks, int no_way_out, - int *worst) +__mc_scan_banks(struct mce_hw_err *err, struct pt_regs *regs, + struct mce_hw_err *final, unsigned long *toclear, + unsigned long *valid_banks, int no_way_out, int *worst) { struct mce_bank *mce_banks =3D this_cpu_ptr(mce_banks_array); struct mca_config *cfg =3D &mca_cfg; int severity, i, taint =3D 0; + struct mce *m =3D &err->m; =20 for (i =3D 0; i < this_cpu_read(mce_num_banks); i++) { arch___clear_bit(i, toclear); @@ -1319,7 +1337,7 @@ __mc_scan_banks(struct mce *m, struct pt_regs *regs, = struct mce *final, if (severity =3D=3D MCE_NO_SEVERITY) continue; =20 - mce_read_aux(m, i); + mce_read_aux(err, i); =20 /* assuming valid severity level !=3D 0 */ m->severity =3D severity; @@ -1329,17 +1347,17 @@ __mc_scan_banks(struct mce *m, struct pt_regs *regs= , struct mce *final, * done in #MC context, where instrumentation is disabled. */ instrumentation_begin(); - mce_log(m); + mce_log(err); instrumentation_end(); =20 if (severity > *worst) { - *final =3D *m; + *final =3D *err; *worst =3D severity; } } =20 /* mce_clear_state will clear *final, save locally for use later */ - *m =3D *final; + *err =3D *final; =20 return taint; } @@ -1399,9 +1417,10 @@ static void kill_me_never(struct callback_head *cb) set_mce_nospec(pfn); } =20 -static void queue_task_work(struct mce *m, char *msg, void (*func)(struct = callback_head *)) +static void queue_task_work(struct mce_hw_err *err, char *msg, void (*func= )(struct callback_head *)) { int count =3D ++current->mce_count; + struct mce *m =3D &err->m; =20 /* First call, save all the details */ if (count =3D=3D 1) { @@ -1414,11 +1433,12 @@ static void queue_task_work(struct mce *m, char *ms= g, void (*func)(struct callba =20 /* Ten is likely overkill. Don't expect more than two faults before task_= work() */ if (count > 10) - mce_panic("Too many consecutive machine checks while accessing user data= ", m, msg); + mce_panic("Too many consecutive machine checks while accessing user data= ", + err, msg); =20 /* Second or later call, make sure page address matches the one from firs= t call */ if (count > 1 && (current->mce_addr >> PAGE_SHIFT) !=3D (m->addr >> PAGE_= SHIFT)) - mce_panic("Consecutive machine checks to different user pages", m, msg); + mce_panic("Consecutive machine checks to different user pages", err, msg= ); =20 /* Do not call task_work_add() more than once */ if (count > 1) @@ -1467,8 +1487,10 @@ noinstr void do_machine_check(struct pt_regs *regs) int worst =3D 0, order, no_way_out, kill_current_task, lmce, taint =3D 0; DECLARE_BITMAP(valid_banks, MAX_NR_BANKS) =3D { 0 }; DECLARE_BITMAP(toclear, MAX_NR_BANKS) =3D { 0 }; - struct mce m, *final; + struct mce_hw_err *final; + struct mce_hw_err err; char *msg =3D NULL; + struct mce *m; =20 if (unlikely(mce_flags.p5)) return pentium_machine_check(regs); @@ -1506,13 +1528,14 @@ noinstr void do_machine_check(struct pt_regs *regs) =20 this_cpu_inc(mce_exception_count); =20 - mce_gather_info(&m, regs); - m.tsc =3D rdtsc(); + mce_gather_info(&err, regs); + m =3D &err.m; + m->tsc =3D rdtsc(); =20 - final =3D this_cpu_ptr(&mces_seen); - *final =3D m; + final =3D this_cpu_ptr(&hw_errs_seen); + *final =3D err; =20 - no_way_out =3D mce_no_way_out(&m, &msg, valid_banks, regs); + no_way_out =3D mce_no_way_out(&err, &msg, valid_banks, regs); =20 barrier(); =20 @@ -1521,15 +1544,15 @@ noinstr void do_machine_check(struct pt_regs *regs) * Assume the worst for now, but if we find the * severity is MCE_AR_SEVERITY we have other options. */ - if (!(m.mcgstatus & MCG_STATUS_RIPV)) + if (!(m->mcgstatus & MCG_STATUS_RIPV)) kill_current_task =3D 1; /* * Check if this MCE is signaled to only this logical processor, * on Intel, Zhaoxin only. */ - if (m.cpuvendor =3D=3D X86_VENDOR_INTEL || - m.cpuvendor =3D=3D X86_VENDOR_ZHAOXIN) - lmce =3D m.mcgstatus & MCG_STATUS_LMCES; + if (m->cpuvendor =3D=3D X86_VENDOR_INTEL || + m->cpuvendor =3D=3D X86_VENDOR_ZHAOXIN) + lmce =3D m->mcgstatus & MCG_STATUS_LMCES; =20 /* * Local machine check may already know that we have to panic. @@ -1540,12 +1563,12 @@ noinstr void do_machine_check(struct pt_regs *regs) */ if (lmce) { if (no_way_out) - mce_panic("Fatal local machine check", &m, msg); + mce_panic("Fatal local machine check", &err, msg); } else { order =3D mce_start(&no_way_out); } =20 - taint =3D __mc_scan_banks(&m, regs, final, toclear, valid_banks, no_way_o= ut, &worst); + taint =3D __mc_scan_banks(&err, regs, final, toclear, valid_banks, no_way= _out, &worst); =20 if (!no_way_out) mce_clear_state(toclear); @@ -1560,7 +1583,7 @@ noinstr void do_machine_check(struct pt_regs *regs) no_way_out =3D worst >=3D MCE_PANIC_SEVERITY; =20 if (no_way_out) - mce_panic("Fatal machine check on current CPU", &m, msg); + mce_panic("Fatal machine check on current CPU", &err, msg); } } else { /* @@ -1572,8 +1595,8 @@ noinstr void do_machine_check(struct pt_regs *regs) * make sure we have the right "msg". */ if (worst >=3D MCE_PANIC_SEVERITY) { - mce_severity(&m, regs, &msg, true); - mce_panic("Local fatal machine check!", &m, msg); + mce_severity(m, regs, &msg, true); + mce_panic("Local fatal machine check!", &err, msg); } } =20 @@ -1591,16 +1614,16 @@ noinstr void do_machine_check(struct pt_regs *regs) goto out; =20 /* Fault was in user mode and we need to take some action */ - if ((m.cs & 3) =3D=3D 3) { + if ((m->cs & 3) =3D=3D 3) { /* If this triggers there is no way to recover. Die hard. */ BUG_ON(!on_thread_stack() || !user_mode(regs)); =20 - if (!mce_usable_address(&m)) - queue_task_work(&m, msg, kill_me_now); + if (!mce_usable_address(m)) + queue_task_work(&err, msg, kill_me_now); else - queue_task_work(&m, msg, kill_me_maybe); + queue_task_work(&err, msg, kill_me_maybe); =20 - } else if (m.mcgstatus & MCG_STATUS_SEAM_NR) { + } else if (m->mcgstatus & MCG_STATUS_SEAM_NR) { /* * Saved RIP on stack makes it look like the machine check * was taken in the kernel on the instruction following @@ -1612,8 +1635,8 @@ noinstr void do_machine_check(struct pt_regs *regs) * not occur there. Mark the page as poisoned so it won't * be added to free list when the guest is terminated. */ - if (mce_usable_address(&m)) { - struct page *p =3D pfn_to_online_page(m.addr >> PAGE_SHIFT); + if (mce_usable_address(m)) { + struct page *p =3D pfn_to_online_page(m->addr >> PAGE_SHIFT); =20 if (p) SetPageHWPoison(p); @@ -1628,13 +1651,13 @@ noinstr void do_machine_check(struct pt_regs *regs) * corresponding exception handler which would do that is the * proper one. */ - if (m.kflags & MCE_IN_KERNEL_RECOV) { + if (m->kflags & MCE_IN_KERNEL_RECOV) { if (!fixup_exception(regs, X86_TRAP_MC, 0, 0)) - mce_panic("Failed kernel mode recovery", &m, msg); + mce_panic("Failed kernel mode recovery", &err, msg); } =20 - if (m.kflags & MCE_IN_KERNEL_COPYIN) - queue_task_work(&m, msg, kill_me_never); + if (m->kflags & MCE_IN_KERNEL_COPYIN) + queue_task_work(&err, msg, kill_me_never); } =20 out: diff --git a/arch/x86/kernel/cpu/mce/genpool.c b/arch/x86/kernel/cpu/mce/ge= npool.c index 4284749..d0be6dd 100644 --- a/arch/x86/kernel/cpu/mce/genpool.c +++ b/arch/x86/kernel/cpu/mce/genpool.c @@ -31,15 +31,15 @@ static LLIST_HEAD(mce_event_llist); */ static bool is_duplicate_mce_record(struct mce_evt_llist *t, struct mce_ev= t_llist *l) { + struct mce_hw_err *err1, *err2; struct mce_evt_llist *node; - struct mce *m1, *m2; =20 - m1 =3D &t->mce; + err1 =3D &t->err; =20 llist_for_each_entry(node, &l->llnode, llnode) { - m2 =3D &node->mce; + err2 =3D &node->err; =20 - if (!mce_cmp(m1, m2)) + if (!mce_cmp(&err1->m, &err2->m)) return true; } return false; @@ -73,8 +73,8 @@ struct llist_node *mce_gen_pool_prepare_records(void) =20 void mce_gen_pool_process(struct work_struct *__unused) { - struct llist_node *head; struct mce_evt_llist *node, *tmp; + struct llist_node *head; struct mce *mce; =20 head =3D llist_del_all(&mce_event_llist); @@ -83,7 +83,7 @@ void mce_gen_pool_process(struct work_struct *__unused) =20 head =3D llist_reverse_order(head); llist_for_each_entry_safe(node, tmp, head, llnode) { - mce =3D &node->mce; + mce =3D &node->err.m; blocking_notifier_call_chain(&x86_mce_decoder_chain, 0, mce); gen_pool_free(mce_evt_pool, (unsigned long)node, sizeof(*node)); } @@ -94,11 +94,11 @@ bool mce_gen_pool_empty(void) return llist_empty(&mce_event_llist); } =20 -int mce_gen_pool_add(struct mce *mce) +int mce_gen_pool_add(struct mce_hw_err *err) { struct mce_evt_llist *node; =20 - if (filter_mce(mce)) + if (filter_mce(&err->m)) return -EINVAL; =20 if (!mce_evt_pool) @@ -110,7 +110,7 @@ int mce_gen_pool_add(struct mce *mce) return -ENOMEM; } =20 - memcpy(&node->mce, mce, sizeof(*mce)); + memcpy(&node->err, err, sizeof(*err)); llist_add(&node->llnode, &mce_event_llist); =20 return 0; diff --git a/arch/x86/kernel/cpu/mce/inject.c b/arch/x86/kernel/cpu/mce/inj= ect.c index 49ed342..313fe68 100644 --- a/arch/x86/kernel/cpu/mce/inject.c +++ b/arch/x86/kernel/cpu/mce/inject.c @@ -502,8 +502,9 @@ static void prepare_msrs(void *info) =20 static void do_inject(void) { - u64 mcg_status =3D 0; unsigned int cpu =3D i_mce.extcpu; + struct mce_hw_err err; + u64 mcg_status =3D 0; u8 b =3D i_mce.bank; =20 i_mce.tsc =3D rdtsc_ordered(); @@ -517,7 +518,8 @@ static void do_inject(void) i_mce.status |=3D MCI_STATUS_SYNDV; =20 if (inj_type =3D=3D SW_INJ) { - mce_log(&i_mce); + err.m =3D i_mce; + mce_log(&err); return; } =20 diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/i= nternal.h index 43c7f3b..84f8105 100644 --- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -26,12 +26,12 @@ extern struct blocking_notifier_head x86_mce_decoder_ch= ain; =20 struct mce_evt_llist { struct llist_node llnode; - struct mce mce; + struct mce_hw_err err; }; =20 void mce_gen_pool_process(struct work_struct *__unused); bool mce_gen_pool_empty(void); -int mce_gen_pool_add(struct mce *mce); +int mce_gen_pool_add(struct mce_hw_err *err); int mce_gen_pool_init(void); struct llist_node *mce_gen_pool_prepare_records(void); =20 diff --git a/include/trace/events/mce.h b/include/trace/events/mce.h index f0f7b3c..65aba1a 100644 --- a/include/trace/events/mce.h +++ b/include/trace/events/mce.h @@ -19,9 +19,9 @@ =20 TRACE_EVENT(mce_record, =20 - TP_PROTO(struct mce *m), + TP_PROTO(struct mce_hw_err *err), =20 - TP_ARGS(m), + TP_ARGS(err), =20 TP_STRUCT__entry( __field( u64, mcgcap ) @@ -46,25 +46,25 @@ TRACE_EVENT(mce_record, ), =20 TP_fast_assign( - __entry->mcgcap =3D m->mcgcap; - __entry->mcgstatus =3D m->mcgstatus; - __entry->status =3D m->status; - __entry->addr =3D m->addr; - __entry->misc =3D m->misc; - __entry->synd =3D m->synd; - __entry->ipid =3D m->ipid; - __entry->ip =3D m->ip; - __entry->tsc =3D m->tsc; - __entry->ppin =3D m->ppin; - __entry->walltime =3D m->time; - __entry->cpu =3D m->extcpu; - __entry->cpuid =3D m->cpuid; - __entry->apicid =3D m->apicid; - __entry->socketid =3D m->socketid; - __entry->cs =3D m->cs; - __entry->bank =3D m->bank; - __entry->cpuvendor =3D m->cpuvendor; - __entry->microcode =3D m->microcode; + __entry->mcgcap =3D err->m.mcgcap; + __entry->mcgstatus =3D err->m.mcgstatus; + __entry->status =3D err->m.status; + __entry->addr =3D err->m.addr; + __entry->misc =3D err->m.misc; + __entry->synd =3D err->m.synd; + __entry->ipid =3D err->m.ipid; + __entry->ip =3D err->m.ip; + __entry->tsc =3D err->m.tsc; + __entry->ppin =3D err->m.ppin; + __entry->walltime =3D err->m.time; + __entry->cpu =3D err->m.extcpu; + __entry->cpuid =3D err->m.cpuid; + __entry->apicid =3D err->m.apicid; + __entry->socketid =3D err->m.socketid; + __entry->cs =3D err->m.cs; + __entry->bank =3D err->m.bank; + __entry->cpuvendor =3D err->m.cpuvendor; + __entry->microcode =3D err->m.microcode; ), =20 TP_printk("CPU: %d, MCGc/s: %llx/%llx, MC%d: %016Lx, IPID: %016Lx, ADDR: = %016Lx, MISC: %016Lx, SYND: %016Lx, RIP: %02x:<%016Lx>, TSC: %llx, PPIN: %l= lx, vendor: %u, CPUID: %x, time: %llu, socket: %u, APIC: %x, microcode: %x",