From nobody Sat Apr 11 02:18:39 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7CC2DC25B08 for ; Wed, 17 Aug 2022 05:12:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229915AbiHQFMG (ORCPT ); Wed, 17 Aug 2022 01:12:06 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49422 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230131AbiHQFMB (ORCPT ); Wed, 17 Aug 2022 01:12:01 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3184E6DFAB for ; Tue, 16 Aug 2022 22:12:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1660713121; x=1692249121; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=i7pkxwVgmKuiNrs1UqKFz8MUN/cxAwQqnppeopXvGQs=; b=Jcgoegr2t1gt34QyOAwox/VX255Zz1pD4xfpdIiXJZZ6L0P3zqsUm0Id YAVupj1xecGPeN7Jzw61BDKbGx/JiKvWn77DAiJnZ4gSaFCDAGh8I6nPT aMCTxoswa0Uyy+9EdysOM3kFlPpNmusWC0F4fZWhoq50B5+QS3vhMrWqO P2SeSlzvgJzJs3NmUQDewlnGXCmG2jwhuBGRrBxB2a1Xg4kxY2CCdKd0K 2MsiRNiyHk/LxCJfWs58b0dkutpkINj1cWH/xQ0LczOE15faTfel56+nb fgdzYZbxcyQUz108fNnH6ab3Dl8VtGqOV4OdzSOnq8U+OO2ENNujFZj3W Q==; X-IronPort-AV: E=McAfee;i="6400,9594,10441"; a="289972495" X-IronPort-AV: E=Sophos;i="5.93,242,1654585200"; d="scan'208";a="289972495" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2022 22:12:00 -0700 X-IronPort-AV: E=Sophos;i="5.93,242,1654585200"; d="scan'208";a="557976684" Received: from araj-dh-work.jf.intel.com ([10.165.157.158]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2022 22:11:59 -0700 From: Ashok Raj To: Borislav Petkov , Thomas Gleixner Cc: Tony Luck , Dave Hansen , "LKML Mailing List" , X86-kernel , Andy Lutomirski , Tom Lendacky , "Jacon Jun Pan" , Ashok Raj Subject: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature before saving microcode Date: Wed, 17 Aug 2022 05:11:23 +0000 Message-Id: <20220817051127.3323755-2-ashok.raj@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220817051127.3323755-1-ashok.raj@intel.com> References: <20220817051127.3323755-1-ashok.raj@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When save_microcode_patch() is looking to replace an existing microcode in the cache, current code is *only* checks the CPU sig/pf in the main header. Microcode can carry additional sig/pf combinations in the extended signature table, which is completely missed today. For e.g. Current patch is a multi-stepping patch and new incoming patch is a specific patch just for this CPUs stepping. patch1: fms3 <--- header FMS ... ext_sig: fms1 fms2 patch2: new fms2 <--- header FMS Current code takes only fms3 and checks with patch2 fms2. saved_patch.header.fms3 !=3D new_patch.header.fms2, so save_microcode_patch saves it to the end of list instead of replacing patch1 with patch2. There is no functional user observable issue since find_patch() skips patch versions that are <=3D current_patch and will land on patch2 properly. Nevertheless this will just end up storing every patch that isn't required. Kernel just needs to store the latest patch. Otherwise its a memory leak that sits in kernel and never used. Cc: stable@vger.kernel.org Fixes: fe055896c040 ("x86/microcode: Merge the early microcode loader") Tested-by: William Xie Reported-by: William Xie Signed-off-by: Ashok Raj --- arch/x86/kernel/cpu/microcode/intel.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/mi= crocode/intel.c index 025c8f0cd948..c4b11e2fbe33 100644 --- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -114,10 +114,18 @@ static void save_microcode_patch(struct ucode_cpu_inf= o *uci, void *data, unsigne =20 list_for_each_entry_safe(iter, tmp, µcode_cache, plist) { mc_saved_hdr =3D (struct microcode_header_intel *)iter->data; - sig =3D mc_saved_hdr->sig; - pf =3D mc_saved_hdr->pf; =20 - if (find_matching_signature(data, sig, pf)) { + sig =3D uci->cpu_sig.sig; + pf =3D uci->cpu_sig.pf; + + /* + * Compare the current CPUs signature with the ones in the + * cache to identify the right candidate to replace. At any + * given time, we should have no more than one valid patch + * file for a given CPU fms+pf in the cache list. + */ + + if (find_matching_signature(iter->data, sig, pf)) { prev_found =3D true; =20 if (mc_hdr->rev <=3D mc_saved_hdr->rev) --=20 2.32.0 From nobody Sat Apr 11 02:18:39 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADBBCC25B08 for ; Wed, 17 Aug 2022 05:12:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232554AbiHQFMJ (ORCPT ); Wed, 17 Aug 2022 01:12:09 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49434 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231761AbiHQFMC (ORCPT ); Wed, 17 Aug 2022 01:12:02 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E237A6DFB7 for ; Tue, 16 Aug 2022 22:12:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1660713121; x=1692249121; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=EQ+YUi710ASUR/UiftqoykJfpdPYLLzMNph5Lwb1jZw=; b=JL30gLUThHkagKdW4VQ+rt7SlfDbxRTlr67vBIW+u0k7cd1El2Qd1G3J /hL/OQPZl3WMXXF0UaT2DRIWNF7ReRMt8e4DTpyRZ+Dq/g3TGWlmjMlxx OdqulrmDENkpn0pP38hb7k8PW0w+eQ4W5CbQjiGU8HG4RKmxbBnFuDSb7 yCI5hdokEgrwmllPWy0/eDLDGg3yizZKFcnIvNChc5nl4niRL7Lm52BiA C7HddWScPAsPP/gC3m1YK8xH76RGne/YMYB2ZdhIIj1Y/zchrHX3NOfOF GsU+Hs6XXPqZdAsn3D5qe6kbti5GXL6Uk3JzVxUkoiPFImZvupLZIAztg g==; X-IronPort-AV: E=McAfee;i="6400,9594,10441"; a="289972497" X-IronPort-AV: E=Sophos;i="5.93,242,1654585200"; d="scan'208";a="289972497" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2022 22:12:00 -0700 X-IronPort-AV: E=Sophos;i="5.93,242,1654585200"; d="scan'208";a="557976687" Received: from araj-dh-work.jf.intel.com ([10.165.157.158]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2022 22:11:59 -0700 From: Ashok Raj To: Borislav Petkov , Thomas Gleixner Cc: Tony Luck , Dave Hansen , "LKML Mailing List" , X86-kernel , Andy Lutomirski , Tom Lendacky , "Jacon Jun Pan" , Ashok Raj Subject: [PATCH v3 2/5] x86/microcode/intel: Allow a late-load only if a min rev is specified Date: Wed, 17 Aug 2022 05:11:24 +0000 Message-Id: <20220817051127.3323755-3-ashok.raj@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220817051127.3323755-1-ashok.raj@intel.com> References: <20220817051127.3323755-1-ashok.raj@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In general users don't have the necessary information to determine whether a late-load of a new microcode version has removed any feature (MSR, CPUID etc) between what is currently loaded and this new microcode. To address this issue, Intel has added a "minimum required version" field to a previously reserved field in the file header. Microcode updates should only be applied if the current microcode version is equal to, or greater than this minimum required version. https://lore.kernel.org/linux-kernel/alpine.DEB.2.21.1909062237580.1902@nan= os.tec.linutronix.de/ Thomas made some suggestions on how meta-data in the microcode file could provide Linux with information to decide if the new microcode is suitable candidate for late-load. But even the "simpler" option#1 requires a lot of metadata and corresponding kernel code to parse it. The proposal here is an even simpler option. The criteria for a microcode to be a viable late-load candidate is that no CPUID or OS visible MSR features are removed with respect to an earlier version of the microcode. Pseudocode for late-load is as follows: if header.min_required_id =3D=3D 0 This is old format microcode, block late-load else if current_ucode_version < header.min_required_id Current version is too old, block late-load of this microcode. else OK to proceed with late-load. Any microcode that removes a feature will set the min_version to itself. This will enforce this microcode is not suitable for late-loading. The enforcement is not in hardware and limited to kernel loader enforcing the requirement. It is not required for early loading of microcode to enforce this requirement, since the new features are only evaluated after early loading in the boot process. Test cases covered: 1. With new kernel, attempting to load an older format microcode with the min_rev=3D0 should be blocked by kernel. [ 210.541802] microcode: Header MUST specify min version for late-load 2. New microcode with a non-zero min_rev in the header, but the specified min_rev is greater than what is currently loaded in the CPU should be blocked by kernel. 245.139828] microcode: Current revision 0x8f685300 is too old to update, must be at 0xaa000050 version or higher 3. New microcode with a min_rev < currently loaded should allow loading the microcode 4. Build initrd with microcode that has min_rev=3D0, or min_rev > currently loaded should permit early loading microcode from initrd. Tested-by: William Xie Reviewed-by: Tony Luck Signed-off-by: Ashok Raj --- arch/x86/include/asm/microcode_intel.h | 4 +++- arch/x86/kernel/cpu/microcode/intel.c | 20 ++++++++++++++++++++ 2 files changed, 23 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/microcode_intel.h b/arch/x86/include/asm/= microcode_intel.h index 4c92cea7e4b5..16b8715e0984 100644 --- a/arch/x86/include/asm/microcode_intel.h +++ b/arch/x86/include/asm/microcode_intel.h @@ -14,7 +14,9 @@ struct microcode_header_intel { unsigned int pf; unsigned int datasize; unsigned int totalsize; - unsigned int reserved[3]; + unsigned int reserved1; + unsigned int min_req_id; + unsigned int reserved3; }; =20 struct microcode_intel { diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/mi= crocode/intel.c index c4b11e2fbe33..1eb202ec2302 100644 --- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -178,6 +178,7 @@ static int microcode_sanity_check(void *mc, int print_e= rr) struct extended_sigtable *ext_header =3D NULL; u32 sum, orig_sum, ext_sigcount =3D 0, i; struct extended_signature *ext_sig; + struct ucode_cpu_info uci; =20 total_size =3D get_totalsize(mc_header); data_size =3D get_datasize(mc_header); @@ -248,6 +249,25 @@ static int microcode_sanity_check(void *mc, int print_= err) return -EINVAL; } =20 + /* + * Enforce for late-load that min_req_id is specified in the header. + * Otherwise its an old format microcode, reject it. + */ + if (print_err) { + if (!mc_header->min_req_id) { + pr_warn("Header MUST specify min version for late-load\n"); + return -EINVAL; + } + + intel_cpu_collect_info(&uci); + if (uci.cpu_sig.rev < mc_header->min_req_id) { + pr_warn("Current revision 0x%x is too old to update," + "must be at 0x%x version or higher\n", + uci.cpu_sig.rev, mc_header->min_req_id); + return -EINVAL; + } + } + if (!ext_table_size) return 0; =20 --=20 2.32.0 From nobody Sat Apr 11 02:18:39 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E202FC25B08 for ; Wed, 17 Aug 2022 05:12:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232608AbiHQFMN (ORCPT ); Wed, 17 Aug 2022 01:12:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49432 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232070AbiHQFMC (ORCPT ); Wed, 17 Aug 2022 01:12:02 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2B106E2C1 for ; Tue, 16 Aug 2022 22:12:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1660713121; x=1692249121; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KR/HBP1wOvQFpjD2YET+iCx+bjZqywBPEsVL8qmACU8=; b=eBGBDaCDeb7hqD5rGKsyDKfJBN83oq5QDis/hnj6jMZbB2aad32FYmx5 MATTinIoT6G0xd2sgUyGWFHHDYXTs57MbHwRHyVDYQ9zTnqtPdWlwXffT Q5ul3rTHevGvFR//ECUpseB6mDvsxFMziWZKrHZHVSj+d3+qMWvZ/JGPA 2nD4QaQU7e5ok+Np3qtPSD0KDkeLahitiuWyV11pjlVp0XbVQQHVpB6h4 c8FiVMhaIHf09+qFr1yVrAlTLwCVkRv03abXWATlrJOJQ1Pikw2iN52Q2 uBiQOKiKKY9NrG9sBXDX9HgIH0PoceWShueopleDNJ+3qYbxhKPYqLBL9 g==; X-IronPort-AV: E=McAfee;i="6400,9594,10441"; a="289972499" X-IronPort-AV: E=Sophos;i="5.93,242,1654585200"; d="scan'208";a="289972499" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2022 22:12:00 -0700 X-IronPort-AV: E=Sophos;i="5.93,242,1654585200"; d="scan'208";a="557976690" Received: from araj-dh-work.jf.intel.com ([10.165.157.158]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2022 22:11:59 -0700 From: Ashok Raj To: Borislav Petkov , Thomas Gleixner Cc: Tony Luck , Dave Hansen , "LKML Mailing List" , X86-kernel , Andy Lutomirski , Tom Lendacky , "Jacon Jun Pan" , Ashok Raj Subject: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during microcode update Date: Wed, 17 Aug 2022 05:11:25 +0000 Message-Id: <20220817051127.3323755-4-ashok.raj@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220817051127.3323755-1-ashok.raj@intel.com> References: <20220817051127.3323755-1-ashok.raj@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" When a microcode update is in progress, several instructions and MSR's can be patched by the update. During the update in progress, touching any of the resources being patched could result in unpredictable results. If thread0 is doing the update and thread1 happens to get a MCE, the handler might read an MSR that's being patched. In order to have predictable behavior, to avoid this scenario we set the MC= IP in all threads. Since MCE's can't be nested, HW will automatically promote to shutdown condition. After the update is completed, MCIP flag is cleared. The system is going to shutdown anyway, since the MCE could be a fatal error, or even recoverable errors in kernel space are treated as unrecoverable. Signed-off-by: Ashok Raj --- arch/x86/include/asm/mce.h | 4 ++++ arch/x86/kernel/cpu/mce/core.c | 9 +++++++++ arch/x86/kernel/cpu/microcode/core.c | 11 +++++++++++ 3 files changed, 24 insertions(+) diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h index cc73061e7255..2aef6120e23f 100644 --- a/arch/x86/include/asm/mce.h +++ b/arch/x86/include/asm/mce.h @@ -207,12 +207,16 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c); void mcheck_cpu_clear(struct cpuinfo_x86 *c); int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info, u64 lapic_id); +extern void mce_set_mcip(void); +extern void mce_clear_mcip(void); #else static inline int mcheck_init(void) { return 0; } static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {} static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {} static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_= info, u64 lapic_id) { return -EINVAL; } +static inline void mce_set_mcip(void) {} +static inline void mce_clear_mcip(void) {} #endif =20 void mce_setup(struct mce *m); diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 2c8ec5c71712..72b49d95bb3b 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -402,6 +402,15 @@ static noinstr void mce_wrmsrl(u32 msr, u64 v) : : "c" (msr), "a"(low), "d" (high) : "memory"); } =20 +void mce_set_mcip(void) +{ + mce_wrmsrl(MSR_IA32_MCG_STATUS, 0x1); +} + +void mce_clear_mcip(void) +{ + mce_wrmsrl(MSR_IA32_MCG_STATUS, 0x0); +} /* * Collect all global (w.r.t. this processor) status about this machine * check into our "mce" struct so that we can use it later to assess diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/mic= rocode/core.c index ad57e0e4d674..d24e1c754c27 100644 --- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -39,6 +39,7 @@ #include #include #include +#include =20 #define DRIVER_VERSION "2.2" =20 @@ -450,6 +451,14 @@ static int __reload_late(void *info) if (__wait_for_cpus(&late_cpus_in, NSEC_PER_SEC)) return -1; =20 + /* + * Its dangerous to let MCE while microcode update is in progress. + * Its extremely rare and even if happens they are fatal errors. + * But reading patched areas before the update is complete can be + * leading to unpredictable results. Setting MCIP will guarantee + * the platform is taken to reset predictively. + */ + mce_set_mcip(); /* * On an SMT system, it suffices to load the microcode on one sibling of * the core because the microcode engine is shared between the threads. @@ -457,6 +466,7 @@ static int __reload_late(void *info) * loading attempts happen on multiple threads of an SMT core. See * below. */ + if (cpumask_first(topology_sibling_cpumask(cpu)) =3D=3D cpu) apply_microcode_local(&err); else @@ -473,6 +483,7 @@ static int __reload_late(void *info) if (__wait_for_cpus(&late_cpus_out, NSEC_PER_SEC)) panic("Timeout during microcode update!\n"); =20 + mce_clear_mcip(); /* * At least one thread has completed update on each core. * For others, simply call the update to make sure the --=20 2.32.0 From nobody Sat Apr 11 02:18:39 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2063CC25B08 for ; Wed, 17 Aug 2022 05:12:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230282AbiHQFMR (ORCPT ); Wed, 17 Aug 2022 01:12:17 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49444 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232403AbiHQFMD (ORCPT ); Wed, 17 Aug 2022 01:12:03 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C4756DFA5 for ; Tue, 16 Aug 2022 22:12:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1660713122; x=1692249122; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=D80ulUAhbmrV7L9/yOfZt2Tb37WVpYS3nBYfSyUXRLU=; b=AWdNW9ruEInt7Z2b2+PExU05ZuJelHRN2YsPF2qwr3FnAfWT3Wzo6jTT y6dvKlERFY/BJEHQwYumovZ0aiLn3KXDo6muNubT3Dyy+WyDe2rL6I3UR C9Nuh7AOhATVV9voZi8AqniHhGbFU0HwkYA337FaOJRq/Lh2jaJUsln/7 Je9adQ4Afjtamd5oWGAIkNL4P1WwO3nfuRg2d/exRZxxydbUAwglMBIz4 GUr60tlf/8icTDDTU7It524jNsvo6/LIDo7WzcG1IRIt4HXgxBkjzLihx ZLKlGau3tIzh8dU76gmrdmp437eS+B+ldr3VeV0xoKWZs9v5ZbFld/U4Q w==; X-IronPort-AV: E=McAfee;i="6400,9594,10441"; a="289972502" X-IronPort-AV: E=Sophos;i="5.93,242,1654585200"; d="scan'208";a="289972502" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2022 22:12:00 -0700 X-IronPort-AV: E=Sophos;i="5.93,242,1654585200"; d="scan'208";a="557976693" Received: from araj-dh-work.jf.intel.com ([10.165.157.158]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2022 22:11:59 -0700 From: Ashok Raj To: Borislav Petkov , Thomas Gleixner Cc: Tony Luck , Dave Hansen , "LKML Mailing List" , X86-kernel , Andy Lutomirski , Tom Lendacky , "Jacon Jun Pan" , Ashok Raj , Jacob Pan Subject: [PATCH v3 4/5] x86/x2apic: Support x2apic self IPI with NMI_VECTOR Date: Wed, 17 Aug 2022 05:11:26 +0000 Message-Id: <20220817051127.3323755-5-ashok.raj@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220817051127.3323755-1-ashok.raj@intel.com> References: <20220817051127.3323755-1-ashok.raj@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" X2APIC architecture introduced a dedicated register for sending self-IPI. Though highly optimized for performance, its semantics limit the delivery mode to fixed mode. NMI vector is not supported, this created an inconsistent behavior between X2APIC and others. This patch adds support for X2APIC NMI_VECTOR by fall back to the slower ICR method. Suggested-by: Ashok Raj Signed-off-by: Jacob Pan Signed-off-by: Ashok Raj --- arch/x86/kernel/apic/x2apic_phys.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2ap= ic_phys.c index 6bde05a86b4e..cf187f1906b2 100644 --- a/arch/x86/kernel/apic/x2apic_phys.c +++ b/arch/x86/kernel/apic/x2apic_phys.c @@ -149,7 +149,11 @@ int x2apic_phys_pkg_id(int initial_apicid, int index_m= sb) =20 void x2apic_send_IPI_self(int vector) { - apic_write(APIC_SELF_IPI, vector); + if (unlikely(vector =3D=3D NMI_VECTOR)) + apic->send_IPI_mask(cpumask_of(smp_processor_id()), + NMI_VECTOR); + else + apic_write(APIC_SELF_IPI, vector); } =20 static struct apic apic_x2apic_phys __ro_after_init =3D { --=20 2.32.0 From nobody Sat Apr 11 02:18:39 2026 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18155C25B08 for ; Wed, 17 Aug 2022 05:12:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232148AbiHQFMV (ORCPT ); Wed, 17 Aug 2022 01:12:21 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49446 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229996AbiHQFMD (ORCPT ); Wed, 17 Aug 2022 01:12:03 -0400 Received: from mga11.intel.com (mga11.intel.com [192.55.52.93]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D3926DFAB for ; Tue, 16 Aug 2022 22:12:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1660713122; x=1692249122; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=L1VBHnDIQOcr6nKQc2FyQ1mkFBCT7m3GiUej9mOimyU=; b=ivUTQjNsLbEPEbwigLJwUNzQR6NaJcJ++CT+PtZl6M+986AgdFk/RPTa siKk6k9jAFUjuumUFumV6AQsBrc1Zth1XhCyo6eb6hqPY2JzegmnjIgX9 ImT/TzNa+ddiYaXtXRWz9M1FWYIf61EWGtKgkLb8jZKyUuijAwMHnxACo V5zphdjxfGkyyiNnV7d3NuUGLmOfcTqXfkMEuN4gGHk+qGWsnEYNelnam Yot5NOTU4+4joPTboW6CUZ7bIBuuMLkn4T7v7idUiJNKCjjlaSLO04zuH WOmW1Xq92roTLae4WnlxqEs7k4Acflo0hSsar7+h8gCTWpLE/6IOLS2QY w==; X-IronPort-AV: E=McAfee;i="6400,9594,10441"; a="289972507" X-IronPort-AV: E=Sophos;i="5.93,242,1654585200"; d="scan'208";a="289972507" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2022 22:12:01 -0700 X-IronPort-AV: E=Sophos;i="5.93,242,1654585200"; d="scan'208";a="557976696" Received: from araj-dh-work.jf.intel.com ([10.165.157.158]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 16 Aug 2022 22:11:59 -0700 From: Ashok Raj To: Borislav Petkov , Thomas Gleixner Cc: Tony Luck , Dave Hansen , "LKML Mailing List" , X86-kernel , Andy Lutomirski , Tom Lendacky , "Jacon Jun Pan" , Ashok Raj Subject: [PATCH v3 5/5] x86/microcode: Place siblings in NMI loop while update in progress Date: Wed, 17 Aug 2022 05:11:27 +0000 Message-Id: <20220817051127.3323755-6-ashok.raj@intel.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220817051127.3323755-1-ashok.raj@intel.com> References: <20220817051127.3323755-1-ashok.raj@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Microcode updates need a guarantee that the thread sibling that is waiting for the update to finish on the primary core will not execute any instructions until the update is complete. This is required to guarantee any MSR or instruction that's being patched will be executed before the update is complete. After the stop_machine() rendezvous, an NMI handler is registered. If an NMI were to happen while the microcode update is not complete, the secondary thread will spin until the ucode update state is cleared. Couple of choices discussed are: 1. Rendezvous inside the NMI handler, and also perform the update from within the handler. This seemed too risky and might cause instability with the races that we would need to solve. This would be a difficult choice. 1.a Since the primary thread of every core is performing a wrmsr for the update, once the wrmsr has started, it can't be interrupted. Hence its not required to NMI the primary thread of the core. Only the secondary thread needs to be parked in NMI before the update begins. Suggested by From Andy Cooper 2. Thomas (tglx) suggested that we could look into masking all the LVT originating NMI's. Such as LINT1, Perf control LVT entries and such. Since we are in the rendezvous loop, we don't need to worry about any NMI IPI's generated by the OS. The one we didn't have any control over is the ACPI mechanism of sending notifications to kernel for Firmware First Processing (FFM). Apparently it seems there is a PCH register that BIOS in SMI would write to generate such an interrupt (ACPI GHES). 3. This is a simpler option. OS registers an NMI handler and doesn't do any NMI rendezvous dance. But if an NMI were to happen, we check if any of the CPUs thread siblings have an update in progress. Only those CPUs would take an NMI. The thread performing the wrmsr() will only take an NMI after the completion of the wrmsr 0x79 flow. [ Lutomirsky thinks this is weak, and what happens from taking the interrupt and the path to the registered callback handler might be exposed.] Seems like 1.a is the best candidate. The algorithm is something like this: After stop_machine() all threads are executing __reload_late() nmi_callback() { if (!in_ucode_update) return NMI_DONE; if (cpu not in sibling_mask) return NMI_DONE; update sibling reached NMI for primary to continue while (cpu in sibling_mask) wait; return NMI_HANDLED; } __reload_late() { entry_rendezvous(&late_cpus_in); set_mcip() if (this_cpu is first_cpu in the core) wait for siblings to drop in NMI apply_microcode() else { send self_ipi(NMI_VECTOR); goto wait_for_siblings; } wait_for_siblings: exit_rendezvous(&late_cpus_out); clear_mcip } reload_late() { register_nmi_handler() prepare_mask of all sibling cpus() update state =3D ucode in progress; stop_machine(); unregister_nmi_handler(); } Signed-off-by: Ashok Raj --- arch/x86/kernel/cpu/microcode/core.c | 218 ++++++++++++++++++++++++++- 1 file changed, 211 insertions(+), 7 deletions(-) diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/mic= rocode/core.c index d24e1c754c27..fd3b8ce2c82a 100644 --- a/arch/x86/kernel/cpu/microcode/core.c +++ b/arch/x86/kernel/cpu/microcode/core.c @@ -39,7 +39,9 @@ #include #include #include +#include #include +#include =20 #define DRIVER_VERSION "2.2" =20 @@ -411,6 +413,13 @@ static int check_online_cpus(void) =20 static atomic_t late_cpus_in; static atomic_t late_cpus_out; +static atomic_t nmi_cpus; // number of CPUs that enter NMI +static atomic_t nmi_timeouts; // number of siblings that timeout +static atomic_t nmi_siblings; // Nmber of siblings that enter NMI +static atomic_t in_ucode_update;// Are we in microcode update? +static atomic_t nmi_exit; // Siblings that exit NMI + +static struct cpumask all_sibling_mask; =20 static int __wait_for_cpus(atomic_t *t, long long timeout) { @@ -433,6 +442,104 @@ static int __wait_for_cpus(atomic_t *t, long long tim= eout) return 0; } =20 +struct core_rendez { + int num_core_cpus; + atomic_t callin; + atomic_t core_done; +}; + +static DEFINE_PER_CPU(struct core_rendez, core_sync); + +static int __wait_for_update(atomic_t *t, long long timeout) +{ + while (!atomic_read(t)) { + if (timeout < SPINUNIT) + return 1; + + cpu_relax(); + ndelay(SPINUNIT); + timeout -=3D SPINUNIT; + touch_nmi_watchdog(); + } + return 0; +} + +static int ucode_nmi_cb(unsigned int val, struct pt_regs *regs) +{ + int ret, first_cpu, cpu =3D smp_processor_id(); + struct core_rendez *rendez; + + atomic_inc(&nmi_cpus); + if (!atomic_read(&in_ucode_update)) + return NMI_DONE; + + if (!cpumask_test_cpu(cpu, &all_sibling_mask)) + return NMI_DONE; + + first_cpu =3D cpumask_first(topology_sibling_cpumask(cpu)); + rendez =3D &per_cpu(core_sync, first_cpu); + + /* + * If primary has marked update is complete, we don't need to be + * here in the NMI handler. + */ + if (atomic_read(&rendez->core_done)) + return NMI_DONE; + + atomic_inc(&nmi_siblings); + pr_debug("Sibling CPU %d made into NMI handler\n", cpu); + /* + * primary thread waits for all siblings to checkin the NMI handler + * before performing the microcode update + */ + + atomic_inc(&rendez->callin); + ret =3D __wait_for_update(&rendez->core_done, NSEC_PER_SEC); + if (ret) { + atomic_inc(&nmi_timeouts); + pr_debug("Sibling CPU %d sibling timedout\n",cpu); + } + /* + * Once primary signals update is complete, we are free to get out + * of the NMI jail + */ + if (atomic_read(&rendez->core_done)) { + pr_debug("Sibling CPU %d breaking from NMI\n", cpu); + atomic_inc(&nmi_exit); + } + + return NMI_HANDLED; +} + +/* + * Primary thread clears the cpumask to release the siblings from the NMI + * jail + */ + +static void clear_nmi_cpus(void) +{ + int first_cpu, wait_cpu, cpu =3D smp_processor_id(); + + first_cpu =3D cpumask_first(topology_sibling_cpumask(cpu)); + for_each_cpu(wait_cpu, topology_sibling_cpumask(cpu)) { + if (wait_cpu =3D=3D first_cpu) + continue; + cpumask_clear_cpu(wait_cpu, &all_sibling_mask); + } +} + +static int __wait_for_siblings(struct core_rendez *rendez, long long timeo= ut) +{ + int num_sibs =3D rendez->num_core_cpus - 1; + atomic_t *t =3D &rendez->callin; + + while (atomic_read(t) < num_sibs) { + cpu_relax(); + touch_nmi_watchdog(); + } + return 0; +} + /* * Returns: * < 0 - on error @@ -440,17 +547,20 @@ static int __wait_for_cpus(atomic_t *t, long long tim= eout) */ static int __reload_late(void *info) { - int cpu =3D smp_processor_id(); + int first_cpu, cpu =3D smp_processor_id(); enum ucode_state err; int ret =3D 0; =20 /* * Wait for all CPUs to arrive. A load will not be attempted unless all * CPUs show up. - * */ + */ if (__wait_for_cpus(&late_cpus_in, NSEC_PER_SEC)) return -1; =20 + if (cpumask_first(cpu_online_mask) =3D=3D cpu) + pr_debug("__reload_late: Entry Sync Done\n"); + /* * Its dangerous to let MCE while microcode update is in progress. * Its extremely rare and even if happens they are fatal errors. @@ -459,6 +569,7 @@ static int __reload_late(void *info) * the platform is taken to reset predictively. */ mce_set_mcip(); + /* * On an SMT system, it suffices to load the microcode on one sibling of * the core because the microcode engine is shared between the threads. @@ -466,13 +577,35 @@ static int __reload_late(void *info) * loading attempts happen on multiple threads of an SMT core. See * below. */ + first_cpu =3D cpumask_first(topology_sibling_cpumask(cpu)); =20 - if (cpumask_first(topology_sibling_cpumask(cpu)) =3D=3D cpu) + /* + * Set the CPUs that we should hold in NMI until the primary has + * completed the microcode update. + */ + if (first_cpu =3D=3D cpu) { + struct core_rendez *pcpu_core =3D &per_cpu(core_sync, cpu); + + /* + * Wait for all siblings to enter + * NMI before performing the update + */ + ret =3D __wait_for_siblings(pcpu_core, NSEC_PER_SEC); + if (ret) { + pr_err("CPU %d core lead timeout waiting for" + " siblings\n", cpu); + ret =3D -1; + } + pr_debug("Primary CPU %d proceeding with update\n", cpu); apply_microcode_local(&err); - else + atomic_set(&pcpu_core->core_done, 1); + clear_nmi_cpus(); + } else { + apic->send_IPI_self(NMI_VECTOR); goto wait_for_siblings; + } =20 - if (err >=3D UCODE_NFOUND) { + if (ret || err >=3D UCODE_NFOUND) { if (err =3D=3D UCODE_ERROR) pr_warn("Error reloading microcode on CPU %d\n", cpu); =20 @@ -483,6 +616,9 @@ static int __reload_late(void *info) if (__wait_for_cpus(&late_cpus_out, NSEC_PER_SEC)) panic("Timeout during microcode update!\n"); =20 + if (cpumask_first(cpu_online_mask) =3D=3D cpu) + pr_debug("__reload_late: Exit Sync Done\n"); + mce_clear_mcip(); /* * At least one thread has completed update on each core. @@ -496,26 +632,94 @@ static int __reload_late(void *info) return ret; } =20 +static void set_nmi_cpus(int cpu) +{ + int first_cpu, wait_cpu; + struct core_rendez *pcpu_core =3D &per_cpu(core_sync, cpu); + + first_cpu =3D cpumask_first(topology_sibling_cpumask(cpu)); + for_each_cpu(wait_cpu, topology_sibling_cpumask(cpu)) { + if (wait_cpu =3D=3D first_cpu) { + pcpu_core->num_core_cpus =3D + cpumask_weight(topology_sibling_cpumask(wait_cpu)); + continue; + } + cpumask_set_cpu(wait_cpu, &all_sibling_mask); + } +} + +static void prepare_siblings(void) +{ + int cpu; + + for_each_cpu(cpu, cpu_online_mask) { + set_nmi_cpus(cpu); + } +} + /* * Reload microcode late on all CPUs. Wait for a sec until they * all gather together. */ static int microcode_reload_late(void) { - int ret; + int ret =3D 0; =20 pr_err("Attempting late microcode loading - it is dangerous and taints th= e kernel.\n"); pr_err("You should switch to early loading, if possible.\n"); =20 + /* + * Used for late_load entry and exit rendezvous + */ atomic_set(&late_cpus_in, 0); atomic_set(&late_cpus_out, 0); =20 + /* + * in_ucode_update: Global state while in ucode update + * nmi_cpus: Count of CPUs entering NMI while ucode in progress + * nmi_siblings: Count of siblings that enter NMI + * nmi_timeouts: Count of siblings that fail to see mask clear + */ + atomic_set(&in_ucode_update,0); + atomic_set(&nmi_cpus, 0); + atomic_set(&nmi_timeouts, 0); + atomic_set(&nmi_siblings, 0); + + cpumask_clear(&all_sibling_mask); + + ret =3D register_nmi_handler(NMI_LOCAL, ucode_nmi_cb, NMI_FLAG_FIRST, + "ucode_nmi"); + if (ret) { + pr_err("Unable to register NMI handler\n"); + goto done; + } + + /* + * Prepare everything for siblings threads to drop into NMI while + * the update is in progress. + */ + prepare_siblings(); + atomic_set(&in_ucode_update, 1); +#if 0 + apic->send_IPI_mask(&all_sibling_mask, NMI_VECTOR); + pr_debug("Sent NMI broadcast to all sibling cpus\n"); +#endif ret =3D stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask); if (ret =3D=3D 0) microcode_check(); =20 - pr_info("Reload completed, microcode revision: 0x%x\n", boot_cpu_data.mic= rocode); + unregister_nmi_handler(NMI_LOCAL, "ucode_nmi"); + + pr_debug("Total CPUs that entered NMI ... %d\n", + atomic_read(&nmi_cpus)); + pr_debug("Total siblings that entered NMI ... %d\n", + atomic_read(&nmi_siblings)); + pr_debug("Total siblings timedout ... %d\n", + atomic_read(&nmi_timeouts)); + pr_info("Reload completed, microcode revision: 0x%x\n", + boot_cpu_data.microcode); =20 +done: return ret; } =20 --=20 2.32.0