From nobody Sat Apr 11 02:18:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 7CC2DC25B08
	for <linux-kernel@archiver.kernel.org>; Wed, 17 Aug 2022 05:12:08 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S229915AbiHQFMG (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 17 Aug 2022 01:12:06 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49422 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230131AbiHQFMB (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 17 Aug 2022 01:12:01 -0400
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3184E6DFAB
        for <linux-kernel@vger.kernel.org>;
 Tue, 16 Aug 2022 22:12:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1660713121; x=1692249121;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=i7pkxwVgmKuiNrs1UqKFz8MUN/cxAwQqnppeopXvGQs=;
  b=Jcgoegr2t1gt34QyOAwox/VX255Zz1pD4xfpdIiXJZZ6L0P3zqsUm0Id
   YAVupj1xecGPeN7Jzw61BDKbGx/JiKvWn77DAiJnZ4gSaFCDAGh8I6nPT
   aMCTxoswa0Uyy+9EdysOM3kFlPpNmusWC0F4fZWhoq50B5+QS3vhMrWqO
   P2SeSlzvgJzJs3NmUQDewlnGXCmG2jwhuBGRrBxB2a1Xg4kxY2CCdKd0K
   2MsiRNiyHk/LxCJfWs58b0dkutpkINj1cWH/xQ0LczOE15faTfel56+nb
   fgdzYZbxcyQUz108fNnH6ab3Dl8VtGqOV4OdzSOnq8U+OO2ENNujFZj3W
   Q==;
X-IronPort-AV: E=McAfee;i="6400,9594,10441"; a="289972495"
X-IronPort-AV: E=Sophos;i="5.93,242,1654585200";
   d="scan'208";a="289972495"
Received: from orsmga003.jf.intel.com ([10.7.209.27])
  by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Aug 2022 22:12:00 -0700
X-IronPort-AV: E=Sophos;i="5.93,242,1654585200";
   d="scan'208";a="557976684"
Received: from araj-dh-work.jf.intel.com ([10.165.157.158])
  by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Aug 2022 22:11:59 -0700
From: Ashok Raj <ashok.raj@intel.com>
To: Borislav Petkov <bp@alien8.de>,
        Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>,
        Dave Hansen <dave.hansen@intel.com>,
        "LKML Mailing List" <linux-kernel@vger.kernel.org>,
        X86-kernel <x86@kernel.org>,
        Andy Lutomirski <luto@amacapital.net>,
        Tom Lendacky <thomas.lendacky@amd.com>,
        "Jacon Jun Pan" <jacob.jun.pan@intel.com>,
        Ashok Raj <ashok.raj@intel.com>
Subject: [PATCH v3 1/5] x86/microcode/intel: Check against CPU signature
 before saving microcode
Date: Wed, 17 Aug 2022 05:11:23 +0000
Message-Id: <20220817051127.3323755-2-ashok.raj@intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <20220817051127.3323755-1-ashok.raj@intel.com>
References: <20220817051127.3323755-1-ashok.raj@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

When save_microcode_patch() is looking to replace an existing microcode in
the cache, current code is *only* checks the CPU sig/pf in the main
header. Microcode can carry additional sig/pf combinations in the extended
signature table, which is completely missed today.

For e.g. Current patch is a multi-stepping patch and new incoming patch is
a specific patch just for this CPUs stepping.

patch1:
fms3 <--- header FMS
...
ext_sig:
fms1
fms2

patch2: new
fms2 <--- header FMS

Current code takes only fms3 and checks with patch2 fms2.

saved_patch.header.fms3 !=3D new_patch.header.fms2, so save_microcode_patch
saves it to the end of list instead of replacing patch1 with patch2.

There is no functional user observable issue since find_patch() skips
patch versions that are <=3D current_patch and will land on patch2 properly.

Nevertheless this will just end up storing every patch that isn't required.
Kernel just needs to store the latest patch. Otherwise its a memory leak
that sits in kernel and never used.

Cc: stable@vger.kernel.org
Fixes: fe055896c040 ("x86/microcode: Merge the early microcode loader")
Tested-by: William Xie <william.xie@intel.com>
Reported-by: William Xie <william.xie@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/intel.c | 14 +++++++++++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/mi=
crocode/intel.c
index 025c8f0cd948..c4b11e2fbe33 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -114,10 +114,18 @@ static void save_microcode_patch(struct ucode_cpu_inf=
o *uci, void *data, unsigne
=20
 	list_for_each_entry_safe(iter, tmp, &microcode_cache, plist) {
 		mc_saved_hdr =3D (struct microcode_header_intel *)iter->data;
-		sig	     =3D mc_saved_hdr->sig;
-		pf	     =3D mc_saved_hdr->pf;
=20
-		if (find_matching_signature(data, sig, pf)) {
+		sig =3D uci->cpu_sig.sig;
+		pf  =3D uci->cpu_sig.pf;
+
+		/*
+		 * Compare the current CPUs signature with the ones in the
+		 * cache to identify the right candidate to replace. At any
+		 * given time, we should have no more than one valid patch
+		 * file for a given CPU fms+pf in the cache list.
+		 */
+
+		if (find_matching_signature(iter->data, sig, pf)) {
 			prev_found =3D true;
=20
 			if (mc_hdr->rev <=3D mc_saved_hdr->rev)
--=20
2.32.0
From nobody Sat Apr 11 02:18:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id ADBBCC25B08
	for <linux-kernel@archiver.kernel.org>; Wed, 17 Aug 2022 05:12:11 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232554AbiHQFMJ (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 17 Aug 2022 01:12:09 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49434 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231761AbiHQFMC (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 17 Aug 2022 01:12:02 -0400
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E237A6DFB7
        for <linux-kernel@vger.kernel.org>;
 Tue, 16 Aug 2022 22:12:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1660713121; x=1692249121;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=EQ+YUi710ASUR/UiftqoykJfpdPYLLzMNph5Lwb1jZw=;
  b=JL30gLUThHkagKdW4VQ+rt7SlfDbxRTlr67vBIW+u0k7cd1El2Qd1G3J
   /hL/OQPZl3WMXXF0UaT2DRIWNF7ReRMt8e4DTpyRZ+Dq/g3TGWlmjMlxx
   OdqulrmDENkpn0pP38hb7k8PW0w+eQ4W5CbQjiGU8HG4RKmxbBnFuDSb7
   yCI5hdokEgrwmllPWy0/eDLDGg3yizZKFcnIvNChc5nl4niRL7Lm52BiA
   C7HddWScPAsPP/gC3m1YK8xH76RGne/YMYB2ZdhIIj1Y/zchrHX3NOfOF
   GsU+Hs6XXPqZdAsn3D5qe6kbti5GXL6Uk3JzVxUkoiPFImZvupLZIAztg
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10441"; a="289972497"
X-IronPort-AV: E=Sophos;i="5.93,242,1654585200";
   d="scan'208";a="289972497"
Received: from orsmga003.jf.intel.com ([10.7.209.27])
  by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Aug 2022 22:12:00 -0700
X-IronPort-AV: E=Sophos;i="5.93,242,1654585200";
   d="scan'208";a="557976687"
Received: from araj-dh-work.jf.intel.com ([10.165.157.158])
  by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Aug 2022 22:11:59 -0700
From: Ashok Raj <ashok.raj@intel.com>
To: Borislav Petkov <bp@alien8.de>,
        Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>,
        Dave Hansen <dave.hansen@intel.com>,
        "LKML Mailing List" <linux-kernel@vger.kernel.org>,
        X86-kernel <x86@kernel.org>,
        Andy Lutomirski <luto@amacapital.net>,
        Tom Lendacky <thomas.lendacky@amd.com>,
        "Jacon Jun Pan" <jacob.jun.pan@intel.com>,
        Ashok Raj <ashok.raj@intel.com>
Subject: [PATCH v3 2/5] x86/microcode/intel: Allow a late-load only if a min
 rev is specified
Date: Wed, 17 Aug 2022 05:11:24 +0000
Message-Id: <20220817051127.3323755-3-ashok.raj@intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <20220817051127.3323755-1-ashok.raj@intel.com>
References: <20220817051127.3323755-1-ashok.raj@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

In general users don't have the necessary information to determine
whether a late-load of a new microcode version has removed any feature
(MSR, CPUID etc) between what is currently loaded and this new microcode.
To address this issue, Intel has added a "minimum required version" field to
a previously reserved field in the file header. Microcode updates
should only be applied if the current microcode version is equal
to, or greater than this minimum required version.

https://lore.kernel.org/linux-kernel/alpine.DEB.2.21.1909062237580.1902@nan=
os.tec.linutronix.de/

Thomas made some suggestions on how meta-data in the microcode file could
provide Linux with information to decide if the new microcode is suitable
candidate for late-load. But even the "simpler" option#1 requires a lot of
metadata and corresponding kernel code to parse it.

The proposal here is an even simpler option. The criteria for a microcode to
be a viable late-load candidate is that no CPUID or OS visible MSR features
are removed with respect to an earlier version of the microcode.

Pseudocode for late-load is as follows:

if header.min_required_id =3D=3D 0
	This is old format microcode, block late-load
else if current_ucode_version < header.min_required_id
	Current version is too old, block late-load of this microcode.
else
	OK to proceed with late-load.

Any microcode that removes a feature will set the min_version to itself.
This will enforce this microcode is not suitable for late-loading.

The enforcement is not in hardware and limited to kernel loader enforcing
the requirement. It is not required for early loading of microcode to
enforce this requirement, since the new features are only
evaluated after early loading in the boot process.


Test cases covered:

1. With new kernel, attempting to load an older format microcode with the
   min_rev=3D0 should be blocked by kernel.

   [  210.541802] microcode: Header MUST specify min version for late-load

2. New microcode with a non-zero min_rev in the header, but the specified
   min_rev is greater than what is currently loaded in the CPU should be
   blocked by kernel.

   245.139828] microcode: Current revision 0x8f685300 is too old to update,
must be at 0xaa000050 version or higher

3. New microcode with a min_rev < currently loaded should allow loading the
   microcode

4. Build initrd with microcode that has min_rev=3D0, or min_rev > currently
   loaded should permit early loading microcode from initrd.


Tested-by: William Xie <william.xie@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/include/asm/microcode_intel.h |  4 +++-
 arch/x86/kernel/cpu/microcode/intel.c  | 20 ++++++++++++++++++++
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/microcode_intel.h b/arch/x86/include/asm/=
microcode_intel.h
index 4c92cea7e4b5..16b8715e0984 100644
--- a/arch/x86/include/asm/microcode_intel.h
+++ b/arch/x86/include/asm/microcode_intel.h
@@ -14,7 +14,9 @@ struct microcode_header_intel {
 	unsigned int            pf;
 	unsigned int            datasize;
 	unsigned int            totalsize;
-	unsigned int            reserved[3];
+	unsigned int            reserved1;
+	unsigned int		min_req_id;
+	unsigned int            reserved3;
 };
=20
 struct microcode_intel {
diff --git a/arch/x86/kernel/cpu/microcode/intel.c b/arch/x86/kernel/cpu/mi=
crocode/intel.c
index c4b11e2fbe33..1eb202ec2302 100644
--- a/arch/x86/kernel/cpu/microcode/intel.c
+++ b/arch/x86/kernel/cpu/microcode/intel.c
@@ -178,6 +178,7 @@ static int microcode_sanity_check(void *mc, int print_e=
rr)
 	struct extended_sigtable *ext_header =3D NULL;
 	u32 sum, orig_sum, ext_sigcount =3D 0, i;
 	struct extended_signature *ext_sig;
+	struct ucode_cpu_info uci;
=20
 	total_size =3D get_totalsize(mc_header);
 	data_size =3D get_datasize(mc_header);
@@ -248,6 +249,25 @@ static int microcode_sanity_check(void *mc, int print_=
err)
 		return -EINVAL;
 	}
=20
+	/*
+	 * Enforce for late-load that min_req_id is specified in the header.
+	 * Otherwise its an old format microcode, reject it.
+	 */
+	if (print_err) {
+		if (!mc_header->min_req_id) {
+			pr_warn("Header MUST specify min version for late-load\n");
+			return -EINVAL;
+		}
+
+		intel_cpu_collect_info(&uci);
+		if (uci.cpu_sig.rev < mc_header->min_req_id) {
+			pr_warn("Current revision 0x%x is too old to update,"
+				"must  be at 0x%x version or higher\n",
+				uci.cpu_sig.rev, mc_header->min_req_id);
+			return -EINVAL;
+		}
+	}
+
 	if (!ext_table_size)
 		return 0;
=20
--=20
2.32.0
From nobody Sat Apr 11 02:18:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E202FC25B08
	for <linux-kernel@archiver.kernel.org>; Wed, 17 Aug 2022 05:12:14 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232608AbiHQFMN (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 17 Aug 2022 01:12:13 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49432 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232070AbiHQFMC (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 17 Aug 2022 01:12:02 -0400
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2B106E2C1
        for <linux-kernel@vger.kernel.org>;
 Tue, 16 Aug 2022 22:12:01 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1660713121; x=1692249121;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=KR/HBP1wOvQFpjD2YET+iCx+bjZqywBPEsVL8qmACU8=;
  b=eBGBDaCDeb7hqD5rGKsyDKfJBN83oq5QDis/hnj6jMZbB2aad32FYmx5
   MATTinIoT6G0xd2sgUyGWFHHDYXTs57MbHwRHyVDYQ9zTnqtPdWlwXffT
   Q5ul3rTHevGvFR//ECUpseB6mDvsxFMziWZKrHZHVSj+d3+qMWvZ/JGPA
   2nD4QaQU7e5ok+Np3qtPSD0KDkeLahitiuWyV11pjlVp0XbVQQHVpB6h4
   c8FiVMhaIHf09+qFr1yVrAlTLwCVkRv03abXWATlrJOJQ1Pikw2iN52Q2
   uBiQOKiKKY9NrG9sBXDX9HgIH0PoceWShueopleDNJ+3qYbxhKPYqLBL9
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10441"; a="289972499"
X-IronPort-AV: E=Sophos;i="5.93,242,1654585200";
   d="scan'208";a="289972499"
Received: from orsmga003.jf.intel.com ([10.7.209.27])
  by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Aug 2022 22:12:00 -0700
X-IronPort-AV: E=Sophos;i="5.93,242,1654585200";
   d="scan'208";a="557976690"
Received: from araj-dh-work.jf.intel.com ([10.165.157.158])
  by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Aug 2022 22:11:59 -0700
From: Ashok Raj <ashok.raj@intel.com>
To: Borislav Petkov <bp@alien8.de>,
        Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>,
        Dave Hansen <dave.hansen@intel.com>,
        "LKML Mailing List" <linux-kernel@vger.kernel.org>,
        X86-kernel <x86@kernel.org>,
        Andy Lutomirski <luto@amacapital.net>,
        Tom Lendacky <thomas.lendacky@amd.com>,
        "Jacon Jun Pan" <jacob.jun.pan@intel.com>,
        Ashok Raj <ashok.raj@intel.com>
Subject: [PATCH v3 3/5] x86/microcode: Avoid any chance of MCE's during
 microcode update
Date: Wed, 17 Aug 2022 05:11:25 +0000
Message-Id: <20220817051127.3323755-4-ashok.raj@intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <20220817051127.3323755-1-ashok.raj@intel.com>
References: <20220817051127.3323755-1-ashok.raj@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

When a microcode update is in progress, several instructions and MSR's can
be patched by the update. During the update in progress, touching any of
the resources being patched could result in unpredictable results. If
thread0 is doing the update and thread1 happens to get a MCE, the handler
might read an MSR that's being patched.

In order to have predictable behavior, to avoid this scenario we set the MC=
IP in
all threads. Since MCE's can't be nested, HW will automatically promote to
shutdown condition.

After the update is completed, MCIP flag is cleared. The system is going to
shutdown anyway, since the MCE could be a fatal error, or even recoverable
errors in kernel space are treated as unrecoverable.

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/include/asm/mce.h           |  4 ++++
 arch/x86/kernel/cpu/mce/core.c       |  9 +++++++++
 arch/x86/kernel/cpu/microcode/core.c | 11 +++++++++++
 3 files changed, 24 insertions(+)

diff --git a/arch/x86/include/asm/mce.h b/arch/x86/include/asm/mce.h
index cc73061e7255..2aef6120e23f 100644
--- a/arch/x86/include/asm/mce.h
+++ b/arch/x86/include/asm/mce.h
@@ -207,12 +207,16 @@ void mcheck_cpu_init(struct cpuinfo_x86 *c);
 void mcheck_cpu_clear(struct cpuinfo_x86 *c);
 int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_info,
 			       u64 lapic_id);
+extern void mce_set_mcip(void);
+extern void mce_clear_mcip(void);
 #else
 static inline int mcheck_init(void) { return 0; }
 static inline void mcheck_cpu_init(struct cpuinfo_x86 *c) {}
 static inline void mcheck_cpu_clear(struct cpuinfo_x86 *c) {}
 static inline int apei_smca_report_x86_error(struct cper_ia_proc_ctx *ctx_=
info,
 					     u64 lapic_id) { return -EINVAL; }
+static inline void mce_set_mcip(void) {}
+static inline void mce_clear_mcip(void) {}
 #endif
=20
 void mce_setup(struct mce *m);
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c
index 2c8ec5c71712..72b49d95bb3b 100644
--- a/arch/x86/kernel/cpu/mce/core.c
+++ b/arch/x86/kernel/cpu/mce/core.c
@@ -402,6 +402,15 @@ static noinstr void mce_wrmsrl(u32 msr, u64 v)
 		     : : "c" (msr), "a"(low), "d" (high) : "memory");
 }
=20
+void mce_set_mcip(void)
+{
+	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0x1);
+}
+
+void mce_clear_mcip(void)
+{
+	mce_wrmsrl(MSR_IA32_MCG_STATUS, 0x0);
+}
 /*
  * Collect all global (w.r.t. this processor) status about this machine
  * check into our "mce" struct so that we can use it later to assess
diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/mic=
rocode/core.c
index ad57e0e4d674..d24e1c754c27 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -39,6 +39,7 @@
 #include <asm/processor.h>
 #include <asm/cmdline.h>
 #include <asm/setup.h>
+#include <asm/mce.h>
=20
 #define DRIVER_VERSION	"2.2"
=20
@@ -450,6 +451,14 @@ static int __reload_late(void *info)
 	if (__wait_for_cpus(&late_cpus_in, NSEC_PER_SEC))
 		return -1;
=20
+	/*
+	 * Its dangerous to let MCE while microcode update is in progress.
+	 * Its extremely rare and even if happens they are fatal errors.
+	 * But reading patched areas before the update is complete can be
+	 * leading to unpredictable results. Setting MCIP will guarantee
+	 * the platform is taken to reset predictively.
+	 */
+	mce_set_mcip();
 	/*
 	 * On an SMT system, it suffices to load the microcode on one sibling of
 	 * the core because the microcode engine is shared between the threads.
@@ -457,6 +466,7 @@ static int __reload_late(void *info)
 	 * loading attempts happen on multiple threads of an SMT core. See
 	 * below.
 	 */
+
 	if (cpumask_first(topology_sibling_cpumask(cpu)) =3D=3D cpu)
 		apply_microcode_local(&err);
 	else
@@ -473,6 +483,7 @@ static int __reload_late(void *info)
 	if (__wait_for_cpus(&late_cpus_out, NSEC_PER_SEC))
 		panic("Timeout during microcode update!\n");
=20
+	mce_clear_mcip();
 	/*
 	 * At least one thread has completed update on each core.
 	 * For others, simply call the update to make sure the
--=20
2.32.0
From nobody Sat Apr 11 02:18:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2063CC25B08
	for <linux-kernel@archiver.kernel.org>; Wed, 17 Aug 2022 05:12:19 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230282AbiHQFMR (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 17 Aug 2022 01:12:17 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49444 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S232403AbiHQFMD (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 17 Aug 2022 01:12:03 -0400
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C4756DFA5
        for <linux-kernel@vger.kernel.org>;
 Tue, 16 Aug 2022 22:12:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1660713122; x=1692249122;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=D80ulUAhbmrV7L9/yOfZt2Tb37WVpYS3nBYfSyUXRLU=;
  b=AWdNW9ruEInt7Z2b2+PExU05ZuJelHRN2YsPF2qwr3FnAfWT3Wzo6jTT
   y6dvKlERFY/BJEHQwYumovZ0aiLn3KXDo6muNubT3Dyy+WyDe2rL6I3UR
   C9Nuh7AOhATVV9voZi8AqniHhGbFU0HwkYA337FaOJRq/Lh2jaJUsln/7
   Je9adQ4Afjtamd5oWGAIkNL4P1WwO3nfuRg2d/exRZxxydbUAwglMBIz4
   GUr60tlf/8icTDDTU7It524jNsvo6/LIDo7WzcG1IRIt4HXgxBkjzLihx
   ZLKlGau3tIzh8dU76gmrdmp437eS+B+ldr3VeV0xoKWZs9v5ZbFld/U4Q
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10441"; a="289972502"
X-IronPort-AV: E=Sophos;i="5.93,242,1654585200";
   d="scan'208";a="289972502"
Received: from orsmga003.jf.intel.com ([10.7.209.27])
  by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Aug 2022 22:12:00 -0700
X-IronPort-AV: E=Sophos;i="5.93,242,1654585200";
   d="scan'208";a="557976693"
Received: from araj-dh-work.jf.intel.com ([10.165.157.158])
  by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Aug 2022 22:11:59 -0700
From: Ashok Raj <ashok.raj@intel.com>
To: Borislav Petkov <bp@alien8.de>,
        Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>,
        Dave Hansen <dave.hansen@intel.com>,
        "LKML Mailing List" <linux-kernel@vger.kernel.org>,
        X86-kernel <x86@kernel.org>,
        Andy Lutomirski <luto@amacapital.net>,
        Tom Lendacky <thomas.lendacky@amd.com>,
        "Jacon Jun Pan" <jacob.jun.pan@intel.com>,
        Ashok Raj <ashok.raj@intel.com>,
        Jacob Pan <jacob.jun.pan@linux.intel.com>
Subject: [PATCH v3 4/5] x86/x2apic: Support x2apic self IPI with NMI_VECTOR
Date: Wed, 17 Aug 2022 05:11:26 +0000
Message-Id: <20220817051127.3323755-5-ashok.raj@intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <20220817051127.3323755-1-ashok.raj@intel.com>
References: <20220817051127.3323755-1-ashok.raj@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

X2APIC architecture introduced a dedicated register for sending self-IPI.
Though highly optimized for performance, its semantics limit the delivery
mode to fixed mode.  NMI vector is not supported, this created an
inconsistent behavior between X2APIC and others.

This patch adds support for X2APIC NMI_VECTOR by fall back to the slower
ICR method.

Suggested-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/apic/x2apic_phys.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2ap=
ic_phys.c
index 6bde05a86b4e..cf187f1906b2 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -149,7 +149,11 @@ int x2apic_phys_pkg_id(int initial_apicid, int index_m=
sb)
=20
 void x2apic_send_IPI_self(int vector)
 {
-	apic_write(APIC_SELF_IPI, vector);
+	if (unlikely(vector =3D=3D NMI_VECTOR))
+		apic->send_IPI_mask(cpumask_of(smp_processor_id()),
+				    NMI_VECTOR);
+	else
+		apic_write(APIC_SELF_IPI, vector);
 }
=20
 static struct apic apic_x2apic_phys __ro_after_init =3D {
--=20
2.32.0
From nobody Sat Apr 11 02:18:39 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 18155C25B08
	for <linux-kernel@archiver.kernel.org>; Wed, 17 Aug 2022 05:12:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S232148AbiHQFMV (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 17 Aug 2022 01:12:21 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49446 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229996AbiHQFMD (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 17 Aug 2022 01:12:03 -0400
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9D3926DFAB
        for <linux-kernel@vger.kernel.org>;
 Tue, 16 Aug 2022 22:12:02 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1660713122; x=1692249122;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=L1VBHnDIQOcr6nKQc2FyQ1mkFBCT7m3GiUej9mOimyU=;
  b=ivUTQjNsLbEPEbwigLJwUNzQR6NaJcJ++CT+PtZl6M+986AgdFk/RPTa
   siKk6k9jAFUjuumUFumV6AQsBrc1Zth1XhCyo6eb6hqPY2JzegmnjIgX9
   ImT/TzNa+ddiYaXtXRWz9M1FWYIf61EWGtKgkLb8jZKyUuijAwMHnxACo
   V5zphdjxfGkyyiNnV7d3NuUGLmOfcTqXfkMEuN4gGHk+qGWsnEYNelnam
   Yot5NOTU4+4joPTboW6CUZ7bIBuuMLkn4T7v7idUiJNKCjjlaSLO04zuH
   WOmW1Xq92roTLae4WnlxqEs7k4Acflo0hSsar7+h8gCTWpLE/6IOLS2QY
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10441"; a="289972507"
X-IronPort-AV: E=Sophos;i="5.93,242,1654585200";
   d="scan'208";a="289972507"
Received: from orsmga003.jf.intel.com ([10.7.209.27])
  by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Aug 2022 22:12:01 -0700
X-IronPort-AV: E=Sophos;i="5.93,242,1654585200";
   d="scan'208";a="557976696"
Received: from araj-dh-work.jf.intel.com ([10.165.157.158])
  by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 16 Aug 2022 22:11:59 -0700
From: Ashok Raj <ashok.raj@intel.com>
To: Borislav Petkov <bp@alien8.de>,
        Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>,
        Dave Hansen <dave.hansen@intel.com>,
        "LKML Mailing List" <linux-kernel@vger.kernel.org>,
        X86-kernel <x86@kernel.org>,
        Andy Lutomirski <luto@amacapital.net>,
        Tom Lendacky <thomas.lendacky@amd.com>,
        "Jacon Jun Pan" <jacob.jun.pan@intel.com>,
        Ashok Raj <ashok.raj@intel.com>
Subject: [PATCH v3 5/5] x86/microcode: Place siblings in NMI loop while update
 in progress
Date: Wed, 17 Aug 2022 05:11:27 +0000
Message-Id: <20220817051127.3323755-6-ashok.raj@intel.com>
X-Mailer: git-send-email 2.32.0
In-Reply-To: <20220817051127.3323755-1-ashok.raj@intel.com>
References: <20220817051127.3323755-1-ashok.raj@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Microcode updates need a guarantee that the thread sibling that is waiting
for the update to finish on the primary core will not execute any
instructions until the update is complete. This is required to guarantee
any MSR or instruction that's being patched will be executed before the
update is complete.

After the stop_machine() rendezvous, an NMI handler is registered. If an
NMI were to happen while the microcode update is not complete, the
secondary thread will spin until the ucode update state is cleared.

Couple of choices discussed are:

1. Rendezvous inside the NMI handler, and also perform the update from
   within the handler. This seemed too risky and might cause instability
   with the races that we would need to solve. This would be a difficult
   choice.
   	1.a Since the primary thread of every core is performing a wrmsr
	for the update, once the wrmsr has started, it can't be
	interrupted. Hence its not required to NMI the primary thread of
	the core. Only the secondary thread needs to be parked in NMI
	before the update begins.
	Suggested by From Andy Cooper
2. Thomas (tglx) suggested that we could look into masking all the LVT
   originating NMI's. Such as LINT1, Perf control LVT entries and such.
   Since we are in the rendezvous loop, we don't need to worry about any
   NMI IPI's generated by the OS.

   The one we didn't have any control over is the ACPI mechanism of sending
   notifications to kernel for Firmware First Processing (FFM). Apparently
   it seems there is a PCH register that BIOS in SMI would write to
   generate such an interrupt (ACPI GHES).
3. This is a simpler option. OS registers an NMI handler and doesn't do any
   NMI rendezvous dance. But if an NMI were to happen, we check if any of
   the CPUs thread siblings have an update in progress. Only those CPUs
   would take an NMI. The thread performing the wrmsr() will only take an
   NMI after the completion of the wrmsr 0x79 flow.

   [ Lutomirsky thinks this is weak, and what happens from taking the
   interrupt and the path to the registered callback handler might be
   exposed.]

   Seems like 1.a is the best candidate.

The algorithm is something like this:

After stop_machine() all threads are executing __reload_late()

nmi_callback()
{
	if (!in_ucode_update)
		return NMI_DONE;
	if (cpu not in sibling_mask)
		return NMI_DONE;
	update sibling reached NMI for primary to continue

	while (cpu in sibling_mask)
		wait;
	return NMI_HANDLED;
}

__reload_late()
{

	entry_rendezvous(&late_cpus_in);
	set_mcip()
	if (this_cpu is first_cpu in the core)
		wait for siblings to drop in NMI
		apply_microcode()
	else {
		send self_ipi(NMI_VECTOR);
		goto wait_for_siblings;
	}

wait_for_siblings:
	exit_rendezvous(&late_cpus_out);
	clear_mcip
}

reload_late()
{
	register_nmi_handler()
	prepare_mask of all sibling cpus()
	update state =3D ucode in progress;
	stop_machine();
	unregister_nmi_handler();
}

Signed-off-by: Ashok Raj <ashok.raj@intel.com>
---
 arch/x86/kernel/cpu/microcode/core.c | 218 ++++++++++++++++++++++++++-
 1 file changed, 211 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/microcode/core.c b/arch/x86/kernel/cpu/mic=
rocode/core.c
index d24e1c754c27..fd3b8ce2c82a 100644
--- a/arch/x86/kernel/cpu/microcode/core.c
+++ b/arch/x86/kernel/cpu/microcode/core.c
@@ -39,7 +39,9 @@
 #include <asm/processor.h>
 #include <asm/cmdline.h>
 #include <asm/setup.h>
+#include <asm/apic.h>
 #include <asm/mce.h>
+#include <asm/nmi.h>
=20
 #define DRIVER_VERSION	"2.2"
=20
@@ -411,6 +413,13 @@ static int check_online_cpus(void)
=20
 static atomic_t late_cpus_in;
 static atomic_t late_cpus_out;
+static atomic_t nmi_cpus;	// number of CPUs that enter NMI
+static atomic_t nmi_timeouts;   // number of siblings that timeout
+static atomic_t nmi_siblings;   // Nmber of siblings that enter NMI
+static atomic_t in_ucode_update;// Are we in microcode update?
+static atomic_t nmi_exit;       // Siblings that exit NMI
+
+static struct cpumask all_sibling_mask;
=20
 static int __wait_for_cpus(atomic_t *t, long long timeout)
 {
@@ -433,6 +442,104 @@ static int __wait_for_cpus(atomic_t *t, long long tim=
eout)
 	return 0;
 }
=20
+struct core_rendez {
+	int num_core_cpus;
+	atomic_t callin;
+	atomic_t core_done;
+};
+
+static DEFINE_PER_CPU(struct core_rendez, core_sync);
+
+static int __wait_for_update(atomic_t *t, long long timeout)
+{
+	while (!atomic_read(t)) {
+		if (timeout < SPINUNIT)
+			return 1;
+
+		cpu_relax();
+		ndelay(SPINUNIT);
+		timeout -=3D SPINUNIT;
+		touch_nmi_watchdog();
+	}
+	return 0;
+}
+
+static int ucode_nmi_cb(unsigned int val, struct pt_regs *regs)
+{
+	int ret, first_cpu, cpu =3D smp_processor_id();
+	struct core_rendez *rendez;
+
+	atomic_inc(&nmi_cpus);
+	if (!atomic_read(&in_ucode_update))
+		return NMI_DONE;
+
+	if (!cpumask_test_cpu(cpu, &all_sibling_mask))
+		return NMI_DONE;
+
+	first_cpu =3D cpumask_first(topology_sibling_cpumask(cpu));
+	rendez =3D &per_cpu(core_sync, first_cpu);
+
+	/*
+	 * If primary has marked update is complete, we don't need to be
+	 * here in the NMI handler.
+	 */
+	if (atomic_read(&rendez->core_done))
+		return NMI_DONE;
+
+	atomic_inc(&nmi_siblings);
+	pr_debug("Sibling CPU %d made into NMI handler\n", cpu);
+	/*
+	 * primary thread waits for all siblings to checkin the NMI handler
+	 * before performing the microcode update
+	 */
+
+	atomic_inc(&rendez->callin);
+	ret =3D __wait_for_update(&rendez->core_done, NSEC_PER_SEC);
+	if (ret) {
+		atomic_inc(&nmi_timeouts);
+		pr_debug("Sibling CPU %d sibling timedout\n",cpu);
+	}
+	/*
+	 * Once primary signals update is complete, we are free to get out
+	 * of the NMI jail
+	 */
+	if (atomic_read(&rendez->core_done)) {
+		pr_debug("Sibling CPU %d breaking from NMI\n", cpu);
+		atomic_inc(&nmi_exit);
+	}
+
+	return NMI_HANDLED;
+}
+
+/*
+ * Primary thread clears the cpumask to release the siblings from the NMI
+ * jail
+ */
+
+static void clear_nmi_cpus(void)
+{
+	int first_cpu, wait_cpu, cpu =3D smp_processor_id();
+
+	first_cpu =3D cpumask_first(topology_sibling_cpumask(cpu));
+	for_each_cpu(wait_cpu, topology_sibling_cpumask(cpu)) {
+		if (wait_cpu =3D=3D first_cpu)
+			continue;
+		cpumask_clear_cpu(wait_cpu, &all_sibling_mask);
+	}
+}
+
+static int __wait_for_siblings(struct core_rendez *rendez, long long timeo=
ut)
+{
+	int num_sibs =3D rendez->num_core_cpus - 1;
+	atomic_t *t =3D &rendez->callin;
+
+	while (atomic_read(t) < num_sibs) {
+		cpu_relax();
+		touch_nmi_watchdog();
+	}
+	return 0;
+}
+
 /*
  * Returns:
  * < 0 - on error
@@ -440,17 +547,20 @@ static int __wait_for_cpus(atomic_t *t, long long tim=
eout)
  */
 static int __reload_late(void *info)
 {
-	int cpu =3D smp_processor_id();
+	int first_cpu, cpu =3D smp_processor_id();
 	enum ucode_state err;
 	int ret =3D 0;
=20
 	/*
 	 * Wait for all CPUs to arrive. A load will not be attempted unless all
 	 * CPUs show up.
-	 * */
+	 */
 	if (__wait_for_cpus(&late_cpus_in, NSEC_PER_SEC))
 		return -1;
=20
+	if (cpumask_first(cpu_online_mask) =3D=3D cpu)
+		pr_debug("__reload_late: Entry Sync Done\n");
+
 	/*
 	 * Its dangerous to let MCE while microcode update is in progress.
 	 * Its extremely rare and even if happens they are fatal errors.
@@ -459,6 +569,7 @@ static int __reload_late(void *info)
 	 * the platform is taken to reset predictively.
 	 */
 	mce_set_mcip();
+
 	/*
 	 * On an SMT system, it suffices to load the microcode on one sibling of
 	 * the core because the microcode engine is shared between the threads.
@@ -466,13 +577,35 @@ static int __reload_late(void *info)
 	 * loading attempts happen on multiple threads of an SMT core. See
 	 * below.
 	 */
+	first_cpu =3D cpumask_first(topology_sibling_cpumask(cpu));
=20
-	if (cpumask_first(topology_sibling_cpumask(cpu)) =3D=3D cpu)
+	/*
+	 * Set the CPUs that we should hold in NMI until the primary has
+	 * completed the microcode update.
+	 */
+	if (first_cpu =3D=3D cpu) {
+		struct core_rendez *pcpu_core =3D &per_cpu(core_sync, cpu);
+
+		/*
+		 * Wait for all siblings to enter
+		 * NMI before performing the update
+		 */
+		ret =3D __wait_for_siblings(pcpu_core, NSEC_PER_SEC);
+		if (ret) {
+			pr_err("CPU %d core lead timeout waiting for"
+			       " siblings\n", cpu);
+			ret =3D -1;
+		}
+		pr_debug("Primary CPU %d proceeding with update\n", cpu);
 		apply_microcode_local(&err);
-	else
+		atomic_set(&pcpu_core->core_done, 1);
+		clear_nmi_cpus();
+	} else {
+		apic->send_IPI_self(NMI_VECTOR);
 		goto wait_for_siblings;
+	}
=20
-	if (err >=3D UCODE_NFOUND) {
+	if (ret || err >=3D UCODE_NFOUND) {
 		if (err =3D=3D UCODE_ERROR)
 			pr_warn("Error reloading microcode on CPU %d\n", cpu);
=20
@@ -483,6 +616,9 @@ static int __reload_late(void *info)
 	if (__wait_for_cpus(&late_cpus_out, NSEC_PER_SEC))
 		panic("Timeout during microcode update!\n");
=20
+	if (cpumask_first(cpu_online_mask) =3D=3D cpu)
+		pr_debug("__reload_late: Exit Sync Done\n");
+
 	mce_clear_mcip();
 	/*
 	 * At least one thread has completed update on each core.
@@ -496,26 +632,94 @@ static int __reload_late(void *info)
 	return ret;
 }
=20
+static void set_nmi_cpus(int cpu)
+{
+	int first_cpu, wait_cpu;
+	struct core_rendez *pcpu_core =3D &per_cpu(core_sync, cpu);
+
+	first_cpu =3D cpumask_first(topology_sibling_cpumask(cpu));
+	for_each_cpu(wait_cpu, topology_sibling_cpumask(cpu)) {
+		if (wait_cpu =3D=3D first_cpu) {
+			pcpu_core->num_core_cpus =3D
+					cpumask_weight(topology_sibling_cpumask(wait_cpu));
+			continue;
+		}
+		cpumask_set_cpu(wait_cpu, &all_sibling_mask);
+	}
+}
+
+static void prepare_siblings(void)
+{
+	int cpu;
+
+	for_each_cpu(cpu, cpu_online_mask) {
+		set_nmi_cpus(cpu);
+	}
+}
+
 /*
  * Reload microcode late on all CPUs. Wait for a sec until they
  * all gather together.
  */
 static int microcode_reload_late(void)
 {
-	int ret;
+	int ret =3D 0;
=20
 	pr_err("Attempting late microcode loading - it is dangerous and taints th=
e kernel.\n");
 	pr_err("You should switch to early loading, if possible.\n");
=20
+	/*
+	 * Used for late_load entry and exit rendezvous
+	 */
 	atomic_set(&late_cpus_in,  0);
 	atomic_set(&late_cpus_out, 0);
=20
+	/*
+	 * in_ucode_update: Global state while in ucode update
+	 * nmi_cpus: Count of CPUs entering NMI while ucode in progress
+	 * nmi_siblings: Count of siblings that enter NMI
+	 * nmi_timeouts: Count of siblings that fail to see mask clear
+	 */
+	atomic_set(&in_ucode_update,0);
+	atomic_set(&nmi_cpus, 0);
+	atomic_set(&nmi_timeouts, 0);
+	atomic_set(&nmi_siblings, 0);
+
+	cpumask_clear(&all_sibling_mask);
+
+	ret =3D register_nmi_handler(NMI_LOCAL, ucode_nmi_cb, NMI_FLAG_FIRST,
+				   "ucode_nmi");
+	if (ret) {
+		pr_err("Unable to register NMI handler\n");
+		goto done;
+	}
+
+	/*
+	 * Prepare everything for siblings threads to drop into NMI while
+	 * the update is in progress.
+	 */
+	prepare_siblings();
+	atomic_set(&in_ucode_update, 1);
+#if 0
+	apic->send_IPI_mask(&all_sibling_mask, NMI_VECTOR);
+	pr_debug("Sent NMI broadcast to all sibling cpus\n");
+#endif
 	ret =3D stop_machine_cpuslocked(__reload_late, NULL, cpu_online_mask);
 	if (ret =3D=3D 0)
 		microcode_check();
=20
-	pr_info("Reload completed, microcode revision: 0x%x\n", boot_cpu_data.mic=
rocode);
+	unregister_nmi_handler(NMI_LOCAL, "ucode_nmi");
+
+	pr_debug("Total CPUs that entered NMI     ... %d\n",
+		 atomic_read(&nmi_cpus));
+	pr_debug("Total siblings that entered NMI ... %d\n",
+		 atomic_read(&nmi_siblings));
+	pr_debug("Total siblings timedout         ... %d\n",
+		 atomic_read(&nmi_timeouts));
+	pr_info("Reload completed, microcode revision: 0x%x\n",
+	        boot_cpu_data.microcode);
=20
+done:
 	return ret;
 }
=20
--=20
2.32.0