From nobody Thu Apr 2 04:22:54 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.8]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 791AF368269 for ; Tue, 31 Mar 2026 02:14:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.8 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774923312; cv=none; b=ZmEcH2kxdCApduMykhnVyZ6BrW8zY94+YFOE7Oc9RUp6MhFrxWyiKM/3NXJqc+tXjJtPyVVYDMm+pSfAcA/2QzZbwny4M9cwSTUpuSkkwUSUnEQebtBJuqlJwxXMK/nj/Inw0rcmv+lJUFZswD5tYvtquz/Em87ZLkGayKLTLuE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774923312; c=relaxed/simple; bh=29kx2ZjLTgH1i/16gpKyzlKrskEdV5uHiROvzlKKw38=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=paJQ7Nx/1LQj3ZJ7tkDbSZadWJP6NU8fJFZTr0RidenElopuDSXcJx3swOt239yJJpdq6ic+/ftTA02GCJ5GnkFlU4O3SmELs9Wnky1ZcRMprKooFlW6jKuZSPKs3ZLTPCAtW+SKJVUACZbxLxz3o0zqUkwwIwPPqTevZFvv7rw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=T54EODzL; arc=none smtp.client-ip=192.198.163.8 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="T54EODzL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774923302; x=1806459302; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=29kx2ZjLTgH1i/16gpKyzlKrskEdV5uHiROvzlKKw38=; b=T54EODzLxwmkKPPpgPsRUG/8FTfaT4N4jwOL95nEbSzZe1dGfOGpm8Ie WCRug3CFB7XhAR5FmY9voCRXQqqh7I4FUMTwMBrEMSi+v39T/EVDez6fh 2dahnGvOgVGkwSmaIwH9VpcoyGD1iAPmvF0rmdhcE4extsc4BRMnJRo2k WweH+TQUx7l6p6ntwgCo/5zjuOBaJd5L6IJocSs+uRY9uemjyhG2GzXN5 nivDd9DZoce3KKtID7wN7EvOIeTQE9yOX/rHoJ2VsbZ3xFHkV2X3Bkler q7eWI7qM0/FNr19iZalVIEBRyABLIRftw0z5ptFCxPyHTJa1s2tMZrzH6 w==; X-CSE-ConnectionGUID: eEnnporIR3+G4nbo/M/pMA== X-CSE-MsgGUID: fEIdNXGjSciDPEw/+KDEKQ== X-IronPort-AV: E=McAfee;i="6800,10657,11744"; a="93508090" X-IronPort-AV: E=Sophos;i="6.23,151,1770624000"; d="scan'208";a="93508090" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Mar 2026 19:07:46 -0700 X-CSE-ConnectionGUID: Tc7S211yTde7Pf/LNtLkfQ== X-CSE-MsgGUID: 2zk1LiPXTAGnBxE6XfJWJw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,151,1770624000"; d="scan'208";a="221358685" Received: from chang-linux-3.sc.intel.com (HELO chang-linux-3) ([172.25.66.172]) by fmviesa006.fm.intel.com with ESMTP; 30 Mar 2026 19:07:42 -0700 From: "Chang S. Bae" To: linux-kernel@vger.kernel.org Cc: x86@kernel.org, tglx@kernel.org, mingo@redhat.com, bp@alien8.de, dave.hansen@linux.intel.com, peterz@infradead.org, david.kaplan@amd.com, chang.seok.bae@intel.com Subject: [PATCH v2 04/11] stop_machine: Add NMI-based execution path Date: Tue, 31 Mar 2026 01:42:42 +0000 Message-ID: <20260331014251.86353-5-chang.seok.bae@intel.com> X-Mailer: git-send-email 2.51.0 In-Reply-To: <20260331014251.86353-1-chang.seok.bae@intel.com> References: <20260331014251.86353-1-chang.seok.bae@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently multi_cpu_stop() executes the stop function in the MULTI_STOP_RUN state. But NMIs can still preempt the execution. For use cases requiring stricter isolation from them, provide an option to run the stop function from NMI context. Then, the NMI stop function must be built without instrumentation, as exceptions may re-enable NMIs on return via IRET. Annotate the NMI handler entirely with noinstr. However, objtool cannot reliably determine whether the indirect call target is located in a noinstr section, so it may emit false positives. To avoid this, temporarily lift the instrumentation restriction around the indirect call site and document the intention. The x86 microcode loader is currently the primary user for this. But other architectures are not expected to use it. Add a build option to make it opt-in. Originally-by: David Kaplan Suggested-by: Borislav Petkov Suggested-by: Thomas Gleixner Signed-off-by: Chang S. Bae Link: https://lore.kernel.org/lkml/20260129121729.GRaXtP2aeWkQKegxC2@fat_cr= ate.local Link: https://lore.kernel.org/lkml/87wm0zl8p2.ffs@tglx --- V1 -> V2: Multiple changes * Fix racy return value read [**] * Add Kconfig option (Thomas) * Add cpumask to track NMI delivery (Boris) * Rework implementation. Consolidate under ifdef [**]: Observed delay in IPI/NMI delivery when testing error return paths --- arch/Kconfig | 3 ++ include/linux/stop_machine.h | 16 ++++++ kernel/stop_machine.c | 96 +++++++++++++++++++++++++++++++++++- 3 files changed, 114 insertions(+), 1 deletion(-) diff --git a/arch/Kconfig b/arch/Kconfig index 102ddbd4298e..f84fd528aae7 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -1841,4 +1841,7 @@ config ARCH_WANTS_PRE_LINK_VMLINUX config ARCH_HAS_CPU_ATTACK_VECTORS bool =20 +config STOP_MACHINE_NMI + bool + endmenu diff --git a/include/linux/stop_machine.h b/include/linux/stop_machine.h index 2f986555113a..9424d363ab38 100644 --- a/include/linux/stop_machine.h +++ b/include/linux/stop_machine.h @@ -19,6 +19,12 @@ */ typedef int (*cpu_stop_fn_t)(void *arg); =20 +/* + * Stop function variant runnable from NMI context. This makes the + * noinstr requirement explicit at the type level. + */ +typedef int (*cpu_stop_nmisafe_fn_t)(void *arg); + #ifdef CONFIG_SMP =20 struct cpu_stop_work { @@ -189,4 +195,14 @@ stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, void = *data, } =20 #endif /* CONFIG_SMP || CONFIG_HOTPLUG_CPU */ + +#ifdef CONFIG_STOP_MACHINE_NMI + +void arch_send_self_nmi(void); +bool noinstr stop_machine_nmi_handler(void); + +#else +static inline bool stop_machine_nmi_handler(void) { return false; } +#endif /* CONFIG_STOP_MACHINE_NMI */ + #endif /* _LINUX_STOP_MACHINE */ diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c index 092c65c002ff..45ea62f1b2b5 100644 --- a/kernel/stop_machine.c +++ b/kernel/stop_machine.c @@ -22,6 +22,7 @@ #include #include #include +#include =20 /* * Structure to determine completion condition and record errors. May @@ -174,6 +175,15 @@ struct multi_stop_data { =20 enum multi_stop_state state; atomic_t thread_ack; + +#ifdef CONFIG_STOP_MACHINE_NMI + /* Used in the NMI stop_machine variant */ + bool use_nmi; + /* A separate function type to highlight noinstr requirement */ + cpu_stop_nmisafe_fn_t nmisafe_fn; + /* cpumask of CPUs on which to raise an NMI */ + cpumask_var_t nmi_cpus; +#endif }; =20 static void set_state(struct multi_stop_data *msdata, @@ -197,6 +207,8 @@ notrace void __weak stop_machine_yield(const struct cpu= mask *cpumask) cpu_relax(); } =20 +static int multi_stop_run(struct multi_stop_data *msdata); + /* This is the cpu_stop function which stops the CPU. */ static int multi_cpu_stop(void *data) { @@ -235,7 +247,7 @@ static int multi_cpu_stop(void *data) break; case MULTI_STOP_RUN: if (is_active) - err =3D msdata->fn(msdata->data); + err =3D multi_stop_run(msdata); break; default: break; @@ -712,3 +724,85 @@ int stop_machine_from_inactive_cpu(cpu_stop_fn_t fn, v= oid *data, mutex_unlock(&stop_cpus_mutex); return ret | done.ret; } + +#ifdef CONFIG_STOP_MACHINE_NMI + +struct nmi_stop { + struct multi_stop_data *data; + int ret; + bool done; +}; + +static DEFINE_PER_CPU(struct nmi_stop, nmi_stop); + +/* + * Instrumentation may trigger nested exceptions such as #INT3, #DB, + * or #PF. IRET from those would re-enable NMIs, which opposes the goal + * of this NMI stop-machine facility. + */ +bool noinstr stop_machine_nmi_handler(void) +{ + struct multi_stop_data *msdata =3D raw_cpu_read(nmi_stop.data); + unsigned int cpu =3D smp_processor_id(); + int ret; + + if (!msdata || !cpumask_test_and_clear_cpu(cpu, msdata->nmi_cpus)) + return false; + + /* + * The indirect call to @nmisafe_fn() is indistinguishable at + * post-compilation. Temporarily enabling instrumentation avoids + * objtool false positives. + */ + instrumentation_begin(); + ret =3D msdata->nmisafe_fn(msdata->data); + instrumentation_end(); + + raw_cpu_write(nmi_stop.ret, ret); + raw_cpu_write(nmi_stop.done, true); + raw_cpu_write(nmi_stop.data, NULL); + + return true; +} + +static bool wait_for_nmi_handler(void) +{ + /* Conservative timeout */ + unsigned long timeout =3D USEC_PER_SEC; + + while (!this_cpu_read(nmi_stop.done) && timeout--) + udelay(1); + + return this_cpu_read(nmi_stop.done); +} + +static int nmi_stop_run(struct multi_stop_data *msdata) +{ + /* + * Save per-CPU state accessible from NMI context and raise a + * self-NMI to execute the stop function from the NMI handler + */ + this_cpu_write(nmi_stop.data, msdata); + this_cpu_write(nmi_stop.done, false); + arch_send_self_nmi(); + + /* Ensure the handler went through before reading the result */ + if (!wait_for_nmi_handler()) + return -ETIMEDOUT; + + return this_cpu_read(nmi_stop.ret); +} + +static int multi_stop_run(struct multi_stop_data *msdata) +{ + return msdata->use_nmi ? nmi_stop_run(msdata) : msdata->fn(msdata->data); +} + +#else + +static int multi_stop_run(struct multi_stop_data *msdata) +{ + return msdata->fn(msdata->data); +} + +#endif /* CONFIG_STOP_MACHINE_NMI */ --=20 2.51.0