From nobody Mon Jun  8 14:37:30 2026
Received: from va-1-111.ptr.blmpb.com (va-1-111.ptr.blmpb.com
 [209.127.230.111])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A7A23F6C56
	for <linux-kernel@vger.kernel.org>; Thu, 28 May 2026 15:14:49 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.127.230.111
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779981292; cv=none;
 b=Co5Kgs13SmNho2wOL1LxrnvMIBpXps2ZXc4cU+rK6BDKtWWYhgyMUZfx4VXIXeO3b6IzJOFBvjtjVXmw0JbeZiRt3YQkCh5ElllSEhU1LU6dewLZinb/Kv1f0qi6bd0rAK/a6azzfWue/jZPwFZc/BVsq9cfrgHQtazbWANFCus=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779981292; c=relaxed/simple;
	bh=nxIsCHjzZWybG5T86n/SFQ8OZAVmnisDIL/dkKngm0A=;
	h=Message-Id:In-Reply-To:Mime-Version:References:Content-Type:Cc:
	 From:Date:To:Subject;
 b=MpRiykFXUiVTUU1agpXkMFZDqjJwbA2hEC7EhWhksHABePanakWMOckDVq2up3L9560ftrZu06U14fbDxseKief1b842w6TK24VaDTAPchkxnts4AQe50q59Vk6bDlJaty0t3vofyiTKrRvt3t7vHwQdqHtPogPt5HhuB1p4q8A=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;
 spf=pass smtp.mailfrom=bytedance.com;
 dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b=Y8LQXMk0; arc=none smtp.client-ip=209.127.230.111
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b="Y8LQXMk0"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 s=2212171451; d=bytedance.com; t=1779981257; h=from:subject:
 mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:
 mime-version:in-reply-to:message-id;
 bh=XT/wX8zJV/L4E81zCQyMz5tEfJg7QLuH5mdTyvXL7tA=;
 b=Y8LQXMk0BElBQbXvH8mwTkf+PpEFmhoonSC7++wz9TqB8rV9ux25ydcCk3aWIquxY7kQD4
 DUERiyVEG69qDwU7cH08x0X6yRI1sXKFHN63E5w95U07E+kauj9uhJ8+h9Tac1eTk0zf7M
 3uM5yx3Bv+pgDo/uAabmwelahvVNu5zpIM9aJy4Bmx0ufDs4QbUe4r8/hIaAsS9ysQqcc+
 HoJFPRMKBqdnT+74jpicMFXgmjvjqLRqBvoMH7cm9iND48sey61QOvIWbd6RvPuYAghj/L
 EKXmV2wF82h49PuyBsFsS6XMiDjfFfAFnmLJnVcamgBpMIZnPj8c++aGspKTEA==
Message-Id: <20260528151338.617843-2-zhouchuyi@bytedance.com>
In-Reply-To: <20260528151338.617843-1-zhouchuyi@bytedance.com>
X-Lms-Return-Path: 
 <lba+26a185bc8+16fc60+vger.kernel.org+zhouchuyi@bytedance.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260528151338.617843-1-zhouchuyi@bytedance.com>
Content-Transfer-Encoding: quoted-printable
X-Original-From: Chuyi Zhou <zhouchuyi@bytedance.com>
Cc: <linux-kernel@vger.kernel.org>, "Chuyi Zhou" <zhouchuyi@bytedance.com>
From: "Chuyi Zhou" <zhouchuyi@bytedance.com>
Date: Thu, 28 May 2026 23:13:27 +0800
X-Mailer: git-send-email 2.20.1
To: <tglx@kernel.org>, <mingo@redhat.com>, <luto@kernel.org>,
	<peterz@infradead.org>, <paulmck@kernel.org>, <muchun.song@linux.dev>,
	<bp@alien8.de>, <dave.hansen@linux.intel.com>, <pbonzini@redhat.com>,
	<bigeasy@linutronix.de>, <clrkwllms@kernel.org>, <rostedt@goodmis.org>,
	<nadav.amit@gmail.com>
Subject: [PATCH v6 01/12] smp: Disable preemption explicitly in
 __csd_lock_wait
Content-Type: text/plain; charset="utf-8"

The latter patches will enable preemption before csd_lock_wait(), which
could break csdlock_debug. Because the slice of other tasks on the CPU may
be accounted between ktime_get_mono_fast_ns() calls, disable preemption
explicitly in __csd_lock_wait(). This is a preparation for the next
patches.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/smp.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/smp.c b/kernel/smp.c
index a0bb56bd8dda..b58975480e11 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -323,6 +323,8 @@ static void __csd_lock_wait(call_single_data_t *csd)
 	int bug_id =3D 0;
 	u64 ts0, ts1;
=20
+	guard(preempt)();
+
 	ts1 =3D ts0 =3D ktime_get_mono_fast_ns();
 	for (;;) {
 		if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id, &nmessages))
--=20
2.20.1
From nobody Mon Jun  8 14:37:30 2026
Received: from va-1-114.ptr.blmpb.com (va-1-114.ptr.blmpb.com
 [209.127.230.114])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9690A3F7869
	for <linux-kernel@vger.kernel.org>; Thu, 28 May 2026 15:16:26 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.127.230.114
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779981386; cv=none;
 b=e3GPi4Bwc2ymeFcsQ33mQFTlrmpBLYlroqG7n+tUw3REWG/LvcW6wsmfGZGhFzbkXIh2pPeW/+C67ci5SYVx4ojtZ5jjR8aHpwQ4aTzfd91IY2kFiHo0uTYhmr1oPvEL4kqK/Yl9k3iCdH/WlpR6BxSOdh1D/FYFjyk0/4G3aTs=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779981386; c=relaxed/simple;
	bh=+OqGd8jPvMH91hbeOy1Duw+UqquzGiBJgmQDM481Mjc=;
	h=To:Cc:Date:Message-Id:In-Reply-To:Content-Type:Mime-Version:From:
	 Subject:References;
 b=KpF3gwW1zwQRDOk5iREX5EugeNstxmT68AoPCdelkdeaLi39nVng6Y8BwHKrmuyICYAy7dD24ni6XVgLg+MUSH/hIB9fjz0OJ0xFIq0b5S7tgS52AINQZW+5PItQt9eENnbq2pqds8bmH6I7vj5X+GfTq5nDPDN94X1MnOMDoDM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;
 spf=pass smtp.mailfrom=bytedance.com;
 dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b=bswMYlDS; arc=none smtp.client-ip=209.127.230.114
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b="bswMYlDS"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 s=2212171451; d=bytedance.com; t=1779981272; h=from:subject:
 mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:
 mime-version:in-reply-to:message-id;
 bh=4LLcvEVd9yKWCSL3tg8d4JMZwSKSugqxVMyF+GMOO/s=;
 b=bswMYlDSkGtgKerM7818IBcvTrRG3BS16w230YKbJRqG7v2yzqmRpetVjSQ17Fikhu1rig
 3aMPa2qaMYpj0p0bOR1J04uCXtDP2WZUTpxlD0jbh0IWT0InE6HD6D6zussXp6Ixi8QBed
 hnrvg6NFyTqjdK4KBzzrPjQS/hKPJ2nGgbu5kfUccDjSQJrfzwj42QTwbZFT870cvhIg5s
 JoZjDgh2I41Sy6UCLuS4+xzKhXfOdHrSXuddd16xblMeg579oOxi1P9HYBUdupmYaJQZz8
 +Bmh2cTeBR4w4OhJ0hY1rN/dsd2nTXeQkkitfjKNJ+IfsdgD52wEnIt9wINHrA==
To: <tglx@kernel.org>, <mingo@redhat.com>, <luto@kernel.org>,
	<peterz@infradead.org>, <paulmck@kernel.org>, <muchun.song@linux.dev>,
	<bp@alien8.de>, <dave.hansen@linux.intel.com>, <pbonzini@redhat.com>,
	<bigeasy@linutronix.de>, <clrkwllms@kernel.org>, <rostedt@goodmis.org>,
	<nadav.amit@gmail.com>
X-Lms-Return-Path: 
 <lba+26a185bd6+1db2f3+vger.kernel.org+zhouchuyi@bytedance.com>
Cc: <linux-kernel@vger.kernel.org>, "Chuyi Zhou" <zhouchuyi@bytedance.com>
Date: Thu, 28 May 2026 23:13:28 +0800
Message-Id: <20260528151338.617843-3-zhouchuyi@bytedance.com>
X-Original-From: Chuyi Zhou <zhouchuyi@bytedance.com>
In-Reply-To: <20260528151338.617843-1-zhouchuyi@bytedance.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
X-Mailer: git-send-email 2.20.1
From: "Chuyi Zhou" <zhouchuyi@bytedance.com>
Subject: [PATCH v6 02/12] smp: Enable preemption early in
 smp_call_function_single
References: <20260528151338.617843-1-zhouchuyi@bytedance.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Now smp_call_function_single() disables preemption mainly for the following
reasons:

- To protect the per-cpu csd_data from concurrent modification by other
tasks on the current CPU in the !wait case. For the wait case,
synchronization is not a concern as on-stack csd is used.

- To prevent the remote online CPU from being offlined. Specifically, we
want to ensure that no new IPIs are queued after smpcfd_dying_cpu() has
finished.

Disabling preemption for the entire execution is unnecessary, especially
csd_lock_wait() part does not require preemption protection. This patch
enables preemption before csd_lock_wait() to reduce the preemption-disabled
critical section.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/smp.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index b58975480e11..292eefadddbc 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -700,11 +700,16 @@ int smp_call_function_single(int cpu, smp_call_func_t=
 func, void *info,
=20
 	err =3D generic_exec_single(cpu, csd);
=20
+	/*
+	 * @csd is stack-allocated when @wait is true. No concurrent access
+	 * except from the IPI completion path, so we can re-enable preemption
+	 * early to reduce latency.
+	 */
+	put_cpu();
+
 	if (wait)
 		csd_lock_wait(csd);
=20
-	put_cpu();
-
 	return err;
 }
 EXPORT_SYMBOL(smp_call_function_single);
--=20
2.20.1
From nobody Mon Jun  8 14:37:30 2026
Received: from sg-3-112.ptr.tlmpb.com (sg-3-112.ptr.tlmpb.com
 [101.45.255.112])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5D9DF47A0D7
	for <linux-kernel@vger.kernel.org>; Thu, 28 May 2026 15:16:13 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=101.45.255.112
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779981374; cv=none;
 b=FgcFQOdC7FcBwJY2jrg7RdFyX9v2HdEWwsziWtNTMztUa1u8jCMqUxPJFJ9Ku9KBShKnQxZAN12z1G/BWTX80jFxbysJ7KYYcYv7RzvXtt6ko97GZC/3gTeX1Srh2jy34qQlrmma0W1JPcVErZP5IqhwBiqfUKte1Qt5/6r/czQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779981374; c=relaxed/simple;
	bh=QhaL/FGeef6j2EYhRJWkDrAqEa2aecreqAdhyjr8LRA=;
	h=From:In-Reply-To:To:Subject:Mime-Version:Date:References:Cc:
	 Message-Id:Content-Type;
 b=o2kpACP3hs51qbOPQl07LsLezPUbLjXtjH4RfPUSwPDfnBN8X27tZwGt1oeK8rfMxtRZbkx7eCtIXtk1PIHXeLpXuybHf7HfYk8zU03rNwqezklE61bRtfRPYqQSJ3XOxmQGdyuD28mRUuXjeTV8fYj39womgNd54YoChFfELpw=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;
 spf=pass smtp.mailfrom=bytedance.com;
 dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b=dj+j7Ckz; arc=none smtp.client-ip=101.45.255.112
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b="dj+j7Ckz"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 s=2212171451; d=bytedance.com; t=1779981286; h=from:subject:
 mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:
 mime-version:in-reply-to:message-id;
 bh=6T7E0M5cb73GurgSKKysehjF+KmmInkvVvGa0xR8cnY=;
 b=dj+j7Ckz7HDgj7ncWL8eCOs4Q7REK1WAnOqnINEaYnEMx+csqrp1cbc0bMjDRZAM0asYq/
 ZKFqTQMqwz4jqo9oefg1fd8vZb84KTHoDUmFaqjVg+aYJ6OXytE7fHlZJwfvNE5BfXAhuq
 hBhjeFcq1BtrxaPnwbmlMaMrncIdsSEsKix59V74URDAtz28OsOf6K+v2dvf2ZcSf4Gtes
 uK49o6kMfX90xwZHOko1LZHfB+pnsCu6mBZRNEufXkUGpTAYXpgMY0RtNg8t0s/ol8icHw
 tMLbldINWSj7EX0HpCo43xd0upXbwrVLBwosOfQpv8dcHBqSV7mAq1QRrUv56w==
From: "Chuyi Zhou" <zhouchuyi@bytedance.com>
In-Reply-To: <20260528151338.617843-1-zhouchuyi@bytedance.com>
Content-Transfer-Encoding: quoted-printable
X-Lms-Return-Path: 
 <lba+26a185be4+ca2573+vger.kernel.org+zhouchuyi@bytedance.com>
To: <tglx@kernel.org>, <mingo@redhat.com>, <luto@kernel.org>,
	<peterz@infradead.org>, <paulmck@kernel.org>, <muchun.song@linux.dev>,
	<bp@alien8.de>, <dave.hansen@linux.intel.com>, <pbonzini@redhat.com>,
	<bigeasy@linutronix.de>, <clrkwllms@kernel.org>, <rostedt@goodmis.org>,
	<nadav.amit@gmail.com>
Subject: [PATCH v6 03/12] smp: Refactor remote CPU selection in
 smp_call_function_any()
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
X-Mailer: git-send-email 2.20.1
Date: Thu, 28 May 2026 23:13:29 +0800
References: <20260528151338.617843-1-zhouchuyi@bytedance.com>
Cc: <linux-kernel@vger.kernel.org>, "Chuyi Zhou" <zhouchuyi@bytedance.com>
Message-Id: <20260528151338.617843-4-zhouchuyi@bytedance.com>
X-Original-From: Chuyi Zhou <zhouchuyi@bytedance.com>
Content-Type: text/plain; charset="utf-8"

Currently, smp_call_function_any() disables preemption across the entire
process of picking a target CPU, enqueueing the IPI, and synchronously
waiting for the remote CPU. Since smp_call_function_single() has already
been optimized to re-enable preemption before the synchronous
csd_lock_wait(), callers of smp_call_function_any() should also benefit
from this optimization to reduce the preemption-disabled critical section.

A naive approach would be to simply remove get_cpu() and put_cpu() from
smp_call_function_any(), leaving the preemption disablement entirely to
smp_call_function_single(). However, doing so opens a dangerous
preemption window between picking the remote CPU (e.g., via
sched_numa_find_nth_cpu()) and dispatching the IPI inside
smp_call_function_single(). If the selected remote CPU is fully offlined
during this window, smp_call_function_single() will fail its
cpu_online() check and return -ENXIO directly to the caller, violating
the guarantee to execute on *any* online CPU in the mask.

To safely enable this optimization, this patch refactors the logic of
smp_call_function_any() and smp_call_function_single(). By moving the
random remote CPU selection into a common __smp_call_function_single(),
and keep the entire selection and IPI dispatch process within a single
preemption-disabled region.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/smp.c | 48 ++++++++++++++++++++++++++----------------------
 1 file changed, 26 insertions(+), 22 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 292eefadddbc..9e9dab3b0d51 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -641,17 +641,8 @@ void flush_smp_call_function_queue(void)
 	local_irq_restore(flags);
 }
=20
-/**
- * smp_call_function_single - Run a function on a specific CPU
- * @cpu: Specific target CPU for this function.
- * @func: The function to run. This must be fast and non-blocking.
- * @info: An arbitrary pointer to pass to the function.
- * @wait: If true, wait until function has completed on other CPUs.
- *
- * Returns: %0 on success, else a negative status code.
- */
-int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
-			     int wait)
+static int __smp_call_function_single(int cpu, smp_call_func_t func,
+			void *info, const struct cpumask *mask, int wait)
 {
 	call_single_data_t *csd;
 	call_single_data_t csd_stack =3D {
@@ -668,6 +659,14 @@ int smp_call_function_single(int cpu, smp_call_func_t =
func, void *info,
 	 */
 	this_cpu =3D get_cpu();
=20
+	if (mask) {
+		/* Try for same CPU (cheapest) */
+		if (!cpumask_test_cpu(this_cpu, mask))
+			cpu =3D sched_numa_find_nth_cpu(mask, 0, cpu_to_node(this_cpu));
+		else
+			cpu =3D this_cpu;
+	}
+
 	/*
 	 * Can deadlock when called with interrupts disabled.
 	 * We allow cpu's that are not yet online though, as no one else can
@@ -712,6 +711,21 @@ int smp_call_function_single(int cpu, smp_call_func_t =
func, void *info,
=20
 	return err;
 }
+
+/**
+ * smp_call_function_single - Run a function on a specific CPU
+ * @cpu: Specific target CPU for this function.
+ * @func: The function to run. This must be fast and non-blocking.
+ * @info: An arbitrary pointer to pass to the function.
+ * @wait: If true, wait until function has completed on other CPUs.
+ *
+ * Returns: %0 on success, else a negative status code.
+ */
+int smp_call_function_single(int cpu, smp_call_func_t func, void *info,
+			     int wait)
+{
+	return __smp_call_function_single(cpu, func, info, NULL, wait);
+}
 EXPORT_SYMBOL(smp_call_function_single);
=20
 /**
@@ -776,17 +790,7 @@ EXPORT_SYMBOL_GPL(smp_call_function_single_async);
 int smp_call_function_any(const struct cpumask *mask,
 			  smp_call_func_t func, void *info, int wait)
 {
-	unsigned int cpu;
-	int ret;
-
-	/* Try for same CPU (cheapest) */
-	cpu =3D get_cpu();
-	if (!cpumask_test_cpu(cpu, mask))
-		cpu =3D sched_numa_find_nth_cpu(mask, 0, cpu_to_node(cpu));
-
-	ret =3D smp_call_function_single(cpu, func, info, wait);
-	put_cpu();
-	return ret;
+	return __smp_call_function_single(-1, func, info, mask, wait);
 }
 EXPORT_SYMBOL_GPL(smp_call_function_any);
=20
--=20
2.20.1
From nobody Mon Jun  8 14:37:30 2026
Received: from lf-1-130.ptr.blmpb.com (lf-1-130.ptr.blmpb.com
 [103.149.242.130])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F99136402E
	for <linux-kernel@vger.kernel.org>; Thu, 28 May 2026 15:16:38 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=103.149.242.130
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779981399; cv=none;
 b=Al7Py8lCk5nyRmbZ2iHnTFlOPP3weXpaEprM02e8qL/GGmXn4g+uXunkOwzT6hLAAIWjQz6tp/y7xHbhuviT0YulwZSqcR0FrEst9STNFTAyxO92P2EqfnRqHrVHSzEJ2Riv4OG/cD2J/LLO3MFprQijrE5mmjeVeVi9+akLKdk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779981399; c=relaxed/simple;
	bh=8IDLKsyVmOC6YUFR9JfzWVp0GcxZBf4gzx47OzUd/W8=;
	h=To:From:Subject:Mime-Version:In-Reply-To:References:Cc:
	 Content-Type:Date:Message-Id;
 b=OUn1dXnybVX7cpzGlfKFYHNomBWA8QWmiGGq9jxY+cdtE6FhfaAukxvlFWLxy4aQc6QrPDNfKs5TbXAJys9gpVNVokeyrAxgxehFjGO9TZ3G6MAj0ZyxZcUPZkP0kvWHQVZS64Rkr5+5wjGJZXzXoHLVvMZR/uCR/Lwnvge4jnU=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;
 spf=pass smtp.mailfrom=bytedance.com;
 dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b=LHouwA7d; arc=none smtp.client-ip=103.149.242.130
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b="LHouwA7d"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 s=2212171451; d=bytedance.com; t=1779981301; h=from:subject:
 mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:
 mime-version:in-reply-to:message-id;
 bh=lH0WBRkAXQvfdxbrmcG7HsUJCSdygGFDPLuiCRcvNI4=;
 b=LHouwA7djV0f0kj6DviZje+2j1IfHFY0OjDAlgmWJyw4urKxTGD0VmwiaZgN9hEBslQwTR
 07dp51hw2mA3y2zBTYO6IG8F44G9zVw1hQqTsmJpOSjvcxo3Q3+7kvD0B83aoI6ncs71gX
 zf/YrIq+pdLb7JcWY/8GGeYjdssovpr3yM5NcSGr0uKDVE3al4q0VPv/c/YGw7UDwiZSys
 6ml0QD4/B6+JYxMxEABGdMaR0TV6uu5A61QLVlM4zcWp3XFf7xuF8auUgdgKfJLea4ne78
 snRYKrODP1a9EA8tmTW4I2EOhC2aLiJy1O8dPxPpno31shH1+48cMYzEUNCofA==
To: <tglx@kernel.org>, <mingo@redhat.com>, <luto@kernel.org>,
	<peterz@infradead.org>, <paulmck@kernel.org>, <muchun.song@linux.dev>,
	<bp@alien8.de>, <dave.hansen@linux.intel.com>, <pbonzini@redhat.com>,
	<bigeasy@linutronix.de>, <clrkwllms@kernel.org>, <rostedt@goodmis.org>,
	<nadav.amit@gmail.com>
From: "Chuyi Zhou" <zhouchuyi@bytedance.com>
Subject: [PATCH v6 04/12] smp: Use task-local IPI cpumask in
 smp_call_function_many_cond()
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
In-Reply-To: <20260528151338.617843-1-zhouchuyi@bytedance.com>
References: <20260528151338.617843-1-zhouchuyi@bytedance.com>
X-Original-From: Chuyi Zhou <zhouchuyi@bytedance.com>
Content-Transfer-Encoding: quoted-printable
Cc: <linux-kernel@vger.kernel.org>, "Chuyi Zhou" <zhouchuyi@bytedance.com>
X-Lms-Return-Path: 
 <lba+26a185bf3+6100ac+vger.kernel.org+zhouchuyi@bytedance.com>
Date: Thu, 28 May 2026 23:13:30 +0800
Message-Id: <20260528151338.617843-5-zhouchuyi@bytedance.com>
X-Mailer: git-send-email 2.20.1
Content-Type: text/plain; charset="utf-8"

This patch prepares the task-local IPI cpumask during thread creation, and
uses the local cpumask to replace the percpu cfd cpumask in
smp_call_function_many_cond(). We will enable preemption during
csd_lock_wait() later, and this can prevent concurrent access to the
cfd->cpumask from other tasks on the current CPU. For cases where
cpumask_size() is smaller than or equal to the pointer size, it tries to
stash the cpumask in the pointer itself to avoid extra memory allocations.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/sched.h |  6 ++++
 include/linux/smp.h   | 15 ++++++++++
 kernel/fork.c         |  9 +++++-
 kernel/smp.c          | 66 +++++++++++++++++++++++++++++++++++++++----
 4 files changed, 89 insertions(+), 7 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 368c7b4d7cb5..bb2c53279412 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1348,6 +1348,12 @@ struct task_struct {
 	struct list_head		perf_event_list;
 	struct perf_ctx_data __rcu	*perf_ctx_data;
 #endif
+#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPTION)
+	union {
+		cpumask_t                       *ipi_mask_ptr;
+		unsigned long			ipi_mask_val;
+	};
+#endif
 #ifdef CONFIG_DEBUG_PREEMPT
 	unsigned long			preempt_disable_ip;
 #endif
diff --git a/include/linux/smp.h b/include/linux/smp.h
index 6925d15ccaa7..e05af439abe4 100644
--- a/include/linux/smp.h
+++ b/include/linux/smp.h
@@ -167,6 +167,11 @@ void smp_call_function_many(const struct cpumask *mask,
 int smp_call_function_any(const struct cpumask *mask,
 			  smp_call_func_t func, void *info, int wait);
=20
+#ifdef CONFIG_PREEMPTION
+int smp_task_ipi_mask_alloc(struct task_struct *task);
+void smp_task_ipi_mask_free(struct task_struct *task);
+#endif
+
 void kick_all_cpus_sync(void);
 void wake_up_all_idle_cpus(void);
 bool cpus_peek_for_pending_ipi(const struct cpumask *mask);
@@ -310,4 +315,14 @@ bool csd_lock_is_stuck(void);
 static inline bool csd_lock_is_stuck(void) { return false; }
 #endif
=20
+#if !defined(CONFIG_SMP) || !defined(CONFIG_PREEMPTION)
+static inline int smp_task_ipi_mask_alloc(struct task_struct *task)
+{
+	return 0;
+}
+static inline void smp_task_ipi_mask_free(struct task_struct *task)
+{
+}
+#endif
+
 #endif /* __LINUX_SMP_H */
diff --git a/kernel/fork.c b/kernel/fork.c
index 5f3fdfdb14c7..bf485c51c447 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -535,6 +535,7 @@ void free_task(struct task_struct *tsk)
 #endif
 	release_user_cpus_ptr(tsk);
 	scs_release(tsk);
+	smp_task_ipi_mask_free(tsk);
=20
 #ifndef CONFIG_THREAD_INFO_IN_TASK
 	/*
@@ -932,10 +933,14 @@ static struct task_struct *dup_task_struct(struct tas=
k_struct *orig, int node)
 #endif
 	account_kernel_stack(tsk, 1);
=20
-	err =3D scs_prepare(tsk, node);
+	err =3D smp_task_ipi_mask_alloc(tsk);
 	if (err)
 		goto free_stack;
=20
+	err =3D scs_prepare(tsk, node);
+	if (err)
+		goto free_ipi_mask;
+
 #ifdef CONFIG_SECCOMP
 	/*
 	 * We must handle setting up seccomp filters once we're under
@@ -1006,6 +1011,8 @@ static struct task_struct *dup_task_struct(struct tas=
k_struct *orig, int node)
 #endif
 	return tsk;
=20
+free_ipi_mask:
+	smp_task_ipi_mask_free(tsk);
 free_stack:
 	exit_task_stack_account(tsk);
 	free_thread_stack(tsk);
diff --git a/kernel/smp.c b/kernel/smp.c
index 9e9dab3b0d51..8f8a9ee2ad11 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -16,6 +16,7 @@
 #include <linux/init.h>
 #include <linux/interrupt.h>
 #include <linux/gfp.h>
+#include <linux/slab.h>
 #include <linux/smp.h>
 #include <linux/cpu.h>
 #include <linux/sched.h>
@@ -794,6 +795,49 @@ int smp_call_function_any(const struct cpumask *mask,
 }
 EXPORT_SYMBOL_GPL(smp_call_function_any);
=20
+static DEFINE_STATIC_KEY_FALSE(ipi_mask_inlined);
+
+#ifdef CONFIG_PREEMPTION
+
+int smp_task_ipi_mask_alloc(struct task_struct *task)
+{
+	if (static_branch_unlikely(&ipi_mask_inlined))
+		return 0;
+
+	task->ipi_mask_ptr =3D kmalloc(cpumask_size(), GFP_KERNEL);
+	if (!task->ipi_mask_ptr)
+		return -ENOMEM;
+
+	return 0;
+}
+
+void smp_task_ipi_mask_free(struct task_struct *task)
+{
+	if (static_branch_unlikely(&ipi_mask_inlined))
+		return;
+
+	kfree(task->ipi_mask_ptr);
+}
+
+static cpumask_t *smp_task_ipi_mask(struct task_struct *cur)
+{
+	/*
+	 * If cpumask_size() is smaller than or equal to the pointer
+	 * size, it stashes the cpumask in the pointer itself to
+	 * avoid extra memory allocations.
+	 */
+	if (static_branch_unlikely(&ipi_mask_inlined))
+		return (cpumask_t *)&cur->ipi_mask_val;
+
+	return cur->ipi_mask_ptr;
+}
+#else
+static cpumask_t *smp_task_ipi_mask(struct task_struct *cur)
+{
+	return NULL;
+}
+#endif
+
 /*
  * Flags to be used as scf_flags argument of smp_call_function_many_cond().
  *
@@ -811,11 +855,19 @@ static void smp_call_function_many_cond(const struct =
cpumask *mask,
 	int cpu, last_cpu, this_cpu =3D smp_processor_id();
 	struct call_function_data *cfd;
 	bool wait =3D scf_flags & SCF_WAIT;
+	struct cpumask *cpumask, *task_mask;
 	int nr_cpus =3D 0;
 	bool run_remote =3D false;
=20
 	lockdep_assert_preemption_disabled();
=20
+	task_mask =3D smp_task_ipi_mask(current);
+	cfd =3D this_cpu_ptr(&cfd_data);
+	if (task_mask)
+		cpumask =3D task_mask;
+	else
+		cpumask =3D cfd->cpumask;
+
 	/*
 	 * Can deadlock when called with interrupts disabled.
 	 * We allow cpu's that are not yet online though, as no one else can
@@ -836,16 +888,15 @@ static void smp_call_function_many_cond(const struct =
cpumask *mask,
=20
 	/* Check if we need remote execution, i.e., any CPU excluding this one. */
 	if (cpumask_any_and_but(mask, cpu_online_mask, this_cpu) < nr_cpu_ids) {
-		cfd =3D this_cpu_ptr(&cfd_data);
-		cpumask_and(cfd->cpumask, mask, cpu_online_mask);
-		__cpumask_clear_cpu(this_cpu, cfd->cpumask);
+		cpumask_and(cpumask, mask, cpu_online_mask);
+		__cpumask_clear_cpu(this_cpu, cpumask);
=20
 		cpumask_clear(cfd->cpumask_ipi);
-		for_each_cpu(cpu, cfd->cpumask) {
+		for_each_cpu(cpu, cpumask) {
 			call_single_data_t *csd =3D per_cpu_ptr(cfd->csd, cpu);
=20
 			if (cond_func && !cond_func(cpu, info)) {
-				__cpumask_clear_cpu(cpu, cfd->cpumask);
+				__cpumask_clear_cpu(cpu, cpumask);
 				continue;
 			}
=20
@@ -896,7 +947,7 @@ static void smp_call_function_many_cond(const struct cp=
umask *mask,
 	}
=20
 	if (run_remote && wait) {
-		for_each_cpu(cpu, cfd->cpumask) {
+		for_each_cpu(cpu, cpumask) {
 			call_single_data_t *csd;
=20
 			csd =3D per_cpu_ptr(cfd->csd, cpu);
@@ -1010,6 +1061,9 @@ EXPORT_SYMBOL(nr_cpu_ids);
 void __init setup_nr_cpu_ids(void)
 {
 	set_nr_cpu_ids(find_last_bit(cpumask_bits(cpu_possible_mask), NR_CPUS) + =
1);
+
+	if (IS_ENABLED(CONFIG_PREEMPTION) && cpumask_size() <=3D sizeof(unsigned =
long))
+		static_branch_enable(&ipi_mask_inlined);
 }
=20
 /* Called by boot processor to activate the rest. */
--=20
2.20.1
From nobody Mon Jun  8 14:37:30 2026
Received: from sg-3-112.ptr.tlmpb.com (sg-3-112.ptr.tlmpb.com
 [101.45.255.112])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F9544611E5
	for <linux-kernel@vger.kernel.org>; Thu, 28 May 2026 15:16:12 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=101.45.255.112
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779981374; cv=none;
 b=XiCxD+bUVsdOyR9IvummRxADCN/ZiWKeSWF6UtgYHB7lwyresSnNCJb/I0tFelKlNEB+u5rBG14FN0JNFIpNc7RuzlRjJ87qzclXXEDrFk7nu7RRHh4PI5uBK+GudZiGT+zWikD7Bcp6OFeYvGNaIunmU60wYuYmrOFWs3JCzTY=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779981374; c=relaxed/simple;
	bh=vpkcypsHSIcvRCtBfDd33g0y+gt765Uewzfg+/mYb/s=;
	h=Cc:Message-Id:References:In-Reply-To:To:From:Subject:Date:
	 Content-Type:Mime-Version;
 b=k4Ix97K7LD0/1DP88poN0qee2UWgSbbyUvl8JAU84qjF99aOvDFkaXDPAjZPnhuXXRnLON5fxwHVCQsGvPlb3uc5ea/s9NlXMwt0rVilcKH6tSyyLobd3DYBibocm4Xup6IwKc9JQtVStgcRJT4+O7opdgK+ObVoTf4/xn7vFSM=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;
 spf=pass smtp.mailfrom=bytedance.com;
 dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b=GBlMaxDy; arc=none smtp.client-ip=101.45.255.112
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b="GBlMaxDy"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 s=2212171451; d=bytedance.com; t=1779981317; h=from:subject:
 mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:
 mime-version:in-reply-to:message-id;
 bh=7oxOY6USHu1pSm9qfMcMHUd+DvmrTxhioWuFsVQiwIE=;
 b=GBlMaxDyOz92cwD4/1CGaOoSr46rVYOSTUAMTCely416vfQPPBIFqwiyFyWnay/tMIesqi
 w6gycxJBz+/2J7jLGJGxfe6Na5jHe/TntzhTKHAzNaM50pH3d3SiiJcc53w31ehIHjHF9V
 NEMXQuNomum09JB5ZDDB9Ubc+VLwCPMjMFa9T9X9QrIAdtsNITlRCeTqAJ2OLYT9Ft2VgW
 3RQxnmZLcQ1kh1gwhC7JFzlczZSu9biv4dOTwVqSeuIw+/GqzT+bu+te4Vk6JOECzOl4O0
 znbMxKmMO4nesTgqbOAriyxR9IgEzuY1Z3Emsg58jPF70wljtvwEYFRxnBvjGA==
Cc: <linux-kernel@vger.kernel.org>, "Chuyi Zhou" <zhouchuyi@bytedance.com>
Message-Id: <20260528151338.617843-6-zhouchuyi@bytedance.com>
X-Mailer: git-send-email 2.20.1
References: <20260528151338.617843-1-zhouchuyi@bytedance.com>
In-Reply-To: <20260528151338.617843-1-zhouchuyi@bytedance.com>
To: <tglx@kernel.org>, <mingo@redhat.com>, <luto@kernel.org>,
	<peterz@infradead.org>, <paulmck@kernel.org>, <muchun.song@linux.dev>,
	<bp@alien8.de>, <dave.hansen@linux.intel.com>, <pbonzini@redhat.com>,
	<bigeasy@linutronix.de>, <clrkwllms@kernel.org>, <rostedt@goodmis.org>,
	<nadav.amit@gmail.com>
From: "Chuyi Zhou" <zhouchuyi@bytedance.com>
Subject: [PATCH v6 05/12] smp: Alloc percpu csd data in smpcfd_prepare_cpu()
 only once
Date: Thu, 28 May 2026 23:13:31 +0800
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
X-Original-From: Chuyi Zhou <zhouchuyi@bytedance.com>
X-Lms-Return-Path: 
 <lba+26a185c03+6e4fbd+vger.kernel.org+zhouchuyi@bytedance.com>
Content-Type: text/plain; charset="utf-8"

Later patch would enable preemption during csd_lock_wait() in
smp_call_function_many_cond(), which may cause access cfd->csd data that
has already been freed in smpcfd_dead_cpu().

One way to fix the above issue is to use the RCU mechanism to protect the
csd data and wait for all read critical sections to exit before freeing
the memory in smpcfd_dead_cpu(), but this could delay CPU shutdown. This
patch chooses a simpler approach: allocate the percpu csd on the UP side
only once and skip freeing the csd memory in smpcfd_dead_cpu().

Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Acked-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/smp.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 8f8a9ee2ad11..9ef136bacda0 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -64,7 +64,15 @@ int smpcfd_prepare_cpu(unsigned int cpu)
 		free_cpumask_var(cfd->cpumask);
 		return -ENOMEM;
 	}
-	cfd->csd =3D alloc_percpu(call_single_data_t);
+
+	/*
+	 * The percpu csd is allocated only once and never freed.
+	 * This ensures that smp_call_function_many_cond() can safely
+	 * access the csd of an offlined CPU if it gets preempted
+	 * during csd_lock_wait().
+	 */
+	if (!cfd->csd)
+		cfd->csd =3D alloc_percpu(call_single_data_t);
 	if (!cfd->csd) {
 		free_cpumask_var(cfd->cpumask);
 		free_cpumask_var(cfd->cpumask_ipi);
@@ -80,7 +88,6 @@ int smpcfd_dead_cpu(unsigned int cpu)
=20
 	free_cpumask_var(cfd->cpumask);
 	free_cpumask_var(cfd->cpumask_ipi);
-	free_percpu(cfd->csd);
 	return 0;
 }
=20
--=20
2.20.1
From nobody Mon Jun  8 14:37:30 2026
Received: from va-1-111.ptr.blmpb.com (va-1-111.ptr.blmpb.com
 [209.127.230.111])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 06633423A78
	for <linux-kernel@vger.kernel.org>; Thu, 28 May 2026 15:15:50 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.127.230.111
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779981373; cv=none;
 b=Ms8qPU93p0GyV1AYLjcd1GWRLEOYS0tV0JoSjkHjmO36sdJmJTN4vYoGA9O1m2EQLdAF9ENtfoNZ4XAxaYkvrGFDRVmkmWrlJQJnB2AZgkHUXveuE97ZFuEFVMrJg8Ens3S4y0ZRKxtRAUaLzc06/cBluGvp0NqUzOnv1xKh4ps=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779981373; c=relaxed/simple;
	bh=gXaxnhovbP/gbBLaznl1TxK512Oy5l6o7vTpX63bQkk=;
	h=Subject:Mime-Version:To:From:In-Reply-To:Message-Id:References:
	 Content-Type:Cc:Date;
 b=IR5wPa6NZlHzsO2GII1FRPWY97bdjhzcejsPtqUTfEfTRjJ8q0Q5uh43cp21LaiVLXqMrFqRqQW+925T2ykXjG9eTGnIASkrahTCH5yz+AXnSKScG9IHiPhTqVMdeHbJ7mM5Iax3WvSeMf54SqH/pfkZMLdIz775cvQVnkeziR8=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;
 spf=pass smtp.mailfrom=bytedance.com;
 dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b=IAzEOQCU; arc=none smtp.client-ip=209.127.230.111
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b="IAzEOQCU"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 s=2212171451; d=bytedance.com; t=1779981331; h=from:subject:
 mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:
 mime-version:in-reply-to:message-id;
 bh=bS/drIP0iANOGHL2HA3EzmZUreF5yIZgpVvGie4Gm/E=;
 b=IAzEOQCUffMFsBC2+WCnRNze1JJTmEhsAm0j+DNOUAt1buY/59RQ+HUBgqOekVGs60gwTf
 3LPsk8kfD7ekKqHZObwosuUI0OqETzlnH3H8EEsPeWA9W6b5WFZNYJ1h/I+2kOVYopWkc1
 TItcRaUzk5KzX2gNo67lQx8MCs5eOqEjSXy4Pq3WjMVURwGwglS3frDo5NIZBiMpPK8Crd
 VpvihHQh+paS7eQz3RaxgLyMQCI/a26tZKwKe1Tx0Xx7xudvHEFUxMnfvGA/SR1Ll8bM0e
 7UUBaFA47pYkKNHzFC0NuZxpWtL/aerre5HBPTmZ6Ijzj/AVByiDjekAwVV6zQ==
Subject: [PATCH v6 06/12] smp: Enable preemption early in
 smp_call_function_many_cond
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
X-Lms-Return-Path: 
 <lba+26a185c11+97faa7+vger.kernel.org+zhouchuyi@bytedance.com>
To: <tglx@kernel.org>, <mingo@redhat.com>, <luto@kernel.org>,
	<peterz@infradead.org>, <paulmck@kernel.org>, <muchun.song@linux.dev>,
	<bp@alien8.de>, <dave.hansen@linux.intel.com>, <pbonzini@redhat.com>,
	<bigeasy@linutronix.de>, <clrkwllms@kernel.org>, <rostedt@goodmis.org>,
	<nadav.amit@gmail.com>
From: "Chuyi Zhou" <zhouchuyi@bytedance.com>
In-Reply-To: <20260528151338.617843-1-zhouchuyi@bytedance.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <20260528151338.617843-7-zhouchuyi@bytedance.com>
X-Original-From: Chuyi Zhou <zhouchuyi@bytedance.com>
References: <20260528151338.617843-1-zhouchuyi@bytedance.com>
Cc: <linux-kernel@vger.kernel.org>, "Chuyi Zhou" <zhouchuyi@bytedance.com>
Date: Thu, 28 May 2026 23:13:32 +0800
X-Mailer: git-send-email 2.20.1
Content-Type: text/plain; charset="utf-8"

Disabling preemption entirely during smp_call_function_many_cond() was
primarily for the following reasons:

- To prevent the remote online CPU from going offline. Specifically, we
want to ensure that no new csds are queued after smpcfd_dying_cpu() has
finished. Therefore, preemption must be disabled until all necessary IPIs
are sent.

- To prevent current CPU from going offline. Being migrated to another CPU
and calling csd_lock_wait() may cause UAF due to smpcfd_dead_cpu() during
the current CPU offline process.

- To protect the per-cpu cfd_data from concurrent modification by other
tasks on the current CPU. cfd_data contains cpumasks and per-cpu csds.
Before enqueueing a csd, we block on the csd_lock() to ensure the
previous async csd->func() has completed, and then initialize csd->func and
csd->info. After sending the IPI, we spin-wait for the remote CPU to call
csd_unlock(). Actually the csd_lock mechanism already guarantees csd
serialization. If preemption occurs during csd_lock_wait, other concurrent
smp_call_function_many_cond calls will simply block until the previous
csd->func() completes:

task A                    task B

sd->func =3D fun_a
send ipis

                preempted by B
               --------------->
                        csd_lock(csd); // block until last
                                       // fun_a finished

                        csd->func =3D func_b;
                        csd->info =3D info;
                            ...
                        send ipis

                switch back to A
                <---------------

csd_lock_wait(csd); // block until remote finish func_*

Previous patches replaced the per-cpu cfd->cpumask with task-local cpumask,
and the percpu csd is allocated only once and is never freed to ensure
we can safely access csd. Now we can enable preemption before
csd_lock_wait() which makes the potentially unpredictable csd_lock_wait()
preemptible and migratable.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/smp.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 9ef136bacda0..5cb09a84263b 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -859,15 +859,14 @@ static void smp_call_function_many_cond(const struct =
cpumask *mask,
 					unsigned int scf_flags,
 					smp_cond_func_t cond_func)
 {
-	int cpu, last_cpu, this_cpu =3D smp_processor_id();
+	int cpu, last_cpu, this_cpu;
 	struct call_function_data *cfd;
 	bool wait =3D scf_flags & SCF_WAIT;
 	struct cpumask *cpumask, *task_mask;
 	int nr_cpus =3D 0;
 	bool run_remote =3D false;
=20
-	lockdep_assert_preemption_disabled();
-
+	this_cpu =3D get_cpu();
 	task_mask =3D smp_task_ipi_mask(current);
 	cfd =3D this_cpu_ptr(&cfd_data);
 	if (task_mask)
@@ -953,6 +952,17 @@ static void smp_call_function_many_cond(const struct c=
pumask *mask,
 		local_irq_restore(flags);
 	}
=20
+	/*
+	 * Waiting for completion can take time especially with many CPUs.
+	 * On a PREEMPTIBLE kernel a per-task cpumask is used to track CPUs
+	 * with pending IPI request. This allows to enable preemption and
+	 * potentially wait while allowing task preemption. On a !PREEMPTIBLE
+	 * the cpumask is shared and the call must block until completion to
+	 * avoid modifications by a another caller on this CPU.
+	 */
+	if (task_mask)
+		put_cpu();
+
 	if (run_remote && wait) {
 		for_each_cpu(cpu, cpumask) {
 			call_single_data_t *csd;
@@ -961,6 +971,9 @@ static void smp_call_function_many_cond(const struct cp=
umask *mask,
 			csd_lock_wait(csd);
 		}
 	}
+
+	if (!task_mask)
+		put_cpu();
 }
=20
 /**
@@ -972,8 +985,7 @@ static void smp_call_function_many_cond(const struct cp=
umask *mask,
  *        on other CPUs.
  *
  * You must not call this function with disabled interrupts or from a
- * hardware interrupt handler or from a bottom half handler. Preemption
- * must be disabled when calling this function.
+ * hardware interrupt handler or from a bottom half handler.
  *
  * @func is not called on the local CPU even if @mask contains it.  Consid=
er
  * using on_each_cpu_cond_mask() instead if this is not desirable.
--=20
2.20.1
From nobody Mon Jun  8 14:37:30 2026
Received: from va-1-114.ptr.blmpb.com (va-1-114.ptr.blmpb.com
 [209.127.230.114])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1EDF477E21
	for <linux-kernel@vger.kernel.org>; Thu, 28 May 2026 15:16:10 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.127.230.114
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779981373; cv=none;
 b=EQXTSTS4PPeR30OPrOatSQd3o7bIYa18PC+FrxMifd9j6KgB+Y49bgx/1H61MuadqeYoDKtJjWgeYpJGdeqnLQDYQy9U+5sjXdww6bR4bZceoef9PuMyup40eH4CTmZuLwT1a7cWybtwH1EN/eDdxL6N+4rnj6tM/CsQ31iX3Z4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779981373; c=relaxed/simple;
	bh=3FormufkreRzBt9rfpQ/5IQdz2yke384VVNs8x7gvqo=;
	h=Content-Type:To:Date:From:Mime-Version:Message-Id:Cc:Subject:
	 In-Reply-To:References;
 b=ps8xeOzkF5BOfNoYDgXOCazVny39gSzLahcEm7jLT28NoeQBkKw19ql3Fwq01mZklJO7TbcJbGavxuBuYZWxX6n95AJbPuPwrT58K+Hfa4h8iVsICsPg5P8ktRciu+9G06OxU2iMO6LDvd24boOJ0XX088ZR+VZK2lbVAw1y11A=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;
 spf=pass smtp.mailfrom=bytedance.com;
 dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b=bfKl7c4g; arc=none smtp.client-ip=209.127.230.114
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b="bfKl7c4g"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 s=2212171451; d=bytedance.com; t=1779981346; h=from:subject:
 mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:
 mime-version:in-reply-to:message-id;
 bh=Ycjngm24Kw8s9p6DxS1rLUtGA5O3ZyKKVHz1ZuAFrkw=;
 b=bfKl7c4gU/NVJU8Rta2gKp5G5imaXYl/n3tjqYFkIte7Ra5K/IdJFmbMvU7TLGUcYCM4TU
 Pd++AP4OEv5/kqsc3zy+2zcr7Up6TQIcn9J1TpYHYaB/kvLrzHgkBpTiiB0nm1Wt6FC/x6
 rTLQkZOFuI6hkcw7sKuwzu/ZeF9cEeXZw2jg8AcxOcTU98ZPLSlBRn/SffPQFlaAQMsmo1
 w6MNzA7obCQlPHyDV2LZakzgofDF/ZAzykk1MrUwpATif0HvAfCMgRapClGMQDChg7Z1u4
 7Eh2vF9FZ/7bP9lwT+VV0aqiz4hHV7rQkVp7SJdcvWx9T3ZtunqVu6IwQ9W5KQ==
Content-Transfer-Encoding: quoted-printable
To: <tglx@kernel.org>, <mingo@redhat.com>, <luto@kernel.org>,
	<peterz@infradead.org>, <paulmck@kernel.org>, <muchun.song@linux.dev>,
	<bp@alien8.de>, <dave.hansen@linux.intel.com>, <pbonzini@redhat.com>,
	<bigeasy@linutronix.de>, <clrkwllms@kernel.org>, <rostedt@goodmis.org>,
	<nadav.amit@gmail.com>
Date: Thu, 28 May 2026 23:13:33 +0800
X-Original-From: Chuyi Zhou <zhouchuyi@bytedance.com>
X-Mailer: git-send-email 2.20.1
From: "Chuyi Zhou" <zhouchuyi@bytedance.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
Message-Id: <20260528151338.617843-8-zhouchuyi@bytedance.com>
X-Lms-Return-Path: 
 <lba+26a185c20+689412+vger.kernel.org+zhouchuyi@bytedance.com>
Cc: <linux-kernel@vger.kernel.org>, "Chuyi Zhou" <zhouchuyi@bytedance.com>
Subject: [PATCH v6 07/12] smp: Remove preempt_disable from smp_call_function
In-Reply-To: <20260528151338.617843-1-zhouchuyi@bytedance.com>
References: <20260528151338.617843-1-zhouchuyi@bytedance.com>
Content-Type: text/plain; charset="utf-8"

Now smp_call_function_many_cond() internally handles the preemption logic,
so smp_call_function() does not need to explicitly disable preemption.
Remove preempt_{enable, disable} from smp_call_function().

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/smp.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index 5cb09a84263b..b1061fbdaa68 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -1012,9 +1012,8 @@ EXPORT_SYMBOL(smp_call_function_many);
  */
 void smp_call_function(smp_call_func_t func, void *info, int wait)
 {
-	preempt_disable();
-	smp_call_function_many(cpu_online_mask, func, info, wait);
-	preempt_enable();
+	smp_call_function_many_cond(cpu_online_mask, func, info,
+			wait ? SCF_WAIT : 0, NULL);
 }
 EXPORT_SYMBOL(smp_call_function);
=20
--=20
2.20.1
From nobody Mon Jun  8 14:37:30 2026
Received: from va-1-115.ptr.blmpb.com (va-1-115.ptr.blmpb.com
 [209.127.230.115])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C94EB3F870F
	for <linux-kernel@vger.kernel.org>; Thu, 28 May 2026 15:16:11 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.127.230.115
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779981373; cv=none;
 b=IqeqthAxnJZds7xGfZDDTbAuXtEulOjEcXF8Guba1QmXPFLycUDxmogIxFbCMJoxjmWWEQaCPITc43qa/bd7egEKInbwP4DgKXnOytskWq6snJcqTk47+AiGS+IDsxOzaBmTIGypc3m187niKH4yAjCnx44RJZzhWMKD0FBSqGk=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779981373; c=relaxed/simple;
	bh=AYFPEvBIykzbwhzJ3OJnCwFmx2+vEyzxPYsY4a2KPYI=;
	h=Subject:Mime-Version:Content-Type:Message-Id:References:Cc:
	 In-Reply-To:To:From:Date;
 b=BjCNdXAnN+B8c2MTq1aKeP912Wb1zeyCEI9uzr0/7Npfcv2eonf/1Yu06mIix4Ev2t4wP+2LZXKtzrD6iYJsZ0Hv31pbk/zRmplDPsuRTrFlv7qoGvBg9qWqCOfRURXcO4wRvXw/MMRenwVXiYSIEEg2j0EBBWP8wO1aykdQnPo=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;
 spf=pass smtp.mailfrom=bytedance.com;
 dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b=D1osNq28; arc=none smtp.client-ip=209.127.230.115
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b="D1osNq28"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 s=2212171451; d=bytedance.com; t=1779981360; h=from:subject:
 mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:
 mime-version:in-reply-to:message-id;
 bh=U6EgKnufr/+zR9wd+nn6Pp1fy8iN3z/oo3eOiGyiHCY=;
 b=D1osNq28ALpr5JUtazlbOjnfhHArgzTcB2lf0agjlWONZpLymUiZVDtnXsfkrLlpKRqSdg
 is15mBeKmcK+ofrmMW2cHNvozlZxJgZkdSIIW0ddU1Bsqp4dx+6JodCk3WNboFevLreoyD
 QfkFrmsFGO5zoPxtQigYnBmicOfEO7YWUZhGJi0HVjGacq+u2MNMlZ8mN5Um5zjS+VBRgx
 /soEU90buGMz2SL27t8IddER+alIwfctEb53iBQESyxKZ91huurg29OqPpEZJ3fS7jA3eQ
 2VyXSb+jUrDPmwJw6Cu7apZnRfm5fXdCRuTR4clfGC9k7s23JKXKYeYxiZtbmQ==
Subject: [PATCH v6 08/12] smp: Remove preempt_disable from
 on_each_cpu_cond_mask
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-Lms-Return-Path: 
 <lba+26a185c2e+2a4740+vger.kernel.org+zhouchuyi@bytedance.com>
Message-Id: <20260528151338.617843-9-zhouchuyi@bytedance.com>
References: <20260528151338.617843-1-zhouchuyi@bytedance.com>
Cc: <linux-kernel@vger.kernel.org>, "Chuyi Zhou" <zhouchuyi@bytedance.com>
In-Reply-To: <20260528151338.617843-1-zhouchuyi@bytedance.com>
X-Mailer: git-send-email 2.20.1
To: <tglx@kernel.org>, <mingo@redhat.com>, <luto@kernel.org>,
	<peterz@infradead.org>, <paulmck@kernel.org>, <muchun.song@linux.dev>,
	<bp@alien8.de>, <dave.hansen@linux.intel.com>, <pbonzini@redhat.com>,
	<bigeasy@linutronix.de>, <clrkwllms@kernel.org>, <rostedt@goodmis.org>,
	<nadav.amit@gmail.com>
From: "Chuyi Zhou" <zhouchuyi@bytedance.com>
Date: Thu, 28 May 2026 23:13:34 +0800
X-Original-From: Chuyi Zhou <zhouchuyi@bytedance.com>
Content-Type: text/plain; charset="utf-8"

Now smp_call_function_many_cond() internally handles the preemption logic,
so on_each_cpu_cond_mask does not need to explicitly disable preemption.
Remove preempt_{enable, disable} from on_each_cpu_cond_mask().

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/smp.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/smp.c b/kernel/smp.c
index b1061fbdaa68..15799f842746 100644
--- a/kernel/smp.c
+++ b/kernel/smp.c
@@ -1136,9 +1136,7 @@ void on_each_cpu_cond_mask(smp_cond_func_t cond_func,=
 smp_call_func_t func,
 	if (wait)
 		scf_flags |=3D SCF_WAIT;
=20
-	preempt_disable();
 	smp_call_function_many_cond(mask, func, info, scf_flags, cond_func);
-	preempt_enable();
 }
 EXPORT_SYMBOL(on_each_cpu_cond_mask);
=20
--=20
2.20.1
From nobody Mon Jun  8 14:37:30 2026
Received: from va-1-115.ptr.blmpb.com (va-1-115.ptr.blmpb.com
 [209.127.230.115])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9DA4840B6F6
	for <linux-kernel@vger.kernel.org>; Thu, 28 May 2026 15:16:20 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.127.230.115
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779981381; cv=none;
 b=rrBkvM4DvVaUI5+xsBCijMh5FFxymZeynESfTbMD4CjAgynUZT6fvnubd9C2/a62WqE5uIfEpOFKjYqQeFM653uYtwYRnBG1ex73Tv34lfMI7/R1gXgbULarw5G8tjXOS1ewATexMexU5DG8ob3ExDdQE+LsrBJP9BFWvJFR9sI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779981381; c=relaxed/simple;
	bh=I1ltThY+aPa8ZNuOpCKwpOFm/VO6arthfFMSoTknGR0=;
	h=Date:Message-Id:References:From:In-Reply-To:To:Subject:
	 Mime-Version:Content-Type:Cc;
 b=Wvf3pk/LO8CBhJPy3hkPBquGAkCrkDWtlW9EsefsgMWznnQoBUeQ0WnzrfqrWUPVLGaM3BMLqdesEs1IEdBTD3vCI4jq3mSDsKQqeDEnSpvQSmZJp8u5pAiwAQ6LMQ6S2z/qNMnzxEvs7PWjOlDXINYYlcckc420hPGkBam666w=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;
 spf=pass smtp.mailfrom=bytedance.com;
 dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b=ScleJbRA; arc=none smtp.client-ip=209.127.230.115
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b="ScleJbRA"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 s=2212171451; d=bytedance.com; t=1779981375; h=from:subject:
 mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:
 mime-version:in-reply-to:message-id;
 bh=lCcRfnZZZMgaTlJ+O3Tck/iW8npayiQmq8UDVOlbmqs=;
 b=ScleJbRAtClxXKwGq3cz/8Ej/CF9yHli5e4naiA037oXYJErER/Od58zwBKNJplMehNzBf
 9bA6pMlyE1u0NXvJS/UVTVwxEBUBh1ssHlmtFolbnE9FzLHs/1GF3OaAYM1nhrDVOHTGmO
 vNPBptYLgtosK9dJtYNPqnV4V4vccToJebqZeUeT67U2nIMQ1L4xDTSCFXmKtiFQVfdrgU
 eM7KlfqwPxSt6V+J1US+rE4fAbd5B2xoduxDIkHgiYgrLSQVn+4Glrs08XZA/wboHzHby8
 JEldH2DuJvt+SZ0qt+tp0ToLM0x45m9nuCP8VGLMc2aHFXlhBjehqvuZ2TD0Qw==
Date: Thu, 28 May 2026 23:13:35 +0800
Message-Id: <20260528151338.617843-10-zhouchuyi@bytedance.com>
X-Mailer: git-send-email 2.20.1
References: <20260528151338.617843-1-zhouchuyi@bytedance.com>
From: "Chuyi Zhou" <zhouchuyi@bytedance.com>
In-Reply-To: <20260528151338.617843-1-zhouchuyi@bytedance.com>
To: <tglx@kernel.org>, <mingo@redhat.com>, <luto@kernel.org>,
	<peterz@infradead.org>, <paulmck@kernel.org>, <muchun.song@linux.dev>,
	<bp@alien8.de>, <dave.hansen@linux.intel.com>, <pbonzini@redhat.com>,
	<bigeasy@linutronix.de>, <clrkwllms@kernel.org>, <rostedt@goodmis.org>,
	<nadav.amit@gmail.com>
Subject: [PATCH v6 09/12] scftorture: Remove preempt_disable in
 scftorture_invoke_one
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
X-Original-From: Chuyi Zhou <zhouchuyi@bytedance.com>
Content-Transfer-Encoding: quoted-printable
Cc: <linux-kernel@vger.kernel.org>, "Chuyi Zhou" <zhouchuyi@bytedance.com>
X-Lms-Return-Path: 
 <lba+26a185c3d+761658+vger.kernel.org+zhouchuyi@bytedance.com>
Content-Type: text/plain; charset="utf-8"

Previous patches make smp_call*() functions handle preemption logic
internally. Thus, the explicit preempt_disable() surrounding these calls
becomes unnecessary. Furthermore, keeping the external preempt_disable()
would prevent scftorture from exercising the newly narrowed internal
preemption-disabled regions during IPI dispatch. This patch removes
the preempt_{enable, disable} pairs in scftorture_invoke_one().

Removing this preemption protection could expose a race condition with
CPU hotplug when use_cpus_read_lock is false. Specifically, for
multi-cast operations (SCF_PRIM_MANY or SCF_PRIM_ALL), if only 1 CPU is
online, smp_call_function_many() correctly skips sending IPIs and leaves
scfc_out as false. Without preemption disabled, a CPU hotplug thread
could preempt the test thread, bring a second CPU online, and increment
num_online_cpus(). When the test thread resumes, the validation check
would see num_online_cpus() > 1 and falsely trigger the memory-ordering
warning, leaking the scfcp structure.

To avoid this potential false positive, restrict the num_online_cpus() > 1
condition to only apply when use_cpus_read_lock is true, ensuring the CPU
count remains stable during evaluation.

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
---
 kernel/scftorture.c | 13 ++++---------
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/kernel/scftorture.c b/kernel/scftorture.c
index 327c315f411c..2082f9b44370 100644
--- a/kernel/scftorture.c
+++ b/kernel/scftorture.c
@@ -348,6 +348,8 @@ static void scftorture_invoke_one(struct scf_statistics=
 *scfp, struct torture_ra
 	int ret =3D 0;
 	struct scf_check *scfcp =3D NULL;
 	struct scf_selector *scfsp =3D scf_sel_rand(trsp);
+	bool is_single =3D (scfsp->scfs_prim =3D=3D SCF_PRIM_SINGLE ||
+			  scfsp->scfs_prim =3D=3D SCF_PRIM_SINGLE_RPC);
=20
 	if (scfsp->scfs_prim =3D=3D SCF_PRIM_SINGLE || scfsp->scfs_wait) {
 		scfcp =3D kmalloc_obj(*scfcp, GFP_ATOMIC);
@@ -364,8 +366,6 @@ static void scftorture_invoke_one(struct scf_statistics=
 *scfp, struct torture_ra
 	}
 	if (use_cpus_read_lock)
 		cpus_read_lock();
-	else
-		preempt_disable();
 	switch (scfsp->scfs_prim) {
 	case SCF_PRIM_RESCHED:
 		if (IS_BUILTIN(CONFIG_SCF_TORTURE_TEST)) {
@@ -411,13 +411,10 @@ static void scftorture_invoke_one(struct scf_statisti=
cs *scfp, struct torture_ra
 		if (!ret) {
 			if (use_cpus_read_lock)
 				cpus_read_unlock();
-			else
-				preempt_enable();
+
 			wait_for_completion(&scfcp->scfc_completion);
 			if (use_cpus_read_lock)
 				cpus_read_lock();
-			else
-				preempt_disable();
 		} else {
 			scfp->n_single_rpc_ofl++;
 			scf_add_to_free_list(scfcp);
@@ -452,7 +449,7 @@ static void scftorture_invoke_one(struct scf_statistics=
 *scfp, struct torture_ra
 			scfcp->scfc_out =3D true;
 	}
 	if (scfcp && scfsp->scfs_wait) {
-		if (WARN_ON_ONCE((num_online_cpus() > 1 || scfsp->scfs_prim =3D=3D SCF_P=
RIM_SINGLE) &&
+		if (WARN_ON_ONCE(((use_cpus_read_lock && num_online_cpus() > 1) || is_si=
ngle) &&
 				 !scfcp->scfc_out)) {
 			pr_warn("%s: Memory-ordering failure, scfs_prim: %d.\n", __func__, scfs=
p->scfs_prim);
 			atomic_inc(&n_mb_out_errs); // Leak rather than trash!
@@ -463,8 +460,6 @@ static void scftorture_invoke_one(struct scf_statistics=
 *scfp, struct torture_ra
 	}
 	if (use_cpus_read_lock)
 		cpus_read_unlock();
-	else
-		preempt_enable();
 	if (allocfail)
 		schedule_timeout_idle((1 + longwait) * HZ);  // Let no-wait handlers com=
plete.
 	else if (!(torture_random(trsp) & 0xfff))
--=20
2.20.1
From nobody Mon Jun  8 14:37:30 2026
Received: from va-1-112.ptr.blmpb.com (va-1-112.ptr.blmpb.com
 [209.127.230.112])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF9153FC5D1
	for <linux-kernel@vger.kernel.org>; Thu, 28 May 2026 15:16:39 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.127.230.112
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779981400; cv=none;
 b=HipBb8h00rBkLB/nsY+KPKmLEH36x+dsBDjQwcPVxqcMiqz8rsk3A38yvuwC75/PKfPOGOw8AlltCBwsa0+nEUJskM/+inWiw2LAJStToJFG1+x1qDB9i6ZqkwSehha6FDs2A5qW53Qgnw328nRSNtsrhs8PUtpzl/FdISYJurQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779981400; c=relaxed/simple;
	bh=ftMmcg8vE8LeBORsRJT823ryJDq+366twL61u5l5QNU=;
	h=Message-Id:Content-Type:To:In-Reply-To:References:Mime-Version:
	 From:Cc:Date:Subject;
 b=FJQ5CnOSEz1O9roNLN4FNuBzFKORo54uPG/T5BDupI6H6+DbM7vqM5Ud10g7mN+4ZRlw3H0n36GkIjG/BXqwN1Hw0nazM65tXfP7LkHC1a6+0AJWbK4qMYq7+uxs5RTPMfI2Zy6WpI5Z2MdjJAN9p7Rer2GQcOZ2DYJs59mIRcs=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;
 spf=pass smtp.mailfrom=bytedance.com;
 dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b=mRzjoKfF; arc=none smtp.client-ip=209.127.230.112
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b="mRzjoKfF"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 s=2212171451; d=bytedance.com; t=1779981390; h=from:subject:
 mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:
 mime-version:in-reply-to:message-id;
 bh=bFny3VpSKSnYZxOWglzQi6ijrbdJMX6ymcMK3NdOjqs=;
 b=mRzjoKfFgWG0aKIOOI+fZYDUDdgLLYSReTTP41Q08fbaGXKdJlaPLI9UZUBRBpep62pK/7
 QULJKdi8+KAzWJ8TPiGCzPDpEFieYF33ysucWqgXgBn67oaNaCiflymuqiBtcilUgL2Pm9
 juyl/fpNFKasogP0l/gYd7g+TfzaNaaEeEL434rN/dgOlCZQzIctQ9M6q8mbQlpy5s58Di
 4v4AYsrUFNwCKndoVqboSA8+jin1K8KqboJunnHIoDTzP/seRxWZcf3RWV4cBva50FyLYP
 uBxgn2mCYAPcCSNC+K8h4nD6ab5wQaP9BFBkp/QbjwQsmNODRFfNF9m9i8yhLA==
Message-Id: <20260528151338.617843-11-zhouchuyi@bytedance.com>
X-Lms-Return-Path: 
 <lba+26a185c4c+0f08f7+vger.kernel.org+zhouchuyi@bytedance.com>
X-Original-From: Chuyi Zhou <zhouchuyi@bytedance.com>
To: <tglx@kernel.org>, <mingo@redhat.com>, <luto@kernel.org>,
	<peterz@infradead.org>, <paulmck@kernel.org>, <muchun.song@linux.dev>,
	<bp@alien8.de>, <dave.hansen@linux.intel.com>, <pbonzini@redhat.com>,
	<bigeasy@linutronix.de>, <clrkwllms@kernel.org>, <rostedt@goodmis.org>,
	<nadav.amit@gmail.com>
Content-Transfer-Encoding: quoted-printable
In-Reply-To: <20260528151338.617843-1-zhouchuyi@bytedance.com>
References: <20260528151338.617843-1-zhouchuyi@bytedance.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
From: "Chuyi Zhou" <zhouchuyi@bytedance.com>
Cc: <linux-kernel@vger.kernel.org>, "Chuyi Zhou" <zhouchuyi@bytedance.com>
Date: Thu, 28 May 2026 23:13:36 +0800
X-Mailer: git-send-email 2.20.1
Subject: [PATCH v6 10/12] x86/mm: Move flush_tlb_info back to the stack
Content-Type: text/plain; charset="utf-8"

Commit 3db6d5a5ecaf ("x86/mm/tlb: Remove 'struct flush_tlb_info' from the
stack") converted flush_tlb_info from stack variable to per-CPU variable.
This brought about a performance improvement of around 3% in extreme test.
However, it also required that all flush_tlb* operations keep preemption
disabled entirely to prevent concurrent modifications of flush_tlb_info.
flush_tlb* needs to send IPIs to remote CPUs and synchronously wait for
all remote CPUs to complete their local TLB flushes. The process could
take tens of milliseconds when interrupts are disabled or with a large
number of remote CPUs.

From the perspective of improving kernel real-time performance, this patch
reverts flush_tlb_info back to stack variables and align it with
SMP_CACHE_BYTES. In certain configurations, SMP_CACHE_BYTES may be large,
so the alignment size is limited to 64.  This is a preparation for enabling
preemption during TLB flush in next patch.

To evaluate the performance impact of this patch, use the following script
to reproduce the microbenchmark mentioned in commit 3db6d5a5ecaf
("x86/mm/tlb: Remove 'struct flush_tlb_info' from the stack"). The test
environment is an Ice Lake system (Intel(R) Xeon(R) Platinum 8336C) with
128 CPUs and 2 NUMA nodes. During the test, the threads were bound to
specific CPUs, and both pti and mitigations were disabled:

    #include <stdio.h>
    #include <stdlib.h>
    #include <pthread.h>
    #include <sys/mman.h>
    #include <sys/time.h>
    #include <unistd.h>

    #define NUM_OPS 1000000
    #define NUM_THREADS 3
    #define NUM_RUNS 5
    #define PAGE_SIZE 4096

    volatile int stop_threads =3D 0;

    void *busy_wait_thread(void *arg) {
        while (!stop_threads) {
            __asm__ volatile ("nop");
        }
        return NULL;
    }

    long long get_usec() {
        struct timeval tv;
        gettimeofday(&tv, NULL);
        return tv.tv_sec * 1000000LL + tv.tv_usec;
    }

    int main() {
        pthread_t threads[NUM_THREADS];
        char *addr;
        int i, r;
        addr =3D mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE
		| MAP_ANONYMOUS, -1, 0);

        if (addr =3D=3D MAP_FAILED) {
            perror("mmap");
            exit(1);
        }

        for (i =3D 0; i < NUM_THREADS; i++) {
            if (pthread_create(&threads[i], NULL, busy_wait_thread, NULL))
                exit(1);
        }

        printf("Running benchmark: %d runs, %d ops each, %d background\n"
               "threads\n", NUM_RUNS, NUM_OPS, NUM_THREADS);

        for (r =3D 0; r < NUM_RUNS; r++) {
            long long start, end;
            start =3D get_usec();
            for (i =3D 0; i < NUM_OPS; i++) {
                addr[0] =3D 1;
                if (madvise(addr, PAGE_SIZE, MADV_DONTNEED)) {
                    perror("madvise");
                    exit(1);
                }
            }
            end =3D get_usec();
            double duration =3D (double)(end - start);
            double avg_lat =3D duration / NUM_OPS;
            printf("Run %d: Total time %.2f us, Avg latency %.4f us/op\n",
            r + 1, duration, avg_lat);
        }
        stop_threads =3D 1;
        for (i =3D 0; i < NUM_THREADS; i++)
            pthread_join(threads[i], NULL);
        munmap(addr, PAGE_SIZE);
        return 0;
    }

                   base   on-stack-aligned  on-stack-not-aligned
                   ----       ---------      -----------
avg (usec/op)     2.5278       2.5261         2.5508
stddev            0.0007       0.0027         0.0023

The benchmark results show that the average latency difference between the
baseline (base) and the properly aligned stack variable (on-stack-aligned)
is within the standard deviation (stddev). This indicates that the
variations are caused by testing noise, and reverting to a stack variable
with proper alignment causes no performance regression compared to the
per-CPU implementation. The unaligned version (on-stack-not-aligned) shows
a minor performance drop. This demonstrates that we can improve the
real-time performance without sacrificing performance.

Suggested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Suggested-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/x86/include/asm/tlbflush.h |  8 +++-
 arch/x86/mm/tlb.c               | 72 +++++++++------------------------
 2 files changed, 27 insertions(+), 53 deletions(-)

diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflus=
h.h
index 0545fe75c3fa..f4e4505d4ece 100644
--- a/arch/x86/include/asm/tlbflush.h
+++ b/arch/x86/include/asm/tlbflush.h
@@ -211,6 +211,12 @@ extern u16 invlpgb_count_max;
=20
 extern void initialize_tlbstate_and_flush(void);
=20
+#if SMP_CACHE_BYTES > 64
+#define FLUSH_TLB_INFO_ALIGN 64
+#else
+#define FLUSH_TLB_INFO_ALIGN SMP_CACHE_BYTES
+#endif
+
 /*
  * TLB flushing:
  *
@@ -249,7 +255,7 @@ struct flush_tlb_info {
 	u8			stride_shift;
 	u8			freed_tables;
 	u8			trim_cpumask;
-};
+} __aligned(FLUSH_TLB_INFO_ALIGN);
=20
 void flush_tlb_local(void);
 void flush_tlb_one_user(unsigned long addr);
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index af43d177087e..cfc3a72477f5 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1373,28 +1373,12 @@ void flush_tlb_multi(const struct cpumask *cpumask,
  */
 unsigned long tlb_single_page_flush_ceiling __read_mostly =3D 33;
=20
-static DEFINE_PER_CPU_SHARED_ALIGNED(struct flush_tlb_info, flush_tlb_info=
);
-
-#ifdef CONFIG_DEBUG_VM
-static DEFINE_PER_CPU(unsigned int, flush_tlb_info_idx);
-#endif
-
-static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm,
-			unsigned long start, unsigned long end,
-			unsigned int stride_shift, bool freed_tables,
-			u64 new_tlb_gen)
+static void get_flush_tlb_info(struct flush_tlb_info *info,
+			       struct mm_struct *mm,
+			       unsigned long start, unsigned long end,
+			       unsigned int stride_shift, bool freed_tables,
+			       u64 new_tlb_gen)
 {
-	struct flush_tlb_info *info =3D this_cpu_ptr(&flush_tlb_info);
-
-#ifdef CONFIG_DEBUG_VM
-	/*
-	 * Ensure that the following code is non-reentrant and flush_tlb_info
-	 * is not overwritten. This means no TLB flushing is initiated by
-	 * interrupt handlers and machine-check exception handlers.
-	 */
-	BUG_ON(this_cpu_inc_return(flush_tlb_info_idx) !=3D 1);
-#endif
-
 	/*
 	 * If the number of flushes is so large that a full flush
 	 * would be faster, do a full flush.
@@ -1412,32 +1396,22 @@ static struct flush_tlb_info *get_flush_tlb_info(st=
ruct mm_struct *mm,
 	info->new_tlb_gen	=3D new_tlb_gen;
 	info->initiating_cpu	=3D smp_processor_id();
 	info->trim_cpumask	=3D 0;
-
-	return info;
-}
-
-static void put_flush_tlb_info(void)
-{
-#ifdef CONFIG_DEBUG_VM
-	/* Complete reentrancy prevention checks */
-	barrier();
-	this_cpu_dec(flush_tlb_info_idx);
-#endif
 }
=20
 void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start,
 				unsigned long end, unsigned int stride_shift,
 				bool freed_tables)
 {
-	struct flush_tlb_info *info;
+	struct flush_tlb_info _info;
+	struct flush_tlb_info *info =3D &_info;
 	int cpu =3D get_cpu();
 	u64 new_tlb_gen;
=20
 	/* This is also a barrier that synchronizes with switch_mm(). */
 	new_tlb_gen =3D inc_mm_tlb_gen(mm);
=20
-	info =3D get_flush_tlb_info(mm, start, end, stride_shift, freed_tables,
-				  new_tlb_gen);
+	get_flush_tlb_info(&_info, mm, start, end, stride_shift, freed_tables,
+			   new_tlb_gen);
=20
 	/*
 	 * flush_tlb_multi() is not optimized for the common case in which only
@@ -1457,7 +1431,6 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigne=
d long start,
 		local_irq_enable();
 	}
=20
-	put_flush_tlb_info();
 	put_cpu();
 	mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end);
 }
@@ -1527,19 +1500,16 @@ static void kernel_tlb_flush_range(struct flush_tlb=
_info *info)
=20
 void flush_tlb_kernel_range(unsigned long start, unsigned long end)
 {
-	struct flush_tlb_info *info;
+	struct flush_tlb_info info;
=20
 	guard(preempt)();
+	get_flush_tlb_info(&info, NULL, start, end, PAGE_SHIFT, false,
+			   TLB_GENERATION_INVALID);
=20
-	info =3D get_flush_tlb_info(NULL, start, end, PAGE_SHIFT, false,
-				  TLB_GENERATION_INVALID);
-
-	if (info->end =3D=3D TLB_FLUSH_ALL)
-		kernel_tlb_flush_all(info);
+	if (info.end =3D=3D TLB_FLUSH_ALL)
+		kernel_tlb_flush_all(&info);
 	else
-		kernel_tlb_flush_range(info);
-
-	put_flush_tlb_info();
+		kernel_tlb_flush_range(&info);
 }
=20
 /*
@@ -1707,12 +1677,11 @@ EXPORT_SYMBOL_FOR_KVM(__flush_tlb_all);
=20
 void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch)
 {
-	struct flush_tlb_info *info;
+	struct flush_tlb_info info;
=20
 	int cpu =3D get_cpu();
-
-	info =3D get_flush_tlb_info(NULL, 0, TLB_FLUSH_ALL, 0, false,
-				  TLB_GENERATION_INVALID);
+	get_flush_tlb_info(&info, NULL, 0, TLB_FLUSH_ALL, 0, false,
+			   TLB_GENERATION_INVALID);
 	/*
 	 * flush_tlb_multi() is not optimized for the common case in which only
 	 * a local TLB flush is needed. Optimize this use-case by calling
@@ -1722,17 +1691,16 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap=
_batch *batch)
 		invlpgb_flush_all_nonglobals();
 		batch->unmapped_pages =3D false;
 	} else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
-		flush_tlb_multi(&batch->cpumask, info);
+		flush_tlb_multi(&batch->cpumask, &info);
 	} else if (cpumask_test_cpu(cpu, &batch->cpumask)) {
 		lockdep_assert_irqs_enabled();
 		local_irq_disable();
-		flush_tlb_func(info);
+		flush_tlb_func(&info);
 		local_irq_enable();
 	}
=20
 	cpumask_clear(&batch->cpumask);
=20
-	put_flush_tlb_info();
 	put_cpu();
 }
=20
--=20
2.20.1
From nobody Mon Jun  8 14:37:30 2026
Received: from va-1-113.ptr.blmpb.com (va-1-113.ptr.blmpb.com
 [209.127.230.113])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E58342EC0B0
	for <linux-kernel@vger.kernel.org>; Thu, 28 May 2026 15:16:48 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.127.230.113
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779981409; cv=none;
 b=V+zgMQ6Cuq2nmLjDFmVPZjDD9tyyEjmZRdSsREEyx0kEOZ+b408g6hE6LWk5C51bwScCECk6hL5JplzyxfY0cjkayY5SGoUvCD8ZGsEOBiqOmDjw5YTLhj33P8zmNbB6FE/GoZ3OjW0tkYOPLGcBpkN8K31z4EkVNehNiPYfiFo=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779981409; c=relaxed/simple;
	bh=J7QA81aiUJfPF/H9iEJGAJhfsEgy8ijZQRteb4wSyD0=;
	h=References:In-Reply-To:Content-Type:From:Subject:Date:Message-Id:
	 Mime-Version:To:Cc;
 b=Yf1432LWuSGo89Aijm8tFqndxLI3MwpoRfBKLgOqiUrBjKlEMfZ45fGsfXDq24O1PwxyoxSHPmth/31c9IESiUfJKaQ6K/zHF8ib6g6ZcL34t9rgKx31eqEbkUznm6vCkEeF9ND3jktULHVwmDz7MJL4ltDLnH82qEc4SCwMsOg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;
 spf=pass smtp.mailfrom=bytedance.com;
 dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b=ZeA+eJjF; arc=none smtp.client-ip=209.127.230.113
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b="ZeA+eJjF"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 s=2212171451; d=bytedance.com; t=1779981403; h=from:subject:
 mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:
 mime-version:in-reply-to:message-id;
 bh=LBNpCceUAsmGG5hrYrskLI5oAZAIH9XBGd0PNpXkeEA=;
 b=ZeA+eJjFA+NQkhR3C7n7nTb6vUGZw/YFdv/S0maIJxpsvrw947FIQB4kLd8WHHB3yClsMF
 31WYppzsf9N9uD3gVOtcqIHU6b3aBN98D6TncUqaIgkMafIzTDuunSUMVavhvHaI329Cws
 4VFugFMwFuwIypofXflcr1WIJwyaGqpJym+vkGAP6HyqzVphobeC/X/rkFzIAniGX1j018
 WfajruXTGIztJAmA/3eirkBZwUcYoqdewneuEUX6mUdVJURfwcY33uC2Q8wsdM7IneOr8q
 jx7U5sVkuu3MJElEX/hHgVcolIZbh08sb9vqVX/Qb30SRHZqGlpPZusfrrZJHw==
References: <20260528151338.617843-1-zhouchuyi@bytedance.com>
In-Reply-To: <20260528151338.617843-1-zhouchuyi@bytedance.com>
From: "Chuyi Zhou" <zhouchuyi@bytedance.com>
Subject: [PATCH v6 11/12] x86/mm: Enable preemption during
 native_flush_tlb_multi
Date: Thu, 28 May 2026 23:13:37 +0800
Message-Id: <20260528151338.617843-12-zhouchuyi@bytedance.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-Lms-Return-Path: 
 <lba+26a185c59+768b83+vger.kernel.org+zhouchuyi@bytedance.com>
To: <tglx@kernel.org>, <mingo@redhat.com>, <luto@kernel.org>,
	<peterz@infradead.org>, <paulmck@kernel.org>, <muchun.song@linux.dev>,
	<bp@alien8.de>, <dave.hansen@linux.intel.com>, <pbonzini@redhat.com>,
	<bigeasy@linutronix.de>, <clrkwllms@kernel.org>, <rostedt@goodmis.org>,
	<nadav.amit@gmail.com>
Cc: <linux-kernel@vger.kernel.org>, "Chuyi Zhou" <zhouchuyi@bytedance.com>
X-Original-From: Chuyi Zhou <zhouchuyi@bytedance.com>
X-Mailer: git-send-email 2.20.1
Content-Type: text/plain; charset="utf-8"

native_flush_tlb_multi() may be frequently called by flush_tlb_mm_range()
and arch_tlbbatch_flush() in production environments. When pages are
reclaimed or process exit, native_flush_tlb_multi() sends IPIs to remote
CPUs and waits for all remote CPUs to complete their local TLB flushes.
The overall latency may reach tens of milliseconds due to a large number of
remote CPUs and other factors (such as interrupts being disabled). Since
flush_tlb_mm_range() and arch_tlbbatch_flush() always disable preemption,
which may cause increased scheduling latency for other threads on the
current CPU.

Previous patch converted flush_tlb_info from per-cpu variable to on-stack
variable. Additionally, it's no longer necessary to explicitly disable
preemption before calling smp_call*() since they internally handle the
preemption logic. Now it's safe to enable preemption during
native_flush_tlb_multi().

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/x86/kernel/kvm.c | 4 +++-
 arch/x86/mm/tlb.c     | 9 +++++++--
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 29226d112029..d540f54f4d16 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -662,8 +662,10 @@ static void kvm_flush_tlb_multi(const struct cpumask *=
cpumask,
 	u8 state;
 	int cpu;
 	struct kvm_steal_time *src;
-	struct cpumask *flushmask =3D this_cpu_cpumask_var_ptr(__pv_cpu_mask);
+	struct cpumask *flushmask;
=20
+	guard(preempt)();
+	flushmask =3D this_cpu_cpumask_var_ptr(__pv_cpu_mask);
 	cpumask_copy(flushmask, cpumask);
 	/*
 	 * We have to call flush only on online vCPUs. And
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index cfc3a72477f5..58c6f3d2f993 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1421,9 +1421,11 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsign=
ed long start,
 	if (mm_global_asid(mm)) {
 		broadcast_tlb_flush(info);
 	} else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) {
+		put_cpu();
 		info->trim_cpumask =3D should_trim_cpumask(mm);
 		flush_tlb_multi(mm_cpumask(mm), info);
 		consider_global_asid(mm);
+		goto invalidate;
 	} else if (mm =3D=3D this_cpu_read(cpu_tlbstate.loaded_mm)) {
 		lockdep_assert_irqs_enabled();
 		local_irq_disable();
@@ -1432,6 +1434,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigne=
d long start,
 	}
=20
 	put_cpu();
+invalidate:
 	mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end);
 }
=20
@@ -1691,7 +1694,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_b=
atch *batch)
 		invlpgb_flush_all_nonglobals();
 		batch->unmapped_pages =3D false;
 	} else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) {
+		put_cpu();
 		flush_tlb_multi(&batch->cpumask, &info);
+		goto clear;
 	} else if (cpumask_test_cpu(cpu, &batch->cpumask)) {
 		lockdep_assert_irqs_enabled();
 		local_irq_disable();
@@ -1699,9 +1704,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_b=
atch *batch)
 		local_irq_enable();
 	}
=20
-	cpumask_clear(&batch->cpumask);
-
 	put_cpu();
+clear:
+	cpumask_clear(&batch->cpumask);
 }
=20
 /*
--=20
2.20.1
From nobody Mon Jun  8 14:37:30 2026
Received: from va-1-114.ptr.blmpb.com (va-1-114.ptr.blmpb.com
 [209.127.230.114])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7D3DC3F4DE5
	for <linux-kernel@vger.kernel.org>; Thu, 28 May 2026 15:17:02 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.127.230.114
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1779981422; cv=none;
 b=oanbcgf7doTWaAvvq4uTm9c25LQAR81UWAKRtbVGAuplK//KAiE/OnYsTDRLfD8WMBMUr1RtuzTQMRTLeu1T96FKlkrLRuxQxt8KMWgY64GkTCI6k4EYM5hDQp2F/snNJQ3EXxT9QKUHOfrvIEvHdy1do3Zb7N7JliQZExR33pM=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1779981422; c=relaxed/simple;
	bh=Ma7fYd0vKUYW3yr65IkQjFbFL3kVsHjaAu5BmxkKTR0=;
	h=Content-Type:To:Date:In-Reply-To:Subject:Mime-Version:References:
	 Cc:From:Message-Id;
 b=rgSs+XorWkByl2lNZaY09oeDRSjBnMfCD6iJHdUvbTFcd9csiUtTVd2h8wBDwW5oj7lbwmYBLE6nd8oYvMqs8PDLeeXIO1RJ9F646UoQ+2dN+uDUmad7P+pXk1EgRV2aLX/fj9VOEPFBtvLW5iIzH62vILsDtBwLyAZFES6b974=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com;
 spf=pass smtp.mailfrom=bytedance.com;
 dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b=lAcoDDUR; arc=none smtp.client-ip=209.127.230.114
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=quarantine dis=none) header.from=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=bytedance.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com
 header.b="lAcoDDUR"
DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed;
 s=2212171451; d=bytedance.com; t=1779981417; h=from:subject:
 mime-version:from:date:message-id:subject:to:cc:reply-to:content-type:
 mime-version:in-reply-to:message-id;
 bh=IdQVhwppj5Q9lm1vHytR5UVgmyzCeb/YCYUSHzb3Q60=;
 b=lAcoDDUR1NykqcVpaQ4JCa7/TWXx9qK3ZMiXZN1rQQ8q0Oj7/MhG4jkNyTDAPMpjheXiwu
 c5bFzPfMM0kGujdN2xMKgdxhevLBbZ1uiTdqdzNUeBV8JbxERxURHjDDZ/xjtLHY3BrdCS
 fximq4qZ38jXus9wEct1wccZygtO4kYXWDIilQVpXwynzep9GsiN5cc53oN2L0uQvLJpA8
 NXjK211bPvGrsUX1coi8bJdsTyQE0oWF5WuvO73rQBuc9oU70kfFpXyMPHjXGqSkmNJk+7
 /maTbkYyf4uc+L5JUk8JA5F7SR0V4XB2neoiyWrZehQ7GCsMK9Ass/fMr40vfw==
X-Mailer: git-send-email 2.20.1
To: <tglx@kernel.org>, <mingo@redhat.com>, <luto@kernel.org>,
	<peterz@infradead.org>, <paulmck@kernel.org>, <muchun.song@linux.dev>,
	<bp@alien8.de>, <dave.hansen@linux.intel.com>, <pbonzini@redhat.com>,
	<bigeasy@linutronix.de>, <clrkwllms@kernel.org>, <rostedt@goodmis.org>,
	<nadav.amit@gmail.com>
Date: Thu, 28 May 2026 23:13:38 +0800
In-Reply-To: <20260528151338.617843-1-zhouchuyi@bytedance.com>
X-Lms-Return-Path: 
 <lba+26a185c67+d4cecd+vger.kernel.org+zhouchuyi@bytedance.com>
Subject: [PATCH v6 12/12] x86/mm: Enable preemption during
 flush_tlb_kernel_range
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20260528151338.617843-1-zhouchuyi@bytedance.com>
X-Original-From: Chuyi Zhou <zhouchuyi@bytedance.com>
Cc: <linux-kernel@vger.kernel.org>, "Chuyi Zhou" <zhouchuyi@bytedance.com>
From: "Chuyi Zhou" <zhouchuyi@bytedance.com>
Message-Id: <20260528151338.617843-13-zhouchuyi@bytedance.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

flush_tlb_kernel_range() is invoked when kernel memory mapping changes.
On x86 platforms without the INVLPGB feature enabled, we need to send IPIs
to every online CPU and synchronously wait for them to complete
do_kernel_range_flush(). This process can be time-consuming due to factors
such as a large number of CPUs or other issues (like interrupts being
disabled). flush_tlb_kernel_range() always disables preemption, this may
affect the scheduling latency of other tasks on the current CPU.

Previous patch converted flush_tlb_info from per-cpu variable to on-stack
variable. Additionally, it's no longer necessary to explicitly disable
preemption before calling smp_call*() since they internally handles the
preemption logic. Now it's safe to enable preemption during
flush_tlb_kernel_range(). Additionally, in get_flush_tlb_info() use
raw_smp_processor_id() to avoid warnings from check_preemption_disabled().

Signed-off-by: Chuyi Zhou <zhouchuyi@bytedance.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
---
 arch/x86/mm/tlb.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 58c6f3d2f993..c37cc9845abc 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -1394,7 +1394,7 @@ static void get_flush_tlb_info(struct flush_tlb_info =
*info,
 	info->stride_shift	=3D stride_shift;
 	info->freed_tables	=3D freed_tables;
 	info->new_tlb_gen	=3D new_tlb_gen;
-	info->initiating_cpu	=3D smp_processor_id();
+	info->initiating_cpu	=3D raw_smp_processor_id();
 	info->trim_cpumask	=3D 0;
 }
=20
@@ -1461,6 +1461,8 @@ static void invlpgb_kernel_range_flush(struct flush_t=
lb_info *info)
 {
 	unsigned long addr, nr;
=20
+	guard(preempt)();
+
 	for (addr =3D info->start; addr < info->end; addr +=3D nr << PAGE_SHIFT) {
 		nr =3D (info->end - addr) >> PAGE_SHIFT;
=20
@@ -1505,7 +1507,6 @@ void flush_tlb_kernel_range(unsigned long start, unsi=
gned long end)
 {
 	struct flush_tlb_info info;
=20
-	guard(preempt)();
 	get_flush_tlb_info(&info, NULL, start, end, PAGE_SHIFT, false,
 			   TLB_GENERATION_INVALID);
=20
--=20
2.20.1