From nobody Wed Apr 1 10:02:08 2026 Received: from va-1-112.ptr.blmpb.com (va-1-112.ptr.blmpb.com [209.127.230.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 964813D811D for ; Tue, 31 Mar 2026 11:31:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.112 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956713; cv=none; b=B/sxfaJ/oRETbx0j9+jj4CW0ojRniqDkU3fwjWST4DJw8IsQO8zR7oEimxFwal9wkrF9Ue+OByao43veUc1wncvAo6+fGzPJGFHnBPPy0Kg7DLakkCDgIDQE4vDqs4yBwV7NmlxaTmY+VBZqrnxXGHnjiHGe02WxYA0ZKyJQr4A= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956713; c=relaxed/simple; bh=Yr7hPbFJqPhwhA4yLJOLlIwqQ4YC5o5sOhMGOPQF0y0=; h=Mime-Version:In-Reply-To:Cc:Subject:To:Message-Id:From:Date: References:Content-Type; b=oAi5y+mIwDu1ej6rTld5A3i5HuW9cUz8cmMnuZZbPBQGTrIW+F8IXo9doxzKZM5dMZHT3z1GA1PQvGu/taONVDPNXskTLSXH2pR8F+AixWmHzdTbMuFf7PzB/mVfZtuInAUtwTqeXPEC979iB0XrNE8EwtpD5qcHlLe3eRHCDzg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=Imb3sLOj; arc=none smtp.client-ip=209.127.230.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="Imb3sLOj" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1774956700; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=4qJqU4VfR4Arq9FDA14N+IzDSZhq+TH2zib4qVKbcC0=; b=Imb3sLOjrMXQAP1ujDsCQo7BP9fxLANCwYjlCwB2451DwEgmVigiT1kkPJC8aIMs5myObj iYUNlPajpPMrgzPQBRue6ZZZ/IGBt6V074eCih+eZr69unBIg395GHR7V4lIIW2c0EIsC5 0Ki3pYO/1ymPXWVC8KPrkCEROfjOPqYqPiKQRuR8lh9VSV8YMaBsq+zMr9PPB6oJgedbII sAgl3JwBfSIMOd+k3CTaoZkN6+CBLhGzKWiatRESr8Jq5HZ0Sw5C+xA93lVJYZXtts0OxY Z4RUcVbGq1aFiV0baPIy3YjN3bl6mq5fcWtOekVNCJieybMur+t8U6Q+gXYyxA== Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20260331113103.2197007-1-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 X-Original-From: Chuyi Zhou Cc: , "Chuyi Zhou" Subject: [PATCH v4 01/12] smp: Disable preemption explicitly in __csd_lock_wait Content-Transfer-Encoding: quoted-printable To: , , , , , , , , , , , , Message-Id: <20260331113103.2197007-2-zhouchuyi@bytedance.com> X-Lms-Return-Path: From: "Chuyi Zhou" Date: Tue, 31 Mar 2026 19:30:52 +0800 References: <20260331113103.2197007-1-zhouchuyi@bytedance.com> Content-Type: text/plain; charset="utf-8" The latter patches will enable preemption before csd_lock_wait(), which could break csdlock_debug. Because the slice of other tasks on the CPU may be accounted between ktime_get_mono_fast_ns() calls. Disable preemption explicitly in __csd_lock_wait(). This is a preparation for the next patches. Signed-off-by: Chuyi Zhou Acked-by: Muchun Song Reviewed-by: Steven Rostedt (Google) --- kernel/smp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/smp.c b/kernel/smp.c index f349960f79ca..fc1f7a964616 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -323,6 +323,8 @@ static void __csd_lock_wait(call_single_data_t *csd) int bug_id =3D 0; u64 ts0, ts1; =20 + guard(preempt)(); + ts1 =3D ts0 =3D ktime_get_mono_fast_ns(); for (;;) { if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id, &nmessages)) --=20 2.20.1 From nobody Wed Apr 1 10:02:08 2026 Received: from va-1-115.ptr.blmpb.com (va-1-115.ptr.blmpb.com [209.127.230.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E4163090E2 for ; Tue, 31 Mar 2026 11:32:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.115 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956726; cv=none; b=YpAgmSPBxwP3ETiD6zkZORbYhqx4rHCKuOCSGx6nNfcGHULy+LaxGfkyqLqPShuOoCU791fFmULPnmWNjmMLNT7vyE9D44+E06enqrpZUcilKzjKuzkFc4HWZFUUiZM2fk7altL6ONVrWaxKRoi/VAxssfDy58PF7wz9gWES8q8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956726; c=relaxed/simple; bh=LldljtaPW5ZtrUdVB5DF65oiNbDI86xEp7E1a1JNrg8=; h=From:Message-Id:To:Cc:Mime-Version:In-Reply-To:Date:Subject: References:Content-Type; b=aX86FYokoXz8QboMs86dm7f1sOjEAKGzBKjtCh+BWzToksCHdsbXkeXVCJ/zn4Tn6GPnoCKiWM1lVQhlf6xixe073J3BZMVDNnE7dF+yGzh2Jymr3lsK18ehGgM2E8uS8SKVgEBh1RVtQN+k6sFH3Y3mzZpKeOPrZaRanw9HZt4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=m1laIbeU; arc=none smtp.client-ip=209.127.230.115 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="m1laIbeU" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1774956714; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=zaVLXF5fAu4ZiPLqJExgKtU8coNIuqJ1IUnigMO3d+0=; b=m1laIbeUf6uvJo5UKOu9w1mGGMReQ+4dqteRx5aCtpoZZZcpseigZ/XNLhB83CCIhqBYSQ msxPvj/wAEkKv0Ko80LY55hme4kpYG/1DTB0+aCyE8LqeBNWD8DgXybQfmj6L8lBm4BXrk x/x7kpy+Db0hnLcjbXA+mR7ztlSGRsgy1bS+ppg2LA2x/q721FgCfFUSgNLA0uRyiRFZyj m+YE3hC9NUtNlbvPZtgmwhonttQXXG57n/bGO531cOSpHQuHEoVpJcmSADnz5zrULsuI4k q/3/hMHzmEpc6YSyR+GhM/P6MkyhneuN3OJLSVUrphGlwXukJo0PjecEkhvuGQ== From: "Chuyi Zhou" Message-Id: <20260331113103.2197007-3-zhouchuyi@bytedance.com> To: , , , , , , , , , , , , Cc: , "Chuyi Zhou" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable In-Reply-To: <20260331113103.2197007-1-zhouchuyi@bytedance.com> Date: Tue, 31 Mar 2026 19:30:53 +0800 X-Lms-Return-Path: X-Mailer: git-send-email 2.20.1 Subject: [PATCH v4 02/12] smp: Enable preemption early in smp_call_function_single X-Original-From: Chuyi Zhou References: <20260331113103.2197007-1-zhouchuyi@bytedance.com> Content-Type: text/plain; charset="utf-8" Now smp_call_function_single() disables preemption mainly for the following reasons: - To protect the per-cpu csd_data from concurrent modification by other tasks on the current CPU in the !wait case. For the wait case, synchronization is not a concern as on-stack csd is used. - To prevent the remote online CPU from being offlined. Specifically, we want to ensure that no new IPIs are queued after smpcfd_dying_cpu() has finished. Disabling preemption for the entire execution is unnecessary, especially csd_lock_wait() part does not require preemption protection. This patch enables preemption before csd_lock_wait() to reduce the preemption-disabled critical section. Signed-off-by: Chuyi Zhou Reviewed-by: Muchun Song Reviewed-by: Steven Rostedt (Google) --- kernel/smp.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index fc1f7a964616..b603d4229f95 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -685,11 +685,16 @@ int smp_call_function_single(int cpu, smp_call_func_t= func, void *info, =20 err =3D generic_exec_single(cpu, csd); =20 + /* + * @csd is stack-allocated when @wait is true. No concurrent access + * except from the IPI completion path, so we can re-enable preemption + * early to reduce latency. + */ + put_cpu(); + if (wait) csd_lock_wait(csd); =20 - put_cpu(); - return err; } EXPORT_SYMBOL(smp_call_function_single); --=20 2.20.1 From nobody Wed Apr 1 10:02:08 2026 Received: from va-1-111.ptr.blmpb.com (va-1-111.ptr.blmpb.com [209.127.230.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4851E3D9023 for ; Tue, 31 Mar 2026 11:32:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.111 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956741; cv=none; b=XMOGwtSLtNDnyHFdXm/WpovMrCzrz82iFkirBCOWpMEaOr6yr09TS//alWk5jayIjzk0o6ZYPUFUbdodZgLX3XlzfVw6rSPABETLaw07XO/6FDHmbUOamAbhjTdZFyPTUVqPH7eNzvrpGSQthVjQaOJfLBeLd13uw4QZr4LUhzY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956741; c=relaxed/simple; bh=TCy2KX9qDbiHGy1QuZHj0y6OOg+oyJE4Ks74Pqah68A=; h=Subject:Date:References:To:In-Reply-To:Mime-Version:Message-Id: From:Content-Type:Cc; b=tB40Pkp+mPwDyQt9jIjYx/EK8uSY+6hPKZ1T3o4npTYRTCqtiejcZvGqsYoP+KEJNhq0YcMg8z+A1L6kqE+gmBBTVWlzCRAl3N7JvxNhc6G99JY1JKPpw9zXlpNXiFy2nJkZQVUpgjptA5pHlwdVBo/AWFw5LDrByU4SE4rfHko= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=ZCmx/T8Z; arc=none smtp.client-ip=209.127.230.111 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="ZCmx/T8Z" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1774956727; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=KHQMxt7OqXLP0XEy9NJ/DuIMZST6z1kq98CQ96mzrVY=; b=ZCmx/T8Z8OR00gPQgwTCay76rknyED4x2Aw6w9YCRE3TJsu3W7pb1MXn61MRpuO9s5E0wR LSumeiIcWJ8Yoh9n+MQv6mJOEF6jwcMmKwsqFCPinQsJZjEoCSA23WeauYIvv0fvds/F8F Di7tsmZUL9j0kDIR9g4YQPPqSxitfG92wlGrXb5JhrBzqDCB6W/RvhwNHDNTtYrlSXBLt8 4cePQUIY9fd+7+YJZxpKGrkIwqMbH0y2lk//EmHahW8Qzc+QRxx/gzHQjakLicZQntUgp1 IyYOPUK81s89GSsfYj8jtBuhQyP5MQhmBiLOzEvYzIlSV9RRn/zMJRKI/SmKcA== Subject: [PATCH v4 03/12] smp: Remove get_cpu from smp_call_function_any Date: Tue, 31 Mar 2026 19:30:54 +0800 X-Lms-Return-Path: Content-Transfer-Encoding: quoted-printable References: <20260331113103.2197007-1-zhouchuyi@bytedance.com> X-Original-From: Chuyi Zhou To: , , , , , , , , , , , , In-Reply-To: <20260331113103.2197007-1-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Message-Id: <20260331113103.2197007-4-zhouchuyi@bytedance.com> From: "Chuyi Zhou" Cc: , "Chuyi Zhou" Content-Type: text/plain; charset="utf-8" Now smp_call_function_single() would enable preemption before csd_lock_wait() to reduce the critical section. To allow callers of smp_call_function_any() to also benefit from this optimization, remove get_cpu()/put_cpu() from smp_call_function_any(). Signed-off-by: Chuyi Zhou Reviewed-by: Muchun Song --- kernel/smp.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index b603d4229f95..80daf9dd4a25 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -761,16 +761,26 @@ EXPORT_SYMBOL_GPL(smp_call_function_single_async); int smp_call_function_any(const struct cpumask *mask, smp_call_func_t func, void *info, int wait) { + bool local =3D true; unsigned int cpu; int ret; =20 - /* Try for same CPU (cheapest) */ + /* + * Prevent migration to another CPU after selecting the current CPU + * as the target. + */ cpu =3D get_cpu(); - if (!cpumask_test_cpu(cpu, mask)) + + /* Try for same CPU (cheapest) */ + if (!cpumask_test_cpu(cpu, mask)) { cpu =3D sched_numa_find_nth_cpu(mask, 0, cpu_to_node(cpu)); + local =3D false; + put_cpu(); + } =20 ret =3D smp_call_function_single(cpu, func, info, wait); - put_cpu(); + if (local) + put_cpu(); return ret; } EXPORT_SYMBOL_GPL(smp_call_function_any); --=20 2.20.1 From nobody Wed Apr 1 10:02:08 2026 Received: from va-1-112.ptr.blmpb.com (va-1-112.ptr.blmpb.com [209.127.230.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 649043E121B for ; Tue, 31 Mar 2026 11:32:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.112 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956747; cv=none; b=Pp6Rcsf8kNZouFmuZVgX30bol3To26RwxI7MB/ct2gAC1XV7H5WaX0uOyd0NxnXXmvOfN6vi2ZmM+/JLKKDDSHNepXQq4XR2c7MzszbKJTB769ktISocH31rjq7/dpHNJI2KjWJmVjs9KkysddPE47h78yKEJ/2b/GL0UrsYByc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956747; c=relaxed/simple; bh=y/C9Nyl1bF1y1y/rvIxGspctdmepZf8+vZ9zdJl2/Yc=; h=Date:Cc:Subject:Content-Type:Message-Id:From:Mime-Version: In-Reply-To:References:To; b=RAg4Q/+fh5AHmeVZx/ma1gThXqrBGJWynqH+iurD19ZfMI7lyiL+DdPgah5Zlj+v59BfTLYIja1EKSHwDpPDJuV+OjcTldT+YS8nwbRNp7PvlfRUJ9kfWYW6Jw7ACQkcyAbhSFF+o5YbRu0jJniiC6EXb6rySuCuSZSlXwwmBfE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=I9325wY5; arc=none smtp.client-ip=209.127.230.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="I9325wY5" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1774956740; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=4AZlOeAQjBFoDsOULBq831rbKH1dU4zDMu2/CTl7cHQ=; b=I9325wY5rj0bbw1zbI5hxtqBA2pRJRrApNtu9SORiSqpW3j2jz0fMlatpQilAgDzEDHP+p ms6UZxEoMyREO9CA0yHxiMu2y2hyhssCT2Ie7iGghF3VRRifxYl5XpZb/HZZx2dLhjb6NK 47WjP8LJZtFs9nAHqYglFlTdxBsx8der2N+s+cmVwlfg3/wJb7wSRwz7gETR7ZukH2iF8f 3us17uc71DLcJWQebxKpELSkLdJfR13k6Ev/QsR7u+pODydcIChSVFQDbNumxZzPPqPbOr yFx5lEfqvyfKg2l54GHnUOS0Yj5dKjlMRjSuljhLaBDhZb/s4x8B1p/BekND0A== Date: Tue, 31 Mar 2026 19:30:55 +0800 X-Original-From: Chuyi Zhou Cc: , "Chuyi Zhou" Subject: [PATCH v4 04/12] smp: Use task-local IPI cpumask in smp_call_function_many_cond() Message-Id: <20260331113103.2197007-5-zhouchuyi@bytedance.com> From: "Chuyi Zhou" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20260331113103.2197007-1-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 X-Lms-Return-Path: Content-Transfer-Encoding: quoted-printable References: <20260331113103.2197007-1-zhouchuyi@bytedance.com> To: , , , , , , , , , , , , Content-Type: text/plain; charset="utf-8" This patch prepares the task-local IPI cpumask during thread creation, and uses the local cpumask to replace the percpu cfd cpumask in smp_call_function_many_cond(). We will enable preemption during csd_lock_wait() later, and this can prevent concurrent access to the cfd->cpumask from other tasks on the current CPU. For cases where cpumask_size() is smaller than or equal to the pointer size, it tries to stash the cpumask in the pointer itself to avoid extra memory allocations. Signed-off-by: Chuyi Zhou --- include/linux/sched.h | 6 +++++ include/linux/smp.h | 20 +++++++++++++++ kernel/fork.c | 9 ++++++- kernel/smp.c | 59 ++++++++++++++++++++++++++++++++++++++----- 4 files changed, 87 insertions(+), 7 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 5a5d3dbc9cdf..6daab67caacc 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1346,6 +1346,12 @@ struct task_struct { struct list_head perf_event_list; struct perf_ctx_data __rcu *perf_ctx_data; #endif +#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPTION) + union { + cpumask_t *ipi_mask_ptr; + unsigned long ipi_mask_val; + }; +#endif #ifdef CONFIG_DEBUG_PREEMPT unsigned long preempt_disable_ip; #endif diff --git a/include/linux/smp.h b/include/linux/smp.h index 1ebd88026119..c7b8cc82ad3c 100644 --- a/include/linux/smp.h +++ b/include/linux/smp.h @@ -167,6 +167,12 @@ void smp_call_function_many(const struct cpumask *mask, int smp_call_function_any(const struct cpumask *mask, smp_call_func_t func, void *info, int wait); =20 +#ifdef CONFIG_PREEMPTION +int smp_task_ipi_mask_alloc(struct task_struct *task); +void smp_task_ipi_mask_free(struct task_struct *task); +cpumask_t *smp_task_ipi_mask(struct task_struct *cur); +#endif + void kick_all_cpus_sync(void); void wake_up_all_idle_cpus(void); bool cpus_peek_for_pending_ipi(const struct cpumask *mask); @@ -306,4 +312,18 @@ bool csd_lock_is_stuck(void); static inline bool csd_lock_is_stuck(void) { return false; } #endif =20 +#if !defined(CONFIG_SMP) || !defined(CONFIG_PREEMPTION) +static inline int smp_task_ipi_mask_alloc(struct task_struct *task) +{ + return 0; +} +static inline void smp_task_ipi_mask_free(struct task_struct *task) +{ +} +static inline cpumask_t *smp_task_ipi_mask(struct task_struct *cur) +{ + return NULL; +} +#endif + #endif /* __LINUX_SMP_H */ diff --git a/kernel/fork.c b/kernel/fork.c index bc2bf58b93b6..7082eb1c02c1 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -533,6 +533,7 @@ void free_task(struct task_struct *tsk) #endif release_user_cpus_ptr(tsk); scs_release(tsk); + smp_task_ipi_mask_free(tsk); =20 #ifndef CONFIG_THREAD_INFO_IN_TASK /* @@ -930,10 +931,14 @@ static struct task_struct *dup_task_struct(struct tas= k_struct *orig, int node) #endif account_kernel_stack(tsk, 1); =20 - err =3D scs_prepare(tsk, node); + err =3D smp_task_ipi_mask_alloc(tsk); if (err) goto free_stack; =20 + err =3D scs_prepare(tsk, node); + if (err) + goto free_ipi_mask; + #ifdef CONFIG_SECCOMP /* * We must handle setting up seccomp filters once we're under @@ -1004,6 +1009,8 @@ static struct task_struct *dup_task_struct(struct tas= k_struct *orig, int node) #endif return tsk; =20 +free_ipi_mask: + smp_task_ipi_mask_free(tsk); free_stack: exit_task_stack_account(tsk); free_thread_stack(tsk); diff --git a/kernel/smp.c b/kernel/smp.c index 80daf9dd4a25..446e3f80007e 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -785,6 +785,44 @@ int smp_call_function_any(const struct cpumask *mask, } EXPORT_SYMBOL_GPL(smp_call_function_any); =20 +static DEFINE_STATIC_KEY_FALSE(ipi_mask_inlined); + +#ifdef CONFIG_PREEMPTION + +int smp_task_ipi_mask_alloc(struct task_struct *task) +{ + if (static_branch_unlikely(&ipi_mask_inlined)) + return 0; + + task->ipi_mask_ptr =3D kmalloc(cpumask_size(), GFP_KERNEL); + if (!task->ipi_mask_ptr) + return -ENOMEM; + + return 0; +} + +void smp_task_ipi_mask_free(struct task_struct *task) +{ + if (static_branch_unlikely(&ipi_mask_inlined)) + return; + + kfree(task->ipi_mask_ptr); +} + +cpumask_t *smp_task_ipi_mask(struct task_struct *cur) +{ + /* + * If cpumask_size() is smaller than or equal to the pointer + * size, it stashes the cpumask in the pointer itself to + * avoid extra memory allocations. + */ + if (static_branch_unlikely(&ipi_mask_inlined)) + return (cpumask_t *)&cur->ipi_mask_val; + + return cur->ipi_mask_ptr; +} +#endif + /* * Flags to be used as scf_flags argument of smp_call_function_many_cond(). * @@ -802,11 +840,18 @@ static void smp_call_function_many_cond(const struct = cpumask *mask, int cpu, last_cpu, this_cpu =3D smp_processor_id(); struct call_function_data *cfd; bool wait =3D scf_flags & SCF_WAIT; + struct cpumask *cpumask, *task_mask; + bool preemptible_wait; int nr_cpus =3D 0; bool run_remote =3D false; =20 lockdep_assert_preemption_disabled(); =20 + task_mask =3D smp_task_ipi_mask(current); + preemptible_wait =3D task_mask && preemptible(); + cfd =3D this_cpu_ptr(&cfd_data); + cpumask =3D preemptible_wait ? task_mask : cfd->cpumask; + /* * Can deadlock when called with interrupts disabled. * We allow cpu's that are not yet online though, as no one else can @@ -827,16 +872,15 @@ static void smp_call_function_many_cond(const struct = cpumask *mask, =20 /* Check if we need remote execution, i.e., any CPU excluding this one. */ if (cpumask_any_and_but(mask, cpu_online_mask, this_cpu) < nr_cpu_ids) { - cfd =3D this_cpu_ptr(&cfd_data); - cpumask_and(cfd->cpumask, mask, cpu_online_mask); - __cpumask_clear_cpu(this_cpu, cfd->cpumask); + cpumask_and(cpumask, mask, cpu_online_mask); + __cpumask_clear_cpu(this_cpu, cpumask); =20 cpumask_clear(cfd->cpumask_ipi); - for_each_cpu(cpu, cfd->cpumask) { + for_each_cpu(cpu, cpumask) { call_single_data_t *csd =3D per_cpu_ptr(cfd->csd, cpu); =20 if (cond_func && !cond_func(cpu, info)) { - __cpumask_clear_cpu(cpu, cfd->cpumask); + __cpumask_clear_cpu(cpu, cpumask); continue; } =20 @@ -887,7 +931,7 @@ static void smp_call_function_many_cond(const struct cp= umask *mask, } =20 if (run_remote && wait) { - for_each_cpu(cpu, cfd->cpumask) { + for_each_cpu(cpu, cpumask) { call_single_data_t *csd; =20 csd =3D per_cpu_ptr(cfd->csd, cpu); @@ -1003,6 +1047,9 @@ EXPORT_SYMBOL(nr_cpu_ids); void __init setup_nr_cpu_ids(void) { set_nr_cpu_ids(find_last_bit(cpumask_bits(cpu_possible_mask), NR_CPUS) + = 1); + + if (IS_ENABLED(CONFIG_PREEMPTION) && cpumask_size() <=3D sizeof(unsigned = long)) + static_branch_enable(&ipi_mask_inlined); } =20 /* Called by boot processor to activate the rest. */ --=20 2.20.1 From nobody Wed Apr 1 10:02:08 2026 Received: from va-1-111.ptr.blmpb.com (va-1-111.ptr.blmpb.com [209.127.230.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C10D3BF680 for ; Tue, 31 Mar 2026 11:32:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.111 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956760; cv=none; b=hvKqc/N3JeljP1WsfeVfRevVj/kGo/eMebGhoV0LxCN6eEXQcQADrXQ2Z00kPKyu4hQe6Ly3DJMNyhmCL1vnWZQTVOFZtsl7IM4CM8T4IfprAn36/AI+psiWnZcQJ/ILhLXw+tZss7w8JMVmHV03f/78D+lLHrfvK5oslCw25Xs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956760; c=relaxed/simple; bh=pdq0IJ9ol3YXfuwtoTfM1eCh4RF5Wdt2TD5XtAvjhmw=; h=From:In-Reply-To:Subject:Date:Message-Id:To:Cc:Mime-Version: References:Content-Type; b=XO2jcN5lQLnXwY6LXZTwJVcORB3fHP3L+nypXOIimpn3koG5i4DR76kTDqWMZ6fZCZVVLEnVgbZgr518MVF7/BZWc4qLo1ZDecytUy/Iq7M/QvehEhvqYys2cUSsr7EdpDhWiLJpx4Vp5ZVlHSZc90LhMPcDUficEdkQuk9KOm8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=oxGY3aSV; arc=none smtp.client-ip=209.127.230.111 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="oxGY3aSV" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1774956754; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=GPWBH8CkBbclLXy0JLLOWAAPp3nPoBMoyH4dd3fN0Jc=; b=oxGY3aSViHgIhkFQhmIUYKUfb0XuQN4lv+7cH+hlbZyZ3u03JgrLE3ctpzGVquvf3vfesR jgoQ872+dBdOGQeve/Yl6hF8P7qTJkp/EKy7TTBLMjffdMqsFfgBPtKdVDxJ1iU/CkiqzC w+QhoKfdCnpXxpfWYUR73KbufyZURVUZPEgqnPVwEchBW7wWAG7RKQGRgK4IwqdlKVojzL JbN0r66eJ3fWXBPMdxDtgH5G1adm9E9iWHdWh/n8I95NqBpiWHv296gzMJZ2TjzUDoJ+Kb OSPfgTIhzMIIn5IFXPpF+X6txX6XVQEXuUoNR11hKsBpa8r21f3nofoHBIcv8g== From: "Chuyi Zhou" In-Reply-To: <20260331113103.2197007-1-zhouchuyi@bytedance.com> X-Original-From: Chuyi Zhou Subject: [PATCH v4 05/12] smp: Alloc percpu csd data in smpcfd_prepare_cpu() only once Date: Tue, 31 Mar 2026 19:30:56 +0800 X-Lms-Return-Path: X-Mailer: git-send-email 2.20.1 Message-Id: <20260331113103.2197007-6-zhouchuyi@bytedance.com> To: , , , , , , , , , , , , Cc: , "Chuyi Zhou" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260331113103.2197007-1-zhouchuyi@bytedance.com> Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Later patch would enable preemption during csd_lock_wait() in smp_call_function_many_cond(), which may cause access cfd->csd data that has already been freed in smpcfd_dead_cpu(). One way to fix the above issue is to use the RCU mechanism to protect the csd data and wait for all read critical sections to exit before freeing the memory in smpcfd_dead_cpu(), but this could delay CPU shutdown. This patch chooses a simpler approach: allocate the percpu csd on the UP side only once and skip freeing the csd memory in smpcfd_dead_cpu(). Suggested-by: Sebastian Andrzej Siewior Signed-off-by: Chuyi Zhou --- kernel/smp.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 446e3f80007e..2a33877dd812 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -63,7 +63,15 @@ int smpcfd_prepare_cpu(unsigned int cpu) free_cpumask_var(cfd->cpumask); return -ENOMEM; } - cfd->csd =3D alloc_percpu(call_single_data_t); + + /* + * The percpu csd is allocated only once and never freed. + * This ensures that smp_call_function_many_cond() can safely + * access the csd of an offlined CPU if it gets preempted + * during csd_lock_wait(). + */ + if (!cfd->csd) + cfd->csd =3D alloc_percpu(call_single_data_t); if (!cfd->csd) { free_cpumask_var(cfd->cpumask); free_cpumask_var(cfd->cpumask_ipi); @@ -79,7 +87,6 @@ int smpcfd_dead_cpu(unsigned int cpu) =20 free_cpumask_var(cfd->cpumask); free_cpumask_var(cfd->cpumask_ipi); - free_percpu(cfd->csd); return 0; } =20 --=20 2.20.1 From nobody Wed Apr 1 10:02:08 2026 Received: from va-1-112.ptr.blmpb.com (va-1-112.ptr.blmpb.com [209.127.230.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C940E3BF680 for ; Tue, 31 Mar 2026 11:32:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.112 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956773; cv=none; b=o52FKE0+NitinWdVUmXqZfBgUst4gTwtuLEcsqPJlR8jjqdWhCfFXlJLm8TIaUQoQcG1hu6nmKZttHHzKWhQVYQlSF3PHZBfAehyZkYVBwWE6gLuJF1pPBq6YEVPRiThjgsc9q442gDuv9JLLHZkx774nyurCzh7L9jCLAix4UQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956773; c=relaxed/simple; bh=I9hXemxNVff+3OSIOGk3Ax3O4tzLD3jTo41thSFz57M=; h=From:References:To:Date:Cc:Message-Id:Mime-Version:Subject: In-Reply-To:Content-Type; b=WIfipNOu+1vpFWJ3+ch1PHg6MW4ChEXlk+I7DxdR/5v9h0hDkocgWHEaC/TSSAdqDb+uGbK61pNEvqAY6Y5bE36Q+1bHcfR8uByfur2xwFDnx/7PuCHxZnsLxKGk3Y5VJgaqoTMOOzI9elggB7spi6DPEXvbPPnF63K/thYfwGA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=Qe9+uIaU; arc=none smtp.client-ip=209.127.230.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="Qe9+uIaU" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1774956767; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=IR2fbYWvvlLHW7bF5ETZrAx+GthPVN05/ic0Oz1bRok=; b=Qe9+uIaUdynRXSyf55AZhnT9iOTPrNaDbYRBdfX9olRZJgGi4hycuKaS/UEUn11XdAhMy6 RNKoeSriByZy+bomYYSzFtOoij9303BUEv11lOVaX8q9yoXQNY+s8oMpb2eT/y58z5N7z0 nVoPfrguQWCsE9reYIM6CumWGIlaJbPLIc5XMUoxbwXRCzRNkMuwtv5r5UkK7Hcwv7ZD2n wsIjpQ6oPujnSZh/RAENlD6RYL0+Avk8DO+lq095wuEVX5mbXyGk/Z+RX521knLbus8/jD tYUgOCZSGeJDH3vCrLneUCtqD0JjSXU9ytRtqORfQwWOnVCsXIji5yNDd3Z/GA== From: "Chuyi Zhou" References: <20260331113103.2197007-1-zhouchuyi@bytedance.com> To: , , , , , , , , , , , , Date: Tue, 31 Mar 2026 19:30:57 +0800 Cc: , "Chuyi Zhou" Message-Id: <20260331113103.2197007-7-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Subject: [PATCH v4 06/12] smp: Enable preemption early in smp_call_function_many_cond X-Original-From: Chuyi Zhou Content-Transfer-Encoding: quoted-printable X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260331113103.2197007-1-zhouchuyi@bytedance.com> X-Lms-Return-Path: Content-Type: text/plain; charset="utf-8" Disabling preemption entirely during smp_call_function_many_cond() was primarily for the following reasons: - To prevent the remote online CPU from going offline. Specifically, we want to ensure that no new csds are queued after smpcfd_dying_cpu() has finished. Therefore, preemption must be disabled until all necessary IPIs are sent. - To prevent current CPU from going offline. Being migrated to another CPU and calling csd_lock_wait() may cause UAF due to smpcfd_dead_cpu() during the current CPU offline process. - To protect the per-cpu cfd_data from concurrent modification by other tasks on the current CPU. cfd_data contains cpumasks and per-cpu csds. Before enqueueing a csd, we block on the csd_lock() to ensure the previous async csd->func() has completed, and then initialize csd->func and csd->info. After sending the IPI, we spin-wait for the remote CPU to call csd_unlock(). Actually the csd_lock mechanism already guarantees csd serialization. If preemption occurs during csd_lock_wait, other concurrent smp_call_function_many_cond calls will simply block until the previous csd->func() completes: task A task B sd->func =3D fun_a send ipis preempted by B ---------------> csd_lock(csd); // block until last // fun_a finished csd->func =3D func_b; csd->info =3D info; ... send ipis switch back to A <--------------- csd_lock_wait(csd); // block until remote finish func_* Previous patches replaced the per-cpu cfd->cpumask with task-local cpumask, and the percpu csd is allocated only once and is never freed to ensure we can safely access csd. Now we can enable preemption before csd_lock_wait() which makes the potentially unpredictable csd_lock_wait() preemptible and migratable. Signed-off-by: Chuyi Zhou --- kernel/smp.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 2a33877dd812..4ddb1ec1e43e 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -844,7 +844,7 @@ static void smp_call_function_many_cond(const struct cp= umask *mask, unsigned int scf_flags, smp_cond_func_t cond_func) { - int cpu, last_cpu, this_cpu =3D smp_processor_id(); + int cpu, last_cpu, this_cpu; struct call_function_data *cfd; bool wait =3D scf_flags & SCF_WAIT; struct cpumask *cpumask, *task_mask; @@ -852,10 +852,10 @@ static void smp_call_function_many_cond(const struct = cpumask *mask, int nr_cpus =3D 0; bool run_remote =3D false; =20 - lockdep_assert_preemption_disabled(); - task_mask =3D smp_task_ipi_mask(current); preemptible_wait =3D task_mask && preemptible(); + + this_cpu =3D get_cpu(); cfd =3D this_cpu_ptr(&cfd_data); cpumask =3D preemptible_wait ? task_mask : cfd->cpumask; =20 @@ -937,6 +937,19 @@ static void smp_call_function_many_cond(const struct c= pumask *mask, local_irq_restore(flags); } =20 + /* + * We may block in csd_lock_wait() for a significant amount of time, + * especially when interrupts are disabled or with a large number of + * remote CPUs. Try to enable preemption before csd_lock_wait(). + * + * Use the cpumask_stack instead of cfd->cpumask to avoid concurrency + * modification from tasks on the same cpu. If preemption occurs during + * csd_lock_wait, other concurrent smp_call_function_many_cond() calls + * will simply block until the previous csd->func() completes. + */ + if (preemptible_wait) + put_cpu(); + if (run_remote && wait) { for_each_cpu(cpu, cpumask) { call_single_data_t *csd; @@ -945,6 +958,9 @@ static void smp_call_function_many_cond(const struct cp= umask *mask, csd_lock_wait(csd); } } + + if (!preemptible_wait) + put_cpu(); } =20 /** --=20 2.20.1 From nobody Wed Apr 1 10:02:08 2026 Received: from lf-1-131.ptr.blmpb.com (lf-1-131.ptr.blmpb.com [103.149.242.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 46B053DC4CF for ; Tue, 31 Mar 2026 11:34:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=103.149.242.131 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956871; cv=none; b=J7dvLDGmInfGxVHkjrTFPSdoRBrE8T46FgE0U6gsyzrr+2LIDDEIZYXD9cm3OJZLQcYq7tdLqiPOdF9yF1ixap3NKOK83vg5en0GTW7IWtl/3NtATlnSEnBC0kB0BdD59xEZtJteH390Az3ku7O9wYYvsPzX/j/nUibDLb6F9XI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956871; c=relaxed/simple; bh=nm7x+iIZFVBhl7GqtPAn/49VgtSnzaofyxXgrQlkx2s=; h=References:Subject:Date:In-Reply-To:To:Cc:Mime-Version: Content-Type:From:Message-Id; b=MySz6ww1NRWCem5JosNSoGsw10iHqPhJH3o0R9cmMJo9tILQo5fcv/+LIaqLL6+gl7nrbPU6Ha7SEwxfhAlN6vPm7oqPkJG5Te+aetA17LOXLcuYPVlb94uL1hVebAtAHuv1F3XWPLRkgxYTfTfiL/qKn219KkH30Ai7mAhLd2k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=Ozpe6WZG; arc=none smtp.client-ip=103.149.242.131 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="Ozpe6WZG" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1774956781; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=vKnsPKGYkLl1GrWNoYcVaiglQIwu2WZFC7HWdwidkRo=; b=Ozpe6WZGsoy4Zs+FNmpFblGiXQcX3LNxVqVR9DlaLl+LwYtuL9pBvHXpGqG9U8gUrngAJr aU8l5J6BCQfr6XoXELDUkKxAjisNtCj0rcYJu4mqsw21hkudoEZDRcEVI48gFD0V0R/mAs 4SwDkl2OQDcpsBahLvb7YZQGxaiQe+HjxWiP1xfXHQ5ZFAB+BKQLksc7VGhM3jSsLSV0Xk DcVRz9A5CW8kNgyVzIWO4eSdcFNklQOvMG3Xc/g9uRGUe9c9rD02huryc1MvOKk5bTYCVa HRoQzbGA2mym/wDWUyK2vpwSq0Cr/Cs/PXN5DquKGjjOhDcDTMljLgXE4FB5Xw== References: <20260331113103.2197007-1-zhouchuyi@bytedance.com> Subject: [PATCH v4 07/12] smp: Remove preempt_disable from smp_call_function Date: Tue, 31 Mar 2026 19:30:58 +0800 X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260331113103.2197007-1-zhouchuyi@bytedance.com> X-Original-From: Chuyi Zhou To: , , , , , , , , , , , , Cc: , "Chuyi Zhou" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Lms-Return-Path: Content-Transfer-Encoding: quoted-printable From: "Chuyi Zhou" Message-Id: <20260331113103.2197007-8-zhouchuyi@bytedance.com> Content-Type: text/plain; charset="utf-8" Now smp_call_function_many_cond() internally handles the preemption logic, so smp_call_function() does not need to explicitly disable preemption. Remove preempt_{enable, disable} from smp_call_function(). Signed-off-by: Chuyi Zhou Reviewed-by: Muchun Song --- kernel/smp.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 4ddb1ec1e43e..9b658362aa02 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -1002,9 +1002,8 @@ EXPORT_SYMBOL(smp_call_function_many); */ void smp_call_function(smp_call_func_t func, void *info, int wait) { - preempt_disable(); - smp_call_function_many(cpu_online_mask, func, info, wait); - preempt_enable(); + smp_call_function_many_cond(cpu_online_mask, func, info, + wait ? SCF_WAIT : 0, NULL); } EXPORT_SYMBOL(smp_call_function); =20 --=20 2.20.1 From nobody Wed Apr 1 10:02:08 2026 Received: from va-1-114.ptr.blmpb.com (va-1-114.ptr.blmpb.com [209.127.230.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 68B3531715D for ; Tue, 31 Mar 2026 11:33:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.114 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956802; cv=none; b=s8/zPNghfUiJzsPfNWfxkEB1Ty5Ybk5ilJwUb8Yolfyw5mT6HI2r1E0gnFOMS+qCNgWothn9fvDF/JYyZCLfumQH7rm2qE0RL4eThEmK8QYBLzR7EuyYE7lwl4sVlR9BOQacooSICBqG3xTBN4AIjDOGKsjNXLyuTSEV5LoX1t4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956802; c=relaxed/simple; bh=pLywiVwJokTie6hZcJ5aOULb5E/sPYP6I2OWMqYPrNs=; h=To:In-Reply-To:Subject:Cc:Date:Mime-Version:Content-Type:From: Message-Id:References; b=T8T39iiwxO5twtHy+TaCZmMlz4dLd6wd5T5FIV2dxxrULFDi4D3K/cFcQJqQu/T50VSRqAzfinHu1wwZJ6tM7i+7FIeFZW4sR0l1Zf+Aj2fTqSPIUuTVng8j4iSMmGp6etdT41Tf3CsmYS/9/nZgyL8P9CipeWwzGJBp5CC2Fls= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=L53wYe7L; arc=none smtp.client-ip=209.127.230.114 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="L53wYe7L" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1774956794; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=c3hq5Me5QkTTcfCV851A2Rw5g4tVVkTFN9csw5ssKNk=; b=L53wYe7Lq/wcz/D58RQTisak4bXyNf7WZ/x3Ai/dX4xMY8pmIz4rN9yPwFVAapT0KobJdB xqPsPkK5iRDxjxK+FWmf0QgPFVJUfBkBnoJALLbwB++h3Dmt90i/f2FeWjjwtQmjmrNVZh JDCYeBdYrjZPxSBDMiTrWTXq1gncfbf88joICWOfdpnynpKk3LYBW9KXTRJVOsd5bDwuzc qTtcCpmLdLc3RWT7yUGxi1SswzoyMBoGZ8/yXzQPg4YgPhLVkB1J6BybkjxFUWS0Bt6Hs3 MteLfFuIayOXDpXxnF5a/UKXzWqAjwy8A7jxju1gX3L2i7v4/+xCkKGP2TVGdg== To: , , , , , , , , , , , , In-Reply-To: <20260331113103.2197007-1-zhouchuyi@bytedance.com> Content-Transfer-Encoding: quoted-printable Subject: [PATCH v4 08/12] smp: Remove preempt_disable from on_each_cpu_cond_mask X-Lms-Return-Path: X-Original-From: Chuyi Zhou Cc: , "Chuyi Zhou" Date: Tue, 31 Mar 2026 19:30:59 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 From: "Chuyi Zhou" Message-Id: <20260331113103.2197007-9-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 References: <20260331113103.2197007-1-zhouchuyi@bytedance.com> Content-Type: text/plain; charset="utf-8" Now smp_call_function_many_cond() internally handles the preemption logic, so on_each_cpu_cond_mask does not need to explicitly disable preemption. Remove preempt_{enable, disable} from on_each_cpu_cond_mask(). Signed-off-by: Chuyi Zhou Reviewed-by: Muchun Song --- kernel/smp.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 9b658362aa02..8a1c26312d12 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -1125,9 +1125,7 @@ void on_each_cpu_cond_mask(smp_cond_func_t cond_func,= smp_call_func_t func, if (wait) scf_flags |=3D SCF_WAIT; =20 - preempt_disable(); smp_call_function_many_cond(mask, func, info, scf_flags, cond_func); - preempt_enable(); } EXPORT_SYMBOL(on_each_cpu_cond_mask); =20 --=20 2.20.1 From nobody Wed Apr 1 10:02:08 2026 Received: from va-1-115.ptr.blmpb.com (va-1-115.ptr.blmpb.com [209.127.230.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 368183DC4C4 for ; Tue, 31 Mar 2026 11:33:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.115 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956813; cv=none; b=WHnv3MbuOTYS4cIImfDz0jlyqPVF/Hc0693GZEVEfbRfX9Zo0Y+B4XnMzmiSUXiymkcZl2NG29yiVOkIO/fCyseRnzToNozgDD0dbxImQJP6he8facla0oF0vtABzCzG0fdHdJ89BG1l2ChXKeFOBwalQUnOcNa30R5CcRuhdCo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956813; c=relaxed/simple; bh=Qa9IpuSjSfpTtmS0nJfwKeG6oQYGbdwsouDwUnE2vi8=; h=From:Message-Id:Content-Type:Cc:Subject:Date:To:In-Reply-To: References:Mime-Version; b=hDfM9WdjMRFiT17FkH4yhxTSCZfxhWiUW+R9l2Kvkr+oV0VdIINU4i9Z4S4WTzZCeBagUBGedw9vJj+ctfxc7O7iP/AL4T4SVUDxKpdZAfSFeeUgDc/W+A4f0vbjP7WVpGATTNrcYbsHbkRLE8HeUI15aU4TzHY0pJi4LkIIsLg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=I15t4j8a; arc=none smtp.client-ip=209.127.230.115 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="I15t4j8a" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1774956807; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=0usi0H2e8tYtT63WQwB8R/fMy+WKcYDOp5mE7Fo7UFU=; b=I15t4j8aNagbcVS7iY9/upwhEa7ghk66vSxrNVy5ECGHqGVnB7PT7rKvS1gi2EwPac2Efx A2e+azkmQHA+b5+gOEIs8L81qggby6J6QJo6Fteho4BtNapcU04ZbtdkwnsRLdtcfGlC2T +aqydjftxW8OXSD75D0wVIX5pY4cNo6l+U/BTxKn6BYYgJvDCV5NjCiUhejoXjcTy2pzSX rMBA6u3mlHFPgo6VOZ5NEGM0ppuik4ykRTQSUo0MGLOFS+V7HcRyXlhcPKvBKonlZjQSD3 IVUx+tJNGZF6fVqqKQm3qcaw/DKxnnXvWQX3dqmwVo7Wg5m224PK9BF+l0zAmw== From: "Chuyi Zhou" Message-Id: <20260331113103.2197007-10-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 Cc: , "Chuyi Zhou" Subject: [PATCH v4 09/12] scftorture: Remove preempt_disable in scftorture_invoke_one Date: Tue, 31 Mar 2026 19:31:00 +0800 X-Original-From: Chuyi Zhou X-Lms-Return-Path: To: , , , , , , , , , , , , In-Reply-To: <20260331113103.2197007-1-zhouchuyi@bytedance.com> References: <20260331113103.2197007-1-zhouchuyi@bytedance.com> Content-Transfer-Encoding: quoted-printable Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Previous patches make smp_call*() handle preemption logic internally. Now the preempt_disable() by most callers becomes unnecessary and can therefore be removed. Remove preempt_{enable, disable} in scftorture_invoke_one(). Signed-off-by: Chuyi Zhou --- kernel/scftorture.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/kernel/scftorture.c b/kernel/scftorture.c index 327c315f411c..b87215e40be5 100644 --- a/kernel/scftorture.c +++ b/kernel/scftorture.c @@ -364,8 +364,6 @@ static void scftorture_invoke_one(struct scf_statistics= *scfp, struct torture_ra } if (use_cpus_read_lock) cpus_read_lock(); - else - preempt_disable(); switch (scfsp->scfs_prim) { case SCF_PRIM_RESCHED: if (IS_BUILTIN(CONFIG_SCF_TORTURE_TEST)) { @@ -411,13 +409,10 @@ static void scftorture_invoke_one(struct scf_statisti= cs *scfp, struct torture_ra if (!ret) { if (use_cpus_read_lock) cpus_read_unlock(); - else - preempt_enable(); + wait_for_completion(&scfcp->scfc_completion); if (use_cpus_read_lock) cpus_read_lock(); - else - preempt_disable(); } else { scfp->n_single_rpc_ofl++; scf_add_to_free_list(scfcp); @@ -463,8 +458,6 @@ static void scftorture_invoke_one(struct scf_statistics= *scfp, struct torture_ra } if (use_cpus_read_lock) cpus_read_unlock(); - else - preempt_enable(); if (allocfail) schedule_timeout_idle((1 + longwait) * HZ); // Let no-wait handlers com= plete. else if (!(torture_random(trsp) & 0xfff)) --=20 2.20.1 From nobody Wed Apr 1 10:02:08 2026 Received: from va-1-111.ptr.blmpb.com (va-1-111.ptr.blmpb.com [209.127.230.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 690CE31715D for ; Tue, 31 Mar 2026 11:33:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.111 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956828; cv=none; b=YgSfS3OPZs794eiu8JffD33IKHacQ6KLyxLA1jvWXlP9xU+VAS1Z4r5G4RqR1FjnH4V/ClD2Hf5MhAYRN9iG9JBqXTUBLvw1uMkUTQSaB60CgDgvDLAMnGc5uUg83KUpHMBb9Z8JPBdj9EJTiiKCxQL8FfAQ+hoRC8uM+SNcuvU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956828; c=relaxed/simple; bh=yKMOFqHHPiGM6QdNGInGyQVsW7G2+Z6vzitvihpbaCE=; h=Content-Type:To:Mime-Version:Cc:Subject:Date:Message-Id: In-Reply-To:From:References; b=SHvbrk1sC5Y/4fI7coYX0xsdAm9EoYtSK7xfolyRGWzWn5DzJku/0xUHuHg2wvqhqhywySfQm+3lLyQpU7WxRrBBglOXNxW8vvzvufRMC3+jSF7d7snF6/EIc57w3MEAoLgoiCNGJ0+qiZTYLe//LdDDfgYMEVlv07L8506/uuc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=V86u5r5r; arc=none smtp.client-ip=209.127.230.111 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="V86u5r5r" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1774956821; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=aUK3r6pXI/HwsoVBOCdW4m6bVqpLkwnx6XaS+jnRFrM=; b=V86u5r5rsQ5EyzWloNq1A2z5D/fUrSPe+zw/xy8HJWylLkdNGjmmaJ2NYMd41NsKs1iOQh kyGdjT8gin3/Izo6RU34AFoGSC1hathirJP6hrYpV3RLjpvkYE+YuMr5dX97bwQE/cCe+a 6LYFWiSwilHnL/YSkWBnpGVgHjPxQpY1FhtBV6eN8kwoU9an2qkxboV7BtqolOck/s6pwL j8fCcI9LgQP9BlILqkutUCjkw8GIpF8Oyea0zMrcOEDBcXTyzSiZ709PGiI6MHM/9/G5g+ 3+hMDhxRsc1bE+Y0TxURLPwPRePXAT6RTtzmKXSlipzt0Uv72b7fU5Z430gqUg== To: , , , , , , , , , , , , Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Cc: , "Chuyi Zhou" Subject: [PATCH v4 10/12] x86/mm: Move flush_tlb_info back to the stack Date: Tue, 31 Mar 2026 19:31:01 +0800 Message-Id: <20260331113103.2197007-11-zhouchuyi@bytedance.com> In-Reply-To: <20260331113103.2197007-1-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 X-Lms-Return-Path: X-Original-From: Chuyi Zhou From: "Chuyi Zhou" References: <20260331113103.2197007-1-zhouchuyi@bytedance.com> Content-Type: text/plain; charset="utf-8" Commit 3db6d5a5ecaf ("x86/mm/tlb: Remove 'struct flush_tlb_info' from the stack") converted flush_tlb_info from stack variable to per-CPU variable. This brought about a performance improvement of around 3% in extreme test. However, it also required that all flush_tlb* operations keep preemption disabled entirely to prevent concurrent modifications of flush_tlb_info. flush_tlb* needs to send IPIs to remote CPUs and synchronously wait for all remote CPUs to complete their local TLB flushes. The process could take tens of milliseconds when interrupts are disabled or with a large number of remote CPUs. From the perspective of improving kernel real-time performance, this patch reverts flush_tlb_info back to stack variables and align it with SMP_CACHE_BYTES. In certain configurations, SMP_CACHE_BYTES may be large, so the alignment size is limited to 64. This is a preparation for enabling preemption during TLB flush in next patch. To evaluate the performance impact of this patch, use the following script to reproduce the microbenchmark mentioned in commit 3db6d5a5ecaf ("x86/mm/tlb: Remove 'struct flush_tlb_info' from the stack"). The test environment is an Ice Lake system (Intel(R) Xeon(R) Platinum 8336C) with 128 CPUs and 2 NUMA nodes. During the test, the threads were bound to specific CPUs, and both pti and mitigations were disabled: #include #include #include #include #include #include #define NUM_OPS 1000000 #define NUM_THREADS 3 #define NUM_RUNS 5 #define PAGE_SIZE 4096 volatile int stop_threads =3D 0; void *busy_wait_thread(void *arg) { while (!stop_threads) { __asm__ volatile ("nop"); } return NULL; } long long get_usec() { struct timeval tv; gettimeofday(&tv, NULL); return tv.tv_sec * 1000000LL + tv.tv_usec; } int main() { pthread_t threads[NUM_THREADS]; char *addr; int i, r; addr =3D mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (addr =3D=3D MAP_FAILED) { perror("mmap"); exit(1); } for (i =3D 0; i < NUM_THREADS; i++) { if (pthread_create(&threads[i], NULL, busy_wait_thread, NULL)) exit(1); } printf("Running benchmark: %d runs, %d ops each, %d background\n" "threads\n", NUM_RUNS, NUM_OPS, NUM_THREADS); for (r =3D 0; r < NUM_RUNS; r++) { long long start, end; start =3D get_usec(); for (i =3D 0; i < NUM_OPS; i++) { addr[0] =3D 1; if (madvise(addr, PAGE_SIZE, MADV_DONTNEED)) { perror("madvise"); exit(1); } } end =3D get_usec(); double duration =3D (double)(end - start); double avg_lat =3D duration / NUM_OPS; printf("Run %d: Total time %.2f us, Avg latency %.4f us/op\n", r + 1, duration, avg_lat); } stop_threads =3D 1; for (i =3D 0; i < NUM_THREADS; i++) pthread_join(threads[i], NULL); munmap(addr, PAGE_SIZE); return 0; } base on-stack-aligned on-stack-not-aligned ---- --------- ----------- avg (usec/op) 2.5278 2.5261 2.5508 stddev 0.0007 0.0027 0.0023 The benchmark results show that the average latency difference between the baseline (base) and the properly aligned stack variable (on-stack-aligned) is within the standard deviation (stddev). This indicates that the variations are caused by testing noise, and reverting to a stack variable with proper alignment causes no performance regression compared to the per-CPU implementation. The unaligned version (on-stack-not-aligned) shows a minor performance drop. This demonstrates that we can improve the real-time performance without sacrificing performance. Suggested-by: Sebastian Andrzej Siewior Suggested-by: Nadav Amit Signed-off-by: Chuyi Zhou --- arch/x86/include/asm/tlbflush.h | 8 +++- arch/x86/mm/tlb.c | 72 +++++++++------------------------ 2 files changed, 27 insertions(+), 53 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflus= h.h index 0545fe75c3fa..f4e4505d4ece 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -211,6 +211,12 @@ extern u16 invlpgb_count_max; =20 extern void initialize_tlbstate_and_flush(void); =20 +#if SMP_CACHE_BYTES > 64 +#define FLUSH_TLB_INFO_ALIGN 64 +#else +#define FLUSH_TLB_INFO_ALIGN SMP_CACHE_BYTES +#endif + /* * TLB flushing: * @@ -249,7 +255,7 @@ struct flush_tlb_info { u8 stride_shift; u8 freed_tables; u8 trim_cpumask; -}; +} __aligned(FLUSH_TLB_INFO_ALIGN); =20 void flush_tlb_local(void); void flush_tlb_one_user(unsigned long addr); diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index af43d177087e..cfc3a72477f5 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1373,28 +1373,12 @@ void flush_tlb_multi(const struct cpumask *cpumask, */ unsigned long tlb_single_page_flush_ceiling __read_mostly =3D 33; =20 -static DEFINE_PER_CPU_SHARED_ALIGNED(struct flush_tlb_info, flush_tlb_info= ); - -#ifdef CONFIG_DEBUG_VM -static DEFINE_PER_CPU(unsigned int, flush_tlb_info_idx); -#endif - -static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, - unsigned long start, unsigned long end, - unsigned int stride_shift, bool freed_tables, - u64 new_tlb_gen) +static void get_flush_tlb_info(struct flush_tlb_info *info, + struct mm_struct *mm, + unsigned long start, unsigned long end, + unsigned int stride_shift, bool freed_tables, + u64 new_tlb_gen) { - struct flush_tlb_info *info =3D this_cpu_ptr(&flush_tlb_info); - -#ifdef CONFIG_DEBUG_VM - /* - * Ensure that the following code is non-reentrant and flush_tlb_info - * is not overwritten. This means no TLB flushing is initiated by - * interrupt handlers and machine-check exception handlers. - */ - BUG_ON(this_cpu_inc_return(flush_tlb_info_idx) !=3D 1); -#endif - /* * If the number of flushes is so large that a full flush * would be faster, do a full flush. @@ -1412,32 +1396,22 @@ static struct flush_tlb_info *get_flush_tlb_info(st= ruct mm_struct *mm, info->new_tlb_gen =3D new_tlb_gen; info->initiating_cpu =3D smp_processor_id(); info->trim_cpumask =3D 0; - - return info; -} - -static void put_flush_tlb_info(void) -{ -#ifdef CONFIG_DEBUG_VM - /* Complete reentrancy prevention checks */ - barrier(); - this_cpu_dec(flush_tlb_info_idx); -#endif } =20 void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, bool freed_tables) { - struct flush_tlb_info *info; + struct flush_tlb_info _info; + struct flush_tlb_info *info =3D &_info; int cpu =3D get_cpu(); u64 new_tlb_gen; =20 /* This is also a barrier that synchronizes with switch_mm(). */ new_tlb_gen =3D inc_mm_tlb_gen(mm); =20 - info =3D get_flush_tlb_info(mm, start, end, stride_shift, freed_tables, - new_tlb_gen); + get_flush_tlb_info(&_info, mm, start, end, stride_shift, freed_tables, + new_tlb_gen); =20 /* * flush_tlb_multi() is not optimized for the common case in which only @@ -1457,7 +1431,6 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigne= d long start, local_irq_enable(); } =20 - put_flush_tlb_info(); put_cpu(); mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); } @@ -1527,19 +1500,16 @@ static void kernel_tlb_flush_range(struct flush_tlb= _info *info) =20 void flush_tlb_kernel_range(unsigned long start, unsigned long end) { - struct flush_tlb_info *info; + struct flush_tlb_info info; =20 guard(preempt)(); + get_flush_tlb_info(&info, NULL, start, end, PAGE_SHIFT, false, + TLB_GENERATION_INVALID); =20 - info =3D get_flush_tlb_info(NULL, start, end, PAGE_SHIFT, false, - TLB_GENERATION_INVALID); - - if (info->end =3D=3D TLB_FLUSH_ALL) - kernel_tlb_flush_all(info); + if (info.end =3D=3D TLB_FLUSH_ALL) + kernel_tlb_flush_all(&info); else - kernel_tlb_flush_range(info); - - put_flush_tlb_info(); + kernel_tlb_flush_range(&info); } =20 /* @@ -1707,12 +1677,11 @@ EXPORT_SYMBOL_FOR_KVM(__flush_tlb_all); =20 void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) { - struct flush_tlb_info *info; + struct flush_tlb_info info; =20 int cpu =3D get_cpu(); - - info =3D get_flush_tlb_info(NULL, 0, TLB_FLUSH_ALL, 0, false, - TLB_GENERATION_INVALID); + get_flush_tlb_info(&info, NULL, 0, TLB_FLUSH_ALL, 0, false, + TLB_GENERATION_INVALID); /* * flush_tlb_multi() is not optimized for the common case in which only * a local TLB flush is needed. Optimize this use-case by calling @@ -1722,17 +1691,16 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap= _batch *batch) invlpgb_flush_all_nonglobals(); batch->unmapped_pages =3D false; } else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { - flush_tlb_multi(&batch->cpumask, info); + flush_tlb_multi(&batch->cpumask, &info); } else if (cpumask_test_cpu(cpu, &batch->cpumask)) { lockdep_assert_irqs_enabled(); local_irq_disable(); - flush_tlb_func(info); + flush_tlb_func(&info); local_irq_enable(); } =20 cpumask_clear(&batch->cpumask); =20 - put_flush_tlb_info(); put_cpu(); } =20 --=20 2.20.1 From nobody Wed Apr 1 10:02:08 2026 Received: from va-1-113.ptr.blmpb.com (va-1-113.ptr.blmpb.com [209.127.230.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DEDD3A5E61 for ; Tue, 31 Mar 2026 11:33:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956840; cv=none; b=u3GTgHc9J5Q6PuvjD9p2PqJSM/cI3n3n76Xn5xWeRnGEE6NMQGKaxL2Qbn5rW6AFyESQNWAiKQwqUA81JSJNslp1IPtHSo7v9vPOd6l5GLJfXYm647Yopd7EoaFdJghrHvvm4VaJ/6inMjGET2HEB4f2Q/R9oQS06PBhuuLj15s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956840; c=relaxed/simple; bh=qSQTKmFatG7y7DFIJ0d6KtLoocQeO0QaRbWFAqrfiOM=; h=Cc:Subject:To:Message-Id:References:From:In-Reply-To:Content-Type: Date:Mime-Version; b=YIpYIp57n7FfbKJhps/a6hEx1Dnj55ZvMkgf+VVYTjPHDOmHvglQkXoF1RhU1/B8vsaFtOaNRAZXKBHUtbNyZnh5GNR4al4ytLiNI0ylT6VdqFSRRf5TSpvjrS18FazFF8rjMUbSWErRqdbJmsyHcWBK3cOPsqg5MN3oEXuRlcs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=K2R0bU6g; arc=none smtp.client-ip=209.127.230.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="K2R0bU6g" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1774956834; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=hJBc7tJV1GDZ0XkoFVNF8pqXPsHEQjYxsWQJKGn58QQ=; b=K2R0bU6g2V9rjFV2TZvTyjCp6UiEUxSGtKLtG2Bv/IKrX5tkDgQSzN4TOlpQYTVo7rb1Ss bpa2PsCYl0OdOvphbXvHZcyLtQ8tG2aXQ1iPGzZ/s1wNgzTQB5qyIVwFtCaOBbWzEeV8HN KDbEbmFNFOwZWoAE6U23oQyGb9cJNO5ZIrWp2QEWSpuBkwfjDkAkyuu17eAtsFP0s9nUl3 5ri60WOuoiP1mNq7YjQ17fZldjRFYncp5CWIkPpzpngQJKCjpoU9aH5nQUOUtRIL3QTXxH 8jNK/gOB/3Ivpl9rJ5Mk4iY09nibdFLH9M+aTJmS7w2pIu5XMeRl/QehB5TLbQ== Cc: , "Chuyi Zhou" Subject: [PATCH v4 11/12] x86/mm: Enable preemption during native_flush_tlb_multi To: , , , , , , , , , , , , Message-Id: <20260331113103.2197007-12-zhouchuyi@bytedance.com> X-Lms-Return-Path: X-Mailer: git-send-email 2.20.1 Content-Transfer-Encoding: quoted-printable References: <20260331113103.2197007-1-zhouchuyi@bytedance.com> From: "Chuyi Zhou" In-Reply-To: <20260331113103.2197007-1-zhouchuyi@bytedance.com> Date: Tue, 31 Mar 2026 19:31:02 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Original-From: Chuyi Zhou Content-Type: text/plain; charset="utf-8" native_flush_tlb_multi() may be frequently called by flush_tlb_mm_range() and arch_tlbbatch_flush() in production environments. When pages are reclaimed or process exit, native_flush_tlb_multi() sends IPIs to remote CPUs and waits for all remote CPUs to complete their local TLB flushes. The overall latency may reach tens of milliseconds due to a large number of remote CPUs and other factors (such as interrupts being disabled). Since flush_tlb_mm_range() and arch_tlbbatch_flush() always disable preemption, which may cause increased scheduling latency for other threads on the current CPU. Previous patch converted flush_tlb_info from per-cpu variable to on-stack variable. Additionally, it's no longer necessary to explicitly disable preemption before calling smp_call*() since they internally handle the preemption logic. Now it's safe to enable preemption during native_flush_tlb_multi(). Signed-off-by: Chuyi Zhou --- arch/x86/kernel/kvm.c | 4 +++- arch/x86/mm/tlb.c | 9 +++++++-- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 3bc062363814..4f7f4c1149b9 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -668,8 +668,10 @@ static void kvm_flush_tlb_multi(const struct cpumask *= cpumask, u8 state; int cpu; struct kvm_steal_time *src; - struct cpumask *flushmask =3D this_cpu_cpumask_var_ptr(__pv_cpu_mask); + struct cpumask *flushmask; =20 + guard(preempt)(); + flushmask =3D this_cpu_cpumask_var_ptr(__pv_cpu_mask); cpumask_copy(flushmask, cpumask); /* * We have to call flush only on online vCPUs. And diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index cfc3a72477f5..58c6f3d2f993 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1421,9 +1421,11 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsign= ed long start, if (mm_global_asid(mm)) { broadcast_tlb_flush(info); } else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) { + put_cpu(); info->trim_cpumask =3D should_trim_cpumask(mm); flush_tlb_multi(mm_cpumask(mm), info); consider_global_asid(mm); + goto invalidate; } else if (mm =3D=3D this_cpu_read(cpu_tlbstate.loaded_mm)) { lockdep_assert_irqs_enabled(); local_irq_disable(); @@ -1432,6 +1434,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigne= d long start, } =20 put_cpu(); +invalidate: mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); } =20 @@ -1691,7 +1694,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_b= atch *batch) invlpgb_flush_all_nonglobals(); batch->unmapped_pages =3D false; } else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { + put_cpu(); flush_tlb_multi(&batch->cpumask, &info); + goto clear; } else if (cpumask_test_cpu(cpu, &batch->cpumask)) { lockdep_assert_irqs_enabled(); local_irq_disable(); @@ -1699,9 +1704,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_b= atch *batch) local_irq_enable(); } =20 - cpumask_clear(&batch->cpumask); - put_cpu(); +clear: + cpumask_clear(&batch->cpumask); } =20 /* --=20 2.20.1 From nobody Wed Apr 1 10:02:08 2026 Received: from va-1-115.ptr.blmpb.com (va-1-115.ptr.blmpb.com [209.127.230.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3133F31715D for ; Tue, 31 Mar 2026 11:34:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.115 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956852; cv=none; b=DvcYv/MoxtMgFDNs4LaF+7smRj8NnBxYz5lcgp1z63e+Sif3By5YsmWR3Cvt+AmFxe0h77MsPYNUk/xE1RTPh2YEfnf8ObWvDbMtlI4eaQLqy2P6vK8IWoxsZULuyLysip6IQ5rAQd2vS2Hk1YD7pNqj6WM9mKn05+o1DDHTFfA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774956852; c=relaxed/simple; bh=SnTlhugGdX0bTE/87HbhO6BksKz0DZkq52gEATDYEPY=; h=To:Date:In-Reply-To:Content-Type:Subject:Cc:From:Message-Id: References:Mime-Version; b=MLMhXL5DWYxX6pD6b7bqx2A+s/kD5btVVbVMu4Sgcy63qYkf3P9O6xdIjC+ssGhacLk4z38VtvQCfadiN+T5AZwGhdApv+HFWD0ucFvM1wsa/shS2jxOSQxnKD5DZm0ia5DVQoxnCdWcTZLc4Keur+Cb0crGT80gHNuCW7sX4QI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=XTYe2BqR; arc=none smtp.client-ip=209.127.230.115 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="XTYe2BqR" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1774956847; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=3e681DmNzN1rxc8H3IUAJMYq13Sjy9zXysxvUnJSINE=; b=XTYe2BqR5Abak6QdZDJC5L6FCEcZGahcXrhU/eSmxFRH8oInibDb2CUc6YGrFjnWNW3Q9I /A7426sexxV0dB6O+a60O18xGazGNCsURvy6pNFL/kDxE9ZRc6g8kzabQ1X7tH/FA03/bj CCgRP572dub3JEs2FIEEcmI4WD/W45gMFH7vgDOgevBvqDFpICxhIbQb7O+h6udvvqjTJn J3IPhczKjPJpYAVkvx5XjOgtJD5mgFwYnldncYK6xMw7SlGjHtC4QYytdhIIpS7KysJe7M q2rwqb9wj825cMPqGB74bPo2tbQW2v05pR6bSSoBkLgQJjpBvcNWzQdjkcitZw== To: , , , , , , , , , , , , Date: Tue, 31 Mar 2026 19:31:03 +0800 Content-Transfer-Encoding: quoted-printable In-Reply-To: <20260331113103.2197007-1-zhouchuyi@bytedance.com> Subject: [PATCH v4 12/12] x86/mm: Enable preemption during flush_tlb_kernel_range X-Mailer: git-send-email 2.20.1 Cc: , "Chuyi Zhou" From: "Chuyi Zhou" Message-Id: <20260331113103.2197007-13-zhouchuyi@bytedance.com> References: <20260331113103.2197007-1-zhouchuyi@bytedance.com> X-Lms-Return-Path: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Original-From: Chuyi Zhou Content-Type: text/plain; charset="utf-8" flush_tlb_kernel_range() is invoked when kernel memory mapping changes. On x86 platforms without the INVLPGB feature enabled, we need to send IPIs to every online CPU and synchronously wait for them to complete do_kernel_range_flush(). This process can be time-consuming due to factors such as a large number of CPUs or other issues (like interrupts being disabled). flush_tlb_kernel_range() always disables preemption, this may affect the scheduling latency of other tasks on the current CPU. Previous patch converted flush_tlb_info from per-cpu variable to on-stack variable. Additionally, it's no longer necessary to explicitly disable preemption before calling smp_call*() since they internally handles the preemption logic. Now is's safe to enable preemption during flush_tlb_kernel_range(). Additionally, in get_flush_tlb_info() use raw_smp_processor_id() to avoid warnings from check_preemption_disabled(). Signed-off-by: Chuyi Zhou --- arch/x86/mm/tlb.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 58c6f3d2f993..c37cc9845abc 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1394,7 +1394,7 @@ static void get_flush_tlb_info(struct flush_tlb_info = *info, info->stride_shift =3D stride_shift; info->freed_tables =3D freed_tables; info->new_tlb_gen =3D new_tlb_gen; - info->initiating_cpu =3D smp_processor_id(); + info->initiating_cpu =3D raw_smp_processor_id(); info->trim_cpumask =3D 0; } =20 @@ -1461,6 +1461,8 @@ static void invlpgb_kernel_range_flush(struct flush_t= lb_info *info) { unsigned long addr, nr; =20 + guard(preempt)(); + for (addr =3D info->start; addr < info->end; addr +=3D nr << PAGE_SHIFT) { nr =3D (info->end - addr) >> PAGE_SHIFT; =20 @@ -1505,7 +1507,6 @@ void flush_tlb_kernel_range(unsigned long start, unsi= gned long end) { struct flush_tlb_info info; =20 - guard(preempt)(); get_flush_tlb_info(&info, NULL, start, end, PAGE_SHIFT, false, TLB_GENERATION_INVALID); =20 --=20 2.20.1