From nobody Mon Apr 6 19:45:50 2026 Received: from va-1-113.ptr.blmpb.com (va-1-113.ptr.blmpb.com [209.127.230.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3E89A2EA16A for ; Wed, 18 Mar 2026 04:57:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809869; cv=none; b=j76hZuMw+vZg/FXSaDCmKpdX47XoWYXB66HD+h5WOrxOQx0ntlEyNyhb8B2bZvDMAgxfJzqMFFw3tqdItNpCdr/SBet8ALYkkaMpR6cHahXqrETfXIp0HCeBKhWXtslmUlIJ7QqSIY7eAL4VGZVVZmgWG5DC34iDvOX6o1+Pa2E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809869; c=relaxed/simple; bh=XH+TgyQJYrU8fNK1eEXW2c0NoIumPBqw2CKhw534Sz4=; h=To:Cc:Date:Message-Id:Mime-Version:References:Content-Type:From: Subject:In-Reply-To; b=bc9IWimI6xkL0jJHv3IdQ+HV0HeED4T4DbCCCVUvznbz9iAe2PS5WVEIWmCsTSwVQwq+hi6MHNBGlonoUUEoK03X/AI0btOovlymcoQJBmV8dNX1a6KdaTwvJJEC3TC3JSY6HEUIGlJ1ykJrQNnr4KJbaUqjrtAyVGbrR6rQbpk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=eRQ2uZoH; arc=none smtp.client-ip=209.127.230.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="eRQ2uZoH" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1773809862; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=XJQbGVgOA6I60dB8y0qr6QT4Xv7obtiE15uJEm45WXA=; b=eRQ2uZoHlwCDtzXiAa9UJwu+NCK4oQZdEwYZ38qyffg7YNqWKzOpvvh+1JgQQE809pmyS8 znC0862odMBCQhzeu3nfUIdxtVKB1Uwdk2vkjCeJOvnnwMIEdNYVZAanSixm2/m+B0bCkN /3iyXIVuFt6gopmGNloafnUMZVj+uJ17ZdJktmzKRWR5BSFzehuWuyLKIFhMjN9GBV+Rpa Ajp8N2NUb4pCjAUDuY8ZuXp3bmo8MtlM0W2RVYH0/1uSW9CppTqAUVqaxQ9efWcKVgLBPA 2pxNRx3UJ03DnnpgcWfILYZ7+JGp6QZtzelTRmLx//jAt5eYcwfCpQrBPyX/dw== X-Original-From: Chuyi Zhou To: , , , , , , , , , , , Cc: , "Chuyi Zhou" Date: Wed, 18 Mar 2026 12:56:27 +0800 Message-Id: <20260318045638.1572777-2-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Content-Transfer-Encoding: quoted-printable From: "Chuyi Zhou" Subject: [PATCH v3 01/12] smp: Disable preemption explicitly in __csd_lock_wait X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260318045638.1572777-1-zhouchuyi@bytedance.com> X-Lms-Return-Path: Content-Type: text/plain; charset="utf-8" The latter patches will enable preemption before csd_lock_wait(), which could break csdlock_debug. Because the slice of other tasks on the CPU may be accounted between ktime_get_mono_fast_ns() calls. Disable preemption explicitly in __csd_lock_wait(). This is a preparation for the next patches. Signed-off-by: Chuyi Zhou Acked-by: Muchun Song Reviewed-by: Steven Rostedt (Google) --- kernel/smp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/smp.c b/kernel/smp.c index f349960f79ca..fc1f7a964616 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -323,6 +323,8 @@ static void __csd_lock_wait(call_single_data_t *csd) int bug_id =3D 0; u64 ts0, ts1; =20 + guard(preempt)(); + ts1 =3D ts0 =3D ktime_get_mono_fast_ns(); for (;;) { if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id, &nmessages)) --=20 2.20.1 From nobody Mon Apr 6 19:45:50 2026 Received: from va-1-114.ptr.blmpb.com (va-1-114.ptr.blmpb.com [209.127.230.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E37722EA16A for ; Wed, 18 Mar 2026 04:57:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.114 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809881; cv=none; b=K3HKxk3DGVKMMPwkIdmr4dtxZ1ji2spk6oYULAK/GCAO/6HaLbB63F3aazLY4QqH9DWEZDvZKmMAsqrfhmX8xf8XltJnurBvXKlxXD4hgmO3qWt065kK8Dp1ty8Lb5vOdxhLqw/AwTGM1bMEnL3wcnq9J/SnNgzBs+D5RvzjcFY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809881; c=relaxed/simple; bh=BUQMpQd8u4yUUj234WDO6+aggVbiGFe/pxhvaD81hJA=; h=Message-Id:Mime-Version:In-Reply-To:Cc:From:Content-Type:To: References:Subject:Date; b=KoPekEBNW8LuA7cU4A6xQhk9WV+BU6CnRHYY709JcmpjADIiDCoH9qpBg5U7J9Mjgv773jjmlhXz80tSVN4x1gzq8bi86T1ciOYkTz1yD/Gm1IKx9QgKw1zvYvgqRD6uKsnbpCQdPLHtr9QcsDiD5GsrxuVGPHYGZXYiW2trw8o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=NvSR84l9; arc=none smtp.client-ip=209.127.230.114 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="NvSR84l9" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1773809875; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=3XTAwPdhGZK2OUqdl9lebTQ6K/gQ43T09kCwqsH/4Gg=; b=NvSR84l9Ul3Ci9jxM1oVJA4Cx6HEKrCgPMAe2crYWSUMJLhLok4w9jQwRXW71iZx3G9+cb xlYy9mCJnkjeYKGYgy0JvJ1TvWFxxhhH+wljyo0d5RDThxOzOAYXDjPYwMW7lh/8thL+WG 3XBquo5Ah5lmdWxQhowGM1hIdwuXNofMNzwBN+VYt8zz/qtN0SXxZqFKXVj0jKsljzKHJd pH4l3ZFG4F9kv0/+iyS1DFGzA4pqqYPlgXUxSRVB6RFcY99Y0wEPf7zbkTb0tVxhS2adx9 neyjR0IbztJJK5y27z812OAz8yhHBie2jV5//+BxkrqkmkmUXdsKYpn7ZLsQBQ== Message-Id: <20260318045638.1572777-3-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Cc: , "Chuyi Zhou" From: "Chuyi Zhou" To: , , , , , , , , , , , X-Mailer: git-send-email 2.20.1 X-Original-From: Chuyi Zhou References: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Subject: [PATCH v3 02/12] smp: Enable preemption early in smp_call_function_single Date: Wed, 18 Mar 2026 12:56:28 +0800 Content-Transfer-Encoding: quoted-printable X-Lms-Return-Path: Content-Type: text/plain; charset="utf-8" Now smp_call_function_single() disables preemption mainly for the following reasons: - To protect the per-cpu csd_data from concurrent modification by other tasks on the current CPU in the !wait case. For the wait case, synchronization is not a concern as on-stack csd is used. - To prevent the remote online CPU from being offlined. Specifically, we want to ensure that no new IPIs are queued after smpcfd_dying_cpu() has finished. Disabling preemption for the entire execution is unnecessary, especially csd_lock_wait() part does not require preemption protection. This patch enables preemption before csd_lock_wait() to reduce the preemption-disabled critical section. Signed-off-by: Chuyi Zhou Reviewed-by: Muchun Song Reviewed-by: Steven Rostedt (Google) --- kernel/smp.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index fc1f7a964616..b603d4229f95 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -685,11 +685,16 @@ int smp_call_function_single(int cpu, smp_call_func_t= func, void *info, =20 err =3D generic_exec_single(cpu, csd); =20 + /* + * @csd is stack-allocated when @wait is true. No concurrent access + * except from the IPI completion path, so we can re-enable preemption + * early to reduce latency. + */ + put_cpu(); + if (wait) csd_lock_wait(csd); =20 - put_cpu(); - return err; } EXPORT_SYMBOL(smp_call_function_single); --=20 2.20.1 From nobody Mon Apr 6 19:45:50 2026 Received: from va-1-111.ptr.blmpb.com (va-1-111.ptr.blmpb.com [209.127.230.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B32E82FD1A1 for ; Wed, 18 Mar 2026 04:58:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.111 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809893; cv=none; b=c2SMw3db8zmw7+oI2RIHWD+VQavHwx0Rc8S+QPPP4LtI0VzE2d6rOvpm8EUlDcEGmjJh+ntLbktxKaLyDkmPpDzuBneeaG7T6uVaEWUZrncIwYuwM0Y4A0Xw5cWGCEPJp/7qDQeVI9cAvPAFqC/9z8jVUcmXTqsvpX7KI6Eq6CE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809893; c=relaxed/simple; bh=TCy2KX9qDbiHGy1QuZHj0y6OOg+oyJE4Ks74Pqah68A=; h=References:Cc:From:Date:In-Reply-To:Content-Type:Subject: Message-Id:Mime-Version:To; b=JCSnDz4XWD7FCUy95S+5nus7dl/uvJphTRNspVJHlgmoiWmxjI0RLpimcHevEmqccCVppzpYV024ZOtVeUc/BDMpWvqosZGW58XdjDkWzFxbWpTEwi0Td+f17Z8jza+3ynn2C/gUsVN24RTwvVSgMXrRjfqlVunsF35q6wm5rB0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=LaPvzZpT; arc=none smtp.client-ip=209.127.230.111 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="LaPvzZpT" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1773809887; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=KHQMxt7OqXLP0XEy9NJ/DuIMZST6z1kq98CQ96mzrVY=; b=LaPvzZpTtMUP4M1VYO9O/7yeOIZWcOy47aKB8heL1glDtggQeE8B4Xza3Njztn4vhhM/cq IAAavfaF09pSsY2WYCe3F0LJ70yS+UB4DrDHEwNAW8XHzy7vvHfLEm8kNtzptVLMrcR6F5 76fcPqdDuOufTozktRDQj3VX/9IMtLtq521O/5k3NF39q0B/9H1yJDIUuf3PFMzoi10yUd gv0WxYMC0oq/Ti3zPx9WDYGaLqOZtn/4ul/5r0ohwIDNRUkn0vk0j3pczWXV+Tcz/HL00+ JEi4fHlpp38UWGOBkUIOqJuNJFabfJ9ftgCrNwL5zXb0CJ2nMyHa97kB15gDZA== Content-Transfer-Encoding: quoted-printable References: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Cc: , "Chuyi Zhou" From: "Chuyi Zhou" Date: Wed, 18 Mar 2026 12:56:29 +0800 In-Reply-To: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Subject: [PATCH v3 03/12] smp: Remove get_cpu from smp_call_function_any Message-Id: <20260318045638.1572777-4-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 To: , , , , , , , , , , , X-Original-From: Chuyi Zhou X-Lms-Return-Path: X-Mailer: git-send-email 2.20.1 Content-Type: text/plain; charset="utf-8" Now smp_call_function_single() would enable preemption before csd_lock_wait() to reduce the critical section. To allow callers of smp_call_function_any() to also benefit from this optimization, remove get_cpu()/put_cpu() from smp_call_function_any(). Signed-off-by: Chuyi Zhou Reviewed-by: Muchun Song --- kernel/smp.c | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index b603d4229f95..80daf9dd4a25 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -761,16 +761,26 @@ EXPORT_SYMBOL_GPL(smp_call_function_single_async); int smp_call_function_any(const struct cpumask *mask, smp_call_func_t func, void *info, int wait) { + bool local =3D true; unsigned int cpu; int ret; =20 - /* Try for same CPU (cheapest) */ + /* + * Prevent migration to another CPU after selecting the current CPU + * as the target. + */ cpu =3D get_cpu(); - if (!cpumask_test_cpu(cpu, mask)) + + /* Try for same CPU (cheapest) */ + if (!cpumask_test_cpu(cpu, mask)) { cpu =3D sched_numa_find_nth_cpu(mask, 0, cpu_to_node(cpu)); + local =3D false; + put_cpu(); + } =20 ret =3D smp_call_function_single(cpu, func, info, wait); - put_cpu(); + if (local) + put_cpu(); return ret; } EXPORT_SYMBOL_GPL(smp_call_function_any); --=20 2.20.1 From nobody Mon Apr 6 19:45:50 2026 Received: from va-1-114.ptr.blmpb.com (va-1-114.ptr.blmpb.com [209.127.230.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 19DA82FDC2C for ; Wed, 18 Mar 2026 04:58:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.114 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809904; cv=none; b=q7F2CT5pEkovx/IdfTNtetptJtb88eQMA54H/2zQqk12XjwjGJsLQodZsjL03bMN73zvQX+m6eunXq0R63d8RWTZbAzUpnzaLgwJNGhoELaipqO0OK6IDHYI1Y7cQUt3CAa0ZIsQtzWroqzvI5awaIuFnt0P/KLaJ7pCwrVtFsk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809904; c=relaxed/simple; bh=7MJNZsJeS39n2ojWHSx8/JErYlkaOcC+j+Gs9faF9yc=; h=To:Cc:From:Message-Id:References:Content-Type:Subject:Date: In-Reply-To:Mime-Version; b=mizG2t91CAGmF0NVlLFvltWeoP/67yfUPg3DLsJYMmIrQXejeHMX2sHXXYWAZWIgXr/yHE+rqoHK+CqhgkRPFEKgEaHIK0M6ABp935aSIwkCGQ6lIIBS2o8ccaAVy+ywJ9Pu+pwckx9XN7vyat8wQDJgPLlNOZcn4XZ8uJtVKoQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=dDfHeWfP; arc=none smtp.client-ip=209.127.230.114 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="dDfHeWfP" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1773809899; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=I3Rm8ROr4+HRYuuyjAmvwWOTfT5mUF/s2Lnk6LUdYbg=; b=dDfHeWfPK+erY4zx3KTHpaEIT4g0ueD1k9qZ6rmDP25QFwNOLgfHc8nOhCtCfx9cDOjdJ7 CEHCMGjbvDEbkfSpuKmxMkpj43eW/YO4YMjmXnJT63Iug87D8Fyi81Z8pVbp0USIKaT0Tg aB/sXP7OfyBsUnnEl2NclFy4YqLTdSukl0kMoEr32D+vm842J7M8ErsUrWwiQWi0o76hAn EOhF5OQ6kIrbVIgWlheH+UiBAztwNWnJ2wgKmNeR1rGZ1jDpADlifQUY7v3YvW5fpytSEZ A7i3wVxpvufK4o17pHGtxaIL56LEfzo8XFlpsmKi/WSwZKrRiSTaJ2aSClEIUg== To: , , , , , , , , , , , Cc: , "Chuyi Zhou" From: "Chuyi Zhou" Message-Id: <20260318045638.1572777-5-zhouchuyi@bytedance.com> References: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Content-Transfer-Encoding: quoted-printable Subject: [PATCH v3 04/12] smp: Use on-stack cpumask in smp_call_function_many_cond Date: Wed, 18 Mar 2026 12:56:30 +0800 In-Reply-To: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Lms-Return-Path: X-Original-From: Chuyi Zhou X-Mailer: git-send-email 2.20.1 Content-Type: text/plain; charset="utf-8" This patch use on-stack cpumask to replace percpu cfd cpumask in smp_call_function_many_cond(). Note that when both CONFIG_CPUMASK_OFFSTACK and PREEMPT_RT are enabled, allocation during preempt-disabled section would break RT. Therefore, only do this when CONFIG_CPUMASK_OFFSTACK=3Dn. This is a preparation for enabling preemption during csd_lock_wait() in smp_call_function_many_cond(). Signed-off-by: Chuyi Zhou Reviewed-by: Muchun Song --- kernel/smp.c | 25 +++++++++++++++++++------ 1 file changed, 19 insertions(+), 6 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 80daf9dd4a25..9728ba55944d 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -799,14 +799,25 @@ static void smp_call_function_many_cond(const struct = cpumask *mask, unsigned int scf_flags, smp_cond_func_t cond_func) { + bool preemptible_wait =3D !IS_ENABLED(CONFIG_CPUMASK_OFFSTACK); int cpu, last_cpu, this_cpu =3D smp_processor_id(); struct call_function_data *cfd; bool wait =3D scf_flags & SCF_WAIT; + cpumask_var_t cpumask_stack; + struct cpumask *cpumask; int nr_cpus =3D 0; bool run_remote =3D false; =20 lockdep_assert_preemption_disabled(); =20 + cfd =3D this_cpu_ptr(&cfd_data); + cpumask =3D cfd->cpumask; + + if (preemptible_wait) { + BUILD_BUG_ON(!alloc_cpumask_var(&cpumask_stack, GFP_ATOMIC)); + cpumask =3D cpumask_stack; + } + /* * Can deadlock when called with interrupts disabled. * We allow cpu's that are not yet online though, as no one else can @@ -827,16 +838,15 @@ static void smp_call_function_many_cond(const struct = cpumask *mask, =20 /* Check if we need remote execution, i.e., any CPU excluding this one. */ if (cpumask_any_and_but(mask, cpu_online_mask, this_cpu) < nr_cpu_ids) { - cfd =3D this_cpu_ptr(&cfd_data); - cpumask_and(cfd->cpumask, mask, cpu_online_mask); - __cpumask_clear_cpu(this_cpu, cfd->cpumask); + cpumask_and(cpumask, mask, cpu_online_mask); + __cpumask_clear_cpu(this_cpu, cpumask); =20 cpumask_clear(cfd->cpumask_ipi); - for_each_cpu(cpu, cfd->cpumask) { + for_each_cpu(cpu, cpumask) { call_single_data_t *csd =3D per_cpu_ptr(cfd->csd, cpu); =20 if (cond_func && !cond_func(cpu, info)) { - __cpumask_clear_cpu(cpu, cfd->cpumask); + __cpumask_clear_cpu(cpu, cpumask); continue; } =20 @@ -887,13 +897,16 @@ static void smp_call_function_many_cond(const struct = cpumask *mask, } =20 if (run_remote && wait) { - for_each_cpu(cpu, cfd->cpumask) { + for_each_cpu(cpu, cpumask) { call_single_data_t *csd; =20 csd =3D per_cpu_ptr(cfd->csd, cpu); csd_lock_wait(csd); } } + + if (preemptible_wait) + free_cpumask_var(cpumask_stack); } =20 /** --=20 2.20.1 From nobody Mon Apr 6 19:45:50 2026 Received: from sg-1-104.ptr.blmpb.com (sg-1-104.ptr.blmpb.com [118.26.132.104]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 030812FB08C for ; Wed, 18 Mar 2026 04:59:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=118.26.132.104 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809962; cv=none; b=qoMSMuMM9YdX/ABOTsdfhabgHJIu+3ftMV/7ns3z0f6b2jYapooBP7dziQ8i5+eJRSoVvr9bUzXe24N7yux+aF07+DB49CSbc2sNSgSIKJK6XywRfbbtSehf97p1GeKCctzrdbIKd7j7b7um5Y7y0cavxa2PusxCDtvCH+i+Z8g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809962; c=relaxed/simple; bh=3B8Q6Z8qPVom0wVfa/u6kST+pmSoHMdNyiVxux+JTCs=; h=To:Subject:Date:Mime-Version:From:Message-Id:References: Content-Type:Cc:In-Reply-To; b=ld87M+Xker4v6ikIkprXpGZSJhgCEX9z/2DPvJofsEple6TsnIzq11goJbNphxf4sNMl956KzYwdcGRO96COgaZATWTFTGjdBnCeycmzLlae5mJKmHKQno82iJGmYGNoBR7qHGZhzsSYKGmk4HVjEiZ17GMUrESsZ/p4h1PXZzA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=dajwY+5k; arc=none smtp.client-ip=118.26.132.104 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="dajwY+5k" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1773809910; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=qRbRybQjfXLTnRKgiGnZ/NjEGu5Y+VtHmSP6i6C3aeA=; b=dajwY+5kcdxvz5h/0nOxB5dwJr4p+jheXPBActQNTiIi4pExWPO0wEbzvWqK5EuPQVS+1l 2d3ChYf1qelkYsjqyIHhaaMFlTJUOhvootfuQU8pClOlLpOonbC/o7mxAgj8BwTgYSVo2U Y9YArUDOhQblemjBZ4cOif+WOKznvEv2XhpNA64egM91msrvXVDxU2bZDN2Sd7dp8dVxLt cTUaUkPcvhWwnrL4wMAh8T5/Unw2ujWVlQzsGXLkM0dJhUgdXqZWbqA4BRvKf2cKOdvZXy w2eACEW+uAlhC0spFdUxsyACZUH3Z+dhOn2Y1bNK0wOC8tbXD5i4WCHc0QKngA== To: , , , , , , , , , , , Subject: [PATCH v3 05/12] smp: Free call_function_data via RCU in smpcfd_dead_cpu Date: Wed, 18 Mar 2026 12:56:31 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable From: "Chuyi Zhou" Message-Id: <20260318045638.1572777-6-zhouchuyi@bytedance.com> References: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Cc: , "Chuyi Zhou" X-Lms-Return-Path: X-Mailer: git-send-email 2.20.1 X-Original-From: Chuyi Zhou In-Reply-To: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Content-Type: text/plain; charset="utf-8" Use rcu_read_lock() to protect csd in smp_call_function_many_cond() and wait for all read critical sections to exit before releasing percpu csd data. This is preparation for enabling preemption during csd_lock_wait() and can prevent accessing cfd->csd data that has already been freed in smpcfd_dead_cpu(). Signed-off-by: Chuyi Zhou --- kernel/smp.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/smp.c b/kernel/smp.c index 9728ba55944d..32c293d8be0e 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -77,6 +77,7 @@ int smpcfd_dead_cpu(unsigned int cpu) { struct call_function_data *cfd =3D &per_cpu(cfd_data, cpu); =20 + synchronize_rcu(); free_cpumask_var(cfd->cpumask); free_cpumask_var(cfd->cpumask_ipi); free_percpu(cfd->csd); @@ -810,6 +811,7 @@ static void smp_call_function_many_cond(const struct cp= umask *mask, =20 lockdep_assert_preemption_disabled(); =20 + rcu_read_lock(); cfd =3D this_cpu_ptr(&cfd_data); cpumask =3D cfd->cpumask; =20 @@ -905,6 +907,7 @@ static void smp_call_function_many_cond(const struct cp= umask *mask, } } =20 + rcu_read_unlock(); if (preemptible_wait) free_cpumask_var(cpumask_stack); } --=20 2.20.1 From nobody Mon Apr 6 19:45:50 2026 Received: from va-1-114.ptr.blmpb.com (va-1-114.ptr.blmpb.com [209.127.230.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D92E32D97AA for ; Wed, 18 Mar 2026 04:58:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.114 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809929; cv=none; b=rwlw9PXv5AWZCLhhpo5EiTjhDlgMX0diUqMcAlqDxj2PbGglTuLkL8H5d7R5/mmUDA/NZXka9nrRqIEcBDV10e9Wk2mh65EshJapqLs74bcAZliT3j8lO+s1js768Z8PCoez+mtHPDjkm01QG/4UsZXg4rxlTpcw820h2PGW+Po= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809929; c=relaxed/simple; bh=nJ3YM+2uk6gNK9Bh9fpvR79XeGn7S966rmjjKp3BW7c=; h=Date:To:Cc:Subject:Mime-Version:From:Message-Id:In-Reply-To: References:Content-Type; b=dp66QhgH9xfe6BKqg5q5p90wlnEXea3QB6FA/M403yS7hwmLyu6FMm3EtSHOQBFed4WJwnTOHXK1zZGRDgGHtqxENDE0cFg/z+wHJsvadHG5Mi3seN2gn9uwa3zX1nhx/E2E0o6hMJqgmS/jItShLivMcgsQJCZJRsAvPjQIPyw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=dPaE6SuE; arc=none smtp.client-ip=209.127.230.114 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="dPaE6SuE" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1773809923; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=MHnbIwKrVkHQClJXvUMbP3oOlNn3PbRZLv/FUo9ECVo=; b=dPaE6SuEs+XY+NORr/AOsV4Qnum8kykktHe2dJsyJ9rp91eFQtvIQGzvc548QJt2bF8Bvr /hhqYHEEO4tQNXksWe7iN6UlzF3NVOw63q+L7CLOx6ciaIfkbORFvz/gAnMLTmOOnnjjLN Bwl0K5bJ/VSIEUyWJm3gQ0XVDuZ8qNE4yAjzdyZ66UcR5PyhYaPiCd/buWC3tSnaa2QdrG DN0Mupft0HAn6W7pEhj+sR66tulmp/4olqYxo3DYl/+KjNRGsAkjGvmH8+Il55VCUdc3RF GrjdF5DN7vrK/oTE0MEak384B3FrDtSCN1blrfmNw9yOQ8vwmYDKGGtX3nGhag== Date: Wed, 18 Mar 2026 12:56:32 +0800 X-Original-From: Chuyi Zhou To: , , , , , , , , , , , Cc: , "Chuyi Zhou" Subject: [PATCH v3 06/12] smp: Enable preemption early in smp_call_function_many_cond Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Lms-Return-Path: From: "Chuyi Zhou" Message-Id: <20260318045638.1572777-7-zhouchuyi@bytedance.com> In-Reply-To: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Content-Transfer-Encoding: quoted-printable X-Mailer: git-send-email 2.20.1 References: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Content-Type: text/plain; charset="utf-8" Now smp_call_function_many_cond() disables preemption mainly for the following reasons: - To prevent the remote online CPU from going offline. Specifically, we want to ensure that no new csds are queued after smpcfd_dying_cpu() has finished. Therefore, preemption must be disabled until all necessary IPIs are sent. - To prevent migration to another CPU, which also implicitly prevents the current CPU from going offline (since stop_machine requires preempting the current task to execute offline callbacks). - To protect the per-cpu cfd_data from concurrent modification by other smp_call_*() on the current CPU. cfd_data contains cpumasks and per-cpu csds. Before enqueueing a csd, we block on the csd_lock() to ensure the previous asyc csd->func() has completed, and then initialize csd->func and csd->info. After sending the IPI, we spin-wait for the remote CPU to call csd_unlock(). Actually the csd_lock mechanism already guarantees csd serialization. If preemption occurs during csd_lock_wait, other concurrent smp_call_function_many_cond calls will simply block until the previous csd->func() completes: task A task B sd->func =3D fun_a send ipis preempted by B ---------------> csd_lock(csd); // block until last // fun_a finished csd->func =3D func_b; csd->info =3D info; ... send ipis switch back to A <--------------- csd_lock_wait(csd); // block until remote finish func_* This patch enables preemption before csd_lock_wait() which makes the potentially unpredictable csd_lock_wait() preemptible and migratable. Note that being migrated to another CPU and calling csd_lock_wait() may cause UAF due to smpcfd_dead_cpu() during the current CPU offline process. Previous patch used the RCU mechanism to synchronize csd_lock_wait() with smpcfd_dead_cpu() to prevent the above UAF issue. Signed-off-by: Chuyi Zhou --- kernel/smp.c | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 32c293d8be0e..18e7e4a8f1b6 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -801,7 +801,7 @@ static void smp_call_function_many_cond(const struct cp= umask *mask, smp_cond_func_t cond_func) { bool preemptible_wait =3D !IS_ENABLED(CONFIG_CPUMASK_OFFSTACK); - int cpu, last_cpu, this_cpu =3D smp_processor_id(); + int cpu, last_cpu, this_cpu; struct call_function_data *cfd; bool wait =3D scf_flags & SCF_WAIT; cpumask_var_t cpumask_stack; @@ -809,9 +809,9 @@ static void smp_call_function_many_cond(const struct cp= umask *mask, int nr_cpus =3D 0; bool run_remote =3D false; =20 - lockdep_assert_preemption_disabled(); - rcu_read_lock(); + this_cpu =3D get_cpu(); + cfd =3D this_cpu_ptr(&cfd_data); cpumask =3D cfd->cpumask; =20 @@ -898,6 +898,19 @@ static void smp_call_function_many_cond(const struct c= pumask *mask, local_irq_restore(flags); } =20 + /* + * We may block in csd_lock_wait() for a significant amount of time, + * especially when interrupts are disabled or with a large number of + * remote CPUs. Try to enable preemption before csd_lock_wait(). + * + * Use the cpumask_stack instead of cfd->cpumask to avoid concurrency + * modification from tasks on the same cpu. If preemption occurs during + * csd_lock_wait, other concurrent smp_call_function_many_cond() calls + * will simply block until the previous csd->func() complete. + */ + if (preemptible_wait) + put_cpu(); + if (run_remote && wait) { for_each_cpu(cpu, cpumask) { call_single_data_t *csd; @@ -907,9 +920,11 @@ static void smp_call_function_many_cond(const struct c= pumask *mask, } } =20 - rcu_read_unlock(); - if (preemptible_wait) + if (!preemptible_wait) + put_cpu(); + else free_cpumask_var(cpumask_stack); + rcu_read_unlock(); } =20 /** --=20 2.20.1 From nobody Mon Apr 6 19:45:50 2026 Received: from va-1-114.ptr.blmpb.com (va-1-114.ptr.blmpb.com [209.127.230.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 32ABA2FB08C for ; Wed, 18 Mar 2026 04:58:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.114 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809941; cv=none; b=XHyK+On0XmT1IoDeDPlOs8xseyDxF9Z6yNj2hUFbtKXSvt8YmcjxWVwyFQTHt4yW2zLDTvPujxmxloeLbt0X7EoX2tswQF/cGVIiMHquSxyMM0JdOp92VzKb0E8ejhy07UO5tyvAbtLAQ9F+uko0AGrQYXpAWQyrSPAe8LTQABA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809941; c=relaxed/simple; bh=FuGvak24gdR5Ls+ObagKEeNMvUcABQ7pso8x/MZ4Alg=; h=From:In-Reply-To:To:References:Content-Type:Cc:Message-Id:Subject: Date:Mime-Version; b=OR4k62NY9/ghcJmKJ511L9kPghT2ZiLGSYonr6pp2hmon0DQFZoxyNAUbGs1yyQ537pwU1VQnyFrJKJNTFi37UI+//vQbWzuqiZfZBmE2inytAvChIGw/6LVdWMMEgrSa7rD7YCr1gVB47IYW296361WOu0ncWb/zRZXXQDJveo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=UXNt2tad; arc=none smtp.client-ip=209.127.230.114 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="UXNt2tad" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1773809935; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=bbF0/0t9FxXUmujvSxbeDHkVrMxui1VTKyTpLx2uW+w=; b=UXNt2tadJtSWdRqVW1n+ngPfUZ+aJ4tZMVmgRheGrHz5n+/bcSh3nqyE6B1+WRVw8hQkAZ KTSLh6ZJnBX8fz/ymDnb2GJKO20xmX9xqcsLR8VNSTjaaS5I3YXkwbDDy8sqgi9kDg2EKz dvIENjaovgNcWDU2Ez+DCvroQFuFLssQDq9UF+rtIfbHTjdhsFSepfcpOaSy0Yzo7kygXs iFPHLANY9AiaPrDSk+Ufjc+IZ/+Sbexbx8qFQsIst/JjQ4fC0g5T3heKoHJWfdLZsow0bc 8BN7GsklBE+QSHaSbHD+aLQABgLV/IBWbUyEoRnyVNLl8oiHhF5U7JCckJ8RTw== From: "Chuyi Zhou" In-Reply-To: <20260318045638.1572777-1-zhouchuyi@bytedance.com> To: , , , , , , , , , , , X-Mailer: git-send-email 2.20.1 X-Lms-Return-Path: References: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Cc: , "Chuyi Zhou" Message-Id: <20260318045638.1572777-8-zhouchuyi@bytedance.com> Content-Transfer-Encoding: quoted-printable Subject: [PATCH v3 07/12] smp: Remove preempt_disable from smp_call_function Date: Wed, 18 Mar 2026 12:56:33 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Original-From: Chuyi Zhou Content-Type: text/plain; charset="utf-8" Now smp_call_function_many_cond() internally handles the preemption logic, so smp_call_function() does not need to explicitly disable preemption. Remove preempt_{enable, disable} from smp_call_function(). Signed-off-by: Chuyi Zhou Reviewed-by: Muchun Song --- kernel/smp.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 18e7e4a8f1b6..f9c0028968ef 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -966,9 +966,8 @@ EXPORT_SYMBOL(smp_call_function_many); */ void smp_call_function(smp_call_func_t func, void *info, int wait) { - preempt_disable(); - smp_call_function_many(cpu_online_mask, func, info, wait); - preempt_enable(); + smp_call_function_many_cond(cpu_online_mask, func, info, + wait ? SCF_WAIT : 0, NULL); } EXPORT_SYMBOL(smp_call_function); =20 --=20 2.20.1 From nobody Mon Apr 6 19:45:50 2026 Received: from va-1-113.ptr.blmpb.com (va-1-113.ptr.blmpb.com [209.127.230.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BDCB22FD1A1 for ; Wed, 18 Mar 2026 04:59:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809952; cv=none; b=tbyw1ptOF6CRAnkS7iFmAoGtwtwej4CfUa8E3d2IoTi5yFgnHHq8NRS1e3UwxModZybS63qEdML7rkl9GI583fBhEdlBLuVIzs7NHA1w4nPmIqSmbTF0okg7hsC2IeabHl6+nblAVo58/KvYS79YNg0RsK11nFsoMK80HcWk/SM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809952; c=relaxed/simple; bh=bG73orLfkAqUfB826WRUsll2Fggmb5GLRERdJt0cdrk=; h=Cc:From:Subject:Message-Id:Mime-Version:References:Date:To: In-Reply-To:Content-Type; b=MBpcdtjMkIMhA8My+ep9Cq+6tr8tSkj8elqrvkfM9Ov3pf6SWtstaltpOV1cN0VgJLRTidPns05Z9NGXnpGHxToeP3eeb7/+mTAzAfOnnUi0cW1v5HxVlrlO6HI9ViwQyjVmEREdA3adn1wMf0zOudw2OvprkB92/k/Dt8A7EUM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=ApGNoUl3; arc=none smtp.client-ip=209.127.230.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="ApGNoUl3" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1773809947; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=3HNXMIzSZOnUUizhVjmHGjpoXFrZqxYcqXR3Nk2JNE8=; b=ApGNoUl3wTbr8abE1dSBtZzEwOA/kw3cpSFTJhqL6uu5XxPcbO79ZbvCH/lJgUsjwmp2Du N5PkB+eh0CzMiWhnbLGc2fh/seD5smqiYqIRN25koJDOmy0RLzStJ17fGKcLTak2k6jEro o0YlQfqWYzdXrRiNOZShqlib54PV+je4uS269kpKbQWE609fqU+2z/LBzNz0ynub7wa5Jk ol1v/hZduT8MKHq3GAJSinbVGTf/WmOz+9VqCarOJPTRpV3cT17V5cxzmiTM6DxSkI9D05 QNv/Qh+aC8Q7CckFxKaTmzQIW8FPKmwFsTCoWaL84PxmCP41uRHDsTe4a3xFVg== Cc: , "Chuyi Zhou" From: "Chuyi Zhou" Subject: [PATCH v3 08/12] smp: Remove preempt_disable from on_each_cpu_cond_mask Content-Transfer-Encoding: quoted-printable X-Original-From: Chuyi Zhou Message-Id: <20260318045638.1572777-9-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Date: Wed, 18 Mar 2026 12:56:34 +0800 X-Mailer: git-send-email 2.20.1 To: , , , , , , , , , , , In-Reply-To: <20260318045638.1572777-1-zhouchuyi@bytedance.com> X-Lms-Return-Path: Content-Type: text/plain; charset="utf-8" Now smp_call_function_many_cond() internally handles the preemption logic, so on_each_cpu_cond_mask does not need to explicitly disable preemption. Remove preempt_{enable, disable} from on_each_cpu_cond_mask(). Signed-off-by: Chuyi Zhou Reviewed-by: Muchun Song --- kernel/smp.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index f9c0028968ef..47c3b057f57f 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -1086,9 +1086,7 @@ void on_each_cpu_cond_mask(smp_cond_func_t cond_func,= smp_call_func_t func, if (wait) scf_flags |=3D SCF_WAIT; =20 - preempt_disable(); smp_call_function_many_cond(mask, func, info, scf_flags, cond_func); - preempt_enable(); } EXPORT_SYMBOL(on_each_cpu_cond_mask); =20 --=20 2.20.1 From nobody Mon Apr 6 19:45:50 2026 Received: from va-1-113.ptr.blmpb.com (va-1-113.ptr.blmpb.com [209.127.230.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B6662FE057 for ; Wed, 18 Mar 2026 04:59:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809964; cv=none; b=Kqv9Qmvo6azlMegyY11rzStzr6FVF5I+hXstCsG0YwGNAicrTZk9P67k9al6l5j/zXZSOrrYn4YxcQX5ANX8Wx97hGBM7iIO4xeJut6DQ7oTF8NU5fisybbXjUuE5iPlM4nKVGmh3MHLqJeGmG6jBTkq6e818wb3Y9r15Fq3kYo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809964; c=relaxed/simple; bh=Qa9IpuSjSfpTtmS0nJfwKeG6oQYGbdwsouDwUnE2vi8=; h=To:Cc:Mime-Version:References:From:Date:Message-Id:Subject: In-Reply-To:Content-Type; b=DXtg08oP6VicHjaRQ+gzcqJX5zg6ciztQTMt5VKZOVDwbnoUgSDYnFgj4hmzlqMo+Sg9hlQuZmyMG2Pm1UfY6BJT86/+CNDvLXEAJivHrVopRtngvCJdMsSH/kYwaHoEimox9iqvbl6AvLHb3qgebNJYw4X3b9Jq+i47rUWAaKI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=VYv2eb7Y; arc=none smtp.client-ip=209.127.230.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="VYv2eb7Y" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1773809959; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=0usi0H2e8tYtT63WQwB8R/fMy+WKcYDOp5mE7Fo7UFU=; b=VYv2eb7Y6MCWMHnVpNO3AKOL9o8ivDUH9Ric3fDEgJ5CVmO1PSZszJn0VBhS+1XBajza5G U18A3q+wSDfIOyosfPhxRBMhTKwi1JsMd5n0Vu4y8MpY9BEWCSMl3OjA6nVExkQyB1DAWt Q0oN8eqiBLdBPp0hiPBlLwYNLrER/iPiiN6ZhE1zJ6zEuZ1yQZ+iq/bcQxHIwtxEOzzM+p hJ2Wl/HWguNZ+jdSxQ+YgfG3OMVM364eoYP/npc5tlH/mvQydRPyYrfmBaRzYMzBCiYW62 kJffmFhzMkR5m2IpGDcSy/njCdCubnsMW7Af3opDxSTqmvPRshO36fnQ24cpXA== To: , , , , , , , , , , , Cc: , "Chuyi Zhou" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Lms-Return-Path: X-Original-From: Chuyi Zhou References: <20260318045638.1572777-1-zhouchuyi@bytedance.com> From: "Chuyi Zhou" Date: Wed, 18 Mar 2026 12:56:35 +0800 Content-Transfer-Encoding: quoted-printable X-Mailer: git-send-email 2.20.1 Message-Id: <20260318045638.1572777-10-zhouchuyi@bytedance.com> Subject: [PATCH v3 09/12] scftorture: Remove preempt_disable in scftorture_invoke_one In-Reply-To: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Content-Type: text/plain; charset="utf-8" Previous patches make smp_call*() handle preemption logic internally. Now the preempt_disable() by most callers becomes unnecessary and can therefore be removed. Remove preempt_{enable, disable} in scftorture_invoke_one(). Signed-off-by: Chuyi Zhou --- kernel/scftorture.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/kernel/scftorture.c b/kernel/scftorture.c index 327c315f411c..b87215e40be5 100644 --- a/kernel/scftorture.c +++ b/kernel/scftorture.c @@ -364,8 +364,6 @@ static void scftorture_invoke_one(struct scf_statistics= *scfp, struct torture_ra } if (use_cpus_read_lock) cpus_read_lock(); - else - preempt_disable(); switch (scfsp->scfs_prim) { case SCF_PRIM_RESCHED: if (IS_BUILTIN(CONFIG_SCF_TORTURE_TEST)) { @@ -411,13 +409,10 @@ static void scftorture_invoke_one(struct scf_statisti= cs *scfp, struct torture_ra if (!ret) { if (use_cpus_read_lock) cpus_read_unlock(); - else - preempt_enable(); + wait_for_completion(&scfcp->scfc_completion); if (use_cpus_read_lock) cpus_read_lock(); - else - preempt_disable(); } else { scfp->n_single_rpc_ofl++; scf_add_to_free_list(scfcp); @@ -463,8 +458,6 @@ static void scftorture_invoke_one(struct scf_statistics= *scfp, struct torture_ra } if (use_cpus_read_lock) cpus_read_unlock(); - else - preempt_enable(); if (allocfail) schedule_timeout_idle((1 + longwait) * HZ); // Let no-wait handlers com= plete. else if (!(torture_random(trsp) & 0xfff)) --=20 2.20.1 From nobody Mon Apr 6 19:45:50 2026 Received: from va-1-114.ptr.blmpb.com (va-1-114.ptr.blmpb.com [209.127.230.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2C1422FB08C for ; Wed, 18 Mar 2026 04:59:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.114 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809977; cv=none; b=tcqzsjIam/fjvS78/EJS0mL9/1F9M+0qPXE3D5G+KYDem+aKqrJYdhmqt8KQ8qhUXrYHhO9M20UVYSfG1/qoHAtV4k3LfDzenUHlFp29Leg23G7jEskcW8spEaNtJ9kxXE6bcObGwxmLYSir9j9FcQoHr710ysnDyBokl+/81bs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809977; c=relaxed/simple; bh=GBkvWMjAzeSvg0JsFYHEodJkrgPWnjQPldgI1rjvBCg=; h=Subject:Message-Id:In-Reply-To:Date:References:Content-Type:To:Cc: From:Mime-Version; b=r6Dqx/EViHykmxIKZWQETP3w5hkYAopkLjCREsUoqAE5rWjBYmgZ0sVhu3Rwo/9es915F9oSZqbb4NOAo5GEI+X9jxdncifYk8HTkmyAjul+YzzqcTLtrHy6kgTw2w/2g3vTDft8An1SJOaG0ENCJ8FJ0pDCrLXgZgDb8m5g/8I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=nQj5o6et; arc=none smtp.client-ip=209.127.230.114 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="nQj5o6et" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1773809971; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=Qe36UkdpYXJJkjgQsbqtRyM93IV57KBYKdRoga/Upxo=; b=nQj5o6etNasMYVRpty+AvuzX2C0HwpDJvUnUXGWKCA3mijltD9AAtslCO9uPo/JRhEYfTP B/OQdVex4RXVXAgCcCUv8Vw7qJ/w4+w6k6R1JsFOKZD/sgXpIMgO1U3iyLifDLfOoNf5Bz bL14kHsmqck4+knyxFHgjgcLBmLBqfsQ8ho6wjCbHTtvbgao7rReKS88YQsQyHhEsilflI SQCKRlrmmxVfalxTCsX5dIJTf4GA0kSJqEACpUKMiD+pscd0mrRK1u9I9+S+sykmmd7wn+ MSECBySkVmxRPLz4bEbCgOzZlHZIi0Djs+M0wz9mQFL65QwHx0kR9kKSTJ5LSA== Subject: [PATCH v3 10/12] x86/mm: Move flush_tlb_info back to the stack Message-Id: <20260318045638.1572777-11-zhouchuyi@bytedance.com> In-Reply-To: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Content-Transfer-Encoding: quoted-printable X-Original-From: Chuyi Zhou Date: Wed, 18 Mar 2026 12:56:36 +0800 X-Mailer: git-send-email 2.20.1 References: <20260318045638.1572777-1-zhouchuyi@bytedance.com> To: , , , , , , , , , , , Cc: , "Chuyi Zhou" X-Lms-Return-Path: From: "Chuyi Zhou" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Commit 3db6d5a5ecaf ("x86/mm/tlb: Remove 'struct flush_tlb_info' from the stack") converted flush_tlb_info from stack variable to per-CPU variable. This brought about a performance improvement of around 3% in extreme test. However, it also required that all flush_tlb* operations keep preemption disabled entirely to prevent concurrent modifications of flush_tlb_info. flush_tlb* needs to send IPIs to remote CPUs and synchronously wait for all remote CPUs to complete their local TLB flushes. The process could take tens of milliseconds when interrupts are disabled or with a large number of remote CPUs. From the perspective of improving kernel real-time performance, this patch reverts flush_tlb_info back to stack variables. This is a preparation for enabling preemption during TLB flush in next patch. To evaluate the performance impact of this patch, use the following script to reproduce the microbenchmark mentioned in commit 3db6d5a5ecaf ("x86/mm/tlb: Remove 'struct flush_tlb_info' from the stack"). The test environment is an Ice Lake system (Intel(R) Xeon(R) Platinum 8336C) with 128 CPUs and 2 NUMA nodes: #include #include #include #include #include #include #define NUM_OPS 1000000 #define NUM_THREADS 3 #define NUM_RUNS 5 #define PAGE_SIZE 4096 volatile int stop_threads =3D 0; void *busy_wait_thread(void *arg) { while (!stop_threads) { __asm__ volatile ("nop"); } return NULL; } long long get_usec() { struct timeval tv; gettimeofday(&tv, NULL); return tv.tv_sec * 1000000LL + tv.tv_usec; } int main() { pthread_t threads[NUM_THREADS]; char *addr; int i, r; addr =3D mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (addr =3D=3D MAP_FAILED) { perror("mmap"); exit(1); } for (i =3D 0; i < NUM_THREADS; i++) { if (pthread_create(&threads[i], NULL, busy_wait_thread, NULL)) exit(1); } printf("Running benchmark: %d runs, %d ops each, %d background\n" "threads\n", NUM_RUNS, NUM_OPS, NUM_THREADS); for (r =3D 0; r < NUM_RUNS; r++) { long long start, end; start =3D get_usec(); for (i =3D 0; i < NUM_OPS; i++) { addr[0] =3D 1; if (madvise(addr, PAGE_SIZE, MADV_DONTNEED)) { perror("madvise"); exit(1); } } end =3D get_usec(); double duration =3D (double)(end - start); double avg_lat =3D duration / NUM_OPS; printf("Run %d: Total time %.2f us, Avg latency %.4f us/op\n", r + 1, duration, avg_lat); } stop_threads =3D 1; for (i =3D 0; i < NUM_THREADS; i++) pthread_join(threads[i], NULL); munmap(addr, PAGE_SIZE); return 0; } Using the per-cpu flush_tlb_info showed only a very marginal performance advantage, approximately 1%. base on-stack ---- --------- avg (usec/op) 5.9362 5.9956 (+1%) stddev 0.0240 0.0096 And for the mmtest/stress-ng-madvise test, which randomly calls madvise on pages within a mmap range and triggers a large number of high-frequency TLB flushes, no significant performance regression was observed. baseline on-stack Amean bops-madvise-1 13.64 ( 0.00%) 13.56 ( 0.59%) Amean bops-madvise-2 27.32 ( 0.00%) 27.26 ( 0.24%) Amean bops-madvise-4 53.35 ( 0.00%) 53.54 ( -0.35%) Amean bops-madvise-8 103.09 ( 0.00%) 103.30 ( -0.20%) Amean bops-madvise-16 191.88 ( 0.00%) 191.75 ( 0.07%) Amean bops-madvise-32 287.98 ( 0.00%) 291.01 * -1.05%* Amean bops-madvise-64 365.84 ( 0.00%) 368.09 * -0.61%* Amean bops-madvise-128 422.72 ( 0.00%) 423.47 ( -0.18%) Amean bops-madvise-256 435.61 ( 0.00%) 435.63 ( -0.01%) Signed-off-by: Chuyi Zhou --- arch/x86/mm/tlb.c | 124 ++++++++++++++++++---------------------------- 1 file changed, 49 insertions(+), 75 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index af43d177087e..4704200de3f0 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1373,71 +1373,30 @@ void flush_tlb_multi(const struct cpumask *cpumask, */ unsigned long tlb_single_page_flush_ceiling __read_mostly =3D 33; =20 -static DEFINE_PER_CPU_SHARED_ALIGNED(struct flush_tlb_info, flush_tlb_info= ); - -#ifdef CONFIG_DEBUG_VM -static DEFINE_PER_CPU(unsigned int, flush_tlb_info_idx); -#endif - -static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, - unsigned long start, unsigned long end, - unsigned int stride_shift, bool freed_tables, - u64 new_tlb_gen) +void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, + unsigned long end, unsigned int stride_shift, + bool freed_tables) { - struct flush_tlb_info *info =3D this_cpu_ptr(&flush_tlb_info); + int cpu =3D get_cpu(); =20 -#ifdef CONFIG_DEBUG_VM - /* - * Ensure that the following code is non-reentrant and flush_tlb_info - * is not overwritten. This means no TLB flushing is initiated by - * interrupt handlers and machine-check exception handlers. - */ - BUG_ON(this_cpu_inc_return(flush_tlb_info_idx) !=3D 1); -#endif + struct flush_tlb_info info =3D { + .mm =3D mm, + .stride_shift =3D stride_shift, + .freed_tables =3D freed_tables, + .trim_cpumask =3D 0, + .initiating_cpu =3D cpu, + }; =20 - /* - * If the number of flushes is so large that a full flush - * would be faster, do a full flush. - */ if ((end - start) >> stride_shift > tlb_single_page_flush_ceiling) { start =3D 0; end =3D TLB_FLUSH_ALL; } =20 - info->start =3D start; - info->end =3D end; - info->mm =3D mm; - info->stride_shift =3D stride_shift; - info->freed_tables =3D freed_tables; - info->new_tlb_gen =3D new_tlb_gen; - info->initiating_cpu =3D smp_processor_id(); - info->trim_cpumask =3D 0; - - return info; -} - -static void put_flush_tlb_info(void) -{ -#ifdef CONFIG_DEBUG_VM - /* Complete reentrancy prevention checks */ - barrier(); - this_cpu_dec(flush_tlb_info_idx); -#endif -} - -void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, - unsigned long end, unsigned int stride_shift, - bool freed_tables) -{ - struct flush_tlb_info *info; - int cpu =3D get_cpu(); - u64 new_tlb_gen; - /* This is also a barrier that synchronizes with switch_mm(). */ - new_tlb_gen =3D inc_mm_tlb_gen(mm); + info.new_tlb_gen =3D inc_mm_tlb_gen(mm); =20 - info =3D get_flush_tlb_info(mm, start, end, stride_shift, freed_tables, - new_tlb_gen); + info.start =3D start; + info.end =3D end; =20 /* * flush_tlb_multi() is not optimized for the common case in which only @@ -1445,19 +1404,18 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsig= ned long start, * flush_tlb_func_local() directly in this case. */ if (mm_global_asid(mm)) { - broadcast_tlb_flush(info); + broadcast_tlb_flush(&info); } else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) { - info->trim_cpumask =3D should_trim_cpumask(mm); - flush_tlb_multi(mm_cpumask(mm), info); + info.trim_cpumask =3D should_trim_cpumask(mm); + flush_tlb_multi(mm_cpumask(mm), &info); consider_global_asid(mm); } else if (mm =3D=3D this_cpu_read(cpu_tlbstate.loaded_mm)) { lockdep_assert_irqs_enabled(); local_irq_disable(); - flush_tlb_func(info); + flush_tlb_func(&info); local_irq_enable(); } =20 - put_flush_tlb_info(); put_cpu(); mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); } @@ -1527,19 +1485,29 @@ static void kernel_tlb_flush_range(struct flush_tlb= _info *info) =20 void flush_tlb_kernel_range(unsigned long start, unsigned long end) { - struct flush_tlb_info *info; + struct flush_tlb_info info =3D { + .mm =3D NULL, + .stride_shift =3D PAGE_SHIFT, + .freed_tables =3D false, + .trim_cpumask =3D 0, + .new_tlb_gen =3D TLB_GENERATION_INVALID + }; =20 guard(preempt)(); =20 - info =3D get_flush_tlb_info(NULL, start, end, PAGE_SHIFT, false, - TLB_GENERATION_INVALID); + if ((end - start) >> PAGE_SHIFT > tlb_single_page_flush_ceiling) { + start =3D 0; + end =3D TLB_FLUSH_ALL; + } =20 - if (info->end =3D=3D TLB_FLUSH_ALL) - kernel_tlb_flush_all(info); - else - kernel_tlb_flush_range(info); + info.initiating_cpu =3D smp_processor_id(), + info.start =3D start; + info.end =3D end; =20 - put_flush_tlb_info(); + if (info.end =3D=3D TLB_FLUSH_ALL) + kernel_tlb_flush_all(&info); + else + kernel_tlb_flush_range(&info); } =20 /* @@ -1707,12 +1675,19 @@ EXPORT_SYMBOL_FOR_KVM(__flush_tlb_all); =20 void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) { - struct flush_tlb_info *info; - int cpu =3D get_cpu(); =20 - info =3D get_flush_tlb_info(NULL, 0, TLB_FLUSH_ALL, 0, false, - TLB_GENERATION_INVALID); + struct flush_tlb_info info =3D { + .start =3D 0, + .end =3D TLB_FLUSH_ALL, + .mm =3D NULL, + .stride_shift =3D 0, + .freed_tables =3D false, + .new_tlb_gen =3D TLB_GENERATION_INVALID, + .initiating_cpu =3D cpu, + .trim_cpumask =3D 0, + }; + /* * flush_tlb_multi() is not optimized for the common case in which only * a local TLB flush is needed. Optimize this use-case by calling @@ -1722,17 +1697,16 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap= _batch *batch) invlpgb_flush_all_nonglobals(); batch->unmapped_pages =3D false; } else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { - flush_tlb_multi(&batch->cpumask, info); + flush_tlb_multi(&batch->cpumask, &info); } else if (cpumask_test_cpu(cpu, &batch->cpumask)) { lockdep_assert_irqs_enabled(); local_irq_disable(); - flush_tlb_func(info); + flush_tlb_func(&info); local_irq_enable(); } =20 cpumask_clear(&batch->cpumask); =20 - put_flush_tlb_info(); put_cpu(); } =20 --=20 2.20.1 From nobody Mon Apr 6 19:45:50 2026 Received: from va-1-112.ptr.blmpb.com (va-1-112.ptr.blmpb.com [209.127.230.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 237433002AB for ; Wed, 18 Mar 2026 04:59:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.112 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809990; cv=none; b=b1QRXL9xNTn7z0/zU6xvntFHqXZBrhVym0ZNVQi/++5aIwDkeHYQZ0SMkTJNB9COQF3neSLaR305etfdAxw3TOc9NHDcBEU17aU4G1nyWgXMdHTZCcBnWKhAg17Vm3chF7n/mQ/jIhyd30ND7l3LOppu17rVpx52ipI7B+/sAPE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773809990; c=relaxed/simple; bh=iu2tR5++cx1DsR8djRRme5fVzstGkXmTeBUTeU7U+u0=; h=Subject:Message-Id:References:To:Cc:Mime-Version:In-Reply-To:From: Date:Content-Type; b=tpe4+cGZ4S9XSHZ59+KzjBO9qSArofs1rPsQSDs8CnekOdad2K1tH2me6juMccNdfRmtpsWtML3kih3dUjTdIh4fgKbcW+LcXMiU0fkIDrZJyFer+IA8H8aAsIH3nL9unVu1l+soIXwZwOmHMZdpBqoz30VwN5lJQT8zAMdOHLY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=X1K4ABaD; arc=none smtp.client-ip=209.127.230.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="X1K4ABaD" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1773809983; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=XM8LPX7mDbn8GUhfaBl8sGfw5chH9onLSnGyii/rkUQ=; b=X1K4ABaDtVPtTHZ5CfeAFuPyRZFGBd1fEqWUeJZoh4HYWknrQuZR+FBqA1hO5PPzEMwFVY 9s97PBzBXjFmF0OkWTRuGA+SVAA3b+1RHZaFhXGPagWI7BuCmaPm+KReXx+AdwI2ziNt7z XUSAxxo+rfQYHr+RN0HiLM+8rxNzXGFzhFHxl9tDPV6ersfTE3ZCDInzF0obvN5LUv68fz fR9CQePjJI52uWW9akWftKlmFsU+L9ygRjWRQPrDpBSafLful1t9L8ofSi2jd6/fR1sD2t maTOSpJvsxfprydn04XQVV/4M69RJtW16Ntfjz9FDhOe+F1etGR5bgwjBeMMXg== Subject: [PATCH v3 11/12] x86/mm: Enable preemption during native_flush_tlb_multi Message-Id: <20260318045638.1572777-12-zhouchuyi@bytedance.com> X-Original-From: Chuyi Zhou References: <20260318045638.1572777-1-zhouchuyi@bytedance.com> X-Lms-Return-Path: To: , , , , , , , , , , , Cc: , "Chuyi Zhou" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20260318045638.1572777-1-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 From: "Chuyi Zhou" Date: Wed, 18 Mar 2026 12:56:37 +0800 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" native_flush_tlb_multi() may be frequently called by flush_tlb_mm_range() and arch_tlbbatch_flush() in production environments. When pages are reclaimed or process exit, native_flush_tlb_multi() sends IPIs to remote CPUs and waits for all remote CPUs to complete their local TLB flushes. The overall latency may reach tens of milliseconds due to a large number of remote CPUs and other factors (such as interrupts being disabled). Since flush_tlb_mm_range() and arch_tlbbatch_flush() always disable preemption, which may cause increased scheduling latency for other threads on the current CPU. Previous patch converted flush_tlb_info from per-cpu variable to on-stack variable. Additionally, it's no longer necessary to explicitly disable preemption before calling smp_call*() since they internally handle the preemption logic. Now it's safe to enable preemption during native_flush_tlb_multi(). Signed-off-by: Chuyi Zhou --- arch/x86/kernel/kvm.c | 4 +++- arch/x86/mm/tlb.c | 9 +++++++-- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 3bc062363814..4f7f4c1149b9 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -668,8 +668,10 @@ static void kvm_flush_tlb_multi(const struct cpumask *= cpumask, u8 state; int cpu; struct kvm_steal_time *src; - struct cpumask *flushmask =3D this_cpu_cpumask_var_ptr(__pv_cpu_mask); + struct cpumask *flushmask; =20 + guard(preempt)(); + flushmask =3D this_cpu_cpumask_var_ptr(__pv_cpu_mask); cpumask_copy(flushmask, cpumask); /* * We have to call flush only on online vCPUs. And diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 4704200de3f0..73500376d185 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1406,9 +1406,11 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsign= ed long start, if (mm_global_asid(mm)) { broadcast_tlb_flush(&info); } else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) { + put_cpu(); info.trim_cpumask =3D should_trim_cpumask(mm); flush_tlb_multi(mm_cpumask(mm), &info); consider_global_asid(mm); + goto invalidate; } else if (mm =3D=3D this_cpu_read(cpu_tlbstate.loaded_mm)) { lockdep_assert_irqs_enabled(); local_irq_disable(); @@ -1417,6 +1419,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigne= d long start, } =20 put_cpu(); +invalidate: mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); } =20 @@ -1697,7 +1700,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_b= atch *batch) invlpgb_flush_all_nonglobals(); batch->unmapped_pages =3D false; } else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { + put_cpu(); flush_tlb_multi(&batch->cpumask, &info); + goto clear; } else if (cpumask_test_cpu(cpu, &batch->cpumask)) { lockdep_assert_irqs_enabled(); local_irq_disable(); @@ -1705,9 +1710,9 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_b= atch *batch) local_irq_enable(); } =20 - cpumask_clear(&batch->cpumask); - put_cpu(); +clear: + cpumask_clear(&batch->cpumask); } =20 /* --=20 2.20.1 From nobody Mon Apr 6 19:45:50 2026 Received: from va-1-112.ptr.blmpb.com (va-1-112.ptr.blmpb.com [209.127.230.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B36A2FE04E for ; Wed, 18 Mar 2026 04:59:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.112 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773810001; cv=none; b=pB9bJBIdYUtNCUeDia8t9OCTQgF9lZPnCamKYgsJMTUKl9eUoqOJA4VOiHjUbgCOf+TpylHO53oJ2TvpyXp4gHW2cwNjZLiEMy2iJA/jm/4iBGX+k3y9vjfpFd8BaT5laJZmzfpkEmWWfA29twCDEDd6cfPqHamAdBoEFnWeZTM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1773810001; c=relaxed/simple; bh=X+CqvbFR/GYiLv0G6d4WSZR05Firg3cSQjz8ceINs08=; h=To:In-Reply-To:Cc:From:Subject:References:Date:Message-Id: Mime-Version:Content-Type; b=ssYrnVlb/R11X+PTD0oFDD+lon43KLo1Isjr10hPfsfW/456xSrRnfg8HzACbA7uS7a0VvJWAUF0ZpZ7FFUbywzaSVy2mkAXQ1mpEoc4FKVeQ2lLmCCkrABTb62hnYPeRPStZfWhsXm60sL8hIQEloNF8ATs/4ZaW2q7HJA1F+w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=PKSOjaQh; arc=none smtp.client-ip=209.127.230.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="PKSOjaQh" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1773809995; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=dP80iEfLiCUMTATceYOb/cHCPDqACbsVJSQ5RitjUJU=; b=PKSOjaQh1Vw3ufDXvzZyNO8xXeqAQDBk/pH4TpuWUr7BWJsu+QYqqovWDsacsWduANKdFy 8+u3yZ/4+2tjKsBhg8ErCmroBAzvqAP+9kNDSQ/No5aHA3pYfEe+h2KFIqmghJld1XORfZ w+lF69LX8EwUSbjmPVGKGyFb5hnst5rV0viWpybqolr3iDlGf5iJXxtfPrQdlz/3cHLCBL d4z5zbWQQNyZt00Ou6m3qzqJ4sflgSZVlvZUtGSfr7rXPjubpgEaTuusAWfZgBak28GSsU Ur6B5FTBqNnfWLNvoU7nPk1O+NbCHFEFTbgncD4rBDwa4ftPsnvZX3Fhz6GPZw== To: , , , , , , , , , , , X-Lms-Return-Path: In-Reply-To: <20260318045638.1572777-1-zhouchuyi@bytedance.com> Cc: , "Chuyi Zhou" From: "Chuyi Zhou" Subject: [PATCH v3 12/12] x86/mm: Enable preemption during flush_tlb_kernel_range References: <20260318045638.1572777-1-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 Date: Wed, 18 Mar 2026 12:56:38 +0800 Message-Id: <20260318045638.1572777-13-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Original-From: Chuyi Zhou Content-Type: text/plain; charset="utf-8" flush_tlb_kernel_range() is invoked when kernel memory mapping changes. On x86 platforms without the INVLPGB feature enabled, we need to send IPIs to every online CPU and synchronously wait for them to complete do_kernel_range_flush(). This process can be time-consuming due to factors such as a large number of CPUs or other issues (like interrupts being disabled). flush_tlb_kernel_range() always disables preemption, this may affect the scheduling latency of other tasks on the current CPU. Previous patch converted flush_tlb_info from per-cpu variable to on-stack variable. Additionally, it's no longer necessary to explicitly disable preemption before calling smp_call*() since they internally handles the preemption logic. Now is's safe to enable preemption during flush_tlb_kernel_range(). Signed-off-by: Chuyi Zhou --- arch/x86/mm/tlb.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 73500376d185..b89949d4fb31 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1446,6 +1446,8 @@ static void invlpgb_kernel_range_flush(struct flush_t= lb_info *info) { unsigned long addr, nr; =20 + guard(preempt)(); + for (addr =3D info->start; addr < info->end; addr +=3D nr << PAGE_SHIFT) { nr =3D (info->end - addr) >> PAGE_SHIFT; =20 @@ -1496,14 +1498,12 @@ void flush_tlb_kernel_range(unsigned long start, un= signed long end) .new_tlb_gen =3D TLB_GENERATION_INVALID }; =20 - guard(preempt)(); - if ((end - start) >> PAGE_SHIFT > tlb_single_page_flush_ceiling) { start =3D 0; end =3D TLB_FLUSH_ALL; } =20 - info.initiating_cpu =3D smp_processor_id(), + info.initiating_cpu =3D raw_smp_processor_id(), info.start =3D start; info.end =3D end; =20 --=20 2.20.1