From nobody Wed Jun 17 02:52:25 2026 Received: from va-1-111.ptr.blmpb.com (va-1-111.ptr.blmpb.com [209.127.230.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 191DF42982C for ; Tue, 16 Jun 2026 11:12:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.111 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608361; cv=none; b=OkKnsy1NbpcxITBiI72Fx9+4J9OZJCOFs0vcCzxAwiiBUDm740jVK9ORg3/UkXXPvkDrGvlALSQaoM30z6gzbDrb/eLiuU4ttAeg5ofkSl/+RdHQN+7BPVLgbj0tw48zaAl+K00mXuco9nOlDSFi2ldwj+2MzMW3oJzZV2b+7BA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608361; c=relaxed/simple; bh=Rg0SRjge25r54MuX+5e7uJUHi8h+jHWxgQTJZMA3Ybw=; h=References:Mime-Version:In-Reply-To:Message-Id:From:Subject:Date: Content-Type:To:Cc; b=XBIta4/hWXeFfhk323N7Gkxp6OhmVofj+nxlZKJLkrP6vmvnk+bw8JCeI97NvV6YpuhoWkE9vqBwVK6LEddbpzhBJ08abqXBGilNuJDKyKx5ydSV/ZkCcALegUAjzCWthM+jKhqYKT9ekmrnBhYF/DAr8/d9g6+mCu3B9C0q2jM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=X18AaafF; arc=none smtp.client-ip=209.127.230.111 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="X18AaafF" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1781608348; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=0CuqQYjiGxOq8T4U5Tl4ROE+RqReBxMtjXyeROSsYsk=; b=X18AaafFELgY36HI8gTHEoHsgYxQYfA16Ed3p5SZx9g4F0EBQrP3AYZoiXmio8/kfgG6WS gc3noRgcqFmH008wVqyjqPB+XSwbn7zxdZm1BlAzgFPyDqTGBZATiIlmp7NcnsFiD2Bqtv +NZsQj7GFmM2Kqfwan4UCsDZUuM4K/Zha00q2GeqGYwU/tz6qoEgwpaYaW7y/7T+PHv3t0 D6SezlUhbJvjWgGqkuGi+QimGQ3OkJJQKMQCZNAkpLJNfSjVYqTe9iQ3hi402B1LV0/hb5 5E5GVZaTWhZrfz6K09BXLneckao7l/rSNqD+NuIJP+c4+v/dkHuoPID1ELtAWg== X-Lms-Return-Path: Content-Transfer-Encoding: quoted-printable References: <20260616111127.966468-1-zhouchuyi@bytedance.com> X-Original-From: Chuyi Zhou Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20260616111127.966468-1-zhouchuyi@bytedance.com> Message-Id: <20260616111127.966468-2-zhouchuyi@bytedance.com> From: "Chuyi Zhou" Subject: [PATCH v8 01/14] smp: Disable preemption explicitly in __csd_lock_wait() Date: Tue, 16 Jun 2026 19:11:14 +0800 X-Mailer: git-send-email 2.20.1 To: , , , , , , , , , , , , , Cc: , "Chuyi Zhou" Content-Type: text/plain; charset="utf-8" The latter patches will enable preemption before csd_lock_wait(), which could break csdlock_debug. Because the slice of other tasks on the CPU may be accounted between ktime_get_mono_fast_ns() calls, disable preemption explicitly in __csd_lock_wait(). This is a preparation for the next patches. Signed-off-by: Chuyi Zhou Acked-by: Muchun Song Reviewed-by: Steven Rostedt (Google) Reviewed-by: Sebastian Andrzej Siewior Tested-by: Paul E. McKenney --- kernel/smp.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/smp.c b/kernel/smp.c index a0bb56bd8dda..b58975480e11 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -323,6 +323,8 @@ static void __csd_lock_wait(call_single_data_t *csd) int bug_id =3D 0; u64 ts0, ts1; =20 + guard(preempt)(); + ts1 =3D ts0 =3D ktime_get_mono_fast_ns(); for (;;) { if (csd_lock_wait_toolong(csd, ts0, &ts1, &bug_id, &nmessages)) --=20 2.20.1 From nobody Wed Jun 17 02:52:25 2026 Received: from va-1-112.ptr.blmpb.com (va-1-112.ptr.blmpb.com [209.127.230.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C89CE43C057 for ; Tue, 16 Jun 2026 11:12:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.112 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608392; cv=none; b=TJqqhkkomZi1y4j4ylAEVp5LGb1JGhAxl6wVHEXg+ouRWZVXsedctE9R73DzDwhES2GrgetPqTBmVx/kU6gl2Tsz8Rg1ZpRP1qncKDGZYdmUKQDc3yJ3WvotWNmMgM6qh2zz8AuBhltu637foKDACp3oaTlEd9mnVh+SAUAKPAc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608392; c=relaxed/simple; bh=1+/jf0lnbM/ox43aPM6n6bVNq3l3OoNlvzi1wsAOzMk=; h=To:Subject:Message-Id:Content-Type:Cc:In-Reply-To:From: Mime-Version:Date:References; b=tvpqXN/W0M/fiTXfZPRn//5BjjLX3gyRNDX/GNcvt+xwO+hJjBOkNTJS9RSINXHNay2a/xiDKMJfaeHrIKnUGcwxgFTCC/h+VfwSW28FomVUl2SIKcvuzqVa3QU2oVSd1Kd29ciJ2MojRrnb1EpQqT0HYuK/dlqV+aF7zDk7U/c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=G8AFQK0I; arc=none smtp.client-ip=209.127.230.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="G8AFQK0I" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1781608369; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=IQWvTs8KMqCd06PqYrP8SQSRn9LOACFGe0wZnkGdl2E=; b=G8AFQK0IszFnn7Gt9tiAiD0ly6JjyoXb1taAYLxTchVfJMN1mbVAWwTtwSUc3JP0ScxomI G9IQX0MMM9+WFp67HQRWvcYueg4E/txBAIVO1GRW9vtSGyLyodgbj1pX2k+uoQHxDZqtt3 GQqr7O3b+zO6NS1JLI4kME8pjnzYlbvrQV/MYNCLkXgxC1IUesUlgwRd+iyOdtez9zAh5l FHVZfx3oHfOC4FHtAo0J5yUTI6FpD1TgVRKNPfTP8+D/7h5dLPMmxLtQOhMW1uNXrDsYqz N7i1E8GHjvBOCN5OwVvQKC6eY0Rf2GIGOKGbcAAViRP1snEKxNFcepNg4wUZkw== To: , , , , , , , , , , , , , Subject: [PATCH v8 02/14] smp: Enable preemption early in smp_call_function_single() Message-Id: <20260616111127.966468-3-zhouchuyi@bytedance.com> Cc: , "Chuyi Zhou" In-Reply-To: <20260616111127.966468-1-zhouchuyi@bytedance.com> From: "Chuyi Zhou" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Date: Tue, 16 Jun 2026 19:11:15 +0800 X-Lms-Return-Path: Content-Transfer-Encoding: quoted-printable X-Mailer: git-send-email 2.20.1 References: <20260616111127.966468-1-zhouchuyi@bytedance.com> X-Original-From: Chuyi Zhou Content-Type: text/plain; charset="utf-8" Now smp_call_function_single() disables preemption mainly for the following reasons: - To protect the per-cpu csd_data from concurrent modification by other tasks on the current CPU in the !wait case. For the wait case, synchronization is not a concern as on-stack csd is used. - To prevent the remote online CPU from being offlined. Specifically, we want to ensure that no new IPIs are queued after smpcfd_dying_cpu() has finished. Disabling preemption for the entire execution is unnecessary, especially csd_lock_wait() part does not require preemption protection. This patch enables preemption before csd_lock_wait() to reduce the preemption-disabled critical section. Signed-off-by: Chuyi Zhou Reviewed-by: Muchun Song Reviewed-by: Steven Rostedt (Google) Reviewed-by: Sebastian Andrzej Siewior Tested-by: Paul E. McKenney --- kernel/smp.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index b58975480e11..292eefadddbc 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -700,11 +700,16 @@ int smp_call_function_single(int cpu, smp_call_func_t= func, void *info, =20 err =3D generic_exec_single(cpu, csd); =20 + /* + * @csd is stack-allocated when @wait is true. No concurrent access + * except from the IPI completion path, so we can re-enable preemption + * early to reduce latency. + */ + put_cpu(); + if (wait) csd_lock_wait(csd); =20 - put_cpu(); - return err; } EXPORT_SYMBOL(smp_call_function_single); --=20 2.20.1 From nobody Wed Jun 17 02:52:25 2026 Received: from va-1-114.ptr.blmpb.com (va-1-114.ptr.blmpb.com [209.127.230.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D4183BB105 for ; Tue, 16 Jun 2026 11:15:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.114 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608511; cv=none; b=i0VhdXi9z7V3sjBy3CTKazboA2ycSnW81nAieiBXV8xvE4n7oP57mxW+LRpkPZErx2QU9vHiIHgGsC/fbzmhoMxVSydRJR84sPdmseDfKyYPNbJ+cA1o3Z85t+2bbqxqOnQULcCzaT/4wgC+ZHj0WY/GVjL1paD8G5RDllaAwic= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608511; c=relaxed/simple; bh=DaoCTKAEmmS1xb24C8REU2CZ3tyO1voA8knm3HDbnMY=; h=Date:Message-Id:In-Reply-To:To:Cc:From:References:Content-Type: Subject:Mime-Version; b=GcEGc01feZAfAkccOmSbgVhNuoIkToAvzo7MD/N5iGIiHf9wC6UZqV7qM/ZM+aHYohcFKewJyigZmBuesXkrXtPrkM3+c2s+eBpxrQOBTnNdYar7ryGv28slItPTzduNaWFrv+6A9t9h16Dp5jzGDsgRhzkZo9xyHmiWpq5npXQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=mGQuDfKG; arc=none smtp.client-ip=209.127.230.114 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="mGQuDfKG" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1781608385; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=wePV7J6zUH0vCzblJqncGcHwvRMzxtg0nNGmKrGLCi8=; b=mGQuDfKGmNXZFswwwr5QaT604jm0ICNPE01pd8wzheUfDW8Lfo+0O6KrmFiRgh+PJlVUcj 3fJrRsoq+VErnDKQNNL5tawPwNZuuiOfwtG3EX0xtcRclH824dYcKSeZ/i/3JANYcXkYpH vd4EskFbZM9/RUKuR9ji3m/wf9iXMGZFXNoDRCjU/kuXtVFLikE7lO5j6NYL1tHD5kVPdd 8fZTAs1rlxSQNqKufCQMuG0oKj4/WCRmhugSaOo7QujTEFnGzBfWMjpt4TN6pPehe+JMYe JfTP4QWHHotb1/vM4pbs/580qn5m80/vwC098IF8AwibqK1sgxfv0mdJkQSwQw== Date: Tue, 16 Jun 2026 19:11:16 +0800 Message-Id: <20260616111127.966468-4-zhouchuyi@bytedance.com> Content-Transfer-Encoding: quoted-printable In-Reply-To: <20260616111127.966468-1-zhouchuyi@bytedance.com> To: , , , , , , , , , , , , , Cc: , "Chuyi Zhou" From: "Chuyi Zhou" References: <20260616111127.966468-1-zhouchuyi@bytedance.com> X-Original-From: Chuyi Zhou X-Mailer: git-send-email 2.20.1 Subject: [PATCH v8 03/14] smp: Refactor remote CPU selection in smp_call_function_any() Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Lms-Return-Path: Content-Type: text/plain; charset="utf-8" Currently, smp_call_function_any() disables preemption across the entire process of picking a target CPU, enqueueing the IPI, and synchronously waiting for the remote CPU. Since smp_call_function_single() has already been optimized to re-enable preemption before the synchronous csd_lock_wait(), callers of smp_call_function_any() should also benefit from this optimization to reduce the preemption-disabled critical section. A naive approach would be to simply remove get_cpu() and put_cpu() from smp_call_function_any(), leaving the preemption disablement entirely to smp_call_function_single(). However, doing so opens a dangerous preemption window between picking the remote CPU (e.g., via sched_numa_find_nth_cpu()) and dispatching the IPI inside smp_call_function_single(). If the selected remote CPU is fully offlined during this window, smp_call_function_single() will fail its cpu_online() check and return -ENXIO directly to the caller, violating the guarantee to execute on *any* online CPU in the mask. To safely enable this optimization, this patch refactors the logic of smp_call_function_any() and smp_call_function_single(). By moving the random remote CPU selection into a common __smp_call_function_single(), and keep the entire selection and IPI dispatch process within a single preemption-disabled region. Signed-off-by: Chuyi Zhou Reviewed-by: Sebastian Andrzej Siewior Tested-by: Paul E. McKenney --- kernel/smp.c | 48 ++++++++++++++++++++++++++---------------------- 1 file changed, 26 insertions(+), 22 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 292eefadddbc..9e9dab3b0d51 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -641,17 +641,8 @@ void flush_smp_call_function_queue(void) local_irq_restore(flags); } =20 -/** - * smp_call_function_single - Run a function on a specific CPU - * @cpu: Specific target CPU for this function. - * @func: The function to run. This must be fast and non-blocking. - * @info: An arbitrary pointer to pass to the function. - * @wait: If true, wait until function has completed on other CPUs. - * - * Returns: %0 on success, else a negative status code. - */ -int smp_call_function_single(int cpu, smp_call_func_t func, void *info, - int wait) +static int __smp_call_function_single(int cpu, smp_call_func_t func, + void *info, const struct cpumask *mask, int wait) { call_single_data_t *csd; call_single_data_t csd_stack =3D { @@ -668,6 +659,14 @@ int smp_call_function_single(int cpu, smp_call_func_t = func, void *info, */ this_cpu =3D get_cpu(); =20 + if (mask) { + /* Try for same CPU (cheapest) */ + if (!cpumask_test_cpu(this_cpu, mask)) + cpu =3D sched_numa_find_nth_cpu(mask, 0, cpu_to_node(this_cpu)); + else + cpu =3D this_cpu; + } + /* * Can deadlock when called with interrupts disabled. * We allow cpu's that are not yet online though, as no one else can @@ -712,6 +711,21 @@ int smp_call_function_single(int cpu, smp_call_func_t = func, void *info, =20 return err; } + +/** + * smp_call_function_single - Run a function on a specific CPU + * @cpu: Specific target CPU for this function. + * @func: The function to run. This must be fast and non-blocking. + * @info: An arbitrary pointer to pass to the function. + * @wait: If true, wait until function has completed on other CPUs. + * + * Returns: %0 on success, else a negative status code. + */ +int smp_call_function_single(int cpu, smp_call_func_t func, void *info, + int wait) +{ + return __smp_call_function_single(cpu, func, info, NULL, wait); +} EXPORT_SYMBOL(smp_call_function_single); =20 /** @@ -776,17 +790,7 @@ EXPORT_SYMBOL_GPL(smp_call_function_single_async); int smp_call_function_any(const struct cpumask *mask, smp_call_func_t func, void *info, int wait) { - unsigned int cpu; - int ret; - - /* Try for same CPU (cheapest) */ - cpu =3D get_cpu(); - if (!cpumask_test_cpu(cpu, mask)) - cpu =3D sched_numa_find_nth_cpu(mask, 0, cpu_to_node(cpu)); - - ret =3D smp_call_function_single(cpu, func, info, wait); - put_cpu(); - return ret; + return __smp_call_function_single(-1, func, info, mask, wait); } EXPORT_SYMBOL_GPL(smp_call_function_any); =20 --=20 2.20.1 From nobody Wed Jun 17 02:52:25 2026 Received: from va-1-113.ptr.blmpb.com (va-1-113.ptr.blmpb.com [209.127.230.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64B88436344 for ; Tue, 16 Jun 2026 11:13:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608409; cv=none; b=Q314AKhv0cOF8QT4Bez2kSMOjW6NN1GY1Gb792H7mSRvVNSZ9A51vA95euVGshdYVsNTUm/akY4LqVBdJlzhAZatKYtEhO+hypFXM5CmHXQzUFXAPptlwBua8+HQuqCSwXUe+cF5aUsxDkTCfpgMsE6LmXRqoNBb7Z9pni5Lm3w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608409; c=relaxed/simple; bh=XQTkOFY96VxR6Fp/1Yh+/8uhD0S4Q/6rgi7kmACRNoE=; h=Date:References:Cc:Subject:Message-Id:Mime-Version:From: Content-Type:To:In-Reply-To; b=tw7AuSwDRtFfIeE6HpMtayjygzYIMkBGA+Hk3r+PrSVucWmY4n7ITH+96fdUFr+5hmH8UNVcpNPsrtODK/iNOmsknUTYglUI/ssGAa2nq4fvcsnOEFciu1/BsxKQGAbSwd34i/Y39bpgnbUyEwVqiLOwiWx+aNsQo1uuDl854K0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=pLrh+Oup; arc=none smtp.client-ip=209.127.230.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="pLrh+Oup" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1781608403; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=xWP2euNy/iozxWMwiQ0p6EryNIapTq0UdR8owm37jAk=; b=pLrh+OupsGYpjHUziDzYonCYZOivqg/fpcOoRkLZGeqi3gFpBH+zY8tpk3LmtpoH3+WzSp iBUHYOmvJGpxD6GF8IH0SuA3bCHQg5TMXgu+bnFEkfjq3NPNg345pIxxiGv0v5MfuaJV0b lUyusVmeldCYc3o7LNRAPH0+gIjn32tx6UcWQDEH/R19iotGSM8K/2IPghAlohFi6OSjzB ZzMrsP7VRmS8RLnpPhbzFMEqTKOoD7z6Eikc27sZa/JsFTFZcwNuxVwVNpKu7kcOcAVgiI w94X4BWpFVbmfWM7Uac/OzO+iR3a6SehDHYd0ShP5Ybn1fn2jwuw559JMUTTbQ== Date: Tue, 16 Jun 2026 19:11:17 +0800 References: <20260616111127.966468-1-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 X-Lms-Return-Path: Cc: , "Chuyi Zhou" Subject: [PATCH v8 04/14] smp: Use task-local IPI cpumask in smp_call_function_many_cond() X-Original-From: Chuyi Zhou Message-Id: <20260616111127.966468-5-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 From: "Chuyi Zhou" Content-Transfer-Encoding: quoted-printable To: , , , , , , , , , , , , , In-Reply-To: <20260616111127.966468-1-zhouchuyi@bytedance.com> Content-Type: text/plain; charset="utf-8" This patch prepares the task-local IPI cpumask during thread creation, and uses the local cpumask to replace the percpu cfd cpumask in smp_call_function_many_cond(). We will enable preemption during csd_lock_wait() later, and this can prevent concurrent access to the cfd->cpumask from other tasks on the current CPU. For cases where cpumask_size() is smaller than or equal to the pointer size, it tries to stash the cpumask in the pointer itself to avoid extra memory allocations. Signed-off-by: Chuyi Zhou Reviewed-by: Sebastian Andrzej Siewior Tested-by: Paul E. McKenney --- include/linux/sched.h | 6 ++++ include/linux/smp.h | 11 ++++++++ kernel/fork.c | 9 +++++- kernel/smp.c | 66 +++++++++++++++++++++++++++++++++++++++---- 4 files changed, 85 insertions(+), 7 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 35e6183ef615..c76c4c6c6b19 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1364,6 +1364,12 @@ struct task_struct { struct list_head perf_event_list; struct perf_ctx_data __rcu *perf_ctx_data; #endif +#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPTION) + union { + cpumask_t *ipi_mask_ptr; + unsigned long ipi_mask_val; + }; +#endif #ifdef CONFIG_DEBUG_PREEMPT unsigned long preempt_disable_ip; #endif diff --git a/include/linux/smp.h b/include/linux/smp.h index 6925d15ccaa7..15da884114cb 100644 --- a/include/linux/smp.h +++ b/include/linux/smp.h @@ -239,6 +239,17 @@ static inline int get_boot_cpu_id(void) =20 #endif /* !SMP */ =20 +#if defined(CONFIG_PREEMPTION) && defined(CONFIG_SMP) +int smp_task_ipi_mask_alloc(struct task_struct *task); +void smp_task_ipi_mask_free(struct task_struct *task); +#else +static inline int smp_task_ipi_mask_alloc(struct task_struct *task) +{ + return 0; +} +static inline void smp_task_ipi_mask_free(struct task_struct *task) { } +#endif + /* * raw_smp_processor_id() - get the current (unstable) CPU id * diff --git a/kernel/fork.c b/kernel/fork.c index 6fcca1db0af3..37f8343a3b74 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -535,6 +535,7 @@ void free_task(struct task_struct *tsk) #endif release_user_cpus_ptr(tsk); scs_release(tsk); + smp_task_ipi_mask_free(tsk); =20 #ifndef CONFIG_THREAD_INFO_IN_TASK /* @@ -933,10 +934,14 @@ static struct task_struct *dup_task_struct(struct tas= k_struct *orig, int node) #endif account_kernel_stack(tsk, 1); =20 - err =3D scs_prepare(tsk, node); + err =3D smp_task_ipi_mask_alloc(tsk); if (err) goto free_stack; =20 + err =3D scs_prepare(tsk, node); + if (err) + goto free_ipi_mask; + #ifdef CONFIG_SECCOMP /* * We must handle setting up seccomp filters once we're under @@ -1007,6 +1012,8 @@ static struct task_struct *dup_task_struct(struct tas= k_struct *orig, int node) #endif return tsk; =20 +free_ipi_mask: + smp_task_ipi_mask_free(tsk); free_stack: exit_task_stack_account(tsk); free_thread_stack(tsk); diff --git a/kernel/smp.c b/kernel/smp.c index 9e9dab3b0d51..8f8a9ee2ad11 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -794,6 +795,49 @@ int smp_call_function_any(const struct cpumask *mask, } EXPORT_SYMBOL_GPL(smp_call_function_any); =20 +static DEFINE_STATIC_KEY_FALSE(ipi_mask_inlined); + +#ifdef CONFIG_PREEMPTION + +int smp_task_ipi_mask_alloc(struct task_struct *task) +{ + if (static_branch_unlikely(&ipi_mask_inlined)) + return 0; + + task->ipi_mask_ptr =3D kmalloc(cpumask_size(), GFP_KERNEL); + if (!task->ipi_mask_ptr) + return -ENOMEM; + + return 0; +} + +void smp_task_ipi_mask_free(struct task_struct *task) +{ + if (static_branch_unlikely(&ipi_mask_inlined)) + return; + + kfree(task->ipi_mask_ptr); +} + +static cpumask_t *smp_task_ipi_mask(struct task_struct *cur) +{ + /* + * If cpumask_size() is smaller than or equal to the pointer + * size, it stashes the cpumask in the pointer itself to + * avoid extra memory allocations. + */ + if (static_branch_unlikely(&ipi_mask_inlined)) + return (cpumask_t *)&cur->ipi_mask_val; + + return cur->ipi_mask_ptr; +} +#else +static cpumask_t *smp_task_ipi_mask(struct task_struct *cur) +{ + return NULL; +} +#endif + /* * Flags to be used as scf_flags argument of smp_call_function_many_cond(). * @@ -811,11 +855,19 @@ static void smp_call_function_many_cond(const struct = cpumask *mask, int cpu, last_cpu, this_cpu =3D smp_processor_id(); struct call_function_data *cfd; bool wait =3D scf_flags & SCF_WAIT; + struct cpumask *cpumask, *task_mask; int nr_cpus =3D 0; bool run_remote =3D false; =20 lockdep_assert_preemption_disabled(); =20 + task_mask =3D smp_task_ipi_mask(current); + cfd =3D this_cpu_ptr(&cfd_data); + if (task_mask) + cpumask =3D task_mask; + else + cpumask =3D cfd->cpumask; + /* * Can deadlock when called with interrupts disabled. * We allow cpu's that are not yet online though, as no one else can @@ -836,16 +888,15 @@ static void smp_call_function_many_cond(const struct = cpumask *mask, =20 /* Check if we need remote execution, i.e., any CPU excluding this one. */ if (cpumask_any_and_but(mask, cpu_online_mask, this_cpu) < nr_cpu_ids) { - cfd =3D this_cpu_ptr(&cfd_data); - cpumask_and(cfd->cpumask, mask, cpu_online_mask); - __cpumask_clear_cpu(this_cpu, cfd->cpumask); + cpumask_and(cpumask, mask, cpu_online_mask); + __cpumask_clear_cpu(this_cpu, cpumask); =20 cpumask_clear(cfd->cpumask_ipi); - for_each_cpu(cpu, cfd->cpumask) { + for_each_cpu(cpu, cpumask) { call_single_data_t *csd =3D per_cpu_ptr(cfd->csd, cpu); =20 if (cond_func && !cond_func(cpu, info)) { - __cpumask_clear_cpu(cpu, cfd->cpumask); + __cpumask_clear_cpu(cpu, cpumask); continue; } =20 @@ -896,7 +947,7 @@ static void smp_call_function_many_cond(const struct cp= umask *mask, } =20 if (run_remote && wait) { - for_each_cpu(cpu, cfd->cpumask) { + for_each_cpu(cpu, cpumask) { call_single_data_t *csd; =20 csd =3D per_cpu_ptr(cfd->csd, cpu); @@ -1010,6 +1061,9 @@ EXPORT_SYMBOL(nr_cpu_ids); void __init setup_nr_cpu_ids(void) { set_nr_cpu_ids(find_last_bit(cpumask_bits(cpu_possible_mask), NR_CPUS) + = 1); + + if (IS_ENABLED(CONFIG_PREEMPTION) && cpumask_size() <=3D sizeof(unsigned = long)) + static_branch_enable(&ipi_mask_inlined); } =20 /* Called by boot processor to activate the rest. */ --=20 2.20.1 From nobody Wed Jun 17 02:52:25 2026 Received: from va-1-115.ptr.blmpb.com (va-1-115.ptr.blmpb.com [209.127.230.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DA89D43C056 for ; Tue, 16 Jun 2026 11:13:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.115 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608434; cv=none; b=X17efmvs8oYWUt1Z946wtUAbQh1+GsWA/x4L6Mrkn69jSH9NettiYU4+s568ZcFKLfHRVNnam4UgN9pqp/kf4KZnzai+1vp6NHs9OwE90bebAUVtcpJ44zJp9MB9I+GwnyECnZvWvgQzYwMJGtDSPerAk8JQGAYJWcGoy4oblss= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608434; c=relaxed/simple; bh=T2xs9v4QNzc+YxYskBzD2NeNfY0oZrIUShTHyOtgYn0=; h=Date:Content-Type:Subject:Message-Id:Mime-Version:To:Cc:From: References:In-Reply-To; b=swhUUQ8YfRSe1eRk/Vzv2mBAZnLt+h2epLQOkBcjGRnd9GIp0068pzy1XZTT4qAZBeIVsGmZgL92hJok5Jg/hWTPjAycDojCvUR7PMmZRy2tN4qRBg2CeBvmMvVspc5qfqCmUYsXTx+GenditpMSIXseUwJDGxy2HCJoqbRU5BQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=edSiK075; arc=none smtp.client-ip=209.127.230.115 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="edSiK075" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1781608422; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=pO46IL8QWb9ZB8R4f6WGAlcAMqyemLMdaVa3NlmJ4ec=; b=edSiK075K9KfQtYM86akK9fPP0CVSOcDbdXhVbAOuUwl6oBUIKlgPlSxE2xkxatT4xvJzc bt80/AM8LcHxHustoDsHD476L9TgvdVAODxmgXRj8hOiobvaQd4TUyeCc4RpiSu7WwcWmv TyoyCrP+sBOVzuxGvtSRaO4T0t6jhpxMBnf8XwsZ6V+89eFn7foZGsLyUjjmC9T90x3wA/ d9BNqxicb4+N1qWIf/kKsCrAFBy6GMMyX+H8eA4V7lU7arMr3ZNGDLagOJk9EsRTFHp1oT TWFf2mIr/c8wkCnkK5RyGi1DxZOWmVopo3rWQijDsMYtFLIo4mlaym7xdnR0yw== Date: Tue, 16 Jun 2026 19:11:18 +0800 X-Lms-Return-Path: X-Mailer: git-send-email 2.20.1 Subject: [PATCH v8 05/14] smp: Alloc percpu csd data in smpcfd_prepare_cpu() only once Message-Id: <20260616111127.966468-6-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Original-From: Chuyi Zhou To: , , , , , , , , , , , , , Cc: , "Chuyi Zhou" From: "Chuyi Zhou" References: <20260616111127.966468-1-zhouchuyi@bytedance.com> In-Reply-To: <20260616111127.966468-1-zhouchuyi@bytedance.com> Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Later patch would enable preemption during csd_lock_wait() in smp_call_function_many_cond(), which may cause access cfd->csd data that has already been freed in smpcfd_dead_cpu(). One way to fix the above issue is to use the RCU mechanism to protect the csd data and wait for all read critical sections to exit before freeing the memory in smpcfd_dead_cpu(), but this could delay CPU shutdown. This patch chooses a simpler approach: allocate the percpu csd on the UP side only once and skip freeing the csd memory in smpcfd_dead_cpu(). Suggested-by: Sebastian Andrzej Siewior Signed-off-by: Chuyi Zhou Acked-by: Muchun Song Reviewed-by: Sebastian Andrzej Siewior Tested-by: Paul E. McKenney --- kernel/smp.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 8f8a9ee2ad11..9ef136bacda0 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -64,7 +64,15 @@ int smpcfd_prepare_cpu(unsigned int cpu) free_cpumask_var(cfd->cpumask); return -ENOMEM; } - cfd->csd =3D alloc_percpu(call_single_data_t); + + /* + * The percpu csd is allocated only once and never freed. + * This ensures that smp_call_function_many_cond() can safely + * access the csd of an offlined CPU if it gets preempted + * during csd_lock_wait(). + */ + if (!cfd->csd) + cfd->csd =3D alloc_percpu(call_single_data_t); if (!cfd->csd) { free_cpumask_var(cfd->cpumask); free_cpumask_var(cfd->cpumask_ipi); @@ -80,7 +88,6 @@ int smpcfd_dead_cpu(unsigned int cpu) =20 free_cpumask_var(cfd->cpumask); free_cpumask_var(cfd->cpumask_ipi); - free_percpu(cfd->csd); return 0; } =20 --=20 2.20.1 From nobody Wed Jun 17 02:52:25 2026 Received: from va-1-113.ptr.blmpb.com (va-1-113.ptr.blmpb.com [209.127.230.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E09153B841D for ; Tue, 16 Jun 2026 11:14:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608447; cv=none; b=uMJWRPIxzrljMJuLMq6C6uKVazeTVMm+7WYwiUS/UgmMjdDL2+yJu53iXR+zatgqIqqo1/j5fMZO0pVAkB/6wmDW+4UrfxGxh/qdTxE3z6jVY56BOM4vRqBsVg1GcVz4u+AoEFNMaHFod+0j5KQtrcL2spnfdHtqQNR61UcfpvA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608447; c=relaxed/simple; bh=uRWkY2wQs8YgnNqUQ36nNxK+pjSBYXc+FXNwn9jsCVA=; h=Date:In-Reply-To:References:From:Cc:To:Content-Type:Message-Id: Mime-Version:Subject; b=MwuVwQHXN5n/NiztdFYlfrqk4h4FLiziwd86siBNLN/2MBXrl3xeumnGkrw3f1zTHhIu5WqnGWUdUQy0i2wV0HKOVBDqSetSHJ2cabi6YURYAUimW6qrhBxXaFjmwH7BE005Q2sF5CQu3uvKhNX2JexwbyJtdoA3umuD6UMMQe4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=VMOcyfSa; arc=none smtp.client-ip=209.127.230.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="VMOcyfSa" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1781608441; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=CtCQ19MYHoDYIqTakdxf2422YXxfy5+A3fzTiz/hZlo=; b=VMOcyfSafrBUJsUnlCjviVkaBAWtodXc0F598YE2e8PM1u2SpDuHA6JahJaBFko7dhistr n9f7jIyfUggY0k2W9y8kcQ2CisIcWDnwxygNs+zYuilWoLxz9NjGuMFY39MPMGsfHDnV24 i/TIO/K8cvIbdyRpZP+dICWQgh5ZaLxjn5zUK/sYYqLEvqi2NIr9+DMDdspFLHc5HO61Yt NViIdljGznLyH2Ls42qCe0Sfs05tjBYHyrxNYMOzDP7/FMi4OekLBBoKExMgWC/y3xneOW lMWB/duyNCFUmcO49K1oLzWzvz9nQdJ9RHbXy8K7ZQPElBaFKbNdE/xd5XlLdA== Date: Tue, 16 Jun 2026 19:11:19 +0800 In-Reply-To: <20260616111127.966468-1-zhouchuyi@bytedance.com> References: <20260616111127.966468-1-zhouchuyi@bytedance.com> From: "Chuyi Zhou" Cc: , "Chuyi Zhou" X-Original-From: Chuyi Zhou X-Mailer: git-send-email 2.20.1 To: , , , , , , , , , , , , , Message-Id: <20260616111127.966468-7-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Lms-Return-Path: Subject: [PATCH v8 06/14] smp: Enable preemption early in smp_call_function_many_cond() Content-Type: text/plain; charset="utf-8" Disabling preemption entirely during smp_call_function_many_cond() was primarily for the following reasons: - To prevent the remote online CPU from going offline. Specifically, we want to ensure that no new csds are queued after smpcfd_dying_cpu() has finished. Therefore, preemption must be disabled until all necessary IPIs are sent. - To prevent current CPU from going offline. Being migrated to another CPU and calling csd_lock_wait() may cause UAF due to smpcfd_dead_cpu() during the current CPU offline process. - To protect the per-cpu cfd_data from concurrent modification by other tasks on the current CPU. cfd_data contains cpumasks and per-cpu csds. Before enqueueing a csd, we block on the csd_lock() to ensure the previous async csd->func() has completed, and then initialize csd->func and csd->info. After sending the IPI, we spin-wait for the remote CPU to call csd_unlock(). Actually the csd_lock mechanism already guarantees csd serialization. If preemption occurs during csd_lock_wait, other concurrent smp_call_function_many_cond calls will simply block until the previous csd->func() completes: task A task B sd->func =3D fun_a send ipis preempted by B ---------------> csd_lock(csd); // block until last // fun_a finished csd->func =3D func_b; csd->info =3D info; ... send ipis switch back to A <--------------- csd_lock_wait(csd); // block until remote finish func_* Previous patches replaced the per-cpu cfd->cpumask with task-local cpumask, and the percpu csd is allocated only once and is never freed to ensure we can safely access csd. Now we can enable preemption before csd_lock_wait() which makes the potentially unpredictable csd_lock_wait() preemptible and migratable. Signed-off-by: Chuyi Zhou Reviewed-by: Sebastian Andrzej Siewior Tested-by: Paul E. McKenney --- kernel/smp.c | 22 +++++++++++++++++----- 1 file changed, 17 insertions(+), 5 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 9ef136bacda0..390e6526574c 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -859,15 +859,14 @@ static void smp_call_function_many_cond(const struct = cpumask *mask, unsigned int scf_flags, smp_cond_func_t cond_func) { - int cpu, last_cpu, this_cpu =3D smp_processor_id(); + int cpu, last_cpu, this_cpu; struct call_function_data *cfd; bool wait =3D scf_flags & SCF_WAIT; struct cpumask *cpumask, *task_mask; int nr_cpus =3D 0; bool run_remote =3D false; =20 - lockdep_assert_preemption_disabled(); - + this_cpu =3D get_cpu(); task_mask =3D smp_task_ipi_mask(current); cfd =3D this_cpu_ptr(&cfd_data); if (task_mask) @@ -953,6 +952,17 @@ static void smp_call_function_many_cond(const struct c= pumask *mask, local_irq_restore(flags); } =20 + /* + * Waiting for completion can take time, especially with many CPUs. + * On a PREEMPT kernel a per-task cpumask is used to track CPUs with + * pending IPI requests. This allows preemption to be enabled before + * waiting. On a !PREEMPT kernel the cpumask is shared and the call + * must block until completion to avoid modifications by another caller + * on this CPU. + */ + if (task_mask) + put_cpu(); + if (run_remote && wait) { for_each_cpu(cpu, cpumask) { call_single_data_t *csd; @@ -961,6 +971,9 @@ static void smp_call_function_many_cond(const struct cp= umask *mask, csd_lock_wait(csd); } } + + if (!task_mask) + put_cpu(); } =20 /** @@ -972,8 +985,7 @@ static void smp_call_function_many_cond(const struct cp= umask *mask, * on other CPUs. * * You must not call this function with disabled interrupts or from a - * hardware interrupt handler or from a bottom half handler. Preemption - * must be disabled when calling this function. + * hardware interrupt handler or from a bottom half handler. * * @func is not called on the local CPU even if @mask contains it. Consid= er * using on_each_cpu_cond_mask() instead if this is not desirable. --=20 2.20.1 From nobody Wed Jun 17 02:52:25 2026 Received: from va-1-111.ptr.blmpb.com (va-1-111.ptr.blmpb.com [209.127.230.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 147CD3AA1B5 for ; Tue, 16 Jun 2026 11:14:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.111 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608465; cv=none; b=oDjyq05kXge7iDmJ5Q8h0Tkwjsqxhh2tRKHzdpreS7fQuC/73yJSwQC7x3CYyI/SsHOrSuftb9sEvZv718BtdpCxN6hNI3eWnkoqDmWpjLDQZSldPAKOkKu5T+CEeeRNpZvQCIfOT8INq6pI8trNz+16d7vDl61hsAQqdcPstNY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608465; c=relaxed/simple; bh=AemR9iwk7VLpx+DxdESEd9Bpa8Mxx+NmvQ/u7MWkZ+4=; h=Subject:Message-Id:In-Reply-To:From:Content-Type:To:Cc:References: Date:Mime-Version; b=H7gULZUmIYugoXPa3Amb6XdnTH3dIH7J0XAN99OAqB0X5EtVWFyJdYAPciji2EFzu7slQPmMU+PYNlX2znWMpHzS3h7RQxejVhQdY9YKNXPjgqKBFL/BwJP286mQzYBwoZt3OVvodyFmxGeXrijzIYTTno2b47YxwiJBH/Y0jJE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=nrg0gObV; arc=none smtp.client-ip=209.127.230.111 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="nrg0gObV" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1781608459; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=vTZJvD8hE4khuR+UlhrXRdjUapZ6kr9QUGV9VwCRf70=; b=nrg0gObV2zl7Qv5q+hc8kKpFJkx3TzTpIHE+X4noSgFTPGzOBlHgK5kGrv4klFH9vSfWqg x9zOwTer9Eennzop3RqQqhRrO2tlhqrIgITNBzyRSJ9QZ3N/+bXoBfUiqex6hIpcZw4c5I fXRk1uQrvCQDjRzjF63JbhmQr5QVkz4TGeFNJsTaXXelJuwcANcLXxM+Tc/rgdUnu28cq2 MKTlgMvuEdE5upTyct3VtHJTw0UNCvsP+W3XdjdQlUZBINEftytJiKZzIDA7BtsCmtdna5 oR4cHa80rWCpg45EovvnjCj9OWiAfKLFlzgiVmm06OVc4YD48tpM+eUG2IeGRQ== Subject: [PATCH v8 07/14] smp: Remove preempt_disable() from smp_call_function() Message-Id: <20260616111127.966468-8-zhouchuyi@bytedance.com> X-Lms-Return-Path: X-Original-From: Chuyi Zhou In-Reply-To: <20260616111127.966468-1-zhouchuyi@bytedance.com> Content-Transfer-Encoding: quoted-printable From: "Chuyi Zhou" To: , , , , , , , , , , , , , Cc: , "Chuyi Zhou" X-Mailer: git-send-email 2.20.1 References: <20260616111127.966468-1-zhouchuyi@bytedance.com> Date: Tue, 16 Jun 2026 19:11:20 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Now smp_call_function_many_cond() internally handles the preemption logic, so smp_call_function() does not need to explicitly disable preemption. Remove preempt_{enable, disable} from smp_call_function(). Signed-off-by: Chuyi Zhou Reviewed-by: Muchun Song Reviewed-by: Sebastian Andrzej Siewior Tested-by: Paul E. McKenney --- kernel/smp.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 390e6526574c..096d857dc3a5 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -1012,9 +1012,8 @@ EXPORT_SYMBOL(smp_call_function_many); */ void smp_call_function(smp_call_func_t func, void *info, int wait) { - preempt_disable(); - smp_call_function_many(cpu_online_mask, func, info, wait); - preempt_enable(); + smp_call_function_many_cond(cpu_online_mask, func, info, + wait ? SCF_WAIT : 0, NULL); } EXPORT_SYMBOL(smp_call_function); =20 --=20 2.20.1 From nobody Wed Jun 17 02:52:25 2026 Received: from va-1-113.ptr.blmpb.com (va-1-113.ptr.blmpb.com [209.127.230.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 98C573AA1B5 for ; Tue, 16 Jun 2026 11:14:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608481; cv=none; b=Ey1T5A4+oL8NnxetY0mH/8WVCY0RDk2HeMc6oz1IdQFJYz3Ws9hku0ZXAPDxPJ46wdl9QpmOj81erMi51GfSL9Vrkk5ZeK9HGEzwbThV9xH4ZnqGvGj0gIHiAmx0rGklYXlZRjpBOuNny9gZTSUJ47SUeMIOdbfg1CvZZF/hkC0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608481; c=relaxed/simple; bh=0P122Fn2mUj74BKQlVzqos7y8K6F9Rz84hoFPo6v9F4=; h=To:Message-Id:Subject:Date:Mime-Version:From:In-Reply-To: References:Cc:Content-Type; b=ZOUutX3dxwjf+mXDyNiofdXr9FvI9KsWGhV62AFBi8g45LvgrLpgEAB1FKddX9OdfPPnTaSGz+zuAyrKFDts2LOll+86QQPFeHEEMdGXihSN5j/gVexwWWGRmpEJxc0bGLd9SVOVbFRaxUtaUB84VnStMjPPFaLdiUWTTP89Vx4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=PC5PUnH4; arc=none smtp.client-ip=209.127.230.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="PC5PUnH4" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1781608477; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=9UYt8ywRbTQvOGXzJfKVSiQceNMM8+++Ffei3z22Wn4=; b=PC5PUnH4NjYoVJC+S0pk5xv6nyQ4VXV7afbFw5x0PPnLcS6uPpUT7z2DjjU5j1ZqL8ogjt /2YMJ2xN3Y5SjwAapYsIE4E2MSuzclmLF3E7fY3bVE+yKFLRhdxCBIOSTaWcDOe2vMW3Us l4FirpLHF/y5kvc0dnSttLn1UeieaJilVhXRdHrBgSdMA47Iet2fvl1e5E/DzguMXvgdaD mgqGE6HKiY0ISuwe2LmeutRV505IK85JKbXECn3ftSoNHZsrngVpzcIKO+rm7Ky5xFWzax tcIu2LwUU/wuwSUa+xYAZA+BjB2YiayIroI/Hw3LUKsXTPYtIDVJ96hGU63EgQ== To: , , , , , , , , , , , , , Message-Id: <20260616111127.966468-9-zhouchuyi@bytedance.com> X-Lms-Return-Path: Subject: [PATCH v8 08/14] smp: Remove preempt_disable() from on_each_cpu_cond_mask() Date: Tue, 16 Jun 2026 19:11:21 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Original-From: Chuyi Zhou From: "Chuyi Zhou" X-Mailer: git-send-email 2.20.1 In-Reply-To: <20260616111127.966468-1-zhouchuyi@bytedance.com> References: <20260616111127.966468-1-zhouchuyi@bytedance.com> Cc: , "Chuyi Zhou" Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now smp_call_function_many_cond() internally handles the preemption logic, so on_each_cpu_cond_mask does not need to explicitly disable preemption. Remove preempt_{enable, disable} from on_each_cpu_cond_mask(). Signed-off-by: Chuyi Zhou Reviewed-by: Muchun Song Reviewed-by: Sebastian Andrzej Siewior Tested-by: Paul E. McKenney --- kernel/smp.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/kernel/smp.c b/kernel/smp.c index 096d857dc3a5..0595e0043a23 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -1136,9 +1136,7 @@ void on_each_cpu_cond_mask(smp_cond_func_t cond_func,= smp_call_func_t func, if (wait) scf_flags |=3D SCF_WAIT; =20 - preempt_disable(); smp_call_function_many_cond(mask, func, info, scf_flags, cond_func); - preempt_enable(); } EXPORT_SYMBOL(on_each_cpu_cond_mask); =20 --=20 2.20.1 From nobody Wed Jun 17 02:52:25 2026 Received: from va-1-114.ptr.blmpb.com (va-1-114.ptr.blmpb.com [209.127.230.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E6A43436374 for ; Tue, 16 Jun 2026 11:15:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.114 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608506; cv=none; b=rJiYylJf/QEB+D5plh5weG1CJhsUPLxtVUvpk/JLEHm5Cn7l7JHZJ9g6Ibj/qikAhgum23+4sv4N7TnlGGqp7LfEZs0Wwj8GCup2dVhcYyjH0gdvWprrCVpbuORlLGnCrQAWSPiYMQG1S+K0OUwmtpJdTGu15LmTBQzkumBPBJg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608506; c=relaxed/simple; bh=UTSUUiWJQtN66NoyNijoYhg+u0HAIyMdPCLucB2WvoY=; h=Message-Id:References:In-Reply-To:Content-Type:Cc:Date: Mime-Version:To:From:Subject; b=R6pyOlWswVoWXOf79rjF+S8QkF9429ha7bvPXUXTXb10BMYIbyZW8iQCirlpuySGfc5s/tCq41nZ+Xo9cJzWXwrnAs/C1YQh1Q4hifnH0yTfpnPSs0c+98JeacUjyFpFmUXh0xqvX8DkDl2EDVi4M06gwfunV1+Q8s2yCIiDAT0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=Aaa4iu+i; arc=none smtp.client-ip=209.127.230.114 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="Aaa4iu+i" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1781608493; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=uifNtWWqCdcLbeQJ1BoOjbBYSTB4/ehdbPbI/vJscJc=; b=Aaa4iu+iST+pFgiBpmoUsr34ofH+CaIaJoCB5OgjEqShSRIW5JqRNkrRHJQEXSbUVB3qHZ 6djdOdE5Nk6R0p+iGak4tqc8gVeeF97qmbRn9bt2LV8/hHU9W1ttnMjVjtEcV4ikHamF1X 3AAHShF9KM4DdY+CSlKHcMP6SyEQKvqQLhwWRrk9ER5TZqUwyUEdW9gqzabJbamVNviK1j ZW4S7zkCCBRTwGMVjH/nndh4XFV15qv4sq2Xl9EzlbVwgITTIMRvtT5Em2ReQdnmr3KpHw aTDi77TgN7uB9/HkWKu+pzlikrP7QxPZz1m2yySqoxVlK7oEwW6SsoIqdN8Zew== Message-Id: <20260616111127.966468-10-zhouchuyi@bytedance.com> Content-Transfer-Encoding: quoted-printable References: <20260616111127.966468-1-zhouchuyi@bytedance.com> In-Reply-To: <20260616111127.966468-1-zhouchuyi@bytedance.com> Cc: , "Chuyi Zhou" Date: Tue, 16 Jun 2026 19:11:22 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Mailer: git-send-email 2.20.1 To: , , , , , , , , , , , , , From: "Chuyi Zhou" Subject: [PATCH v8 09/14] scftorture: Remove preempt_disable() in scftorture_invoke_one() X-Lms-Return-Path: X-Original-From: Chuyi Zhou Content-Type: text/plain; charset="utf-8" Previous patches make smp_call*() functions handle preemption logic internally. Thus, the explicit preempt_disable() surrounding these calls becomes unnecessary. Furthermore, keeping the external preempt_disable() would prevent scftorture from exercising the newly narrowed internal preemption-disabled regions during IPI dispatch. This patch removes the preempt_{enable, disable} pairs in scftorture_invoke_one(). Removing this preemption protection could expose a race condition with CPU hotplug when use_cpus_read_lock is false. Specifically, for multi-cast operations (SCF_PRIM_MANY or SCF_PRIM_ALL), if only 1 CPU is online, smp_call_function_many() correctly skips sending IPIs and leaves scfc_out as false. Without preemption disabled, a CPU hotplug thread could preempt the test thread, bring a second CPU online, and increment num_online_cpus(). When the test thread resumes, the validation check would see num_online_cpus() > 1 and falsely trigger the memory-ordering warning, leaking the scfcp structure. To avoid this potential false positive, restrict the num_online_cpus() > 1 condition to only apply when use_cpus_read_lock is true, ensuring the CPU count remains stable during evaluation. Signed-off-by: Chuyi Zhou Reviewed-by: Sebastian Andrzej Siewior Tested-by: Paul E. McKenney --- kernel/scftorture.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-) diff --git a/kernel/scftorture.c b/kernel/scftorture.c index 327c315f411c..2082f9b44370 100644 --- a/kernel/scftorture.c +++ b/kernel/scftorture.c @@ -348,6 +348,8 @@ static void scftorture_invoke_one(struct scf_statistics= *scfp, struct torture_ra int ret =3D 0; struct scf_check *scfcp =3D NULL; struct scf_selector *scfsp =3D scf_sel_rand(trsp); + bool is_single =3D (scfsp->scfs_prim =3D=3D SCF_PRIM_SINGLE || + scfsp->scfs_prim =3D=3D SCF_PRIM_SINGLE_RPC); =20 if (scfsp->scfs_prim =3D=3D SCF_PRIM_SINGLE || scfsp->scfs_wait) { scfcp =3D kmalloc_obj(*scfcp, GFP_ATOMIC); @@ -364,8 +366,6 @@ static void scftorture_invoke_one(struct scf_statistics= *scfp, struct torture_ra } if (use_cpus_read_lock) cpus_read_lock(); - else - preempt_disable(); switch (scfsp->scfs_prim) { case SCF_PRIM_RESCHED: if (IS_BUILTIN(CONFIG_SCF_TORTURE_TEST)) { @@ -411,13 +411,10 @@ static void scftorture_invoke_one(struct scf_statisti= cs *scfp, struct torture_ra if (!ret) { if (use_cpus_read_lock) cpus_read_unlock(); - else - preempt_enable(); + wait_for_completion(&scfcp->scfc_completion); if (use_cpus_read_lock) cpus_read_lock(); - else - preempt_disable(); } else { scfp->n_single_rpc_ofl++; scf_add_to_free_list(scfcp); @@ -452,7 +449,7 @@ static void scftorture_invoke_one(struct scf_statistics= *scfp, struct torture_ra scfcp->scfc_out =3D true; } if (scfcp && scfsp->scfs_wait) { - if (WARN_ON_ONCE((num_online_cpus() > 1 || scfsp->scfs_prim =3D=3D SCF_P= RIM_SINGLE) && + if (WARN_ON_ONCE(((use_cpus_read_lock && num_online_cpus() > 1) || is_si= ngle) && !scfcp->scfc_out)) { pr_warn("%s: Memory-ordering failure, scfs_prim: %d.\n", __func__, scfs= p->scfs_prim); atomic_inc(&n_mb_out_errs); // Leak rather than trash! @@ -463,8 +460,6 @@ static void scftorture_invoke_one(struct scf_statistics= *scfp, struct torture_ra } if (use_cpus_read_lock) cpus_read_unlock(); - else - preempt_enable(); if (allocfail) schedule_timeout_idle((1 + longwait) * HZ); // Let no-wait handlers com= plete. else if (!(torture_random(trsp) & 0xfff)) --=20 2.20.1 From nobody Wed Jun 17 02:52:25 2026 Received: from va-1-114.ptr.blmpb.com (va-1-114.ptr.blmpb.com [209.127.230.114]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 226283BB10F for ; Tue, 16 Jun 2026 11:15:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.114 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608514; cv=none; b=HBP2jFwlfyzbbSXT0WJogzdCxZUyzTMdULLRK1zbkm/F0Bl4AA8rbqpYq9QACmAqkuXL+w6HDhLMDZx0Bo2vbhu/1szG2V0zlbjBkMnHVRq5YIMXZi36lx9MkKgyrrVP3Hj+EUlx+ESULmD2LJwdODCeGJC0hfkT1fLxIvkjBW8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608514; c=relaxed/simple; bh=aQEn2iHu8uWayUekGTF8NcS+dbUVABWhsg8gTFgTJ6A=; h=In-Reply-To:Date:From:Cc:Content-Type:Mime-Version:Subject: Message-Id:References:To; b=sE27syvDXgEjtjWRUMN8Sk7zSZqYs90eGPii3HI8DcZvf5U8Hhjn6odsBtqN+BdWQTArRhvvCMCp4yAdI1hPaUh0VDP5GQuRAacOf8kX89mzQ96/gHg+Mgz1slMhg4Mqd6dqZmMNwwf1deWa8R05v/dOqhQ6K1rI9wQjpiSPAh4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=OL1godpt; arc=none smtp.client-ip=209.127.230.114 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="OL1godpt" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1781608508; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=0oRoK9tryy0kwmhMlEOPbcFOGHYaFPNq3wPUzdycWW0=; b=OL1godptiv/mvkjY/fopMNy8sKUHWW+eZczBuNp+7FbZIG6oeYTNOI2Djj4i4PoGAaqGFU ijLXOuBl+O0C7vuvIkZvDyIqNr+rYDnzyp3fgLrbfqWSP+mw6lJSBVbC4JNiVVztE885PO rlUVRyB799SiiFrqX92E4Zg73h7po3gq50G2H8d78HtdkYPzRadwVjxgF1tWMPfx5La9fi MDsvjIn+d7Ug6+MYrVG4cRa9YU2DZGVqIwWuDkjER9P1ga+qCKFZKdgrJsaMyyTNpF0mHh puQFahdIlm2/ClCeL6FMBSt9yEoUlAUQ4WLmy64vMh7ThPXUhS0M7drlM93OQw== In-Reply-To: <20260616111127.966468-1-zhouchuyi@bytedance.com> X-Original-From: Chuyi Zhou Content-Transfer-Encoding: quoted-printable Date: Tue, 16 Jun 2026 19:11:23 +0800 From: "Chuyi Zhou" Cc: , "Chuyi Zhou" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 Subject: [PATCH v8 10/14] x86/mm: Factor out flush_tlb_info initialization Message-Id: <20260616111127.966468-11-zhouchuyi@bytedance.com> References: <20260616111127.966468-1-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 X-Lms-Return-Path: To: , , , , , , , , , , , , , Content-Type: text/plain; charset="utf-8" get_flush_tlb_info() currently does two things: it reserves the per-CPU flush_tlb_info storage and initializes the fields that describe the flush. Split the field setup into init_flush_tlb_info(). The per-CPU storage, DEBUG_VM reentrancy check and put_flush_tlb_info() lifetime rules are unchanged. This is a preparatory cleanup for allowing callers to provide their own flush_tlb_info storage. Signed-off-by: Chuyi Zhou Reviewed-by: Sebastian Andrzej Siewior --- arch/x86/mm/tlb.c | 40 +++++++++++++++++++++++++--------------- 1 file changed, 25 insertions(+), 15 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index af43d177087e..c999d5cd3ea8 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1379,22 +1379,12 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct flush_t= lb_info, flush_tlb_info); static DEFINE_PER_CPU(unsigned int, flush_tlb_info_idx); #endif =20 -static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, - unsigned long start, unsigned long end, - unsigned int stride_shift, bool freed_tables, - u64 new_tlb_gen) +static void init_flush_tlb_info(struct flush_tlb_info *info, + struct mm_struct *mm, + unsigned long start, unsigned long end, + unsigned int stride_shift, bool freed_tables, + u64 new_tlb_gen) { - struct flush_tlb_info *info =3D this_cpu_ptr(&flush_tlb_info); - -#ifdef CONFIG_DEBUG_VM - /* - * Ensure that the following code is non-reentrant and flush_tlb_info - * is not overwritten. This means no TLB flushing is initiated by - * interrupt handlers and machine-check exception handlers. - */ - BUG_ON(this_cpu_inc_return(flush_tlb_info_idx) !=3D 1); -#endif - /* * If the number of flushes is so large that a full flush * would be faster, do a full flush. @@ -1412,6 +1402,26 @@ static struct flush_tlb_info *get_flush_tlb_info(str= uct mm_struct *mm, info->new_tlb_gen =3D new_tlb_gen; info->initiating_cpu =3D smp_processor_id(); info->trim_cpumask =3D 0; +} + +static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, + unsigned long start, unsigned long end, + unsigned int stride_shift, bool freed_tables, + u64 new_tlb_gen) +{ + struct flush_tlb_info *info =3D this_cpu_ptr(&flush_tlb_info); + +#ifdef CONFIG_DEBUG_VM + /* + * Ensure that the following code is non-reentrant and flush_tlb_info + * is not overwritten. This means no TLB flushing is initiated by + * interrupt handlers and machine-check exception handlers. + */ + BUG_ON(this_cpu_inc_return(flush_tlb_info_idx) !=3D 1); +#endif + + init_flush_tlb_info(info, mm, start, end, stride_shift, freed_tables, + new_tlb_gen); =20 return info; } --=20 2.20.1 From nobody Wed Jun 17 02:52:25 2026 Received: from va-1-111.ptr.blmpb.com (va-1-111.ptr.blmpb.com [209.127.230.111]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D658830F7EB for ; Tue, 16 Jun 2026 11:15:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.111 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608530; cv=none; b=p6uWWYQdazQ06Xk4lkY76MyopcSiQcu8985wFx1q5CNjuTbtEKdO+Nllcn+lFnK0m9azOS+od6/ZQh/GcSRvWB7Vfb2xR9EjhLfd0ow2ljIUIKAccebCuBUZZrm/CFtc+RvxL3RPgaLcS8moIsFpnu/znWo9FgOc9OzRyOifvVQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608530; c=relaxed/simple; bh=M2wx0yQ0ISsGa7o0fsbM6+YpnJfP0S4dIuFt+CSey8o=; h=To:Mime-Version:References:Content-Type:Subject:Date:Message-Id: Cc:From:In-Reply-To; b=lYYoOroJdeDWTKAuYV07oFABduQ63iqMM+GeQaKj3aKdaTK087VCqAC9rvakejpjLYmxoWMwABLkVr1YZRi3yyxkN2RpC1i1zgXOjGpGKkyCb+gzahrZWc57jYGD2FcQFeI04/qsiVuzwIE9G7chsK6PsCwzB++1dS5SwriTLpo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=KfKL+ki1; arc=none smtp.client-ip=209.127.230.111 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="KfKL+ki1" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1781608524; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=34+rYr8IwFzuQdzA9eDZ0kHXooZcaGyhtm34n5+nApc=; b=KfKL+ki13XAKYEp2LKTvnVPkmqk+hysIOq+kQLyvxqNt7JocR51lV7RJeMHxbA5ThKpAjc iV6ftBN//PghQZncTIFOJm4lK9dnHEwkNSOLJ/rl3RdADyjIQcry0YS5NA0Mwnzf6rmd8q NH03vjm42Cqf4Jhly4zoVxjbSEuqKjz7uczA4zz9nlNTYG1gI1twrMO9p7g48lrjqHDhda bbZKWmISk/LwsObD5eSlz6iuNgqdqhnk/AuIssoCngOT6UrquERQxFFNKZE6Z2Rr3zTdOt M2YBcsEKqoP/2kgz3oO9Q7fl0nql20ZeljHiqzTvcym0yuXWQtiMIBB2DFIX/g== To: , , , , , , , , , , , , , Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260616111127.966468-1-zhouchuyi@bytedance.com> Subject: [PATCH v8 11/14] x86/mm: Cap flush_tlb_info alignment at 64 bytes Date: Tue, 16 Jun 2026 19:11:24 +0800 Message-Id: <20260616111127.966468-12-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 Content-Transfer-Encoding: quoted-printable X-Lms-Return-Path: Cc: , "Chuyi Zhou" From: "Chuyi Zhou" X-Original-From: Chuyi Zhou In-Reply-To: <20260616111127.966468-1-zhouchuyi@bytedance.com> Content-Type: text/plain; charset="utf-8" A stack allocated flush_tlb_info should keep cacheline alignment to avoid the regression that motivated the per-CPU storage, but using SMP_CACHE_BYTES directly can make the stack frame grow excessively on configurations with large cache lines[1]. Add FLUSH_TLB_INFO_ALIGN and cap the type alignment at 64 bytes. The existing per-CPU flush_tlb_info instance remains DEFINE_PER_CPU_SHARED_ALIGNED(), so its per-CPU shared-cacheline alignment is unchanged. The capped type alignment matters once flush_tlb_info is moved back to the stack by the next patch. link[1]: https://lore.kernel.org/all/tip-780e0106d468a2962b16b52fdf42898f26= 39e0a0@git.kernel.org/ Signed-off-by: Chuyi Zhou Reviewed-by: Sebastian Andrzej Siewior --- arch/x86/include/asm/tlbflush.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflus= h.h index 0545fe75c3fa..5889a6c4e956 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -4,6 +4,7 @@ =20 #include #include +#include #include =20 #include @@ -211,6 +212,8 @@ extern u16 invlpgb_count_max; =20 extern void initialize_tlbstate_and_flush(void); =20 +#define FLUSH_TLB_INFO_ALIGN MIN(SMP_CACHE_BYTES, 64) + /* * TLB flushing: * @@ -249,7 +252,7 @@ struct flush_tlb_info { u8 stride_shift; u8 freed_tables; u8 trim_cpumask; -}; +} __aligned(FLUSH_TLB_INFO_ALIGN); =20 void flush_tlb_local(void); void flush_tlb_one_user(unsigned long addr); --=20 2.20.1 From nobody Wed Jun 17 02:52:25 2026 Received: from va-1-113.ptr.blmpb.com (va-1-113.ptr.blmpb.com [209.127.230.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8A9473385A1 for ; Tue, 16 Jun 2026 11:15:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608544; cv=none; b=WlkaHAVrf0++7822pW4F3TukpE4SSB36PoHUMmu6txnzGEY0jnlNEIPwqrMIp3PEi9q4NzsXE8jzdRd4VD0DgZJFI2oc4MxjGpUxdhY2s9hol6HiMRkAy2A999DeZ3IvcbuMK2N812RiaZAvSE4qpWTxP5nCOklPar2UP+6zviY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608544; c=relaxed/simple; bh=ppBEnk0jXXX3uVbq+PcOmQbGxjxpM7hilQ8NH2EiDLk=; h=Mime-Version:Content-Type:Date:Message-Id:In-Reply-To:To:Cc: Subject:From:References; b=AHV4IbWPv6Md2AeKqc7w40XrT/KR655AAEiZvkBacz2kIKWJn9Kb/f/p9WIR+fu2H/lPMkoL/7jS7aIhvqPz9c6V1h/Ga/zn79pnNoOipLdYk2HJu4B905/mt3Gbg+sGBoPke6NQb6ZfWJeAf/yaiZo3wQazSZ/r2MSqHUlSoc8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=dEdyj4oM; arc=none smtp.client-ip=209.127.230.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="dEdyj4oM" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1781608538; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=iuBr0oGTELap4qROqDlcFlVGDDQ2eNS5KUbDgHnssYg=; b=dEdyj4oMKclEKVcAiLKlrD80Gpxh9WcXrmPphly56Epy6CANDtCP2XcEBqpJB1naKZce7T ZHtA+gi4ywjC91aOF0xz+6pPxGIPNHrF9NOSS8+pGKEHa8zbqQT+J83a7s2RIwZIE1lWxQ OWOO0leeFl6uAH0PNXiiQlTUhwNb2zDgNiPysB6BrXfs0hsAZAB/62Vtf1g1Z167KrSIw/ wMDntmj4oqpr3pAhgNAAjLnJIM7a1uWrWa3byY+TN/GkmsGctcECrtBxzp24b+1iBqt4nI zcGr9RhFv2VXkLQ/OaJsqSBeUyjTR3ioKNjRj8hP1vfY+eYfFm5A4GgyxDiicA== Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 X-Original-From: Chuyi Zhou Date: Tue, 16 Jun 2026 19:11:25 +0800 X-Lms-Return-Path: Message-Id: <20260616111127.966468-13-zhouchuyi@bytedance.com> In-Reply-To: <20260616111127.966468-1-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 To: , , , , , , , , , , , , , Cc: , "Chuyi Zhou" Subject: [PATCH v8 12/14] x86/mm: Move flush_tlb_info back to the stack From: "Chuyi Zhou" Content-Transfer-Encoding: quoted-printable References: <20260616111127.966468-1-zhouchuyi@bytedance.com> Content-Type: text/plain; charset="utf-8" flush_tlb_info benefits from cacheline alignment, but using cacheline-aligned stack storage directly can grow stack usage too much on configurations with large SMP_CACHE_BYTES values[1]. That problem caused commit 515ab7c41306 ("x86/mm: Align TLB invalidation info") to be reverted. Commit 3db6d5a5ecaf ("x86/mm/tlb: Remove 'struct flush_tlb_info' from the stack") moved flush_tlb_info to per-CPU storage, which avoided the stack growth problem while preserving cacheline alignment. That was a good fit while the callers kept preemption disabled for the whole flush operation. However, a single per-CPU flush_tlb_info also requires all flush_tlb* operations to keep preemption disabled while the object is in use, so that it cannot be overwritten by another flush on the same CPU. flush_tlb* may send IPIs to remote CPUs and synchronously wait for all remote CPUs to complete their local TLB flushes. That wait can take tens of milliseconds when interrupts are disabled on a remote CPU or when a large number of remote CPUs are involved. The following changes need to shorten the CPU-pinned/preemption-disabled section around those remote TLB flush waits. Move flush_tlb_info back to caller-private stack storage so the caller does not have to stay on the same CPU until the remote flush completes. The previous patch capped the type alignment at 64 bytes. This keeps the alignment benefit for stack objects without reintroducing the old large-cacheline stack usage problem. To evaluate the performance impact of this patch, use the following script to reproduce the microbenchmark mentioned in commit 3db6d5a5ecaf ("x86/mm/tlb: Remove 'struct flush_tlb_info' from the stack"). The test environment is an Ice Lake system (Intel(R) Xeon(R) Platinum 8336C) with 128 CPUs and 2 NUMA nodes. During the test, the threads were bound to specific CPUs, and both pti and mitigations were disabled: #include #include #include #include #include #include #define NUM_OPS 1000000 #define NUM_THREADS 3 #define NUM_RUNS 5 #define PAGE_SIZE 4096 volatile int stop_threads =3D 0; void *busy_wait_thread(void *arg) { while (!stop_threads) { __asm__ volatile ("nop"); } return NULL; } long long get_usec() { struct timeval tv; gettimeofday(&tv, NULL); return tv.tv_sec * 1000000LL + tv.tv_usec; } int main() { pthread_t threads[NUM_THREADS]; char *addr; int i, r; addr =3D mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (addr =3D=3D MAP_FAILED) { perror("mmap"); exit(1); } for (i =3D 0; i < NUM_THREADS; i++) { if (pthread_create(&threads[i], NULL, busy_wait_thread, NULL)) exit(1); } printf("Running benchmark: %d runs, %d ops each, %d background\n" "threads\n", NUM_RUNS, NUM_OPS, NUM_THREADS); for (r =3D 0; r < NUM_RUNS; r++) { long long start, end; start =3D get_usec(); for (i =3D 0; i < NUM_OPS; i++) { addr[0] =3D 1; if (madvise(addr, PAGE_SIZE, MADV_DONTNEED)) { perror("madvise"); exit(1); } } end =3D get_usec(); double duration =3D (double)(end - start); double avg_lat =3D duration / NUM_OPS; printf("Run %d: Total time %.2f us, Avg latency %.4f us/op\n", r + 1, duration, avg_lat); } stop_threads =3D 1; for (i =3D 0; i < NUM_THREADS; i++) pthread_join(threads[i], NULL); munmap(addr, PAGE_SIZE); return 0; } base on-stack-aligned on-stack-not-aligned ---- --------- ----------- avg (usec/op) 2.5278 2.5261 2.5508 stddev 0.0007 0.0027 0.0023 The benchmark results show that the average latency difference between the baseline (base) and the properly aligned stack variable (on-stack-aligned) is within the standard deviation (stddev). This indicates that the variations are caused by testing noise, and reverting to a stack variable with proper alignment causes no performance regression compared to the per-CPU implementation. The unaligned version (on-stack-not-aligned) shows a minor performance drop. This demonstrates that we can shorten the CPU-pinned/preemption-disabled section without sacrificing performance. With caller-private storage there is no shared per-CPU object to protect, so remove the DEBUG_VM reentrancy counter as well. Link[1]: https://lore.kernel.org/all/tip-780e0106d468a2962b16b52fdf42898f26= 39e0a0@git.kernel.org/ Signed-off-by: Chuyi Zhou Acked-by: Nadav Amit Reviewed-by: Sebastian Andrzej Siewior Tested-by: Paul E. McKenney --- arch/x86/mm/tlb.c | 78 +++++++++++------------------------------------ 1 file changed, 18 insertions(+), 60 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index c999d5cd3ea8..0620c001981f 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1373,12 +1373,6 @@ void flush_tlb_multi(const struct cpumask *cpumask, */ unsigned long tlb_single_page_flush_ceiling __read_mostly =3D 33; =20 -static DEFINE_PER_CPU_SHARED_ALIGNED(struct flush_tlb_info, flush_tlb_info= ); - -#ifdef CONFIG_DEBUG_VM -static DEFINE_PER_CPU(unsigned int, flush_tlb_info_idx); -#endif - static void init_flush_tlb_info(struct flush_tlb_info *info, struct mm_struct *mm, unsigned long start, unsigned long end, @@ -1404,50 +1398,19 @@ static void init_flush_tlb_info(struct flush_tlb_in= fo *info, info->trim_cpumask =3D 0; } =20 -static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, - unsigned long start, unsigned long end, - unsigned int stride_shift, bool freed_tables, - u64 new_tlb_gen) -{ - struct flush_tlb_info *info =3D this_cpu_ptr(&flush_tlb_info); - -#ifdef CONFIG_DEBUG_VM - /* - * Ensure that the following code is non-reentrant and flush_tlb_info - * is not overwritten. This means no TLB flushing is initiated by - * interrupt handlers and machine-check exception handlers. - */ - BUG_ON(this_cpu_inc_return(flush_tlb_info_idx) !=3D 1); -#endif - - init_flush_tlb_info(info, mm, start, end, stride_shift, freed_tables, - new_tlb_gen); - - return info; -} - -static void put_flush_tlb_info(void) -{ -#ifdef CONFIG_DEBUG_VM - /* Complete reentrancy prevention checks */ - barrier(); - this_cpu_dec(flush_tlb_info_idx); -#endif -} - void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, bool freed_tables) { - struct flush_tlb_info *info; + struct flush_tlb_info info; int cpu =3D get_cpu(); u64 new_tlb_gen; =20 /* This is also a barrier that synchronizes with switch_mm(). */ new_tlb_gen =3D inc_mm_tlb_gen(mm); =20 - info =3D get_flush_tlb_info(mm, start, end, stride_shift, freed_tables, - new_tlb_gen); + init_flush_tlb_info(&info, mm, start, end, stride_shift, freed_tables, + new_tlb_gen); =20 /* * flush_tlb_multi() is not optimized for the common case in which only @@ -1455,19 +1418,18 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsig= ned long start, * flush_tlb_func_local() directly in this case. */ if (mm_global_asid(mm)) { - broadcast_tlb_flush(info); + broadcast_tlb_flush(&info); } else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) { - info->trim_cpumask =3D should_trim_cpumask(mm); - flush_tlb_multi(mm_cpumask(mm), info); + info.trim_cpumask =3D should_trim_cpumask(mm); + flush_tlb_multi(mm_cpumask(mm), &info); consider_global_asid(mm); } else if (mm =3D=3D this_cpu_read(cpu_tlbstate.loaded_mm)) { lockdep_assert_irqs_enabled(); local_irq_disable(); - flush_tlb_func(info); + flush_tlb_func(&info); local_irq_enable(); } =20 - put_flush_tlb_info(); put_cpu(); mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); } @@ -1537,19 +1499,16 @@ static void kernel_tlb_flush_range(struct flush_tlb= _info *info) =20 void flush_tlb_kernel_range(unsigned long start, unsigned long end) { - struct flush_tlb_info *info; + struct flush_tlb_info info; =20 guard(preempt)(); + init_flush_tlb_info(&info, NULL, start, end, PAGE_SHIFT, false, + TLB_GENERATION_INVALID); =20 - info =3D get_flush_tlb_info(NULL, start, end, PAGE_SHIFT, false, - TLB_GENERATION_INVALID); - - if (info->end =3D=3D TLB_FLUSH_ALL) - kernel_tlb_flush_all(info); + if (info.end =3D=3D TLB_FLUSH_ALL) + kernel_tlb_flush_all(&info); else - kernel_tlb_flush_range(info); - - put_flush_tlb_info(); + kernel_tlb_flush_range(&info); } =20 /* @@ -1717,12 +1676,12 @@ EXPORT_SYMBOL_FOR_KVM(__flush_tlb_all); =20 void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) { - struct flush_tlb_info *info; + struct flush_tlb_info info; =20 int cpu =3D get_cpu(); =20 - info =3D get_flush_tlb_info(NULL, 0, TLB_FLUSH_ALL, 0, false, - TLB_GENERATION_INVALID); + init_flush_tlb_info(&info, NULL, 0, TLB_FLUSH_ALL, 0, false, + TLB_GENERATION_INVALID); /* * flush_tlb_multi() is not optimized for the common case in which only * a local TLB flush is needed. Optimize this use-case by calling @@ -1732,17 +1691,16 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap= _batch *batch) invlpgb_flush_all_nonglobals(); batch->unmapped_pages =3D false; } else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { - flush_tlb_multi(&batch->cpumask, info); + flush_tlb_multi(&batch->cpumask, &info); } else if (cpumask_test_cpu(cpu, &batch->cpumask)) { lockdep_assert_irqs_enabled(); local_irq_disable(); - flush_tlb_func(info); + flush_tlb_func(&info); local_irq_enable(); } =20 cpumask_clear(&batch->cpumask); =20 - put_flush_tlb_info(); put_cpu(); } =20 --=20 2.20.1 From nobody Wed Jun 17 02:52:25 2026 Received: from va-1-112.ptr.blmpb.com (va-1-112.ptr.blmpb.com [209.127.230.112]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 738E235CB6A for ; Tue, 16 Jun 2026 11:15:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.112 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608560; cv=none; b=t0DD1u/mdltDy1rATnEW2O/W4BzlPAmJKwzhdX+nDy19e2dijueDzjlI8BQ8ZaurqiTqaFB+7Bim8zRkwaqLq2K/cMqrpTZ0pGYaEYcbx3afKSTsX+4SMjjn18IQAEAB+9FqTbkk9Mu1WnFEU/EyjIk5C2o81la92xCbPmN7sps= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608560; c=relaxed/simple; bh=/ZIk1PhvsGTQl4yIwWXnYIJ4uKwn1gsGeIJS017923o=; h=Message-Id:Mime-Version:References:Content-Type:Subject: In-Reply-To:Cc:To:Date:From; b=OhxPN2ES+jF+mQzk8PZ0j0cpfS9evMoDcfGVu6kCYWOXcG2A3N+c/qlC2bOTyma9JP/KebTLf30yQZ0srJhM0JN63Piwz4k8GEx+OCORqRswDuHAuSD98z9dQfoQI0ME96MTrc/NiU4VubpR/ebF4t6yiXX89f1mt1PinUZplxM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=R7vD1/Lk; arc=none smtp.client-ip=209.127.230.112 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="R7vD1/Lk" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1781608552; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=fxjFgs3jmgVbxBEjqFS+NGlWNchb9XQmZzLBfuKu/K4=; b=R7vD1/LkAbUDR5uB3tZp1ZV7saXUY7KaYv4iXgyKrquNA432QR6ce76oYTFxKnGxLV/RcQ iPWJP7xV0JCgA6R2LCNOHBNOlejNva4S3dPtbFL5IDrjSgpxmgx8VAhKzZBb9uGf9LsZ4l 6rEzDrIZdE2kWliAnfynjWBXMaz3OixLbKGcsJODsm+f3FQdES8Y4sLTonD+IQKdUlnzHk wl6Uz8F5Shb1mkXuMzF/+f693v1d9WjG7jTrBmPksDOspov4/r9hbIfNneKCp8yR09WgWO Gebl/kL/1xKGBL4LDphsx5qD5yqzWw76kA9Z67qrxrHkqzzNxH4IEz3mdWZbRw== Message-Id: <20260616111127.966468-14-zhouchuyi@bytedance.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260616111127.966468-1-zhouchuyi@bytedance.com> Content-Transfer-Encoding: quoted-printable X-Original-From: Chuyi Zhou Subject: [PATCH v8 13/14] x86/kvm: Disable preemption in kvm_flush_tlb_multi() In-Reply-To: <20260616111127.966468-1-zhouchuyi@bytedance.com> X-Mailer: git-send-email 2.20.1 Cc: , "Chuyi Zhou" To: , , , , , , , , , , , , , Date: Tue, 16 Jun 2026 19:11:26 +0800 X-Lms-Return-Path: From: "Chuyi Zhou" Content-Type: text/plain; charset="utf-8" kvm_flush_tlb_multi() is installed as an x86 PV TLB flush backend, so flush_tlb_multi() can reach it through pv_ops when running as a KVM guest. kvm_flush_tlb_multi() uses the per-CPU scratch cpumask __pv_cpu_mask. That buffer must remain tied to the current CPU until the mask has been copied, filtered, and consumed by native_flush_tlb_multi(). Today the x86/mm callers enter flush_tlb_multi() while pinned to a CPU, but a subsequent x86/mm change will drop that caller-side CPU pinning before issuing the remote TLB flush so the caller can be preempted while waiting for remote CPUs. Make the KVM backend protect its own per-CPU scratch cpumask by disabling preemption locally. This is harmless with the current callers, where the preemption disable is nested, and makes the KVM pv_ops dependency explicit before changing the x86/mm call sites. Signed-off-by: Chuyi Zhou Reviewed-by: Sebastian Andrzej Siewior --- arch/x86/kernel/kvm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index 29226d112029..d540f54f4d16 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -662,8 +662,10 @@ static void kvm_flush_tlb_multi(const struct cpumask *= cpumask, u8 state; int cpu; struct kvm_steal_time *src; - struct cpumask *flushmask =3D this_cpu_cpumask_var_ptr(__pv_cpu_mask); + struct cpumask *flushmask; =20 + guard(preempt)(); + flushmask =3D this_cpu_cpumask_var_ptr(__pv_cpu_mask); cpumask_copy(flushmask, cpumask); /* * We have to call flush only on online vCPUs. And --=20 2.20.1 From nobody Wed Jun 17 02:52:25 2026 Received: from va-1-113.ptr.blmpb.com (va-1-113.ptr.blmpb.com [209.127.230.113]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 87FED199FB0 for ; Tue, 16 Jun 2026 11:16:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.127.230.113 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608574; cv=none; b=omDshX+ZdqabzpbByIkUeaA5BjwJ81ox6ZZzGe5+IOkl1s2uoQA57FHtRiLA3Tbjs6XXRwG2PjTGAK+2HwJFn9VXB6KKsPTY3rBeHd/zoBoP/DIXMz0IYn88yZqX5Oi339OlCeR2L/eS5574COoTAImt0eY+aNzn1elH3Na+oXo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1781608574; c=relaxed/simple; bh=qTIzXR97oND0CA9T0lkKK8Bkfk9tD760gZAlv6SaGJY=; h=Subject:Message-Id:Content-Type:To:Cc:Date:Mime-Version: In-Reply-To:From:References; b=cSPOfgGKND/kKX40oEg4xXyEd8ZB3oHvQDkjD5WmMZDH5gGJJM+35dMQCbxSZp9S7ybJFe0s8AEA+8KDGJTnuHr5xhm26OEUyUNBhytDC2VsCCdcIETjmHRk72rVUG2JhGUfkoFjC1Yd6aMnHPgSpXgMiBQoERR5uO2p5ZXhAJ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=W6wCTIwk; arc=none smtp.client-ip=209.127.230.113 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="W6wCTIwk" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1781608567; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=YtQo0mziR4Ns1GWTOaIuVmJqeO8khC1OE7ejgmrfeHU=; b=W6wCTIwk3xP5bI7jW87TDPcBrFuwkNFldvISLzIPJ1/lO4XNq+au262nQITcXqJg70XPGm x65KZ7vqrtifLDo5LUfI+7lr76kQm3KGH8jO3uFT2ioDta4qV1BQxfPinlxxVLyRN2siOr z1uGtVoBUsTSwtwM4oVxaCZ1myEfiQAXhvT9GUoxNVB9GjW5OooVQKYTM7QN7jf3c8BeyO bLo5R+oS7lj+HuLWJGupzpZfN+OEI6MHh9NuGL8Que0PZuxhIbntvDxG0TNkG5oJWSDCZN /PxLlCuaQDH97CKghI0mDq8nSpMGMGQEX2TK/+wnobhiKYoL4dlfnofEevpmEA== Subject: [PATCH v8 14/14] x86/mm: Re-enable preemption before flush_tlb_multi() Message-Id: <20260616111127.966468-15-zhouchuyi@bytedance.com> Content-Transfer-Encoding: quoted-printable To: , , , , , , , , , , , , , X-Original-From: Chuyi Zhou X-Mailer: git-send-email 2.20.1 Cc: , "Chuyi Zhou" Date: Tue, 16 Jun 2026 19:11:27 +0800 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 In-Reply-To: <20260616111127.966468-1-zhouchuyi@bytedance.com> X-Lms-Return-Path: From: "Chuyi Zhou" References: <20260616111127.966468-1-zhouchuyi@bytedance.com> Content-Type: text/plain; charset="utf-8" flush_tlb_mm_range() and arch_tlbbatch_flush() pin the current CPU while they decide whether the flush can be handled locally or must be sent to remote CPUs. The CPU pinning is needed for the current CPU number and for the local TLB flush path, which reads per-CPU TLB state. It is not needed while waiting for a remote TLB flush to complete. After the remote-flush path has been selected, flush_tlb_info is caller-private stack storage, so the caller no longer has to stay on the same CPU to protect a shared per-CPU flush_tlb_info object. flush_tlb_multi() may also route through x86 PV backends. Those backends must protect their own CPU-local scratch state instead of relying on the caller to stay pinned. Hyper-V already does this by disabling interrupts while using hyperv_pcpu_input_arg, and Xen's multicall path brackets its per-CPU multicall buffer with xen_mc_batch()/xen_mc_issue(). The previous patch makes the KVM backend do the same for __pv_cpu_mask. Remote TLB flushes may synchronously wait for many CPUs, and the wait can take tens of milliseconds when remote CPUs have interrupts disabled or when many CPUs are involved. Keeping preemption disabled for that whole wait unnecessarily increases scheduling latency on the initiating CPU. Drop the CPU pinning before calling flush_tlb_multi() in the remote paths of flush_tlb_mm_range() and arch_tlbbatch_flush(). Keep the local paths inside the pinned section because they still access this CPU's TLB state. Signed-off-by: Chuyi Zhou Reviewed-by: Sebastian Andrzej Siewior Tested-by: Paul E. McKenney --- arch/x86/mm/tlb.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 0620c001981f..3b021930cc69 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1403,6 +1403,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigne= d long start, bool freed_tables) { struct flush_tlb_info info; + bool remote_flush =3D false; int cpu =3D get_cpu(); u64 new_tlb_gen; =20 @@ -1420,9 +1421,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigne= d long start, if (mm_global_asid(mm)) { broadcast_tlb_flush(&info); } else if (cpumask_any_but(mm_cpumask(mm), cpu) < nr_cpu_ids) { - info.trim_cpumask =3D should_trim_cpumask(mm); - flush_tlb_multi(mm_cpumask(mm), &info); - consider_global_asid(mm); + remote_flush =3D true; } else if (mm =3D=3D this_cpu_read(cpu_tlbstate.loaded_mm)) { lockdep_assert_irqs_enabled(); local_irq_disable(); @@ -1431,6 +1430,13 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsign= ed long start, } =20 put_cpu(); + + if (remote_flush) { + info.trim_cpumask =3D should_trim_cpumask(mm); + flush_tlb_multi(mm_cpumask(mm), &info); + consider_global_asid(mm); + } + mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); } =20 @@ -1677,7 +1683,7 @@ EXPORT_SYMBOL_FOR_KVM(__flush_tlb_all); void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) { struct flush_tlb_info info; - + bool remote_flush =3D false; int cpu =3D get_cpu(); =20 init_flush_tlb_info(&info, NULL, 0, TLB_FLUSH_ALL, 0, false, @@ -1691,7 +1697,7 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_b= atch *batch) invlpgb_flush_all_nonglobals(); batch->unmapped_pages =3D false; } else if (cpumask_any_but(&batch->cpumask, cpu) < nr_cpu_ids) { - flush_tlb_multi(&batch->cpumask, &info); + remote_flush =3D true; } else if (cpumask_test_cpu(cpu, &batch->cpumask)) { lockdep_assert_irqs_enabled(); local_irq_disable(); @@ -1699,9 +1705,12 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_= batch *batch) local_irq_enable(); } =20 - cpumask_clear(&batch->cpumask); - put_cpu(); + + if (remote_flush) + flush_tlb_multi(&batch->cpumask, &info); + + cpumask_clear(&batch->cpumask); } =20 /* --=20 2.20.1