From nobody Mon Feb 9 06:02:09 2026 Received: from sg-1-100.ptr.blmpb.com (sg-1-100.ptr.blmpb.com [118.26.132.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39F9D3563C5 for ; Mon, 2 Feb 2026 09:14:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=118.26.132.100 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770023666; cv=none; b=lWDinEPQXdnYGxpKnjSX5AYnEBz/gCN/MkVm9tMRgGyknamSuIWsfz8j6DYx+oSAfamhxhUf6o0uU2NzDYNBjRtqQjqLkwhuWivxiMqUZRyZ/1WhBS2KdlM8n/hZRy+fAVxKnRS5Gs6pp2uJJsV2SPgA6Baa4roaQRCxFCQTn/0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1770023666; c=relaxed/simple; bh=sPTEvWnva5n5CLrvZht4XWRG0J9xLIleuREsk1wT140=; h=From:Mime-Version:To:Content-Type:Subject:Date:Cc:Message-Id; b=Gh9EDJ3/tMjhDO3HulqZCgPtAfwwgqwbp7k7kOxqKgu8K9ZPlujVxWQ/5qNahi9c6d0XrOnBdI3CQ5NFXImQwlqJXG1bN6rFf+eWYypNofcehGMC8TQk8Obo7XygI27qi7UMwKPTgSnEXFoCX2wqFjymp8zqhAgEKznZCx1Sgi0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com; spf=pass smtp.mailfrom=bytedance.com; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b=jR2QkiqC; arc=none smtp.client-ip=118.26.132.100 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=bytedance.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=bytedance.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=bytedance.com header.i=@bytedance.com header.b="jR2QkiqC" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; s=2212171451; d=bytedance.com; t=1770023646; h=from:subject: mime-version:from:date:message-id:subject:to:cc:reply-to:content-type: mime-version:in-reply-to:message-id; bh=1k2P3758o5OptGHu6EC22LU4mqBsDuAUN1+em7tI2sM=; b=jR2QkiqCKYNOaHXEkAKPW+7/ZtkOtHmzAGMh0xZIu9vbpn1TMFBGqSdIJO4CVKKfoSYbGn ggpKpp0AzH6lmXTBjLJel6aaVKL8JY7Ceyos4TASnb3sbLeXadrbqs1214JInN9zSUo5XI 8E51/2NL+Xv3e90k8J7FsHg/jjrdXf53bd2q42FS0JbwNW9mMCe5CpFtaY/xOpCdK2qCq6 MPP/MLgwS+LtWgnrPEUpj/iNFaPAnhKCx8dAGYsjzsvOiU2caInOusz++l/iQoh8C7uk3W wLYli/AfyTTJXbwjHJwkrpLl/HoOqWFakG6diIRMptSCW4GEA/LkTcjiL1oOZQ== X-Original-From: Li Zhe From: "Li Zhe" Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 To: , , , , X-Mailer: git-send-email 2.45.2 Subject: [PATCH] klp: use stop machine to check and expedite transition for running tasks Date: Mon, 2 Feb 2026 17:13:34 +0800 X-Lms-Return-Path: Content-Transfer-Encoding: quoted-printable Cc: , , , Message-Id: <20260202091334.60881-1-lizhe.67@bytedance.com> Content-Type: text/plain; charset="utf-8" In the current KLP transition implementation, the strategy for running tasks relies on waiting for a context switch to attempt to clear the TIF_PATCH_PENDING flag. Alternatively, determine whether the TIF_PATCH_PENDING flag can be cleared by inspecting the stack once the process has yielded the CPU. However, this approach proves problematic in certain environments. Consider a scenario where the majority of system CPUs are configured with nohzfull and isolcpus, each dedicated to a VM with a vCPU pinned to that physical core and configured with idle=3Dpoll within the guest. Under such conditions, these vCPUs rarely leave the CPU. Combined with the high core counts typical of modern server platforms, this results in transition completion times that are not only excessively prolonged but also highly unpredictable. This patch resolves this issue by registering a callback with stop_machine. The callback attempts to transition the associated running task. In a VM environment configured with 32 CPUs, the live patching operation completes promptly after the SIGNALS_TIMEOUT period with this patch applied; without it, the process nearly fails to complete under the same scenario. Co-developed-by: Rui Qi Signed-off-by: Rui Qi Signed-off-by: Li Zhe --- kernel/livepatch/transition.c | 62 ++++++++++++++++++++++++++++++++--- 1 file changed, 58 insertions(+), 4 deletions(-) diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c index 2351a19ac2a9..9c078b9bd755 100644 --- a/kernel/livepatch/transition.c +++ b/kernel/livepatch/transition.c @@ -10,6 +10,7 @@ #include #include #include +#include #include "core.h" #include "patch.h" #include "transition.h" @@ -297,6 +298,61 @@ static int klp_check_and_switch_task(struct task_struc= t *task, void *arg) return 0; } =20 +enum klp_stop_work_bit { + KLP_STOP_WORK_PENDING_BIT, +}; + +struct klp_stop_work_info { + struct task_struct *task; + unsigned long flag; +}; + +static DEFINE_PER_CPU(struct cpu_stop_work, klp_transition_stop_work); +static DEFINE_PER_CPU(struct klp_stop_work_info, klp_stop_work_info); + +static int klp_check_task(struct task_struct *task, void *old_name) +{ + if (task =3D=3D current) + return klp_check_and_switch_task(current, old_name); + else + return task_call_func(task, klp_check_and_switch_task, old_name); +} + +static int klp_transition_stop_work_fn(void *arg) +{ + struct klp_stop_work_info *info =3D (struct klp_stop_work_info *)arg; + struct task_struct *task =3D info->task; + const char *old_name; + + clear_bit(KLP_STOP_WORK_PENDING_BIT, &info->flag); + + if (likely(klp_patch_pending(task))) + klp_check_task(task, &old_name); + + put_task_struct(task); + + return 0; +} + +static void klp_try_transition_running_task(struct task_struct *task) +{ + int cpu =3D task_cpu(task); + + if (klp_signals_cnt && !(klp_signals_cnt % SIGNALS_TIMEOUT)) { + struct klp_stop_work_info *info =3D + per_cpu_ptr(&klp_stop_work_info, cpu); + + if (test_and_set_bit(KLP_STOP_WORK_PENDING_BIT, &info->flag)) + return; + + info->task =3D get_task_struct(task); + if (!stop_one_cpu_nowait(cpu, klp_transition_stop_work_fn, info, + per_cpu_ptr(&klp_transition_stop_work, + cpu))) + put_task_struct(task); + } +} + /* * Try to safely switch a task to the target patch state. If it's current= ly * running, or it's sleeping on a to-be-patched or to-be-unpatched functio= n, or @@ -323,10 +379,7 @@ static bool klp_try_switch_task(struct task_struct *ta= sk) * functions. If all goes well, switch the task to the target patch * state. */ - if (task =3D=3D current) - ret =3D klp_check_and_switch_task(current, &old_name); - else - ret =3D task_call_func(task, klp_check_and_switch_task, &old_name); + ret =3D klp_check_task(task, &old_name); =20 switch (ret) { case 0: /* success */ @@ -335,6 +388,7 @@ static bool klp_try_switch_task(struct task_struct *tas= k) case -EBUSY: /* klp_check_and_switch_task() */ pr_debug("%s: %s:%d is running\n", __func__, task->comm, task->pid); + klp_try_transition_running_task(task); break; case -EINVAL: /* klp_check_and_switch_task() */ pr_debug("%s: %s:%d has an unreliable stack\n", --=20 2.20.1