klp: use stop machine to check and expedite transition for running tasks

[PATCH] klp: use stop machine to check and expedite transition for running tasks

Posted by Li Zhe 4 days, 21 hours ago

In the current KLP transition implementation, the strategy for running
tasks relies on waiting for a context switch to attempt to clear the
TIF_PATCH_PENDING flag. Alternatively, determine whether the
TIF_PATCH_PENDING flag can be cleared by inspecting the stack once the
process has yielded the CPU. However, this approach proves problematic
in certain environments.

Consider a scenario where the majority of system CPUs are configured
with nohzfull and isolcpus, each dedicated to a VM with a vCPU pinned
to that physical core and configured with idle=poll within the guest.
Under such conditions, these vCPUs rarely leave the CPU. Combined with
the high core counts typical of modern server platforms, this results
in transition completion times that are not only excessively prolonged
but also highly unpredictable.

This patch resolves this issue by registering a callback with
stop_machine. The callback attempts to transition the associated running
task. In a VM environment configured with 32 CPUs, the live patching
operation completes promptly after the SIGNALS_TIMEOUT period with this
patch applied; without it, the process nearly fails to complete under
the same scenario.

Co-developed-by: Rui Qi <qirui.001@bytedance.com>
Signed-off-by: Rui Qi <qirui.001@bytedance.com>
Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
---
 kernel/livepatch/transition.c | 62 ++++++++++++++++++++++++++++++++---
 1 file changed, 58 insertions(+), 4 deletions(-)

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 2351a19ac2a9..9c078b9bd755 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -10,6 +10,7 @@
 #include <linux/cpu.h>
 #include <linux/stacktrace.h>
 #include <linux/static_call.h>
+#include <linux/stop_machine.h>
 #include "core.h"
 #include "patch.h"
 #include "transition.h"
@@ -297,6 +298,61 @@ static int klp_check_and_switch_task(struct task_struct *task, void *arg)
 	return 0;
 }
 
+enum klp_stop_work_bit {
+	KLP_STOP_WORK_PENDING_BIT,
+};
+
+struct klp_stop_work_info {
+	struct task_struct *task;
+	unsigned long flag;
+};
+
+static DEFINE_PER_CPU(struct cpu_stop_work, klp_transition_stop_work);
+static DEFINE_PER_CPU(struct klp_stop_work_info, klp_stop_work_info);
+
+static int klp_check_task(struct task_struct *task, void *old_name)
+{
+	if (task == current)
+		return klp_check_and_switch_task(current, old_name);
+	else
+		return task_call_func(task, klp_check_and_switch_task, old_name);
+}
+
+static int klp_transition_stop_work_fn(void *arg)
+{
+	struct klp_stop_work_info *info = (struct klp_stop_work_info *)arg;
+	struct task_struct *task = info->task;
+	const char *old_name;
+
+	clear_bit(KLP_STOP_WORK_PENDING_BIT, &info->flag);
+
+	if (likely(klp_patch_pending(task)))
+		klp_check_task(task, &old_name);
+
+	put_task_struct(task);
+
+	return 0;
+}
+
+static void klp_try_transition_running_task(struct task_struct *task)
+{
+	int cpu = task_cpu(task);
+
+	if (klp_signals_cnt && !(klp_signals_cnt % SIGNALS_TIMEOUT)) {
+		struct klp_stop_work_info *info =
+			per_cpu_ptr(&klp_stop_work_info, cpu);
+
+		if (test_and_set_bit(KLP_STOP_WORK_PENDING_BIT, &info->flag))
+			return;
+
+		info->task = get_task_struct(task);
+		if (!stop_one_cpu_nowait(cpu, klp_transition_stop_work_fn, info,
+					 per_cpu_ptr(&klp_transition_stop_work,
+					 cpu)))
+			put_task_struct(task);
+	}
+}
+
 /*
  * Try to safely switch a task to the target patch state.  If it's currently
  * running, or it's sleeping on a to-be-patched or to-be-unpatched function, or
@@ -323,10 +379,7 @@ static bool klp_try_switch_task(struct task_struct *task)
 	 * functions.  If all goes well, switch the task to the target patch
 	 * state.
 	 */
-	if (task == current)
-		ret = klp_check_and_switch_task(current, &old_name);
-	else
-		ret = task_call_func(task, klp_check_and_switch_task, &old_name);
+	ret = klp_check_task(task, &old_name);
 
 	switch (ret) {
 	case 0:		/* success */
@@ -335,6 +388,7 @@ static bool klp_try_switch_task(struct task_struct *task)
 	case -EBUSY:	/* klp_check_and_switch_task() */
 		pr_debug("%s: %s:%d is running\n",
 			 __func__, task->comm, task->pid);
+		klp_try_transition_running_task(task);
 		break;
 	case -EINVAL:	/* klp_check_and_switch_task() */
 		pr_debug("%s: %s:%d has an unreliable stack\n",
-- 
2.20.1

Re: [PATCH] klp: use stop machine to check and expedite transition for running tasks

Posted by Josh Poimboeuf 3 days, 4 hours ago

On Mon, Feb 02, 2026 at 05:13:34PM +0800, Li Zhe wrote:
> In the current KLP transition implementation, the strategy for running
> tasks relies on waiting for a context switch to attempt to clear the
> TIF_PATCH_PENDING flag. Alternatively, determine whether the
> TIF_PATCH_PENDING flag can be cleared by inspecting the stack once the
> process has yielded the CPU. However, this approach proves problematic
> in certain environments.
> 
> Consider a scenario where the majority of system CPUs are configured
> with nohzfull and isolcpus, each dedicated to a VM with a vCPU pinned
> to that physical core and configured with idle=poll within the guest.
> Under such conditions, these vCPUs rarely leave the CPU. Combined with
> the high core counts typical of modern server platforms, this results
> in transition completion times that are not only excessively prolonged
> but also highly unpredictable.
> 
> This patch resolves this issue by registering a callback with
> stop_machine. The callback attempts to transition the associated running
> task. In a VM environment configured with 32 CPUs, the live patching
> operation completes promptly after the SIGNALS_TIMEOUT period with this
> patch applied; without it, the process nearly fails to complete under
> the same scenario.
> 
> Co-developed-by: Rui Qi <qirui.001@bytedance.com>
> Signed-off-by: Rui Qi <qirui.001@bytedance.com>
> Signed-off-by: Li Zhe <lizhe.67@bytedance.com>

PeterZ, what's your take on this?

I wonder if we could instead do resched_cpu() or something similar to
trigger the call to klp_sched_try_switch() in __schedule()?

> ---
>  kernel/livepatch/transition.c | 62 ++++++++++++++++++++++++++++++++---
>  1 file changed, 58 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index 2351a19ac2a9..9c078b9bd755 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -10,6 +10,7 @@
>  #include <linux/cpu.h>
>  #include <linux/stacktrace.h>
>  #include <linux/static_call.h>
> +#include <linux/stop_machine.h>
>  #include "core.h"
>  #include "patch.h"
>  #include "transition.h"
> @@ -297,6 +298,61 @@ static int klp_check_and_switch_task(struct task_struct *task, void *arg)
>  	return 0;
>  }
>  
> +enum klp_stop_work_bit {
> +	KLP_STOP_WORK_PENDING_BIT,
> +};
> +
> +struct klp_stop_work_info {
> +	struct task_struct *task;
> +	unsigned long flag;
> +};
> +
> +static DEFINE_PER_CPU(struct cpu_stop_work, klp_transition_stop_work);
> +static DEFINE_PER_CPU(struct klp_stop_work_info, klp_stop_work_info);
> +
> +static int klp_check_task(struct task_struct *task, void *old_name)
> +{
> +	if (task == current)
> +		return klp_check_and_switch_task(current, old_name);
> +	else
> +		return task_call_func(task, klp_check_and_switch_task, old_name);
> +}
> +
> +static int klp_transition_stop_work_fn(void *arg)
> +{
> +	struct klp_stop_work_info *info = (struct klp_stop_work_info *)arg;
> +	struct task_struct *task = info->task;
> +	const char *old_name;
> +
> +	clear_bit(KLP_STOP_WORK_PENDING_BIT, &info->flag);
> +
> +	if (likely(klp_patch_pending(task)))
> +		klp_check_task(task, &old_name);
> +
> +	put_task_struct(task);
> +
> +	return 0;
> +}
> +
> +static void klp_try_transition_running_task(struct task_struct *task)
> +{
> +	int cpu = task_cpu(task);
> +
> +	if (klp_signals_cnt && !(klp_signals_cnt % SIGNALS_TIMEOUT)) {
> +		struct klp_stop_work_info *info =
> +			per_cpu_ptr(&klp_stop_work_info, cpu);
> +
> +		if (test_and_set_bit(KLP_STOP_WORK_PENDING_BIT, &info->flag))
> +			return;
> +
> +		info->task = get_task_struct(task);
> +		if (!stop_one_cpu_nowait(cpu, klp_transition_stop_work_fn, info,
> +					 per_cpu_ptr(&klp_transition_stop_work,
> +					 cpu)))
> +			put_task_struct(task);
> +	}
> +}
> +
>  /*
>   * Try to safely switch a task to the target patch state.  If it's currently
>   * running, or it's sleeping on a to-be-patched or to-be-unpatched function, or
> @@ -323,10 +379,7 @@ static bool klp_try_switch_task(struct task_struct *task)
>  	 * functions.  If all goes well, switch the task to the target patch
>  	 * state.
>  	 */
> -	if (task == current)
> -		ret = klp_check_and_switch_task(current, &old_name);
> -	else
> -		ret = task_call_func(task, klp_check_and_switch_task, &old_name);
> +	ret = klp_check_task(task, &old_name);
>  
>  	switch (ret) {
>  	case 0:		/* success */
> @@ -335,6 +388,7 @@ static bool klp_try_switch_task(struct task_struct *task)
>  	case -EBUSY:	/* klp_check_and_switch_task() */
>  		pr_debug("%s: %s:%d is running\n",
>  			 __func__, task->comm, task->pid);
> +		klp_try_transition_running_task(task);
>  		break;
>  	case -EINVAL:	/* klp_check_and_switch_task() */
>  		pr_debug("%s: %s:%d has an unreliable stack\n",
> -- 
> 2.20.1

-- 
Josh

Re: [PATCH] klp: use stop machine to check and expedite transition for running tasks

Posted by Li Zhe 3 days, 4 hours ago

On Tue, 3 Feb 2026 18:20:22 -0800, jpoimboe@kernel.org wrote:
 
> On Mon, Feb 02, 2026 at 05:13:34PM +0800, Li Zhe wrote:
> > In the current KLP transition implementation, the strategy for running
> > tasks relies on waiting for a context switch to attempt to clear the
> > TIF_PATCH_PENDING flag. Alternatively, determine whether the
> > TIF_PATCH_PENDING flag can be cleared by inspecting the stack once the
> > process has yielded the CPU. However, this approach proves problematic
> > in certain environments.
> > 
> > Consider a scenario where the majority of system CPUs are configured
> > with nohzfull and isolcpus, each dedicated to a VM with a vCPU pinned
> > to that physical core and configured with idle=poll within the guest.
> > Under such conditions, these vCPUs rarely leave the CPU. Combined with
> > the high core counts typical of modern server platforms, this results
> > in transition completion times that are not only excessively prolonged
> > but also highly unpredictable.
> > 
> > This patch resolves this issue by registering a callback with
> > stop_machine. The callback attempts to transition the associated running
> > task. In a VM environment configured with 32 CPUs, the live patching
> > operation completes promptly after the SIGNALS_TIMEOUT period with this
> > patch applied; without it, the process nearly fails to complete under
> > the same scenario.
> > 
> > Co-developed-by: Rui Qi <qirui.001@bytedance.com>
> > Signed-off-by: Rui Qi <qirui.001@bytedance.com>
> > Signed-off-by: Li Zhe <lizhe.67@bytedance.com>
> 
> PeterZ, what's your take on this?
> 
> I wonder if we could instead do resched_cpu() or something similar to
> trigger the call to klp_sched_try_switch() in __schedule()?

klp_sched_try_switch() only invokes __klp_sched_try_switch() after
verifying that the corresponding task has the TASK_FREEZABLE flag
set. I remain uncertain whether this approach adequately resolves
the issue.

Thanks,
Zhe