[RFC] sched,livepatch: call stop_one_cpu in klp_check_and_switch_task

Rik van Riel posted 1 patch 4 years ago
[RFC] sched,livepatch: call stop_one_cpu in klp_check_and_switch_task
Posted by Rik van Riel 4 years ago
After talking with Peter, this seems like it might be a potential approach
to fix the issue for kernels both with PREEMPT enabled and disabled.

If this looks like a reasonable approach to people, we can run experiments
with this patch on a few thousand systems, and compare it with the kernel
live patch transition latencies (and number of failures) on a kernel without
that patch.

Does this look like an approach that could work?

---8<---
sched,livepatch: call stop_one_cpu in klp_check_and_switch_task

If a running task fails to transition to the new kernel live patch after the
first attempt, use the stopper thread to preempt it during subsequent attempts
at switching to the new kernel live patch.

<INSERT EXPERIMENTAL RESULTS HERE>

Signed-off-by: Rik van Riel <riel@surriel.com>

diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
index 5d03a2ad1066..26e9e5f09822 100644
--- a/kernel/livepatch/transition.c
+++ b/kernel/livepatch/transition.c
@@ -9,6 +9,7 @@
 
 #include <linux/cpu.h>
 #include <linux/stacktrace.h>
+#include <linux/stop_machine.h>
 #include "core.h"
 #include "patch.h"
 #include "transition.h"
@@ -281,6 +282,11 @@ static int klp_check_and_switch_task(struct task_struct *task, void *arg)
 	return 0;
 }
 
+static int kpatch_dummy_fn(void *dummy)
+{
+	return 0;
+}
+
 /*
  * Try to safely switch a task to the target patch state.  If it's currently
  * running, or it's sleeping on a to-be-patched or to-be-unpatched function, or
@@ -315,6 +321,9 @@ static bool klp_try_switch_task(struct task_struct *task)
 	case -EBUSY:	/* klp_check_and_switch_task() */
 		pr_debug("%s: %s:%d is running\n",
 			 __func__, task->comm, task->pid);
+		/* Preempt the task from the second KLP switch attempt. */
+		if (klp_signals_cnt)
+			stop_one_cpu(task_cpu(task), kpatch_dummy_fn, NULL);
 		break;
 	case -EINVAL:	/* klp_check_and_switch_task() */
 		pr_debug("%s: %s:%d has an unreliable stack\n",
Re: [RFC] sched,livepatch: call stop_one_cpu in klp_check_and_switch_task
Posted by Josh Poimboeuf 4 years ago
On Mon, May 09, 2022 at 11:52:27AM -0400, Rik van Riel wrote:
> Does this look like an approach that could work?
> 
> ---8<---
> sched,livepatch: call stop_one_cpu in klp_check_and_switch_task
> 
> If a running task fails to transition to the new kernel live patch after the
> first attempt, use the stopper thread to preempt it during subsequent attempts
> at switching to the new kernel live patch.
> 
> <INSERT EXPERIMENTAL RESULTS HERE>

It would be helpful to add more info about the original problem being
solved, and how this is supposed to fix it.

> +static int kpatch_dummy_fn(void *dummy)

s/kpatch/klp/

> +{
> +	return 0;
> +}
> +
>  /*
>   * Try to safely switch a task to the target patch state.  If it's currently
>   * running, or it's sleeping on a to-be-patched or to-be-unpatched function, or
> @@ -315,6 +321,9 @@ static bool klp_try_switch_task(struct task_struct *task)
>  	case -EBUSY:	/* klp_check_and_switch_task() */
>  		pr_debug("%s: %s:%d is running\n",
>  			 __func__, task->comm, task->pid);
> +		/* Preempt the task from the second KLP switch attempt. */
> +		if (klp_signals_cnt)
> +			stop_one_cpu(task_cpu(task), kpatch_dummy_fn, NULL);

I must be missing something, how is briefly preempting a kthread
supposed to actually transition it?  Won't it likely go back to running
on the CPU before the next periodic klp_transition_work_fn() check?

-- 
Josh
Re: [RFC] sched,livepatch: call stop_one_cpu in klp_check_and_switch_task
Posted by Rik van Riel 4 years ago
On Mon, 2022-05-09 at 11:00 -0700, Josh Poimboeuf wrote:
> On Mon, May 09, 2022 at 11:52:27AM -0400, Rik van Riel wrote:
> > Does this look like an approach that could work?
> > 
> > @@ -315,6 +321,9 @@ static bool klp_try_switch_task(struct
> > task_struct *task)
> >         case -EBUSY:    /* klp_check_and_switch_task() */
> >                 pr_debug("%s: %s:%d is running\n",
> >                          __func__, task->comm, task->pid);
> > +               /* Preempt the task from the second KLP switch
> > attempt. */
> > +               if (klp_signals_cnt)
> > +                       stop_one_cpu(task_cpu(task),
> > kpatch_dummy_fn, NULL);
> 
> I must be missing something, how is briefly preempting a kthread
> supposed to actually transition it?  Won't it likely go back to
> running
> on the CPU before the next periodic klp_transition_work_fn() check?
> 
That's the kind of feedback I was hoping for ;)

I looked around the code a little bit, and it seems
that only the idle tasks can transition to another KLP
while they are running?

That makes me wonder how the kworker thread that runs
the klp switching code transitions itself...

Should kernel threads that can use a lot of CPU have
something in their outer loop to transition KLPs,
just like the idle task does?

-- 
All Rights Reversed.
Re: [RFC] sched,livepatch: call stop_one_cpu in klp_check_and_switch_task
Posted by Josh Poimboeuf 4 years ago
On Mon, May 09, 2022 at 03:10:16PM -0400, Rik van Riel wrote:
> On Mon, 2022-05-09 at 11:00 -0700, Josh Poimboeuf wrote:
> > On Mon, May 09, 2022 at 11:52:27AM -0400, Rik van Riel wrote:
> > > Does this look like an approach that could work?
> > > 
> > > @@ -315,6 +321,9 @@ static bool klp_try_switch_task(struct
> > > task_struct *task)
> > >         case -EBUSY:    /* klp_check_and_switch_task() */
> > >                 pr_debug("%s: %s:%d is running\n",
> > >                          __func__, task->comm, task->pid);
> > > +               /* Preempt the task from the second KLP switch
> > > attempt. */
> > > +               if (klp_signals_cnt)
> > > +                       stop_one_cpu(task_cpu(task),
> > > kpatch_dummy_fn, NULL);
> > 
> > I must be missing something, how is briefly preempting a kthread
> > supposed to actually transition it?  Won't it likely go back to
> > running
> > on the CPU before the next periodic klp_transition_work_fn() check?
> > 
> That's the kind of feedback I was hoping for ;)
> 
> I looked around the code a little bit, and it seems
> that only the idle tasks can transition to another KLP
> while they are running?

Yes.

> That makes me wonder how the kworker thread that runs
> the klp switching code transitions itself...

See klp_check_and_switch_task(), in addition to checking blocked tasks,
it also checks the current task.

> Should kernel threads that can use a lot of CPU have
> something in their outer loop to transition KLPs,
> just like the idle task does?

Maybe - I suppose this is the first time we've had an issue with
CPU-bound kthreads.  I didn't know that was a thing ;-)

-- 
Josh
Re: [RFC] sched,livepatch: call stop_one_cpu in klp_check_and_switch_task
Posted by Rik van Riel 4 years ago
On Mon, 2022-05-09 at 12:17 -0700, Josh Poimboeuf wrote:
> On Mon, May 09, 2022 at 03:10:16PM -0400, Rik van Riel wrote:
> 
> 
> > Should kernel threads that can use a lot of CPU have
> > something in their outer loop to transition KLPs,
> > just like the idle task does?
> 
> Maybe - I suppose this is the first time we've had an issue with
> CPU-bound kthreads.  I didn't know that was a thing ;-)
> 
Kworkers have as much work as you want them to do, and with
things like btrfs compression that can be quite a bit.

-- 
All Rights Reversed.
Re: [RFC] sched,livepatch: call stop_one_cpu in klp_check_and_switch_task
Posted by Josh Poimboeuf 4 years ago
On Mon, May 09, 2022 at 03:49:52PM -0400, Rik van Riel wrote:
> On Mon, 2022-05-09 at 12:17 -0700, Josh Poimboeuf wrote:
> > On Mon, May 09, 2022 at 03:10:16PM -0400, Rik van Riel wrote:
> > 
> > 
> > > Should kernel threads that can use a lot of CPU have
> > > something in their outer loop to transition KLPs,
> > > just like the idle task does?
> > 
> > Maybe - I suppose this is the first time we've had an issue with
> > CPU-bound kthreads.  I didn't know that was a thing ;-)
> > 
> Kworkers have as much work as you want them to do, and with
> things like btrfs compression that can be quite a bit.

To prevent patching, it would need to be some kind of sustained CPU
activity, rather than a burst.  I guess we haven't seen that show up as
a real-world problem until now.

If you're able to identify which kthreads would be problematic, then
yeah, defining a "transition point" in their outer loops could be an
option.

We could look also at a more general approach, like stack checking from
an irq handler.  But as Petr alluded to, that would be problematic for
CONFIG_FRAME_POINTER.

We could maybe deprecate frame pointers on x86 for live patching, but I
think other arches would have a similar problem unless they were to do
something like the ORC unwinder.

-- 
Josh
Re: [RFC] sched,livepatch: call stop_one_cpu in klp_check_and_switch_task
Posted by Rik van Riel 4 years ago
On Mon, 2022-05-09 at 13:09 -0700, Josh Poimboeuf wrote:


> To prevent patching, it would need to be some kind of sustained CPU
> activity, rather than a burst.  I guess we haven't seen that show up
> as
> a real-world problem until now.
> 
It's amazing what you see when you have a few million very
busy servers. The problems you think of as "one in a million"
happen all the time :)

> If you're able to identify which kthreads would be problematic, then
> yeah, defining a "transition point" in their outer loops could be an
> option.
> 
I'm in the middle of creating some scripts to gather kpatch
output from all the systems, and then sort and count the
results so we can easily see what the main pain points are.

We'll be back with patches once we have the data in hand :)


-- 
All Rights Reversed.
Re: [RFC] sched,livepatch: call stop_one_cpu in klp_check_and_switch_task
Posted by Song Liu 4 years ago

> On May 9, 2022, at 1:09 PM, Josh Poimboeuf <jpoimboe@kernel.org> wrote:
> 
> On Mon, May 09, 2022 at 03:49:52PM -0400, Rik van Riel wrote:
>> On Mon, 2022-05-09 at 12:17 -0700, Josh Poimboeuf wrote:
>>> On Mon, May 09, 2022 at 03:10:16PM -0400, Rik van Riel wrote:
>>> 
>>> 
>>>> Should kernel threads that can use a lot of CPU have
>>>> something in their outer loop to transition KLPs,
>>>> just like the idle task does?
>>> 
>>> Maybe - I suppose this is the first time we've had an issue with
>>> CPU-bound kthreads.  I didn't know that was a thing ;-)
>>> 
>> Kworkers have as much work as you want them to do, and with
>> things like btrfs compression that can be quite a bit.
> 
> To prevent patching, it would need to be some kind of sustained CPU
> activity, rather than a burst.  I guess we haven't seen that show up as
> a real-world problem until now.

Yes, we see this issue with sustained CPU activity from a kernel 
thread, which might be a bug itself. OTOH, the kernel thread does 
call cond_sched(), so it is not a deadlock. 

> 
> If you're able to identify which kthreads would be problematic, then
> yeah, defining a "transition point" in their outer loops could be an
> option.

cond_sched() feels like a natural “transition point” to me, and it 
should solve many different variations of such problem. I agree that
adding something to cond_sched() might be too much overhead for the 
system. So we are open for other suggestions. 

Thanks,
Song

> 
> We could look also at a more general approach, like stack checking from
> an irq handler.  But as Petr alluded to, that would be problematic for
> CONFIG_FRAME_POINTER.
> 
> We could maybe deprecate frame pointers on x86 for live patching, but I
> think other arches would have a similar problem unless they were to do
> something like the ORC unwinder.

Re: [RFC] sched,livepatch: call stop_one_cpu in klp_check_and_switch_task
Posted by Peter Zijlstra 3 years, 12 months ago
On Tue, May 10, 2022 at 12:32:02AM +0000, Song Liu wrote:
> cond_sched() feels like a natural “transition point” to me, and it 

You have to think of cond_resched() as a NOP.

Re: [RFC] sched,livepatch: call stop_one_cpu in klp_check_and_switch_task
Posted by Song Liu 4 years ago

> On May 9, 2022, at 8:52 AM, Rik van Riel <riel@surriel.com> wrote:
> 
> After talking with Peter, this seems like it might be a potential approach
> to fix the issue for kernels both with PREEMPT enabled and disabled.
> 
> If this looks like a reasonable approach to people, we can run experiments
> with this patch on a few thousand systems, and compare it with the kernel
> live patch transition latencies (and number of failures) on a kernel without
> that patch.
> 
> Does this look like an approach that could work?

Hi Petr, 

IIUC, this is similar to you proposal. Could you please share your thoughts
on this? 

Thanks,
Song


> 
> ---8<---
> sched,livepatch: call stop_one_cpu in klp_check_and_switch_task
> 
> If a running task fails to transition to the new kernel live patch after the
> first attempt, use the stopper thread to preempt it during subsequent attempts
> at switching to the new kernel live patch.
> 
> <INSERT EXPERIMENTAL RESULTS HERE>
> 
> Signed-off-by: Rik van Riel <riel@surriel.com>
> 
> diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> index 5d03a2ad1066..26e9e5f09822 100644
> --- a/kernel/livepatch/transition.c
> +++ b/kernel/livepatch/transition.c
> @@ -9,6 +9,7 @@
> 
> #include <linux/cpu.h>
> #include <linux/stacktrace.h>
> +#include <linux/stop_machine.h>
> #include "core.h"
> #include "patch.h"
> #include "transition.h"
> @@ -281,6 +282,11 @@ static int klp_check_and_switch_task(struct task_struct *task, void *arg)
> 	return 0;
> }
> 
> +static int kpatch_dummy_fn(void *dummy)
> +{
> +	return 0;
> +}
> +
> /*
>  * Try to safely switch a task to the target patch state.  If it's currently
>  * running, or it's sleeping on a to-be-patched or to-be-unpatched function, or
> @@ -315,6 +321,9 @@ static bool klp_try_switch_task(struct task_struct *task)
> 	case -EBUSY:	/* klp_check_and_switch_task() */
> 		pr_debug("%s: %s:%d is running\n",
> 			 __func__, task->comm, task->pid);
> +		/* Preempt the task from the second KLP switch attempt. */
> +		if (klp_signals_cnt)
> +			stop_one_cpu(task_cpu(task), kpatch_dummy_fn, NULL);
> 		break;
> 	case -EINVAL:	/* klp_check_and_switch_task() */
> 		pr_debug("%s: %s:%d has an unreliable stack\n",
>