drivers/vhost/vhost.c | 4 ++++ include/linux/livepatch.h | 2 ++ kernel/livepatch/transition.c | 11 +++++++++++ 3 files changed, 17 insertions(+)
We've fairly regularaly seen liveptches which cannot transition within kpatch's timeout period due to busy vhost worker kthreads. In looking for a solution the only answer I found was to call klp_update_patch_state() from a safe location. I tried adding this call to vhost_worker(), and it works, but this creates the potential for problems if a livepatch attempted to patch vhost_worker(). Without a call to klp_update_patch_state() fully loaded vhost kthreads can never switch because vhost_worker() will always appear on the stack, but with the call these kthreads can switch but will still be running the old version of vhost_worker(). To avoid this situation I've added a new function, klp_switch_current(), which switches the current task only if its stack does not include any function being patched. This allows kthreads to safely attempt switching themselves if a patch is pending. There is at least one downside, however. Since there's no way for the kthread to track whether it has already tried to switch for a pending patch it can end up calling klp_switch_current() repeatedly when it can never be safely switched. I don't know whether this is the right solution, and I'm happy to try out other suggestions. But in my testing these patches proved effective in consistently switching heavily loaded vhost kthreads almost immediately. To: Josh Poimboeuf <jpoimboe@kernel.org> To: Jiri Kosina <jikos@kernel.org> To: Miroslav Benes <mbenes@suse.cz> To: Petr Mladek <pmladek@suse.com> To: Joe Lawrence <joe.lawrence@redhat.com> To: "Michael S. Tsirkin" <mst@redhat.com> To: Jason Wang <jasowang@redhat.com> Cc: live-patching@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org Cc: virtualization@lists.linux-foundation.org Cc: netdev@vger.kernel.org Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> --- Seth Forshee (DigitalOcean) (2): livepatch: add an interface for safely switching kthreads vhost: check for pending livepatches from vhost worker kthreads drivers/vhost/vhost.c | 4 ++++ include/linux/livepatch.h | 2 ++ kernel/livepatch/transition.c | 11 +++++++++++ 3 files changed, 17 insertions(+) --- base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4 change-id: 20230120-vhost-klp-switching-ba9a3ae38b8a Best regards, -- Seth Forshee (DigitalOcean) <sforshee@kernel.org>
On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote: > We've fairly regularaly seen liveptches which cannot transition within kpatch's > timeout period due to busy vhost worker kthreads. I have missed this detail. Miroslav told me that we have solved something similar some time ago, see https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/ Honestly, kpatch's timeout 1 minute looks incredible low to me. Note that the transition is tried only once per minute. It means that there are "only" 60 attempts. Just by chance, does it help you to increase the timeout, please? This low timeout might be useful for testing. But in practice, it does not matter when the transition is lasting one hour or even longer. It takes much longer time to prepare the livepatch. Best Regards, Petr
On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote: > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote: > > We've fairly regularaly seen liveptches which cannot transition within kpatch's > > timeout period due to busy vhost worker kthreads. > > I have missed this detail. Miroslav told me that we have solved > something similar some time ago, see > https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/ Interesting thread. I had thought about something along the lines of the original patch, but there are some ideas in there that I hadn't considered. > Honestly, kpatch's timeout 1 minute looks incredible low to me. Note > that the transition is tried only once per minute. It means that there > are "only" 60 attempts. > > Just by chance, does it help you to increase the timeout, please? To be honest my test setup reproduces the problem well enough to make KLP wait significant time due to vhost threads, but it seldom causes it to hit kpatch's timeout. Our system management software will try to load a patch tens of times in a day, and we've seen real-world cases where patches couldn't load within kpatch's timeout for multiple days. But I don't have such an environment readily accessible for my own testing. I can try to refine my test case and see if I can get it to that point. > This low timeout might be useful for testing. But in practice, it does > not matter when the transition is lasting one hour or even longer. > It takes much longer time to prepare the livepatch. Agreed. And to be clear, we cope with the fact that patches may take hours or even days to get applied in some cases. The patches I sent are just about improving the only case I've identified which has lead to kpatch failing to load a patch for a day or longer. Thanks, Seth
On Thu 2023-01-26 15:12:35, Seth Forshee (DigitalOcean) wrote: > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote: > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote: > > > We've fairly regularaly seen liveptches which cannot transition within kpatch's > > > timeout period due to busy vhost worker kthreads. > > > > I have missed this detail. Miroslav told me that we have solved > > something similar some time ago, see > > https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/ > > Interesting thread. I had thought about something along the lines of the > original patch, but there are some ideas in there that I hadn't > considered. Could you please provide some more details about the test system? Is there anything important to make it reproducible? The following aspects come to my mind. It might require: + more workers running on the same system + have a dedicated CPU for the worker + livepatching the function called by work->fn() + running the same work again and again + huge and overloaded system > > Honestly, kpatch's timeout 1 minute looks incredible low to me. Note > > that the transition is tried only once per minute. It means that there > > are "only" 60 attempts. > > > > Just by chance, does it help you to increase the timeout, please? > > To be honest my test setup reproduces the problem well enough to make > KLP wait significant time due to vhost threads, but it seldom causes it > to hit kpatch's timeout. > > Our system management software will try to load a patch tens of times in > a day, and we've seen real-world cases where patches couldn't load > within kpatch's timeout for multiple days. But I don't have such an > environment readily accessible for my own testing. I can try to refine > my test case and see if I can get it to that point. My understanding is that you try to load the patch repeatedly but it always fails after the 1 minute timeout. It means that it always starts from the beginning (no livepatched process). Is there any chance to try it with a longer timeout, for example, one hour? It should increase the chance if there are more problematic kthreads. > > This low timeout might be useful for testing. But in practice, it does > > not matter when the transition is lasting one hour or even longer. > > It takes much longer time to prepare the livepatch. > > Agreed. And to be clear, we cope with the fact that patches may take > hours or even days to get applied in some cases. The patches I sent are > just about improving the only case I've identified which has lead to > kpatch failing to load a patch for a day or longer. If it is acceptable to wait hours or even days then the 1 minute timeout is quite contra-productive. We actually do not use any timeout at all in livepatches provided by SUSE. Best Regards, Petr
On Fri, Jan 27, 2023 at 12:19:03PM +0100, Petr Mladek wrote: > On Thu 2023-01-26 15:12:35, Seth Forshee (DigitalOcean) wrote: > > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote: > > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote: > > > > We've fairly regularaly seen liveptches which cannot transition within kpatch's > > > > timeout period due to busy vhost worker kthreads. > > > > > > I have missed this detail. Miroslav told me that we have solved > > > something similar some time ago, see > > > https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/ > > > > Interesting thread. I had thought about something along the lines of the > > original patch, but there are some ideas in there that I hadn't > > considered. > > Could you please provide some more details about the test system? > Is there anything important to make it reproducible? > > The following aspects come to my mind. It might require: > > + more workers running on the same system > + have a dedicated CPU for the worker > + livepatching the function called by work->fn() > + running the same work again and again > + huge and overloaded system I'm isolating a CPU, starting a KVM guest with a virtio-net device, and setting the affinity of the vhost worker thread to only the isolated CPU. Thus the vhost-worker thread has a dedicated CPU, as you say. (I'll note that in real-world cases the systems have many CPUs, and while the vhost threads aren't each given a dedicated CPU, if the system load is light enough a thread can end up with exlusive use of a CPU). Then all I do is run iperf between the guest and the host with several parallel streams. I seem to be hitting the limits of the guest vCPUs before the vhost thread is fully saturated, as this gets it to about 90% CPU utilization by the vhost thread. > > > Honestly, kpatch's timeout 1 minute looks incredible low to me. Note > > > that the transition is tried only once per minute. It means that there > > > are "only" 60 attempts. > > > > > > Just by chance, does it help you to increase the timeout, please? > > > > To be honest my test setup reproduces the problem well enough to make > > KLP wait significant time due to vhost threads, but it seldom causes it > > to hit kpatch's timeout. > > > > Our system management software will try to load a patch tens of times in > > a day, and we've seen real-world cases where patches couldn't load > > within kpatch's timeout for multiple days. But I don't have such an > > environment readily accessible for my own testing. I can try to refine > > my test case and see if I can get it to that point. > > My understanding is that you try to load the patch repeatedly but > it always fails after the 1 minute timeout. It means that it always > starts from the beginning (no livepatched process). > > Is there any chance to try it with a longer timeout, for example, one > hour? It should increase the chance if there are more problematic kthreads. Yes, I can try it. But I think I already mentioned that we are somewhat limited by our system management software and how livepatch loading is currently implemented there. I'd need to consult with others about how long we could make the timeout, but 1 hour is definitely too long under our current system. > > > This low timeout might be useful for testing. But in practice, it does > > > not matter when the transition is lasting one hour or even longer. > > > It takes much longer time to prepare the livepatch. > > > > Agreed. And to be clear, we cope with the fact that patches may take > > hours or even days to get applied in some cases. The patches I sent are > > just about improving the only case I've identified which has lead to > > kpatch failing to load a patch for a day or longer. > > If it is acceptable to wait hours or even days then the 1 minute > timeout is quite contra-productive. We actually do not use any timeout > at all in livepatches provided by SUSE. I agree, though I'd still prefer it didn't take days. Based on this discussion I do plan to look at changing how we load livepatches to make this possible, but it will take some time. Thanks, Seth
On Fri 2023-01-27 08:57:40, Seth Forshee wrote: > On Fri, Jan 27, 2023 at 12:19:03PM +0100, Petr Mladek wrote: > > Could you please provide some more details about the test system? > > Is there anything important to make it reproducible? > > > > The following aspects come to my mind. It might require: > > > > + more workers running on the same system > > + have a dedicated CPU for the worker > > + livepatching the function called by work->fn() > > + running the same work again and again > > + huge and overloaded system > > I'm isolating a CPU, starting a KVM guest with a virtio-net device, and > setting the affinity of the vhost worker thread to only the isolated > CPU. Thus the vhost-worker thread has a dedicated CPU, as you say. (I'll > note that in real-world cases the systems have many CPUs, and while the > vhost threads aren't each given a dedicated CPU, if the system load is > light enough a thread can end up with exlusive use of a CPU). > > Then all I do is run iperf between the guest and the host with several > parallel streams. I seem to be hitting the limits of the guest vCPUs > before the vhost thread is fully saturated, as this gets it to about 90% > CPU utilization by the vhost thread. Thanks for the info! > > > > Honestly, kpatch's timeout 1 minute looks incredible low to me. Note > > > > that the transition is tried only once per minute. It means that there > > > > are "only" 60 attempts. > > > > > > > > Just by chance, does it help you to increase the timeout, please? > > > > > > To be honest my test setup reproduces the problem well enough to make > > > KLP wait significant time due to vhost threads, but it seldom causes it > > > to hit kpatch's timeout. > > > > > > Our system management software will try to load a patch tens of times in > > > a day, and we've seen real-world cases where patches couldn't load > > > within kpatch's timeout for multiple days. But I don't have such an > > > environment readily accessible for my own testing. I can try to refine > > > my test case and see if I can get it to that point. > > > > My understanding is that you try to load the patch repeatedly but > > it always fails after the 1 minute timeout. It means that it always > > starts from the beginning (no livepatched process). > > > > Is there any chance to try it with a longer timeout, for example, one > > hour? It should increase the chance if there are more problematic kthreads. > > Yes, I can try it. But I think I already mentioned that we are somewhat > limited by our system management software and how livepatch loading is > currently implemented there. I'd need to consult with others about how > long we could make the timeout, but 1 hour is definitely too long under > our current system. Another possibility is to do not wait at all. SUSE livepatch packages load the livepatch module, remove not longer used livepatch modules and are done with it. Note that the module is loaded quickly. The transition is finished asynchronously using workqueues. Of course, there is a risk that the transition will never finish. It would prevent loading any newer livepatch. But it might be handled when the the newer livepatch is loaded. It might revert the pending transition, ... Of course, it would be great to make the transition more reliable. It would be nice to add the hook into the scheduler as discussed in another branch of this thread. But it might bring another problems, for example, affect the system performance. Well, it probably can be optimized or ratelimited. Anyway, I wanted to say that there is a way to get rid of the timeout completely. Best Regards, Petr
On Thu, Jan 26, 2023 at 03:12:35PM -0600, Seth Forshee (DigitalOcean) wrote: > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote: > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote: > > > We've fairly regularaly seen liveptches which cannot transition within kpatch's > > > timeout period due to busy vhost worker kthreads. > > > > I have missed this detail. Miroslav told me that we have solved > > something similar some time ago, see > > https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/ > > Interesting thread. I had thought about something along the lines of the > original patch, but there are some ideas in there that I hadn't > considered. Here's another idea, have we considered this? Have livepatch set TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is set. Not sure how scheduler folks would feel about that ;-) -- Josh
On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote: > On Thu, Jan 26, 2023 at 03:12:35PM -0600, Seth Forshee (DigitalOcean) wrote: > > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote: > > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote: > > > > We've fairly regularaly seen liveptches which cannot transition within kpatch's > > > > timeout period due to busy vhost worker kthreads. > > > > > > I have missed this detail. Miroslav told me that we have solved > > > something similar some time ago, see > > > https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/ > > > > Interesting thread. I had thought about something along the lines of the > > original patch, but there are some ideas in there that I hadn't > > considered. > > Here's another idea, have we considered this? Have livepatch set > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is > set. > > Not sure how scheduler folks would feel about that ;-) So, let me try and page all that back in.... :-) KLP needs to unwind the stack to see if any of the patched functions are active, if not, flip task to new set. Unwinding the stack of a task can be done when: - task is inactive (stable reg and stack) -- provided it stays inactive while unwinding etc.. - task is current (guarantees stack doesn't dip below where we started due to being busy on top etc..) Can NOT be done from interrupt context, because can hit in the middle of setting up stack frames etc.. The issue at hand is that some tasks run for a long time without passing through an explicit check. The thread above tried sticking something in cond_resched() which is a problem for PREEMPT=y since cond_resched() is a no-op. Preempt notifiers were raised, and those would actually be nice, except you can only install a notifier on current and you need some memory allocated per task, which makes it less than ideal. Plus ... ... putting something in finish_task_switch() wouldn't be the end of the world I suppose, but then you still need to force schedule the task -- imagine it being the only runnable task on the CPU, there's nothing going to make it actually switch. Which then leads me to suggest something daft like this.. does that help? diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c index f1b25ec581e0..06746095a724 100644 --- a/kernel/livepatch/transition.c +++ b/kernel/livepatch/transition.c @@ -9,6 +9,7 @@ #include <linux/cpu.h> #include <linux/stacktrace.h> +#include <linux/stop_machine.h> #include "core.h" #include "patch.h" #include "transition.h" @@ -334,6 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task) return !ret; } +static int __stop_try_switch(void *arg) +{ + return klp_try_switch_task(arg) ? 0 : -EBUSY; +} + +static bool klp_try_switch_task_harder(struct task_struct *task) +{ + return !stop_one_cpu(task_cpu(task), __stop_try_switch, task); +} + /* * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set. * Kthreads with TIF_PATCH_PENDING set are woken up.
On Fri, Jan 27, 2023 at 11:37:02AM +0100, Peter Zijlstra wrote: > On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote: > > On Thu, Jan 26, 2023 at 03:12:35PM -0600, Seth Forshee (DigitalOcean) wrote: > > > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote: > > > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote: > > > > > We've fairly regularaly seen liveptches which cannot transition within kpatch's > > > > > timeout period due to busy vhost worker kthreads. > > > > > > > > I have missed this detail. Miroslav told me that we have solved > > > > something similar some time ago, see > > > > https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/ > > > > > > Interesting thread. I had thought about something along the lines of the > > > original patch, but there are some ideas in there that I hadn't > > > considered. > > > > Here's another idea, have we considered this? Have livepatch set > > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then > > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is > > set. > > > > Not sure how scheduler folks would feel about that ;-) > > So, let me try and page all that back in.... :-) > > KLP needs to unwind the stack to see if any of the patched functions are > active, if not, flip task to new set. > > Unwinding the stack of a task can be done when: > > - task is inactive (stable reg and stack) -- provided it stays inactive > while unwinding etc.. > > - task is current (guarantees stack doesn't dip below where we started > due to being busy on top etc..) > > Can NOT be done from interrupt context, because can hit in the middle of > setting up stack frames etc.. > > The issue at hand is that some tasks run for a long time without passing > through an explicit check. > > The thread above tried sticking something in cond_resched() which is a > problem for PREEMPT=y since cond_resched() is a no-op. > > Preempt notifiers were raised, and those would actually be nice, except > you can only install a notifier on current and you need some memory > allocated per task, which makes it less than ideal. Plus ... > > ... putting something in finish_task_switch() wouldn't be the end of the > world I suppose, but then you still need to force schedule the task -- > imagine it being the only runnable task on the CPU, there's nothing > going to make it actually switch. > > Which then leads me to suggest something daft like this.. does that > help? The changes below are working well for me. The context has a read lock on tasklist_lock so it can't sleep, so I'm using stop_one_cpu_nowait() with per-CPU state. Based on Josh's comments it sounds like the klp_have_reliable_stack() check probably isn't quite right, and we might want to add something else for PREEMPT+!ORC. But I wanted to go ahead and see if this approach seems reasonable to everyone. Thanks, Seth diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c index f1b25ec581e0..9f3898f02828 100644 --- a/kernel/livepatch/transition.c +++ b/kernel/livepatch/transition.c @@ -9,6 +9,7 @@ #include <linux/cpu.h> #include <linux/stacktrace.h> +#include <linux/stop_machine.h> #include "core.h" #include "patch.h" #include "transition.h" @@ -334,9 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task) return !ret; } +static int __try_switch_kthread(void *arg) +{ + return klp_try_switch_task(arg) ? 0 : -EBUSY; +} + +static DEFINE_PER_CPU(struct cpu_stop_work, klp_stop_work); + /* * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set. - * Kthreads with TIF_PATCH_PENDING set are woken up. + * Kthreads with TIF_PATCH_PENDING set are preempted or woken up. */ static void klp_send_signals(void) { @@ -357,11 +365,22 @@ static void klp_send_signals(void) * would be meaningless. It is not serious though. */ if (task->flags & PF_KTHREAD) { - /* - * Wake up a kthread which sleeps interruptedly and - * still has not been migrated. - */ - wake_up_state(task, TASK_INTERRUPTIBLE); + if (task_curr(task) && klp_have_reliable_stack()) { + /* + * kthread is currently running on a CPU; try + * to preempt it. + */ + stop_one_cpu_nowait(task_cpu(task), + __try_switch_kthread, + task, + this_cpu_ptr(&klp_stop_work)); + } else { + /* + * Wake up a kthread which sleeps interruptedly + * and still has not been migrated. + */ + wake_up_state(task, TASK_INTERRUPTIBLE); + } } else { /* * Send fake signal to all non-kthread tasks which are
On Fri, Jan 27, 2023 at 11:37:02AM +0100, Peter Zijlstra wrote: > On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote: > > Here's another idea, have we considered this? Have livepatch set > > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then > > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is > > set. > > > > Not sure how scheduler folks would feel about that ;-) Hmmmm, with preemption I guess the above doesn't work for kthreads calling cond_resched() instead of what vhost_worker() does (explicit need_resched/schedule). > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c > index f1b25ec581e0..06746095a724 100644 > --- a/kernel/livepatch/transition.c > +++ b/kernel/livepatch/transition.c > @@ -9,6 +9,7 @@ > > #include <linux/cpu.h> > #include <linux/stacktrace.h> > +#include <linux/stop_machine.h> > #include "core.h" > #include "patch.h" > #include "transition.h" > @@ -334,6 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task) > return !ret; > } > > +static int __stop_try_switch(void *arg) > +{ > + return klp_try_switch_task(arg) ? 0 : -EBUSY; > +} > + > +static bool klp_try_switch_task_harder(struct task_struct *task) > +{ > + return !stop_one_cpu(task_cpu(task), __stop_try_switch, task); > +} > + > /* > * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set. > * Kthreads with TIF_PATCH_PENDING set are woken up. Doesn't work for PREEMPT+!ORC. Non-ORC reliable unwinders will detect preemption on the stack and automatically report unreliable. -- Josh
On Fri, Jan 27, 2023 at 08:52:38AM -0800, Josh Poimboeuf wrote: > On Fri, Jan 27, 2023 at 11:37:02AM +0100, Peter Zijlstra wrote: > > On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote: > > > Here's another idea, have we considered this? Have livepatch set > > > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then > > > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is > > > set. > > > > > > Not sure how scheduler folks would feel about that ;-) > > Hmmmm, with preemption I guess the above doesn't work for kthreads > calling cond_resched() instead of what vhost_worker() does (explicit > need_resched/schedule). Though I guess we could hook into cond_resched() too if we make it a non-NOP for PREEMPT+LIVEPATCH? -- Josh
On Fri, Jan 27, 2023 at 09:09:48AM -0800, Josh Poimboeuf wrote: > On Fri, Jan 27, 2023 at 08:52:38AM -0800, Josh Poimboeuf wrote: > > On Fri, Jan 27, 2023 at 11:37:02AM +0100, Peter Zijlstra wrote: > > > On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote: > > > > Here's another idea, have we considered this? Have livepatch set > > > > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then > > > > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is > > > > set. > > > > > > > > Not sure how scheduler folks would feel about that ;-) > > > > Hmmmm, with preemption I guess the above doesn't work for kthreads > > calling cond_resched() instead of what vhost_worker() does (explicit > > need_resched/schedule). > > Though I guess we could hook into cond_resched() too if we make it a > non-NOP for PREEMPT+LIVEPATCH? I discussed this idea with Peter on IRC and he didn't immediately shoot it down. It compiles... Thoughts? diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h index 293e29960c6e..937816d0867c 100644 --- a/include/linux/livepatch.h +++ b/include/linux/livepatch.h @@ -14,6 +14,8 @@ #include <linux/completion.h> #include <linux/list.h> +#include <linux/livepatch_sched.h> + #if IS_ENABLED(CONFIG_LIVEPATCH) /* task patch states */ diff --git a/include/linux/livepatch_sched.h b/include/linux/livepatch_sched.h new file mode 100644 index 000000000000..3237bc6a5b01 --- /dev/null +++ b/include/linux/livepatch_sched.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _LINUX_LIVEPATCH_SCHED_H_ +#define _LINUX_LIVEPATCH_SCHED_H_ + +#include <linux/static_call_types.h> + +#ifdef CONFIG_LIVEPATCH + +void __klp_sched_try_switch(void); +DECLARE_STATIC_CALL(klp_sched_try_switch, __klp_sched_try_switch); + +static __always_inline void klp_sched_try_switch(void) +{ + //FIXME need static_call_cond_mod() ? + static_call_mod(klp_sched_try_switch)(); +} + +#else /* !CONFIG_LIVEPATCH */ +static inline void klp_sched_try_switch(void) {} +#endif /* CONFIG_LIVEPATCH */ + +#endif /* _LINUX_LIVEPATCH_SCHED_H_ */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 4df2b3e76b30..fbcd3acca25c 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -36,6 +36,7 @@ #include <linux/seqlock.h> #include <linux/kcsan.h> #include <linux/rv.h> +#include <linux/livepatch_sched.h> #include <asm/kmap_size.h> /* task_struct member predeclarations (sorted alphabetically): */ @@ -2074,6 +2075,9 @@ DECLARE_STATIC_CALL(cond_resched, __cond_resched); static __always_inline int _cond_resched(void) { + //FIXME this is a bit redundant with preemption disabled + klp_sched_try_switch(); + return static_call_mod(cond_resched)(); } diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c index f1b25ec581e0..042e34c9389c 100644 --- a/kernel/livepatch/transition.c +++ b/kernel/livepatch/transition.c @@ -9,6 +9,7 @@ #include <linux/cpu.h> #include <linux/stacktrace.h> +#include <linux/static_call.h> #include "core.h" #include "patch.h" #include "transition.h" @@ -24,6 +25,9 @@ static int klp_target_state = KLP_UNDEFINED; static unsigned int klp_signals_cnt; +DEFINE_STATIC_CALL_NULL(klp_sched_try_switch, __klp_sched_try_switch); +EXPORT_STATIC_CALL_TRAMP(klp_sched_try_switch); + /* * This work can be performed periodically to finish patching or unpatching any * "straggler" tasks which failed to transition in the first attempt. @@ -76,6 +80,8 @@ static void klp_complete_transition(void) klp_transition_patch->mod->name, klp_target_state == KLP_PATCHED ? "patching" : "unpatching"); + static_call_update(klp_sched_try_switch, NULL); + if (klp_transition_patch->replace && klp_target_state == KLP_PATCHED) { klp_unpatch_replaced_patches(klp_transition_patch); klp_discard_nops(klp_transition_patch); @@ -256,7 +262,8 @@ static int klp_check_stack(struct task_struct *task, const char **oldname) klp_for_each_func(obj, func) { ret = klp_check_stack_func(func, entries, nr_entries); if (ret) { - *oldname = func->old_name; + if (oldname) + *oldname = func->old_name; return -EADDRINUSE; } } @@ -307,7 +314,11 @@ static bool klp_try_switch_task(struct task_struct *task) * functions. If all goes well, switch the task to the target patch * state. */ - ret = task_call_func(task, klp_check_and_switch_task, &old_name); + if (task == current) + ret = klp_check_and_switch_task(current, &old_name); + else + ret = task_call_func(task, klp_check_and_switch_task, &old_name); + switch (ret) { case 0: /* success */ break; @@ -334,6 +345,15 @@ static bool klp_try_switch_task(struct task_struct *task) return !ret; } +void __klp_sched_try_switch(void) +{ + if (likely(!klp_patch_pending(current))) + return; + + //FIXME locking + klp_try_switch_task(current); +} + /* * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set. * Kthreads with TIF_PATCH_PENDING set are woken up. @@ -401,8 +421,10 @@ void klp_try_complete_transition(void) */ read_lock(&tasklist_lock); for_each_process_thread(g, task) - if (!klp_try_switch_task(task)) + if (!klp_try_switch_task(task)) { + set_tsk_need_resched(task); complete = false; + } read_unlock(&tasklist_lock); /* @@ -413,6 +435,7 @@ void klp_try_complete_transition(void) task = idle_task(cpu); if (cpu_online(cpu)) { if (!klp_try_switch_task(task)) { + set_tsk_need_resched(task); complete = false; /* Make idle task go through the main loop. */ wake_up_if_idle(cpu); @@ -492,6 +515,8 @@ void klp_start_transition(void) set_tsk_thread_flag(task, TIF_PATCH_PENDING); } + static_call_update(klp_sched_try_switch, __klp_sched_try_switch); + klp_signals_cnt = 0; } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3a0ef2fefbd5..01e32d242ef6 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -6506,6 +6506,8 @@ static void __sched notrace __schedule(unsigned int sched_mode) struct rq *rq; int cpu; + klp_sched_try_switch(); + cpu = smp_processor_id(); rq = cpu_rq(cpu); prev = rq->curr; @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched); static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched); int __sched dynamic_cond_resched(void) { - if (!static_branch_unlikely(&sk_dynamic_cond_resched)) + if (!static_branch_unlikely(&sk_dynamic_cond_resched)) { + klp_sched_try_switch(); return 0; + } return __cond_resched(); } EXPORT_SYMBOL(dynamic_cond_resched); diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c index e9ef66be2870..27ba93930584 100644 --- a/kernel/sched/idle.c +++ b/kernel/sched/idle.c @@ -308,9 +308,6 @@ static void do_idle(void) */ flush_smp_call_function_queue(); schedule_idle(); - - if (unlikely(klp_patch_pending(current))) - klp_update_patch_state(current); } bool cpu_in_idle(unsigned long pc)
On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote: > diff --git a/include/linux/sched.h b/include/linux/sched.h > index 4df2b3e76b30..fbcd3acca25c 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -36,6 +36,7 @@ > #include <linux/seqlock.h> > #include <linux/kcsan.h> > #include <linux/rv.h> > +#include <linux/livepatch_sched.h> > #include <asm/kmap_size.h> > > /* task_struct member predeclarations (sorted alphabetically): */ > @@ -2074,6 +2075,9 @@ DECLARE_STATIC_CALL(cond_resched, __cond_resched); > > static __always_inline int _cond_resched(void) > { > + //FIXME this is a bit redundant with preemption disabled > + klp_sched_try_switch(); > + > return static_call_mod(cond_resched)(); > } Right, I was thinking you'd do something like: static_call_update(cond_resched, klp_cond_resched); With: static int klp_cond_resched(void) { klp_try_switch_task(current); return __cond_resched(); } That would force cond_resched() into doing the transition thing, irrespective of the preemption mode at hand. And then, when KLP be done, re-run sched_dynamic_update() to reset it to whatever it ought to be. > @@ -401,8 +421,10 @@ void klp_try_complete_transition(void) > */ > read_lock(&tasklist_lock); > for_each_process_thread(g, task) > - if (!klp_try_switch_task(task)) > + if (!klp_try_switch_task(task)) { > + set_tsk_need_resched(task); > complete = false; > + } Yeah, no, that's broken -- preemption state live in more than just the TIF bit. > read_unlock(&tasklist_lock); > > /* > @@ -413,6 +435,7 @@ void klp_try_complete_transition(void) > task = idle_task(cpu); > if (cpu_online(cpu)) { > if (!klp_try_switch_task(task)) { > + set_tsk_need_resched(task); > complete = false; > /* Make idle task go through the main loop. */ > wake_up_if_idle(cpu); Idem. Also, I don't see the point of this and the __schedule() hook here: > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 3a0ef2fefbd5..01e32d242ef6 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -6506,6 +6506,8 @@ static void __sched notrace __schedule(unsigned int sched_mode) > struct rq *rq; > int cpu; > > + klp_sched_try_switch(); > + > cpu = smp_processor_id(); > rq = cpu_rq(cpu); > prev = rq->curr; If it schedules, you'll get it with the normal switcheroo, because it'll be inactive some of the time. If it doesn't schedule, it'll run into cond_resched(). > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched); > static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched); > int __sched dynamic_cond_resched(void) > { > - if (!static_branch_unlikely(&sk_dynamic_cond_resched)) > + if (!static_branch_unlikely(&sk_dynamic_cond_resched)) { > + klp_sched_try_switch(); > return 0; > + } > return __cond_resched(); > } > EXPORT_SYMBOL(dynamic_cond_resched); I would make the klp_sched_try_switch() not depend on sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed pass through __schedule(). But you'll probably want to check with Mark here, this all might generate crap code on arm64. Both ways this seems to make KLP 'depend' (or at least work lots better) when PREEMPT_DYNAMIC=y. Do we want a PREEMPT_DYNAMIC=n fallback for _cond_resched() too?
On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote: > Right, I was thinking you'd do something like: > > static_call_update(cond_resched, klp_cond_resched); > > With: > > static int klp_cond_resched(void) > { > klp_try_switch_task(current); > return __cond_resched(); > } Something like this? diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index cbe72bfd2f1f..424c0c939f57 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -363,8 +363,7 @@ static int vhost_worker(void *data) kcov_remote_start_common(dev->kcov_handle); work->fn(work); kcov_remote_stop(); - if (need_resched()) - schedule(); + cond_resched(); } } kthread_unuse_mm(dev->mm); diff --git a/include/linux/livepatch.h b/include/linux/livepatch.h index 293e29960c6e..937816d0867c 100644 --- a/include/linux/livepatch.h +++ b/include/linux/livepatch.h @@ -14,6 +14,8 @@ #include <linux/completion.h> #include <linux/list.h> +#include <linux/livepatch_sched.h> + #if IS_ENABLED(CONFIG_LIVEPATCH) /* task patch states */ diff --git a/include/linux/livepatch_sched.h b/include/linux/livepatch_sched.h new file mode 100644 index 000000000000..3237bc6a5b01 --- /dev/null +++ b/include/linux/livepatch_sched.h @@ -0,0 +1,22 @@ +/* SPDX-License-Identifier: GPL-2.0-or-later */ +#ifndef _LINUX_LIVEPATCH_SCHED_H_ +#define _LINUX_LIVEPATCH_SCHED_H_ + +#include <linux/static_call_types.h> + +#ifdef CONFIG_LIVEPATCH + +void __klp_sched_try_switch(void); +DECLARE_STATIC_CALL(klp_sched_try_switch, __klp_sched_try_switch); + +static __always_inline void klp_sched_try_switch(void) +{ + //FIXME need static_call_cond_mod() ? + static_call_mod(klp_sched_try_switch)(); +} + +#else /* !CONFIG_LIVEPATCH */ +static inline void klp_sched_try_switch(void) {} +#endif /* CONFIG_LIVEPATCH */ + +#endif /* _LINUX_LIVEPATCH_SCHED_H_ */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 4df2b3e76b30..a7acf9ae9b90 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -36,6 +36,7 @@ #include <linux/seqlock.h> #include <linux/kcsan.h> #include <linux/rv.h> +#include <linux/livepatch_sched.h> #include <asm/kmap_size.h> /* task_struct member predeclarations (sorted alphabetically): */ @@ -2077,11 +2078,15 @@ static __always_inline int _cond_resched(void) return static_call_mod(cond_resched)(); } +void sched_dynamic_klp_enable(void); +void sched_dynamic_klp_disable(void); + #elif defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY) extern int dynamic_cond_resched(void); static __always_inline int _cond_resched(void) { + klp_sched_try_switch(); return dynamic_cond_resched(); } @@ -2089,6 +2094,7 @@ static __always_inline int _cond_resched(void) static inline int _cond_resched(void) { + klp_sched_try_switch(); return __cond_resched(); } @@ -2096,7 +2102,10 @@ static inline int _cond_resched(void) #else -static inline int _cond_resched(void) { return 0; } +static inline int _cond_resched(void) { + klp_sched_try_switch(); + return 0; +} #endif /* !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC) */ diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c index f1b25ec581e0..3cc4e0a24dc6 100644 --- a/kernel/livepatch/transition.c +++ b/kernel/livepatch/transition.c @@ -9,6 +9,7 @@ #include <linux/cpu.h> #include <linux/stacktrace.h> +#include <linux/static_call.h> #include "core.h" #include "patch.h" #include "transition.h" @@ -24,6 +25,9 @@ static int klp_target_state = KLP_UNDEFINED; static unsigned int klp_signals_cnt; +DEFINE_STATIC_CALL_NULL(klp_sched_try_switch, __klp_sched_try_switch); +EXPORT_STATIC_CALL_TRAMP(klp_sched_try_switch); + /* * This work can be performed periodically to finish patching or unpatching any * "straggler" tasks which failed to transition in the first attempt. @@ -61,6 +65,28 @@ static void klp_synchronize_transition(void) schedule_on_each_cpu(klp_sync); } +/* + * Enable the klp hooks in cond_resched() while livepatching is in progress. + * This helps CPU-bound kthreads get patched. + */ +static void klp_sched_hook_enable(void) +{ +#if defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL) + sched_dynamic_klp_enable(); +#else + static_call_update(klp_sched_try_switch, __klp_sched_try_switch); +#endif +} + +static void klp_sched_hook_disable(void) +{ +#if defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL) + sched_dynamic_klp_disable(); +#else + static_call_update(klp_sched_try_switch, NULL); +#endif +} + /* * The transition to the target patch state is complete. Clean up the data * structures. @@ -76,6 +102,8 @@ static void klp_complete_transition(void) klp_transition_patch->mod->name, klp_target_state == KLP_PATCHED ? "patching" : "unpatching"); + klp_sched_hook_disable(); + if (klp_transition_patch->replace && klp_target_state == KLP_PATCHED) { klp_unpatch_replaced_patches(klp_transition_patch); klp_discard_nops(klp_transition_patch); @@ -307,7 +335,11 @@ static bool klp_try_switch_task(struct task_struct *task) * functions. If all goes well, switch the task to the target patch * state. */ - ret = task_call_func(task, klp_check_and_switch_task, &old_name); + if (task == current) + ret = klp_check_and_switch_task(current, &old_name); + else + ret = task_call_func(task, klp_check_and_switch_task, &old_name); + switch (ret) { case 0: /* success */ break; @@ -334,6 +366,15 @@ static bool klp_try_switch_task(struct task_struct *task) return !ret; } +void __klp_sched_try_switch(void) +{ + if (likely(!klp_patch_pending(current))) + return; + + //FIXME locking + klp_try_switch_task(current); +} + /* * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set. * Kthreads with TIF_PATCH_PENDING set are woken up. @@ -492,6 +533,8 @@ void klp_start_transition(void) set_tsk_thread_flag(task, TIF_PATCH_PENDING); } + klp_sched_hook_enable(); + klp_signals_cnt = 0; } diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 3a0ef2fefbd5..4fbf70b05576 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -8648,13 +8648,16 @@ int sched_dynamic_mode(const char *str) #error "Unsupported PREEMPT_DYNAMIC mechanism" #endif +static bool klp_override; + void sched_dynamic_update(int mode) { /* * Avoid {NONE,VOLUNTARY} -> FULL transitions from ever ending up in * the ZERO state, which is invalid. */ - preempt_dynamic_enable(cond_resched); + if (!klp_override) + preempt_dynamic_enable(cond_resched); preempt_dynamic_enable(might_resched); preempt_dynamic_enable(preempt_schedule); preempt_dynamic_enable(preempt_schedule_notrace); @@ -8662,16 +8665,19 @@ void sched_dynamic_update(int mode) switch (mode) { case preempt_dynamic_none: - preempt_dynamic_enable(cond_resched); + if (!klp_override) + preempt_dynamic_enable(cond_resched); preempt_dynamic_disable(might_resched); preempt_dynamic_disable(preempt_schedule); preempt_dynamic_disable(preempt_schedule_notrace); preempt_dynamic_disable(irqentry_exit_cond_resched); + //FIXME avoid printk for klp restore pr_info("Dynamic Preempt: none\n"); break; case preempt_dynamic_voluntary: - preempt_dynamic_enable(cond_resched); + if (!klp_override) + preempt_dynamic_enable(cond_resched); preempt_dynamic_enable(might_resched); preempt_dynamic_disable(preempt_schedule); preempt_dynamic_disable(preempt_schedule_notrace); @@ -8680,7 +8686,8 @@ void sched_dynamic_update(int mode) break; case preempt_dynamic_full: - preempt_dynamic_disable(cond_resched); + if (!klp_override) + preempt_dynamic_disable(cond_resched); preempt_dynamic_disable(might_resched); preempt_dynamic_enable(preempt_schedule); preempt_dynamic_enable(preempt_schedule_notrace); @@ -8692,6 +8699,28 @@ void sched_dynamic_update(int mode) preempt_dynamic_mode = mode; } +#ifdef CONFIG_HAVE_PREEMPT_DYNAMIC_CALL +static int klp_cond_resched(void) +{ + __klp_sched_try_switch(); + return __cond_resched(); +} + +void sched_dynamic_klp_enable(void) +{ + //FIXME locking + klp_override = true; + static_call_update(cond_resched, klp_cond_resched); +} + +void sched_dynamic_klp_disable(void) +{ + //FIXME locking + klp_override = false; + sched_dynamic_update(preempt_dynamic_mode); +} +#endif /* CONFIG_HAVE_PREEMPT_DYNAMIC_CALL*/ + static int __init setup_preempt_mode(char *str) { int mode = sched_dynamic_mode(str);
On Mon, Jan 30, 2023 at 11:59:30AM -0800, Josh Poimboeuf wrote: > @@ -8662,16 +8665,19 @@ void sched_dynamic_update(int mode) > > switch (mode) { > case preempt_dynamic_none: > - preempt_dynamic_enable(cond_resched); > + if (!klp_override) > + preempt_dynamic_enable(cond_resched); > preempt_dynamic_disable(might_resched); > preempt_dynamic_disable(preempt_schedule); > preempt_dynamic_disable(preempt_schedule_notrace); > preempt_dynamic_disable(irqentry_exit_cond_resched); > + //FIXME avoid printk for klp restore if (mode != preempt_dynamic_mode) > pr_info("Dynamic Preempt: none\n"); > break; > > case preempt_dynamic_voluntary: > - preempt_dynamic_enable(cond_resched); > + if (!klp_override) > + preempt_dynamic_enable(cond_resched); > preempt_dynamic_enable(might_resched); > preempt_dynamic_disable(preempt_schedule); > preempt_dynamic_disable(preempt_schedule_notrace);
On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote: > On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote: > > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched); > > static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched); > > int __sched dynamic_cond_resched(void) > > { > > - if (!static_branch_unlikely(&sk_dynamic_cond_resched)) > > + if (!static_branch_unlikely(&sk_dynamic_cond_resched)) { > > + klp_sched_try_switch(); > > return 0; > > + } > > return __cond_resched(); > > } > > EXPORT_SYMBOL(dynamic_cond_resched); > > I would make the klp_sched_try_switch() not depend on > sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed > pass through __schedule(). > > But you'll probably want to check with Mark here, this all might > generate crap code on arm64. IIUC here klp_sched_try_switch() is a static call, so on arm64 this'll generate at least a load, a conditional branch, and an indirect branch. That's not ideal, but I'd have to benchmark it to find out whether it's a significant overhead relative to the baseline of PREEMPT_DYNAMIC. For arm64 it'd be a bit nicer to have another static key check, and a call to __klp_sched_try_switch(). That way the static key check gets turned into a NOP in the common case, and the call to __klp_sched_try_switch() can be a direct call (potentially a tail-call if we made it return 0). Thanks, Mark.
On Mon, Jan 30, 2023 at 06:36:32PM +0000, Mark Rutland wrote: > On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote: > > On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote: > > > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched); > > > static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched); > > > int __sched dynamic_cond_resched(void) > > > { > > > - if (!static_branch_unlikely(&sk_dynamic_cond_resched)) > > > + if (!static_branch_unlikely(&sk_dynamic_cond_resched)) { > > > + klp_sched_try_switch(); > > > return 0; > > > + } > > > return __cond_resched(); > > > } > > > EXPORT_SYMBOL(dynamic_cond_resched); > > > > I would make the klp_sched_try_switch() not depend on > > sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed > > pass through __schedule(). > > > > But you'll probably want to check with Mark here, this all might > > generate crap code on arm64. > > IIUC here klp_sched_try_switch() is a static call, so on arm64 this'll generate > at least a load, a conditional branch, and an indirect branch. That's not > ideal, but I'd have to benchmark it to find out whether it's a significant > overhead relative to the baseline of PREEMPT_DYNAMIC. > > For arm64 it'd be a bit nicer to have another static key check, and a call to > __klp_sched_try_switch(). That way the static key check gets turned into a NOP > in the common case, and the call to __klp_sched_try_switch() can be a direct > call (potentially a tail-call if we made it return 0). Hm, it might be nice if our out-of-line static call implementation would automatically do a static key check as part of static_call_cond() for NULL-type static calls. But the best answer is probably to just add inline static calls to arm64. Is the lack of objtool the only thing blocking that? Objtool is now modular, so all the controversial CFG reverse engineering is now optional, so it shouldn't be too hard to just enable objtool for static call inlines. -- Josh
On Mon, Jan 30, 2023 at 11:48:23AM -0800, Josh Poimboeuf wrote: > On Mon, Jan 30, 2023 at 06:36:32PM +0000, Mark Rutland wrote: > > On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote: > > > On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote: > > > > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched); > > > > static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched); > > > > int __sched dynamic_cond_resched(void) > > > > { > > > > - if (!static_branch_unlikely(&sk_dynamic_cond_resched)) > > > > + if (!static_branch_unlikely(&sk_dynamic_cond_resched)) { > > > > + klp_sched_try_switch(); > > > > return 0; > > > > + } > > > > return __cond_resched(); > > > > } > > > > EXPORT_SYMBOL(dynamic_cond_resched); > > > > > > I would make the klp_sched_try_switch() not depend on > > > sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed > > > pass through __schedule(). > > > > > > But you'll probably want to check with Mark here, this all might > > > generate crap code on arm64. > > > > IIUC here klp_sched_try_switch() is a static call, so on arm64 this'll generate > > at least a load, a conditional branch, and an indirect branch. That's not > > ideal, but I'd have to benchmark it to find out whether it's a significant > > overhead relative to the baseline of PREEMPT_DYNAMIC. > > > > For arm64 it'd be a bit nicer to have another static key check, and a call to > > __klp_sched_try_switch(). That way the static key check gets turned into a NOP > > in the common case, and the call to __klp_sched_try_switch() can be a direct > > call (potentially a tail-call if we made it return 0). > > Hm, it might be nice if our out-of-line static call implementation would > automatically do a static key check as part of static_call_cond() for > NULL-type static calls. > > But the best answer is probably to just add inline static calls to > arm64. Is the lack of objtool the only thing blocking that? The major issues were branch range limitations (and needing the linker to add PLTs), and painful instruction patching requirements (e.g. the architecture's "CMODX" rules for Concurrent MODification and eXecution of instructions). We went with the static key scheme above because that was what our assembled code generation would devolve to anyway. If we knew each call-site would only call a particular function or skip the call, then we could do better (and would probably need something like objtool to NOP that out at compile time), but since we don't know the callee at build time we can't ensure we have a PLT in range when necessary. > Objtool is now modular, so all the controversial CFG reverse engineering > is now optional, so it shouldn't be too hard to just enable objtool for > static call inlines. Funnily enough, I spent some time yesterday looking at enabling a trivial objtool for arm64 as I wanted some basic ELF rewriting functionality (to manipulate the mcount_loc table). So I'll likely be looking at that soon regardless of static calls. :) Thanks, Mark.
On Tue, Jan 31, 2023 at 10:22:09AM +0000, Mark Rutland wrote: > > Hm, it might be nice if our out-of-line static call implementation would > > automatically do a static key check as part of static_call_cond() for > > NULL-type static calls. > > > > But the best answer is probably to just add inline static calls to > > arm64. Is the lack of objtool the only thing blocking that? > > The major issues were branch range limitations (and needing the linker to add > PLTs), Does the compiler do the right thing (e.g., force PLT) if the branch target is outside the translation unit? I'm wondering if we could for example use objtool to help enforce such rules at the call site. > and painful instruction patching requirements (e.g. the architecture's > "CMODX" rules for Concurrent MODification and eXecution of instructions). We > went with the static key scheme above because that was what our assembled code > generation would devolve to anyway. > > If we knew each call-site would only call a particular function or skip the > call, then we could do better (and would probably need something like objtool > to NOP that out at compile time), but since we don't know the callee at build > time we can't ensure we have a PLT in range when necessary. Unfortunately most static calls have multiple destinations. And most don't have the option of being NULL. -- Josh
On Tue, Jan 31, 2023 at 08:38:32AM -0800, Josh Poimboeuf wrote: > On Tue, Jan 31, 2023 at 10:22:09AM +0000, Mark Rutland wrote: > > > Hm, it might be nice if our out-of-line static call implementation would > > > automatically do a static key check as part of static_call_cond() for > > > NULL-type static calls. > > > > > > But the best answer is probably to just add inline static calls to > > > arm64. Is the lack of objtool the only thing blocking that? > > > > The major issues were branch range limitations (and needing the linker to add > > PLTs), > > Does the compiler do the right thing (e.g., force PLT) if the branch > target is outside the translation unit? I'm wondering if we could for > example use objtool to help enforce such rules at the call site. It's the linker (rather than the compiler) that'll generate the PLT if the caller and callee are out of range at link time. There are a few other issues too (e.g. no guarnatee that the PLT isn't used by multiple distinct callers, CMODX patching requirements), so we'd have to generate a pseudo-PLT ourselves at build time with a patching-friendly code sequence. Ard had a prototype for that: https://lore.kernel.org/linux-arm-kernel/20211105145917.2828911-1-ardb@kernel.org/ ... but that was sufficiently painful that we went with the current static key approach: https://lore.kernel.org/all/20211109172408.49641-1-mark.rutland@arm.com/ > > and painful instruction patching requirements (e.g. the architecture's > > "CMODX" rules for Concurrent MODification and eXecution of instructions). We > > went with the static key scheme above because that was what our assembled code > > generation would devolve to anyway. > > > > If we knew each call-site would only call a particular function or skip the > > call, then we could do better (and would probably need something like objtool > > to NOP that out at compile time), but since we don't know the callee at build > > time we can't ensure we have a PLT in range when necessary. > > Unfortunately most static calls have multiple destinations. Sure, but here we're just enabling/disabling a call, which we could treat differently, or wrap at a different level within the scheduler code. I'm happy to take a look at that. > And most don't have the option of being NULL. Oh, I was under the impression that all could be disabled/skipped, which is what a NULL target implied. Thanks, Mark.
On Wed, Feb 01, 2023 at 11:10:20AM +0000, Mark Rutland wrote: > On Tue, Jan 31, 2023 at 08:38:32AM -0800, Josh Poimboeuf wrote: > > On Tue, Jan 31, 2023 at 10:22:09AM +0000, Mark Rutland wrote: > > > > Hm, it might be nice if our out-of-line static call implementation would > > > > automatically do a static key check as part of static_call_cond() for > > > > NULL-type static calls. > > > > > > > > But the best answer is probably to just add inline static calls to > > > > arm64. Is the lack of objtool the only thing blocking that? > > > > > > The major issues were branch range limitations (and needing the linker to add > > > PLTs), > > > > Does the compiler do the right thing (e.g., force PLT) if the branch > > target is outside the translation unit? I'm wondering if we could for > > example use objtool to help enforce such rules at the call site. > > It's the linker (rather than the compiler) that'll generate the PLT if the > caller and callee are out of range at link time. There are a few other issues > too (e.g. no guarnatee that the PLT isn't used by multiple distinct callers, > CMODX patching requirements), so we'd have to generate a pseudo-PLT ourselves > at build time with a patching-friendly code sequence. Ard had a prototype for > that: > > https://lore.kernel.org/linux-arm-kernel/20211105145917.2828911-1-ardb@kernel.org/ > > ... but that was sufficiently painful that we went with the current static key > approach: > > https://lore.kernel.org/all/20211109172408.49641-1-mark.rutland@arm.com/ Thanks for the background. Was there a reason for putting it out-of-line rather than directly in _cond_resched()? If it were inline then it wouldn't be that much different from the static called version and I wonder if we could simplify by just using the static key for all PREEMPT_DYNAMIC configs. > > > If we knew each call-site would only call a particular function or skip the > > > call, then we could do better (and would probably need something like objtool > > > to NOP that out at compile time), but since we don't know the callee at build > > > time we can't ensure we have a PLT in range when necessary. > > > > Unfortunately most static calls have multiple destinations. > > Sure, but here we're just enabling/disabling a call, which we could treat > differently, or wrap at a different level within the scheduler code. I'm happy > to take a look at that. I can try to emulate what you did for PREEMPT_DYNAMIC. I'll Cc you on my actual patch to come soon-ish. > > And most don't have the option of being NULL. > > Oh, I was under the impression that all could be disabled/skipped, which is > what a NULL target implied. I guess what I was trying to say is that if the target can be NULL, the call site has to use static_call_cond() to not break the !HAVE_STATIC_CALL case. But most call sites use static_call(). -- Josh
On Wed, Feb 01, 2023 at 08:57:27AM -0800, Josh Poimboeuf wrote: > On Wed, Feb 01, 2023 at 11:10:20AM +0000, Mark Rutland wrote: > > On Tue, Jan 31, 2023 at 08:38:32AM -0800, Josh Poimboeuf wrote: > > > On Tue, Jan 31, 2023 at 10:22:09AM +0000, Mark Rutland wrote: > > > > > Hm, it might be nice if our out-of-line static call implementation would > > > > > automatically do a static key check as part of static_call_cond() for > > > > > NULL-type static calls. > > > > > > > > > > But the best answer is probably to just add inline static calls to > > > > > arm64. Is the lack of objtool the only thing blocking that? > > > > > > > > The major issues were branch range limitations (and needing the linker to add > > > > PLTs), > > > > > > Does the compiler do the right thing (e.g., force PLT) if the branch > > > target is outside the translation unit? I'm wondering if we could for > > > example use objtool to help enforce such rules at the call site. > > > > It's the linker (rather than the compiler) that'll generate the PLT if the > > caller and callee are out of range at link time. There are a few other issues > > too (e.g. no guarnatee that the PLT isn't used by multiple distinct callers, > > CMODX patching requirements), so we'd have to generate a pseudo-PLT ourselves > > at build time with a patching-friendly code sequence. Ard had a prototype for > > that: > > > > https://lore.kernel.org/linux-arm-kernel/20211105145917.2828911-1-ardb@kernel.org/ > > > > ... but that was sufficiently painful that we went with the current static key > > approach: > > > > https://lore.kernel.org/all/20211109172408.49641-1-mark.rutland@arm.com/ > > Thanks for the background. > > Was there a reason for putting it out-of-line rather than directly in > _cond_resched()? I think that's mostly a historical accident; I'm not aware of a reaason we can't put that directly in _cond_resched(). Since we started from out-of-line static call trampolines, even the out-of-line static key check looked nicer, and I think we just never considered moving the static key check inline. > If it were inline then it wouldn't be that much different from the > static called version and I wonder if we could simplify by just using > the static key for all PREEMPT_DYNAMIC configs. That would be nice! > > > > If we knew each call-site would only call a particular function or skip the > > > > call, then we could do better (and would probably need something like objtool > > > > to NOP that out at compile time), but since we don't know the callee at build > > > > time we can't ensure we have a PLT in range when necessary. > > > > > > Unfortunately most static calls have multiple destinations. > > > > Sure, but here we're just enabling/disabling a call, which we could treat > > differently, or wrap at a different level within the scheduler code. I'm happy > > to take a look at that. > > I can try to emulate what you did for PREEMPT_DYNAMIC. I'll Cc you on > my actual patch to come soon-ish. I look forward to it! :) > > > And most don't have the option of being NULL. > > > > Oh, I was under the impression that all could be disabled/skipped, which is > > what a NULL target implied. > > I guess what I was trying to say is that if the target can be NULL, the > call site has to use static_call_cond() to not break the > !HAVE_STATIC_CALL case. But most call sites use static_call(). Ah, sorry -- I had missed that we had distinct static_call_cond() and static_call() helpers. Thanks, Mark.
On Mon, Jan 30, 2023 at 11:48 AM Josh Poimboeuf <jpoimboe@kernel.org> wrote: > > On Mon, Jan 30, 2023 at 06:36:32PM +0000, Mark Rutland wrote: > > On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote: > > > On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote: > > > > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched); > > > > static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched); > > > > int __sched dynamic_cond_resched(void) > > > > { > > > > - if (!static_branch_unlikely(&sk_dynamic_cond_resched)) > > > > + if (!static_branch_unlikely(&sk_dynamic_cond_resched)) { > > > > + klp_sched_try_switch(); > > > > return 0; > > > > + } > > > > return __cond_resched(); > > > > } > > > > EXPORT_SYMBOL(dynamic_cond_resched); > > > > > > I would make the klp_sched_try_switch() not depend on > > > sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed > > > pass through __schedule(). > > > > > > But you'll probably want to check with Mark here, this all might > > > generate crap code on arm64. > > > > IIUC here klp_sched_try_switch() is a static call, so on arm64 this'll generate > > at least a load, a conditional branch, and an indirect branch. That's not > > ideal, but I'd have to benchmark it to find out whether it's a significant > > overhead relative to the baseline of PREEMPT_DYNAMIC. > > > > For arm64 it'd be a bit nicer to have another static key check, and a call to > > __klp_sched_try_switch(). That way the static key check gets turned into a NOP > > in the common case, and the call to __klp_sched_try_switch() can be a direct > > call (potentially a tail-call if we made it return 0). > > Hm, it might be nice if our out-of-line static call implementation would > automatically do a static key check as part of static_call_cond() for > NULL-type static calls. > > But the best answer is probably to just add inline static calls to > arm64. Is the lack of objtool the only thing blocking that? > > Objtool is now modular, so all the controversial CFG reverse engineering > is now optional, so it shouldn't be too hard to just enable objtool for > static call inlines. This might be a little off topic, and maybe I missed some threads: How far are we from officially supporting livepatch on arm64? IIUC, stable stack unwinding is the missing piece at the moment? Thanks, Song
On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote: > On Fri, Jan 27, 2023 at 02:11:31PM -0800, Josh Poimboeuf wrote: > > > > diff --git a/include/linux/sched.h b/include/linux/sched.h > > index 4df2b3e76b30..fbcd3acca25c 100644 > > --- a/include/linux/sched.h > > +++ b/include/linux/sched.h > > @@ -36,6 +36,7 @@ > > #include <linux/seqlock.h> > > #include <linux/kcsan.h> > > #include <linux/rv.h> > > +#include <linux/livepatch_sched.h> > > #include <asm/kmap_size.h> > > > > /* task_struct member predeclarations (sorted alphabetically): */ > > @@ -2074,6 +2075,9 @@ DECLARE_STATIC_CALL(cond_resched, __cond_resched); > > > > static __always_inline int _cond_resched(void) > > { > > + //FIXME this is a bit redundant with preemption disabled > > + klp_sched_try_switch(); > > + > > return static_call_mod(cond_resched)(); > > } > > Right, I was thinking you'd do something like: > > static_call_update(cond_resched, klp_cond_resched); > > With: > > static int klp_cond_resched(void) > { > klp_try_switch_task(current); > return __cond_resched(); > } > > That would force cond_resched() into doing the transition thing, > irrespective of the preemption mode at hand. And then, when KLP be done, > re-run sched_dynamic_update() to reset it to whatever it ought to be. Ok, makes sense. > > > @@ -401,8 +421,10 @@ void klp_try_complete_transition(void) > > */ > > read_lock(&tasklist_lock); > > for_each_process_thread(g, task) > > - if (!klp_try_switch_task(task)) > > + if (!klp_try_switch_task(task)) { > > + set_tsk_need_resched(task); > > complete = false; > > + } > > Yeah, no, that's broken -- preemption state live in more than just the > TIF bit. Oops. > > read_unlock(&tasklist_lock); > > > > /* > > @@ -413,6 +435,7 @@ void klp_try_complete_transition(void) > > task = idle_task(cpu); > > if (cpu_online(cpu)) { > > if (!klp_try_switch_task(task)) { > > + set_tsk_need_resched(task); > > complete = false; > > /* Make idle task go through the main loop. */ > > wake_up_if_idle(cpu); > > Idem. > > Also, I don't see the point of this and the __schedule() hook here: The (poorly executed) idea was to catch kthreads which do if (need_resched()) schedule(); but I guess we can just replace those usages with cond_resched()? > > @@ -8500,8 +8502,10 @@ EXPORT_STATIC_CALL_TRAMP(might_resched); > > static DEFINE_STATIC_KEY_FALSE(sk_dynamic_cond_resched); > > int __sched dynamic_cond_resched(void) > > { > > - if (!static_branch_unlikely(&sk_dynamic_cond_resched)) > > + if (!static_branch_unlikely(&sk_dynamic_cond_resched)) { > > + klp_sched_try_switch(); > > return 0; > > + } > > return __cond_resched(); > > } > > EXPORT_SYMBOL(dynamic_cond_resched); > > I would make the klp_sched_try_switch() not depend on > sk_dynamic_cond_resched, because __cond_resched() is not a guaranteed > pass through __schedule(). > > But you'll probably want to check with Mark here, this all might > generate crap code on arm64. > > Both ways this seems to make KLP 'depend' (or at least work lots better) > when PREEMPT_DYNAMIC=y. Do we want a PREEMPT_DYNAMIC=n fallback for > _cond_resched() too? That was the intent but I obviously failed. Let me go rework it a bit. -- Josh
On Mon, Jan 30, 2023 at 01:40:18PM +0100, Peter Zijlstra wrote: > Both ways this seems to make KLP 'depend' (or at least work lots better) > when PREEMPT_DYNAMIC=y. Do we want a PREEMPT_DYNAMIC=n fallback for > _cond_resched() too? I would like this, as we have no other need for PREEMPT_DYNAMIC=y and currently have it disabled. Thanks, Seth
On Fri 2023-01-27 11:37:02, Peter Zijlstra wrote: > On Thu, Jan 26, 2023 at 08:43:55PM -0800, Josh Poimboeuf wrote: > > On Thu, Jan 26, 2023 at 03:12:35PM -0600, Seth Forshee (DigitalOcean) wrote: > > > On Thu, Jan 26, 2023 at 06:03:16PM +0100, Petr Mladek wrote: > > > > On Fri 2023-01-20 16:12:20, Seth Forshee (DigitalOcean) wrote: > > > > > We've fairly regularaly seen liveptches which cannot transition within kpatch's > > > > > timeout period due to busy vhost worker kthreads. > > > > > > > > I have missed this detail. Miroslav told me that we have solved > > > > something similar some time ago, see > > > > https://lore.kernel.org/all/20220507174628.2086373-1-song@kernel.org/ > > > > > > Interesting thread. I had thought about something along the lines of the > > > original patch, but there are some ideas in there that I hadn't > > > considered. > > > > Here's another idea, have we considered this? Have livepatch set > > TIF_NEED_RESCHED on all kthreads to force them into schedule(), and then > > have the scheduler call klp_try_switch_task() if TIF_PATCH_PENDING is > > set. > > > > Not sure how scheduler folks would feel about that ;-) > > So, let me try and page all that back in.... :-) > > KLP needs to unwind the stack to see if any of the patched functions are > active, if not, flip task to new set. > > Unwinding the stack of a task can be done when: > > - task is inactive (stable reg and stack) -- provided it stays inactive > while unwinding etc.. > > - task is current (guarantees stack doesn't dip below where we started > due to being busy on top etc..) > > Can NOT be done from interrupt context, because can hit in the middle of > setting up stack frames etc.. All the above seems correct. > The issue at hand is that some tasks run for a long time without passing > through an explicit check. There might actually be two possibilities why the transition fails too often: 1. The task might be in the running state most of the time. Therefore the backtrace is not reliable most of the time. In this case, some cooperation with the scheduler would really help. We would need to stop the task and check the stack when it is stopped. Something like the patch you proposed. 2. The task might be sleeping but almost always in a livepatched function. Therefore it could not be transitioned. It might be the case with vhost_worker(). The main loop is "tiny". The kthread probaly spends most of the time with processing a vhost_work. And if the "works" are livepatched... In this case, it would help to call klp_try_switch_task(current) in the main loop in vhost_worker(). It would always succeed when vhost_worker() is not livepatched on its own. Note that even this would not help with kPatch when a single vhost_work might need more than the 1 minute timout to get proceed. > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c > index f1b25ec581e0..06746095a724 100644 > --- a/kernel/livepatch/transition.c > +++ b/kernel/livepatch/transition.c > @@ -9,6 +9,7 @@ > > #include <linux/cpu.h> > #include <linux/stacktrace.h> > +#include <linux/stop_machine.h> > #include "core.h" > #include "patch.h" > #include "transition.h" > @@ -334,6 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task) > return !ret; > } > > +static int __stop_try_switch(void *arg) > +{ > + return klp_try_switch_task(arg) ? 0 : -EBUSY; > +} > + > +static bool klp_try_switch_task_harder(struct task_struct *task) > +{ > + return !stop_one_cpu(task_cpu(task), __stop_try_switch, task); > +} > + > /* > * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set. > * Kthreads with TIF_PATCH_PENDING set are woken up. Nice. I am surprised that it can be implemented so easily. Best Regards, Petr
On Fri, Jan 27, 2023 at 01:09:02PM +0100, Petr Mladek wrote: > There might actually be two possibilities why the transition fails > too often: > > 1. The task might be in the running state most of the time. Therefore > the backtrace is not reliable most of the time. > > In this case, some cooperation with the scheduler would really > help. We would need to stop the task and check the stack > when it is stopped. Something like the patch you proposed. This is the situation we are encountering. > 2. The task might be sleeping but almost always in a livepatched > function. Therefore it could not be transitioned. > > It might be the case with vhost_worker(). The main loop is "tiny". > The kthread probaly spends most of the time with processing > a vhost_work. And if the "works" are livepatched... > > In this case, it would help to call klp_try_switch_task(current) > in the main loop in vhost_worker(). It would always succeed > when vhost_worker() is not livepatched on its own. > > Note that even this would not help with kPatch when a single > vhost_work might need more than the 1 minute timout to get proceed. > > > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c > > index f1b25ec581e0..06746095a724 100644 > > --- a/kernel/livepatch/transition.c > > +++ b/kernel/livepatch/transition.c > > @@ -9,6 +9,7 @@ > > > > #include <linux/cpu.h> > > #include <linux/stacktrace.h> > > +#include <linux/stop_machine.h> > > #include "core.h" > > #include "patch.h" > > #include "transition.h" > > @@ -334,6 +335,16 @@ static bool klp_try_switch_task(struct task_struct *task) > > return !ret; > > } > > > > +static int __stop_try_switch(void *arg) > > +{ > > + return klp_try_switch_task(arg) ? 0 : -EBUSY; > > +} > > + > > +static bool klp_try_switch_task_harder(struct task_struct *task) > > +{ > > + return !stop_one_cpu(task_cpu(task), __stop_try_switch, task); > > +} > > + > > /* > > * Sends a fake signal to all non-kthread tasks with TIF_PATCH_PENDING set. > > * Kthreads with TIF_PATCH_PENDING set are woken up. > > Nice. I am surprised that it can be implemented so easily. Yes, that's a neat solution. I will give it a try. AIUI this still doesn't help for architectures without a reliable stacktrace though, right? So we probably should only try this for architectures which do have relaible stacktraces. Thanks, Seth
On Fri, Jan 20, 2023 at 04:12:20PM -0600, Seth Forshee (DigitalOcean) wrote: > We've fairly regularaly seen liveptches which cannot transition within kpatch's > timeout period due to busy vhost worker kthreads. In looking for a solution the > only answer I found was to call klp_update_patch_state() from a safe location. > I tried adding this call to vhost_worker(), and it works, but this creates the > potential for problems if a livepatch attempted to patch vhost_worker(). > Without a call to klp_update_patch_state() fully loaded vhost kthreads can > never switch because vhost_worker() will always appear on the stack, but with > the call these kthreads can switch but will still be running the old version of > vhost_worker(). > > To avoid this situation I've added a new function, klp_switch_current(), which > switches the current task only if its stack does not include any function being > patched. This allows kthreads to safely attempt switching themselves if a patch > is pending. There is at least one downside, however. Since there's no way for > the kthread to track whether it has already tried to switch for a pending patch > it can end up calling klp_switch_current() repeatedly when it can never be > safely switched. > > I don't know whether this is the right solution, and I'm happy to try out other > suggestions. But in my testing these patches proved effective in consistently > switching heavily loaded vhost kthreads almost immediately. > > To: Josh Poimboeuf <jpoimboe@kernel.org> > To: Jiri Kosina <jikos@kernel.org> > To: Miroslav Benes <mbenes@suse.cz> > To: Petr Mladek <pmladek@suse.com> > To: Joe Lawrence <joe.lawrence@redhat.com> > To: "Michael S. Tsirkin" <mst@redhat.com> > To: Jason Wang <jasowang@redhat.com> > Cc: live-patching@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: kvm@vger.kernel.org > Cc: virtualization@lists.linux-foundation.org > Cc: netdev@vger.kernel.org > Signed-off-by: Seth Forshee (DigitalOcean) <sforshee@kernel.org> Don't know enough about live patching to judge this. I'll let livepatch maintainers judge this, and merge through the livepatch tree if appropriate. For that: Acked-by: Michael S. Tsirkin <mst@redhat.com> but pls underestand this is more a 'looks ok superficially and I don't have better ideas' than 'I have reviewed this thoroughly'. > > --- > Seth Forshee (DigitalOcean) (2): > livepatch: add an interface for safely switching kthreads > vhost: check for pending livepatches from vhost worker kthreads > > drivers/vhost/vhost.c | 4 ++++ > include/linux/livepatch.h | 2 ++ > kernel/livepatch/transition.c | 11 +++++++++++ > 3 files changed, 17 insertions(+) > --- > base-commit: 5dc4c995db9eb45f6373a956eb1f69460e69e6d4 > change-id: 20230120-vhost-klp-switching-ba9a3ae38b8a > > Best regards, > -- > Seth Forshee (DigitalOcean) <sforshee@kernel.org>
© 2016 - 2025 Red Hat, Inc.