arch/x86/kvm/mmu/mmu.c | 7 +--- drivers/vhost/vhost.c | 2 +- include/linux/sched/vhost_task.h | 1 + kernel/vhost_task.c | 62 +++++++++++++++++++++++++++----- 4 files changed, 56 insertions(+), 16 deletions(-)
Michael, Do you want to take this through the vhost tree? It technically fixes a KVM bug, but this obviously touches far more vhost code than KVM code, and the patch that needs to go into 6.17 doesn't touch KVM at all. Fix a bug where KVM attempts to wake a vhost task that has already exited in response to a fatal signal, and tack on a few cleanups to harden against introducing similar bugs in the future. The issue is firmly a KVM problem, but I opted to fix the bug by making vhost_task_wake() safe against an exited task as doing so is far simpler and cleaner than implementing the same functionality in KVM, and I suspect that if there are other users of vhost_tasks in the future, then there's a good chance they will want/expect vhost_task to handle that detail. Note, this only started causing problems when commit 56180dd20c19 ("futex: Use RCU-based per-CPU reference counting instead of rcuref_t") landed, so the explosions are "new" in 6.17, but the bug has existed since KVM switched to vhost_task back in 6.13. v2: - Drop the "safe" postfix variant and make the "default" vhost_task_wake() safe. [Michael]. - Use vhost_task_wake() and __vhost_task_wake() for the public APIs, and vhost_task_wake_up_process() for the local helper. [Michael] - Drag the signalas back from their Spanish holiday. [Sebastian] v1: https://lore.kernel.org/all/20250826004012.3835150-1-seanjc@google.com Sean Christopherson (3): vhost_task: Don't wake KVM x86's recovery thread if vhost task was killed vhost_task: Allow caller to omit handle_sigkill() callback KVM: x86/mmu: Don't register a sigkill callback for NX hugepage recovery tasks arch/x86/kvm/mmu/mmu.c | 7 +--- drivers/vhost/vhost.c | 2 +- include/linux/sched/vhost_task.h | 1 + kernel/vhost_task.c | 62 +++++++++++++++++++++++++++----- 4 files changed, 56 insertions(+), 16 deletions(-) base-commit: 1b237f190eb3d36f52dffe07a40b5eb210280e00 -- 2.51.0.268.g9569e192d0-goog
On Wed, Aug 27, 2025, Sean Christopherson wrote: > Michael, > > Do you want to take this through the vhost tree? It technically fixes a KVM > bug, but this obviously touches far more vhost code than KVM code, and the > patch that needs to go into 6.17 doesn't touch KVM at all. Can this be squeezed into 6.17? I know it's very late in the cycle, and that the KVM bug is pre-existing, but the increased impact of the bug is new in 6.17 and I don't want 6.17 to release without a fix.
On Mon, Sep 15, 2025 at 02:03:00PM -0700, Sean Christopherson wrote: > On Wed, Aug 27, 2025, Sean Christopherson wrote: > > Michael, > > > > Do you want to take this through the vhost tree? It technically fixes a KVM > > bug, but this obviously touches far more vhost code than KVM code, and the > > patch that needs to go into 6.17 doesn't touch KVM at all. > > Can this be squeezed into 6.17? I know it's very late in the cycle, and that the > KVM bug is pre-existing, but the increased impact of the bug is new in 6.17 and I > don't want 6.17 to release without a fix. To clarify you just mean 1/3, yes?
On Mon, Sep 15, 2025, Michael S. Tsirkin wrote: > On Mon, Sep 15, 2025 at 02:03:00PM -0700, Sean Christopherson wrote: > > On Wed, Aug 27, 2025, Sean Christopherson wrote: > > > Michael, > > > > > > Do you want to take this through the vhost tree? It technically fixes a KVM > > > bug, but this obviously touches far more vhost code than KVM code, and the > > > patch that needs to go into 6.17 doesn't touch KVM at all. > > > > Can this be squeezed into 6.17? I know it's very late in the cycle, and that the > > KVM bug is pre-existing, but the increased impact of the bug is new in 6.17 and I > > don't want 6.17 to release without a fix. > > To clarify you just mean 1/3, yes? Correct, 2 and 3 aren't urgent.
On 2025-08-27 12:41:04 [-0700], Sean Christopherson wrote: > Michael, Sean, would the bellow work by chance? It is a quick shot but it looks symmetrical… diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c index bc738fa90c1d6..27107dcc1cbfe 100644 --- a/kernel/vhost_task.c +++ b/kernel/vhost_task.c @@ -100,6 +100,7 @@ void vhost_task_stop(struct vhost_task *vtsk) * freeing it below. */ wait_for_completion(&vtsk->exited); + put_task_struct(vtsk->task); kfree(vtsk); } EXPORT_SYMBOL_GPL(vhost_task_stop); @@ -148,7 +149,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *), return ERR_CAST(tsk); } - vtsk->task = tsk; + vtsk->task = get_task_struct(tsk); return vtsk; } EXPORT_SYMBOL_GPL(vhost_task_create); Sebastian
On Wed, Aug 27, 2025 at 10:10:59PM +0200, Sebastian Andrzej Siewior wrote: > On 2025-08-27 12:41:04 [-0700], Sean Christopherson wrote: > > Michael, > > Sean, > > would the bellow work by chance? It is a quick shot but it looks > symmetrical… > > diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c > index bc738fa90c1d6..27107dcc1cbfe 100644 > --- a/kernel/vhost_task.c > +++ b/kernel/vhost_task.c > @@ -100,6 +100,7 @@ void vhost_task_stop(struct vhost_task *vtsk) > * freeing it below. > */ > wait_for_completion(&vtsk->exited); > + put_task_struct(vtsk->task); > kfree(vtsk); > } > EXPORT_SYMBOL_GPL(vhost_task_stop); > @@ -148,7 +149,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *), > return ERR_CAST(tsk); > } > > - vtsk->task = tsk; > + vtsk->task = get_task_struct(tsk); > return vtsk; > } > EXPORT_SYMBOL_GPL(vhost_task_create); > > Sebastian So how about switching to this approach then? Instead of piling up fixes like we seem to do now ... Sean? -- MST
On 2025-09-18 11:09:05 [-0400], Michael S. Tsirkin wrote: > So how about switching to this approach then? > Instead of piling up fixes like we seem to do now ... > Sean? Since I am in To: here. You want me to resent my diff as a proper patch? Sebastian
On Thu, Sep 18, 2025 at 05:48:26PM +0200, Sebastian Andrzej Siewior wrote: > On 2025-09-18 11:09:05 [-0400], Michael S. Tsirkin wrote: > > So how about switching to this approach then? > > Instead of piling up fixes like we seem to do now ... > > Sean? > > Since I am in To: here. You want me to resent my diff as a proper patch? > > Sebastian Yes please, if Sean can ack it. -- MST
vhost_task_create() creates a task and keeps a reference to its
task_struct. That task may exit early via a signal and its task_struct
will be released.
A pending vhost_task_wake() will then attempt to wake the task and
access a task_struct which is no longer there.
Acquire a reference on the task_struct while creating the thread and
release the reference while the struct vhost_task itself is removed.
If the task exits early due to a signal, then the vhost_task_wake() will
still access a valid task_struct. The wake is safe and will be skipped
in this case.
Fixes: f9010dbdce911 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression")
Reported-by: Sean Christopherson <seanjc@google.com>
Closes: https://lore.kernel.org/all/aKkLEtoDXKxAAWju@google.com/
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
---
kernel/vhost_task.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c
index bc738fa90c1d6..27107dcc1cbfe 100644
--- a/kernel/vhost_task.c
+++ b/kernel/vhost_task.c
@@ -100,6 +100,7 @@ void vhost_task_stop(struct vhost_task *vtsk)
* freeing it below.
*/
wait_for_completion(&vtsk->exited);
+ put_task_struct(vtsk->task);
kfree(vtsk);
}
EXPORT_SYMBOL_GPL(vhost_task_stop);
@@ -148,7 +149,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *),
return ERR_CAST(tsk);
}
- vtsk->task = tsk;
+ vtsk->task = get_task_struct(tsk);
return vtsk;
}
EXPORT_SYMBOL_GPL(vhost_task_create);
--
2.51.0
Subject: that is reference -> that is referenced On Thu, Sep 18, 2025 at 08:11:44PM +0200, Sebastian Andrzej Siewior wrote: > vhost_task_create() creates a task and keeps a reference to its > task_struct. That task may exit early via a signal and its task_struct > will be released. > A pending vhost_task_wake() will then attempt to wake the task and > access a task_struct which is no longer there. > > Acquire a reference on the task_struct while creating the thread and > release the reference while the struct vhost_task itself is removed. > If the task exits early due to a signal, then the vhost_task_wake() will > still access a valid task_struct. The wake is safe and will be skipped > in this case. > > Fixes: f9010dbdce911 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") > Reported-by: Sean Christopherson <seanjc@google.com> > Closes: https://lore.kernel.org/all/aKkLEtoDXKxAAWju@google.com/ > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- > kernel/vhost_task.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c > index bc738fa90c1d6..27107dcc1cbfe 100644 > --- a/kernel/vhost_task.c > +++ b/kernel/vhost_task.c > @@ -100,6 +100,7 @@ void vhost_task_stop(struct vhost_task *vtsk) > * freeing it below. > */ > wait_for_completion(&vtsk->exited); > + put_task_struct(vtsk->task); > kfree(vtsk); > } > EXPORT_SYMBOL_GPL(vhost_task_stop); > @@ -148,7 +149,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *), > return ERR_CAST(tsk); > } > > - vtsk->task = tsk; > + vtsk->task = get_task_struct(tsk); > return vtsk; > } > EXPORT_SYMBOL_GPL(vhost_task_create); > -- > 2.51.0
On Sun, Sep 21, 2025 at 04:56:20PM -0400, Michael S. Tsirkin wrote: > Subject: that is reference -> that is referenced to note i fixed it for now. just dropped "that is referenced" completely. shorter. > On Thu, Sep 18, 2025 at 08:11:44PM +0200, Sebastian Andrzej Siewior wrote: > > vhost_task_create() creates a task and keeps a reference to its > > task_struct. That task may exit early via a signal and its task_struct > > will be released. > > A pending vhost_task_wake() will then attempt to wake the task and > > access a task_struct which is no longer there. > > > > Acquire a reference on the task_struct while creating the thread and > > release the reference while the struct vhost_task itself is removed. > > If the task exits early due to a signal, then the vhost_task_wake() will > > still access a valid task_struct. The wake is safe and will be skipped > > in this case. > > > > Fixes: f9010dbdce911 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") > > Reported-by: Sean Christopherson <seanjc@google.com> > > Closes: https://lore.kernel.org/all/aKkLEtoDXKxAAWju@google.com/ > > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > --- > > kernel/vhost_task.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c > > index bc738fa90c1d6..27107dcc1cbfe 100644 > > --- a/kernel/vhost_task.c > > +++ b/kernel/vhost_task.c > > @@ -100,6 +100,7 @@ void vhost_task_stop(struct vhost_task *vtsk) > > * freeing it below. > > */ > > wait_for_completion(&vtsk->exited); > > + put_task_struct(vtsk->task); > > kfree(vtsk); > > } > > EXPORT_SYMBOL_GPL(vhost_task_stop); > > @@ -148,7 +149,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *), > > return ERR_CAST(tsk); > > } > > > > - vtsk->task = tsk; > > + vtsk->task = get_task_struct(tsk); > > return vtsk; > > } > > EXPORT_SYMBOL_GPL(vhost_task_create); > > -- > > 2.51.0
On Thu, Sep 18, 2025, Sebastian Andrzej Siewior wrote: > vhost_task_create() creates a task and keeps a reference to its > task_struct. That task may exit early via a signal and its task_struct > will be released. > A pending vhost_task_wake() will then attempt to wake the task and > access a task_struct which is no longer there. > > Acquire a reference on the task_struct while creating the thread and > release the reference while the struct vhost_task itself is removed. > If the task exits early due to a signal, then the vhost_task_wake() will > still access a valid task_struct. The wake is safe and will be skipped > in this case. > > Fixes: f9010dbdce911 ("fork, vhost: Use CLONE_THREAD to fix freezer/ps regression") > Reported-by: Sean Christopherson <seanjc@google.com> > Closes: https://lore.kernel.org/all/aKkLEtoDXKxAAWju@google.com/ > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- Tested-by: Sean Christopherson <seanjc@google.com>
On Thu, Sep 18, 2025, Sebastian Andrzej Siewior wrote: > On 2025-09-18 11:09:05 [-0400], Michael S. Tsirkin wrote: > > So how about switching to this approach then? > > Instead of piling up fixes like we seem to do now ... I don't have a strong preference for 6.17, beyond landing a fix of some kind. I think there are three options for 6.17, in order of "least like to break something": 1. Sebastian's get_task_struct() fix 2. This series, without the KILLED sanity check in __vhost_task_wake() 3. This series, with my fixup (with which syzbot was happy) Longer term, I'd still like to land everything though. > > Sean? > > Since I am in To: here. You want me to resent my diff as a proper patch? Ya, I think it makes sense to harden against UAF even if we fix the KVM bug more directly.
On Thu, Sep 18, 2025 at 09:04:07AM -0700, Sean Christopherson wrote: > On Thu, Sep 18, 2025, Sebastian Andrzej Siewior wrote: > > On 2025-09-18 11:09:05 [-0400], Michael S. Tsirkin wrote: > > > So how about switching to this approach then? > > > Instead of piling up fixes like we seem to do now ... > > I don't have a strong preference for 6.17, beyond landing a fix of some kind. > I think there are three options for 6.17, in order of "least like to break > something": > > 1. Sebastian's get_task_struct() fix I am just a bit apprehensive that we don't create a situation where we leak the task struct somehow, given the limited testing time. Can you help me get convinced that risk is 0? > 2. This series, without the KILLED sanity check in __vhost_task_wake() > 3. This series, with my fixup (with which syzbot was happy) > > Longer term, I'd still like to land everything though. No problem with that. > > > Sean? > > > > Since I am in To: here. You want me to resent my diff as a proper patch? > > Ya, I think it makes sense to harden against UAF even if we fix the KVM bug more > directly.
On Thu, Sep 18, 2025, Michael S. Tsirkin wrote: > On Thu, Sep 18, 2025 at 09:04:07AM -0700, Sean Christopherson wrote: > > On Thu, Sep 18, 2025, Sebastian Andrzej Siewior wrote: > > > On 2025-09-18 11:09:05 [-0400], Michael S. Tsirkin wrote: > > > > So how about switching to this approach then? > > > > Instead of piling up fixes like we seem to do now ... > > > > I don't have a strong preference for 6.17, beyond landing a fix of some kind. > > I think there are three options for 6.17, in order of "least like to break > > something": > > > > 1. Sebastian's get_task_struct() fix > > > I am just a bit apprehensive that we don't create a situation > where we leak the task struct somehow, given the limited > testing time. Can you help me get convinced that risk is 0? I doubt it, I share same similar concerns about lack of testing. So I guess thinking about this again, #2 is probably safer since it'd only impact KVM? > > 2. This series, without the KILLED sanity check in __vhost_task_wake() > > 3. This series, with my fixup (with which syzbot was happy)
On Thu, Sep 18, 2025 at 09:52:19AM -0700, Sean Christopherson wrote: > On Thu, Sep 18, 2025, Michael S. Tsirkin wrote: > > On Thu, Sep 18, 2025 at 09:04:07AM -0700, Sean Christopherson wrote: > > > On Thu, Sep 18, 2025, Sebastian Andrzej Siewior wrote: > > > > On 2025-09-18 11:09:05 [-0400], Michael S. Tsirkin wrote: > > > > > So how about switching to this approach then? > > > > > Instead of piling up fixes like we seem to do now ... > > > > > > I don't have a strong preference for 6.17, beyond landing a fix of some kind. > > > I think there are three options for 6.17, in order of "least like to break > > > something": > > > > > > 1. Sebastian's get_task_struct() fix > > > > > > I am just a bit apprehensive that we don't create a situation > > where we leak the task struct somehow, given the limited > > testing time. Can you help me get convinced that risk is 0? > > I doubt it, I share same similar concerns about lack of testing. So I guess > thinking about this again, #2 is probably safer since it'd only impact KVM? I can't say I understand completely how we get that state though? Why did the warning trigger if it's not a UAF? > > > 2. This series, without the KILLED sanity check in __vhost_task_wake() > > > 3. This series, with my fixup (with which syzbot was happy)
On Thu, Sep 18, 2025, Michael S. Tsirkin wrote: > On Thu, Sep 18, 2025 at 09:52:19AM -0700, Sean Christopherson wrote: > > On Thu, Sep 18, 2025, Michael S. Tsirkin wrote: > > > On Thu, Sep 18, 2025 at 09:04:07AM -0700, Sean Christopherson wrote: > > > > On Thu, Sep 18, 2025, Sebastian Andrzej Siewior wrote: > > > > > On 2025-09-18 11:09:05 [-0400], Michael S. Tsirkin wrote: > > > > > > So how about switching to this approach then? > > > > > > Instead of piling up fixes like we seem to do now ... > > > > > > > > I don't have a strong preference for 6.17, beyond landing a fix of some kind. > > > > I think there are three options for 6.17, in order of "least like to break > > > > something": > > > > > > > > 1. Sebastian's get_task_struct() fix > > > > > > > > > I am just a bit apprehensive that we don't create a situation > > > where we leak the task struct somehow, given the limited > > > testing time. Can you help me get convinced that risk is 0? > > > > I doubt it, I share same similar concerns about lack of testing. So I guess > > thinking about this again, #2 is probably safer since it'd only impact KVM? > > I can't say I understand completely how we get that state though? > Why did the warning trigger if it's not a UAF? It's purely a flaw in the sanity check itself due to the ordering in vhost_task_fn(). As is, vhost_task_fn() marks the task KILLED before invoking ->handle_sigkill(), i.e. before vhost_worker_killed() is guaranteed to complete, and thus before worker->killed is set. As a result, vhost can keep waking workers that have KILLED set, but haven't actually exited. That's perfectly fine as UAF won't occur until do_exit() is called, and that won't happen until ->handle_sigkill() completes. > > > > 2. This series, without the KILLED sanity check in __vhost_task_wake() > > > > 3. This series, with my fixup (with which syzbot was happy) >
Tested this series of patches's v2 again with vhost-net regression tests, everything works fine. Tested-by: Lei Yang <leiyang@redhat.com> On Thu, Aug 28, 2025 at 4:11 AM Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > > On 2025-08-27 12:41:04 [-0700], Sean Christopherson wrote: > > Michael, > > Sean, > > would the bellow work by chance? It is a quick shot but it looks > symmetrical… > > diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c > index bc738fa90c1d6..27107dcc1cbfe 100644 > --- a/kernel/vhost_task.c > +++ b/kernel/vhost_task.c > @@ -100,6 +100,7 @@ void vhost_task_stop(struct vhost_task *vtsk) > * freeing it below. > */ > wait_for_completion(&vtsk->exited); > + put_task_struct(vtsk->task); > kfree(vtsk); > } > EXPORT_SYMBOL_GPL(vhost_task_stop); > @@ -148,7 +149,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *), > return ERR_CAST(tsk); > } > > - vtsk->task = tsk; > + vtsk->task = get_task_struct(tsk); > return vtsk; > } > EXPORT_SYMBOL_GPL(vhost_task_create); > > Sebastian >
On Wed, Aug 27, 2025, Sebastian Andrzej Siewior wrote: > On 2025-08-27 12:41:04 [-0700], Sean Christopherson wrote: > > Michael, > > Sean, > > would the bellow work by chance? It is a quick shot but it looks > symmetrical… Gah, sorry, I flagged your earlier mail and then forgot to circle back to it (for whatever reason, I didn't entirely grok what you were suggesting). > diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c > index bc738fa90c1d6..27107dcc1cbfe 100644 > --- a/kernel/vhost_task.c > +++ b/kernel/vhost_task.c > @@ -100,6 +100,7 @@ void vhost_task_stop(struct vhost_task *vtsk) > * freeing it below. > */ > wait_for_completion(&vtsk->exited); > + put_task_struct(vtsk->task); > kfree(vtsk); > } > EXPORT_SYMBOL_GPL(vhost_task_stop); > @@ -148,7 +149,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *), > return ERR_CAST(tsk); > } > > - vtsk->task = tsk; > + vtsk->task = get_task_struct(tsk); > return vtsk; > } > EXPORT_SYMBOL_GPL(vhost_task_create); Nice! This fixes things too. Either solution works for me. Or maybe do both? Attempting to wake a task that vhost_task knows has exited (is exiting?) is a bit gross, but even with that hardening, guarding against UAF is very nice to have too. Tested-by: Sean Christopherson <seanjc@google.com>
On Wed, Aug 27, 2025 at 05:16:35PM -0700, Sean Christopherson wrote: > On Wed, Aug 27, 2025, Sebastian Andrzej Siewior wrote: > > On 2025-08-27 12:41:04 [-0700], Sean Christopherson wrote: > > > Michael, > > > > Sean, > > > > would the bellow work by chance? It is a quick shot but it looks > > symmetrical… > > Gah, sorry, I flagged your earlier mail and then forgot to circle back to it > (for whatever reason, I didn't entirely grok what you were suggesting). > > > diff --git a/kernel/vhost_task.c b/kernel/vhost_task.c > > index bc738fa90c1d6..27107dcc1cbfe 100644 > > --- a/kernel/vhost_task.c > > +++ b/kernel/vhost_task.c > > @@ -100,6 +100,7 @@ void vhost_task_stop(struct vhost_task *vtsk) > > * freeing it below. > > */ > > wait_for_completion(&vtsk->exited); > > + put_task_struct(vtsk->task); > > kfree(vtsk); > > } > > EXPORT_SYMBOL_GPL(vhost_task_stop); > > @@ -148,7 +149,7 @@ struct vhost_task *vhost_task_create(bool (*fn)(void *), > > return ERR_CAST(tsk); > > } > > > > - vtsk->task = tsk; > > + vtsk->task = get_task_struct(tsk); > > return vtsk; > > } > > EXPORT_SYMBOL_GPL(vhost_task_create); > > Nice! This fixes things too. Either solution works for me. Or maybe do both? > Attempting to wake a task that vhost_task knows has exited (is exiting?) is a > bit gross, but even with that hardening, guarding against UAF is very nice to > have too. > > Tested-by: Sean Christopherson <seanjc@google.com> Sure let's do both. -- MST
On 2025-08-27 17:16:35 [-0700], Sean Christopherson wrote: > Nice! This fixes things too. Either solution works for me. Or maybe do both? > Attempting to wake a task that vhost_task knows has exited (is exiting?) is a > bit gross, but even with that hardening, guarding against UAF is very nice to > have too. I don't mind either way. If this is requested I can submit a proper patch. > Tested-by: Sean Christopherson <seanjc@google.com> Sebastian
© 2016 - 2025 Red Hat, Inc.