Now that KVM loads from vcpu_array if and only if the target index is
valid with respect to online_vcpus, i.e. now that it is safe to erase a
not-fully-onlined vCPU entry, revert to storing into vcpu_array before
success is guaranteed.
If xa_store() fails, which _should_ be impossible, then putting the vCPU's
reference to 'struct kvm' results in a refcounting bug as the vCPU fd has
been installed and owns the vCPU's reference.
This was found by inspection, but forcing the xa_store() to fail
confirms the problem:
| Unable to handle kernel paging request at virtual address ffff800080ecd960
| Call trace:
| _raw_spin_lock_irq+0x2c/0x70
| kvm_irqfd_release+0x24/0xa0
| kvm_vm_release+0x1c/0x38
| __fput+0x88/0x2ec
| ____fput+0x10/0x1c
| task_work_run+0xb0/0xd4
| do_exit+0x210/0x854
| do_group_exit+0x70/0x98
| get_signal+0x6b0/0x73c
| do_signal+0xa4/0x11e8
| do_notify_resume+0x60/0x12c
| el0_svc+0x64/0x68
| el0t_64_sync_handler+0x84/0xfc
| el0t_64_sync+0x190/0x194
| Code: b9000909 d503201f 2a1f03e1 52800028 (88e17c08)
Practically speaking, this is a non-issue as xa_store() can't fail, absent
a nasty kernel bug. But the code is visually jarring and technically
broken.
This reverts commit afb2acb2e3a32e4d56f7fbd819769b98ed1b7520.
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Michal Luczaj <mhal@rbox.co>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marc Zyngier <maz@kernel.org>
Reported-by: Will Deacon <will@kernel.org>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
virt/kvm/kvm_main.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index fca9f74e9544..f081839521ef 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -4283,7 +4283,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
}
vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
- r = xa_reserve(&kvm->vcpu_array, vcpu->vcpu_idx, GFP_KERNEL_ACCOUNT);
+ r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
+ BUG_ON(r == -EBUSY);
if (r)
goto unlock_vcpu_destroy;
@@ -4298,12 +4299,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
kvm_get_kvm(kvm);
r = create_vcpu_fd(vcpu);
if (r < 0)
- goto kvm_put_xa_release;
-
- if (KVM_BUG_ON(xa_store(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, 0), kvm)) {
- r = -EINVAL;
- goto kvm_put_xa_release;
- }
+ goto kvm_put_xa_erase;
/*
* Pairs with smp_rmb() in kvm_get_vcpu. Store the vcpu
@@ -4318,10 +4314,10 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id)
kvm_create_vcpu_debugfs(vcpu);
return r;
-kvm_put_xa_release:
+kvm_put_xa_erase:
mutex_unlock(&vcpu->mutex);
kvm_put_kvm_no_destroy(kvm);
- xa_release(&kvm->vcpu_array, vcpu->vcpu_idx);
+ xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx);
unlock_vcpu_destroy:
mutex_unlock(&kvm->lock);
kvm_dirty_ring_free(&vcpu->dirty_ring);
--
2.47.0.rc0.187.ge670bccf7e-goog
On 10/9/24 17:04, Sean Christopherson wrote: > Now that KVM loads from vcpu_array if and only if the target index is > valid with respect to online_vcpus, i.e. now that it is safe to erase a > not-fully-onlined vCPU entry, revert to storing into vcpu_array before > success is guaranteed. > > If xa_store() fails, which _should_ be impossible, then putting the vCPU's > reference to 'struct kvm' results in a refcounting bug as the vCPU fd has > been installed and owns the vCPU's reference. > > This was found by inspection, but forcing the xa_store() to fail > confirms the problem: > > | Unable to handle kernel paging request at virtual address ffff800080ecd960 > | Call trace: > | _raw_spin_lock_irq+0x2c/0x70 > | kvm_irqfd_release+0x24/0xa0 > | kvm_vm_release+0x1c/0x38 > | __fput+0x88/0x2ec > | ____fput+0x10/0x1c > | task_work_run+0xb0/0xd4 > | do_exit+0x210/0x854 > | do_group_exit+0x70/0x98 > | get_signal+0x6b0/0x73c > | do_signal+0xa4/0x11e8 > | do_notify_resume+0x60/0x12c > | el0_svc+0x64/0x68 > | el0t_64_sync_handler+0x84/0xfc > | el0t_64_sync+0x190/0x194 > | Code: b9000909 d503201f 2a1f03e1 52800028 (88e17c08) > > Practically speaking, this is a non-issue as xa_store() can't fail, absent > a nasty kernel bug. But the code is visually jarring and technically > broken. > > This reverts commit afb2acb2e3a32e4d56f7fbd819769b98ed1b7520. > > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Michal Luczaj <mhal@rbox.co> > Cc: Alexander Potapenko <glider@google.com> > Cc: Marc Zyngier <maz@kernel.org> > Reported-by: Will Deacon <will@kernel.org> > Signed-off-by: Sean Christopherson <seanjc@google.com> > --- > virt/kvm/kvm_main.c | 14 +++++--------- > 1 file changed, 5 insertions(+), 9 deletions(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index fca9f74e9544..f081839521ef 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -4283,7 +4283,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id) > } > > vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus); > - r = xa_reserve(&kvm->vcpu_array, vcpu->vcpu_idx, GFP_KERNEL_ACCOUNT); > + r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT); > + BUG_ON(r == -EBUSY); > if (r) > goto unlock_vcpu_destroy; > > @@ -4298,12 +4299,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id) > kvm_get_kvm(kvm); > r = create_vcpu_fd(vcpu); > if (r < 0) > - goto kvm_put_xa_release; > - > - if (KVM_BUG_ON(xa_store(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, 0), kvm)) { > - r = -EINVAL; > - goto kvm_put_xa_release; > - } > + goto kvm_put_xa_erase; I also find it a bit jarring though that we have to undo the insertion. This is a chicken-and-egg situation where you are pick one operation B that will have to undo operation A if it fails. But what xa_store is doing, is breaking this deadlock. The code is a bit longer, sure, but I don't see the point in complicating the vcpu_array invariants and letting an entry disappear. The rest of the series is still good, of course. Paolo > /* > * Pairs with smp_rmb() in kvm_get_vcpu. Store the vcpu > @@ -4318,10 +4314,10 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id) > kvm_create_vcpu_debugfs(vcpu); > return r; > > -kvm_put_xa_release: > +kvm_put_xa_erase: > mutex_unlock(&vcpu->mutex); > kvm_put_kvm_no_destroy(kvm); > - xa_release(&kvm->vcpu_array, vcpu->vcpu_idx); > + xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx); > unlock_vcpu_destroy: > mutex_unlock(&kvm->lock); > kvm_dirty_ring_free(&vcpu->dirty_ring);
On Thu, Oct 10, 2024, Paolo Bonzini wrote: > On 10/9/24 17:04, Sean Christopherson wrote: > > Now that KVM loads from vcpu_array if and only if the target index is > > valid with respect to online_vcpus, i.e. now that it is safe to erase a > > not-fully-onlined vCPU entry, revert to storing into vcpu_array before > > success is guaranteed. > > > > If xa_store() fails, which _should_ be impossible, then putting the vCPU's > > reference to 'struct kvm' results in a refcounting bug as the vCPU fd has > > been installed and owns the vCPU's reference. > > > > This was found by inspection, but forcing the xa_store() to fail > > confirms the problem: > > > > | Unable to handle kernel paging request at virtual address ffff800080ecd960 > > | Call trace: > > | _raw_spin_lock_irq+0x2c/0x70 > > | kvm_irqfd_release+0x24/0xa0 > > | kvm_vm_release+0x1c/0x38 > > | __fput+0x88/0x2ec > > | ____fput+0x10/0x1c > > | task_work_run+0xb0/0xd4 > > | do_exit+0x210/0x854 > > | do_group_exit+0x70/0x98 > > | get_signal+0x6b0/0x73c > > | do_signal+0xa4/0x11e8 > > | do_notify_resume+0x60/0x12c > > | el0_svc+0x64/0x68 > > | el0t_64_sync_handler+0x84/0xfc > > | el0t_64_sync+0x190/0x194 > > | Code: b9000909 d503201f 2a1f03e1 52800028 (88e17c08) > > > > Practically speaking, this is a non-issue as xa_store() can't fail, absent > > a nasty kernel bug. But the code is visually jarring and technically > > broken. > > > > This reverts commit afb2acb2e3a32e4d56f7fbd819769b98ed1b7520. > > > > Cc: Paolo Bonzini <pbonzini@redhat.com> > > Cc: Michal Luczaj <mhal@rbox.co> > > Cc: Alexander Potapenko <glider@google.com> > > Cc: Marc Zyngier <maz@kernel.org> > > Reported-by: Will Deacon <will@kernel.org> > > Signed-off-by: Sean Christopherson <seanjc@google.com> > > --- > > virt/kvm/kvm_main.c | 14 +++++--------- > > 1 file changed, 5 insertions(+), 9 deletions(-) > > > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > > index fca9f74e9544..f081839521ef 100644 > > --- a/virt/kvm/kvm_main.c > > +++ b/virt/kvm/kvm_main.c > > @@ -4283,7 +4283,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id) > > } > > vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus); > > - r = xa_reserve(&kvm->vcpu_array, vcpu->vcpu_idx, GFP_KERNEL_ACCOUNT); > > + r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT); > > + BUG_ON(r == -EBUSY); > > if (r) > > goto unlock_vcpu_destroy; > > @@ -4298,12 +4299,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id) > > kvm_get_kvm(kvm); > > r = create_vcpu_fd(vcpu); > > if (r < 0) > > - goto kvm_put_xa_release; > > - > > - if (KVM_BUG_ON(xa_store(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, 0), kvm)) { > > - r = -EINVAL; > > - goto kvm_put_xa_release; > > - } > > + goto kvm_put_xa_erase; > > I also find it a bit jarring though that we have to undo the insertion. This > is a chicken-and-egg situation where you are pick one operation B that will > have to undo operation A if it fails. But what xa_store is doing, is > breaking this deadlock. > > The code is a bit longer, sure, but I don't see the point in complicating > the vcpu_array invariants and letting an entry disappear. But we only need one rule: vcpu_array[x] is valid if and only if 'x' is less than online_vcpus. And that rule is necessary regardless of whether or not vcpu_array[x] is filled before success is guaranteed. I'm not concerned about the code length, it's that we need to do _something_ if xa_store() fails. Yeah, it should never happen, but knowingly doing nothing feels all kinds of wrong. I don't like BUG(), because it's obviously very doable to gracefully handle failure. And a WARN() is rather pointless, because continuing on with an invalid entry is all but guaranteed to crash, i.e. is little more than a deferred BUG() in this case.
On 10/10/24 19:48, Sean Christopherson wrote: > On Thu, Oct 10, 2024, Paolo Bonzini wrote: >> On 10/9/24 17:04, Sean Christopherson wrote: >>> Now that KVM loads from vcpu_array if and only if the target index is >>> valid with respect to online_vcpus, i.e. now that it is safe to erase a >>> not-fully-onlined vCPU entry, revert to storing into vcpu_array before >>> success is guaranteed. >>> >>> If xa_store() fails, which _should_ be impossible, then putting the vCPU's >>> reference to 'struct kvm' results in a refcounting bug as the vCPU fd has >>> been installed and owns the vCPU's reference. >>> >>> This was found by inspection, but forcing the xa_store() to fail >>> confirms the problem: >>> >>> | Unable to handle kernel paging request at virtual address ffff800080ecd960 >>> | Call trace: >>> | _raw_spin_lock_irq+0x2c/0x70 >>> | kvm_irqfd_release+0x24/0xa0 >>> | kvm_vm_release+0x1c/0x38 >>> | __fput+0x88/0x2ec >>> | ____fput+0x10/0x1c >>> | task_work_run+0xb0/0xd4 >>> | do_exit+0x210/0x854 >>> | do_group_exit+0x70/0x98 >>> | get_signal+0x6b0/0x73c >>> | do_signal+0xa4/0x11e8 >>> | do_notify_resume+0x60/0x12c >>> | el0_svc+0x64/0x68 >>> | el0t_64_sync_handler+0x84/0xfc >>> | el0t_64_sync+0x190/0x194 >>> | Code: b9000909 d503201f 2a1f03e1 52800028 (88e17c08) >>> >>> Practically speaking, this is a non-issue as xa_store() can't fail, absent >>> a nasty kernel bug. But the code is visually jarring and technically >>> broken. >>> >>> This reverts commit afb2acb2e3a32e4d56f7fbd819769b98ed1b7520. >>> >>> Cc: Paolo Bonzini <pbonzini@redhat.com> >>> Cc: Michal Luczaj <mhal@rbox.co> >>> Cc: Alexander Potapenko <glider@google.com> >>> Cc: Marc Zyngier <maz@kernel.org> >>> Reported-by: Will Deacon <will@kernel.org> >>> Signed-off-by: Sean Christopherson <seanjc@google.com> >>> --- >>> virt/kvm/kvm_main.c | 14 +++++--------- >>> 1 file changed, 5 insertions(+), 9 deletions(-) >>> >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >>> index fca9f74e9544..f081839521ef 100644 >>> --- a/virt/kvm/kvm_main.c >>> +++ b/virt/kvm/kvm_main.c >>> @@ -4283,7 +4283,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id) >>> } >>> vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus); >>> - r = xa_reserve(&kvm->vcpu_array, vcpu->vcpu_idx, GFP_KERNEL_ACCOUNT); >>> + r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT); >>> + BUG_ON(r == -EBUSY); >>> if (r) >>> goto unlock_vcpu_destroy; >>> @@ -4298,12 +4299,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, unsigned long id) >>> kvm_get_kvm(kvm); >>> r = create_vcpu_fd(vcpu); >>> if (r < 0) >>> - goto kvm_put_xa_release; >>> - >>> - if (KVM_BUG_ON(xa_store(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, 0), kvm)) { >>> - r = -EINVAL; >>> - goto kvm_put_xa_release; >>> - } >>> + goto kvm_put_xa_erase; >> >> I also find it a bit jarring though that we have to undo the insertion. This >> is a chicken-and-egg situation where you are pick one operation B that will >> have to undo operation A if it fails. But what xa_store is doing, is >> breaking this deadlock. >> >> The code is a bit longer, sure, but I don't see the point in complicating >> the vcpu_array invariants and letting an entry disappear. > > But we only need one rule: vcpu_array[x] is valid if and only if 'x' is less than > online_vcpus. And that rule is necessary regardless of whether or not vcpu_array[x] > is filled before success is guaranteed. Even if the invariant is explainable I still find xa_erase to be uglier than xa_release, but maybe it's just me. The reason I'm not fully convinced by the explanation is that... > I'm not concerned about the code length, it's that we need to do _something_ if > xa_store() fails. Yeah, it should never happen, but knowingly doing nothing feels > all kinds of wrong. ... it seems to me that this is not just an issue in KVM code; it should apply to other uses of xa_reserve()/xa_store() as well. If xa_store() fails after xa_reserve(), you're pretty much using the xarray API incorrectly... and then, just make it a BUG()? I know that BUG() is frowned upon, but if the API causes invalid memory accesses when used incorrectly, one might as well fail as early as possible and before the invalid memory access becomes exploitable. > I don't like BUG(), because it's obviously very doable to > gracefully handle failure. Yes, you can by using a different API. But the point is that in the reserve/store case the insert failure becomes a reserve failure, never a store failure. Maybe there should be an xa_store_reserved() that BUGs on failure, I don't know. Paolo
© 2016 - 2024 Red Hat, Inc.