[PATCH 7/8] KVM: gmem: Avoid race with kvm_gmem_release and mmu notifier

isaku.yamahata@intel.com posted 8 patches 2 years, 4 months ago
[PATCH 7/8] KVM: gmem: Avoid race with kvm_gmem_release and mmu notifier
Posted by isaku.yamahata@intel.com 2 years, 4 months ago
From: Isaku Yamahata <isaku.yamahata@intel.com>

Add slots_lock around kvm_flush_shadow_all().  kvm_gmem_release() via
fput() and kvm_mmu_notifier_release() via mmput() can be called
simultaneously on process exit because vhost, /dev/vhost_{net, vsock}, can
delay the call to release mmu_notifier, kvm_mmu_notifier_release() by its
kernel thread.  Vhost uses get_task_mm() and mmput() for the kernel thread
to access process memory.  mmput() can defer after closing the file.

kvm_flush_shadow_all() and kvm_gmem_release() can be called simultaneously.
With TDX KVM, HKID releasing by kvm_flush_shadow_all() and private memory
releasing by kvm_gmem_release() can race.  Add slots_lock to
kvm_mmu_notifier_release().

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
---
 virt/kvm/kvm_main.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 49380cd62367..4855d0b7a859 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -927,9 +927,16 @@ static void kvm_mmu_notifier_release(struct mmu_notifier *mn,
 	struct kvm *kvm = mmu_notifier_to_kvm(mn);
 	int idx;
 
+	/*
+	 * Avoide race with kvm_gmem_release().
+	 * This function is called via mmu notifier, mmu_release().
+	 * kvm_gmem_release() is called via fput() on process exit.
+	 */
+	mutex_lock(&kvm->slots_lock);
 	idx = srcu_read_lock(&kvm->srcu);
 	kvm_flush_shadow_all(kvm);
 	srcu_read_unlock(&kvm->srcu, idx);
+	mutex_unlock(&kvm->slots_lock);
 }
 
 static const struct mmu_notifier_ops kvm_mmu_notifier_ops = {
-- 
2.25.1
Re: [PATCH 7/8] KVM: gmem: Avoid race with kvm_gmem_release and mmu notifier
Posted by Sean Christopherson 2 years, 4 months ago
On Tue, Aug 15, 2023, isaku.yamahata@intel.com wrote:
> From: Isaku Yamahata <isaku.yamahata@intel.com>
> 
> Add slots_lock around kvm_flush_shadow_all().  kvm_gmem_release() via
> fput() and kvm_mmu_notifier_release() via mmput() can be called
> simultaneously on process exit because vhost, /dev/vhost_{net, vsock}, can
> delay the call to release mmu_notifier, kvm_mmu_notifier_release() by its
> kernel thread.  Vhost uses get_task_mm() and mmput() for the kernel thread
> to access process memory.  mmput() can defer after closing the file.
> 
> kvm_flush_shadow_all() and kvm_gmem_release() can be called simultaneously.

KVM shouldn't reclaim memory on file release, it should instead do that on the
inode being "evicted": https://lore.kernel.org/all/ZLGiEfJZTyl7M8mS@google.com

> With TDX KVM, HKID releasing by kvm_flush_shadow_all() and private memory
> releasing by kvm_gmem_release() can race.  Add slots_lock to
> kvm_mmu_notifier_release().

No, the right answer is to not release the HKID until the VM is destroyed.  gmem
has a reference to its associated kvm instance, and so that will naturally ensure
memory all memory encrypted with the HKID is freed before the HKID is released.
kvm_flush_shadow_all() should only tear down page tables, it shouldn't be freeing
guest_memfd memory.

Then patches 6-8 go away.