[PATCH 1/2] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking

Yan Zhao posted 2 patches 1 year, 1 month ago
There is a newer version of this series
[PATCH 1/2] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking
Posted by Yan Zhao 1 year, 1 month ago
Do not allow resetting dirty GFNs belonging to a memslot that does not
enable dirty tracking.

vCPUs' dirty rings are shared between userspace and KVM. After KVM sets
dirtied entries in the dirty rings, userspace is responsible for
harvesting/resetting the dirtied entries and calling the ioctl
KVM_RESET_DIRTY_RINGS to inform KVM to advance the reset_index in the
dirty rings and invoke kvm_arch_mmu_enable_log_dirty_pt_masked() to clear
the SPTEs' dirty bits or perform write protection of GFNs.

Although KVM does not set dirty entries for GFNs in a memslot that does not
enable dirty tracking, it is still possible for userspace to specify that
it has harvested a GFN belonging to such a memslot. When this happens, KVM
will be asked to clear dirty bits or perform write protection for GFNs in a
memslot that does not enable dirty tracking, which is not desired.

For TDX, this unexpected resetting of dirty GFNs could cause inconsistency
between the mirror SPTE and the external SPTE in hardware (e.g., the mirror
SPTE has no write bit while it is writable in the external SPTE in
hardware). When kvm_dirty_log_manual_protect_and_init_set() is true and
when huge pages are enabled in TDX, this could even lead to
kvm_mmu_slot_gfn_write_protect() being called and the external SPTE being
removed.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 virt/kvm/dirty_ring.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index d14ffc7513ee..1ce5352ea596 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -66,7 +66,8 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask)
 
 	memslot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
 
-	if (!memslot || (offset + __fls(mask)) >= memslot->npages)
+	if (!memslot || (offset + __fls(mask)) >= memslot->npages ||
+	    !kvm_slot_dirty_track_enabled(memslot))
 		return;
 
 	KVM_MMU_LOCK(kvm);
-- 
2.43.2
Re: [PATCH 1/2] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking
Posted by Sean Christopherson 1 year, 1 month ago
On Fri, Dec 20, 2024, Yan Zhao wrote:
> Do not allow resetting dirty GFNs belonging to a memslot that does not
> enable dirty tracking.
> 
> vCPUs' dirty rings are shared between userspace and KVM. After KVM sets
> dirtied entries in the dirty rings, userspace is responsible for
> harvesting/resetting the dirtied entries and calling the ioctl
> KVM_RESET_DIRTY_RINGS to inform KVM to advance the reset_index in the
> dirty rings and invoke kvm_arch_mmu_enable_log_dirty_pt_masked() to clear
> the SPTEs' dirty bits or perform write protection of GFNs.
> 
> Although KVM does not set dirty entries for GFNs in a memslot that does not
> enable dirty tracking, it is still possible for userspace to specify that
> it has harvested a GFN belonging to such a memslot. When this happens, KVM
> will be asked to clear dirty bits or perform write protection for GFNs in a
> memslot that does not enable dirty tracking, which is not desired.
> 
> For TDX, this unexpected resetting of dirty GFNs could cause inconsistency
> between the mirror SPTE and the external SPTE in hardware (e.g., the mirror
> SPTE has no write bit while it is writable in the external SPTE in
> hardware). When kvm_dirty_log_manual_protect_and_init_set() is true and
> when huge pages are enabled in TDX, this could even lead to
> kvm_mmu_slot_gfn_write_protect() being called and the external SPTE being
> removed.
> 
> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  virt/kvm/dirty_ring.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index d14ffc7513ee..1ce5352ea596 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -66,7 +66,8 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask)
>  
>  	memslot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
>  
> -	if (!memslot || (offset + __fls(mask)) >= memslot->npages)
> +	if (!memslot || (offset + __fls(mask)) >= memslot->npages ||
> +	    !kvm_slot_dirty_track_enabled(memslot))

Can you add a comment explaining that it's possible to try to update a memslot
that isn't being dirty-logged if userspace is misbehaving?  And specifically that
userspace can write arbitrary data into the ring.
Re: [PATCH 1/2] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking
Posted by Yan Zhao 1 year, 1 month ago
On Fri, Dec 20, 2024 at 09:31:35AM -0800, Sean Christopherson wrote:
> On Fri, Dec 20, 2024, Yan Zhao wrote:
> > Do not allow resetting dirty GFNs belonging to a memslot that does not
> > enable dirty tracking.
> > 
> > vCPUs' dirty rings are shared between userspace and KVM. After KVM sets
> > dirtied entries in the dirty rings, userspace is responsible for
> > harvesting/resetting the dirtied entries and calling the ioctl
> > KVM_RESET_DIRTY_RINGS to inform KVM to advance the reset_index in the
> > dirty rings and invoke kvm_arch_mmu_enable_log_dirty_pt_masked() to clear
> > the SPTEs' dirty bits or perform write protection of GFNs.
> > 
> > Although KVM does not set dirty entries for GFNs in a memslot that does not
> > enable dirty tracking, it is still possible for userspace to specify that
> > it has harvested a GFN belonging to such a memslot. When this happens, KVM
> > will be asked to clear dirty bits or perform write protection for GFNs in a
> > memslot that does not enable dirty tracking, which is not desired.
> > 
> > For TDX, this unexpected resetting of dirty GFNs could cause inconsistency
> > between the mirror SPTE and the external SPTE in hardware (e.g., the mirror
> > SPTE has no write bit while it is writable in the external SPTE in
> > hardware). When kvm_dirty_log_manual_protect_and_init_set() is true and
> > when huge pages are enabled in TDX, this could even lead to
> > kvm_mmu_slot_gfn_write_protect() being called and the external SPTE being
> > removed.
> > 
> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  virt/kvm/dirty_ring.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> > index d14ffc7513ee..1ce5352ea596 100644
> > --- a/virt/kvm/dirty_ring.c
> > +++ b/virt/kvm/dirty_ring.c
> > @@ -66,7 +66,8 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask)
> >  
> >  	memslot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
> >  
> > -	if (!memslot || (offset + __fls(mask)) >= memslot->npages)
> > +	if (!memslot || (offset + __fls(mask)) >= memslot->npages ||
> > +	    !kvm_slot_dirty_track_enabled(memslot))
> 
> Can you add a comment explaining that it's possible to try to update a memslot
> that isn't being dirty-logged if userspace is misbehaving?  And specifically that
> userspace can write arbitrary data into the ring.
Yes, will do. Thanks!