[PATCH v3 1/3] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking

Yan Zhao posted 3 patches 1 month, 1 week ago
[PATCH v3 1/3] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking
Posted by Yan Zhao 1 month, 1 week ago
Do not allow resetting dirty GFNs in memslots that do not enable dirty
tracking.

vCPUs' dirty rings are shared between userspace and KVM. After KVM sets
dirtied entries in the dirty rings, userspace is responsible for
harvesting/resetting these entries and calling the ioctl
KVM_RESET_DIRTY_RINGS to inform KVM to advance the reset_index in the dirty
rings and invoke kvm_arch_mmu_enable_log_dirty_pt_masked() to clear the
SPTEs' dirty bits or perform write protection of the GFNs.

Although KVM does not set dirty entries for GFNs in a memslot that does not
enable dirty tracking, userspace can write arbitrary data into the dirty
ring. This makes it possible for misbehaving userspace to specify that it
has harvested a GFN from such a memslot. When this happens, KVM will be
asked to clear dirty bits or perform write protection for GFNs in a memslot
that does not enable dirty tracking, which is undesirable.

For TDX, this unexpected resetting of dirty GFNs could cause inconsistency
between the mirror SPTE and the external SPTE in hardware (e.g., the mirror
SPTE has no write bit while the external SPTE is writable). When
kvm_dirty_log_manual_protect_and_init_set() is true and huge pages are
enabled in TDX, this could even lead to kvm_mmu_slot_gfn_write_protect()
being called and trigger KVM_BUG_ON() due to permission reduction changes
in the huge mirror SPTEs.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
---
 virt/kvm/dirty_ring.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
index 02bc6b00d76c..b38b4b7d7667 100644
--- a/virt/kvm/dirty_ring.c
+++ b/virt/kvm/dirty_ring.c
@@ -63,7 +63,13 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask)
 
 	memslot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
 
-	if (!memslot || (offset + __fls(mask)) >= memslot->npages)
+	/*
+	 * Userspace can write arbitrary data into the dirty ring, making it
+	 * possible for misbehaving userspace to try to reset an out-of-memslot
+	 * GFN or a GFN in a memslot that isn't being dirty-logged.
+	 */
+	if (!memslot || (offset + __fls(mask)) >= memslot->npages ||
+	    !kvm_slot_dirty_track_enabled(memslot))
 		return;
 
 	KVM_MMU_LOCK(kvm);
-- 
2.43.2
Re: [PATCH v3 1/3] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking
Posted by Sean Christopherson 1 month, 1 week ago
On Fri, Aug 22, 2025, Yan Zhao wrote:
> Do not allow resetting dirty GFNs in memslots that do not enable dirty
> tracking.
> 
> vCPUs' dirty rings are shared between userspace and KVM. After KVM sets
> dirtied entries in the dirty rings, userspace is responsible for
> harvesting/resetting these entries and calling the ioctl
> KVM_RESET_DIRTY_RINGS to inform KVM to advance the reset_index in the dirty
> rings and invoke kvm_arch_mmu_enable_log_dirty_pt_masked() to clear the
> SPTEs' dirty bits or perform write protection of the GFNs.
> 
> Although KVM does not set dirty entries for GFNs in a memslot that does not
> enable dirty tracking, userspace can write arbitrary data into the dirty
> ring. This makes it possible for misbehaving userspace to specify that it
> has harvested a GFN from such a memslot. When this happens, KVM will be
> asked to clear dirty bits or perform write protection for GFNs in a memslot
> that does not enable dirty tracking, which is undesirable.
> 
> For TDX, this unexpected resetting of dirty GFNs could cause inconsistency
> between the mirror SPTE and the external SPTE in hardware (e.g., the mirror
> SPTE has no write bit while the external SPTE is writable). When
> kvm_dirty_log_manual_protect_and_init_set() is true and huge pages are
> enabled in TDX, this could even lead to kvm_mmu_slot_gfn_write_protect()
> being called and trigger KVM_BUG_ON() due to permission reduction changes
> in the huge mirror SPTEs.
> 

Sounds like this needs a Fixes and Cc: stable?

> Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> ---
>  virt/kvm/dirty_ring.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> index 02bc6b00d76c..b38b4b7d7667 100644
> --- a/virt/kvm/dirty_ring.c
> +++ b/virt/kvm/dirty_ring.c
> @@ -63,7 +63,13 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask)
>  
>  	memslot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
>  
> -	if (!memslot || (offset + __fls(mask)) >= memslot->npages)
> +	/*
> +	 * Userspace can write arbitrary data into the dirty ring, making it
> +	 * possible for misbehaving userspace to try to reset an out-of-memslot
> +	 * GFN or a GFN in a memslot that isn't being dirty-logged.
> +	 */
> +	if (!memslot || (offset + __fls(mask)) >= memslot->npages ||
> +	    !kvm_slot_dirty_track_enabled(memslot))

Maybe check for dirty tracking being enabled before checking the range?  Purely
because checking if _any_  gfn can be recorded seems like something that should
be checked before a specific gfn can be recorded.  I.e.

	if (!memslot || !kvm_slot_dirty_track_enabled(memslot) ||
	    (offset + __fls(mask)) >= memslot->npages)
	    
>  		return;
>  
>  	KVM_MMU_LOCK(kvm);
> -- 
> 2.43.2
>
Re: [PATCH v3 1/3] KVM: Do not reset dirty GFNs in a memslot not enabling dirty tracking
Posted by Yan Zhao 1 month, 1 week ago
On Mon, Aug 25, 2025 at 01:42:43PM -0700, Sean Christopherson wrote:
> On Fri, Aug 22, 2025, Yan Zhao wrote:
> > Do not allow resetting dirty GFNs in memslots that do not enable dirty
> > tracking.
> > 
> > vCPUs' dirty rings are shared between userspace and KVM. After KVM sets
> > dirtied entries in the dirty rings, userspace is responsible for
> > harvesting/resetting these entries and calling the ioctl
> > KVM_RESET_DIRTY_RINGS to inform KVM to advance the reset_index in the dirty
> > rings and invoke kvm_arch_mmu_enable_log_dirty_pt_masked() to clear the
> > SPTEs' dirty bits or perform write protection of the GFNs.
> > 
> > Although KVM does not set dirty entries for GFNs in a memslot that does not
> > enable dirty tracking, userspace can write arbitrary data into the dirty
> > ring. This makes it possible for misbehaving userspace to specify that it
> > has harvested a GFN from such a memslot. When this happens, KVM will be
> > asked to clear dirty bits or perform write protection for GFNs in a memslot
> > that does not enable dirty tracking, which is undesirable.
> > 
> > For TDX, this unexpected resetting of dirty GFNs could cause inconsistency
> > between the mirror SPTE and the external SPTE in hardware (e.g., the mirror
> > SPTE has no write bit while the external SPTE is writable). When
> > kvm_dirty_log_manual_protect_and_init_set() is true and huge pages are
> > enabled in TDX, this could even lead to kvm_mmu_slot_gfn_write_protect()
> > being called and trigger KVM_BUG_ON() due to permission reduction changes
> > in the huge mirror SPTEs.
> > 
> 
> Sounds like this needs a Fixes and Cc: stable?
Ok. Will include them in the next version.

> > Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
> > ---
> >  virt/kvm/dirty_ring.c | 8 +++++++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/virt/kvm/dirty_ring.c b/virt/kvm/dirty_ring.c
> > index 02bc6b00d76c..b38b4b7d7667 100644
> > --- a/virt/kvm/dirty_ring.c
> > +++ b/virt/kvm/dirty_ring.c
> > @@ -63,7 +63,13 @@ static void kvm_reset_dirty_gfn(struct kvm *kvm, u32 slot, u64 offset, u64 mask)
> >  
> >  	memslot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
> >  
> > -	if (!memslot || (offset + __fls(mask)) >= memslot->npages)
> > +	/*
> > +	 * Userspace can write arbitrary data into the dirty ring, making it
> > +	 * possible for misbehaving userspace to try to reset an out-of-memslot
> > +	 * GFN or a GFN in a memslot that isn't being dirty-logged.
> > +	 */
> > +	if (!memslot || (offset + __fls(mask)) >= memslot->npages ||
> > +	    !kvm_slot_dirty_track_enabled(memslot))
> 
> Maybe check for dirty tracking being enabled before checking the range?  Purely
> because checking if _any_  gfn can be recorded seems like something that should
> be checked before a specific gfn can be recorded.  I.e.
> 
> 	if (!memslot || !kvm_slot_dirty_track_enabled(memslot) ||
> 	    (offset + __fls(mask)) >= memslot->npages)
Makes sense.
Thank you!

> >  		return;
> >  
> >  	KVM_MMU_LOCK(kvm);
> > -- 
> > 2.43.2
> >