[PATCH 02/17] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"

Rick Edgecombe posted 17 patches 5 days, 17 hours ago
[PATCH 02/17] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"
Posted by Rick Edgecombe 5 days, 17 hours ago
From: Sean Christopherson <seanjc@google.com>

Pass a pointer to iter->old_spte, not simply its value, when setting an
external SPTE in __tdp_mmu_set_spte_atomic(), so that the iterator's value
will be updated if the cmpxchg64 to freeze the mirror SPTE fails.  The bug
is currently benign as TDX is mutualy exclusive with all paths that do
"local" retry", e.g. clear_dirty_gfn_range() and wrprot_gfn_range().

Fixes: 77ac7079e66d ("KVM: x86/tdp_mmu: Propagate building mirror page tables")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 7b1102d26f9c..dbaeb80f2b64 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -509,10 +509,10 @@ static void *get_external_spt(gfn_t gfn, u64 new_spte, int level)
 }
 
 static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sptep,
-						 gfn_t gfn, u64 old_spte,
+						 gfn_t gfn, u64 *old_spte,
 						 u64 new_spte, int level)
 {
-	bool was_present = is_shadow_present_pte(old_spte);
+	bool was_present = is_shadow_present_pte(*old_spte);
 	bool is_present = is_shadow_present_pte(new_spte);
 	bool is_leaf = is_present && is_last_spte(new_spte, level);
 	int ret = 0;
@@ -525,7 +525,7 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
 	 * page table has been modified. Use FROZEN_SPTE similar to
 	 * the zapping case.
 	 */
-	if (!try_cmpxchg64(rcu_dereference(sptep), &old_spte, FROZEN_SPTE))
+	if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE))
 		return -EBUSY;
 
 	/*
@@ -541,7 +541,7 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
 		ret = kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt);
 	}
 	if (ret)
-		__kvm_tdp_mmu_write_spte(sptep, old_spte);
+		__kvm_tdp_mmu_write_spte(sptep, *old_spte);
 	else
 		__kvm_tdp_mmu_write_spte(sptep, new_spte);
 	return ret;
@@ -670,7 +670,7 @@ static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm,
 			return -EBUSY;
 
 		ret = set_external_spte_present(kvm, iter->sptep, iter->gfn,
-						iter->old_spte, new_spte, iter->level);
+						&iter->old_spte, new_spte, iter->level);
 		if (ret)
 			return ret;
 	} else {
-- 
2.53.0
Re: [PATCH 02/17] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"
Posted by Huang, Kai 2 days, 4 hours ago
On Fri, 2026-03-27 at 13:14 -0700, Rick Edgecombe wrote:
> From: Sean Christopherson <seanjc@google.com>
> 
> Pass a pointer to iter->old_spte, not simply its value, when setting an
> external SPTE in __tdp_mmu_set_spte_atomic(), so that the iterator's value
> will be updated if the cmpxchg64 to freeze the mirror SPTE fails.  The bug
> is currently benign as TDX is mutualy exclusive with all paths that do
> "local" retry", e.g. clear_dirty_gfn_range() and wrprot_gfn_range().
> 
> Fixes: 77ac7079e66d ("KVM: x86/tdp_mmu: Propagate building mirror page tables")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 7b1102d26f9c..dbaeb80f2b64 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -509,10 +509,10 @@ static void *get_external_spt(gfn_t gfn, u64 new_spte, int level)
>  }
>  
>  static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sptep,
> -						 gfn_t gfn, u64 old_spte,
> +						 gfn_t gfn, u64 *old_spte,
>  						 u64 new_spte, int level)
>  {
> -	bool was_present = is_shadow_present_pte(old_spte);
> +	bool was_present = is_shadow_present_pte(*old_spte);
>  	bool is_present = is_shadow_present_pte(new_spte);
>  	bool is_leaf = is_present && is_last_spte(new_spte, level);
>  	int ret = 0;
> @@ -525,7 +525,7 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
>  	 * page table has been modified. Use FROZEN_SPTE similar to
>  	 * the zapping case.
>  	 */
> -	if (!try_cmpxchg64(rcu_dereference(sptep), &old_spte, FROZEN_SPTE))
> +	if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE))
>  		return -EBUSY;
>  
>  	/*
> @@ -541,7 +541,7 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
>  		ret = kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt);
>  	}
>  	if (ret)
> -		__kvm_tdp_mmu_write_spte(sptep, old_spte);
> +		__kvm_tdp_mmu_write_spte(sptep, *old_spte);
>  	else
>  		__kvm_tdp_mmu_write_spte(sptep, new_spte);
>  	return ret;
> @@ -670,7 +670,7 @@ static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm,
>  			return -EBUSY;
>  
>  		ret = set_external_spte_present(kvm, iter->sptep, iter->gfn,
> -						iter->old_spte, new_spte, iter->level);
> +						&iter->old_spte, new_spte, iter->level);
>  		if (ret)
>  			return ret;
>  	} else {

The __tdp_mmu_set_spte_atomic() has a WARN() at the beginning to check the
iter->old_spte isn't a frozen SPTE:

        WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte));

Thinking more, I _think_ this patch could potentially trigger this WARNING
due to now set_external_spte_present() will set iter->old_spte to
FROZEN_SPTE when try_cmpxchg64() fails.

Consider there are 3 vCPUs trying to accept the same GFN, and they all reach
__tdp_mmu_set_spte_atomic() simultaneously.  Assuming vCPU1 does the 

	if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE))
 		return -EBUSY;

.. successfully in set_external_spte_present(), then vCPU2 will fail on the
try_cmpxchg64(), but this will cause iter->old_spte to be updated to
FROZEN_SPTE.

Then when vCPU3 enters __tdp_mmu_set_spte_atomic(), AFAICT the WARNING will
be triggered due to is_frozen_spte(iter->old_spte) will now return true.

Or did I miss anything?

Also, AFAICT this issue doesn't exist for non-TDX case because there's no
case tdp_mmu_set_spte_atomic() is called to set new_spte as FROZEN_SPTE in
such case.
Re: [PATCH 02/17] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"
Posted by Yan Zhao 2 days, 4 hours ago
On Tue, Mar 31, 2026 at 05:47:29PM +0800, Huang, Kai wrote:
> On Fri, 2026-03-27 at 13:14 -0700, Rick Edgecombe wrote:
> > From: Sean Christopherson <seanjc@google.com>
> > 
> > Pass a pointer to iter->old_spte, not simply its value, when setting an
> > external SPTE in __tdp_mmu_set_spte_atomic(), so that the iterator's value
> > will be updated if the cmpxchg64 to freeze the mirror SPTE fails.  The bug
> > is currently benign as TDX is mutualy exclusive with all paths that do
> > "local" retry", e.g. clear_dirty_gfn_range() and wrprot_gfn_range().
> > 
> > Fixes: 77ac7079e66d ("KVM: x86/tdp_mmu: Propagate building mirror page tables")
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> > ---
> >  arch/x86/kvm/mmu/tdp_mmu.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index 7b1102d26f9c..dbaeb80f2b64 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -509,10 +509,10 @@ static void *get_external_spt(gfn_t gfn, u64 new_spte, int level)
> >  }
> >  
> >  static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sptep,
> > -						 gfn_t gfn, u64 old_spte,
> > +						 gfn_t gfn, u64 *old_spte,
> >  						 u64 new_spte, int level)
> >  {
> > -	bool was_present = is_shadow_present_pte(old_spte);
> > +	bool was_present = is_shadow_present_pte(*old_spte);
> >  	bool is_present = is_shadow_present_pte(new_spte);
> >  	bool is_leaf = is_present && is_last_spte(new_spte, level);
> >  	int ret = 0;
> > @@ -525,7 +525,7 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
> >  	 * page table has been modified. Use FROZEN_SPTE similar to
> >  	 * the zapping case.
> >  	 */
> > -	if (!try_cmpxchg64(rcu_dereference(sptep), &old_spte, FROZEN_SPTE))
> > +	if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE))
> >  		return -EBUSY;
> >  
> >  	/*
> > @@ -541,7 +541,7 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
> >  		ret = kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt);
> >  	}
> >  	if (ret)
> > -		__kvm_tdp_mmu_write_spte(sptep, old_spte);
> > +		__kvm_tdp_mmu_write_spte(sptep, *old_spte);
> >  	else
> >  		__kvm_tdp_mmu_write_spte(sptep, new_spte);
> >  	return ret;
> > @@ -670,7 +670,7 @@ static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm,
> >  			return -EBUSY;
> >  
> >  		ret = set_external_spte_present(kvm, iter->sptep, iter->gfn,
> > -						iter->old_spte, new_spte, iter->level);
> > +						&iter->old_spte, new_spte, iter->level);
> >  		if (ret)
> >  			return ret;
> >  	} else {
> 
> The __tdp_mmu_set_spte_atomic() has a WARN() at the beginning to check the
> iter->old_spte isn't a frozen SPTE:
> 
>         WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte));
> 
> Thinking more, I _think_ this patch could potentially trigger this WARNING
> due to now set_external_spte_present() will set iter->old_spte to
> FROZEN_SPTE when try_cmpxchg64() fails.
> 
> Consider there are 3 vCPUs trying to accept the same GFN, and they all reach
> __tdp_mmu_set_spte_atomic() simultaneously.  Assuming vCPU1 does the 
> 
> 	if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE))
>  		return -EBUSY;
> 
> .. successfully in set_external_spte_present(), then vCPU2 will fail on the
> try_cmpxchg64(), but this will cause iter->old_spte to be updated to
> FROZEN_SPTE.
> 
> Then when vCPU3 enters __tdp_mmu_set_spte_atomic(), AFAICT the WARNING will
> be triggered due to is_frozen_spte(iter->old_spte) will now return true.

The failed caller needs to check "if (is_frozen_spte(iter.old_spte))" before
retrying, as in kvm_tdp_mmu_map()?
 
> Or did I miss anything?
> 
> Also, AFAICT this issue doesn't exist for non-TDX case because there's no
> case tdp_mmu_set_spte_atomic() is called to set new_spte as FROZEN_SPTE in
> such case.
Re: [PATCH 02/17] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"
Posted by Huang, Kai 2 days, 4 hours ago
> > 
> > The __tdp_mmu_set_spte_atomic() has a WARN() at the beginning to check the
> > iter->old_spte isn't a frozen SPTE:
> > 
> >         WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte));
> > 
> > Thinking more, I _think_ this patch could potentially trigger this WARNING
> > due to now set_external_spte_present() will set iter->old_spte to
> > FROZEN_SPTE when try_cmpxchg64() fails.
> > 
> > Consider there are 3 vCPUs trying to accept the same GFN, and they all reach
> > __tdp_mmu_set_spte_atomic() simultaneously.  Assuming vCPU1 does the 
> > 
> > 	if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE))
> >  		return -EBUSY;
> > 
> > .. successfully in set_external_spte_present(), then vCPU2 will fail on the
> > try_cmpxchg64(), but this will cause iter->old_spte to be updated to
> > FROZEN_SPTE.
> > 
> > Then when vCPU3 enters __tdp_mmu_set_spte_atomic(), AFAICT the WARNING will
> > be triggered due to is_frozen_spte(iter->old_spte) will now return true.
> 
> The failed caller needs to check "if (is_frozen_spte(iter.old_spte))" before
> retrying, as in kvm_tdp_mmu_map()?

It's possible the vCPU3 is already about to go into
__tdp_mmu_set_spte_atomic() when iter.old_spte becomes FROZEN_SPTE.
Re: [PATCH 02/17] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"
Posted by Yan Zhao 2 days, 4 hours ago
On Tue, Mar 31, 2026 at 05:59:54PM +0800, Huang, Kai wrote:
> > > 
> > > The __tdp_mmu_set_spte_atomic() has a WARN() at the beginning to check the
> > > iter->old_spte isn't a frozen SPTE:
> > > 
> > >         WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte));
> > > 
> > > Thinking more, I _think_ this patch could potentially trigger this WARNING
> > > due to now set_external_spte_present() will set iter->old_spte to
> > > FROZEN_SPTE when try_cmpxchg64() fails.
> > > 
> > > Consider there are 3 vCPUs trying to accept the same GFN, and they all reach
> > > __tdp_mmu_set_spte_atomic() simultaneously.  Assuming vCPU1 does the 
> > > 
> > > 	if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE))
> > >  		return -EBUSY;
> > > 
> > > .. successfully in set_external_spte_present(), then vCPU2 will fail on the
> > > try_cmpxchg64(), but this will cause iter->old_spte to be updated to
> > > FROZEN_SPTE.
> > > 
> > > Then when vCPU3 enters __tdp_mmu_set_spte_atomic(), AFAICT the WARNING will
> > > be triggered due to is_frozen_spte(iter->old_spte) will now return true.
> > 
> > The failed caller needs to check "if (is_frozen_spte(iter.old_spte))" before
> > retrying, as in kvm_tdp_mmu_map()?
> 
> It's possible the vCPU3 is already about to go into
> __tdp_mmu_set_spte_atomic() when iter.old_spte becomes FROZEN_SPTE.
Hmm, different vCPU's &iter shouldn't locate at the same memory, where iter is a
local variable.
Re: [PATCH 02/17] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"
Posted by Huang, Kai 2 days, 3 hours ago
On Tue, 2026-03-31 at 17:22 +0800, Yan Zhao wrote:
> On Tue, Mar 31, 2026 at 05:59:54PM +0800, Huang, Kai wrote:
> > > > 
> > > > The __tdp_mmu_set_spte_atomic() has a WARN() at the beginning to check the
> > > > iter->old_spte isn't a frozen SPTE:
> > > > 
> > > >         WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte));
> > > > 
> > > > Thinking more, I _think_ this patch could potentially trigger this WARNING
> > > > due to now set_external_spte_present() will set iter->old_spte to
> > > > FROZEN_SPTE when try_cmpxchg64() fails.
> > > > 
> > > > Consider there are 3 vCPUs trying to accept the same GFN, and they all reach
> > > > __tdp_mmu_set_spte_atomic() simultaneously.  Assuming vCPU1 does the 
> > > > 
> > > > 	if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE))
> > > >  		return -EBUSY;
> > > > 
> > > > .. successfully in set_external_spte_present(), then vCPU2 will fail on the
> > > > try_cmpxchg64(), but this will cause iter->old_spte to be updated to
> > > > FROZEN_SPTE.
> > > > 
> > > > Then when vCPU3 enters __tdp_mmu_set_spte_atomic(), AFAICT the WARNING will
> > > > be triggered due to is_frozen_spte(iter->old_spte) will now return true.
> > > 
> > > The failed caller needs to check "if (is_frozen_spte(iter.old_spte))" before
> > > retrying, as in kvm_tdp_mmu_map()?
> > 
> > It's possible the vCPU3 is already about to go into
> > __tdp_mmu_set_spte_atomic() when iter.old_spte becomes FROZEN_SPTE.
> Hmm, different vCPU's &iter shouldn't locate at the same memory, where iter is a
> local variable.

Ah right.  I missed that :-)