[RFC PATCH v5 02/45] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"

Sean Christopherson posted 45 patches 1 week, 3 days ago
[RFC PATCH v5 02/45] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"
Posted by Sean Christopherson 1 week, 3 days ago
Pass a pointer to iter->old_spte, not simply its value, when setting an
external SPTE in __tdp_mmu_set_spte_atomic(), so that the iterator's value
will be updated if the cmpxchg64 to freeze the mirror SPTE fails.  The bug
is currently benign as TDX is mutualy exclusive with all paths that do
"local" retry", e.g. clear_dirty_gfn_range() and wrprot_gfn_range().

Fixes: 77ac7079e66d ("KVM: x86/tdp_mmu: Propagate building mirror page tables")
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 9c26038f6b77..0feda295859a 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -509,10 +509,10 @@ static void *get_external_spt(gfn_t gfn, u64 new_spte, int level)
 }
 
 static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sptep,
-						 gfn_t gfn, u64 old_spte,
+						 gfn_t gfn, u64 *old_spte,
 						 u64 new_spte, int level)
 {
-	bool was_present = is_shadow_present_pte(old_spte);
+	bool was_present = is_shadow_present_pte(*old_spte);
 	bool is_present = is_shadow_present_pte(new_spte);
 	bool is_leaf = is_present && is_last_spte(new_spte, level);
 	int ret = 0;
@@ -525,7 +525,7 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
 	 * page table has been modified. Use FROZEN_SPTE similar to
 	 * the zapping case.
 	 */
-	if (!try_cmpxchg64(rcu_dereference(sptep), &old_spte, FROZEN_SPTE))
+	if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE))
 		return -EBUSY;
 
 	/*
@@ -541,7 +541,7 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
 		ret = kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt);
 	}
 	if (ret)
-		__kvm_tdp_mmu_write_spte(sptep, old_spte);
+		__kvm_tdp_mmu_write_spte(sptep, *old_spte);
 	else
 		__kvm_tdp_mmu_write_spte(sptep, new_spte);
 	return ret;
@@ -670,7 +670,7 @@ static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm,
 			return -EBUSY;
 
 		ret = set_external_spte_present(kvm, iter->sptep, iter->gfn,
-						iter->old_spte, new_spte, iter->level);
+						&iter->old_spte, new_spte, iter->level);
 		if (ret)
 			return ret;
 	} else {
-- 
2.53.0.rc1.217.geba53bf80e-goog
Re: [RFC PATCH v5 02/45] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"
Posted by Huang, Kai 5 days, 5 hours ago
On Wed, 2026-01-28 at 17:14 -0800, Sean Christopherson wrote:
> Pass a pointer to iter->old_spte, not simply its value, when setting an
> external SPTE in __tdp_mmu_set_spte_atomic(), so that the iterator's value
> will be updated if the cmpxchg64 to freeze the mirror SPTE fails.  The bug
> is currently benign as TDX is mutualy exclusive with all paths that do
> "local" retry", e.g. clear_dirty_gfn_range() and wrprot_gfn_range().
> 
> Fixes: 77ac7079e66d ("KVM: x86/tdp_mmu: Propagate building mirror page tables")
> Signed-off-by: Sean Christopherson <seanjc@google.com>

Reviewed-by: Kai Huang <kai.huang@intel.com>

Btw, do we need to cc stable?
Re: [RFC PATCH v5 02/45] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"
Posted by Sean Christopherson 4 days, 20 hours ago
On Tue, Feb 03, 2026, Kai Huang wrote:
> On Wed, 2026-01-28 at 17:14 -0800, Sean Christopherson wrote:
> > Pass a pointer to iter->old_spte, not simply its value, when setting an
> > external SPTE in __tdp_mmu_set_spte_atomic(), so that the iterator's value
> > will be updated if the cmpxchg64 to freeze the mirror SPTE fails.  The bug
> > is currently benign as TDX is mutualy exclusive with all paths that do
> > "local" retry", e.g. clear_dirty_gfn_range() and wrprot_gfn_range().
> > 
> > Fixes: 77ac7079e66d ("KVM: x86/tdp_mmu: Propagate building mirror page tables")
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> 
> Reviewed-by: Kai Huang <kai.huang@intel.com>
> 
> Btw, do we need to cc stable?

Probably not?  The bug is benign until dirty logging comes along, and if someone
backports that support (if it ever manifests) to an older kernel, it's firmly
that person's responsibility to pick up dependencies like this.
Re: [RFC PATCH v5 02/45] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"
Posted by Huang, Kai 4 days, 18 hours ago
On Tue, 2026-02-03 at 12:06 -0800, Sean Christopherson wrote:
> On Tue, Feb 03, 2026, Kai Huang wrote:
> > On Wed, 2026-01-28 at 17:14 -0800, Sean Christopherson wrote:
> > > Pass a pointer to iter->old_spte, not simply its value, when setting an
> > > external SPTE in __tdp_mmu_set_spte_atomic(), so that the iterator's value
> > > will be updated if the cmpxchg64 to freeze the mirror SPTE fails.  The bug
> > > is currently benign as TDX is mutualy exclusive with all paths that do
> > > "local" retry", e.g. clear_dirty_gfn_range() and wrprot_gfn_range().
> > > 
> > > Fixes: 77ac7079e66d ("KVM: x86/tdp_mmu: Propagate building mirror page tables")
> > > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > 
> > Reviewed-by: Kai Huang <kai.huang@intel.com>
> > 
> > Btw, do we need to cc stable?
> 
> Probably not?  The bug is benign until dirty logging comes along, and if someone
> backports that support (if it ever manifests) to an older kernel, it's firmly
> that person's responsibility to pick up dependencies like this.

Makes sense. :-)
Re: [RFC PATCH v5 02/45] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"
Posted by Yan Zhao 5 days, 7 hours ago
On Wed, Jan 28, 2026 at 05:14:34PM -0800, Sean Christopherson wrote:
> Pass a pointer to iter->old_spte, not simply its value, when setting an
> external SPTE in __tdp_mmu_set_spte_atomic(), so that the iterator's value
> will be updated if the cmpxchg64 to freeze the mirror SPTE fails.  The bug
> is currently benign as TDX is mutualy exclusive with all paths that do
> "local" retry", e.g. clear_dirty_gfn_range() and wrprot_gfn_range().
>
> Fixes: 77ac7079e66d ("KVM: x86/tdp_mmu: Propagate building mirror page tables")
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>  arch/x86/kvm/mmu/tdp_mmu.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 9c26038f6b77..0feda295859a 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -509,10 +509,10 @@ static void *get_external_spt(gfn_t gfn, u64 new_spte, int level)
>  }
>  
>  static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sptep,
> -						 gfn_t gfn, u64 old_spte,
> +						 gfn_t gfn, u64 *old_spte,
>  						 u64 new_spte, int level)
>  {
> -	bool was_present = is_shadow_present_pte(old_spte);
> +	bool was_present = is_shadow_present_pte(*old_spte);
>  	bool is_present = is_shadow_present_pte(new_spte);
>  	bool is_leaf = is_present && is_last_spte(new_spte, level);
>  	int ret = 0;
> @@ -525,7 +525,7 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
>  	 * page table has been modified. Use FROZEN_SPTE similar to
>  	 * the zapping case.
>  	 */
> -	if (!try_cmpxchg64(rcu_dereference(sptep), &old_spte, FROZEN_SPTE))
> +	if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE))
>  		return -EBUSY;
>  
>  	/*
> @@ -541,7 +541,7 @@ static int __must_check set_external_spte_present(struct kvm *kvm, tdp_ptep_t sp
>  		ret = kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt);
>  	}
>  	if (ret)
> -		__kvm_tdp_mmu_write_spte(sptep, old_spte);
> +		__kvm_tdp_mmu_write_spte(sptep, *old_spte);
Do we need to add a comment explaining that when the above try_cmpxchg64()
succeeds, the value of *old_spte is unmodified?

>  	else
>  		__kvm_tdp_mmu_write_spte(sptep, new_spte);
>  	return ret;
> @@ -670,7 +670,7 @@ static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm,
>  			return -EBUSY;
>  
>  		ret = set_external_spte_present(kvm, iter->sptep, iter->gfn,
> -						iter->old_spte, new_spte, iter->level);
> +						&iter->old_spte, new_spte, iter->level);
>  		if (ret)
>  			return ret;
>  	} else {
> -- 
> 2.53.0.rc1.217.geba53bf80e-goog
>
Re: [RFC PATCH v5 02/45] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"
Posted by Edgecombe, Rick P 1 week, 2 days ago
On Wed, 2026-01-28 at 17:14 -0800, Sean Christopherson wrote:
> Pass a pointer to iter->old_spte, not simply its value, when setting an
> external SPTE in __tdp_mmu_set_spte_atomic(), so that the iterator's value
> will be updated if the cmpxchg64 to freeze the mirror SPTE fails.
> 

Might be being dense here, but is the bug that if cmpxchg64 *succeeds* and
set_external_spte() fails? Then old_spte is not updated and the local retry will
expect the wrong old_spte.

>   The bug
> is currently benign as TDX is mutualy exclusive with all paths that do
> "local" retry", e.g. clear_dirty_gfn_range() and wrprot_gfn_range().


Re: [RFC PATCH v5 02/45] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"
Posted by Sean Christopherson 1 week, 2 days ago
On Thu, Jan 29, 2026, Rick P Edgecombe wrote:
> On Wed, 2026-01-28 at 17:14 -0800, Sean Christopherson wrote:
> > Pass a pointer to iter->old_spte, not simply its value, when setting an
> > external SPTE in __tdp_mmu_set_spte_atomic(), so that the iterator's value
> > will be updated if the cmpxchg64 to freeze the mirror SPTE fails.
> > 
> 
> Might be being dense here, but is the bug that if cmpxchg64 *succeeds* and
> set_external_spte() fails? Then old_spte is not updated and the local retry will
> expect the wrong old_spte.

No, the bug is if the cmpxchg64 fails.  On failure, the current mismatching value
is stored in the "old" param.  KVM relies on the iter->old_spte holding the
current value when restarting an operation without re-reading the SPTE from memory.

E.g. in __tdp_mmu_zap_root(), if tdp_mmu_set_spte_atomic() fails, iter->old_spte
*must* hold the current in-memroy value, otherwise the loop will hang because it
will re-attempt cmpxchg64 using the stale iter->old_spte.

static void __tdp_mmu_zap_root(struct kvm *kvm, struct kvm_mmu_page *root,
			       bool shared, int zap_level)
{
	struct tdp_iter iter;

	for_each_tdp_pte_min_level_all(iter, root, zap_level) {
retry:
		if (tdp_mmu_iter_cond_resched(kvm, &iter, false, shared))
			continue;

		if (!is_shadow_present_pte(iter.old_spte))
			continue;

		if (iter.level > zap_level)
			continue;

		if (!shared)
			tdp_mmu_iter_set_spte(kvm, &iter, SHADOW_NONPRESENT_VALUE);
		else if (tdp_mmu_set_spte_atomic(kvm, &iter, SHADOW_NONPRESENT_VALUE))
			goto retry;
	}
}

> >   The bug
> > is currently benign as TDX is mutualy exclusive with all paths that do
> > "local" retry", e.g. clear_dirty_gfn_range() and wrprot_gfn_range().
> 
> 
Re: [RFC PATCH v5 02/45] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails"
Posted by Edgecombe, Rick P 1 week, 2 days ago
On Thu, 2026-01-29 at 14:23 -0800, Sean Christopherson wrote:
> No, the bug is if the cmpxchg64 fails.  On failure, the current mismatching value
> is stored in the "old" param.  KVM relies on the iter->old_spte holding the
> current value when restarting an operation without re-reading the SPTE from memory.

Ah, I see. Sorry. Just went and refreshed up on the difference between
cmpxchg64() and try_cmpxchg64(). I see now that the log is accurate since it
refers to the behavior of the instruction, but specifying try_cmpxchg64() might
be a little clearer since cmpxchg() doesn't automatically update the 'old'
passed in. In either case:

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>