From: Sean Christopherson <seanjc@google.com>
Move the freeing of external page tables into the reclaim operation that
lives in TDX code.
The TDP MMU supports traversing the TDP without holding locks. Page
tables needs to be freed via RCU to prevent walking one that gets
freed.
While none of these lockless walk operations actually happen for the
mirror EPT, the TDP MMU none-the-less frees the mirror EPT page tables in
the same way, and because it’s a handy place to plug it in, the external
page tables as well.
However, the external page tables definitely can’t be walked once they are
reclaimed from the TDX module. The TDX module releases the page for the
host VMM to use, so this RCU-time free is unnecessary for external page
tables.
So move the free_page() call to TDX code. Create an
tdp_mmu_free_unused_sp() to allow for freeing external page tables that
have never left the TDP MMU code (i.e. don’t need freed in a special way.
Link: https://lore.kernel.org/kvm/aYpjNrtGmogNzqwT@google.com/
Not-yet-Signed-off-by: Sean Christopherson <seanjc@google.com>
[Based on a diff by Sean, added log]
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
arch/x86/kvm/mmu/tdp_mmu.c | 16 +++++++++++-----
arch/x86/kvm/vmx/tdx.c | 11 ++++++++++-
2 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 575033cc7fe4..18e11c1c7631 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -53,13 +53,18 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
rcu_barrier();
}
-static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
+static void __tdp_mmu_free_sp(struct kvm_mmu_page *sp)
{
- free_page((unsigned long)sp->external_spt);
free_page((unsigned long)sp->spt);
kmem_cache_free(mmu_page_header_cache, sp);
}
+static void tdp_mmu_free_unused_sp(struct kvm_mmu_page *sp)
+{
+ free_page((unsigned long)sp->external_spt);
+ __tdp_mmu_free_sp(sp);
+}
+
/*
* This is called through call_rcu in order to free TDP page table memory
* safely with respect to other kernel threads that may be operating on
@@ -73,7 +78,8 @@ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head)
struct kvm_mmu_page *sp = container_of(head, struct kvm_mmu_page,
rcu_head);
- tdp_mmu_free_sp(sp);
+ WARN_ON_ONCE(sp->external_spt);
+ __tdp_mmu_free_sp(sp);
}
void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
@@ -1261,7 +1267,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
* failed, e.g. because a different task modified the SPTE.
*/
if (r) {
- tdp_mmu_free_sp(sp);
+ tdp_mmu_free_unused_sp(sp);
goto retry;
}
@@ -1571,7 +1577,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
* installs its own sp in place of the last sp we tried to split.
*/
if (sp)
- tdp_mmu_free_sp(sp);
+ tdp_mmu_free_unused_sp(sp);
return 0;
}
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index d064b40a6b31..1346e891ca94 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1782,7 +1782,16 @@ static void tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn,
*/
if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) ||
tdx_reclaim_page(virt_to_page(sp->external_spt)))
- sp->external_spt = NULL;
+ goto out;
+
+ /*
+ * Immediately free the S-EPT page as the TDX subsystem doesn't support
+ * freeing pages from RCU callbacks, and more importantly because
+ * TDH.PHYMEM.PAGE.RECLAIM ensures there are no outstanding readers.
+ */
+ free_page((unsigned long)sp->external_spt);
+out:
+ sp->external_spt = NULL;
}
static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
--
2.53.0
> > So move the free_page() call to TDX code. Create an > tdp_mmu_free_unused_sp() to allow for freeing external page tables that > have never left the TDP MMU code (i.e. don’t need freed in a special way. > Missing ')' in the last sentence. Also, don't need to be freed ?
On Fri, Mar 27, 2026 at 01:14:21PM -0700, Rick Edgecombe wrote:
> From: Sean Christopherson <seanjc@google.com>
>
> Move the freeing of external page tables into the reclaim operation that
> lives in TDX code.
>
> The TDP MMU supports traversing the TDP without holding locks. Page
> tables needs to be freed via RCU to prevent walking one that gets
> freed.
>
> While none of these lockless walk operations actually happen for the
> mirror EPT, the TDP MMU none-the-less frees the mirror EPT page tables in
> the same way, and because it’s a handy place to plug it in, the external
> page tables as well.
>
> However, the external page tables definitely can’t be walked once they are
> reclaimed from the TDX module. The TDX module releases the page for the
> host VMM to use, so this RCU-time free is unnecessary for external page
> tables.
>
> So move the free_page() call to TDX code. Create an
> tdp_mmu_free_unused_sp() to allow for freeing external page tables that
> have never left the TDP MMU code (i.e. don’t need freed in a special way.
>
> Link: https://lore.kernel.org/kvm/aYpjNrtGmogNzqwT@google.com/
> Not-yet-Signed-off-by: Sean Christopherson <seanjc@google.com>
> [Based on a diff by Sean, added log]
> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
> ---
> arch/x86/kvm/mmu/tdp_mmu.c | 16 +++++++++++-----
> arch/x86/kvm/vmx/tdx.c | 11 ++++++++++-
> 2 files changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> index 575033cc7fe4..18e11c1c7631 100644
> --- a/arch/x86/kvm/mmu/tdp_mmu.c
> +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> @@ -53,13 +53,18 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
> rcu_barrier();
> }
>
> -static void tdp_mmu_free_sp(struct kvm_mmu_page *sp)
> +static void __tdp_mmu_free_sp(struct kvm_mmu_page *sp)
> {
> - free_page((unsigned long)sp->external_spt);
> free_page((unsigned long)sp->spt);
> kmem_cache_free(mmu_page_header_cache, sp);
> }
>
> +static void tdp_mmu_free_unused_sp(struct kvm_mmu_page *sp)
> +{
> + free_page((unsigned long)sp->external_spt);
> + __tdp_mmu_free_sp(sp);
> +}
> +
> /*
> * This is called through call_rcu in order to free TDP page table memory
> * safely with respect to other kernel threads that may be operating on
> @@ -73,7 +78,8 @@ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head *head)
> struct kvm_mmu_page *sp = container_of(head, struct kvm_mmu_page,
> rcu_head);
>
> - tdp_mmu_free_sp(sp);
> + WARN_ON_ONCE(sp->external_spt);
> + __tdp_mmu_free_sp(sp);
> }
>
> void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root)
> @@ -1261,7 +1267,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
> * failed, e.g. because a different task modified the SPTE.
> */
> if (r) {
> - tdp_mmu_free_sp(sp);
> + tdp_mmu_free_unused_sp(sp);
> goto retry;
> }
>
> @@ -1571,7 +1577,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *kvm,
> * installs its own sp in place of the last sp we tried to split.
> */
> if (sp)
> - tdp_mmu_free_sp(sp);
> + tdp_mmu_free_unused_sp(sp);
>
> return 0;
> }
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index d064b40a6b31..1346e891ca94 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1782,7 +1782,16 @@ static void tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn,
> */
> if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) ||
> tdx_reclaim_page(virt_to_page(sp->external_spt)))
> - sp->external_spt = NULL;
> + goto out;
> +
> + /*
> + * Immediately free the S-EPT page as the TDX subsystem doesn't support
> + * freeing pages from RCU callbacks, and more importantly because
> + * TDH.PHYMEM.PAGE.RECLAIM ensures there are no outstanding readers.
> + */
> + free_page((unsigned long)sp->external_spt);
Since sp->external_spt is allocated in kvm_mmu_alloc_external_spt() in a similar
way as sp->spt in tdp_mmu_alloc_sp(), why bother making freeing of
sp->external_spt special? i.e., what's the downside of freeing it together with
sp->spt in tdp_mmu_free_sp() ?
> +out:
> + sp->external_spt = NULL;
> }
>
> static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
> --
> 2.53.0
>
On Mon, 2026-03-30 at 15:49 +0800, Yan Zhao wrote: > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > > index d064b40a6b31..1346e891ca94 100644 > > --- a/arch/x86/kvm/vmx/tdx.c > > +++ b/arch/x86/kvm/vmx/tdx.c > > @@ -1782,7 +1782,16 @@ static void tdx_sept_free_private_spt(struct kvm > > *kvm, gfn_t gfn, > > */ > > if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) || > > tdx_reclaim_page(virt_to_page(sp->external_spt))) > > - sp->external_spt = NULL; > > + goto out; > > + > > + /* > > + * Immediately free the S-EPT page as the TDX subsystem doesn't > > support > > + * freeing pages from RCU callbacks, and more importantly because > > + * TDH.PHYMEM.PAGE.RECLAIM ensures there are no outstanding > > readers. > > + */ > > + free_page((unsigned long)sp->external_spt); > Since sp->external_spt is allocated in kvm_mmu_alloc_external_spt() in a > similar > way as sp->spt in tdp_mmu_alloc_sp(), why bother making freeing of > sp->external_spt special? i.e., what's the downside of freeing it together > with > sp->spt in tdp_mmu_free_sp() ? I agree it's questionable as a cleanup. The change originally came from your comment here actually: https://lore.kernel.org/kvm/aYW5CbUvZrLogsWF@yzhao56-desk.sh.intel.com/ So for a more concrete DPAMT reason. That said I agree the reasoning as a stand alone cleanup is pretty weak. I tried to put a brave face on it. We have to think if we should try to include it in the initial set or include it with the later DPAMT parts. I thought to try at least to do it earlier and leave it at the end. It also kind of helps to see more of the overhaul together. Thoughts?
On Thu, Apr 02, 2026 at 08:17:19AM +0800, Edgecombe, Rick P wrote:
> On Mon, 2026-03-30 at 15:49 +0800, Yan Zhao wrote:
> > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > > index d064b40a6b31..1346e891ca94 100644
> > > --- a/arch/x86/kvm/vmx/tdx.c
> > > +++ b/arch/x86/kvm/vmx/tdx.c
> > > @@ -1782,7 +1782,16 @@ static void tdx_sept_free_private_spt(struct kvm
> > > *kvm, gfn_t gfn,
> > > */
> > > if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) ||
> > > tdx_reclaim_page(virt_to_page(sp->external_spt)))
> > > - sp->external_spt = NULL;
> > > + goto out;
> > > +
> > > + /*
> > > + * Immediately free the S-EPT page as the TDX subsystem doesn't
> > > support
> > > + * freeing pages from RCU callbacks, and more importantly because
> > > + * TDH.PHYMEM.PAGE.RECLAIM ensures there are no outstanding
> > > readers.
> > > + */
> > > + free_page((unsigned long)sp->external_spt);
> > Since sp->external_spt is allocated in kvm_mmu_alloc_external_spt() in a
> > similar
> > way as sp->spt in tdp_mmu_alloc_sp(), why bother making freeing of
> > sp->external_spt special? i.e., what's the downside of freeing it together
> > with
> > sp->spt in tdp_mmu_free_sp() ?
>
> I agree it's questionable as a cleanup. The change originally came from your
> comment here actually:
> https://lore.kernel.org/kvm/aYW5CbUvZrLogsWF@yzhao56-desk.sh.intel.com/
>
> So for a more concrete DPAMT reason. That said I agree the reasoning as a stand
> alone cleanup is pretty weak. I tried to put a brave face on it.
>
> We have to think if we should try to include it in the initial set or include it
> with the later DPAMT parts. I thought to try at least to do it earlier and leave
> it at the end. It also kind of helps to see more of the overhaul together.
> Thoughts?
However, even with DPAMT, sp->external_sp is also allocated via
kvm_mmu_alloc_external_spt().
static inline void kvm_mmu_alloc_external_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
{
/*
* external_spt is allocated for TDX module to hold private EPT mappings,
* TDX module will initialize the page by itself.
* Therefore, KVM does not need to initialize or access external_spt.
* KVM only interacts with sp->spt for private EPT operations.
*/
sp->external_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_external_spt_cache);
}
i.e., sp->external_spt is allocated with no difference from sp->spt.
So, I'm not sure why we have to free it differently.
What's the benefit of keeping the memory for a little longer than freeing it
immediately after TDX RECLAIM?
Or are you planning to allocate sp->external_spt via a TDX hook?
On Thu, Apr 02, 2026 at 10:16:07AM +0800, Yan Zhao wrote:
> On Thu, Apr 02, 2026 at 08:17:19AM +0800, Edgecombe, Rick P wrote:
> > On Mon, 2026-03-30 at 15:49 +0800, Yan Zhao wrote:
> > > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > > > index d064b40a6b31..1346e891ca94 100644
> > > > --- a/arch/x86/kvm/vmx/tdx.c
> > > > +++ b/arch/x86/kvm/vmx/tdx.c
> > > > @@ -1782,7 +1782,16 @@ static void tdx_sept_free_private_spt(struct kvm
> > > > *kvm, gfn_t gfn,
> > > > */
> > > > if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) ||
> > > > tdx_reclaim_page(virt_to_page(sp->external_spt)))
> > > > - sp->external_spt = NULL;
> > > > + goto out;
> > > > +
> > > > + /*
> > > > + * Immediately free the S-EPT page as the TDX subsystem doesn't
> > > > support
> > > > + * freeing pages from RCU callbacks, and more importantly because
> > > > + * TDH.PHYMEM.PAGE.RECLAIM ensures there are no outstanding
> > > > readers.
> > > > + */
> > > > + free_page((unsigned long)sp->external_spt);
> > > Since sp->external_spt is allocated in kvm_mmu_alloc_external_spt() in a
> > > similar
> > > way as sp->spt in tdp_mmu_alloc_sp(), why bother making freeing of
> > > sp->external_spt special? i.e., what's the downside of freeing it together
> > > with
> > > sp->spt in tdp_mmu_free_sp() ?
> >
> > I agree it's questionable as a cleanup. The change originally came from your
> > comment here actually:
> > https://lore.kernel.org/kvm/aYW5CbUvZrLogsWF@yzhao56-desk.sh.intel.com/
> >
> > So for a more concrete DPAMT reason. That said I agree the reasoning as a stand
> > alone cleanup is pretty weak. I tried to put a brave face on it.
> >
> > We have to think if we should try to include it in the initial set or include it
> > with the later DPAMT parts. I thought to try at least to do it earlier and leave
> > it at the end. It also kind of helps to see more of the overhaul together.
> > Thoughts?
> However, even with DPAMT, sp->external_sp is also allocated via
> kvm_mmu_alloc_external_spt().
>
> static inline void kvm_mmu_alloc_external_spt(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
> {
> /*
> * external_spt is allocated for TDX module to hold private EPT mappings,
> * TDX module will initialize the page by itself.
> * Therefore, KVM does not need to initialize or access external_spt.
> * KVM only interacts with sp->spt for private EPT operations.
> */
> sp->external_spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_external_spt_cache);
> }
>
> i.e., sp->external_spt is allocated with no difference from sp->spt.
> So, I'm not sure why we have to free it differently.
> What's the benefit of keeping the memory for a little longer than freeing it
Typo: what's the downside
> immediately after TDX RECLAIM?
>
> Or are you planning to allocate sp->external_spt via a TDX hook?
>
© 2016 - 2026 Red Hat, Inc.