[RFC, PATCH 11/12] KVM: TDX: Reclaim PAMT memory

Kirill A. Shutemov posted 12 patches 9 months, 1 week ago
There is a newer version of this series
[RFC, PATCH 11/12] KVM: TDX: Reclaim PAMT memory
Posted by Kirill A. Shutemov 9 months, 1 week ago
The PAMT memory holds metadata for TDX-protected memory. With Dynamic
PAMT, PAMT_4K is allocated on demand. The kernel supplies the TDX module
with a few pages that cover 2M of host physical memory.

PAMT memory can be reclaimed when the last user is gone. It can happen
in a few code paths:

- On TDH.PHYMEM.PAGE.RECLAIM in tdx_reclaim_td_control_pages() and
  tdx_reclaim_page().

- On TDH.MEM.PAGE.REMOVE in tdx_sept_drop_private_spte().

- In tdx_sept_zap_private_spte() for pages that were in the queue to be
  added with TDH.MEM.PAGE.ADD, but it never happened due to an error.

Add tdx_pamt_put() in these code paths.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kvm/vmx/tdx.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 0f06ae7ff6b9..352f7b41f611 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -487,8 +487,11 @@ static int tdx_reclaim_page(struct page *page)
 	int r;
 
 	r = __tdx_reclaim_page(page);
-	if (!r)
+	if (!r) {
 		tdx_clear_page(page);
+		tdx_pamt_put(page);
+	}
+
 	return r;
 }
 
@@ -737,6 +740,7 @@ static void tdx_reclaim_td_control_pages(struct kvm *kvm)
 		return;
 	}
 	tdx_clear_page(kvm_tdx->td.tdr_page);
+	tdx_pamt_put(kvm_tdx->td.tdr_page);
 
 	__free_page(kvm_tdx->td.tdr_page);
 	kvm_tdx->td.tdr_page = NULL;
@@ -1768,6 +1772,7 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn,
 		return -EIO;
 	}
 	tdx_clear_page(page);
+	tdx_pamt_put(page);
 	tdx_unpin(kvm, page);
 	return 0;
 }
@@ -1848,6 +1853,7 @@ static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn,
 	if (tdx_is_sept_zap_err_due_to_premap(kvm_tdx, err, entry, level) &&
 	    !KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm)) {
 		atomic64_dec(&kvm_tdx->nr_premapped);
+		tdx_pamt_put(page);
 		tdx_unpin(kvm, page);
 		return 0;
 	}
-- 
2.47.2
Re: [RFC, PATCH 11/12] KVM: TDX: Reclaim PAMT memory
Posted by Huang, Kai 9 months ago

On 3/05/2025 1:08 am, Kirill A. Shutemov wrote:
> The PAMT memory holds metadata for TDX-protected memory. With Dynamic
> PAMT, PAMT_4K is allocated on demand. The kernel supplies the TDX module
> with a few pages that cover 2M of host physical memory.
> 
> PAMT memory can be reclaimed when the last user is gone. It can happen
> in a few code paths:
> 
> - On TDH.PHYMEM.PAGE.RECLAIM in tdx_reclaim_td_control_pages() and
>    tdx_reclaim_page().
> 
> - On TDH.MEM.PAGE.REMOVE in tdx_sept_drop_private_spte().
> 
> - In tdx_sept_zap_private_spte() for pages that were in the queue to be
>    added with TDH.MEM.PAGE.ADD, but it never happened due to an error.
> 
> Add tdx_pamt_put() in these code paths.

IMHO, instead of explicitly hooking tdx_pamt_put() to various places, we 
should just do tdx_free_page() for the pages that were allocated by 
tdx_alloc_page() (i.e., control pages, SEPT pages).

That means, IMHO, we should do PAMT allocation/free when we actually 
*allocate* and *free* the target TDX private page(s).  I.e., we should:

- For TDX private pages with normal kernel allocation (control pages, 
SEPT pages etc), we use tdx_alloc_page() and tdx_free_page().
- For TDX private pages in page cache, i.e., guest_memfd, since we 
cannot use tdx_{alloc|free}_page(), we hook guest_memfd code to call 
tdx_pamt_{get|put}().

(I wish there's a way to unify the above two as well, but I don't have a 
simple way to do that.)

I believe this can help simplifying the code.

So, ...

> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>   arch/x86/kvm/vmx/tdx.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 0f06ae7ff6b9..352f7b41f611 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -487,8 +487,11 @@ static int tdx_reclaim_page(struct page *page)
>   	int r;
>   
>   	r = __tdx_reclaim_page(page);
> -	if (!r)
> +	if (!r) {
>   		tdx_clear_page(page);
> +		tdx_pamt_put(page);
> +	}
> +
>   	return r;
>   }
>   

... I think this change should be removed, and ...

[...]

> +	tdx_pamt_put(kvm_tdx->td.tdr_page);
>   
>   	__free_page(kvm_tdx->td.tdr_page);

... The above two should be just:

	tdx_free_page(kvm_tdx->td.tdr_page);

and ...

>   	kvm_tdx->td.tdr_page = NULL;
> @@ -1768,6 +1772,7 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn,
>   		return -EIO;
>   	}
>   	tdx_clear_page(page);
> +	tdx_pamt_put(page);
>   	tdx_unpin(kvm, page);
>   	return 0;
>   }
> @@ -1848,6 +1853,7 @@ static int tdx_sept_zap_private_spte(struct kvm *kvm, gfn_t gfn,
>   	if (tdx_is_sept_zap_err_due_to_premap(kvm_tdx, err, entry, level) &&
>   	    !KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm)) {
>   		atomic64_dec(&kvm_tdx->nr_premapped);
> +		tdx_pamt_put(page);
>   		tdx_unpin(kvm, page);
>   		return 0;
>   	}
... the above should be removed too.

For PAMT associated with sp->external_spt, we can call tdx_pamt_put() 
when we free sp->external_spt.

For PAMT associated with TDX memory in guest_memfd, we can have a 
guest_memfd specific a_ops->folio_invalidate() in which we can have a 
hook opposite to kvm_gmem_prepare_folio() to do tdx_pamt_put().  That 
should cover all the cases, right?

Or anything I missed?
Re: [RFC, PATCH 11/12] KVM: TDX: Reclaim PAMT memory
Posted by Vishal Annapurve 9 months ago
On Tue, May 13, 2025 at 6:12 PM Huang, Kai <kai.huang@intel.com> wrote:
>
>
>
> On 3/05/2025 1:08 am, Kirill A. Shutemov wrote:
> > The PAMT memory holds metadata for TDX-protected memory. With Dynamic
> > PAMT, PAMT_4K is allocated on demand. The kernel supplies the TDX module
> > with a few pages that cover 2M of host physical memory.
> >
> > PAMT memory can be reclaimed when the last user is gone. It can happen
> > in a few code paths:
> >
> > - On TDH.PHYMEM.PAGE.RECLAIM in tdx_reclaim_td_control_pages() and
> >    tdx_reclaim_page().
> >
> > - On TDH.MEM.PAGE.REMOVE in tdx_sept_drop_private_spte().
> >
> > - In tdx_sept_zap_private_spte() for pages that were in the queue to be
> >    added with TDH.MEM.PAGE.ADD, but it never happened due to an error.
> >
> > Add tdx_pamt_put() in these code paths.
>
> IMHO, instead of explicitly hooking tdx_pamt_put() to various places, we
> should just do tdx_free_page() for the pages that were allocated by
> tdx_alloc_page() (i.e., control pages, SEPT pages).
>
> That means, IMHO, we should do PAMT allocation/free when we actually
> *allocate* and *free* the target TDX private page(s).  I.e., we should:

I think it's important to ensure that PAMT pages are *only* allocated
for a 2M range if it's getting mapped in EPT at 4K granularity.
Physical memory allocation order can be different from the EPT mapping
granularity.
Re: [RFC, PATCH 11/12] KVM: TDX: Reclaim PAMT memory
Posted by Huang, Kai 8 months, 3 weeks ago
On Wed, 2025-05-14 at 08:21 -0700, Vishal Annapurve wrote:
> On Tue, May 13, 2025 at 6:12 PM Huang, Kai <kai.huang@intel.com> wrote:
> > 
> > 
> > 
> > On 3/05/2025 1:08 am, Kirill A. Shutemov wrote:
> > > The PAMT memory holds metadata for TDX-protected memory. With Dynamic
> > > PAMT, PAMT_4K is allocated on demand. The kernel supplies the TDX module
> > > with a few pages that cover 2M of host physical memory.
> > > 
> > > PAMT memory can be reclaimed when the last user is gone. It can happen
> > > in a few code paths:
> > > 
> > > - On TDH.PHYMEM.PAGE.RECLAIM in tdx_reclaim_td_control_pages() and
> > >    tdx_reclaim_page().
> > > 
> > > - On TDH.MEM.PAGE.REMOVE in tdx_sept_drop_private_spte().
> > > 
> > > - In tdx_sept_zap_private_spte() for pages that were in the queue to be
> > >    added with TDH.MEM.PAGE.ADD, but it never happened due to an error.
> > > 
> > > Add tdx_pamt_put() in these code paths.
> > 
> > IMHO, instead of explicitly hooking tdx_pamt_put() to various places, we
> > should just do tdx_free_page() for the pages that were allocated by
> > tdx_alloc_page() (i.e., control pages, SEPT pages).
> > 
> > That means, IMHO, we should do PAMT allocation/free when we actually
> > *allocate* and *free* the target TDX private page(s).  I.e., we should:
> 
> I think it's important to ensure that PAMT pages are *only* allocated
> for a 2M range if it's getting mapped in EPT at 4K granularity.
> Physical memory allocation order can be different from the EPT mapping
> granularity.

Agreed.  Thanks.

I still think all control pages and secure EPT pages can just use
tdx_{alloc|free}_page() though (because we always alloc and use them in 4K
granularity).