[v4] TDX: Enable Dynamic PAMT

[PATCH v4 00/16] TDX: Enable Dynamic PAMT

Posted by Rick Edgecombe 2 months, 2 weeks ago

Hi, 

This is 4th revision of Dynamic PAMT, which is a new feature that reduces 
the memory use of TDX. For background information, see the v3 
coverletter[0]. V4 mostly consists of incorporating the feedback from v3. 
Notably what it *doesn’t* change is the solution for pre-allocating DPAMT
backing pages. (more info below in “Changes”)

I think this series isn’t quite ready to ask for merging just yet. I’d 
appreciate another round of review especially looking for any issues in the
refcount allocation/mapping and the pamt get/put lock-refcount dance. And
hopefully collect some RBs.

Sean/Paolo, we have mostly banished this all to TDX code except for “KVM: 
TDX: Add x86 ops for external spt cache” patch. That and “x86/virt/tdx: 
Add helpers to allow for pre-allocating pages” are probably the remaining 
possibly controversial parts of your domain. If you only have a short time 
to spend at this point, I’d point you at those two patches.

Since most of the changes are in arch/x86, I’d think this feature could be
a candidate for eventually merging through tip with Sean or Paolo’s ack.
But it is currently based on kvm-x86/next in order to build on top of the 
post-populate cleanup series. Next time I’ll probably target tip if people 
think that is a good way forward.

Changes
=======
There were two good suggestions around the pre-allocated pages solution
last time, but neither ended up working out:

1. Dave suggested to use mempool_t instead of the linked list based 
structure, in order to not re-invent the wheel. This turned out to not 
quite fit. The problems were that there wasn’t really a “topup” mechanism, 
or an atomic fallback (which matches the kvm cache behavior). This results 
in very similar code being built around mempool that was built around the 
linked list. It was an overall harder to follow solution for not much code 
savings. 

I strongly considered going back to Kiryl’s original solution which passed 
a callback function pointer for allocating DPAMT pages, and an opaque
void * that the callback could use to find the kvm_mmu_memory_cache. I
thought that readability issues of passing the opaque void * between
subsystems outweighed the small code duplication in the simple, familiar
patterned linked list-based code. So I ended up leaving it. 

2. Kai suggested (but later retracted the idea) that since the external 
page table cache was moved to TDX code, it could simply install DPAMT 
pages for the cache at topup time. Then the installation of DPAMT backing 
for S-EPT page tables could be done outside of the mmu_lock. It could also 
be argued that it makes the design simpler in a way, because the external
page table cache acts like it did before. Anything in there could be simply
used. 

At the time my argument against this was that whether a huge page would be 
installed (and thus, whether DPAMT backing was needed) for the guest 
private memory would not be known until later, so early install solution
would need special late handling for TDX huge pages. After some internal
discussions I at looked how we could simplify the series by punting on TDX
huge pages needs. 

But it turns out that this other design was actually more complex and had 
more LOC than the previous solution. So it was dropped, and again, I went 
back to the original solution. 


I’m really starting to think that, while the overall solution here isn’t 
the most elegant, we might not have much more to squeeze from it. So 
design-wise, I think we should think about calling it done. 

Testing
=======
Based on kvm-x86/next (4531ff85d925). Testing was the usual, except I also 
tested with TDX modules that don't support DPAMT, and with the two 
optimization patches removed: “Improve PAMT refcounters allocation for 
sparse memory” and “x86/virt/tdx: Optimize tdx_alloc/free_page() helpers”.

[0] https://lore.kernel.org/kvm/20250918232224.2202592-1-rick.p.edgecombe@intel.com/ 


Kirill A. Shutemov (13):
  x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h>
  x86/tdx: Add helpers to check return status codes
  x86/virt/tdx: Allocate page bitmap for Dynamic PAMT
  x86/virt/tdx: Allocate reference counters for PAMT memory
  x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory
  x86/virt/tdx: Add tdx_alloc/free_page() helpers
  x86/virt/tdx: Optimize tdx_alloc/free_page() helpers
  KVM: TDX: Allocate PAMT memory for TD control structures
  KVM: TDX: Allocate PAMT memory for vCPU control structures
  KVM: TDX: Handle PAMT allocation in fault path
  KVM: TDX: Reclaim PAMT memory
  x86/virt/tdx: Enable Dynamic PAMT
  Documentation/x86: Add documentation for TDX's Dynamic PAMT

Rick Edgecombe (3):
  x86/virt/tdx: Simplify tdmr_get_pamt_sz()
  KVM: TDX: Add x86 ops for external spt cache
  x86/virt/tdx: Add helpers to allow for pre-allocating pages

 Documentation/arch/x86/tdx.rst              |  21 +
 arch/x86/coco/tdx/tdx.c                     |  10 +-
 arch/x86/include/asm/kvm-x86-ops.h          |   3 +
 arch/x86/include/asm/kvm_host.h             |  14 +-
 arch/x86/include/asm/shared/tdx.h           |   8 +
 arch/x86/include/asm/shared/tdx_errno.h     | 104 ++++
 arch/x86/include/asm/tdx.h                  |  78 ++-
 arch/x86/include/asm/tdx_global_metadata.h  |   1 +
 arch/x86/kvm/mmu/mmu.c                      |   6 +-
 arch/x86/kvm/mmu/mmu_internal.h             |   2 +-
 arch/x86/kvm/vmx/tdx.c                      | 160 ++++--
 arch/x86/kvm/vmx/tdx.h                      |   3 +-
 arch/x86/kvm/vmx/tdx_errno.h                |  40 --
 arch/x86/virt/vmx/tdx/tdx.c                 | 587 +++++++++++++++++---
 arch/x86/virt/vmx/tdx/tdx.h                 |   5 +-
 arch/x86/virt/vmx/tdx/tdx_global_metadata.c |   7 +
 16 files changed, 854 insertions(+), 195 deletions(-)
 create mode 100644 arch/x86/include/asm/shared/tdx_errno.h
 delete mode 100644 arch/x86/kvm/vmx/tdx_errno.h

-- 
2.51.2

Re: [PATCH v4 00/16] TDX: Enable Dynamic PAMT

Posted by Sagi Shahar 2 months, 2 weeks ago

On Thu, Nov 20, 2025 at 6:51 PM Rick Edgecombe
<rick.p.edgecombe@intel.com> wrote:
>
> Hi,
>
> This is 4th revision of Dynamic PAMT, which is a new feature that reduces
> the memory use of TDX. For background information, see the v3
> coverletter[0]. V4 mostly consists of incorporating the feedback from v3.
> Notably what it *doesn’t* change is the solution for pre-allocating DPAMT
> backing pages. (more info below in “Changes”)
>
> I think this series isn’t quite ready to ask for merging just yet. I’d
> appreciate another round of review especially looking for any issues in the
> refcount allocation/mapping and the pamt get/put lock-refcount dance. And
> hopefully collect some RBs.
>
> Sean/Paolo, we have mostly banished this all to TDX code except for “KVM:
> TDX: Add x86 ops for external spt cache” patch. That and “x86/virt/tdx:
> Add helpers to allow for pre-allocating pages” are probably the remaining
> possibly controversial parts of your domain. If you only have a short time
> to spend at this point, I’d point you at those two patches.
>
> Since most of the changes are in arch/x86, I’d think this feature could be
> a candidate for eventually merging through tip with Sean or Paolo’s ack.
> But it is currently based on kvm-x86/next in order to build on top of the
> post-populate cleanup series. Next time I’ll probably target tip if people
> think that is a good way forward.
>
> Changes
> =======
> There were two good suggestions around the pre-allocated pages solution
> last time, but neither ended up working out:
>
> 1. Dave suggested to use mempool_t instead of the linked list based
> structure, in order to not re-invent the wheel. This turned out to not
> quite fit. The problems were that there wasn’t really a “topup” mechanism,
> or an atomic fallback (which matches the kvm cache behavior). This results
> in very similar code being built around mempool that was built around the
> linked list. It was an overall harder to follow solution for not much code
> savings.
>
> I strongly considered going back to Kiryl’s original solution which passed
> a callback function pointer for allocating DPAMT pages, and an opaque
> void * that the callback could use to find the kvm_mmu_memory_cache. I
> thought that readability issues of passing the opaque void * between
> subsystems outweighed the small code duplication in the simple, familiar
> patterned linked list-based code. So I ended up leaving it.
>
> 2. Kai suggested (but later retracted the idea) that since the external
> page table cache was moved to TDX code, it could simply install DPAMT
> pages for the cache at topup time. Then the installation of DPAMT backing
> for S-EPT page tables could be done outside of the mmu_lock. It could also
> be argued that it makes the design simpler in a way, because the external
> page table cache acts like it did before. Anything in there could be simply
> used.
>
> At the time my argument against this was that whether a huge page would be
> installed (and thus, whether DPAMT backing was needed) for the guest
> private memory would not be known until later, so early install solution
> would need special late handling for TDX huge pages. After some internal
> discussions I at looked how we could simplify the series by punting on TDX
> huge pages needs.
>
> But it turns out that this other design was actually more complex and had
> more LOC than the previous solution. So it was dropped, and again, I went
> back to the original solution.
>
>
> I’m really starting to think that, while the overall solution here isn’t
> the most elegant, we might not have much more to squeeze from it. So
> design-wise, I think we should think about calling it done.
>
> Testing
> =======
> Based on kvm-x86/next (4531ff85d925). Testing was the usual, except I also
> tested with TDX modules that don't support DPAMT, and with the two
> optimization patches removed: “Improve PAMT refcounters allocation for
> sparse memory” and “x86/virt/tdx: Optimize tdx_alloc/free_page() helpers”.
>
> [0] https://lore.kernel.org/kvm/20250918232224.2202592-1-rick.p.edgecombe@intel.com/
>
>
> Kirill A. Shutemov (13):
>   x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h>
>   x86/tdx: Add helpers to check return status codes
>   x86/virt/tdx: Allocate page bitmap for Dynamic PAMT
>   x86/virt/tdx: Allocate reference counters for PAMT memory
>   x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory
>   x86/virt/tdx: Add tdx_alloc/free_page() helpers
>   x86/virt/tdx: Optimize tdx_alloc/free_page() helpers
>   KVM: TDX: Allocate PAMT memory for TD control structures
>   KVM: TDX: Allocate PAMT memory for vCPU control structures
>   KVM: TDX: Handle PAMT allocation in fault path
>   KVM: TDX: Reclaim PAMT memory
>   x86/virt/tdx: Enable Dynamic PAMT
>   Documentation/x86: Add documentation for TDX's Dynamic PAMT
>
> Rick Edgecombe (3):
>   x86/virt/tdx: Simplify tdmr_get_pamt_sz()
>   KVM: TDX: Add x86 ops for external spt cache
>   x86/virt/tdx: Add helpers to allow for pre-allocating pages
>
>  Documentation/arch/x86/tdx.rst              |  21 +
>  arch/x86/coco/tdx/tdx.c                     |  10 +-
>  arch/x86/include/asm/kvm-x86-ops.h          |   3 +
>  arch/x86/include/asm/kvm_host.h             |  14 +-
>  arch/x86/include/asm/shared/tdx.h           |   8 +
>  arch/x86/include/asm/shared/tdx_errno.h     | 104 ++++
>  arch/x86/include/asm/tdx.h                  |  78 ++-
>  arch/x86/include/asm/tdx_global_metadata.h  |   1 +
>  arch/x86/kvm/mmu/mmu.c                      |   6 +-
>  arch/x86/kvm/mmu/mmu_internal.h             |   2 +-
>  arch/x86/kvm/vmx/tdx.c                      | 160 ++++--
>  arch/x86/kvm/vmx/tdx.h                      |   3 +-
>  arch/x86/kvm/vmx/tdx_errno.h                |  40 --
>  arch/x86/virt/vmx/tdx/tdx.c                 | 587 +++++++++++++++++---
>  arch/x86/virt/vmx/tdx/tdx.h                 |   5 +-
>  arch/x86/virt/vmx/tdx/tdx_global_metadata.c |   7 +
>  16 files changed, 854 insertions(+), 195 deletions(-)
>  create mode 100644 arch/x86/include/asm/shared/tdx_errno.h
>  delete mode 100644 arch/x86/kvm/vmx/tdx_errno.h
>
> --
> 2.51.2
>
>

Tested-by: Sagi Shahar <sagis@google.com>

Re: [PATCH v4 00/16] TDX: Enable Dynamic PAMT

Posted by Vishal Annapurve 2 months, 2 weeks ago

On Thu, Nov 20, 2025 at 4:51 PM Rick Edgecombe
<rick.p.edgecombe@intel.com> wrote:
>
> Hi,
>
> This is 4th revision of Dynamic PAMT, which is a new feature that reduces
> the memory use of TDX. For background information, see the v3
> coverletter[0]. V4 mostly consists of incorporating the feedback from v3.
> Notably what it *doesn’t* change is the solution for pre-allocating DPAMT
> backing pages. (more info below in “Changes”)
>
> I think this series isn’t quite ready to ask for merging just yet. I’d
> appreciate another round of review especially looking for any issues in the
> refcount allocation/mapping and the pamt get/put lock-refcount dance. And
> hopefully collect some RBs.
>
> Sean/Paolo, we have mostly banished this all to TDX code except for “KVM:
> TDX: Add x86 ops for external spt cache” patch. That and “x86/virt/tdx:
> Add helpers to allow for pre-allocating pages” are probably the remaining
> possibly controversial parts of your domain. If you only have a short time
> to spend at this point, I’d point you at those two patches.
>
> Since most of the changes are in arch/x86, I’d think this feature could be
> a candidate for eventually merging through tip with Sean or Paolo’s ack.
> But it is currently based on kvm-x86/next in order to build on top of the
> post-populate cleanup series. Next time I’ll probably target tip if people
> think that is a good way forward.
>
> Changes
> =======
> There were two good suggestions around the pre-allocated pages solution
> last time, but neither ended up working out:
>
> 1. Dave suggested to use mempool_t instead of the linked list based
> structure, in order to not re-invent the wheel. This turned out to not
> quite fit. The problems were that there wasn’t really a “topup” mechanism,
> or an atomic fallback (which matches the kvm cache behavior). This results
> in very similar code being built around mempool that was built around the
> linked list. It was an overall harder to follow solution for not much code
> savings.
>
> I strongly considered going back to Kiryl’s original solution which passed
> a callback function pointer for allocating DPAMT pages, and an opaque
> void * that the callback could use to find the kvm_mmu_memory_cache. I
> thought that readability issues of passing the opaque void * between
> subsystems outweighed the small code duplication in the simple, familiar
> patterned linked list-based code. So I ended up leaving it.
>
> 2. Kai suggested (but later retracted the idea) that since the external
> page table cache was moved to TDX code, it could simply install DPAMT
> pages for the cache at topup time. Then the installation of DPAMT backing
> for S-EPT page tables could be done outside of the mmu_lock. It could also
> be argued that it makes the design simpler in a way, because the external
> page table cache acts like it did before. Anything in there could be simply
> used.
>
> At the time my argument against this was that whether a huge page would be
> installed (and thus, whether DPAMT backing was needed) for the guest
> private memory would not be known until later, so early install solution
> would need special late handling for TDX huge pages. After some internal
> discussions I at looked how we could simplify the series by punting on TDX
> huge pages needs.
>
> But it turns out that this other design was actually more complex and had
> more LOC than the previous solution. So it was dropped, and again, I went
> back to the original solution.
>
>
> I’m really starting to think that, while the overall solution here isn’t
> the most elegant, we might not have much more to squeeze from it. So
> design-wise, I think we should think about calling it done.
>
> Testing
> =======
> Based on kvm-x86/next (4531ff85d925). Testing was the usual, except I also
> tested with TDX modules that don't support DPAMT, and with the two
> optimization patches removed: “Improve PAMT refcounters allocation for
> sparse memory” and “x86/virt/tdx: Optimize tdx_alloc/free_page() helpers”.
>
> [0] https://lore.kernel.org/kvm/20250918232224.2202592-1-rick.p.edgecombe@intel.com/
>
>
> Kirill A. Shutemov (13):
>   x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h>
>   x86/tdx: Add helpers to check return status codes
>   x86/virt/tdx: Allocate page bitmap for Dynamic PAMT
>   x86/virt/tdx: Allocate reference counters for PAMT memory
>   x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory
>   x86/virt/tdx: Add tdx_alloc/free_page() helpers
>   x86/virt/tdx: Optimize tdx_alloc/free_page() helpers
>   KVM: TDX: Allocate PAMT memory for TD control structures
>   KVM: TDX: Allocate PAMT memory for vCPU control structures
>   KVM: TDX: Handle PAMT allocation in fault path
>   KVM: TDX: Reclaim PAMT memory
>   x86/virt/tdx: Enable Dynamic PAMT
>   Documentation/x86: Add documentation for TDX's Dynamic PAMT
>
> Rick Edgecombe (3):
>   x86/virt/tdx: Simplify tdmr_get_pamt_sz()
>   KVM: TDX: Add x86 ops for external spt cache
>   x86/virt/tdx: Add helpers to allow for pre-allocating pages
>
>  Documentation/arch/x86/tdx.rst              |  21 +
>  arch/x86/coco/tdx/tdx.c                     |  10 +-
>  arch/x86/include/asm/kvm-x86-ops.h          |   3 +
>  arch/x86/include/asm/kvm_host.h             |  14 +-
>  arch/x86/include/asm/shared/tdx.h           |   8 +
>  arch/x86/include/asm/shared/tdx_errno.h     | 104 ++++
>  arch/x86/include/asm/tdx.h                  |  78 ++-
>  arch/x86/include/asm/tdx_global_metadata.h  |   1 +
>  arch/x86/kvm/mmu/mmu.c                      |   6 +-
>  arch/x86/kvm/mmu/mmu_internal.h             |   2 +-
>  arch/x86/kvm/vmx/tdx.c                      | 160 ++++--
>  arch/x86/kvm/vmx/tdx.h                      |   3 +-
>  arch/x86/kvm/vmx/tdx_errno.h                |  40 --
>  arch/x86/virt/vmx/tdx/tdx.c                 | 587 +++++++++++++++++---
>  arch/x86/virt/vmx/tdx/tdx.h                 |   5 +-
>  arch/x86/virt/vmx/tdx/tdx_global_metadata.c |   7 +
>  16 files changed, 854 insertions(+), 195 deletions(-)
>  create mode 100644 arch/x86/include/asm/shared/tdx_errno.h
>  delete mode 100644 arch/x86/kvm/vmx/tdx_errno.h
>
> --
> 2.51.2
>

acked-by: Vishal Annapurve <vannapurve@google.com>.