Documentation/arch/x86/tdx.rst | 21 + arch/x86/coco/tdx/tdx.c | 10 +- arch/x86/include/asm/kvm-x86-ops.h | 3 + arch/x86/include/asm/kvm_host.h | 14 +- arch/x86/include/asm/shared/tdx.h | 8 + arch/x86/include/asm/shared/tdx_errno.h | 104 ++++ arch/x86/include/asm/tdx.h | 78 ++- arch/x86/include/asm/tdx_global_metadata.h | 1 + arch/x86/kvm/mmu/mmu.c | 6 +- arch/x86/kvm/mmu/mmu_internal.h | 2 +- arch/x86/kvm/vmx/tdx.c | 160 ++++-- arch/x86/kvm/vmx/tdx.h | 3 +- arch/x86/kvm/vmx/tdx_errno.h | 40 -- arch/x86/virt/vmx/tdx/tdx.c | 587 +++++++++++++++++--- arch/x86/virt/vmx/tdx/tdx.h | 5 +- arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 7 + 16 files changed, 854 insertions(+), 195 deletions(-) create mode 100644 arch/x86/include/asm/shared/tdx_errno.h delete mode 100644 arch/x86/kvm/vmx/tdx_errno.h
Hi, This is 4th revision of Dynamic PAMT, which is a new feature that reduces the memory use of TDX. For background information, see the v3 coverletter[0]. V4 mostly consists of incorporating the feedback from v3. Notably what it *doesn’t* change is the solution for pre-allocating DPAMT backing pages. (more info below in “Changes”) I think this series isn’t quite ready to ask for merging just yet. I’d appreciate another round of review especially looking for any issues in the refcount allocation/mapping and the pamt get/put lock-refcount dance. And hopefully collect some RBs. Sean/Paolo, we have mostly banished this all to TDX code except for “KVM: TDX: Add x86 ops for external spt cache” patch. That and “x86/virt/tdx: Add helpers to allow for pre-allocating pages” are probably the remaining possibly controversial parts of your domain. If you only have a short time to spend at this point, I’d point you at those two patches. Since most of the changes are in arch/x86, I’d think this feature could be a candidate for eventually merging through tip with Sean or Paolo’s ack. But it is currently based on kvm-x86/next in order to build on top of the post-populate cleanup series. Next time I’ll probably target tip if people think that is a good way forward. Changes ======= There were two good suggestions around the pre-allocated pages solution last time, but neither ended up working out: 1. Dave suggested to use mempool_t instead of the linked list based structure, in order to not re-invent the wheel. This turned out to not quite fit. The problems were that there wasn’t really a “topup” mechanism, or an atomic fallback (which matches the kvm cache behavior). This results in very similar code being built around mempool that was built around the linked list. It was an overall harder to follow solution for not much code savings. I strongly considered going back to Kiryl’s original solution which passed a callback function pointer for allocating DPAMT pages, and an opaque void * that the callback could use to find the kvm_mmu_memory_cache. I thought that readability issues of passing the opaque void * between subsystems outweighed the small code duplication in the simple, familiar patterned linked list-based code. So I ended up leaving it. 2. Kai suggested (but later retracted the idea) that since the external page table cache was moved to TDX code, it could simply install DPAMT pages for the cache at topup time. Then the installation of DPAMT backing for S-EPT page tables could be done outside of the mmu_lock. It could also be argued that it makes the design simpler in a way, because the external page table cache acts like it did before. Anything in there could be simply used. At the time my argument against this was that whether a huge page would be installed (and thus, whether DPAMT backing was needed) for the guest private memory would not be known until later, so early install solution would need special late handling for TDX huge pages. After some internal discussions I at looked how we could simplify the series by punting on TDX huge pages needs. But it turns out that this other design was actually more complex and had more LOC than the previous solution. So it was dropped, and again, I went back to the original solution. I’m really starting to think that, while the overall solution here isn’t the most elegant, we might not have much more to squeeze from it. So design-wise, I think we should think about calling it done. Testing ======= Based on kvm-x86/next (4531ff85d925). Testing was the usual, except I also tested with TDX modules that don't support DPAMT, and with the two optimization patches removed: “Improve PAMT refcounters allocation for sparse memory” and “x86/virt/tdx: Optimize tdx_alloc/free_page() helpers”. [0] https://lore.kernel.org/kvm/20250918232224.2202592-1-rick.p.edgecombe@intel.com/ Kirill A. Shutemov (13): x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h> x86/tdx: Add helpers to check return status codes x86/virt/tdx: Allocate page bitmap for Dynamic PAMT x86/virt/tdx: Allocate reference counters for PAMT memory x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory x86/virt/tdx: Add tdx_alloc/free_page() helpers x86/virt/tdx: Optimize tdx_alloc/free_page() helpers KVM: TDX: Allocate PAMT memory for TD control structures KVM: TDX: Allocate PAMT memory for vCPU control structures KVM: TDX: Handle PAMT allocation in fault path KVM: TDX: Reclaim PAMT memory x86/virt/tdx: Enable Dynamic PAMT Documentation/x86: Add documentation for TDX's Dynamic PAMT Rick Edgecombe (3): x86/virt/tdx: Simplify tdmr_get_pamt_sz() KVM: TDX: Add x86 ops for external spt cache x86/virt/tdx: Add helpers to allow for pre-allocating pages Documentation/arch/x86/tdx.rst | 21 + arch/x86/coco/tdx/tdx.c | 10 +- arch/x86/include/asm/kvm-x86-ops.h | 3 + arch/x86/include/asm/kvm_host.h | 14 +- arch/x86/include/asm/shared/tdx.h | 8 + arch/x86/include/asm/shared/tdx_errno.h | 104 ++++ arch/x86/include/asm/tdx.h | 78 ++- arch/x86/include/asm/tdx_global_metadata.h | 1 + arch/x86/kvm/mmu/mmu.c | 6 +- arch/x86/kvm/mmu/mmu_internal.h | 2 +- arch/x86/kvm/vmx/tdx.c | 160 ++++-- arch/x86/kvm/vmx/tdx.h | 3 +- arch/x86/kvm/vmx/tdx_errno.h | 40 -- arch/x86/virt/vmx/tdx/tdx.c | 587 +++++++++++++++++--- arch/x86/virt/vmx/tdx/tdx.h | 5 +- arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 7 + 16 files changed, 854 insertions(+), 195 deletions(-) create mode 100644 arch/x86/include/asm/shared/tdx_errno.h delete mode 100644 arch/x86/kvm/vmx/tdx_errno.h -- 2.51.2
On Thu, Nov 20, 2025 at 6:51 PM Rick Edgecombe <rick.p.edgecombe@intel.com> wrote: > > Hi, > > This is 4th revision of Dynamic PAMT, which is a new feature that reduces > the memory use of TDX. For background information, see the v3 > coverletter[0]. V4 mostly consists of incorporating the feedback from v3. > Notably what it *doesn’t* change is the solution for pre-allocating DPAMT > backing pages. (more info below in “Changes”) > > I think this series isn’t quite ready to ask for merging just yet. I’d > appreciate another round of review especially looking for any issues in the > refcount allocation/mapping and the pamt get/put lock-refcount dance. And > hopefully collect some RBs. > > Sean/Paolo, we have mostly banished this all to TDX code except for “KVM: > TDX: Add x86 ops for external spt cache” patch. That and “x86/virt/tdx: > Add helpers to allow for pre-allocating pages” are probably the remaining > possibly controversial parts of your domain. If you only have a short time > to spend at this point, I’d point you at those two patches. > > Since most of the changes are in arch/x86, I’d think this feature could be > a candidate for eventually merging through tip with Sean or Paolo’s ack. > But it is currently based on kvm-x86/next in order to build on top of the > post-populate cleanup series. Next time I’ll probably target tip if people > think that is a good way forward. > > Changes > ======= > There were two good suggestions around the pre-allocated pages solution > last time, but neither ended up working out: > > 1. Dave suggested to use mempool_t instead of the linked list based > structure, in order to not re-invent the wheel. This turned out to not > quite fit. The problems were that there wasn’t really a “topup” mechanism, > or an atomic fallback (which matches the kvm cache behavior). This results > in very similar code being built around mempool that was built around the > linked list. It was an overall harder to follow solution for not much code > savings. > > I strongly considered going back to Kiryl’s original solution which passed > a callback function pointer for allocating DPAMT pages, and an opaque > void * that the callback could use to find the kvm_mmu_memory_cache. I > thought that readability issues of passing the opaque void * between > subsystems outweighed the small code duplication in the simple, familiar > patterned linked list-based code. So I ended up leaving it. > > 2. Kai suggested (but later retracted the idea) that since the external > page table cache was moved to TDX code, it could simply install DPAMT > pages for the cache at topup time. Then the installation of DPAMT backing > for S-EPT page tables could be done outside of the mmu_lock. It could also > be argued that it makes the design simpler in a way, because the external > page table cache acts like it did before. Anything in there could be simply > used. > > At the time my argument against this was that whether a huge page would be > installed (and thus, whether DPAMT backing was needed) for the guest > private memory would not be known until later, so early install solution > would need special late handling for TDX huge pages. After some internal > discussions I at looked how we could simplify the series by punting on TDX > huge pages needs. > > But it turns out that this other design was actually more complex and had > more LOC than the previous solution. So it was dropped, and again, I went > back to the original solution. > > > I’m really starting to think that, while the overall solution here isn’t > the most elegant, we might not have much more to squeeze from it. So > design-wise, I think we should think about calling it done. > > Testing > ======= > Based on kvm-x86/next (4531ff85d925). Testing was the usual, except I also > tested with TDX modules that don't support DPAMT, and with the two > optimization patches removed: “Improve PAMT refcounters allocation for > sparse memory” and “x86/virt/tdx: Optimize tdx_alloc/free_page() helpers”. > > [0] https://lore.kernel.org/kvm/20250918232224.2202592-1-rick.p.edgecombe@intel.com/ > > > Kirill A. Shutemov (13): > x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h> > x86/tdx: Add helpers to check return status codes > x86/virt/tdx: Allocate page bitmap for Dynamic PAMT > x86/virt/tdx: Allocate reference counters for PAMT memory > x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory > x86/virt/tdx: Add tdx_alloc/free_page() helpers > x86/virt/tdx: Optimize tdx_alloc/free_page() helpers > KVM: TDX: Allocate PAMT memory for TD control structures > KVM: TDX: Allocate PAMT memory for vCPU control structures > KVM: TDX: Handle PAMT allocation in fault path > KVM: TDX: Reclaim PAMT memory > x86/virt/tdx: Enable Dynamic PAMT > Documentation/x86: Add documentation for TDX's Dynamic PAMT > > Rick Edgecombe (3): > x86/virt/tdx: Simplify tdmr_get_pamt_sz() > KVM: TDX: Add x86 ops for external spt cache > x86/virt/tdx: Add helpers to allow for pre-allocating pages > > Documentation/arch/x86/tdx.rst | 21 + > arch/x86/coco/tdx/tdx.c | 10 +- > arch/x86/include/asm/kvm-x86-ops.h | 3 + > arch/x86/include/asm/kvm_host.h | 14 +- > arch/x86/include/asm/shared/tdx.h | 8 + > arch/x86/include/asm/shared/tdx_errno.h | 104 ++++ > arch/x86/include/asm/tdx.h | 78 ++- > arch/x86/include/asm/tdx_global_metadata.h | 1 + > arch/x86/kvm/mmu/mmu.c | 6 +- > arch/x86/kvm/mmu/mmu_internal.h | 2 +- > arch/x86/kvm/vmx/tdx.c | 160 ++++-- > arch/x86/kvm/vmx/tdx.h | 3 +- > arch/x86/kvm/vmx/tdx_errno.h | 40 -- > arch/x86/virt/vmx/tdx/tdx.c | 587 +++++++++++++++++--- > arch/x86/virt/vmx/tdx/tdx.h | 5 +- > arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 7 + > 16 files changed, 854 insertions(+), 195 deletions(-) > create mode 100644 arch/x86/include/asm/shared/tdx_errno.h > delete mode 100644 arch/x86/kvm/vmx/tdx_errno.h > > -- > 2.51.2 > > Tested-by: Sagi Shahar <sagis@google.com>
On Thu, Nov 20, 2025 at 4:51 PM Rick Edgecombe <rick.p.edgecombe@intel.com> wrote: > > Hi, > > This is 4th revision of Dynamic PAMT, which is a new feature that reduces > the memory use of TDX. For background information, see the v3 > coverletter[0]. V4 mostly consists of incorporating the feedback from v3. > Notably what it *doesn’t* change is the solution for pre-allocating DPAMT > backing pages. (more info below in “Changes”) > > I think this series isn’t quite ready to ask for merging just yet. I’d > appreciate another round of review especially looking for any issues in the > refcount allocation/mapping and the pamt get/put lock-refcount dance. And > hopefully collect some RBs. > > Sean/Paolo, we have mostly banished this all to TDX code except for “KVM: > TDX: Add x86 ops for external spt cache” patch. That and “x86/virt/tdx: > Add helpers to allow for pre-allocating pages” are probably the remaining > possibly controversial parts of your domain. If you only have a short time > to spend at this point, I’d point you at those two patches. > > Since most of the changes are in arch/x86, I’d think this feature could be > a candidate for eventually merging through tip with Sean or Paolo’s ack. > But it is currently based on kvm-x86/next in order to build on top of the > post-populate cleanup series. Next time I’ll probably target tip if people > think that is a good way forward. > > Changes > ======= > There were two good suggestions around the pre-allocated pages solution > last time, but neither ended up working out: > > 1. Dave suggested to use mempool_t instead of the linked list based > structure, in order to not re-invent the wheel. This turned out to not > quite fit. The problems were that there wasn’t really a “topup” mechanism, > or an atomic fallback (which matches the kvm cache behavior). This results > in very similar code being built around mempool that was built around the > linked list. It was an overall harder to follow solution for not much code > savings. > > I strongly considered going back to Kiryl’s original solution which passed > a callback function pointer for allocating DPAMT pages, and an opaque > void * that the callback could use to find the kvm_mmu_memory_cache. I > thought that readability issues of passing the opaque void * between > subsystems outweighed the small code duplication in the simple, familiar > patterned linked list-based code. So I ended up leaving it. > > 2. Kai suggested (but later retracted the idea) that since the external > page table cache was moved to TDX code, it could simply install DPAMT > pages for the cache at topup time. Then the installation of DPAMT backing > for S-EPT page tables could be done outside of the mmu_lock. It could also > be argued that it makes the design simpler in a way, because the external > page table cache acts like it did before. Anything in there could be simply > used. > > At the time my argument against this was that whether a huge page would be > installed (and thus, whether DPAMT backing was needed) for the guest > private memory would not be known until later, so early install solution > would need special late handling for TDX huge pages. After some internal > discussions I at looked how we could simplify the series by punting on TDX > huge pages needs. > > But it turns out that this other design was actually more complex and had > more LOC than the previous solution. So it was dropped, and again, I went > back to the original solution. > > > I’m really starting to think that, while the overall solution here isn’t > the most elegant, we might not have much more to squeeze from it. So > design-wise, I think we should think about calling it done. > > Testing > ======= > Based on kvm-x86/next (4531ff85d925). Testing was the usual, except I also > tested with TDX modules that don't support DPAMT, and with the two > optimization patches removed: “Improve PAMT refcounters allocation for > sparse memory” and “x86/virt/tdx: Optimize tdx_alloc/free_page() helpers”. > > [0] https://lore.kernel.org/kvm/20250918232224.2202592-1-rick.p.edgecombe@intel.com/ > > > Kirill A. Shutemov (13): > x86/tdx: Move all TDX error defines into <asm/shared/tdx_errno.h> > x86/tdx: Add helpers to check return status codes > x86/virt/tdx: Allocate page bitmap for Dynamic PAMT > x86/virt/tdx: Allocate reference counters for PAMT memory > x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory > x86/virt/tdx: Add tdx_alloc/free_page() helpers > x86/virt/tdx: Optimize tdx_alloc/free_page() helpers > KVM: TDX: Allocate PAMT memory for TD control structures > KVM: TDX: Allocate PAMT memory for vCPU control structures > KVM: TDX: Handle PAMT allocation in fault path > KVM: TDX: Reclaim PAMT memory > x86/virt/tdx: Enable Dynamic PAMT > Documentation/x86: Add documentation for TDX's Dynamic PAMT > > Rick Edgecombe (3): > x86/virt/tdx: Simplify tdmr_get_pamt_sz() > KVM: TDX: Add x86 ops for external spt cache > x86/virt/tdx: Add helpers to allow for pre-allocating pages > > Documentation/arch/x86/tdx.rst | 21 + > arch/x86/coco/tdx/tdx.c | 10 +- > arch/x86/include/asm/kvm-x86-ops.h | 3 + > arch/x86/include/asm/kvm_host.h | 14 +- > arch/x86/include/asm/shared/tdx.h | 8 + > arch/x86/include/asm/shared/tdx_errno.h | 104 ++++ > arch/x86/include/asm/tdx.h | 78 ++- > arch/x86/include/asm/tdx_global_metadata.h | 1 + > arch/x86/kvm/mmu/mmu.c | 6 +- > arch/x86/kvm/mmu/mmu_internal.h | 2 +- > arch/x86/kvm/vmx/tdx.c | 160 ++++-- > arch/x86/kvm/vmx/tdx.h | 3 +- > arch/x86/kvm/vmx/tdx_errno.h | 40 -- > arch/x86/virt/vmx/tdx/tdx.c | 587 +++++++++++++++++--- > arch/x86/virt/vmx/tdx/tdx.h | 5 +- > arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 7 + > 16 files changed, 854 insertions(+), 195 deletions(-) > create mode 100644 arch/x86/include/asm/shared/tdx_errno.h > delete mode 100644 arch/x86/kvm/vmx/tdx_errno.h > > -- > 2.51.2 > acked-by: Vishal Annapurve <vannapurve@google.com>.
© 2016 - 2025 Red Hat, Inc.