From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
The PAMT memory holds metadata for TDX protected memory. With Dynamic
PAMT, the 4KB range of PAMT is allocated on demand. The kernel supplies
the TDX module with a page pair that covers 2MB of host physical memory.
The kernel must provide this page pair before using pages from the range
for TDX. If this is not done, any SEAMCALL that attempts to use the memory
will fail.
Allocate reference counters for every 2MB range to track PAMT memory usage.
This is necessary to accurately determine when PAMT memory needs to be
allocated and when it can be freed.
This allocation will currently consume 2 MB for every 1 TB of address
space from 0 to max_pfn (highest pfn of RAM). The allocation size will
depend on how the ram is physically laid out. In a worse case scenario
where the entire 52 address space is covered this would be 8GB. Then
the DPAMT refcount allocations could hypothetically exceed the savings
from Dynamic PAMT, which is 4GB per TB. This is probably unlikely.
However, future changes will reduce this refcount overhead to make DPAMT
always a net win.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
[Add feedback, update log]
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
---
v3:
- Split out lazily populate optimization to next patch (Dave)
- Add comment around pamt_refcounts (Dave)
- Improve log
---
arch/x86/virt/vmx/tdx/tdx.c | 47 ++++++++++++++++++++++++++++++++++++-
1 file changed, 46 insertions(+), 1 deletion(-)
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 4e4aa8927550..0ce4181ca352 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -29,6 +29,7 @@
#include <linux/acpi.h>
#include <linux/suspend.h>
#include <linux/idr.h>
+#include <linux/vmalloc.h>
#include <asm/page.h>
#include <asm/special_insns.h>
#include <asm/msr-index.h>
@@ -50,6 +51,16 @@ static DEFINE_PER_CPU(bool, tdx_lp_initialized);
static struct tdmr_info_list tdx_tdmr_list;
+/*
+ * On a machine with Dynamic PAMT, the kernel maintains a reference counter
+ * for every 2M range. The counter indicates how many users there are for
+ * the PAMT memory of the 2M range.
+ *
+ * The kernel allocates PAMT memory when the first user arrives and
+ * frees it when the last user has left.
+ */
+static atomic_t *pamt_refcounts;
+
static enum tdx_module_status_t tdx_module_status;
static DEFINE_MUTEX(tdx_module_lock);
@@ -183,6 +194,34 @@ int tdx_cpu_enable(void)
}
EXPORT_SYMBOL_GPL(tdx_cpu_enable);
+/*
+ * Allocate PAMT reference counters for all physical memory.
+ *
+ * It consumes 2MiB for every 1TiB of physical memory.
+ */
+static int init_pamt_metadata(void)
+{
+ size_t size = max_pfn / PTRS_PER_PTE * sizeof(*pamt_refcounts);
+
+ if (!tdx_supports_dynamic_pamt(&tdx_sysinfo))
+ return 0;
+
+ pamt_refcounts = vmalloc(size);
+ if (!pamt_refcounts)
+ return -ENOMEM;
+
+ return 0;
+}
+
+static void free_pamt_metadata(void)
+{
+ if (!tdx_supports_dynamic_pamt(&tdx_sysinfo))
+ return;
+
+ vfree(pamt_refcounts);
+ pamt_refcounts = NULL;
+}
+
/*
* Add a memory region as a TDX memory block. The caller must make sure
* all memory regions are added in address ascending order and don't
@@ -1074,10 +1113,14 @@ static int init_tdx_module(void)
*/
get_online_mems();
- ret = build_tdx_memlist(&tdx_memlist);
+ ret = init_pamt_metadata();
if (ret)
goto out_put_tdxmem;
+ ret = build_tdx_memlist(&tdx_memlist);
+ if (ret)
+ goto err_free_pamt_metadata;
+
/* Allocate enough space for constructing TDMRs */
ret = alloc_tdmr_list(&tdx_tdmr_list, &tdx_sysinfo.tdmr);
if (ret)
@@ -1135,6 +1178,8 @@ static int init_tdx_module(void)
free_tdmr_list(&tdx_tdmr_list);
err_free_tdxmem:
free_tdx_memlist(&tdx_memlist);
+err_free_pamt_metadata:
+ free_pamt_metadata();
goto out_put_tdxmem;
}
--
2.51.0
On 9/19/2025 7:22 AM, Rick Edgecombe wrote: > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> > > The PAMT memory holds metadata for TDX protected memory. With Dynamic > PAMT, the 4KB range of PAMT is allocated on demand. The kernel supplies > the TDX module with a page pair that covers 2MB of host physical memory. > > The kernel must provide this page pair before using pages from the range > for TDX. If this is not done, any SEAMCALL that attempts to use the memory > will fail. > > Allocate reference counters for every 2MB range to track PAMT memory usage. > This is necessary to accurately determine when PAMT memory needs to be > allocated and when it can be freed. > > This allocation will currently consume 2 MB for every 1 TB of address > space from 0 to max_pfn (highest pfn of RAM). The allocation size will > depend on how the ram is physically laid out. In a worse case scenario > where the entire 52 address space is covered this would be 8GB. Then ^ 52-bit > the DPAMT refcount allocations could hypothetically exceed the savings > from Dynamic PAMT, which is 4GB per TB. This is probably unlikely. > > However, future changes will reduce this refcount overhead to make DPAMT > always a net win. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > [Add feedback, update log] > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > --- > v3: > - Split out lazily populate optimization to next patch (Dave) > - Add comment around pamt_refcounts (Dave) > - Improve log > --- > arch/x86/virt/vmx/tdx/tdx.c | 47 ++++++++++++++++++++++++++++++++++++- > 1 file changed, 46 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 4e4aa8927550..0ce4181ca352 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -29,6 +29,7 @@ > #include <linux/acpi.h> > #include <linux/suspend.h> > #include <linux/idr.h> > +#include <linux/vmalloc.h> > #include <asm/page.h> > #include <asm/special_insns.h> > #include <asm/msr-index.h> > @@ -50,6 +51,16 @@ static DEFINE_PER_CPU(bool, tdx_lp_initialized); > > static struct tdmr_info_list tdx_tdmr_list; > > +/* > + * On a machine with Dynamic PAMT, the kernel maintains a reference counter > + * for every 2M range. The counter indicates how many users there are for > + * the PAMT memory of the 2M range. > + * > + * The kernel allocates PAMT memory when the first user arrives and > + * frees it when the last user has left. > + */ > +static atomic_t *pamt_refcounts; > + > static enum tdx_module_status_t tdx_module_status; > static DEFINE_MUTEX(tdx_module_lock); > > @@ -183,6 +194,34 @@ int tdx_cpu_enable(void) > } > EXPORT_SYMBOL_GPL(tdx_cpu_enable); > > +/* > + * Allocate PAMT reference counters for all physical memory. > + * > + * It consumes 2MiB for every 1TiB of physical memory. > + */ > +static int init_pamt_metadata(void) > +{ > + size_t size = max_pfn / PTRS_PER_PTE * sizeof(*pamt_refcounts); Is there guarantee that max_pfn is PTRS_PER_PTE aligned? If not, it should be rounded up. > + > + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) > + return 0; > + > + pamt_refcounts = vmalloc(size); > + if (!pamt_refcounts) > + return -ENOMEM; > + > + return 0; > +} > + > [...]
On Tue, 2025-09-23 at 15:45 +0800, Binbin Wu wrote: > > +/* > > + * Allocate PAMT reference counters for all physical memory. > > + * > > + * It consumes 2MiB for every 1TiB of physical memory. > > + */ > > +static int init_pamt_metadata(void) > > +{ > > + size_t size = max_pfn / PTRS_PER_PTE * sizeof(*pamt_refcounts); > > Is there guarantee that max_pfn is PTRS_PER_PTE aligned? > If not, it should be rounded up. Vmalloc() should handle it?
On 9/29/25 10:41, Edgecombe, Rick P wrote: > On Tue, 2025-09-23 at 15:45 +0800, Binbin Wu wrote: >>> +/* >>> + * Allocate PAMT reference counters for all physical memory. >>> + * >>> + * It consumes 2MiB for every 1TiB of physical memory. >>> + */ >>> +static int init_pamt_metadata(void) >>> +{ >>> + size_t size = max_pfn / PTRS_PER_PTE * sizeof(*pamt_refcounts); >> Is there guarantee that max_pfn is PTRS_PER_PTE aligned? >> If not, it should be rounded up. > Vmalloc() should handle it? vmalloc() will, for instance, round up to 2 pages if you ask for 4097 bytes in 'size'. But that's not the problem. The 'size' calculation itself is the problem. You need exactly 2 MiB for every 1 TiB of memory, so let's say we have: max_pfn = 1<<28 (where 28 == 40-PAGE_SIZE) then size would be *exactly* 1<<21 (2 MiB). Right? But what if: max_pfn = (1<<28) + 1 Then size needs to be one more page. Right? But what would the code do?
On Mon, 2025-09-29 at 11:08 -0700, Dave Hansen wrote: > On 9/29/25 10:41, Edgecombe, Rick P wrote: > > On Tue, 2025-09-23 at 15:45 +0800, Binbin Wu wrote: > > > > +/* > > > > + * Allocate PAMT reference counters for all physical memory. > > > > + * > > > > + * It consumes 2MiB for every 1TiB of physical memory. > > > > + */ > > > > +static int init_pamt_metadata(void) > > > > +{ > > > > + size_t size = max_pfn / PTRS_PER_PTE * sizeof(*pamt_refcounts); > > > Is there guarantee that max_pfn is PTRS_PER_PTE aligned? > > > If not, it should be rounded up. > > Vmalloc() should handle it? > > vmalloc() will, for instance, round up to 2 pages if you ask for 4097 > bytes in 'size'. But that's not the problem. The 'size' calculation > itself is the problem. > > You need exactly 2 MiB for every 1 TiB of memory, so let's say we have: > > max_pfn = 1<<28 > > (where 28 == 40-PAGE_SIZE) then size would be *exactly* 1<<21 (2 MiB). > Right? > > But what if: > > max_pfn = (1<<28) + 1 > > Then size needs to be one more page. Right? But what would the code do? Doh, right. There is an additional issue. A later patch tweaks it to be: + size = max_pfn / PTRS_PER_PTE * sizeof(*pamt_refcounts); + size = round_up(size, PAGE_SIZE); Perhaps an attempt to fix up the issue by Kirill? It should be fixed like Binbin suggests, maybe: + size = DIV_ROUND_UP(max_pfn, PTRS_PER_PTE) * sizeof(*pamt_refcounts); Thanks, and sorry for not giving the comment the proper attention the first time around.
© 2016 - 2025 Red Hat, Inc.