PCI/TSM: PCIe Link Encryption Establishment via TDX platform services

[PATCH v2 03/31] x86/virt/tdx: Add tdx_page_array helpers for new TDX Module objects

Posted by Xu Yilun 5 days, 23 hours ago

Add struct tdx_page_array definition for new TDX Module object
types - HPA_ARRAY_T and HPA_LIST_INFO. They are used as input/output
parameters in newly defined SEAMCALLs. Also define some helpers to
allocate, setup and free tdx_page_array.

HPA_ARRAY_T and HPA_LIST_INFO are similar in most aspects. They both
represent a list of pages for TDX Module accessing. There are several
use cases for these 2 structures:

 - As SEAMCALL inputs. They are claimed by TDX Module as control pages.
   Control pages are private pages for TDX Module to hold its internal
   control structures or private data. TDR, TDCS, TDVPR... are existing
   control pages, just not added via tdx_page_array.
 - As SEAMCALL outputs. They were TDX Module control pages and now are
   released.
 - As SEAMCALL inputs. They are just temporary buffers for exchanging
   data blobs in one SEAMCALL. TDX Module will not hold them for long
   time.

The 2 structures both need a 'root page' which contains a list of HPAs.
They collapse the HPA of the root page and the number of valid HPAs
into a 64 bit raw value for SEAMCALL parameters. The root page is
always a medium for passing data pages, TDX Module never keeps the
root page.

A main difference is HPA_ARRAY_T requires singleton mode when
containing just 1 functional page (page0). In this mode the root page is
not needed and the HPA field of the raw value directly points to the
page0. But in this patch, root page is always allocated for user
friendly kAPIs.

Another small difference is HPA_LIST_INFO contains a 'first entry' field
which could be filled by TDX Module. This simplifies host by providing
the same structure when re-invoke the interrupted SEAMCALL. No need for
host to touch this field.

Typical usages of the tdx_page_array:

1. Add control pages:
 - struct tdx_page_array *array = tdx_page_array_create(nr_pages);
 - seamcall(TDH_XXX_CREATE, array, ...);

2. Release control pages:
 - seamcall(TDX_XXX_DELETE, array, &nr_released, &released_hpa);
 - tdx_page_array_ctrl_release(array, nr_released, released_hpa);

3. Exchange data blobs:
 - struct tdx_page_array *array = tdx_page_array_create(nr_pages);
 - seamcall(TDX_XXX, array, ...);
 - Read data from array.
 - tdx_page_array_free(array);

4. Note the root page contains 512 HPAs at most, if more pages are
   required, re-populate the tdx_page_array is needed.

 - struct tdx_page_array *array = tdx_page_array_alloc(nr_pages);
 - for each 512-page bulk
   - tdx_page_array_populate(array, offset);
   - seamcall(TDH_XXX_ADD, array, ...);

In case 2, SEAMCALLs output the released page array in the form of
HPA_ARRAY_T or PAGE_LIST_INFO. Use tdx_page_array_ctrl_release() to
check if the output pages match the original input pages. If failed,
TDX Module is buggy. In this case the safer way is to leak the
control pages, call tdx_page_array_ctrl_leak().

The usage of tdx_page_array will be in following patches.

Co-developed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
---
 arch/x86/include/asm/tdx.h  |  37 +++++
 arch/x86/virt/vmx/tdx/tdx.c | 299 ++++++++++++++++++++++++++++++++++++
 2 files changed, 336 insertions(+)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 65c4da396450..9173a432b312 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -139,6 +139,43 @@ void tdx_guest_keyid_free(unsigned int keyid);
 
 void tdx_quirk_reset_page(struct page *page);
 
+/**
+ * struct tdx_page_array - Represents a list of pages for TDX Module access
+ * @nr_pages: Total number of data pages in the collection
+ * @pages: Array of data page pointers containing all the data
+ *
+ * @offset: Internal: The starting index in @pages, positions the currently
+ *	    populated page window in @root.
+ * @nents: Internal: Number of valid HPAs for the page window in @root
+ * @root: Internal: A single 4KB page holding the 8-byte HPAs of the page
+ *	  window. The page window max size is constrained by the root page,
+ *	  which is 512 HPAs.
+ *
+ * This structure abstracts several TDX Module defined object types, e.g.,
+ * HPA_ARRAY_T and HPA_LIST_INFO. Typically they all use a "root page" as the
+ * medium to exchange a list of data pages between host and TDX Module. This
+ * structure serves as a unified parameter type for SEAMCALL wrappers, where
+ * these hardware object types are needed.
+ */
+struct tdx_page_array {
+	/* public: */
+	unsigned int nr_pages;
+	struct page **pages;
+
+	/* private: */
+	unsigned int offset;
+	unsigned int nents;
+	u64 *root;
+};
+
+void tdx_page_array_free(struct tdx_page_array *array);
+DEFINE_FREE(tdx_page_array_free, struct tdx_page_array *, if (_T) tdx_page_array_free(_T))
+struct tdx_page_array *tdx_page_array_create(unsigned int nr_pages);
+void tdx_page_array_ctrl_leak(struct tdx_page_array *array);
+int tdx_page_array_ctrl_release(struct tdx_page_array *array,
+				unsigned int nr_released,
+				u64 released_hpa);
+
 struct tdx_td {
 	/* TD root structure: */
 	struct page *tdr_page;
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 8b8e165a2001..a3021e7e2490 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -30,6 +30,7 @@
 #include <linux/suspend.h>
 #include <linux/idr.h>
 #include <linux/kvm_types.h>
+#include <linux/bitfield.h>
 #include <asm/page.h>
 #include <asm/special_insns.h>
 #include <asm/msr-index.h>
@@ -258,6 +259,304 @@ static int build_tdx_memlist(struct list_head *tmb_list)
 	return ret;
 }
 
+#define TDX_PAGE_ARRAY_MAX_NENTS	(PAGE_SIZE / sizeof(u64))
+
+static int tdx_page_array_populate(struct tdx_page_array *array,
+				   unsigned int offset)
+{
+	u64 *entries;
+	int i;
+
+	if (offset >= array->nr_pages)
+		return 0;
+
+	array->offset = offset;
+	array->nents = umin(array->nr_pages - offset,
+			    TDX_PAGE_ARRAY_MAX_NENTS);
+
+	entries = array->root;
+	for (i = 0; i < array->nents; i++)
+		entries[i] = page_to_phys(array->pages[offset + i]);
+
+	return array->nents;
+}
+
+static void tdx_free_pages_bulk(unsigned int nr_pages, struct page **pages)
+{
+	int i;
+
+	for (i = 0; i < nr_pages; i++)
+		__free_page(pages[i]);
+}
+
+static int tdx_alloc_pages_bulk(unsigned int nr_pages, struct page **pages)
+{
+	unsigned int filled, done = 0;
+
+	do {
+		filled = alloc_pages_bulk(GFP_KERNEL, nr_pages - done,
+					  pages + done);
+		if (!filled) {
+			tdx_free_pages_bulk(done, pages);
+			return -ENOMEM;
+		}
+
+		done += filled;
+	} while (done != nr_pages);
+
+	return 0;
+}
+
+/**
+ * tdx_page_array_free() - Free all memory for a tdx_page_array
+ * @array: The tdx_page_array to be freed.
+ *
+ * Free all associated pages and the container itself.
+ */
+void tdx_page_array_free(struct tdx_page_array *array)
+{
+	if (!array)
+		return;
+
+	tdx_free_pages_bulk(array->nr_pages, array->pages);
+	kfree(array->pages);
+	kfree(array->root);
+	kfree(array);
+}
+EXPORT_SYMBOL_GPL(tdx_page_array_free);
+
+static struct tdx_page_array *
+tdx_page_array_alloc(unsigned int nr_pages)
+{
+	struct tdx_page_array *array = NULL;
+	struct page **pages = NULL;
+	u64 *root = NULL;
+	int ret;
+
+	if (!nr_pages)
+		return NULL;
+
+	array = kzalloc_obj(*array);
+	if (!array)
+		goto out_free;
+
+	root = kzalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!root)
+		goto out_free;
+
+	pages = kcalloc(nr_pages, sizeof(*pages), GFP_KERNEL);
+	if (!pages)
+		goto out_free;
+
+	ret = tdx_alloc_pages_bulk(nr_pages, pages);
+	if (ret)
+		goto out_free;
+
+	array->nr_pages = nr_pages;
+	array->pages = pages;
+	array->root = root;
+
+	return array;
+
+out_free:
+	kfree(pages);
+	kfree(root);
+	kfree(array);
+
+	return NULL;
+}
+
+/**
+ * tdx_page_array_create() - Create a small tdx_page_array (up to 512 pages)
+ * @nr_pages: Number of pages to allocate (must be <= 512).
+ *
+ * Allocate and populate a tdx_page_array in a single step. This is intended
+ * for small collections that fit within a single root page. The allocated
+ * pages are all order-0 pages. This is the most common use case for a list of
+ * TDX control pages.
+ *
+ * If more pages are required, use tdx_page_array_alloc() and
+ * tdx_page_array_populate() to build tdx_page_array chunk by chunk.
+ *
+ * Return: Fully populated tdx_page_array or NULL on failure.
+ */
+struct tdx_page_array *tdx_page_array_create(unsigned int nr_pages)
+{
+	struct tdx_page_array *array;
+	int populated;
+
+	if (nr_pages > TDX_PAGE_ARRAY_MAX_NENTS)
+		return NULL;
+
+	array = tdx_page_array_alloc(nr_pages);
+	if (!array)
+		return NULL;
+
+	populated = tdx_page_array_populate(array, 0);
+	if (populated != nr_pages)
+		goto out_free;
+
+	return array;
+
+out_free:
+	tdx_page_array_free(array);
+	return NULL;
+}
+EXPORT_SYMBOL_GPL(tdx_page_array_create);
+
+/**
+ * tdx_page_array_ctrl_leak() - Leak data pages and free the container
+ * @array: The tdx_page_array to be leaked.
+ *
+ * Call this function when failed to reclaim the control pages. Free the root
+ * page and the holding structures, but orphan the data pages, to prevent the
+ * host from re-allocating and accessing memory that the hardware may still
+ * consider private.
+ */
+void tdx_page_array_ctrl_leak(struct tdx_page_array *array)
+{
+	if (!array)
+		return;
+
+	kfree(array->pages);
+	kfree(array->root);
+	kfree(array);
+}
+EXPORT_SYMBOL_GPL(tdx_page_array_ctrl_leak);
+
+static bool tdx_page_array_validate_release(struct tdx_page_array *array,
+					    unsigned int offset,
+					    unsigned int nr_released,
+					    u64 released_hpa)
+{
+	unsigned int nents;
+
+	if (offset >= array->nr_pages)
+		return false;
+
+	nents = umin(array->nr_pages - offset, TDX_PAGE_ARRAY_MAX_NENTS);
+
+	if (nents != nr_released) {
+		pr_err("%s nr_released [%d] doesn't match page array nents [%d]\n",
+		       __func__, nr_released, nents);
+		return false;
+	}
+
+	/*
+	 * Unfortunately TDX has multiple page allocation protocols, check the
+	 * "singleton" case required for HPA_ARRAY_T.
+	 */
+	if (page_to_phys(array->pages[0]) == released_hpa &&
+	    array->nr_pages == 1)
+		return true;
+
+	/* Then check the "non-singleton" case */
+	if (virt_to_phys(array->root) == released_hpa) {
+		u64 *entries = array->root;
+		int i;
+
+		for (i = 0; i < nents; i++) {
+			struct page *page = array->pages[offset + i];
+			u64 val = page_to_phys(page);
+
+			if (val != entries[i]) {
+				pr_err("%s entry[%d] [0x%llx] doesn't match page hpa [0x%llx]\n",
+				       __func__, i, entries[i], val);
+				return false;
+			}
+		}
+
+		return true;
+	}
+
+	pr_err("%s failed to validate, released_hpa [0x%llx], root page hpa [0x%llx], page0 hpa [%#llx], number pages %u\n",
+	       __func__, released_hpa, virt_to_phys(array->root),
+	       page_to_phys(array->pages[0]), array->nr_pages);
+
+	return false;
+}
+
+/**
+ * tdx_page_array_ctrl_release() - Verify and release TDX control pages
+ * @array: The tdx_page_array used to originally create control pages.
+ * @nr_released: Number of HPAs the TDX Module reported as released.
+ * @released_hpa: The HPA list the TDX Module reported as released.
+ *
+ * TDX Module can at most release 512 control pages, so this function only
+ * accepts small tdx_page_array (up to 512 pages), usually created by
+ * tdx_page_array_create().
+ *
+ * Return: 0 on success, -errno on page release protocol error.
+ */
+int tdx_page_array_ctrl_release(struct tdx_page_array *array,
+				unsigned int nr_released,
+				u64 released_hpa)
+{
+	int i;
+
+	/*
+	 * The only case where ->nr_pages is allowed to be >
+	 * TDX_PAGE_ARRAY_MAX_NENTS is a case where those pages are never
+	 * expected to be released by this function.
+	 */
+	if (WARN_ON(array->nr_pages > TDX_PAGE_ARRAY_MAX_NENTS))
+		return -EINVAL;
+
+	if (WARN_ONCE(!tdx_page_array_validate_release(array, 0, nr_released,
+						       released_hpa),
+		      "page release protocol error, consider reboot and replace TDX Module.\n"))
+		return -EFAULT;
+
+	for (i = 0; i < array->nr_pages; i++) {
+		u64 r;
+
+		r = tdh_phymem_page_wbinvd_hkid(tdx_global_keyid,
+						array->pages[i]);
+		if (WARN_ON(r))
+			return -EFAULT;
+	}
+
+	tdx_page_array_free(array);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(tdx_page_array_ctrl_release);
+
+#define HPA_LIST_INFO_FIRST_ENTRY	GENMASK_U64(11, 3)
+#define HPA_LIST_INFO_PFN		GENMASK_U64(51, 12)
+#define HPA_LIST_INFO_LAST_ENTRY	GENMASK_U64(63, 55)
+
+static u64 __maybe_unused hpa_list_info_assign_raw(struct tdx_page_array *array)
+{
+	return FIELD_PREP(HPA_LIST_INFO_FIRST_ENTRY, 0) |
+	       FIELD_PREP(HPA_LIST_INFO_PFN,
+			  PFN_DOWN(virt_to_phys(array->root))) |
+	       FIELD_PREP(HPA_LIST_INFO_LAST_ENTRY, array->nents - 1);
+}
+
+#define HPA_ARRAY_T_PFN		GENMASK_U64(51, 12)
+#define HPA_ARRAY_T_SIZE	GENMASK_U64(63, 55)
+
+static u64 __maybe_unused hpa_array_t_assign_raw(struct tdx_page_array *array)
+{
+	unsigned long pfn;
+
+	if (array->nents == 1)
+		pfn = page_to_pfn(array->pages[array->offset]);
+	else
+		pfn = PFN_DOWN(virt_to_phys(array->root));
+
+	return FIELD_PREP(HPA_ARRAY_T_PFN, pfn) |
+	       FIELD_PREP(HPA_ARRAY_T_SIZE, array->nents - 1);
+}
+
+static u64 __maybe_unused hpa_array_t_release_raw(struct tdx_page_array *array)
+{
+	if (array->nents == 1)
+		return 0;
+
+	return virt_to_phys(array->root);
+}
+
 static int read_sys_metadata_field(u64 field_id, u64 *data)
 {
 	struct tdx_module_args args = {};
-- 
2.25.1

Re: [PATCH v2 03/31] x86/virt/tdx: Add tdx_page_array helpers for new TDX Module objects

Posted by Nikolay Borisov 3 days, 2 hours ago


On 27.03.26 г. 18:01 ч., Xu Yilun wrote:
<snip>

> +/**
> + * tdx_page_array_ctrl_leak() - Leak data pages and free the container
> + * @array: The tdx_page_array to be leaked.
> + *
> + * Call this function when failed to reclaim the control pages. Free the root
> + * page and the holding structures, but orphan the data pages, to prevent the
> + * host from re-allocating and accessing memory that the hardware may still
> + * consider private.
> + */
> +void tdx_page_array_ctrl_leak(struct tdx_page_array *array)
> +{
> +	if (!array)
> +		return;
> +
> +	kfree(array->pages);
> +	kfree(array->root);
> +	kfree(array);
> +}
> +EXPORT_SYMBOL_GPL(tdx_page_array_ctrl_leak);

This instantly raises a red flag if by design an API has the ability to 
simply leak memory. Under what conditions this might be required, can't 
we do something to gracefully handle the case when pages cannot be freed 
instantly, i.e queued freeing or some such ? Simply leaking them is a 
big NO.

> +
> +static bool tdx_page_array_validate_release(struct tdx_page_array *array,
> +					    unsigned int offset,

This function is only ever called with offset of 0, if it's intended to 
be used later then I'd rather see this argument added in an explicit 
patch with rationale why it's needed.

> +					    unsigned int nr_released,
> +					    u64 released_hpa)
> +{
> +	unsigned int nents;
> +
> +	if (offset >= array->nr_pages)
> +		return false;
> +
> +	nents = umin(array->nr_pages - offset, TDX_PAGE_ARRAY_MAX_NENTS);
> +
> +	if (nents != nr_released) {
> +		pr_err("%s nr_released [%d] doesn't match page array nents [%d]\n",
> +		       __func__, nr_released, nents);
> +		return false;
> +	}
> +
> +	/*
> +	 * Unfortunately TDX has multiple page allocation protocols, check the
> +	 * "singleton" case required for HPA_ARRAY_T.
> +	 */
> +	if (page_to_phys(array->pages[0]) == released_hpa &&
> +	    array->nr_pages == 1)
> +		return true;
> +
> +	/* Then check the "non-singleton" case */
> +	if (virt_to_phys(array->root) == released_hpa) {
> +		u64 *entries = array->root;
> +		int i;
> +
> +		for (i = 0; i < nents; i++) {
> +			struct page *page = array->pages[offset + i];
> +			u64 val = page_to_phys(page);
> +
> +			if (val != entries[i]) {
> +				pr_err("%s entry[%d] [0x%llx] doesn't match page hpa [0x%llx]\n",
> +				       __func__, i, entries[i], val);
> +				return false;
> +			}
> +		}
> +
> +		return true;
> +	}
> +
> +	pr_err("%s failed to validate, released_hpa [0x%llx], root page hpa [0x%llx], page0 hpa [%#llx], number pages %u\n",
> +	       __func__, released_hpa, virt_to_phys(array->root),
> +	       page_to_phys(array->pages[0]), array->nr_pages);
> +
> +	return false;
> +}


> +<snip>

Re: [PATCH v2 03/31] x86/virt/tdx: Add tdx_page_array helpers for new TDX Module objects

Posted by Xu Yilun 2 days, 2 hours ago

On Mon, Mar 30, 2026 at 04:31:43PM +0300, Nikolay Borisov wrote:
> 
> 
> On 27.03.26 г. 18:01 ч., Xu Yilun wrote:
> <snip>
> 
> > +/**
> > + * tdx_page_array_ctrl_leak() - Leak data pages and free the container
> > + * @array: The tdx_page_array to be leaked.
> > + *
> > + * Call this function when failed to reclaim the control pages. Free the root
> > + * page and the holding structures, but orphan the data pages, to prevent the
> > + * host from re-allocating and accessing memory that the hardware may still
> > + * consider private.
> > + */
> > +void tdx_page_array_ctrl_leak(struct tdx_page_array *array)
> > +{
> > +	if (!array)
> > +		return;
> > +
> > +	kfree(array->pages);
> > +	kfree(array->root);
> > +	kfree(array);
> > +}
> > +EXPORT_SYMBOL_GPL(tdx_page_array_ctrl_leak);
> 
> This instantly raises a red flag if by design an API has the ability to

OK. With the discussion in this thread, I tend to remove this leak API.

> simply leak memory. Under what conditions this might be required, can't we
> do something to gracefully handle the case when pages cannot be freed
> instantly, i.e queued freeing or some such ? Simply leaking them is a big
> NO.

It was intended to be called when failing to reclaim pages from secure
firmware, maybe because of firmware bug. In this case kernel has no idea
what to do. Leaking is a last resort here, don't expect things still work.

Re: [PATCH v2 03/31] x86/virt/tdx: Add tdx_page_array helpers for new TDX Module objects

Posted by Edgecombe, Rick P 5 days, 14 hours ago

Hi,

In general I'm struggling to understand the design decisions. It seems a very
specific design and quite a bit of code to manage an array of pages. Questions
below.

On Sat, 2026-03-28 at 00:01 +0800, Xu Yilun wrote:
> Add struct tdx_page_array definition for new TDX Module object
> types - HPA_ARRAY_T and HPA_LIST_INFO. 

This is unfortunate. I see you agree in the comments.

> 
> They are used as input/output
> parameters in newly defined SEAMCALLs. Also define some helpers to
> allocate, setup and free tdx_page_array.
> 
> HPA_ARRAY_T and HPA_LIST_INFO are similar in most aspects. They both
> represent a list of pages for TDX Module accessing. There are several
> use cases for these 2 structures:
> 
>  - As SEAMCALL inputs. They are claimed by TDX Module as control pages.
>    Control pages are private pages for TDX Module to hold its internal
>    control structures or private data. TDR, TDCS, TDVPR... are existing
>    control pages, just not added via tdx_page_array.
>  - As SEAMCALL outputs. They were TDX Module control pages and now are
>    released.
>  - As SEAMCALL inputs. They are just temporary buffers for exchanging
>    data blobs in one SEAMCALL. TDX Module will not hold them for long
>    time.

This is kind of verbose for what it seems to be trying to say. It's just that
these types can be input or output params. The TDX module could hold on to the
pages for a long time, or just transiently. For that latter part I think you are
trying to say sometimes they need flushing and sometimes they don't?

> 
> The 2 structures both need a 'root page' which contains a list of HPAs.
> They collapse the HPA of the root page and the number of valid HPAs
> into a 64 bit raw value for SEAMCALL parameters. The root page is
> always a medium for passing data pages, TDX Module never keeps the
> root page.
> 
> A main difference is HPA_ARRAY_T requires singleton mode when
> containing just 1 functional page (page0). In this mode the root page is
> not needed and the HPA field of the raw value directly points to the
> page0. But in this patch, root page is always allocated for user
> friendly kAPIs.

"singleton mode"? What is it? If it's the case of not needing populate loop, it
probably deserves more explanation. I'm not sure, but the populate loop seems to
drive a lot of the struct design?

> 
> Another small difference is HPA_LIST_INFO contains a 'first entry' field
> which could be filled by TDX Module. This simplifies host by providing
> the same structure when re-invoke the interrupted SEAMCALL. No need for
> host to touch this field.

Not clear what this is talking about. But I'm starting to wonder if we should be
so bold to claim that the differences between the types really simplify the
host. 

> 
> Typical usages of the tdx_page_array:
> 
> 1. Add control pages:
>  - struct tdx_page_array *array = tdx_page_array_create(nr_pages);
>  - seamcall(TDH_XXX_CREATE, array, ...);
> 
> 2. Release control pages:
>  - seamcall(TDX_XXX_DELETE, array, &nr_released, &released_hpa);
>  - tdx_page_array_ctrl_release(array, nr_released, released_hpa);

So release is mostly needed because of the need to do the wbvind seamcall? And
unlike tdx_page_array_free() it returns an error in case that fails. Or other
sanity checking. But all the callers do the same thing on error, call
tdx_page_array_ctrl_leak().

Just wondering if we could simplify it somehow. There are two helpers and the
caller has to know which one to call based on SEAMCALL specifics. What if the
seamcall wrapper set a bit in the page array while passing it out. The bit would
specify to the helper if it needs to do wbinvd or not. Then the wrappers could
encapsulate the type of free needed and not rely on the caller to know. And we
only need to have one function for it instead of two.

BTW, do we expect errors from the tdh_phymem_page_wbinvd_hkid() calls here? How
could the BUSY happen? If we don't think it can happen in normal runtime, we
could just warn and skip the special leak logic. In KVM side there is a place
where we can't really handle it for the wbinvd calls. And one where we can. If
we need a ton of code to handle a bug somewhere (on kernel side or TDX module),
it seems too defensive to me. At least it's not in sync with the rest of TDX.

Especially the quite large tdx_page_array_validate_release() logic should need a
justification that there is something very tricky that needs all this checking.

But maybe you can explain what the special risk is.

> 
> 3. Exchange data blobs:
>  - struct tdx_page_array *array = tdx_page_array_create(nr_pages);
>  - seamcall(TDX_XXX, array, ...);
>  - Read data from array.
> 
> 
> 4. Note the root page contains 512 HPAs at most, if more pages are
>    required, re-populate the tdx_page_array is needed.
> 
>  - struct tdx_page_array *array = tdx_page_array_alloc(nr_pages);
>  - for each 512-page bulk
>    - tdx_page_array_populate(array, offset);
>    - seamcall(TDH_XXX_ADD, array, ...);
> 
> In case 2, SEAMCALLs output the released page array in the form of
> HPA_ARRAY_T or PAGE_LIST_INFO. Use tdx_page_array_ctrl_release() to
> check if the output pages match the original input pages. If failed,
> TDX Module is buggy. In this case the safer way is to leak the
> control pages, call tdx_page_array_ctrl_leak().
> 
> The usage of tdx_page_array will be in following patches.
> 
> Co-developed-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com>
> ---
>  arch/x86/include/asm/tdx.h  |  37 +++++
>  arch/x86/virt/vmx/tdx/tdx.c | 299 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 336 insertions(+)
> 
> diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
> index 65c4da396450..9173a432b312 100644
> --- a/arch/x86/include/asm/tdx.h
> +++ b/arch/x86/include/asm/tdx.h
> @@ -139,6 +139,43 @@ void tdx_guest_keyid_free(unsigned int keyid);
>  
>  void tdx_quirk_reset_page(struct page *page);
>  
> +/**
> + * struct tdx_page_array - Represents a list of pages for TDX Module access
> + * @nr_pages: Total number of data pages in the collection
> + * @pages: Array of data page pointers containing all the data
> + *
> + * @offset: Internal: The starting index in @pages, positions the currently
> + *	    populated page window in @root.
> + * @nents: Internal: Number of valid HPAs for the page window in @root
> + * @root: Internal: A single 4KB page holding the 8-byte HPAs of the page
> + *	  window. The page window max size is constrained by the root page,
> + *	  which is 512 HPAs.
> + *
> + * This structure abstracts several TDX Module defined object types, e.g.,
> + * HPA_ARRAY_T and HPA_LIST_INFO. Typically they all use a "root page" as the
> + * medium to exchange a list of data pages between host and TDX Module. This
> + * structure serves as a unified parameter type for SEAMCALL wrappers, where
> + * these hardware object types are needed.
> + */
> +struct tdx_page_array {
> +	/* public: */
> +	unsigned int nr_pages;
> +	struct page **pages;
> +
> +	/* private: */
> +	unsigned int offset;
> +	unsigned int nents;
> +	u64 *root;

pages is going to be an array of struct pointers, and root is a single page of
PA's that gets re-used to copy and pass the PA's to the TDX module. Why do we
need both? Like just keep an array of PA's that would be the same size as the
struct page array. And not need the populate loop? 

Pausing for now. Still looking through the callers and it's the end of the day.

Re: [PATCH v2 03/31] x86/virt/tdx: Add tdx_page_array helpers for new TDX Module objects

Posted by Xu Yilun 2 days, 23 hours ago

> > Typical usages of the tdx_page_array:
> > 
> > 1. Add control pages:
> >  - struct tdx_page_array *array = tdx_page_array_create(nr_pages);
> >  - seamcall(TDH_XXX_CREATE, array, ...);
> > 
> > 2. Release control pages:
> >  - seamcall(TDX_XXX_DELETE, array, &nr_released, &released_hpa);
> >  - tdx_page_array_ctrl_release(array, nr_released, released_hpa);
> 
> So release is mostly needed because of the need to do the wbvind seamcall? And
> unlike tdx_page_array_free() it returns an error in case that fails. Or other
> sanity checking. But all the callers do the same thing on error, call
> tdx_page_array_ctrl_leak().
> 
> Just wondering if we could simplify it somehow. There are two helpers and the
> caller has to know which one to call based on SEAMCALL specifics. What if the
> seamcall wrapper set a bit in the page array while passing it out. The bit would
> specify to the helper if it needs to do wbinvd or not. Then the wrappers could
> encapsulate the type of free needed and not rely on the caller to know. And we
> only need to have one function for it instead of two.

I like the idea, I can have a try.

But we may need more than a bit to finish the release. On release some
SEAMCALLs return the released HPA list and host checks they are sane,
otherwise leak. We need to also record these info in the struct.

> 
> 
> BTW, do we expect errors from the tdh_phymem_page_wbinvd_hkid() calls here? How

No, it can't happen in normal runtime.

> could the BUSY happen? If we don't think it can happen in normal runtime, we
> could just warn and skip the special leak logic. In KVM side there is a place
> where we can't really handle it for the wbinvd calls. And one where we can. If

But we do leak control pages when wbinvd fails, or even when
tdh_phymem_page_reclaim() fails. So anyway we need leak logic, is it?
Is it a little insane if we failed to reclaim and return the private
page to kernel?

> we need a ton of code to handle a bug somewhere (on kernel side or TDX module),
> it seems too defensive to me. At least it's not in sync with the rest of TDX.
> 
> Especially the quite large tdx_page_array_validate_release() logic should need a
> justification that there is something very tricky that needs all this checking.
> 
> But maybe you can explain what the special risk is.

I don't see the special risk, actually I don't even see the releasing
failed once.

But we do check the return value of tdh_phymem_page_reclaim() for a
single ctrl page releasing. It is just we also check a list of ctrl
pages releasing. The check becomes naturally complex, e.g., if the
released number matches, if every HPA in the list matches ...

...

> > +struct tdx_page_array {
> > +	/* public: */
> > +	unsigned int nr_pages;
> > +	struct page **pages;
> > +
> > +	/* private: */
> > +	unsigned int offset;
> > +	unsigned int nents;
> > +	u64 *root;
> 
> pages is going to be an array of struct pointers, and root is a single page of
> PA's that gets re-used to copy and pass the PA's to the TDX module. Why do we
> need both? Like just keep an array of PA's that would be the same size as the
> struct page array. And not need the populate loop? 

We need Linux language, struct page *, for alloc and free. Also need
TDX Module language - PA list - for SEAMCALLs. So IIUC, the page to PA
populating won't disappear on allocation, the PA to page populating
would appear on free.

Besides, host may need to vmap and access the (shared) pages.

Thanks,
Yilun

Re: [PATCH v2 03/31] x86/virt/tdx: Add tdx_page_array helpers for new TDX Module objects

Posted by Edgecombe, Rick P 2 days, 15 hours ago

On Mon, 2026-03-30 at 23:47 +0800, Xu Yilun wrote:
> > pages is going to be an array of struct pointers, and root is a single page
> > of
> > PA's that gets re-used to copy and pass the PA's to the TDX module. Why do
> > we
> > need both? Like just keep an array of PA's that would be the same size as
> > the
> > struct page array. And not need the populate loop? 
> 
> We need Linux language, struct page *, for alloc and free. Also need
> TDX Module language - PA list - for SEAMCALLs. So IIUC, the page to PA
> populating won't disappear on allocation, the PA to page populating
> would appear on free.

Not sure what you mean by this.

> 
> Besides, host may need to vmap and access the (shared) pages.

Some code someday may need to convert a PA to another format? Is that it?
Doesn't seem like big problem.

But I'm not sure about this idea yet.

Re: [PATCH v2 03/31] x86/virt/tdx: Add tdx_page_array helpers for new TDX Module objects

Posted by Xu Yilun 2 days, 5 hours ago

On Mon, Mar 30, 2026 at 11:57:11PM +0000, Edgecombe, Rick P wrote:
> On Mon, 2026-03-30 at 23:47 +0800, Xu Yilun wrote:
> > > pages is going to be an array of struct pointers, and root is a single page
> > > of
> > > PA's that gets re-used to copy and pass the PA's to the TDX module. Why do
> > > we
> > > need both? Like just keep an array of PA's that would be the same size as
> > > the
> > > struct page array. And not need the populate loop? 
> > 
> > We need Linux language, struct page *, for alloc and free. Also need
> > TDX Module language - PA list - for SEAMCALLs. So IIUC, the page to PA
> > populating won't disappear on allocation, the PA to page populating
> > would appear on free.
> 
> Not sure what you mean by this.

I mean host use struct page * for memory allocation and free. If we only
keep the PA list, we still need to convert PAs back to struct page * and
free.

> 
> > 
> > Besides, host may need to vmap and access the (shared) pages.
> 
> Some code someday may need to convert a PA to another format? Is that it?

No, I don't convert a PA to something else. PA list is only for
SEAMCALLs.

Now we use vm_map_ram(array->pages, ...) to map the shared pages,
that's another reason I want to keep the struct page ** in
struct tdx_page_array.

Anyway we use struct page ** for kernel memory management in several
cases, keeping the struct page ** avoids PA -> page populating.

> Doesn't seem like big problem.
> 
> But I'm not sure about this idea yet.

Re: [PATCH v2 03/31] x86/virt/tdx: Add tdx_page_array helpers for new TDX Module objects

Posted by Xu Yilun 3 days, 5 hours ago

> On Sat, 2026-03-28 at 00:01 +0800, Xu Yilun wrote:
> > Add struct tdx_page_array definition for new TDX Module object
> > types - HPA_ARRAY_T and HPA_LIST_INFO. 
> 
> This is unfortunate. I see you agree in the comments.

Yes, basically they are defining the same concept, behave mostly the same
but some differences...

> 
> > 
> > They are used as input/output
> > parameters in newly defined SEAMCALLs. Also define some helpers to
> > allocate, setup and free tdx_page_array.
> > 
> > HPA_ARRAY_T and HPA_LIST_INFO are similar in most aspects. They both
> > represent a list of pages for TDX Module accessing. There are several
> > use cases for these 2 structures:
> > 
> >  - As SEAMCALL inputs. They are claimed by TDX Module as control pages.
> >    Control pages are private pages for TDX Module to hold its internal
> >    control structures or private data. TDR, TDCS, TDVPR... are existing
> >    control pages, just not added via tdx_page_array.
> >  - As SEAMCALL outputs. They were TDX Module control pages and now are
> >    released.
> >  - As SEAMCALL inputs. They are just temporary buffers for exchanging
> >    data blobs in one SEAMCALL. TDX Module will not hold them for long
> >    time.
> 
> This is kind of verbose for what it seems to be trying to say. It's just that

I assume if you feel the explanation of "what is control page" is off
track. I added it cause the term firstly appears in x86 (only in KVM
TDX previously), and people ask the definition:

https://lore.kernel.org/all/cfcfb160-fcd2-4a75-9639-5f7f0894d14b@intel.com/

> these types can be input or output params. The TDX module could hold on to the
> pages for a long time, or just transiently.

Mm.. I'm trying to ramp up on the kernel API level flow:

For control pages, it would be hold by TDX Module long time, so host
inputs the page array, later TDX Module outputs the page array back.
Host need to verify the outputs.

For shared pages, TDX Module's accessing is transient in one SEAMCALL,
so only as input, TDX Module never needs to output the array.

I think the verboseness makes the following pseudo code easier to
understand.

> For that latter part I think you are
> trying to say sometimes they need flushing and sometimes they don't?

Yeah.
control pages => long term => host verifies and releases => flush on release
shared pages => transient => no verify and releases => no flush

Maybe I should mention the flushing is already covered by releasing
kAPI.
> 
> > 
> > The 2 structures both need a 'root page' which contains a list of HPAs.
> > They collapse the HPA of the root page and the number of valid HPAs
> > into a 64 bit raw value for SEAMCALL parameters. The root page is
> > always a medium for passing data pages, TDX Module never keeps the
> > root page.
> > 
> > A main difference is HPA_ARRAY_T requires singleton mode when
> > containing just 1 functional page (page0). In this mode the root page is
> > not needed and the HPA field of the raw value directly points to the
> > page0. But in this patch, root page is always allocated for user
> > friendly kAPIs.
> 
> "singleton mode"? What is it? If it's the case of not needing populate loop, it

It is the SEAMCALL level detail for HPA_ARRAY_T. It is literally as
explained above - the HPA field should be filled by page0, not root page.

> probably deserves more explanation. I'm not sure, but the populate loop seems to
> drive a lot of the struct design?

The caller is not aware of singleton mode. Actually, I'm trying to make
the tdx_page_array independent of HPA_ARRAY_T or HPA_LIST_INFO details
when allocating/populating, root page is still populated even not needed
for singleton mode. The differences only happen when collaping the struct
into u64 SEAMCALL parameters.

> 
> > 
> > Another small difference is HPA_LIST_INFO contains a 'first entry' field
> > which could be filled by TDX Module. This simplifies host by providing
> > the same structure when re-invoke the interrupted SEAMCALL. No need for
> > host to touch this field.
> 
> Not clear what this is talking about. But I'm starting to wonder if we should be
> so bold to claim that the differences between the types really simplify the
> host. 

I'm talking about another SEAMCALL level detail. Sometimes TDX Module
got interrupted in the middle of page array processing, it needs an
anchor to resuming from where it stops, TDX Module record the anchor
in the 'first entry'.

By illustrating these SEAMCALL level differences, I want to explain
they don't impact the general SW flow and kAPI cares about them
internally.

Yes in POC code we do write dedicated code for each type, but it ends up
with plenty of similar logics on caller side about root page
manipulation. By now, the differences are not much, but I think we
should not write copies for every type, we should stop new types.

Please allow me to stop here, will continue later...

Thanks.

Re: [PATCH v2 03/31] x86/virt/tdx: Add tdx_page_array helpers for new TDX Module objects

Posted by Edgecombe, Rick P 2 days, 16 hours ago

On Mon, 2026-03-30 at 18:25 +0800, Xu Yilun wrote:
> > On Sat, 2026-03-28 at 00:01 +0800, Xu Yilun wrote:
> > > Add struct tdx_page_array definition for new TDX Module object
> > > types - HPA_ARRAY_T and HPA_LIST_INFO. 
> > 
> > This is unfortunate. I see you agree in the comments.
> 
> Yes, basically they are defining the same concept, behave mostly the same
> but some differences...
> 
> > 
> > > 
> > > They are used as input/output
> > > parameters in newly defined SEAMCALLs. Also define some helpers to
> > > allocate, setup and free tdx_page_array.
> > > 
> > > HPA_ARRAY_T and HPA_LIST_INFO are similar in most aspects. They both
> > > represent a list of pages for TDX Module accessing. There are several
> > > use cases for these 2 structures:
> > > 
> > >  - As SEAMCALL inputs. They are claimed by TDX Module as control pages.
> > >    Control pages are private pages for TDX Module to hold its internal
> > >    control structures or private data. TDR, TDCS, TDVPR... are existing
> > >    control pages, just not added via tdx_page_array.
> > >  - As SEAMCALL outputs. They were TDX Module control pages and now are
> > >    released.
> > >  - As SEAMCALL inputs. They are just temporary buffers for exchanging
> > >    data blobs in one SEAMCALL. TDX Module will not hold them for long
> > >    time.
> > 
> > This is kind of verbose for what it seems to be trying to say. It's just
> > that
> 
> I assume if you feel the explanation of "what is control page" is off
> track. I added it cause the term firstly appears in x86 (only in KVM
> TDX previously), and people ask the definition:
> 
> https://lore.kernel.org/all/cfcfb160-fcd2-4a75-9639-5f7f0894d14b@intel.com/

I meant it more generally.

> 
> > these types can be input or output params. The TDX module could hold on to
> > the
> > pages for a long time, or just transiently.
> 
> Mm.. I'm trying to ramp up on the kernel API level flow:
> 
> For control pages, it would be hold by TDX Module long time, so host
> inputs the page array, later TDX Module outputs the page array back.
> Host need to verify the outputs.
> 
> For shared pages, TDX Module's accessing is transient in one SEAMCALL,
> so only as input, TDX Module never needs to output the array.
> 
> I think the verboseness makes the following pseudo code easier to
> understand.
> 
> > For that latter part I think you are
> > trying to say sometimes they need flushing and sometimes they don't?
> 
> Yeah.
> control pages => long term => host verifies and releases => flush on release
> shared pages => transient => no verify and releases => no flush
> 
> Maybe I should mention the flushing is already covered by releasing
> kAPI.

I hear:
 1. Long time vs short time
 2. Accessed as private vs "shared"

I think (2) is the important point, right? Why does (1) matter?

> > 
> > > 
> > > The 2 structures both need a 'root page' which contains a list of HPAs.
> > > They collapse the HPA of the root page and the number of valid HPAs
> > > into a 64 bit raw value for SEAMCALL parameters. The root page is
> > > always a medium for passing data pages, TDX Module never keeps the
> > > root page.
> > > 
> > > A main difference is HPA_ARRAY_T requires singleton mode when
> > > containing just 1 functional page (page0). In this mode the root page is
> > > not needed and the HPA field of the raw value directly points to the
> > > page0. But in this patch, root page is always allocated for user
> > > friendly kAPIs.
> > 
> > "singleton mode"? What is it? If it's the case of not needing populate loop,
> > it
> 
> It is the SEAMCALL level detail for HPA_ARRAY_T. It is literally as
> explained above - the HPA field should be filled by page0, not root page.
> 
> > probably deserves more explanation. I'm not sure, but the populate loop
> > seems to
> > drive a lot of the struct design?
> 
> The caller is not aware of singleton mode. Actually, I'm trying to make
> the tdx_page_array independent of HPA_ARRAY_T or HPA_LIST_INFO details
> when allocating/populating, root page is still populated even not needed
> for singleton mode. The differences only happen when collaping the struct
> into u64 SEAMCALL parameters.

It seems tdx_page_array combines two concepts. An array of pages, and the method
that the pages get handed to the TDX module. What if we broke apart these
concepts?

> 
> > 
> > > 
> > > Another small difference is HPA_LIST_INFO contains a 'first entry' field
> > > which could be filled by TDX Module. This simplifies host by providing
> > > the same structure when re-invoke the interrupted SEAMCALL. No need for
> > > host to touch this field.
> > 
> > Not clear what this is talking about. But I'm starting to wonder if we
> > should be
> > so bold to claim that the differences between the types really simplify the
> > host. 
> 
> I'm talking about another SEAMCALL level detail. Sometimes TDX Module
> got interrupted in the middle of page array processing, it needs an
> anchor to resuming from where it stops, TDX Module record the anchor
> in the 'first entry'.
> 
> By illustrating these SEAMCALL level differences, I want to explain
> they don't impact the general SW flow and kAPI cares about them
> internally.
> 
> Yes in POC code we do write dedicated code for each type, but it ends up
> with plenty of similar logics on caller side about root page
> manipulation. By now, the differences are not much, but I think we
> should not write copies for every type, we should stop new types.

Hmm, doesn't it seem like this is quickly becoming complicated though, to
combine all the different types together? And it seems there are more coming
that people want to add to this.

> 
> Please allow me to stop here, will continue later...
> 
> Thanks.

Re: [PATCH v2 03/31] x86/virt/tdx: Add tdx_page_array helpers for new TDX Module objects

Posted by Tony Lindgren 1 day, 8 hours ago

On Mon, Mar 30, 2026 at 11:25:08PM +0000, Edgecombe, Rick P wrote:
> On Mon, 2026-03-30 at 18:25 +0800, Xu Yilun wrote:
> > The caller is not aware of singleton mode. Actually, I'm trying to make
> > the tdx_page_array independent of HPA_ARRAY_T or HPA_LIST_INFO details
> > when allocating/populating, root page is still populated even not needed
> > for singleton mode. The differences only happen when collaping the struct
> > into u64 SEAMCALL parameters.
> 
> It seems tdx_page_array combines two concepts. An array of pages, and the method
> that the pages get handed to the TDX module. What if we broke apart these
> concepts?

We could add an enum for the LIST_INFO type to intialized the tdx_page_array?

Then the code using the tdx_page_array could initialize the root page based on
the LIST_INFO type for the SEAMCALL.

Regards,

Tony

Re: [PATCH v2 03/31] x86/virt/tdx: Add tdx_page_array helpers for new TDX Module objects

Posted by Tony Lindgren 2 days, 9 hours ago

On Mon, Mar 30, 2026 at 11:25:08PM +0000, Edgecombe, Rick P wrote:
> On Mon, 2026-03-30 at 18:25 +0800, Xu Yilun wrote:
> > Yes in POC code we do write dedicated code for each type, but it ends up
> > with plenty of similar logics on caller side about root page
> > manipulation. By now, the differences are not much, but I think we
> > should not write copies for every type, we should stop new types.
> 
> Hmm, doesn't it seem like this is quickly becoming complicated though, to
> combine all the different types together? And it seems there are more coming
> that people want to add to this.

Most of the list entry types are the same with just some bits unused.

Regards,

Tony