From nobody Mon Jun 8 22:51:56 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 636C62E1722; Tue, 26 May 2026 02:35:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762926; cv=none; b=Sprp1p70cWmzKA2EdUcK20POCyo+/4qrxzG/KEawSMyF0ck9rUn9Uk3OrSt2ORpGGUxu7uYnN2u7JYKjvLB0DrU7Trr45k0RtSnTK8wj0j7qKp1jeLwYnYmzICOwOIrEr3yEQahZs/Vu5wg0jbWFUkSmDiCjeWKkcZtuZNmI57Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762926; c=relaxed/simple; bh=SEpamUmtYnjcMiPjIUMQus5ffGo4RJ6Mzn/ClsVQS6k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FdEKz22TJZLqKvX37r9dgfO+TgmkcFVwgmg6OLJ4vK4lQFf5N3HdThlOkwIJUxaVkyCDMv6rlkg2HBnY2A+ttj4XPlMztgpQLt+gdViip10p8UCRzSa4bmDF6KOyyBojcCm+RK5Hq+kO1P1OwnfBM7dXXap969GTA/qq5CVweVo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gqeFl+wz; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gqeFl+wz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779762924; x=1811298924; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SEpamUmtYnjcMiPjIUMQus5ffGo4RJ6Mzn/ClsVQS6k=; b=gqeFl+wzXJw6pAWN1Wo0riYuLMXR9h9CkLSJBk2X8ftYt9xslaXyDlCX Ew57moTJldiqpWBbWHzLl+ZDDb9A3tbFXCAcnTSbuN4tN0iYJ+wmjfyIF dLv40yvzZje82VIFgcsA1VnWImJTI2CkisfehAoF8FA9pQrwQ1POGu4AA KFWBdi15vmiBFCgKYsjP0/kWS20jeoJN6dX8eRZve5twGJWMjXFM06sCD O4yEfGEAgh6BiQ6A2ZoMIC4KMeGZBOLR80N45Chf/pBt/l2I2JVe4/CW5 f7vckg2+HLqhE5vhu8WDgQqMEMU/nci2HOcRhYNwUWNsfUNwfTRpleO/7 g==; X-CSE-ConnectionGUID: btvjyVpdQnuq7XHtIY5wGw== X-CSE-MsgGUID: zYFR1I4XRiiHGnsZ/n7KYA== X-IronPort-AV: E=McAfee;i="6800,10657,11797"; a="91677776" X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="91677776" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:23 -0700 X-CSE-ConnectionGUID: PNVu3DbYSY6pP/X2tya8PQ== X-CSE-MsgGUID: lN5ATs/uQeuvO7f/Uku0Yw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="279878286" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 From: Rick Edgecombe To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com, kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com, seanjc@google.com, tglx@kernel.org, vannapurve@google.com, x86@kernel.org, chao.gao@intel.com, yan.y.zhao@intel.com, kai.huang@intel.com Cc: rick.p.edgecombe@intel.com, Binbin Wu Subject: [PATCH v6 01/11] x86/virt/tdx: Simplify tdmr_get_pamt_sz() Date: Mon, 25 May 2026 19:35:05 -0700 Message-ID: <20260526023515.288829-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526023515.288829-1-rick.p.edgecombe@intel.com> References: <20260526023515.288829-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" For each memory region that the TDX module might use (called TDMR), three separate traditional PAMT allocations are needed. One for each supported page size (1GB, 2MB, 4KB). These store information on each page in the TDMR. In Linux, they are allocated out of one physically contiguous block, in order to more efficiently use some internal TDX module book keeping resources. So some simple math is needed to break the single large allocation into three smaller allocations for each page size. There are some commonalities in the math needed to calculate the base and size for each smaller allocation, and so an effort was made to share logic across the three. Unfortunately doing this turned out unnaturally tortured, with a loop iterating over the three page sizes, only to call into a function with cases statement for each page size. In the future Dynamic PAMT will add more logic that is special to the 4KB page size, making the benefit of the math sharing even more questionable. Three is not a very high number, so get rid of the loop and just duplicate the small calculation three times. In doing so, setup for future Dynamic PAMT changes. Since the loop that iterates over it is gone, further simplify the code by dropping the array of intermediate size and base storage. Just store the values to their final locations. Accept the small complication of having to clear tdmr->pamt_4k_base in the error path, so that tdmr_do_pamt_func() will not try to operate on the TDMR struct when attempting to free it. Assisted-by: GitHub Copilot:claude-opus-4-6 Claude:claude-opus-4-7 Reviewed-by: Binbin Wu Signed-off-by: Rick Edgecombe Reviewed-by: Kiryl Shutsemau (Meta) Reviewed-by: Tony Lindgren --- v6: - Drop {} by moving a comment (Binbin) - Log tweaks v4: - Just refer to global var instead of passing pamt_entry_size around (Xiaoyao) - Remove setting pamt_4k_base to zero, because it already is zero. Adjust the comment appropriately (Kai) v3: - New patch --- arch/x86/virt/vmx/tdx/tdx.c | 93 ++++++++++++------------------------- 1 file changed, 29 insertions(+), 64 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 967482ae3c801..487f389f52f4b 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -516,31 +516,21 @@ static __init int fill_out_tdmrs(struct list_head *tm= b_list, * Calculate PAMT size given a TDMR and a page size. The returned * PAMT size is always aligned up to 4K page boundary. */ -static __init unsigned long tdmr_get_pamt_sz(struct tdmr_info *tdmr, int p= gsz, - u16 pamt_entry_size) +static __init unsigned long tdmr_get_pamt_sz(struct tdmr_info *tdmr, int p= gsz) { unsigned long pamt_sz, nr_pamt_entries; + const int tdx_pg_size_shift[] =3D { PAGE_SHIFT, PMD_SHIFT, PUD_SHIFT }; + const u16 pamt_entry_size[TDX_PS_NR] =3D { + tdx_sysinfo.tdmr.pamt_4k_entry_size, + tdx_sysinfo.tdmr.pamt_2m_entry_size, + tdx_sysinfo.tdmr.pamt_1g_entry_size, + }; =20 - switch (pgsz) { - case TDX_PS_4K: - nr_pamt_entries =3D tdmr->size >> PAGE_SHIFT; - break; - case TDX_PS_2M: - nr_pamt_entries =3D tdmr->size >> PMD_SHIFT; - break; - case TDX_PS_1G: - nr_pamt_entries =3D tdmr->size >> PUD_SHIFT; - break; - default: - WARN_ON_ONCE(1); - return 0; - } + nr_pamt_entries =3D tdmr->size >> tdx_pg_size_shift[pgsz]; + pamt_sz =3D nr_pamt_entries * pamt_entry_size[pgsz]; =20 - pamt_sz =3D nr_pamt_entries * pamt_entry_size; /* TDX requires PAMT size must be 4K aligned */ - pamt_sz =3D ALIGN(pamt_sz, PAGE_SIZE); - - return pamt_sz; + return PAGE_ALIGN(pamt_sz); } =20 /* @@ -578,28 +568,21 @@ static __init int tdmr_get_nid(struct tdmr_info *tdmr= , struct list_head *tmb_lis * within @tdmr, and set up PAMTs for @tdmr. */ static __init int tdmr_set_up_pamt(struct tdmr_info *tdmr, - struct list_head *tmb_list, - u16 pamt_entry_size[]) + struct list_head *tmb_list) { - unsigned long pamt_base[TDX_PS_NR]; - unsigned long pamt_size[TDX_PS_NR]; - unsigned long tdmr_pamt_base; unsigned long tdmr_pamt_size; struct page *pamt; - int pgsz, nid; - + int nid; nid =3D tdmr_get_nid(tdmr, tmb_list); =20 /* * Calculate the PAMT size for each TDX supported page size * and the total PAMT size. */ - tdmr_pamt_size =3D 0; - for (pgsz =3D TDX_PS_4K; pgsz < TDX_PS_NR; pgsz++) { - pamt_size[pgsz] =3D tdmr_get_pamt_sz(tdmr, pgsz, - pamt_entry_size[pgsz]); - tdmr_pamt_size +=3D pamt_size[pgsz]; - } + tdmr->pamt_4k_size =3D tdmr_get_pamt_sz(tdmr, TDX_PS_4K); + tdmr->pamt_2m_size =3D tdmr_get_pamt_sz(tdmr, TDX_PS_2M); + tdmr->pamt_1g_size =3D tdmr_get_pamt_sz(tdmr, TDX_PS_1G); + tdmr_pamt_size =3D tdmr->pamt_4k_size + tdmr->pamt_2m_size + tdmr->pamt_1= g_size; =20 /* * Allocate one chunk of physically contiguous memory for all @@ -607,26 +590,18 @@ static __init int tdmr_set_up_pamt(struct tdmr_info *= tdmr, * in overlapped TDMRs. */ pamt =3D alloc_contig_pages(tdmr_pamt_size >> PAGE_SHIFT, GFP_KERNEL, - nid, &node_online_map); + nid, &node_online_map); + + /* + * tdmr->pamt_4k_base is still zero so the error + * path of the caller will skip freeing the pamt. + */ if (!pamt) return -ENOMEM; =20 - /* - * Break the contiguous allocation back up into the - * individual PAMTs for each page size. - */ - tdmr_pamt_base =3D page_to_pfn(pamt) << PAGE_SHIFT; - for (pgsz =3D TDX_PS_4K; pgsz < TDX_PS_NR; pgsz++) { - pamt_base[pgsz] =3D tdmr_pamt_base; - tdmr_pamt_base +=3D pamt_size[pgsz]; - } - - tdmr->pamt_4k_base =3D pamt_base[TDX_PS_4K]; - tdmr->pamt_4k_size =3D pamt_size[TDX_PS_4K]; - tdmr->pamt_2m_base =3D pamt_base[TDX_PS_2M]; - tdmr->pamt_2m_size =3D pamt_size[TDX_PS_2M]; - tdmr->pamt_1g_base =3D pamt_base[TDX_PS_1G]; - tdmr->pamt_1g_size =3D pamt_size[TDX_PS_1G]; + tdmr->pamt_4k_base =3D page_to_phys(pamt); + tdmr->pamt_2m_base =3D tdmr->pamt_4k_base + tdmr->pamt_4k_size; + tdmr->pamt_1g_base =3D tdmr->pamt_2m_base + tdmr->pamt_2m_size; =20 return 0; } @@ -657,10 +632,7 @@ static __init void tdmr_do_pamt_func(struct tdmr_info = *tdmr, tdmr_get_pamt(tdmr, &pamt_base, &pamt_size); =20 /* Do nothing if PAMT hasn't been allocated for this TDMR */ - if (!pamt_size) - return; - - if (WARN_ON_ONCE(!pamt_base)) + if (!pamt_base) return; =20 pamt_func(pamt_base, pamt_size); @@ -686,14 +658,12 @@ static __init void tdmrs_free_pamt_all(struct tdmr_in= fo_list *tdmr_list) =20 /* Allocate and set up PAMTs for all TDMRs */ static __init int tdmrs_set_up_pamt_all(struct tdmr_info_list *tdmr_list, - struct list_head *tmb_list, - u16 pamt_entry_size[]) + struct list_head *tmb_list) { int i, ret =3D 0; =20 for (i =3D 0; i < tdmr_list->nr_consumed_tdmrs; i++) { - ret =3D tdmr_set_up_pamt(tdmr_entry(tdmr_list, i), tmb_list, - pamt_entry_size); + ret =3D tdmr_set_up_pamt(tdmr_entry(tdmr_list, i), tmb_list); if (ret) goto err; } @@ -970,18 +940,13 @@ static __init int construct_tdmrs(struct list_head *t= mb_list, struct tdmr_info_list *tdmr_list, struct tdx_sys_info_tdmr *sysinfo_tdmr) { - u16 pamt_entry_size[TDX_PS_NR] =3D { - sysinfo_tdmr->pamt_4k_entry_size, - sysinfo_tdmr->pamt_2m_entry_size, - sysinfo_tdmr->pamt_1g_entry_size, - }; int ret; =20 ret =3D fill_out_tdmrs(tmb_list, tdmr_list); if (ret) return ret; =20 - ret =3D tdmrs_set_up_pamt_all(tdmr_list, tmb_list, pamt_entry_size); + ret =3D tdmrs_set_up_pamt_all(tdmr_list, tmb_list); if (ret) return ret; =20 --=20 2.54.0 From nobody Mon Jun 8 22:51:56 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AFBEC30CD80; Tue, 26 May 2026 02:35:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762931; cv=none; b=M5dQddrLiodD8bm1C3av34Z1OVepwUGzEz43Rf+0gy/1zAtOJjigtHoKKXvI7UgEWUNk7J3AXdrqCeW93LL4bjUfVa7Mxvo+uq1CHFLLzmkRcS6ogYs/qmmunKVMnTT1UhDLIm9MSIbH0IOFxJ7TePk6BXegrATdkum04SvS2eA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762931; c=relaxed/simple; bh=q3F6dFl3Y2R7QdEvWtaNECjX8ekXuQNqhFpzJyLuGX4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=o5EElCxJ/GC0yWw6zjX16OYVv195x1LgWhMz7F9IfnqcOsZ1kNrps9MUZPbFFE8EroTJ/XFIC4cFRtwZCBtl+2VcoZvYMhyJXpM0BN36108M38tu3NMuoIlp7AsyTwFeQKRhbx8yPt0df0FK+8DidlHyU+HVF3KKs46DZycf7vM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QG/GT1I7; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QG/GT1I7" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779762927; x=1811298927; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=q3F6dFl3Y2R7QdEvWtaNECjX8ekXuQNqhFpzJyLuGX4=; b=QG/GT1I7cTRHiInq+ikSxceEoxCaNvZWLuL95krezmbAL6dPJa+VPXTv Kf6GVDyzIrL6FuVY4bgvTOHDOqfw/ZWyJSM2FtAcmdcoGaRD4j8P5XiuA knnyBNTjJmD2m3+ZadE03KjCqLQspjdW7PKFupvAzUZFaZy0aVbmvA/77 vHBtcyK+UcA8/piZiMGyFkue+1ceoDYusJLajfTsCtMknZ877fEFLGIGs 7Xd78NVSL7e0nOOBpWEa6OuqdB9bWFqyVCQ2pLnBLel2+SY8RU/Kgq3T6 QQOgGkQSCqAVg2+Fyrj+mKidiVHC/KodI5NR/Q0Oi/YJBxyK2d4wvM/CP A==; X-CSE-ConnectionGUID: V+tZFXC1QCSHKc5TO5yEMQ== X-CSE-MsgGUID: 7r5XPAYDTdmmw58egmoihA== X-IronPort-AV: E=McAfee;i="6800,10657,11797"; a="91677784" X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="91677784" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 X-CSE-ConnectionGUID: zVeTeqWbS6GVA4HeIOxKVg== X-CSE-MsgGUID: oZBy9SjWSnOU0uwQtEFuww== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="279878289" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 From: Rick Edgecombe To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com, kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com, seanjc@google.com, tglx@kernel.org, vannapurve@google.com, x86@kernel.org, chao.gao@intel.com, yan.y.zhao@intel.com, kai.huang@intel.com Cc: rick.p.edgecombe@intel.com, "Kirill A. Shutemov" , Binbin Wu Subject: [PATCH v6 02/11] x86/virt/tdx: Allocate page bitmap for Dynamic PAMT Date: Mon, 25 May 2026 19:35:06 -0700 Message-ID: <20260526023515.288829-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526023515.288829-1-rick.p.edgecombe@intel.com> References: <20260526023515.288829-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kirill A. Shutemov" The TDX Physical Address Metadata Table (PAMT) holds data about the physical memory used by TDX, and must be allocated by the kernel during TDX module initialization. The exact size of the required PAMT memory is determined by the TDX module and may vary between TDX module versions. Currently it is approximately 0.4% of the system memory. This is a significant commitment, especially if it is not known upfront whether the machine will run any TDX guests. Each memory region that the TDX module might use needs three separate PAMT allocations. One for each supported page size (1GB, 2MB, 4KB). The TDX module supports a new feature designed to reduce PAMT overhead called Dynamic PAMT. At a high level, Dynamic PAMT still has the 1GB and 2MB levels allocated on TDX module initialization, but the 4KB level is allocated dynamically during runtime. However, in the details, Dynamic PAMT still needs some smaller per 4KB page scoped data (currently it is 1 bit per page). The TDX module exposes the number of bits as a separate piece of metadata than the 4KB static allocation for regular PAMT. Although the size is enumerated differently, it is handed to the TDX module in the same way the 4KB page size PAMT allocation is for regular, non-dynamic PAMT. Begin to implement Dynamic PAMT in the kernel by reading the bits-per-page needed for Dynamic PAMT. Calculate the size needed for the bitmap, and use it instead of the 4KB size determined for normal PAMT, in the case of Dynamic PAMT. Unlike the existing metadata reading code, this code is not generated by a script. So adjust the comment to be more generic. Also, start to adopt a more normal kernel code style without the tenary statements and if conditionals assignments that the auto generated code has. Assisted-by: Sashiko:claude-opus-4-6 Reviewed-by: Binbin Wu Signed-off-by: Kirill A. Shutemov Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Tony Lindgren --- v6: - Improve comment (Binbin) - Log tweaks - Mark tdmr_get_pamt_bitmap_sz() __init in response to upstream changes - Switch to more normal kernel code style, even though it differs from the existing auto generated code. --- arch/x86/include/asm/tdx.h | 5 +++++ arch/x86/include/asm/tdx_global_metadata.h | 3 +++ arch/x86/virt/vmx/tdx/tdx.c | 19 ++++++++++++++++++- arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 21 ++++++++++++++++++++- 4 files changed, 46 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 503f9a3f46d61..82dc27aecf297 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -149,6 +149,11 @@ static __always_inline u64 sc_retry(sc_func_t func, u6= 4 fn, const char *tdx_dump_mce_info(struct mce *m); const struct tdx_sys_info *tdx_get_sysinfo(void); =20 +static inline bool tdx_supports_dynamic_pamt(const struct tdx_sys_info *sy= sinfo) +{ + return false; /* To be enabled when kernel is ready */ +} + int tdx_guest_keyid_alloc(void); u32 tdx_get_nr_guest_keyids(void); void tdx_guest_keyid_free(unsigned int keyid); diff --git a/arch/x86/include/asm/tdx_global_metadata.h b/arch/x86/include/= asm/tdx_global_metadata.h index 40689c8dc67eb..88040ddb51af4 100644 --- a/arch/x86/include/asm/tdx_global_metadata.h +++ b/arch/x86/include/asm/tdx_global_metadata.h @@ -21,6 +21,9 @@ struct tdx_sys_info_tdmr { u16 pamt_4k_entry_size; u16 pamt_2m_entry_size; u16 pamt_1g_entry_size; + + /* Optional metadata, if Dynamic PAMT is supported */ + u8 pamt_page_bitmap_entry_bits; }; =20 struct tdx_sys_info_td_ctrl { diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 487f389f52f4b..9ebd192cb5c17 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -512,6 +512,18 @@ static __init int fill_out_tdmrs(struct list_head *tmb= _list, return 0; } =20 +static __init unsigned long tdmr_get_pamt_bitmap_sz(struct tdmr_info *tdmr) +{ + unsigned long pamt_sz, nr_pamt_entries; + int bits_per_entry; + + bits_per_entry =3D tdx_sysinfo.tdmr.pamt_page_bitmap_entry_bits; + nr_pamt_entries =3D tdmr->size >> PAGE_SHIFT; + pamt_sz =3D DIV_ROUND_UP(nr_pamt_entries * bits_per_entry, BITS_PER_BYTE); + + return PAGE_ALIGN(pamt_sz); +} + /* * Calculate PAMT size given a TDMR and a page size. The returned * PAMT size is always aligned up to 4K page boundary. @@ -579,7 +591,12 @@ static __init int tdmr_set_up_pamt(struct tdmr_info *t= dmr, * Calculate the PAMT size for each TDX supported page size * and the total PAMT size. */ - tdmr->pamt_4k_size =3D tdmr_get_pamt_sz(tdmr, TDX_PS_4K); + if (tdx_supports_dynamic_pamt(&tdx_sysinfo)) { + /* With Dynamic PAMT, PAMT_4K is replaced with a bitmap */ + tdmr->pamt_4k_size =3D tdmr_get_pamt_bitmap_sz(tdmr); + } else { + tdmr->pamt_4k_size =3D tdmr_get_pamt_sz(tdmr, TDX_PS_4K); + } tdmr->pamt_2m_size =3D tdmr_get_pamt_sz(tdmr, TDX_PS_2M); tdmr->pamt_1g_size =3D tdmr_get_pamt_sz(tdmr, TDX_PS_1G); tdmr_pamt_size =3D tdmr->pamt_4k_size + tdmr->pamt_2m_size + tdmr->pamt_1= g_size; diff --git a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c b/arch/x86/virt/vm= x/tdx/tdx_global_metadata.c index c7db393a9cfb1..7e8e913463be1 100644 --- a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c +++ b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 /* - * Automatically generated functions to read TDX global metadata. + * Functions to read TDX global metadata. * * This file doesn't compile on its own as it lacks of inclusion * of SEAMCALL wrapper primitive which reads global metadata. @@ -33,6 +33,18 @@ static __init int get_tdx_sys_info_features(struct tdx_s= ys_info_features *sysinf return ret; } =20 +static __init int get_tdx_sys_info_tdmr_dpamt(struct tdx_sys_info_tdmr *sy= sinfo_tdmr) +{ + int ret; + u64 val; + + ret =3D read_sys_metadata_field(0x9100000100000013, &val); + if (!ret) + sysinfo_tdmr->pamt_page_bitmap_entry_bits =3D val; + + return ret; +} + static __init int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr *sysinfo_= tdmr) { int ret =3D 0; @@ -116,5 +128,12 @@ static __init int get_tdx_sys_info(struct tdx_sys_info= *sysinfo) ret =3D ret ?: get_tdx_sys_info_td_ctrl(&sysinfo->td_ctrl); ret =3D ret ?: get_tdx_sys_info_td_conf(&sysinfo->td_conf); =20 + /* + * Don't treat a module that doesn't support Dynamic PAMT + * as a failure. Only read the metadata optionally. + */ + if (!ret && tdx_supports_dynamic_pamt(sysinfo)) + ret =3D get_tdx_sys_info_tdmr_dpamt(&sysinfo->tdmr); + return ret; } --=20 2.54.0 From nobody Mon Jun 8 22:51:56 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D3DC1283FDC; Tue, 26 May 2026 02:35:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762932; cv=none; b=s45k3iPqA083pH9Keh/nnIV87deJ2GYa7Ug4MmzQpLjuOMigq8HvOOIa3Dq4qI9FMXdrHecrE7MV3yLCsqq7PnvR4yBwjF03TsIOZsV+eF1U/bkrogPCnCzNBwVkeThgXMS+kNmOwTME/LBpyGFQYcuY/mlF6JndTDUIeUq40Ac= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762932; c=relaxed/simple; bh=FN3Il+FhDrPnsCEcXMFwpbvzUhYE5TJOH/0QF6cbey4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uMWsUr9f0VkV9uAlBZt3vtNnxkrjkz0xvZQoLlL2se+H3Fvzj2Eu4eIvDJOGovDSUCyw0mn/F5Hou+g6dDV7ddc4GPCHJaUSdfQ0lQK98SxtWFsUE+EgmJkEwwq4rnNenQQhn5j5yeEv8xmPLKWYSaiUMMIYDqF738BaK5X4ilc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=GLHonh3b; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="GLHonh3b" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779762927; x=1811298927; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FN3Il+FhDrPnsCEcXMFwpbvzUhYE5TJOH/0QF6cbey4=; b=GLHonh3bcwdgGxCM/3NPnq6gFPSrPeX6KZwLXgsLHWjrj3mmc577hThU UNfTAfsWZ+uCySLxYYHORPWGihAn2n0W3Wxp7SCy9SkZZ6/ucTgRsM+48 8oErYmqOv3FckwfqkcCzFZmeANbUS9K8+JPmmOdrAWwyGTY8ExwCwJndO 8ay7PVsGjJCeBd25FJ/r3aqcm/UBREcOtem0O4pkwmZoUcfz2bj5frR0J 7RJEu+AipLqAWeXCN2WALNX64l2umxMNh/ZB8NDCi8le4+c/9i5ztZuni 7QeSxcgGGYfPhg0NNVgHBzEe07Lq6CsX1bDEhw1UEZO96S7pEB2sphDpE Q==; X-CSE-ConnectionGUID: 7wTcw3XOQMGAj/1SvGXJzg== X-CSE-MsgGUID: 1A5mNLYaQBGzhKRzjoDNNg== X-IronPort-AV: E=McAfee;i="6800,10657,11797"; a="91677792" X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="91677792" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 X-CSE-ConnectionGUID: 0E+Vn2WbSm6X/H20MkIM4Q== X-CSE-MsgGUID: bzRiO/V4Tu2kagKfdYP5kA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="279878294" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 From: Rick Edgecombe To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com, kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com, seanjc@google.com, tglx@kernel.org, vannapurve@google.com, x86@kernel.org, chao.gao@intel.com, yan.y.zhao@intel.com, kai.huang@intel.com Cc: rick.p.edgecombe@intel.com, "Kirill A. Shutemov" Subject: [PATCH v6 03/11] x86/virt/tdx: Add tdx_alloc/free_control_page() helpers Date: Mon, 25 May 2026 19:35:07 -0700 Message-ID: <20260526023515.288829-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526023515.288829-1-rick.p.edgecombe@intel.com> References: <20260526023515.288829-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kirill A. Shutemov" Add helpers to use when allocating or preparing pages that are handed to the TDX-Module for use as control/S-EPT pages, and thus need Dynamic PAMT adjustments. The TDX module tracks some state for each page of physical memory that it might use. It calls this state the PAMT. It includes separate state for each page size a physical page could be utilized at within the TDX module (1GB, 2MB, 4KB). In Dynamic PAMT, only the 4KB page size state is allocated dynamically. So for pages that TDX will use as 2MB physically contiguous pages, Dynamic PAMT backing is not needed. KVM will need to hand pages to the TDX module that it will use at 4KB granularity. So these pages will need Dynamic PAMT backing added before they are used by the TDX module, and removed afterwards. Add tdx_alloc_control_page() and tdx_free_control_page() to handle both page allocation and Dynamic PAMT installation. Make them behave like normal alloc/free functions where allocation can fail in the case of no memory, but free (with any necessary Dynamic PAMT release) always succeeds. Do this so they can support the existing TDX flows that require teardowns to succeed. Also create tdx_pamt_get/put() to handle installing Dynamic PAMT 4KB backing for pages that are already allocated (such as KVM's use of S-EPT page tables or guest private memory). Have them take a pfn instead of a struct page, as future changes will want to use these helpers for guest pages which are tracked by PFN. Don't CLFLUSH the Dynamic PAMT pages handed to the TDX module, as is done for some other SEAMCALLs, as the TDX docs specify that this is only needed on "TD private memory or TD control structure page". Since these allocations will be easily user triggerable, account the memory. Leave logic to handle concurrency issues for future changes. Assisted-by: GitHub Copilot:claude-opus-4-6 Claude:claude-opus-4-7 Sashiko:= claude-opus-4-6 Signed-off-by: Kirill A. Shutemov Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu Reviewed-by: Tony Lindgren --- v6: The major change was to split out the concurrency stuff into a future patch. It makes it easier to explain in the log. This one is the basic functionality. Then the simple version of the concurrency and why in the next patch. Also, to get rid of the dynamically sized DPAMT backing support which was not based on a formal spec. Details: - Split out concurrency stuff into next patch because the log was too long - Switch to fixed size pamt page arrays (Nikolay) - Rename tdx_alloc_page()/tdx_free_page() to tdx_alloc_control_page()/ tdx_free_control_page() to reflect control/S-EPT purpose (Sean) - Take gfp from the caller in tdx_alloc_control_page() (Sean) - Narrow external API: make tdx_pamt_get()/tdx_pamt_put() static and export only tdx_alloc_control_page()/tdx_free_control_page() (note: dropped inline helpers since the discussion on Sean's series resulted in them not being needed) - Switch EXPORT_SYMBOL_GPL to EXPORT_SYMBOL_FOR_KVM (Sean) - Use WARN_ON_ONCE() instead of pr_err() for TDX module failures (Sean) - Fold alloc_pamt_array()/free_pamt_array() helpers back in and fix the error-unwind index bug (dpamt_pages[i] -> [j]) - Adjustments after struct page->pfn - Adjustments from dropping error helper patches - Make the free error paths more normal - Drop gfp_t arg in tdx_alloc_control_page(). In the Sean mega v5, it was really needed because the kvm_mmu_memory_cache had a gfp_t it needed something to do with. But this was still weird because that version didn't handle allocating the DPAMT pages as the gfp_t. And in the end all the callers pass GFP_KERNEL_ACCOUNT. So just drop the arg. - Log tweaks --- arch/x86/include/asm/tdx.h | 7 ++ arch/x86/virt/vmx/tdx/tdx.c | 159 ++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 2 + 3 files changed, 168 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 82dc27aecf297..74e75db5728c7 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -37,6 +37,7 @@ =20 #include #include +#include #include =20 /* @@ -160,6 +161,12 @@ void tdx_guest_keyid_free(unsigned int keyid); =20 void tdx_quirk_reset_paddr(unsigned long base, unsigned long size); =20 +/* Number PAMT pages to be provided to TDX module per 2MB region of PA */ +#define TDX_DPAMT_ENTRY_PAGE_CNT 2 + +struct page *tdx_alloc_control_page(void); +void tdx_free_control_page(struct page *page); + struct tdx_td { /* TD root structure: */ struct page *tdr_page; diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 9ebd192cb5c17..9e0812d87ab06 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1919,6 +1919,165 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, kvm_pfn_t= pfn) } EXPORT_SYMBOL_FOR_KVM(tdh_phymem_page_wbinvd_hkid); =20 +static int alloc_pamt_array(struct page **pamt_pages) +{ + int i, j; + + for (i =3D 0; i < TDX_DPAMT_ENTRY_PAGE_CNT; i++) { + pamt_pages[i] =3D alloc_page(GFP_KERNEL_ACCOUNT); + if (!pamt_pages[i]) + goto err; + } + + return 0; +err: + for (j =3D 0; j < i; j++) + __free_page(pamt_pages[j]); + return -ENOMEM; +} + +static void free_pamt_array(struct page **pamt_pages) +{ + for (int i =3D 0; i < TDX_DPAMT_ENTRY_PAGE_CNT; i++) { + /* + * Reset pages unconditionally to cover cases + * where they were passed to the TDX module. + */ + tdx_quirk_reset_paddr(page_to_phys(pamt_pages[i]), PAGE_SIZE); + + __free_page(pamt_pages[i]); + } +} + +/* + * Calculate the arg needed for operating on the DPAMT backing for + * a given 4KB page. + */ +static u64 pamt_2mb_arg(kvm_pfn_t pfn) +{ + unsigned long hpa_2mb =3D ALIGN_DOWN(pfn << PAGE_SHIFT, PMD_SIZE); + + return hpa_2mb | TDX_PS_2M; +} + +/* Add PAMT backing for the given page. */ +static u64 tdh_phymem_pamt_add(kvm_pfn_t pfn, struct page **pamt_pages) +{ + struct tdx_module_args args =3D { + .rcx =3D pamt_2mb_arg(pfn), + .rdx =3D page_to_phys(pamt_pages[0]), + .r8 =3D page_to_phys(pamt_pages[1]), + }; + + return seamcall(TDH_PHYMEM_PAMT_ADD, &args); +} + +/* Remove PAMT backing for the given page. */ +static u64 tdh_phymem_pamt_remove(kvm_pfn_t pfn, struct page **pamt_pages) +{ + struct tdx_module_args args =3D { + .rcx =3D pamt_2mb_arg(pfn), + }; + u64 ret; + + ret =3D seamcall_ret(TDH_PHYMEM_PAMT_REMOVE, &args); + if (ret) + return ret; + + /* Copy PAMT pages out of the struct per the TDX ABI */ + pamt_pages[0] =3D phys_to_page(args.rdx); + pamt_pages[1] =3D phys_to_page(args.r8); + + return 0; +} + +/* Allocate PAMT memory for the given page */ +static int tdx_pamt_get(kvm_pfn_t pfn) +{ + struct page *pamt_pages[TDX_DPAMT_ENTRY_PAGE_CNT]; + u64 tdx_status; + int ret; + + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return 0; + + ret =3D alloc_pamt_array(pamt_pages); + if (ret) + return ret; + + tdx_status =3D tdh_phymem_pamt_add(pfn, pamt_pages); + if (tdx_status !=3D TDX_SUCCESS) { + ret =3D -EIO; + goto out_free; + } + + return 0; +out_free: + free_pamt_array(pamt_pages); + return ret; +} + +/* Free PAMT memory for the given page */ +static void tdx_pamt_put(kvm_pfn_t pfn) +{ + struct page *pamt_pages[TDX_DPAMT_ENTRY_PAGE_CNT] =3D {}; + u64 tdx_status; + + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return; + + tdx_status =3D tdh_phymem_pamt_remove(pfn, pamt_pages); + + /* + * Don't free pamt_pages as it could hold garbage when + * tdh_phymem_pamt_remove() fails. Don't panic/BUG_ON(), as + * there is no risk of data corruption, but do yell loudly as + * failure indicates a kernel bug, memory is being leaked, and + * the dangling PAMT entry may cause future operations to fail. + */ + if (WARN_ON_ONCE(tdx_status !=3D TDX_SUCCESS)) + return; + + free_pamt_array(pamt_pages); +} + +/* + * Return a page that can be gifted to the TDX-Module for use as a "contro= l" + * page, i.e. pages that are used for control and S-EPT structures for a g= iven + * TDX guest, and bound to said guest's HKID and thus obtain TDX protectio= ns, + * including PAMT tracking. + */ +struct page *tdx_alloc_control_page(void) +{ + struct page *page; + + page =3D alloc_page(GFP_KERNEL_ACCOUNT); + if (!page) + return NULL; + + if (tdx_pamt_get(page_to_pfn(page))) { + __free_page(page); + return NULL; + } + + return page; +} +EXPORT_SYMBOL_FOR_KVM(tdx_alloc_control_page); + +/* + * Free a page that was gifted to the TDX-Module for use as a control/S-EPT + * page. After this, the page is no longer protected by TDX. + */ +void tdx_free_control_page(struct page *page) +{ + if (!page) + return; + + tdx_pamt_put(page_to_pfn(page)); + __free_page(page); +} +EXPORT_SYMBOL_FOR_KVM(tdx_free_control_page); + #ifdef CONFIG_KEXEC_CORE void tdx_cpu_flush_cache_for_kexec(void) { diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index dde219c823b41..8c39dde347cc2 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -46,6 +46,8 @@ #define TDH_PHYMEM_PAGE_WBINVD 41 #define TDH_VP_WR 43 #define TDH_SYS_CONFIG 45 +#define TDH_PHYMEM_PAMT_ADD 58 +#define TDH_PHYMEM_PAMT_REMOVE 59 =20 /* * SEAMCALL leaf: --=20 2.54.0 From nobody Mon Jun 8 22:51:56 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F4EB30C151; Tue, 26 May 2026 02:35:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762931; cv=none; b=rONF7nETeqOS4FFGMkr1gQF53kJGBHyGObY34xOR5/VWVsKlF2OXZKNjsduoNY0OuLIiVNnqB2Dda4kz1EWDxITfZF34CxpWQc6aIe2qK+XUw0gnRrfSzdLXjZNMIVmUqer95JJyJQxQ7OaGAUfeM6bfa8nRemv4aDnrr3jJmXU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762931; c=relaxed/simple; bh=UNpc8w7zvAFucwvtcKFcJLYzSYmUAsEKDwnfDUz3AMg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QKQv8xNOc7m+WYwAm0HirC/UbvG0sCBUr5COHoTaFiZcBR6r8yRBAaibDRBFXO+NuP/3fpieq++wJrbEKlq29982Rmh3ihFFGDNGtMFEdDNXTxG1Rls1xIaQ/LjwLqlx2CA5NywCZRaB21nB67MrwgxquF/rRW8nQs2KCOAN8Jk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=VpVA0SGN; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="VpVA0SGN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779762927; x=1811298927; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=UNpc8w7zvAFucwvtcKFcJLYzSYmUAsEKDwnfDUz3AMg=; b=VpVA0SGNTAoSeoF4ey//07pHXhXCZ56L/32zdipABCN7u8jmiPa2cVLI /LE7/+9gB4ku9yYkDzBJjOh4wAz+6vzJ60IOzs9VCzwaTIcLbAafwRtb2 s5kQg+bJan0SFbQgdAgBWV4HBLZuaxnk+781XpOzXY3O01OJRXlZls7Xj cb3EA9XRsbE0rPn6rAt+S62USglpaZ1M5zkGo2bOgDViwW5xgkhOVp8wy +FRn6bTf2KvgS6dtRAXEnCcIqKKzK4ELSx6y47vjch4k2cLUTVfiaJaYL VJSh4mujUJNnFdIwYvwsDAIdd5Cc0jxsySDQh2CeCtVlPoaR5IqrRLhnq A==; X-CSE-ConnectionGUID: ai8ScbH0RZ2zi7MfTu9bAw== X-CSE-MsgGUID: YK8EXVqORJ6t+ZiB/R9EIg== X-IronPort-AV: E=McAfee;i="6800,10657,11797"; a="91677802" X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="91677802" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 X-CSE-ConnectionGUID: FzS/pyoeR7WIjAooc/5Jew== X-CSE-MsgGUID: pwcThJ61QCaTFzB3exMhpQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="279878298" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 From: Rick Edgecombe To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com, kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com, seanjc@google.com, tglx@kernel.org, vannapurve@google.com, x86@kernel.org, chao.gao@intel.com, yan.y.zhao@intel.com, kai.huang@intel.com Cc: rick.p.edgecombe@intel.com, "Kirill A. Shutemov" Subject: [PATCH v6 04/11] x86/virt/tdx: Allocate ref counts for Dynamic PAMT memory Date: Mon, 25 May 2026 19:35:08 -0700 Message-ID: <20260526023515.288829-5-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526023515.288829-1-rick.p.edgecombe@intel.com> References: <20260526023515.288829-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kirill A. Shutemov" The PAMT memory holds metadata for all possible TDX protected memory. Each physical address range is covered by PAMT entries at three levels (1GB, 2MB, 4KB). With Dynamic PAMT, the 4KB range of PAMT is allocated on demand. The kernel supplies the TDX module with page pairs to store the 4KB entries, which cover 2MB of host physical memory. The kernel must provide this page pair before using pages from the range for TDX. If this is not done, SEAMCALLs that give the pages to be protected by the TDX module will fail. Allocate reference counters for every 2MB range to track TDX memory usage. This can be used to handle concurrent get/put callers, in order to accurately determine when the dynamic 4KB level of Dynamic PAMT needs to be allocated and when it can be freed. This allocation will currently consume 2 MB for every 1 TB of address space from 0 to max_pfn. The allocation size will depend on how the RAM is physically laid out. In a worst case scenario where the entire 52-bit address space is covered this would be 8GB. Then the DPAMT refcount allocations could hypothetically cause the savings from Dynamic PAMT to go negative on exotic platforms with sparse, small amounts of memory. Future changes could reduce this refcount overhead to be only allocating refcounts for physical ranges that contain memory that TDX can use. However, this is left for future work. Assisted-by: Sashiko:claude-opus-4-6 GitHub Copilot:claude-opus-4-6 Sashiko= :claude-opus-4-6 Signed-off-by: Kirill A. Shutemov Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Tony Lindgren --- v6: - Remove confusing reference to allocating PAMT memory in pamt_refcounts comment. (Yan) - Rename "metadata" function names that really deal with refcounts, as metadata already has a different meaning in TDX. - Move tdx_find_pamt_refcount() to this patch to aid in reviewability v4: - Log typo (Binbin) - round correctly when computing PAMT refcount size (Binbin) - Zero refcount vmalloc allocation (Note: This got replaced in optimization patch with a zero-ed allocation, but this showed up in testing with the optimization patches removed. Since it's fixed before this code is exercised, it's not a bisectability issue, but fix it anyway.) v3: - Split out lazily populate optimization to next patch (Dave) - Add comment around pamt_refcounts (Dave) - Improve log --- arch/x86/virt/vmx/tdx/tdx.c | 54 ++++++++++++++++++++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 9e0812d87ab06..6658a6be6697c 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -52,6 +53,14 @@ static DEFINE_PER_CPU(bool, tdx_lp_initialized); =20 static struct tdmr_info_list tdx_tdmr_list; =20 +/* + * On a machine with Dynamic PAMT, the kernel maintains a reference counter + * for every 2M range. The counter indicates how many users there are for + * the PAMT memory of the 2M range. The kernel allocates PAMT refcounts at + * initialization. + */ +static atomic_t *pamt_refcounts; + /* All TDX-usable memory regions. Protected by mem_hotplug_lock. */ static LIST_HEAD(tdx_memlist); =20 @@ -254,6 +263,43 @@ static struct syscore tdx_syscore =3D { .ops =3D &tdx_syscore_ops, }; =20 +/* + * Allocate PAMT reference counters for all physical memory. + * + * It consumes 2MiB for every 1TiB of physical memory. + */ +static int init_pamt_refcounts(void) +{ + size_t size =3D DIV_ROUND_UP(max_pfn, PTRS_PER_PTE) * sizeof(*pamt_refcou= nts); + + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return 0; + + pamt_refcounts =3D __vmalloc(size, GFP_KERNEL | __GFP_ZERO); + if (!pamt_refcounts) + return -ENOMEM; + + return 0; +} + +static void free_pamt_refcounts(void) +{ + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return; + + vfree(pamt_refcounts); + pamt_refcounts =3D NULL; +} + +/* Find PAMT refcount for a given physical address */ +static atomic_t * __maybe_unused tdx_find_pamt_refcount(unsigned long pfn) +{ + /* Find which PMD a PFN is in. */ + unsigned long index =3D pfn >> (PMD_SHIFT - PAGE_SHIFT); + + return &pamt_refcounts[index]; +} + /* * Add a memory region as a TDX memory block. The caller must make sure * all memory regions are added in address ascending order and don't @@ -1151,10 +1197,14 @@ static __init int init_tdx_module(void) */ get_online_mems(); =20 - ret =3D build_tdx_memlist(&tdx_memlist); + ret =3D init_pamt_refcounts(); if (ret) goto out_put_tdxmem; =20 + ret =3D build_tdx_memlist(&tdx_memlist); + if (ret) + goto err_free_pamt_refcounts; + /* Allocate enough space for constructing TDMRs */ ret =3D alloc_tdmr_list(&tdx_tdmr_list, &tdx_sysinfo.tdmr); if (ret) @@ -1204,6 +1254,8 @@ static __init int init_tdx_module(void) free_tdmr_list(&tdx_tdmr_list); err_free_tdxmem: free_tdx_memlist(&tdx_memlist); +err_free_pamt_refcounts: + free_pamt_refcounts(); goto out_put_tdxmem; } =20 --=20 2.54.0 From nobody Mon Jun 8 22:51:56 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 33D6F30E83A; Tue, 26 May 2026 02:35:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762935; cv=none; b=CSDfJy1AT9EKDov9P3x2sJ97zznZ0wvEiVhbT3ovrgTkFA6BYMJtZzBbkL4oAIH+kbcrfFR0eKwGBm785MmnGDEYiH5/NLevy9D7cDRyIEh0wZr30liHnkw4S0aFp9bbUfMRUj2HTpX3I5VsZSq3bRuNb1fKUQVgwumscpATEQ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762935; c=relaxed/simple; bh=KfOA9v5revpQ6S9OEYf7TOeMIGkHMH0/KABda7h4XWg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uHKmhdP+yQ4jG+TQCxFp0q6NPTUXElo/WRp6wM0dbhnFdb1fEVvi1RTauNrc/neNI+xv8jTST2rl08o7EWvdGLJdluLEEFhBwcUv3ICOfLvbqO/nNH3At+t9hYrAkCMQ+HKiOdydKpnBIG1K348BaWiitztLB2qXO8M9eqHvubU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SyVdth+L; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SyVdth+L" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779762931; x=1811298931; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KfOA9v5revpQ6S9OEYf7TOeMIGkHMH0/KABda7h4XWg=; b=SyVdth+LrKZSt+6Expgeor/iqNMY9pcrxiypazut4o9+hzB6HXBy0XET JURCx64kKAEqmhU+9XHmskQGPjbTJVcrL0ezHoTSwb0+ucVAk3xwRmTlU 52Y66TJN8wJRt6loSR0dcCImvAtHFS52w67YRPUwKWAnYtvWcoIFpy5xc 8RzQF9Zsb/tbjkRiOyo0f8nO91Ido1f5vNGKonpdxY93i0wwzW3tE2I0y 9Fu6Zhpvnz4ahdsF5ZMwptQCmlZ5/N0QziGkjZsJCyu0waSBO5GiOL8j4 36aMIuIEB+jVWOzoVIbLzcoeD/QDj0KTfwzCdfUty95+UlbQrSRNmH54S g==; X-CSE-ConnectionGUID: oYDCcmohSLSrnuN/iZXyLw== X-CSE-MsgGUID: c+JPMJRoS7a5Ef2/2mQi7A== X-IronPort-AV: E=McAfee;i="6800,10657,11797"; a="91677813" X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="91677813" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 X-CSE-ConnectionGUID: 3StM9WuITROPZGtD6Gu2xQ== X-CSE-MsgGUID: hOWx9LOoTyas3tLOdjWQ9g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="279878302" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 From: Rick Edgecombe To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com, kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com, seanjc@google.com, tglx@kernel.org, vannapurve@google.com, x86@kernel.org, chao.gao@intel.com, yan.y.zhao@intel.com, kai.huang@intel.com Cc: rick.p.edgecombe@intel.com, "Kirill A. Shutemov" Subject: [PATCH v6 05/11] x86/virt/tdx: Handle concurrent callers in tdx_pamt_get/put() Date: Mon, 25 May 2026 19:35:09 -0700 Message-ID: <20260526023515.288829-6-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526023515.288829-1-rick.p.edgecombe@intel.com> References: <20260526023515.288829-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kirill A. Shutemov" tdx_pamt_get()/tdx_pamt_put() unconditionally add or remove Dynamic PAMT backing for the 2MB region covering the passed pfn. However, multiple callers can concurrently operate on 4KB pages that fall within the same 2MB region. When this happens only one Dynamic PAMT page pair needs to be installed to cover the 2MB range. And when one page is freed, the Dynamic PAMT backing cannot be freed until all pages in the range are no longer in use. Make the helpers handle these races internally. Use the per-2MB refcounts from previous changes to track how many 4KB pages are in use within each region. Gate the actual Dynamic PAMT add and remove on refcount transitions (0->1 and 1->0). Serialize the refcount check and SEAMCALL with a global spinlock so the read-decide-act sequence is atomic. This also avoids TDX module BUSY errors, as Dynamic PAMT add and remove SEAMCALLs take an internal TDX module locks at 2MB granularity, so simultaneous attempts on the same region would conflict. The lock is global and heavyweight. Use simple conditional logic to keep correctness obvious. This will be optimized in a later change. Assisted-by: GitHub Copilot:claude-opus-4-6 Claude:claude-opus-4-7 Signed-off-by: Kirill A. Shutemov Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Tony Lindgren --- v6: - Split from "x86/virt/tdx: Add tdx_alloc/free_control_page() helpers" - Return 0 instead of ret to be clearer (Binbin) - Clarify log (Nikolay) - Justify why the patch is not optimized in response to comments by (Nikolay) - Move tdx_find_pamt_refcount() to faciliate patch re-order - Adjustments from dropping error helper patches - Log tweaks --- arch/x86/virt/vmx/tdx/tdx.c | 72 ++++++++++++++++++++++++++++--------- 1 file changed, 56 insertions(+), 16 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 6658a6be6697c..50333eb96efa6 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -2043,10 +2043,14 @@ static u64 tdh_phymem_pamt_remove(kvm_pfn_t pfn, st= ruct page **pamt_pages) return 0; } =20 -/* Allocate PAMT memory for the given page */ +/* Serializes adding/removing PAMT memory */ +static DEFINE_SPINLOCK(pamt_lock); + +/* Bump PAMT refcount for the given page and allocate PAMT memory if neede= d */ static int tdx_pamt_get(kvm_pfn_t pfn) { struct page *pamt_pages[TDX_DPAMT_ENTRY_PAGE_CNT]; + atomic_t *pamt_refcount; u64 tdx_status; int ret; =20 @@ -2057,10 +2061,26 @@ static int tdx_pamt_get(kvm_pfn_t pfn) if (ret) return ret; =20 - tdx_status =3D tdh_phymem_pamt_add(pfn, pamt_pages); - if (tdx_status !=3D TDX_SUCCESS) { - ret =3D -EIO; - goto out_free; + pamt_refcount =3D tdx_find_pamt_refcount(pfn); + + scoped_guard(spinlock, &pamt_lock) { + /* + * If the pamt page is already added (i.e. refcount >=3D 1), + * then just increment the refcount. + */ + if (atomic_read(pamt_refcount)) { + atomic_inc(pamt_refcount); + goto out_free; + } + + /* Try to add the pamt page and take the refcount 0->1. */ + tdx_status =3D tdh_phymem_pamt_add(pfn, pamt_pages); + if (WARN_ON_ONCE(tdx_status !=3D TDX_SUCCESS)) { + ret =3D -EIO; + goto out_free; + } + + atomic_set(pamt_refcount, 1); } =20 return 0; @@ -2069,26 +2089,46 @@ static int tdx_pamt_get(kvm_pfn_t pfn) return ret; } =20 -/* Free PAMT memory for the given page */ +/* + * Drop PAMT refcount for the given page and free PAMT memory if it is no + * longer needed. + */ static void tdx_pamt_put(kvm_pfn_t pfn) { struct page *pamt_pages[TDX_DPAMT_ENTRY_PAGE_CNT] =3D {}; + atomic_t *pamt_refcount; u64 tdx_status; =20 if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) return; =20 - tdx_status =3D tdh_phymem_pamt_remove(pfn, pamt_pages); + pamt_refcount =3D tdx_find_pamt_refcount(pfn); =20 - /* - * Don't free pamt_pages as it could hold garbage when - * tdh_phymem_pamt_remove() fails. Don't panic/BUG_ON(), as - * there is no risk of data corruption, but do yell loudly as - * failure indicates a kernel bug, memory is being leaked, and - * the dangling PAMT entry may cause future operations to fail. - */ - if (WARN_ON_ONCE(tdx_status !=3D TDX_SUCCESS)) - return; + scoped_guard(spinlock, &pamt_lock) { + /* + * If the there are more than 1 references on the pamt page, + * don't remove it yet. Just decrement the refcount. + */ + if (atomic_read(pamt_refcount) > 1) { + atomic_dec(pamt_refcount); + return; + } + + /* Try to remove the pamt page and take the refcount 1->0. */ + tdx_status =3D tdh_phymem_pamt_remove(pfn, pamt_pages); + + /* + * Don't free pamt_pages as it could hold garbage when + * tdh_phymem_pamt_remove() fails. Don't panic/BUG_ON(), as + * there is no risk of data corruption, but do yell loudly as + * failure indicates a kernel bug, memory is being leaked, and + * the dangling PAMT entry may cause future operations to fail. + */ + if (WARN_ON_ONCE(tdx_status !=3D TDX_SUCCESS)) + return; + + atomic_set(pamt_refcount, 0); + } =20 free_pamt_array(pamt_pages); } --=20 2.54.0 From nobody Mon Jun 8 22:51:56 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3CC3430EF91; Tue, 26 May 2026 02:35:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762935; cv=none; b=Xi6PvoAau2ANmp4RxSzrZ7VUEWIxkYVbefObWRN+lX1hdC3cCdI3gN4isEKImZv3pVjtZX+tWPLJxjU8GETAwpRElNZK7pTDjtByCKsm6R64k5yu7ccvRD6zvXWdJNxXBxeLd3bVDz4hN5r77yCrfUK38nWAzQNZ86NAIBbgksc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762935; c=relaxed/simple; bh=roAtp1tb8HH5eiV2GjzKq9nl3/PGnRumHdTZNS+LIaM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Em35Rh2p77c+IAME85WYh70mEQn1ANl71e0jjAAwXkLNaKMt7KwNp5P5QhE68qdMTBnLi4J4FVA+DjES6cX1kgp6FIhj99gi5cb5h8bFcOuEDW9VckzrgKKQv0geQ0C5vpiyD7F/9IHtxaR7PEwpyUQ/QO7WLc0ouUHYLvduUkM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=bo4iO5eR; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="bo4iO5eR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779762931; x=1811298931; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=roAtp1tb8HH5eiV2GjzKq9nl3/PGnRumHdTZNS+LIaM=; b=bo4iO5eRUVhzdwP+goukSM/fq+j/XyeqsSl9qZ/0mlXTg7e5+xTtZ9x+ nuALd1XZrUY+pYBlErIcYIbRBke7qpLbZyrjO2jLgAASA2DOGkzMZ5tZs Obic+Yjfr1JvcUQYLiunKOjO+rBIFCgGd1OEaMjqYrdNdLBrpIculYsrr ffqtF4tMTUPfDL4Q1Cs63k72LjfVQGWVSXnafgJqpu5TECjJCSogklexm IqQVy87ieR6y/T7VgqdGj3cxugztuwxO4uWlTon0QFez3SXq1k8JZJMN/ TztoU0GPoAJu4uVnTacUG9knMV6hvZnzDs4yX0hZihXTIwvAuujhcrJ/w w==; X-CSE-ConnectionGUID: vbLtjpxqTua9PB8Gj/Mzlw== X-CSE-MsgGUID: ooncipFgT7WICKu458f5Ow== X-IronPort-AV: E=McAfee;i="6800,10657,11797"; a="91677821" X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="91677821" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 X-CSE-ConnectionGUID: XLpXVkV6Qqy1ryRRTsYHTA== X-CSE-MsgGUID: qeATd1qGR++QxJFsCxh8Lw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="279878308" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 From: Rick Edgecombe To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com, kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com, seanjc@google.com, tglx@kernel.org, vannapurve@google.com, x86@kernel.org, chao.gao@intel.com, yan.y.zhao@intel.com, kai.huang@intel.com Cc: rick.p.edgecombe@intel.com, "Kirill A. Shutemov" Subject: [PATCH v6 06/11] x86/virt/tdx: Optimize tdx_pamt_get/put() Date: Mon, 25 May 2026 19:35:10 -0700 Message-ID: <20260526023515.288829-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526023515.288829-1-rick.p.edgecombe@intel.com> References: <20260526023515.288829-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: "Kirill A. Shutemov" The Dynamic PAMT get/put helpers use a global spinlock to serialize all refcount updates and SEAMCALL invocations. This gives correct behavior for concurrent callers, but leads to contention. It is especially bad from the KVM side, which is designed to allow faulting in EPT under a shared lock. With the global spinlock, not only is the lock an exclusive one, but it is for all TDs instead of just a single one. But taking the global lock each time is actually unnecessary. Only the 0->1 and 1->0 refcount transitions actually need the lock (to pair with SEAMCALLs that actually add and remove with the Dynamic PAMT pages). The common case of incrementing or decrementing a non-zero refcount can be done locklessly. So create a fast and slow path. Check the refcount outside the lock and only take it for the slowpath (0->1 and 1->0 transitions). On the put side make the refcount adjustment and lock taking atomic so if a 'get' happens between them, it doesn't cause the Dynamic PAMT to be freed incorrectly. On the get side there is no technique for doing the refcount adjustment and lock atomically, so check the refcount again inside the lock. Assisted-by: GitHub Copilot:claude-opus-4-6 Signed-off-by: Kirill A. Shutemov Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Tony Lindgren --- v6: - Fix "tdx_pamt_add()" typo to "tdx_pamt_get()" in lost-race comment - Fix error path bug: set ret =3D -EIO and use WARN_ON_ONCE() instead of pr_err() for unexpected PAMT.ADD failures (Sean) - Use "set the refcount 0->1" wording to match atomic_set() usage - Wrap comments to 80 columns - Switch to atomic_dec_and_lock() and remove handling of races that are no longer needed as a result. Adjust comments as appropriate. (Dave) - Adjustments from dropping error helper patches v4: - Use atomic_set() in the HPA_RANGE_NOT_FREE case (Kiryl) - Log, comment typos (Binbin) - Move PAMT page allocation after refcount check in tdx_pamt_get() to avoid an alloc/free in the common path. v3: - Split out optimization from =E2=80=9Cx86/virt/tdx: Add tdx_alloc/free_pa= ge() helpers=E2=80=9D - Remove edge case handling that I could not find a reason for - Write log --- arch/x86/virt/vmx/tdx/tdx.c | 102 +++++++++++++++++++++--------------- 1 file changed, 61 insertions(+), 41 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 50333eb96efa6..c41c632a4cdf2 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -2057,32 +2057,50 @@ static int tdx_pamt_get(kvm_pfn_t pfn) if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) return 0; =20 + pamt_refcount =3D tdx_find_pamt_refcount(pfn); + + /* + * If the pamt page is already added (i.e. refcount >=3D 1), + * then just increment the refcount. + */ + if (atomic_inc_not_zero(pamt_refcount)) + return 0; + ret =3D alloc_pamt_array(pamt_pages); if (ret) return ret; =20 - pamt_refcount =3D tdx_find_pamt_refcount(pfn); + spin_lock(&pamt_lock); =20 - scoped_guard(spinlock, &pamt_lock) { - /* - * If the pamt page is already added (i.e. refcount >=3D 1), - * then just increment the refcount. - */ - if (atomic_read(pamt_refcount)) { - atomic_inc(pamt_refcount); - goto out_free; - } - - /* Try to add the pamt page and take the refcount 0->1. */ - tdx_status =3D tdh_phymem_pamt_add(pfn, pamt_pages); - if (WARN_ON_ONCE(tdx_status !=3D TDX_SUCCESS)) { - ret =3D -EIO; - goto out_free; - } - - atomic_set(pamt_refcount, 1); + /* + * Unlike tdx_pamt_put() which uses atomic_dec_and_lock() to + * atomically handle the 1->0 transition, the get side has no + * equivalent combined primitive for 0->1. Recheck under the + * lock since another get may have already done the 0->1 + * transition after both saw atomic_inc_not_zero() fail. + */ + if (atomic_read(pamt_refcount)) { + atomic_inc(pamt_refcount); + spin_unlock(&pamt_lock); + goto out_free; } =20 + tdx_status =3D tdh_phymem_pamt_add(pfn, pamt_pages); + if (tdx_status =3D=3D TDX_SUCCESS) { + /* + * The refcount is zero, and this locked path is the + * only way to increase it from 0->1. + */ + atomic_set(pamt_refcount, 1); + } else { + WARN_ON_ONCE(1); + ret =3D -EIO; + spin_unlock(&pamt_lock); + goto out_free; + } + + spin_unlock(&pamt_lock); + return 0; out_free: free_pamt_array(pamt_pages); @@ -2104,32 +2122,34 @@ static void tdx_pamt_put(kvm_pfn_t pfn) =20 pamt_refcount =3D tdx_find_pamt_refcount(pfn); =20 - scoped_guard(spinlock, &pamt_lock) { + /* + * If there is more than 1 reference on the pamt page, don't + * remove it yet. Just decrement the refcount. + */ + if (!atomic_dec_and_lock(pamt_refcount, &pamt_lock)) + return; + + tdx_status =3D tdh_phymem_pamt_remove(pfn, pamt_pages); + + /* + * Don't free pamt_pages as it could hold garbage when + * tdh_phymem_pamt_remove() fails. Don't panic/BUG_ON(), as + * there is no risk of data corruption, but do yell loudly as + * failure indicates a kernel bug, memory is being leaked, and + * the dangling PAMT entry may cause future operations to fail. + */ + if (WARN_ON_ONCE(tdx_status !=3D TDX_SUCCESS)) { /* - * If the there are more than 1 references on the pamt page, - * don't remove it yet. Just decrement the refcount. + * atomic_dec_and_lock() already decremented it to 0, + * but the PAMT entry still exists since REMOVE failed. */ - if (atomic_read(pamt_refcount) > 1) { - atomic_dec(pamt_refcount); - return; - } - - /* Try to remove the pamt page and take the refcount 1->0. */ - tdx_status =3D tdh_phymem_pamt_remove(pfn, pamt_pages); - - /* - * Don't free pamt_pages as it could hold garbage when - * tdh_phymem_pamt_remove() fails. Don't panic/BUG_ON(), as - * there is no risk of data corruption, but do yell loudly as - * failure indicates a kernel bug, memory is being leaked, and - * the dangling PAMT entry may cause future operations to fail. - */ - if (WARN_ON_ONCE(tdx_status !=3D TDX_SUCCESS)) - return; - - atomic_set(pamt_refcount, 0); + atomic_set(pamt_refcount, 1); + spin_unlock(&pamt_lock); + return; } =20 + spin_unlock(&pamt_lock); + free_pamt_array(pamt_pages); } =20 --=20 2.54.0 From nobody Mon Jun 8 22:51:56 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB3F130E83F; Tue, 26 May 2026 02:35:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762937; cv=none; b=AKRw3fDHLR61d236K8Fpf1d5FPSkZnaszaM5kgMHIs6wzeXtO3busolGS0fKHrtTpHnV5jmoxUxJtRej3eAJQGG2H1D4R/tY6ul/U+4nY6MN0LhQ8ShN48pHGxgvOIMpGKXMUsvngvLGJd0uYMt56s7X0Yy9HT6ZLDvDWKzX3rk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762937; c=relaxed/simple; bh=6OPqLGzID/8yWm6UrSRl+4VbyudTj8S0aIJtkM1Jl78=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eKe4/nnRFuwjvgS0QV1jJCUcULuP9TIRJ7u0PzBNroiLBE5cjQkgq7h9ES70O6ZEjgH59jfyBXpvZdFIbXYeVJXeYndv285kQpoTE7m9fC7tolXTkS2yHGZ0fc4cm60i58dKX7X6MBETTxUOnDWxv6OPCaZHnbMkPu3dg5rWELE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=n+9gfAAt; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="n+9gfAAt" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779762936; x=1811298936; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6OPqLGzID/8yWm6UrSRl+4VbyudTj8S0aIJtkM1Jl78=; b=n+9gfAAtuFNeAEHXjvZHUmiZDRt95J61GqrUYQkFEYsLo8WPp3JiLkZj u0r1A+0AU6qZ1Euuh+m9LB9hdlcgFNWQb3KyJT59zvJRLeEcLWH/1HXD9 xh1OH0SBjKNq3VMNc/i2KxY8ZbN1E3qS7bcmvz9JvqP2DMfLBBTFnambN 4+QLY8vvx9upvlNtmw4JQlKg2Tj6q2M6CQHdOC/trxvoMAncbUnmM1Jm8 hLcsJMq1/5QcfqwFk7lpemP1RscWP443bvQKhBBYQ+WHEsRQOn5NNlzyA XCXSCiolKwrz2e6V92cOwv8XRzPPSWARSlr2p1HYoQiL7ZzsYTlKsssx2 Q==; X-CSE-ConnectionGUID: FZ6gjSCbQy6K1i6wLd1wpg== X-CSE-MsgGUID: YY3XCIaGS9OmRKE61yfwmA== X-IronPort-AV: E=McAfee;i="6800,10657,11797"; a="91677829" X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="91677829" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 X-CSE-ConnectionGUID: n2GKC6uIShyxAra1m/l7jQ== X-CSE-MsgGUID: oV/Ws18ZSHi1aErTacyYIA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="279878313" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 From: Rick Edgecombe To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com, kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com, seanjc@google.com, tglx@kernel.org, vannapurve@google.com, x86@kernel.org, chao.gao@intel.com, yan.y.zhao@intel.com, kai.huang@intel.com Cc: rick.p.edgecombe@intel.com, "Kirill A. Shutemov" Subject: [PATCH v6 07/11] KVM: TDX: Allocate PAMT memory for TD and vCPU control structures Date: Mon, 25 May 2026 19:35:11 -0700 Message-ID: <20260526023515.288829-8-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526023515.288829-1-rick.p.edgecombe@intel.com> References: <20260526023515.288829-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kirill A. Shutemov" Use control page helpers for allocating and freeing TD control structures, such these operations can work for Dynamic PAMT. The TDX module tracks some state for each page of physical memory that it might use. It calls this state the PAMT. It includes separate state for each page size a physical page could be utilized at within the TDX module (1GB, 2MB, 4KB). In Dynamic PAMT, only the 4KB page size state is allocated dynamically. So the kernel must install PAMT backing for each 4KB page before gifting it to the TDX module, and tear it down after the page is reclaimed. TD-scoped control pages (TDR, TDCS) and vCPU-scoped control pages (TDVPR, TDCX) are all handed to the TDX module at 4KB page size and are therefore subject to this requirement. Replace the raw alloc_page()/__free_page() calls for these pages with tdx_alloc/free_control_page(). Switching between special Dynamic PAMT operations or normal page alloc/free operations is handled internally in tdx_alloc/free_control_page(). So don't check for Dynamic PAMT around these calls. Just call them unconditionally. Similarly, drop the NULL checks before freeing, as tdx_free_control_page() handles NULL internally. No functional change intended when Dynamic PAMT is not in use. Assisted-by: GitHub Copilot:claude-opus-4-6 Claude:claude-opus-4-7 Signed-off-by: Kirill A. Shutemov [sean: handle alloc+free+reclaim in one patch] Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson [Rick: enhance log] Signed-off-by: Rick Edgecombe Reviewed-by: Tony Lindgren --- arch/x86/kvm/vmx/tdx.c | 35 ++++++++++++++--------------------- 1 file changed, 14 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 2539107e0ad3d..3e67e2471ffe3 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -362,7 +362,7 @@ static void tdx_reclaim_control_page(struct page *ctrl_= page) if (tdx_reclaim_page(ctrl_page)) return; =20 - __free_page(ctrl_page); + tdx_free_control_page(ctrl_page); } =20 struct tdx_flush_vp_arg { @@ -599,7 +599,7 @@ static void tdx_reclaim_td_control_pages(struct kvm *kv= m) =20 tdx_quirk_reset_paddr(page_to_phys(kvm_tdx->td.tdr_page), PAGE_SIZE); =20 - __free_page(kvm_tdx->td.tdr_page); + tdx_free_control_page(kvm_tdx->td.tdr_page); kvm_tdx->td.tdr_page =3D NULL; } =20 @@ -2444,7 +2444,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_p= arams *td_params, =20 ret =3D -ENOMEM; =20 - tdr_page =3D alloc_page(GFP_KERNEL_ACCOUNT); + tdr_page =3D tdx_alloc_control_page(); if (!tdr_page) goto free_hkid; =20 @@ -2458,7 +2458,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_p= arams *td_params, goto free_tdr; =20 for (i =3D 0; i < kvm_tdx->td.tdcs_nr_pages; i++) { - tdcs_pages[i] =3D alloc_page(GFP_KERNEL_ACCOUNT); + tdcs_pages[i] =3D tdx_alloc_control_page(); if (!tdcs_pages[i]) goto free_tdcs; } @@ -2576,10 +2576,8 @@ static int __tdx_td_init(struct kvm *kvm, struct td_= params *td_params, teardown: /* Only free pages not yet added, so start at 'i' */ for (; i < kvm_tdx->td.tdcs_nr_pages; i++) { - if (tdcs_pages[i]) { - __free_page(tdcs_pages[i]); - tdcs_pages[i] =3D NULL; - } + tdx_free_control_page(tdcs_pages[i]); + tdcs_pages[i] =3D NULL; } if (!kvm_tdx->td.tdcs_pages) kfree(tdcs_pages); @@ -2594,16 +2592,13 @@ static int __tdx_td_init(struct kvm *kvm, struct td= _params *td_params, free_cpumask_var(packages); =20 free_tdcs: - for (i =3D 0; i < kvm_tdx->td.tdcs_nr_pages; i++) { - if (tdcs_pages[i]) - __free_page(tdcs_pages[i]); - } + for (i =3D 0; i < kvm_tdx->td.tdcs_nr_pages; i++) + tdx_free_control_page(tdcs_pages[i]); kfree(tdcs_pages); kvm_tdx->td.tdcs_pages =3D NULL; =20 free_tdr: - if (tdr_page) - __free_page(tdr_page); + tdx_free_control_page(tdr_page); kvm_tdx->td.tdr_page =3D NULL; =20 free_hkid: @@ -2933,7 +2928,7 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u6= 4 vcpu_rcx) int ret, i; u64 err; =20 - page =3D alloc_page(GFP_KERNEL_ACCOUNT); + page =3D tdx_alloc_control_page(); if (!page) return -ENOMEM; tdx->vp.tdvpr_page =3D page; @@ -2953,7 +2948,7 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u6= 4 vcpu_rcx) } =20 for (i =3D 0; i < kvm_tdx->td.tdcx_nr_pages; i++) { - page =3D alloc_page(GFP_KERNEL_ACCOUNT); + page =3D tdx_alloc_control_page(); if (!page) { ret =3D -ENOMEM; goto free_tdcx; @@ -2975,7 +2970,7 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u6= 4 vcpu_rcx) * method, but the rest are freed here. */ for (; i < kvm_tdx->td.tdcx_nr_pages; i++) { - __free_page(tdx->vp.tdcx_pages[i]); + tdx_free_control_page(tdx->vp.tdcx_pages[i]); tdx->vp.tdcx_pages[i] =3D NULL; } return -EIO; @@ -3003,16 +2998,14 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, = u64 vcpu_rcx) =20 free_tdcx: for (i =3D 0; i < kvm_tdx->td.tdcx_nr_pages; i++) { - if (tdx->vp.tdcx_pages[i]) - __free_page(tdx->vp.tdcx_pages[i]); + tdx_free_control_page(tdx->vp.tdcx_pages[i]); tdx->vp.tdcx_pages[i] =3D NULL; } kfree(tdx->vp.tdcx_pages); tdx->vp.tdcx_pages =3D NULL; =20 free_tdvpr: - if (tdx->vp.tdvpr_page) - __free_page(tdx->vp.tdvpr_page); + tdx_free_control_page(tdx->vp.tdvpr_page); tdx->vp.tdvpr_page =3D NULL; tdx->vp.tdvpr_pa =3D 0; =20 --=20 2.54.0 From nobody Mon Jun 8 22:51:56 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 04ADC30FC26; Tue, 26 May 2026 02:35:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762936; cv=none; b=SxskdDG6myH21JKYfCJQ8wQ5+7K+f0KiaghrJRJ0QIOTUlLu4j/5RyrnmCiy/vB1TzvoNKYd/0oO7mLrHslhfuJrZikDPbFNPoIRSG41x+xCE/A9j3f2wchwosTwIlUZlBO5BwgyGhJEtqYl95shxCmHDYSAPVydjxY4li/nV0M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762936; c=relaxed/simple; bh=OmHDCZZupE7O81xE8LbEYOSeBHx149NCgEwlWGXPDCU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VPLZrv1NGiv6ulXQZi+5EMLzo35MBTBy3slI9v4TVZp9PoLWF3/WY2dLsv3BnLUJIfYJR/5pSD1cb3+9SyjcsPUbQnWRM7SedAfblsEXhYkmB78frLUWE6exs60rkD1GceW0Ta3ppNOW/G4bMvXceGfmkKixM3wIj8iBMAX3Hug= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=LUxd3se5; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="LUxd3se5" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779762934; x=1811298934; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OmHDCZZupE7O81xE8LbEYOSeBHx149NCgEwlWGXPDCU=; b=LUxd3se5g8DF1ijQzUFHzYbIn/Cy4Dk53nBEm59m/euMlL2/uf8bx54Z v5GDVUY5nRDQQrj+WjjP90kctb4dQYQ+T8WuAEYbHKyJyaDe47llz6TJu BIkmIHkcAbbbbc9uz9IbQ0tdvBT4ejUFiCynFtNv67XY+knIs8G5hqnGE 0ujkNrrAiRQ7jN7I5d2MBA+M/6v8PJMQIol71qrWW6bAHWyVWJNm8XHCB 262qdGWFlHptt5LlomuVNVAnGmBx8WdVBRi+BmcM+gC8gFPFCBUvkb4zQ BrBTlSapwX8GKBxGDl2mQ107xBI5EXtk5RP5e09IgnRbI67XAj54MWweQ A==; X-CSE-ConnectionGUID: VbDuzsO9R/CAjf+GeAWeBA== X-CSE-MsgGUID: EAF88qP8Tn69TG0mAz5Mxw== X-IronPort-AV: E=McAfee;i="6800,10657,11797"; a="91677838" X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="91677838" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 X-CSE-ConnectionGUID: UbKHeqCuQhWpAr/y+92E7A== X-CSE-MsgGUID: SMYgsFmETsyv3TV3QFMXVw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="279878317" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 From: Rick Edgecombe To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com, kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com, seanjc@google.com, tglx@kernel.org, vannapurve@google.com, x86@kernel.org, chao.gao@intel.com, yan.y.zhao@intel.com, kai.huang@intel.com Cc: rick.p.edgecombe@intel.com Subject: [PATCH v6 08/11] x86/tdx: Add APIs to support Dynamic PAMT ops from KVM's fault path Date: Mon, 25 May 2026 19:35:12 -0700 Message-ID: <20260526023515.288829-9-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526023515.288829-1-rick.p.edgecombe@intel.com> References: <20260526023515.288829-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When handling an EPT violation, KVM holds a spinlock while manipulating the EPT. Before entering the spinlock it doesn't know how many EPT page tables will need to be installed or whether a huge page will be used. For this reason it allocates a worst case number of page tables that it might need as part of servicing the EPT violation. Under Dynamic PAMT these pre-allocated pages will potentially need to have Dynamic PAMT backing pages installed for them. KVM already has helpers to manage topping up page caches before taking the MMU lock, but they cannot be passed from KVM to arch/x86 code. The problem of how and when to install the DPAMT backing pages for the pages given to the TDX module during the fault path has had a lot of design attempts. - Extracting KVM's MMU caches requires too much inlined code added to headers. - A few varieties of installing Dynamic PAMT backing when allocating the S-EPT page tables. [0][1] - Using mempool_t to transfer the pages between KVM and arch/x86 doesn't work because it is the component is designed more around maintaining a pool of pages, rather than topping up a continually drained cache. So don't do these as they all had various problems. Instead just create a small simple data structure to use for handing a pre-allocated list of pages between KVM and arch/x86 code. Model this on KVM's existing MMU memory caches. Add a tdx_pamt_cache arg to tdx_pamt_get() so it can draw pages from a cache when needed. Not all DPAMT page installations will happen under spinlock, for example control pages. So have tdx_pamt_get() maintain the existing behavior of allocating from the page allocator when NULL is passed for the struct tdx_pamt_cache arg. This prevents excess allocations for cases where it can be avoided. Export the new helpers for KVM. Assisted-by: GitHub Copilot:claude-opus-4-6 Claude:claude-opus-4-7 Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe Link: https://lore.kernel.org/kvm/de05853257e9cc66998101943f78a4b7e6e3d741.= camel@intel.com/ [0] Link: https://lore.kernel.org/kvm/aYprxnSHKHUtk7pt@google.com/ [1] Reviewed-by: Kiryl Shutsemau (Meta) Reviewed-by: Tony Lindgren --- v6: - Filled out log from Sean's series --- arch/x86/include/asm/tdx.h | 17 ++++++++++ arch/x86/virt/vmx/tdx/tdx.c | 65 +++++++++++++++++++++++++++++++++---- 2 files changed, 76 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 74e75db5728c7..191da84bbf2a1 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -155,6 +155,23 @@ static inline bool tdx_supports_dynamic_pamt(const str= uct tdx_sys_info *sysinfo) return false; /* To be enabled when kernel is ready */ } =20 +/* Simple structure for pre-allocating Dynamic PAMT pages outside of locks= . */ +struct tdx_pamt_cache { + struct list_head page_list; + int cnt; +}; + +static inline void tdx_init_pamt_cache(struct tdx_pamt_cache *cache) +{ + INIT_LIST_HEAD(&cache->page_list); + cache->cnt =3D 0; +} + +void tdx_free_pamt_cache(struct tdx_pamt_cache *cache); +int tdx_topup_pamt_cache(struct tdx_pamt_cache *cache, unsigned long npage= s); +int tdx_pamt_get(kvm_pfn_t pfn, struct tdx_pamt_cache *cache); +void tdx_pamt_put(kvm_pfn_t pfn); + int tdx_guest_keyid_alloc(void); u32 tdx_get_nr_guest_keyids(void); void tdx_guest_keyid_free(unsigned int keyid); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index c41c632a4cdf2..3544794fb092a 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1971,12 +1971,33 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, kvm_pfn_t= pfn) } EXPORT_SYMBOL_FOR_KVM(tdh_phymem_page_wbinvd_hkid); =20 -static int alloc_pamt_array(struct page **pamt_pages) +static struct page *tdx_alloc_page_pamt_cache(struct tdx_pamt_cache *cache) +{ + struct page *page; + + page =3D list_first_entry_or_null(&cache->page_list, struct page, lru); + if (page) { + list_del(&page->lru); + cache->cnt--; + } + + return page; +} + +static struct page *alloc_dpamt_page(struct tdx_pamt_cache *cache) +{ + if (cache) + return tdx_alloc_page_pamt_cache(cache); + + return alloc_page(GFP_KERNEL_ACCOUNT); +} + +static int alloc_pamt_array(struct page **pamt_pages, struct tdx_pamt_cach= e *cache) { int i, j; =20 for (i =3D 0; i < TDX_DPAMT_ENTRY_PAGE_CNT; i++) { - pamt_pages[i] =3D alloc_page(GFP_KERNEL_ACCOUNT); + pamt_pages[i] =3D alloc_dpamt_page(cache); if (!pamt_pages[i]) goto err; } @@ -2047,7 +2068,7 @@ static u64 tdh_phymem_pamt_remove(kvm_pfn_t pfn, stru= ct page **pamt_pages) static DEFINE_SPINLOCK(pamt_lock); =20 /* Bump PAMT refcount for the given page and allocate PAMT memory if neede= d */ -static int tdx_pamt_get(kvm_pfn_t pfn) +int tdx_pamt_get(kvm_pfn_t pfn, struct tdx_pamt_cache *cache) { struct page *pamt_pages[TDX_DPAMT_ENTRY_PAGE_CNT]; atomic_t *pamt_refcount; @@ -2066,7 +2087,7 @@ static int tdx_pamt_get(kvm_pfn_t pfn) if (atomic_inc_not_zero(pamt_refcount)) return 0; =20 - ret =3D alloc_pamt_array(pamt_pages); + ret =3D alloc_pamt_array(pamt_pages, cache); if (ret) return ret; =20 @@ -2106,12 +2127,13 @@ static int tdx_pamt_get(kvm_pfn_t pfn) free_pamt_array(pamt_pages); return ret; } +EXPORT_SYMBOL_FOR_KVM(tdx_pamt_get); =20 /* * Drop PAMT refcount for the given page and free PAMT memory if it is no * longer needed. */ -static void tdx_pamt_put(kvm_pfn_t pfn) +void tdx_pamt_put(kvm_pfn_t pfn) { struct page *pamt_pages[TDX_DPAMT_ENTRY_PAGE_CNT] =3D {}; atomic_t *pamt_refcount; @@ -2152,6 +2174,37 @@ static void tdx_pamt_put(kvm_pfn_t pfn) =20 free_pamt_array(pamt_pages); } +EXPORT_SYMBOL_FOR_KVM(tdx_pamt_put); + +void tdx_free_pamt_cache(struct tdx_pamt_cache *cache) +{ + struct page *page; + + while ((page =3D tdx_alloc_page_pamt_cache(cache))) + __free_page(page); +} +EXPORT_SYMBOL_FOR_KVM(tdx_free_pamt_cache); + +int tdx_topup_pamt_cache(struct tdx_pamt_cache *cache, unsigned long npage= s) +{ + if (WARN_ON_ONCE(!tdx_supports_dynamic_pamt(&tdx_sysinfo))) + return 0; + + npages *=3D TDX_DPAMT_ENTRY_PAGE_CNT; + + while (cache->cnt < npages) { + struct page *page =3D alloc_page(GFP_KERNEL_ACCOUNT); + + if (!page) + return -ENOMEM; + + list_add(&page->lru, &cache->page_list); + cache->cnt++; + } + + return 0; +} +EXPORT_SYMBOL_FOR_KVM(tdx_topup_pamt_cache); =20 /* * Return a page that can be gifted to the TDX-Module for use as a "contro= l" @@ -2167,7 +2220,7 @@ struct page *tdx_alloc_control_page(void) if (!page) return NULL; =20 - if (tdx_pamt_get(page_to_pfn(page))) { + if (tdx_pamt_get(page_to_pfn(page), NULL)) { __free_page(page); return NULL; } --=20 2.54.0 From nobody Mon Jun 8 22:51:56 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D2B15315D40; Tue, 26 May 2026 02:35:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762938; cv=none; b=FvA+tOyarspoJkqmU55Ozy2oUSyYAyFqvIWNtedrTBFIYaM3x1CcO6ojGmjAQDYhnO2nuKMsDbbCUCRK7sqe5hwWZn0n7hf2SE+sPuK0jJXzmhtrbfiw58vyJudtApdQRnkPaQiPHGDmZTOiEe8lh1bvSxwsSmMOtBH9M3QasyI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762938; c=relaxed/simple; bh=x4M3BHIRpF+bPtXgd/Cy9C6LdFBaMDPFDLmTYYjTwHI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gDnGP3LZPUD8PQLqwFVzVqeNiHKauvwl8k+SeuTZfsaIJoPgeWQWAJcgn4Ucq2Tl3qPyCtYnIRpQkwTZq/3j+l5Id/Sf+3jQp2xUqA9JWpRHzmrwkR4TzN3PpA5iE2Qqc8f7UIFaeTp7h8/tJ5PvmxD3eHH3CgJP3oi2YGz0bP0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=F9JPz4Bz; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="F9JPz4Bz" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779762936; x=1811298936; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=x4M3BHIRpF+bPtXgd/Cy9C6LdFBaMDPFDLmTYYjTwHI=; b=F9JPz4BzWIK/WTANwhybIEvUjsI8Dd3aTwSsawfKEslBX0bukrqMO0Np pmgQb0Maq6p7AR65mwHG0Fw63t9j105O4JglrNFFjgEuBSAqw7bRuPykP IMS1Hnq1QAUuTttjCNSipD40HsJ5qLGySPL6GBwahhvlNa06yCGA9DaMy tT2pgv0ZHwLYL64w+eVhLyqjcofCNxustgGPXJLmJylBQbjCYsWdhm/sA ncsHmheVYlKVrJKVeItBZP9ktznpUyzBNN+COohTQLS//hANGLdBq2jcr zwyoxMbBKhAKYkolx9ulcN/MT3SZp06U/J+nVnuvHUJ/gtNkWChJnfOw5 Q==; X-CSE-ConnectionGUID: ExdPvukfTJSI1uKWjMY2Dg== X-CSE-MsgGUID: ksfsbpUWQNu5CzJYffTaJA== X-IronPort-AV: E=McAfee;i="6800,10657,11797"; a="91677846" X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="91677846" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 X-CSE-ConnectionGUID: Oxd/nPBVQXqBWHIJDRbqDA== X-CSE-MsgGUID: YLmp9nw3Swegidwv5nab4A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="279878323" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 From: Rick Edgecombe To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com, kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com, seanjc@google.com, tglx@kernel.org, vannapurve@google.com, x86@kernel.org, chao.gao@intel.com, yan.y.zhao@intel.com, kai.huang@intel.com Cc: rick.p.edgecombe@intel.com, "Kirill A. Shutemov" Subject: [PATCH v6 09/11] KVM: TDX: Get/put PAMT pages when (un)mapping private memory Date: Mon, 25 May 2026 19:35:13 -0700 Message-ID: <20260526023515.288829-10-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526023515.288829-1-rick.p.edgecombe@intel.com> References: <20260526023515.288829-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kirill A. Shutemov" Add Dynamic PAMT support to KVM's S-EPT MMU by "getting" a PAMT page when adding guest memory (PAGE.ADD or PAGE.AUG), and "putting" the page when removing guest memory (PAGE.REMOVE). To access the per-vCPU PAMT caches without plumbing @vcpu throughout the TDP MMU, begrudgingly use kvm_get_running_vcpu() to get the vCPU, and bug the VM if KVM attempts to set an S-EPT leaf without an active vCPU. KVM only supports creating _new_ mappings in page (pre)fault paths, all of which require an active vCPU. The PAMT memory holds metadata for TDX-protected memory. With Dynamic PAMT, PAMT_4K is allocated on demand. The kernel supplies the TDX module with a few pages that cover 2M of host physical memory. Releases are balanced via tdx_pamt_put(): every control-page free goes through tdx_free_control_page(), and guest data pages are put directly on the successful tdh_mem_page_remove() path and in the tdx_mem_page_add/aug() error path. Assisted-by: Sashiko:claude-opus-4-6 GitHub Copilot:claude-opus-4-6 Claude:= claude-opus-4-7 Signed-off-by: Kirill A. Shutemov Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Tony Lindgren --- v6: - Don't have topup op take a min param (Yan, Sean) - Make log match style of the rest of the series - Adjustments from dropping error helper patches --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 2 + arch/x86/kvm/mmu/mmu.c | 4 ++ arch/x86/kvm/vmx/tdx.c | 65 ++++++++++++++++++++++++++---- arch/x86/kvm/vmx/tdx.h | 2 + 5 files changed, 66 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 10ccf6ea9d9a2..320f1d30edacc 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -97,6 +97,7 @@ KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) KVM_X86_OP_OPTIONAL(free_external_spt) +KVM_X86_OP_OPTIONAL_RET0(topup_external_cache) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) KVM_X86_OP(get_l2_tsc_multiplier) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 6b28dd387bc61..bfe92e993a212 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1898,6 +1898,8 @@ struct kvm_x86_ops { /* Update external page tables for page table about to be freed. */ void (*free_external_spt)(struct kvm *kvm, struct kvm_mmu_page *sp); =20 + int (*topup_external_cache)(struct kvm_vcpu *vcpu, int min_nr_spts); + =20 bool (*has_wbinvd_exit)(void); =20 diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 892246204435c..2a48fc7fccc11 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -607,6 +607,10 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vc= pu, bool maybe_indirect) PT64_ROOT_MAX_LEVEL); if (r) return r; + + r =3D kvm_x86_call(topup_external_cache)(vcpu, PT64_ROOT_MAX_LEVEL); + if (r) + return r; } r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache, PT64_ROOT_MAX_LEVEL); diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 3e67e2471ffe3..ee073cacafbec 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -685,6 +685,8 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) if (!irqchip_split(vcpu->kvm)) return -EINVAL; =20 + tdx_init_pamt_cache(&tdx->pamt_cache); + fpstate_set_confidential(&vcpu->arch.guest_fpu); vcpu->arch.apic->guest_apic_protected =3D true; INIT_LIST_HEAD(&tdx->vt.pi_wakeup_list); @@ -870,6 +872,8 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu) struct vcpu_tdx *tdx =3D to_tdx(vcpu); int i; =20 + tdx_free_pamt_cache(&tdx->pamt_cache); + if (vcpu->cpu !=3D -1) { KVM_BUG_ON(tdx->state =3D=3D VCPU_TD_STATE_INITIALIZED, vcpu->kvm); tdx_flush_vp_on_cpu(vcpu); @@ -1611,6 +1615,16 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t r= oot_hpa, int pgd_level) td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa); } =20 +static int tdx_topup_external_pamt_cache(struct kvm_vcpu *vcpu, int min_nr= _spts) +{ + /* + * Don't cover the root SPT, but cover a possible 4KB private + * page in addition to the SPTs. So -1 to exclude the root + * SPT, and +1 for the guest page cancel out. + */ + return tdx_topup_pamt_cache(&to_tdx(vcpu)->pamt_cache, min_nr_spts); +} + static int tdx_mem_page_add(struct kvm *kvm, gfn_t gfn, enum pg_level leve= l, kvm_pfn_t pfn) { @@ -1669,16 +1683,29 @@ static struct page *tdx_spte_to_sept_pt(struct kvm = *kvm, gfn_t gfn, static int tdx_sept_map_nonleaf_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 new_spte) { + struct kvm_vcpu *vcpu =3D kvm_get_running_vcpu(); + struct vcpu_tdx *tdx =3D to_tdx(vcpu); gpa_t gpa =3D gfn_to_gpa(gfn); u64 err, entry, level_state; struct page *sept_pt; + int ret; + + if (KVM_BUG_ON(!vcpu, kvm)) + return -EIO; =20 sept_pt =3D tdx_spte_to_sept_pt(kvm, gfn, new_spte, level); if (!sept_pt) return -EIO; =20 + ret =3D tdx_pamt_get(page_to_pfn(sept_pt), &tdx->pamt_cache); + if (ret) + return ret; + err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, level, sept_pt, &entry, &level_state); + if (err) + tdx_pamt_put(page_to_pfn(sept_pt)); + if (unlikely(tdx_operand_busy(err))) return -EBUSY; =20 @@ -1691,8 +1718,14 @@ static int tdx_sept_map_nonleaf_spte(struct kvm *kvm= , gfn_t gfn, static int tdx_sept_map_leaf_spte(struct kvm *kvm, gfn_t gfn, enum pg_leve= l level, u64 new_spte) { + struct kvm_vcpu *vcpu =3D kvm_get_running_vcpu(); struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); kvm_pfn_t pfn =3D spte_to_pfn(new_spte); + struct vcpu_tdx *tdx =3D to_tdx(vcpu); + int ret; + + if (KVM_BUG_ON(!vcpu, kvm)) + return -EIO; =20 /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) @@ -1700,6 +1733,10 @@ static int tdx_sept_map_leaf_spte(struct kvm *kvm, g= fn_t gfn, enum pg_level leve =20 WARN_ON_ONCE((new_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); =20 + ret =3D tdx_pamt_get(pfn, &tdx->pamt_cache); + if (ret) + return ret; + /* * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory() * before kvm_tdx->state. Userspace must not be allowed to pre-fault @@ -1712,10 +1749,15 @@ static int tdx_sept_map_leaf_spte(struct kvm *kvm, = gfn_t gfn, enum pg_level leve * If the TD isn't finalized/runnable, then userspace is initializing * the VM image via KVM_TDX_INIT_MEM_REGION; ADD the page to the TD. */ - if (unlikely(kvm_tdx->state !=3D TD_STATE_RUNNABLE)) - return tdx_mem_page_add(kvm, gfn, level, pfn); + if (likely(kvm_tdx->state =3D=3D TD_STATE_RUNNABLE)) + ret =3D tdx_mem_page_aug(kvm, gfn, level, pfn); + else + ret =3D tdx_mem_page_add(kvm, gfn, level, pfn); =20 - return tdx_mem_page_aug(kvm, gfn, level, pfn); + if (ret) + tdx_pamt_put(pfn); + + return ret; } =20 /* @@ -1812,6 +1854,7 @@ static int tdx_sept_remove_leaf_spte(struct kvm *kvm,= gfn_t gfn, return -EIO; =20 tdx_quirk_reset_paddr(PFN_PHYS(pfn), PAGE_SIZE); + tdx_pamt_put(pfn); return 0; } =20 @@ -1855,6 +1898,8 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, u64 old_spte, */ static void tdx_sept_free_private_spt(struct kvm *kvm, struct kvm_mmu_page= *sp) { + struct page *sept_pt =3D virt_to_page(sp->external_spt); + /* * KVM doesn't (yet) zap page table pages in mirror page table while * TD is active, though guest pages mapped in mirror page table could be @@ -1868,15 +1913,15 @@ static void tdx_sept_free_private_spt(struct kvm *k= vm, struct kvm_mmu_page *sp) * the page to prevent the kernel from accessing the encrypted page. */ if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) || - tdx_reclaim_page(virt_to_page(sp->external_spt))) + tdx_reclaim_page(sept_pt)) goto out; =20 /* - * Immediately free the S-EPT page because RCU-time free is unnecessary - * after TDH.PHYMEM.PAGE.RECLAIM ensures there are no outstanding - * readers. + * Immediately free the S-EPT page as the TDX subsystem doesn't support + * freeing pages from RCU callbacks, and more importantly because + * TDH.PHYMEM.PAGE.RECLAIM ensures there are no outstanding readers. */ - free_page((unsigned long)sp->external_spt); + tdx_free_control_page(sept_pt); out: sp->external_spt =3D NULL; } @@ -3468,6 +3513,10 @@ int __init tdx_hardware_setup(void) =20 vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; vt_x86_ops.free_external_spt =3D tdx_sept_free_private_spt; + + if (tdx_supports_dynamic_pamt(tdx_sysinfo)) + vt_x86_ops.topup_external_cache =3D tdx_topup_external_pamt_cache; + vt_x86_ops.protected_apic_has_interrupt =3D tdx_protected_apic_has_interr= upt; return 0; =20 diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index b5cd2ffb303e5..47334a5a74eab 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -73,6 +73,8 @@ struct vcpu_tdx { =20 u64 map_gpa_next; u64 map_gpa_end; + + struct tdx_pamt_cache pamt_cache; }; =20 void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 e= rr); --=20 2.54.0 From nobody Mon Jun 8 22:51:56 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 81D1D3264F2; Tue, 26 May 2026 02:35:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762939; cv=none; b=t32eady8i0BPUJKB5+icY7Tto6lelvvZFjPB4vOlJzdmxsHo1mZO9/DnpMNS29LeaCKUUUA7iyFALHbAzFggjBwT7cLyOWnUUefVEDAhevaPkkdCdt/PjzJt+RDR/iW4oXTeGuPcfwTuY5xKKjQMEqjJsbL6pBJMRai0FYB/Opc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762939; c=relaxed/simple; bh=6WuNs68Db22YKlyZABLkieFbKvfcpx6Rf74QPb8/WiQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mfL76aH+k3OsiKLAiLR83i9aVJeywY8hjhMuGchqk0aFQmmRF/P1O7hah/kDXiTz6uBmlgR0we1i2Eqtr4z5hYRzx6z8U1v2XufjrtRM3lmbhBbwFpPO0wOhEYZ2b6brWOJqioOjYxFbKh0KVbJ6RpesyUGq6MG6PUbBQctpeyg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=bOFxdF9C; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="bOFxdF9C" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779762937; x=1811298937; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=6WuNs68Db22YKlyZABLkieFbKvfcpx6Rf74QPb8/WiQ=; b=bOFxdF9CBjg0K/np9IkGL1suJCe2JrhWXf9XFaWcsJGm/KRklVUfoHA+ 8susST3ENzD6svEWoMRU5jxy6Z+kigxGIhEZQFupPNf6zk9KA5NFzQ+54 FXe+MxN4RdWTNMFT8rYfgVwzn0ZarpsLpQKXFHK3vZs7I3MWVJnsj8vfh Lp6iICzu8Zcfb88dAWwJ84pYvoiLwZ+wSMboGAlc2LWgFFCp/I4XApifb 2QAOU2tmvXh9UX3z0Wdw0bGhIVd+cOSVy9XoIOC5Qy66getxyySPA74jK 1oJxggd6yqg6u7XBKHVenJAO6AyM1StCjDtK990hk103rjmo8b6xD7eTu A==; X-CSE-ConnectionGUID: 6r9fYfiHQPKz7Z5ss31sVQ== X-CSE-MsgGUID: RF/dQPOZSGi86LrDk22G3g== X-IronPort-AV: E=McAfee;i="6800,10657,11797"; a="91677854" X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="91677854" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 X-CSE-ConnectionGUID: 2Ngf3jtgRpWTN/eSGePtIw== X-CSE-MsgGUID: kDW/DPGxS6CHeP/Vo9jHSg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="279878326" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 From: Rick Edgecombe To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com, kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com, seanjc@google.com, tglx@kernel.org, vannapurve@google.com, x86@kernel.org, chao.gao@intel.com, yan.y.zhao@intel.com, kai.huang@intel.com Cc: rick.p.edgecombe@intel.com, "Kirill A. Shutemov" Subject: [PATCH v6 10/11] x86/virt/tdx: Enable Dynamic PAMT Date: Mon, 25 May 2026 19:35:14 -0700 Message-ID: <20260526023515.288829-11-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526023515.288829-1-rick.p.edgecombe@intel.com> References: <20260526023515.288829-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kirill A. Shutemov" The Physical Address Metadata Table (PAMT) holds TDX metadata for physical memory and must be allocated by the kernel during TDX module initialization. Dynamic PAMT is a TDX module feature that can reduce this memory use by allocating part of the PAMT dynamically. All pieces are in place to Enable Dynamic PAMT if it is supported. Determine if the TDX module supports it by checking the 'features0' bit exposed by the TDX module. The TDX module also exposes information about whether the *system* (and not the module) supports Dynamic PAMT. The TDX module documentation describes how PAMT works internally. To allow the last level to be dynamically allocated, it uses a 3 level tree structure, not unlike page tables. Like page tables, it has a maximum address space that it can cover. This address space can be covered in 48 bits. If the host physical address space is higher than this, than the TDX module can't guarantee the tree will be able to cover the TDX memory. The TDX module exposes this system support via metadata stating the minimum number of HKIDs that need to be available in order for Dynamic PAMT to be usable. The reasoning appears to be that more HKIDs can shrink the "real" addressable physical address bits enough to make the 48 bit Dynamic PAMT limit workable on high physical address width HW. However, the docs also clearly explain the 48 bit limit and how this fits into the Dymamic PAMT tree constraints. The handy x86_phys_bits value is already read and adjusted for keyid bits. So just compare that against 48 instead of reading more metadata and burdening the code with the more tenuous connection to minimum HKID bits. Signed-off-by: Kirill A. Shutemov Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Tony Lindgren --- v6: - After Nikolai pointed out that the TDX docs actually have the Dynamic PAMT pages-per-2MB region fixed at 2 instead of variable sized, I checked over the docs more closely looking for anything else that might have been missed. Spotted this 48 bit physical address bit check in the docs, so added it. --- arch/x86/include/asm/tdx.h | 11 ++++++++++- arch/x86/virt/vmx/tdx/tdx.c | 11 +++++++++-- arch/x86/virt/vmx/tdx/tdx.h | 3 --- 3 files changed, 19 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 191da84bbf2a1..187014686df3e 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -33,6 +33,10 @@ #define TDX_SUCCESS 0ULL #define TDX_RND_NO_ENTROPY 0x8000020300000000ULL =20 +/* Bit definitions of TDX_FEATURES0 metadata field */ +#define TDX_FEATURES0_NO_RBP_MOD BIT_ULL(18) +#define TDX_FEATURES0_DYNAMIC_PAMT BIT_ULL(36) + #ifndef __ASSEMBLER__ =20 #include @@ -152,7 +156,12 @@ const struct tdx_sys_info *tdx_get_sysinfo(void); =20 static inline bool tdx_supports_dynamic_pamt(const struct tdx_sys_info *sy= sinfo) { - return false; /* To be enabled when kernel is ready */ + /* + * The TDX Module's internal Dynamic PAMT tree structure can't + * handle physical addresses with more than 48 bits. + */ + return sysinfo->features.tdx_features0 & TDX_FEATURES0_DYNAMIC_PAMT && + boot_cpu_data.x86_phys_bits <=3D 48; } =20 /* Simple structure for pre-allocating Dynamic PAMT pages outside of locks= . */ diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 3544794fb092a..75140511571bf 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1028,8 +1028,9 @@ static __init int construct_tdmrs(struct list_head *t= mb_list, return ret; } =20 -static __init int config_tdx_module(struct tdmr_info_list *tdmr_list, - u64 global_keyid) +#define TDX_SYS_CONFIG_DYNAMIC_PAMT BIT(16) + +static __init int config_tdx_module(struct tdmr_info_list *tdmr_list, u64 = global_keyid) { struct tdx_module_args args =3D {}; u64 *tdmr_pa_array; @@ -1056,6 +1057,12 @@ static __init int config_tdx_module(struct tdmr_info= _list *tdmr_list, args.rcx =3D __pa(tdmr_pa_array); args.rdx =3D tdmr_list->nr_consumed_tdmrs; args.r8 =3D global_keyid; + + if (tdx_supports_dynamic_pamt(&tdx_sysinfo)) { + pr_info("Enable Dynamic PAMT\n"); + args.r8 |=3D TDX_SYS_CONFIG_DYNAMIC_PAMT; + } + ret =3D seamcall_prerr(TDH_SYS_CONFIG, &args); =20 /* Free the array as it is not required anymore. */ diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 8c39dde347cc2..68a68468fbeb6 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -86,9 +86,6 @@ struct tdmr_info { DECLARE_FLEX_ARRAY(struct tdmr_reserved_area, reserved_areas); } __packed __aligned(TDMR_INFO_ALIGNMENT); =20 -/* Bit definitions of TDX_FEATURES0 metadata field */ -#define TDX_FEATURES0_NO_RBP_MOD BIT(18) - /* * Do not put any hardware-defined TDX structure representations below * this comment! --=20 2.54.0 From nobody Mon Jun 8 22:51:56 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B2883168EB; Tue, 26 May 2026 02:35:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762938; cv=none; b=R1m6tivMpxcQnLDqld1zpQEa4YFnUNx2HGmk/DzmW419Yg1R58ch12v1cKZ9U7ujDAD4BsK1WQkGsE1DDql1SgOcIWFAdd07RCRAJ4OVGrDloMW7pWHpPhKPizrgAy3dtmnDRPDEOGhdxfwE4P9hERCG3uotSP0qb3Qg838Pz7k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779762938; c=relaxed/simple; bh=YVw8pzRdsCVjF1UzuZe5H17KLA9HaNPb5qcIEy92d54=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fZb+Z79YPZbRgUvrUxeCtcLhAqBTsV4ub4A70OP9tW+ALPe7TM0h65OrL5sFL8LYYLBeibnx9C4fTO4qF9pl5aY7SPeWtj+x5jv6jbEWRvtAqVNb/MrUvPKoEcmoaiHIWpJB4+mkw+Z7ZO9Iw11f1OPiXIF7SPCQ0jQTcwe60qM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Hj9JnxvM; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Hj9JnxvM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779762936; x=1811298936; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=YVw8pzRdsCVjF1UzuZe5H17KLA9HaNPb5qcIEy92d54=; b=Hj9JnxvMLhsgbCriL3Kt++MhDQyt1o1ipxCuwospzo3Jq/0NhFHSov9j wWOjOVBvtfxDw5fjw4+COkv4be+BybXJV1/TxSTP+pV2e6OVsbhZstyRJ DpnqhL0pdpcnJ5ILNzQSub65OCCP0tp9mHFH9KbOfpYdYMuxDunGexD/c zvFpT4PLN1w6EuSd2lw5Pfjw5iSEjBd3gCSTeQ+dNyMa7GveWtRpnn+bP WGTgFG4tAU2YgRBqrFtsYyh6xHzOfhBCGXOuaMXeQTHKdIFqRPUh9AAFZ PFpSyP+DFOb645vTDXZk/D/vUBTeJ/A0KQkQi8pAPUoTVSuihw2bhAsX7 A==; X-CSE-ConnectionGUID: q/wujdeAT32mTHEbn2sXLA== X-CSE-MsgGUID: GmP0Nf4lSNKmiH9p4+gzCw== X-IronPort-AV: E=McAfee;i="6800,10657,11797"; a="91677858" X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="91677858" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 X-CSE-ConnectionGUID: zMLBCN4HSsO/XtP2+Y/1aw== X-CSE-MsgGUID: vGSy2OVJSyqKUsT6vvFdxg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,168,1774335600"; d="scan'208";a="279878331" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 25 May 2026 19:35:24 -0700 From: Rick Edgecombe To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com, kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, mingo@redhat.com, nik.borisov@suse.com, pbonzini@redhat.com, seanjc@google.com, tglx@kernel.org, vannapurve@google.com, x86@kernel.org, chao.gao@intel.com, yan.y.zhao@intel.com, kai.huang@intel.com Cc: rick.p.edgecombe@intel.com, "Kirill A. Shutemov" Subject: [PATCH v6 11/11] Documentation/x86: Add documentation for TDX's Dynamic PAMT Date: Mon, 25 May 2026 19:35:15 -0700 Message-ID: <20260526023515.288829-12-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.54.0 In-Reply-To: <20260526023515.288829-1-rick.p.edgecombe@intel.com> References: <20260526023515.288829-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kirill A. Shutemov" Expand TDX documentation to include information on the Dynamic PAMT feature. The new section explains PAMT support in the TDX module and how Dynamic PAMT affects the kernel memory use. Assisted-by: Sashiko:claude-opus-4-6 GitHub Copilot:claude-opus-4-6 Claude:= claude-opus-4-7 Signed-off-by: Kirill A. Shutemov Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe Reviewed-by: Tony Lindgren --- v6: - Add missing word (Binbin) - Use "::" instead of ":" - Make format of dmesg example accurate v3: - Trim down docs to be about things that user cares about, instead of development history and other details like this. --- Documentation/arch/x86/tdx.rst | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/Documentation/arch/x86/tdx.rst b/Documentation/arch/x86/tdx.rst index ff6b110291bc6..ce026a88b6f78 100644 --- a/Documentation/arch/x86/tdx.rst +++ b/Documentation/arch/x86/tdx.rst @@ -73,6 +73,28 @@ initialize:: =20 [..] virt/tdx: TDX-Module initialization failed ... =20 +Dynamic PAMT +------------ + +PAMT is memory that the TDX module needs to keep data about each page +(think like struct page). It needs to be handed to the TDX module for its +exclusive use. For normal PAMT, this is installed when the TDX module +is first loaded and comes to about 0.4% of system memory. + +Dynamic PAMT is a TDX feature that allows VMM to allocate part of the +PAMT as needed (the parts for tracking 4KB size pages). The other page +sizes (1GB and 2MB) are still allocated statically at the time of +TDX module initialization. This reduces the amount of memory that TDX +uses while TDs are not in use. + +When Dynamic PAMT is in use, dmesg shows it like:: + + [..] virt/tdx: Enable Dynamic PAMT + [..] virt/tdx: 10092 KB allocated for PAMT + [..] virt/tdx: TDX-Module initialized + +Dynamic PAMT is enabled automatically if supported. + TDX Interaction to Other Kernel Components ------------------------------------------ =20 --=20 2.54.0