From nobody Sun Feb 8 15:15:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44EEB21CC49; Mon, 9 Jun 2025 19:13:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496437; cv=none; b=LzhABjXeffwUoi5GPL+tyP02Vf5hXERBR/3CjdCHixZY6LeiUpqWf2NZW5HaDc9mCyrEV6MLLCkkJkHiyZaBSNx6aeF0zM9JfxRKNKb0uVzS9NEATj+yC4abhk7ZxIHjDLXPW8lqiRmwncmhsG3AHatMOux6kFv4CM43pSbMwCw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496437; c=relaxed/simple; bh=nB6JOQr5Y06aT/tqLGB1sCtZdJ11IaKemE4Y42Ams54=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iF5XaBWxX/KJQHWGPpxnGHHVJUNx6TvZ/FmITOBf226/gTCYLhqiLd5hXtsI+chFfZ9c0OVycQr3ava2evqmLM7mpNINWRNEs9+vxqDuyJ764JxWdBsRdmWnGduYDb5AhFMqvDy+HKl5bHYMhrGWtVeYbboXUESfuS0IIGRCJuA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=VeTNzltA; arc=none smtp.client-ip=198.175.65.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="VeTNzltA" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749496435; x=1781032435; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=nB6JOQr5Y06aT/tqLGB1sCtZdJ11IaKemE4Y42Ams54=; b=VeTNzltAKVaUW95oMo9ku8JGDfAWXUgibLVnCwLkuBhNEWsEXr7QEv0r fToWHwE4vRLewWxyX85yGSxpS1mb7WLkRkp5N1UOISFS1ep6iFMTTVxxq kO1Kvyx3xzqUnrPE3MPM8WosQ2lQ9Agk7TOlSZaEmiGdTj9hjBixiii+q vejxLCx6TYr7xCXrjAiuC0QjYkjDe9WNe5FUWyDt91m9lDugYlr/vI6lD KAUSxvCLTcpfC6cLBtHUuKHDk4ctmvgYG/Pqx+RWGdn/47Bq5dnwW7cRk gxiU5n0XWFsBhP8e+yE2sEoDciov2R9MLd70UOhzuAnNs8C3KdXkCWTmd w==; X-CSE-ConnectionGUID: 7ep67eexRbmkQG7+kiVmYg== X-CSE-MsgGUID: LAY2nM/VTsGsgQNVLGvZoA== X-IronPort-AV: E=McAfee;i="6800,10657,11459"; a="51467248" X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="51467248" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa111.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2025 12:13:55 -0700 X-CSE-ConnectionGUID: KnXjtD1HTS61h5p+hpvigQ== X-CSE-MsgGUID: A1iYLt12TLKcjJbCzm3L5Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="147562156" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa008.jf.intel.com with ESMTP; 09 Jun 2025 12:13:51 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 01662168; Mon, 09 Jun 2025 22:13:48 +0300 (EEST) From: "Kirill A. Shutemov" To: pbonzini@redhat.com, seanjc@google.com, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, isaku.yamahata@intel.com, kai.huang@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 01/12] x86/tdx: Consolidate TDX error handling Date: Mon, 9 Jun 2025 22:13:29 +0300 Message-ID: <20250609191340.2051741-2-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> References: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move all (host, kvm, guest) code related to TDX error handling into . Add inline functions to check errors. Signed-off-by: Kirill A. Shutemov --- arch/x86/coco/tdx/tdx.c | 6 +- arch/x86/include/asm/tdx.h | 21 +------ arch/x86/{kvm/vmx =3D> include/asm}/tdx_errno.h | 60 +++++++++++++++++-- arch/x86/kvm/vmx/tdx.c | 18 ++---- arch/x86/kvm/vmx/tdx.h | 1 - 5 files changed, 63 insertions(+), 43 deletions(-) rename arch/x86/{kvm/vmx =3D> include/asm}/tdx_errno.h (52%) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index edab6d6049be..6505bfcd2a0d 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -33,10 +33,6 @@ #define VE_GET_PORT_NUM(e) ((e) >> 16) #define VE_IS_IO_STRING(e) ((e) & BIT(4)) =20 -/* TDX Module call error codes */ -#define TDCALL_RETURN_CODE(a) ((a) >> 32) -#define TDCALL_INVALID_OPERAND 0xc0000100 - #define TDREPORT_SUBTYPE_0 0 =20 static atomic_long_t nr_shared; @@ -127,7 +123,7 @@ int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport) =20 ret =3D __tdcall(TDG_MR_REPORT, &args); if (ret) { - if (TDCALL_RETURN_CODE(ret) =3D=3D TDCALL_INVALID_OPERAND) + if (tdx_operand_invalid(ret)) return -EINVAL; return -EIO; } diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 26ffc792e673..9649308bd9c0 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -10,28 +10,9 @@ #include #include #include +#include #include =20 -/* - * SW-defined error codes. - * - * Bits 47:40 =3D=3D 0xFF indicate Reserved status code class that never u= sed by - * TDX module. - */ -#define TDX_ERROR _BITUL(63) -#define TDX_NON_RECOVERABLE _BITUL(62) -#define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40)) -#define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000)) - -#define TDX_SEAMCALL_GP (TDX_SW_ERROR | X86_TRAP_GP) -#define TDX_SEAMCALL_UD (TDX_SW_ERROR | X86_TRAP_UD) - -/* - * TDX module SEAMCALL leaf function error codes - */ -#define TDX_SUCCESS 0ULL -#define TDX_RND_NO_ENTROPY 0x8000020300000000ULL - #ifndef __ASSEMBLER__ =20 #include diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/include/asm/tdx_errno.h similarity index 52% rename from arch/x86/kvm/vmx/tdx_errno.h rename to arch/x86/include/asm/tdx_errno.h index 6ff4672c4181..d418934176e2 100644 --- a/arch/x86/kvm/vmx/tdx_errno.h +++ b/arch/x86/include/asm/tdx_errno.h @@ -1,14 +1,13 @@ /* SPDX-License-Identifier: GPL-2.0 */ /* architectural status code for SEAMCALL */ =20 -#ifndef __KVM_X86_TDX_ERRNO_H -#define __KVM_X86_TDX_ERRNO_H - -#define TDX_SEAMCALL_STATUS_MASK 0xFFFFFFFF00000000ULL +#ifndef _X86_TDX_ERRNO_H +#define _X86_TDX_ERRNO_H =20 /* * TDX SEAMCALL Status Codes (returned in RAX) */ +#define TDX_SUCCESS 0ULL #define TDX_NON_RECOVERABLE_VCPU 0x4000000100000000ULL #define TDX_NON_RECOVERABLE_TD 0x4000000200000000ULL #define TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE 0x6000000500000000ULL @@ -17,6 +16,7 @@ #define TDX_OPERAND_INVALID 0xC000010000000000ULL #define TDX_OPERAND_BUSY 0x8000020000000000ULL #define TDX_PREVIOUS_TLB_EPOCH_BUSY 0x8000020100000000ULL +#define TDX_RND_NO_ENTROPY 0x8000020300000000ULL #define TDX_PAGE_METADATA_INCORRECT 0xC000030000000000ULL #define TDX_VCPU_NOT_ASSOCIATED 0x8000070200000000ULL #define TDX_KEY_GENERATION_FAILED 0x8000080000000000ULL @@ -37,4 +37,54 @@ #define TDX_OPERAND_ID_SEPT 0x92 #define TDX_OPERAND_ID_TD_EPOCH 0xa9 =20 -#endif /* __KVM_X86_TDX_ERRNO_H */ +#define TDX_STATUS_MASK 0xFFFFFFFF00000000ULL + +/* + * SW-defined error codes. + * + * Bits 47:40 =3D=3D 0xFF indicate Reserved status code class that never u= sed by + * TDX module. + */ +#define TDX_ERROR _BITULL(63) +#define TDX_NON_RECOVERABLE _BITULL(62) +#define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40)) +#define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000)) + +#define TDX_SEAMCALL_GP (TDX_SW_ERROR | X86_TRAP_GP) +#define TDX_SEAMCALL_UD (TDX_SW_ERROR | X86_TRAP_UD) + +#ifndef __ASSEMBLER__ +#include +#include + +static inline u64 tdx_status(u64 err) +{ + return err & TDX_STATUS_MASK; +} + +static inline bool tdx_sw_error(u64 err) +{ + return (err & TDX_SW_ERROR) =3D=3D TDX_SW_ERROR; +} + +static inline bool tdx_success(u64 err) +{ + return tdx_status(err) =3D=3D TDX_SUCCESS; +} + +static inline bool tdx_rnd_no_entropy(u64 err) +{ + return tdx_status(err) =3D=3D TDX_RND_NO_ENTROPY; +} + +static inline bool tdx_operand_invalid(u64 err) +{ + return tdx_status(err) =3D=3D TDX_OPERAND_INVALID; +} + +static inline bool tdx_operand_busy(u64 err) +{ + return tdx_status(err) =3D=3D TDX_OPERAND_BUSY; +} +#endif /* __ASSEMBLER__ */ +#endif /* _X86_TDX_ERRNO_H */ diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index b952bc673271..7a48bd901536 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -202,12 +202,6 @@ static DEFINE_MUTEX(tdx_lock); =20 static atomic_t nr_configured_hkid; =20 -static bool tdx_operand_busy(u64 err) -{ - return (err & TDX_SEAMCALL_STATUS_MASK) =3D=3D TDX_OPERAND_BUSY; -} - - /* * A per-CPU list of TD vCPUs associated with a given CPU. * Protected by interrupt mask. Only manipulated by the CPU owning this pe= r-CPU @@ -895,7 +889,7 @@ static __always_inline u32 tdx_to_vmx_exit_reason(struc= t kvm_vcpu *vcpu) struct vcpu_tdx *tdx =3D to_tdx(vcpu); u32 exit_reason; =20 - switch (tdx->vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) { + switch (tdx_status(tdx->vp_enter_ret)) { case TDX_SUCCESS: case TDX_NON_RECOVERABLE_VCPU: case TDX_NON_RECOVERABLE_TD: @@ -1957,7 +1951,7 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t= fastpath) * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and * TDX_SEAMCALL_VMFAILINVALID. */ - if (unlikely((vp_enter_ret & TDX_SW_ERROR) =3D=3D TDX_SW_ERROR)) { + if (tdx_sw_error(vp_enter_ret)) { KVM_BUG_ON(!kvm_rebooting, vcpu->kvm); goto unhandled_exit; } @@ -1982,7 +1976,7 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t= fastpath) } =20 WARN_ON_ONCE(exit_reason.basic !=3D EXIT_REASON_TRIPLE_FAULT && - (vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) !=3D TDX_SUCCESS); + !tdx_success(vp_enter_ret)); =20 switch (exit_reason.basic) { case EXIT_REASON_TRIPLE_FAULT: @@ -2428,7 +2422,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_p= arams *td_params, err =3D tdh_mng_create(&kvm_tdx->td, kvm_tdx->hkid); mutex_unlock(&tdx_lock); =20 - if (err =3D=3D TDX_RND_NO_ENTROPY) { + if (tdx_rnd_no_entropy(err)) { ret =3D -EAGAIN; goto free_packages; } @@ -2470,7 +2464,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_p= arams *td_params, kvm_tdx->td.tdcs_pages =3D tdcs_pages; for (i =3D 0; i < kvm_tdx->td.tdcs_nr_pages; i++) { err =3D tdh_mng_addcx(&kvm_tdx->td, tdcs_pages[i]); - if (err =3D=3D TDX_RND_NO_ENTROPY) { + if (tdx_rnd_no_entropy(err)) { /* Here it's hard to allow userspace to retry. */ ret =3D -EAGAIN; goto teardown; @@ -2483,7 +2477,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_p= arams *td_params, } =20 err =3D tdh_mng_init(&kvm_tdx->td, __pa(td_params), &rcx); - if ((err & TDX_SEAMCALL_STATUS_MASK) =3D=3D TDX_OPERAND_INVALID) { + if (tdx_operand_invalid(err)) { /* * Because a user gives operands, don't warn. * Return a hint to the user because it's sometimes hard for the diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 51f98443e8a2..dba23f1d21cb 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -3,7 +3,6 @@ #define __KVM_X86_VMX_TDX_H =20 #include "tdx_arch.h" -#include "tdx_errno.h" =20 #ifdef CONFIG_KVM_INTEL_TDX #include "common.h" --=20 2.47.2 From nobody Sun Feb 8 15:15:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6F37121CC4A; Mon, 9 Jun 2025 19:13:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.14 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496436; cv=none; b=ti590ZBRvl9bkykpXWnM2wU931d+9nBfjGxwXJs+ZSLiBXJlUpaE6xl50y7LtkbL5vw/zZ56BlOD6ssqh631qXzcjysSRl6zfFRzbBYphrA/e5Ntxh7qkXHT0rPtW+cAW4Kn8swZQKHI3GRxyTc6WQ33FmC5sGVb/xwW59lBIL8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496436; c=relaxed/simple; bh=qfB8VMSGySBtQwc24w5kS+61cOYDH+0r3VJ7G+xrbns=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jfbx5zM0yYupd/ZDcNnYC/2anckOcDWy+fTTklGvPXltFtex80vvBwebV9zteHhnvkAjem00UOhQM5tGfOYH6Sm0DW+DSNGuhd1igPg1uy6tJGkIN1YLpW9NMIQNjRs9FH3iGgHmZ/IxESWag+rkme5PzYXcAlXXCthIYAcDDUo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=cRcGCzSR; arc=none smtp.client-ip=192.198.163.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cRcGCzSR" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749496435; x=1781032435; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=qfB8VMSGySBtQwc24w5kS+61cOYDH+0r3VJ7G+xrbns=; b=cRcGCzSR163+wR7LZztxFsKYLdB+9mwLIf+p4IRS/KGWAppcYiZ7cLJx Uls/gvafrKRJPGIXeRfZkT/WGiSRQ++xmFGYEWFtZXLpPKo9QM6E+0Hfk ry/Omr4ljxDTtv1FI8aLmGP0fpu42iXPfQXRPB+BDKclCSJuF8nhdj0mb 063G4Iufvy8k5XHWHz0yV6DnJRbpqEOrhVm0+vxtshJHImYe/5Tyqgxc1 Rcp/mtY8yJferThr03IiLvrc6lTIp2sPgzfE1YZA/0JboHpRKNiiwaXeF tZ7QXpdQmhop8plWB1XENFvaepmWzSer6C9VaB/009HzOpVCajCfu02OQ Q==; X-CSE-ConnectionGUID: 1qzGlIQWRRim4+Q/Fpx9Sg== X-CSE-MsgGUID: hj8FW4tHTe+cY/TXquQPRA== X-IronPort-AV: E=McAfee;i="6800,10657,11459"; a="51681740" X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="51681740" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2025 12:13:54 -0700 X-CSE-ConnectionGUID: B8SZt3eeQKKg6gaHNVT4ag== X-CSE-MsgGUID: YlEQp6tfSA22ulyBZtOKGw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="147174157" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa007.jf.intel.com with ESMTP; 09 Jun 2025 12:13:51 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 0F88E290; Mon, 09 Jun 2025 22:13:49 +0300 (EEST) From: "Kirill A. Shutemov" To: pbonzini@redhat.com, seanjc@google.com, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, isaku.yamahata@intel.com, kai.huang@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 02/12] x86/virt/tdx: Allocate page bitmap for Dynamic PAMT Date: Mon, 9 Jun 2025 22:13:30 +0300 Message-ID: <20250609191340.2051741-3-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> References: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The Physical Address Metadata Table (PAMT) holds TDX metadata for physical memory and must be allocated by the kernel during TDX module initialization. The exact size of the required PAMT memory is determined by the TDX module and may vary between TDX module versions, but currently it is approximately 0.4% of the system memory. This is a significant commitment, especially if it is not known upfront whether the machine will run any TDX guests. The Dynamic PAMT feature reduces static PAMT allocations. PAMT_1G and PAMT_2M levels are still allocated on TDX module initialization, but the PAMT_4K level is allocated dynamically, reducing static allocations to approximately 0.004% of the system memory. With Dynamic PAMT, the kernel no longer needs to allocate PAMT_4K on boot, but instead must allocate a page bitmap. The TDX module determines how many bits per page need to be allocated (currently it is 1). Allocate the bitmap if the kernel boots on a machine with Dynamic PAMT. Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/tdx.h | 5 +++++ arch/x86/include/asm/tdx_global_metadata.h | 1 + arch/x86/virt/vmx/tdx/tdx.c | 23 ++++++++++++++++++++- arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 3 +++ 4 files changed, 31 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 9649308bd9c0..583d6fe66821 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -106,6 +106,11 @@ int tdx_enable(void); const char *tdx_dump_mce_info(struct mce *m); const struct tdx_sys_info *tdx_get_sysinfo(void); =20 +static inline bool tdx_supports_dynamic_pamt(const struct tdx_sys_info *sy= sinfo) +{ + return false; /* To be enabled when kernel is ready */ +} + int tdx_guest_keyid_alloc(void); u32 tdx_get_nr_guest_keyids(void); void tdx_guest_keyid_free(unsigned int keyid); diff --git a/arch/x86/include/asm/tdx_global_metadata.h b/arch/x86/include/= asm/tdx_global_metadata.h index 060a2ad744bf..5eb808b23997 100644 --- a/arch/x86/include/asm/tdx_global_metadata.h +++ b/arch/x86/include/asm/tdx_global_metadata.h @@ -15,6 +15,7 @@ struct tdx_sys_info_tdmr { u16 pamt_4k_entry_size; u16 pamt_2m_entry_size; u16 pamt_1g_entry_size; + u8 pamt_page_bitmap_entry_bits; }; =20 struct tdx_sys_info_td_ctrl { diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 2457d13c3f9e..18179eb26eb9 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -470,6 +470,18 @@ static unsigned long tdmr_get_pamt_sz(struct tdmr_info= *tdmr, int pgsz, return pamt_sz; } =20 +static unsigned long tdmr_get_pamt_bitmap_sz(struct tdmr_info *tdmr) +{ + unsigned long pamt_sz, nr_pamt_entries; + int bits_per_entry; + + bits_per_entry =3D tdx_sysinfo.tdmr.pamt_page_bitmap_entry_bits; + nr_pamt_entries =3D tdmr->size >> PAGE_SHIFT; + pamt_sz =3D DIV_ROUND_UP(nr_pamt_entries * bits_per_entry, BITS_PER_BYTE); + + return ALIGN(pamt_sz, PAGE_SIZE); +} + /* * Locate a NUMA node which should hold the allocation of the @tdmr * PAMT. This node will have some memory covered by the TDMR. The @@ -522,7 +534,16 @@ static int tdmr_set_up_pamt(struct tdmr_info *tdmr, * and the total PAMT size. */ tdmr_pamt_size =3D 0; - for (pgsz =3D TDX_PS_4K; pgsz < TDX_PS_NR; pgsz++) { + pgsz =3D TDX_PS_4K; + + /* With Dynamic PAMT, PAMT_4K is replaced with a bitmap */ + if (tdx_supports_dynamic_pamt(&tdx_sysinfo)) { + pamt_size[pgsz] =3D tdmr_get_pamt_bitmap_sz(tdmr); + tdmr_pamt_size +=3D pamt_size[pgsz]; + pgsz++; + } + + for (; pgsz < TDX_PS_NR; pgsz++) { pamt_size[pgsz] =3D tdmr_get_pamt_sz(tdmr, pgsz, pamt_entry_size[pgsz]); tdmr_pamt_size +=3D pamt_size[pgsz]; diff --git a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c b/arch/x86/virt/vm= x/tdx/tdx_global_metadata.c index 13ad2663488b..683925bcc9eb 100644 --- a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c +++ b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c @@ -33,6 +33,9 @@ static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdmr= *sysinfo_tdmr) sysinfo_tdmr->pamt_2m_entry_size =3D val; if (!ret && !(ret =3D read_sys_metadata_field(0x9100000100000012, &val))) sysinfo_tdmr->pamt_1g_entry_size =3D val; + if (!ret && tdx_supports_dynamic_pamt(&tdx_sysinfo) && + !(ret =3D read_sys_metadata_field(0x9100000100000013, &val))) + sysinfo_tdmr->pamt_page_bitmap_entry_bits =3D val; =20 return ret; } --=20 2.47.2 From nobody Sun Feb 8 15:15:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3342B220F30; Mon, 9 Jun 2025 19:13:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.14 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496441; cv=none; b=rRLepc+sJgemhix/AYcCCWUBw+7K0v6IgyRcAiyGaTwG83Lwv2HsspiO0HJN01r8oKHrvu/4XiT9x/vJu/XbyXtVTbJHk4ZTp0l2ibguZt/q/YPGJ6ILdV3GmbJ0zYIFcUejBlLVHLGjqUNHZRcpg02eMqELLJUfNmLhP0G34LE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496441; c=relaxed/simple; bh=w7pkZDeT2pFDEb3Ky2qV2DXqIV/kyuBmg0NR3kbxln0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=o6w4PXsLR6zmvdao4vzNjdR7Jrykp/8j2zThAcAXaXFBdofHPPKpnb8KNy7yEsSwmWNY0ni7hY+LnycGnpy9zqK56MmHWxJtPCn224VAYXE9hBqfC3+8OY7DjVjNZsS2gjCxyWhNUo6qyrsT6407JiK8JiWiMBDzkyT1Qn70DKU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=GYZhT4Dd; arc=none smtp.client-ip=192.198.163.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="GYZhT4Dd" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749496439; x=1781032439; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=w7pkZDeT2pFDEb3Ky2qV2DXqIV/kyuBmg0NR3kbxln0=; b=GYZhT4Dd1CioqR90hp79OqwxIVnAqUwpdTTKOotbbSjbHeZP3FjlaJHY T+nMA3tMdsw6b0uTxMVqaRr2pcytok4U97ZQlnL1+kDzWGYs/AX2IJG9M Nd8xQ3qgTry0KDQ+m1S1MApAMXx+i4y+gqrVc51RgKS6hYvG8ogFv447Z a8Q9rw2ky/kDZOKGLJQbAIw/vFKx5L293+RGVtE6RsVPdbHzKrv2SIbY4 rK9oL3uVVQi93Rc1HzyP3mpMotrnnSlzrGhqqpixlNB3ARjWWSCUkY1VM EY0yxz8xCeJmEDUzwPtv2oo89jHEDxvtkk3FrG8Xg5P+TYnAhP3CVw21R A==; X-CSE-ConnectionGUID: svclemQ7QuyWGa8CSP5tOw== X-CSE-MsgGUID: cx7uSABQSwusfVK1ANzrjQ== X-IronPort-AV: E=McAfee;i="6800,10657,11459"; a="51681762" X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="51681762" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2025 12:13:55 -0700 X-CSE-ConnectionGUID: cwfS3GPeSFq2D5kC9dZ8Xg== X-CSE-MsgGUID: A1UN77I/Sia9lFSwW3TSiw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="147174164" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa007.jf.intel.com with ESMTP; 09 Jun 2025 12:13:51 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 201A3492; Mon, 09 Jun 2025 22:13:49 +0300 (EEST) From: "Kirill A. Shutemov" To: pbonzini@redhat.com, seanjc@google.com, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, isaku.yamahata@intel.com, kai.huang@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 03/12] x86/virt/tdx: Allocate reference counters for PAMT memory Date: Mon, 9 Jun 2025 22:13:31 +0300 Message-ID: <20250609191340.2051741-4-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> References: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The PAMT memory holds metadata for TDX-protected memory. With Dynamic PAMT, PAMT_4K is allocated on demand. The kernel supplies the TDX module with a page pair that covers 2M of host physical memory. The kernel must provide this page pair before using pages from the range for TDX. If this is not done, any SEAMCALL that attempts to use the memory will fail. Allocate reference counters for every 2M range to track PAMT memory usage. This is necessary to accurately determine when PAMT memory needs to be allocated and when it can be freed. This allocation will consume 2MiB for every 1TiB of physical memory. Tracking PAMT memory usage on the kernel side duplicates what TDX module does. It is possible to avoid this by lazily allocating PAMT memory on SEAMCALL failure and freeing it based on hints provided by the TDX module when the last user of PAMT memory is no longer present. However, this approach complicates serialization. The TDX module takes locks when dealing with PAMT: a shared lock on any SEAMCALL that uses explicit HPA and an exclusive lock on PAMT.ADD and PAMT.REMOVE. Any SEAMCALL that uses explicit HPA as an operand may fail if it races with PAMT.ADD/REMOVE. Since PAMT is a global resource, to prevent failure the kernel would need global locking (per-TD is not sufficient). Or, it has to retry on TDX_OPERATOR_BUSY. Both options are not ideal, and tracking PAMT usage on the kernel side seems like a reasonable alternative. Signed-off-by: Kirill A. Shutemov --- arch/x86/virt/vmx/tdx/tdx.c | 112 +++++++++++++++++++++++++++++++++++- 1 file changed, 111 insertions(+), 1 deletion(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 18179eb26eb9..ad9d7a30989d 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -50,6 +51,8 @@ static DEFINE_PER_CPU(bool, tdx_lp_initialized); =20 static struct tdmr_info_list tdx_tdmr_list; =20 +static atomic_t *pamt_refcounts; + static enum tdx_module_status_t tdx_module_status; static DEFINE_MUTEX(tdx_module_lock); =20 @@ -182,6 +185,102 @@ int tdx_cpu_enable(void) } EXPORT_SYMBOL_GPL(tdx_cpu_enable); =20 +static atomic_t *tdx_get_pamt_refcount(unsigned long hpa) +{ + return &pamt_refcounts[hpa / PMD_SIZE]; +} + +static int pamt_refcount_populate(pte_t *pte, unsigned long addr, void *da= ta) +{ + unsigned long vaddr; + pte_t entry; + + if (!pte_none(ptep_get(pte))) + return 0; + + vaddr =3D __get_free_page(GFP_KERNEL | __GFP_ZERO); + if (!vaddr) + return -ENOMEM; + + entry =3D pfn_pte(PFN_DOWN(__pa(vaddr)), PAGE_KERNEL); + + spin_lock(&init_mm.page_table_lock); + if (pte_none(ptep_get(pte))) + set_pte_at(&init_mm, addr, pte, entry); + else + free_page(vaddr); + spin_unlock(&init_mm.page_table_lock); + + return 0; +} + +static int pamt_refcount_depopulate(pte_t *pte, unsigned long addr, + void *data) +{ + unsigned long vaddr; + + vaddr =3D (unsigned long)__va(PFN_PHYS(pte_pfn(ptep_get(pte)))); + + spin_lock(&init_mm.page_table_lock); + if (!pte_none(ptep_get(pte))) { + pte_clear(&init_mm, addr, pte); + free_page(vaddr); + } + spin_unlock(&init_mm.page_table_lock); + + return 0; +} + +static int alloc_pamt_refcount(unsigned long start_pfn, unsigned long end_= pfn) +{ + unsigned long start, end; + + start =3D (unsigned long)tdx_get_pamt_refcount(PFN_PHYS(start_pfn)); + end =3D (unsigned long)tdx_get_pamt_refcount(PFN_PHYS(end_pfn + 1)); + start =3D round_down(start, PAGE_SIZE); + end =3D round_up(end, PAGE_SIZE); + + return apply_to_page_range(&init_mm, start, end - start, + pamt_refcount_populate, NULL); +} + +static int init_pamt_metadata(void) +{ + size_t size =3D max_pfn / PTRS_PER_PTE * sizeof(*pamt_refcounts); + struct vm_struct *area; + + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return 0; + + /* + * Reserve vmalloc range for PAMT reference counters. It covers all + * physical address space up to max_pfn. It is going to be populated + * from init_tdmr() only for present memory that available for TDX use. + */ + area =3D get_vm_area(size, VM_IOREMAP); + if (!area) + return -ENOMEM; + + pamt_refcounts =3D area->addr; + return 0; +} + +static void free_pamt_metadata(void) +{ + size_t size =3D max_pfn / PTRS_PER_PTE * sizeof(*pamt_refcounts); + + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return; + + size =3D round_up(size, PAGE_SIZE); + apply_to_existing_page_range(&init_mm, + (unsigned long)pamt_refcounts, + size, pamt_refcount_depopulate, + NULL); + vfree(pamt_refcounts); + pamt_refcounts =3D NULL; +} + /* * Add a memory region as a TDX memory block. The caller must make sure * all memory regions are added in address ascending order and don't @@ -248,6 +347,10 @@ static int build_tdx_memlist(struct list_head *tmb_lis= t) ret =3D add_tdx_memblock(tmb_list, start_pfn, end_pfn, nid); if (ret) goto err; + + ret =3D alloc_pamt_refcount(start_pfn, end_pfn); + if (ret) + goto err; } =20 return 0; @@ -1110,10 +1213,15 @@ static int init_tdx_module(void) */ get_online_mems(); =20 - ret =3D build_tdx_memlist(&tdx_memlist); + /* Reserve vmalloc range for PAMT reference counters */ + ret =3D init_pamt_metadata(); if (ret) goto out_put_tdxmem; =20 + ret =3D build_tdx_memlist(&tdx_memlist); + if (ret) + goto err_free_pamt_metadata; + /* Allocate enough space for constructing TDMRs */ ret =3D alloc_tdmr_list(&tdx_tdmr_list, &tdx_sysinfo.tdmr); if (ret) @@ -1171,6 +1279,8 @@ static int init_tdx_module(void) free_tdmr_list(&tdx_tdmr_list); err_free_tdxmem: free_tdx_memlist(&tdx_memlist); +err_free_pamt_metadata: + free_pamt_metadata(); goto out_put_tdxmem; } =20 --=20 2.47.2 From nobody Sun Feb 8 15:15:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 269CE21E08A; Mon, 9 Jun 2025 19:13:57 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.14 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496439; cv=none; b=TB/l3wsK8d3SN0M8QR6421w9jfKfaXQsDZqie12VkxyoemQhQVj8Rpz6RCl79jMm2R+/rBJX4WlLc5ZemPAnNAtlyDCP8XK/Cs3GkKf04ElzTY9cF7QIQyM7d41m8BqD/vKe9+IIb2uU6JsMn5o2Nnhxvxoob5rMrtR1sgjPMJ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496439; c=relaxed/simple; bh=8CmXDZNtUgRwbG5heuuxgJssDszuLMnl/EJ2jEgttpc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=TMktaxMB4wABv7ZdcZ23VeFZnYHCM7QeSRjotY62yn5ZoHg073ITuaY2r3+L+cL7qjAjry65ZdFrK63LFwUWhOmKM90XehML2LQ8arS2gZI02LY6tNFRtIrLIBPPanI/2tnajPwRq3twGGbGYkHoh6JUKjRTEB4FzEZ/EGpc+DQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gMSr+kFe; arc=none smtp.client-ip=192.198.163.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gMSr+kFe" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749496437; x=1781032437; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8CmXDZNtUgRwbG5heuuxgJssDszuLMnl/EJ2jEgttpc=; b=gMSr+kFePmE9b/KROqrF39Kxd8CIdFK+O2kUQvwJO5/RsMknbVRaiqXz Jg6D/ZIiCVrf3+n9Pb1DgeyD9PoxiR59G4j0q1pp0vWZSwG00vBJeyb48 Km4vvqNHcLIHVfXsxNJoIOpvQPM5evHpB1ylmBQ3xcujh2dW4JX47RkBZ dsThW9XaYNiwnfxlrOlC41AzjfWvnuzcXC8Qz6qWNnAujyqlkEGQrlvrh 0k0vMdWmDtkm9wZ2PDFhoyqj7tX7dLiAp7vPocMawKVA1x0FcVPtR7MkT Xdiaqb8hahDg1a4/gR+XK2HX5kz/b2mYQrIjeSY/+v/FVMq6C6Z9U/vgL Q==; X-CSE-ConnectionGUID: 6RIVesiDSQek+1LlQRf9vw== X-CSE-MsgGUID: Vs+IUYbkQMa35BFClz+HAw== X-IronPort-AV: E=McAfee;i="6800,10657,11459"; a="51681750" X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="51681750" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2025 12:13:55 -0700 X-CSE-ConnectionGUID: kS/Po0pETxSlbmv6Ks8ilA== X-CSE-MsgGUID: /yQorKbQQVqPFnzMa2ckrw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="147174159" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa007.jf.intel.com with ESMTP; 09 Jun 2025 12:13:51 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 308CA4EA; Mon, 09 Jun 2025 22:13:49 +0300 (EEST) From: "Kirill A. Shutemov" To: pbonzini@redhat.com, seanjc@google.com, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, isaku.yamahata@intel.com, kai.huang@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 04/12] x86/virt/tdx: Add tdx_alloc/free_page() helpers Date: Mon, 9 Jun 2025 22:13:32 +0300 Message-ID: <20250609191340.2051741-5-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> References: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The new helpers allocate and free pages that can be used for a TDs. Besides page allocation and freeing, these helpers also take care about managing PAMT memory, if kernel runs on a platform with Dynamic PAMT supported. tdx_pamt_get()/put() helpers take care of PAMT allocation/freeing and its refcounting. PAMT memory is allocated when refcount for the 2M range crosses from 0 to 1 and gets freed back on when it is dropped to zero. These transitions can happen concurrently and pamt_lock spinlock serializes them. Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/tdx.h | 3 + arch/x86/include/asm/tdx_errno.h | 6 + arch/x86/virt/vmx/tdx/tdx.c | 205 +++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 2 + 4 files changed, 216 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 583d6fe66821..d9a77147412f 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -115,6 +115,9 @@ int tdx_guest_keyid_alloc(void); u32 tdx_get_nr_guest_keyids(void); void tdx_guest_keyid_free(unsigned int keyid); =20 +struct page *tdx_alloc_page(void); +void tdx_free_page(struct page *page); + struct tdx_td { /* TD root structure: */ struct page *tdr_page; diff --git a/arch/x86/include/asm/tdx_errno.h b/arch/x86/include/asm/tdx_er= rno.h index d418934176e2..0b3332c2d6b2 100644 --- a/arch/x86/include/asm/tdx_errno.h +++ b/arch/x86/include/asm/tdx_errno.h @@ -18,6 +18,7 @@ #define TDX_PREVIOUS_TLB_EPOCH_BUSY 0x8000020100000000ULL #define TDX_RND_NO_ENTROPY 0x8000020300000000ULL #define TDX_PAGE_METADATA_INCORRECT 0xC000030000000000ULL +#define TDX_HPA_RANGE_NOT_FREE 0xC000030400000000ULL #define TDX_VCPU_NOT_ASSOCIATED 0x8000070200000000ULL #define TDX_KEY_GENERATION_FAILED 0x8000080000000000ULL #define TDX_KEY_STATE_INCORRECT 0xC000081100000000ULL @@ -86,5 +87,10 @@ static inline bool tdx_operand_busy(u64 err) { return tdx_status(err) =3D=3D TDX_OPERAND_BUSY; } + +static inline bool tdx_hpa_range_not_free(u64 err) +{ + return tdx_status(err) =3D=3D TDX_HPA_RANGE_NOT_FREE; +} #endif /* __ASSEMBLER__ */ #endif /* _X86_TDX_ERRNO_H */ diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index ad9d7a30989d..c514c60e8c8d 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -2000,3 +2000,208 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct pa= ge *page) return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); } EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_hkid); + +static int tdx_nr_pamt_pages(void) +{ + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return 0; + + return tdx_sysinfo.tdmr.pamt_4k_entry_size * PTRS_PER_PTE / PAGE_SIZE; +} + +static u64 tdh_phymem_pamt_add(unsigned long hpa, + struct list_head *pamt_pages) +{ + struct tdx_module_args args =3D { + .rcx =3D hpa, + }; + struct page *page; + u64 *p; + + WARN_ON_ONCE(!IS_ALIGNED(hpa & PAGE_MASK, PMD_SIZE)); + + p =3D &args.rdx; + list_for_each_entry(page, pamt_pages, lru) { + *p =3D page_to_phys(page); + p++; + } + + return seamcall(TDH_PHYMEM_PAMT_ADD, &args); +} + +static u64 tdh_phymem_pamt_remove(unsigned long hpa, + struct list_head *pamt_pages) +{ + struct tdx_module_args args =3D { + .rcx =3D hpa, + }; + struct page *page; + u64 *p, ret; + + WARN_ON_ONCE(!IS_ALIGNED(hpa & PAGE_MASK, PMD_SIZE)); + + ret =3D seamcall_ret(TDH_PHYMEM_PAMT_REMOVE, &args); + if (ret) + return ret; + + p =3D &args.rdx; + for (int i =3D 0; i < tdx_nr_pamt_pages(); i++) { + page =3D phys_to_page(*p); + list_add(&page->lru, pamt_pages); + p++; + } + + return ret; +} + +static DEFINE_SPINLOCK(pamt_lock); + +static void tdx_free_pamt_pages(struct list_head *pamt_pages) +{ + struct page *page; + + while ((page =3D list_first_entry_or_null(pamt_pages, struct page, lru)))= { + list_del(&page->lru); + __free_page(page); + } +} + +static int tdx_alloc_pamt_pages(struct list_head *pamt_pages) +{ + for (int i =3D 0; i < tdx_nr_pamt_pages(); i++) { + struct page *page =3D alloc_page(GFP_KERNEL); + if (!page) + goto fail; + list_add(&page->lru, pamt_pages); + } + return 0; +fail: + tdx_free_pamt_pages(pamt_pages); + return -ENOMEM; +} + +static int tdx_pamt_add(atomic_t *pamt_refcount, unsigned long hpa, + struct list_head *pamt_pages) +{ + u64 err; + + guard(spinlock)(&pamt_lock); + + hpa =3D ALIGN_DOWN(hpa, PMD_SIZE); + + /* Lost race to other tdx_pamt_add() */ + if (atomic_read(pamt_refcount) !=3D 0) { + atomic_inc(pamt_refcount); + return 1; + } + + err =3D tdh_phymem_pamt_add(hpa | TDX_PS_2M, pamt_pages); + + /* + * tdx_hpa_range_not_free() is true if current task won race + * against tdx_pamt_put(). + */ + if (err && !tdx_hpa_range_not_free(err)) { + pr_err("TDH_PHYMEM_PAMT_ADD failed: %#llx\n", err); + return -EIO; + } + + atomic_set(pamt_refcount, 1); + + if (tdx_hpa_range_not_free(err)) + return 1; + + return 0; +} + +static int tdx_pamt_get(struct page *page, enum pg_level level) +{ + unsigned long hpa =3D page_to_phys(page); + atomic_t *pamt_refcount; + LIST_HEAD(pamt_pages); + int ret; + + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return 0; + + if (level !=3D PG_LEVEL_4K) + return 0; + + pamt_refcount =3D tdx_get_pamt_refcount(hpa); + WARN_ON_ONCE(atomic_read(pamt_refcount) < 0); + + if (atomic_inc_not_zero(pamt_refcount)) + return 0; + + if (tdx_alloc_pamt_pages(&pamt_pages)) + return -ENOMEM; + + ret =3D tdx_pamt_add(pamt_refcount, hpa, &pamt_pages); + if (ret) + tdx_free_pamt_pages(&pamt_pages); + + return ret >=3D 0 ? 0 : ret; +} + +static void tdx_pamt_put(struct page *page, enum pg_level level) +{ + unsigned long hpa =3D page_to_phys(page); + atomic_t *pamt_refcount; + LIST_HEAD(pamt_pages); + u64 err; + + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return; + + if (level !=3D PG_LEVEL_4K) + return; + + hpa =3D ALIGN_DOWN(hpa, PMD_SIZE); + + pamt_refcount =3D tdx_get_pamt_refcount(hpa); + if (!atomic_dec_and_test(pamt_refcount)) + return; + + scoped_guard(spinlock, &pamt_lock) { + /* Lost race against tdx_pamt_add()? */ + if (atomic_read(pamt_refcount) !=3D 0) + return; + + err =3D tdh_phymem_pamt_remove(hpa | TDX_PS_2M, &pamt_pages); + + if (err) { + atomic_inc(pamt_refcount); + pr_err("TDH_PHYMEM_PAMT_REMOVE failed: %#llx\n", err); + return; + } + } + + tdx_free_pamt_pages(&pamt_pages); +} + +struct page *tdx_alloc_page(void) +{ + struct page *page; + + page =3D alloc_page(GFP_KERNEL); + if (!page) + return NULL; + + if (tdx_pamt_get(page, PG_LEVEL_4K)) { + __free_page(page); + return NULL; + } + + return page; +} +EXPORT_SYMBOL_GPL(tdx_alloc_page); + +void tdx_free_page(struct page *page) +{ + if (!page) + return; + + tdx_pamt_put(page, PG_LEVEL_4K); + __free_page(page); +} +EXPORT_SYMBOL_GPL(tdx_free_page); diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 82bb82be8567..46c4214b79fb 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -46,6 +46,8 @@ #define TDH_PHYMEM_PAGE_WBINVD 41 #define TDH_VP_WR 43 #define TDH_SYS_CONFIG 45 +#define TDH_PHYMEM_PAMT_ADD 58 +#define TDH_PHYMEM_PAMT_REMOVE 59 =20 /* * SEAMCALL leaf: --=20 2.47.2 From nobody Sun Feb 8 15:15:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 79F60220F47; Mon, 9 Jun 2025 19:13:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496441; cv=none; b=sgp9Zb/EJzC9vNYZrLe5MM1ANkShzoHgonALQ1+y3yF0rjpAChCmGPJpSJz45vaLlXegIIZ/JKQCuiUG43SJKM5RFhRE9NFWnjtWFKwLzv1tSQ5rA18QxzNHbOMeEpzjm0MFZxPQonXqZbWmgRjWfZ3AdzWU21SCSUtd5ffk/8I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496441; c=relaxed/simple; bh=glNGd8+NYynrmWy1sxr86AYIXrmFKY1sbbXyi9StH7s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GoszvMG68e7qalU4xUWQtp5uwrhVabYosOdEdH/cjl6QF7GbHC0xGr2zvF9EyT9bwygAgwwzMhA3HhzREKbj/kDUzyL4x5QBbruYiAADmpNafJgHidw+QxWDaIxr5L2GVefrSEDbuxYNySWjshmT+BMhlA1l0XTBv2h1+P4ZsbI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=MpE1Cek9; arc=none smtp.client-ip=198.175.65.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="MpE1Cek9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749496439; x=1781032439; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=glNGd8+NYynrmWy1sxr86AYIXrmFKY1sbbXyi9StH7s=; b=MpE1Cek9dyXQl4i2gAP1rzFiEi3Plqz1vwgvsTUseIKHyPeFLF1oJfQl b5pd0/Ei/2W/3aOoiLVPlVVQ1VXgtnzW7xMnKNGzb8+q8wNHnxY7GjxIX wWQXCMQa2RKwMeKSm5Z3Gd81Wq+ZDtUs9Rk2jnMWXRDEmBUy3UtatM1yF kDrp6PKQya5OmKUaDoFTtOUOlxXt2oRQAlVuSEhJHZPPaDa5ghB9Hbc2R PG716qvHmKPJ/k9JZT89yAObE+tP8gfPlCu5sBvQAZlr7gnt8ac4kk0Nu wZLYjwK9JjEii01k2Vl9Q0crUsP2yOUhROHp6bGY+UoXcQzL3BBRLh6eO g==; X-CSE-ConnectionGUID: +Oy6WJFuS8eEAeZNHDuzVA== X-CSE-MsgGUID: nh5roFAsQymLDhtPyBOi6Q== X-IronPort-AV: E=McAfee;i="6800,10657,11459"; a="51467262" X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="51467262" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa111.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2025 12:13:59 -0700 X-CSE-ConnectionGUID: djTIjhqXSRmTDLZp0ei7vg== X-CSE-MsgGUID: lMk54a/gQ76OMl7JysR4zg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="147562189" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa008.jf.intel.com with ESMTP; 09 Jun 2025 12:13:56 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 3B3D06A8; Mon, 09 Jun 2025 22:13:49 +0300 (EEST) From: "Kirill A. Shutemov" To: pbonzini@redhat.com, seanjc@google.com, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, isaku.yamahata@intel.com, kai.huang@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 05/12] KVM: TDX: Allocate PAMT memory in __tdx_td_init() Date: Mon, 9 Jun 2025 22:13:33 +0300 Message-ID: <20250609191340.2051741-6-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> References: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allocate PAMT memory for TDH.MNG.CREATE and TDH.MNG.ADDCX. PAMT memory that is associated with pages successfully added to the TD with TDH.MNG.ADDCX will be removed in tdx_reclaim_page() on tdx_reclaim_control_page(). Signed-off-by: Kirill A. Shutemov --- arch/x86/kvm/vmx/tdx.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 7a48bd901536..13796b9a4bc5 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2370,7 +2370,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_p= arams *td_params, =20 atomic_inc(&nr_configured_hkid); =20 - tdr_page =3D alloc_page(GFP_KERNEL); + tdr_page =3D tdx_alloc_page(); if (!tdr_page) goto free_hkid; =20 @@ -2383,7 +2383,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_p= arams *td_params, goto free_tdr; =20 for (i =3D 0; i < kvm_tdx->td.tdcs_nr_pages; i++) { - tdcs_pages[i] =3D alloc_page(GFP_KERNEL); + tdcs_pages[i] =3D tdx_alloc_page(); if (!tdcs_pages[i]) goto free_tdcs; } @@ -2504,10 +2504,8 @@ static int __tdx_td_init(struct kvm *kvm, struct td_= params *td_params, teardown: /* Only free pages not yet added, so start at 'i' */ for (; i < kvm_tdx->td.tdcs_nr_pages; i++) { - if (tdcs_pages[i]) { - __free_page(tdcs_pages[i]); - tdcs_pages[i] =3D NULL; - } + tdx_free_page(tdcs_pages[i]); + tdcs_pages[i] =3D NULL; } if (!kvm_tdx->td.tdcs_pages) kfree(tdcs_pages); @@ -2523,15 +2521,13 @@ static int __tdx_td_init(struct kvm *kvm, struct td= _params *td_params, =20 free_tdcs: for (i =3D 0; i < kvm_tdx->td.tdcs_nr_pages; i++) { - if (tdcs_pages[i]) - __free_page(tdcs_pages[i]); + tdx_free_page(tdcs_pages[i]); } kfree(tdcs_pages); kvm_tdx->td.tdcs_pages =3D NULL; =20 free_tdr: - if (tdr_page) - __free_page(tdr_page); + tdx_free_page(tdr_page); kvm_tdx->td.tdr_page =3D 0; =20 free_hkid: --=20 2.47.2 From nobody Sun Feb 8 15:15:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8B1A922129F; Mon, 9 Jun 2025 19:14:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496442; cv=none; b=h/+rdAg5CDv5ZMVqfiSo99nHyBeqs0lfa2nz7T6ts3r/WbZZ+aDIZMWZZlw8+TeidSDEY+FeciG787nktOhU/KlbbGehWr74U7yJzLM3qbUmQhh659+Mr5RGrWxpHJLHFPgwmyvWODZ0SK9PpBFNOv1woVYWLFAoeXzdbIoAB78= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496442; c=relaxed/simple; bh=dVq8AG2GF42RqiqDTCSC9v7X/gPCjXZk8mRsFAhmeB8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UcVZNR5YFU+aKeSDOVEHYScSNLyoLnXEdYyFwDcoHksnFya+hq8EtEBwarMBTv3YN7aSxmZOq1h+3230qZXXBm7KnkzSTw2HyQx3GPST2pKXoL+Iu59YnDiuQrwCnOYadB0OcqevoBMruxU7sPtdUagU7cLtNijZgWMJYp8Pvqw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=eYvCXm/i; arc=none smtp.client-ip=198.175.65.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="eYvCXm/i" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749496441; x=1781032441; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dVq8AG2GF42RqiqDTCSC9v7X/gPCjXZk8mRsFAhmeB8=; b=eYvCXm/iR6NWIhMo9A/sc5yToFrmz9iDMjVOaSqYemWyJVuZR85obrXd op2wueGn8Cjb7jsp4g1xbcnQy8P0C/HCGmZrFHhUfLgSGpl5wEmMoUzDu ZgY4DIcqxEpMWEg7ZcD5XBAgPJ4B9hvtRYIf2DFHgcgr2qnDc1Ettqq/O 9cHU+U7cCJvRf4whng3O5KI4VFM5K3TxE97rzgs6GqtJ2gZszXIkRDpg+ RgXpo+JhEgdAvqTkhcdBzbkePhHOASpUdQiuKlkhcBnn7WdSVh4QfOBVT XAjDBjY1vgnV3eyua0Eh6bX1ZYh+WaZIp84Cgh2Tpx7GXKMN7EZTzsogs w==; X-CSE-ConnectionGUID: YkSzl6J6SOmcK1+S5VQ5BA== X-CSE-MsgGUID: T8KLbCdXQ0uOVxS8Q1nBVQ== X-IronPort-AV: E=McAfee;i="6800,10657,11459"; a="51467269" X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="51467269" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa111.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2025 12:14:00 -0700 X-CSE-ConnectionGUID: faca+Ut5Sq6LPIhDdQSKQA== X-CSE-MsgGUID: 65UctnRYRdmz/ldkTwnKVA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="147562194" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa008.jf.intel.com with ESMTP; 09 Jun 2025 12:13:56 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 461716AD; Mon, 09 Jun 2025 22:13:49 +0300 (EEST) From: "Kirill A. Shutemov" To: pbonzini@redhat.com, seanjc@google.com, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, isaku.yamahata@intel.com, kai.huang@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 06/12] KVM: TDX: Allocate PAMT memory in tdx_td_vcpu_init() Date: Mon, 9 Jun 2025 22:13:34 +0300 Message-ID: <20250609191340.2051741-7-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> References: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allocate PAMT memory for TDH.VP.CREATE and TDH.VP.ADDCX. PAMT memory that is associated with pages successfully added to the TD with TDH.VP.ADDCX will be removed in tdx_reclaim_page() on tdx_reclaim_control_page(). Signed-off-by: Kirill A. Shutemov --- arch/x86/kvm/vmx/tdx.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 13796b9a4bc5..36c3c9f8a62c 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2829,7 +2829,7 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u6= 4 vcpu_rcx) int ret, i; u64 err; =20 - page =3D alloc_page(GFP_KERNEL); + page =3D tdx_alloc_page(); if (!page) return -ENOMEM; tdx->vp.tdvpr_page =3D page; @@ -2842,7 +2842,7 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u6= 4 vcpu_rcx) } =20 for (i =3D 0; i < kvm_tdx->td.tdcx_nr_pages; i++) { - page =3D alloc_page(GFP_KERNEL); + page =3D tdx_alloc_page(); if (!page) { ret =3D -ENOMEM; goto free_tdcx; @@ -2866,7 +2866,7 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u6= 4 vcpu_rcx) * method, but the rest are freed here. */ for (; i < kvm_tdx->td.tdcx_nr_pages; i++) { - __free_page(tdx->vp.tdcx_pages[i]); + tdx_free_page(tdx->vp.tdcx_pages[i]); tdx->vp.tdcx_pages[i] =3D NULL; } return -EIO; @@ -2885,16 +2885,15 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, = u64 vcpu_rcx) =20 free_tdcx: for (i =3D 0; i < kvm_tdx->td.tdcx_nr_pages; i++) { - if (tdx->vp.tdcx_pages[i]) - __free_page(tdx->vp.tdcx_pages[i]); + tdx_free_page(tdx->vp.tdcx_pages[i]); tdx->vp.tdcx_pages[i] =3D NULL; } kfree(tdx->vp.tdcx_pages); tdx->vp.tdcx_pages =3D NULL; =20 free_tdvpr: - if (tdx->vp.tdvpr_page) - __free_page(tdx->vp.tdvpr_page); + tdx_free_page(tdx->vp.tdvpr_page); + tdx->vp.tdvpr_page =3D 0; =20 return ret; --=20 2.47.2 From nobody Sun Feb 8 15:15:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2427221D594; Mon, 9 Jun 2025 19:14:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.14 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496444; cv=none; b=iYCkGRVjSvtLr2rtAe6eQ0I0xAOsqaxRQzpBRefBc0bxbImP5KNHleXaqo11tyyLAx894+xpcRaFyj35eKrbP52OumVak9BV4oc0Oq4Lmff84gRjsrdOIlHeHPfEw6jRiVG6FpFaJw6XIHSnFoKpgcYDRNb1IoV5Vz7U/mTN1fA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496444; c=relaxed/simple; bh=aauiztrJkeorvpMvNPHFkfjpUArGlWOFlPdcvLg8G64=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aGw8F7F1AeodkwQCj+q8f6bK7Mr/dhpfcqiKouc/j5odHzVk2AYkjlhSW/iRYda9clk5o+93OZgo8Xf+c8mzlRy/uRDio0H0NTvoMV1lY4aEkto6BsqdOh83yCgr6YIgghoJWVI04ONT5xOTK+CI50LOjiPRdOGIzXoCdpJBi5w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=CS3BF/T4; arc=none smtp.client-ip=192.198.163.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="CS3BF/T4" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749496442; x=1781032442; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aauiztrJkeorvpMvNPHFkfjpUArGlWOFlPdcvLg8G64=; b=CS3BF/T4enq+dt6H+1r1nl0tX8w6adZ03VwPJZrayEh6G8tiKUnhs66x ZIkdKp2Wq7x31aVg/oHX7i0QmFVlLqmw/jY5S3hbYyksJwn4bU2Rg8Ac2 2jOzALt4tosG1TKASarJtI55Rn2WvNvW3gbAlurd8pKPgj5C1/vRCmkUF TOe7wnFhDWl2ltAN0R/gZqiwCEr5RMvkfCmo1vI0wIRQLcsN0xj049KdI svifWguTdB3pbybS1FPM1Xw6pEK7M6W+N0rM0vqqJo+0MX8MKccV0tg0P Xz39ESjooTDHSaJDuagoK0o8VrADgCS8MCc6drefcLd0ZMBrZcMbrVDsv g==; X-CSE-ConnectionGUID: 8gcHbl93Rq66r0BN50v7fg== X-CSE-MsgGUID: K5w/QFYwQ5239vjfyikCPA== X-IronPort-AV: E=McAfee;i="6800,10657,11459"; a="51681808" X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="51681808" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2025 12:14:00 -0700 X-CSE-ConnectionGUID: vy5boMulSEeNzGdk2TDGRA== X-CSE-MsgGUID: ha8OKAo1SWyFNWZapjfE5A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="147174195" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa007.jf.intel.com with ESMTP; 09 Jun 2025 12:13:56 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 510B4A0C; Mon, 09 Jun 2025 22:13:49 +0300 (EEST) From: "Kirill A. Shutemov" To: pbonzini@redhat.com, seanjc@google.com, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, isaku.yamahata@intel.com, kai.huang@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 07/12] KVM: TDX: Preallocate PAMT pages to be used in page fault path Date: Mon, 9 Jun 2025 22:13:35 +0300 Message-ID: <20250609191340.2051741-8-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> References: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Preallocate a page to be used in the link_external_spt() and set_external_spte() paths. In the worst-case scenario, handling a page fault might require a tdx_nr_pamt_pages() pages for each page table level. Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/include/asm/tdx.h | 2 ++ arch/x86/kvm/mmu/mmu.c | 7 +++++++ arch/x86/virt/vmx/tdx/tdx.c | 3 ++- 4 files changed, 13 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 330cdcbed1a6..02dbbf848182 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -849,6 +849,8 @@ struct kvm_vcpu_arch { */ struct kvm_mmu_memory_cache mmu_external_spt_cache; =20 + struct kvm_mmu_memory_cache pamt_page_cache; + /* * QEMU userspace and the guest each have their own FPU state. * In vcpu_run, we switch between the user and guest FPU contexts. diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index d9a77147412f..47092eb13eb3 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -115,6 +115,7 @@ int tdx_guest_keyid_alloc(void); u32 tdx_get_nr_guest_keyids(void); void tdx_guest_keyid_free(unsigned int keyid); =20 +int tdx_nr_pamt_pages(void); struct page *tdx_alloc_page(void); void tdx_free_page(struct page *page); =20 @@ -188,6 +189,7 @@ static inline int tdx_enable(void) { return -ENODEV; } static inline u32 tdx_get_nr_guest_keyids(void) { return 0; } static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; } static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NU= LL; } +static inline int tdx_nr_pamt_pages(void) { return 0; } #endif /* CONFIG_INTEL_TDX_HOST */ =20 #endif /* !__ASSEMBLER__ */ diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index cbc84c6abc2e..d99bb27b5b01 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -616,6 +616,12 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vc= pu, bool maybe_indirect) if (r) return r; } + + r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.pamt_page_cache, + tdx_nr_pamt_pages() * PT64_ROOT_MAX_LEVEL); + if (r) + return r; + return kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache, PT64_ROOT_MAX_LEVEL); } @@ -626,6 +632,7 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcp= u) kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_external_spt_cache); + kvm_mmu_free_memory_cache(&vcpu->arch.pamt_page_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache); } =20 diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index c514c60e8c8d..4f9eaba4af4a 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -2001,13 +2001,14 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct pa= ge *page) } EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_hkid); =20 -static int tdx_nr_pamt_pages(void) +int tdx_nr_pamt_pages(void) { if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) return 0; =20 return tdx_sysinfo.tdmr.pamt_4k_entry_size * PTRS_PER_PTE / PAGE_SIZE; } +EXPORT_SYMBOL_GPL(tdx_nr_pamt_pages); =20 static u64 tdh_phymem_pamt_add(unsigned long hpa, struct list_head *pamt_pages) --=20 2.47.2 From nobody Sun Feb 8 15:15:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B376A221262; Mon, 9 Jun 2025 19:13:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.14 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496441; cv=none; b=DsFrxKn98SjL15IiZRPbVaq5j+lHFH4qwl8N46jmJcBi9ZZcB2JyoQkPe9rDul9XVHm9mU+WNpCHPZGzj1jQC42/7r7eVS7H1BlddglRzKFnXha9rj5aB5OMg+WwqPzvbwtriHwhkiYaIt41Bp1MT4XebgmaZgwgtsXIrwiY1NI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496441; c=relaxed/simple; bh=1KtN0Q8dMeZGdFmpkljFfijLNOzTCQGangJAVHHPvZw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KQcZice0w7BY0fjI64faiMIJaIItxW88Wrqwoy9VXWaHU8jkS8fsMDdanrtbKfGdn8icCwQY5L18TBfTlTW/wmxGXHmEBkIfLLqD1Rs7+KbQHJo8NFbqAnysklx5/0DyF9PSbe8rmPtmWvTRE0FjqRnk3af5QxqBeXDRpF+9Jc8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Q8eb9Qwq; arc=none smtp.client-ip=192.198.163.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Q8eb9Qwq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749496440; x=1781032440; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1KtN0Q8dMeZGdFmpkljFfijLNOzTCQGangJAVHHPvZw=; b=Q8eb9QwqZ/YcAubBOW8MgvbPD44QDbBcrUb54aDgC/ET6DfslFwwET2F KZy2Herac4zAbE/TXYof6FA1DEhSEOsv40Tp31UsOK2wwOrpAm1yepDaY b64Ev2LPxIrcCpBYHw1m5Hlhdpj0PCaecSG0PKU9ecwyIbeF/XETcO+m1 Ax0DlGnkKwKhdidoIeOkRPJIEpEkFotRzwEhpORlNZmgaA+elzlr2ddzl S0m2yeWgV/ADlXtOcM+JKKq+dWrkU7KVcdBocK9QVSuhuatxtxheWW2vp gzEHz99MlU+FKhC0JH8qaQrC4rzQpU7uidYU3ESSpCNpOby/9GaIS5rY0 g==; X-CSE-ConnectionGUID: SkCBoHOkQVeMX7h5oQHtEg== X-CSE-MsgGUID: WySUYMJRT2yy/Ap/VY87SA== X-IronPort-AV: E=McAfee;i="6800,10657,11459"; a="51681780" X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="51681780" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2025 12:13:59 -0700 X-CSE-ConnectionGUID: vP8mJcvNSwqMwssTTfOWiQ== X-CSE-MsgGUID: S03nyYFkR2W7J4TU4fWmrw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="147174188" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa007.jf.intel.com with ESMTP; 09 Jun 2025 12:13:56 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 5BD8DA45; Mon, 09 Jun 2025 22:13:49 +0300 (EEST) From: "Kirill A. Shutemov" To: pbonzini@redhat.com, seanjc@google.com, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, isaku.yamahata@intel.com, kai.huang@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 08/12] KVM: TDX: Handle PAMT allocation in fault path Date: Mon, 9 Jun 2025 22:13:36 +0300 Message-ID: <20250609191340.2051741-9-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> References: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" There are two distinct cases when the kernel needs to allocate PAMT memory in the fault path: for SEPT page tables in tdx_sept_link_private_spt= () and for leaf pages in tdx_sept_set_private_spte(). These code paths run in atomic context. Use a pre-allocated per-VCPU pool for memory allocations. Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/tdx.h | 4 ++++ arch/x86/kvm/vmx/tdx.c | 40 ++++++++++++++++++++++++++++++++----- arch/x86/virt/vmx/tdx/tdx.c | 21 +++++++++++++------ virt/kvm/kvm_main.c | 1 + 4 files changed, 55 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 47092eb13eb3..39f8dd7e0f06 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -116,6 +116,10 @@ u32 tdx_get_nr_guest_keyids(void); void tdx_guest_keyid_free(unsigned int keyid); =20 int tdx_nr_pamt_pages(void); +int tdx_pamt_get(struct page *page, enum pg_level level, + struct page *(alloc)(void *data), void *data); +void tdx_pamt_put(struct page *page, enum pg_level level); + struct page *tdx_alloc_page(void); void tdx_free_page(struct page *page); =20 diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 36c3c9f8a62c..bc9bc393f866 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1537,11 +1537,26 @@ static int tdx_mem_page_record_premap_cnt(struct kv= m *kvm, gfn_t gfn, return 0; } =20 +static struct page *tdx_alloc_pamt_page_atomic(void *data) +{ + struct kvm_vcpu *vcpu =3D data; + void *p; + + p =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.pamt_page_cache); + return virt_to_page(p); +} + int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, kvm_pfn_t pfn) { + struct kvm_vcpu *vcpu =3D kvm_get_running_vcpu(); struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); struct page *page =3D pfn_to_page(pfn); + int ret; + + ret =3D tdx_pamt_get(page, level, tdx_alloc_pamt_page_atomic, vcpu); + if (ret) + return ret; =20 /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) @@ -1562,10 +1577,16 @@ int tdx_sept_set_private_spte(struct kvm *kvm, gfn_= t gfn, * barrier in tdx_td_finalize(). */ smp_rmb(); - if (likely(kvm_tdx->state =3D=3D TD_STATE_RUNNABLE)) - return tdx_mem_page_aug(kvm, gfn, level, page); =20 - return tdx_mem_page_record_premap_cnt(kvm, gfn, level, pfn); + if (likely(kvm_tdx->state =3D=3D TD_STATE_RUNNABLE)) + ret =3D tdx_mem_page_aug(kvm, gfn, level, page); + else + ret =3D tdx_mem_page_record_premap_cnt(kvm, gfn, level, pfn); + + if (ret) + tdx_pamt_put(page, level); + + return ret; } =20 static int tdx_sept_drop_private_spte(struct kvm *kvm, gfn_t gfn, @@ -1622,17 +1643,26 @@ int tdx_sept_link_private_spt(struct kvm *kvm, gfn_= t gfn, enum pg_level level, void *private_spt) { int tdx_level =3D pg_level_to_tdx_sept_level(level); - gpa_t gpa =3D gfn_to_gpa(gfn); + struct kvm_vcpu *vcpu =3D kvm_get_running_vcpu(); struct page *page =3D virt_to_page(private_spt); + gpa_t gpa =3D gfn_to_gpa(gfn); u64 err, entry, level_state; + int ret; + + ret =3D tdx_pamt_get(page, PG_LEVEL_4K, tdx_alloc_pamt_page_atomic, vcpu); + if (ret) + return ret; =20 err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, tdx_level, page, &ent= ry, &level_state); - if (unlikely(tdx_operand_busy(err))) + if (unlikely(tdx_operand_busy(err))) { + tdx_pamt_put(page, PG_LEVEL_4K); return -EBUSY; + } =20 if (KVM_BUG_ON(err, kvm)) { pr_tdx_error_2(TDH_MEM_SEPT_ADD, err, entry, level_state); + tdx_pamt_put(page, PG_LEVEL_4K); return -EIO; } =20 diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 4f9eaba4af4a..d4b50b6428fa 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -2067,10 +2067,16 @@ static void tdx_free_pamt_pages(struct list_head *p= amt_pages) } } =20 -static int tdx_alloc_pamt_pages(struct list_head *pamt_pages) +static int tdx_alloc_pamt_pages(struct list_head *pamt_pages, + struct page *(alloc)(void *data), void *data) { for (int i =3D 0; i < tdx_nr_pamt_pages(); i++) { - struct page *page =3D alloc_page(GFP_KERNEL); + struct page *page; + + if (alloc) + page =3D alloc(data); + else + page =3D alloc_page(GFP_KERNEL); if (!page) goto fail; list_add(&page->lru, pamt_pages); @@ -2115,7 +2121,8 @@ static int tdx_pamt_add(atomic_t *pamt_refcount, unsi= gned long hpa, return 0; } =20 -static int tdx_pamt_get(struct page *page, enum pg_level level) +int tdx_pamt_get(struct page *page, enum pg_level level, + struct page *(alloc)(void *data), void *data) { unsigned long hpa =3D page_to_phys(page); atomic_t *pamt_refcount; @@ -2134,7 +2141,7 @@ static int tdx_pamt_get(struct page *page, enum pg_le= vel level) if (atomic_inc_not_zero(pamt_refcount)) return 0; =20 - if (tdx_alloc_pamt_pages(&pamt_pages)) + if (tdx_alloc_pamt_pages(&pamt_pages, alloc, data)) return -ENOMEM; =20 ret =3D tdx_pamt_add(pamt_refcount, hpa, &pamt_pages); @@ -2143,8 +2150,9 @@ static int tdx_pamt_get(struct page *page, enum pg_le= vel level) =20 return ret >=3D 0 ? 0 : ret; } +EXPORT_SYMBOL_GPL(tdx_pamt_get); =20 -static void tdx_pamt_put(struct page *page, enum pg_level level) +void tdx_pamt_put(struct page *page, enum pg_level level) { unsigned long hpa =3D page_to_phys(page); atomic_t *pamt_refcount; @@ -2179,6 +2187,7 @@ static void tdx_pamt_put(struct page *page, enum pg_l= evel level) =20 tdx_free_pamt_pages(&pamt_pages); } +EXPORT_SYMBOL_GPL(tdx_pamt_put); =20 struct page *tdx_alloc_page(void) { @@ -2188,7 +2197,7 @@ struct page *tdx_alloc_page(void) if (!page) return NULL; =20 - if (tdx_pamt_get(page, PG_LEVEL_4K)) { + if (tdx_pamt_get(page, PG_LEVEL_4K, NULL, NULL)) { __free_page(page); return NULL; } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index eec82775c5bf..6add012532a0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -436,6 +436,7 @@ void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_= cache *mc) BUG_ON(!p); return p; } +EXPORT_SYMBOL_GPL(kvm_mmu_memory_cache_alloc); #endif =20 static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned= id) --=20 2.47.2 From nobody Sun Feb 8 15:15:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 407EB22172B; Mon, 9 Jun 2025 19:14:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.14 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496443; cv=none; b=RJeZxOXpTrHyMYcdFcL101GTuDoGdKMDIf/iP3JP/qJCt/3iKsjMUIb7LXTYkccc7rjLTUmeGaMMVDk1jU5SPPZyXp2e/0R80xVj32nf+q+vnTYUaoz+pwSm9nIHJLQo14z4vzf6ldDWvHbV6pDIm8q0HbsjKREX6MtiS6YiGfk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496443; c=relaxed/simple; bh=u3lmhKal89dtBVTIGgcT29cm7/94nRSPPgh1/WsBV+E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=pBFIjYeV+yy1lsC5c5MD4Q/CW5s3qlkqq//nuWXaNNo8nPeYcbJ6/vvS+rnktZj5gIbkj33BuLb8cDa2YjSwA3SPI6qyFEvq+HumiE+wZq6Kf3yCXiI/djUr117rrIJNPc10OlkI8iQLwaAAABfWVYBZevW0zZmX5ubnJDcJPVo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=FB6oOYKt; arc=none smtp.client-ip=192.198.163.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="FB6oOYKt" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749496441; x=1781032441; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=u3lmhKal89dtBVTIGgcT29cm7/94nRSPPgh1/WsBV+E=; b=FB6oOYKtMxXZzbXCXoXLvPRHZCsiKPNHO80gdqIt8w0bpXMxvIL6gPo6 412elJdrhGJYNlBJ0o6qCz9f9D6qoOp3x4qCyYDIYJBc5REISN4eyD13K 4YGDmiKQwjSTxdevk4NUptPYR1n6/Ixvg3tKy7/me0J1Z6b5dAQMXs9Gh asVMr6m5rnHESqFFRcpdYg+ORw+3ub1hGdzBhB8TX8d6hS8ZUxdcB9gB0 osMZvIkRkIkcQEA/RbJQKPYcLQoecVC5PW0blUXTQog8FFpdJdlG90Iw6 ZO8fzjCQj6KoD6v3N25m1vioL5M0M0GfYn2orfV/7LDA4nRq0wP4RaYka w==; X-CSE-ConnectionGUID: 72rtDwQeSkC/QeusAovSsQ== X-CSE-MsgGUID: WhWEE223RMOra83FaWbpWA== X-IronPort-AV: E=McAfee;i="6800,10657,11459"; a="51681796" X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="51681796" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2025 12:14:00 -0700 X-CSE-ConnectionGUID: Wz46xy/GRjWvuiwx4+4Tvw== X-CSE-MsgGUID: pkpBkGgeS6quvNIqDLH6/w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="147174193" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa007.jf.intel.com with ESMTP; 09 Jun 2025 12:13:56 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 66C62286; Mon, 09 Jun 2025 22:13:49 +0300 (EEST) From: "Kirill A. Shutemov" To: pbonzini@redhat.com, seanjc@google.com, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, isaku.yamahata@intel.com, kai.huang@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 09/12] KVM: TDX: Reclaim PAMT memory Date: Mon, 9 Jun 2025 22:13:37 +0300 Message-ID: <20250609191340.2051741-10-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> References: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The PAMT memory holds metadata for TDX-protected memory. With Dynamic PAMT, PAMT_4K is allocated on demand. The kernel supplies the TDX module with a few pages that cover 2M of host physical memory. PAMT memory can be reclaimed when the last user is gone. It can happen in a few code paths: - On TDH.PHYMEM.PAGE.RECLAIM in tdx_reclaim_td_control_pages() and tdx_reclaim_page(). - On TDH.MEM.PAGE.REMOVE in tdx_sept_drop_private_spte(). - In tdx_sept_zap_private_spte() for pages that were in the queue to be added with TDH.MEM.PAGE.ADD, but it never happened due to an error. - In tdx_sept_free_private_spt() for SEPT pages; Add tdx_pamt_put() for memory that comes from guest_memfd and use tdx_free_page() for the rest. Signed-off-by: Kirill A. Shutemov --- arch/x86/kvm/vmx/tdx.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index bc9bc393f866..0aed7e73cd6b 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -353,7 +353,7 @@ static void tdx_reclaim_control_page(struct page *ctrl_= page) if (tdx_reclaim_page(ctrl_page)) return; =20 - __free_page(ctrl_page); + tdx_free_page(ctrl_page); } =20 struct tdx_flush_vp_arg { @@ -584,7 +584,7 @@ static void tdx_reclaim_td_control_pages(struct kvm *kv= m) } tdx_clear_page(kvm_tdx->td.tdr_page); =20 - __free_page(kvm_tdx->td.tdr_page); + tdx_free_page(kvm_tdx->td.tdr_page); kvm_tdx->td.tdr_page =3D NULL; } =20 @@ -1635,6 +1635,7 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm= , gfn_t gfn, return -EIO; } tdx_clear_page(page); + tdx_pamt_put(page, level); tdx_unpin(kvm, page); return 0; } @@ -1724,6 +1725,7 @@ static int tdx_sept_zap_private_spte(struct kvm *kvm,= gfn_t gfn, if (tdx_is_sept_zap_err_due_to_premap(kvm_tdx, err, entry, level) && !KVM_BUG_ON(!atomic64_read(&kvm_tdx->nr_premapped), kvm)) { atomic64_dec(&kvm_tdx->nr_premapped); + tdx_pamt_put(page, level); tdx_unpin(kvm, page); return 0; } @@ -1788,6 +1790,8 @@ int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t = gfn, enum pg_level level, void *private_spt) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + struct page *page =3D virt_to_page(private_spt); + int ret; =20 /* * free_external_spt() is only called after hkid is freed when TD is @@ -1804,7 +1808,12 @@ int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t= gfn, * The HKID assigned to this TD was already freed and cache was * already flushed. We don't have to flush again. */ - return tdx_reclaim_page(virt_to_page(private_spt)); + ret =3D tdx_reclaim_page(virt_to_page(private_spt)); + if (ret) + return ret; + + tdx_pamt_put(page, PG_LEVEL_4K); + return 0; } =20 int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, --=20 2.47.2 From nobody Sun Feb 8 15:15:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A76CE2206B2; Mon, 9 Jun 2025 19:14:00 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.14 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496442; cv=none; b=fKt40I3Rh9kgnUNAQ9Qg6UW6184dW5s2dcC14QwCyyeylYv/y19QG0zNWe//cwQ5tdZqfb6Omtv33igEGU4A2NlbVdV3GHz7z5BKJjebhx2f/GFTyV1xdtdjk+7KZCTAsCG38CwrXjmPY6b3jZy3dPJl0PepgjBib3hfPYVYOuE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496442; c=relaxed/simple; bh=taDyQQ0hVqoylp1FoASKqcRdTS2NYvm6e9mKOK0s/aw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VlRXsk4U9soCxcjbe4X/1WAyzZPQvSXe6HI1QKiU87US9X9JiAP9KbdMiJihBfP3o5/pZN8ybs/E+sSmt5C+V1MPrfYGNDk1x2q134ZlHibTwYwsttIS42BaGi24vClLbosxnSLnuxBKkhNMUhpx9hSQlmxLt+9aBPTrZsL3o+g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Scvxj3ZH; arc=none smtp.client-ip=192.198.163.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Scvxj3ZH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749496441; x=1781032441; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=taDyQQ0hVqoylp1FoASKqcRdTS2NYvm6e9mKOK0s/aw=; b=Scvxj3ZHgsGGnK/2209KadmoZ8aI5knMWv1CAUVqo7Ob2uUYA3RY6N3n CqQUdTjGE4iDKOdHVPgkw+JGBAMYVNQzvWm30iU4/4pgp7bPOGXKIWiOh GFPvo60tdW1ZFzKnx5boLcLXGmaKazYYAG+43fM80TqtqoRWuRLAIikdd Yk+Y/aiYTTX4ir04vvB6OIzJGoXlQbC94N2WkrvIIQA/UAhTRv0g6h++s Ys7duSOxRjTqcqxX/Muh0mFQqaJ/mJ+4FMltbXiAi10CIAsgPTALXrsKn R+VfwADL+tLopZrc0rShr5BMoxNlkxgYIKGcO7/MINx0y7+glBtfUEhXt w==; X-CSE-ConnectionGUID: XyEYEDPaRqmhJZ94fPxjxw== X-CSE-MsgGUID: nc1hgbJUSYWXZ2e/UXefiA== X-IronPort-AV: E=McAfee;i="6800,10657,11459"; a="51681788" X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="51681788" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2025 12:14:00 -0700 X-CSE-ConnectionGUID: Z+V09sNBSQqiAKkg0FkqZg== X-CSE-MsgGUID: jHwAvZdhQ9O5TYXQKBTDJA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="147174191" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa007.jf.intel.com with ESMTP; 09 Jun 2025 12:13:56 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 6CD7BA63; Mon, 09 Jun 2025 22:13:49 +0300 (EEST) From: "Kirill A. Shutemov" To: pbonzini@redhat.com, seanjc@google.com, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, isaku.yamahata@intel.com, kai.huang@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 10/12] [NOT-FOR-UPSTREAM] x86/virt/tdx: Account PAMT memory and print it in /proc/meminfo Date: Mon, 9 Jun 2025 22:13:38 +0300 Message-ID: <20250609191340.2051741-11-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> References: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" PAMT memory can add up to substantial portion of system memory. Account these pages and print them into /proc/meminfo as TDX. When no TD running PAMT memory consumption suppose to be zero. Signed-off-by: Kirill A. Shutemov --- The patch proved to be extremely useful to catch PAMT memory leaks, but putting this counter in /proc/meminfo is probably overkill. Any suggestion for better way to expose this counter is welcome. --- arch/x86/include/asm/set_memory.h | 3 +++ arch/x86/include/asm/tdx.h | 3 +++ arch/x86/mm/Makefile | 2 ++ arch/x86/mm/meminfo.c | 11 +++++++++++ arch/x86/mm/pat/set_memory.c | 2 +- arch/x86/virt/vmx/tdx/tdx.c | 26 ++++++++++++++++++++++++-- 6 files changed, 44 insertions(+), 3 deletions(-) create mode 100644 arch/x86/mm/meminfo.c diff --git a/arch/x86/include/asm/set_memory.h b/arch/x86/include/asm/set_m= emory.h index 8d9f1c9aaa4c..66b37bff61e5 100644 --- a/arch/x86/include/asm/set_memory.h +++ b/arch/x86/include/asm/set_memory.h @@ -90,6 +90,9 @@ int set_direct_map_default_noflush(struct page *page); int set_direct_map_valid_noflush(struct page *page, unsigned nr, bool vali= d); bool kernel_page_present(struct page *page); =20 +struct seq_file; +void direct_pages_meminfo(struct seq_file *m); + extern int kernel_set_to_readonly; =20 #endif /* _ASM_X86_SET_MEMORY_H */ diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 39f8dd7e0f06..853471e1eda1 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -186,6 +186,8 @@ u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64= level, u64 *ext_err1, u6 u64 tdh_phymem_cache_wb(bool resume); u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td); u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page); + +void tdx_meminfo(struct seq_file *m); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } @@ -194,6 +196,7 @@ static inline u32 tdx_get_nr_guest_keyids(void) { retur= n 0; } static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; } static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NU= LL; } static inline int tdx_nr_pamt_pages(void) { return 0; } +static inline void tdx_meminfo(struct seq_file *m) {} #endif /* CONFIG_INTEL_TDX_HOST */ =20 #endif /* !__ASSEMBLER__ */ diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 32035d5be5a0..311d60801871 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -38,6 +38,8 @@ CFLAGS_fault.o :=3D -I $(src)/../include/asm/trace =20 obj-$(CONFIG_X86_32) +=3D pgtable_32.o iomap_32.o =20 +obj-$(CONFIG_PROC_FS) +=3D meminfo.o + obj-$(CONFIG_HUGETLB_PAGE) +=3D hugetlbpage.o obj-$(CONFIG_PTDUMP) +=3D dump_pagetables.o obj-$(CONFIG_PTDUMP_DEBUGFS) +=3D debug_pagetables.o diff --git a/arch/x86/mm/meminfo.c b/arch/x86/mm/meminfo.c new file mode 100644 index 000000000000..7bdb5df014de --- /dev/null +++ b/arch/x86/mm/meminfo.c @@ -0,0 +1,11 @@ +#include +#include + +#include +#include + +void arch_report_meminfo(struct seq_file *m) +{ + direct_pages_meminfo(m); + tdx_meminfo(m); +} diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c index def3d9284254..59432b92e80e 100644 --- a/arch/x86/mm/pat/set_memory.c +++ b/arch/x86/mm/pat/set_memory.c @@ -118,7 +118,7 @@ static void collapse_page_count(int level) direct_pages_count[level - 1] -=3D PTRS_PER_PTE; } =20 -void arch_report_meminfo(struct seq_file *m) +void direct_pages_meminfo(struct seq_file *m) { seq_printf(m, "DirectMap4k: %8lu kB\n", direct_pages_count[PG_LEVEL_4K] << 2); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index d4b50b6428fa..4dcba7bf4ab9 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -51,6 +51,8 @@ static DEFINE_PER_CPU(bool, tdx_lp_initialized); =20 static struct tdmr_info_list tdx_tdmr_list; =20 +static atomic_long_t tdx_pamt_count =3D ATOMIC_LONG_INIT(0); + static atomic_t *pamt_refcounts; =20 static enum tdx_module_status_t tdx_module_status; @@ -2010,6 +2012,19 @@ int tdx_nr_pamt_pages(void) } EXPORT_SYMBOL_GPL(tdx_nr_pamt_pages); =20 +void tdx_meminfo(struct seq_file *m) +{ + unsigned long usage; + + if (!cpu_feature_enabled(X86_FEATURE_TDX_HOST_PLATFORM)) + return; + + usage =3D atomic_long_read(&tdx_pamt_count) * + tdx_nr_pamt_pages() * PAGE_SIZE / SZ_1K; + + seq_printf(m, "TDX: %8lu kB\n", usage); +} + static u64 tdh_phymem_pamt_add(unsigned long hpa, struct list_head *pamt_pages) { @@ -2017,7 +2032,7 @@ static u64 tdh_phymem_pamt_add(unsigned long hpa, .rcx =3D hpa, }; struct page *page; - u64 *p; + u64 *p, ret; =20 WARN_ON_ONCE(!IS_ALIGNED(hpa & PAGE_MASK, PMD_SIZE)); =20 @@ -2027,7 +2042,12 @@ static u64 tdh_phymem_pamt_add(unsigned long hpa, p++; } =20 - return seamcall(TDH_PHYMEM_PAMT_ADD, &args); + ret =3D seamcall(TDH_PHYMEM_PAMT_ADD, &args); + + if (!ret) + atomic_long_inc(&tdx_pamt_count); + + return ret; } =20 static u64 tdh_phymem_pamt_remove(unsigned long hpa, @@ -2045,6 +2065,8 @@ static u64 tdh_phymem_pamt_remove(unsigned long hpa, if (ret) return ret; =20 + atomic_long_dec(&tdx_pamt_count); + p =3D &args.rdx; for (int i =3D 0; i < tdx_nr_pamt_pages(); i++) { page =3D phys_to_page(*p); --=20 2.47.2 From nobody Sun Feb 8 15:15:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36300221725; Mon, 9 Jun 2025 19:14:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496443; cv=none; b=DAiIzHiFmXBP/l3V0zt1xUswJ9ovmA+vIjpQRcSjsxN8jjOCJ/swbSbLK2RPgAMcbJ3KDKhAkSI06GqNhYM1esFl/Zub4+vFMYpF0uY0u3p12A4nkHmMg8dD3Hb+YrsW+xo/Q//WtaNxAARQ+3yFGITS0cpDzaVqW8Wt+JOb+W8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496443; c=relaxed/simple; bh=8rUB5lKZK3FbRxrTpkzxHWu3XBODRxiLmbQSXovzhx0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=heRNX9ADur89VFOL7u/7d5YoiZoba5Ov4PLz+ob4FzllkcxV96GkhM5W8Ats8g6FzuOvNrJPNJt+siy8RSCASlUrMRUJ7erYGvOExUw+Y8IWv4w8NDO15FEY3TOMufdJG/3vx6J/XzXr0edoFzVtDYSlnlmRLf9k2bnAEQKVvPQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=BRmw9W6K; arc=none smtp.client-ip=198.175.65.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="BRmw9W6K" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749496441; x=1781032441; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8rUB5lKZK3FbRxrTpkzxHWu3XBODRxiLmbQSXovzhx0=; b=BRmw9W6K/z+mkJ2ICWHxyuGXs+UNGCf5fSNY/XSRyAS8iSshnOMWkID+ iVcFjcTr8wSo50HyhAffc+DLkPr4A9DXmhYEaeXUVtzD9ojfjGGjr7ddD UFonQwmiJCpAm0EMLSfj+3RuycR6r9hzoemH4zpKFp4pNGLWjOe3bA3A/ ZnBStxT9SZAA4wxKBpGjKfnNgvykdQ75RpNWm1Vr0sB6026UWlW7cwdLw mLB1UZNrD/SLLLwLhw8L0RYOjeY814MPosiLIakV6m7jWSg91y6fSZd58 8hyfFa7AqoIExPk1QEl8yUetMypgS+DF4zWalE73ahKNoG9GU2X4ZytbJ A==; X-CSE-ConnectionGUID: D8l5jOZ6TXaTW0WXeVejRA== X-CSE-MsgGUID: Rmg+nucpRUuN7cn+bjhlLw== X-IronPort-AV: E=McAfee;i="6800,10657,11459"; a="51467276" X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="51467276" Received: from orviesa008.jf.intel.com ([10.64.159.148]) by orvoesa111.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2025 12:14:01 -0700 X-CSE-ConnectionGUID: hAoX+B1/T9O/+n1A8mtWNw== X-CSE-MsgGUID: 7H05x142SKmu7hdMQ2oWJg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="147562196" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa008.jf.intel.com with ESMTP; 09 Jun 2025 12:13:56 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 7D1A8B59; Mon, 09 Jun 2025 22:13:49 +0300 (EEST) From: "Kirill A. Shutemov" To: pbonzini@redhat.com, seanjc@google.com, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, isaku.yamahata@intel.com, kai.huang@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 11/12] x86/virt/tdx: Enable Dynamic PAMT Date: Mon, 9 Jun 2025 22:13:39 +0300 Message-ID: <20250609191340.2051741-12-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> References: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" The Physical Address Metadata Table (PAMT) holds TDX metadata for physical memory and must be allocated by the kernel during TDX module initialization. The exact size of the required PAMT memory is determined by the TDX module and may vary between TDX module versions, but currently it is approximately 0.4% of the system memory. This is a significant commitment, especially if it is not known upfront whether the machine will run any TDX guests. The Dynamic PAMT feature reduces static PAMT allocations. PAMT_1G and PAMT_2M levels are still allocated on TDX module initialization, but the PAMT_4K level is allocated dynamically, reducing static allocations to approximately 0.004% of the system memory. All pieces are in place. Enable Dynamic PAMT if it is supported. Signed-off-by: Kirill A. Shutemov --- arch/x86/include/asm/tdx.h | 6 +++++- arch/x86/virt/vmx/tdx/tdx.c | 8 ++++++++ arch/x86/virt/vmx/tdx/tdx.h | 3 --- 3 files changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 853471e1eda1..8897c7416309 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -13,6 +13,10 @@ #include #include =20 +/* Bit definitions of TDX_FEATURES0 metadata field */ +#define TDX_FEATURES0_NO_RBP_MOD BIT_ULL(18) +#define TDX_FEATURES0_DYNAMIC_PAMT BIT_ULL(36) + #ifndef __ASSEMBLER__ =20 #include @@ -108,7 +112,7 @@ const struct tdx_sys_info *tdx_get_sysinfo(void); =20 static inline bool tdx_supports_dynamic_pamt(const struct tdx_sys_info *sy= sinfo) { - return false; /* To be enabled when kernel is ready */ + return sysinfo->features.tdx_features0 & TDX_FEATURES0_DYNAMIC_PAMT; } =20 int tdx_guest_keyid_alloc(void); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 4dcba7bf4ab9..d9f27647424d 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1047,6 +1047,8 @@ static int construct_tdmrs(struct list_head *tmb_list, return ret; } =20 +#define TDX_SYS_CONFIG_DYNAMIC_PAMT BIT(16) + static int config_tdx_module(struct tdmr_info_list *tdmr_list, u64 global_= keyid) { struct tdx_module_args args =3D {}; @@ -1074,6 +1076,12 @@ static int config_tdx_module(struct tdmr_info_list *= tdmr_list, u64 global_keyid) args.rcx =3D __pa(tdmr_pa_array); args.rdx =3D tdmr_list->nr_consumed_tdmrs; args.r8 =3D global_keyid; + + if (tdx_supports_dynamic_pamt(&tdx_sysinfo)) { + pr_info("Enable Dynamic PAMT\n"); + args.r8 |=3D TDX_SYS_CONFIG_DYNAMIC_PAMT; + } + ret =3D seamcall_prerr(TDH_SYS_CONFIG, &args); =20 /* Free the array as it is not required anymore. */ diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 46c4214b79fb..096c78a1d438 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -86,9 +86,6 @@ struct tdmr_info { DECLARE_FLEX_ARRAY(struct tdmr_reserved_area, reserved_areas); } __packed __aligned(TDMR_INFO_ALIGNMENT); =20 -/* Bit definitions of TDX_FEATURES0 metadata field */ -#define TDX_FEATURES0_NO_RBP_MOD BIT(18) - /* * Do not put any hardware-defined TDX structure representations below * this comment! --=20 2.47.2 From nobody Sun Feb 8 15:15:50 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 09EB72222D5; Mon, 9 Jun 2025 19:14:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.14 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496444; cv=none; b=H4jX0/hSTcjhScVJTolmiPw56o3ZuDgs2hl5ongGdNoUK2Dhod2ooig5tPnXtN+t20ZRWl3fzncr4iGMt635/Rh4R4m/5NWHkIZ2RvDcxK9PZhevFsoQO/y/RctKHRKJU/tvzTZU/enoEieXQKOnAXtrcFe6tty7rLrvGhbj0fQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1749496444; c=relaxed/simple; bh=kX+3ZVR0+aZ1UszWZjARbJWwTNTIBQcWEhqm1x/9MBs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UTmxh6Hx20JaZmUqRW0IMJxj+VF3hM2fe8y1o/6/iDjNXrcBeyWOpP/0kI636k+Bp59Yfzdpbk9614Qiz+F7QnZB6/DQBlH9GE+RDHmG9eab9fcwaw3lcDyIyyMb87Ypcf7MlMiM+IZwJnqMx5Ao14hDGQ4v+c7d3asXdMjye1Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com; spf=none smtp.helo=mgamail.intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=G2OiRVul; arc=none smtp.client-ip=192.198.163.14 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.intel.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.helo=mgamail.intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="G2OiRVul" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1749496443; x=1781032443; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kX+3ZVR0+aZ1UszWZjARbJWwTNTIBQcWEhqm1x/9MBs=; b=G2OiRVul5YovaOSYABfHCZjX5XJkeR0fgCI9sCtJKaW9aUZ6P4C64+j0 c546WC83NUZznbHIn5kzve3kG076LAeh1zNeVPD5qujtYf06q62E5AFO8 TrYaxk5NvNdQZdBTM0DKblHcYu6AZF02uEV6fmTgche56lyerJK41HatD okK/IqC1BftODCTK9a0BBnn2s3aHsIxI7c/8QknfsL9U1Mzx3Inkv7OuK bLMjgeFroEhonkhFqm3um9K+oUShgPH4Z0uYivT3ofmFVkcBUwcdjNVuG 1TgxRGYKem7sfF7JTFTjCXERgdfaMjnyAIYXHpVe6ZirBuKtawJUX3rcU w==; X-CSE-ConnectionGUID: aNRxnkqRQjGhZa9oaVsMWg== X-CSE-MsgGUID: 6OLddxpWQ6ypJ402xxtBkQ== X-IronPort-AV: E=McAfee;i="6800,10657,11459"; a="51681811" X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="51681811" Received: from orviesa007.jf.intel.com ([10.64.159.147]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Jun 2025 12:14:00 -0700 X-CSE-ConnectionGUID: xyquTUd2SemM+bGqi0GaHQ== X-CSE-MsgGUID: XiVCkRk/ToefjoqPvYMteQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.16,223,1744095600"; d="scan'208";a="147174198" Received: from black.fi.intel.com ([10.237.72.28]) by orviesa007.jf.intel.com with ESMTP; 09 Jun 2025 12:13:56 -0700 Received: by black.fi.intel.com (Postfix, from userid 1000) id 8310CA65; Mon, 09 Jun 2025 22:13:49 +0300 (EEST) From: "Kirill A. Shutemov" To: pbonzini@redhat.com, seanjc@google.com, dave.hansen@linux.intel.com Cc: rick.p.edgecombe@intel.com, isaku.yamahata@intel.com, kai.huang@intel.com, yan.y.zhao@intel.com, chao.gao@intel.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, kvm@vger.kernel.org, x86@kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: [PATCHv2 12/12] Documentation/x86: Add documentation for TDX's Dynamic PAMT Date: Mon, 9 Jun 2025 22:13:40 +0300 Message-ID: <20250609191340.2051741-13-kirill.shutemov@linux.intel.com> X-Mailer: git-send-email 2.47.2 In-Reply-To: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> References: <20250609191340.2051741-1-kirill.shutemov@linux.intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Expand TDX documentation to include information on the Dynamic PAMT feature. The new section explains PAMT support in the TDX module and how it is enabled on the kernel side. Signed-off-by: Kirill A. Shutemov --- Documentation/arch/x86/tdx.rst | 108 +++++++++++++++++++++++++++++++++ 1 file changed, 108 insertions(+) diff --git a/Documentation/arch/x86/tdx.rst b/Documentation/arch/x86/tdx.rst index 719043cd8b46..a1dc50dd6f57 100644 --- a/Documentation/arch/x86/tdx.rst +++ b/Documentation/arch/x86/tdx.rst @@ -99,6 +99,114 @@ initialize:: =20 [..] virt/tdx: module initialization failed ... =20 +Dynamic PAMT +------------ + +Dynamic PAMT support in TDX module +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Dynamic PAMT is a TDX feature that allows VMM to allocate PAMT_4K as +needed. PAMT_1G and PAMT_2M are still allocated statically at the time of +TDX module initialization. At init stage allocation of PAMT_4K is replaced +with PAMT_PAGE_BITMAP which currently requires one bit of memory per 4k. + +VMM is responsible for allocating and freeing PAMT_4K. There's a couple of +new SEAMCALLs for this: TDH.PHYMEM.PAMT.ADD and TDH.PHYMEM.PAMT.REMOVE. +They add/remove PAMT memory in form of page pair. There's no requirement +for these pages to be contiguous. + +Page pair supplied via TDH.PHYMEM.PAMT.ADD will cover specified 2M region. +It allows any 4K from the region to be usable by TDX module. + +With Dynamic PAMT, a number of SEAMCALLs can now fail due to missing PAMT +memory (TDX_MISSING_PAMT_PAGE_PAIR): + + - TDH.MNG.CREATE + - TDH.MNG.ADDCX + - TDH.VP.ADDCX + - TDH.VP.CREATE + - TDH.MEM.PAGE.ADD + - TDH.MEM.PAGE.AUG + - TDH.MEM.PAGE.DEMOTE + - TDH.MEM.PAGE.RELOCATE + +Basically, if you supply memory to a TD, this memory has to backed by PAMT +memory. + +Once no TD uses the 2M range, the PAMT page pair can be reclaimed with +TDH.PHYMEM.PAMT.REMOVE. + +TDX module track PAMT memory usage and can give VMM a hint that PAMT +memory can be removed. Such hint is provided from all SEAMCALLs that +removes memory from TD: + + - TDH.MEM.SEPT.REMOVE + - TDH.MEM.PAGE.REMOVE + - TDH.MEM.PAGE.PROMOTE + - TDH.MEM.PAGE.RELOCATE + - TDH.PHYMEM.PAGE.RECLAIM + +With Dynamic PAMT, TDH.MEM.PAGE.DEMOTE takes PAMT page pair as additional +input to populate PAMT_4K on split. TDH.MEM.PAGE.PROMOTE returns no longer +needed PAMT page pair. + +PAMT memory is global resource and not tied to a specific TD. TDX modules +maintains PAMT memory in a radix tree addressed by physical address. Each +entry in the tree can be locked with shared or exclusive lock. Any +modification of the tree requires exclusive lock. + +Any SEAMCALL that takes explicit HPA as an argument will walk the tree +taking shared lock on entries. It required to make sure that the page +pointed by HPA is of compatible type for the usage. + +TDCALLs don't take PAMT locks as none of the take HPA as an argument. + +Dynamic PAMT enabling in kernel +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Kernel maintains refcounts for every 2M regions with two helpers +tdx_pamt_get() and tdx_pamt_put(). + +The refcount represents number of users for the PAMT memory in the region. +Kernel calls TDH.PHYMEM.PAMT.ADD on 0->1 transition and +TDH.PHYMEM.PAMT.REMOVE on transition 1->0. + +The function tdx_alloc_page() allocates a new page and ensures that it is +backed by PAMT memory. Pages allocated in this manner are ready to be used +for a TD. The function tdx_free_page() frees the page and releases the +PAMT memory for the 2M region if it is no longer needed. + +PAMT memory gets allocated as part of TD init, VCPU init, on populating +SEPT tree and adding guest memory (both during TD build and via AUG on +accept). Splitting 2M page into 4K also requires PAMT memory. + +PAMT memory removed on reclaim of control pages and guest memory. + +Populating PAMT memory on fault and on split is tricky as kernel cannot +allocate memory from the context where it is needed. These code paths use +pre-allocated PAMT memory pools. + +Previous attempt on Dynamic PAMT enabling +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The initial attempt at kernel enabling was quite different. It was built +around lazy PAMT allocation: only trying to add a PAMT page pair if a +SEAMCALL fails due to a missing PAMT and reclaiming it based on hints +provided by the TDX module. + +The motivation was to avoid duplicating the PAMT memory refcounting that +the TDX module does on the kernel side. + +This approach is inherently more racy as there is no serialization of +PAMT memory add/remove against SEAMCALLs that add/remove memory for a TD. +Such serialization would require global locking, which is not feasible. + +This approach worked, but at some point it became clear that it could not +be robust as long as the kernel avoids TDX_OPERAND_BUSY loops. +TDX_OPERAND_BUSY will occur as a result of the races mentioned above. + +This approach was abandoned in favor of explicit refcounting. + TDX Interaction to Other Kernel Components ------------------------------------------ =20 --=20 2.47.2