From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 75DBB318B81 for ; Thu, 29 Jan 2026 01:15:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649332; cv=none; b=iGY5bDAaVBmiUGWDQTq0wxIgACsjAJapFYMObYitJA4IPfKApFTQNWMen+EgiHUo4xRA5j0EC0sU+WelZ05rExQpzI881ss575+ggzdivQ4BKn09yMMU/oBcpSutEsZ1oQzyJ6vd//JtK3eoMjXWUHBqOd9orMWNCm5/qc2RH2M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649332; c=relaxed/simple; bh=6WZuX4SGOhlJ1EJPkuzateSePJBHf/L/b6Rra/zk9bI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=jkRjJmWi64u9Jz8Pg8oiSYLof9jv/rGAJITxa1ylC7S3A+68Qrb83uj4s04KTgbzQw/2+7pTb0MNKfyLA6BIGMgKCr3LbAVzvp45TiifflGZvFRFbxpMCHB0E+/E3Pk0PeuV91pdLg3PWXZprnRdcVw/vanztwyKsQwKQonnX2A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Q/Bj2STU; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Q/Bj2STU" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-353c9d644b0so266199a91.2 for ; Wed, 28 Jan 2026 17:15:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649330; x=1770254130; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=igCO2aefnhWl+0Qoc7v6+UqQKsYgMls58OQ/f2Mb56Y=; b=Q/Bj2STUpdeRKMjiuu6O30By6xRhtpRtE7MwtSKueyeBUIJtARtamLsQmQiw6CxnGl 8UjPpG64M3GZpiOtI1pGLOUiEEcf3t6PxKWZPbKRrKFRVIUD7cbVUCwiXvxjYEW3EMIO mc58dQTrKJv5ahKu0tE+lH9pYii6sRRNAV3S9Kr1rsMJZWj8jEfs5GF9WMHEde3HgRcB V+T+pFx3CpMGS715w8QM/hGCbTuV5mGwO3llL/E2O0WmpXb08m3sQvrcAEPILnpatuhp HZaVuoN/teDUsBZKhco2nxOPlJ9CXNSCzmJuMNKih1UVl4ZYXxCrUIMaC/gpZzIzJBC5 haag== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649330; x=1770254130; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=igCO2aefnhWl+0Qoc7v6+UqQKsYgMls58OQ/f2Mb56Y=; b=S8+eB3Sk5aJRq2eW0ys+1tA28GZO/BVRxMBYMNiq+PbmqeB6Gu4KvJIx6tYQnBWtSS RCYlmdZh0x1XUxZaDEcBRU3VPdyNi+88cxfKcQ+uzCICsMG5AUKh0xwGBr34Q41FR2uT bmxEVeRwCQ12HeIvs7xbX02zYIMFbY6DbnBGcAY5M3rBUV/DXSxZCjBlGKj3li2LzBDF 3EMYikHBkkdsvsMmrnGMxW99VHHxqqrI00BIZU0SDuAVd1qKnUWITPrpC8fXH6bAU+/F BjxW0mRQH3A6hT3D9RXFYQkdHjAgYOgYFZQIYNp6S9Qa/0ciBZFuz7dOEGTFFNUKIX7P p5Ew== X-Gm-Message-State: AOJu0Yz4E7oaJucGR3gdPiY7yvsgUdm/vqoynia0+U00YNnvyXFVH6zZ tqxyM/I+cuOqiRjJcu4Xx1YN1yMLKpuvfvzQ1zAbrPBbh0F+S4/zV3ADKtdUwsvuHNFWGNIkgYt 3YUWulQ== X-Received: from pjbos4.prod.google.com ([2002:a17:90b:1cc4:b0:34a:c87f:a95a]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2ecc:b0:340:9d52:44c1 with SMTP id 98e67ed59e1d1-353feda1774mr6383748a91.35.1769649329834; Wed, 28 Jan 2026 17:15:29 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:33 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-2-seanjc@google.com> Subject: [RFC PATCH v5 01/45] x86/tdx: Use pg_level in TDX APIs, not the TDX-Module's 0-based level From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Rework the TDX APIs to take the kernel's 1-based pg_level enum, not the TDX-Module's 0-based level. The APIs are _kernel_ APIs, not TDX-Module APIs, and the kernel (and KVM) uses "enum pg_level" literally everywhere. Using "enum pg_level" eliminates ambiguity when looking at the APIs (it's NOT clear that "int level" refers to the TDX-Module's level), and will allow for using existing helpers like page_level_size() when support for hugepages is added to the S-EPT APIs. No functional change intended. Cc: Kai Huang Cc: Dave Hansen Cc: Rick Edgecombe Cc: Yan Zhao Cc: Vishal Annapurve Cc: Ackerley Tng Acked-by: Kiryl Shutsemau Reviewed-by: Kai Huang Tested-by: Kai Huang Reviewed-by: Rick Edgecombe Tested-by: Rick Edgecombe Signed-off-by: Sean Christopherson Acked-by: Dave Hansen --- arch/x86/include/asm/tdx.h | 14 ++++---------- arch/x86/kvm/vmx/tdx.c | 11 ++++------- arch/x86/virt/vmx/tdx/tdx.c | 26 ++++++++++++++++++-------- 3 files changed, 26 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 6b338d7f01b7..bc0d03e70fd6 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -189,19 +189,13 @@ static inline u64 mk_keyed_paddr(u16 hkid, struct pag= e *page) return ret; } =20 -static inline int pg_level_to_tdx_sept_level(enum pg_level level) -{ - WARN_ON_ONCE(level =3D=3D PG_LEVEL_NONE); - return level - 1; -} - u64 tdh_vp_enter(struct tdx_vp *vp, struct tdx_module_args *args); u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page); u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, struct page *page, struct= page *source, u64 *ext_err1, u64 *ext_err2); -u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, int level, struct page *p= age, u64 *ext_err1, u64 *ext_err2); +u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, enum pg_level level, stru= ct page *page, u64 *ext_err1, u64 *ext_err2); u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page); -u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *p= age, u64 *ext_err1, u64 *ext_err2); -u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, int level, u64 *ext_er= r1, u64 *ext_err2); +u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, enum pg_level level, stru= ct page *page, u64 *ext_err1, u64 *ext_err2); +u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, enum pg_level level, u= 64 *ext_err1, u64 *ext_err2); u64 tdh_mng_key_config(struct tdx_td *td); u64 tdh_mng_create(struct tdx_td *td, u16 hkid); u64 tdh_vp_create(struct tdx_td *td, struct tdx_vp *vp); @@ -217,7 +211,7 @@ u64 tdh_vp_rd(struct tdx_vp *vp, u64 field, u64 *data); u64 tdh_vp_wr(struct tdx_vp *vp, u64 field, u64 data, u64 mask); u64 tdh_phymem_page_reclaim(struct page *page, u64 *tdx_pt, u64 *tdx_owner= , u64 *tdx_size); u64 tdh_mem_track(struct tdx_td *tdr); -u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_er= r1, u64 *ext_err2); +u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, enum pg_level level, u= 64 *ext_err1, u64 *ext_err2); u64 tdh_phymem_cache_wb(bool resume); u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td); u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page); diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 5df9d32d2058..561461c9d131 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1648,14 +1648,13 @@ static int tdx_mem_page_add(struct kvm *kvm, gfn_t = gfn, enum pg_level level, static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn, enum pg_level level, kvm_pfn_t pfn) { - int tdx_level =3D pg_level_to_tdx_sept_level(level); struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); struct page *page =3D pfn_to_page(pfn); gpa_t gpa =3D gfn_to_gpa(gfn); u64 entry, level_state; u64 err; =20 - err =3D tdh_mem_page_aug(&kvm_tdx->td, gpa, tdx_level, page, &entry, &lev= el_state); + err =3D tdh_mem_page_aug(&kvm_tdx->td, gpa, level, page, &entry, &level_s= tate); if (unlikely(tdx_operand_busy(err))) return -EBUSY; =20 @@ -1699,12 +1698,11 @@ static int tdx_sept_set_private_spte(struct kvm *kv= m, gfn_t gfn, static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, enum pg_level level, void *private_spt) { - int tdx_level =3D pg_level_to_tdx_sept_level(level); gpa_t gpa =3D gfn_to_gpa(gfn); struct page *page =3D virt_to_page(private_spt); u64 err, entry, level_state; =20 - err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, tdx_level, page, &ent= ry, + err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, level, page, &entry, &level_state); if (unlikely(tdx_operand_busy(err))) return -EBUSY; @@ -1788,7 +1786,6 @@ static void tdx_sept_remove_private_spte(struct kvm *= kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte) { struct page *page =3D pfn_to_page(spte_to_pfn(mirror_spte)); - int tdx_level =3D pg_level_to_tdx_sept_level(level); struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); gpa_t gpa =3D gfn_to_gpa(gfn); u64 err, entry, level_state; @@ -1808,7 +1805,7 @@ static void tdx_sept_remove_private_spte(struct kvm *= kvm, gfn_t gfn, return; =20 err =3D tdh_do_no_vcpus(tdh_mem_range_block, kvm, &kvm_tdx->td, gpa, - tdx_level, &entry, &level_state); + level, &entry, &level_state); if (TDX_BUG_ON_2(err, TDH_MEM_RANGE_BLOCK, entry, level_state, kvm)) return; =20 @@ -1824,7 +1821,7 @@ static void tdx_sept_remove_private_spte(struct kvm *= kvm, gfn_t gfn, * Race with TDH.VP.ENTER due to (0-step mitigation) and Guest TDCALLs. */ err =3D tdh_do_no_vcpus(tdh_mem_page_remove, kvm, &kvm_tdx->td, gpa, - tdx_level, &entry, &level_state); + level, &entry, &level_state); if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_REMOVE, entry, level_state, kvm)) return; =20 diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 5ce4ebe99774..22c0f832cb37 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1516,6 +1516,12 @@ static void tdx_clflush_page(struct page *page) clflush_cache_range(page_to_virt(page), PAGE_SIZE); } =20 +static int pg_level_to_tdx_sept_level(enum pg_level level) +{ + WARN_ON_ONCE(level =3D=3D PG_LEVEL_NONE); + return level - 1; +} + noinstr u64 tdh_vp_enter(struct tdx_vp *td, struct tdx_module_args *args) { args->rcx =3D td->tdvpr_pa; @@ -1556,10 +1562,11 @@ u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, st= ruct page *page, struct page } EXPORT_SYMBOL_FOR_KVM(tdh_mem_page_add); =20 -u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, int level, struct page *p= age, u64 *ext_err1, u64 *ext_err2) +u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, enum pg_level level, + struct page *page, u64 *ext_err1, u64 *ext_err2) { struct tdx_module_args args =3D { - .rcx =3D gpa | level, + .rcx =3D gpa | pg_level_to_tdx_sept_level(level), .rdx =3D tdx_tdr_pa(td), .r8 =3D page_to_phys(page), }; @@ -1587,10 +1594,11 @@ u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *td= cx_page) } EXPORT_SYMBOL_FOR_KVM(tdh_vp_addcx); =20 -u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *p= age, u64 *ext_err1, u64 *ext_err2) +u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, enum pg_level level, + struct page *page, u64 *ext_err1, u64 *ext_err2) { struct tdx_module_args args =3D { - .rcx =3D gpa | level, + .rcx =3D gpa | pg_level_to_tdx_sept_level(level), .rdx =3D tdx_tdr_pa(td), .r8 =3D page_to_phys(page), }; @@ -1606,10 +1614,11 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, in= t level, struct page *page, u } EXPORT_SYMBOL_FOR_KVM(tdh_mem_page_aug); =20 -u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, int level, u64 *ext_er= r1, u64 *ext_err2) +u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, enum pg_level level, + u64 *ext_err1, u64 *ext_err2) { struct tdx_module_args args =3D { - .rcx =3D gpa | level, + .rcx =3D gpa | pg_level_to_tdx_sept_level(level), .rdx =3D tdx_tdr_pa(td), }; u64 ret; @@ -1822,10 +1831,11 @@ u64 tdh_mem_track(struct tdx_td *td) } EXPORT_SYMBOL_FOR_KVM(tdh_mem_track); =20 -u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_er= r1, u64 *ext_err2) +u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, enum pg_level level, + u64 *ext_err1, u64 *ext_err2) { struct tdx_module_args args =3D { - .rcx =3D gpa | level, + .rcx =3D gpa | pg_level_to_tdx_sept_level(level), .rdx =3D tdx_tdr_pa(td), }; u64 ret; --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 82A3731A56C for ; Thu, 29 Jan 2026 01:15:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649334; cv=none; b=eC+TYRQoJDVP58zNrFjgfQvY0UIVqDFXBm5Eo9unoUs6LcDmvJhq29Ru2WD2PXIkFGm+grgGIZKAzw4JGkFmspSTH1X0sHlQc62Wk5le8a242sYTab44+K+yoz9N9ldhA22AD9Q/AqDl4+xPCrmRQICJfbSMn2MD92di6vc3JFk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649334; c=relaxed/simple; bh=512/K9zz3Etgfo/Gtt/+uJuzjMDLWxSOVrO3dn//dXQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=fhlogGomcbGBAc8Y4Ya0YK4d+EW6w0eLN6SRtvDGaxaEf6Lov7T7CqwCyl+6xvz/aBi4vcGpXH0bDUe8I7xUeAB4eknxj6p4xXsMxl5Ihe1+Br6BraDDKsmCIcySWhnf7LlVE8x5GFrATkVHAeTjsh/vvgRtLrdqc54GXDcgc5g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=EPJgd6rI; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EPJgd6rI" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b630b4d8d52so239866a12.3 for ; Wed, 28 Jan 2026 17:15:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649332; x=1770254132; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=oJpdJVSHcQzV+/GX2rfzmObrBzUwVa1amlSsrvCn9LM=; b=EPJgd6rIwyNYD60F7j17l/C3mVc+WGLFxgc6FobNlrstijQjoF78m82Yr000xztpJK Qdz5t2o3WiTCxjXqiX6iIonY+xre/Q8/bO54R5akn4wvl7Edv5Md8Lu+7PxM86WRmaNy C/hxCQTHkXQplMMiLk2iRixbfRSd5kihsKxRMHHNt7poI5vb/BVSrTWjtuwThx43oGcT Ic8G/i1BgXyyvxe2Mphy5ssRp1dowgzlPZSbT+RwW9jDG/tI1rOfjgumoLfVu8pqRb+H KBcmf/xuYmYnfHQhIcn8ElXx8D6btiMvaUIQ1wRdoMOwEKWNn96iI+q9ZP79Tx6xiHVa o1vQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649332; x=1770254132; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oJpdJVSHcQzV+/GX2rfzmObrBzUwVa1amlSsrvCn9LM=; b=AJTWFxsIt8HNPbM9KYwjLm6PRnHG3Jj2utABFA72xo0akLcqEDQteWAyg84BETPPay VaOl9/2Ke3W1DJySEnML0OmrY+l5C9gpl2Fb/b9SRQUXK9i9zS/GA1KVs1zbD2mxrMUp q8BRN8gCiKsI3r3drH7G7jmZZrHsQhufJjYAZUgcT2zAGp7B/gKUU+PiKKqJA1RsWgOZ OWLtmO/vZKOVU9DePFdWAeIjIz1R4KMmnAgzsQHSlAXmrp3Vr7jsUZ18ICdquUQsfHCq OgiYAov6Mwpg8vmjwLRUolTmNrkdixhjR4Qqlpvpk4i7Wv020LGC6cLsAcY8N7+Z1zmh lVBg== X-Gm-Message-State: AOJu0YwB6E80FGoIZqzhVso95M6BL7db9ZM29tazsv5Uc6ObYSbnbtG+ MmGMKoTDQB0ocJSSWEPYtipr1gi+7msIgkPrf6pViYaenQ1fyhhfPBKiVTK7k+54zKghYEihqhp GBgdd0A== X-Received: from pgbdk12.prod.google.com ([2002:a05:6a02:c8c:b0:c63:3c6b:9ab6]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:32a6:b0:366:1de8:62dc with SMTP id adf61e73a8af0-38ec627bccbmr6327229637.8.1769649331636; Wed, 28 Jan 2026 17:15:31 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:34 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-3-seanjc@google.com> Subject: [RFC PATCH v5 02/45] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails" From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Pass a pointer to iter->old_spte, not simply its value, when setting an external SPTE in __tdp_mmu_set_spte_atomic(), so that the iterator's value will be updated if the cmpxchg64 to freeze the mirror SPTE fails. The bug is currently benign as TDX is mutualy exclusive with all paths that do "local" retry", e.g. clear_dirty_gfn_range() and wrprot_gfn_range(). Fixes: 77ac7079e66d ("KVM: x86/tdp_mmu: Propagate building mirror page tabl= es") Signed-off-by: Sean Christopherson Reviewed-by: Kai Huang Reviewed-by: Rick Edgecombe --- arch/x86/kvm/mmu/tdp_mmu.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 9c26038f6b77..0feda295859a 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -509,10 +509,10 @@ static void *get_external_spt(gfn_t gfn, u64 new_spte= , int level) } =20 static int __must_check set_external_spte_present(struct kvm *kvm, tdp_pte= p_t sptep, - gfn_t gfn, u64 old_spte, + gfn_t gfn, u64 *old_spte, u64 new_spte, int level) { - bool was_present =3D is_shadow_present_pte(old_spte); + bool was_present =3D is_shadow_present_pte(*old_spte); bool is_present =3D is_shadow_present_pte(new_spte); bool is_leaf =3D is_present && is_last_spte(new_spte, level); int ret =3D 0; @@ -525,7 +525,7 @@ static int __must_check set_external_spte_present(struc= t kvm *kvm, tdp_ptep_t sp * page table has been modified. Use FROZEN_SPTE similar to * the zapping case. */ - if (!try_cmpxchg64(rcu_dereference(sptep), &old_spte, FROZEN_SPTE)) + if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE)) return -EBUSY; =20 /* @@ -541,7 +541,7 @@ static int __must_check set_external_spte_present(struc= t kvm *kvm, tdp_ptep_t sp ret =3D kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt); } if (ret) - __kvm_tdp_mmu_write_spte(sptep, old_spte); + __kvm_tdp_mmu_write_spte(sptep, *old_spte); else __kvm_tdp_mmu_write_spte(sptep, new_spte); return ret; @@ -670,7 +670,7 @@ static inline int __must_check __tdp_mmu_set_spte_atomi= c(struct kvm *kvm, return -EBUSY; =20 ret =3D set_external_spte_present(kvm, iter->sptep, iter->gfn, - iter->old_spte, new_spte, iter->level); + &iter->old_spte, new_spte, iter->level); if (ret) return ret; } else { --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E62D7E0E8 for ; Thu, 29 Jan 2026 01:15:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649335; cv=none; b=RvvyESDzhTNVMIWF12sxMgq920fzmZZ6+X/451vQK+mxQxdeLuLZKcsqVJb1m6FDZ1RMZOWAySJDAIFltLnItJ6t7EzB9xZ7XR5oFUqrYba7e2qKx+NhFsjnzrPCbkQ19kP4+5o8oHWppmpxdLDR0FA/2eQ727KP095CcFZkA6Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649335; c=relaxed/simple; bh=rawa+GdPzt/mn/jjYKeF7IrFPzAWvGBv/cL7AH4QpdE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=pgSUfu9N1kY822Ym9H052msMnOz4kNUeHG1LTNW0CsAs5VxcIjsHScIQvWRb3JxXKgNxiuyAqapqxD3LG3uVSCGSwy+QNUlCmVeg8f+vMyz5mL0aI7KNbcelPR94kKWGDEKvVQUoJUaxdPmQsyriBpbn6EKP2OoLBVMmSJO2zTs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Izhe0iXb; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Izhe0iXb" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-352de7a89e1so337943a91.1 for ; Wed, 28 Jan 2026 17:15:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649333; x=1770254133; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=QDz31TgUHSC/pAT7NKm7s9BECEFc0xut5G3SF9CrPBg=; b=Izhe0iXbQIr5wee6Ty1226YDIUoBlPpzttNBH7dTq5yIE9nPjqyZIQHmTEOuvLTZ0Z DU5USJfKNI65e+kSZbJciUHrkGOh7jP1Uvm44E+kWNtfry6oldtYjvuYybSTOArYL/tw WKSY9AeeTrzVRtEZ17cgW760BkgpZfeAAbgkKYjAy9zba70laEZyxzvpY256PAMQdbUs EWdIREoVDcKjGtUeLm8q5jcCm+vlRR3PzB+dwFPHwIinSZOKNePqYhs6K+syXU37umtg h8wfDMGe4DIfFTzcwH0ppnWxLypmYzx32DslYQZ5rJQDf2E2UZbzNQsjtOjHXDKU6jmz C86Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649333; x=1770254133; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=QDz31TgUHSC/pAT7NKm7s9BECEFc0xut5G3SF9CrPBg=; b=OCVv1fzzIFDKW0Unkdq0R8oiS2WHDr3M5Dw5urqOShBIsDbB72sdU+6kg4En+7c7N4 oiglqgQuMaOwvsIJPC6nTMqS4iZbRQ40IjymFpNMZuxQ4Odh+qkSoQ0FN+fY+jmPax9s fFyIL1WBlwXZpUISs/i2xA87pVk8eQR9N44i9VPBAj9BnZV/FjrOEJ7QAQ8FTXOcGYd+ y37acq1I+Wpf6qJuhbmVu0dGwnnM9yqj66/BHLMn+vifGxUFhHdl4WxZMhBJCJtoWjZE 2mQoV/pxHbEnmEvLC0yF4zLuCacZmtn1hU3X4YVW+R0lPRNoXXjEqB7Mt1Si2Lek+Z0z GSZg== X-Gm-Message-State: AOJu0YyJPfaEb8Zima1v2nPRsesthOMGvPg+o8vtRRuV8IKxlwO3WUw7 UQt5rI9JJajR58S+2QaQQ06pcoE8+NBG8Ys6rXUuC2A9sc7K/uLk/onYaoT+w8nMFbteOfteQMW hilX/eg== X-Received: from pjbnh9.prod.google.com ([2002:a17:90b:3649:b0:352:ff8b:ef26]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:50c8:b0:32b:9774:d340 with SMTP id 98e67ed59e1d1-353fedb93fbmr6374509a91.33.1769649333379; Wed, 28 Jan 2026 17:15:33 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:35 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-4-seanjc@google.com> Subject: [RFC PATCH v5 03/45] KVM: TDX: Account all non-transient page allocations for per-TD structures From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Account all non-transient allocations associated with a single TD (or its vCPUs), as KVM's ABI is that allocations that are active for the lifetime of a VM are accounted. Leave temporary allocations, i.e. allocations that are freed within a single function/ioctl, unaccounted, to again align with KVM's existing behavior, e.g. see commit dd103407ca31 ("KVM: X86: Remove unnecessary GFP_KERNEL_ACCOUNT for temporary variables"). Fixes: 8d032b683c29 ("KVM: TDX: create/destroy VM structure") Fixes: a50f673f25e0 ("KVM: TDX: Do TDX specific vcpu initialization") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Reviewed-by: Kai Huang Reviewed-by: Rick Edgecombe --- arch/x86/kvm/vmx/tdx.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 561461c9d131..5688c77616e3 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2397,7 +2397,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_p= arams *td_params, =20 atomic_inc(&nr_configured_hkid); =20 - tdr_page =3D alloc_page(GFP_KERNEL); + tdr_page =3D alloc_page(GFP_KERNEL_ACCOUNT); if (!tdr_page) goto free_hkid; =20 @@ -2405,12 +2405,12 @@ static int __tdx_td_init(struct kvm *kvm, struct td= _params *td_params, /* TDVPS =3D TDVPR(4K page) + TDCX(multiple 4K pages), -1 for TDVPR. */ kvm_tdx->td.tdcx_nr_pages =3D tdx_sysinfo->td_ctrl.tdvps_base_size / PAGE= _SIZE - 1; tdcs_pages =3D kcalloc(kvm_tdx->td.tdcs_nr_pages, sizeof(*kvm_tdx->td.tdc= s_pages), - GFP_KERNEL); + GFP_KERNEL_ACCOUNT); if (!tdcs_pages) goto free_tdr; =20 for (i =3D 0; i < kvm_tdx->td.tdcs_nr_pages; i++) { - tdcs_pages[i] =3D alloc_page(GFP_KERNEL); + tdcs_pages[i] =3D alloc_page(GFP_KERNEL_ACCOUNT); if (!tdcs_pages[i]) goto free_tdcs; } @@ -2885,7 +2885,7 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u6= 4 vcpu_rcx) int ret, i; u64 err; =20 - page =3D alloc_page(GFP_KERNEL); + page =3D alloc_page(GFP_KERNEL_ACCOUNT); if (!page) return -ENOMEM; tdx->vp.tdvpr_page =3D page; @@ -2898,14 +2898,14 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, = u64 vcpu_rcx) tdx->vp.tdvpr_pa =3D page_to_phys(tdx->vp.tdvpr_page); =20 tdx->vp.tdcx_pages =3D kcalloc(kvm_tdx->td.tdcx_nr_pages, sizeof(*tdx->vp= .tdcx_pages), - GFP_KERNEL); + GFP_KERNEL_ACCOUNT); if (!tdx->vp.tdcx_pages) { ret =3D -ENOMEM; goto free_tdvpr; } =20 for (i =3D 0; i < kvm_tdx->td.tdcx_nr_pages; i++) { - page =3D alloc_page(GFP_KERNEL); + page =3D alloc_page(GFP_KERNEL_ACCOUNT); if (!page) { ret =3D -ENOMEM; goto free_tdcx; --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B9402326927 for ; Thu, 29 Jan 2026 01:15:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649337; cv=none; b=psh9SPPXWxEcU2a/LBDg2KIhDvhK0ggnxy2421vWBbCjHXuNQD1RNJRezUBEqA1203PqyBf9DLhf7u6ICExN5sE0lhYJLMMd9vrEwIhpAYGwBmQ5v0rq9XllIMiq9HbALFojQWXj8wjxkcUd7TLcuj2aH74e9qVSplGp0nOfwJg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649337; c=relaxed/simple; bh=i+BaQipwcwIk83xd9q3EaqDNGVAGV/ShuF7c8g62crA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=SrS7XnFMCJ6xSvx8IgqK5LsLTcvS4h71NssEVbgOS4Z0x4TnzQ8MdtQb/1+K8GWTRqUaO3AwfFwGQy7aHteQ+fcTDC5mBuho34xu4ggYZPZuMsLPicPLkI1d8a++DGn+es20XSB07bq4KD0UEXpVwWiKOcMV/+q249plOR5C9Q8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0UsIwRae; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0UsIwRae" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-352c7924ebcso380771a91.3 for ; Wed, 28 Jan 2026 17:15:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649335; x=1770254135; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=7CglUg76d+5MQRtd6QcMDKkRS3BGZoZjOLZ6NJir0yk=; b=0UsIwRaeBsjtSch/Ppnq8Y9B2FLg+XUcIqJJhbAD280Uv7WFo2iePwEAlL/KNW1Hch 0x5mC+UACTra/Ja2iglZxoveEa9DWoL+TB/10HXJQ2v6pMdEA/3XqxMxcdzjkjOhyBhT yxKB0TpMUgC990yN99Pc/sKy5ZCnuD5jtnIrW4smXPt6DwiJtMK4EdqOH684LqyF5oF2 AKd2KjP3El4q6nPzlY0VhQuV0K67mwyhaC33r4JQdm5QxuorRlh1NmYwpRxvBEcsvlUb byVuFpIhhjfSZO2+pBF//bRhPZt3HvOgldEC8+jFhVg7MG8nBXICx8RgZ3IKYadlyK+W jUBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649335; x=1770254135; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7CglUg76d+5MQRtd6QcMDKkRS3BGZoZjOLZ6NJir0yk=; b=LEwwNueSxxj7GR58RE8u1nFwBnAsZSXnvpcPEKbj+W7GXjphg796lOBMSD2JxvjYiH jeOGGGYb5HSI+H8BdC/H5mFrRVj9NPJPnGZlXdTjaYlAzUPFpoSd9rVJe3YKuGUGe47J GHyXhPhdbzsCSFrFGTSOCSvhPR8aRBrPwTue3SFpsZmr9OPJc8jhcbWP/yUuFfhKfLT7 CWzCzhpSmaJErPVK6cwsDQTluwO9K6sn3w0ciRArxwAlw+AerIEXR/+4AYsw+hDZLa5F d6Gqai1Flz/vTE23CHgW4pWwWPauo3Wt4ai/KCy7uU8+JJWFHNvdD/2GUj46VvOaYTPd O5NQ== X-Gm-Message-State: AOJu0YxuiL6jJUTuFaupi+dImYF56pvcZkUES8xDeHvla90R0sgLbMKT qXETwXtPCuk0cIXQ3dFoVTp736ae9mePAYRb7p1Ih9O9zxfIlB+pFNWDsIPu2l3zP7lBnzYIJD+ LIZfOtQ== X-Received: from pjbft8.prod.google.com ([2002:a17:90b:f88:b0:352:bd7e:99e7]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:314d:b0:340:ecad:414 with SMTP id 98e67ed59e1d1-353fed8797emr5977548a91.27.1769649335221; Wed, 28 Jan 2026 17:15:35 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:36 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-5-seanjc@google.com> Subject: [RFC PATCH v5 04/45] KVM: x86: Make "external SPTE" ops that can fail RET0 static calls From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Define kvm_x86_ops .link_external_spt(), .set_external_spte(), and .free_external_spt() as RET0 static calls so that an unexpected call to a a default operation doesn't consume garbage. Fixes: 77ac7079e66d ("KVM: x86/tdp_mmu: Propagate building mirror page tabl= es") Fixes: 94faba8999b9 ("KVM: x86/tdp_mmu: Propagate tearing down mirror page = tables") Signed-off-by: Sean Christopherson Reviewed-by: Kai Huang Reviewed-by: Rick Edgecombe --- arch/x86/include/asm/kvm-x86-ops.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index de709fb5bd76..c18a033bee7e 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -94,9 +94,9 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr) KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) -KVM_X86_OP_OPTIONAL(link_external_spt) -KVM_X86_OP_OPTIONAL(set_external_spte) -KVM_X86_OP_OPTIONAL(free_external_spt) +KVM_X86_OP_OPTIONAL_RET0(link_external_spt) +KVM_X86_OP_OPTIONAL_RET0(set_external_spte) +KVM_X86_OP_OPTIONAL_RET0(free_external_spt) KVM_X86_OP_OPTIONAL(remove_external_spte) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 85F2D32AAA3 for ; Thu, 29 Jan 2026 01:15:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649340; cv=none; b=r98bJ3B2up9qMnFFxeC0X16kkDiVMNBte26P1CAsrbcgK41wpjUBciVBRSdC3vZIgbnuTUMmhQ7p6hyfMg9g0yyQNbVy7M0DY/qCNINsutc4AxpcLuynA0K1MLTvJoM5PQOOsGIv71swQGBv5RjMKEGrLIhnEXH7ilGZlVyoUKc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649340; c=relaxed/simple; bh=P8qsQaiuZL+h7L+tAAdynSCmTDEjnF5aicywFGbF+f4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=DyaieWMBG7lLW2x49s3dGtO0gga1fuYBaD5QSC5P1+ETtQz8AP5u0fq2SrvKDyQPgii12pmoGrQqFQR3vxO8d3AtJvzcjQDfld1acLyVX2+fVfG/A+aovcuIqnjzfLfLr3BXwBttomV5p7pMt2rbzv3KKr+fK0DmHBa36rRfsq4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=MLX3OaMZ; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="MLX3OaMZ" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c6187bdadcdso216004a12.0 for ; Wed, 28 Jan 2026 17:15:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649337; x=1770254137; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=pQCejTr0vCZl81GKWwdB4BKHKeMCfpmuFrgxzwPsDWQ=; b=MLX3OaMZsd1x+Dr8bv/oDSDCV9VQsrnpASyMyf9vmk8FVw+CL/KZcNo0d01wm/tOdm O+6rJermk6khzhsHnh9J5LPIYv1ryVBT0kgKMDW1vxR0YMJ1VKLGHK6ouIbm5gucHSEr bcETH6xOJVlet2o8eE56euryRwTlbpMdckpvxJC7rSIhTUXd9wa1FQKdfNkLYSn5s1Ln JB5WkN1d3Renh6yPKqj3ztXrGkglWtT0pQU/J/e6xAdwCBB7ovJlVZ2nwPAI6+45YulV JjteZmR/eTLOvu5d+fO42MMtqsyA6CnTvN9rX4Gx5aEBIqH3PYBZDiPxdKOQkXTDis7P WDhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649337; x=1770254137; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pQCejTr0vCZl81GKWwdB4BKHKeMCfpmuFrgxzwPsDWQ=; b=BUswzLJjR3zwco4ZYR7L+d9LpwIb9RJmXv4HIoxBuLXr8h0P5l664xZuq03UKORwjV L7YxOash/WHCBW8TvDt56WVwaeSMsCc2q82EAAA8+movsRZz/6zdGacNN7HjX4ZJkIPb ljrG+/wtbQEceID33ohNCCUXUxri+2ve7fgY5C0GBMBSo9r1/VzL/Jp0SD8uRpuv9mJK ePFSQjfyZ6ImnillOoJHHzwMndGfk46oF7mHGBazN1qPX5lUZPPobCpprYe9nf0vfmYd O0yEN7MzgbNl+8dkOQRDt3PXY6H5E9Yz32hCrbmOX6dHoqohMRtkqbWjxmE8dqZWy+ey aowg== X-Gm-Message-State: AOJu0YxtwXNlHj34TbVHtSkYx3wDQ4mfIvwuBAAdTEVky93Z/P/ugfms a/jiLp+TwQ/+owlVQ4IxKsa74exJ+Py4Yx2HAWozrwTkD93ydXMcLLi9MovzmZ0ZG0FWFz85zDF hu2OfHQ== X-Received: from pgg11.prod.google.com ([2002:a05:6a02:4d8b:b0:c65:b239:37ed]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:a106:b0:35d:d477:a7f3 with SMTP id adf61e73a8af0-38ec62ddca8mr6620267637.20.1769649336815; Wed, 28 Jan 2026 17:15:36 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:37 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-6-seanjc@google.com> Subject: [RFC PATCH v5 05/45] KVM: TDX: Drop kvm_x86_ops.link_external_spt(), use .set_external_spte() for all From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Drop the dedicated .link_external_spt() for linking non-leaf S-EPT pages, and instead funnel everything through .set_external_spte(). Using separate hooks doesn't help prevent TDP MMU details from bleeding into TDX, and vice versa; to the contrary, dedicated callbacks will result in _more_ pollution when hugepage support is added, e.g. will require the TDP MMU to know details about the splitting rules for TDX that aren't all that relevant to the TDP MMU. Ideally, KVM would provide a single pair of hooks to set S-EPT entries, one hook for setting SPTEs under write-lock and another for settings SPTEs under read-lock (e.g. to ensure the entire operation is "atomic", to allow for failure, etc.). Sadly, TDX's requirement that all child S-EPT entries are removed before the parent makes that impractical: the TDP MMU deliberately prunes non-leaf SPTEs and _then_ processes its children, thus making it quite important for the TDP MMU to differentiate between zapping leaf and non-leaf S-EPT entries. However, that's the _only_ case that's truly special, and even that case could be shoehorned into a single hook; it's just wouldn't be a net positive. Signed-off-by: Sean Christopherson Reviewed-by: Rick Edgecombe --- arch/x86/include/asm/kvm-x86-ops.h | 1 - arch/x86/include/asm/kvm_host.h | 3 -- arch/x86/kvm/mmu/tdp_mmu.c | 37 +++--------------- arch/x86/kvm/vmx/tdx.c | 61 ++++++++++++++++++++---------- 4 files changed, 48 insertions(+), 54 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index c18a033bee7e..57eb1f4832ae 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -94,7 +94,6 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr) KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) -KVM_X86_OP_OPTIONAL_RET0(link_external_spt) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) KVM_X86_OP_OPTIONAL_RET0(free_external_spt) KVM_X86_OP_OPTIONAL(remove_external_spte) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index e441f270f354..d12ca0f8a348 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1853,9 +1853,6 @@ struct kvm_x86_ops { void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); =20 - /* Update external mapping with page table link. */ - int (*link_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, - void *external_spt); /* Update the external page table from spte getting set. */ int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 0feda295859a..56ad056e6042 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -495,31 +495,17 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pt= ep_t pt, bool shared) call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); } =20 -static void *get_external_spt(gfn_t gfn, u64 new_spte, int level) -{ - if (is_shadow_present_pte(new_spte) && !is_last_spte(new_spte, level)) { - struct kvm_mmu_page *sp =3D spte_to_child_sp(new_spte); - - WARN_ON_ONCE(sp->role.level + 1 !=3D level); - WARN_ON_ONCE(sp->gfn !=3D gfn); - return sp->external_spt; - } - - return NULL; -} - static int __must_check set_external_spte_present(struct kvm *kvm, tdp_pte= p_t sptep, gfn_t gfn, u64 *old_spte, u64 new_spte, int level) { - bool was_present =3D is_shadow_present_pte(*old_spte); - bool is_present =3D is_shadow_present_pte(new_spte); - bool is_leaf =3D is_present && is_last_spte(new_spte, level); - int ret =3D 0; - - KVM_BUG_ON(was_present, kvm); + int ret; =20 lockdep_assert_held(&kvm->mmu_lock); + + if (KVM_BUG_ON(is_shadow_present_pte(*old_spte), kvm)) + return -EIO; + /* * We need to lock out other updates to the SPTE until the external * page table has been modified. Use FROZEN_SPTE similar to @@ -528,18 +514,7 @@ static int __must_check set_external_spte_present(stru= ct kvm *kvm, tdp_ptep_t sp if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE)) return -EBUSY; =20 - /* - * Use different call to either set up middle level - * external page table, or leaf. - */ - if (is_leaf) { - ret =3D kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); - } else { - void *external_spt =3D get_external_spt(gfn, new_spte, level); - - KVM_BUG_ON(!external_spt, kvm); - ret =3D kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt); - } + ret =3D kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); if (ret) __kvm_tdp_mmu_write_spte(sptep, *old_spte); else diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 5688c77616e3..30494f9ceb31 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1664,18 +1664,58 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t = gfn, return 0; } =20 +static struct page *tdx_spte_to_external_spt(struct kvm *kvm, gfn_t gfn, + u64 new_spte, enum pg_level level) +{ + struct kvm_mmu_page *sp =3D spte_to_child_sp(new_spte); + + if (KVM_BUG_ON(!sp->external_spt, kvm) || + KVM_BUG_ON(sp->role.level + 1 !=3D level, kvm) || + KVM_BUG_ON(sp->gfn !=3D gfn, kvm)) + return NULL; + + return virt_to_page(sp->external_spt); +} + +static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, u64 mirror_spte) +{ + gpa_t gpa =3D gfn_to_gpa(gfn); + u64 err, entry, level_state; + struct page *external_spt; + + external_spt =3D tdx_spte_to_external_spt(kvm, gfn, mirror_spte, level); + if (!external_spt) + return -EIO; + + err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, level, external_spt, + &entry, &level_state); + if (unlikely(tdx_operand_busy(err))) + return -EBUSY; + + if (TDX_BUG_ON_2(err, TDH_MEM_SEPT_ADD, entry, level_state, kvm)) + return -EIO; + + return 0; +} + static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); =20 + if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) + return -EIO; + + if (!is_last_spte(mirror_spte, level)) + return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte); + /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) return -EIO; =20 - WARN_ON_ONCE(!is_shadow_present_pte(mirror_spte) || - (mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); + WARN_ON_ONCE((mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); =20 /* * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory() @@ -1695,23 +1735,7 @@ static int tdx_sept_set_private_spte(struct kvm *kvm= , gfn_t gfn, return tdx_mem_page_aug(kvm, gfn, level, pfn); } =20 -static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, - enum pg_level level, void *private_spt) -{ - gpa_t gpa =3D gfn_to_gpa(gfn); - struct page *page =3D virt_to_page(private_spt); - u64 err, entry, level_state; =20 - err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, level, page, &entry, - &level_state); - if (unlikely(tdx_operand_busy(err))) - return -EBUSY; - - if (TDX_BUG_ON_2(err, TDH_MEM_SEPT_ADD, entry, level_state, kvm)) - return -EIO; - - return 0; -} =20 /* * Ensure shared and private EPTs to be flushed on all vCPUs. @@ -3592,7 +3616,6 @@ void __init tdx_hardware_setup(void) */ vt_x86_ops.vm_size =3D max_t(unsigned int, vt_x86_ops.vm_size, sizeof(str= uct kvm_tdx)); =20 - vt_x86_ops.link_external_spt =3D tdx_sept_link_private_spt; vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; vt_x86_ops.free_external_spt =3D tdx_sept_free_private_spt; vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1DD5532ED40 for ; Thu, 29 Jan 2026 01:15:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649342; cv=none; b=q3DmKDvkjLcj2Tkt972QNeRN3gQEDsk45P2a/ZjIvDZksFkFr6O2e03LxtHBlAROjSQanqUMKpzjsCR+AUgUvP5w9D4Kazf8P1Uey3Md5PudFazqlT4TY7ORae0xAdFmGYaNFqpwqwAQkVQLrFsAYD+jkRbWDU9bT3QdQRObD54= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649342; c=relaxed/simple; bh=52WHRYcBaF84cq4d9eTCRiZcdWMBb91+paCcx/t1GiQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=SP/oWnGAy8sOQDVUoAH9WcvgNVB81K2+VMRqQlMfjg4mcCpN+Nb4yLszpaab5xaTKAmLp6s31MuK+wQc9F1pyQWp3ttCSMfQPvgt+RbOaIsWR5YfDPNsX5c2mJspTE0MdnNDM4U2zZ1w+iG42KUpuDwMMAl3y5RR2Ta3BwhIngc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Ypv5dFtP; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Ypv5dFtP" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2a0bae9acd4so3360685ad.3 for ; Wed, 28 Jan 2026 17:15:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649339; x=1770254139; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=34WkAZh0eO1zmQ97gxROYaaRFi2k1TsiQe3th+qNzkY=; b=Ypv5dFtPf/9Y0aL6OHDXtJ0YZkdkzx6d2/0qMrCxo6Eixo0KRKCOeVYwmKoTmbjcT/ hbBMR7DFDJ7EKasj9z363MZYnN+/1+HRuwZUWAspb2EKdSmYUvpFXz/L6zHwM+vwr0P0 r1bJSvDshwobZfBavWF3KWweWkirRQy7MDASZPC1zpMIL8scfCaN64sS+3amKHHhMSsw tNKPP0PyN3JCjE8m+IBstWK5EB+VU0I7uptBIGLfIdAdQqTRKmz8MTTUg/xklhJD9MGM tArVUIjTpdszC42rT88t5VKhbEgiCIZBtXtGz7Tloo4GIHgnVJEofEVoL1XWtDB0cp1Y f2Ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649339; x=1770254139; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=34WkAZh0eO1zmQ97gxROYaaRFi2k1TsiQe3th+qNzkY=; b=X6m8cDY5lDkITPzb6enOsXmKYXzSYzX0G1NlceJuQxE6023BpBFg6pdpVtjgNnyDRK Fdps0u8dd5L6WZfyXq9+BWWy7/Om8N4Lkn7F5vMHdzLz+5c1jmYqgl1DsJG8ZJJBWXTC Zw4EYGIo4d7JxFQ/KJKS3D7oicKHRvhalvSe1rgzMg+/VM7BwFLPfpvx/5elI3DUH7H0 Sijn9Ld0W6CSvo6t27czc5UXotdE44rF0rydsdUbbuY/VoIYW6iQ6DQWhwmmj8sl+8Mk RkR0NhIzw6u4f/38d6coKM7J+pttmXmgSCDDUDrjlwvXCh8mKd/JAzlkHaK9FP6I895r IvmQ== X-Gm-Message-State: AOJu0YyuM4PtJoY/xc6IPdn2hVZ+tVmu1c8CBolQG8v73ZPUChjdipNn YUlwt9554vpOhMHOEnF34KohD98U1bSSNQRw4bmB+AKQ78/4uO0Of3bI8oHl5A7BCrO0ENQutBL n6t5Ysw== X-Received: from plnx1.prod.google.com ([2002:a17:902:8201:b0:2a0:f5f5:419d]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:dac4:b0:2a0:c5b8:24b0 with SMTP id d9443c01a7336-2a870dd56edmr73393065ad.46.1769649338679; Wed, 28 Jan 2026 17:15:38 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:38 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-7-seanjc@google.com> Subject: [RFC PATCH v5 06/45] KVM: x86/mmu: Fold set_external_spte_present() into its sole caller From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Fold set_external_spte_present() into __tdp_mmu_set_spte_atomic() in anticipation of supporting hugepage splitting, at which point other paths will also set shadow-present external SPTEs. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 82 +++++++++++++++++--------------------- 1 file changed, 36 insertions(+), 46 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 56ad056e6042..6fb48b217f5b 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -495,33 +495,6 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pte= p_t pt, bool shared) call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); } =20 -static int __must_check set_external_spte_present(struct kvm *kvm, tdp_pte= p_t sptep, - gfn_t gfn, u64 *old_spte, - u64 new_spte, int level) -{ - int ret; - - lockdep_assert_held(&kvm->mmu_lock); - - if (KVM_BUG_ON(is_shadow_present_pte(*old_spte), kvm)) - return -EIO; - - /* - * We need to lock out other updates to the SPTE until the external - * page table has been modified. Use FROZEN_SPTE similar to - * the zapping case. - */ - if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE)) - return -EBUSY; - - ret =3D kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); - if (ret) - __kvm_tdp_mmu_write_spte(sptep, *old_spte); - else - __kvm_tdp_mmu_write_spte(sptep, new_spte); - return ret; -} - /** * handle_changed_spte - handle bookkeeping associated with an SPTE change * @kvm: kvm instance @@ -626,6 +599,8 @@ static inline int __must_check __tdp_mmu_set_spte_atomi= c(struct kvm *kvm, struct tdp_iter *iter, u64 new_spte) { + u64 *raw_sptep =3D rcu_dereference(iter->sptep); + /* * The caller is responsible for ensuring the old SPTE is not a FROZEN * SPTE. KVM should never attempt to zap or manipulate a FROZEN SPTE, @@ -638,31 +613,46 @@ static inline int __must_check __tdp_mmu_set_spte_ato= mic(struct kvm *kvm, int ret; =20 /* - * Users of atomic zapping don't operate on mirror roots, - * so don't handle it and bug the VM if it's seen. + * KVM doesn't currently support zapping or splitting mirror + * SPTEs while holding mmu_lock for read. */ - if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) + if (KVM_BUG_ON(is_shadow_present_pte(iter->old_spte), kvm) || + KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EBUSY; =20 - ret =3D set_external_spte_present(kvm, iter->sptep, iter->gfn, - &iter->old_spte, new_spte, iter->level); + /* + * Temporarily freeze the SPTE until the external PTE operation + * has completed, e.g. so that concurrent faults don't attempt + * to install a child PTE in the external page table before the + * parent PTE has been written. + */ + if (!try_cmpxchg64(raw_sptep, &iter->old_spte, FROZEN_SPTE)) + return -EBUSY; + + /* + * Update the external PTE. On success, set the mirror SPTE to + * the desired value. On failure, restore the old SPTE so that + * the SPTE isn't frozen in perpetuity. + */ + ret =3D kvm_x86_call(set_external_spte)(kvm, iter->gfn, + iter->level, new_spte); if (ret) - return ret; - } else { - u64 *sptep =3D rcu_dereference(iter->sptep); - - /* - * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs - * and does not hold the mmu_lock. On failure, i.e. if a - * different logical CPU modified the SPTE, try_cmpxchg64() - * updates iter->old_spte with the current value, so the caller - * operates on fresh data, e.g. if it retries - * tdp_mmu_set_spte_atomic() - */ - if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte)) - return -EBUSY; + __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte); + else + __kvm_tdp_mmu_write_spte(iter->sptep, new_spte); + return ret; } =20 + /* + * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs and + * does not hold the mmu_lock. On failure, i.e. if a different logical + * CPU modified the SPTE, try_cmpxchg64() updates iter->old_spte with + * the current value, so the caller operates on fresh data, e.g. if it + * retries tdp_mmu_set_spte_atomic() + */ + if (!try_cmpxchg64(raw_sptep, &iter->old_spte, new_spte)) + return -EBUSY; + return 0; } =20 --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5C0F4330647 for ; Thu, 29 Jan 2026 01:15:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649344; cv=none; b=lJsazmwaIAcaoSkNtqhL0cErEE00zhWltxPshZUFs4FlSi7hvr3ovJw5bTZE/kq/LVb9MWObjDC0X2FEtO4Oa/E7R2Qnw3TBgqiIhi1jLEqfCWw7uQs6J1goDj/YUJ8M47PakjnIuuMZOYqO0DyJq1ujwj1hSgUwsQTsYS+wcS4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649344; c=relaxed/simple; bh=u2c/NHbwd0lYBbUDVthgSs2By7odK2wgvzMf1mYm8nw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=XMIAyu3tkBGi2ItmvbQLnMioVHWxPlCLTREJAWiTDPrbpGjL1Uv20gErN7KAnfN7U1Z3wguC6bwRafnEvK8/2S2+RlLUi70xZBl6OyL7QpNBWgoePGQ/G9Whinb+tozBNJUCbcf7CSxc3PeJ0NLuyDTvwHqagWF8jcgxBZhE3sU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=3TPPBt9x; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="3TPPBt9x" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2a13be531b2so3454995ad.2 for ; Wed, 28 Jan 2026 17:15:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649341; x=1770254141; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=yG0aJZj9vYemOuZH1z1kqzk7rC62lJqsq0W3///tL/M=; b=3TPPBt9xfC4daULq855+G1xZ8/MD3fU3A6zDwIlq59gwqQ61ijTWwM00lcrAn+KZid wUnNpw85dMlXJ3fdV4JOjEO5UhL8INXoJesshXjCq+sh3JcYeds5FB7jO81t135r6MTJ KYzZI1S0MudJDLBu5vfHSSL++YZQ/UUL6YQxnw0sq+AnTaUC2pyvDo6OHSFUIUFwzzQO Nc2oc1C6MiZHLf2uCZlZZH33+xoGeSvt6IukZX3vCLnNwkjyxLx6cue7kadxwTUt2pTe vkaiqwAORkfidawGhOdCtHQfe/OEbuA17XxIxJ9u8unXjfG5UMC2soUKLXAt6hud+w2B I9Nw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649341; x=1770254141; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yG0aJZj9vYemOuZH1z1kqzk7rC62lJqsq0W3///tL/M=; b=u8JpCIokcqNDBLcahWLh/RHvRmA13heDVcBLsT5y0S/X6ZnVGxLa6jbiICwCFs4CBA tPTMeEvLKRsekVS/k3ZZed6D2yPVLqj8QBnMJ+egNf0Rqn8IkL8yDc6ljq6O+2XH+8C9 zRKZOWfqANLcirUs3Rde+2vRkQnXpkHEWjOTSENJcVqLUxWJA9H/gDjdkEbAbpnxTQY6 JTVuy3Qwd8xM1ZIoj/Sk4mfye9FUF++/shm5y0bwdPtFChC7OaPAoGhjvLkLQoRylBG7 QUKCTL8lbEvUf/yvBcpa9GqVETd1qqUKLyEISLHwvV7hsTK/HbxsyJivQ133DktrF6A2 jYww== X-Gm-Message-State: AOJu0YxQwncEC4nrGl4n8QiMXikAmSMGZo4dSuYQ/IK3iQ79XFFqLo4D 0wqw3J2u9QZPwc/Bj3Wi88WH1kAYf94YRnvxzoY74gQIBXeDXQEgeawpdM9zE/eHFKN2ri/M363 aVlRbbg== X-Received: from pgbcq1.prod.google.com ([2002:a05:6a02:4081:b0:bc3:ac3:8769]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:a41:b0:389:94e1:25c1 with SMTP id adf61e73a8af0-38ec658ba91mr6736564637.70.1769649340957; Wed, 28 Jan 2026 17:15:40 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:39 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-8-seanjc@google.com> Subject: [RFC PATCH v5 07/45] KVM: x86/mmu: Plumb the SPTE _pointer_ into the TDP MMU's handle_changed_spte() From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Plumb the SPTE pointer into handle_changed_spte() so that remove leaf mirror entries can be forwarded to TDX in handle_changed_spte(), instead of effectively requiring callers to manually do so. Relying on each caller to invoke .remove_external_spte() is confusing and brittle, e.g. subtly relies tdp_mmu_set_spte_atomic() never removing SPTEs. This will also allow consolidating all S-EPT updates into a single kvm_x86_ops hook. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 6fb48b217f5b..8743cd020d12 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -320,9 +320,9 @@ void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool= mirror) } } =20 -static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, - u64 old_spte, u64 new_spte, int level, - bool shared); +static void handle_changed_spte(struct kvm *kvm, int as_id, tdp_ptep_t spt= ep, + gfn_t gfn, u64 old_spte, u64 new_spte, + int level, bool shared); =20 static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp) { @@ -471,7 +471,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep= _t pt, bool shared) old_spte =3D kvm_tdp_mmu_write_spte(sptep, old_spte, FROZEN_SPTE, level); } - handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn, + handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), sptep, gfn, old_spte, FROZEN_SPTE, level, shared); =20 if (is_mirror_sp(sp)) { @@ -499,6 +499,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep= _t pt, bool shared) * handle_changed_spte - handle bookkeeping associated with an SPTE change * @kvm: kvm instance * @as_id: the address space of the paging structure the SPTE was a part of + * @sptep: pointer to the SPTE * @gfn: the base GFN that was mapped by the SPTE * @old_spte: The value of the SPTE before the change * @new_spte: The value of the SPTE after the change @@ -511,9 +512,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep= _t pt, bool shared) * dirty logging updates are handled in common code, not here (see make_sp= te() * and fast_pf_fix_direct_spte()). */ -static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, - u64 old_spte, u64 new_spte, int level, - bool shared) +static void handle_changed_spte(struct kvm *kvm, int as_id, tdp_ptep_t spt= ep, + gfn_t gfn, u64 old_spte, u64 new_spte, + int level, bool shared) { bool was_present =3D is_shadow_present_pte(old_spte); bool is_present =3D is_shadow_present_pte(new_spte); @@ -685,8 +686,8 @@ static inline int __must_check tdp_mmu_set_spte_atomic(= struct kvm *kvm, if (ret) return ret; =20 - handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte, - new_spte, iter->level, true); + handle_changed_spte(kvm, iter->as_id, iter->sptep, iter->gfn, + iter->old_spte, new_spte, iter->level, true); =20 return 0; } @@ -720,7 +721,7 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id,= tdp_ptep_t sptep, =20 old_spte =3D kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level); =20 - handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false); + handle_changed_spte(kvm, as_id, sptep, gfn, old_spte, new_spte, level, fa= lse); =20 /* * Users that do non-atomic setting of PTEs don't operate on mirror --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2AC7C33122E for ; Thu, 29 Jan 2026 01:15:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649345; cv=none; b=JD+9tUjWxgFjN8yxWcinzblMWWjJXr9VRuZJAPd4ReObaUTjZOLUzJVP+kuw9kZCRbDG3JS6bBMCsEdp1Os2WZd3FpxChy6D/p9n0GrZO68qYm0BWWbed4K8YdUCmv4wa8io/TuO7a/mXx8yF7JJLGqlP+LkKPEHMzfDNiVzUeM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649345; c=relaxed/simple; bh=kkghoN8ozlhuTfHP923gQ1EuVfkrW8+BxHy4JMV4lfQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=gRUuZcZsVdBDw7l+VqpRqyItSWLhBJc5asa6sC/NVKUXpTrWAWz2njUtqgwTvKIskl5dR6PlEMZ97QLEbBRcLxOm8hJ8XAw5tfaf86uNN6wbM8hpFcRjXD48zWmNA9RzUsfcG9z9FVd1MG+GFzizqm1xZhMZbc1ZQoXlnnOZmsA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=nDjEaaSE; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nDjEaaSE" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-c6124a9fb86so902298a12.3 for ; Wed, 28 Jan 2026 17:15:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649342; x=1770254142; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=SAUKDZ4/BIicH8X7DG5XUGdjBDs3Mzg+V7hlpmye//I=; b=nDjEaaSELq7tsG+ZZmKg2Ii3ysgvGBmoIxvjEKOc32tQwuzJvNBjFhXgZMrx+p1hHJ rF1n01/Z2iOp5Jpl1hP2ZRXC4C2BuHVTD+bNGbyfzP0nz6Gx73APfGsKmE6eydovc4ok klmP99k5WzoIM1fHglVMXV8kpADvRSnJKrEWoZjaKND/m4UhQj7faJ5q45wQjdgyVJ5M RCe5mMVLmUNCpbmMQVKQBn5FGGhrAwUC+xB2//QWSQpfHgpplacvag7C4MzT7M+dSHCH RUveKjkGnGlIfpNPt36Tq2dTZp5zHK0c6cR6bfihRgBg56y7lMoreW13aVj1In4ii6os jwqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649342; x=1770254142; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=SAUKDZ4/BIicH8X7DG5XUGdjBDs3Mzg+V7hlpmye//I=; b=cRBqcyx+eeHGBKwDfR1ZJB6fiF/4tQqqkzcDOjFImAFTZQ3fHvG6jDAb04WQRzBcXB KEqC9wFdwfNL0zxUoxWvvWR0u/AEY1rYnY1qo5oVegxwmHbkklDXvfM8tahV6YrwTsXe wHjzEY+Erx7akl9y7z2g+jbCFU9LZZtZQRj2eSUrtyhuKTvZ3VhENJPlc5vPb7zs2MR1 l33tagYHRTSHuECMIjnoggHuWqbrqHTV/hgFNrul428+DrW+olNhl8iMX5HgMaYlLubA jvkvlqK0DgbaT9D4IEeJnqXxtZ7VPal+34zcmu21g4yK0J9cVNaQeV398X7QWe0Wac+X 5TTQ== X-Gm-Message-State: AOJu0YxxfiWsKEs2zY7WfFxhvSy5RLSW9XCur3bZKkLxPxP0vIhDR3Ga iTkRqyYFz6YirSX9IMCuINjgXClNTSMsKpKxgnlEeIp0Sv5rFFtVdXkXPCeXYoEPnhmLtJlIazr t74Mplg== X-Received: from plbkf12.prod.google.com ([2002:a17:903:5cc:b0:2a0:fb2a:79f0]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:a8e:b0:38b:d9b5:5de2 with SMTP id adf61e73a8af0-38ec63c6233mr5358315637.50.1769649342527; Wed, 28 Jan 2026 17:15:42 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:40 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-9-seanjc@google.com> Subject: [RFC PATCH v5 08/45] KVM: x86/mmu: Propagate mirror SPTE removal to S-EPT in handle_changed_spte() From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Invoke .remove_external_spte() in handle_changed_spte() as appropriate instead of relying on callers to do the right thing. Relying on callers to invoke .remove_external_spte() is confusing and brittle, e.g. subtly relies tdp_mmu_set_spte_atomic() never removing SPTEs, and removing an S-EPT entry in tdp_mmu_set_spte() is bizarre (yeah, the VM is bugged so it doesn't matter in practice, but it's still weird). Implementing rules-based logic in a common chokepoint will also make it easier to reason about the correctness of splitting hugepages when support for S-EPT hugepages comes along. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 43 +++++++++++++------------------------- 1 file changed, 14 insertions(+), 29 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 8743cd020d12..27ac520f2a89 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -359,25 +359,6 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct = kvm_mmu_page *sp) spin_unlock(&kvm->arch.tdp_mmu_pages_lock); } =20 -static void remove_external_spte(struct kvm *kvm, gfn_t gfn, u64 old_spte, - int level) -{ - /* - * External (TDX) SPTEs are limited to PG_LEVEL_4K, and external - * PTs are removed in a special order, involving free_external_spt(). - * But remove_external_spte() will be called on non-leaf PTEs via - * __tdp_mmu_zap_root(), so avoid the error the former would return - * in this case. - */ - if (!is_last_spte(old_spte, level)) - return; - - /* Zapping leaf spte is allowed only when write lock is held. */ - lockdep_assert_held_write(&kvm->mmu_lock); - - kvm_x86_call(remove_external_spte)(kvm, gfn, level, old_spte); -} - /** * handle_removed_pt() - handle a page table removed from the TDP structure * @@ -473,11 +454,6 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pte= p_t pt, bool shared) } handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), sptep, gfn, old_spte, FROZEN_SPTE, level, shared); - - if (is_mirror_sp(sp)) { - KVM_BUG_ON(shared, kvm); - remove_external_spte(kvm, gfn, old_spte, level); - } } =20 if (is_mirror_sp(sp) && @@ -590,10 +566,21 @@ static void handle_changed_spte(struct kvm *kvm, int = as_id, tdp_ptep_t sptep, * the paging structure. Note the WARN on the PFN changing without the * SPTE being converted to a hugepage (leaf) or being zapped. Shadow * pages are kernel allocations and should never be migrated. + * + * When removing leaf entries from a mirror, immediately propagate the + * changes to the external page tables. Note, non-leaf mirror entries + * are handled by handle_removed_pt(), as TDX requires that all leaf + * entries are removed before the owning page table. Note #2, writes + * to make mirror PTEs shadow-present are propagated to external page + * tables by __tdp_mmu_set_spte_atomic(), as KVM needs to ensure the + * external page table was successfully updated before marking the + * mirror SPTE present. */ if (was_present && !was_leaf && (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); + else if (was_leaf && is_mirror_sptep(sptep) && !is_leaf) + kvm_x86_call(remove_external_spte)(kvm, gfn, level, old_spte); } =20 static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm, @@ -725,12 +712,10 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_i= d, tdp_ptep_t sptep, =20 /* * Users that do non-atomic setting of PTEs don't operate on mirror - * roots, so don't handle it and bug the VM if it's seen. + * roots. Bug the VM as this path doesn't propagate such writes to the + * external page tables. */ - if (is_mirror_sptep(sptep)) { - KVM_BUG_ON(is_shadow_present_pte(new_spte), kvm); - remove_external_spte(kvm, gfn, old_spte, level); - } + KVM_BUG_ON(is_mirror_sptep(sptep) && is_shadow_present_pte(new_spte), kvm= ); =20 return old_spte; } --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E324433554F for ; Thu, 29 Jan 2026 01:15:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649348; cv=none; b=L51Iik3eu7D/EkS6l3mAyfX1ZDlweZhMti+QWKfeZoRmaviVe5pM5hBLq/VlBYZ5K68YtyyRAIUNYBh9zh1BeUk8PuQfVh64M4i6L+HGzvJt2zyH3bA4YCMRXMbREQdV2dUROJrpOYGPXXxQKPXsG/yGHgjCZVp5p5rJaYqDVhk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649348; c=relaxed/simple; bh=yVpImpTFQZriVyFQbj/bWHkCHJ5SUUu9kYYN2lYkaKM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=cfe6gSWAef49Pb0noOiPJK3upHRO1T/e1nFD0w5y6ASjWg1sb2N2Xu45WpI1EH5Cd0KB4+MhhpgsXj1J1Jo0Xl6c7SACy2bNsLSszPLqRqLFO8msTSeCJhqPZpIkJCVN0FxpB+X8MVFnpuXK70FEArh9HoQh6QylqA/Z52EFRis= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4LkUNy2a; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4LkUNy2a" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-34ec823527eso661561a91.2 for ; Wed, 28 Jan 2026 17:15:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649344; x=1770254144; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=p4DxLrcS3WgLaR+fYgPLmC8YCbLvAUFJ1rM8P1CFw/Y=; b=4LkUNy2alYrfyl04ZgQlm5uL+2WzFB/tbjaLUBWqrBOV7HeurA/0YyvChECR7+xiJF bBAI2a5JrQsFaNofX15rT4HFEXoBKaR9PzpCVYHy5tAMDPfu7NXcm8x/PrHk0AOrKpUd VRaXURdQ0xAiDm4TVDCrwDDGnsE1N9vIhFovkIEw27j68iRfLaL6CO5QCUFd5w9/UwoB ZiXMbkN2uAxZUHqJKdH4gP+iEvK9qECbX+gBAlgTMC+fO7xvbjsyeO3XIWp8YE6lrac1 LPRt2Gf9hpofFfwfHSE3rF7wWtWwf19oF+GjZKsF+lHGrnxUtI67bgbXT+dOIp3yb819 I5lQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649344; x=1770254144; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=p4DxLrcS3WgLaR+fYgPLmC8YCbLvAUFJ1rM8P1CFw/Y=; b=ETjcL5n0008nOwWH9QRKu42HBy6xxh7hpMJDB9/oO8FnP5g/RpsiHi5FVB4d7DNG98 IY2q9esl1UHnF6or+uKPWO9+LFlOHLtsBJx10QuMdoozlEHoLRYq7BwmfvXPupPB0E1N MNZYscLfcKOqdjL9mgn1sKu4u9I0L8tyyIgeV1ni0Aydjj0soKWlJgVpWpd5I8eEzGqB OG14yqJ6tGhE4K8blRwVrXXl8+NsnKJVRe7u0Ol4aBBrVPDuf1O/lpsTXH3RYzCn0Edv /agPXCwxbOopWu1qNTHRXIWFA+TShsyDSpRQcQ4edGuPDLhZelnc9JRDm2F9DO2dyfwO Wodw== X-Gm-Message-State: AOJu0YzlduBxV29WqejFAuk7NmPRRkagsPDIcsZA+1C79N2qwsOwErcc EqXp1B7gCE4DLlwueD1streSzoGBjxNM8VNf5awosThHs+YpCaHTGbYPvRO0tH4KqbsCl5AryCo 2ylVSJA== X-Received: from pjbbh4.prod.google.com ([2002:a17:90b:484:b0:34c:dd6d:b10e]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2705:b0:34c:2db6:57d6 with SMTP id 98e67ed59e1d1-353fed866a5mr6317423a91.19.1769649344156; Wed, 28 Jan 2026 17:15:44 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:41 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-10-seanjc@google.com> Subject: [RFC PATCH v5 09/45] KVM: x86: Rework .free_external_spt() into .reclaim_external_sp() From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Massage .free_external_spt() into .reclaim_external_sp() to free up (pun intended) "free" for actually freeing memory, and to allow TDX to do more than just "free" the S-EPT entry. Specifically, nullify external_spt to leak the S-EPT page if reclaiming the page fails, as that detail and implementation choice has no business living in the TDP MMU. Use "sp" instead of "spt" even though "spt" is arguably more accurate, as "spte" and "spt" are dangerously close in name, and because the key parameter is a kvm_mmu_page, not a pointer to an S-EPT page table. Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm-x86-ops.h | 2 +- arch/x86/include/asm/kvm_host.h | 4 ++-- arch/x86/kvm/mmu/tdp_mmu.c | 13 ++----------- arch/x86/kvm/vmx/tdx.c | 27 ++++++++++++--------------- 4 files changed, 17 insertions(+), 29 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 57eb1f4832ae..c17cedc485c9 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -95,8 +95,8 @@ KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) -KVM_X86_OP_OPTIONAL_RET0(free_external_spt) KVM_X86_OP_OPTIONAL(remove_external_spte) +KVM_X86_OP_OPTIONAL(reclaim_external_sp) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) KVM_X86_OP(get_l2_tsc_multiplier) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index d12ca0f8a348..b35a07ed11fb 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1858,8 +1858,8 @@ struct kvm_x86_ops { u64 mirror_spte); =20 /* Update external page tables for page table about to be freed. */ - int (*free_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, - void *external_spt); + void (*reclaim_external_sp)(struct kvm *kvm, gfn_t gfn, + struct kvm_mmu_page *sp); =20 /* Update external page table from spte getting removed, and flush TLB. */ void (*remove_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level le= vel, diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 27ac520f2a89..18764dbc97ea 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -456,17 +456,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pte= p_t pt, bool shared) old_spte, FROZEN_SPTE, level, shared); } =20 - if (is_mirror_sp(sp) && - WARN_ON(kvm_x86_call(free_external_spt)(kvm, base_gfn, sp->role.level, - sp->external_spt))) { - /* - * Failed to free page table page in mirror page table and - * there is nothing to do further. - * Intentionally leak the page to prevent the kernel from - * accessing the encrypted page. - */ - sp->external_spt =3D NULL; - } + if (is_mirror_sp(sp)) + kvm_x86_call(reclaim_external_sp)(kvm, base_gfn, sp); =20 call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 30494f9ceb31..66bc3ceb5e17 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1783,27 +1783,24 @@ static void tdx_track(struct kvm *kvm) kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); } =20 -static int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, - enum pg_level level, void *private_spt) +static void tdx_sept_reclaim_private_sp(struct kvm *kvm, gfn_t gfn, + struct kvm_mmu_page *sp) { - struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - /* - * free_external_spt() is only called after hkid is freed when TD is - * tearing down. * KVM doesn't (yet) zap page table pages in mirror page table while * TD is active, though guest pages mapped in mirror page table could be * zapped during TD is active, e.g. for shared <-> private conversion * and slot move/deletion. + * + * In other words, KVM should only free mirror page tables after the + * TD's hkid is freed, when the TD is being torn down. + * + * If the S-EPT PTE can't be removed for any reason, intentionally leak + * the page to prevent the kernel from accessing the encrypted page. */ - if (KVM_BUG_ON(is_hkid_assigned(kvm_tdx), kvm)) - return -EIO; - - /* - * The HKID assigned to this TD was already freed and cache was - * already flushed. We don't have to flush again. - */ - return tdx_reclaim_page(virt_to_page(private_spt)); + if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) || + tdx_reclaim_page(virt_to_page(sp->external_spt))) + sp->external_spt =3D NULL; } =20 static void tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, @@ -3617,7 +3614,7 @@ void __init tdx_hardware_setup(void) vt_x86_ops.vm_size =3D max_t(unsigned int, vt_x86_ops.vm_size, sizeof(str= uct kvm_tdx)); =20 vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; - vt_x86_ops.free_external_spt =3D tdx_sept_free_private_spt; + vt_x86_ops.reclaim_external_sp =3D tdx_sept_reclaim_private_sp; vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; vt_x86_ops.protected_apic_has_interrupt =3D tdx_protected_apic_has_interr= upt; } --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 852C33168F7 for ; Thu, 29 Jan 2026 01:15:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649348; cv=none; b=rJ1A09h//6oToNbDY+kfwqyCQCwi69VPHXFbumWjOVDdSar/DCKCWgBFdXKBkWfXE3EtDVzJsqOB0v/XjBZODgReD4DJwj4HPy5NJsrYt5NxjiPIri8J6tXzBNEVAvDXsxDmJSnghvU9WL/EVL3GJiUgo5iJoih4wEy+4eEsB38= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649348; c=relaxed/simple; bh=5dRzb+o6Q/AhnBEPjaTaRMPq1GGOTIzm7nn66WFegm8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=HiqtBezP6bgQ8uh6IlLbfRfBLt/tn1nDCAJ8VCRgs3BKhzhQ7F8R+0er+B8riPzkOlGsYtxFg1OuibgdMYVKKnYBM5wDXHU6OH0QO7OUpTQt8pa9JBstrne67xVVXrfy1N5f86QYJY5TOH1VhPPJevT4kKXF1CDZH3CsEcRVCdg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=OmxbsKEH; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="OmxbsKEH" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b62da7602a0so305255a12.2 for ; Wed, 28 Jan 2026 17:15:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649346; x=1770254146; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=jeaBadeNJ+DHIVupn+BMEpRTAzqBIwMv2T9SYcpdIj4=; b=OmxbsKEHyFz2PsFeuz38st00L4C43wHncW6lOxRmWIyOMwVTVcGYzuI0V9OMATzzWH b4vtAnAGqa2SNVsPl8roYEpAu41joiKINjr8QL6DryoL5ytnRMyr3ZYIrcvEMlyAIOoe Ixy8iXavizGwwTsL2pHoBHPlDTt6LQhAnG32tfX/YPX1zzdQFqQqk79CCKE/VWP+vgZZ RF4gHY0JJQu/wgtr37PRbAhTwocIKl4ybEHU90aD3finzDSbCBzrsT/JKow50o3OKhHm ylNRlP6pLQki/NflEfgRAfBUY5LP+kUE8Aq2OUNhmmwVZi1XVSvirTXa3oxqO/zOhv87 +oVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649346; x=1770254146; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jeaBadeNJ+DHIVupn+BMEpRTAzqBIwMv2T9SYcpdIj4=; b=JFmGOn0qAAvOeTJi+Fy7zSZx74vY5/pgNfn3zl4MWZQL7vgrHphLgL5S3hh9e2HP+z AfuFN09rzCqET8/QI6t7BLSLOWvNNonJxtRdOY2qfIJH6ZdC7Jp/1sSCR0bd1I0sXcVg lreS+fvhAh3F9jN9PTiFGFJStICAAKYRoWdpDUC6IxIfwTMUQG39xXw/FyfrMPPzlOER eVycHxYkMCUHo6LbnDbhzMOoii5VZsrT4FmIy42iGUMb87Yd742LtvkcQPfrS6gi8yZd 0cM/TV3P04nO2welnMxT8knv+QM2wI+4KIMKOZiFNNn3DDdFxAzGvIHoQsz0K+4a0eoQ 0fbQ== X-Gm-Message-State: AOJu0YyC0gvAFMHu+ZmDOr59xYagO754/goJdYDbsfICgQV98OlX46Ts HU2nhRwK36nir49jxqrpJSLELvI+6dYbEOH1/RjJXjrSRBhZtDEIdEUWEXU70vD/YXvto9N6hHO lV++Lxw== X-Received: from pga3.prod.google.com ([2002:a05:6a02:4f83:b0:b6b:90a5:d43]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:62c4:b0:361:3bda:7155 with SMTP id adf61e73a8af0-38ec627b6b5mr6091565637.7.1769649345722; Wed, 28 Jan 2026 17:15:45 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:42 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-11-seanjc@google.com> Subject: [RFC PATCH v5 10/45] x86/tdx: Move all TDX error defines into From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kirill A. Shutemov Today there are two separate locations where TDX error codes are defined: arch/x86/include/asm/tdx.h arch/x86/kvm/vmx/tdx_errno.h They have some overlap that is already defined similarly. Reduce the duplication and prepare to introduce some helpers for these error codes in the central place by unifying them. Join them at: asm/shared/tdx_errno.h ...and update the headers that contained the duplicated definitions to include the new unified header. "asm/shared" is used for sharing TDX code between the early compressed code and the normal kernel code. While the compressed code for the guest doesn't use these error code header definitions today, it does make the types of calls that return the values they define. So place the defines in "shared" location so that it can, but leave such cleanups for future changes. Opportunistically massage some comments. Also, adjust _BITUL()->_BITULL() to address 32 bit build errors after the move. Signed-off-by: Kirill A. Shutemov [enhance log] Signed-off-by: Rick Edgecombe Tested-by: Sagi Shahar Signed-off-by: Sean Christopherson --- arch/x86/include/asm/shared/tdx.h | 1 + .../vmx =3D> include/asm/shared}/tdx_errno.h | 27 +++++++++++++++---- arch/x86/include/asm/tdx.h | 20 -------------- arch/x86/kvm/vmx/tdx.h | 1 - 4 files changed, 23 insertions(+), 26 deletions(-) rename arch/x86/{kvm/vmx =3D> include/asm/shared}/tdx_errno.h (65%) diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/share= d/tdx.h index 8bc074c8d7c6..6a1646fc2b2f 100644 --- a/arch/x86/include/asm/shared/tdx.h +++ b/arch/x86/include/asm/shared/tdx.h @@ -4,6 +4,7 @@ =20 #include #include +#include =20 #define TDX_HYPERCALL_STANDARD 0 =20 diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/include/asm/shared/tdx= _errno.h similarity index 65% rename from arch/x86/kvm/vmx/tdx_errno.h rename to arch/x86/include/asm/shared/tdx_errno.h index 6ff4672c4181..3aa74f6a6119 100644 --- a/arch/x86/kvm/vmx/tdx_errno.h +++ b/arch/x86/include/asm/shared/tdx_errno.h @@ -1,14 +1,16 @@ /* SPDX-License-Identifier: GPL-2.0 */ -/* architectural status code for SEAMCALL */ +#ifndef _X86_SHARED_TDX_ERRNO_H +#define _X86_SHARED_TDX_ERRNO_H =20 -#ifndef __KVM_X86_TDX_ERRNO_H -#define __KVM_X86_TDX_ERRNO_H +#include =20 +/* Upper 32 bit of the TDX error code encodes the status */ #define TDX_SEAMCALL_STATUS_MASK 0xFFFFFFFF00000000ULL =20 /* - * TDX SEAMCALL Status Codes (returned in RAX) + * TDX SEAMCALL Status Codes */ +#define TDX_SUCCESS 0ULL #define TDX_NON_RECOVERABLE_VCPU 0x4000000100000000ULL #define TDX_NON_RECOVERABLE_TD 0x4000000200000000ULL #define TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE 0x6000000500000000ULL @@ -17,6 +19,7 @@ #define TDX_OPERAND_INVALID 0xC000010000000000ULL #define TDX_OPERAND_BUSY 0x8000020000000000ULL #define TDX_PREVIOUS_TLB_EPOCH_BUSY 0x8000020100000000ULL +#define TDX_RND_NO_ENTROPY 0x8000020300000000ULL #define TDX_PAGE_METADATA_INCORRECT 0xC000030000000000ULL #define TDX_VCPU_NOT_ASSOCIATED 0x8000070200000000ULL #define TDX_KEY_GENERATION_FAILED 0x8000080000000000ULL @@ -28,6 +31,20 @@ #define TDX_EPT_ENTRY_STATE_INCORRECT 0xC0000B0D00000000ULL #define TDX_METADATA_FIELD_NOT_READABLE 0xC0000C0200000000ULL =20 +/* + * SW-defined error codes. + * + * Bits 47:40 =3D=3D 0xFF indicate Reserved status code class that never u= sed by + * TDX module. + */ +#define TDX_ERROR _BITULL(63) +#define TDX_NON_RECOVERABLE _BITULL(62) +#define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40)) +#define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _ULL(0xFFFF0000)) + +#define TDX_SEAMCALL_GP (TDX_SW_ERROR | X86_TRAP_GP) +#define TDX_SEAMCALL_UD (TDX_SW_ERROR | X86_TRAP_UD) + /* * TDX module operand ID, appears in 31:0 part of error code as * detail information @@ -37,4 +54,4 @@ #define TDX_OPERAND_ID_SEPT 0x92 #define TDX_OPERAND_ID_TD_EPOCH 0xa9 =20 -#endif /* __KVM_X86_TDX_ERRNO_H */ +#endif /* _X86_SHARED_TDX_ERRNO_H */ diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index bc0d03e70fd6..c3c574511094 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -12,26 +12,6 @@ #include #include =20 -/* - * SW-defined error codes. - * - * Bits 47:40 =3D=3D 0xFF indicate Reserved status code class that never u= sed by - * TDX module. - */ -#define TDX_ERROR _BITUL(63) -#define TDX_NON_RECOVERABLE _BITUL(62) -#define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40)) -#define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000)) - -#define TDX_SEAMCALL_GP (TDX_SW_ERROR | X86_TRAP_GP) -#define TDX_SEAMCALL_UD (TDX_SW_ERROR | X86_TRAP_UD) - -/* - * TDX module SEAMCALL leaf function error codes - */ -#define TDX_SUCCESS 0ULL -#define TDX_RND_NO_ENTROPY 0x8000020300000000ULL - #ifndef __ASSEMBLER__ =20 #include diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index 45b5183ccb36..ce2720a028ad 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -3,7 +3,6 @@ #define __KVM_X86_VMX_TDX_H =20 #include "tdx_arch.h" -#include "tdx_errno.h" =20 #ifdef CONFIG_KVM_INTEL_TDX #include "common.h" --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3B5B5335562 for ; Thu, 29 Jan 2026 01:15:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649351; cv=none; b=oxI86aAYrCKXb4URN2hhcN5g0K11idwAiha/qfnHqJbZD2Cf0GtmlfbLof12XFOdy/XIFYH5QKLhfXf1aqkuSufRGnYsj4QnK0A1PfP6DyvO9ZaBL0NAF+OEsXCortQJu4P/Idpe7sg8ALneQ5FQQ4NqOWu3jBQPY8xt177O0Y0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649351; c=relaxed/simple; bh=jwnKR9cuhHsfDDyjwAVTOJTLz/mxQ0s0U+yG6VaemLA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=dhQGUtRGbTfS5kuT+CM+dRI2M+yhOlm02rucEE+911DTz93X9o7CH/Pop2fJFWDSmrT/HDueu/0B2T+fqCLbQIc3Mlt1O402GZT6Whaqt1ECYXa49wO1qJcQZQ5X1n16ky430xJOg7IWMNyVBFCFKyomGxyuFPcje0wreqjghls= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=sItXTfVk; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="sItXTfVk" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-822426d844aso298316b3a.0 for ; Wed, 28 Jan 2026 17:15:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649348; x=1770254148; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:from:to:cc:subject:date :message-id:reply-to; bh=RZIzGBaOGIQvS0LzrR6Lm7I/m6+6T72pqVRnN87itMs=; b=sItXTfVkV4RlW9Grht34ebDSwyu+Ozu4fgcXS3dXiP+8HnleHEgPRnzI7SsIP6zU+o XM/NLeEIQLjXgLdct/IlOi+YbJLE3/MdNLEvcSLlC/8UlKaeRSbee54hzFkhNxYKQ9ps ebmAiMRnOXOZTOv13DUtXuhxmNg+NenGmlsJi0rTdsgpSJbHPoBPEENFrh2j80M86NMb Jz+tkjLbiBD2bpanLtlPifub5iVsAKdYHD4JrnrdjIR6BGm5UalpUshSPciA0cRolkFw zqaih1M81QAICexHDouqqmw3T3R40TRlEHzuR8/WxMTlZiipSHg6NysoutUZE3zxwzqx Lm6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649348; x=1770254148; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=RZIzGBaOGIQvS0LzrR6Lm7I/m6+6T72pqVRnN87itMs=; b=UFYv4CmuqO29DTutK5QGiL5T++a5lSGmKIaZ17b6xVz+uPa4acymCiLy90q5NAwzhM Zh6O5YXW9hnjraXqv0et7hMVFbGNpa1XWrKFfuA+CSZgvjvHLgt4pE2DJaMH5MO3fihG hZ8T3YNtJkyTbH2wGOWGBX7iSyKZYIsxtKG/XUwWFw0SKQDEm0eFUFhQLGXMOzTNzxTi Jp8y1+VwYkGvUcJluCHPfBEMqO4ngw4K9EF8iqSNSGSPYSKKv7+hSc7tean/1oBrIAi7 xugzydg/lupF6uQS+nzVMHgyGpSIodfHR9mEy+TnaZBmV6ADxP84CsigEQFTBProzsnx O3rA== X-Gm-Message-State: AOJu0Yxvjp6l+DCQl7v07P6uyhF0rq915XLjVlaC4EB/gUMdDntFoef9 GQvsTIY9z8LGJEtzuqZhoUukzDd5Aqy3RLSRXK+FdntjZJTY8a+Q40GlR2JtZa0ROn8yBkRR+q6 ZJfnV/Q== X-Received: from pfbln17.prod.google.com ([2002:a05:6a00:3cd1:b0:823:20e0:dd18]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:2451:b0:823:ec5:430c with SMTP id d2e1a72fcca58-823921185efmr1036212b3a.29.1769649348476; Wed, 28 Jan 2026 17:15:48 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:43 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-12-seanjc@google.com> Subject: [RFC PATCH v5 11/45] x86/tdx: Add helpers to check return status codes From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kirill A. Shutemov The TDX error code has a complex structure. The upper 32 bits encode the status code (higher level information), while the lower 32 bits provide clues about the error, such as operand ID, CPUID leaf, MSR index, etc. In practice, the kernel logic cares mostly about the status code. Whereas the error details are more often dumped to warnings to be used as debugging breadcrumbs. This results in a lot of code that masks the status code and then checks the resulting value. Future code to support Dynamic PAMT will add yet more SEAMCALL error code checking. To prepare for this, do some cleanup to reduce the boiler plate error code parsing. Since the lower bits that contain details are needed for both error printing and a few cases where the logical code flow does depend on them, don=E2=80=99t reduce the boiler plate by masking the detail bits inside the SEAMCALL wrappers, returning only the status code. Instead, create some helpers to perform the needed masking and comparisons. For the status code based checks, create a macro for generating the helpers based on the name. Name the helpers IS_TDX_FOO(), based on the discussion in the Link. Many of the checks that consult the error details are only done in a single place. It could be argued that there is not any code savings by adding helpers for these checks. Add helpers for them anyway so that the checks look consistent when uses with checks that are used in multiple places (e.g. sc_retry_prerr()). Finally, update the code that previously open coded the bit math to use the helpers. Link: https://lore.kernel.org/kvm/aJNycTvk1GEWgK_Q@google.com/ Signed-off-by: Kirill A. Shutemov [Enhance log] Signed-off-by: Rick Edgecombe Tested-by: Sagi Shahar Signed-off-by: Sean Christopherson --- arch/x86/coco/tdx/tdx.c | 10 +++--- arch/x86/include/asm/shared/tdx_errno.h | 47 ++++++++++++++++++++++++- arch/x86/include/asm/tdx.h | 2 +- arch/x86/kvm/vmx/tdx.c | 40 +++++++++------------ arch/x86/virt/vmx/tdx/tdx.c | 8 ++--- 5 files changed, 73 insertions(+), 34 deletions(-) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 7b2833705d47..167c5b273c40 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -129,9 +129,9 @@ int tdx_mcall_get_report0(u8 *reportdata, u8 *tdreport) =20 ret =3D __tdcall(TDG_MR_REPORT, &args); if (ret) { - if (TDCALL_RETURN_CODE(ret) =3D=3D TDCALL_INVALID_OPERAND) + if (IS_TDX_OPERAND_INVALID(ret)) return -ENXIO; - else if (TDCALL_RETURN_CODE(ret) =3D=3D TDCALL_OPERAND_BUSY) + else if (IS_TDX_OPERAND_BUSY(ret)) return -EBUSY; return -EIO; } @@ -165,9 +165,9 @@ int tdx_mcall_extend_rtmr(u8 index, u8 *data) =20 ret =3D __tdcall(TDG_MR_RTMR_EXTEND, &args); if (ret) { - if (TDCALL_RETURN_CODE(ret) =3D=3D TDCALL_INVALID_OPERAND) + if (IS_TDX_OPERAND_INVALID(ret)) return -ENXIO; - if (TDCALL_RETURN_CODE(ret) =3D=3D TDCALL_OPERAND_BUSY) + if (IS_TDX_OPERAND_BUSY(ret)) return -EBUSY; return -EIO; } @@ -316,7 +316,7 @@ static void reduce_unnecessary_ve(void) { u64 err =3D tdg_vm_wr(TDCS_TD_CTLS, TD_CTLS_REDUCE_VE, TD_CTLS_REDUCE_VE); =20 - if (err =3D=3D TDX_SUCCESS) + if (IS_TDX_SUCCESS(err)) return; =20 /* diff --git a/arch/x86/include/asm/shared/tdx_errno.h b/arch/x86/include/asm= /shared/tdx_errno.h index 3aa74f6a6119..e302aed31b50 100644 --- a/arch/x86/include/asm/shared/tdx_errno.h +++ b/arch/x86/include/asm/shared/tdx_errno.h @@ -5,7 +5,7 @@ #include =20 /* Upper 32 bit of the TDX error code encodes the status */ -#define TDX_SEAMCALL_STATUS_MASK 0xFFFFFFFF00000000ULL +#define TDX_STATUS_MASK 0xFFFFFFFF00000000ULL =20 /* * TDX SEAMCALL Status Codes @@ -54,4 +54,49 @@ #define TDX_OPERAND_ID_SEPT 0x92 #define TDX_OPERAND_ID_TD_EPOCH 0xa9 =20 +#ifndef __ASSEMBLER__ +#include +#include + +static inline u64 TDX_STATUS(u64 err) +{ + return err & TDX_STATUS_MASK; +} + +static inline bool IS_TDX_NON_RECOVERABLE(u64 err) +{ + return (err & TDX_NON_RECOVERABLE) =3D=3D TDX_NON_RECOVERABLE; +} + +static inline bool IS_TDX_SEAMCALL_VMFAILINVALID(u64 err) +{ + return (err & TDX_SEAMCALL_VMFAILINVALID) =3D=3D + TDX_SEAMCALL_VMFAILINVALID; +} + +static inline bool IS_TDX_SEAMCALL_GP(u64 err) +{ + return err =3D=3D TDX_SEAMCALL_GP; +} + +static inline bool IS_TDX_SEAMCALL_UD(u64 err) +{ + return err =3D=3D TDX_SEAMCALL_UD; +} + +#define DEFINE_TDX_ERRNO_HELPER(error) \ + static inline bool IS_##error(u64 err) \ + { \ + return TDX_STATUS(err) =3D=3D error; \ + } + +DEFINE_TDX_ERRNO_HELPER(TDX_SUCCESS); +DEFINE_TDX_ERRNO_HELPER(TDX_RND_NO_ENTROPY); +DEFINE_TDX_ERRNO_HELPER(TDX_OPERAND_INVALID); +DEFINE_TDX_ERRNO_HELPER(TDX_OPERAND_BUSY); +DEFINE_TDX_ERRNO_HELPER(TDX_VCPU_NOT_ASSOCIATED); +DEFINE_TDX_ERRNO_HELPER(TDX_FLUSHVP_NOT_DONE); +DEFINE_TDX_ERRNO_HELPER(TDX_SW_ERROR); + +#endif /* __ASSEMBLER__ */ #endif /* _X86_SHARED_TDX_ERRNO_H */ diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index c3c574511094..441a26988d3b 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -117,7 +117,7 @@ static __always_inline u64 sc_retry(sc_func_t func, u64= fn, preempt_disable(); ret =3D __seamcall_dirty_cache(func, fn, args); preempt_enable(); - } while (ret =3D=3D TDX_RND_NO_ENTROPY && --retry); + } while (IS_TDX_RND_NO_ENTROPY(ret) && --retry); =20 return ret; } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 66bc3ceb5e17..4ef414ee27b4 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -220,12 +220,6 @@ static DEFINE_MUTEX(tdx_lock); =20 static atomic_t nr_configured_hkid; =20 -static bool tdx_operand_busy(u64 err) -{ - return (err & TDX_SEAMCALL_STATUS_MASK) =3D=3D TDX_OPERAND_BUSY; -} - - /* * A per-CPU list of TD vCPUs associated with a given CPU. * Protected by interrupt mask. Only manipulated by the CPU owning this pe= r-CPU @@ -312,7 +306,7 @@ static inline void tdx_disassociate_vp(struct kvm_vcpu = *vcpu) lockdep_assert_held_write(&kvm->mmu_lock); \ \ __err =3D tdh_func(args); \ - if (unlikely(tdx_operand_busy(__err))) { \ + if (unlikely(IS_TDX_OPERAND_BUSY(__err))) { \ WRITE_ONCE(__kvm_tdx->wait_for_sept_zap, true); \ kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); \ \ @@ -400,7 +394,7 @@ static void tdx_flush_vp(void *_arg) * migration. No other thread uses TDVPR in those cases. */ err =3D tdh_vp_flush(&to_tdx(vcpu)->vp); - if (unlikely(err && err !=3D TDX_VCPU_NOT_ASSOCIATED)) { + if (unlikely(!IS_TDX_VCPU_NOT_ASSOCIATED(err))) { /* * This function is called in IPI context. Do not use * printk to avoid console semaphore. @@ -467,7 +461,7 @@ static void smp_func_do_phymem_cache_wb(void *unused) /* * TDH.PHYMEM.CACHE.WB flushes caches associated with any TDX private * KeyID on the package or core. The TDX module may not finish the - * cache flush but return TDX_INTERRUPTED_RESUMEABLE instead. The + * cache flush but return TDX_ERR_INTERRUPTED_RESUMEABLE instead. The * kernel should retry it until it returns success w/o rescheduling. */ for (i =3D TDX_SEAMCALL_RETRIES; i > 0; i--) { @@ -522,7 +516,7 @@ void tdx_mmu_release_hkid(struct kvm *kvm) * associations, as all vCPU fds have been released at this stage. */ err =3D tdh_mng_vpflushdone(&kvm_tdx->td); - if (err =3D=3D TDX_FLUSHVP_NOT_DONE) + if (IS_TDX_FLUSHVP_NOT_DONE(err)) goto out; if (TDX_BUG_ON(err, TDH_MNG_VPFLUSHDONE, kvm)) { pr_err("tdh_mng_vpflushdone() failed. HKID %d is leaked.\n", @@ -937,7 +931,7 @@ static __always_inline u32 tdx_to_vmx_exit_reason(struc= t kvm_vcpu *vcpu) struct vcpu_tdx *tdx =3D to_tdx(vcpu); u32 exit_reason; =20 - switch (tdx->vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) { + switch (TDX_STATUS(tdx->vp_enter_ret)) { case TDX_SUCCESS: case TDX_NON_RECOVERABLE_VCPU: case TDX_NON_RECOVERABLE_TD: @@ -1011,7 +1005,7 @@ static fastpath_t tdx_exit_handlers_fastpath(struct k= vm_vcpu *vcpu) * EXIT_FASTPATH_REENTER_GUEST to exit fastpath, otherwise, the * requester may be blocked endlessly. */ - if (unlikely(tdx_operand_busy(vp_enter_ret))) + if (unlikely(IS_TDX_OPERAND_BUSY(vp_enter_ret))) return EXIT_FASTPATH_EXIT_HANDLED; =20 return EXIT_FASTPATH_NONE; @@ -1107,7 +1101,7 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 ru= n_flags) if (unlikely(tdx->vp_enter_ret =3D=3D EXIT_REASON_EPT_MISCONFIG)) return EXIT_FASTPATH_NONE; =20 - if (unlikely((tdx->vp_enter_ret & TDX_SW_ERROR) =3D=3D TDX_SW_ERROR)) + if (unlikely(IS_TDX_SW_ERROR(tdx->vp_enter_ret))) return EXIT_FASTPATH_NONE; =20 trace_kvm_exit(vcpu, KVM_ISA_VMX); @@ -1636,7 +1630,7 @@ static int tdx_mem_page_add(struct kvm *kvm, gfn_t gf= n, enum pg_level level, =20 err =3D tdh_mem_page_add(&kvm_tdx->td, gpa, pfn_to_page(pfn), kvm_tdx->page_add_src, &entry, &level_state); - if (unlikely(tdx_operand_busy(err))) + if (unlikely(IS_TDX_OPERAND_BUSY(err))) return -EBUSY; =20 if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_ADD, entry, level_state, kvm)) @@ -1655,7 +1649,7 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gf= n, u64 err; =20 err =3D tdh_mem_page_aug(&kvm_tdx->td, gpa, level, page, &entry, &level_s= tate); - if (unlikely(tdx_operand_busy(err))) + if (unlikely(IS_TDX_OPERAND_BUSY(err))) return -EBUSY; =20 if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_AUG, entry, level_state, kvm)) @@ -1690,7 +1684,7 @@ static int tdx_sept_link_private_spt(struct kvm *kvm,= gfn_t gfn, =20 err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, level, external_spt, &entry, &level_state); - if (unlikely(tdx_operand_busy(err))) + if (unlikely(IS_TDX_OPERAND_BUSY(err))) return -EBUSY; =20 if (TDX_BUG_ON_2(err, TDH_MEM_SEPT_ADD, entry, level_state, kvm)) @@ -2011,7 +2005,7 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t= fastpath) * Handle TDX SW errors, including TDX_SEAMCALL_UD, TDX_SEAMCALL_GP and * TDX_SEAMCALL_VMFAILINVALID. */ - if (unlikely((vp_enter_ret & TDX_SW_ERROR) =3D=3D TDX_SW_ERROR)) { + if (unlikely(IS_TDX_SW_ERROR(vp_enter_ret))) { KVM_BUG_ON(!kvm_rebooting, vcpu->kvm); goto unhandled_exit; } @@ -2022,7 +2016,7 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t= fastpath) * not enabled, TDX_NON_RECOVERABLE must be set. */ WARN_ON_ONCE(vcpu->arch.guest_state_protected && - !(vp_enter_ret & TDX_NON_RECOVERABLE)); + !IS_TDX_NON_RECOVERABLE(vp_enter_ret)); vcpu->run->exit_reason =3D KVM_EXIT_FAIL_ENTRY; vcpu->run->fail_entry.hardware_entry_failure_reason =3D exit_reason.full; vcpu->run->fail_entry.cpu =3D vcpu->arch.last_vmentry_cpu; @@ -2036,7 +2030,7 @@ int tdx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t= fastpath) } =20 WARN_ON_ONCE(exit_reason.basic !=3D EXIT_REASON_TRIPLE_FAULT && - (vp_enter_ret & TDX_SEAMCALL_STATUS_MASK) !=3D TDX_SUCCESS); + !IS_TDX_SUCCESS(vp_enter_ret)); =20 switch (exit_reason.basic) { case EXIT_REASON_TRIPLE_FAULT: @@ -2470,7 +2464,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_p= arams *td_params, err =3D tdh_mng_create(&kvm_tdx->td, kvm_tdx->hkid); mutex_unlock(&tdx_lock); =20 - if (err =3D=3D TDX_RND_NO_ENTROPY) { + if (IS_TDX_RND_NO_ENTROPY(err)) { ret =3D -EAGAIN; goto free_packages; } @@ -2511,7 +2505,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_p= arams *td_params, kvm_tdx->td.tdcs_pages =3D tdcs_pages; for (i =3D 0; i < kvm_tdx->td.tdcs_nr_pages; i++) { err =3D tdh_mng_addcx(&kvm_tdx->td, tdcs_pages[i]); - if (err =3D=3D TDX_RND_NO_ENTROPY) { + if (IS_TDX_RND_NO_ENTROPY(err)) { /* Here it's hard to allow userspace to retry. */ ret =3D -EAGAIN; goto teardown; @@ -2523,7 +2517,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_p= arams *td_params, } =20 err =3D tdh_mng_init(&kvm_tdx->td, __pa(td_params), &rcx); - if ((err & TDX_SEAMCALL_STATUS_MASK) =3D=3D TDX_OPERAND_INVALID) { + if (IS_TDX_OPERAND_INVALID(err)) { /* * Because a user gives operands, don't warn. * Return a hint to the user because it's sometimes hard for the @@ -2837,7 +2831,7 @@ static int tdx_td_finalize(struct kvm *kvm, struct kv= m_tdx_cmd *cmd) return -EINVAL; =20 cmd->hw_error =3D tdh_mr_finalize(&kvm_tdx->td); - if (tdx_operand_busy(cmd->hw_error)) + if (IS_TDX_OPERAND_BUSY(cmd->hw_error)) return -EBUSY; if (TDX_BUG_ON(cmd->hw_error, TDH_MR_FINALIZE, kvm)) return -EIO; diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 22c0f832cb37..783bf704f2cd 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -82,16 +82,16 @@ static __always_inline int sc_retry_prerr(sc_func_t fun= c, { u64 sret =3D sc_retry(func, fn, args); =20 - if (sret =3D=3D TDX_SUCCESS) + if (IS_TDX_SUCCESS(sret)) return 0; =20 - if (sret =3D=3D TDX_SEAMCALL_VMFAILINVALID) + if (IS_TDX_SEAMCALL_VMFAILINVALID(sret)) return -ENODEV; =20 - if (sret =3D=3D TDX_SEAMCALL_GP) + if (IS_TDX_SEAMCALL_GP(sret)) return -EOPNOTSUPP; =20 - if (sret =3D=3D TDX_SEAMCALL_UD) + if (IS_TDX_SEAMCALL_UD(sret)) return -EACCES; =20 err_func(fn, sret, args); --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B29C33195E3 for ; Thu, 29 Jan 2026 01:15:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649356; cv=none; b=eUZ/9soshM0D2CxRbq/oM23G27lM7Alvuy00F02tS+ird7+d6Sp3cH/1KJibjPN4V2RtuSMzpoyz9iHNUexnNhvAkbZ4b0h9ays954hhP7vh0Jvuhd7tvqLuaJqmdFIhboEbh8xDXqH9q/c1jg+Og5NOQDpzbYKZGSuv9LQhW+o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649356; c=relaxed/simple; bh=ox2Up5FtQH0aplnsexm0ECraE0OqVPOftwajWnYeoZQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=QUY4zhr+bhm91HIMA4VzX5HxgRhTfYYjLttZ0NR0iX5ORdZ4Yg1fb0FEeJlP0MTvhHzOjU8cMLlK92CMmTcVMEt1ZhBCzZ0GAelSz9UK9twvh4MtejehJAob20xHJJRbR/EZFSjD071MwpU9BIDrCuNLUf+j/mUiuqwgbOTyodU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=QJYwc/Hh; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="QJYwc/Hh" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-34c5d203988so729030a91.3 for ; Wed, 28 Jan 2026 17:15:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649350; x=1770254150; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=oUVgGQKjra+oTJtFOnUbyNE8ATML+3QU96XWRx/DAwg=; b=QJYwc/HhirWOQxQ6pimzwUjdLA7CNgdAbOk+6rAAOO8TgCjO8HQDHc60ppIJgS3S0R TDXhhOCDzeIiJ/ULldjN44FFuTAsmdROo3kPNcKFiEXC0L9tNL0lABC/FotZwbO2MSTp EPD1lI2dQy8Uobkr98JxCShw6kh77bTroQ602spQ0xOEYIp0IAmCfgFxs5T+Q8Z+dZ28 Udz1QawCPVtSKE3fOOs/tl+4h2GJ7+7o8u2EVz1VreNGtJzRykRXwSjqLRqD62LXf2CB EZLxs60PpqwxslczKa01wrLvJa1RpPcATCE5fqT4TExgDtTUy83jUAlZh1ys3aMWSLEZ M5oQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649350; x=1770254150; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oUVgGQKjra+oTJtFOnUbyNE8ATML+3QU96XWRx/DAwg=; b=dPmB3xPqN/fAw7Lr6YhXSp+TrAhXElE6BdfgwgJZrDKWFpds658zAKoqgM4/HxOEx8 uO1y9N9joEG+dYnOXis9Ow+6OLr+Y6XzCYy11zGNJAJothaQQhPvpc1S37lGlGjYvplc qZjl5+9MzRbOfveD9/j8mhE8gy8vqHdcUHDpwR5DoZGlh1/xM1Sd1L2I0+bfxhTEcvom HI2Zi/dDFb9L8kCgqDRHs7LZujdzbTU21Ri+Y8VCnyLizTuF4B2CGomUfy5owIbwinHD NzT4bpbEQOJRJ/N9JPxFnuXhpcbD6QBxFGTTBDobV2JxwcD50srU0rwWzWC3hlGmeqkL WpCg== X-Gm-Message-State: AOJu0YxJq/+pDJiPKdFD5Oo8c7EdzckXDZtvaa/I/X2J7xv0mODm6TXh 9TCinIljTg/7k4B1q7FjsYuFUkE+2rBjxZlQUOjdqzoTcGwlWy7tDGYABFdhjGRq4n8BI1ARzaZ e4USq4g== X-Received: from pjyl21.prod.google.com ([2002:a17:90a:ec15:b0:352:be9f:8324]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:e704:b0:340:ff7d:c26 with SMTP id 98e67ed59e1d1-353fed0a12dmr7721418a91.16.1769649350136; Wed, 28 Jan 2026 17:15:50 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:44 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-13-seanjc@google.com> Subject: [RFC PATCH v5 12/45] x86/virt/tdx: Simplify tdmr_get_pamt_sz() From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rick Edgecombe For each memory region that the TDX module might use (TDMR), the three separate PAMT allocations are needed. One for each supported page size (1GB, 2MB, 4KB). These store information on each page in the TDMR. In Linux, they are allocated out of one physically contiguous block, in order to more efficiently use some internal TDX module book keeping resources. So some simple math is needed to break the single large allocation into three smaller allocations for each page size. There are some commonalities in the math needed to calculate the base and size for each smaller allocation, and so an effort was made to share logic across the three. Unfortunately doing this turned out naturally tortured, with a loop iterating over the three page sizes, only to call into a function with a case statement for each page size. In the future Dynamic PAMT will add more logic that is special to the 4KB page size, making the benefit of the math sharing even more questionable. Three is not a very high number, so get rid of the loop and just duplicate the small calculation three times. In doing so, setup for future Dynamic PAMT changes and drop a net 33 lines of code. Since the loop that iterates over it is gone, further simplify the code by dropping the array of intermediate size and base storage. Just store the values to their final locations. Accept the small complication of having to clear tdmr->pamt_4k_base in the error path, so that tdmr_do_pamt_func() will not try to operate on the TDMR struct when attempting to free it. Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu Tested-by: Sagi Shahar Signed-off-by: Sean Christopherson --- arch/x86/virt/vmx/tdx/tdx.c | 93 ++++++++++++------------------------- 1 file changed, 29 insertions(+), 64 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 783bf704f2cd..0c4c873bff80 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -445,31 +445,21 @@ static int fill_out_tdmrs(struct list_head *tmb_list, * Calculate PAMT size given a TDMR and a page size. The returned * PAMT size is always aligned up to 4K page boundary. */ -static unsigned long tdmr_get_pamt_sz(struct tdmr_info *tdmr, int pgsz, - u16 pamt_entry_size) +static unsigned long tdmr_get_pamt_sz(struct tdmr_info *tdmr, int pgsz) { unsigned long pamt_sz, nr_pamt_entries; + const int tdx_pg_size_shift[] =3D { PAGE_SHIFT, PMD_SHIFT, PUD_SHIFT }; + const u16 pamt_entry_size[TDX_PS_NR] =3D { + tdx_sysinfo.tdmr.pamt_4k_entry_size, + tdx_sysinfo.tdmr.pamt_2m_entry_size, + tdx_sysinfo.tdmr.pamt_1g_entry_size, + }; =20 - switch (pgsz) { - case TDX_PS_4K: - nr_pamt_entries =3D tdmr->size >> PAGE_SHIFT; - break; - case TDX_PS_2M: - nr_pamt_entries =3D tdmr->size >> PMD_SHIFT; - break; - case TDX_PS_1G: - nr_pamt_entries =3D tdmr->size >> PUD_SHIFT; - break; - default: - WARN_ON_ONCE(1); - return 0; - } + nr_pamt_entries =3D tdmr->size >> tdx_pg_size_shift[pgsz]; + pamt_sz =3D nr_pamt_entries * pamt_entry_size[pgsz]; =20 - pamt_sz =3D nr_pamt_entries * pamt_entry_size; /* TDX requires PAMT size must be 4K aligned */ - pamt_sz =3D ALIGN(pamt_sz, PAGE_SIZE); - - return pamt_sz; + return PAGE_ALIGN(pamt_sz); } =20 /* @@ -507,28 +497,21 @@ static int tdmr_get_nid(struct tdmr_info *tdmr, struc= t list_head *tmb_list) * within @tdmr, and set up PAMTs for @tdmr. */ static int tdmr_set_up_pamt(struct tdmr_info *tdmr, - struct list_head *tmb_list, - u16 pamt_entry_size[]) + struct list_head *tmb_list) { - unsigned long pamt_base[TDX_PS_NR]; - unsigned long pamt_size[TDX_PS_NR]; - unsigned long tdmr_pamt_base; unsigned long tdmr_pamt_size; struct page *pamt; - int pgsz, nid; - + int nid; nid =3D tdmr_get_nid(tdmr, tmb_list); =20 /* * Calculate the PAMT size for each TDX supported page size * and the total PAMT size. */ - tdmr_pamt_size =3D 0; - for (pgsz =3D TDX_PS_4K; pgsz < TDX_PS_NR; pgsz++) { - pamt_size[pgsz] =3D tdmr_get_pamt_sz(tdmr, pgsz, - pamt_entry_size[pgsz]); - tdmr_pamt_size +=3D pamt_size[pgsz]; - } + tdmr->pamt_4k_size =3D tdmr_get_pamt_sz(tdmr, TDX_PS_4K); + tdmr->pamt_2m_size =3D tdmr_get_pamt_sz(tdmr, TDX_PS_2M); + tdmr->pamt_1g_size =3D tdmr_get_pamt_sz(tdmr, TDX_PS_1G); + tdmr_pamt_size =3D tdmr->pamt_4k_size + tdmr->pamt_2m_size + tdmr->pamt_1= g_size; =20 /* * Allocate one chunk of physically contiguous memory for all @@ -536,26 +519,18 @@ static int tdmr_set_up_pamt(struct tdmr_info *tdmr, * in overlapped TDMRs. */ pamt =3D alloc_contig_pages(tdmr_pamt_size >> PAGE_SHIFT, GFP_KERNEL, - nid, &node_online_map); - if (!pamt) + nid, &node_online_map); + if (!pamt) { + /* + * tdmr->pamt_4k_base is zero so the + * error path will skip freeing. + */ return -ENOMEM; - - /* - * Break the contiguous allocation back up into the - * individual PAMTs for each page size. - */ - tdmr_pamt_base =3D page_to_pfn(pamt) << PAGE_SHIFT; - for (pgsz =3D TDX_PS_4K; pgsz < TDX_PS_NR; pgsz++) { - pamt_base[pgsz] =3D tdmr_pamt_base; - tdmr_pamt_base +=3D pamt_size[pgsz]; } =20 - tdmr->pamt_4k_base =3D pamt_base[TDX_PS_4K]; - tdmr->pamt_4k_size =3D pamt_size[TDX_PS_4K]; - tdmr->pamt_2m_base =3D pamt_base[TDX_PS_2M]; - tdmr->pamt_2m_size =3D pamt_size[TDX_PS_2M]; - tdmr->pamt_1g_base =3D pamt_base[TDX_PS_1G]; - tdmr->pamt_1g_size =3D pamt_size[TDX_PS_1G]; + tdmr->pamt_4k_base =3D page_to_phys(pamt); + tdmr->pamt_2m_base =3D tdmr->pamt_4k_base + tdmr->pamt_4k_size; + tdmr->pamt_1g_base =3D tdmr->pamt_2m_base + tdmr->pamt_2m_size; =20 return 0; } @@ -586,10 +561,7 @@ static void tdmr_do_pamt_func(struct tdmr_info *tdmr, tdmr_get_pamt(tdmr, &pamt_base, &pamt_size); =20 /* Do nothing if PAMT hasn't been allocated for this TDMR */ - if (!pamt_size) - return; - - if (WARN_ON_ONCE(!pamt_base)) + if (!pamt_base) return; =20 pamt_func(pamt_base, pamt_size); @@ -615,14 +587,12 @@ static void tdmrs_free_pamt_all(struct tdmr_info_list= *tdmr_list) =20 /* Allocate and set up PAMTs for all TDMRs */ static int tdmrs_set_up_pamt_all(struct tdmr_info_list *tdmr_list, - struct list_head *tmb_list, - u16 pamt_entry_size[]) + struct list_head *tmb_list) { int i, ret =3D 0; =20 for (i =3D 0; i < tdmr_list->nr_consumed_tdmrs; i++) { - ret =3D tdmr_set_up_pamt(tdmr_entry(tdmr_list, i), tmb_list, - pamt_entry_size); + ret =3D tdmr_set_up_pamt(tdmr_entry(tdmr_list, i), tmb_list); if (ret) goto err; } @@ -903,18 +873,13 @@ static int construct_tdmrs(struct list_head *tmb_list, struct tdmr_info_list *tdmr_list, struct tdx_sys_info_tdmr *sysinfo_tdmr) { - u16 pamt_entry_size[TDX_PS_NR] =3D { - sysinfo_tdmr->pamt_4k_entry_size, - sysinfo_tdmr->pamt_2m_entry_size, - sysinfo_tdmr->pamt_1g_entry_size, - }; int ret; =20 ret =3D fill_out_tdmrs(tmb_list, tdmr_list); if (ret) return ret; =20 - ret =3D tdmrs_set_up_pamt_all(tdmr_list, tmb_list, pamt_entry_size); + ret =3D tdmrs_set_up_pamt_all(tdmr_list, tmb_list); if (ret) return ret; =20 --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7DDF431C57B for ; Thu, 29 Jan 2026 01:15:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649358; cv=none; b=pxrnD8pT9gktDclfyELuTRIAsYPpnUT0S4XALiQT//zjwKG16/YY4einmT9Mv0TdQ2uFHx2YadxsWfGTqcPurrau3FBzCPPparC0HryygFEJWyrBL3PDHiHGNjtC4XY22h0mcHAYkU/7+vz80eynRVhXXi7dVPVh17JmQ4Mc0i0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649358; c=relaxed/simple; bh=oX2MlvAYx4zmch8f356wAzFlhGp0Ab7gNL2eHuy2Vxo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=DeLzWud8A5etIseEhX4OX1crz0uz+W4RZdFzf5vYbysDc3ueRCVMM6wTyV2cpIO1T0pM5M3sdvtZbGFW7uEw1e20zac+XzQyVEi7lPJhA7aGc6frBx0RqHJXwOTHez7+yOsU4V3jb2Ep9Jh8f5f+PPHEoco2K6yswy/PlbUTrGw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=EfcR20IN; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="EfcR20IN" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-35301003062so1067090a91.2 for ; Wed, 28 Jan 2026 17:15:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649353; x=1770254153; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=oWQPx27FCugyzao24/tqQ+m62u9m5MfQopgQbQNuj2c=; b=EfcR20INHGl6b5V3cdxngj/R6SNDzNA/VG+pBDsMrlYdEawTii/UPokRW5d4SixfgR RusxoZ/K/fZSnkg2LslZrrsqBkFTCXImRRDYKIOYTw71mKFj7+D+pHVWVBfdc2RBi3jO igQzmcO50Mf9GCFDXEl433ICdhpZ5oD8mUztTSAOyhEnABGw4kmbS0Rf7P061uS9zelL /hmcDOkfHBNAUprXoQxZRx9+zHVAYyR4+fvvE2hv6aDEWRJHs5CGXxj/3X1e+VD6CWhQ cs15jjaokaKpOS8LpnM6BUEHCHm+pa7n3pIJyqLx4vNgNOqCQQO1nzX274Ul0ZU25R7+ Kbbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649353; x=1770254153; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oWQPx27FCugyzao24/tqQ+m62u9m5MfQopgQbQNuj2c=; b=YShvlj++QAO/zhHy94Dd7GUq8KCy9y0JiQjX5HJvYgPB7rV7h7gHkQ4XPXP2/3YJiN ZuQx04Gz0r4MvFX8tEzN191P7f/Bgqy5oJa9smgnIvq3Xq8bYK833j5wOt2LcNRmHD7z jgoImhrwyn89oOeEwSkHJ4LNebNSJcQaRA3I2QoNI8JdAT1zurJuyTDyzJMTBLjnj/D9 df/mErygtjWT7bz8vP1TCJnWEH+WZd4qScU4vG8MwX9qPZkghVCYIF8sesCoiFcJfmAC pKHMwoIWmltA3tuGxiJOLUxV7O0JEcnL3hM1tal1XELC/uufyfFUvpFYb5PNs3qnawBP GDKA== X-Gm-Message-State: AOJu0YzOA7i3Zkrpzfpw5pSuuJrMOhs/ihKifRe9a96jHmGuVCT6gXmZ mQ+WxTVxN1vrwXkgtKDBzFKGSsCUweRu8cfwqIdyG5Dkaeum9LOcaLm5T2dqnrT64NcCtcmNoh/ Gy9qX2A== X-Received: from pjyd6.prod.google.com ([2002:a17:90a:dfc6:b0:353:8d2b:682]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5183:b0:352:ccae:fe65 with SMTP id 98e67ed59e1d1-353fecba611mr6392881a91.4.1769649352912; Wed, 28 Jan 2026 17:15:52 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:45 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-14-seanjc@google.com> Subject: [RFC PATCH v5 13/45] x86/virt/tdx: Allocate page bitmap for Dynamic PAMT From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kirill A. Shutemov The Physical Address Metadata Table (PAMT) holds TDX metadata for physical memory and must be allocated by the kernel during TDX module initialization. The exact size of the required PAMT memory is determined by the TDX module and may vary between TDX module versions. Currently it is approximately 0.4% of the system memory. This is a significant commitment, especially if it is not known upfront whether the machine will run any TDX guests. For normal PAMT, each memory region that the TDX module might use (TDMR) needs three separate PAMT allocations. One for each supported page size (1GB, 2MB, 4KB). At a high level, Dynamic PAMT still has the 1GB and 2MB levels allocated on TDX module initialization, but the 4KB level allocated dynamically at TD runtime. However, in the details, the TDX module still needs some per 4KB page data. The TDX module exposed how many bits per page need to be allocated (currently it is 1). The bits-per-page value can then be used to calculate the size to pass in place of the 4KB allocations in the TDMR, which TDX specs call "PAMT_PAGE_BITMAP". So in effect, Dynamic PAMT just needs a different (smaller) size allocation for the 4KB level part of the allocation. Although it is functionally something different, it is passed in the same way the 4KB page size PAMT allocation is. Begin to implement Dynamic PAMT in the kernel by reading the bits-per-page needed for Dynamic PAMT. Calculate the size needed for the bitmap, and use it instead of the 4KB size determined for normal PAMT, in the case of Dynamic PAMT. In doing so, reduce the static allocations to approximately 0.004%, a 100x improvement. Signed-off-by: Kirill A. Shutemov [Enhanced log] Signed-off-by: Rick Edgecombe Reviewed-by: Binbin Wu Tested-by: Sagi Shahar Signed-off-by: Sean Christopherson --- arch/x86/include/asm/tdx.h | 5 +++++ arch/x86/include/asm/tdx_global_metadata.h | 1 + arch/x86/virt/vmx/tdx/tdx.c | 19 ++++++++++++++++++- arch/x86/virt/vmx/tdx/tdx_global_metadata.c | 7 +++++++ 4 files changed, 31 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 441a26988d3b..57d5f07e3735 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -130,6 +130,11 @@ int tdx_enable(void); const char *tdx_dump_mce_info(struct mce *m); const struct tdx_sys_info *tdx_get_sysinfo(void); =20 +static inline bool tdx_supports_dynamic_pamt(const struct tdx_sys_info *sy= sinfo) +{ + return false; /* To be enabled when kernel is ready */ +} + int tdx_guest_keyid_alloc(void); u32 tdx_get_nr_guest_keyids(void); void tdx_guest_keyid_free(unsigned int keyid); diff --git a/arch/x86/include/asm/tdx_global_metadata.h b/arch/x86/include/= asm/tdx_global_metadata.h index 060a2ad744bf..5eb808b23997 100644 --- a/arch/x86/include/asm/tdx_global_metadata.h +++ b/arch/x86/include/asm/tdx_global_metadata.h @@ -15,6 +15,7 @@ struct tdx_sys_info_tdmr { u16 pamt_4k_entry_size; u16 pamt_2m_entry_size; u16 pamt_1g_entry_size; + u8 pamt_page_bitmap_entry_bits; }; =20 struct tdx_sys_info_td_ctrl { diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 0c4c873bff80..517c6759c3ca 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -441,6 +441,18 @@ static int fill_out_tdmrs(struct list_head *tmb_list, return 0; } =20 +static unsigned long tdmr_get_pamt_bitmap_sz(struct tdmr_info *tdmr) +{ + unsigned long pamt_sz, nr_pamt_entries; + int bits_per_entry; + + bits_per_entry =3D tdx_sysinfo.tdmr.pamt_page_bitmap_entry_bits; + nr_pamt_entries =3D tdmr->size >> PAGE_SHIFT; + pamt_sz =3D DIV_ROUND_UP(nr_pamt_entries * bits_per_entry, BITS_PER_BYTE); + + return PAGE_ALIGN(pamt_sz); +} + /* * Calculate PAMT size given a TDMR and a page size. The returned * PAMT size is always aligned up to 4K page boundary. @@ -508,7 +520,12 @@ static int tdmr_set_up_pamt(struct tdmr_info *tdmr, * Calculate the PAMT size for each TDX supported page size * and the total PAMT size. */ - tdmr->pamt_4k_size =3D tdmr_get_pamt_sz(tdmr, TDX_PS_4K); + if (tdx_supports_dynamic_pamt(&tdx_sysinfo)) { + /* With Dynamic PAMT, PAMT_4K is replaced with a bitmap */ + tdmr->pamt_4k_size =3D tdmr_get_pamt_bitmap_sz(tdmr); + } else { + tdmr->pamt_4k_size =3D tdmr_get_pamt_sz(tdmr, TDX_PS_4K); + } tdmr->pamt_2m_size =3D tdmr_get_pamt_sz(tdmr, TDX_PS_2M); tdmr->pamt_1g_size =3D tdmr_get_pamt_sz(tdmr, TDX_PS_1G); tdmr_pamt_size =3D tdmr->pamt_4k_size + tdmr->pamt_2m_size + tdmr->pamt_1= g_size; diff --git a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c b/arch/x86/virt/vm= x/tdx/tdx_global_metadata.c index 13ad2663488b..00ab0e550636 100644 --- a/arch/x86/virt/vmx/tdx/tdx_global_metadata.c +++ b/arch/x86/virt/vmx/tdx/tdx_global_metadata.c @@ -33,6 +33,13 @@ static int get_tdx_sys_info_tdmr(struct tdx_sys_info_tdm= r *sysinfo_tdmr) sysinfo_tdmr->pamt_2m_entry_size =3D val; if (!ret && !(ret =3D read_sys_metadata_field(0x9100000100000012, &val))) sysinfo_tdmr->pamt_1g_entry_size =3D val; + /* + * Don't fail here if tdx_supports_dynamic_pamt() isn't supported. The + * TDX code can fallback to normal PAMT if it's not supported. + */ + if (!ret && tdx_supports_dynamic_pamt(&tdx_sysinfo) && + !(ret =3D read_sys_metadata_field(0x9100000100000013, &val))) + sysinfo_tdmr->pamt_page_bitmap_entry_bits =3D val; =20 return ret; } --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1909332ED31 for ; Thu, 29 Jan 2026 01:15:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649361; cv=none; b=MFanKx5S4NrnNTg4HkDlYvPlGFepjISFdvNhT1ZXtKXkKrmYLHyebTcpZ5wEKbLVx3nPq0Q6MFyLDCK9CJ2bbcAypEmD3Py45FNhDqTbnMVTfCOkcvJQZL8mTj0udNf7o6VjcAqOF8mFRWZOWck+EHdomn/twFzXFI0elyRz2GQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649361; c=relaxed/simple; bh=esb0FgEarAvzd7he3f1yhhOtX7p49PRyk0uclVgGx9Y=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=GVynz/fx+ojgcMieuVNVlBAZ+VCdI+5mO+HSjn/glmEQSkccccIEa2oRQ/LZecqGe3MKAGqNa6+LIAiPluRRGSgPGBYpcTolPaSSimsuk4VF99xs/LsUyjlONvvXPvQM9ib3A4hdS6IfdTJIzGZvW5K4l8ghw+opbIImL2XkUHs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=XZO2dMZL; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="XZO2dMZL" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2a76f2d7744so3326265ad.3 for ; Wed, 28 Jan 2026 17:15:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649354; x=1770254154; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=UlFOBu5FArwUjFk9bCGxsyWYTTpURccpKRpfejh6JpM=; b=XZO2dMZLp55RudMI1Rh6X7k3WvSlotfaJEGCYw6RBXr0Sj2sysggcmKVZ2DL0eaicx kah2lVeKPcSfE+9YLS3ayAewbGnIB7zq7BU+LDOQy2UW9bBKlqBqYxb1FdZKdyTlselT 3yOCqJJegmd08bAUBopFX7VXR5ThZQTxvwy+QljuyYd94FmYZH7QlckyD9WSwCyG6+RF nadJQRYEyTED/f41s2n7SXTQ3AucSpsA2a1RnDCpbtu+LBDvsxxEamxxWIvB+F4+WzzV +DqDfRcaWfQ8qhMb2E0THN9FRk4oCPtBcXtTv9sRsnKopEYM0xQevELWXlADvA5pMDIW r4qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649354; x=1770254154; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=UlFOBu5FArwUjFk9bCGxsyWYTTpURccpKRpfejh6JpM=; b=wwO+VJ7GdtrUJBS1Z/qE/k/yS6cIfGHcPueBK0UPYGx5lB7WdoQm+KiZga1TlK7IoQ Ii6kDPBImeP/lbgq4ObH5ff3k8UC/V6nRbJIp4Vpy7ntHz3iWvV8CilTINRXDNxjLMbo qIt3zYSgiwE+mWnuk+HVkXl/vFwADehyXlUDiIT6jRZGKDV/8LEjMq4vnFb1oBevzK1H 5gUeKayB2V/HFqF3DZIofxnUsHfdWFtSrKDhc4ZRJJhDUWytHUnrVJ8vi/hSvxMfC8nh oYmVHPzSD/Gl6dwCrvQfjyAOEt3CkFNVhi24tGc7rmYj9UPWemJflKxW9uybq1JoWcja 3Glg== X-Gm-Message-State: AOJu0Yw1Kff56luv1d+Nhk6eImcLyjCyf3S6g+GhXNStjs1LJ+U9HWUP r1U8gcOBAZZNoeYOfMFcS8RNR1Xvzg5q87vKiYaCQCeulAa2QOWvTriX74QJvnPU/7cz1bepzDG 7xp7D4g== X-Received: from plbd1.prod.google.com ([2002:a17:902:f141:b0:29d:5afa:2d9]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:902:ca0f:b0:2a0:9fc8:a98b with SMTP id d9443c01a7336-2a870ddc60amr49753935ad.40.1769649354439; Wed, 28 Jan 2026 17:15:54 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:46 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-15-seanjc@google.com> Subject: [RFC PATCH v5 14/45] x86/virt/tdx: Allocate reference counters for PAMT memory From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kirill A. Shutemov The PAMT memory holds metadata for TDX protected memory. With Dynamic PAMT, the 4KB range of PAMT is allocated on demand. The kernel supplies the TDX module with a page pair that covers 2MB of host physical memory. The kernel must provide this page pair before using pages from the range for TDX. If this is not done, any SEAMCALL that attempts to use the memory will fail. Allocate reference counters for every 2MB range to track PAMT memory usage. This is necessary to accurately determine when PAMT memory needs to be allocated and when it can be freed. This allocation will currently consume 2 MB for every 1 TB of address space from 0 to max_pfn (highest pfn of RAM). The allocation size will depend on how the ram is physically laid out. In a worse case scenario where the entire 52 bit address space is covered this would be 8GB. Then the DPAMT refcount allocations could hypothetically exceed the savings from Dynamic PAMT, which is 4GB per TB. This is probably unlikely. However, future changes will reduce this refcount overhead to make DPAMT always a net win. Signed-off-by: Kirill A. Shutemov [Add feedback, update log] Signed-off-by: Rick Edgecombe Tested-by: Sagi Shahar Signed-off-by: Sean Christopherson --- arch/x86/virt/vmx/tdx/tdx.c | 47 ++++++++++++++++++++++++++++++++++++- 1 file changed, 46 insertions(+), 1 deletion(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 517c6759c3ca..db48bf2ce601 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -51,6 +52,16 @@ static DEFINE_PER_CPU(bool, tdx_lp_initialized); =20 static struct tdmr_info_list tdx_tdmr_list; =20 +/* + * On a machine with Dynamic PAMT, the kernel maintains a reference counter + * for every 2M range. The counter indicates how many users there are for + * the PAMT memory of the 2M range. + * + * The kernel allocates PAMT memory when the first user arrives and + * frees it when the last user has left. + */ +static atomic_t *pamt_refcounts; + static enum tdx_module_status_t tdx_module_status; static DEFINE_MUTEX(tdx_module_lock); =20 @@ -184,6 +195,34 @@ int tdx_cpu_enable(void) } EXPORT_SYMBOL_FOR_KVM(tdx_cpu_enable); =20 +/* + * Allocate PAMT reference counters for all physical memory. + * + * It consumes 2MiB for every 1TiB of physical memory. + */ +static int init_pamt_metadata(void) +{ + size_t size =3D DIV_ROUND_UP(max_pfn, PTRS_PER_PTE) * sizeof(*pamt_refcou= nts); + + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return 0; + + pamt_refcounts =3D __vmalloc(size, GFP_KERNEL | __GFP_ZERO); + if (!pamt_refcounts) + return -ENOMEM; + + return 0; +} + +static void free_pamt_metadata(void) +{ + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return; + + vfree(pamt_refcounts); + pamt_refcounts =3D NULL; +} + /* * Add a memory region as a TDX memory block. The caller must make sure * all memory regions are added in address ascending order and don't @@ -1083,10 +1122,14 @@ static int init_tdx_module(void) */ get_online_mems(); =20 - ret =3D build_tdx_memlist(&tdx_memlist); + ret =3D init_pamt_metadata(); if (ret) goto out_put_tdxmem; =20 + ret =3D build_tdx_memlist(&tdx_memlist); + if (ret) + goto err_free_pamt_metadata; + /* Allocate enough space for constructing TDMRs */ ret =3D alloc_tdmr_list(&tdx_tdmr_list, &tdx_sysinfo.tdmr); if (ret) @@ -1136,6 +1179,8 @@ static int init_tdx_module(void) free_tdmr_list(&tdx_tdmr_list); err_free_tdxmem: free_tdx_memlist(&tdx_memlist); +err_free_pamt_metadata: + free_pamt_metadata(); goto out_put_tdxmem; } =20 --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D0537211290 for ; Thu, 29 Jan 2026 01:15:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649360; cv=none; b=KjcgB6J0knUn30NPBv5QdnD4KKegZz2D9NX13QgWRaydgE88BfgBUntKRRiEwTZV27JrECVsxedggV7PlTWMg0xgOItm/mpkPbk2C675/qBh4J1WLOEDoAE+c/NA5W5wBgu9zpZlsMpwg8RJPyRA/cTsVEvRtCMiinCXbFcou58= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649360; c=relaxed/simple; bh=6Mm0CxsePKv0UPnTln/3LFfYBmAkOanL4/Zsp57lTRg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=mLdk4xGkcnutJCC9thJGEWGwQNmbGlwUVek/H/r8TElXUZ0ObIPAAJUNpgIDFesjJUFPp9OlRFGfnUe77LjquQWmSwQ8Oej1VmZfrTq0SYrWJzG7vJWiwvl162/vsTX3x2S/2bTSnxoqF/PT9DDNJmL4AvfLvxvrijLDplOtkmM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=eowRY3tU; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="eowRY3tU" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2a7701b6353so3715175ad.3 for ; Wed, 28 Jan 2026 17:15:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649356; x=1770254156; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:from:to:cc:subject:date :message-id:reply-to; bh=1Bauc2hlFnmZtSKMyZqeZEsyhD6tTcOYVqQRVSOeimY=; b=eowRY3tUfEqiaRtjN3y+WYBVdHTQ7s6Y2DTUVU9xtYvq94GMiMEtb2z3SdszG6TNY5 SPdfa72mclrokx5zPlVdrmpYRerI0rlvnuRZn7ARfGa44l9IbKXZv2agFoPUB+8vx0iB loK5Vfh0ddm5MFAnno2sxFndxdbKxurf9J/QRBOykILC2SyS3BSzVAXqB4bOoN5NRn0O fWhy/oLaKWODqe8Aip5w9VZGwyTynQF3jME8otOGbBHY6IkwntluVl8D84N/mdjt2OGW 6C4tPTO2KHKsDErYdjGnsStUx5PISfZ8Ikkg/p/SspO994z/V0OybjZtL+FhH0W+80Gm 39bA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649356; x=1770254156; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=1Bauc2hlFnmZtSKMyZqeZEsyhD6tTcOYVqQRVSOeimY=; b=aogfEGuCH2LA4i6lec9cxC9SW2Mo/iNZBQJBC1f7lYWAPWmgmR9jE9IDkfs7nyPKKR 0KNh2WiZ66NhJIgFCJEvGzo20IRDG9RRmYVrHb9SGTQnQdnS5pZBshvo9VWKrwmBTf+j N3JAoR5Z6S/t49OmKcxq1d6eryYIF5qbDZLHWDSFMcQZNwlBAD8jOVdufLbysA2/sAJN GtQKbtC7SnpqWNbKzpuLCXd1RLutvwpLBFeitvwepaQfP7fi2HYugpb/4GUSgzJuc+BC MAsaDK02egGuLewmshmYX1I2Js+nKlcJyfxJ3whg7nLSzfh4EuJdcICnLNITesX5O/bh Xvog== X-Gm-Message-State: AOJu0YxlYxU1kzcfQ+Q0QC8h1h1KOCh/trtZRJh0KdHZr9JBhAq7107k TvYeudla31CG+twSgl9Fq1uzKiGYOf2g5oNv5ZhGIMoKeyD/5p1LyLufYuhMIURphVBKM+pWmwD zlExIYw== X-Received: from pjbso3.prod.google.com ([2002:a17:90b:1f83:b0:33b:c211:1fa9]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:384e:b0:340:b912:536 with SMTP id 98e67ed59e1d1-353fed87a7amr5569509a91.31.1769649356235; Wed, 28 Jan 2026 17:15:56 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:47 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-16-seanjc@google.com> Subject: [RFC PATCH v5 15/45] x86/virt/tdx: Improve PAMT refcounts allocation for sparse memory From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kirill A. Shutemov init_pamt_metadata() allocates PAMT refcounts for all physical memory up to max_pfn. It might be suboptimal if the physical memory layout is discontinuous and has large holes. The refcount allocation vmalloc allocation. This is necessary to support a large allocation size. The virtually contiguous property also makes it easy to find a specific 2MB range=E2=80=99s refcount since it can simply be indexed. Since vmalloc mappings support remapping during normal kernel runtime, switch to an approach that only populates refcount pages for the vmalloc mapping when there is actually memory for that range. This means any holes in the physical address space won=E2=80=99t use actual physical memory. The validity of this memory optimization is based on a couple assumptions: 1. Physical holes in the ram layout are commonly large enough for it to be worth it. 2. An alternative approach that looks the refcounts via some more layered data structure wouldn=E2=80=99t overly complicate the lookups. Or at lea= st more than the complexity of managing the vmalloc mapping. Signed-off-by: Kirill A. Shutemov [Add feedback, update log] Signed-off-by: Rick Edgecombe Tested-by: Sagi Shahar Signed-off-by: Sean Christopherson --- arch/x86/virt/vmx/tdx/tdx.c | 122 ++++++++++++++++++++++++++++++++++-- 1 file changed, 118 insertions(+), 4 deletions(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index db48bf2ce601..f6e80aba5895 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -195,30 +195,135 @@ int tdx_cpu_enable(void) } EXPORT_SYMBOL_FOR_KVM(tdx_cpu_enable); =20 +/* Find PAMT refcount for a given physical address */ +static atomic_t *tdx_find_pamt_refcount(unsigned long pfn) +{ + /* Find which PMD a PFN is in. */ + unsigned long index =3D pfn >> (PMD_SHIFT - PAGE_SHIFT); + + return &pamt_refcounts[index]; +} + +/* Map a page into the PAMT refcount vmalloc region */ +static int pamt_refcount_populate(pte_t *pte, unsigned long addr, void *da= ta) +{ + struct page *page; + pte_t entry; + + page =3D alloc_page(GFP_KERNEL | __GFP_ZERO); + if (!page) + return -ENOMEM; + + entry =3D mk_pte(page, PAGE_KERNEL); + + spin_lock(&init_mm.page_table_lock); + /* + * PAMT refcount populations can overlap due to rounding of the + * start/end pfn. Make sure the PAMT range is only populated once. + */ + if (pte_none(ptep_get(pte))) + set_pte_at(&init_mm, addr, pte, entry); + else + __free_page(page); + spin_unlock(&init_mm.page_table_lock); + + return 0; +} + /* - * Allocate PAMT reference counters for all physical memory. + * Allocate PAMT reference counters for the given PFN range. * * It consumes 2MiB for every 1TiB of physical memory. */ +static int alloc_pamt_refcount(unsigned long start_pfn, unsigned long end_= pfn) +{ + unsigned long refcount_first, refcount_last; + unsigned long mapping_start, mapping_end; + + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return 0; + + /* + * 'start_pfn' is inclusive and 'end_pfn' is exclusive. Find the + * range of refcounts the pfn range will need. + */ + refcount_first =3D (unsigned long)tdx_find_pamt_refcount(start_pfn); + refcount_last =3D (unsigned long)tdx_find_pamt_refcount(end_pfn - 1); + + /* + * Calculate the page aligned range that includes the refcounts. The + * teardown logic needs to handle potentially overlapping refcount + * mappings resulting from the alignments. + */ + mapping_start =3D round_down(refcount_first, PAGE_SIZE); + mapping_end =3D round_up(refcount_last + sizeof(*pamt_refcounts), PAGE_= SIZE); + + + return apply_to_page_range(&init_mm, mapping_start, mapping_end - mapping= _start, + pamt_refcount_populate, NULL); +} + +/* + * Reserve vmalloc range for PAMT reference counters. It covers all physic= al + * address space up to max_pfn. It is going to be populated from + * build_tdx_memlist() only for present memory that available for TDX use. + * + * It reserves 2MiB of virtual address space for every 1TiB of physical me= mory. + */ static int init_pamt_metadata(void) { - size_t size =3D DIV_ROUND_UP(max_pfn, PTRS_PER_PTE) * sizeof(*pamt_refcou= nts); + struct vm_struct *area; + size_t size; =20 if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) return 0; =20 - pamt_refcounts =3D __vmalloc(size, GFP_KERNEL | __GFP_ZERO); - if (!pamt_refcounts) + size =3D DIV_ROUND_UP(max_pfn, PTRS_PER_PTE) * sizeof(*pamt_refcounts); + + area =3D get_vm_area(size, VM_SPARSE); + if (!area) return -ENOMEM; =20 + pamt_refcounts =3D area->addr; return 0; } =20 +/* Unmap a page from the PAMT refcount vmalloc region */ +static int pamt_refcount_depopulate(pte_t *pte, unsigned long addr, void *= data) +{ + struct page *page; + pte_t entry; + + spin_lock(&init_mm.page_table_lock); + + entry =3D ptep_get(pte); + /* refcount allocation is sparse, may not be populated */ + if (!pte_none(entry)) { + pte_clear(&init_mm, addr, pte); + page =3D pte_page(entry); + __free_page(page); + } + + spin_unlock(&init_mm.page_table_lock); + + return 0; +} + +/* Unmap all PAMT refcount pages and free vmalloc range */ static void free_pamt_metadata(void) { + size_t size; + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) return; =20 + size =3D max_pfn / PTRS_PER_PTE * sizeof(*pamt_refcounts); + size =3D round_up(size, PAGE_SIZE); + + apply_to_existing_page_range(&init_mm, + (unsigned long)pamt_refcounts, + size, pamt_refcount_depopulate, + NULL); vfree(pamt_refcounts); pamt_refcounts =3D NULL; } @@ -289,10 +394,19 @@ static int build_tdx_memlist(struct list_head *tmb_li= st) ret =3D add_tdx_memblock(tmb_list, start_pfn, end_pfn, nid); if (ret) goto err; + + /* Allocated PAMT refcountes for the memblock */ + ret =3D alloc_pamt_refcount(start_pfn, end_pfn); + if (ret) + goto err; } =20 return 0; err: + /* + * Only free TDX memory blocks here, PAMT refcount pages + * will be freed in the init_tdx_module() error path. + */ free_tdx_memlist(tmb_list); return ret; } --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BAF9232939A for ; Thu, 29 Jan 2026 01:15:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649361; cv=none; b=jegLMfUfBtjj6OskDB0PXhYe21R8YD5Qk/D+g6/zFsLdV9Fv3nSMZm/gAoSeW1f0ZJy9Qm9lG63b4iTWDNdGcpSVZEAgHZtCzCq2MJHGa8dGaKPVotlcK+FSYfmVGVWoI9k1USkuiAliv9s3o9FHhegZVF9PThg9Qe6M2X1mDaU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649361; c=relaxed/simple; bh=+qNafrKEDMrDmnSzhO+7B/CncJYata4dM8nSNluXeFE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=IbIdzgd3zZDfm6OZ971Au3vAjHInzzzW/eoZvw99qwJez6YlmQ8XJNYrY2td55yupfrtGyusAX7ku1+u4d2fDM3ssP1pFFMqMH5DL++ZxGj9I+yI50Aok9HrYAd/1KXQeEPr+zr8T4f+J73AQycUBwqYNIXRbY+EcKi2aOBFJos= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=wv0RMrEO; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="wv0RMrEO" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-352ec74a925so682998a91.2 for ; Wed, 28 Jan 2026 17:15:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649358; x=1770254158; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:from:to:cc:subject:date :message-id:reply-to; bh=QAsPNO0+ZutAYPgOffflP5U84vjSbbQ0NN/8afdydvg=; b=wv0RMrEOlMYU3OWc/M+iR+p8VFpTq/9WufSQnfKhOb5RDEPfXQuTHWdITYSW9L5z5n spiDHoMiHZFwkAPMk2fGYfIeJdTt6yBEjRoF6D/2B6vCrk1SnassJjAM2eavkR7dRqVK MQXrW0da3tw2iWlDb3t8IZ2Il03SkdzRSxxAVQy/ZSUsv8iqkiuypn/bM0yVIJYkqPsN sBRAFA4pRwwNpJA/gFcMbDZ195G6EPAJLe2/NwiHGP4XQ9zOkD8V4QsIIGW5RjpCAzi+ kaZGcU1tR1jeooYVJln9Cnd3d2d0KrAybLioleE1Oz/yQd82BC3gZquD+Rv2T9V61cyg eVkA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649358; x=1770254158; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=QAsPNO0+ZutAYPgOffflP5U84vjSbbQ0NN/8afdydvg=; b=ElxU+q/+DRhvhG+sjOyzO8xmCz1yT5wFn1YZRLYVDN5uLQCk+opB93nY0/ZITs/EEw oKEtKS/HbQowFOqt/kzVtcXHPoUzrEv6tVuVjFxgBzEtjzYfnO+noS4a6+AEwKmu2x+u bU9L4LzgXu9UpZncg/AVMc7Vm9mlOxSDXOjzechYYBdvW1OV3cFAuXDCZT3/ah55JAo8 yDQNiTnnRN6EVNg36tBh/tdVZ8jlMEXe+pgHkNgX0/z7B2OiVE7UZuFv1NBqRVOC0XFB mI5yug2ThWXaiViQdlDXHCzRutn0NbRhsFEa97kj+qzPwI7XY8VuiUn9lnRA2+1WC3rW Zz9g== X-Gm-Message-State: AOJu0YxFbjWwIjiD5pzLw/rqNtePxiQWcy8c7hQmHWFqHjxo6HsXXEIL 1uJCJpP7gl9O9uRQ6EtpGPrsbAzFRt21RyAvvrQnLbooVct+arWEZgKHgrUmaTQpYv7J9rs2eyD dQ9NuVQ== X-Received: from pjbbh4.prod.google.com ([2002:a17:90b:484:b0:34c:dd6d:b10e]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5408:b0:34c:253d:581d with SMTP id 98e67ed59e1d1-353fece3407mr6693122a91.9.1769649358000; Wed, 28 Jan 2026 17:15:58 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:48 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-17-seanjc@google.com> Subject: [RFC PATCH v5 16/45] x86/virt/tdx: Add tdx_alloc/free_control_page() helpers From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kirill A. Shutemov Add helpers to use when allocating or preparing pages that are gifted to the TDX-Module for use as control/S-EPT pages, and thus need DPAMT backing. Make them handle races internally for the case of multiple callers trying operate on the same 2MB range simultaneously. While the TDX initialization code in arch/x86 uses pages with 2MB alignment, KVM will need to hand 4KB pages for it to use. Under DPAMT, these pages will need DPAMT backing 4KB backing. Add tdx_alloc_control_page() and tdx_free_control_page() to handle both page allocation and DPAMT installation. Make them behave like normal alloc/free functions where allocation can fail in the case of no memory, but free (with any necessary DPAMT release) always succeeds. Do this so they can support the existing TDX flows that require cleanups to succeed. Also create tdx_pamt_put()/tdx_pamt_get() to handle installing DPAMT 4KB backing for pages that are already allocated (such as external page tables, or S-EPT pages). Allocate the pages as GFP_KERNEL_ACCOUNT based on that the allocations will be easily user triggerable. Since the source of these pages is the page allocator, multiple TDs could each get 4KB pages that are covered by the same 2MB range. When this happens only one page pair needs to be installed to cover the 2MB range. Similarly, when one page is freed, the DPAMT backing cannot be freed until all TDX pages in the range are no longer in use. Have the helpers manage these races internally. So the requirements are that: 1. Free path cannot fail (i.e. no TDX module BUSY errors). 2. Allocation paths need to handle finding that DPAMT backing is already installed, and only return an error in the case of no memory, not in the case of losing races with other=E2=80=99s trying to operate on the same = DPAMT range. 3. Free paths cannot fail, and also need to clean up the DPAMT backing when the last page in the 2MB range is no longer needed by TDX. Previous changes allocated refcounts to be used to track how many 4KB pages are in use by TDX for each 2MB region. So update those inside the helpers and use them to decide when to actually install the DPAMT backing pages. tdx_pamt_put() needs to guarantee the DPAMT is installed before returning so that racing threads don=E2=80=99t tell the TDX module to operate on the = page before it=E2=80=99s installed. Take a lock while adjusting the refcount and= doing the actual TDH.PHYMEM.PAMT.ADD/REMOVE to make sure these happen atomically. The lock is heavyweight, but will be optimized in future changes. Just do the simple solution before any complex improvements. TDH.PHYMEM.PAMT.ADD/REMOVE take exclusive locks at the granularity each 2MB range. A simultaneous attempt to operate on the same 2MB region would result in a BUSY error code returned from the SEAMCALL. Since the invocation of SEAMCALLs are behind a lock, this won=E2=80=99t conflict. Besides the contention between TDH.PHYMEM.PAMT.ADD/REMOVE, many other SEAMCALLs take the same 2MB granularity locks as shared. This means any attempt to operate on the page by the TDX module while simultaneously doing PAMT.ADD/REMOVE will result in a BUSY error. This should not happen, as the PAMT pages always has to be installed before giving the pages to the TDX module anyway. Signed-off-by: Kirill A. Shutemov [Add feedback, update log] Signed-off-by: Rick Edgecombe Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson --- arch/x86/include/asm/tdx.h | 24 +++- arch/x86/virt/vmx/tdx/tdx.c | 264 ++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 2 + 3 files changed, 289 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 57d5f07e3735..fa29be18498c 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -16,6 +16,7 @@ =20 #include #include +#include #include =20 /* @@ -135,11 +136,32 @@ static inline bool tdx_supports_dynamic_pamt(const st= ruct tdx_sys_info *sysinfo) return false; /* To be enabled when kernel is ready */ } =20 +void tdx_quirk_reset_page(struct page *page); + int tdx_guest_keyid_alloc(void); u32 tdx_get_nr_guest_keyids(void); void tdx_guest_keyid_free(unsigned int keyid); =20 -void tdx_quirk_reset_page(struct page *page); +struct page *__tdx_alloc_control_page(gfp_t gfp); +void __tdx_free_control_page(struct page *page); + +static inline unsigned long tdx_alloc_control_page(gfp_t gfp) +{ + struct page *page =3D __tdx_alloc_control_page(gfp); + + if (!page) + return 0; + + return (unsigned long)page_address(page); +} + +static inline void tdx_free_control_page(unsigned long addr) +{ + if (!addr) + return; + + __tdx_free_control_page(virt_to_page(addr)); +} =20 struct tdx_td { /* TD root structure: */ diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index f6e80aba5895..682c8a228b53 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1824,6 +1824,50 @@ u64 tdh_mng_rd(struct tdx_td *td, u64 field, u64 *da= ta) } EXPORT_SYMBOL_FOR_KVM(tdh_mng_rd); =20 +/* Number PAMT pages to be provided to TDX module per 2M region of PA */ +static int tdx_dpamt_entry_pages(void) +{ + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return 0; + + return tdx_sysinfo.tdmr.pamt_4k_entry_size * PTRS_PER_PTE / PAGE_SIZE; +} + +/* + * For SEAMCALLs that pass a bundle of pages, the TDX spec treats the regi= sters + * like an array, as they are ordered in the struct. The effective array = size + * is (obviously) limited by the number or registers, relative to the star= ting + * register. Fill the register array at a given starting register, with s= anity + * checks to avoid overflowing the args structure. + */ +static void dpamt_copy_regs_array(struct tdx_module_args *args, void *reg, + u64 *pamt_pa_array, bool copy_to_regs) +{ + int size =3D tdx_dpamt_entry_pages() * sizeof(*pamt_pa_array); + + if (WARN_ON_ONCE(reg + size > (void *)args) + sizeof(*args)) + return; + + /* Copy PAMT page PA's to/from the struct per the TDX ABI. */ + if (copy_to_regs) + memcpy(reg, pamt_pa_array, size); + else + memcpy(pamt_pa_array, reg, size); +} + +#define dpamt_copy_from_regs(dst, args, reg) \ + dpamt_copy_regs_array(args, &(args)->reg, dst, false) + +#define dpamt_copy_to_regs(args, reg, src) \ + dpamt_copy_regs_array(args, &(args)->reg, src, true) + +/* + * When declaring PAMT arrays on the stack, use the maximum theoretical nu= mber + * of entries that can be squeezed into a SEAMCALL, as stack allocations a= re + * practically free, i.e. any wasted space is a non-issue. + */ +#define MAX_NR_DPAMT_ARGS (sizeof(struct tdx_module_args) / sizeof(u64)) + u64 tdh_mr_extend(struct tdx_td *td, u64 gpa, u64 *ext_err1, u64 *ext_err2) { struct tdx_module_args args =3D { @@ -2020,6 +2064,226 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct pa= ge *page) } EXPORT_SYMBOL_FOR_KVM(tdh_phymem_page_wbinvd_hkid); =20 +static int alloc_pamt_array(u64 *pa_array) +{ + struct page *page; + int i; + + for (i =3D 0; i < tdx_dpamt_entry_pages(); i++) { + page =3D alloc_page(GFP_KERNEL_ACCOUNT); + if (!page) + goto err; + pa_array[i] =3D page_to_phys(page); + } + + return 0; +err: + /* + * Zero the rest of the array to help with + * freeing in error paths. + */ + for (; i < tdx_dpamt_entry_pages(); i++) + pa_array[i] =3D 0; + return -ENOMEM; +} + +static void free_pamt_array(u64 *pa_array) +{ + for (int i =3D 0; i < tdx_dpamt_entry_pages(); i++) { + if (!pa_array[i]) + break; + + /* + * Reset pages unconditionally to cover cases + * where they were passed to the TDX module. + */ + tdx_quirk_reset_paddr(pa_array[i], PAGE_SIZE); + + __free_page(phys_to_page(pa_array[i])); + } +} + +/* + * Calculate the arg needed for operating on the DPAMT backing for + * a given 4KB page. + */ +static u64 pamt_2mb_arg(struct page *page) +{ + unsigned long hpa_2mb =3D ALIGN_DOWN(page_to_phys(page), PMD_SIZE); + + return hpa_2mb | TDX_PS_2M; +} + +/* + * Add PAMT backing for the given page. Return's negative error code + * for kernel side error conditions (-ENOMEM) and 1 for TDX Module + * error. In the case of TDX module error, the return code is stored + * in tdx_err. + */ +static u64 tdh_phymem_pamt_add(struct page *page, u64 *pamt_pa_array) +{ + struct tdx_module_args args =3D { + .rcx =3D pamt_2mb_arg(page) + }; + + dpamt_copy_to_regs(&args, rdx, pamt_pa_array); + + return seamcall(TDH_PHYMEM_PAMT_ADD, &args); +} + +/* Remove PAMT backing for the given page. */ +static u64 tdh_phymem_pamt_remove(struct page *page, u64 *pamt_pa_array) +{ + struct tdx_module_args args =3D { + .rcx =3D pamt_2mb_arg(page), + }; + u64 ret; + + ret =3D seamcall_ret(TDH_PHYMEM_PAMT_REMOVE, &args); + if (ret) + return ret; + + dpamt_copy_from_regs(pamt_pa_array, &args, rdx); + return 0; +} + +/* Serializes adding/removing PAMT memory */ +static DEFINE_SPINLOCK(pamt_lock); + +/* Bump PAMT refcount for the given page and allocate PAMT memory if neede= d */ +static int tdx_pamt_get(struct page *page) +{ + u64 pamt_pa_array[MAX_NR_DPAMT_ARGS]; + atomic_t *pamt_refcount; + u64 tdx_status; + int ret; + + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return 0; + + ret =3D alloc_pamt_array(pamt_pa_array); + if (ret) + goto out_free; + + pamt_refcount =3D tdx_find_pamt_refcount(page_to_pfn(page)); + + scoped_guard(spinlock, &pamt_lock) { + /* + * If the pamt page is already added (i.e. refcount >=3D 1), + * then just increment the refcount. + */ + if (atomic_read(pamt_refcount)) { + atomic_inc(pamt_refcount); + goto out_free; + } + + /* Try to add the pamt page and take the refcount 0->1. */ + tdx_status =3D tdh_phymem_pamt_add(page, pamt_pa_array); + if (WARN_ON_ONCE(!IS_TDX_SUCCESS(tdx_status))) { + ret =3D -EIO; + goto out_free; + } + + atomic_inc(pamt_refcount); + } + + return 0; + +out_free: + /* + * pamt_pa_array is populated or zeroed up to tdx_dpamt_entry_pages() + * above. free_pamt_array() can handle either case. + */ + free_pamt_array(pamt_pa_array); + return ret; +} + +/* + * Drop PAMT refcount for the given page and free PAMT memory if it is no + * longer needed. + */ +static void tdx_pamt_put(struct page *page) +{ + u64 pamt_pa_array[MAX_NR_DPAMT_ARGS]; + atomic_t *pamt_refcount; + u64 tdx_status; + + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return; + + pamt_refcount =3D tdx_find_pamt_refcount(page_to_pfn(page)); + + scoped_guard(spinlock, &pamt_lock) { + /* + * If the there are more than 1 references on the pamt page, + * don't remove it yet. Just decrement the refcount. + */ + if (atomic_read(pamt_refcount) > 1) { + atomic_dec(pamt_refcount); + return; + } + + /* Try to remove the pamt page and take the refcount 1->0. */ + tdx_status =3D tdh_phymem_pamt_remove(page, pamt_pa_array); + + /* + * Don't free pamt_pa_array as it could hold garbage when + * tdh_phymem_pamt_remove() fails. Don't panic/BUG_ON(), as + * there is no risk of data corruption, but do yell loudly as + * failure indicates a kernel bug, memory is being leaked, and + * the dangling PAMT entry may cause future operations to fail. + */ + if (WARN_ON_ONCE(!IS_TDX_SUCCESS(tdx_status))) + return; + + atomic_dec(pamt_refcount); + } + + /* + * pamt_pa_array is populated up to tdx_dpamt_entry_pages() by the TDX + * module with pages, or remains zero inited. free_pamt_array() can + * handle either case. Just pass it unconditionally. + */ + free_pamt_array(pamt_pa_array); +} + +/* + * Return a page that can be gifted to the TDX-Module for use as a "contro= l" + * page, i.e. pages that are used for control and S-EPT structures for a g= iven + * TDX guest, and bound to said guest's HKID and thus obtain TDX protectio= ns, + * including PAMT tracking. + */ +struct page *__tdx_alloc_control_page(gfp_t gfp) +{ + struct page *page; + + page =3D alloc_page(gfp); + if (!page) + return NULL; + + if (tdx_pamt_get(page)) { + __free_page(page); + return NULL; + } + + return page; +} +EXPORT_SYMBOL_FOR_KVM(__tdx_alloc_control_page); + +/* + * Free a page that was gifted to the TDX-Module for use as a control/S-EPT + * page. After this, the page is no longer protected by TDX. + */ +void __tdx_free_control_page(struct page *page) +{ + if (!page) + return; + + tdx_pamt_put(page); + __free_page(page); +} +EXPORT_SYMBOL_FOR_KVM(__tdx_free_control_page); + #ifdef CONFIG_KEXEC_CORE void tdx_cpu_flush_cache_for_kexec(void) { diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 82bb82be8567..46c4214b79fb 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -46,6 +46,8 @@ #define TDH_PHYMEM_PAGE_WBINVD 41 #define TDH_VP_WR 43 #define TDH_SYS_CONFIG 45 +#define TDH_PHYMEM_PAMT_ADD 58 +#define TDH_PHYMEM_PAMT_REMOVE 59 =20 /* * SEAMCALL leaf: --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D223A340A57 for ; Thu, 29 Jan 2026 01:16:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649364; cv=none; b=l3z98JPVOMBn1t18X31rb0jkoqg8vvSBeTiIxCIJohMMHagRF7LOO/fQO4b/jo4tHcsEDxOaXu9oWbgfGEYOs7/0Qw1wLtk0znxvfVbD0ZzLU8EiW5VJ9MIlxeUfpigBKwDsz2r0vFQ3MAhtMbWcN5tq89h4K+6YcN4DtOQlIDo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649364; c=relaxed/simple; bh=9SrYLBLoBg+Brv1S2Cy63Tr95H28nw4CNLfY9ahurBw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=o+r42ZyaDIZEUWRHIZLlepHmjrpoudpmA1suqe+Orfy1MFASIlahO//Lp+KeEYyhl/MvBH65Jvqon9y/cP5hkIcVD+WPjlsyliohQibym2VNqOGS3hAUgLajzNNgjHVq8rM/wOUgU9Sn/9AANOkMLZfu1c6PrvsEUXvGlGMwpd4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=r+gj6EBc; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="r+gj6EBc" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-34c48a76e75so341081a91.1 for ; Wed, 28 Jan 2026 17:16:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649361; x=1770254161; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:from:to:cc:subject:date :message-id:reply-to; bh=SvybPfEFtNPjwlwiGqMKsAxDfWeIkvf3Jbxr1ERSoXA=; b=r+gj6EBcaWCWPlTtfZPUhk9lI4xMzOaaLh0rtZl0FCyoFtnbLjEst50/YU+gtWsP8u KNSzAcPyOumXIXOS8x2AtKV7TaD61nhL+o46s6YW2nkt62G9ZXcFqLxOBNeKmqL23rpD OXXLKUNOAp64tT7phVstRIU4JN6UWEIyRAwA5LlgwSfn2WpDjdwBUhYo3JPJ3cL0eX/a /FjLHN0saVeYB3BlyhBwZ67wim9giFwvCf/oqMHmhlc3P3C3RxavOpNUtuIBFT8WC5/l xwUSa8rC6+4p+KyGN0I32A9N8JHzSpJzF9MaA3cubIYIcppBwIAPh/LsdrIVvPB3Hw+v 9xdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649361; x=1770254161; h=content-transfer-encoding:cc:to:from:subject:message-id:references :mime-version:in-reply-to:date:reply-to:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=SvybPfEFtNPjwlwiGqMKsAxDfWeIkvf3Jbxr1ERSoXA=; b=J+FADyDWdgQOoMiRhioVoOUQ3mZadE7Y1suAFDTyNt3aM2ds/1XsNtuHXU4TKU7HTx pezFVCdw9u7SFyQvejiCrQjj89rUuRxLXRt+5EyBkNyUsbPcX00rlmeeusmeWH8l/fsE 831PN4hbnKg+FEn6R8n+pp5yknu5nDm9OH3+WAxKLxPLiJPB5dy8ft7h+qq+xvmnUCso 5pziZjfMr2HGuNbKicQOwyz0lRuDoGC0bLjWrmy02UC9hg5WiiUjD+IxEKTZj4MLGC2I 2+lub40PacsrjQCx1p8hMZywHe1bezy6DM9yxG7D1ogApEFslV3GzCunZzFKGo6jAy0a tfyA== X-Gm-Message-State: AOJu0Ywwj44Btk61c9hDm6h5ea5aBinv8MB4nF8IEgM2MwZPbQNfW26D n/r6Al6Dwu8I9uoo2JJRnvTsCMi8r2YyoCi6sqJZ5dEYvglLrUq3ABD9hsAW8PuaxCYZOs/FSH6 BgmGpWw== X-Received: from pjzh6.prod.google.com ([2002:a17:90a:ea86:b0:352:fa90:e943]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:268a:b0:353:6373:590b with SMTP id 98e67ed59e1d1-353fecc6720mr7267165a91.7.1769649361299; Wed, 28 Jan 2026 17:16:01 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:49 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-18-seanjc@google.com> Subject: [RFC PATCH v5 17/45] x86/virt/tdx: Optimize tdx_alloc/free_control_page() helpers From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kirill A. Shutemov Optimize the PAMT alloc/free helpers to avoid taking the global lock when possible. The recently introduced PAMT alloc/free helpers maintain a refcount to keep track of when it is ok to reclaim and free a 4KB PAMT page. This refcount is protected by a global lock in order to guarantee that races don=E2=80=99t result in the PAMT getting freed while another caller request= s it be mapped. But a global lock is a bit heavyweight, especially since the refcounts can be (already are) updated atomically. A simple approach would be to increment/decrement the refcount outside of the lock before actually adjusting the PAMT, and only adjust the PAMT if the refcount transitions from/to 0. This would correctly allocate and free the PAMT page without getting out of sync. But there it leaves a race where a simultaneous caller could see the refcount already incremented and return before it is actually mapped. So treat the refcount 0->1 case as a special case. On add, if the refcount is zero *don=E2=80=99t* increment the refcount outside the lock (to 1). Alw= ays take the lock in that case and only set the refcount to 1 after the PAMT is actually added. This way simultaneous adders, when PAMT is not installed yet, will take the slow lock path. On the 1->0 case, it is ok to return from tdx_pamt_put() when the DPAMT is not actually freed yet, so the basic approach works. Just decrement the refcount before taking the lock. Only do the lock and removal of the PAMT when the refcount goes to zero. There is an asymmetry between tdx_pamt_get() and tdx_pamt_put() in that tdx_pamt_put() goes 1->0 outside the lock, but tdx_pamt_get() does 0-1 inside the lock. Because of this, there is a special race where tdx_pamt_put() could decrement the refcount to zero before the PAMT is actually removed, and tdx_pamt_get() could try to do a PAMT.ADD when the page is already mapped. Luckily the TDX module will tell return a special error that tells us we hit this case. So handle it specially by looking for the error code. The optimization is a little special, so make the code extra commented and verbose. Signed-off-by: Kirill A. Shutemov [Clean up code, update log] Signed-off-by: Rick Edgecombe Tested-by: Sagi Shahar Signed-off-by: Sean Christopherson --- arch/x86/include/asm/shared/tdx_errno.h | 2 + arch/x86/virt/vmx/tdx/tdx.c | 69 +++++++++++++++++++------ 2 files changed, 54 insertions(+), 17 deletions(-) diff --git a/arch/x86/include/asm/shared/tdx_errno.h b/arch/x86/include/asm= /shared/tdx_errno.h index e302aed31b50..acf7197527da 100644 --- a/arch/x86/include/asm/shared/tdx_errno.h +++ b/arch/x86/include/asm/shared/tdx_errno.h @@ -21,6 +21,7 @@ #define TDX_PREVIOUS_TLB_EPOCH_BUSY 0x8000020100000000ULL #define TDX_RND_NO_ENTROPY 0x8000020300000000ULL #define TDX_PAGE_METADATA_INCORRECT 0xC000030000000000ULL +#define TDX_HPA_RANGE_NOT_FREE 0xC000030400000000ULL #define TDX_VCPU_NOT_ASSOCIATED 0x8000070200000000ULL #define TDX_KEY_GENERATION_FAILED 0x8000080000000000ULL #define TDX_KEY_STATE_INCORRECT 0xC000081100000000ULL @@ -94,6 +95,7 @@ DEFINE_TDX_ERRNO_HELPER(TDX_SUCCESS); DEFINE_TDX_ERRNO_HELPER(TDX_RND_NO_ENTROPY); DEFINE_TDX_ERRNO_HELPER(TDX_OPERAND_INVALID); DEFINE_TDX_ERRNO_HELPER(TDX_OPERAND_BUSY); +DEFINE_TDX_ERRNO_HELPER(TDX_HPA_RANGE_NOT_FREE); DEFINE_TDX_ERRNO_HELPER(TDX_VCPU_NOT_ASSOCIATED); DEFINE_TDX_ERRNO_HELPER(TDX_FLUSHVP_NOT_DONE); DEFINE_TDX_ERRNO_HELPER(TDX_SW_ERROR); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 682c8a228b53..d333d2790913 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -2161,16 +2161,23 @@ static int tdx_pamt_get(struct page *page) if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) return 0; =20 + pamt_refcount =3D tdx_find_pamt_refcount(page_to_pfn(page)); + + /* + * If the pamt page is already added (i.e. refcount >=3D 1), + * then just increment the refcount. + */ + if (atomic_inc_not_zero(pamt_refcount)) + return 0; + ret =3D alloc_pamt_array(pamt_pa_array); if (ret) goto out_free; =20 - pamt_refcount =3D tdx_find_pamt_refcount(page_to_pfn(page)); - scoped_guard(spinlock, &pamt_lock) { /* - * If the pamt page is already added (i.e. refcount >=3D 1), - * then just increment the refcount. + * Lost race to other tdx_pamt_add(). Other task has already allocated + * PAMT memory for the HPA. */ if (atomic_read(pamt_refcount)) { atomic_inc(pamt_refcount); @@ -2179,12 +2186,30 @@ static int tdx_pamt_get(struct page *page) =20 /* Try to add the pamt page and take the refcount 0->1. */ tdx_status =3D tdh_phymem_pamt_add(page, pamt_pa_array); - if (WARN_ON_ONCE(!IS_TDX_SUCCESS(tdx_status))) { + if (IS_TDX_SUCCESS(tdx_status)) { + /* + * The refcount is zero, and this locked path is the only way to + * increase it from 0-1. If the PAMT.ADD was successful, set it + * to 1 (obviously). + */ + atomic_set(pamt_refcount, 1); + } else if (IS_TDX_HPA_RANGE_NOT_FREE(tdx_status)) { + /* + * Less obviously, another CPU's call to tdx_pamt_put() could have + * decremented the refcount before entering its lock section. + * In this case, the PAMT is not actually removed yet. Luckily + * TDX module tells about this case, so increment the refcount + * 0-1, so tdx_pamt_put() skips its pending PAMT.REMOVE. + * + * The call didn't need the pages though, so free them. + */ + atomic_set(pamt_refcount, 1); + goto out_free; + } else { + WARN_ON_ONCE(1); ret =3D -EIO; goto out_free; } - - atomic_inc(pamt_refcount); } =20 return 0; @@ -2213,15 +2238,21 @@ static void tdx_pamt_put(struct page *page) =20 pamt_refcount =3D tdx_find_pamt_refcount(page_to_pfn(page)); =20 + /* + * If the there are more than 1 references on the pamt page, + * don't remove it yet. Just decrement the refcount. + * + * Unlike the paired call in tdx_pamt_get(), decrement the refcount + * outside the lock even if it's the special 0<->1 transition. See + * special logic around HPA_RANGE_NOT_FREE in tdx_pamt_get(). + */ + if (!atomic_dec_and_test(pamt_refcount)) + return; + scoped_guard(spinlock, &pamt_lock) { - /* - * If the there are more than 1 references on the pamt page, - * don't remove it yet. Just decrement the refcount. - */ - if (atomic_read(pamt_refcount) > 1) { - atomic_dec(pamt_refcount); + /* Lost race with tdx_pamt_get(). */ + if (atomic_read(pamt_refcount)) return; - } =20 /* Try to remove the pamt page and take the refcount 1->0. */ tdx_status =3D tdh_phymem_pamt_remove(page, pamt_pa_array); @@ -2233,10 +2264,14 @@ static void tdx_pamt_put(struct page *page) * failure indicates a kernel bug, memory is being leaked, and * the dangling PAMT entry may cause future operations to fail. */ - if (WARN_ON_ONCE(!IS_TDX_SUCCESS(tdx_status))) + if (WARN_ON_ONCE(!IS_TDX_SUCCESS(tdx_status))) { + /* + * Since the refcount was optimistically decremented above + * outside the lock, revert it if there is a failure. + */ + atomic_inc(pamt_refcount); return; - - atomic_dec(pamt_refcount); + } } =20 /* --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 17CA933AD8A for ; Thu, 29 Jan 2026 01:16:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649366; cv=none; b=p/FXpj+Syfj4eAXe0T+YKvdoz3f7RfrSikvAIzlmUdMyN+OaDawVlJUGISgHWoYBn4oydYMTnvCXlc3vKGsHpKyECOKpX60zEMynzRPqe2gePMmzyfo0Zr1zVOmntnEWlAZ9UXZVhWy9RlfitZ1Rz57r2owjTuhSO7JuOHQbJx0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649366; c=relaxed/simple; bh=AXxQxnhtnt5YWRVt+lGWA6sUB4Tt8qHFbJ97Bkuq2Ck=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=uiMuzDSPESNhue8gL0AKYnpBHXM1x8sd0hZUSuqOKHeBMnN8A5xt8xUnmbpvPdayakB+DcFMElMKOPai48N4j7MLwGqAqvnmlNKcJzH9WJdW3NxYUzEMOJgEZkvPld2ggSTQt3RDmq7FwEf34i9imG66ucNAzI9gqp3j8lcwbZQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=0XC4V+9H; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="0XC4V+9H" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-352de7a89e1so338203a91.1 for ; Wed, 28 Jan 2026 17:16:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649363; x=1770254163; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=CQTLYZXbjXhuegQycdxIvwBS6byU7gVEkkc0u846/P4=; b=0XC4V+9HgEwcFw83xyxzkzLqqU42+KuT1CemPHaI6AR9FH3kboNo4bWF2E+DnoI9qn YS4smVIMsi6FDL1UFhjFlcuDM431X9t0iQauv5CojKeovYTglVYa2iu/bS8CiE67QCpG g+XK5xAweSQTz9v8PkkLIUs1F+qmP0U3Xz+Q1vu2Jn6hg50HSYgavr9VeJoAZjok5Czy eaxocl3bfZQY0Jp6krzIG1ehPwnkWdRY71SMwiQ95b5eAB2oMnF9i+qRm5Qa4kBN7vg3 3e1M3ZL6Kjnr4j/soJfCj5zRTmigwIK1/td/Z5wg2tpb3MoqN//HXNlxpNkWluw4AeF6 W1MA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649363; x=1770254163; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=CQTLYZXbjXhuegQycdxIvwBS6byU7gVEkkc0u846/P4=; b=tQi5ZCOu/++PTsye5nrD3+Q4XJUfPHr9LVW0GFMlsy2Zl3GmG0dNh4QBXhOZBbsdLy PjGuPmfIkYVVB1Y/1RKt72kjdSu84T79dad23mX6u228lA9FRgm8iSzw031wwM7fZNe5 WNpKNxoLZtEM+n+DvETSu4E6WMVWxB1q/sHY0loTrKB8QamgX8BKhXAOXU4tCGCRPT2Y BQXzA0iFE5wzkqWfPtp+7uAVGvXFxXY8pDd4O7QttrhC8ZVqwiE11+NF9OLLSTZYKPZK dJJG2N+uqWrcalHYqOUjyFNGglppKeigvJuAC9PSsqfVKO/vrdGvvNuYP8kQaGjQCTbD cKMQ== X-Gm-Message-State: AOJu0YyqYN9ryF0wFxSb/B28Sgvs8CODYJwvDcyeMH2RTQfinTZ1pJAB t4eXADzHDusEAZqxhjNAnzSNGgYZDMcSlEhmZLHLOmvGqIMZRY5h8XhUoFD6UinF5+Z5lU4sHiT 7i3UirA== X-Received: from pjbmy4.prod.google.com ([2002:a17:90b:4c84:b0:353:31e9:fe44]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:51cb:b0:340:a5b2:c305 with SMTP id 98e67ed59e1d1-353feccf407mr5767501a91.2.1769649363134; Wed, 28 Jan 2026 17:16:03 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:50 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-19-seanjc@google.com> Subject: [RFC PATCH v5 18/45] KVM: TDX: Allocate PAMT memory for TD and vCPU control structures From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kirill A. Shutemov TDX TD control structures are provided to the TDX module at 4KB page size and require PAMT backing. This means for Dynamic PAMT they need to also have 4KB backings installed. Use the recently introduce TDX APIs for allocating/freeing control pages, which handle DPAMT maintenance, to allocate/free TD and vCPU pages for TDX guests. Signed-off-by: Kirill A. Shutemov [update log] Signed-off-by: Rick Edgecombe [sean: handle alloc+free+reclaim in one patch] Signed-off-by: Sean Christopherson --- arch/x86/kvm/vmx/tdx.c | 35 ++++++++++++++--------------------- 1 file changed, 14 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 4ef414ee27b4..323aae4300a1 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -360,7 +360,7 @@ static void tdx_reclaim_control_page(struct page *ctrl_= page) if (tdx_reclaim_page(ctrl_page)) return; =20 - __free_page(ctrl_page); + __tdx_free_control_page(ctrl_page); } =20 struct tdx_flush_vp_arg { @@ -597,7 +597,7 @@ static void tdx_reclaim_td_control_pages(struct kvm *kv= m) =20 tdx_quirk_reset_page(kvm_tdx->td.tdr_page); =20 - __free_page(kvm_tdx->td.tdr_page); + __tdx_free_control_page(kvm_tdx->td.tdr_page); kvm_tdx->td.tdr_page =3D NULL; } =20 @@ -2412,7 +2412,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_p= arams *td_params, =20 atomic_inc(&nr_configured_hkid); =20 - tdr_page =3D alloc_page(GFP_KERNEL_ACCOUNT); + tdr_page =3D __tdx_alloc_control_page(GFP_KERNEL_ACCOUNT); if (!tdr_page) goto free_hkid; =20 @@ -2425,7 +2425,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_p= arams *td_params, goto free_tdr; =20 for (i =3D 0; i < kvm_tdx->td.tdcs_nr_pages; i++) { - tdcs_pages[i] =3D alloc_page(GFP_KERNEL_ACCOUNT); + tdcs_pages[i] =3D __tdx_alloc_control_page(GFP_KERNEL_ACCOUNT); if (!tdcs_pages[i]) goto free_tdcs; } @@ -2543,10 +2543,8 @@ static int __tdx_td_init(struct kvm *kvm, struct td_= params *td_params, teardown: /* Only free pages not yet added, so start at 'i' */ for (; i < kvm_tdx->td.tdcs_nr_pages; i++) { - if (tdcs_pages[i]) { - __free_page(tdcs_pages[i]); - tdcs_pages[i] =3D NULL; - } + __tdx_free_control_page(tdcs_pages[i]); + tdcs_pages[i] =3D NULL; } if (!kvm_tdx->td.tdcs_pages) kfree(tdcs_pages); @@ -2561,16 +2559,13 @@ static int __tdx_td_init(struct kvm *kvm, struct td= _params *td_params, free_cpumask_var(packages); =20 free_tdcs: - for (i =3D 0; i < kvm_tdx->td.tdcs_nr_pages; i++) { - if (tdcs_pages[i]) - __free_page(tdcs_pages[i]); - } + for (i =3D 0; i < kvm_tdx->td.tdcs_nr_pages; i++) + __tdx_free_control_page(tdcs_pages[i]); kfree(tdcs_pages); kvm_tdx->td.tdcs_pages =3D NULL; =20 free_tdr: - if (tdr_page) - __free_page(tdr_page); + __tdx_free_control_page(tdr_page); kvm_tdx->td.tdr_page =3D NULL; =20 free_hkid: @@ -2900,7 +2895,7 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u6= 4 vcpu_rcx) int ret, i; u64 err; =20 - page =3D alloc_page(GFP_KERNEL_ACCOUNT); + page =3D __tdx_alloc_control_page(GFP_KERNEL_ACCOUNT); if (!page) return -ENOMEM; tdx->vp.tdvpr_page =3D page; @@ -2920,7 +2915,7 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u6= 4 vcpu_rcx) } =20 for (i =3D 0; i < kvm_tdx->td.tdcx_nr_pages; i++) { - page =3D alloc_page(GFP_KERNEL_ACCOUNT); + page =3D __tdx_alloc_control_page(GFP_KERNEL_ACCOUNT); if (!page) { ret =3D -ENOMEM; goto free_tdcx; @@ -2942,7 +2937,7 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u6= 4 vcpu_rcx) * method, but the rest are freed here. */ for (; i < kvm_tdx->td.tdcx_nr_pages; i++) { - __free_page(tdx->vp.tdcx_pages[i]); + __tdx_free_control_page(tdx->vp.tdcx_pages[i]); tdx->vp.tdcx_pages[i] =3D NULL; } return -EIO; @@ -2970,16 +2965,14 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, = u64 vcpu_rcx) =20 free_tdcx: for (i =3D 0; i < kvm_tdx->td.tdcx_nr_pages; i++) { - if (tdx->vp.tdcx_pages[i]) - __free_page(tdx->vp.tdcx_pages[i]); + __tdx_free_control_page(tdx->vp.tdcx_pages[i]); tdx->vp.tdcx_pages[i] =3D NULL; } kfree(tdx->vp.tdcx_pages); tdx->vp.tdcx_pages =3D NULL; =20 free_tdvpr: - if (tdx->vp.tdvpr_page) - __free_page(tdx->vp.tdvpr_page); + __tdx_free_control_page(tdx->vp.tdvpr_page); tdx->vp.tdvpr_page =3D NULL; tdx->vp.tdvpr_pa =3D 0; =20 --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BB75C33B6DB for ; Thu, 29 Jan 2026 01:16:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649367; cv=none; b=cRmQZ4YJsZxNG4UctkluYSQZIRyIA7x/MjJFzYiKOzBzPURBMMFEow8kPV/a7RMkTRU5yw1UoiYwKu11Da8KhPLxeOIJJxaOjuqoh3a5QEplrgRRmIPEtXLTvEfhUVX6NWnpcwY/WNdWHT+ZoWX49A9RTb11ZEjG6o4B3VI1gck= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649367; c=relaxed/simple; bh=D1RcyFmhh8+6broUyuRXqMJfbz3G6P8KH0L0jZI3CCE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=n9jTPdhZpSqB7sCfmnlKXFTRuaPO+xd1nxdM6DdeJe1DE8QKvzDsWmcknnFLR/w5Y7c+8QbCazjzshox5ety8R+D6fd3pekJ/1Zkot7+E2SBv05jlNOGt3SJQqsSFIi1FxOugGr7JL/jxh7NUoBy4amYq7B2RzmwCHsdr8tItEM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=yRs+JtwI; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="yRs+JtwI" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-81c68fef4d4so1144236b3a.2 for ; Wed, 28 Jan 2026 17:16:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649365; x=1770254165; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=Ex0+d/EF8QHeKOGyYN3KDqeLevHaBCQA5Sl9ZATFjF0=; b=yRs+JtwI9bAzB0ajJ90k5KMreMdzCCSe2hKvtVkwILGNrW19hediRw2PIVqdIobvd8 5dEqWeyVZNBIYuh2k0+op0ZZ8MWgFSOH70HBuH201QbCfKvR3ENdjvkOB2FIPB67xaUh U9e1WfGKXWo7OnOZlYJ//CyjH6olSaF8oNqo4WrstBs+09vU4Lu3AyvqP/v8kW5bh7Yd xl/fW5gE0j28AjdoTKv7i8ogpiQgMpvocWQT6vYOGWbE4VqXvJ8czDoSNgbE2CTuVDmS 6bDfQ4LD+jFMqOjYxwyT4+RtEKVRMOXGbcXzAFJ2sWtJ6ALs8NmXdeWMtEJXjWhVSitj XlKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649365; x=1770254165; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Ex0+d/EF8QHeKOGyYN3KDqeLevHaBCQA5Sl9ZATFjF0=; b=XIQwhNFwpKSgiJlyKc7UymTck9YFnVmDUplGpPDRWLnGpCiiDm1XHPuBH7BQC2E/gE ul5p2cjtICAtfmqLSzhEvovDtUIxfp2+KGpqMWtfGYCfMfGT0KcsVuiLNLorpVoJiPZy jJLEwerOU7DY2bAH/A1lvB9PJeC+NFukWW5UabY5YXtOS80xua8erjvvyNYXmu1mOi1Q o9z2ERGYDrMV7fGMS2bFyVGl8BjVl7HxTKF1Iv7KPPwE99Um2tH16tFvr0VEPC9b8hrJ aM+dIiE7b1qG6PuRrs4Yi9NUglveeanR210xJsGEHPe/dZE7rMhqRJ8ZKfA5Hc0ixM4U nfQg== X-Gm-Message-State: AOJu0YweCVB46HiLYUmhm/gPwED+t/bTjhgymBQ9jWmQfA3IxOHJYNLW 58vtGl5S0OgUCKXqHosZgdehjiHpj6dk3P4a9eTgsj6VOlh5nL+SwSZ/9vnW7VLK7uetZJ+ZLKI ta966Yg== X-Received: from pfbif13.prod.google.com ([2002:a05:6a00:8b0d:b0:76b:f0d4:ac71]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:2d82:b0:821:80bf:590b with SMTP id d2e1a72fcca58-823692a3747mr6686129b3a.34.1769649365031; Wed, 28 Jan 2026 17:16:05 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:51 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-20-seanjc@google.com> Subject: [RFC PATCH v5 19/45] KVM: Allow owner of kvm_mmu_memory_cache to provide a custom page allocator From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Extend "struct kvm_mmu_memory_cache" to support a custom page allocator so that x86's TDX can update per-page metadata on allocation and free(). Name the allocator page_get() to align with __get_free_page(), e.g. to communicate that it returns an "unsigned long", not a "struct page", and to avoid collisions with macros, e.g. with alloc_page. Suggested-by: Kai Huang Signed-off-by: Sean Christopherson Reviewed-by: Kai Huang --- include/linux/kvm_types.h | 2 ++ virt/kvm/kvm_main.c | 7 ++++++- 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index a568d8e6f4e8..87fa9deffdb7 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -112,6 +112,8 @@ struct kvm_mmu_memory_cache { gfp_t gfp_custom; u64 init_value; struct kmem_cache *kmem_cache; + unsigned long (*page_get)(gfp_t gfp); + void (*page_free)(unsigned long addr); int capacity; int nobjs; void **objects; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 571cf0d6ec01..7015edce5bd8 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -356,7 +356,10 @@ static inline void *mmu_memory_cache_alloc_obj(struct = kvm_mmu_memory_cache *mc, if (mc->kmem_cache) return kmem_cache_alloc(mc->kmem_cache, gfp_flags); =20 - page =3D (void *)__get_free_page(gfp_flags); + if (mc->page_get) + page =3D (void *)mc->page_get(gfp_flags); + else + page =3D (void *)__get_free_page(gfp_flags); if (page && mc->init_value) memset64(page, mc->init_value, PAGE_SIZE / sizeof(u64)); return page; @@ -416,6 +419,8 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_ca= che *mc) while (mc->nobjs) { if (mc->kmem_cache) kmem_cache_free(mc->kmem_cache, mc->objects[--mc->nobjs]); + else if (mc->page_free) + mc->page_free((unsigned long)mc->objects[--mc->nobjs]); else free_page((unsigned long)mc->objects[--mc->nobjs]); } --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 61FBC313E39 for ; Thu, 29 Jan 2026 01:16:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649369; cv=none; b=Xf1gOe896Clnfa7Wb1FFiE95pW7TkGE6OrEEXiiCRxP+Kuo+UtBeFy8s4q5aWq85YvlPpDh0wztiyswL/6pUq865a4HL+EGKmoQTiKJXp4r9x84q3+5uEREGWMCq6TU8U5CIrrMmmuC6cDqqml97Trw7hnpR7S8lrcAth6xtDhI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649369; c=relaxed/simple; bh=TBS3NOuTU/+p9r5oAffOOIjvCTELnp6C8dm3U+EJwg4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=qZvS4NMinLISuanwtlugAQKe8P0gVm+qC44ijmQM4pJSvlrl5KvZcF19aMrH1kPNuVa8iZaCYwqsUczUE3p9MOaYDZ+E6c1edGGqSRG86evdFx9rh222NDW8oMPGtxYHqmGkIVlUguj8XV7NftmTsBJGWhQReVI7ONrYnGT2SZc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=03AoZWhu; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="03AoZWhu" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-c552d1f9eafso748013a12.0 for ; Wed, 28 Jan 2026 17:16:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649367; x=1770254167; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=8HnFAQrxQ92neYQA/fX8/fYTn2bGdwSiVHNlc5IdWqw=; b=03AoZWhuYRB0gFY+FBCitsqfcVgW10jhnpRK8GHP9Y9WUn75EaoT6q2+3viPq6MQwd q0FKavLqd8u4WAOY5jrPXOJK+U2K1qs2fpQGXXg+A1sOuCGFALsCmxHGpjPG1rKPtrGF iZPbdhZ6wwlOp+BEMuZvgGvbj41t2IoNqa2v6mwQl/LqAOpAYROxkRJS7rG36XHp4nI5 ifjl13e6tuyF1Q3G6yhffzYFLmv6kw8YxTrG1C8y59ti06zmXFRHNBZryfeT2f652RYV +5LpvEXlPA2I97EKE/dM75pJFrGm9DW2taPuYm34j13t9tNeQaBgGdzJvlPoWlf2j0wr VLOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649367; x=1770254167; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8HnFAQrxQ92neYQA/fX8/fYTn2bGdwSiVHNlc5IdWqw=; b=FsdI/bdt5ObZ9fGl5cLb2dTj4ItH2sM3ioRuVtBhs5501r7V9Qpm+Z424cCwidBsK9 +yXKHinuPEvCa0jAZQ9wtw9iFjaVE7fEzpKcZJeFG3NoafT9Zr6AfmAZwyUiRcOzaEW/ vkmm8UQG42A6u9Llak8RA0fPuJ3rrd38qN6rzqJ1s6IINQKiWbUX1pMDKXQuvvdFDqvy moLIOxnLODSUNCobZlIuxawzi0Nr46PVwiUjj3iRIRT2+oApLij7sx9rViabBB3R9eWL ulBVhZ+drvBeUwb0H/NU7M8gMZL0Yo0DM7i1nc658xg0vlJMkHiS8CkFpB2Js28GqNkR 9gaQ== X-Gm-Message-State: AOJu0Yz64k+BfyUi1PzI5wr1n3QPBeUEQuql5sDH+nsWn4Q1WC7odcdK 1EyPVCvmdoUBKBBiCDGW/mB9uzKM+dmOOndJ/7z4fa1qnx4nH+cIvf4EjQrscHlW8mrrG1aKmVu L2Zw35w== X-Received: from pgo32.prod.google.com ([2002:a63:e60:0:b0:c65:be00:c5c9]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:6f09:b0:38e:9acd:97d0 with SMTP id adf61e73a8af0-38ec632f1cbmr6748652637.34.1769649366694; Wed, 28 Jan 2026 17:16:06 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:52 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-21-seanjc@google.com> Subject: [RFC PATCH v5 20/45] KVM: x86/mmu: Allocate/free S-EPT pages using tdx_{alloc,free}_control_page() From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that kvm_mmu_memory_cache supports custom page allocators, wire up the S-EPT cache to use tdx_{alloc,free}_control_page() (arguably S-EPT pages aren't "control" pages, but they're not guest pages either). Using the TDX APIs will make S-EPT pages naturally play nice with Dynamic PAMT, by virtue of adding/removing PAMT entries when S-EPT pages are allocated and freed, as opposed to when they are added/removed from the S-EPT tree. Inserting into the PAMT entries on allocation does mean KVM will create unnecessary PAMT entries, e.g. once a vCPU stops faulting in memory, the remaining pages in the MMU cache will go unused. But in practice, odds are very good the containing 2MiB page will have other in-use S-EPT pages, i.e. will create PAMT entries anyways. And _if_ creating PAMT entries on allocation is problematic for memory consumption, that can be resolved by tweaking KVM's cache size. Suggested-by: Kai Huang Signed-off-by: Sean Christopherson Reviewed-by: Kai Huang --- arch/x86/include/asm/kvm-x86-ops.h | 2 ++ arch/x86/include/asm/kvm_host.h | 18 +++++++++--------- arch/x86/kvm/mmu/mmu.c | 6 ++++-- arch/x86/kvm/mmu/mmu_internal.h | 11 ----------- arch/x86/kvm/mmu/tdp_mmu.c | 5 +++-- arch/x86/kvm/vmx/tdx.c | 13 ++++++++++++- 6 files changed, 30 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index c17cedc485c9..17dddada69fc 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -94,6 +94,8 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr) KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) +KVM_X86_OP_OPTIONAL(alloc_external_sp) +KVM_X86_OP_OPTIONAL(free_external_sp) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) KVM_X86_OP_OPTIONAL(remove_external_spte) KVM_X86_OP_OPTIONAL(reclaim_external_sp) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index b35a07ed11fb..6e84dbc89e79 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -867,10 +867,7 @@ struct kvm_vcpu_arch { struct kvm_mmu_memory_cache mmu_shadow_page_cache; struct kvm_mmu_memory_cache mmu_shadowed_info_cache; struct kvm_mmu_memory_cache mmu_page_header_cache; - /* - * This cache is to allocate external page table. E.g. private EPT used - * by the TDX module. - */ + /* Used to allocate S-EPT pages (gifted to the TDX-Module). */ struct kvm_mmu_memory_cache mmu_external_spt_cache; =20 /* @@ -1853,18 +1850,21 @@ struct kvm_x86_ops { void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); =20 - /* Update the external page table from spte getting set. */ + /* + * Callbacks to allocate and free external page tables, a.k.a. S-EPT, + * and to propagate changes in mirror page tables to the external page + * tables. + */ + unsigned long (*alloc_external_sp)(gfp_t gfp); + void (*free_external_sp)(unsigned long addr); int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte); - - /* Update external page tables for page table about to be freed. */ void (*reclaim_external_sp)(struct kvm *kvm, gfn_t gfn, struct kvm_mmu_page *sp); - - /* Update external page table from spte getting removed, and flush TLB. */ void (*remove_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level le= vel, u64 mirror_spte); =20 + bool (*has_wbinvd_exit)(void); =20 u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3911ac9bddfd..9b5a6861e2a4 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6690,11 +6690,13 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) vcpu->arch.mmu_page_header_cache.kmem_cache =3D mmu_page_header_cache; vcpu->arch.mmu_page_header_cache.gfp_zero =3D __GFP_ZERO; =20 - vcpu->arch.mmu_shadow_page_cache.init_value =3D - SHADOW_NONPRESENT_VALUE; + vcpu->arch.mmu_shadow_page_cache.init_value =3D SHADOW_NONPRESENT_VALUE; if (!vcpu->arch.mmu_shadow_page_cache.init_value) vcpu->arch.mmu_shadow_page_cache.gfp_zero =3D __GFP_ZERO; =20 + vcpu->arch.mmu_external_spt_cache.page_get =3D kvm_x86_ops.alloc_external= _sp; + vcpu->arch.mmu_external_spt_cache.page_free =3D kvm_x86_ops.free_external= _sp; + vcpu->arch.mmu =3D &vcpu->arch.root_mmu; vcpu->arch.walk_mmu =3D &vcpu->arch.root_mmu; =20 diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna= l.h index 73cdcbccc89e..6bb97f660793 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -157,17 +157,6 @@ static inline bool is_mirror_sp(const struct kvm_mmu_p= age *sp) return sp->role.is_mirror; } =20 -static inline void kvm_mmu_alloc_external_spt(struct kvm_vcpu *vcpu, struc= t kvm_mmu_page *sp) -{ - /* - * external_spt is allocated for TDX module to hold private EPT mappings, - * TDX module will initialize the page by itself. - * Therefore, KVM does not need to initialize or access external_spt. - * KVM only interacts with sp->spt for private EPT operations. - */ - sp->external_spt =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_external_= spt_cache); -} - static inline gfn_t kvm_gfn_root_bits(const struct kvm *kvm, const struct = kvm_mmu_page *root) { /* diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 18764dbc97ea..01e3e4f4baa5 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -55,7 +55,8 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) =20 static void tdp_mmu_free_sp(struct kvm_mmu_page *sp) { - free_page((unsigned long)sp->external_spt); + if (sp->external_spt) + kvm_x86_call(free_external_sp)((unsigned long)sp->external_spt); free_page((unsigned long)sp->spt); kmem_cache_free(mmu_page_header_cache, sp); } @@ -1246,7 +1247,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm= _page_fault *fault) sp =3D tdp_mmu_alloc_sp(vcpu); tdp_mmu_init_child_sp(sp, &iter); if (is_mirror_sp(sp)) - kvm_mmu_alloc_external_spt(vcpu, sp); + sp->external_spt =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_externa= l_spt_cache); =20 sp->nx_huge_page_disallowed =3D fault->huge_page_disallowed; =20 diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 323aae4300a1..0946eba2de23 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1790,7 +1790,9 @@ static void tdx_sept_reclaim_private_sp(struct kvm *k= vm, gfn_t gfn, * TD's hkid is freed, when the TD is being torn down. * * If the S-EPT PTE can't be removed for any reason, intentionally leak - * the page to prevent the kernel from accessing the encrypted page. + * the page to prevent the kernel from accessing the encrypted page, + * and if Dynamic PAMT is enabled, to avoid inducing a failure on + * removal of the still-used PAMT entry. */ if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) || tdx_reclaim_page(virt_to_page(sp->external_spt))) @@ -3600,6 +3602,15 @@ void __init tdx_hardware_setup(void) */ vt_x86_ops.vm_size =3D max_t(unsigned int, vt_x86_ops.vm_size, sizeof(str= uct kvm_tdx)); =20 + /* + * TDX uses the external_spt cache to allocate S-EPT page table pages, + * which (a) don't need to be initialized by KVM as the TDX-Module will + * initialize the page (using the guest's encryption key), and (b) need + * to use a custom allocator to be compatible with Dynamic PAMT. + */ + vt_x86_ops.alloc_external_sp =3D tdx_alloc_control_page; + vt_x86_ops.free_external_sp =3D tdx_free_control_page; + vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; vt_x86_ops.reclaim_external_sp =3D tdx_sept_reclaim_private_sp; vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4864F345CA6 for ; Thu, 29 Jan 2026 01:16:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649372; cv=none; b=objcbFUrDD8X3ROCzw/uxFKcoomwPbHeXN2FITtN+OI1X9jS9RaxI/mwRcacEZ1Vy3NaYj6rESJmGHfpo5+p4vYPy+imLx2Li8PEK1Rgjt/wlo+OnOtqlc4ACSCRHrshdTC7cbz5C63hH2stf5T/EpK4lEVb9byg4ACltLY1meg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649372; c=relaxed/simple; bh=xyU67iIG8B0z1lftPU8r+DkLP9n8GV25IDyjReoJtDA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=hZnVv4ntneadZBfYyRhKlnsriitADqYIn+PYGSGIjn2xC9tLLt+sE2SG+2YV2y8tL99kKJD2dZFYxFLBJThcIcHBTdOTdsPkmFEUc8llQk9YIWUOczDjPiW5STjacG9Rc/RqnofmK7H51UKmvH9PnFETrhb3zTQsHqYc88x40us= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=FxffVin7; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="FxffVin7" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-b62da7602a0so305549a12.2 for ; Wed, 28 Jan 2026 17:16:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649369; x=1770254169; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=83UiBV3UrNTMNXOexHb3DCeTDDgbi4QS7S/0ehY+L8M=; b=FxffVin7zPw86OwPlBQrcaRWJCcsyS0f6OOjPg6ViCZJtzbA8WgAMjAckH98lMCivH Mf8ljmvmuFADbVRsgZ5kItBQZakcqJh8enANA7rGCqvXt/TwSWLvqwNY3pjPdPGSyOW7 srq4QBFXSvXxM6pSGJVke7WiKAJ/OP7na5hNFQiIDZ+KKcHeqC/vvFtB44djsR1i+XgG VvkSfik4OPiXiFNxsmTKPCguhSOxzENx/j7l/V0VYLl8EgdX734hilkKfIQ7WwL9a0CP JsfaIKE0MD+cHqQKEJGNq5HnSN2RB2c6cBiYjFbReNuYaV4U4tLsq/ELNbb3+zcr9mmL j/Bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649369; x=1770254169; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=83UiBV3UrNTMNXOexHb3DCeTDDgbi4QS7S/0ehY+L8M=; b=Y0GISsRXIvctciiDTu3+SSy8KDf0ggiDnbKt6RAb8ziEZgUZuONpaUPZlBvWRHX8UU vbI1i/n0H4lsfcyxNEFQV3V+JFcjxp/2tpO2OWm3Q/2QXkekvIBE6vH3wn8UrUXPdHWL vzUSYPYvHu8YT6nsTTLj064kELfJv6nYwJOjxGbMRsML1nObOt+yOieQWqlW+Abyrapr 6NJIkCbwrNEcU6gP2WsyjrN+j6KLuwlSUibrq/kWCJwSVN0gNtW46Wgqk0OSAlGuUsVR 4iuQyQgqAG4C4q1RDkNwpEdUrnj1KIa5oKPzWMyJiE0jiOLxG7OOQkVj8XttC2JciKKG hE3w== X-Gm-Message-State: AOJu0Yy/aqEv0BF4b+y6zl3GRnRzszR6qBJL171h4wjWTFTdP+RXvwvp FwnTsRF65KyAMpN/kKPz6EBXVyrul0ngtt+LCYM4MrJCByUvFIMmC6xJyFJu7rbY6tVjqKPZRC2 qpmpUxQ== X-Received: from pjbsy16.prod.google.com ([2002:a17:90b:2d10:b0:34c:2124:a2b0]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:820e:b0:38d:fa67:e87f with SMTP id adf61e73a8af0-38ec627b9f1mr7469147637.12.1769649368609; Wed, 28 Jan 2026 17:16:08 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:53 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-22-seanjc@google.com> Subject: [RFC PATCH v5 21/45] x86/tdx: Add APIs to support get/put of DPAMT entries from KVM, under spinlock From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rick Edgecombe Implement a PAMT "caching" scheme, similar to KVM's pre-allocated cache of MMU assets, along with APIs to allow KVM to pre-allocate PAMT pages before acquiring its mmu_lock spinlock, but wait until S-EPT entries are created to actually update the Dynamic PAMT. Signed-off-by: Rick Edgecombe Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson --- arch/x86/include/asm/tdx.h | 17 ++++++++++ arch/x86/virt/vmx/tdx/tdx.c | 65 +++++++++++++++++++++++++++++++++---- 2 files changed, 76 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index fa29be18498c..c39e2920d0c3 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -136,6 +136,23 @@ static inline bool tdx_supports_dynamic_pamt(const str= uct tdx_sys_info *sysinfo) return false; /* To be enabled when kernel is ready */ } =20 +/* Simple structure for pre-allocating Dynamic PAMT pages outside of locks= . */ +struct tdx_pamt_cache { + struct list_head page_list; + int cnt; +}; + +static inline void tdx_init_pamt_cache(struct tdx_pamt_cache *cache) +{ + INIT_LIST_HEAD(&cache->page_list); + cache->cnt =3D 0; +} + +void tdx_free_pamt_cache(struct tdx_pamt_cache *cache); +int tdx_topup_pamt_cache(struct tdx_pamt_cache *cache, unsigned long npage= s); +int tdx_pamt_get(struct page *page, struct tdx_pamt_cache *cache); +void tdx_pamt_put(struct page *page); + void tdx_quirk_reset_page(struct page *page); =20 int tdx_guest_keyid_alloc(void); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index d333d2790913..53b29c827520 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -2064,13 +2064,34 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct pa= ge *page) } EXPORT_SYMBOL_FOR_KVM(tdh_phymem_page_wbinvd_hkid); =20 -static int alloc_pamt_array(u64 *pa_array) +static struct page *tdx_alloc_page_pamt_cache(struct tdx_pamt_cache *cache) +{ + struct page *page; + + page =3D list_first_entry_or_null(&cache->page_list, struct page, lru); + if (page) { + list_del(&page->lru); + cache->cnt--; + } + + return page; +} + +static struct page *alloc_dpamt_page(struct tdx_pamt_cache *cache) +{ + if (cache) + return tdx_alloc_page_pamt_cache(cache); + + return alloc_page(GFP_KERNEL_ACCOUNT); +} + +static int alloc_pamt_array(u64 *pa_array, struct tdx_pamt_cache *cache) { struct page *page; int i; =20 for (i =3D 0; i < tdx_dpamt_entry_pages(); i++) { - page =3D alloc_page(GFP_KERNEL_ACCOUNT); + page =3D alloc_dpamt_page(cache); if (!page) goto err; pa_array[i] =3D page_to_phys(page); @@ -2151,7 +2172,7 @@ static u64 tdh_phymem_pamt_remove(struct page *page, = u64 *pamt_pa_array) static DEFINE_SPINLOCK(pamt_lock); =20 /* Bump PAMT refcount for the given page and allocate PAMT memory if neede= d */ -static int tdx_pamt_get(struct page *page) +int tdx_pamt_get(struct page *page, struct tdx_pamt_cache *cache) { u64 pamt_pa_array[MAX_NR_DPAMT_ARGS]; atomic_t *pamt_refcount; @@ -2170,7 +2191,7 @@ static int tdx_pamt_get(struct page *page) if (atomic_inc_not_zero(pamt_refcount)) return 0; =20 - ret =3D alloc_pamt_array(pamt_pa_array); + ret =3D alloc_pamt_array(pamt_pa_array, cache); if (ret) goto out_free; =20 @@ -2222,12 +2243,13 @@ static int tdx_pamt_get(struct page *page) free_pamt_array(pamt_pa_array); return ret; } +EXPORT_SYMBOL_FOR_KVM(tdx_pamt_get); =20 /* * Drop PAMT refcount for the given page and free PAMT memory if it is no * longer needed. */ -static void tdx_pamt_put(struct page *page) +void tdx_pamt_put(struct page *page) { u64 pamt_pa_array[MAX_NR_DPAMT_ARGS]; atomic_t *pamt_refcount; @@ -2281,6 +2303,37 @@ static void tdx_pamt_put(struct page *page) */ free_pamt_array(pamt_pa_array); } +EXPORT_SYMBOL_FOR_KVM(tdx_pamt_put); + +void tdx_free_pamt_cache(struct tdx_pamt_cache *cache) +{ + struct page *page; + + while ((page =3D tdx_alloc_page_pamt_cache(cache))) + __free_page(page); +} +EXPORT_SYMBOL_FOR_KVM(tdx_free_pamt_cache); + +int tdx_topup_pamt_cache(struct tdx_pamt_cache *cache, unsigned long npage= s) +{ + if (WARN_ON_ONCE(!tdx_supports_dynamic_pamt(&tdx_sysinfo))) + return 0; + + npages *=3D tdx_dpamt_entry_pages(); + + while (cache->cnt < npages) { + struct page *page =3D alloc_page(GFP_KERNEL_ACCOUNT); + + if (!page) + return -ENOMEM; + + list_add(&page->lru, &cache->page_list); + cache->cnt++; + } + + return 0; +} +EXPORT_SYMBOL_FOR_KVM(tdx_topup_pamt_cache); =20 /* * Return a page that can be gifted to the TDX-Module for use as a "contro= l" @@ -2296,7 +2349,7 @@ struct page *__tdx_alloc_control_page(gfp_t gfp) if (!page) return NULL; =20 - if (tdx_pamt_get(page)) { + if (tdx_pamt_get(page, NULL)) { __free_page(page); return NULL; } --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A36C3469E0 for ; Thu, 29 Jan 2026 01:16:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649372; cv=none; b=ODdRHMDwa3EdA4aBNFm+RZigjdmwspPVkI5q6mBFp9e5ZRZsWUiK+Me3kv4EUepdqActEN/iyBZs/XkmzHDmjJWFC3owCZgw15vNW7GBtgbmvTSfMDfFGn7AinkfUNHoVvwYBGG/sYsiP4zHKIIV9hcqLoUU1Hyh84sEaApGFkg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649372; c=relaxed/simple; bh=SQ5cW++nxEKu5GXoB9MoukK6xkStBoadO/lp4PIOccg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Mu1/newXAjVgoDSJMjpnJ4Ydp+Hc6m8R6mueJr+/4ISmjDGeJ/0Zyu7dqLLCVO3qMbwAPYhS1aWQe67dNOGFmg9MrZQjPkIajBivAC4niYbJ1fDT+pBO57oDFyQOo78qGzpJFkaNn+b+Xk+n0VDELjSi3IFFMyTWzm8ayYsYT6s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=XRcJExwl; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="XRcJExwl" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-c61334ba4d3so225442a12.0 for ; Wed, 28 Jan 2026 17:16:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649370; x=1770254170; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=n4V0wAly6JW/34a2CvfLdneki3p7jKUuadg7DAh0+IE=; b=XRcJExwlO6xDxQyKw0mwAAQIce+WXCXhMw5tl9g/1Fuec6bu0j3q0chLhMKAAw/mpE Akx+dP7DTe/kPJgm044yu2gCdku9oTs18BUPXVute6y09uLtsemjrwi7MDnsfBPD5hGy GIzw4Glp0sryeiHhZnAYbbcRIwRk59dSbnXFlI0yrhWwgB1kQdtyiLGW/oLiV28xqrXU iS0n/A39cEW/HohyuVsZEg0ZVdJtv3Mzsc68wxFi1BOz0qjWnRKnKKk/pwbxFhaKCbq3 tGDCI4cLw+CDHPlF9i5ZOyE4uKYwl3uFJAYFu3F5fou7wC20mE7ND7fgl37loLC5HeXX +6eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649370; x=1770254170; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=n4V0wAly6JW/34a2CvfLdneki3p7jKUuadg7DAh0+IE=; b=ByNcoXsnEv7QtSyndcoiJoCXd4UEzrH8jSCJdesUvp4ZwJMe+Ap8im92+7UOxlIZMJ xL1TtWGE72fu4nMa1Yl9ONPnZmzONLOLhRWXIKVZWzP/GdjV6hObcLzdA+pOjIdZiL5K 5J4JhuMR609cfwckYUvEU5KG34KQbv0PB4E/zKnzgXsXKNgvlpbEql5+eQAuc6RNZScw GazdqZDemY9tP3wC9qrFRGQTShHf5frH53GuV48dQ9cCMBsHagmT27dWBWpMtYIeXjLo RQodfHZ2/uPqKdY2mc/C25oOiqBnmBqnnZubFz/sXGybS27YCEZNmLmew8uds8dBUTLW xghg== X-Gm-Message-State: AOJu0YyGb8Sb+cJmZI+C4OokU0Ex10YKEAijzqa06ytCDRMUQvcg/KQb Q3kWUfvIiTiqf8x3udd3O3vjjqz6txD2AP90CdNmwNxWW9KDzI0Pl/3BErfQxKhlrN0IqiSRE/w 6puzVGw== X-Received: from pjbdy16.prod.google.com ([2002:a17:90b:6d0:b0:34c:2156:9de7]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5784:b0:340:4abf:391d with SMTP id 98e67ed59e1d1-353fecf6b60mr6484797a91.16.1769649370472; Wed, 28 Jan 2026 17:16:10 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:54 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-23-seanjc@google.com> Subject: [RFC PATCH v5 22/45] KVM: TDX: Get/put PAMT pages when (un)mapping private memory From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kirill A. Shutemov Add Dynamic PAMT support to KVM's S-EPT MMU by "getting" a PAMT page when adding guest memory (PAGE.ADD or PAGE.AUG), and "putting" the page when removing guest memory (PAGE.REMOVE). To access the per-vCPU PAMT caches without plumbing @vcpu throughout the TDP MMU, begrudginly use kvm_get_running_vcpu() to get the vCPU, and bug the VM If KVM attempts to set an S-EPT without an active vCPU. KVM only supports creating _new_ mappings in page (pre)fault paths, all of which require an active vCPU. The PAMT memory holds metadata for TDX-protected memory. With Dynamic PAMT, PAMT_4K is allocated on demand. The kernel supplies the TDX module with a few pages that cover 2M of host physical memory. PAMT memory can be reclaimed when the last user is gone. It can happen in a few code paths: - On TDH.PHYMEM.PAGE.RECLAIM in tdx_reclaim_td_control_pages() and tdx_reclaim_page(). - On TDH.MEM.PAGE.REMOVE in tdx_sept_drop_private_spte(). - In tdx_sept_zap_private_spte() for pages that were in the queue to be added with TDH.MEM.PAGE.ADD, but it never happened due to an error. - In tdx_sept_free_private_spt() for SEPT pages; Signed-off-by: Kirill A. Shutemov [Minor log tweak] Signed-off-by: Rick Edgecombe Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/mmu/mmu.c | 4 +++ arch/x86/kvm/vmx/tdx.c | 44 ++++++++++++++++++++++++++---- arch/x86/kvm/vmx/tdx.h | 2 ++ 5 files changed, 47 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 17dddada69fc..394dc29483a7 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -99,6 +99,7 @@ KVM_X86_OP_OPTIONAL(free_external_sp) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) KVM_X86_OP_OPTIONAL(remove_external_spte) KVM_X86_OP_OPTIONAL(reclaim_external_sp) +KVM_X86_OP_OPTIONAL_RET0(topup_external_cache) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) KVM_X86_OP(get_l2_tsc_multiplier) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 6e84dbc89e79..a6e4ab76b1b2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1863,6 +1863,7 @@ struct kvm_x86_ops { struct kvm_mmu_page *sp); void (*remove_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level le= vel, u64 mirror_spte); + int (*topup_external_cache)(struct kvm_vcpu *vcpu, int min); =20 =20 bool (*has_wbinvd_exit)(void); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 9b5a6861e2a4..4ecbf216d96f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -605,6 +605,10 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vc= pu, bool maybe_indirect) PT64_ROOT_MAX_LEVEL); if (r) return r; + + r =3D kvm_x86_call(topup_external_cache)(vcpu, PT64_ROOT_MAX_LEVEL); + if (r) + return r; } r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache, PT64_ROOT_MAX_LEVEL); diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 0946eba2de23..d74a2547e512 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -683,6 +683,8 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) if (!irqchip_split(vcpu->kvm)) return -EINVAL; =20 + tdx_init_pamt_cache(&tdx->pamt_cache); + fpstate_set_confidential(&vcpu->arch.guest_fpu); vcpu->arch.apic->guest_apic_protected =3D true; INIT_LIST_HEAD(&tdx->vt.pi_wakeup_list); @@ -868,6 +870,8 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu) struct vcpu_tdx *tdx =3D to_tdx(vcpu); int i; =20 + tdx_free_pamt_cache(&tdx->pamt_cache); + if (vcpu->cpu !=3D -1) { KVM_BUG_ON(tdx->state =3D=3D VCPU_TD_STATE_INITIALIZED, vcpu->kvm); tdx_flush_vp_on_cpu(vcpu); @@ -1615,6 +1619,14 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t r= oot_hpa, int pgd_level) td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa); } =20 +static int tdx_topup_external_pamt_cache(struct kvm_vcpu *vcpu, int min) +{ + if (!tdx_supports_dynamic_pamt(tdx_sysinfo)) + return 0; + + return tdx_topup_pamt_cache(&to_tdx(vcpu)->pamt_cache, min); +} + static int tdx_mem_page_add(struct kvm *kvm, gfn_t gfn, enum pg_level leve= l, kvm_pfn_t pfn) { @@ -1696,8 +1708,15 @@ static int tdx_sept_link_private_spt(struct kvm *kvm= , gfn_t gfn, static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte) { + struct kvm_vcpu *vcpu =3D kvm_get_running_vcpu(); struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); + struct vcpu_tdx *tdx =3D to_tdx(vcpu); + struct page *page =3D pfn_to_page(pfn); + int ret; + + if (KVM_BUG_ON(!vcpu, kvm)) + return -EINVAL; =20 if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) return -EIO; @@ -1711,6 +1730,10 @@ static int tdx_sept_set_private_spte(struct kvm *kvm= , gfn_t gfn, =20 WARN_ON_ONCE((mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); =20 + ret =3D tdx_pamt_get(page, &tdx->pamt_cache); + if (ret) + return ret; + /* * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory() * before kvm_tdx->state. Userspace must not be allowed to pre-fault @@ -1723,14 +1746,17 @@ static int tdx_sept_set_private_spte(struct kvm *kv= m, gfn_t gfn, * If the TD isn't finalized/runnable, then userspace is initializing * the VM image via KVM_TDX_INIT_MEM_REGION; ADD the page to the TD. */ - if (unlikely(kvm_tdx->state !=3D TD_STATE_RUNNABLE)) - return tdx_mem_page_add(kvm, gfn, level, pfn); + if (likely(kvm_tdx->state =3D=3D TD_STATE_RUNNABLE)) + ret =3D tdx_mem_page_aug(kvm, gfn, level, pfn); + else + ret =3D tdx_mem_page_add(kvm, gfn, level, pfn); =20 - return tdx_mem_page_aug(kvm, gfn, level, pfn); + if (ret) + tdx_pamt_put(page); + + return ret; } =20 - - /* * Ensure shared and private EPTs to be flushed on all vCPUs. * tdh_mem_track() is the only caller that increases TD epoch. An increase= in @@ -1847,6 +1873,7 @@ static void tdx_sept_remove_private_spte(struct kvm *= kvm, gfn_t gfn, return; =20 tdx_quirk_reset_page(page); + tdx_pamt_put(page); } =20 void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, @@ -3614,5 +3641,12 @@ void __init tdx_hardware_setup(void) vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; vt_x86_ops.reclaim_external_sp =3D tdx_sept_reclaim_private_sp; vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; + + /* + * FIXME: Wire up the PAMT hook iff DPAMT is supported, once VMXON is + * moved out of KVM and tdx_bringup() is folded into here. + */ + vt_x86_ops.topup_external_cache =3D tdx_topup_external_pamt_cache; + vt_x86_ops.protected_apic_has_interrupt =3D tdx_protected_apic_has_interr= upt; } diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index ce2720a028ad..f444fc84d93b 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -73,6 +73,8 @@ struct vcpu_tdx { =20 u64 map_gpa_next; u64 map_gpa_end; + + struct tdx_pamt_cache pamt_cache; }; =20 void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 e= rr); --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D959F346A02 for ; Thu, 29 Jan 2026 01:16:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649374; cv=none; b=oBYkqADEv9aYJX+rPKiLAbzm4n9dH6jmnmkPrmz68dZ48m+2P+GkydHptQzkbPMaLIrlEfxaJIvG2G7uOzVjQf1ODU6czBXGBoDwLQJehiKXrUiCt2hS7kulOIL4N8tsAXPeTBWKm8DTFC5tG+Ai8LRY/rhgZo6j9IFMWQYwxGM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649374; c=relaxed/simple; bh=xiOF/ywd/+MHn1BhHUiTZVc2zVcx2UHV6SNWBozy59Q=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ELOO+J0K2SRuDWHt+AuC/FY+FE7hdZSMCMjTqc4Ytl8va6NWTpAVf0vrcUR8d63SlvTh+uMgVbyc2y4iDvLszkW+b1WB3Tb29tRXzY+662N1lRnOe+SrwoG2BitOn8dp776tdxAGqFC0f0Xf3XRY5maVCiyb6ZQ3vpgtqkVRv7M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=mgFzpC4w; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mgFzpC4w" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-c5e56644646so1058658a12.0 for ; Wed, 28 Jan 2026 17:16:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649372; x=1770254172; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=gE55/NsqfamORgnZHIenHqCeRcUi8ZNblQ/btnBuAf4=; b=mgFzpC4wZ9jbi4Ms4Ari3El4TCf751OER6smDogF3LYURmyqPKSv0UDMAi4IkZo9sN rXJc36o23mbyzvs++1baCHpkER/SsrIhaahW/mXct4qsjUXu8+KwLBQ1kXe/pFfwCCa/ k3tOEe7EaDvLQbWy2NXW5Meh8RDdFxADWH5ZU4waEcvNJ4ggfJ53nvmxhGPl6Lgl2oZ/ XZadZME/dSEzBkO64QKh1ln5IeRIvpS+Sbg/GO3EjrvrTON8tR3dFkkpkTMnYaqlpUC6 0o7MbEJpNZB36An7Urz1GjTco9SbUoXAqGk2ctShde6SyeguxEA/s2wbJVmHOR0SliQz hPsQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649372; x=1770254172; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=gE55/NsqfamORgnZHIenHqCeRcUi8ZNblQ/btnBuAf4=; b=T5Cm6jwXJG4G1lxUIgGd/98HLepTTso5AfVBSz/il8A4XaA+lKOEuCXwxR/0od2i0P N/MPgbFltNcYPupZcQdQkyL4N2xg/mR+e1u1jcAqnnGYysolTvnXnKdn0630xiSjnaPk FJGujs6njPj94ym/A9RP2sTdhbsPRo7C05MOAlc3zNW7fmmRYKMx4aqkCQecp+SMg+u9 4z87zpG99veVCYt+OSo8oDLwwCa+l9MjrgJ04rNlch/yrGYpRD6l9qNfdCq/9QmDJ8Oj TUIHPV5jqfLc4wga8qatsoOGKhPaqWFOEVnpZbUskjzwNNO3jdV4DyUp1feN/yBtNpBD bUUA== X-Gm-Message-State: AOJu0Ywcut/UxDb3dIJWxPfWc0wfe5wr2ZRwgH//Ql8DGqfjPA/6KRwo pqBtvoI/oHPDKJ67kHgUmmK7NA9Q6Dw8yGq0holb9Qz4bB4iFRhCyiXzgK9lB0GZ4Jfw+fRRlEV izq3wZA== X-Received: from pgbdn8.prod.google.com ([2002:a05:6a02:e08:b0:c1c:1a2d:bc85]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:d50c:b0:389:5767:5763 with SMTP id adf61e73a8af0-392cc6b1ab7mr1088866637.24.1769649372269; Wed, 28 Jan 2026 17:16:12 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:55 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-24-seanjc@google.com> Subject: [RFC PATCH v5 23/45] x86/virt/tdx: Enable Dynamic PAMT From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kirill A. Shutemov The Physical Address Metadata Table (PAMT) holds TDX metadata for physical memory and must be allocated by the kernel during TDX module initialization. The exact size of the required PAMT memory is determined by the TDX module and may vary between TDX module versions, but currently it is approximately 0.4% of the system memory. This is a significant commitment, especially if it is not known upfront whether the machine will run any TDX guests. The Dynamic PAMT feature reduces static PAMT allocations. PAMT_1G and PAMT_2M levels are still allocated on TDX module initialization, but the PAMT_4K level is allocated dynamically, reducing static allocations to approximately 0.004% of the system memory. All pieces are in place. Enable Dynamic PAMT if it is supported. Signed-off-by: Kirill A. Shutemov Signed-off-by: Rick Edgecombe Tested-by: Sagi Shahar Signed-off-by: Sean Christopherson --- arch/x86/include/asm/tdx.h | 6 +++++- arch/x86/virt/vmx/tdx/tdx.c | 8 ++++++++ arch/x86/virt/vmx/tdx/tdx.h | 3 --- 3 files changed, 13 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index c39e2920d0c3..56bdfbce4289 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -12,6 +12,10 @@ #include #include =20 +/* Bit definitions of TDX_FEATURES0 metadata field */ +#define TDX_FEATURES0_NO_RBP_MOD BIT_ULL(18) +#define TDX_FEATURES0_DYNAMIC_PAMT BIT_ULL(36) + #ifndef __ASSEMBLER__ =20 #include @@ -133,7 +137,7 @@ const struct tdx_sys_info *tdx_get_sysinfo(void); =20 static inline bool tdx_supports_dynamic_pamt(const struct tdx_sys_info *sy= sinfo) { - return false; /* To be enabled when kernel is ready */ + return sysinfo->features.tdx_features0 & TDX_FEATURES0_DYNAMIC_PAMT; } =20 /* Simple structure for pre-allocating Dynamic PAMT pages outside of locks= . */ diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 53b29c827520..90407493bb45 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1068,6 +1068,8 @@ static int construct_tdmrs(struct list_head *tmb_list, return ret; } =20 +#define TDX_SYS_CONFIG_DYNAMIC_PAMT BIT(16) + static int config_tdx_module(struct tdmr_info_list *tdmr_list, u64 global_= keyid) { struct tdx_module_args args =3D {}; @@ -1095,6 +1097,12 @@ static int config_tdx_module(struct tdmr_info_list *= tdmr_list, u64 global_keyid) args.rcx =3D __pa(tdmr_pa_array); args.rdx =3D tdmr_list->nr_consumed_tdmrs; args.r8 =3D global_keyid; + + if (tdx_supports_dynamic_pamt(&tdx_sysinfo)) { + pr_info("Enable Dynamic PAMT\n"); + args.r8 |=3D TDX_SYS_CONFIG_DYNAMIC_PAMT; + } + ret =3D seamcall_prerr(TDH_SYS_CONFIG, &args); =20 /* Free the array as it is not required anymore. */ diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 46c4214b79fb..096c78a1d438 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -86,9 +86,6 @@ struct tdmr_info { DECLARE_FLEX_ARRAY(struct tdmr_reserved_area, reserved_areas); } __packed __aligned(TDMR_INFO_ALIGNMENT); =20 -/* Bit definitions of TDX_FEATURES0 metadata field */ -#define TDX_FEATURES0_NO_RBP_MOD BIT(18) - /* * Do not put any hardware-defined TDX structure representations below * this comment! --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 97FBE346E6C for ; Thu, 29 Jan 2026 01:16:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649376; cv=none; b=jP9sNF4OHPaSMVigkTbyvY/PIQY4h/WAc8S/m91cKMb+I86Iik/YWi6h5Uv3hR4e90dtIqvH7lh8GqrHcXT9RbR33xXON9ayKbrLB0Q8UhoG/+pQJqj7ETOUFhySGeEtwennUP/2P44c8AEZ9ZB9Z2wrQZID+AphrKnYDCyIquQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649376; c=relaxed/simple; bh=EJA7XKSAHRGSswQQXcHnUMuba3R8D9sqLu/XxynOWLo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=VHN4DHJldFn0I4Fqcn1qSvxbTQ25cZ5ENG+gaFCZtK1zg7MEPe1kPov76PM53/g09lvtOVmHTGmK34jr+Tc3MwTlEAwkR7GDtVNdNW+nzGy/RIOHG1VdN8GBYVolaRSVcinZzvJSC9RelKz/MEW3ojUXYEM6/4AmfRW5pxxBKcs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Ff/6Xpj+; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Ff/6Xpj+" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-34ea5074935so323275a91.0 for ; Wed, 28 Jan 2026 17:16:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649374; x=1770254174; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=1krQa7Jk7W59zRdRjLWbI+ZDvWi8VJAVMG8mEFH1axA=; b=Ff/6Xpj+vVRGLtSWAW8NM97AymcBmXePAsKwwO0hZbU2oJ/1OwDq5niySedVhiZwQk r8MSPu5iYBixrJiFwKfu1fmo1LMpf33zVyMnQaTfV/mggyAqWYzMBQ5m7FDpFMwJFB/L An4AXuA5Gd6Xs6sb5oCgYnacYU/e0eMeVr5jBVTQ3LGSYZT9UPzEbXaW1iTG/xizgO2L t9ZLnGFoYxDC/v2tE9MjjbkmOObA9Z/nxrJOAmWvp1PZF2iNfjn6BORECmMRsjCIwQ1F LTKkGQfJRojtq9GrYvUpTCDWhDbKR0T62P4S/Ukjj+ENbl5IkfperRxg549Iy1YIIEIv Hj3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649374; x=1770254174; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=1krQa7Jk7W59zRdRjLWbI+ZDvWi8VJAVMG8mEFH1axA=; b=rd1pgB/+VH8YD5oaTu55s44DhENt52n24yeO3jj43FagilYU6WMzzgCeTCE6xlmnM7 2/6PgvoOVweGhgcKoUu6xpoXPulZcpFy6q6dJ4WctGAmiYRj3suqv6CnSma2mQ0VUUxO GhdAb9dH4R0KF6S+Oy4xqfPdomW5391U0DekrWWNo2URZqkkEslQ8bouGVH+RulztegZ vPP/QtMc7V2VS7FdT/OlvZEBPNrssrRusCM34aqj3MunowN5DQk5oJhCiH5Frs7Uukl4 xFeXexEOnGCTkZ7zeD1p8hsPngu23MdhcHtuEiuY1VKBo17fZP+1sheQRhWV5H86NND/ nOiQ== X-Gm-Message-State: AOJu0YwxXs6Xg7kUNNxxYX3lBWlH1PjaM3t0rLoTp7cl6HL/TMRhspmS E+AmyljNV4Ke09jfoUISf28WyJr9ymXcpwwTpFCa3JTeLhOBA8SBeH/845dXCLEsVV72zYKysFx 1UvNeBQ== X-Received: from pjph21.prod.google.com ([2002:a17:90a:9c15:b0:352:c3aa:9e08]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:270e:b0:34a:b459:bd10 with SMTP id 98e67ed59e1d1-353fed7104bmr6156493a91.24.1769649373991; Wed, 28 Jan 2026 17:16:13 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:56 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-25-seanjc@google.com> Subject: [RFC PATCH v5 24/45] Documentation/x86: Add documentation for TDX's Dynamic PAMT From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kirill A. Shutemov Expand TDX documentation to include information on the Dynamic PAMT feature. The new section explains PAMT support in the TDX module and how Dynamic PAMT affects the kernel memory use. Signed-off-by: Kirill A. Shutemov [Add feedback, update log] Signed-off-by: Rick Edgecombe Tested-by: Sagi Shahar Signed-off-by: Sean Christopherson --- Documentation/arch/x86/tdx.rst | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/Documentation/arch/x86/tdx.rst b/Documentation/arch/x86/tdx.rst index 61670e7df2f7..8d45d31fee29 100644 --- a/Documentation/arch/x86/tdx.rst +++ b/Documentation/arch/x86/tdx.rst @@ -99,6 +99,27 @@ initialize:: =20 [..] virt/tdx: module initialization failed ... =20 +Dynamic PAMT +------------ + +PAMT is memory that the TDX module needs to keep data about each page +(think like struct page). It needs to handed to the TDX module for its +exclusive use. For normal PAMT, this is installed when the TDX module +is first loaded and comes to about 0.4% of system memory. + +Dynamic PAMT is a TDX feature that allows VMM to allocate part of the +PAMT as needed (the parts for tracking 4KB size pages). The other page +sizes (1GB and 2MB) are still allocated statically at the time of +TDX module initialization. This reduces the amount of memory that TDX +uses while TDs are not in use. + +When Dynamic PAMT is in use, dmesg shows it like: + [..] virt/tdx: Enable Dynamic PAMT + [..] virt/tdx: 10092 KB allocated for PAMT + [..] virt/tdx: module initialized + +Dynamic PAMT is enabled automatically if supported. + TDX Interaction to Other Kernel Components ------------------------------------------ =20 --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E37734B183 for ; Thu, 29 Jan 2026 01:16:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649379; cv=none; b=AjCHxAZTsYd3/O4sZiUFD2Lg3jv//EysLtAXhzKAVGeoJum0lNqoZXRxZBIatA9eZPzTeNkSHMtI0KWroF3SJkwMDzpLZ4PRlFwojzIIPVv/hy/DsiF3oXM7I4n+sH7MDYFN9ccJ5TlErBCqMM8WkZYa/D+ASRnYVNMC3gQPxPk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649379; c=relaxed/simple; bh=MXp9XA/Uy7hh/qURkarvwWX9R8yNmxnCcj1ibUkwNqY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ujMDI6oqaPWUXwRJaWDHmJR22rdpWnxIycaYzmGDvv2o9JsvaaKeDgdWZxj8GdqSIBtbKCezAqlFcaZ6I0zfdC6zUrE4C1SgTdarO4X/cHb1qzVInbgLMdxY/95NSM7ca/nwZ63EQG5YIu0MOOJj+v8MkheggxgfbEojc8jexLI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=2R9nd942; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="2R9nd942" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-c6133262e4eso233097a12.2 for ; Wed, 28 Jan 2026 17:16:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649376; x=1770254176; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=WBiU3c3o5EnYCzpzOQ+YM8lgZxAVBivBH7B1NpdVc/o=; b=2R9nd9426KsSkF7W/JaFUA5+r6NRTptWvVzI5O3VSAM5x8HGqYk1Ee53MzU4vu4/G9 Z5VBRm70sKiI1/KELI85iAw45dKfSg+dVeWiuLQas3YQ7/wrirM56qrwyY2S6lnppH0Z azGEns2XYotea3ANj5gYujS8PYUKzABjpQJOwryxSjQCDLgk0T2WB7Aszwy9ewDMYD4+ YNR1MrIl/OR94XEtff1Ba+AB7kUU6zNYNo+lQUqFv7wseeoFGRyZb4r8nGza2r84NlOi SHYe15N+iN3APR5p+CBke506oo+YbVwidILVVYXSa8VD1dyp1Hpg/pZ+Zvg+079+zJEV 7CPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649376; x=1770254176; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=WBiU3c3o5EnYCzpzOQ+YM8lgZxAVBivBH7B1NpdVc/o=; b=vw5TBiNaDYB5l/Y4ONFT1+aUqnzk9Z7OfEm/uCuOp2V9QnMbeCuFTPIDWhtqC+oiHS mYy6PRtu9EGHBWScZuWCMIpEpXPVwudy7cKLPYo644eeRUoky68A2r8dGAR2W/jm7nBA PxRg3mPwBjObeLExK65041+HkCxFRNDYygoXnpsrJBxci95oRKE1jHZGCKOFpnFrBGAG OobGEyTd9RnobvQkW6BeKhQGg/VgP/C55UtZaQHHybYv39vrqX1Aj1hGLe9i/n2++bIW L3HzPRoEmNm+UJ27yEyOVgGvg01xRonTREdtSsZt1Id3lcLJZWmTnsRyH7A3+fcRr+D/ qsDw== X-Gm-Message-State: AOJu0Yx9KPQjy7YPrBg02UtbtCpYWxbeG6Ecrh8Cl7hhUrDRE65gCO4N 4DLvnd4oo+rV+g2r/h2Nn+MOoB91F2MUvwogIMM/7Z6wiZpCwaO/l2vGDwNBKSkJSO6HZJzUu6Z ioVZBqw== X-Received: from pjbkk8.prod.google.com ([2002:a17:90b:4a08:b0:34b:fe89:512c]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:2d20:b0:34f:ec81:bc3d with SMTP id adf61e73a8af0-38ec64225eemr6883378637.44.1769649375880; Wed, 28 Jan 2026 17:16:15 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:57 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-26-seanjc@google.com> Subject: [RFC PATCH v5 25/45] *** DO NOT MERGE *** x86/virt/tdx: Don't assume guest memory is backed by struct page From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Remove the completely unnecessary assumptions that memory mapped into a TDX guest is backed by refcounted struct page memory. TDH_MEM_PAGE_ADD and TDH_MEM_PAGE_AUG are glorified writes to PTEs, they have no business placing requirements on how KVM and guest_memfd manage memory. Rip out the misguided struct page assumptions/constraints before hugepage support is added for S-EPT, e.g. so the kernel doesn't pick up even worse assumptions like "a hugepage must be contained in a single folio". TODO (before merge): Replace "u64 pfn" with something type-safe. Signed-off-by: Sean Christopherson --- arch/x86/include/asm/tdx.h | 25 ++++++--------- arch/x86/kvm/vmx/tdx.c | 33 ++++++++++--------- arch/x86/virt/vmx/tdx/tdx.c | 63 +++++++++++++++++++------------------ 3 files changed, 59 insertions(+), 62 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 56bdfbce4289..1f57f7721286 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -154,10 +154,10 @@ static inline void tdx_init_pamt_cache(struct tdx_pam= t_cache *cache) =20 void tdx_free_pamt_cache(struct tdx_pamt_cache *cache); int tdx_topup_pamt_cache(struct tdx_pamt_cache *cache, unsigned long npage= s); -int tdx_pamt_get(struct page *page, struct tdx_pamt_cache *cache); -void tdx_pamt_put(struct page *page); +int tdx_pamt_get(u64 pfn, struct tdx_pamt_cache *cache); +void tdx_pamt_put(u64 pfn); =20 -void tdx_quirk_reset_page(struct page *page); +void tdx_quirk_reset_page(u64 pfn); =20 int tdx_guest_keyid_alloc(void); u32 tdx_get_nr_guest_keyids(void); @@ -206,23 +206,18 @@ struct tdx_vp { struct page **tdcx_pages; }; =20 -static inline u64 mk_keyed_paddr(u16 hkid, struct page *page) +static inline u64 mk_keyed_paddr(u16 hkid, u64 pfn) { - u64 ret; - - ret =3D page_to_phys(page); - /* KeyID bits are just above the physical address bits: */ - ret |=3D (u64)hkid << boot_cpu_data.x86_phys_bits; - - return ret; + /* KeyID bits are just above the physical address bits. */ + return PFN_PHYS(pfn) | ((u64)hkid << boot_cpu_data.x86_phys_bits); } =20 u64 tdh_vp_enter(struct tdx_vp *vp, struct tdx_module_args *args); u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page); -u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, struct page *page, struct= page *source, u64 *ext_err1, u64 *ext_err2); +u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, u64 pfn, struct page *sou= rce, u64 *ext_err1, u64 *ext_err2); u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, enum pg_level level, stru= ct page *page, u64 *ext_err1, u64 *ext_err2); u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page); -u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, enum pg_level level, stru= ct page *page, u64 *ext_err1, u64 *ext_err2); +u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, enum pg_level level, u64 = pfn, u64 *ext_err1, u64 *ext_err2); u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, enum pg_level level, u= 64 *ext_err1, u64 *ext_err2); u64 tdh_mng_key_config(struct tdx_td *td); u64 tdh_mng_create(struct tdx_td *td, u16 hkid); @@ -237,12 +232,12 @@ u64 tdh_mng_init(struct tdx_td *td, u64 td_params, u6= 4 *extended_err); u64 tdh_vp_init(struct tdx_vp *vp, u64 initial_rcx, u32 x2apicid); u64 tdh_vp_rd(struct tdx_vp *vp, u64 field, u64 *data); u64 tdh_vp_wr(struct tdx_vp *vp, u64 field, u64 data, u64 mask); -u64 tdh_phymem_page_reclaim(struct page *page, u64 *tdx_pt, u64 *tdx_owner= , u64 *tdx_size); +u64 tdh_phymem_page_reclaim(u64 pfn, u64 *tdx_pt, u64 *tdx_owner, u64 *tdx= _size); u64 tdh_mem_track(struct tdx_td *tdr); u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, enum pg_level level, u= 64 *ext_err1, u64 *ext_err2); u64 tdh_phymem_cache_wb(bool resume); u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td); -u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page); +u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, u64 pfn); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index d74a2547e512..4ac312376ac9 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -318,11 +318,11 @@ static inline void tdx_disassociate_vp(struct kvm_vcp= u *vcpu) }) =20 /* TDH.PHYMEM.PAGE.RECLAIM is allowed only when destroying the TD. */ -static int __tdx_reclaim_page(struct page *page) +static int __tdx_reclaim_page(kvm_pfn_t pfn) { u64 err, rcx, rdx, r8; =20 - err =3D tdh_phymem_page_reclaim(page, &rcx, &rdx, &r8); + err =3D tdh_phymem_page_reclaim(pfn, &rcx, &rdx, &r8); =20 /* * No need to check for TDX_OPERAND_BUSY; all TD pages are freed @@ -337,11 +337,12 @@ static int __tdx_reclaim_page(struct page *page) =20 static int tdx_reclaim_page(struct page *page) { + kvm_pfn_t pfn =3D page_to_pfn(page); int r; =20 - r =3D __tdx_reclaim_page(page); + r =3D __tdx_reclaim_page(pfn); if (!r) - tdx_quirk_reset_page(page); + tdx_quirk_reset_page(pfn); return r; } =20 @@ -583,7 +584,7 @@ static void tdx_reclaim_td_control_pages(struct kvm *kv= m) if (!kvm_tdx->td.tdr_page) return; =20 - if (__tdx_reclaim_page(kvm_tdx->td.tdr_page)) + if (__tdx_reclaim_page(page_to_pfn(kvm_tdx->td.tdr_page))) return; =20 /* @@ -595,7 +596,7 @@ static void tdx_reclaim_td_control_pages(struct kvm *kv= m) if (TDX_BUG_ON(err, TDH_PHYMEM_PAGE_WBINVD, kvm)) return; =20 - tdx_quirk_reset_page(kvm_tdx->td.tdr_page); + tdx_quirk_reset_page(page_to_pfn(kvm_tdx->td.tdr_page)); =20 __tdx_free_control_page(kvm_tdx->td.tdr_page); kvm_tdx->td.tdr_page =3D NULL; @@ -1640,8 +1641,8 @@ static int tdx_mem_page_add(struct kvm *kvm, gfn_t gf= n, enum pg_level level, KVM_BUG_ON(!kvm_tdx->page_add_src, kvm)) return -EIO; =20 - err =3D tdh_mem_page_add(&kvm_tdx->td, gpa, pfn_to_page(pfn), - kvm_tdx->page_add_src, &entry, &level_state); + err =3D tdh_mem_page_add(&kvm_tdx->td, gpa, pfn, kvm_tdx->page_add_src, + &entry, &level_state); if (unlikely(IS_TDX_OPERAND_BUSY(err))) return -EBUSY; =20 @@ -1655,12 +1656,11 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t = gfn, enum pg_level level, kvm_pfn_t pfn) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - struct page *page =3D pfn_to_page(pfn); gpa_t gpa =3D gfn_to_gpa(gfn); u64 entry, level_state; u64 err; =20 - err =3D tdh_mem_page_aug(&kvm_tdx->td, gpa, level, page, &entry, &level_s= tate); + err =3D tdh_mem_page_aug(&kvm_tdx->td, gpa, level, pfn, &entry, &level_st= ate); if (unlikely(IS_TDX_OPERAND_BUSY(err))) return -EBUSY; =20 @@ -1712,7 +1712,6 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); struct vcpu_tdx *tdx =3D to_tdx(vcpu); - struct page *page =3D pfn_to_page(pfn); int ret; =20 if (KVM_BUG_ON(!vcpu, kvm)) @@ -1730,7 +1729,7 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, =20 WARN_ON_ONCE((mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); =20 - ret =3D tdx_pamt_get(page, &tdx->pamt_cache); + ret =3D tdx_pamt_get(pfn, &tdx->pamt_cache); if (ret) return ret; =20 @@ -1752,7 +1751,7 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, ret =3D tdx_mem_page_add(kvm, gfn, level, pfn); =20 if (ret) - tdx_pamt_put(page); + tdx_pamt_put(pfn); =20 return ret; } @@ -1828,8 +1827,8 @@ static void tdx_sept_reclaim_private_sp(struct kvm *k= vm, gfn_t gfn, static void tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte) { - struct page *page =3D pfn_to_page(spte_to_pfn(mirror_spte)); struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); gpa_t gpa =3D gfn_to_gpa(gfn); u64 err, entry, level_state; =20 @@ -1868,12 +1867,12 @@ static void tdx_sept_remove_private_spte(struct kvm= *kvm, gfn_t gfn, if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_REMOVE, entry, level_state, kvm)) return; =20 - err =3D tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, page); + err =3D tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, pfn); if (TDX_BUG_ON(err, TDH_PHYMEM_PAGE_WBINVD, kvm)) return; =20 - tdx_quirk_reset_page(page); - tdx_pamt_put(page); + tdx_quirk_reset_page(pfn); + tdx_pamt_put(pfn); } =20 void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 90407493bb45..85c31ed9b9d1 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -799,9 +799,9 @@ static void tdx_quirk_reset_paddr(unsigned long base, u= nsigned long size) mb(); } =20 -void tdx_quirk_reset_page(struct page *page) +void tdx_quirk_reset_page(u64 pfn) { - tdx_quirk_reset_paddr(page_to_phys(page), PAGE_SIZE); + tdx_quirk_reset_paddr(PFN_PHYS(pfn), PAGE_SIZE); } EXPORT_SYMBOL_FOR_KVM(tdx_quirk_reset_page); =20 @@ -1665,6 +1665,11 @@ static void tdx_clflush_page(struct page *page) clflush_cache_range(page_to_virt(page), PAGE_SIZE); } =20 +static void tdx_clflush_pfn(u64 pfn) +{ + clflush_cache_range(__va(PFN_PHYS(pfn)), PAGE_SIZE); +} + static int pg_level_to_tdx_sept_level(enum pg_level level) { WARN_ON_ONCE(level =3D=3D PG_LEVEL_NONE); @@ -1691,17 +1696,17 @@ u64 tdh_mng_addcx(struct tdx_td *td, struct page *t= dcs_page) } EXPORT_SYMBOL_FOR_KVM(tdh_mng_addcx); =20 -u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, struct page *page, struct= page *source, u64 *ext_err1, u64 *ext_err2) +u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, u64 pfn, struct page *sou= rce, u64 *ext_err1, u64 *ext_err2) { struct tdx_module_args args =3D { .rcx =3D gpa, .rdx =3D tdx_tdr_pa(td), - .r8 =3D page_to_phys(page), + .r8 =3D PFN_PHYS(pfn), .r9 =3D page_to_phys(source), }; u64 ret; =20 - tdx_clflush_page(page); + tdx_clflush_pfn(pfn); ret =3D seamcall_ret(TDH_MEM_PAGE_ADD, &args); =20 *ext_err1 =3D args.rcx; @@ -1743,17 +1748,17 @@ u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *td= cx_page) } EXPORT_SYMBOL_FOR_KVM(tdh_vp_addcx); =20 -u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, enum pg_level level, - struct page *page, u64 *ext_err1, u64 *ext_err2) +u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, enum pg_level level, u64 = pfn, + u64 *ext_err1, u64 *ext_err2) { struct tdx_module_args args =3D { .rcx =3D gpa | pg_level_to_tdx_sept_level(level), .rdx =3D tdx_tdr_pa(td), - .r8 =3D page_to_phys(page), + .r8 =3D PFN_PHYS(pfn), }; u64 ret; =20 - tdx_clflush_page(page); + tdx_clflush_pfn(pfn); ret =3D seamcall_ret(TDH_MEM_PAGE_AUG, &args); =20 *ext_err1 =3D args.rcx; @@ -1997,10 +2002,10 @@ EXPORT_SYMBOL_FOR_KVM(tdh_vp_init); * So despite the names, they must be interpted specially as described by = the spec. Return * them only for error reporting purposes. */ -u64 tdh_phymem_page_reclaim(struct page *page, u64 *tdx_pt, u64 *tdx_owner= , u64 *tdx_size) +u64 tdh_phymem_page_reclaim(u64 pfn, u64 *tdx_pt, u64 *tdx_owner, u64 *tdx= _size) { struct tdx_module_args args =3D { - .rcx =3D page_to_phys(page), + .rcx =3D PFN_PHYS(pfn), }; u64 ret; =20 @@ -2056,17 +2061,17 @@ u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td) { struct tdx_module_args args =3D {}; =20 - args.rcx =3D mk_keyed_paddr(tdx_global_keyid, td->tdr_page); + args.rcx =3D mk_keyed_paddr(tdx_global_keyid, page_to_pfn(td->tdr_page)); =20 return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); } EXPORT_SYMBOL_FOR_KVM(tdh_phymem_page_wbinvd_tdr); =20 -u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page) +u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, u64 pfn) { struct tdx_module_args args =3D {}; =20 - args.rcx =3D mk_keyed_paddr(hkid, page); + args.rcx =3D mk_keyed_paddr(hkid, pfn); =20 return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); } @@ -2136,11 +2141,9 @@ static void free_pamt_array(u64 *pa_array) * Calculate the arg needed for operating on the DPAMT backing for * a given 4KB page. */ -static u64 pamt_2mb_arg(struct page *page) +static u64 pamt_2mb_arg(u64 pfn) { - unsigned long hpa_2mb =3D ALIGN_DOWN(page_to_phys(page), PMD_SIZE); - - return hpa_2mb | TDX_PS_2M; + return ALIGN_DOWN(PFN_PHYS(pfn), PMD_SIZE) | TDX_PS_2M; } =20 /* @@ -2149,10 +2152,10 @@ static u64 pamt_2mb_arg(struct page *page) * error. In the case of TDX module error, the return code is stored * in tdx_err. */ -static u64 tdh_phymem_pamt_add(struct page *page, u64 *pamt_pa_array) +static u64 tdh_phymem_pamt_add(u64 pfn, u64 *pamt_pa_array) { struct tdx_module_args args =3D { - .rcx =3D pamt_2mb_arg(page) + .rcx =3D pamt_2mb_arg(pfn) }; =20 dpamt_copy_to_regs(&args, rdx, pamt_pa_array); @@ -2161,10 +2164,10 @@ static u64 tdh_phymem_pamt_add(struct page *page, u= 64 *pamt_pa_array) } =20 /* Remove PAMT backing for the given page. */ -static u64 tdh_phymem_pamt_remove(struct page *page, u64 *pamt_pa_array) +static u64 tdh_phymem_pamt_remove(u64 pfn, u64 *pamt_pa_array) { struct tdx_module_args args =3D { - .rcx =3D pamt_2mb_arg(page), + .rcx =3D pamt_2mb_arg(pfn), }; u64 ret; =20 @@ -2180,7 +2183,7 @@ static u64 tdh_phymem_pamt_remove(struct page *page, = u64 *pamt_pa_array) static DEFINE_SPINLOCK(pamt_lock); =20 /* Bump PAMT refcount for the given page and allocate PAMT memory if neede= d */ -int tdx_pamt_get(struct page *page, struct tdx_pamt_cache *cache) +int tdx_pamt_get(u64 pfn, struct tdx_pamt_cache *cache) { u64 pamt_pa_array[MAX_NR_DPAMT_ARGS]; atomic_t *pamt_refcount; @@ -2190,7 +2193,7 @@ int tdx_pamt_get(struct page *page, struct tdx_pamt_c= ache *cache) if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) return 0; =20 - pamt_refcount =3D tdx_find_pamt_refcount(page_to_pfn(page)); + pamt_refcount =3D tdx_find_pamt_refcount(pfn); =20 /* * If the pamt page is already added (i.e. refcount >=3D 1), @@ -2214,7 +2217,7 @@ int tdx_pamt_get(struct page *page, struct tdx_pamt_c= ache *cache) } =20 /* Try to add the pamt page and take the refcount 0->1. */ - tdx_status =3D tdh_phymem_pamt_add(page, pamt_pa_array); + tdx_status =3D tdh_phymem_pamt_add(pfn, pamt_pa_array); if (IS_TDX_SUCCESS(tdx_status)) { /* * The refcount is zero, and this locked path is the only way to @@ -2257,7 +2260,7 @@ EXPORT_SYMBOL_FOR_KVM(tdx_pamt_get); * Drop PAMT refcount for the given page and free PAMT memory if it is no * longer needed. */ -void tdx_pamt_put(struct page *page) +void tdx_pamt_put(u64 pfn) { u64 pamt_pa_array[MAX_NR_DPAMT_ARGS]; atomic_t *pamt_refcount; @@ -2266,7 +2269,7 @@ void tdx_pamt_put(struct page *page) if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) return; =20 - pamt_refcount =3D tdx_find_pamt_refcount(page_to_pfn(page)); + pamt_refcount =3D tdx_find_pamt_refcount(pfn); =20 /* * If the there are more than 1 references on the pamt page, @@ -2285,7 +2288,7 @@ void tdx_pamt_put(struct page *page) return; =20 /* Try to remove the pamt page and take the refcount 1->0. */ - tdx_status =3D tdh_phymem_pamt_remove(page, pamt_pa_array); + tdx_status =3D tdh_phymem_pamt_remove(pfn, pamt_pa_array); =20 /* * Don't free pamt_pa_array as it could hold garbage when @@ -2357,7 +2360,7 @@ struct page *__tdx_alloc_control_page(gfp_t gfp) if (!page) return NULL; =20 - if (tdx_pamt_get(page, NULL)) { + if (tdx_pamt_get(page_to_pfn(page), NULL)) { __free_page(page); return NULL; } @@ -2375,7 +2378,7 @@ void __tdx_free_control_page(struct page *page) if (!page) return; =20 - tdx_pamt_put(page); + tdx_pamt_put(page_to_pfn(page)); __free_page(page); } EXPORT_SYMBOL_FOR_KVM(__tdx_free_control_page); --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1CA4D33DEE3 for ; Thu, 29 Jan 2026 01:16:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649379; cv=none; b=BH5anuldInszgt0JK4akGW6Hk2zJRaBviOHIAUtj50UdOC1qeeazAbFxjyRc9jh0vls9BiSwfaRSxQ4gQwqYWnb+Ql/mALx/ezFPq5qunQk7Sf7h2AVW85EsNaEjlUC3Hg9hy5AOShwh9kbcjEhHInzRaRIWVe479SBxBiZNTzw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649379; c=relaxed/simple; bh=2AFN/urgPi8sHxS0ED+9ecVsZaZqGadZaHDApgUZnpM=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=AoEWOOE7xzGDcEmh1tIR/0iULpaciMYXbzf1W+3AClY3zoDm3ZGVe/JBP+qeSF0lvaQqQx4Y05faBjjyYo1QN09zg4V3GFc1aZQ8iSNbaB+NkfPm+bjNKilJNnm5GwdgmLN2HAUaOdl/Cq1f5WF8SnkNVtpe44bih2eTKtZkkOI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=eaVBYD+0; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="eaVBYD+0" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-34c6e05af6fso309000a91.1 for ; Wed, 28 Jan 2026 17:16:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649377; x=1770254177; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=80B6pHXYnPFCaLMTncYxaAMN7FbDOTiyI0xEQDWm1fU=; b=eaVBYD+0n6Ge25BC3ApptZtLTjnLI5TOcJm7WWj03lkKmIyYze7yhM4BVhah0PXQN/ j0QU8O9nhV/ZkwKKkoPbzaFEj4EhkiNmHMqs7m9N7ey/+bCYffpxlkmVEk55STy1p2EK 2xJ011t+ec7KS0PS/hPxuzWhSV7QaukJLyOuJ+fL4IHzMfvEvBr/RjoVB+nIALURK5z5 c3pREQMMAvaAFIh98VyTjtO86rev/+lrIYhmjr97q4vQ4H0gtribkooM6NFElFo8Hkat TAvJR6hHJwb8s6q8CKQdZ3qRH5Ce3l8pOtbCg6m2xqKh7PkO12ijpThBWO1y/sO/FbSH L2ow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649377; x=1770254177; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=80B6pHXYnPFCaLMTncYxaAMN7FbDOTiyI0xEQDWm1fU=; b=C03yiyr1jfxc1Dc40vN7tB/EQTOtHT4HgAVY74RkKQYM/bKZYi1oief5mSnD+AEvXd QFDSc0v2IEpjA66jTkQsfxkGPi2XtmPnPL4HKhybL6miWaLEjTWv1x2AdTFxLUfwCFL7 YHLnOqPklJ0FVvDxrnXdAYYb0bsxrlV3OBOka6I+frtbdbiUKaN4VGmTKpTaW1tEcI9o CvoOZI9XpqWj3YjI06+bT6d/tZp5VVjwkWOyTfhgeeKLyzHVm03xaCt8pomduHQf46ju 2yINVChu1o5LuPjBoiG5cK+ip3fh0Nl8LkFSzztTqSY0Oo0u8Wa03cyVoaOqxeJcdtX5 NVGA== X-Gm-Message-State: AOJu0YzmRTDdgjbio+HBPGbQV+Sb2zXKeLuzAGVZZc+QE83Hj28rx84/ d6GzVuqqWLv3bSmN69NiLMCyYDnyFnaRCAo50PCoED+HlO+YfMjGwwm+tW/JBXCavbrEysbyggh BKFbHpA== X-Received: from pjbos4.prod.google.com ([2002:a17:90b:1cc4:b0:34a:c87f:a95a]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:c88c:b0:340:9cf1:54d0 with SMTP id 98e67ed59e1d1-353fecca19emr6398135a91.1.1769649377470; Wed, 28 Jan 2026 17:16:17 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:58 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-27-seanjc@google.com> Subject: [RFC PATCH v5 26/45] x86/virt/tdx: Enhance tdh_mem_page_aug() to support huge pages From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yan Zhao Enhance the SEAMCALL wrapper tdh_mem_page_aug() to support huge pages. The SEAMCALL TDH_MEM_PAGE_AUG currently supports adding physical memory to the S-EPT up to 2MB in size. While keeping the "level" parameter in the tdh_mem_page_aug() wrapper to allow callers to specify the physical memory size, introduce the parameters "folio" and "start_idx" to specify the physical memory starting from the page at "start_idx" within the "folio". The specified physical memory must be fully contained within a single folio. Invoke tdx_clflush_page() for each 4KB segment of the physical memory being added. tdx_clflush_page() performs CLFLUSH operations conservatively to prevent dirty cache lines from writing back later and corrupting TD memory. Signed-off-by: Xiaoyao Li Signed-off-by: Isaku Yamahata Signed-off-by: Yan Zhao [sean: remove the page+folio assumptions] Signed-off-by: Sean Christopherson --- arch/x86/virt/vmx/tdx/tdx.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 85c31ed9b9d1..37776ea56eb7 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1756,9 +1756,13 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, enu= m pg_level level, u64 pfn, .rdx =3D tdx_tdr_pa(td), .r8 =3D PFN_PHYS(pfn), }; + unsigned long npages =3D page_level_size(level) / PAGE_SIZE; + unsigned long i; u64 ret; =20 - tdx_clflush_pfn(pfn); + for (i =3D 0; i < npages; i++) + tdx_clflush_pfn(pfn + i); + ret =3D seamcall_ret(TDH_MEM_PAGE_AUG, &args); =20 *ext_err1 =3D args.rcx; --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D308F34DB6C for ; Thu, 29 Jan 2026 01:16:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649381; cv=none; b=e1FOf326EJkVYFY/UyG9KgNaMuAqtM04tTGAFqv/8R90CEU1qa7O0RHIQb12P/EZzMQmqFrvuEopXLZWJUhN9NDAEY4bIj12U1M0ZTZMIN2b3wqu9iZIJc3F/5e7ATZQbkL/Cn5eb53agQVxJdkoPeG/XT7x3ScDnT5v5lONRVk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649381; c=relaxed/simple; bh=PfMNr2r+VH5JLxrfxOQQF4dS/m+8iJ8wfmAB6pwfKRU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=pNQAhIjd/Q6DWKOVBjLEbTva9Pk/UtISls60A3IcDLkorW7H26f3xz/n0Myj1U0dr+k8GXP0O09/PBDJWCkmMqLXlbYMY50nFU7hSayPvC7dSqAzekwTUoxE1eNPhlRhVr103kNzwN5PpBHZHPwwB3JQrfGsYOQBjQ0qh6i1vU0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=dwf9Hbf1; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dwf9Hbf1" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-81f8c209cfbso212621b3a.2 for ; Wed, 28 Jan 2026 17:16:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649379; x=1770254179; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=IDz9ZOJlCXDAbAOz3SQRUqJo6I1Od4zF7irdWu0ijNQ=; b=dwf9Hbf17Re3HIZIyQ5QcO8evvYep6A1m2y+gteFjK7Phfun1Pl+I1+YgIPct2drpQ qQm07NUrMJquy1Yc6J6R7bUAGMFkUvAtN0dyf2vOkol1QVBBwFXLG/B/TzKXqCXZdHU6 NFs+/0fKDM3UCUUCWVR7YaW1s+CeiqsctS5LQTQUFhvkE1Rm1BgXkBVUEbwQP6lzL2uT q+B4rsMFES//gUMYlo1XnkhpBvmNiPyKKfQrB2awZN7yLzzwuvx4GT96WeRxOURt1vA+ awrSWdR/cOxa8n7PxJRXVR0J+PRzKbKd0zx3+R4gBjqEwUcgE1vmVh66+bxEjkVz/RpE QfJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649379; x=1770254179; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=IDz9ZOJlCXDAbAOz3SQRUqJo6I1Od4zF7irdWu0ijNQ=; b=E0UA1I/SWhxomYdxhlVo9IRESAlGBgDSimgfVhiQJWuolFA5KsDolujFV3YPsuS0M8 SworRqGkP30Vkh1pz7/yJ67Pij8bA43Gvjk4IxH+VOng2dd8Wv0yOlx21v/eGDzH7POW Vp2es8sWSSwD62jiGdcBCwUq+GZjfrYywaAnt0b3tJ0AhaMS7eTX5G9S/0ZMucJ5a7rN 0xbEPHJQackoe5+dS/jKt7FUNGYSfGnb5zCwEaQU5oVp5IqSdXmIOKQjfSbLHhLRcwCX wspO8CSFp+/59ni3XiAOKjWWynX9JmuWvKebqA9jC0Zg5Qnyd0QSFfPnyrb2EpieBbPj gPhQ== X-Gm-Message-State: AOJu0YxacQHz6HS5beH6v+pRRm+VAbuLUORE5rphDrGwq0egMsjfq00e leiJk02qoiRbCChbJayOKP8yX1spQSXq+bbI37fy2UDw/JyEWfJVBUv+irA4YS2qhLI/CJT4vwe VaZ2Jcg== X-Received: from pfbli11.prod.google.com ([2002:a05:6a00:718b:b0:7dd:8bba:63aa]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:1a17:b0:823:1276:9a86 with SMTP id d2e1a72fcca58-8236929d5fbmr5606152b3a.39.1769649379248; Wed, 28 Jan 2026 17:16:19 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:59 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-28-seanjc@google.com> Subject: [RFC PATCH v5 27/45] x86/virt/tdx: Enhance tdh_phymem_page_wbinvd_hkid() to invalidate huge pages From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yan Zhao After removing a TD's private page, the TDX module does not write back and invalidate cache lines associated with the page and its keyID (i.e., the TD's guest keyID). The SEAMCALL wrapper tdh_phymem_page_wbinvd_hkid() enables the caller to provide the TD's guest keyID and physical memory address to invoke the SEAMCALL TDH_PHYMEM_PAGE_WBINVD to perform cache line invalidation. Enhance the SEAMCALL wrapper tdh_phymem_page_wbinvd_hkid() to support cache line invalidation for huge pages by introducing the parameters "folio", "start_idx", and "npages". These parameters specify the physical memory starting from the page at "start_idx" within a "folio" and spanning "npages" contiguous PFNs. Return TDX_OPERAND_INVALID if the specified memory is not entirely contained within a single folio. Signed-off-by: Xiaoyao Li Signed-off-by: Isaku Yamahata Suggested-by: Rick Edgecombe Signed-off-by: Yan Zhao Signed-off-by: Sean Christopherson --- arch/x86/include/asm/tdx.h | 2 +- arch/x86/kvm/vmx/tdx.c | 2 +- arch/x86/virt/vmx/tdx/tdx.c | 16 ++++++++++++---- 3 files changed, 14 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 1f57f7721286..8ceaebc6c1a9 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -237,7 +237,7 @@ u64 tdh_mem_track(struct tdx_td *tdr); u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, enum pg_level level, u= 64 *ext_err1, u64 *ext_err2); u64 tdh_phymem_cache_wb(bool resume); u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td); -u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, u64 pfn); +u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, u64 pfn, enum pg_level level); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 4ac312376ac9..90133e8f5c53 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1867,7 +1867,7 @@ static void tdx_sept_remove_private_spte(struct kvm *= kvm, gfn_t gfn, if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_REMOVE, entry, level_state, kvm)) return; =20 - err =3D tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, pfn); + err =3D tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, pfn, level); if (TDX_BUG_ON(err, TDH_PHYMEM_PAGE_WBINVD, kvm)) return; =20 diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 37776ea56eb7..367df9366d57 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -2071,13 +2071,21 @@ u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td) } EXPORT_SYMBOL_FOR_KVM(tdh_phymem_page_wbinvd_tdr); =20 -u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, u64 pfn) +u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, u64 pfn, enum pg_level level) { - struct tdx_module_args args =3D {}; + unsigned long npages =3D page_level_size(level) / PAGE_SIZE; + u64 err; =20 - args.rcx =3D mk_keyed_paddr(hkid, pfn); + for (unsigned long i =3D 0; i < npages; i++) { + struct tdx_module_args args =3D { + .rcx =3D mk_keyed_paddr(hkid, pfn + i), + }; =20 - return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); + err =3D seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); + if (err) + break; + } + return err; } EXPORT_SYMBOL_FOR_KVM(tdh_phymem_page_wbinvd_hkid); =20 --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A514934F46F for ; Thu, 29 Jan 2026 01:16:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649383; cv=none; b=T5zjXFflqG0xmQDloroBlEFBN+r8iZN4ncqK3ck9n6H8ZOs0fCF1WU8FwfdiJX3n4sICzkFrVS3daSx/QhAqRkiT42j2KrtSAo38U5Ijd4TWAEoVSYN8wYnPwyOUdfVhbDeXSoC7BgwESrvBcztJuYiDyHwXcJoHbRz6fj1VRKo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649383; c=relaxed/simple; bh=CYTYIlGGcrH6BQ0d71EBgYmrRkYDF9cUwwiwLMPdHys=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=uaXbA1nc6zrsFSqIjDcpKroxqGHl7gpZ+LeE1qmd8uyZKf0uW4FDMsxfVCXEumiH1w9bs3q2XlyYDqm1LQ6IyS9h+Df3d8mpIEEWyko5XwoGVSHnj5csAVLM3e32tWgOmGkAMh2XmXr8pdJCHTZ6kDIsFmo8of9vkOT3I5j1hU8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=eLELBRAQ; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="eLELBRAQ" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-34abd303b4aso804619a91.1 for ; Wed, 28 Jan 2026 17:16:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649381; x=1770254181; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=msD2jv18E2L4HOYUfz15g4GHsjmr/D9b7fjjCRW4KGo=; b=eLELBRAQUs7O/ZEHD6Bcq9OUbL5b5azWsYCDabQzX7VMxDPTDXcnN+gDxiHTFfMh5j hPIBSk3zORQuGE5kT4mJ+6ajCD+AuYkjgfwfabyMUP6YB4MESSe4oK87+GdIDF0ZpoG5 P7AlT0jSGLEbXt/7ATwSQrm1wV6iCszrJBMutfqXN4Z/Fg9vLPnzbErNGuEXF9ftBfsP 2oRplLmRHW31QmesfrOitAPBFdIXufeKFPXs54yJdCIXpIBiDnPEh7BlAArgbCvrt0fZ vMA7upCi8gatuj8En1wUwysnKtmXZ9rjj2S/DZH30iF3BMpQ3VqeoIKn5qalbymqrzNb TRVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649381; x=1770254181; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=msD2jv18E2L4HOYUfz15g4GHsjmr/D9b7fjjCRW4KGo=; b=TX6rWUDq9bUOMdfdCNGxJkYJvR7pz4MgVgJ10h+kWEIILyi1RrHjRyDpNbltEb4YkZ uioe6k/GkwLAaE7ZfMskBvTO0HXM14ZWnKQ2TngJxLpLOeza1x3X8++SQDkQcmFZLr9X 7i9fOvdFya2NcEhr5s5sCbCDCTCrGmly5rmWdix77CrEYJ7+zx3ueEv1i7LPh9G4qpus lcNyOR/c2/laWoQ98UJC+avGDxOorwKZktKExuxmhDjNfXj9G2E3wXV9HapAqPmZTSd5 +tZWDtjj9/x/xW1rjjAVEHegAVdJxBblBJ94qGfXFWF6kRpJm8nDS/9D0+uhTwDTHBEt dFOA== X-Gm-Message-State: AOJu0YwVxOQqtcvLeYSzDmkC7PRu8HVUtuteNv4iLwrmt8OThgVBrx99 S+yxCtSaaO3c9sFeRkoph4SLBe0H/OTkjiGwoo5798pngRaTz20sFsA/j+dNbJxdCkYIvrx9hzu 6S6K3IQ== X-Received: from pjof4.prod.google.com ([2002:a17:90a:8e84:b0:34c:cb46:dad7]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2789:b0:34c:cb3c:f536 with SMTP id 98e67ed59e1d1-353fedbb396mr7057728a91.36.1769649380899; Wed, 28 Jan 2026 17:16:20 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:00 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-29-seanjc@google.com> Subject: [RFC PATCH v5 28/45] x86/virt/tdx: Extend "reset page" quirk to support huge pages From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Extend the APIs for "resetting" TDX pages to workaround the TDX_PW_MCE erratum to support huge pages, e.g. so that KVM can pass in the pfn+level without having to manually calculate the size in multiple locations. No functional change intended (because KVM doesn't currently support anything but level=3DPG_LEVEL_4K). Suggested-by: Vishal Annapurve Signed-off-by: Sean Christopherson --- arch/x86/include/asm/tdx.h | 7 ++++++- arch/x86/kvm/vmx/tdx.c | 2 +- arch/x86/virt/vmx/tdx/tdx.c | 6 +++--- 3 files changed, 10 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 8ceaebc6c1a9..e61b0b3cc403 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -157,7 +157,12 @@ int tdx_topup_pamt_cache(struct tdx_pamt_cache *cache,= unsigned long npages); int tdx_pamt_get(u64 pfn, struct tdx_pamt_cache *cache); void tdx_pamt_put(u64 pfn); =20 -void tdx_quirk_reset_page(u64 pfn); +void __tdx_quirk_reset_page(u64 pfn, enum pg_level level); + +static inline void tdx_quirk_reset_page(u64 pfn) +{ + __tdx_quirk_reset_page(pfn, PG_LEVEL_4K); +} =20 int tdx_guest_keyid_alloc(void); u32 tdx_get_nr_guest_keyids(void); diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 90133e8f5c53..aca556923822 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1871,7 +1871,7 @@ static void tdx_sept_remove_private_spte(struct kvm *= kvm, gfn_t gfn, if (TDX_BUG_ON(err, TDH_PHYMEM_PAGE_WBINVD, kvm)) return; =20 - tdx_quirk_reset_page(pfn); + __tdx_quirk_reset_page(pfn, level); tdx_pamt_put(pfn); } =20 diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 367df9366d57..411e5feef39f 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -799,11 +799,11 @@ static void tdx_quirk_reset_paddr(unsigned long base,= unsigned long size) mb(); } =20 -void tdx_quirk_reset_page(u64 pfn) +void __tdx_quirk_reset_page(u64 pfn, enum pg_level level) { - tdx_quirk_reset_paddr(PFN_PHYS(pfn), PAGE_SIZE); + tdx_quirk_reset_paddr(PFN_PHYS(pfn), page_level_size(level)); } -EXPORT_SYMBOL_FOR_KVM(tdx_quirk_reset_page); +EXPORT_SYMBOL_FOR_KVM(__tdx_quirk_reset_page); =20 static void tdmr_quirk_reset_pamt(struct tdmr_info *tdmr) { --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8634A34EF0D for ; Thu, 29 Jan 2026 01:16:23 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649385; cv=none; b=XVlzUJddnDrZaHZ/roO8p96YiKKgy3cvrut/8z5O470MtbL/JM8s8wLgmq8uWJ/BwZnlLHuEcXiDiJoLjOTGwk6xKmdORhP9FjAR21RolclG4Sci2JDfR5dBrrVyWflGm9bl1bh+jFlJQyYFf5dBfwIVnpQmDG9ewUnhlUGqO5M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649385; c=relaxed/simple; bh=E/p9aO3ZAiwSbYiHC1+ljXC/97KIVcZ8eXjeNv4DXak=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=d3mHrL5E3zX8T5XR7jOfu8qEirx8Wo+MvuxGuJGmZz5aT04FUM+a0ndhEtRkxSzjvmcytxbVKI8esd72cuS2Sb8RCxTxos/dPAtdUYqxqMZTOlMwZhqTxiohWvPkEFneh+nhmXN/O+kf7lHqMQ45eFaGF3wBEVU4RNE+fnp8KtM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=duDE11qp; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="duDE11qp" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c5291b89733so300202a12.0 for ; Wed, 28 Jan 2026 17:16:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649383; x=1770254183; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=EhyaETaO3z5ZObA+zH7KzPZSfmMP5O9GD9W3Xs6375s=; b=duDE11qpqs9aYVpRTRY+ka1IbNyNWLwzQG6aZSHOSUDgdJDTOubv+Ts1rNPkr7RtUe e51jJXax3NeWMPYqX2qnUnU/3icXTrWMslm1ExMLnJfuC34xpX07H+SeDn3gRs9r62aK FZTNv5gJct/QHmwAyEoHzhZLS2Bk2dP91P5WfsxYzL06Khnhtt/KTlQHOuz2buVactIr lcEwqS3KUhhxU4mwnTRM3nBNm9IfyeXRo9kIRtBlEePRFRqZzFkFuG+daTmRtqDnD4dG kDJMMe2FTRTlrr2xWbPXfiAs58JDDP1L7HQySsFgnu+viEV5oWQHxaJ/ojQYKJ6j461D 4U+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649383; x=1770254183; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=EhyaETaO3z5ZObA+zH7KzPZSfmMP5O9GD9W3Xs6375s=; b=kP4hRJ5AtPg692w5Yn+mJ9NWglDhGP4bJcnYAhpN3j+cCA2zJtC9l2x9lZVbQmXAtg zBHesmtzQCCC5c2kqRw4dJHr3n4vVUnNLMD+t8Tf8ISrFXgzxFOuzmm4uPZPrVHPeAnJ 7878MEXYvayT82JWxS5WnGd7cmvKw2Qt85p2aT8Ze4E7cMAUB41vr7+UZJEznhpSuu2K 7r3u4JbaOVjizOMpxqlNsA3Rwk8ZonHBVB6Fe4oyMgU0TJCRUYwxt0uUI97UAoz6jof6 gAF311DIb3MLNRVS+hrcZcYqEnnZ+bSY3+bYsOceQqpbs0SxJBqhywyyPbYthxy+Depx IrlQ== X-Gm-Message-State: AOJu0YyoBpI5JQ2mcala1wHZTPgh/lykudh3w5+10R+n9yjSppyLIabD Bxp8JHXdoFx3ZyyiZo4wxbapO2nul1Etb0uo3vMdJubVx5nnKMzXF5TXtiBeJoNnokO5lb5r6Hy VtRG/aw== X-Received: from pggk18.prod.google.com ([2002:a63:d112:0:b0:c5e:9fe:7560]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6300:2206:b0:350:b8e:f99b with SMTP id adf61e73a8af0-38ec64182ccmr7501037637.45.1769649382397; Wed, 28 Jan 2026 17:16:22 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:01 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-30-seanjc@google.com> Subject: [RFC PATCH v5 29/45] x86/virt/tdx: Get/Put DPAMT page pair if and only if mapping size is 4KB From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kirill A. Shutemov Elide the guts of getting/putting a Dynamic PAMT entry when the associated mapping is greater than 4KiB, in which case static PAMT pages are used and there's no need to (un)install extra PAMT pages. Signed-off-by: Kirill A. Shutemov [Yan: Move level checking to callers of tdx_pamt_{get/put}()] Signed-off-by: Yan Zhao [sean: move level checking back to tdx_pamt_{get/put}()] Signed-off-by: Sean Christopherson --- arch/x86/include/asm/tdx.h | 16 ++++++++++++++-- arch/x86/kvm/vmx/tdx.c | 6 +++--- arch/x86/virt/vmx/tdx/tdx.c | 12 ++++++------ 3 files changed, 23 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index e61b0b3cc403..50feea01b066 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -154,8 +154,20 @@ static inline void tdx_init_pamt_cache(struct tdx_pamt= _cache *cache) =20 void tdx_free_pamt_cache(struct tdx_pamt_cache *cache); int tdx_topup_pamt_cache(struct tdx_pamt_cache *cache, unsigned long npage= s); -int tdx_pamt_get(u64 pfn, struct tdx_pamt_cache *cache); -void tdx_pamt_put(u64 pfn); +int __tdx_pamt_get(u64 pfn, struct tdx_pamt_cache *cache); +void __tdx_pamt_put(u64 pfn); + +static inline int tdx_pamt_get(u64 pfn, enum pg_level level, + struct tdx_pamt_cache *cache) +{ + return level =3D=3D PG_LEVEL_4K ? __tdx_pamt_get(pfn, cache) : 0; +} + +static inline void tdx_pamt_put(u64 pfn, enum pg_level level) +{ + if (level =3D=3D PG_LEVEL_4K) + __tdx_pamt_put(pfn); +} =20 void __tdx_quirk_reset_page(u64 pfn, enum pg_level level); =20 diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index aca556923822..bd5d902da303 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1729,7 +1729,7 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, =20 WARN_ON_ONCE((mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); =20 - ret =3D tdx_pamt_get(pfn, &tdx->pamt_cache); + ret =3D tdx_pamt_get(pfn, level, &tdx->pamt_cache); if (ret) return ret; =20 @@ -1751,7 +1751,7 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, ret =3D tdx_mem_page_add(kvm, gfn, level, pfn); =20 if (ret) - tdx_pamt_put(pfn); + tdx_pamt_put(pfn, level); =20 return ret; } @@ -1872,7 +1872,7 @@ static void tdx_sept_remove_private_spte(struct kvm *= kvm, gfn_t gfn, return; =20 __tdx_quirk_reset_page(pfn, level); - tdx_pamt_put(pfn); + tdx_pamt_put(pfn, level); } =20 void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 411e5feef39f..cff325fdec79 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -2195,7 +2195,7 @@ static u64 tdh_phymem_pamt_remove(u64 pfn, u64 *pamt_= pa_array) static DEFINE_SPINLOCK(pamt_lock); =20 /* Bump PAMT refcount for the given page and allocate PAMT memory if neede= d */ -int tdx_pamt_get(u64 pfn, struct tdx_pamt_cache *cache) +int __tdx_pamt_get(u64 pfn, struct tdx_pamt_cache *cache) { u64 pamt_pa_array[MAX_NR_DPAMT_ARGS]; atomic_t *pamt_refcount; @@ -2266,13 +2266,13 @@ int tdx_pamt_get(u64 pfn, struct tdx_pamt_cache *ca= che) free_pamt_array(pamt_pa_array); return ret; } -EXPORT_SYMBOL_FOR_KVM(tdx_pamt_get); +EXPORT_SYMBOL_FOR_KVM(__tdx_pamt_get); =20 /* * Drop PAMT refcount for the given page and free PAMT memory if it is no * longer needed. */ -void tdx_pamt_put(u64 pfn) +void __tdx_pamt_put(u64 pfn) { u64 pamt_pa_array[MAX_NR_DPAMT_ARGS]; atomic_t *pamt_refcount; @@ -2326,7 +2326,7 @@ void tdx_pamt_put(u64 pfn) */ free_pamt_array(pamt_pa_array); } -EXPORT_SYMBOL_FOR_KVM(tdx_pamt_put); +EXPORT_SYMBOL_FOR_KVM(__tdx_pamt_put); =20 void tdx_free_pamt_cache(struct tdx_pamt_cache *cache) { @@ -2372,7 +2372,7 @@ struct page *__tdx_alloc_control_page(gfp_t gfp) if (!page) return NULL; =20 - if (tdx_pamt_get(page_to_pfn(page), NULL)) { + if (__tdx_pamt_get(page_to_pfn(page), NULL)) { __free_page(page); return NULL; } @@ -2390,7 +2390,7 @@ void __tdx_free_control_page(struct page *page) if (!page) return; =20 - tdx_pamt_put(page_to_pfn(page)); + __tdx_pamt_put(page_to_pfn(page)); __free_page(page); } EXPORT_SYMBOL_FOR_KVM(__tdx_free_control_page); --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9DC433E35C for ; Thu, 29 Jan 2026 01:16:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649386; cv=none; b=HPDHrhttg4tQOrL+KukmMzhcCQ8re6NSRp3e9vCm9Uq0LanyiiIov6KWAyFNRW+2Z3aVQcGgMV4+f7Jr7NH5SLnOLpxTCr7R2ChQB6jTR8XeTD5I7+lXjJH6G/MCAIr8IkYGdu2wHBJsVwXHdgMm6Glw5AxwYuKV238ek/RnLqM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649386; c=relaxed/simple; bh=w/nijrur6jKEFCzBZzrCq8D2hM1PQMnPJT4Mk6V/KUw=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=F9WxnD2tKvlt8EB6kLg/1Sl0ozoUwNkfQfr2JKgs84Nk8utUzZ475fSyT05dGt3GGNsQPvKjcVxif5h37f670r6vqMBa5yS3mkSOa8MZXXjDM5m9tC5MNniyx8qmIYZaVpp48UAPg5YFbfLqyftbopx2ZaFJ0j+arssGPWLoFws= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=tYewLBo8; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="tYewLBo8" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-34e5a9f0d6aso427068a91.0 for ; Wed, 28 Jan 2026 17:16:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649384; x=1770254184; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=WpiUmsPu8uA5hilip45Z70pJh5rqhfxSEKqO7V+kAwg=; b=tYewLBo8XzGaATkK7wHfDeOSKsmQ7niTxv1tSAQNaRDcSCQ33p0V7Pr6QeQlXfxhyE xhuogJAXL5sz0hBv5uYzPFodLsYvs9eMHR1lvD/YjX944qqKFx7obE6f9XtAROrNhE01 njwuLkt2N8jfFewUGKoucgeLoeONJThzLT9WIpXaJ2ALnInFtCgt2VX8EIzUVKnjqxJF fHE6RYUP3PIX1+ay93Z/E42PdyVwURHOg3zxqONRojgyOOX8l4O4iCdYEqFFy/P7PYHh rIa31KghUMTFPKeZAP3aQ9fZX5Wf2c8d7JgugnAV2I/0GeEPC8qYUTdJD4fuN7xY1l+M c2yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649384; x=1770254184; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=WpiUmsPu8uA5hilip45Z70pJh5rqhfxSEKqO7V+kAwg=; b=jUUHDuazvSXw5M4nl93Aesq7758jlDVNyybJnqRKi/+6R22OI3MXOmqAmBhLrCs+Zo iOww/puxdpqloJJktm+Rx1vrvb+bQRd3sjoy3AaG5/QNq1UysvXs/X4Fq+hylFIDRMfu R8KYP3tCzzULcwUpshJFdc3elvO5sfLcPEp5KJ8bM0sHIeah6+VGnthLAhAZd5MoKD14 1lBcjEkcfAvQ0rNV+EOuCwIEb95O6qDecC2xyUJbErRYRT85Vvasx/PUD0qZ0l7kf0Qa J3uq2CZ/L+c08GSBjR6tOmMy2j8FsYrq517e6CI7/+5Sn/sSJRYDqeLKvkyYFKtKLgyP tKCQ== X-Gm-Message-State: AOJu0YzmO1/vRU7K9otn1v0fGhUUmiuuxzQCGPDZlAI9LiicJxdOHrH/ /MguWgPAz+/uJew9pAB/sFUe13l/aEFAWklJboSCNZthN247FhtNp1mhAo2IpORpMQJijt257RB 27ne2Ww== X-Received: from pjbbo7.prod.google.com ([2002:a17:90b:907:b0:34a:b143:87d3]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:51c2:b0:352:dbcc:d74c with SMTP id 98e67ed59e1d1-35429a8deb5mr979193a91.15.1769649384396; Wed, 28 Jan 2026 17:16:24 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:02 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-31-seanjc@google.com> Subject: [RFC PATCH v5 30/45] x86/virt/tdx: Add API to demote a 2MB mapping to 512 4KB mappings From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Xiaoyao Li Introduce SEAMCALL wrapper tdh_mem_page_demote() to invoke TDH_MEM_PAGE_DEMOTE, which splits a 2MB or a 1GB mapping in S-EPT into 512 4KB or 2MB mappings respectively. TDH_MEM_PAGE_DEMOTE walks the S-EPT to locate the huge entry/mapping to split, and replaces the huge entry with a new S-EPT page table containing the equivalent 512 smaller mappings. Parameters "gpa" and "level" specify the huge mapping to split, and parameter "new_sept_page" specifies the 4KB page to be added as the S-EPT page. Invoke tdx_clflush_page() before adding the new S-EPT page conservatively to prevent dirty cache lines from writing back later and corrupting TD memory. tdh_mem_page_demote() may fail, e.g., due to S-EPT walk error. Callers must check function return value and can retrieve the extended error info from the output parameters "ext_err1", and "ext_err2". The TDX module has many internal locks. To avoid staying in SEAM mode for too long, SEAMCALLs return a BUSY error code to the kernel instead of spinning on the locks. Depending on the specific SEAMCALL, the caller may need to handle this error in specific ways (e.g., retry). Therefore, return the SEAMCALL error code directly to the caller without attempting to handle it in the core kernel. Enable tdh_mem_page_demote() only on TDX modules that support feature TDX_FEATURES0.ENHANCE_DEMOTE_INTERRUPTIBILITY, which does not return error TDX_INTERRUPTED_RESTARTABLE on basic TDX (i.e., without TD partition) [2]. This is because error TDX_INTERRUPTED_RESTARTABLE is difficult to handle. The TDX module provides no guaranteed maximum retry count to ensure forward progress of the demotion. Interrupt storms could then result in a DoS if host simply retries endlessly for TDX_INTERRUPTED_RESTARTABLE. Disabling interrupts before invoking the SEAMCALL also doesn't work because NMIs can also trigger TDX_INTERRUPTED_RESTARTABLE. Therefore, the tradeoff for basic TDX is to disable the TDX_INTERRUPTED_RESTARTABLE error given the reasonable execution time for demotion. [1] Allocate (or dequeue from the cache) PAMT pages when Dynamic PAMT is enabled, as TDH.MEM.PAGE.DEMOTE takes a DPAMT page pair in R12 and R13, to store physical memory metadata for the 2MB guest private memory (after a successful split). Take care to use seamcall_saved_ret() to handle registers above R11. Free the Dynamic PAMT pages after SEAMCALL TDH_MEM_PAGE_DEMOTE fails since the guest private memory is still mapped at 2MB level. Link: https://lore.kernel.org/kvm/99f5585d759328db973403be0713f68e492b492a.= camel@intel.com [1] Link: https://lore.kernel.org/all/fbf04b09f13bc2ce004ac97ee9c1f2c965f44fdf.= camel@intel.com [2] Signed-off-by: Xiaoyao Li Co-developed-by: Kirill A. Shutemov Signed-off-by: Kirill A. Shutemov Co-developed-by: Isaku Yamahata Signed-off-by: Isaku Yamahata Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao [sean: squash all demote support into a single patch] Signed-off-by: Sean Christopherson --- arch/x86/include/asm/tdx.h | 9 +++++++ arch/x86/virt/vmx/tdx/tdx.c | 54 +++++++++++++++++++++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 1 + 3 files changed, 64 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 50feea01b066..483441de7fe0 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -15,6 +15,7 @@ /* Bit definitions of TDX_FEATURES0 metadata field */ #define TDX_FEATURES0_NO_RBP_MOD BIT_ULL(18) #define TDX_FEATURES0_DYNAMIC_PAMT BIT_ULL(36) +#define TDX_FEATURES0_ENHANCE_DEMOTE_INTERRUPTIBILITY BIT_ULL(51) =20 #ifndef __ASSEMBLER__ =20 @@ -140,6 +141,11 @@ static inline bool tdx_supports_dynamic_pamt(const str= uct tdx_sys_info *sysinfo) return sysinfo->features.tdx_features0 & TDX_FEATURES0_DYNAMIC_PAMT; } =20 +static inline bool tdx_supports_demote_nointerrupt(const struct tdx_sys_in= fo *sysinfo) +{ + return sysinfo->features.tdx_features0 & TDX_FEATURES0_ENHANCE_DEMOTE_INT= ERRUPTIBILITY; +} + /* Simple structure for pre-allocating Dynamic PAMT pages outside of locks= . */ struct tdx_pamt_cache { struct list_head page_list; @@ -240,6 +246,9 @@ u64 tdh_mng_key_config(struct tdx_td *td); u64 tdh_mng_create(struct tdx_td *td, u16 hkid); u64 tdh_vp_create(struct tdx_td *td, struct tdx_vp *vp); u64 tdh_mng_rd(struct tdx_td *td, u64 field, u64 *data); +u64 tdh_mem_page_demote(struct tdx_td *td, u64 gpa, enum pg_level level, u= 64 pfn, + struct page *new_sp, struct tdx_pamt_cache *pamt_cache, + u64 *ext_err1, u64 *ext_err2); u64 tdh_mr_extend(struct tdx_td *td, u64 gpa, u64 *ext_err1, u64 *ext_err2= ); u64 tdh_mr_finalize(struct tdx_td *td); u64 tdh_vp_flush(struct tdx_vp *vp); diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index cff325fdec79..823ec092b4e4 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1841,6 +1841,9 @@ u64 tdh_mng_rd(struct tdx_td *td, u64 field, u64 *dat= a) } EXPORT_SYMBOL_FOR_KVM(tdh_mng_rd); =20 +static int alloc_pamt_array(u64 *pa_array, struct tdx_pamt_cache *cache); +static void free_pamt_array(u64 *pa_array); + /* Number PAMT pages to be provided to TDX module per 2M region of PA */ static int tdx_dpamt_entry_pages(void) { @@ -1885,6 +1888,57 @@ static void dpamt_copy_regs_array(struct tdx_module_= args *args, void *reg, */ #define MAX_NR_DPAMT_ARGS (sizeof(struct tdx_module_args) / sizeof(u64)) =20 +u64 tdh_mem_page_demote(struct tdx_td *td, u64 gpa, enum pg_level level, u= 64 pfn, + struct page *new_sp, struct tdx_pamt_cache *pamt_cache, + u64 *ext_err1, u64 *ext_err2) +{ + bool dpamt =3D tdx_supports_dynamic_pamt(&tdx_sysinfo) && level =3D=3D PG= _LEVEL_2M; + u64 pamt_pa_array[MAX_NR_DPAMT_ARGS]; + struct tdx_module_args args =3D { + .rcx =3D gpa | pg_level_to_tdx_sept_level(level), + .rdx =3D tdx_tdr_pa(td), + .r8 =3D page_to_phys(new_sp), + }; + u64 ret; + + if (!tdx_supports_demote_nointerrupt(&tdx_sysinfo)) + return TDX_SW_ERROR; + + if (dpamt) { + if (alloc_pamt_array(pamt_pa_array, pamt_cache)) + return TDX_SW_ERROR; + + dpamt_copy_to_regs(&args, r12, pamt_pa_array); + } + + /* Flush the new S-EPT page to be added */ + tdx_clflush_page(new_sp); + + ret =3D seamcall_saved_ret(TDH_MEM_PAGE_DEMOTE, &args); + + *ext_err1 =3D args.rcx; + *ext_err2 =3D args.rdx; + + if (dpamt) { + if (ret) { + free_pamt_array(pamt_pa_array); + } else { + /* + * Set the PAMT refcount for the guest private memory, + * i.e. for the hugepage that was just demoted to 512 + * smaller pages. + */ + atomic_t *pamt_refcount; + + pamt_refcount =3D tdx_find_pamt_refcount(pfn); + WARN_ON_ONCE(atomic_cmpxchg_release(pamt_refcount, 0, + PTRS_PER_PMD)); + } + } + return ret; +} +EXPORT_SYMBOL_FOR_KVM(tdh_mem_page_demote); + u64 tdh_mr_extend(struct tdx_td *td, u64 gpa, u64 *ext_err1, u64 *ext_err2) { struct tdx_module_args args =3D { diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index 096c78a1d438..a6c0fa53ece9 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -24,6 +24,7 @@ #define TDH_MNG_KEY_CONFIG 8 #define TDH_MNG_CREATE 9 #define TDH_MNG_RD 11 +#define TDH_MEM_PAGE_DEMOTE 15 #define TDH_MR_EXTEND 16 #define TDH_MR_FINALIZE 17 #define TDH_VP_FLUSH 18 --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E3C4F361DC0 for ; Thu, 29 Jan 2026 01:16:26 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649389; cv=none; b=JG9Zeu+7bKO98CZRMyCJkeHMMd8olmUz1cg7DbvkeyGSyPY8qmMCpuTvK94c3skYY2OqmtCcCMDPMw9yv8pUmq8BZfdQwuuCkYnAk3o3AVr7GA13eNo+kMF3A0ssY2clBVVRj8HiDVo32E9v/+/7TsWYKgZ63V8bI7DcVls++Lg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649389; c=relaxed/simple; bh=MXrNqxIppqNsimK8K/2l7kp6YfDuyBgadcQDGD92Gkc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ZgrIouDDCa00+VTXFGEIJ9E0FbAMTeDd1aupVRL1fFMFoqXA3IcBRzULaWSwjxLE8yx9tRPFNjnUx/mnSoQjdS1NcXzFw3/hqxUo3Mj7r9mu5TXiRD7mTucKK84JdtVVczQPkk+/V89ykn87VuqOw8vCTGLcfsnEUGW4aLNuUFE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=r5sWMCqF; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="r5sWMCqF" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-34c7d0c5ed2so331731a91.0 for ; Wed, 28 Jan 2026 17:16:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649386; x=1770254186; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=XitJqErgB2rpX72/d+fDj8zwONic1fb7Um6zwnCWul0=; b=r5sWMCqFwdDaw7KulBji2YBerR0rUuIxkCrLr87M2VIx+TpQoe6kLPfk7qhwijc/+Y JIC+fMLuys3HAGQrlLmHzEqOHS9nCNIFPup14L6XrfV2FhhN8lqHQXdgwHZFwjsFPIEe y83zUBohwKR2HGBgGo1MbmuumMXGnp2vobiZSfTsgYFZEe8JZEkgEF52nW31ryuqb1jg okeAhNEsJbSHTQRA6vlbntfx/nEpyLT5TaN/Y9M5ECx6nxb0zETH5ShKybOShszDWISF +XSKP5hNEj8dGOuSENKfF1TuFC5km90VZZ+j9rxp9d76Y0OTZD2O/b0r5ZsFv9s/9Emk ytvw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649386; x=1770254186; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=XitJqErgB2rpX72/d+fDj8zwONic1fb7Um6zwnCWul0=; b=sX0Ohjyyh9DnN4mlPat1WM2i+EtcD0UR41zJiKLVLBOCgpH96xlPJMPb/gEVbEP3z4 9o57ZgG3qeSKhro8KhADynHDDEi97TKs7x1Ncr9MXe2c9sb0cbo4bds6Ld7qRl2F1pP8 QclSKvoJ3zA9kzswaDaDN/d1LtKbd4XAon83rLNFotXI57pju5FHshKGY8wBD/ui6E4c dn67EoJvDzWQy6b0SJYHpUGE3V0wliJ0tZSHHSF/ZH1vocbp/6glAw5Nrkdpdr++Qna1 i3Ec90Oe4YTDJiN077hoaq5l+zl2sKnC+Elrl/WTGyiLcuBbSh4KKHRuHxS+lyGCmLAr thhA== X-Gm-Message-State: AOJu0YxFFTvKikrvjku5t3873rbmjYSmGZOywnjVa+sSDw3FrzG2Tc90 gXep+YdNH1+mN3GLWWP/f6yW+zzQQg8LGxcj9/bU232n87Y/iSZ9stnuc5aC2cw95nH2DwFYVe/ 9GHrK+g== X-Received: from pjbbf7.prod.google.com ([2002:a17:90b:b07:b0:34a:b3a0:78b9]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:538f:b0:34c:6108:bf32 with SMTP id 98e67ed59e1d1-353fedb94b1mr6266892a91.34.1769649386237; Wed, 28 Jan 2026 17:16:26 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:03 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-32-seanjc@google.com> Subject: [RFC PATCH v5 31/45] KVM: x86/mmu: Prevent hugepage promotion for mirror roots in fault path From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rick Edgecombe Disallow hugepage promotion in the TDP MMU for mirror roots as KVM doesn't currently support promoting S-EPT entries due to the complexity incurred by the TDX-Module's rules for hugepage promotion. - The current TDX-Module requires all 4KB leafs to be either all PENDING or all ACCEPTED before a successful promotion to 2MB. This requirement prevents successful page merging after partially converting a 2MB range from private to shared and then back to private, which is the primary scenario necessitating page promotion. - The TDX-Module effectively requires a break-before-make sequence (to satisfy its TLB flushing rules), i.e. creates a window of time where a different vCPU can encounter faults on a SPTE that KVM is trying to promote to a hugepage. To avoid unexpected BUSY errors, KVM would need to FREEZE the non-leaf SPTE before replacing it with a huge SPTE. Disable hugepage promotion for all map() operations, as supporting page promotion when building the initial image is still non-trivial, and the vast majority of images are ~4MB or less, i.e. the benefit of creating hugepages during TD build time is minimal. Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao [sean: check root, add comment, rewrite changelog] Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 3 ++- arch/x86/kvm/mmu/tdp_mmu.c | 12 +++++++++++- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4ecbf216d96f..45650f70eeab 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3419,7 +3419,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault= *fault, u64 spte, int cur_ cur_level =3D=3D fault->goal_level && is_shadow_present_pte(spte) && !is_large_pte(spte) && - spte_to_child_sp(spte)->nx_huge_page_disallowed) { + ((spte_to_child_sp(spte)->nx_huge_page_disallowed) || + is_mirror_sp(spte_to_child_sp(spte)))) { /* * A small SPTE exists for this pfn, but FNAME(fetch), * direct_map(), or kvm_tdp_mmu_map() would like to create a diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 01e3e4f4baa5..f8ebdd0c6114 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1222,7 +1222,17 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kv= m_page_fault *fault) for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) { int r; =20 - if (fault->nx_huge_page_workaround_enabled) + /* + * Don't replace a page table (non-leaf) SPTE with a huge SPTE + * (a.k.a. hugepage promotion) if the NX hugepage workaround is + * enabled, as doing so will cause significant thrashing if one + * or more leaf SPTEs needs to be executable. + * + * Disallow hugepage promotion for mirror roots as KVM doesn't + * (yet) support promoting S-EPT entries while holding mmu_lock + * for read (due to complexity induced by the TDX-Module APIs). + */ + if (fault->nx_huge_page_workaround_enabled || is_mirror_sp(root)) disallowed_hugepage_adjust(fault, iter.old_spte, iter.level); =20 /* --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1CC3364EB3 for ; Thu, 29 Jan 2026 01:16:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649390; cv=none; b=l3otFZJ5GwWqru/0d2utUCmSvJnDaCI0zjArZq0EqrGbpGFV9N6y9m/qfmrdayBtUiRzEyV0HPP5NCNqBlnDcjrCE0d+d0CezP/LV11Bl9FaUA6oOJ5KuvU8BQJ+KRPb850WaNGmEcv3wq38x7aUJEnU0QwpPqmkUukafLbuitg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649390; c=relaxed/simple; bh=ZKA4gxILOiT4uKjve2aBLdSG8+EEE+GNTcKmwrKf7To=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=LntJah/oAkjyla0P3N0Eu6oVxguh7giOdPsEKTVCnmF/gb0C8ajCzWeCWOwzKycgaDLDOKkMxPZYk+O1lVITNTU/Ti2zAWzv3QiO5rT1r+0yK4FXflK7ii89yww356KxswR7PZQzlW8gu41YSwI9ub3E9hhIaReYuWls7pAOGp0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=hgqCWAGB; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="hgqCWAGB" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-34e70e2e363so375385a91.1 for ; Wed, 28 Jan 2026 17:16:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649388; x=1770254188; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=jjjDLQ+c4+Lcjknb+8jLsLWn/cseIOYcFr4v4eW8m5k=; b=hgqCWAGBoHQRtHlm3K3k4oVqB/Q+7jCPcLeHglUbNCia3HLJCxBlpgdaHI+a3nDtWT rwKjfCvk4/L818DaCzIyGDUk0/+OZM2l2wqRFVrI+TH6MLGncAKOvFN5i6AsWnLyYJOY haXn+z+Pma74CW+RuVNNHF7yiLHl2R2u4DAfaggIPU/EIEL5u7YwlFjSzhTFRvtfkeFt azmNhNfJaTxSfo4CLRjFucPdMzG3dtZBGS7idKe/wWRkXp+os1gqjmrJqCDhJoNtDDoX cVwS0oMuAlnr64sFWNfw9DkDcMS8rhiIy8J80Hs2NY9zMIxaxyQ/sQdvHerbAAFRVe9Z TG6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649388; x=1770254188; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jjjDLQ+c4+Lcjknb+8jLsLWn/cseIOYcFr4v4eW8m5k=; b=Hu2dBNknvUWoPKtcpB998R5g24KuL6Ab2tVRGcnKDDoJpjILDNi4HvzmF7pTh333Fv gSGo2wysyYd+SD9IVzEBzJoXRZ8+dTL0RMOx8KH4Vg7WGXfIsZeGu97pt5fRTzXjxl/K CQ/neq27z5v5cBwprasxzdyMbbCf3/+E4VkOOCAdzB6Y36xaVNAT81vJcMjsdahya/9p GHueORJpAQJzKrTHBLYxl4ThMZ05a0QoBcyTDHuYb8Qa5KeeByzTW7wFOiCq0U3Nwmna 7ZctfhufxpTKah5zsxfxSoMQYJ0YLMhBSBlnb+CRfN0ps6jyIl6gjScM2S5mSvWIhpiq RcEw== X-Gm-Message-State: AOJu0YxQ61FjqdH6EJ/LqSTP9AHxrwJZMFv87ftYKlWpWgBb/wqFKCey /zhzIQvOS85OT5N9cVTyWqQmsvIE7vv3sSBVG6FNGK33y+bYvMi2V/O9gaIm+heDPzvxpY9TkmP qkLRRjg== X-Received: from pjboj17.prod.google.com ([2002:a17:90b:4d91:b0:34c:2f52:23aa]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:2888:b0:343:c3d1:8bb1 with SMTP id 98e67ed59e1d1-353fed9b231mr5381792a91.28.1769649388060; Wed, 28 Jan 2026 17:16:28 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:04 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-33-seanjc@google.com> Subject: [RFC PATCH v5 32/45] KVM: x86/mmu: Plumb the old_spte into kvm_x86_ops.set_external_spte() From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Plumb the old SPTE into .set_external_spte() so that the callback can be used to handle removal and splitting of leaf SPTEs. Rename mirror_spte to new_spte to follow the TDP MMU's naming, and to make it more obvious what value the parameter holds. Opportunistically tweak the ordering of parameters to match the pattern of most TDP MMU functions, which do "old, new, level". Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 4 ++-- arch/x86/kvm/mmu/tdp_mmu.c | 4 ++-- arch/x86/kvm/vmx/tdx.c | 14 +++++++------- 3 files changed, 11 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index a6e4ab76b1b2..67deec8e205e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1857,8 +1857,8 @@ struct kvm_x86_ops { */ unsigned long (*alloc_external_sp)(gfp_t gfp); void (*free_external_sp)(unsigned long addr); - int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, - u64 mirror_spte); + int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, u64 old_spte, + u64 new_spte, enum pg_level level); void (*reclaim_external_sp)(struct kvm *kvm, gfn_t gfn, struct kvm_mmu_page *sp); void (*remove_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level le= vel, diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index f8ebdd0c6114..271dd6f875a6 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -614,8 +614,8 @@ static inline int __must_check __tdp_mmu_set_spte_atomi= c(struct kvm *kvm, * the desired value. On failure, restore the old SPTE so that * the SPTE isn't frozen in perpetuity. */ - ret =3D kvm_x86_call(set_external_spte)(kvm, iter->gfn, - iter->level, new_spte); + ret =3D kvm_x86_call(set_external_spte)(kvm, iter->gfn, iter->old_spte, + new_spte, iter->level); if (ret) __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte); else diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index bd5d902da303..e451acdb0978 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1705,29 +1705,29 @@ static int tdx_sept_link_private_spt(struct kvm *kv= m, gfn_t gfn, return 0; } =20 -static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 mirror_spte) +static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, + u64 new_spte, enum pg_level level) { struct kvm_vcpu *vcpu =3D kvm_get_running_vcpu(); struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); + kvm_pfn_t pfn =3D spte_to_pfn(new_spte); struct vcpu_tdx *tdx =3D to_tdx(vcpu); int ret; =20 if (KVM_BUG_ON(!vcpu, kvm)) return -EINVAL; =20 - if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) + if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EIO; =20 - if (!is_last_spte(mirror_spte, level)) - return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte); + if (!is_last_spte(new_spte, level)) + return tdx_sept_link_private_spt(kvm, gfn, level, new_spte); =20 /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) return -EIO; =20 - WARN_ON_ONCE((mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); + WARN_ON_ONCE((new_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); =20 ret =3D tdx_pamt_get(pfn, level, &tdx->pamt_cache); if (ret) --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA7F433067D for ; Thu, 29 Jan 2026 01:16:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649393; cv=none; b=Gq/E9ZZ+8PACneFR6O9KflzFFYOknV+S28ZGNEs8behj0CX/heIQGuYL3EQrXroqDaKM7IZ9ZR/9XD/oE/HREUf4Odtg25IifQOI9Zdo51oe1fuOnDQ7h9aSEpO/lUloH1Nf8/6C/xwRWbj+4zQKggr99xAvptaIKvm+qor4Wz4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649393; c=relaxed/simple; bh=KvOn9sdpHKW3QH4/pQ+ztFzBdmExEpDhGv9YuGKPiRo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=JN9OCokPa00KyLxFAiAmlF0FL6yzmu6dRcuwqSPPOD6iqVtbWr8ZPii+ffQdv/uL6cnl4iVwYErqtQZ8Q2whlItV0SdXB569vbEuc75jF8bcYr49hkOfhOXz+TQXx3iPmMoMSwROxmkzM0XHeMHz7a2noybrGV5zi4emsj8//0g= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=d8HNgDN9; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="d8HNgDN9" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b630753cc38so814987a12.1 for ; Wed, 28 Jan 2026 17:16:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649391; x=1770254191; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=czjQrUOc/hIkSrdYFZVTFF/3UmFL9ujoJHiw3Kb4R7I=; b=d8HNgDN9HVCg3MdK0RAyyRGxFP1wCgTfOeJLZ1/DBZTMdCNZBPY89yzij0It7hzp+E frEfTu41gDO3WOF5iBsiwNvCk/+CGs7RgMhtI8fIbnDyFkU02Nc//OpKaih+GXvy66Fg k+Uy8Pux0MNtVGavHWx7uKS6hWNFbNCvMphglCX49sRn+N0VNaAk75KHVWMQ1B8yCEaQ QtLV8Q4jXZaPHOEmiEqy/FmG88Ev7Ykx3/dO46jnA9rDjXvU8ghlRDwI1M1uvtefJNmq a31E11zBZhTwvIZ2aqWmQ2O1cwxCQ7wqy+GkFz18ZVCFjIWc+gN6mZ5jlPKmq8jTKx+t LI6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649391; x=1770254191; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=czjQrUOc/hIkSrdYFZVTFF/3UmFL9ujoJHiw3Kb4R7I=; b=eIuUFKN1pPH7rcBm1fGhkUcW9kbL/nDNUwFgMX7nQ94pL4/yutkIp+Aq6x3ON1G4bU 6bhH1BEUM4zCGhVQ4boOldkimHzd6kYgYI55YNWDsTF6Wg2+qR1laq3ObhNRL1FSqe6a v5xDad3vZL4Cot0Q8ZxGLryFumlsKrX8wkkOgxrgfMexpNF58y1q8EXuTwUxo8mxKZpa mymuu62H9H7tSSnj1gG8sy41ZNke2fZ7wxTi8Gqf5LwBzc5FlWObMpIwMlThfgPR6OFK eNDNfZfEjEVzklazNzrO2TQlCkuwpPxDxhX51gX2XTyW7LniGSNwNA1ZHq/e+p3OPOEI 2wDQ== X-Gm-Message-State: AOJu0YzpoNxZrMhCp8DovH/GyE028JZF8Ps5Uk9cNHXZbbWe5Obbr0iL gFkfIzcZY7ivX2X7079FDAciT4WLXSS1Dd0mhsEnkY2gPlt3pG9YE64eJzzbUauMI11woTeCQep ssKgWnw== X-Received: from pfiu8.prod.google.com ([2002:a05:6a00:1248:b0:7cf:2dad:ff87]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:6088:b0:81f:997e:59a0 with SMTP id d2e1a72fcca58-823692fd850mr7473991b3a.64.1769649390925; Wed, 28 Jan 2026 17:16:30 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:05 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-34-seanjc@google.com> Subject: [RFC PATCH v5 33/45] KVM: TDX: Hoist tdx_sept_remove_private_spte() above set_private_spte() From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move tdx_sept_remove_private_spte() (and its tdx_track() helper) above tdx_sept_set_private_spte() in anticipation of routing all non-atomic S-EPT writes (with the exception of reclaiming non-leaf pages) through the "set" API. No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/vmx/tdx.c | 194 ++++++++++++++++++++--------------------- 1 file changed, 97 insertions(+), 97 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index e451acdb0978..0f3d27699a3d 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1670,6 +1670,52 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t g= fn, return 0; } =20 +/* + * Ensure shared and private EPTs to be flushed on all vCPUs. + * tdh_mem_track() is the only caller that increases TD epoch. An increase= in + * the TD epoch (e.g., to value "N + 1") is successful only if no vCPUs are + * running in guest mode with the value "N - 1". + * + * A successful execution of tdh_mem_track() ensures that vCPUs can only r= un in + * guest mode with TD epoch value "N" if no TD exit occurs after the TD ep= och + * being increased to "N + 1". + * + * Kicking off all vCPUs after that further results in no vCPUs can run in= guest + * mode with TD epoch value "N", which unblocks the next tdh_mem_track() (= e.g. + * to increase TD epoch to "N + 2"). + * + * TDX module will flush EPT on the next TD enter and make vCPUs to run in + * guest mode with TD epoch value "N + 1". + * + * kvm_make_all_cpus_request() guarantees all vCPUs are out of guest mode = by + * waiting empty IPI handler ack_kick(). + * + * No action is required to the vCPUs being kicked off since the kicking o= ff + * occurs certainly after TD epoch increment and before the next + * tdh_mem_track(). + */ +static void tdx_track(struct kvm *kvm) +{ + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + u64 err; + + /* If TD isn't finalized, it's before any vcpu running. */ + if (unlikely(kvm_tdx->state !=3D TD_STATE_RUNNABLE)) + return; + + /* + * The full sequence of TDH.MEM.TRACK and forcing vCPUs out of guest + * mode must be serialized, as TDH.MEM.TRACK will fail if the previous + * tracking epoch hasn't completed. + */ + lockdep_assert_held_write(&kvm->mmu_lock); + + err =3D tdh_do_no_vcpus(tdh_mem_track, kvm, &kvm_tdx->td); + TDX_BUG_ON(err, TDH_MEM_TRACK, kvm); + + kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); +} + static struct page *tdx_spte_to_external_spt(struct kvm *kvm, gfn_t gfn, u64 new_spte, enum pg_level level) { @@ -1705,6 +1751,57 @@ static int tdx_sept_link_private_spt(struct kvm *kvm= , gfn_t gfn, return 0; } =20 +static void tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, u64 mirror_spte) +{ + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); + gpa_t gpa =3D gfn_to_gpa(gfn); + u64 err, entry, level_state; + + lockdep_assert_held_write(&kvm->mmu_lock); + + /* + * HKID is released after all private pages have been removed, and set + * before any might be populated. Warn if zapping is attempted when + * there can't be anything populated in the private EPT. + */ + if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm)) + return; + + /* TODO: handle large pages. */ + if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) + return; + + err =3D tdh_do_no_vcpus(tdh_mem_range_block, kvm, &kvm_tdx->td, gpa, + level, &entry, &level_state); + if (TDX_BUG_ON_2(err, TDH_MEM_RANGE_BLOCK, entry, level_state, kvm)) + return; + + /* + * TDX requires TLB tracking before dropping private page. Do + * it here, although it is also done later. + */ + tdx_track(kvm); + + /* + * When zapping private page, write lock is held. So no race condition + * with other vcpu sept operation. + * Race with TDH.VP.ENTER due to (0-step mitigation) and Guest TDCALLs. + */ + err =3D tdh_do_no_vcpus(tdh_mem_page_remove, kvm, &kvm_tdx->td, gpa, + level, &entry, &level_state); + if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_REMOVE, entry, level_state, kvm)) + return; + + err =3D tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, pfn, level); + if (TDX_BUG_ON(err, TDH_PHYMEM_PAGE_WBINVD, kvm)) + return; + + __tdx_quirk_reset_page(pfn, level); + tdx_pamt_put(pfn, level); +} + static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, u64 new_spte, enum pg_level level) { @@ -1756,52 +1853,6 @@ static int tdx_sept_set_private_spte(struct kvm *kvm= , gfn_t gfn, u64 old_spte, return ret; } =20 -/* - * Ensure shared and private EPTs to be flushed on all vCPUs. - * tdh_mem_track() is the only caller that increases TD epoch. An increase= in - * the TD epoch (e.g., to value "N + 1") is successful only if no vCPUs are - * running in guest mode with the value "N - 1". - * - * A successful execution of tdh_mem_track() ensures that vCPUs can only r= un in - * guest mode with TD epoch value "N" if no TD exit occurs after the TD ep= och - * being increased to "N + 1". - * - * Kicking off all vCPUs after that further results in no vCPUs can run in= guest - * mode with TD epoch value "N", which unblocks the next tdh_mem_track() (= e.g. - * to increase TD epoch to "N + 2"). - * - * TDX module will flush EPT on the next TD enter and make vCPUs to run in - * guest mode with TD epoch value "N + 1". - * - * kvm_make_all_cpus_request() guarantees all vCPUs are out of guest mode = by - * waiting empty IPI handler ack_kick(). - * - * No action is required to the vCPUs being kicked off since the kicking o= ff - * occurs certainly after TD epoch increment and before the next - * tdh_mem_track(). - */ -static void tdx_track(struct kvm *kvm) -{ - struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - u64 err; - - /* If TD isn't finalized, it's before any vcpu running. */ - if (unlikely(kvm_tdx->state !=3D TD_STATE_RUNNABLE)) - return; - - /* - * The full sequence of TDH.MEM.TRACK and forcing vCPUs out of guest - * mode must be serialized, as TDH.MEM.TRACK will fail if the previous - * tracking epoch hasn't completed. - */ - lockdep_assert_held_write(&kvm->mmu_lock); - - err =3D tdh_do_no_vcpus(tdh_mem_track, kvm, &kvm_tdx->td); - TDX_BUG_ON(err, TDH_MEM_TRACK, kvm); - - kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); -} - static void tdx_sept_reclaim_private_sp(struct kvm *kvm, gfn_t gfn, struct kvm_mmu_page *sp) { @@ -1824,57 +1875,6 @@ static void tdx_sept_reclaim_private_sp(struct kvm *= kvm, gfn_t gfn, sp->external_spt =3D NULL; } =20 -static void tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 mirror_spte) -{ - struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); - gpa_t gpa =3D gfn_to_gpa(gfn); - u64 err, entry, level_state; - - lockdep_assert_held_write(&kvm->mmu_lock); - - /* - * HKID is released after all private pages have been removed, and set - * before any might be populated. Warn if zapping is attempted when - * there can't be anything populated in the private EPT. - */ - if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm)) - return; - - /* TODO: handle large pages. */ - if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) - return; - - err =3D tdh_do_no_vcpus(tdh_mem_range_block, kvm, &kvm_tdx->td, gpa, - level, &entry, &level_state); - if (TDX_BUG_ON_2(err, TDH_MEM_RANGE_BLOCK, entry, level_state, kvm)) - return; - - /* - * TDX requires TLB tracking before dropping private page. Do - * it here, although it is also done later. - */ - tdx_track(kvm); - - /* - * When zapping private page, write lock is held. So no race condition - * with other vcpu sept operation. - * Race with TDH.VP.ENTER due to (0-step mitigation) and Guest TDCALLs. - */ - err =3D tdh_do_no_vcpus(tdh_mem_page_remove, kvm, &kvm_tdx->td, gpa, - level, &entry, &level_state); - if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_REMOVE, entry, level_state, kvm)) - return; - - err =3D tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, pfn, level); - if (TDX_BUG_ON(err, TDH_PHYMEM_PAGE_WBINVD, kvm)) - return; - - __tdx_quirk_reset_page(pfn, level); - tdx_pamt_put(pfn, level); -} - void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, int trig_mode, int vector) { --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CA8A36C0CF for ; Thu, 29 Jan 2026 01:16:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649394; cv=none; b=cWmEXlvLYj1TdMNg5cXTfQlxpEYgI1mDqcA6V3xCNXq9gNGW8CFTM0wS8LhFyU5hZ8oPFznBwhdjo66b68ErcTYxXwwxIKIlzrANlzlNKu4m++GrSk2PtH9eLZSmhSP1/tELUuzNOqKNKaPIMmeAre1/2af4hUkd+RCukdym+mo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649394; c=relaxed/simple; bh=Y+c4aNyIIj8WvslN5Bwj78lT0iBBJf1QhhvlZTFeuW0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=k/RUFWBPcKUYMuFmERRTZFRPmX/5W1UB+tOrL1sMCYHphowikrDnq9VWseoKI9tFcVEz6bw3fMm4qia8gFMh8jVcbnF2oYM6iBk2faf13N91U9C9CyJF0OmGKvGerZ+qCLoJ8G3dNhoYRzPkOWjVJrMywC1y4mLrpOZ3HhGz8Ng= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=w5peNNh2; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="w5peNNh2" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-c6133262e4eso233369a12.2 for ; Wed, 28 Jan 2026 17:16:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649392; x=1770254192; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=0KZFqHRSBZl8QkQW9ClCrTtdFCm69ml2u4d1T0I6oYw=; b=w5peNNh22085kspSprL3mtRockrJMiRN3mV8CBqL0zoKOOg23CP1x7beK/6VS1hi7R sWGNC2z5bGrSdaIBrR1/FIKX9/1puKBip1iihNpLCEvXEB2ERBwadQ9fZ2yINAr7nGBY zpaeyA1omqxTMiLvLojijUrvVyvllriXSqi+73RBZ0E1jxF0m1M+ggFhYLGsseXhKsQY 4pywNMxK2aHhMZxKZG4EBWLypp+j8xak/89JD+9OzxEyntAzHBRnetD77StttKeHTWL+ sA5S0UUvVhYHAxxhcX6eNRHehCKOWScJi08a5pq3hc4X1w8h1FRbgfm5Mwq2Dl+5eMw5 ePvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649392; x=1770254192; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0KZFqHRSBZl8QkQW9ClCrTtdFCm69ml2u4d1T0I6oYw=; b=P0bNmqL+BCMQ6b/51t1M900tYukNCt5KXeK+8SEv2080bkS4NsXOy4sxD0hBRim+7O Gucj2wSwoYx3DKnygNxmjHDCXCDSf8FZh6ZMMdsGJKjsIunJqjKp4lWJ5dhEhTsyAG3h WJ1vELe/R7reJqgw/CBxLUIswpSKqy/GzsW1hm9LnfcOGbviWKInTh4hvZIMipZgE6PO 7sscja3eztw5n/PwtGlOmC2UMK4nvPEfAqkBfeuv9lKnmBHOa1hH9c6tc8sUuwvQUjhm fhitehZag4k9n/Z1kAmT3BzePTPsop5fZ1fO+0S9BmJifOoYHc4227eSBvUSnTbgbZ5c 13Pw== X-Gm-Message-State: AOJu0YwK6bMWaXLhBpsIuzhYwv1pqhET2EmiN2mMdn9D9Y12B31PE0ht pQw6aV9rBRDdR0F4g3u8SUU/91he9iUyiWQgwvl36edyLqblcoiTfhM+hrQKWTia4F5772PsGJF hkn1pVA== X-Received: from pjd16.prod.google.com ([2002:a17:90b:54d0:b0:32e:ca6a:7ca9]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:3d12:b0:34f:14d6:15f5 with SMTP id adf61e73a8af0-38ec62dddd6mr5885937637.29.1769649392490; Wed, 28 Jan 2026 17:16:32 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:06 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-35-seanjc@google.com> Subject: [RFC PATCH v5 34/45] KVM: TDX: Handle removal of leaf SPTEs in .set_private_spte() From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Drop kvm_x86_ops.remove_external_spte(), and instead handling the removal of leaf SPTEs in the S-EPT (a.k.a. external root) in .set_private_spte(). This will allow extending tdx_sept_set_private_spte() to support splitting a huge S-EPT entry without needing yet another kvm_x86_ops hook. Bug the VM if the callback fails, as redundant KVM_BUG_ON() calls are benign (the WARN will fire if and only if the VM isn't already bugged) and handle_changed_spte() is most definitely not prepared to handle failure. Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm-x86-ops.h | 1 - arch/x86/include/asm/kvm_host.h | 2 -- arch/x86/kvm/mmu/tdp_mmu.c | 20 +++++++++++--------- arch/x86/kvm/vmx/tdx.c | 21 ++++++++++++--------- 4 files changed, 23 insertions(+), 21 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 394dc29483a7..3ca56fe6b951 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -97,7 +97,6 @@ KVM_X86_OP(load_mmu_pgd) KVM_X86_OP_OPTIONAL(alloc_external_sp) KVM_X86_OP_OPTIONAL(free_external_sp) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) -KVM_X86_OP_OPTIONAL(remove_external_spte) KVM_X86_OP_OPTIONAL(reclaim_external_sp) KVM_X86_OP_OPTIONAL_RET0(topup_external_cache) KVM_X86_OP(has_wbinvd_exit) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 67deec8e205e..385f1cf32d70 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1861,8 +1861,6 @@ struct kvm_x86_ops { u64 new_spte, enum pg_level level); void (*reclaim_external_sp)(struct kvm *kvm, gfn_t gfn, struct kvm_mmu_page *sp); - void (*remove_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level le= vel, - u64 mirror_spte); int (*topup_external_cache)(struct kvm_vcpu *vcpu, int min); =20 =20 diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 271dd6f875a6..d49aecba18d8 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -559,20 +559,22 @@ static void handle_changed_spte(struct kvm *kvm, int = as_id, tdp_ptep_t sptep, * SPTE being converted to a hugepage (leaf) or being zapped. Shadow * pages are kernel allocations and should never be migrated. * - * When removing leaf entries from a mirror, immediately propagate the - * changes to the external page tables. Note, non-leaf mirror entries - * are handled by handle_removed_pt(), as TDX requires that all leaf - * entries are removed before the owning page table. Note #2, writes - * to make mirror PTEs shadow-present are propagated to external page - * tables by __tdp_mmu_set_spte_atomic(), as KVM needs to ensure the - * external page table was successfully updated before marking the - * mirror SPTE present. + * When modifying leaf entries in mirrored page tables, propagate the + * changes to the external SPTE. Bug the VM on failure, as callers + * aren't prepared to handle errors, e.g. due to lock contention in the + * TDX-Module. Note, changes to non-leaf mirror SPTEs are handled by + * handle_removed_pt() (the TDX-Module requires that child entries are + * removed before the parent SPTE), and changes to non-present mirror + * SPTEs are handled by __tdp_mmu_set_spte_atomic() (KVM needs to set + * the external SPTE while the mirror SPTE is frozen so that installing + * a new SPTE is effectively an atomic operation). */ if (was_present && !was_leaf && (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); else if (was_leaf && is_mirror_sptep(sptep) && !is_leaf) - kvm_x86_call(remove_external_spte)(kvm, gfn, level, old_spte); + KVM_BUG_ON(kvm_x86_call(set_external_spte)(kvm, gfn, old_spte, + new_spte, level), kvm); } =20 static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm, diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 0f3d27699a3d..9f7789c5f0a7 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1751,11 +1751,11 @@ static int tdx_sept_link_private_spt(struct kvm *kv= m, gfn_t gfn, return 0; } =20 -static void tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 mirror_spte) +static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, u64 old_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); + kvm_pfn_t pfn =3D spte_to_pfn(old_spte); gpa_t gpa =3D gfn_to_gpa(gfn); u64 err, entry, level_state; =20 @@ -1767,16 +1767,16 @@ static void tdx_sept_remove_private_spte(struct kvm= *kvm, gfn_t gfn, * there can't be anything populated in the private EPT. */ if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm)) - return; + return -EIO; =20 /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) - return; + return -EIO; =20 err =3D tdh_do_no_vcpus(tdh_mem_range_block, kvm, &kvm_tdx->td, gpa, level, &entry, &level_state); if (TDX_BUG_ON_2(err, TDH_MEM_RANGE_BLOCK, entry, level_state, kvm)) - return; + return -EIO; =20 /* * TDX requires TLB tracking before dropping private page. Do @@ -1792,14 +1792,15 @@ static void tdx_sept_remove_private_spte(struct kvm= *kvm, gfn_t gfn, err =3D tdh_do_no_vcpus(tdh_mem_page_remove, kvm, &kvm_tdx->td, gpa, level, &entry, &level_state); if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_REMOVE, entry, level_state, kvm)) - return; + return -EIO; =20 err =3D tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, pfn, level); if (TDX_BUG_ON(err, TDH_PHYMEM_PAGE_WBINVD, kvm)) - return; + return -EIO; =20 __tdx_quirk_reset_page(pfn, level); tdx_pamt_put(pfn, level); + return 0; } =20 static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, @@ -1811,6 +1812,9 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, u64 old_spte, struct vcpu_tdx *tdx =3D to_tdx(vcpu); int ret; =20 + if (is_shadow_present_pte(old_spte)) + return tdx_sept_remove_private_spte(kvm, gfn, level, old_spte); + if (KVM_BUG_ON(!vcpu, kvm)) return -EINVAL; =20 @@ -3639,7 +3643,6 @@ void __init tdx_hardware_setup(void) =20 vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; vt_x86_ops.reclaim_external_sp =3D tdx_sept_reclaim_private_sp; - vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; =20 /* * FIXME: Wire up the PAMT hook iff DPAMT is supported, once VMXON is --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B4603314C5 for ; Thu, 29 Jan 2026 01:16:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649396; cv=none; b=N5+ijKjRf9XQB5VeSSxh7YNAZzXaOtBShf1/O4O7yRESLJy6WAoZVKtnsCiOY6CobhL+FFZS9/Zo9LtZXSqWZ6TM1RQ1sWVQ5N3IgQrkkvBc0AJQtl985uUxCCrlfAtkjp/PjzPOhY+vauZWxV9bYZAWp5fO/jHKhatwsCCs1UM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649396; c=relaxed/simple; bh=kJQ9hFNF5ygDouQFKIpQgR+27T82l54AJAavRy375H8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=geOQMpulhIpBXVGDGKCJn1R0Fp6FXDPfhKCjVVX+LfBuQ8DaWgHzk2XRaBscTTx6AxcMnOGJz3D1SwgzHiCsiPiQlLBwiyB4LLYL/UvfF10h7XWN4r/Q6MNrl/WM/PD9BNDMU1BvZxQ9bK4TidXcNfsp4i/+S9F00gRfAsapcOs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4pjbbzou; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4pjbbzou" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-2a7701b6353so3720175ad.3 for ; Wed, 28 Jan 2026 17:16:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649394; x=1770254194; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=hk2OrgovQOMb8nY4yl86VZzMpQRRHhLb7boChViJO8g=; b=4pjbbzouSAHOfeCMbJTJQjatZ6ebS4fVvg3et4OTAM6aOvrTt+ZH/yy9ulQrRRyCBQ ot7I4xF2zR72+Hahb/2TMcjfvulGbfAWlDmvfZwoE+OYAaVQ39yRunyzd4SbNDPHos24 Vhca8yDkjahNkN39jhk40Fz53W3XGS0qKYQs2Jcz7ujyzC4KZZA0eeKT2ZbvNMSTvKDT i3SHsKQOt+4MlLY6do7IjNJzOZ4Jxl6tZz65v/BY9OTpyDAcIT6MWmri24MccQvvPDKS eCrpcmToZr799m0bAmva+rt0OSFspkk7tyIYInxgxNjyxTZCHjUxnA+sICfr4kf5GdOy D2iQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649394; x=1770254194; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=hk2OrgovQOMb8nY4yl86VZzMpQRRHhLb7boChViJO8g=; b=qaRTvSXQsaeHKUKks8SRfc/iUjPJcKOskIa9TjnkLwxyuKCg8VkameJlWJVGjOOdyo Aw/UynyuqgR9NOJiMsAcKrXU8kER0XhYhLu0tPOvDnwE0rF39OyMw9HK0nFKDG+v+FT1 PZVyAoA01rYZ7HjGaXtSFfHsuIcZvK3MnWTD+pCM3g91EAPjmUVxcWZrKcyVublOHtlM 1wVUKv3L9Cgrel17xXvML7CFnT3kQea4/4vk983PwR3lxCyF+RSi0mTKsJnroRzQIa+P 12pP/UJyqhtQ2SLi+/3du6GLmGgQo9+8dT/IFTBC67Kzw26Fu2qdJK5NNVHH4bpnbXuf PMUA== X-Gm-Message-State: AOJu0YzdH561olbcnuy0oRq9z0QqCq31/kH+OeGgDB+iyfTitLeIe6hV 3nzYMevOkXUNzecygzsaAWQx92HCD07M9HbbN2pXABZPj9jnSxOOlmJ9dRWbMIcKb8dsWxAtMWX 5fmah+w== X-Received: from plqu6.prod.google.com ([2002:a17:902:a606:b0:2a7:cf29:aee1]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:903:8cd:b0:2a7:a653:5203 with SMTP id d9443c01a7336-2a870de43ccmr68752745ad.27.1769649394405; Wed, 28 Jan 2026 17:16:34 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:07 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-36-seanjc@google.com> Subject: [RFC PATCH v5 35/45] KVM: TDX: Add helper to handle mapping leaf SPTE into S-EPT From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a helper, tdx_sept_map_leaf_spte(), to wrap and isolate PAGE.ADD and PAGE.AUG operations, and thus complete tdx_sept_set_private_spte()'s transition into a "dispatch" routine for setting/writing S-EPT entries. Opportunistically tweak the prototypes for tdx_sept_remove_private_spte() and tdx_sept_link_private_spt() to align with tdx_sept_set_private_spte() and tdx_sept_map_leaf_spte(). No functional change intended. Signed-off-by: Sean Christopherson --- arch/x86/kvm/vmx/tdx.c | 97 ++++++++++++++++++++++-------------------- 1 file changed, 51 insertions(+), 46 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 9f7789c5f0a7..e6ac4aca8114 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1670,6 +1670,50 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t g= fn, return 0; } =20 +static int tdx_sept_map_leaf_spte(struct kvm *kvm, gfn_t gfn, u64 new_spte, + enum pg_level level) +{ + struct kvm_vcpu *vcpu =3D kvm_get_running_vcpu(); + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + kvm_pfn_t pfn =3D spte_to_pfn(new_spte); + int ret; + + /* TODO: handle large pages. */ + if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) + return -EIO; + + if (KVM_BUG_ON(!vcpu, kvm)) + return -EINVAL; + + WARN_ON_ONCE((new_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); + + ret =3D tdx_pamt_get(pfn, level, &to_tdx(vcpu)->pamt_cache); + if (ret) + return ret; + + /* + * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory() + * before kvm_tdx->state. Userspace must not be allowed to pre-fault + * arbitrary memory until the initial memory image is finalized. Pairs + * with the smp_wmb() in tdx_td_finalize(). + */ + smp_rmb(); + + /* + * If the TD isn't finalized/runnable, then userspace is initializing + * the VM image via KVM_TDX_INIT_MEM_REGION; ADD the page to the TD. + */ + if (likely(kvm_tdx->state =3D=3D TD_STATE_RUNNABLE)) + ret =3D tdx_mem_page_aug(kvm, gfn, level, pfn); + else + ret =3D tdx_mem_page_add(kvm, gfn, level, pfn); + + if (ret) + tdx_pamt_put(pfn, level); + + return ret; +} + /* * Ensure shared and private EPTs to be flushed on all vCPUs. * tdh_mem_track() is the only caller that increases TD epoch. An increase= in @@ -1729,14 +1773,14 @@ static struct page *tdx_spte_to_external_spt(struct= kvm *kvm, gfn_t gfn, return virt_to_page(sp->external_spt); } =20 -static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 mirror_spte) +static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, u64 new_s= pte, + enum pg_level level) { gpa_t gpa =3D gfn_to_gpa(gfn); u64 err, entry, level_state; struct page *external_spt; =20 - external_spt =3D tdx_spte_to_external_spt(kvm, gfn, mirror_spte, level); + external_spt =3D tdx_spte_to_external_spt(kvm, gfn, new_spte, level); if (!external_spt) return -EIO; =20 @@ -1752,7 +1796,7 @@ static int tdx_sept_link_private_spt(struct kvm *kvm,= gfn_t gfn, } =20 static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 old_spte) + u64 old_spte, enum pg_level level) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); kvm_pfn_t pfn =3D spte_to_pfn(old_spte); @@ -1806,55 +1850,16 @@ static int tdx_sept_remove_private_spte(struct kvm = *kvm, gfn_t gfn, static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, u64 new_spte, enum pg_level level) { - struct kvm_vcpu *vcpu =3D kvm_get_running_vcpu(); - struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - kvm_pfn_t pfn =3D spte_to_pfn(new_spte); - struct vcpu_tdx *tdx =3D to_tdx(vcpu); - int ret; - if (is_shadow_present_pte(old_spte)) - return tdx_sept_remove_private_spte(kvm, gfn, level, old_spte); - - if (KVM_BUG_ON(!vcpu, kvm)) - return -EINVAL; + return tdx_sept_remove_private_spte(kvm, gfn, old_spte, level); =20 if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EIO; =20 if (!is_last_spte(new_spte, level)) - return tdx_sept_link_private_spt(kvm, gfn, level, new_spte); + return tdx_sept_link_private_spt(kvm, gfn, new_spte, level); =20 - /* TODO: handle large pages. */ - if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) - return -EIO; - - WARN_ON_ONCE((new_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); - - ret =3D tdx_pamt_get(pfn, level, &tdx->pamt_cache); - if (ret) - return ret; - - /* - * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory() - * before kvm_tdx->state. Userspace must not be allowed to pre-fault - * arbitrary memory until the initial memory image is finalized. Pairs - * with the smp_wmb() in tdx_td_finalize(). - */ - smp_rmb(); - - /* - * If the TD isn't finalized/runnable, then userspace is initializing - * the VM image via KVM_TDX_INIT_MEM_REGION; ADD the page to the TD. - */ - if (likely(kvm_tdx->state =3D=3D TD_STATE_RUNNABLE)) - ret =3D tdx_mem_page_aug(kvm, gfn, level, pfn); - else - ret =3D tdx_mem_page_add(kvm, gfn, level, pfn); - - if (ret) - tdx_pamt_put(pfn, level); - - return ret; + return tdx_sept_map_leaf_spte(kvm, gfn, new_spte, level); } =20 static void tdx_sept_reclaim_private_sp(struct kvm *kvm, gfn_t gfn, --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C461B36CE0D for ; Thu, 29 Jan 2026 01:16:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649398; cv=none; b=p3O8imwGbPg6NkBhcQt7m7fHE+3VCUHoAwgHfoEQvUzKLimc+o/cBBM2QKRHNVADPBtr2JqkybyGNPHdsyFsTWdWWIf/o+gCXauXeuPHmHjCgE8nvIBuRBCVmOWxYYz5yWzifcAXYRUFyzeH9Dyzz6DvchjQ3RR/s9HQTHeYQa4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649398; c=relaxed/simple; bh=ZA7lO6Iqcxv2nIH6jvLo0wih4TGT7nAoCV3mCXKZU8Q=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ebuD3JM/MxWV0Gg3YP0L80pLKo2puuLKl+zcOfijFqPx1oJ0ZaFlBE+mEc40PzEGJwCSg59xkdmDUCIEvs/5h5An2CHhpRVDBeoUNFc23dlD6BaYU8sjZi0nnS6Ka7u9BIPaGrL2oHnqVoAADUhVPfXaIMW6gfH+SGu4zx+B2/o= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=nhqDFrpt; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="nhqDFrpt" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c6133262e4eso233408a12.2 for ; Wed, 28 Jan 2026 17:16:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649396; x=1770254196; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=yjclvkjmE1zw/GHq/jW5Y/EQUpO3A2fBpqR5d3bbX+8=; b=nhqDFrptFZSlTjl5ZcbC29HI4IjTrLVEH06tZSxjzl9PrBFvmlrkxxiF8cbIl1Im1X eTh72iMiNYC/MAcrzvZ0sCehqpIj94BMP4k/fZIFTr34s+Co8LArZO6xMjr8rNLyEnnH upoyXiPxcvxyHFRi0wUL4ZovSaf2gSOq4ei4mQUvoJkqqlWR1V37Hv8QIMacC9T3VV5b pRSlssFfjakI+Sn6hWWA7EFKuMH4Dq0TbmfSX0uGz3rsuWjH3JyXAFzlZvcaAim1tpLc TsNBN9u0IJue1UJtuJHgDgWf06h9o0BaegCbrPQKAdkii2qkufZyIvoav7vpqFnUBHk2 L17A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649396; x=1770254196; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=yjclvkjmE1zw/GHq/jW5Y/EQUpO3A2fBpqR5d3bbX+8=; b=a1ldzw07cHSihsVyh7UA4tAZ9QeqSkcrnVlQvKfPs9g8ivgEE/9dPvPVHohk/ooAF9 rHyH6lz4x1o3jplPrRCiU9EUzp2eL0Ewle1OIkcgubOpCcYwJAgzvU1OMfyrBVBgdDCw Hq08BDu7aH6CVQntaUq8FelA3HjiKlxMH7/BHGc5lTBCcsmdqClgZ69rePNR8VO//sC/ TDlRXuK6vxK2Jic+sfv/FQQXVCmbiC7nvWVCuODDymMP7wGOdEtVHIAX5MHlgsy/5do4 D13TdMkv2OxpiDRexku3j5ZstWtUcl/UfV//9Z53O84Y6R0Y45Jz87V2icUKjn/nN30i h20w== X-Gm-Message-State: AOJu0YwNgVGlwjoSuo3VgreAYq8NZha7jBH6C9QHqSxpmxBxdtljETt7 VI1hPTCgj6fPYFwHYyzXOB5WCSWXh7v3b/Y7A0ggJzOvqoAty1bfKQA6RpniXGf8V4zA3jXR6xL kU7IxdA== X-Received: from pjof4.prod.google.com ([2002:a17:90a:8e84:b0:34c:cb46:dad7]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:7484:b0:366:14af:9bbb with SMTP id adf61e73a8af0-38ec65898bfmr7510420637.69.1769649396157; Wed, 28 Jan 2026 17:16:36 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:08 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-37-seanjc@google.com> Subject: [RFC PATCH v5 36/45] KVM: TDX: Move S-EPT page demotion TODO to tdx_sept_set_private_spte() From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that handle_changed_spte() can handles all mirror SPTE updates, move the TDP MMU's assertion that it doesn't replace a shadow-present mirror SPTE with another shadow-present SPTE into TDX, in the form of a TODO that calls out that KVM needs to add support for splitting/demoting hugepage. Drop the "!is_leaf" condition so that an unexpected/unsupported update to a shadow-present S-EPT triggers a KVM_BUG_ON(), versus being silently ignored (well, silent until it causes explosions in the future). Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 9 +-------- arch/x86/kvm/vmx/tdx.c | 5 ++++- 2 files changed, 5 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index d49aecba18d8..3b0da898824a 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -572,7 +572,7 @@ static void handle_changed_spte(struct kvm *kvm, int as= _id, tdp_ptep_t sptep, if (was_present && !was_leaf && (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); - else if (was_leaf && is_mirror_sptep(sptep) && !is_leaf) + else if (was_leaf && is_mirror_sptep(sptep)) KVM_BUG_ON(kvm_x86_call(set_external_spte)(kvm, gfn, old_spte, new_spte, level), kvm); } @@ -704,13 +704,6 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id= , tdp_ptep_t sptep, =20 handle_changed_spte(kvm, as_id, sptep, gfn, old_spte, new_spte, level, fa= lse); =20 - /* - * Users that do non-atomic setting of PTEs don't operate on mirror - * roots. Bug the VM as this path doesn't propagate such writes to the - * external page tables. - */ - KVM_BUG_ON(is_mirror_sptep(sptep) && is_shadow_present_pte(new_spte), kvm= ); - return old_spte; } =20 diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index e6ac4aca8114..59b7ba36d3d9 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1850,7 +1850,10 @@ static int tdx_sept_remove_private_spte(struct kvm *= kvm, gfn_t gfn, static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, u64 new_spte, enum pg_level level) { - if (is_shadow_present_pte(old_spte)) + /* TODO: Support replacing huge SPTE with non-leaf SPTE. (a.k.a. demotion= ). */ + if (KVM_BUG_ON(is_shadow_present_pte(old_spte) && is_shadow_present_pte(n= ew_spte), kvm)) + return -EIO; + else if (is_shadow_present_pte(old_spte)) return tdx_sept_remove_private_spte(kvm, gfn, old_spte, level); =20 if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A6FBC3382F8 for ; Thu, 29 Jan 2026 01:16:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649400; cv=none; b=pCTRzXba3jiU5E2nVWpgK4J4r2wu3T9r1lLgO7BsQo6UM0l5+X4Vl7GD4P+acVmuacZvLxV9HQs96a5BIQo6vSCazTMV1NsSxg3VNnGF6iRYbujdzWsT4KvNgkuniGSBMwa54KK5RnrlMs5xVeo59c0tLuF7ERa4c4zaUnMIlms= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649400; c=relaxed/simple; bh=Oi162mi6i9IEnnNv6uPixUH7Wd19+um7eCx9W0j7qNY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=SZueBgN+FwjzqYHAbefe9PWoQGLQYAAdkqTherYmtYCfNNDFIVL/cQcGtY3wZF6D9gMhCYOTCLTuOaNJhdhBFxjNYO5WRW9FQJr/WDjCugErkZx3yI+trpaIchk7lB72yZU9XZNuVkp4T7RTLZKwmkFPM6A8ETIVJiYcGmUBahI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=aWY5uYcR; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="aWY5uYcR" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-34abd303b4aso805084a91.1 for ; Wed, 28 Jan 2026 17:16:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649398; x=1770254198; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=f+prWW7rpbp1pSxf9bX6axWwAfKRzuFR7Mjnf0TwcpA=; b=aWY5uYcRhVR3rIc/WIhWu0xOEymyFTOdwGCZ0tWTYWKkEp+lkvlMNHNtgGDv5hfbEw Lt4vEmZMxuA3b50Ovz+LEGJvCJValJtophtoG4HkV4lLcg9UOAqVzM45+xSJKxlGac62 1wWrr+/4vC+SZa2A9H2pA6IaUCb7x/VtqgIgJbdGRoewZ+2/VkId1peSmtVW5tAigDsb Fn/Vaa/WcAfjorIaFJcpyONedWVGM35o4w0G8faE30hc0YjajhQo/JCJO5MG+CA0HAbK qKVCz/aVYbFb89NiqmyzJiBubg23KqRtAjS9TrHfoDtDFBXvaviuyIDEa+gdlF2RAhSL Spzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649398; x=1770254198; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=f+prWW7rpbp1pSxf9bX6axWwAfKRzuFR7Mjnf0TwcpA=; b=G5IS+TZjdKAAzIypiAhEn7eRZvyl39c5H53uullM8MXvht2fFtbxvtIK/ll4LSdaDa zZ1DR0PNMS27ie/7uQQBfPWBu11LiYKicpL3DIS8KopVDrmHaj6glGL5QNlPDoHKseDL G8DYHESGJMseWWnL4CE9jIVz1c3b9x4LVH4vaQCeGvv0nIVmEXBoTfga3KV4LNYF2cje l8j9EyT/nLcETJu1urUhxCOqwn2X9yAWYxGJXHDcLW6WJv/ICxvDI3dNpANWPuH9f3Hx 2SBzgvdQBQOSZCaSvLit/zauz10YzPzIbl/r024wuh90/HSnYN0CQEkZZZHiVARij4Kc F7qw== X-Gm-Message-State: AOJu0Yxi0EcU3D+GCRHaHfBL9KlefVftvLBi7e2eli8CsgZ7/CnHoBz1 ARkN03/d5TmT5CEuutlQh3pk4KrbIStQ2pQxzW49ixIaHCumiY18QJVFiy4C1JD+usXX2L4pJ+j PXaZ6kg== X-Received: from pjbso3.prod.google.com ([2002:a17:90b:1f83:b0:33b:c211:1fa9]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:d603:b0:34e:7938:669c with SMTP id 98e67ed59e1d1-353feda725cmr6069000a91.25.1769649397968; Wed, 28 Jan 2026 17:16:37 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:09 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-38-seanjc@google.com> Subject: [RFC PATCH v5 37/45] KVM: x86/tdp_mmu: Alloc external_spt page for mirror page table splitting From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Isaku Yamahata Enhance tdp_mmu_alloc_sp_for_split() to allocate a page table page for the external page table for splitting the mirror page table. Signed-off-by: Isaku Yamahata Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao [sean: use kvm_x86_ops.alloc_external_sp()] Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 3b0da898824a..4f5b80f0ca03 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1447,7 +1447,7 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, return spte_set; } =20 -static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(void) +static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct tdp_iter *it= er) { struct kvm_mmu_page *sp; =20 @@ -1461,6 +1461,15 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spl= it(void) return NULL; } =20 + if (is_mirror_sptep(iter->sptep)) { + sp->external_spt =3D (void *)kvm_x86_call(alloc_external_sp)(GFP_KERNEL_= ACCOUNT); + if (!sp->external_spt) { + free_page((unsigned long)sp->spt); + kmem_cache_free(mmu_page_header_cache, sp); + return NULL; + } + } + return sp; } =20 @@ -1540,7 +1549,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *= kvm, else write_unlock(&kvm->mmu_lock); =20 - sp =3D tdp_mmu_alloc_sp_for_split(); + sp =3D tdp_mmu_alloc_sp_for_split(&iter); =20 if (shared) read_lock(&kvm->mmu_lock); --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65B26376469 for ; Thu, 29 Jan 2026 01:16:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649401; cv=none; b=F4CMQ3CZEUwl839IL7CKRV9VjKd79pLtE9tOMN5WhlTEuQfPkxWKEYw/OZqtlAJdmeh/HVp9eul42/YHL9AohzU5G+PdUeJYEB9794Rouzyr763SO85fKww3lxtyu8ryfSVqtTGf0WGEtTNxhemWsqxLIrwEGMAjYddF9u2V3EI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649401; c=relaxed/simple; bh=E81ZhmT2t0PJD5dbqVBjf6xweJIj6b/NPrrZPd0lj4k=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=sSiv2w2MQuYR6MOcUWXwRLyV15HCGA5lXGKZ9HRPz4edgsumFn5+eFjL67gWQJJTGLHXVq51tRTp1orNqAIUrj2xQHjB5sDIs45duro97Y/BMq0W5jroIitgYWsRULHtmNvAVifiIJ6+5Qot8/6innNTensoFAUo5CpYacNN9Rc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=eq19XCw9; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="eq19XCw9" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-81c68fef4d4so1145348b3a.2 for ; Wed, 28 Jan 2026 17:16:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649400; x=1770254200; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=4KWczVW3BK+JCRmXurxu5hKVHytxWOCNoKeowZmDwwM=; b=eq19XCw9ByLa0Fhtodu7upOCibpeL/vACstyYEa+YcjJoAkedHpXo2AxfvjseOQu2D WIKp3SL3FTnew9zbjYBAr5tsAeCr9+aKmH8g0b5M4wFG+sa0kSCkTdKAxyGa59Gc5Huk wuP6yayJ9s9iYCQf2hpWU1Wvs+S/EFRmipvHnhlEK10VpSjtUo23KDEHeWgrQApcK9fd xcv7G2nGsv84MPfJygHnH6pgPcJ/P6dSKbR5q4hZUjHk9KWHEx8Y4GwACDa5j1wDXazE r57SjR62f9E6pyUMALl7aDHgCnAnbhd3WJzml5QEfZDgz31eHwzRU5G0NIVfvwolIStz tsvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649400; x=1770254200; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=4KWczVW3BK+JCRmXurxu5hKVHytxWOCNoKeowZmDwwM=; b=qqHcJEw4ds+cCoYv4CUpRuELUP+HRQQ9qcgQnQYoLM864NXF0PeIG0bpcKLu5uZYkM HJJl++tt/fy40Pi7Dqn9OE/gcpr1MwwFk/mV4FgIusgVP5vX/27Gfn70boC29dPP1N7F eMa30jbwdFhTPZ9BimLlN46pUy4NUERI4TQghNd/J7mmXbfxXJNbahAYPncNu2sL2SQ5 IC/1epKvQMnHYO9r7OrrTfDKPcQplRTpwXZZ9JChQnwucqHMz34+gFfpMoA6uB+7+Y7A RZFqxxQrX5iNlPZgASuZ3ajaOLCkob8OE7EG54XmU5RlL5GPJEx08DwnqewgbHptEvmW PhWQ== X-Gm-Message-State: AOJu0YywTb+yOX1cWClUmWtCqGrrZRNzRFuOl9PsnV7/ai/nR39TdUiP Rhz6l1DqmcpooNwi5qG+yFXoPhXpu9kjixb1qz/qgZ/D+2Eg+G2idFO4MiA3LIn26gw/W8Pr1c8 k1RZ5Vw== X-Received: from pfcy16.prod.google.com ([2002:a05:6a00:93d0:b0:823:747:7567]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:3d14:b0:81b:1a87:9eb9 with SMTP id d2e1a72fcca58-823691866damr6265136b3a.25.1769649399557; Wed, 28 Jan 2026 17:16:39 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:10 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-39-seanjc@google.com> Subject: [RFC PATCH v5 38/45] KVM: x86/mmu: Add Dynamic PAMT support in TDP MMU for vCPU-induced page split From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Extend the TDP MMU to support vCPU-induced hugepage splits in mirror roots when Dynamic PAMT is enabled. I.e. top-up the PAMT cache when allocating a new child page table, so that if the split is successful, there will be a PAMT paging waiting to associated with the new less/non-huge mapping. Note, the allocation is for the guest memory, not the S-EPT page, as PAMT pages are accounted up front by .alloc_external_sp(). Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 25 ++++++++++++++++--------- arch/x86/kvm/vmx/tdx.c | 3 +++ 2 files changed, 19 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 4f5b80f0ca03..e32034bfca5a 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1456,21 +1456,28 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_sp= lit(struct tdp_iter *iter) return NULL; =20 sp->spt =3D (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT); - if (!sp->spt) { - kmem_cache_free(mmu_page_header_cache, sp); - return NULL; - } + if (!sp->spt) + goto err_spt; =20 if (is_mirror_sptep(iter->sptep)) { sp->external_spt =3D (void *)kvm_x86_call(alloc_external_sp)(GFP_KERNEL_= ACCOUNT); - if (!sp->external_spt) { - free_page((unsigned long)sp->spt); - kmem_cache_free(mmu_page_header_cache, sp); - return NULL; - } + if (!sp->external_spt) + goto err_external_spt; + + if (kvm_x86_call(topup_external_cache)(kvm_get_running_vcpu(), 1)) + goto err_external_split; } =20 return sp; + +err_external_split: + kvm_x86_call(free_external_sp)((unsigned long)sp->external_spt); +err_external_spt: + free_page((unsigned long)sp->spt); +err_spt: + kmem_cache_free(mmu_page_header_cache, sp); + return NULL; + } =20 /* Note, the caller is responsible for initializing @sp. */ diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 59b7ba36d3d9..e90610540a0b 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1625,6 +1625,9 @@ static int tdx_topup_external_pamt_cache(struct kvm_v= cpu *vcpu, int min) if (!tdx_supports_dynamic_pamt(tdx_sysinfo)) return 0; =20 + if (WARN_ON_ONCE(!vcpu)) + return -EIO; + return tdx_topup_pamt_cache(&to_tdx(vcpu)->pamt_cache, min); } =20 --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C683A338585 for ; Thu, 29 Jan 2026 01:16:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649403; cv=none; b=bu9+QY/+u349Gem8kkkb22KtUDuLZ+GUdvZJoSz2eW3PPnUADHWoK3U2ogt59HCwP878vRo7Ukz0Pz7wHVabfBqZ9830nwDFnwXDKY8NGSgsnhDlMN0weu3B4HLxeDcfWrr0fUba2O6wqMgk7WQgYmfLlCmOIGgJde/MFFQidaU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649403; c=relaxed/simple; bh=748IjIIuUafEiIM83Ku56LzDm+kATSx23KWSqCyn9i4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=kgCUFLXQ0ckCqXHEd8uob87aOUehv1+x5N8C7UZpdniAhzE9q3Uq6BU+WAHRUsvcZcZAcyVMXbvwjd+8X2h63moShPwyIoyVtnqtWWOOeUKiMVRceX8vbMjkxbTtUQzPgV7t/LkAL70TNp53RzsVC/jYriWadBGTqO7V17T467E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=sEuSOiDV; arc=none smtp.client-ip=209.85.214.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="sEuSOiDV" Received: by mail-pl1-f201.google.com with SMTP id d9443c01a7336-29f1f79d6afso3992605ad.0 for ; Wed, 28 Jan 2026 17:16:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649401; x=1770254201; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=QSAxn7KWvy+xJKU0DNDq3fpgZfT3mJRUWulB6u+627E=; b=sEuSOiDVz3kgLmg+QRTGoCJnBp6tAVQhpvov++AvSZYsTWc3bJowjohb2kmc2rCWLD KBV2VbnwNVtw31lHjPU18xL3yiWcrnAe43avbY8es96+TWu1hOY4g3PaTl488/ZDuxlu m1eAh1taMnwxvgjkS9HT27noMNxXULl46D6+XfcYsy2GzaUwapx2YnQykkr1DkLmcrNW S2f6HGIAQgp3DkdBYURltgW6pHAiFwg2Utwa2ubD+9p2JwO6pJkyn8zk4HPS8RtNQE3o uVHdg02GBC8MayI6PJKfGHIdPRli8dqfVHONTS2Vth4oI0TTShHLzUL/3rnQFMFjKSj1 1Mrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649401; x=1770254201; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=QSAxn7KWvy+xJKU0DNDq3fpgZfT3mJRUWulB6u+627E=; b=Ck3wu/nIdonJdnjpndcu0Kye5aeRKVX7sVqmsIqHPBhloUxkLFL3IRqNpqOdWbb5cc Pmhsxyi+/qv92vXVE/+N8phJv/+6mKGheGceumWTcNwY4Y3yyJ/ZW7JTvmlVS9s0v6Ou vu+x1WupXOixncz39vkX2KETOz6wzkTIZ9XSo6/ptSrf2uE+xPCLcb88vniF/IDmZV6u TmmLCg+bz0tE+fcNkcH2MDe3rGHW9HsJy0L1+5ghb2Sqtawc1/ODYGHpnei8IKFCUxsF aNkF1LTdTv0AHEcDqX4wm4P3K1AaTeXAe6RcnWbx58Q4YN9vWMs2X8wDyd/8AGmqBxJV b84g== X-Gm-Message-State: AOJu0YzrMCTZEznsyo4uNoyRWkmHFc9A+YT3LgFzER4okeV73MquK4+R biWvXt0h3P/lB1LjSbeXwFbOGbnQeYyBSkgNzK9e6uYdV57QiTFSIKtj9GBz4YH2IrULF2ySWlr f1tEdeQ== X-Received: from pght14.prod.google.com ([2002:a63:eb0e:0:b0:c1d:1cf6:2897]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:568e:b0:38d:fc34:c887 with SMTP id adf61e73a8af0-38ec654856dmr6765109637.65.1769649401195; Wed, 28 Jan 2026 17:16:41 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:11 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-40-seanjc@google.com> Subject: [RFC PATCH v5 39/45] KVM: TDX: Add core support for splitting/demoting 2MiB S-EPT to 4KiB From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yan Zhao Add support for splitting, a.k.a. demoting, a 2MiB S-EPT hugepage to its 512 constituent 4KiB pages. As per the TDX-Module rules, first invoke MEM.RANGE.BLOCK to put the huge S-EPTE entry into a splittable state, then do MEM.TRACK and kick all vCPUs outside of guest mode to flush TLBs, and finally do MEM.PAGE.DEMOTE to demote/split the huge S-EPT entry. Assert the mmu_lock is held for write, as the BLOCK =3D> TRACK =3D> DEMOTE sequence needs to be "atomic" to guarantee success (and because mmu_lock must be held for write to use tdh_do_no_vcpus()). Note, even with kvm->mmu_lock held for write, tdh_mem_page_demote() may contend with tdh_vp_enter() and potentially with the guest's S-EPT entry operations. Therefore, wrap the call with tdh_do_no_vcpus() to kick other vCPUs out of the guest and prevent tdh_vp_enter() to ensure success. Signed-off-by: Xiaoyao Li Signed-off-by: Isaku Yamahata Signed-off-by: Yan Zhao [sean: wire up via tdx_sept_link_private_spt(), massage changelog] Signed-off-by: Sean Christopherson --- arch/x86/kvm/vmx/tdx.c | 51 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 48 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index e90610540a0b..af63364c8713 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1776,6 +1776,52 @@ static struct page *tdx_spte_to_external_spt(struct = kvm *kvm, gfn_t gfn, return virt_to_page(sp->external_spt); } =20 +/* + * Split a huge mapping into the target level. Currently only supports 2M= iB + * mappings (KVM doesn't yet support 1GiB mappings for TDX guests). + * + * Invoke "BLOCK + TRACK + kick off vCPUs (inside tdx_track())" since DEMO= TE + * now does not support yet the NON-BLOCKING-RESIZE feature. No UNBLOCK is + * needed after a successful DEMOTE. + * + * Under write mmu_lock, kick off all vCPUs (inside tdh_do_no_vcpus()) to = ensure + * DEMOTE will succeed on the second invocation if the first invocation re= turns + * BUSY. + */ +static int tdx_sept_split_private_spte(struct kvm *kvm, gfn_t gfn, u64 old= _spte, + u64 new_spte, enum pg_level level) +{ + struct kvm_vcpu *vcpu =3D kvm_get_running_vcpu(); + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + gpa_t gpa =3D gfn_to_gpa(gfn); + u64 err, entry, level_state; + struct page *external_spt; + + lockdep_assert_held_write(&kvm->mmu_lock); + + external_spt =3D tdx_spte_to_external_spt(kvm, gfn, new_spte, level); + if (!external_spt) + return -EIO; + + if (KVM_BUG_ON(!vcpu || vcpu->kvm !=3D kvm, kvm)) + return -EIO; + + err =3D tdh_do_no_vcpus(tdh_mem_range_block, kvm, &kvm_tdx->td, gpa, + level, &entry, &level_state); + if (TDX_BUG_ON_2(err, TDH_MEM_RANGE_BLOCK, entry, level_state, kvm)) + return -EIO; + + tdx_track(kvm); + + err =3D tdh_do_no_vcpus(tdh_mem_page_demote, kvm, &kvm_tdx->td, gpa, + level, spte_to_pfn(old_spte), external_spt, + &to_tdx(vcpu)->pamt_cache, &entry, &level_state); + if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_DEMOTE, entry, level_state, kvm)) + return -EIO; + + return 0; +} + static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, u64 new_s= pte, enum pg_level level) { @@ -1853,9 +1899,8 @@ static int tdx_sept_remove_private_spte(struct kvm *k= vm, gfn_t gfn, static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, u64 new_spte, enum pg_level level) { - /* TODO: Support replacing huge SPTE with non-leaf SPTE. (a.k.a. demotion= ). */ - if (KVM_BUG_ON(is_shadow_present_pte(old_spte) && is_shadow_present_pte(n= ew_spte), kvm)) - return -EIO; + if (is_shadow_present_pte(old_spte) && is_shadow_present_pte(new_spte)) + return tdx_sept_split_private_spte(kvm, gfn, old_spte, new_spte, level); else if (is_shadow_present_pte(old_spte)) return tdx_sept_remove_private_spte(kvm, gfn, old_spte, level); =20 --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AB7203783B2 for ; Thu, 29 Jan 2026 01:16:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649405; cv=none; b=IF7TLPbKWSI1BUwWKTelEiGsc7OInEHywZNjdSfPCKpEOKnHAgeNTkEQfTduj4N2qM9QgHKA0WUVlmaR4qKVcB3YIxe9QjEEbbb9rI5cyJzMdXKWMC/dYCiilhNmWlPhGF7WdhqwRwow+hFIGNBaPx0gI7x6vTPVN1hgyjwaaEQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649405; c=relaxed/simple; bh=MaC+fCO7g1By3J/qOTxfjL/79GFuNBOpVwOO27+9vUE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=aCsIDCjq39uobUoMrhalHk/yZ+kMicr88vCR0XalMWecPnym24Q5xWn2LmMTYLtaD6uav1k/L1sKYiy7+15TvgRCXqJp9ip3Ii8/Wf6uS8fQJG3CurglkE9m2iyoUmSM1o5p+R9wQhqJv989fifqna/sqGER3jxlPmHf+zCsZxM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=paUTCwqr; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="paUTCwqr" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-352e1a8603bso362052a91.1 for ; Wed, 28 Jan 2026 17:16:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649403; x=1770254203; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=cCcD265cxmmVZ0UcKWxh2iCOGhqGhkNB0SavdcofzbU=; b=paUTCwqrRbNpS5BvaZ6238ym2ADiolbeNS1rSuK054wMak7iMmC1rXWvCnG8ev0i0w aYKDDEPuhgw+qKX9KFqVBqrPwXcDdjdoeykarvB2wBrXlSrkCLgsu6erZlSB9esCY3CZ rRwAH7nDRTjMY7c5ywdofRy/dWHaVbXu4F4E3EkVryzQl5ZpCLOJCzj1rrsDmYOB2dYG FfqRMOks6IJf07V1mlzoQ+K3MJP5aF0f4N1gC1N8rrJuoxa5Cfkj27VrH+BpJ2NqR+/S RgIUPS+nObnt2zGR1UOE5VRGqhIgQlC7apuCkr2WgFjCVFBvfUQlebx6YoB5G+IazjwH FQ3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649403; x=1770254203; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=cCcD265cxmmVZ0UcKWxh2iCOGhqGhkNB0SavdcofzbU=; b=Ik1RLcwtu2O1k4uhXSVaq+MtGhIBI0gybtFG5odme8UyLbWniCvVbOsNiBXSRdk/ew AVEWfj1qlWiCat2RcGGqHs4p5PD0r4V0tXtDhTaIFqQKJGDDXcMW4bEeT7NsfrkndFzC tl3CIG+g4MkN2EHW7doRPae9voNv0PY1lZKsVd7E6qxUinsZjRTvNUvCNPtqQvhGj4jm ok6jraON0yczg06t6NBjCqvxpmMCm7RnpB+UWPI2LAewtZRddAYdLbsAhxirmlPMSYyd Xi0T8RoY02J6Sa9SofjiaTW3Fx1t2JDdm9JPYbBaGd9FlaXO9thXSXCUZ73K0CYL1P7W Ydqg== X-Gm-Message-State: AOJu0Yx0N9l/Qg+9xypRHcD84Os2ZysX9i4bUSda0ywb8Hk0553HHhdo mpSX8XzeNr46h2z+3oXiRKNp56CT95rFkeOFlIsDB1sgeTaULQZzMpm0+uKyHxFSQf99Bz4SefG 0DjYFXA== X-Received: from pjbbf7.prod.google.com ([2002:a17:90b:b07:b0:34a:b3a0:78b9]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90a:d008:b0:32d:e780:e9d5 with SMTP id 98e67ed59e1d1-353fed6ebffmr5150167a91.22.1769649403000; Wed, 28 Jan 2026 17:16:43 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:12 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-41-seanjc@google.com> Subject: [RFC PATCH v5 40/45] KVM: x86: Introduce hugepage_set_guest_inhibit() From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yan Zhao TDX requires guests to accept S-EPT mappings created by the host KVM. Due to the current implementation of the TDX module, if a guest accepts a GFN at a lower level after KVM maps it at a higher level, the TDX module will emulate an EPT violation VMExit to KVM instead of returning a size mismatch error to the guest. If KVM fails to perform page splitting in the VMExit handler, the guest's accept operation will be triggered again upon re-entering the guest, causing a repeated EPT violation VMExit. To facilitate passing the guest's accept level information to the KVM MMU core and to prevent the repeated mapping of a GFN at different levels due to different accept levels specified by different vCPUs, introduce the interface hugepage_set_guest_inhibit(). This interface specifies across vCPUs that mapping at a certain level is inhibited from the guest. Intentionally don't provide an API to clear KVM_LPAGE_GUEST_INHIBIT_FLAG for the time being, as detecting that it's ok to (re)install a hugepage is tricky (and costly if KVM wants to be 100% accurate), and KVM doesn't currently support hugepage promotion (only direct installation of hugepages) for S-EPT. As a result, the only scenario where clearing the flag would likely allow KVM to install a hugepage is when an entire 2MiB / 1GiB range is converted to shared or private. But if the guest is accepting at 4KiB granulairty, odds are good the guest is using the memory for something "special" and will never convert the entire range to shared (and/or back to private). Punt that optimization to the future, if it's ever needed. Link: https://lore.kernel.org/all/a6ffe23fb97e64109f512fa43e9f6405236ed40a.= camel@intel.com [1] Suggested-by: Rick Edgecombe Suggested-by: Sean Christopherson Signed-off-by: Yan Zhao [sean: explain *why* the flag is never cleared] Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu.h | 4 ++++ arch/x86/kvm/mmu/mmu.c | 21 ++++++++++++++++++--- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 830f46145692..fa6a8daf4b05 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -322,4 +322,8 @@ static inline bool kvm_is_gfn_alias(struct kvm *kvm, gf= n_t gfn) { return gfn & kvm_gfn_direct_bits(kvm); } + +void hugepage_set_guest_inhibit(struct kvm_memory_slot *slot, gfn_t gfn, i= nt level); +bool hugepage_test_guest_inhibit(struct kvm_memory_slot *slot, gfn_t gfn, = int level); + #endif diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 45650f70eeab..c2765bfc8492 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -718,12 +718,14 @@ static struct kvm_lpage_info *lpage_info_slot(gfn_t g= fn, } =20 /* - * The most significant bit in disallow_lpage tracks whether or not memory - * attributes are mixed, i.e. not identical for all gfns at the current le= vel. + * The most 2 significant bits in disallow_lpage tracks whether or not mem= ory + * attributes are mixed, i.e. not identical for all gfns at the current le= vel, + * or whether or not guest inhibits the current level of hugepage at the g= fn. * The lower order bits are used to refcount other cases where a hugepage = is * disallowed, e.g. if KVM has shadow a page table at the gfn. */ #define KVM_LPAGE_MIXED_FLAG BIT(31) +#define KVM_LPAGE_GUEST_INHIBIT_FLAG BIT(30) =20 static void update_gfn_disallow_lpage_count(const struct kvm_memory_slot *= slot, gfn_t gfn, int count) @@ -736,7 +738,8 @@ static void update_gfn_disallow_lpage_count(const struc= t kvm_memory_slot *slot, =20 old =3D linfo->disallow_lpage; linfo->disallow_lpage +=3D count; - WARN_ON_ONCE((old ^ linfo->disallow_lpage) & KVM_LPAGE_MIXED_FLAG); + WARN_ON_ONCE((old ^ linfo->disallow_lpage) & + (KVM_LPAGE_MIXED_FLAG | KVM_LPAGE_GUEST_INHIBIT_FLAG)); } } =20 @@ -1648,6 +1651,18 @@ static bool __kvm_rmap_zap_gfn_range(struct kvm *kvm, start, end - 1, can_yield, true, flush); } =20 +bool hugepage_test_guest_inhibit(struct kvm_memory_slot *slot, gfn_t gfn, = int level) +{ + return lpage_info_slot(gfn, slot, level)->disallow_lpage & KVM_LPAGE_GUES= T_INHIBIT_FLAG; +} +EXPORT_SYMBOL_FOR_KVM_INTERNAL(hugepage_test_guest_inhibit); + +void hugepage_set_guest_inhibit(struct kvm_memory_slot *slot, gfn_t gfn, i= nt level) +{ + lpage_info_slot(gfn, slot, level)->disallow_lpage |=3D KVM_LPAGE_GUEST_IN= HIBIT_FLAG; +} +EXPORT_SYMBOL_FOR_KVM_INTERNAL(hugepage_set_guest_inhibit); + bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) { bool flush =3D false; --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 83F313783D1 for ; Thu, 29 Jan 2026 01:16:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649407; cv=none; b=d7iWj18/kB2ZJtIouAsLbJB00/ctjYAUueDYcZITVLAGYm4AXxCWr/XgRoe9p6BD35y47TERD9vr/5heFgXhcps8bkr95+EuHfvAIHprCZcPvo/Xo4RBpBiICK0SPRBq6CVSNjPfj+F/RA+Ju7lMZRID035Z73F5m8+LKbWKU54= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649407; c=relaxed/simple; bh=eGA1sGsrL5gQvVxX2B8XhLkR/dJNKRXiCDxyhvAgE7U=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Cfn82+G6zTIHTJBDvrldkpW9KIcjQkluowke13bQJj2nBQB4Qrjc5SznGtg+Q4pdCJQc52WDZhA1IqV+wriVqrOke8XTarJZfgMpkya0YSOKVa4QY4qk1Wf7MZI0MfJsk8u19ehRdzUp0U485CQTVIdga1Is+84jRKU0l8mzJTQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=kfW/qArN; arc=none smtp.client-ip=209.85.216.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="kfW/qArN" Received: by mail-pj1-f74.google.com with SMTP id 98e67ed59e1d1-34e5a9de94bso949912a91.0 for ; Wed, 28 Jan 2026 17:16:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649405; x=1770254205; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=V7Fouq+dvqn1XG8w3rkoPxCMRgeF52r64rWaA84oL2w=; b=kfW/qArNm57/4auvntJyZjb6Ea5Ffjjg19LJwwL8z1Cxk0XTB2s2VYDwQ6kJ6az5Rs UlSuw1HBgldTJDfN+/cSDe6hRvNpKgqol5tpa5x5De0uTXUfGxBPNb+wOhlwf+rw7u1Q 1T2wZg2heNs504loTN0pe9R6fKJ9HpA7VbvADvy64QPyCUXs2WzVoXc363pwKsFa5DG9 m1RXGq5iOJoG9w7U4+t0blreplQrBlM0IV8LuLxBWhXbppu1wYfvAb6E0zkUmOS9TkrW GMt6XFJsctZ2cayWqC/Rml0fdxPbJIzuhh+vFdbLg1IiF7Xa9Pqs8JNCzLswZfa5/L4D SPyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649405; x=1770254205; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=V7Fouq+dvqn1XG8w3rkoPxCMRgeF52r64rWaA84oL2w=; b=QSL52ATCk4mpOUeGorOk5pHe41QwTdZdzJGKpC92Tq8TOwTbYItenanlXFq8/kv6sU 6WA9cycWk00Hk6gbcDtokKFZ7d+AG7uMQLxA/tcKo9cLUE3LnbT3gCHQqZyMOLyjfHfT SGjfRU7XMsm0aMO3/eC32ZdawgMFr/KMLTakv+WOLCYUTEnBYnaeo0K8GRyoj5MB4p4x rywEs8JzJU8saJqMrmS/itRK62UkpMRH9Z4vMqrE5qnErJRK4RjC07vNpoCpGrDIAqN3 uQHHgckm7M55VcXG0HlZS8Dy0WDk0JkvkGCIJKwjSmUi+K/N65PIuizRaDr2nrGNCMqE 5oZA== X-Gm-Message-State: AOJu0Yw8BjxHvXSzuqlgd1tyRf8UDU7fzBQ1C39xwFBEJ8+NB0WfGK+t GM9xg5foSYKPKpDVS/dPbfW3DwAaYBD/QHhpW2BuyiQ3KR7MzBYIVB/C0oPknrLedGDyHRJweJ4 kyJv/ow== X-Received: from pjwo16.prod.google.com ([2002:a17:90a:d250:b0:34c:84ee:67c4]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:3d02:b0:340:d569:d295 with SMTP id 98e67ed59e1d1-353fed5cad0mr6510840a91.24.1769649404957; Wed, 28 Jan 2026 17:16:44 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:13 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-42-seanjc@google.com> Subject: [RFC PATCH v5 41/45] KVM: TDX: Honor the guest's accept level contained in an EPT violation From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yan Zhao TDX requires guests to accept S-EPT mappings created by the host KVM. Due to the current implementation of the TDX module, if a guest accepts a GFN at a lower level after KVM maps it at a higher level, the TDX module will synthesize an EPT Violation VM-Exit to KVM instead of returning a size mismatch error to the guest. If KVM fails to perform page splitting in the EPT Violation handler, the guest's ACCEPT operation will be triggered again upon re-entering the guest, causing a repeated EPT Violation VM-Exit. To ensure forward progress, honor the guest's accept level if an EPT Violation VMExit contains guest accept level (the TDX-Module provides the level when synthesizing a VM-Exit in response to a failed guest ACCEPT). (1) Set the guest inhibit bit in the lpage info to prevent KVM's MMU from mapping at a higher level than the guest's accept level. (2) Split any existing mapping higher than the guest's accept level. For now, take mmu_lock for write across the entire operation to keep things simple. This can/will be revisited when the TDX-Module adds support for NON-BLOCKING-RESIZE, at which point KVM can split the hugepage without needing to handle UNBLOCK failure if the DEMOTE fails. To avoid unnecessarily contending mmu_lock, check if the inhibit flag is already set before acquiring mmu_lock, e.g. so that a vCPUs doing ACCEPT on a region of memory aren't completely serialized. Note, this relies on (a) setting the inhibit after performing the split, and (b) never clearing the flag, e.g. to avoid false positives and potentially triggering the zero-step mitigation. Note: EPT Violation VM-Exits without the guest's accept level are *never* caused by the guest's ACCEPT operation, but are instead occur if the guest accesses of memory before said memory is accepted. Since KVM can't obtain the guest accept level info from such EPT Violations (the ACCEPT operation hasn't occurred yet), KVM may still map at a higher level than the later guest's ACCEPT level. So, the typical guest/KVM interaction flow is: - If guest accesses private memory without first accepting it, (like non-Linux guests): 1. Guest accesses a private memory. 2. KVM finds it can map the GFN at 2MB. So, AUG at 2MB. 3. Guest accepts the GFN at 4KB. 4. KVM receives an EPT violation with eeq_type of ACCEPT + 4KB level. 5. KVM splits the 2MB mapping. 6. Guest accepts successfully and accesses the page. - If guest first accepts private memory before accessing it, (like Linux guests): 1. Guest accepts a private memory at 4KB. 2. KVM receives an EPT violation with eeq_type of ACCEPT + 4KB level. 3. KVM AUG at 4KB. 4. Guest accepts successfully and accesses the page. Link: https://lore.kernel.org/all/a6ffe23fb97e64109f512fa43e9f6405236ed40a.= camel@intel.com Suggested-by: Rick Edgecombe Suggested-by: Sean Christopherson Signed-off-by: Yan Zhao Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/tdp_mmu.c | 11 ++++++ arch/x86/kvm/mmu/tdp_mmu.h | 2 + arch/x86/kvm/vmx/tdx.c | 76 +++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/tdx_arch.h | 3 ++ 4 files changed, 92 insertions(+) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index e32034bfca5a..0cdc6782e508 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1619,6 +1619,17 @@ void kvm_tdp_mmu_try_split_huge_pages(struct kvm *kv= m, } } =20 +/* Split huge pages for the current root. */ +int kvm_tdp_mmu_split_huge_pages(struct kvm_vcpu *vcpu, gfn_t start, gfn_t= end, + int target_level) +{ + struct kvm_mmu_page *root =3D root_to_sp(vcpu->arch.mmu->root.hpa); + + return tdp_mmu_split_huge_pages_root(vcpu->kvm, root, start, end, + target_level, false); +} +EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_tdp_mmu_split_huge_pages); + static bool tdp_mmu_need_write_protect(struct kvm *kvm, struct kvm_mmu_pag= e *sp) { /* diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h index bd62977c9199..cdb0b4ecaa37 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.h +++ b/arch/x86/kvm/mmu/tdp_mmu.h @@ -97,6 +97,8 @@ void kvm_tdp_mmu_try_split_huge_pages(struct kvm *kvm, const struct kvm_memory_slot *slot, gfn_t start, gfn_t end, int target_level, bool shared); +int kvm_tdp_mmu_split_huge_pages(struct kvm_vcpu *vcpu, gfn_t start, gfn_t= end, + int target_level); =20 static inline void kvm_tdp_mmu_walk_lockless_begin(void) { diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index af63364c8713..098954f5e07c 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -13,6 +13,7 @@ #include "tdx.h" #include "vmx.h" #include "mmu/spte.h" +#include "mmu/tdp_mmu.h" #include "common.h" #include "posted_intr.h" #include "irq.h" @@ -1958,6 +1959,77 @@ static inline bool tdx_is_sept_violation_unexpected_= pending(struct kvm_vcpu *vcp return !(eq & EPT_VIOLATION_PROT_MASK) && !(eq & EPT_VIOLATION_EXEC_FOR_R= ING3_LIN); } =20 +static bool tdx_is_mismatched_accepted(struct kvm_vcpu *vcpu) +{ + return (to_tdx(vcpu)->ext_exit_qualification & TDX_EXT_EXIT_QUAL_TYPE_MAS= K) =3D=3D + TDX_EXT_EXIT_QUAL_TYPE_ACCEPT; +} + +static int tdx_get_ept_violation_level(struct kvm_vcpu *vcpu) +{ + u64 ext_exit_qual =3D to_tdx(vcpu)->ext_exit_qualification; + + return (((ext_exit_qual & TDX_EXT_EXIT_QUAL_INFO_MASK) >> + TDX_EXT_EXIT_QUAL_INFO_SHIFT) & GENMASK(2, 0)) + 1; +} + +/* + * An EPT violation can be either due to the guest's ACCEPT operation or + * due to the guest's access of memory before the guest accepts the + * memory. + * + * Type TDX_EXT_EXIT_QUAL_TYPE_ACCEPT in the extended exit qualification + * identifies the former case, which must also contain a valid guest + * accept level. + * + * For the former case, honor guest's accept level by setting guest inhibi= t bit + * on levels above the guest accept level and split the existing mapping f= or the + * faulting GFN if it's with a higher level than the guest accept level. + * + * Do nothing if the EPT violation is due to the latter case. KVM will map= the + * GFN without considering the guest's accept level (unless the guest inhi= bit + * bit is already set). + */ +static int tdx_handle_mismatched_accept(struct kvm_vcpu *vcpu, gfn_t gfn) +{ + struct kvm_memory_slot *slot =3D kvm_vcpu_gfn_to_memslot(vcpu, gfn); + struct kvm *kvm =3D vcpu->kvm; + gfn_t start, end; + int level, r; + + if (!slot || !tdx_is_mismatched_accepted(vcpu)) + return 0; + + if (WARN_ON_ONCE(!VALID_PAGE(vcpu->arch.mmu->root.hpa))) + return 0; + + level =3D tdx_get_ept_violation_level(vcpu); + if (level > PG_LEVEL_2M) + return 0; + + if (hugepage_test_guest_inhibit(slot, gfn, level + 1)) + return 0; + + guard(write_lock)(&kvm->mmu_lock); + + start =3D gfn_round_for_level(gfn, level); + end =3D start + KVM_PAGES_PER_HPAGE(level); + + r =3D kvm_tdp_mmu_split_huge_pages(vcpu, start, end, level); + if (r) + return r; + + /* + * No TLB flush is required, as the "BLOCK + TRACK + kick off vCPUs" + * sequence required by the TDX-Module includes a TLB flush. + */ + hugepage_set_guest_inhibit(slot, gfn, level + 1); + if (level =3D=3D PG_LEVEL_4K) + hugepage_set_guest_inhibit(slot, gfn, level + 2); + + return 0; +} + static int tdx_handle_ept_violation(struct kvm_vcpu *vcpu) { unsigned long exit_qual; @@ -1983,6 +2055,10 @@ static int tdx_handle_ept_violation(struct kvm_vcpu = *vcpu) */ exit_qual =3D EPT_VIOLATION_ACC_WRITE; =20 + ret =3D tdx_handle_mismatched_accept(vcpu, gpa_to_gfn(gpa)); + if (ret) + return ret; + /* Only private GPA triggers zero-step mitigation */ local_retry =3D true; } else { diff --git a/arch/x86/kvm/vmx/tdx_arch.h b/arch/x86/kvm/vmx/tdx_arch.h index a30e880849e3..af006a73ee05 100644 --- a/arch/x86/kvm/vmx/tdx_arch.h +++ b/arch/x86/kvm/vmx/tdx_arch.h @@ -82,7 +82,10 @@ struct tdx_cpuid_value { #define TDX_TD_ATTR_PERFMON BIT_ULL(63) =20 #define TDX_EXT_EXIT_QUAL_TYPE_MASK GENMASK(3, 0) +#define TDX_EXT_EXIT_QUAL_TYPE_ACCEPT 1 #define TDX_EXT_EXIT_QUAL_TYPE_PENDING_EPT_VIOLATION 6 +#define TDX_EXT_EXIT_QUAL_INFO_MASK GENMASK(63, 32) +#define TDX_EXT_EXIT_QUAL_INFO_SHIFT 32 /* * TD_PARAMS is provided as an input to TDH_MNG_INIT, the size of which is= 1024B. */ --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 257B6338585 for ; Thu, 29 Jan 2026 01:16:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649408; cv=none; b=TvVhpRnyGcjO8yzzUnSy3+vxsveLZXfJePku5CVgLt9i4BRktZNZKmpKECRJQ/vNVuCw8zBnCw/dwPfp18IqNZJiVmbqKI4ZVl1ZvHIH7EEQ9eaAp1ZzyrxPXzSy6Qrrvx/J3CmRXF0uoWOQbFtH1xz85a6U4llzqaMlAXb5r8U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649408; c=relaxed/simple; bh=2lbLvW6ijIN279zXFdILobLzEFRXyPlsR+WIhazYkJc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=TaIeWcwA2ECMWXbP0ZQvJMG1+Xm+zkRNqQ3ZUKYWhM0YUX7oufU4bChccpEggniioDBqDud//jyMUsSc6Jgosc4Uo1cE2A4Tn6rGQUNf1EFdCh/wIqP68AlwuTVn2tQoMcyxIHeM7o1w9aefU9+9MkfgIZbGYFzBdxU7CiBTZPM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=vmGrprmL; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="vmGrprmL" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c503d6be76fso977380a12.0 for ; Wed, 28 Jan 2026 17:16:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649406; x=1770254206; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=eeY70eVnfoYCtpmYqMWIyko5YIZVo7ravcEk/U6/auE=; b=vmGrprmLuMg8VCbykZI9iDxnUl5RzwhjxeeBSrf5iQSNw0RK6BItj4nf/H+BDiBAyt 1+wlnMESBe4Vrf/p1Hcmtf/VxRf0wfEZQ4YpemMJydS+ybAzrGAQVMXuLdkiWSjrE/ew dKHrqvR2/jHZUzF6Z7z37B6rOzmYTBgCPlpOwDdowAUbRwDbW7IiPzG8Eofd8V6nEhAU UYtWbogRTyKWeJJg4Aqpfc/36xUgjrSMdoVdfQ1HsyVFO6aKNnuuBthzZd5Xs4KjEiZU 9i4I+R6uFdItMxZFbhTmqnefODrHxVWMucIG3ZpAcfJUU4aL0XcQP3cwjTDYqVoUluQt xx0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649406; x=1770254206; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=eeY70eVnfoYCtpmYqMWIyko5YIZVo7ravcEk/U6/auE=; b=C1lgV5AH5DUkBANexc5qEoVCR89Nxm3LLAcKCLBdkKJFzL/x8dUmrpMresh7LTOzQv VuvcjZjz5lUNYJX2YOVYL6e7Sxq++1Q9z/tC8o7kE8IZtvQtmBbqIckgtx1XR+CpKQjV xzCbRbtERjvQbQO+awYImZc9rmiA+pD5srTO+vdK9rYxZ/vovjyejl+2SUv/vDdrBzFL QZClyeGfL7bKOqI0gOCnCtHnShNG7QP/Ltbt5mg5fuaNecL/n/q03N02Cd2MYygOtJd0 5znKZ0k7fJOpzRkSBff+c9IfOPsRtHkv/zQ2gTVan2L1euETKgUBMnq7efqBxs161W23 aKUA== X-Gm-Message-State: AOJu0Ywi9Z4X5EmiZpKkjYffaJKRY03FhmJaMYsZC4AHOJhq9DuRbqL+ 7LoG+U3PEJFtnAMVBxHCfitxCB0UKE03QExk2GOZbV2E1o3+I3HsoEdUIqQXYcMF6Y9d3bxmxdR aXUXzqA== X-Received: from pjyu17.prod.google.com ([2002:a17:90a:e011:b0:353:454:939c]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:62c4:b0:38d:f62a:a9e5 with SMTP id adf61e73a8af0-38ec6248055mr7357147637.14.1769649406445; Wed, 28 Jan 2026 17:16:46 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:14 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-43-seanjc@google.com> Subject: [RFC PATCH v5 42/45] KVM: guest_memfd: Add helpers to get start/end gfns give gmem+slot+pgoff From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add helpers for getting a gfn given a gmem slot+pgoff, and for getting a gfn given a starting or ending pgoff, i.e. an offset that may be beyond the range of the memslot binding. Providing helpers will avoid duplicate boilerplate code "if" future code also needs to iterate over gfn ranges. No functional change intended. Signed-off-by: Sean Christopherson --- virt/kvm/guest_memfd.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 923c51a3a525..51dbb309188f 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -59,6 +59,21 @@ static pgoff_t kvm_gmem_get_index(struct kvm_memory_slot= *slot, gfn_t gfn) return gfn - slot->base_gfn + slot->gmem.pgoff; } =20 +static gfn_t kvm_gmem_get_gfn(struct kvm_memory_slot *slot, pgoff_t pgoff) +{ + return slot->base_gfn + pgoff - slot->gmem.pgoff; +} + +static gfn_t kvm_gmem_get_start_gfn(struct kvm_memory_slot *slot, pgoff_t = start) +{ + return kvm_gmem_get_gfn(slot, max(slot->gmem.pgoff, start)); +} + +static gfn_t kvm_gmem_get_end_gfn(struct kvm_memory_slot *slot, pgoff_t en= d) +{ + return kvm_gmem_get_gfn(slot, min(slot->gmem.pgoff + slot->npages, end)); +} + static int __kvm_gmem_prepare_folio(struct kvm *kvm, struct kvm_memory_slo= t *slot, pgoff_t index, struct folio *folio) { @@ -167,11 +182,9 @@ static void __kvm_gmem_invalidate_begin(struct gmem_fi= le *f, pgoff_t start, unsigned long index; =20 xa_for_each_range(&f->bindings, index, slot, start, end - 1) { - pgoff_t pgoff =3D slot->gmem.pgoff; - struct kvm_gfn_range gfn_range =3D { - .start =3D slot->base_gfn + max(pgoff, start) - pgoff, - .end =3D slot->base_gfn + min(pgoff + slot->npages, end) - pgoff, + .start =3D kvm_gmem_get_start_gfn(slot, start), + .end =3D kvm_gmem_get_end_gfn(slot, end), .slot =3D slot, .may_block =3D true, .attr_filter =3D attr_filter, --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E82EF378D68 for ; Thu, 29 Jan 2026 01:16:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649410; cv=none; b=M9+73j59IhqItjN7MpEY8y9zs4qjCyN1b4hkkwzls4nNF4sYdQy4X0ulATsMfTKUzaeTkOYV/uoXsXDk5xzF8APXSgEvgPQ8Zu4plIvEXDwnTMEiLk5FxgNe+MaDUL5mZf0VhUXyQMHazk4cZHJh/2K79owuuiM8PJlr2m06Ssg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649410; c=relaxed/simple; bh=NFixUjF3MwoYkH+QN/YHmqd9LbNtFdN+U0RPFkdel4M=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=UsoNQAXTpoKyPBjFFOwRD/924OMpwnLQ3E/2/wEj8M2xQbJXetezuJqTo+s3wRW32jAWTCrlFiRCMlkzncwHE9y/NusLswaqd0YXCdtxLqZvA+BD40w40Eu0A36MmfjohT8IDAknvoba9GFepZ/fw2FF9vjYCafIqEo/sgrBabg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=MzDw42XP; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="MzDw42XP" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c634b862fcfso265417a12.2 for ; Wed, 28 Jan 2026 17:16:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649408; x=1770254208; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=UsNtWhNO0vhkCUAU93bPMBDIA0a/SH6YuHE+/bCD4zo=; b=MzDw42XP4OeqhyRho5tVL44TcTjVpEWc2zaS/MNoTUt05mQd3pmd2/G+2oUIw9/dV3 7zktCZ0uF1m97EKQC2h1xCT4r6xxrzOjk1Lhb0qbSHPk+XnhmF4NPvM2sEAoGJQlj0JM emiZ4QcLNIzVqwt3XUcaT0jXB4933aHSSRO5xbnsgLmhFWP281Y8vJllgkdLhcx0huL2 wK15KuRhixaKVHlvRRW9QuMZ5D4vBXYI1aTR6tilQUR+5Dth9bescmvQ7QYMpiPW2OrP FVOIJVEaLQU2UnUZ23OhhWwgcabR16qXsp23Xqb9XKw1rtqapIQEsr4hffBrkWoTf8qx l0Zw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649408; x=1770254208; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=UsNtWhNO0vhkCUAU93bPMBDIA0a/SH6YuHE+/bCD4zo=; b=fa4RNeaSpEmnA00Oyqog/viIeor91Ni9waBpZoXueI4rb6sJmXJ4Hsl5buXmDSWkXq ch3yA9ufi4/u4SLEP40xsJqUOTjDhkzcHVekDcqK2sfzv7a0Yn5Q6KDcVEHyjiC3JBb4 UKZ1ytEdvS9Ru8raqpTMI3ZxYsoSgC+f52GxYfRznfcLo4xyc4UkC8XwkKLu5MRm3gBH YpPs+KAnNjkcHXNgmwndX9spMe+Pf+jMTRC+3h2t59S7jetKOR0Kf41crj98lhU/nj5h eaaG18eRYRllwQKHtTSjyIioYEpF4dtzPvu/vG1mZrKcCfckYnj4mcupMJhR6FZPEvd5 KSLg== X-Gm-Message-State: AOJu0YwalPC5WUowVVsHdnutU6OSUR0av5lPsdlhVGR1/+zgFSesVn4H v1/AC3ShY8tJCNYbEl2/EMLFo4pI++ZCUwOgwuIeJWwULs4LRBkwgTsSA0eheo7Rqusz+5hVdLH 6mHHuQw== X-Received: from pjbga13.prod.google.com ([2002:a17:90b:38d:b0:349:3867:ccc1]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:3d19:b0:366:14ac:e20f with SMTP id adf61e73a8af0-38ec6581747mr6919458637.77.1769649408354; Wed, 28 Jan 2026 17:16:48 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:15 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-44-seanjc@google.com> Subject: [RFC PATCH v5 43/45] *** DO NOT MERGE *** KVM: guest_memfd: Add pre-zap arch hook for shared<=>private conversion From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a gmem "pre-zap" hook to allow arch code to take action before a shared<=3D>private conversion, and just as importantly, to let arch code reject/fail a conversion, e.g. if the conversion requires new page tables and KVM hits in OOM situation. The new hook will be used by TDX to split hugepages as necessary to avoid overzapping PTEs, which for all intents and purposes corrupts guest data for TDX VMs (memory is wiped when private PTEs are removed). TODO: Wire this up the convert path, not the PUNCH_HOLE path, once in-place conversion is supported. Signed-off-by: Sean Christopherson --- arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/mmu/tdp_mmu.c | 8 ++++++ include/linux/kvm_host.h | 5 ++++ virt/kvm/Kconfig | 4 +++ virt/kvm/guest_memfd.c | 50 ++++++++++++++++++++++++++++++++++++-- 5 files changed, 66 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig index d916bd766c94..5f8d8daf4289 100644 --- a/arch/x86/kvm/Kconfig +++ b/arch/x86/kvm/Kconfig @@ -138,6 +138,7 @@ config KVM_INTEL_TDX depends on INTEL_TDX_HOST select KVM_GENERIC_MEMORY_ATTRIBUTES select HAVE_KVM_ARCH_GMEM_POPULATE + select HAVE_KVM_ARCH_GMEM_CONVERT help Provides support for launching Intel Trust Domain Extensions (TDX) confidential VMs on Intel processors. diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 0cdc6782e508..c46ebdacdb50 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1630,6 +1630,14 @@ int kvm_tdp_mmu_split_huge_pages(struct kvm_vcpu *vc= pu, gfn_t start, gfn_t end, } EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_tdp_mmu_split_huge_pages); =20 +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_CONVERT +int kvm_arch_gmem_convert(struct kvm *kvm, gfn_t start, gfn_t end, + bool to_private) +{ + return 0; +} +#endif /* CONFIG_HAVE_KVM_ARCH_GMEM_CONVERT */ + static bool tdp_mmu_need_write_protect(struct kvm *kvm, struct kvm_mmu_pag= e *sp) { /* diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 782f4d670793..c0bafff274b6 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2588,6 +2588,11 @@ long kvm_gmem_populate(struct kvm *kvm, gfn_t gfn, v= oid __user *src, long npages kvm_gmem_populate_cb post_populate, void *opaque); #endif =20 +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_CONVERT +int kvm_arch_gmem_convert(struct kvm *kvm, gfn_t start, gfn_t end, + bool to_private); +#endif + #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_INVALIDATE void kvm_arch_gmem_invalidate(kvm_pfn_t start, kvm_pfn_t end); #endif diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig index 267c7369c765..05d69eaa50ae 100644 --- a/virt/kvm/Kconfig +++ b/virt/kvm/Kconfig @@ -125,3 +125,7 @@ config HAVE_KVM_ARCH_GMEM_INVALIDATE config HAVE_KVM_ARCH_GMEM_POPULATE bool depends on KVM_GUEST_MEMFD + +config HAVE_KVM_ARCH_GMEM_CONVERT + bool + depends on KVM_GUEST_MEMFD \ No newline at end of file diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 51dbb309188f..b01f333a5e95 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -164,6 +164,46 @@ static struct folio *kvm_gmem_get_folio(struct inode *= inode, pgoff_t index) return folio; } =20 +#ifdef CONFIG_HAVE_KVM_ARCH_GMEM_CONVERT +static int __kvm_gmem_convert(struct gmem_file *f, pgoff_t start, pgoff_t = end, + bool to_private) +{ + struct kvm_memory_slot *slot; + unsigned long index; + int r; + + xa_for_each_range(&f->bindings, index, slot, start, end - 1) { + r =3D kvm_arch_gmem_convert(f->kvm, + kvm_gmem_get_start_gfn(slot, start), + kvm_gmem_get_end_gfn(slot, end), + to_private); + if (r) + return r; + } + return 0; +} + +static int kvm_gmem_convert(struct inode *inode, pgoff_t start, pgoff_t en= d, + bool to_private) +{ + struct gmem_file *f; + int r; + + kvm_gmem_for_each_file(f, inode->i_mapping) { + r =3D __kvm_gmem_convert(f, start, end, to_private); + if (r) + return r; + } + return 0; +} +#else +static int kvm_gmem_convert(struct inode *inode, pgoff_t start, pgoff_t en= d, + bool to_private) +{ + return 0; +} +#endif + static enum kvm_gfn_range_filter kvm_gmem_get_invalidate_filter(struct ino= de *inode) { if (GMEM_I(inode)->flags & GUEST_MEMFD_FLAG_INIT_SHARED) @@ -244,6 +284,7 @@ static long kvm_gmem_punch_hole(struct inode *inode, lo= ff_t offset, loff_t len) { pgoff_t start =3D offset >> PAGE_SHIFT; pgoff_t end =3D (offset + len) >> PAGE_SHIFT; + int r; =20 /* * Bindings must be stable across invalidation to ensure the start+end @@ -253,13 +294,18 @@ static long kvm_gmem_punch_hole(struct inode *inode, = loff_t offset, loff_t len) =20 kvm_gmem_invalidate_begin(inode, start, end); =20 - truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1); + /* + * For demonstration purposes, pretend this is a private=3D>shared conver= sion. + */ + r =3D kvm_gmem_convert(inode, start, end, false); + if (!r) + truncate_inode_pages_range(inode->i_mapping, offset, offset + len - 1); =20 kvm_gmem_invalidate_end(inode, start, end); =20 filemap_invalidate_unlock(inode->i_mapping); =20 - return 0; + return r; } =20 static long kvm_gmem_allocate(struct inode *inode, loff_t offset, loff_t l= en) --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DCECB342158 for ; Thu, 29 Jan 2026 01:16:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649413; cv=none; b=TQGEsCQvgbgaPygaPeKaljrXV6YpO1I0yTTJanRLCTH84bgJGs6nRcLBPSVWTN0j6gKZlrKnrBpcKGA1kPjHsnnLWlsjhl9bdr3MaqZSWU4CiK7J22m3KhwvgsRugLxceZALuL6BL6BE79cH2D8Ttay8j9RFEhN1yGYKQUMIFZI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649413; c=relaxed/simple; bh=0iJukulF0iDMEL1IfL1RkxcCyzPuMdALrMtRBvNIXvg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=S8mZG9a5W54TrfX/yGnhxCBRwU2Z/qXxYtTCpKxcs7fTG8Nzwuq5YWGmnTKYuyP7p6AKzwqxdrl1ZYbLa6VOvYPj6b9kNplPt7fi1YlRGlYPcwT54mwcoQep6kG6Tlc1ZfduYgoEDA6i9gzmE/+NbwyyM4tQjkHMXP659LcwNHE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=VUa1+a90; arc=none smtp.client-ip=209.85.214.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="VUa1+a90" Received: by mail-pl1-f202.google.com with SMTP id d9443c01a7336-2a0e952f153so9480745ad.0 for ; Wed, 28 Jan 2026 17:16:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649410; x=1770254210; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=h+1tpJHrAZljj9M61rOSFv1VCgBL5A8paE4Q/bTQIJs=; b=VUa1+a903Fj7EnH257x8AzflrDL/v0e0gAM74zHv4y9goH13SSbhc/Bkbo6wPMUyiK OY+ep38XLw25X9LzK01V+ZT0r0daE580nEIKRU5v/wSatm9Dcl8umN5G2ZfLsF9h7yBl TViTaZSENQcrlCm5pvPFE9IjhSE41woYCZp+Etx7dh2pnO91vzWkhOVhNlRppva7xGOi x42ALQfW0itu05NfA6aFnj1htVA5ji0DF1woTMcR1m9e342HLu+R+zEvtG5nB9keZ+VX wDgEpfS3n/peBIf0d6IBmU8f/J0B5HKschUM+D0MT+e063G6w99PYPHnYOyIuJ2tee9x 5joQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649410; x=1770254210; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=h+1tpJHrAZljj9M61rOSFv1VCgBL5A8paE4Q/bTQIJs=; b=hEBZeUXKPwlWqbccIqK183BDX8psRbwa5c2j/muUBYDrxBy3HaZmeTujLptsUQNJlb jCHPu20Bap8KGS2LfSFzdFKOW33Bzq1UAvT8KbAInbBN6XMeKHyv5RMa2TkhFl9avMlq vh0w1E4U6yxP/E9nb72iWJ1vqAJVAN9k7/we8K1ryN4hv91VEW/WeQhbpBpxz5l1UbtW Lj8m/LqiEL8KhCTzQOzhnqDW0Nj9HvdBTc/JynYOTsNm0XFL97pNiLnD9LkJpNS4CrpJ wBtlcAsiptJ9KmiRc32opScm3xsAadw3To3e13bqXdgm/VrunUyS8bmmJIjuoRaa8ceB nZ5w== X-Gm-Message-State: AOJu0Ywk5GibuI/78o6MVyOe829CJyYljnGJz2mq4Ua8H1XKm/sJdRic oE5s1pzAg9DwA+TjYVvUnh9AhcGMGuMtZ94c3pIwW31mFvSwfsBvR92gA+lu2RA9KbXx4U9HWqo NROUKjw== X-Received: from pgbcu5.prod.google.com ([2002:a05:6a02:2185:b0:c0e:da74:78ee]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:918f:b0:34f:1623:2354 with SMTP id adf61e73a8af0-38ec6421854mr6633861637.42.1769649410126; Wed, 28 Jan 2026 17:16:50 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:16 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-45-seanjc@google.com> Subject: [RFC PATCH v5 44/45] KVM: x86/mmu: Add support for splitting S-EPT hugepages on conversion From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add support for splitting S-EPT hugepages in preparation for converting a subset of a hugepage to be shared, as KVM must precisely zap/remove S-EPT entries to avoid clobbering guest memory (the lifetime of guest private memory is tied to the S-EPT). I.e. KVM needs to first split a hugepage so that only the to-be-converted small pages can be zapped. To avoid unnecessary work, e.g. if only the tail/end page of massive region isn't aligned to the conversion, explicitly detect unaligned head and tail pages relative to the max page size support by KVM, i.e. head/tail pages that will undergo partial conversion. To support splitting an S-EPT hugepage without a vCPU, add a per-VM PAMT cache, along with a mutex to guard the cache. Using a mutex, e.g. versus a spinlock, is important at it allows KVM to allocate memory *without* dropping the lock, i.e. so that the PAMT cache can be topped-up as needed without needed to juggle arch.tdp_mmu_external_cache_lock. Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 8 +++- arch/x86/kvm/mmu/mmu.c | 2 +- arch/x86/kvm/mmu/tdp_mmu.c | 72 +++++++++++++++++++++++++++++++-- arch/x86/kvm/vmx/tdx.c | 34 +++++++++++++--- arch/x86/kvm/vmx/tdx.h | 2 + 5 files changed, 107 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 385f1cf32d70..54dea90a53dc 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1563,6 +1563,12 @@ struct kvm_arch { * the code to do so. */ spinlock_t tdp_mmu_pages_lock; + + /* + * Protect the per-VM cache of pre-allocate pages used to populate the + * Dynamic PAMT when splitting S-EPT huge pages without a vCPU. + */ + struct mutex tdp_mmu_external_cache_lock; #endif /* CONFIG_X86_64 */ =20 /* @@ -1861,7 +1867,7 @@ struct kvm_x86_ops { u64 new_spte, enum pg_level level); void (*reclaim_external_sp)(struct kvm *kvm, gfn_t gfn, struct kvm_mmu_page *sp); - int (*topup_external_cache)(struct kvm_vcpu *vcpu, int min); + int (*topup_external_cache)(struct kvm *kvm, struct kvm_vcpu *vcpu, int m= in); =20 =20 bool (*has_wbinvd_exit)(void); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index c2765bfc8492..62bf6bec2df2 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -606,7 +606,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcp= u, bool maybe_indirect) if (r) return r; =20 - r =3D kvm_x86_call(topup_external_cache)(vcpu, PT64_ROOT_MAX_LEVEL); + r =3D kvm_x86_call(topup_external_cache)(vcpu->kvm, vcpu, PT64_ROOT_MAX_= LEVEL); if (r) return r; } diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index c46ebdacdb50..3181406c5e0b 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1447,7 +1447,8 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, return spte_set; } =20 -static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct tdp_iter *it= er) +static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_split(struct kvm *kvm, + struct tdp_iter *iter) { struct kvm_mmu_page *sp; =20 @@ -1464,7 +1465,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli= t(struct tdp_iter *iter) if (!sp->external_spt) goto err_external_spt; =20 - if (kvm_x86_call(topup_external_cache)(kvm_get_running_vcpu(), 1)) + if (kvm_x86_call(topup_external_cache)(kvm, kvm_get_running_vcpu(), 1)) goto err_external_split; } =20 @@ -1556,7 +1557,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *= kvm, else write_unlock(&kvm->mmu_lock); =20 - sp =3D tdp_mmu_alloc_sp_for_split(&iter); + sp =3D tdp_mmu_alloc_sp_for_split(kvm, &iter); =20 if (shared) read_lock(&kvm->mmu_lock); @@ -1631,9 +1632,74 @@ int kvm_tdp_mmu_split_huge_pages(struct kvm_vcpu *vc= pu, gfn_t start, gfn_t end, EXPORT_SYMBOL_FOR_KVM_INTERNAL(kvm_tdp_mmu_split_huge_pages); =20 #ifdef CONFIG_HAVE_KVM_ARCH_GMEM_CONVERT +static int __tdp_mmu_split_mirror_huge_pages(struct kvm *kvm, + struct kvm_mmu_page *root, + gfn_t gfn, int target_level) +{ + gfn_t end =3D gfn + KVM_PAGES_PER_HPAGE(target_level + 1); + + return tdp_mmu_split_huge_pages_root(kvm, root, gfn, end, target_level, f= alse); +} + +static int tdp_mmu_split_mirror_huge_pages(struct kvm *kvm, + struct kvm_mmu_page *root, + gfn_t start, gfn_t end, int level) +{ + + gfn_t head =3D gfn_round_for_level(start, level + 1); + gfn_t tail =3D gfn_round_for_level(end, level + 1); + int r; + + if (head !=3D start) { + r =3D __tdp_mmu_split_mirror_huge_pages(kvm, root, head, level); + if (r) + return r; + } + + if (tail !=3D end && (head !=3D tail || head =3D=3D start)) { + r =3D __tdp_mmu_split_mirror_huge_pages(kvm, root, tail, level); + if (r) + return r; + } + + return 0; +} + int kvm_arch_gmem_convert(struct kvm *kvm, gfn_t start, gfn_t end, bool to_private) { + struct kvm_mmu_page *root; + int r; + + /* + * When converting from private=3D>shared, KVM must first split potential + * hugepages, as KVM mustn't overzap private mappings for TDX guests, + * i.e. must zap _exactly_ [start, end). Split potential hugepages at + * the head and tail of the to-be-converted (and thus zapped) range so + * that KVM doesn't overzap due to dropping a hugepage that doesn't + * fall wholly inside the range. + */ + if (to_private || !kvm_has_mirrored_tdp(kvm)) + return 0; + + /* + * Acquire the external cache lock, a.k.a. the Dynamic PAMT lock, to + * protect the per-VM cache of pre-allocate pages used to populate the + * Dynamic PAMT when splitting S-EPT huge pages. + */ + guard(mutex)(&kvm->arch.tdp_mmu_external_cache_lock); + + guard(write_lock)(&kvm->mmu_lock); + + /* + * TODO: Also split from PG_LEVEL_1G =3D> PG_LEVEL_2M when KVM supports + * 1GiB S-EPT pages. + */ + __for_each_tdp_mmu_root_yield_safe(kvm, root, 0, KVM_MIRROR_ROOTS) { + r =3D tdp_mmu_split_mirror_huge_pages(kvm, root, start, end, PG_LEVEL_4K= ); + if (r) + return r; + } return 0; } #endif /* CONFIG_HAVE_KVM_ARCH_GMEM_CONVERT */ diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 098954f5e07c..774d395e5c73 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -607,6 +607,8 @@ void tdx_vm_destroy(struct kvm *kvm) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); =20 + tdx_free_pamt_cache(&kvm_tdx->pamt_cache); + tdx_reclaim_td_control_pages(kvm); =20 kvm_tdx->state =3D TD_STATE_UNINITIALIZED; @@ -629,6 +631,8 @@ int tdx_vm_init(struct kvm *kvm) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); =20 + tdx_init_pamt_cache(&kvm_tdx->pamt_cache); + kvm->arch.has_protected_state =3D true; /* * TDX Module doesn't allow the hypervisor to modify the EOI-bitmap, @@ -1621,15 +1625,32 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t = root_hpa, int pgd_level) td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa); } =20 -static int tdx_topup_external_pamt_cache(struct kvm_vcpu *vcpu, int min) +static struct tdx_pamt_cache *tdx_get_pamt_cache(struct kvm *kvm, + struct kvm_vcpu *vcpu) { + if (KVM_BUG_ON(vcpu && vcpu->kvm !=3D kvm, kvm)) + return NULL; + + if (vcpu) + return &to_tdx(vcpu)->pamt_cache; + + lockdep_assert_held(&kvm->arch.tdp_mmu_external_cache_lock); + return &to_kvm_tdx(kvm)->pamt_cache; +} + +static int tdx_topup_external_pamt_cache(struct kvm *kvm, + struct kvm_vcpu *vcpu, int min) +{ + struct tdx_pamt_cache *pamt_cache; + if (!tdx_supports_dynamic_pamt(tdx_sysinfo)) return 0; =20 - if (WARN_ON_ONCE(!vcpu)) + pamt_cache =3D tdx_get_pamt_cache(kvm, vcpu); + if (!pamt_cache) return -EIO; =20 - return tdx_topup_pamt_cache(&to_tdx(vcpu)->pamt_cache, min); + return tdx_topup_pamt_cache(pamt_cache, min); } =20 static int tdx_mem_page_add(struct kvm *kvm, gfn_t gfn, enum pg_level leve= l, @@ -1792,8 +1813,8 @@ static struct page *tdx_spte_to_external_spt(struct k= vm *kvm, gfn_t gfn, static int tdx_sept_split_private_spte(struct kvm *kvm, gfn_t gfn, u64 old= _spte, u64 new_spte, enum pg_level level) { - struct kvm_vcpu *vcpu =3D kvm_get_running_vcpu(); struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + struct tdx_pamt_cache *pamt_cache; gpa_t gpa =3D gfn_to_gpa(gfn); u64 err, entry, level_state; struct page *external_spt; @@ -1804,7 +1825,8 @@ static int tdx_sept_split_private_spte(struct kvm *kv= m, gfn_t gfn, u64 old_spte, if (!external_spt) return -EIO; =20 - if (KVM_BUG_ON(!vcpu || vcpu->kvm !=3D kvm, kvm)) + pamt_cache =3D tdx_get_pamt_cache(kvm, kvm_get_running_vcpu()); + if (!pamt_cache) return -EIO; =20 err =3D tdh_do_no_vcpus(tdh_mem_range_block, kvm, &kvm_tdx->td, gpa, @@ -1816,7 +1838,7 @@ static int tdx_sept_split_private_spte(struct kvm *kv= m, gfn_t gfn, u64 old_spte, =20 err =3D tdh_do_no_vcpus(tdh_mem_page_demote, kvm, &kvm_tdx->td, gpa, level, spte_to_pfn(old_spte), external_spt, - &to_tdx(vcpu)->pamt_cache, &entry, &level_state); + pamt_cache, &entry, &level_state); if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_DEMOTE, entry, level_state, kvm)) return -EIO; =20 diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index f444fc84d93b..57d7e70ffe7d 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -48,6 +48,8 @@ struct kvm_tdx { * Set/unset is protected with kvm->mmu_lock. */ bool wait_for_sept_zap; + + struct tdx_pamt_cache pamt_cache; }; =20 /* TDX module vCPU states */ --=20 2.53.0.rc1.217.geba53bf80e-goog From nobody Sat Feb 7 17:41:40 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 00D2333A6FB for ; Thu, 29 Jan 2026 01:16:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649414; cv=none; b=fE2w7/LVmZ17X3OdT595ZhCXnsPjpCP4ARQUp6FsmDrxohHRE2RHixY/u2ovLCi3WcA4pNCQR7tKm7XlMbTk9qWD4CxKXVVsd9/ZFb2Vz5Z8NpqvIwMgMPSg+sCxyLY9/5N2Sw7qMK+iDcJ6Mq5f8R8lePY8Lrg26fJlsRxe/7w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649414; c=relaxed/simple; bh=JnLQOFtJ/wcAZWGRwiicTI8EBue9TzM7THLWcXx2/5w=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=EY0peQImxEm1a2n5DV82nEyupxhLCjMhvGhS/s1vMYejtG9bXC6cV78DcQ9laE5cFlasvce9h44IRl9egnCCtZ3j5GUHmlHJpk4YBmZ59WS0WAjyzZJx8OzMnIM+q4vWiGxuPMKWzj2WVRG/CVphKQwzw08rE5aG/K97KH6t3VE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=K8V+kZo+; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="K8V+kZo+" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-b5edecdf94eso1171561a12.2 for ; Wed, 28 Jan 2026 17:16:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649412; x=1770254212; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=MGU/mVdsXAmMeHXzQ8R+PnC312tm74tZ14i8PtfNv9E=; b=K8V+kZo+TCVkDJzTLsTmgYIc7tz7wyCoh6EcyGfaBOY+9b/Z7/iHaf3w01KXRSQRpf Shseces7G//MR4nt20BJt3DZoWf5xLyOmCjZn3KZsOePxY/BsORImdw+NZjIDIIEMejq 14FrrykW2WzI/U2GDr6syGanp3OibllPjExDTRJos2u58xPulnMIPMrC/NDuqMhEWdUo NvcF7nbZq9rKU5PjweCmmLn7L907p3mRB77qLOjKuk/ycZXpzw7tr0fFfqS3/kJOitTt qnL/8ph7Qr14czngGOE4K2oVZ5wVGWwoDPw0wcEaOuPS8WNOKlQWoMrkT1EuLEbONn7X 2j2g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649412; x=1770254212; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MGU/mVdsXAmMeHXzQ8R+PnC312tm74tZ14i8PtfNv9E=; b=KU7d05XwIhh+/ZQfUORSH3x+/kuH01Z7gc6ziPkBTJk0OLRoYhYWztaco90+wtXH0S yoLaXqTvliai9GZmVt62SsMiN01PjBgFEpSJJv0ZQ/WP7LJUWNkjCqk1i6hv1/xyuj9N N4lLZ55UCfMo9pECClVjlUSnihWxX0xhaBDhO5NWA1wfZk0Tiww9ipCjlhQ6GMrmL480 /mbjh+pxfHlOKN1DLnCgtOgSV7lVnNe78QB1BRnuMSdqJs9MXJB3PYLsMafRjVNX8ix7 Y4qwtNI/AKk0dnT1pzUJvkGC5iMfJlVFOAjDzy80KItXaSQmhdRtxSjq2YfhFjt+QG9V QWAg== X-Gm-Message-State: AOJu0YxA+0MOTNh7+m51+BMCm5qoW12YToqNSbnbQ4zwwkVM8zt0+daU sjlAopQHRoAMfmSoBbFSPpn/SsWcVb8tGwzFhbpWUF0Az8iDE2CUXB6wtOLgi2yeN0SQoVRQbov f7rJZGA== X-Received: from pjbkb4.prod.google.com ([2002:a17:90a:e7c4:b0:352:c130:fba7]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:258e:b0:35d:d477:a7fa with SMTP id adf61e73a8af0-38ec63390f2mr6964703637.23.1769649411949; Wed, 28 Jan 2026 17:16:51 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:15:17 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-46-seanjc@google.com> Subject: [RFC PATCH v5 45/45] KVM: TDX: Turn on PG_LEVEL_2M From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yan Zhao Turn on PG_LEVEL_2M in tdx_gmem_private_max_mapping_level() when TDX huge page is enabled and TD is RUNNABLE. Introduce a module parameter named "tdx_huge_page" for kvm-intel.ko to enable/disable TDX huge page. Turn TDX huge page off if the TDX module does not support TDX_FEATURES0.ENHANCED_DEMOTE_INTERRUPTIBILITY. Force page size to 4KB during TD build time to simplify code design, since - tdh_mem_page_add() only adds private pages at 4KB. - The amount of initial memory pages is usually limited (e.g. ~4MB in a typical linux TD). Update the warnings and KVM_BUG_ON() info to match the conditions when 2MB mappings are permitted. Signed-off-by: Xiaoyao Li Signed-off-by: Isaku Yamahata Signed-off-by: Yan Zhao Signed-off-by: Sean Christopherson --- arch/x86/kvm/vmx/tdx.c | 37 ++++++++++++++++++++++++++++++------- 1 file changed, 30 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 774d395e5c73..8f9b4ad9871f 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -55,6 +55,8 @@ =20 bool enable_tdx __ro_after_init; module_param_named(tdx, enable_tdx, bool, 0444); +static bool __read_mostly enable_tdx_huge_page =3D true; +module_param_named(tdx_huge_page, enable_tdx_huge_page, bool, 0444); =20 #define TDX_SHARED_BIT_PWL_5 gpa_to_gfn(BIT_ULL(51)) #define TDX_SHARED_BIT_PWL_4 gpa_to_gfn(BIT_ULL(47)) @@ -1703,8 +1705,9 @@ static int tdx_sept_map_leaf_spte(struct kvm *kvm, gf= n_t gfn, u64 new_spte, kvm_pfn_t pfn =3D spte_to_pfn(new_spte); int ret; =20 - /* TODO: handle large pages. */ - if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) + /* TODO: Support hugepages when building the initial TD image. */ + if (KVM_BUG_ON(level !=3D PG_LEVEL_4K && + to_kvm_tdx(kvm)->state !=3D TD_STATE_RUNNABLE, kvm)) return -EIO; =20 if (KVM_BUG_ON(!vcpu, kvm)) @@ -1885,10 +1888,6 @@ static int tdx_sept_remove_private_spte(struct kvm *= kvm, gfn_t gfn, if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm)) return -EIO; =20 - /* TODO: handle large pages. */ - if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) - return -EIO; - err =3D tdh_do_no_vcpus(tdh_mem_range_block, kvm, &kvm_tdx->td, gpa, level, &entry, &level_state); if (TDX_BUG_ON_2(err, TDH_MEM_RANGE_BLOCK, entry, level_state, kvm)) @@ -3474,12 +3473,34 @@ int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __us= er *argp) return ret; } =20 +/* + * For private pages: + * + * Force KVM to map at 4KB level when !enable_tdx_huge_page (e.g., due to + * incompatible TDX module) or before TD state is RUNNABLE. + * + * Always allow KVM to map at 2MB level in other cases, though KVM may sti= ll map + * the page at 4KB (i.e., passing in PG_LEVEL_4K to AUG) due to + * (1) the backend folio is 4KB, + * (2) disallow_lpage restrictions: + * - mixed private/shared pages in the 2MB range + * - level misalignment due to slot base_gfn, slot size, and ugfn + * - guest_inhibit bit set due to guest's 4KB accept level + * (3) page merging is disallowed (e.g., when part of a 2MB range has been + * mapped at 4KB level during TD build time). + */ int tdx_gmem_max_mapping_level(struct kvm *kvm, kvm_pfn_t pfn, bool is_pri= vate) { if (!is_private) return 0; =20 - return PG_LEVEL_4K; + if (!enable_tdx_huge_page) + return PG_LEVEL_4K; + + if (unlikely(to_kvm_tdx(kvm)->state !=3D TD_STATE_RUNNABLE)) + return PG_LEVEL_4K; + + return PG_LEVEL_2M; } =20 static int tdx_online_cpu(unsigned int cpu) @@ -3665,6 +3686,8 @@ static int __init __tdx_bringup(void) if (misc_cg_set_capacity(MISC_CG_RES_TDX, tdx_get_nr_guest_keyids())) goto get_sysinfo_err; =20 + if (enable_tdx_huge_page && !tdx_supports_demote_nointerrupt(tdx_sysinfo)) + enable_tdx_huge_page =3D false; /* * Leave hardware virtualization enabled after TDX is enabled * successfully. TDX CPU hotplug depends on this. --=20 2.53.0.rc1.217.geba53bf80e-goog