From nobody Mon Feb 9 00:55:01 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 61FBC313E39 for ; Thu, 29 Jan 2026 01:16:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649369; cv=none; b=Xf1gOe896Clnfa7Wb1FFiE95pW7TkGE6OrEEXiiCRxP+Kuo+UtBeFy8s4q5aWq85YvlPpDh0wztiyswL/6pUq865a4HL+EGKmoQTiKJXp4r9x84q3+5uEREGWMCq6TU8U5CIrrMmmuC6cDqqml97Trw7hnpR7S8lrcAth6xtDhI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649369; c=relaxed/simple; bh=TBS3NOuTU/+p9r5oAffOOIjvCTELnp6C8dm3U+EJwg4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=qZvS4NMinLISuanwtlugAQKe8P0gVm+qC44ijmQM4pJSvlrl5KvZcF19aMrH1kPNuVa8iZaCYwqsUczUE3p9MOaYDZ+E6c1edGGqSRG86evdFx9rh222NDW8oMPGtxYHqmGkIVlUguj8XV7NftmTsBJGWhQReVI7ONrYnGT2SZc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=03AoZWhu; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="03AoZWhu" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-c552d1f9eafso748013a12.0 for ; Wed, 28 Jan 2026 17:16:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649367; x=1770254167; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=8HnFAQrxQ92neYQA/fX8/fYTn2bGdwSiVHNlc5IdWqw=; b=03AoZWhuYRB0gFY+FBCitsqfcVgW10jhnpRK8GHP9Y9WUn75EaoT6q2+3viPq6MQwd q0FKavLqd8u4WAOY5jrPXOJK+U2K1qs2fpQGXXg+A1sOuCGFALsCmxHGpjPG1rKPtrGF iZPbdhZ6wwlOp+BEMuZvgGvbj41t2IoNqa2v6mwQl/LqAOpAYROxkRJS7rG36XHp4nI5 ifjl13e6tuyF1Q3G6yhffzYFLmv6kw8YxTrG1C8y59ti06zmXFRHNBZryfeT2f652RYV +5LpvEXlPA2I97EKE/dM75pJFrGm9DW2taPuYm34j13t9tNeQaBgGdzJvlPoWlf2j0wr VLOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649367; x=1770254167; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8HnFAQrxQ92neYQA/fX8/fYTn2bGdwSiVHNlc5IdWqw=; b=FsdI/bdt5ObZ9fGl5cLb2dTj4ItH2sM3ioRuVtBhs5501r7V9Qpm+Z424cCwidBsK9 +yXKHinuPEvCa0jAZQ9wtw9iFjaVE7fEzpKcZJeFG3NoafT9Zr6AfmAZwyUiRcOzaEW/ vkmm8UQG42A6u9Llak8RA0fPuJ3rrd38qN6rzqJ1s6IINQKiWbUX1pMDKXQuvvdFDqvy moLIOxnLODSUNCobZlIuxawzi0Nr46PVwiUjj3iRIRT2+oApLij7sx9rViabBB3R9eWL ulBVhZ+drvBeUwb0H/NU7M8gMZL0Yo0DM7i1nc658xg0vlJMkHiS8CkFpB2Js28GqNkR 9gaQ== X-Gm-Message-State: AOJu0Yz64k+BfyUi1PzI5wr1n3QPBeUEQuql5sDH+nsWn4Q1WC7odcdK 1EyPVCvmdoUBKBBiCDGW/mB9uzKM+dmOOndJ/7z4fa1qnx4nH+cIvf4EjQrscHlW8mrrG1aKmVu L2Zw35w== X-Received: from pgo32.prod.google.com ([2002:a63:e60:0:b0:c65:be00:c5c9]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a21:6f09:b0:38e:9acd:97d0 with SMTP id adf61e73a8af0-38ec632f1cbmr6748652637.34.1769649366694; Wed, 28 Jan 2026 17:16:06 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:52 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-21-seanjc@google.com> Subject: [RFC PATCH v5 20/45] KVM: x86/mmu: Allocate/free S-EPT pages using tdx_{alloc,free}_control_page() From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now that kvm_mmu_memory_cache supports custom page allocators, wire up the S-EPT cache to use tdx_{alloc,free}_control_page() (arguably S-EPT pages aren't "control" pages, but they're not guest pages either). Using the TDX APIs will make S-EPT pages naturally play nice with Dynamic PAMT, by virtue of adding/removing PAMT entries when S-EPT pages are allocated and freed, as opposed to when they are added/removed from the S-EPT tree. Inserting into the PAMT entries on allocation does mean KVM will create unnecessary PAMT entries, e.g. once a vCPU stops faulting in memory, the remaining pages in the MMU cache will go unused. But in practice, odds are very good the containing 2MiB page will have other in-use S-EPT pages, i.e. will create PAMT entries anyways. And _if_ creating PAMT entries on allocation is problematic for memory consumption, that can be resolved by tweaking KVM's cache size. Suggested-by: Kai Huang Signed-off-by: Sean Christopherson Reviewed-by: Kai Huang --- arch/x86/include/asm/kvm-x86-ops.h | 2 ++ arch/x86/include/asm/kvm_host.h | 18 +++++++++--------- arch/x86/kvm/mmu/mmu.c | 6 ++++-- arch/x86/kvm/mmu/mmu_internal.h | 11 ----------- arch/x86/kvm/mmu/tdp_mmu.c | 5 +++-- arch/x86/kvm/vmx/tdx.c | 13 ++++++++++++- 6 files changed, 30 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index c17cedc485c9..17dddada69fc 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -94,6 +94,8 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr) KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) +KVM_X86_OP_OPTIONAL(alloc_external_sp) +KVM_X86_OP_OPTIONAL(free_external_sp) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) KVM_X86_OP_OPTIONAL(remove_external_spte) KVM_X86_OP_OPTIONAL(reclaim_external_sp) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index b35a07ed11fb..6e84dbc89e79 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -867,10 +867,7 @@ struct kvm_vcpu_arch { struct kvm_mmu_memory_cache mmu_shadow_page_cache; struct kvm_mmu_memory_cache mmu_shadowed_info_cache; struct kvm_mmu_memory_cache mmu_page_header_cache; - /* - * This cache is to allocate external page table. E.g. private EPT used - * by the TDX module. - */ + /* Used to allocate S-EPT pages (gifted to the TDX-Module). */ struct kvm_mmu_memory_cache mmu_external_spt_cache; =20 /* @@ -1853,18 +1850,21 @@ struct kvm_x86_ops { void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); =20 - /* Update the external page table from spte getting set. */ + /* + * Callbacks to allocate and free external page tables, a.k.a. S-EPT, + * and to propagate changes in mirror page tables to the external page + * tables. + */ + unsigned long (*alloc_external_sp)(gfp_t gfp); + void (*free_external_sp)(unsigned long addr); int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte); - - /* Update external page tables for page table about to be freed. */ void (*reclaim_external_sp)(struct kvm *kvm, gfn_t gfn, struct kvm_mmu_page *sp); - - /* Update external page table from spte getting removed, and flush TLB. */ void (*remove_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level le= vel, u64 mirror_spte); =20 + bool (*has_wbinvd_exit)(void); =20 u64 (*get_l2_tsc_offset)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3911ac9bddfd..9b5a6861e2a4 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6690,11 +6690,13 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) vcpu->arch.mmu_page_header_cache.kmem_cache =3D mmu_page_header_cache; vcpu->arch.mmu_page_header_cache.gfp_zero =3D __GFP_ZERO; =20 - vcpu->arch.mmu_shadow_page_cache.init_value =3D - SHADOW_NONPRESENT_VALUE; + vcpu->arch.mmu_shadow_page_cache.init_value =3D SHADOW_NONPRESENT_VALUE; if (!vcpu->arch.mmu_shadow_page_cache.init_value) vcpu->arch.mmu_shadow_page_cache.gfp_zero =3D __GFP_ZERO; =20 + vcpu->arch.mmu_external_spt_cache.page_get =3D kvm_x86_ops.alloc_external= _sp; + vcpu->arch.mmu_external_spt_cache.page_free =3D kvm_x86_ops.free_external= _sp; + vcpu->arch.mmu =3D &vcpu->arch.root_mmu; vcpu->arch.walk_mmu =3D &vcpu->arch.root_mmu; =20 diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna= l.h index 73cdcbccc89e..6bb97f660793 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -157,17 +157,6 @@ static inline bool is_mirror_sp(const struct kvm_mmu_p= age *sp) return sp->role.is_mirror; } =20 -static inline void kvm_mmu_alloc_external_spt(struct kvm_vcpu *vcpu, struc= t kvm_mmu_page *sp) -{ - /* - * external_spt is allocated for TDX module to hold private EPT mappings, - * TDX module will initialize the page by itself. - * Therefore, KVM does not need to initialize or access external_spt. - * KVM only interacts with sp->spt for private EPT operations. - */ - sp->external_spt =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_external_= spt_cache); -} - static inline gfn_t kvm_gfn_root_bits(const struct kvm *kvm, const struct = kvm_mmu_page *root) { /* diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 18764dbc97ea..01e3e4f4baa5 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -55,7 +55,8 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) =20 static void tdp_mmu_free_sp(struct kvm_mmu_page *sp) { - free_page((unsigned long)sp->external_spt); + if (sp->external_spt) + kvm_x86_call(free_external_sp)((unsigned long)sp->external_spt); free_page((unsigned long)sp->spt); kmem_cache_free(mmu_page_header_cache, sp); } @@ -1246,7 +1247,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm= _page_fault *fault) sp =3D tdp_mmu_alloc_sp(vcpu); tdp_mmu_init_child_sp(sp, &iter); if (is_mirror_sp(sp)) - kvm_mmu_alloc_external_spt(vcpu, sp); + sp->external_spt =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_externa= l_spt_cache); =20 sp->nx_huge_page_disallowed =3D fault->huge_page_disallowed; =20 diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 323aae4300a1..0946eba2de23 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1790,7 +1790,9 @@ static void tdx_sept_reclaim_private_sp(struct kvm *k= vm, gfn_t gfn, * TD's hkid is freed, when the TD is being torn down. * * If the S-EPT PTE can't be removed for any reason, intentionally leak - * the page to prevent the kernel from accessing the encrypted page. + * the page to prevent the kernel from accessing the encrypted page, + * and if Dynamic PAMT is enabled, to avoid inducing a failure on + * removal of the still-used PAMT entry. */ if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) || tdx_reclaim_page(virt_to_page(sp->external_spt))) @@ -3600,6 +3602,15 @@ void __init tdx_hardware_setup(void) */ vt_x86_ops.vm_size =3D max_t(unsigned int, vt_x86_ops.vm_size, sizeof(str= uct kvm_tdx)); =20 + /* + * TDX uses the external_spt cache to allocate S-EPT page table pages, + * which (a) don't need to be initialized by KVM as the TDX-Module will + * initialize the page (using the guest's encryption key), and (b) need + * to use a custom allocator to be compatible with Dynamic PAMT. + */ + vt_x86_ops.alloc_external_sp =3D tdx_alloc_control_page; + vt_x86_ops.free_external_sp =3D tdx_free_control_page; + vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; vt_x86_ops.reclaim_external_sp =3D tdx_sept_reclaim_private_sp; vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; --=20 2.53.0.rc1.217.geba53bf80e-goog