From nobody Mon Feb 9 00:00:56 2026 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3A36C3469E0 for ; Thu, 29 Jan 2026 01:16:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649372; cv=none; b=ODdRHMDwa3EdA4aBNFm+RZigjdmwspPVkI5q6mBFp9e5ZRZsWUiK+Me3kv4EUepdqActEN/iyBZs/XkmzHDmjJWFC3owCZgw15vNW7GBtgbmvTSfMDfFGn7AinkfUNHoVvwYBGG/sYsiP4zHKIIV9hcqLoUU1Hyh84sEaApGFkg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649372; c=relaxed/simple; bh=SQ5cW++nxEKu5GXoB9MoukK6xkStBoadO/lp4PIOccg=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Mu1/newXAjVgoDSJMjpnJ4Ydp+Hc6m8R6mueJr+/4ISmjDGeJ/0Zyu7dqLLCVO3qMbwAPYhS1aWQe67dNOGFmg9MrZQjPkIajBivAC4niYbJ1fDT+pBO57oDFyQOo78qGzpJFkaNn+b+Xk+n0VDELjSi3IFFMyTWzm8ayYsYT6s= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=XRcJExwl; arc=none smtp.client-ip=209.85.215.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="XRcJExwl" Received: by mail-pg1-f201.google.com with SMTP id 41be03b00d2f7-c61334ba4d3so225442a12.0 for ; Wed, 28 Jan 2026 17:16:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649370; x=1770254170; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=n4V0wAly6JW/34a2CvfLdneki3p7jKUuadg7DAh0+IE=; b=XRcJExwlO6xDxQyKw0mwAAQIce+WXCXhMw5tl9g/1Fuec6bu0j3q0chLhMKAAw/mpE Akx+dP7DTe/kPJgm044yu2gCdku9oTs18BUPXVute6y09uLtsemjrwi7MDnsfBPD5hGy GIzw4Glp0sryeiHhZnAYbbcRIwRk59dSbnXFlI0yrhWwgB1kQdtyiLGW/oLiV28xqrXU iS0n/A39cEW/HohyuVsZEg0ZVdJtv3Mzsc68wxFi1BOz0qjWnRKnKKk/pwbxFhaKCbq3 tGDCI4cLw+CDHPlF9i5ZOyE4uKYwl3uFJAYFu3F5fou7wC20mE7ND7fgl37loLC5HeXX +6eg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649370; x=1770254170; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=n4V0wAly6JW/34a2CvfLdneki3p7jKUuadg7DAh0+IE=; b=ByNcoXsnEv7QtSyndcoiJoCXd4UEzrH8jSCJdesUvp4ZwJMe+Ap8im92+7UOxlIZMJ xL1TtWGE72fu4nMa1Yl9ONPnZmzONLOLhRWXIKVZWzP/GdjV6hObcLzdA+pOjIdZiL5K 5J4JhuMR609cfwckYUvEU5KG34KQbv0PB4E/zKnzgXsXKNgvlpbEql5+eQAuc6RNZScw GazdqZDemY9tP3wC9qrFRGQTShHf5frH53GuV48dQ9cCMBsHagmT27dWBWpMtYIeXjLo RQodfHZ2/uPqKdY2mc/C25oOiqBnmBqnnZubFz/sXGybS27YCEZNmLmew8uds8dBUTLW xghg== X-Gm-Message-State: AOJu0YyGb8Sb+cJmZI+C4OokU0Ex10YKEAijzqa06ytCDRMUQvcg/KQb Q3kWUfvIiTiqf8x3udd3O3vjjqz6txD2AP90CdNmwNxWW9KDzI0Pl/3BErfQxKhlrN0IqiSRE/w 6puzVGw== X-Received: from pjbdy16.prod.google.com ([2002:a17:90b:6d0:b0:34c:2156:9de7]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:5784:b0:340:4abf:391d with SMTP id 98e67ed59e1d1-353fecf6b60mr6484797a91.16.1769649370472; Wed, 28 Jan 2026 17:16:10 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:54 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-23-seanjc@google.com> Subject: [RFC PATCH v5 22/45] KVM: TDX: Get/put PAMT pages when (un)mapping private memory From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Kirill A. Shutemov Add Dynamic PAMT support to KVM's S-EPT MMU by "getting" a PAMT page when adding guest memory (PAGE.ADD or PAGE.AUG), and "putting" the page when removing guest memory (PAGE.REMOVE). To access the per-vCPU PAMT caches without plumbing @vcpu throughout the TDP MMU, begrudginly use kvm_get_running_vcpu() to get the vCPU, and bug the VM If KVM attempts to set an S-EPT without an active vCPU. KVM only supports creating _new_ mappings in page (pre)fault paths, all of which require an active vCPU. The PAMT memory holds metadata for TDX-protected memory. With Dynamic PAMT, PAMT_4K is allocated on demand. The kernel supplies the TDX module with a few pages that cover 2M of host physical memory. PAMT memory can be reclaimed when the last user is gone. It can happen in a few code paths: - On TDH.PHYMEM.PAGE.RECLAIM in tdx_reclaim_td_control_pages() and tdx_reclaim_page(). - On TDH.MEM.PAGE.REMOVE in tdx_sept_drop_private_spte(). - In tdx_sept_zap_private_spte() for pages that were in the queue to be added with TDH.MEM.PAGE.ADD, but it never happened due to an error. - In tdx_sept_free_private_spt() for SEPT pages; Signed-off-by: Kirill A. Shutemov [Minor log tweak] Signed-off-by: Rick Edgecombe Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm-x86-ops.h | 1 + arch/x86/include/asm/kvm_host.h | 1 + arch/x86/kvm/mmu/mmu.c | 4 +++ arch/x86/kvm/vmx/tdx.c | 44 ++++++++++++++++++++++++++---- arch/x86/kvm/vmx/tdx.h | 2 ++ 5 files changed, 47 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 17dddada69fc..394dc29483a7 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -99,6 +99,7 @@ KVM_X86_OP_OPTIONAL(free_external_sp) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) KVM_X86_OP_OPTIONAL(remove_external_spte) KVM_X86_OP_OPTIONAL(reclaim_external_sp) +KVM_X86_OP_OPTIONAL_RET0(topup_external_cache) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) KVM_X86_OP(get_l2_tsc_multiplier) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 6e84dbc89e79..a6e4ab76b1b2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1863,6 +1863,7 @@ struct kvm_x86_ops { struct kvm_mmu_page *sp); void (*remove_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level le= vel, u64 mirror_spte); + int (*topup_external_cache)(struct kvm_vcpu *vcpu, int min); =20 =20 bool (*has_wbinvd_exit)(void); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 9b5a6861e2a4..4ecbf216d96f 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -605,6 +605,10 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vc= pu, bool maybe_indirect) PT64_ROOT_MAX_LEVEL); if (r) return r; + + r =3D kvm_x86_call(topup_external_cache)(vcpu, PT64_ROOT_MAX_LEVEL); + if (r) + return r; } r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache, PT64_ROOT_MAX_LEVEL); diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 0946eba2de23..d74a2547e512 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -683,6 +683,8 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu) if (!irqchip_split(vcpu->kvm)) return -EINVAL; =20 + tdx_init_pamt_cache(&tdx->pamt_cache); + fpstate_set_confidential(&vcpu->arch.guest_fpu); vcpu->arch.apic->guest_apic_protected =3D true; INIT_LIST_HEAD(&tdx->vt.pi_wakeup_list); @@ -868,6 +870,8 @@ void tdx_vcpu_free(struct kvm_vcpu *vcpu) struct vcpu_tdx *tdx =3D to_tdx(vcpu); int i; =20 + tdx_free_pamt_cache(&tdx->pamt_cache); + if (vcpu->cpu !=3D -1) { KVM_BUG_ON(tdx->state =3D=3D VCPU_TD_STATE_INITIALIZED, vcpu->kvm); tdx_flush_vp_on_cpu(vcpu); @@ -1615,6 +1619,14 @@ void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t r= oot_hpa, int pgd_level) td_vmcs_write64(to_tdx(vcpu), SHARED_EPT_POINTER, root_hpa); } =20 +static int tdx_topup_external_pamt_cache(struct kvm_vcpu *vcpu, int min) +{ + if (!tdx_supports_dynamic_pamt(tdx_sysinfo)) + return 0; + + return tdx_topup_pamt_cache(&to_tdx(vcpu)->pamt_cache, min); +} + static int tdx_mem_page_add(struct kvm *kvm, gfn_t gfn, enum pg_level leve= l, kvm_pfn_t pfn) { @@ -1696,8 +1708,15 @@ static int tdx_sept_link_private_spt(struct kvm *kvm= , gfn_t gfn, static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte) { + struct kvm_vcpu *vcpu =3D kvm_get_running_vcpu(); struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); + struct vcpu_tdx *tdx =3D to_tdx(vcpu); + struct page *page =3D pfn_to_page(pfn); + int ret; + + if (KVM_BUG_ON(!vcpu, kvm)) + return -EINVAL; =20 if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) return -EIO; @@ -1711,6 +1730,10 @@ static int tdx_sept_set_private_spte(struct kvm *kvm= , gfn_t gfn, =20 WARN_ON_ONCE((mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); =20 + ret =3D tdx_pamt_get(page, &tdx->pamt_cache); + if (ret) + return ret; + /* * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory() * before kvm_tdx->state. Userspace must not be allowed to pre-fault @@ -1723,14 +1746,17 @@ static int tdx_sept_set_private_spte(struct kvm *kv= m, gfn_t gfn, * If the TD isn't finalized/runnable, then userspace is initializing * the VM image via KVM_TDX_INIT_MEM_REGION; ADD the page to the TD. */ - if (unlikely(kvm_tdx->state !=3D TD_STATE_RUNNABLE)) - return tdx_mem_page_add(kvm, gfn, level, pfn); + if (likely(kvm_tdx->state =3D=3D TD_STATE_RUNNABLE)) + ret =3D tdx_mem_page_aug(kvm, gfn, level, pfn); + else + ret =3D tdx_mem_page_add(kvm, gfn, level, pfn); =20 - return tdx_mem_page_aug(kvm, gfn, level, pfn); + if (ret) + tdx_pamt_put(page); + + return ret; } =20 - - /* * Ensure shared and private EPTs to be flushed on all vCPUs. * tdh_mem_track() is the only caller that increases TD epoch. An increase= in @@ -1847,6 +1873,7 @@ static void tdx_sept_remove_private_spte(struct kvm *= kvm, gfn_t gfn, return; =20 tdx_quirk_reset_page(page); + tdx_pamt_put(page); } =20 void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, @@ -3614,5 +3641,12 @@ void __init tdx_hardware_setup(void) vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; vt_x86_ops.reclaim_external_sp =3D tdx_sept_reclaim_private_sp; vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; + + /* + * FIXME: Wire up the PAMT hook iff DPAMT is supported, once VMXON is + * moved out of KVM and tdx_bringup() is folded into here. + */ + vt_x86_ops.topup_external_cache =3D tdx_topup_external_pamt_cache; + vt_x86_ops.protected_apic_has_interrupt =3D tdx_protected_apic_has_interr= upt; } diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index ce2720a028ad..f444fc84d93b 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -73,6 +73,8 @@ struct vcpu_tdx { =20 u64 map_gpa_next; u64 map_gpa_end; + + struct tdx_pamt_cache pamt_cache; }; =20 void tdh_vp_rd_failed(struct vcpu_tdx *tdx, char *uclass, u32 field, u64 e= rr); --=20 2.53.0.rc1.217.geba53bf80e-goog