From nobody Mon Jun 8 16:49:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 78DA53815C5; Thu, 28 May 2026 08:50:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958239; cv=none; b=CBAfA9quJQLUQ1RbKyus1dj3FT1x/FHbJuliNV7dh6/O41Umhm1ikl2+rrcgS96yryo3RmmoJv5TkcIjwnjlmGHM8Oc5FE1D7fkXN/jNDmmAzr9ohXvlOzPm4MIePyMJOlMHQpU4m42tDK+mgC7pECGg5kVfe02sBt6aKoKG7zE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958239; c=relaxed/simple; bh=QBjT4H/o5E0+CDJoBDAHY3pH9otmbxSdU9SWWy32WA0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OKKi7nC0PThQ2pm/41G6IQLCG7Ey/P/8d+DxDwIGzBpbmxfllsBJEqySPk6I37otfKfyFOj/4RIQDbNZRj5xmfE/m8b0Kw6jbsMV982R9WDavZeBbDEk2mffb5UY4oEHG6LRVsh4W/pXN0GnoMYX9serEm7q/FwtxzRxcQrCKI0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nxhO+rfl; arc=none smtp.client-ip=198.175.65.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nxhO+rfl" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779958238; x=1811494238; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=QBjT4H/o5E0+CDJoBDAHY3pH9otmbxSdU9SWWy32WA0=; b=nxhO+rflvYjfd1tbsN4zpmrA1+SssOl+2icRpns7Ld8c20XVF6K4JGyq taQW8Yh0hq+9Tz3RPxyL8Zki3B0ociqBdSzSPxCNcUCXK2yhbwTWhDmLE kJ9Up0QKv3EWwPEwcl5YFxBjJnPFYDdmIiMbu1hxZj1w0aP4EXW1kYBsN 5vg8sptvv5AI7vjo363YbhbmL4Li/XhPMDOzWA7stQaCY+6rqSi1n88Z3 s58fVfh9TsA37TlPyvcxJe5xDHTq4nvCio46CstgOL3XpqRFH7lplUB4i PmmJLjR/+creVititGeGd0/aTQnrPvRwR3owm1PWt7Q+LCKiRAShSwwxq g==; X-CSE-ConnectionGUID: 80W8dtutSeCPSBHcyRDsmw== X-CSE-MsgGUID: 1v25EvxCSayNUswQPBkcJA== X-IronPort-AV: E=McAfee;i="6800,10657,11799"; a="91105722" X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="91105722" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:50:37 -0700 X-CSE-ConnectionGUID: px5UrW9nRSKjLS5CYlSVsA== X-CSE-MsgGUID: +bqs9Xs6TwS4PCVwEeKlEg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="241460893" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by orviesa006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:50:34 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v3 01/15] KVM: TDX: Drop kvm_x86_ops.link_external_spt() Date: Thu, 28 May 2026 16:10:29 +0800 Message-ID: <20260528081029.10220-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260528080856.10141-1-yan.y.zhao@intel.com> References: <20260528080856.10141-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Drop the dedicated .link_external_spt() for linking S-EPT pages, and instead funnel everything through .set_external_spte() for mapping S-EPT entries. Using separate hooks doesn't help prevent TDP MMU details from bleeding into TDX, and vice versa; to the contrary, dedicated callbacks will result in _more_ pollution when hugepage support is added, e.g. will require the TDP MMU to know details about the splitting rules for TDX that aren't all that relevant to the TDP MMU. Ideally, KVM would provide a single pair of hooks to set S-EPT entries, one hook for setting SPTEs under write-lock and another for setting SPTEs under read-lock (e.g. to ensure the entire operation is "atomic", to allow for failure, etc.). Sadly, TDX's requirement that all child S-EPT entries are removed before the parent makes that impractical: the TDP MMU deliberately prunes non-leaf SPTEs and _then_ processes its children, thus making it quite important for the TDP MMU to differentiate between zapping leaf and non-leaf S-EPT entries. However, that's the _only_ case that's truly special, and even that case could be shoehorned into a single hook; it just wouldn't be a net positive. Signed-off-by: Sean Christopherson [Rick: add in trivial feedback] Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v3: - Rebased to kvm-x86-next-2026.05.26. MMU_refactors v2: - Moved this patch to the very beginning of the series so that when warnings like "KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)" are removed from __tdp_mmu_set_spte_atomic() in TDP MMU in a later patch, the atomic zap change can be propagated via the .set_external_spte() op. (Yan). MMU_refactors v1: - rename external_spt->sept_pt (Rick, Yan) --- arch/x86/include/asm/kvm-x86-ops.h | 1 - arch/x86/include/asm/kvm_host.h | 3 -- arch/x86/kvm/mmu/tdp_mmu.c | 29 +------------- arch/x86/kvm/vmx/tdx.c | 63 ++++++++++++++++++++---------- 4 files changed, 44 insertions(+), 52 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index b0269325646c..2cb393000ee9 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -96,7 +96,6 @@ KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP_OPTIONAL_RET0(tdp_has_smep) KVM_X86_OP(load_mmu_pgd) -KVM_X86_OP_OPTIONAL_RET0(link_external_spt) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) KVM_X86_OP_OPTIONAL_RET0(free_external_spt) KVM_X86_OP_OPTIONAL(remove_external_spte) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index d72051150bac..55ec998cc861 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1914,9 +1914,6 @@ struct kvm_x86_ops { void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); =20 - /* Update external mapping with page table link. */ - int (*link_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, - void *external_spt); /* Update the external page table from spte getting set. */ int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index f98afc3422ce..0dc1b0597f8a 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -495,27 +495,12 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pt= ep_t pt, bool shared) call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); } =20 -static void *get_external_spt(gfn_t gfn, u64 new_spte, int level) -{ - if (is_shadow_present_pte(new_spte) && !is_last_spte(new_spte, level)) { - struct kvm_mmu_page *sp =3D spte_to_child_sp(new_spte); - - WARN_ON_ONCE(sp->role.level + 1 !=3D level); - WARN_ON_ONCE(sp->gfn !=3D gfn); - return sp->external_spt; - } - - return NULL; -} - static int __must_check set_external_spte_present(struct kvm *kvm, tdp_pte= p_t sptep, gfn_t gfn, u64 *old_spte, u64 new_spte, int level) { bool was_present =3D is_shadow_present_pte(*old_spte); - bool is_present =3D is_shadow_present_pte(new_spte); - bool is_leaf =3D is_present && is_last_spte(new_spte, level); - int ret =3D 0; + int ret; =20 KVM_BUG_ON(was_present, kvm); =20 @@ -528,18 +513,8 @@ static int __must_check set_external_spte_present(stru= ct kvm *kvm, tdp_ptep_t sp if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE)) return -EBUSY; =20 - /* - * Use different call to either set up middle level - * external page table, or leaf. - */ - if (is_leaf) { - ret =3D kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); - } else { - void *external_spt =3D get_external_spt(gfn, new_spte, level); + ret =3D kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); =20 - KVM_BUG_ON(!external_spt, kvm); - ret =3D kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt); - } if (ret) __kvm_tdp_mmu_write_spte(sptep, *old_spte); else diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index d43948a3bb11..34e7e4833871 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1659,18 +1659,58 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t = gfn, return 0; } =20 +static struct page *tdx_spte_to_sept_pt(struct kvm *kvm, gfn_t gfn, + u64 new_spte, enum pg_level level) +{ + struct kvm_mmu_page *sp =3D spte_to_child_sp(new_spte); + + if (KVM_BUG_ON(!sp->external_spt, kvm) || + KVM_BUG_ON(sp->role.level + 1 !=3D level, kvm) || + KVM_BUG_ON(sp->gfn !=3D gfn, kvm)) + return NULL; + + return virt_to_page(sp->external_spt); +} + +static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, u64 mirror_spte) +{ + gpa_t gpa =3D gfn_to_gpa(gfn); + u64 err, entry, level_state; + struct page *sept_pt; + + sept_pt =3D tdx_spte_to_sept_pt(kvm, gfn, mirror_spte, level); + if (!sept_pt) + return -EIO; + + err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, level, sept_pt, + &entry, &level_state); + if (unlikely(tdx_operand_busy(err))) + return -EBUSY; + + if (TDX_BUG_ON_2(err, TDH_MEM_SEPT_ADD, entry, level_state, kvm)) + return -EIO; + + return 0; +} + static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); =20 + if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) + return -EIO; + + if (!is_last_spte(mirror_spte, level)) + return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte); + /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) return -EIO; =20 - WARN_ON_ONCE(!is_shadow_present_pte(mirror_spte) || - (mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); + WARN_ON_ONCE((mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); =20 /* * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory() @@ -1690,24 +1730,6 @@ static int tdx_sept_set_private_spte(struct kvm *kvm= , gfn_t gfn, return tdx_mem_page_aug(kvm, gfn, level, pfn); } =20 -static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, - enum pg_level level, void *private_spt) -{ - gpa_t gpa =3D gfn_to_gpa(gfn); - struct page *page =3D virt_to_page(private_spt); - u64 err, entry, level_state; - - err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, level, page, &entry, - &level_state); - if (unlikely(tdx_operand_busy(err))) - return -EBUSY; - - if (TDX_BUG_ON_2(err, TDH_MEM_SEPT_ADD, entry, level_state, kvm)) - return -EIO; - - return 0; -} - /* * Ensure shared and private EPTs to be flushed on all vCPUs. * tdh_mem_track() is the only caller that increases TD epoch. An increase= in @@ -3424,7 +3446,6 @@ int __init tdx_hardware_setup(void) =20 vt_x86_ops.vm_size =3D max_t(unsigned int, vt_x86_ops.vm_size, sizeof(str= uct kvm_tdx)); =20 - vt_x86_ops.link_external_spt =3D tdx_sept_link_private_spt; vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; vt_x86_ops.free_external_spt =3D tdx_sept_free_private_spt; vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; --=20 2.43.2 From nobody Mon Jun 8 16:49:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D15E38A707; Thu, 28 May 2026 08:51:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958276; cv=none; b=hyfaL2dj6quch4vFfDed3ng5kUJwo+vc2nlMEy6cqhwKk6LNrUuqhUfPUzCtMYTUU+1RfNmsueS+P/iF/FyxtxCHOTwuAIXm1P16Iy3cZ3HQjAyTx84WVXV1BLF10HUnz3ocATytAEW7CfNamRjrdNZdNTLeqABUsAAmYd9W4vE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958276; c=relaxed/simple; bh=OVf/dowr3Zhd4b80Ku3pHctRCBG5qFP1Ep1ToePlpp8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=V/v4TqNIgKdwwdrefZqf2jgsV0XzGLMFag7ApWriPCAHCRfjbzbR9gqgTZTqgAijSPq8fpYeiM5OYjtA66uO6RIFwVdQOMqh90GmKpbXyIKP22Kacb8etcjn/37FsgdFdTpJUbvK1cWPJj5KtoNgK4qUEu520PlNmGZPEfXw7QY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hJU/UD58; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hJU/UD58" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779958275; x=1811494275; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OVf/dowr3Zhd4b80Ku3pHctRCBG5qFP1Ep1ToePlpp8=; b=hJU/UD58H1QdZ6ozAXkROaf4RhtYEth9oKBfbAAGgK7vzd2NoidW3wR+ 6vM7ypeUiLS4XWjZK4Gc1cvrtxz5zPBaChlNZ+/JnADBoId+MCyz3qqZa qsHECAS4kiY/I7Cp9MGK4zoaDrbOfI4HsW7FwlcVtLy8NXny4w6DyTzGm XgPEafotCEibF5uCEzUF8ywDJVZG37MvyZOW0owtpZ9OeK+CiUy0qvQbn rvPaU/APjTB12YNob3UoHpy4ml8XhK4uruOi7ico8Qr7n1HDwgURuWhhk zctBoVtSI6T7zFNmZSso5VJ0jUZ7EFFFty0eqn8ydgILR/KphoTG6pRZh Q==; X-CSE-ConnectionGUID: mBDjG8rTTaGF1OyU4+5n4A== X-CSE-MsgGUID: 8Lbb7BmrRDuxuBLH5zlGHA== X-IronPort-AV: E=McAfee;i="6800,10657,11799"; a="103469608" X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="103469608" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:51:14 -0700 X-CSE-ConnectionGUID: H3SIFul6QOiWvQr7VLxTRA== X-CSE-MsgGUID: K3921SkiSjCKPb3TybGo8w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="247442038" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:51:11 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v3 02/15] KVM: TDX: Wrap mapping of leaf and non-leaf S-EPT entries into helpers Date: Thu, 28 May 2026 16:11:06 +0800 Message-ID: <20260528081106.10236-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260528080856.10141-1-yan.y.zhao@intel.com> References: <20260528080856.10141-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Add a helper, tdx_sept_map_leaf_spte(), to wrap and isolate PAGE.ADD and PAGE.AUG operations. Rename tdx_sept_link_private_spt() to tdx_sept_map_nonleaf_spte() to wrap SEPT.ADD for symmetry. Thus, transition tdx_sept_set_private_spte() into a "dispatch" routine for setting/writing S-EPT entries. No functional change intended. Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v3: - Rebased to kvm-x86-next-2026.05.26. MMU_refactors v2: - Renamed tdx_sept_link_private_spt() to tdx_sept_map_nonleaf_spte(). (Yan) --- arch/x86/kvm/vmx/tdx.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 34e7e4833871..9279f938987b 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1672,7 +1672,7 @@ static struct page *tdx_spte_to_sept_pt(struct kvm *k= vm, gfn_t gfn, return virt_to_page(sp->external_spt); } =20 -static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, +static int tdx_sept_map_nonleaf_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte) { gpa_t gpa =3D gfn_to_gpa(gfn); @@ -1694,18 +1694,12 @@ static int tdx_sept_link_private_spt(struct kvm *kv= m, gfn_t gfn, return 0; } =20 -static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 mirror_spte) +static int tdx_sept_map_leaf_spte(struct kvm *kvm, gfn_t gfn, enum pg_leve= l level, + u64 mirror_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); =20 - if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) - return -EIO; - - if (!is_last_spte(mirror_spte, level)) - return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte); - /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) return -EIO; @@ -1730,6 +1724,18 @@ static int tdx_sept_set_private_spte(struct kvm *kvm= , gfn_t gfn, return tdx_mem_page_aug(kvm, gfn, level, pfn); } =20 +static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, u64 mirror_spte) +{ + if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) + return -EIO; + + if (!is_last_spte(mirror_spte, level)) + return tdx_sept_map_nonleaf_spte(kvm, gfn, level, mirror_spte); + + return tdx_sept_map_leaf_spte(kvm, gfn, level, mirror_spte); +} + /* * Ensure shared and private EPTs to be flushed on all vCPUs. * tdh_mem_track() is the only caller that increases TD epoch. An increase= in --=20 2.43.2 From nobody Mon Jun 8 16:49:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B0A1C3FF1; Thu, 28 May 2026 08:51:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958287; cv=none; b=ajAhrYxkET0N1qa1rc3jU7u42wMAHN7tueXsctQCKU5ydao4nxU6izkTsbs6Qvcd787uCveKxAUPAC9BiPni7tIxjMuB1MN6Nz/Bv6ptuClV4xPZcSpYVfxiWLesd2iulm72BQ6SeGxDo58JllLUKafWyaGw9HOm9mOsZCAPIqU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958287; c=relaxed/simple; bh=PZ3M6CddHvhNU8pALO7JEmoJ39JQW3w+/EYF1FVILYA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=E/9zUREsr1C4tFV+DRQ1Usl37Vsbhz77wJKJBq/JqntjNh9wWapoeRGwxWNn8turXEQv+K0mAby9yQjdgEzjnvWd3g/EgPhi2koLz6WOrfvFJZ7V4HhK1oUYaSEr1h8Bd7Rx2sffT8oJM2HXERQmtIcBEsDE7nKmtp0pyu2OmjM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gC0xhoe9; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gC0xhoe9" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779958286; x=1811494286; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PZ3M6CddHvhNU8pALO7JEmoJ39JQW3w+/EYF1FVILYA=; b=gC0xhoe9DWXndjLCmVm2Mnwjyd7UyugBal0jYDUXpEGAqOWGNTS+4VKy IJqdohSwtSk+8TzJbMEfolD9jw0BfQRw/wPuAyPxx7Lys95yh5CLuqVu2 IgetlkwEthgmvPGJQHGOGkqtJUxUaa/Vqs/ZsdZN+1dgKJPhlaG8DzV4y QZ8eV9T3hDL15kAVKTrH2nBDX8qpfGIfHeJGVe65Wid+q83+zgV0ho/fm m2oXPUCcZQS75DzbjHJQePSS/eh+vYb5Bz6z6qUbJ3mcMSz3gmRD7s6I1 VyiUKCFYqTE0b5lXwkft8NElQ3NtrXmBAs0zN+r44Ovn2RvK8XjL/dG6U w==; X-CSE-ConnectionGUID: M8yoIZg3TKq6+mckaJiMbA== X-CSE-MsgGUID: iw0NKZnwSD+lrOHKqigyaw== X-IronPort-AV: E=McAfee;i="6800,10657,11799"; a="103469631" X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="103469631" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:51:26 -0700 X-CSE-ConnectionGUID: KDkkU1E3S4+HeQJG8VwTmQ== X-CSE-MsgGUID: h7he8ZovTZOnfHTDlpIu/A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="247442045" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:51:23 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v3 03/15] KVM: x86/mmu: Fold set_external_spte_present() into its sole caller Date: Thu, 28 May 2026 16:11:17 +0800 Message-ID: <20260528081117.10252-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260528080856.10141-1-yan.y.zhao@intel.com> References: <20260528080856.10141-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Fold set_external_spte_present() into __tdp_mmu_set_spte_atomic() in anticipation of propagating all changes (like atomic zap) triggered by tdp_mmu_set_spte_atomic() to the external PTEs. No functional change intended. Signed-off-by: Sean Christopherson Signed-off-by: Yan Zhao --- MMU_refactors v3: - Rebased to kvm-x86-next-2026.05.26. MMU_refactors v2: - Moved to the front of the series and updated the patch log to indicate the propagation of changes for atomic zap. (Yan) --- arch/x86/kvm/mmu/tdp_mmu.c | 72 ++++++++++++++++---------------------- 1 file changed, 31 insertions(+), 41 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 0dc1b0597f8a..128089571e34 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -495,33 +495,6 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pte= p_t pt, bool shared) call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); } =20 -static int __must_check set_external_spte_present(struct kvm *kvm, tdp_pte= p_t sptep, - gfn_t gfn, u64 *old_spte, - u64 new_spte, int level) -{ - bool was_present =3D is_shadow_present_pte(*old_spte); - int ret; - - KVM_BUG_ON(was_present, kvm); - - lockdep_assert_held(&kvm->mmu_lock); - /* - * We need to lock out other updates to the SPTE until the external - * page table has been modified. Use FROZEN_SPTE similar to - * the zapping case. - */ - if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE)) - return -EBUSY; - - ret =3D kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); - - if (ret) - __kvm_tdp_mmu_write_spte(sptep, *old_spte); - else - __kvm_tdp_mmu_write_spte(sptep, new_spte); - return ret; -} - /** * handle_changed_spte - handle bookkeeping associated with an SPTE change * @kvm: kvm instance @@ -626,6 +599,8 @@ static inline int __must_check __tdp_mmu_set_spte_atomi= c(struct kvm *kvm, struct tdp_iter *iter, u64 new_spte) { + u64 *raw_sptep =3D rcu_dereference(iter->sptep); + /* * The caller is responsible for ensuring the old SPTE is not a FROZEN * SPTE. KVM should never attempt to zap or manipulate a FROZEN SPTE, @@ -635,8 +610,13 @@ static inline int __must_check __tdp_mmu_set_spte_atom= ic(struct kvm *kvm, WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte)); =20 if (is_mirror_sptep(iter->sptep) && !is_frozen_spte(new_spte)) { + bool was_present =3D is_shadow_present_pte(iter->old_spte); int ret; =20 + KVM_BUG_ON(was_present, kvm); + + lockdep_assert_held(&kvm->mmu_lock); + /* * Users of atomic zapping don't operate on mirror roots, * so don't handle it and bug the VM if it's seen. @@ -644,25 +624,35 @@ static inline int __must_check __tdp_mmu_set_spte_ato= mic(struct kvm *kvm, if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EBUSY; =20 - ret =3D set_external_spte_present(kvm, iter->sptep, iter->gfn, - &iter->old_spte, new_spte, iter->level); - if (ret) - return ret; - } else { - u64 *sptep =3D rcu_dereference(iter->sptep); - /* - * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs - * and does not hold the mmu_lock. On failure, i.e. if a - * different logical CPU modified the SPTE, try_cmpxchg64() - * updates iter->old_spte with the current value, so the caller - * operates on fresh data, e.g. if it retries - * tdp_mmu_set_spte_atomic() + * We need to lock out other updates to the SPTE until the external + * page table has been modified. Use FROZEN_SPTE similar to + * the zapping case. */ - if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte)) + if (!try_cmpxchg64(raw_sptep, &iter->old_spte, FROZEN_SPTE)) return -EBUSY; + + ret =3D kvm_x86_call(set_external_spte)(kvm, iter->gfn, iter->level, + new_spte); + + if (ret) + __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte); + else + __kvm_tdp_mmu_write_spte(iter->sptep, new_spte); + + return ret; } =20 + /* + * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs and + * does not hold the mmu_lock. On failure, i.e. if a different logical + * CPU modified the SPTE, try_cmpxchg64() updates iter->old_spte with + * the current value, so the caller operates on fresh data, e.g. if it + * retries tdp_mmu_set_spte_atomic(). + */ + if (!try_cmpxchg64(raw_sptep, &iter->old_spte, new_spte)) + return -EBUSY; + return 0; } =20 --=20 2.43.2 From nobody Mon Jun 8 16:49:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 01751388386; Thu, 28 May 2026 08:51:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958303; cv=none; b=VXCPOb2qERk6m+O+znqGC7jdvdWWGXfVjgwZBWjbb8HphM+Kme8x2CouA2/omIZyvzeuXRYly3ILsNTHON0MuwPd4n2KhMBRl5u0v/edt1nWeDxpR9Lz4bH+kidb+E8Q8fjaVk61PSV5BHByaoQQCfV3Lkbhr9lFwOsimjjs+fE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958303; c=relaxed/simple; bh=63SKONVqBiu7P5HylKFMiBVzYGqnH5AwvFZJ6uDYUYs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=GosdEf1KTAcyQEGjehrxjCriy2hnzueNaQQt1oUYOXvN+5RMOfOc7nO3hUCj/qJWuLkZLBAYF8N7IgpJ8pmUD/fosoS/HpNNDd/E2TTq6tMGXIzYnD3TQ70h/aFk/3XJ/UlX4b3yY60Iuq3AJs+hmnoCEkh7fm1xQ2QZHyvfHuM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NrqeIjmW; arc=none smtp.client-ip=198.175.65.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NrqeIjmW" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779958302; x=1811494302; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=63SKONVqBiu7P5HylKFMiBVzYGqnH5AwvFZJ6uDYUYs=; b=NrqeIjmWDZklLpt3rGFNxV5cD2iJT+i++c0FkbK/U0wR2fLoDdD7vSxd vn8IEd9iMiBKnG+Z/vEyqxM0Ciau6zIkm57sbLhMD7Apx8OA4CNJqOrJs JV4lMAMSfey45S0WLGe1+bt3RQNGC8c7txEaeHwZKa+RVVAKFdC8uHWeD 7yv/dC2/c2lugKZX/LdrjEB7Zdw5FSpf5RxCK0jYmdLwzOAjG0mao+bq8 cVc4U5OfSTtergBUvIgh2Ll4ygOapvNMYKlNp9Tz/2CQkXZ/4p4smsLm8 uyQGKlSuZCBD39v8PYAHiAkLd9AdTuD5gFe0Cbg6dPQcaGz233GKOLk9g g==; X-CSE-ConnectionGUID: Rd8JnvQ9RmG2hz0acwWfnA== X-CSE-MsgGUID: BguWuvmdQYK75dkzKVXrUA== X-IronPort-AV: E=McAfee;i="6800,10657,11799"; a="80765325" X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="80765325" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa111.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:51:42 -0700 X-CSE-ConnectionGUID: tvZ2y43KTtqKK31r7ptUiA== X-CSE-MsgGUID: 30EYNkYUSKWsIc67S12vfQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="280605195" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:51:33 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v3 04/15] KVM: x86/mmu: Plumb param "old_spte" into kvm_x86_ops.set_external_spte() Date: Thu, 28 May 2026 16:11:29 +0800 Message-ID: <20260528081129.10268-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260528080856.10141-1-yan.y.zhao@intel.com> References: <20260528080856.10141-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson If tdp_mmu_set_spte_atomic() triggers an atomic zap on a mirror SPTE (though currently no paths trigger it), the change is propagated via the set_external_spte() op. Plumb the old SPTE into the set_external_spte() op, so TDX code rather than TDP MMU core can warn if the atomic zap isn't allowed. Rename mirror_spte to new_spte to follow the TDP MMU's naming, and to make it more obvious what value the parameter holds. Opportunistically tweak the ordering of parameters to match the pattern of most TDP MMU functions, which do "old, new, level". Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v3: - Rebased to kvm-x86-next-2026.05.26. MMU_refactors v2: - Moved this patch to before dropping the warning of "KVM_BUG_ON(was_present, kvm)" in __tdp_mmu_set_spte_atomic(). So TDX's tdx_sept_set_private_spte() can later warn instead if atomic zap is propagated via the set_external_spte() op (as allowed by tdp_mmu_set_spte_atomic() if it occurs). (Yan) --- arch/x86/include/asm/kvm_host.h | 4 ++-- arch/x86/kvm/mmu/tdp_mmu.c | 4 ++-- arch/x86/kvm/vmx/tdx.c | 22 +++++++++++----------- 3 files changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 55ec998cc861..7710471a1c49 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1915,8 +1915,8 @@ struct kvm_x86_ops { int root_level); =20 /* Update the external page table from spte getting set. */ - int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, - u64 mirror_spte); + int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, u64 old_spte, + u64 new_spte, enum pg_level level); =20 /* Update external page tables for page table about to be freed. */ int (*free_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 128089571e34..4e961b331780 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -632,8 +632,8 @@ static inline int __must_check __tdp_mmu_set_spte_atomi= c(struct kvm *kvm, if (!try_cmpxchg64(raw_sptep, &iter->old_spte, FROZEN_SPTE)) return -EBUSY; =20 - ret =3D kvm_x86_call(set_external_spte)(kvm, iter->gfn, iter->level, - new_spte); + ret =3D kvm_x86_call(set_external_spte)(kvm, iter->gfn, iter->old_spte, + new_spte, iter->level); =20 if (ret) __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte); diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 9279f938987b..1da637204807 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1673,13 +1673,13 @@ static struct page *tdx_spte_to_sept_pt(struct kvm = *kvm, gfn_t gfn, } =20 static int tdx_sept_map_nonleaf_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 mirror_spte) + enum pg_level level, u64 new_spte) { gpa_t gpa =3D gfn_to_gpa(gfn); u64 err, entry, level_state; struct page *sept_pt; =20 - sept_pt =3D tdx_spte_to_sept_pt(kvm, gfn, mirror_spte, level); + sept_pt =3D tdx_spte_to_sept_pt(kvm, gfn, new_spte, level); if (!sept_pt) return -EIO; =20 @@ -1695,16 +1695,16 @@ static int tdx_sept_map_nonleaf_spte(struct kvm *kv= m, gfn_t gfn, } =20 static int tdx_sept_map_leaf_spte(struct kvm *kvm, gfn_t gfn, enum pg_leve= l level, - u64 mirror_spte) + u64 new_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); + kvm_pfn_t pfn =3D spte_to_pfn(new_spte); =20 /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) return -EIO; =20 - WARN_ON_ONCE((mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); + WARN_ON_ONCE((new_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); =20 /* * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory() @@ -1724,16 +1724,16 @@ static int tdx_sept_map_leaf_spte(struct kvm *kvm, = gfn_t gfn, enum pg_level leve return tdx_mem_page_aug(kvm, gfn, level, pfn); } =20 -static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 mirror_spte) +static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, + u64 new_spte, enum pg_level level) { - if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) + if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EIO; =20 - if (!is_last_spte(mirror_spte, level)) - return tdx_sept_map_nonleaf_spte(kvm, gfn, level, mirror_spte); + if (!is_last_spte(new_spte, level)) + return tdx_sept_map_nonleaf_spte(kvm, gfn, level, new_spte); =20 - return tdx_sept_map_leaf_spte(kvm, gfn, level, mirror_spte); + return tdx_sept_map_leaf_spte(kvm, gfn, level, new_spte); } =20 /* --=20 2.43.2 From nobody Mon Jun 8 16:49:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 859CC38B157; Thu, 28 May 2026 08:51:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958307; cv=none; b=SjI0QconOm2z4QR4PY2SkJLOHxh2imdD0xbp1g1mP5/J3x8UBXr5Qt/adJaX96dSYDh9gQgz4OjwyiM8mChwwURW9yXJ3oqE+sGwpMTY+aes5/QoKa+VqHk988U061MFyCoOrXbyxQzThbHuKHOq1++MaqvrO4un+nxwNjkcok0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958307; c=relaxed/simple; bh=fvjGwyJiJciHSi8GUWc6P9JcR4Q4R2mZ1lNLtK1XAVg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=OF9ojWRGpLPvH0byx87HXfrAb6BPUhW+BKN0ZiuEA7w2rIINSxfdko8RmLoFSTkgkw5DGzOT55XY86Ea7ChbBAV8/WMvBLs55omOwHzG6tdVYO3kNuPyvLbEwmTwXh1uHIQM9jG3ZOOYZLGSijC9qNQ+LexF2ZG7B29xV1nPdjo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=R9wijnmw; arc=none smtp.client-ip=198.175.65.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="R9wijnmw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779958307; x=1811494307; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fvjGwyJiJciHSi8GUWc6P9JcR4Q4R2mZ1lNLtK1XAVg=; b=R9wijnmwu9gVkpTCDlGVLq5JDapEYKrIn/TSG2vufA26E7wZkKOhGxaO SgvkTfe3GhAT9hIKKsLESKOLeNf8R8Tl4Yu5chWcG330XAB2NUsbPvntd svWxpR+bwqfEaGQMv5X9VVXMGuTY2BtyiBbte2Erf/27biZNTpS6gx0op eId1cmL0XfN5zB++Qwsxco8J8CShiD9ixh0BMBqyUD3wzAEi1AmjXVHpx Bk48PTbcdcqyOKnc0rSR0rYhqYhhOK8lNP9uG3VMwnsBZ2SDdOKqKroDR jKfSehquKeat7exxmnGLHEp7rbA+TWYsjxHIl83APqweJ8XRch8WYTgur w==; X-CSE-ConnectionGUID: CIfpGif7Sp+bq8jBZFGMkw== X-CSE-MsgGUID: rhUcXiruSa+/F5Crh9H+hg== X-IronPort-AV: E=McAfee;i="6800,10657,11799"; a="80765338" X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="80765338" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa111.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:51:47 -0700 X-CSE-ConnectionGUID: 30+y9CNCToSOzhWJcSexIQ== X-CSE-MsgGUID: kYQks4b9RPCjfuGoMRb/MA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="280605209" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:51:43 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v3 05/15] KVM: TDX: Move KVM_BUG_ON()s in __tdp_mmu_set_spte_atomic() to TDX code Date: Thu, 28 May 2026 16:11:39 +0800 Message-ID: <20260528081139.10284-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260528080856.10141-1-yan.y.zhao@intel.com> References: <20260528080856.10141-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rick Edgecombe Drop some KVM_BUG_ON()s that are guarding against TDP MMU attempting to propagate unsupported changes to the external page table through __tdp_mmu_set_spte_atomic(). Have TDX code trigger them instead. Now that TDP MMU logically allows propagating atomic zapping operation to the external page table through the set_external_spte() op in __tdp_mmu_set_spte_atomic(). TDX code will trigger the KVM_BUG_ON() on the atomic zapping request instead. (Note: non-atomic zapping is not propagated via the set_external_spte() op yet). Despite the generic naming, external page table ops are designed completely around TDX. They hook the bare minimum of what is needed, and exclude the operations that are not supported by TDX. To help wrangle which operations are handleable by various operations, warnings and KVM_BUG_ON()s exist in the code. These warnings and KVM_BUG_ON()s put the burden of understanding which operations should be forwarded to TDX code on TDP MMU developers, who often read the code without TDX context. Future changes will transition the encapsulation of this domain knowledge to TDX code by funneling the external page table updates through a central update mechanism. In this paradigm, the central update mechanism can encapsulate the special knowledge, but will not have as much knowledge about what operation is in progress. Suggested-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v3: - Rebased to kvm-x86-next-2026.05.26. MMU_refactors v2: - Moved this patch after "KVM: TDX: Drop kvm_x86_ops.link_external_spt()" and "KVM: x86/mmu: Plumb param "old_spte" into kvm_x86_ops.set_external_spte()". (Yan) - Added a replacement KVM_BUG_ON() in TDX for the dropped KVM_BUG_ON(was_present, kvm) in __tdp_mmu_set_spte_atomic(). (Yan). --- arch/x86/kvm/mmu/tdp_mmu.c | 10 ---------- arch/x86/kvm/vmx/tdx.c | 3 +++ 2 files changed, 3 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 4e961b331780..ed806a4768c2 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -610,20 +610,10 @@ static inline int __must_check __tdp_mmu_set_spte_ato= mic(struct kvm *kvm, WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte)); =20 if (is_mirror_sptep(iter->sptep) && !is_frozen_spte(new_spte)) { - bool was_present =3D is_shadow_present_pte(iter->old_spte); int ret; =20 - KVM_BUG_ON(was_present, kvm); - lockdep_assert_held(&kvm->mmu_lock); =20 - /* - * Users of atomic zapping don't operate on mirror roots, - * so don't handle it and bug the VM if it's seen. - */ - if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) - return -EBUSY; - /* * We need to lock out other updates to the SPTE until the external * page table has been modified. Use FROZEN_SPTE similar to diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 1da637204807..e8eab25f3159 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1727,6 +1727,9 @@ static int tdx_sept_map_leaf_spte(struct kvm *kvm, gf= n_t gfn, enum pg_level leve static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, u64 new_spte, enum pg_level level) { + if (KVM_BUG_ON(is_shadow_present_pte(old_spte), kvm)) + return -EIO; + if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EIO; =20 --=20 2.43.2 From nobody Mon Jun 8 16:49:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD62E382F01; Thu, 28 May 2026 08:51:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.19 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958320; cv=none; b=S3gURNaK52hYwmYWOejiUpoXdq2escTelUr+n2JNGUQXQv3dRuMqULVn2DLwROXd2P/v+FGq4K2M/ffGo5DdQN2hmJLwsHdSAhoaced1q+sKwJeivzlVxlhAqvmBIU7PvTtTy9SEtpg3HzwqIarPYqR6VTYxa70C34wkc4VJR5M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958320; c=relaxed/simple; bh=DNyqlt01HmGjeWRhvKJSBQpk0mSCRc9/JsFqqZyNOsA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gr+b0zJqGUPapAs3RaiakUj8cMyfNAqjEH9OORFKulC86Y/f/LrIXegYrzsxorZxRJ8QRp//Vg2bM+ILKKOx6IZXK3VkQsuBnnONG1oCkco9mkENcOIgODRtR2dJNIA9bXnFqSkyd8Zyj5MFM9vWlnLNUIyO9wAyV/FEcP5U18M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=nTSA97jY; arc=none smtp.client-ip=198.175.65.19 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="nTSA97jY" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779958319; x=1811494319; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=DNyqlt01HmGjeWRhvKJSBQpk0mSCRc9/JsFqqZyNOsA=; b=nTSA97jYTObdTwT+xeWmgKxi0OWcgWy11W7Jzgj/hOEr9dHxzDWGsFBO Gpao+sT2cQieaA/lBvikoIVNWbdhA3dcZGMy+RnDHQlZEfqHnjUfoYXy6 mWUwKW3qequ0eVTwdDxUo2SAyrUNHGgkZ312rMhX3W0KPX1pHujtgzOzL y5QDzAzIIszhtJM8He6jNtkxBYYbbAsYBzBGBtdkKJraeS/4LatG/A0/t vDRvy3Qx2IyY/zIWPeCH66bZZoN+xo0y0o0wbn8mHi4Q5jDDcbYtho9Ch 2JLZlrY6+9h59j6+z2jUwZuHD+eg7GbKLqXJ/mM8CZI9UzIpkF9a7y0iP w==; X-CSE-ConnectionGUID: 1zpWMlA5Simv2SkJCWyTjQ== X-CSE-MsgGUID: 0VvKmfzWToeKsLsK0xiPQw== X-IronPort-AV: E=McAfee;i="6800,10657,11799"; a="80765346" X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="80765346" Received: from orviesa001.jf.intel.com ([10.64.159.141]) by orvoesa111.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:51:58 -0700 X-CSE-ConnectionGUID: mshhUnOvRBaVv3oSpQoYiw== X-CSE-MsgGUID: hRPyJwSST6WDf02BAJYxkg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="280605243" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:51:55 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v3 06/15] KVM: TDX: Move lockdep assert in __tdp_mmu_set_spte_atomic() to TDX code Date: Thu, 28 May 2026 16:11:51 +0800 Message-ID: <20260528081151.10300-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260528080856.10141-1-yan.y.zhao@intel.com> References: <20260528080856.10141-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rick Edgecombe Move the MMU lockdep assert in __tdp_mmu_set_spte_atomic() into the TDX specific op because the assert is TDX specific in intention. The TDP MMU has many lockdep asserts for various scenarios, and in fact the callchains that are used for TDX already have a lockdep assert which covers the case in __tdp_mmu_set_spte_atomic(). However, these asserts are for management of the TDP root owned by KVM. In the __tdp_mmu_set_spte_atomic() assert case, it is helping with a scheme to avoid contention in the TDX module during zap operations. That is very TDX specific. One option would be to just remove the assert in __tdp_mmu_set_spte_atomic() and rely on the other ones in the TDP MMU. But that assert is for a different intention, and too far away from the SEAMCALL that needs it. So just move it to TDX code. Suggested-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v3: - Rebased to kvm-x86-next-2026.05.26. --- arch/x86/kvm/mmu/tdp_mmu.c | 2 -- arch/x86/kvm/vmx/tdx.c | 2 ++ 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index ed806a4768c2..dc455e6e7dc7 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -612,8 +612,6 @@ static inline int __must_check __tdp_mmu_set_spte_atomi= c(struct kvm *kvm, if (is_mirror_sptep(iter->sptep) && !is_frozen_spte(new_spte)) { int ret; =20 - lockdep_assert_held(&kvm->mmu_lock); - /* * We need to lock out other updates to the SPTE until the external * page table has been modified. Use FROZEN_SPTE similar to diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index e8eab25f3159..4ff5ae31eb66 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1727,6 +1727,8 @@ static int tdx_sept_map_leaf_spte(struct kvm *kvm, gf= n_t gfn, enum pg_level leve static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, u64 new_spte, enum pg_level level) { + lockdep_assert_held(&kvm->mmu_lock); + if (KVM_BUG_ON(is_shadow_present_pte(old_spte), kvm)) return -EIO; =20 --=20 2.43.2 From nobody Mon Jun 8 16:49:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CEB3A37F015; Thu, 28 May 2026 08:52:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958331; cv=none; b=e3qA749w9F2W7PLeAZsXMtR/xwhymm5iMmIytsbWLNi+ALq2/YhbDv+tCF/87RQrOUb3cmSkrl705XNEJRTIq6QXznqxG2kS6fnE3XldC3xxJ0dkcFkeuCbbgWNLmac5zemoKRugftbuRSOZ5DsvOuY8rAl2oYv7fzRk1oHfA+I= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958331; c=relaxed/simple; bh=9ibY6C45q3jxud6+X63hzk2/wcxR9JuzN7Qnh/MaQ5Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=aIo4x1TLC+5RL/e25/8NiBlEnen9R1DsEyydXGLu52WrNaff2TLscOz2AGZBTfHVlR6L7WMlw7kJrv2QJSFEU1APoE7s2UsgoxWLjfYdT7JDTOp/nvCoKImTFYvgL2k4v1K+s+1qacoyN1PxLxHAdoUnQmddYjcrh3dKykpmofE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=n778P+Le; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="n778P+Le" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779958330; x=1811494330; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=9ibY6C45q3jxud6+X63hzk2/wcxR9JuzN7Qnh/MaQ5Y=; b=n778P+LeiZprpyTk+Q3Kdo43QIE+MuIzzSwXqmJwORQxLof1TIX6Pbn7 5ND7q2dptrftlBoHkIxJvIlj6vmCfswwX4OI8fcUljIAxN1++q8GIPQtW Y3o43ENhP2NT8KlpuNOCuGLWs+XZzY/Dc4uFDweRoQfOLuuoxitwY33Wo VkTYzsOxa8wH2YqdkTiwTNnF8ftSmKdq4ulof334+1u+GumSjdCc/Vakq 9/PkdYtBAi4TH1Ez4EG4L249gBDcuehyLEfFm4yigySdUaA6F2SAq0CpO 68bG22o0hr/nqxs4otswSr99Jt9LqHz32ehPtYMnEZBKTNHZdmEQ4U7/c Q==; X-CSE-ConnectionGUID: QNm6nYEGR/uNXfK+ao24EQ== X-CSE-MsgGUID: bvQe8qngRqONiEm25V0daw== X-IronPort-AV: E=McAfee;i="6800,10657,11799"; a="80857162" X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="80857162" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:52:10 -0700 X-CSE-ConnectionGUID: dcHZ1ZH5SQyBjmwCG9lCCQ== X-CSE-MsgGUID: FsmmdFJkSYmgXvNT9/wqrQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="244322356" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:52:06 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v3 07/15] KVM: x86/tdp_mmu: Morph !is_frozen_spte() check into a KVM_MMU_WARN_ON() Date: Thu, 28 May 2026 16:12:02 +0800 Message-ID: <20260528081202.10316-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260528080856.10141-1-yan.y.zhao@intel.com> References: <20260528080856.10141-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rick Edgecombe Remove the conditional logic for handling the setting of mirror page table to frozen in __tdp_mmu_set_spte_atomic() and add it as a warning for both mirror and direct cases. The mirror page table needs to propagate PTE changes to the external page table. This presents a problem for atomic updates which can't update both page tables at once. So a special value, FROZEN_SPTE, is used as a temporary state during these updates to prevent concurrent operations on the PTE. If the TDP MMU tried to install FROZEN_SPTE as a long-term value, it would confuse these updates. On the other hand, it would also confuse other threads if FROZEN_SPTE is installed as a long-term value for direct page tables (e.g., causing another thread working on atomic zap to wait for a !FROZEN_SPTE value endlessly). Therefore, add the warning for installing FROZEN_SPTE as a long-term value in __tdp_mmu_set_spte_atomic() without differentiating whether it's a mirror or direct page table. Suggested-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v3: - Rebased to kvm-x86-next-2026.05.26. MMU_refactors v2: - Updated the comment for "KVM_MMU_WARN_ON(is_frozen_spte(new_spte))". (Yan) - Explained why the warning also applies to direct page tables. (Yan) --- arch/x86/kvm/mmu/tdp_mmu.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index dc455e6e7dc7..b30e33dea265 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -609,7 +609,10 @@ static inline int __must_check __tdp_mmu_set_spte_atom= ic(struct kvm *kvm, */ WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte)); =20 - if (is_mirror_sptep(iter->sptep) && !is_frozen_spte(new_spte)) { + /* Should not set FROZEN_SPTE as a long-term value. */ + KVM_MMU_WARN_ON(is_frozen_spte(new_spte)); + + if (is_mirror_sptep(iter->sptep)) { int ret; =20 /* --=20 2.43.2 From nobody Mon Jun 8 16:49:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F241838AC9A; Thu, 28 May 2026 08:52:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958342; cv=none; b=f9md9Ew5MZB4xMV581KMvD8Al3yoqzlnVTpWdlbLYPR0f1fnVEH1Ls0kvcUq5eYM4CyW/3dzygfMgmGZ/3ZiPCjStbjidheDfDZI4cByPxnSyn0nDjWPhowk0C3pyGYi7Vuc71WfRSSG7N7YWUev4uZhuzXziA8rh5PJ6Y1gxnI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958342; c=relaxed/simple; bh=Vi33lAduXmB8DG9hDNu1NCuCE3ah6p8mFdsGeLHyWKs=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZW+nNpMOFDCbvmU3EdF/s7FSgShW73mWAoC7u/w3m5CGqLwGP72zRhnXiFnQwhrO4fgnyvCEF+41e8WHCQPixGJ1wJFOWBfELdx6v766WYDytmRjmAQZ2k7evaDhq2FDJ+TJ2lQfXJeOfZ1katBe9HBL2Z8psnSN8dN4LhNZq+U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ETbqbFn4; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ETbqbFn4" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779958341; x=1811494341; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Vi33lAduXmB8DG9hDNu1NCuCE3ah6p8mFdsGeLHyWKs=; b=ETbqbFn4fCkSrDlEr8vJIYdmAqQxyk/HOSQVTmkorNJ6kpYbW7llVDVo UCcajjntgafwhyzfni7+MtStbZYGexpd9M/OGvxzE7Qn3hOGXNDfJDm7d 0B3D8hO95OSn74rPSgF/zMHroy0cj0CYIF1kagHTJdI0OmsRIy/b+BWqT e7koUesdEHh9csf1AHPJwQAIlF3tSJD+E4winWhUB6qr/uo3AIqgy6ZJJ sveZZ/PM8aFwSK9C1Nm5hbxCG3AU6BGHZAqqm1J4Ns7SxisDU+xholUAs tGKfY/FNiRiKhLTlTOtDwMN5O1Svk8To6hojgFbc5MOszW81TfimCPMI1 g==; X-CSE-ConnectionGUID: CZrtAjynTyWw0RYNyYhPqQ== X-CSE-MsgGUID: HHFZw6jPSeWofVPGBNqj2w== X-IronPort-AV: E=McAfee;i="6800,10657,11799"; a="80857188" X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="80857188" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:52:21 -0700 X-CSE-ConnectionGUID: q+GuSvCLQjGYxea0DiW/VA== X-CSE-MsgGUID: xYjWeyWdR0qa/VO88okizw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="244322378" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:52:17 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v3 08/15] KVM: x86/mmu: Plumb "sp" _pointer_ into the TDP MMU's handle_changed_spte() Date: Thu, 28 May 2026 16:12:13 +0800 Message-ID: <20260528081213.10332-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260528080856.10141-1-yan.y.zhao@intel.com> References: <20260528080856.10141-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Plumb the "sp" pointer into handle_changed_spte() to allow checking of is_mirror_sp(sp) in handle_changed_spte() in the next patch. This is a preparation to consolidate all S-EPT updates into a single kvm_x86_ops hook. [Yan: Remove unused "as_id" param in tdp_mmu_set_spte() ] Signed-off-by: Sean Christopherson Signed-off-by: Yan Zhao --- MMU_refactors v3: - Rebased to kvm-x86-next-2026.05.26. MMU_refactors v2: -Split out and added back this patch. (The patch was in Sean's original series, and had "SPTE" instead of "sp" in title). (Yan) - Remove unused "as_id" param in tdp_mmu_set_spte(). (Yan). --- arch/x86/kvm/mmu/tdp_mmu.c | 35 ++++++++++++++++++----------------- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index b30e33dea265..4c68d5e03346 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -320,9 +320,9 @@ void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool= mirror) } } =20 -static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, - u64 old_spte, u64 new_spte, int level, - bool shared); +static void handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp, + gfn_t gfn, u64 old_spte, u64 new_spte, + int level, bool shared); =20 static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp) { @@ -471,8 +471,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep= _t pt, bool shared) old_spte =3D kvm_tdp_mmu_write_spte(sptep, old_spte, FROZEN_SPTE, level); } - handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn, - old_spte, FROZEN_SPTE, level, shared); + handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared); =20 if (is_mirror_sp(sp)) { KVM_BUG_ON(shared, kvm); @@ -498,7 +497,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep= _t pt, bool shared) /** * handle_changed_spte - handle bookkeeping associated with an SPTE change * @kvm: kvm instance - * @as_id: the address space of the paging structure the SPTE was a part of + * @sp: the page table in which the SPTE resides * @gfn: the base GFN that was mapped by the SPTE * @old_spte: The value of the SPTE before the change * @new_spte: The value of the SPTE after the change @@ -511,15 +510,16 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pt= ep_t pt, bool shared) * dirty logging updates are handled in common code, not here (see make_sp= te() * and fast_pf_fix_direct_spte()). */ -static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, - u64 old_spte, u64 new_spte, int level, - bool shared) +static void handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp, + gfn_t gfn, u64 old_spte, u64 new_spte, + int level, bool shared) { bool was_present =3D is_shadow_present_pte(old_spte); bool is_present =3D is_shadow_present_pte(new_spte); bool was_leaf =3D was_present && is_last_spte(old_spte, level); bool is_leaf =3D is_present && is_last_spte(new_spte, level); bool pfn_changed =3D spte_to_pfn(old_spte) !=3D spte_to_pfn(new_spte); + int as_id =3D kvm_mmu_page_as_id(sp); =20 WARN_ON_ONCE(level > PT64_ROOT_MAX_LEVEL); WARN_ON_ONCE(level < PG_LEVEL_4K); @@ -668,6 +668,7 @@ static inline int __must_check tdp_mmu_set_spte_atomic(= struct kvm *kvm, struct tdp_iter *iter, u64 new_spte) { + struct kvm_mmu_page *sp =3D sptep_to_sp(rcu_dereference(iter->sptep)); int ret; =20 lockdep_assert_held_read(&kvm->mmu_lock); @@ -676,7 +677,7 @@ static inline int __must_check tdp_mmu_set_spte_atomic(= struct kvm *kvm, if (ret) return ret; =20 - handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte, + handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte, new_spte, iter->level, true); =20 return 0; @@ -685,7 +686,6 @@ static inline int __must_check tdp_mmu_set_spte_atomic(= struct kvm *kvm, /* * tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkee= ping * @kvm: KVM instance - * @as_id: Address space ID, i.e. regular vs. SMM * @sptep: Pointer to the SPTE * @old_spte: The current value of the SPTE * @new_spte: The new value that will be set for the SPTE @@ -695,9 +695,11 @@ static inline int __must_check tdp_mmu_set_spte_atomic= (struct kvm *kvm, * Returns the old SPTE value, which _may_ be different than @old_spte if = the * SPTE had voldatile bits. */ -static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep, - u64 old_spte, u64 new_spte, gfn_t gfn, int level) +static u64 tdp_mmu_set_spte(struct kvm *kvm, tdp_ptep_t sptep, u64 old_spt= e, + u64 new_spte, gfn_t gfn, int level) { + struct kvm_mmu_page *sp =3D sptep_to_sp(rcu_dereference(sptep)); + lockdep_assert_held_write(&kvm->mmu_lock); =20 /* @@ -711,7 +713,7 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id,= tdp_ptep_t sptep, =20 old_spte =3D kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level); =20 - handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false); + handle_changed_spte(kvm, sp, gfn, old_spte, new_spte, level, false); =20 /* * Users that do non-atomic setting of PTEs don't operate on mirror @@ -729,9 +731,8 @@ static inline void tdp_mmu_iter_set_spte(struct kvm *kv= m, struct tdp_iter *iter, u64 new_spte) { WARN_ON_ONCE(iter->yielded); - iter->old_spte =3D tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep, - iter->old_spte, new_spte, - iter->gfn, iter->level); + iter->old_spte =3D tdp_mmu_set_spte(kvm, iter->sptep, iter->old_spte, + new_spte, iter->gfn, iter->level); } =20 #define tdp_root_for_each_pte(_iter, _kvm, _root, _start, _end) \ --=20 2.43.2 From nobody Mon Jun 8 16:49:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 296D4379980; Thu, 28 May 2026 08:52:32 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958354; cv=none; b=CsceGuX7RI/vYzy0mh3wtFUhzMN3NMzlJWKJukEesQXPDcrYFDHr4UlbN/D0yWy9B0jYLJdT6hAwpISz+2J4T+i20ELP7MZZiJfSPHdBJLEcU49c/bgAZvYkmD9ctB9MLbvwVnO10CA4H3WmVrMbeAbfJzFgeLXxv5f2Rl45sls= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958354; c=relaxed/simple; bh=XYh7FQsVSQ1HfKQO3xOEqXgqIuZM2HV8z2V13DzF9Rc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UEzpdOc64Bh5pFsl15N2i1yLO6hIepv7ebxfXOgATFtBgPNbntAsYRMRYW9KukoFmKILZROQvZuUrTDZ7DEhbxk1V4yVlU7fKk+TvZPjzmGvzMFB07u+yJqF/IHcZ6d9It9ZdTlK7r2mfn/c861yPvbwKC5Wtvnid5o/gFEdEeE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QhRmJPHK; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QhRmJPHK" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779958353; x=1811494353; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=XYh7FQsVSQ1HfKQO3xOEqXgqIuZM2HV8z2V13DzF9Rc=; b=QhRmJPHK2q3JoEbtbE3FYRetcGWi9mkh72xuKw0XIjqH60jNYmdCCFIh +hpX+yLRmInTVczBuECEzG6LiYT8Z9H6RzDYPeeKehUUQayf60FCsHjZb eth8PVkmMpQ/drfDoFHf44nZ8WamGP2BRar4zLjbnf4SDWPyszJZ6ptit NAORrEFDdSJo+Vc0BWFmZ9as83yLZDrWQnXZkVYHgh+nCVQhYkjqyp4vb 2TE9fvd+eJwuMN9ytghhsgF1vpa/Dq+NdLjxBTQXEMh7vr3inSm3M8bMw fJ/ZeCEWoTdhNVsQiolDZgn6JnWL+XicPvpe1xxXvnebV3u14oEfWimcs Q==; X-CSE-ConnectionGUID: ueO7S5ZbTv6JhXY9hXJfxQ== X-CSE-MsgGUID: 2O9VJkoOT2uJ/Nq7QsKUiw== X-IronPort-AV: E=McAfee;i="6800,10657,11799"; a="80857204" X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="80857204" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:52:33 -0700 X-CSE-ConnectionGUID: EGY3PwQDQWyvMrki+qMI4Q== X-CSE-MsgGUID: SRDLyOkcREKGWfLGQVMEzw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="244322410" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:52:29 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v3 09/15] KVM: x86/tdp_mmu: Centrally propagate to-present/atomic zap updates to external PTEs Date: Thu, 28 May 2026 16:12:25 +0800 Message-ID: <20260528081225.10348-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260528080856.10141-1-yan.y.zhao@intel.com> References: <20260528080856.10141-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Move propagation of to-present changes and atomic zap changes to external PTEs from __tdp_mmu_set_spte_atomic() to __handle_changed_spte(), which centrally handles SPTE changes. When setting a PTE to present in the mirror page tables, the update needs to be propagated to the external page tables (in TDX parlance the S-EPT). Today this is handled by special mirror page tables branching in __tdp_mmu_set_spte_atomic(), which is the only place where present PTEs are set for TDX. This keeps things running, but is a bit hacked on. The hook for setting present leaf PTEs is added only where TDX happens to need it. For example, TDX does not support any of the operations that use the non-atomic variant, tdp_mmu_set_spte() to set present PTEs. Since the hook is missing there, it is very hard to understand the code from a non-TDX lens. If the reader doesn't know the TDX specifics it could look like the external update is missing. In addition to being confusing, it also litters the TDP MMU with "external" update callbacks. This is especially unfortunate because there is already a central place to react to TDP updates, handle_changed_spte(). Begin the process of moving toward a model where all mirror page table updates are forwarded to TDX code where the TDX-specific logic can live with a more proper separation of concerns. Do this by adding a helper __handle_changed_spte() and teaching it to return error codes so it can propagate failures that may come from TDX external page table updates. Make the original handle_changed_spte() a no-fail version of __handle_changed_spte() that handles no-fail changes under exclusive mmu_lock or under the no-fail path handle_removed_pt(). (Note: as discussed in [2], teaching __handle_changed_spte() to return errors causes a "wrong" trace order when removing a non-mirror page table, which should be benign.) Instead of having __tdp_mmu_set_spte_atomic() do the frozen mirror SPTE dance and trigger propagation to external PTEs, make __tdp_mmu_set_spte_atomic() a simple helper of try_cmpxchg64() and hoist the frozen mirror SPTE dance up a level to tdp_mmu_set_spte_atomic(). Then, the propagation of changes to present to the external PTEs can be centralized to __handle_changed_spte(). Note that before tdp_mmu_set_spte_atomic() actually sets a mirror SPTE to new_spte (i.e., while it freezes the mirror SPTE), (1) the external PTEs, mirror SPTE statistics/trace, (2) child mirror SPTEs, child external PTEs and child mirror SPTE statistics/trace, are updated by tdp_mmu_set_spte_atomic() to new_spte. (1) is ok. If updating the external PTE is successful, the mirror SPTE will be set to new_spte soon; if updating the external PTE fails, the mirror SPTE will be restored to its original value. (A minor issue in the successful path is that the statistics/trace are updated slightly earlier before the actual change occurs, which should be ok, similar to the reasoning in [2]: "I doubt that'll be a problem in practice, so I'm inclined to take the simpler code"). (2) is also ok since handle_removed_pt() is no-fail. Aging external SPTEs is not yet supported for the mirror page table, so just warn on mirror usage in kvm_tdp_mmu_age_spte() and invoke __tdp_mmu_set_spte_atomic() directly without the frozen dance. No need to warn on installing FROZEN_SPTE as a long-term value in kvm_tdp_mmu_age_spte() since removing the accessed bit is mutually exclusive with installing FROZEN_SPTE (FROZEN_SPTE has the accessed bit set on all x86 platforms). Since tdp_mmu_set_spte_atomic() can also atomically zap SPTEs (though there's no path to trigger atomic zap on the mirror page table currently), also leverage the set_external_spte() op to propagate the atomic zaps while having the TDX code reject this condition. Link: https://lore.kernel.org/lkml/aYYn0nf2cayYu8e7@google.com [1] Link: https://lore.kernel.org/kvm/aY_Dq3riRLUNFZfr@google.com [2] Not-yet-Signed-off-by: Sean Christopherson [Rick: Based on a diff by Sean Chrisopherson] Signed-off-by: Rick Edgecombe [Yan: added atomic zap case ] Signed-off-by: Yan Zhao --- MMU_refactors v3: - Rebased to kvm-x86-next-2026.05.26. - Fixed a missing trace issue reported by sashiko. https://lore.kernel.org/kvm/agu+UsizlbUYsjp6@yzhao56-desk.sh.intel.com MMU_refactors v2: - Updated comments and patch log (Yan, Binbin). - Split the "KVM: x86/mmu: Plumb "sp" _pointer_ into the TDP MMU's handle_changed_spte()" out (which was in Sean's original series but had SPTE instead of "sp" in patch title). (Yan) - Also invoke set_external_spte() op to propagate changes for atomic zap of leaf SPTEs. (Yan). - Kept the "Not-yet-Signed-off-by" in v1 https://lore.kernel.org/all/20260327201421.2824383-8-rick.p.edgecombe@int= el.com. --- arch/x86/kvm/mmu/tdp_mmu.c | 126 ++++++++++++++++++++++++------------- 1 file changed, 81 insertions(+), 45 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 4c68d5e03346..3c3e73ce8da9 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -495,7 +495,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep= _t pt, bool shared) } =20 /** - * handle_changed_spte - handle bookkeeping associated with an SPTE change + * __handle_changed_spte - handle bookkeeping associated with an SPTE chan= ge * @kvm: kvm instance * @sp: the page table in which the SPTE resides * @gfn: the base GFN that was mapped by the SPTE @@ -510,9 +510,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep= _t pt, bool shared) * dirty logging updates are handled in common code, not here (see make_sp= te() * and fast_pf_fix_direct_spte()). */ -static void handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp, - gfn_t gfn, u64 old_spte, u64 new_spte, - int level, bool shared) +static int __handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp, + gfn_t gfn, u64 old_spte, u64 new_spte, + int level, bool shared) { bool was_present =3D is_shadow_present_pte(old_spte); bool is_present =3D is_shadow_present_pte(new_spte); @@ -549,9 +549,7 @@ static void handle_changed_spte(struct kvm *kvm, struct= kvm_mmu_page *sp, } =20 if (old_spte =3D=3D new_spte) - return; - - trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte); + return 0; =20 if (is_leaf) check_spte_writable_invariants(new_spte); @@ -578,29 +576,51 @@ static void handle_changed_spte(struct kvm *kvm, stru= ct kvm_mmu_page *sp, "a temporary frozen SPTE.\n" "as_id: %d gfn: %llx old_spte: %llx new_spte: %llx level: %d", as_id, gfn, old_spte, new_spte, level); - return; - } =20 - if (is_leaf !=3D was_leaf) - kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1); + trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte); + return 0; + } =20 /* * Recursively handle child PTs if the change removed a subtree from * the paging structure. Note the WARN on the PFN changing without the * SPTE being converted to a hugepage (leaf) or being zapped. Shadow * pages are kernel allocations and should never be migrated. + * + * For the mirror page table, propagate changes to present or changes of + * leaf SPTEs to !present under shared mmu_lock to the external SPTE via + * set_external_spte() op. */ if (was_present && !was_leaf && - (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) + (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) { handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); + } else if (is_mirror_sp(sp) && (is_present || shared)) { + int r; + + r =3D kvm_x86_call(set_external_spte)(kvm, gfn, old_spte, new_spte, leve= l); + if (r) + return r; + } + trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte); + + if (is_leaf !=3D was_leaf) + kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1); + + return 0; +} + +static void handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp, + gfn_t gfn, u64 old_spte, u64 new_spte, + int level, bool shared) +{ + KVM_BUG_ON(__handle_changed_spte(kvm, sp, gfn, old_spte, new_spte, + level, shared), kvm); } =20 static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm, struct tdp_iter *iter, u64 new_spte) { - u64 *raw_sptep =3D rcu_dereference(iter->sptep); - /* * The caller is responsible for ensuring the old SPTE is not a FROZEN * SPTE. KVM should never attempt to zap or manipulate a FROZEN SPTE, @@ -609,31 +629,6 @@ static inline int __must_check __tdp_mmu_set_spte_atom= ic(struct kvm *kvm, */ WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte)); =20 - /* Should not set FROZEN_SPTE as a long-term value. */ - KVM_MMU_WARN_ON(is_frozen_spte(new_spte)); - - if (is_mirror_sptep(iter->sptep)) { - int ret; - - /* - * We need to lock out other updates to the SPTE until the external - * page table has been modified. Use FROZEN_SPTE similar to - * the zapping case. - */ - if (!try_cmpxchg64(raw_sptep, &iter->old_spte, FROZEN_SPTE)) - return -EBUSY; - - ret =3D kvm_x86_call(set_external_spte)(kvm, iter->gfn, iter->old_spte, - new_spte, iter->level); - - if (ret) - __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte); - else - __kvm_tdp_mmu_write_spte(iter->sptep, new_spte); - - return ret; - } - /* * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs and * does not hold the mmu_lock. On failure, i.e. if a different logical @@ -641,7 +636,7 @@ static inline int __must_check __tdp_mmu_set_spte_atomi= c(struct kvm *kvm, * the current value, so the caller operates on fresh data, e.g. if it * retries tdp_mmu_set_spte_atomic(). */ - if (!try_cmpxchg64(raw_sptep, &iter->old_spte, new_spte)) + if (!try_cmpxchg64(rcu_dereference(iter->sptep), &iter->old_spte, new_spt= e)) return -EBUSY; =20 return 0; @@ -673,14 +668,51 @@ static inline int __must_check tdp_mmu_set_spte_atomi= c(struct kvm *kvm, =20 lockdep_assert_held_read(&kvm->mmu_lock); =20 - ret =3D __tdp_mmu_set_spte_atomic(kvm, iter, new_spte); + /* Should not set FROZEN_SPTE as a long-term value. */ + KVM_MMU_WARN_ON(is_frozen_spte(new_spte)); + + /* + * Temporarily freeze the SPTE until the external PTE operation has + * completed, e.g. so that concurrent faults don't attempt to install a + * child PTE in the external page table before the parent PTE has been + * written. + */ + if (is_mirror_sptep(iter->sptep)) + ret =3D __tdp_mmu_set_spte_atomic(kvm, iter, FROZEN_SPTE); + else + ret =3D __tdp_mmu_set_spte_atomic(kvm, iter, new_spte); + if (ret) return ret; =20 - handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte, - new_spte, iter->level, true); - - return 0; + /* + * Handle the change from iter->old_spte to new_spte. + * + * Note: for mirror page table, this means the updates of the external + * PTE, statistics, or updates of child SPTEs, child external PTEs and + * corresponding statistics are performed while the mirror SPTE is in + * frozen state (i.e., before the mirror SPTE is set to new_spte). + */ + ret =3D __handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte, + new_spte, iter->level, true); + /* + * Unfreeze the mirror SPTE. If updating the external SPTE failed, + * restore the old value so that the mirror SPTE isn't frozen in + * perpetuity, otherwise set the mirror SPTE to the new desired value. + */ + if (is_mirror_sptep(iter->sptep)) { + if (ret) + __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte); + else + __kvm_tdp_mmu_write_spte(iter->sptep, new_spte); + } else { + /* + * Bug the VM if handling the change failed, as failure is only + * allowed if KVM couldn't update the external SPTE. + */ + KVM_BUG_ON(ret, kvm); + } + return ret; } =20 /* @@ -1334,6 +1366,10 @@ static void kvm_tdp_mmu_age_spte(struct kvm *kvm, st= ruct tdp_iter *iter) { u64 new_spte; =20 + /* TODO: Add support for aging external SPTEs, if necessary. */ + if (WARN_ON_ONCE(is_mirror_sptep(iter->sptep))) + return; + if (spte_ad_enabled(iter->old_spte)) { iter->old_spte =3D tdp_mmu_clear_spte_bits_atomic(iter->sptep, shadow_accessed_mask); --=20 2.43.2 From nobody Mon Jun 8 16:49:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6D142379980; Thu, 28 May 2026 08:52:44 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958366; cv=none; b=jReYmFf2OljREBdNeC/m6x8Pa+azMWWe0TNNV3EN4ZuPQOi03/7fQuG0LBZUVksCsZzpebJHHQw/Tv0ES1RZcnFEVjQJrNTVpNXZIRIn/uO76LuprjNE7PhMY1u9dJgfJ16QJiR1BxvCfqveHKMWP5IdKUdPDu9L+wEVLCkgLyQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958366; c=relaxed/simple; bh=jFOSGnttzw3tQKJ6J5u97lLWf7eG7FMLgh8tBvTHJNU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NlkXI9ElThb9xIkJ5eaFTKbp9vG5x8tKcjw3p5J6IIIOt7jby1g6215pSN/m0HnVDqclFqk6uluZuulOGvjHSelNx4K+IilUsSiJV/Tv5806aXMv9UGYRIKvIH5lZFkTvzzxzwpCEU042TNMx2JolTKFsZdVexPlLK2ffk7SI1k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=dz41IEWJ; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="dz41IEWJ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779958365; x=1811494365; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=jFOSGnttzw3tQKJ6J5u97lLWf7eG7FMLgh8tBvTHJNU=; b=dz41IEWJlFgG6TJCCpXk04mSLd+leSwdcuSnQ2mnFdDab0oTUD9oRJwS nQGubfNhsLRvo3DRhypNLGsqIEycthjMvZc6Cqkpf1bdX7Xj8446fOTOU ZfuGR01U3LEBhL2Qn/D0sqeJUVlLBf8YfKeN+FfUgkTaN0N8hT+G2K45H SUXVR/a4jX3BWSfa5dg50EM2SklU5u1ZPz5fQsycz/g+LDlunfn2RRJMI eko2/vNUqLB+1/VapL9OmeZtBVOns0R/HwmYTx1tfRxZKeeLYME9xA07c pSUNJE9BJq45pDUtss6/bekgVJ9M9W2n3y5OOlOjCfPb0SJv9qvULy9gC w==; X-CSE-ConnectionGUID: oUqFkDX2S8qOQ6byTsd9NA== X-CSE-MsgGUID: zTQRaJusSQGNzh/zrEsZDw== X-IronPort-AV: E=McAfee;i="6800,10657,11799"; a="80857227" X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="80857227" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:52:44 -0700 X-CSE-ConnectionGUID: aXRXvZkSQWSHy+cVtsDzEw== X-CSE-MsgGUID: nazX+xMPQFmJD7zl8Lp73g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="244322456" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:52:40 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v3 10/15] KVM: x86/mmu: Drop KVM_BUG_ON() on shared lock to zap child external PTEs Date: Thu, 28 May 2026 16:12:37 +0800 Message-ID: <20260528081237.10364-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260528080856.10141-1-yan.y.zhao@intel.com> References: <20260528080856.10141-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rick Edgecombe Drop the KVM_BUG_ON() in the KVM MMU core before zapping child external PTEs, since requiring zapping PTEs to be protected by exclusive mmu_lock is TDX's specific requirement. No need to plumb the shared/exclusive info into the remove_external_spte() op or move the KVM_BUG_ON() to TDX, because - There's already an assertion of exclusive mmu_lock protection in TDX. - The KVM_BUG_ON() is a bit redundant given that if there's any bug causing zapping of leaf PTEs in S-EPT under shared mmu_lock, SEAMCALL failures due to contention would result in TDX_BUG_ON() in TDX. Link: https://lore.kernel.org/kvm/aYUarHf3KEwHGuJe@google.com Suggested-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v3: - Rebased to kvm-x86-next-2026.05.26. MMU_refactors v2: - Updated commit log and title. (Yan) --- arch/x86/kvm/mmu/tdp_mmu.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 3c3e73ce8da9..3ba7556a8d2f 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -473,10 +473,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pte= p_t pt, bool shared) } handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared); =20 - if (is_mirror_sp(sp)) { - KVM_BUG_ON(shared, kvm); + if (is_mirror_sp(sp)) remove_external_spte(kvm, gfn, old_spte, level); - } } =20 if (is_mirror_sp(sp) && --=20 2.43.2 From nobody Mon Jun 8 16:49:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5CE3D379980; Thu, 28 May 2026 08:52:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958376; cv=none; b=RhpCk85uKqBZO1JL5SA0Ec6Ar2zKmLENl3FcH6+aoRxAVQS7k8po0qYFkAR7LceoqR0gaxL9AmwaBz67yhQIgbIDPxSJ5ZYiq0DrxZXiYCeFT0JHqXP9OOxis5Gv533jiHlKG/c1JHE1iROIJpnnvkS8MZsoWGKfT/S+hOdFp94= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958376; c=relaxed/simple; bh=C4Hcg8eqtOzlbXPE2qDTzUfxxDPuBgJdEsENfF/+vU0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=T6xG1JE74s6HEJYGgmqvJOi4IVxQ9QxoyzdccbuLVXAfi7clzuY6H5/Wkckf5Yw8PeboD1oGoBLcY0CnrAzRN1POrVsnFD79iEu/9mHS8jfdTmP7izF6TbKI5rn3B0YLVh7ga7Xmm3KAjp+HLazQw2EeR7gXygJPOHbMS9iRHGk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=QkViJdyT; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="QkViJdyT" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779958376; x=1811494376; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=C4Hcg8eqtOzlbXPE2qDTzUfxxDPuBgJdEsENfF/+vU0=; b=QkViJdyTMHcQlOaIa2wozzMoEAFFeSvGFO3mgKxNN2Vh7YB6VrkkIh0K 6PJ+9hak19YUPTFyriDQCTPe97Ptj/MuMcxHmg90C99+PJqRi7GjUJ9JZ Gr4s1HHd5bABQovXlXhfyubn6GV+khQGun6QYehhCzhhTVlm7BO+ohJz1 t7ycwQ3hlTJhRa3QBDYfXSd+2c069YWImWBZ9zQ33xay76evHX5KA2ym2 swHZbbUroO4BvVYxR4El5lmjYyxNHBr4zfLIKmgVkkf7saDDA7pJOHRk7 HL7N3L7blNMR4Onq7Eu7dM/ufbe4a2EvMKNWrbMQJD9fYagK24L7RYiOO w==; X-CSE-ConnectionGUID: jHohmHFcQ16Jjqi/huF/iw== X-CSE-MsgGUID: WsIwlk6aQqmv4MIHIlSOEQ== X-IronPort-AV: E=McAfee;i="6800,10657,11799"; a="80857253" X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="80857253" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:52:55 -0700 X-CSE-ConnectionGUID: cebmKMPiTN6hIvm/Jdl5ew== X-CSE-MsgGUID: eB3Huer5Q5SQ4WwVHYXaJw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="244322510" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:52:51 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v3 11/15] KVM: TDX: Hoist tdx_sept_remove_private_spte() above set_private_spte() Date: Thu, 28 May 2026 16:12:47 +0800 Message-ID: <20260528081247.10380-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260528080856.10141-1-yan.y.zhao@intel.com> References: <20260528080856.10141-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Arrange tdx_sept_remove_private_spte() (and its tdx_track() helper) to be above tdx_sept_set_private_spte() in anticipation of routing all S-EPT writes (with the exception of reclaiming non-leaf pages) through the "set" API. No functional change intended. Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v3: - Rebased to kvm-x86-next-2026.05.26. MMU_refactors v2: -Made the patch log description more generic to match the diff. (Kai, Rick) -Kept tdx_sept_free_private_spt() below tdx_sept_set_private_spte(). (Yan) --- arch/x86/kvm/vmx/tdx.c | 80 +++++++++++++++++++++--------------------- 1 file changed, 40 insertions(+), 40 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 4ff5ae31eb66..47bd9e0f0135 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1724,23 +1724,6 @@ static int tdx_sept_map_leaf_spte(struct kvm *kvm, g= fn_t gfn, enum pg_level leve return tdx_mem_page_aug(kvm, gfn, level, pfn); } =20 -static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, - u64 new_spte, enum pg_level level) -{ - lockdep_assert_held(&kvm->mmu_lock); - - if (KVM_BUG_ON(is_shadow_present_pte(old_spte), kvm)) - return -EIO; - - if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) - return -EIO; - - if (!is_last_spte(new_spte, level)) - return tdx_sept_map_nonleaf_spte(kvm, gfn, level, new_spte); - - return tdx_sept_map_leaf_spte(kvm, gfn, level, new_spte); -} - /* * Ensure shared and private EPTs to be flushed on all vCPUs. * tdh_mem_track() is the only caller that increases TD epoch. An increase= in @@ -1787,29 +1770,6 @@ static void tdx_track(struct kvm *kvm) kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); } =20 -static int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, - enum pg_level level, void *private_spt) -{ - struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - - /* - * free_external_spt() is only called after hkid is freed when TD is - * tearing down. - * KVM doesn't (yet) zap page table pages in mirror page table while - * TD is active, though guest pages mapped in mirror page table could be - * zapped during TD is active, e.g. for shared <-> private conversion - * and slot move/deletion. - */ - if (KVM_BUG_ON(is_hkid_assigned(kvm_tdx), kvm)) - return -EIO; - - /* - * The HKID assigned to this TD was already freed and cache was - * already flushed. We don't have to flush again. - */ - return tdx_reclaim_page(virt_to_page(private_spt)); -} - static void tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte) { @@ -1860,6 +1820,46 @@ static void tdx_sept_remove_private_spte(struct kvm = *kvm, gfn_t gfn, tdx_quirk_reset_paddr(PFN_PHYS(pfn), PAGE_SIZE); } =20 +static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, + u64 new_spte, enum pg_level level) +{ + lockdep_assert_held(&kvm->mmu_lock); + + if (KVM_BUG_ON(is_shadow_present_pte(old_spte), kvm)) + return -EIO; + + if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) + return -EIO; + + if (!is_last_spte(new_spte, level)) + return tdx_sept_map_nonleaf_spte(kvm, gfn, level, new_spte); + + return tdx_sept_map_leaf_spte(kvm, gfn, level, new_spte); +} + +static int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt) +{ + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + + /* + * free_external_spt() is only called after hkid is freed when TD is + * tearing down. + * KVM doesn't (yet) zap page table pages in mirror page table while + * TD is active, though guest pages mapped in mirror page table could be + * zapped during TD is active, e.g. for shared <-> private conversion + * and slot move/deletion. + */ + if (KVM_BUG_ON(is_hkid_assigned(kvm_tdx), kvm)) + return -EIO; + + /* + * The HKID assigned to this TD was already freed and cache was + * already flushed. We don't have to flush again. + */ + return tdx_reclaim_page(virt_to_page(private_spt)); +} + void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, int trig_mode, int vector) { --=20 2.43.2 From nobody Mon Jun 8 16:49:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E9B4038D3FB; Thu, 28 May 2026 08:53:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958387; cv=none; b=j28qU6sgSdDxsBljsg29mP6t2uIpWsopCpeIdSOGtS+dGMarNsSzHOCeo+DEeY91fIJSIzneUWEPdVAVVT5buVNdqI87aOhtAY8ltIQsmieecYidGqPnJu4XS14pElvdD4NBt+n3X3hnynUFLC+Xl6wfPVk/DluN17cqLIZHrik= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958387; c=relaxed/simple; bh=BIYh6lg8Ns/HksV+mzeMIAA6ksi9/hK5K4qrhsoRB9k=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=qa4oaapSYMEhvFU7ebnzygoZOMyInJgrdTm33wXcCk9KnzfxOMI8yZU00qljKwAMTtQzk5fvM+x195t8NbOHaErXhyq5ldVpI5HcZbjK7gV7BZtcD+qujp2AwbaI0v4+xub/AxtcXQC5g636xd6xkuZGNJeyx5IauH8xkv6Ac2Q= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=aidouCvs; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aidouCvs" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779958386; x=1811494386; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BIYh6lg8Ns/HksV+mzeMIAA6ksi9/hK5K4qrhsoRB9k=; b=aidouCvsRuS4tILSkmvwUIZkJdsChUHR39k0QHJauYiB4DWaBXgYL16r rmSjKcIN3YpJwtXkUu+l3iT5Idqk/eMwBzS/DsHTD0VinvNiFQEM7aEe8 kcfmRVQJV4frO1Io1JkHXWO4pKu4tQIfpg417qwJmPpoUs3dobZwwADzq ZlJdrpvK9vyzh/VVD4jgvUIQCnZ5qDV+eCgK3Wi+B6mWgzNWUAQxoxt3+ LhrS9SbOGGtrTnGByzlIOWON9FeDnDLyVBgKkgWR8fUQ6zOQ0Io8ZE7sE m0UuIIFKuf04kUkQoIrDvsmuSdoP8grsXO0YGIv0KAApqQFsxjhmUGxGx Q==; X-CSE-ConnectionGUID: hpcaA78TRTSXqKD06dibCQ== X-CSE-MsgGUID: dw4/6A4WTYCQH2wWRNyl0A== X-IronPort-AV: E=McAfee;i="6800,10657,11799"; a="80857264" X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="80857264" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:53:06 -0700 X-CSE-ConnectionGUID: BcH3uVrzSPyFm+1p/o9Aqg== X-CSE-MsgGUID: ciuo0tJdSAqKtd6c8XaYmw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="244322579" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:53:02 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v3 12/15] KVM: TDX: Drop kvm_x86_ops.remove_external_spte() Date: Thu, 28 May 2026 16:12:58 +0800 Message-ID: <20260528081258.10396-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260528080856.10141-1-yan.y.zhao@intel.com> References: <20260528080856.10141-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Drop kvm_x86_ops.remove_external_spte(), and instead handle the removal of leaf SPTEs in the S-EPT (a.k.a. external page table) in kvm_x86_ops.set_external_spte(). This will also allow extending tdx_sept_set_private_spte() to support splitting a huge S-EPT entry without needing yet another kvm_x86_ops hook. Now all changes for removing leaf mirror SPTEs are propagated through kvm_x86_ops.set_external_spte(). - When removing leaf mirror SPTEs under shared mmu_lock (though currently no path can trigger this scenario and TDX does not support this scenario), tdx_sept_remove_private_spte() may produce a warning due to lockdep_assert_held_write() or may return -EIO and trigger TDX_BUG_ON() due to concurrent BLOCK, TRACK, REMOVE. - When removing leaf mirror SPTEs under exclusive mmu_lock, all errors are unexpected. If any error occurs in this scenario, tdx_sept_remove_private_spte() will return -EIO and trigger KVM_BUG_ON(). A redundant KVM_BUG_ON() call will also be triggered in TDP MMU core in handle_changed_spte(), which is benign (the WARN will fire if and only if the VM isn't already bugged). Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v3: - Rebased to kvm-x86-next-2026.05.26. MMU_refactors v2: - Added expected lock and valid scenarios in function comment of tdx_sept_set_private_spte(). (Yan/Rick) - Updated patch title (was "Handle removal of leaf SPTEs in .set_private_spte()"), since atomic zaps of leaf SPTEs are already handled in kvm_x86_ops.set_external_spte() before this patch. (Yan). --- arch/x86/include/asm/kvm-x86-ops.h | 1 - arch/x86/include/asm/kvm_host.h | 3 --- arch/x86/kvm/mmu/tdp_mmu.c | 37 +++------------------------- arch/x86/kvm/vmx/tdx.c | 39 +++++++++++++++++++++--------- 4 files changed, 31 insertions(+), 49 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 2cb393000ee9..771d991562ca 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -98,7 +98,6 @@ KVM_X86_OP_OPTIONAL_RET0(tdp_has_smep) KVM_X86_OP(load_mmu_pgd) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) KVM_X86_OP_OPTIONAL_RET0(free_external_spt) -KVM_X86_OP_OPTIONAL(remove_external_spte) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) KVM_X86_OP(get_l2_tsc_multiplier) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 7710471a1c49..987e005f08eb 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1922,9 +1922,6 @@ struct kvm_x86_ops { int (*free_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, void *external_spt); =20 - /* Update external page table from spte getting removed, and flush TLB. */ - void (*remove_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level le= vel, - u64 mirror_spte); =20 bool (*has_wbinvd_exit)(void); =20 diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 3ba7556a8d2f..72d323f2d0dc 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -359,25 +359,6 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct = kvm_mmu_page *sp) spin_unlock(&kvm->arch.tdp_mmu_pages_lock); } =20 -static void remove_external_spte(struct kvm *kvm, gfn_t gfn, u64 old_spte, - int level) -{ - /* - * External (TDX) SPTEs are limited to PG_LEVEL_4K, and external - * PTs are removed in a special order, involving free_external_spt(). - * But remove_external_spte() will be called on non-leaf PTEs via - * __tdp_mmu_zap_root(), so avoid the error the former would return - * in this case. - */ - if (!is_last_spte(old_spte, level)) - return; - - /* Zapping leaf spte is allowed only when write lock is held. */ - lockdep_assert_held_write(&kvm->mmu_lock); - - kvm_x86_call(remove_external_spte)(kvm, gfn, level, old_spte); -} - /** * handle_removed_pt() - handle a page table removed from the TDP structure * @@ -472,9 +453,6 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep= _t pt, bool shared) FROZEN_SPTE, level); } handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared); - - if (is_mirror_sp(sp)) - remove_external_spte(kvm, gfn, old_spte, level); } =20 if (is_mirror_sp(sp) && @@ -585,14 +563,14 @@ static int __handle_changed_spte(struct kvm *kvm, str= uct kvm_mmu_page *sp, * SPTE being converted to a hugepage (leaf) or being zapped. Shadow * pages are kernel allocations and should never be migrated. * - * For the mirror page table, propagate changes to present or changes of - * leaf SPTEs to !present under shared mmu_lock to the external SPTE via + * For the mirror page table, propagate all changes to the external SPTE + * (except zapping/promotion of non-leaf SPTEs) via the * set_external_spte() op. */ if (was_present && !was_leaf && (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) { handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); - } else if (is_mirror_sp(sp) && (is_present || shared)) { + } else if (is_mirror_sp(sp)) { int r; =20 r =3D kvm_x86_call(set_external_spte)(kvm, gfn, old_spte, new_spte, leve= l); @@ -745,15 +723,6 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, tdp_ptep_= t sptep, u64 old_spte, =20 handle_changed_spte(kvm, sp, gfn, old_spte, new_spte, level, false); =20 - /* - * Users that do non-atomic setting of PTEs don't operate on mirror - * roots, so don't handle it and bug the VM if it's seen. - */ - if (is_mirror_sptep(sptep)) { - KVM_BUG_ON(is_shadow_present_pte(new_spte), kvm); - remove_external_spte(kvm, gfn, old_spte, level); - } - return old_spte; } =20 diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 47bd9e0f0135..68d9deae3a2d 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1770,11 +1770,11 @@ static void tdx_track(struct kvm *kvm) kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); } =20 -static void tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 mirror_spte) +static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, u64 old_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); + kvm_pfn_t pfn =3D spte_to_pfn(old_spte); gpa_t gpa =3D gfn_to_gpa(gfn); u64 err, entry, level_state; =20 @@ -1786,16 +1786,16 @@ static void tdx_sept_remove_private_spte(struct kvm= *kvm, gfn_t gfn, * there can't be anything populated in the private EPT. */ if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm)) - return; + return -EIO; =20 /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) - return; + return -EIO; =20 err =3D tdh_do_no_vcpus(tdh_mem_range_block, kvm, &kvm_tdx->td, gpa, level, &entry, &level_state); if (TDX_BUG_ON_2(err, TDH_MEM_RANGE_BLOCK, entry, level_state, kvm)) - return; + return -EIO; =20 /* * TDX requires TLB tracking before dropping private page. Do @@ -1811,22 +1811,40 @@ static void tdx_sept_remove_private_spte(struct kvm= *kvm, gfn_t gfn, err =3D tdh_do_no_vcpus(tdh_mem_page_remove, kvm, &kvm_tdx->td, gpa, level, &entry, &level_state); if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_REMOVE, entry, level_state, kvm)) - return; + return -EIO; =20 err =3D tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, pfn); if (TDX_BUG_ON(err, TDH_PHYMEM_PAGE_WBINVD, kvm)) - return; + return -EIO; =20 tdx_quirk_reset_paddr(PFN_PHYS(pfn), PAGE_SIZE); + return 0; } =20 +/* + * Handle changes for + * (1) leaf SPTEs from non-present to present + * (2) non-leaf SPTEs from non-present to present + * (3) leaf SPTEs from present to non-present + * + * - (1) and (2) must be under shared mmu_lock. If (1) and (2) are under + * exclusive mmu_lock (currently impossible), contention errors may lead= to + * KVM_BUG_ON() in handle_changed_spte(), e.g., due to tdx_mem_page_aug(= ), + * tdx_mem_page_add(), or tdh_mem_sept_add() contending with tdh_vp_ente= r() + * due to zero-step mitigation or contending with TDCALLs. + * - (3) must be under write mmu_lock. If (3) is under shared mmu_lock + * (currently impossible), warnings will be generated due to + * lockdep_assert_held_write() or TDX_BUG_ON() caused by concurrent BLOC= K, + * TRACK, REMOVE. + * - Promotion/demotion is not yet supported. + */ static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, u64 new_spte, enum pg_level level) { lockdep_assert_held(&kvm->mmu_lock); =20 - if (KVM_BUG_ON(is_shadow_present_pte(old_spte), kvm)) - return -EIO; + if (is_shadow_present_pte(old_spte)) + return tdx_sept_remove_private_spte(kvm, gfn, level, old_spte); =20 if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EIO; @@ -3459,7 +3477,6 @@ int __init tdx_hardware_setup(void) =20 vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; vt_x86_ops.free_external_spt =3D tdx_sept_free_private_spt; - vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; vt_x86_ops.protected_apic_has_interrupt =3D tdx_protected_apic_has_interr= upt; return 0; =20 --=20 2.43.2 From nobody Mon Jun 8 16:49:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F392738BF62; Thu, 28 May 2026 08:53:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958399; cv=none; b=SEZ9xYz3Y8jbUhAclUfX2C3SMuOBWCopIR2esHUebLdu4AfOT2HFEnBctCjMEJGwzmMl87IE1E8bHjKSryXJBjZAqU3ZqYAisCNYYi2CxIgJntSjO7n20kGWVdSOqfyHiGkjGJETOeS7T7pzpUfnkQDI+xwsDGkJZNjFp4ycoLs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958399; c=relaxed/simple; bh=f+wBwykfBdKW+Bz4u5OUYmH9iwXQcglhyZuxElXSYvU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=m8Q7XUGYUO09kh+jK8vUMYNfZWY9N2Ob0IemYekgMUU6ZjjYWb9uRQFdyCqrlIy7raBCcsDl1rcf1YIWHFFlxuEVp0yMVFeLdrksclqiMzK5iFHQUG/9gGR6BwMaeUxx9tRaZQWlPGPT8DmYIOqJTRvFqnDJp/CLrmOImpc1DDA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=PuHr+l/i; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="PuHr+l/i" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779958397; x=1811494397; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=f+wBwykfBdKW+Bz4u5OUYmH9iwXQcglhyZuxElXSYvU=; b=PuHr+l/iZkTpNFA8h3fmIWNZRulKn9EF3FEFpIKY+yyOhuxh/BLK9Ath eLviP5wMRwUR4GmOV0xlA0TPHNbl/4wOCaoEgNmitq32/vw6dGP3W/U2i eQ70uWGCfS2CgNBUhFJbdrvptM/Sxe0jjRwTiPfqyqrgdiNWeQkOR/N9n voHgm0sjuMa8bUGpjrc3zO6BeiaZWYTRVsvkxF8XTUNAdOVzyIsr/GQh/ 5YD6pybJo6NL13qdrt/UU9T7oNz5ouAvK5Yfnk9lChZrC03IOEe5zRUUb qYDtv5BGCWJbAeLFpU54Rf97frbObj2QQ6AqouwoSXE6NpTRSh1Rc0rqy A==; X-CSE-ConnectionGUID: K8gCZkMlQx2v0uHu6ki2Zg== X-CSE-MsgGUID: eo162IlJRzeij58QcjelCQ== X-IronPort-AV: E=McAfee;i="6800,10657,11799"; a="103469716" X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="103469716" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:53:17 -0700 X-CSE-ConnectionGUID: Cl5KX5k/TcOo5nNRJ2xsvg== X-CSE-MsgGUID: Dg4R5X/QRV6F/+bS7dtBdw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="238312160" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:53:13 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v3 13/15] KVM: TDX: Rename tdx_sept_remove_private_spte() to show it's for leaf SPTEs Date: Thu, 28 May 2026 16:13:09 +0800 Message-ID: <20260528081309.10412-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260528080856.10141-1-yan.y.zhao@intel.com> References: <20260528080856.10141-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Rename tdx_sept_remove_private_spte() to tdx_sept_remove_leaf_spte() to clearly show that this function is for removal of leaf SPTEs. No functional change intended. Signed-off-by: Yan Zhao --- MMU_refactors v3: - Rebased to kvm-x86-next-2026.05.26. MMU_refactors v2: - New patch. --- arch/x86/kvm/vmx/tdx.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 68d9deae3a2d..a385cfa56e4c 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1770,8 +1770,8 @@ static void tdx_track(struct kvm *kvm) kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); } =20 -static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 old_spte) +static int tdx_sept_remove_leaf_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, u64 old_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); kvm_pfn_t pfn =3D spte_to_pfn(old_spte); @@ -1844,7 +1844,7 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, u64 old_spte, lockdep_assert_held(&kvm->mmu_lock); =20 if (is_shadow_present_pte(old_spte)) - return tdx_sept_remove_private_spte(kvm, gfn, level, old_spte); + return tdx_sept_remove_leaf_spte(kvm, gfn, level, old_spte); =20 if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EIO; @@ -2845,7 +2845,7 @@ void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) { /* - * TDX has called tdx_track() in tdx_sept_remove_private_spte() to + * TDX has called tdx_track() in tdx_sept_remove_leaf_spte() to * ensure that private EPT will be flushed on the next TD enter. No need * to call tdx_track() here again even when this callback is a result of * zapping private EPT. --=20 2.43.2 From nobody Mon Jun 8 16:49:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BF27B389E13; Thu, 28 May 2026 08:53:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958409; cv=none; b=Ac55PgtWNRusEFfDwg3QTaAUIvB6KMIeXP9igIwCBmlRVq1FJu/jVfI3NHQ13CBJJRtJ0XBbkHMBTWFUjabz21EjSXr/x3gJNscPUECsVJx5KOZ9zvoMQZdwdO3G3u+inm84kkrYIsfEnuR0VcQ19SOOhw1OFKUdK5fQPdMovG8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958409; c=relaxed/simple; bh=Ra7s6JRuRGiQ+O+rP5LogtfQDOt1r2HJRbwGx4MjqA8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=f+pZ2pEPIadTCtg8TJf2XmO1cViHyjDVp6I3bE+iF+p0L11k3E04UjRNWPmTREB4D96ZdnV2PLm25z37At4ts6LrjSCA3j7pYora9rWp3SBqBC0QlPwFv0BpxwoQrfVtEGqRuH3diSY8hi8TulaQ9KoMyznGQk7rbK4hcP5FhX8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SAfJwit6; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SAfJwit6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779958408; x=1811494408; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ra7s6JRuRGiQ+O+rP5LogtfQDOt1r2HJRbwGx4MjqA8=; b=SAfJwit6t7zol07mHNmDgQMrMVIvcoAS6DZSH5wTK1k5j0XoQ826cRdA P/ziJHiGsBdrW5PF0SKkgqW0v/4nZ7dn9v7R9V+UFD8lkjHEWTo2/8WmT WC9ChegPtyjlbYKVYu6ZIpSsccOKkrEDLTeT6/2viyGm2IIw3v7WOmluP 9aiXUokaFCsjWvGTlVa4HlyQnRdOnYwIh+J/Qi2hBg9OuwPi9AV/Lb5jl qPaYXJ8W5Xzt/gebDoKDK/lX+qEYOepWaCmJW2Qga+/v/sxbk7BlbfhEB hWufAIm6SK+KB+KfIJTANsa1zyVNG49lCh7fu1AWqTvya3natt6Yf/t0R Q==; X-CSE-ConnectionGUID: hHl+S2UMTRWeAbmT2aE0tw== X-CSE-MsgGUID: ELJIdlVvQuiv1b8BawGDcg== X-IronPort-AV: E=McAfee;i="6800,10657,11799"; a="103469731" X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="103469731" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:53:27 -0700 X-CSE-ConnectionGUID: KZAox/aZSBenSF9KCrSfWA== X-CSE-MsgGUID: NS+SJ9GTSJ+SMrvqGob6Hw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="238312169" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:53:24 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v3 14/15] KVM: x86: Move error handling inside free_external_spt() Date: Thu, 28 May 2026 16:13:20 +0800 Message-ID: <20260528081320.10428-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260528080856.10141-1-yan.y.zhao@intel.com> References: <20260528080856.10141-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Move the logic for TDX's specific need to leak pages when reclaim fails inside the free_external_spt() op, so this can be done in TDX specific code and not the generic MMU. Do this by passing the "sp" in instead of the external page table pointer. This way TDX code can set sp->external_spt to NULL. Since the error is now handled internally in TDX code (by triggering KVM_BUG_ON() or TDX_BUG_ON_3(), which warn and stop the VM on any error), change the op to return void. This way it also operates like a normal free in that success is guaranteed from the caller's perspective. Opportunistically, drop the unused level and gfn args while adjusting the sp arg. [ Rick: Re-wrote log and massaged op name ] [ Yan: Updated patch log/function comment, dropped unused param in op ] Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v3: - Rebased to kvm-x86-next-2026.05.26. MMU_refactors v2: - Fixed typo in the patch log. (Binbin) - Dropped unused param gfn. (Binbin) - Mentioned that failure is not handled silently in the patch log. (Binbin) - Added expected lock and valid scenarios in function comment of tdx_sept_free_private_spt(). (Yan/Rick) --- arch/x86/include/asm/kvm-x86-ops.h | 2 +- arch/x86/include/asm/kvm_host.h | 3 +-- arch/x86/kvm/mmu/tdp_mmu.c | 13 ++----------- arch/x86/kvm/vmx/tdx.c | 28 ++++++++++++++-------------- 4 files changed, 18 insertions(+), 28 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 771d991562ca..83dc5086138b 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -97,7 +97,7 @@ KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP_OPTIONAL_RET0(tdp_has_smep) KVM_X86_OP(load_mmu_pgd) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) -KVM_X86_OP_OPTIONAL_RET0(free_external_spt) +KVM_X86_OP_OPTIONAL(free_external_spt) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) KVM_X86_OP(get_l2_tsc_multiplier) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 987e005f08eb..e67d0df51843 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1919,8 +1919,7 @@ struct kvm_x86_ops { u64 new_spte, enum pg_level level); =20 /* Update external page tables for page table about to be freed. */ - int (*free_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, - void *external_spt); + void (*free_external_spt)(struct kvm *kvm, struct kvm_mmu_page *sp); =20 =20 bool (*has_wbinvd_exit)(void); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 72d323f2d0dc..74531e4bbee6 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -455,17 +455,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pte= p_t pt, bool shared) handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared); } =20 - if (is_mirror_sp(sp) && - WARN_ON(kvm_x86_call(free_external_spt)(kvm, base_gfn, sp->role.level, - sp->external_spt))) { - /* - * Failed to free page table page in mirror page table and - * there is nothing to do further. - * Intentionally leak the page to prevent the kernel from - * accessing the encrypted page. - */ - sp->external_spt =3D NULL; - } + if (is_mirror_sp(sp)) + kvm_x86_call(free_external_spt)(kvm, sp); =20 call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index a385cfa56e4c..aa0d16b71e00 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1855,27 +1855,27 @@ static int tdx_sept_set_private_spte(struct kvm *kv= m, gfn_t gfn, u64 old_spte, return tdx_sept_map_leaf_spte(kvm, gfn, level, new_spte); } =20 -static int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, - enum pg_level level, void *private_spt) +/* + * Handle changes for non-leaf SPTEs from present to non-present. + * Must be under exclusive mmu_lock and cannot fail. + */ +static void tdx_sept_free_private_spt(struct kvm *kvm, struct kvm_mmu_page= *sp) { - struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - /* - * free_external_spt() is only called after hkid is freed when TD is - * tearing down. * KVM doesn't (yet) zap page table pages in mirror page table while * TD is active, though guest pages mapped in mirror page table could be * zapped during TD is active, e.g. for shared <-> private conversion * and slot move/deletion. + * + * In other words, KVM should only free mirror page tables after the + * TD's hkid is freed, when the TD is being torn down. + * + * If the S-EPT PTE can't be removed for any reason, intentionally leak + * the page to prevent the kernel from accessing the encrypted page. */ - if (KVM_BUG_ON(is_hkid_assigned(kvm_tdx), kvm)) - return -EIO; - - /* - * The HKID assigned to this TD was already freed and cache was - * already flushed. We don't have to flush again. - */ - return tdx_reclaim_page(virt_to_page(private_spt)); + if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) || + tdx_reclaim_page(virt_to_page(sp->external_spt))) + sp->external_spt =3D NULL; } =20 void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, --=20 2.43.2 From nobody Mon Jun 8 16:49:04 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.9]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E21F136165F; Thu, 28 May 2026 08:53:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.9 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958419; cv=none; b=doxESRT6yAAo8Vj4Sm0ngPjn8Nz2cRQkJe/EYGd3UkhzK663tNO0z1EgMLiR8Ckj/W63tKPueVz/lxDDjWmHSGKgxmB8PEeCZVP7D5Pri4ziK8DLQcOpXxFNDVK7n6sLg30MfZXUIgdFaCfU3qfNNa8CDEZS4Rl9LGrBAAuGMU4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779958419; c=relaxed/simple; bh=ilIafkVoYtD48zIY75e2dw5j3dVTs5JwMYk0RHoobvo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Lc5te7E0l9DcGvITdZTEGtLNEVQYfPUBLNNgOubo+EFRV0bwNja7b7dESdEQviI/qIb2K1al/ZCcL4b7W7oUBfzsqGkzzCMNBUUzDj/8UpTsw/IWTeukUNeCxaF6LBLiOehVXvK99jIiAJPxl0jIdsF+cqL+RpWTSOu9wFg/Y9E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=N8Vh0mPU; arc=none smtp.client-ip=198.175.65.9 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="N8Vh0mPU" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1779958418; x=1811494418; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=ilIafkVoYtD48zIY75e2dw5j3dVTs5JwMYk0RHoobvo=; b=N8Vh0mPUj4GQBCOGVAnhVGMD3GJ6I8/svax+fjHpecsIHnp1jfmB+hkW ZVALSgVJ91BJvp/L84l1Fn++IS9mg+G/WbeXTSUWOfgU1XUm7lFeme1v0 h1cxumox4Wjaix0A2CFT8MatsIMcpn3b7MrOIJtr3j0f5euoDZnUHSfAY qNwu2agV7d+yhZsCPlhTEvJHJBdp659vzTCywSbTq4MDwuNLyAKwXLlwZ PwZE05QFObujwcS7sJtFqfoyLxjgXauyAmVGn4g0IEWGm6+PjPhpPuvyK 6Kj7gMSKZlK41kCdzZ3yvXqrpnkIVkk4lPlFi0d+ZNx4S7GLxGwS9vSaJ g==; X-CSE-ConnectionGUID: 1ASCDbfHRwOb0WB1d+omlQ== X-CSE-MsgGUID: fl1MdsLzRimy/yqAT0MLSA== X-IronPort-AV: E=McAfee;i="6800,10657,11799"; a="103469743" X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="103469743" Received: from fmviesa010.fm.intel.com ([10.60.135.150]) by orvoesa101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:53:38 -0700 X-CSE-ConnectionGUID: twjzVqUsQfWhHmPNItis5g== X-CSE-MsgGUID: FK1Li4oCRwuvhYcujm9r4w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.24,173,1774335600"; d="scan'208";a="238312178" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa010-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2026 01:53:34 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v3 15/15] KVM: TDX: Move external page table freeing to TDX code Date: Thu, 28 May 2026 16:13:30 +0800 Message-ID: <20260528081330.10444-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260528080856.10141-1-yan.y.zhao@intel.com> References: <20260528080856.10141-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Move the freeing of external page tables into the reclaim operation that lives in TDX code. The TDP MMU supports traversing the TDP without holding locks. Page tables need to be freed via RCU to prevent walking one that gets freed. While none of these lockless walk operations actually happen for the mirror page table, the TDP MMU nonetheless frees the mirror page table in the same way, and (because it's a handy place to plug it in) the external page table as well. However, the external page table definitely can't be walked once the page table pages are reclaimed from the TDX module. The TDX module releases the page for the host VMM to use, so this RCU-time free is unnecessary for the external page table. So move the free_page() call to TDX code. Create an tdp_mmu_free_unused_sp() to allow for freeing external page tables that have never left the TDP MMU code (i.e. don't need freed in a special way). Link: https://lore.kernel.org/kvm/aYpjNrtGmogNzqwT@google.com Not-yet-Signed-off-by: Sean Christopherson [Based on a diff by Sean, added log] Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v3: - Rebased to kvm-x86-next-2026.05.26. MMU_refactors v2: - Fixed typos in the patch log. (Yan, Kai) - Still kept "Not-yet-Signed-off-by" tag. Sean, please change it to SoB if the patch looks good to you. - Updated the code comment in tdx_sept_free_private_spt(): invoking free_page() to free S-EPT page in tdx_sept_free_private_spt() is only because RCU-time free is unnecessary, not because it can't be performed from RCU callbacks. (Yan) --- arch/x86/kvm/mmu/tdp_mmu.c | 16 +++++++++++----- arch/x86/kvm/vmx/tdx.c | 11 ++++++++++- 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 74531e4bbee6..5b3041138301 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -53,13 +53,18 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) rcu_barrier(); } =20 -static void tdp_mmu_free_sp(struct kvm_mmu_page *sp) +static void __tdp_mmu_free_sp(struct kvm_mmu_page *sp) { - free_page((unsigned long)sp->external_spt); free_page((unsigned long)sp->spt); kmem_cache_free(mmu_page_header_cache, sp); } =20 +static void tdp_mmu_free_unused_sp(struct kvm_mmu_page *sp) +{ + free_page((unsigned long)sp->external_spt); + __tdp_mmu_free_sp(sp); +} + /* * This is called through call_rcu in order to free TDP page table memory * safely with respect to other kernel threads that may be operating on @@ -73,7 +78,8 @@ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head = *head) struct kvm_mmu_page *sp =3D container_of(head, struct kvm_mmu_page, rcu_head); =20 - tdp_mmu_free_sp(sp); + WARN_ON_ONCE(sp->external_spt); + __tdp_mmu_free_sp(sp); } =20 void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root) @@ -1268,7 +1274,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm= _page_fault *fault) * failed, e.g. because a different task modified the SPTE. */ if (r) { - tdp_mmu_free_sp(sp); + tdp_mmu_free_unused_sp(sp); goto retry; } =20 @@ -1579,7 +1585,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *= kvm, * installs its own sp in place of the last sp we tried to split. */ if (sp) - tdp_mmu_free_sp(sp); + tdp_mmu_free_unused_sp(sp); =20 return 0; } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index aa0d16b71e00..8a64e8d9566a 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1875,7 +1875,16 @@ static void tdx_sept_free_private_spt(struct kvm *kv= m, struct kvm_mmu_page *sp) */ if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) || tdx_reclaim_page(virt_to_page(sp->external_spt))) - sp->external_spt =3D NULL; + goto out; + + /* + * Immediately free the S-EPT page because RCU-time free is unnecessary + * after TDH.PHYMEM.PAGE.RECLAIM ensures there are no outstanding + * readers. + */ + free_page((unsigned long)sp->external_spt); +out: + sp->external_spt =3D NULL; } =20 void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, --=20 2.43.2