From nobody Sat Jun 13 06:01:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 37D4838AC99; Sat, 9 May 2026 08:33:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315636; cv=none; b=Gxn0IfqJFuThqS5ODynQAcPt4q+9g21QuqtIEOIx3PObY8CQ5DJOLA4FXr3tTGJ6BR8igkeaWasUmiJ0tAi+29jLIHBMUyp73mveU+X42jAsovTpJlbLyoPf1uua2MvXr6Jno3vYxItufwrVM+cMaNlxEP1rI+8zpIWnemTYGcs= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315636; c=relaxed/simple; bh=pDBc2cJVxgasJ9ZkgKIDUz6fzf30X1yogw69VA3BjVY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KGSD7p99mMyDrw38VNmxtfPvAVJp+ReKEtPf+duMURJIk7Gw+EBxfy6L1jaCgVDv4ASY3/QMid5XjK6C3ZW5Qhz0qc2NgoUxlSrc8EvMOWWZ7xnfZDEKv6TzTNZmT41dAR5frSHfIMakwBprkpdHVEJeUQUSKWkB4jXjDdpNwzA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=e5tLKwwe; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="e5tLKwwe" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778315635; x=1809851635; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=pDBc2cJVxgasJ9ZkgKIDUz6fzf30X1yogw69VA3BjVY=; b=e5tLKwwerv0X918SW9lCapyRLKnuctyTZyREnNyUXEgqshp+bHE/UDqR LQWCxT+YWCoNcDDKZB26dvBm/kkQF6uagQPGu8YW34Y25q9nqGGS6CtkG oUDfEOa1LoFar03NeJ/+fEblEhQrWZFMTuBMqPzSgHzm2QxtCwNwEc/Es rIF60JnaYtfhiM2jOzYnPyvXhXS4izE1nWRKXCHAfPbZGLVrmW2gqQ/0c d2jTNHV2Ge6QpS1ncKiRmKOi5IIIOMU+W0j9NDvv15qBnsN8XjHLubQgd Z8BzK4fHHG0z9i20mX2z/J59TAyDE9n7k/qh9dP39IfKnCUquMgE5YAbO w==; X-CSE-ConnectionGUID: zsEdw2cmTTie20T+/ol4+A== X-CSE-MsgGUID: P7uQYjZFQaOQDIfs9myyiQ== X-IronPort-AV: E=McAfee;i="6800,10657,11780"; a="79306381" X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="79306381" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:33:55 -0700 X-CSE-ConnectionGUID: 1J6rt6JTR4m9dKYlIjLNpA== X-CSE-MsgGUID: 0czzThtKTyCqrOuaiLDGsg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="230584258" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:33:51 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v2 01/15] KVM: TDX: Drop kvm_x86_ops.link_external_spt() Date: Sat, 9 May 2026 15:53:57 +0800 Message-ID: <20260509075357.4113-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260509075201.4077-1-yan.y.zhao@intel.com> References: <20260509075201.4077-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Drop the dedicated .link_external_spt() for linking S-EPT pages, and instead funnel everything through .set_external_spte() for mapping S-EPT entries. Using separate hooks doesn't help prevent TDP MMU details from bleeding into TDX, and vice versa; to the contrary, dedicated callbacks will result in _more_ pollution when hugepage support is added, e.g. will require the TDP MMU to know details about the splitting rules for TDX that aren't all that relevant to the TDP MMU. Ideally, KVM would provide a single pair of hooks to set S-EPT entries, one hook for setting SPTEs under write-lock and another for setting SPTEs under read-lock (e.g. to ensure the entire operation is "atomic", to allow for failure, etc.). Sadly, TDX's requirement that all child S-EPT entries are removed before the parent makes that impractical: the TDP MMU deliberately prunes non-leaf SPTEs and _then_ processes its children, thus making it quite important for the TDP MMU to differentiate between zapping leaf and non-leaf S-EPT entries. However, that's the _only_ case that's truly special, and even that case could be shoehorned into a single hook; it just wouldn't be a net positive. Signed-off-by: Sean Christopherson [Rick: add in trivial feedback] Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v2: - Moved this patch to the very beginning of the series so that when warnings like "KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)" are removed from __tdp_mmu_set_spte_atomic() in TDP MMU in a later patch, the atomic zap change can be propagated via the .set_external_spte() op. (Yan). MMU_refactors v1: - rename external_spt->sept_pt (Rick, Yan) --- arch/x86/include/asm/kvm-x86-ops.h | 1 - arch/x86/include/asm/kvm_host.h | 3 -- arch/x86/kvm/mmu/tdp_mmu.c | 29 +------------- arch/x86/kvm/vmx/tdx.c | 63 ++++++++++++++++++++---------- 4 files changed, 44 insertions(+), 52 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 31d5c5d58ae6..bced6d938702 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -95,7 +95,6 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr) KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) -KVM_X86_OP_OPTIONAL_RET0(link_external_spt) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) KVM_X86_OP_OPTIONAL_RET0(free_external_spt) KVM_X86_OP_OPTIONAL(remove_external_spte) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index c470e40a00aa..832323c4bc27 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1891,9 +1891,6 @@ struct kvm_x86_ops { void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); =20 - /* Update external mapping with page table link. */ - int (*link_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, - void *external_spt); /* Update the external page table from spte getting set. */ int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index dbaeb80f2b64..e6c3b739d1fe 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -495,27 +495,12 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pt= ep_t pt, bool shared) call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); } =20 -static void *get_external_spt(gfn_t gfn, u64 new_spte, int level) -{ - if (is_shadow_present_pte(new_spte) && !is_last_spte(new_spte, level)) { - struct kvm_mmu_page *sp =3D spte_to_child_sp(new_spte); - - WARN_ON_ONCE(sp->role.level + 1 !=3D level); - WARN_ON_ONCE(sp->gfn !=3D gfn); - return sp->external_spt; - } - - return NULL; -} - static int __must_check set_external_spte_present(struct kvm *kvm, tdp_pte= p_t sptep, gfn_t gfn, u64 *old_spte, u64 new_spte, int level) { bool was_present =3D is_shadow_present_pte(*old_spte); - bool is_present =3D is_shadow_present_pte(new_spte); - bool is_leaf =3D is_present && is_last_spte(new_spte, level); - int ret =3D 0; + int ret; =20 KVM_BUG_ON(was_present, kvm); =20 @@ -528,18 +513,8 @@ static int __must_check set_external_spte_present(stru= ct kvm *kvm, tdp_ptep_t sp if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE)) return -EBUSY; =20 - /* - * Use different call to either set up middle level - * external page table, or leaf. - */ - if (is_leaf) { - ret =3D kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); - } else { - void *external_spt =3D get_external_spt(gfn, new_spte, level); + ret =3D kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); =20 - KVM_BUG_ON(!external_spt, kvm); - ret =3D kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt); - } if (ret) __kvm_tdp_mmu_write_spte(sptep, *old_spte); else diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 9bd4fd748e2a..48c836ec6063 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1653,18 +1653,58 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t = gfn, return 0; } =20 +static struct page *tdx_spte_to_sept_pt(struct kvm *kvm, gfn_t gfn, + u64 new_spte, enum pg_level level) +{ + struct kvm_mmu_page *sp =3D spte_to_child_sp(new_spte); + + if (KVM_BUG_ON(!sp->external_spt, kvm) || + KVM_BUG_ON(sp->role.level + 1 !=3D level, kvm) || + KVM_BUG_ON(sp->gfn !=3D gfn, kvm)) + return NULL; + + return virt_to_page(sp->external_spt); +} + +static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, u64 mirror_spte) +{ + gpa_t gpa =3D gfn_to_gpa(gfn); + u64 err, entry, level_state; + struct page *sept_pt; + + sept_pt =3D tdx_spte_to_sept_pt(kvm, gfn, mirror_spte, level); + if (!sept_pt) + return -EIO; + + err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, level, sept_pt, + &entry, &level_state); + if (unlikely(tdx_operand_busy(err))) + return -EBUSY; + + if (TDX_BUG_ON_2(err, TDH_MEM_SEPT_ADD, entry, level_state, kvm)) + return -EIO; + + return 0; +} + static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); =20 + if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) + return -EIO; + + if (!is_last_spte(mirror_spte, level)) + return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte); + /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) return -EIO; =20 - WARN_ON_ONCE(!is_shadow_present_pte(mirror_spte) || - (mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); + WARN_ON_ONCE((mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); =20 /* * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory() @@ -1684,24 +1724,6 @@ static int tdx_sept_set_private_spte(struct kvm *kvm= , gfn_t gfn, return tdx_mem_page_aug(kvm, gfn, level, pfn); } =20 -static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, - enum pg_level level, void *private_spt) -{ - gpa_t gpa =3D gfn_to_gpa(gfn); - struct page *page =3D virt_to_page(private_spt); - u64 err, entry, level_state; - - err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, level, page, &entry, - &level_state); - if (unlikely(tdx_operand_busy(err))) - return -EBUSY; - - if (TDX_BUG_ON_2(err, TDH_MEM_SEPT_ADD, entry, level_state, kvm)) - return -EIO; - - return 0; -} - /* * Ensure shared and private EPTs to be flushed on all vCPUs. * tdh_mem_track() is the only caller that increases TD epoch. An increase= in @@ -3413,7 +3435,6 @@ int __init tdx_hardware_setup(void) =20 vt_x86_ops.vm_size =3D max_t(unsigned int, vt_x86_ops.vm_size, sizeof(str= uct kvm_tdx)); =20 - vt_x86_ops.link_external_spt =3D tdx_sept_link_private_spt; vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; vt_x86_ops.free_external_spt =3D tdx_sept_free_private_spt; vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; --=20 2.43.2 From nobody Sat Jun 13 06:01:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id ECEE238B7D8; Sat, 9 May 2026 08:34:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315698; cv=none; b=BbUlNF3Cd6yEx+ahV82Jhk1TK3knxyMih1B4amWd8sncYsyJmAcMpv2/nvumpaNM3IdIE0dCCNRKt7NrLbpkMdJdvxb2Co/D/uofMjFZhlpltaWSpOPnRqkUyam2297ewlTIyDHG31YfA0jW7mtFBQZmrTrvjA6f/ZHvwtDMBaQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315698; c=relaxed/simple; bh=948v69K4cg1s0/25JUAvNhe+5Im4e7PMOuDEKHNB4+A=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=vC4WrwyYEnOES7yy1QVadB/O4wMscl927rtVlXgvvbC51M1IzoCI5zOBXrjq/2u9omU+dhPluzVHC5zuGSZnUh3XdOMHJL0Y+xbE5qwjgqaJwEhotKWxSNx34h6YPa2k1lydUYAGxXN/EdbQdCFwmOHgYbk8J1SY5jVwKD+hevQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=KmKrQPB/; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KmKrQPB/" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778315697; x=1809851697; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=948v69K4cg1s0/25JUAvNhe+5Im4e7PMOuDEKHNB4+A=; b=KmKrQPB/nHGdST7S7e0CxmlNQqnoTv7C6g9lbKAqBBbl9nKzB1doSQd5 5wJ+ym4WwxbiP9a0ycHPt8wagKc3L0z6SGMJxcOgAGYzG8C8vvaEyQQCw M+zcshengvlSoj9+CstQUVarjA6TxrPJ9AQex/C7J9Cu0zk2nLqvb/oWq VDw5EVvvq9CpVRqCu8U99HVsM0a5v8KN9Tqhz9lCp5hSqF8cq/vqVuKJP tFy60l8t8Zh2VEpDCYkg9n0XP3q1tLeIEROqCnkqc71iDfkt/QoCvZnQX btsYvgLO7NQvCDU700XzR2zwfQhv1ekZJQs5sp2pe6SEiNbqclj66fwxw Q==; X-CSE-ConnectionGUID: a5z25lNZQKmm4qK1KaH5ow== X-CSE-MsgGUID: YTQ6VvFZSBGxB2lZXViXXg== X-IronPort-AV: E=McAfee;i="6800,10657,11780"; a="79464319" X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="79464319" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:34:56 -0700 X-CSE-ConnectionGUID: zuFfFuxHRgWqVNl+3LHXYw== X-CSE-MsgGUID: CYTiLoIRS7GR7OGQKp8nwQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="260706066" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:34:53 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v2 02/15] KVM: TDX: Wrap mapping of leaf and non-leaf S-EPT entries into helpers Date: Sat, 9 May 2026 15:55:00 +0800 Message-ID: <20260509075500.4157-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260509075201.4077-1-yan.y.zhao@intel.com> References: <20260509075201.4077-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Add a helper, tdx_sept_map_leaf_spte(), to wrap and isolate PAGE.ADD and PAGE.AUG operations. Rename tdx_sept_link_private_spt() to tdx_sept_map_nonleaf_spte() to wrap SEPT.ADD for symmetry. Thus, transition tdx_sept_set_private_spte() into a "dispatch" routine for setting/writing S-EPT entries. No functional change intended. Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v2: - Renamed tdx_sept_link_private_spt() to tdx_sept_map_nonleaf_spte(). (Yan) --- arch/x86/kvm/vmx/tdx.c | 24 +++++++++++++++--------- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 48c836ec6063..886e1eac23fa 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1666,7 +1666,7 @@ static struct page *tdx_spte_to_sept_pt(struct kvm *k= vm, gfn_t gfn, return virt_to_page(sp->external_spt); } =20 -static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, +static int tdx_sept_map_nonleaf_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte) { gpa_t gpa =3D gfn_to_gpa(gfn); @@ -1688,18 +1688,12 @@ static int tdx_sept_link_private_spt(struct kvm *kv= m, gfn_t gfn, return 0; } =20 -static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 mirror_spte) +static int tdx_sept_map_leaf_spte(struct kvm *kvm, gfn_t gfn, enum pg_leve= l level, + u64 mirror_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); =20 - if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) - return -EIO; - - if (!is_last_spte(mirror_spte, level)) - return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte); - /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) return -EIO; @@ -1724,6 +1718,18 @@ static int tdx_sept_set_private_spte(struct kvm *kvm= , gfn_t gfn, return tdx_mem_page_aug(kvm, gfn, level, pfn); } =20 +static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, u64 mirror_spte) +{ + if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) + return -EIO; + + if (!is_last_spte(mirror_spte, level)) + return tdx_sept_map_nonleaf_spte(kvm, gfn, level, mirror_spte); + + return tdx_sept_map_leaf_spte(kvm, gfn, level, mirror_spte); +} + /* * Ensure shared and private EPTs to be flushed on all vCPUs. * tdh_mem_track() is the only caller that increases TD epoch. An increase= in --=20 2.43.2 From nobody Sat Jun 13 06:01:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.16]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9A486345CAB; Sat, 9 May 2026 08:35:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.16 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315717; cv=none; b=odn9uFsK/J7fZ2mRZfqJn67V4B8nR3KhPgA3woewJnvwhE3jdHzsLTZHVB1Y3p+KiwGT16PkbOoX+RYj+GtZ8d70CWX2CfB1NJQ2QDNUE+UjhSljYZSlVyYAHyy7PuQ1HsJvMQVpubflej7Ekyv5t1ORFGI44VX3v3EqbEcR06Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315717; c=relaxed/simple; bh=kEHNsax5MgbdzeQKsbZkeBYqWBcWspgbyWA+/nSM2O4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=dBBIR5CNq3521phDTu0Sr7fCTSrFi7C7RRvoEkLjTCBWGS3c3UeWHJa2+p+FWyUVvuLjHXmzj/EtkeW6kSH7jTWU4wAhegmhN5ZDVMW91pIDfvpYpUJ+fs3t5jNb/wo/VlxeFyhDencJ9DYwA3nCwGlV1T+TqKR9V1VyVYetmnQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=KoRnj15z; arc=none smtp.client-ip=198.175.65.16 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="KoRnj15z" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778315716; x=1809851716; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=kEHNsax5MgbdzeQKsbZkeBYqWBcWspgbyWA+/nSM2O4=; b=KoRnj15zU0Up3Q1ocUUirIpu3vLybtB2FWQ0D3Vj6un2fm6EMbs2RMAn LIc6cCATipUydR1/huloEs22BDE8FvST7Qd2AuhxSkVAfactr8wqmmZJ1 0p/TkCS4G1+Hlz0I4Yxi5XDCyywtM1wive9aPJA2nQ2yDWqzyjM9gvEHF wYlgxHTx4hOj+dEfuIDEcpQsJplwQKbZJ7UBRF7lKC7z6kiXfMqLBiwy+ bO0eeEG7J0t6x64+BLd+j2KkuuHCkajFpJ4heqJgeyzxTWrVKMiS0/m6j 0+zLw3G5yokq5dQKKwIxjVE0O/Kw3+ZufCm5FPxt5/D0BV4E7QyCVNXiO A==; X-CSE-ConnectionGUID: j62VNuGvSqWi7jbt8sAInQ== X-CSE-MsgGUID: oR34eDahSIeNmSGwFo8miw== X-IronPort-AV: E=McAfee;i="6800,10657,11780"; a="79464340" X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="79464340" Received: from fmviesa001.fm.intel.com ([10.60.135.141]) by orvoesa108.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:35:15 -0700 X-CSE-ConnectionGUID: 4m6CRSP+RiS6FI9diavzWA== X-CSE-MsgGUID: Ch2mYIhbTdaE0HNFksNpsA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="260706134" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by smtpauth.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:35:12 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v2 03/15] KVM: x86/mmu: Fold set_external_spte_present() into its sole caller Date: Sat, 9 May 2026 15:55:19 +0800 Message-ID: <20260509075520.4177-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260509075201.4077-1-yan.y.zhao@intel.com> References: <20260509075201.4077-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Fold set_external_spte_present() into __tdp_mmu_set_spte_atomic() in anticipation of propagating all changes (like atomic zap) triggered by tdp_mmu_set_spte_atomic() to the external PTEs. No functional change intended. Signed-off-by: Sean Christopherson Signed-off-by: Yan Zhao --- MMU_refactors v2: - Moved to the front of the series and updated the patch log to indicate the propagation of changes for atomic zap. (Yan) --- arch/x86/kvm/mmu/tdp_mmu.c | 72 ++++++++++++++++---------------------- 1 file changed, 31 insertions(+), 41 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index e6c3b739d1fe..aa6b629a9799 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -495,33 +495,6 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pte= p_t pt, bool shared) call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); } =20 -static int __must_check set_external_spte_present(struct kvm *kvm, tdp_pte= p_t sptep, - gfn_t gfn, u64 *old_spte, - u64 new_spte, int level) -{ - bool was_present =3D is_shadow_present_pte(*old_spte); - int ret; - - KVM_BUG_ON(was_present, kvm); - - lockdep_assert_held(&kvm->mmu_lock); - /* - * We need to lock out other updates to the SPTE until the external - * page table has been modified. Use FROZEN_SPTE similar to - * the zapping case. - */ - if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE)) - return -EBUSY; - - ret =3D kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); - - if (ret) - __kvm_tdp_mmu_write_spte(sptep, *old_spte); - else - __kvm_tdp_mmu_write_spte(sptep, new_spte); - return ret; -} - /** * handle_changed_spte - handle bookkeeping associated with an SPTE change * @kvm: kvm instance @@ -626,6 +599,8 @@ static inline int __must_check __tdp_mmu_set_spte_atomi= c(struct kvm *kvm, struct tdp_iter *iter, u64 new_spte) { + u64 *raw_sptep =3D rcu_dereference(iter->sptep); + /* * The caller is responsible for ensuring the old SPTE is not a FROZEN * SPTE. KVM should never attempt to zap or manipulate a FROZEN SPTE, @@ -635,8 +610,13 @@ static inline int __must_check __tdp_mmu_set_spte_atom= ic(struct kvm *kvm, WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte)); =20 if (is_mirror_sptep(iter->sptep) && !is_frozen_spte(new_spte)) { + bool was_present =3D is_shadow_present_pte(iter->old_spte); int ret; =20 + KVM_BUG_ON(was_present, kvm); + + lockdep_assert_held(&kvm->mmu_lock); + /* * Users of atomic zapping don't operate on mirror roots, * so don't handle it and bug the VM if it's seen. @@ -644,25 +624,35 @@ static inline int __must_check __tdp_mmu_set_spte_ato= mic(struct kvm *kvm, if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EBUSY; =20 - ret =3D set_external_spte_present(kvm, iter->sptep, iter->gfn, - &iter->old_spte, new_spte, iter->level); - if (ret) - return ret; - } else { - u64 *sptep =3D rcu_dereference(iter->sptep); - /* - * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs - * and does not hold the mmu_lock. On failure, i.e. if a - * different logical CPU modified the SPTE, try_cmpxchg64() - * updates iter->old_spte with the current value, so the caller - * operates on fresh data, e.g. if it retries - * tdp_mmu_set_spte_atomic() + * We need to lock out other updates to the SPTE until the external + * page table has been modified. Use FROZEN_SPTE similar to + * the zapping case. */ - if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte)) + if (!try_cmpxchg64(raw_sptep, &iter->old_spte, FROZEN_SPTE)) return -EBUSY; + + ret =3D kvm_x86_call(set_external_spte)(kvm, iter->gfn, iter->level, + new_spte); + + if (ret) + __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte); + else + __kvm_tdp_mmu_write_spte(iter->sptep, new_spte); + + return ret; } =20 + /* + * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs and + * does not hold the mmu_lock. On failure, i.e. if a different logical + * CPU modified the SPTE, try_cmpxchg64() updates iter->old_spte with + * the current value, so the caller operates on fresh data, e.g. if it + * retries tdp_mmu_set_spte_atomic(). + */ + if (!try_cmpxchg64(raw_sptep, &iter->old_spte, new_spte)) + return -EBUSY; + return 0; } =20 --=20 2.43.2 From nobody Sat Jun 13 06:01:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1522038C421; Sat, 9 May 2026 08:35:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315730; cv=none; b=a0ew/yezUlYHZCHRaDEfjVT6LrPINmwAbk7UDdWI3G8CkL4OhXZEXjltBSssgd5RzBod5O6TVHPNzh9ysxk8qjVW+9P1OK1yUl+GbGUvjq+EDBNeEdqf5KKp/fpdsUMBfR93BJ2/dxrvLkAQ4sK/C0wFT14Mx0RgQDmO5CPn2wk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315730; c=relaxed/simple; bh=vlTgTATwZiM8fcEvYsPleSFsO07nYNfjfqChI0OT0MU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=RtQCvhQaC+FjntntjTHTw7zo4EHbL1ZeY+KxLXUNZp9AUopr+3ZyAMvBZun42alNMoIIveA1gubYML4lc27lnySjOgRydWVlj1POo2kyjj0Fr4T9Xfp52szSs9XyWi7DtCegJ7r3wpxGkuwgY7LIV4v3aF9iYerynIHAVmtogVE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=a7syGuDH; arc=none smtp.client-ip=198.175.65.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="a7syGuDH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778315729; x=1809851729; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=vlTgTATwZiM8fcEvYsPleSFsO07nYNfjfqChI0OT0MU=; b=a7syGuDHAUtV+Kd4TNpr/vZj2NqeIQUCT55o+cdpQTXf6AZknD3PbPtt REXtMZbIo83pMykNMk4m/ZY16tCo7M6v8BKaEZJkrmvXMljdhbEXtutbK QfYw6+Cz9tVL/U5h6S5gRiZb7vSAqgVEIIZrOSvaeZU9fDgDhPBS/aw0H yAU/Fub3p84HF8ATLM/yvXZpV9Mlw6a9oMg2EkQvZK2g4x8WWu50usUPD FJVGjrjhc42/M2xjKa9b9cbaJwU9LT05Ff+O8lRLz/xKsSA6p3D7oOllk KKTXUyrEG2zmfdXbgBlmweis+SGNXbG9T7GwfqVag+GFd1eTeDuBSeaIM Q==; X-CSE-ConnectionGUID: BSox26yzRo6Jr0pVQ5lwRQ== X-CSE-MsgGUID: RVe6ZJYDTqKqUMkRV0FlTQ== X-IronPort-AV: E=McAfee;i="6800,10657,11780"; a="90748327" X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="90748327" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:35:29 -0700 X-CSE-ConnectionGUID: 9Nz6ykkKSOmTIz+1KomI+w== X-CSE-MsgGUID: HBKpJHAdRzeCkWei+00LsQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="237086887" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:35:26 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v2 04/15] KVM: x86/mmu: Plumb param "old_spte" into kvm_x86_ops.set_external_spte() Date: Sat, 9 May 2026 15:55:33 +0800 Message-ID: <20260509075533.4193-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260509075201.4077-1-yan.y.zhao@intel.com> References: <20260509075201.4077-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson If tdp_mmu_set_spte_atomic() triggers an atomic zap on a mirror SPTE (though currently no paths trigger it), the change is propagated via the set_external_spte() op. Plumb the old SPTE into the set_external_spte() op, so TDX code rather than TDP MMU core can warn if the atomic zap isn't allowed. Rename mirror_spte to new_spte to follow the TDP MMU's naming, and to make it more obvious what value the parameter holds. Opportunistically tweak the ordering of parameters to match the pattern of most TDP MMU functions, which do "old, new, level". Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v2: - Moved this patch to before dropping the warning of "KVM_BUG_ON(was_present, kvm)" in __tdp_mmu_set_spte_atomic(). So TDX's tdx_sept_set_private_spte() can later warn instead if atomic zap is propagated via the set_external_spte() op (as allowed by tdp_mmu_set_spte_atomic() if it occurs). (Yan) --- arch/x86/include/asm/kvm_host.h | 4 ++-- arch/x86/kvm/mmu/tdp_mmu.c | 4 ++-- arch/x86/kvm/vmx/tdx.c | 22 +++++++++++----------- 3 files changed, 15 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 832323c4bc27..9b55973f194c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1892,8 +1892,8 @@ struct kvm_x86_ops { int root_level); =20 /* Update the external page table from spte getting set. */ - int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, - u64 mirror_spte); + int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, u64 old_spte, + u64 new_spte, enum pg_level level); =20 /* Update external page tables for page table about to be freed. */ int (*free_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index aa6b629a9799..ceb27769bcf6 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -632,8 +632,8 @@ static inline int __must_check __tdp_mmu_set_spte_atomi= c(struct kvm *kvm, if (!try_cmpxchg64(raw_sptep, &iter->old_spte, FROZEN_SPTE)) return -EBUSY; =20 - ret =3D kvm_x86_call(set_external_spte)(kvm, iter->gfn, iter->level, - new_spte); + ret =3D kvm_x86_call(set_external_spte)(kvm, iter->gfn, iter->old_spte, + new_spte, iter->level); =20 if (ret) __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte); diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 886e1eac23fa..219da92fe8ea 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1667,13 +1667,13 @@ static struct page *tdx_spte_to_sept_pt(struct kvm = *kvm, gfn_t gfn, } =20 static int tdx_sept_map_nonleaf_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 mirror_spte) + enum pg_level level, u64 new_spte) { gpa_t gpa =3D gfn_to_gpa(gfn); u64 err, entry, level_state; struct page *sept_pt; =20 - sept_pt =3D tdx_spte_to_sept_pt(kvm, gfn, mirror_spte, level); + sept_pt =3D tdx_spte_to_sept_pt(kvm, gfn, new_spte, level); if (!sept_pt) return -EIO; =20 @@ -1689,16 +1689,16 @@ static int tdx_sept_map_nonleaf_spte(struct kvm *kv= m, gfn_t gfn, } =20 static int tdx_sept_map_leaf_spte(struct kvm *kvm, gfn_t gfn, enum pg_leve= l level, - u64 mirror_spte) + u64 new_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); + kvm_pfn_t pfn =3D spte_to_pfn(new_spte); =20 /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) return -EIO; =20 - WARN_ON_ONCE((mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); + WARN_ON_ONCE((new_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); =20 /* * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory() @@ -1718,16 +1718,16 @@ static int tdx_sept_map_leaf_spte(struct kvm *kvm, = gfn_t gfn, enum pg_level leve return tdx_mem_page_aug(kvm, gfn, level, pfn); } =20 -static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 mirror_spte) +static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, + u64 new_spte, enum pg_level level) { - if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) + if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EIO; =20 - if (!is_last_spte(mirror_spte, level)) - return tdx_sept_map_nonleaf_spte(kvm, gfn, level, mirror_spte); + if (!is_last_spte(new_spte, level)) + return tdx_sept_map_nonleaf_spte(kvm, gfn, level, new_spte); =20 - return tdx_sept_map_leaf_spte(kvm, gfn, level, mirror_spte); + return tdx_sept_map_leaf_spte(kvm, gfn, level, new_spte); } =20 /* --=20 2.43.2 From nobody Sat Jun 13 06:01:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D8091345CAB; Sat, 9 May 2026 08:35:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315741; cv=none; b=Wr8z3UFdv65bMJpGNgYfvfyDmO10+Anf4GYqqdVzgPotkxHdYdwPlkNbSnOnwaDXKnMfSAk79y/a49LsClfwNj5csSviYEFev3bdlhzpo0sQVGjb8AHM58xP3kq0Gqn3v+7JeKr4nBZMLLs7Sl9TRq4tRSqXQZAa39U1iWl9Ihc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315741; c=relaxed/simple; bh=1vfdBKn7/QQ7AqLbsv1Df8yInVJjjwRjwAhZQBF6wak=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=uYG3Aw0tg9kH1x6o6sk4N/19Y9Cq4PE3rJcW40KS7HcaOHDF0PS8RM/PNHBq9imfA67VZCW0prTI+DusLQFqQfrg9XYXZ20jOsHkFbxaWt+3VUv+kMHmxWBZ7CAEMhH+peTOwYzWHbZzQJC7wJUDShzOmI3+VwHcPereOpvkNOY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=NIDEldud; arc=none smtp.client-ip=198.175.65.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="NIDEldud" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778315740; x=1809851740; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1vfdBKn7/QQ7AqLbsv1Df8yInVJjjwRjwAhZQBF6wak=; b=NIDEldudww5jldJsleEEAm+S0jqjBh2vfoqdVTYt0P8vpoFKzAudYcwF RpoJAMrnI7FXzOSr43SuHbDErGQkjiFZq6zyZavPLl1fKgycLqV6OEZ04 JwhPS0TJTlwoe/rw6AV1C60YShQYjW0A6rWf77MeVEe3Czr7Aq0HYFG2I YXeALWgT91VVvLsdUrtaWbkoUbK9IGLzqiOQ57FR2bsqyih9mTg9A1h6D hdhrW7tQ/SpisrbXI2VER+9E6ShZgSV82YbxMa+Ow7YjvZ6w1c0uEPjKi oHG7rVqrHpIOQDNNi4obSdif1M/jDT8KYy75MeKghdGC3S7ZYII0xi4H8 Q==; X-CSE-ConnectionGUID: RQ4TnyzxSbuDDGmcJXsrVg== X-CSE-MsgGUID: cNoNqPK1RquKfYANlZfPcA== X-IronPort-AV: E=McAfee;i="6800,10657,11780"; a="90748334" X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="90748334" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:35:40 -0700 X-CSE-ConnectionGUID: o5cKGYFkT3mn83IrRBPCoA== X-CSE-MsgGUID: K6sCxxJERzyzNCzhkBh+4g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="237086905" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:35:37 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v2 05/15] KVM: TDX: Move KVM_BUG_ON()s in __tdp_mmu_set_spte_atomic() to TDX code Date: Sat, 9 May 2026 15:55:44 +0800 Message-ID: <20260509075544.4210-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260509075201.4077-1-yan.y.zhao@intel.com> References: <20260509075201.4077-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rick Edgecombe Drop some KVM_BUG_ON()s that are guarding against TDP MMU attempting to propagate unsupported changes to the external page table through __tdp_mmu_set_spte_atomic(). Have TDX code trigger them instead. Now that TDP MMU logically allows propagating atomic zapping operation to the external page table through the set_external_spte() op in __tdp_mmu_set_spte_atomic(). TDX code will trigger the KVM_BUG_ON() on the atomic zapping request instead. (Note: non-atomic zapping is not propagated via the set_external_spte() op yet). Despite the generic naming, external page table ops are designed completely around TDX. They hook the bare minimum of what is needed, and exclude the operations that are not supported by TDX. To help wrangle which operations are handleable by various operations, warnings and KVM_BUG_ON()s exist in the code. These warnings and KVM_BUG_ON()s put the burden of understanding which operations should be forwarded to TDX code on TDP MMU developers, who often read the code without TDX context. Future changes will transition the encapsulation of this domain knowledge to TDX code by funneling the external page table updates through a central update mechanism. In this paradigm, the central update mechanism can encapsulate the special knowledge, but will not have as much knowledge about what operation is in progress. Suggested-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v2: - Moved this patch after "KVM: TDX: Drop kvm_x86_ops.link_external_spt()" and "KVM: x86/mmu: Plumb param "old_spte" into kvm_x86_ops.set_external_spte()". (Yan) - Added a replacement KVM_BUG_ON() in TDX for the dropped KVM_BUG_ON(was_present, kvm) in __tdp_mmu_set_spte_atomic(). (Yan). --- arch/x86/kvm/mmu/tdp_mmu.c | 10 ---------- arch/x86/kvm/vmx/tdx.c | 3 +++ 2 files changed, 3 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index ceb27769bcf6..f55967f8d74a 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -610,20 +610,10 @@ static inline int __must_check __tdp_mmu_set_spte_ato= mic(struct kvm *kvm, WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte)); =20 if (is_mirror_sptep(iter->sptep) && !is_frozen_spte(new_spte)) { - bool was_present =3D is_shadow_present_pte(iter->old_spte); int ret; =20 - KVM_BUG_ON(was_present, kvm); - lockdep_assert_held(&kvm->mmu_lock); =20 - /* - * Users of atomic zapping don't operate on mirror roots, - * so don't handle it and bug the VM if it's seen. - */ - if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) - return -EBUSY; - /* * We need to lock out other updates to the SPTE until the external * page table has been modified. Use FROZEN_SPTE similar to diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 219da92fe8ea..0ded336fbf70 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1721,6 +1721,9 @@ static int tdx_sept_map_leaf_spte(struct kvm *kvm, gf= n_t gfn, enum pg_level leve static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, u64 new_spte, enum pg_level level) { + if (KVM_BUG_ON(is_shadow_present_pte(old_spte), kvm)) + return -EIO; + if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EIO; =20 --=20 2.43.2 From nobody Sat Jun 13 06:01:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0BA6338B7D8; Sat, 9 May 2026 08:35:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.12 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315754; cv=none; b=Oa45wRX3EUOWa+6Sgbr1znYLA+ynsvWwp+msQQFyYjqihZKFsYmeGu7gITyZlxyLAm8u7cNYAyjk61NerfOPi4TFBndrbStsusR48u0rt63JmkWWdySoSfPdLGQMpFQaBdCOmmE0IVXlxZQVXNLhkq3PO9QNz3ukx68X9h5lIG0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315754; c=relaxed/simple; bh=Ceydaom24DfcEFjSJM4tuodGfAHyBHsKt8PnRO0XXh0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=BbAf1khZPfNgiDIDoTxufT3MJYXcijIoAnKzl8vVy0+vdtw9kU9YsO0RevtXRrJf8KMvfJWOZljzhfmhyNGtnxsQLUp9lntILVZA9FgjSCDhHBa/Dg5OxMExT6KdKPifYCJVuO+sdDHrOzPSMueGl0sRiEWkjEa98lnKmJwQGP0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ZkF64O9O; arc=none smtp.client-ip=198.175.65.12 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ZkF64O9O" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778315753; x=1809851753; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ceydaom24DfcEFjSJM4tuodGfAHyBHsKt8PnRO0XXh0=; b=ZkF64O9O4obI+jFowcwZ4scmEimuln9XiVy/vSMU0pcHeFKFLwC+ROTH QOP5v1Yn1Habk3szuS5CRfgLelwaqHS2fHS9cC7QoI+Fly2t1lNOLChH2 k0c2ncp8ks1v8OgvPWM7aytXnU/0OM0HHiI4+mnyP/QOg7tiEApVEugj0 +MHy1m+s+E/Jc0o4GkJ1HSGMFkWn9KWDi1VpgQVbkKWztdiGqjN0k+4+L evuvhaJ5rR8FQQYDsDCQP4NzQr5ghSkuTFgEycIXjDCIkRSCCd67Uwmcl fb8ZGVb40K08yxmEVg5YjSJROsiq/qiKoKmlySgcmfRFWhMb6g/L8zUIJ Q==; X-CSE-ConnectionGUID: ahdRaL+8QBuwcSIeRv2LNg== X-CSE-MsgGUID: 3QuVGUU/SFKBqS4XIdAmnw== X-IronPort-AV: E=McAfee;i="6800,10657,11780"; a="90748339" X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="90748339" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:35:53 -0700 X-CSE-ConnectionGUID: cdaHKLqsQHGHbMClpWbOpg== X-CSE-MsgGUID: IxykrO9YRG21at9gyUNm4w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="237086908" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:35:49 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v2 06/15] KVM: TDX: Move lockdep assert in __tdp_mmu_set_spte_atomic() to TDX code Date: Sat, 9 May 2026 15:55:57 +0800 Message-ID: <20260509075557.4226-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260509075201.4077-1-yan.y.zhao@intel.com> References: <20260509075201.4077-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rick Edgecombe Move the MMU lockdep assert in __tdp_mmu_set_spte_atomic() into the TDX specific op because the assert is TDX specific in intention. The TDP MMU has many lockdep asserts for various scenarios, and in fact the callchains that are used for TDX already have a lockdep assert which covers the case in __tdp_mmu_set_spte_atomic(). However, these asserts are for management of the TDP root owned by KVM. In the __tdp_mmu_set_spte_atomic() assert case, it is helping with a scheme to avoid contention in the TDX module during zap operations. That is very TDX specific. One option would be to just remove the assert in __tdp_mmu_set_spte_atomic() and rely on the other ones in the TDP MMU. But that assert is for a different intention, and too far away from the SEAMCALL that needs it. So just move it to TDX code. Suggested-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- arch/x86/kvm/mmu/tdp_mmu.c | 2 -- arch/x86/kvm/vmx/tdx.c | 2 ++ 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index f55967f8d74a..401bb49a91ee 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -612,8 +612,6 @@ static inline int __must_check __tdp_mmu_set_spte_atomi= c(struct kvm *kvm, if (is_mirror_sptep(iter->sptep) && !is_frozen_spte(new_spte)) { int ret; =20 - lockdep_assert_held(&kvm->mmu_lock); - /* * We need to lock out other updates to the SPTE until the external * page table has been modified. Use FROZEN_SPTE similar to diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 0ded336fbf70..48aa7936a7f7 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1721,6 +1721,8 @@ static int tdx_sept_map_leaf_spte(struct kvm *kvm, gf= n_t gfn, enum pg_level leve static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, u64 new_spte, enum pg_level level) { + lockdep_assert_held(&kvm->mmu_lock); + if (KVM_BUG_ON(is_shadow_present_pte(old_spte), kvm)) return -EIO; =20 --=20 2.43.2 From nobody Sat Jun 13 06:01:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A164F345CAB; Sat, 9 May 2026 08:36:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315766; cv=none; b=ktZUbrjTz081pOEwhDkfDYMLXmbtN0WR9HwAyZzcZWvt5PJfJhBokpXc8FOA0QpRGnSHN1Aew/O7u1G/uALbZq0CiW6/DnG15rPsrCEoLMN0YmYeTgoXHsFWOrZojUwWtrC82EKVvgUGUziJoE5Jn8w88lW3hiV21RZlxtM72ss= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315766; c=relaxed/simple; bh=D1Iad4MZlURrN+EJfCKymz0sx8uZtENNlCbI6OF/EJ4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jPJjChSD+1dwzEyA8n2rVITUnoNMLqmyGsxH4ES91guIo1NIGyytBXVp1PTAW6wOVOMPveZfuBYh+Rsej7f/0yCikbc8yGeJgCyW4HVsMkFXtNZB0SHdhOIPGrerhTghyJwj8QFJyvq1aFTO8skUeKrcHyDt6obuiJnuCB4Sfv4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ktIYdt05; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ktIYdt05" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778315766; x=1809851766; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=D1Iad4MZlURrN+EJfCKymz0sx8uZtENNlCbI6OF/EJ4=; b=ktIYdt05a8RnfEYs9P+D0Ffo+4JN/M5lxK8JkvDCu7VR7tgPmyvP6DND ml7Nhmq4QDRvcndFjSUYGLqFMXOFGLTxyZ/5ZrLKDE3YKhlQFyh0YjmQG iJL8yncqerBbvMcUeU6h0Kc/0n4In84oG0eczVclvUtgQf6jwbs45STXQ U2G1XfqTEy29xhS94woqVlYX3UIUXMuwwJP6Ys73nQfCYdifu/QLNuksN NbBYa233Tjr85sButHYpzF0VA2pDqC+EKlHt2554eTXPPTUga6NDUNZ+l 5N/HIyUGGHlMib5JWgbFhgCElAluF47dC8lCFhqa5q+W4qZ4irUAX3I5G Q==; X-CSE-ConnectionGUID: NCJJ4z5pSuC0Fil/3VZ+vw== X-CSE-MsgGUID: iUuL/zqnRN2+wXFKXuJeRQ== X-IronPort-AV: E=McAfee;i="6800,10657,11780"; a="90388059" X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="90388059" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:36:05 -0700 X-CSE-ConnectionGUID: RmJm6zgqTQ2hNZC4RbHbLw== X-CSE-MsgGUID: EbPXwMNuRg+cBCMsPufxmA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="236152980" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:36:02 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v2 07/15] KVM: x86/tdp_mmu: Morph !is_frozen_spte() check into a KVM_MMU_WARN_ON() Date: Sat, 9 May 2026 15:56:09 +0800 Message-ID: <20260509075609.4242-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260509075201.4077-1-yan.y.zhao@intel.com> References: <20260509075201.4077-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rick Edgecombe Remove the conditional logic for handling the setting of mirror page table to frozen in __tdp_mmu_set_spte_atomic() and add it as a warning for both mirror and direct cases. The mirror page table needs to propagate PTE changes to the external page table. This presents a problem for atomic updates which can't update both page tables at once. So a special value, FROZEN_SPTE, is used as a temporary state during these updates to prevent concurrent operations on the PTE. If the TDP MMU tried to install FROZEN_SPTE as a long-term value, it would confuse these updates. On the other hand, it would also confuse other threads if FROZEN_SPTE is installed as a long-term value for direct page tables (e.g., causing another thread working on atomic zap to wait for a !FROZEN_SPTE value endlessly). Therefore, add the warning for installing FROZEN_SPTE as a long-term value in __tdp_mmu_set_spte_atomic() without differentiating whether it's a mirror or direct page table. Suggested-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v2: - Updated the comment for "KVM_MMU_WARN_ON(is_frozen_spte(new_spte))". (Yan) - Explained why the warning also applies to direct page tables. (Yan) --- arch/x86/kvm/mmu/tdp_mmu.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 401bb49a91ee..345fdb0a89fb 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -609,7 +609,10 @@ static inline int __must_check __tdp_mmu_set_spte_atom= ic(struct kvm *kvm, */ WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte)); =20 - if (is_mirror_sptep(iter->sptep) && !is_frozen_spte(new_spte)) { + /* Should not set FROZEN_SPTE as a long-term value. */ + KVM_MMU_WARN_ON(is_frozen_spte(new_spte)); + + if (is_mirror_sptep(iter->sptep)) { int ret; =20 /* --=20 2.43.2 From nobody Sat Jun 13 06:01:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6934A355813; Sat, 9 May 2026 08:36:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315780; cv=none; b=Nf7dPS9H9rBs9OblKVXVWaL606w6GU2lOh1AvryhxSjnMSpmEmguo7tbl8Umqx5riZCu4B5CVQ00Ra5kH2z80PIsWVKQzZq6MVXLmMj/6tJZkIqZnc2W3NSryVT4/Cv6t/evVpo1+HaSEyGtKBd+KnMvUbZiQHFQt1GTDDN9ySo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315780; c=relaxed/simple; bh=fFmaqxBvd77uI0tOwnNvvIjPugIKjr5wjFR0YDe0fL8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=jABiC4ubXvCLC3Us9r1kW6mmxS0Zfjdf35tlM3xOhS3ROxnbS8ZHjYQjkzGs5FUo3WfXCSQ2CXBg3MnjUPPNW7HuBVPMBLeJlwDWhw7qCRp0qC/ztdYI+nR5iUWIyQl3E5AkzBIYZo/b2uXfN3eAAVh7d2ZDkFOzFot35eS3xMA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=l3i18r4t; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="l3i18r4t" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778315778; x=1809851778; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=fFmaqxBvd77uI0tOwnNvvIjPugIKjr5wjFR0YDe0fL8=; b=l3i18r4tTXpp9WCf6IS3SBGuVsJnfSxoLEOLgtvYsjn7kaeQO7pPULvX hLtlUdtJZ7CRmidf2Rb9gkWD5/sjP/CPFe5Ou3rv19azuOo2zhYnD0Vpx sfqbK4UgibIOCWqapvvK7O2kK0Mh5LffiFlB/SLhh1pY4b1FUhdRSM7bR rCLS8NJgjPCxdc30054VFN6llnazWNfRNN2/EtcdKN4TQ2LsYdW+6Jto7 SEYwE1bpO7tLYWFU/PJF+NcR/1ghImYp2Ese11BhR3zlFzSo/kyGqstfY AoIkDY9vTc3M4gvgFcN/iztTvLVqr9p4aCZvn+QjxeMMPkrGKoRj+fWUg Q==; X-CSE-ConnectionGUID: 3ruuYSUqSxeI0628guawow== X-CSE-MsgGUID: qd4beqtDRIq3fTI1cXcvmg== X-IronPort-AV: E=McAfee;i="6800,10657,11780"; a="90388066" X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="90388066" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:36:18 -0700 X-CSE-ConnectionGUID: ltwKjp1WStK3LdV9K8OgRA== X-CSE-MsgGUID: N0whex82T5633GoCIMyEFA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="236153010" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:36:15 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v2 08/15] KVM: x86/mmu: Plumb "sp" _pointer_ into the TDP MMU's handle_changed_spte() Date: Sat, 9 May 2026 15:56:22 +0800 Message-ID: <20260509075622.4258-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260509075201.4077-1-yan.y.zhao@intel.com> References: <20260509075201.4077-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Plumb the "sp" pointer into handle_changed_spte() to allow checking of is_mirror_sp(sp) in handle_changed_spte() in the next patch. This is a preparation to consolidate all S-EPT updates into a single kvm_x86_ops hook. [Yan: Remove unused "as_id" param in tdp_mmu_set_spte() ] Signed-off-by: Sean Christopherson Signed-off-by: Yan Zhao --- MMU_refactors v2: -Split out and added back this patch. (The patch was in Sean's original series, and had "SPTE" instead of "sp" in title). (Yan) - Remove unused "as_id" param in tdp_mmu_set_spte(). (Yan). --- arch/x86/kvm/mmu/tdp_mmu.c | 35 ++++++++++++++++++----------------- 1 file changed, 18 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 345fdb0a89fb..05dc8bdc1ea5 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -320,9 +320,9 @@ void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool= mirror) } } =20 -static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, - u64 old_spte, u64 new_spte, int level, - bool shared); +static void handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp, + gfn_t gfn, u64 old_spte, u64 new_spte, + int level, bool shared); =20 static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp) { @@ -471,8 +471,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep= _t pt, bool shared) old_spte =3D kvm_tdp_mmu_write_spte(sptep, old_spte, FROZEN_SPTE, level); } - handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn, - old_spte, FROZEN_SPTE, level, shared); + handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared); =20 if (is_mirror_sp(sp)) { KVM_BUG_ON(shared, kvm); @@ -498,7 +497,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep= _t pt, bool shared) /** * handle_changed_spte - handle bookkeeping associated with an SPTE change * @kvm: kvm instance - * @as_id: the address space of the paging structure the SPTE was a part of + * @sp: the page table in which the SPTE resides * @gfn: the base GFN that was mapped by the SPTE * @old_spte: The value of the SPTE before the change * @new_spte: The value of the SPTE after the change @@ -511,15 +510,16 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pt= ep_t pt, bool shared) * dirty logging updates are handled in common code, not here (see make_sp= te() * and fast_pf_fix_direct_spte()). */ -static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, - u64 old_spte, u64 new_spte, int level, - bool shared) +static void handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp, + gfn_t gfn, u64 old_spte, u64 new_spte, + int level, bool shared) { bool was_present =3D is_shadow_present_pte(old_spte); bool is_present =3D is_shadow_present_pte(new_spte); bool was_leaf =3D was_present && is_last_spte(old_spte, level); bool is_leaf =3D is_present && is_last_spte(new_spte, level); bool pfn_changed =3D spte_to_pfn(old_spte) !=3D spte_to_pfn(new_spte); + int as_id =3D kvm_mmu_page_as_id(sp); =20 WARN_ON_ONCE(level > PT64_ROOT_MAX_LEVEL); WARN_ON_ONCE(level < PG_LEVEL_4K); @@ -668,6 +668,7 @@ static inline int __must_check tdp_mmu_set_spte_atomic(= struct kvm *kvm, struct tdp_iter *iter, u64 new_spte) { + struct kvm_mmu_page *sp =3D sptep_to_sp(rcu_dereference(iter->sptep)); int ret; =20 lockdep_assert_held_read(&kvm->mmu_lock); @@ -676,7 +677,7 @@ static inline int __must_check tdp_mmu_set_spte_atomic(= struct kvm *kvm, if (ret) return ret; =20 - handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte, + handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte, new_spte, iter->level, true); =20 return 0; @@ -685,7 +686,6 @@ static inline int __must_check tdp_mmu_set_spte_atomic(= struct kvm *kvm, /* * tdp_mmu_set_spte - Set a TDP MMU SPTE and handle the associated bookkee= ping * @kvm: KVM instance - * @as_id: Address space ID, i.e. regular vs. SMM * @sptep: Pointer to the SPTE * @old_spte: The current value of the SPTE * @new_spte: The new value that will be set for the SPTE @@ -695,9 +695,11 @@ static inline int __must_check tdp_mmu_set_spte_atomic= (struct kvm *kvm, * Returns the old SPTE value, which _may_ be different than @old_spte if = the * SPTE had voldatile bits. */ -static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep, - u64 old_spte, u64 new_spte, gfn_t gfn, int level) +static u64 tdp_mmu_set_spte(struct kvm *kvm, tdp_ptep_t sptep, u64 old_spt= e, + u64 new_spte, gfn_t gfn, int level) { + struct kvm_mmu_page *sp =3D sptep_to_sp(rcu_dereference(sptep)); + lockdep_assert_held_write(&kvm->mmu_lock); =20 /* @@ -711,7 +713,7 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id,= tdp_ptep_t sptep, =20 old_spte =3D kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level); =20 - handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false); + handle_changed_spte(kvm, sp, gfn, old_spte, new_spte, level, false); =20 /* * Users that do non-atomic setting of PTEs don't operate on mirror @@ -729,9 +731,8 @@ static inline void tdp_mmu_iter_set_spte(struct kvm *kv= m, struct tdp_iter *iter, u64 new_spte) { WARN_ON_ONCE(iter->yielded); - iter->old_spte =3D tdp_mmu_set_spte(kvm, iter->as_id, iter->sptep, - iter->old_spte, new_spte, - iter->gfn, iter->level); + iter->old_spte =3D tdp_mmu_set_spte(kvm, iter->sptep, iter->old_spte, + new_spte, iter->gfn, iter->level); } =20 #define tdp_root_for_each_pte(_iter, _kvm, _root, _start, _end) \ --=20 2.43.2 From nobody Sat Jun 13 06:01:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A68B038239B; Sat, 9 May 2026 08:36:30 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315792; cv=none; b=kswqdHS6nmAnSiKxYxuVfQXsJp8fQ90lrC9hVLeqTvlm0YRAnvUgoLYCv8WD/0Bxzc/1eBs8zqRyMYWRbUr1Fo0q4/qLAygMVd+SFe0vk9SEPy0uVxQmitP+a08j1KfRG0DI+KlebuvDWwIh4+hN+mNX7nUa+uvq75Wgjnyzg80= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315792; c=relaxed/simple; bh=zJHgnqDY2cnepeWOchbc58PuzD5UV/UhEseRwV5yyn8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=mAbcIML+a859WfxOuYLRa5TptCy87m46JLijvfwlXdDc+y19icxo72wMAmErNkj8i3T20dMChtHV8PePXZ63maqcdXNAdc0NfWN6J7qTj2TkOb1p5XfQhMtZwMxVfI46X0pmPEPllSk/7aAJADwZD2F9JsYkibqMyoRBzbLW0Vw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=LoTC63DI; arc=none smtp.client-ip=198.175.65.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="LoTC63DI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778315791; x=1809851791; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=zJHgnqDY2cnepeWOchbc58PuzD5UV/UhEseRwV5yyn8=; b=LoTC63DIWvpEMgcR45+uAOME9ZFWNOF3pEhDIQlNQYHfSkGcYLsokIAf tcQ1bWneaOcqgx1OAxg0+aOzU9YGgHk212bBho4jBtmKZYcu/cdyuj+nn vriU5951odZBhNJDyzKdN6a3F4sourdH1arMW2I6BpzXLI3/zFMKxet7G calInNtG6HiR5IjYTfcZsuUXNw6vxXE6M30MVSGwRxVFlbE4VazzaT/+r UG9zL1mSRMXw4xyEUmFiGwJJlrHRCOxfiwlz3hlS5x/SaCs9WNeIGy0CR vJCMCemz+/NF199eplg9LLod114GMl5YjRtm4RUUhZks1FajcRMQO/B1K g==; X-CSE-ConnectionGUID: VbNSRQq8TcKaZhfTadmM/Q== X-CSE-MsgGUID: 4pzkpYy+SKGFreEcc1Wgxg== X-IronPort-AV: E=McAfee;i="6800,10657,11780"; a="90388075" X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="90388075" Received: from orviesa010.jf.intel.com ([10.64.159.150]) by orvoesa105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:36:30 -0700 X-CSE-ConnectionGUID: fwy/BzxdSqKTVg2gdYrPGQ== X-CSE-MsgGUID: DvV9twN+QvK7TDtO81sjnA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="236153032" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by orviesa010-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:36:26 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v2 09/15] KVM: x86/tdp_mmu: Centrally propagate to-present/atomic zap updates to external PTEs Date: Sat, 9 May 2026 15:56:34 +0800 Message-ID: <20260509075634.4274-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260509075201.4077-1-yan.y.zhao@intel.com> References: <20260509075201.4077-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Move propagation of to-present changes and atomic zap changes to external PTEs from function __tdp_mmu_set_spte_atomic() to function __handle_changed_spte(), which centrally handles changes of SPTEs. When setting a PTE to present in the mirror page tables, the update needs to be propagated to the external page tables (in TDX parlance the S-EPT). Today this is handled by special mirror page tables branching in __tdp_mmu_set_spte_atomic(), which is the only place where present PTEs are set for TDX. This keeps things running, but is a bit hacked on. The hook for setting present leaf PTEs is added only where TDX happens to need it. For example, TDX does not support any of the operations that use the non-atomic variant, tdp_mmu_set_spte() to set present PTEs. Since the hook is missing there, it is very hard to understand the code from a non-TDX lens. If the reader doesn't know the TDX specifics it could look like the external update is missing. In addition to being confusing, it also litters the TDP MMU with "external" update callbacks. This is especially unfortunate because there is already a central place to react to TDP updates, handle_changed_spte(). Begin the process of moving towards a model where all mirror page table updates are forwarded to TDX code where the TDX-specific logic can live with a more proper separation of concerns. Do this by adding a helper __handle_changed_spte() and teaching it how to return error codes, such that it can propagate the failures that may come from TDX external page table updates. Make the original handle_changed_spte() a no-fail version of __handle_changed_spte(), so it handles no-fail changes which are under exclusive mmu_lock or under the no-fail path handle_removed_pt(), triggering KVM_BUG_ON() on error returns. Instead of having __tdp_mmu_set_spte_atomic() do the frozen mirror SPTE dance and trigger propagation to external PTEs, make __tdp_mmu_set_spte_atomic() a simple helper of try_cmpxchg64() and hoist the frozen mirror SPTE dance up a level to tdp_mmu_set_spte_atomic(). Then, the propagation of changes to present to the external PTEs can be centralized to __handle_changed_spte(). Aging external SPTEs is not yet supported for the mirror page table, so just warn on mirror usage in kvm_tdp_mmu_age_spte() and invoke __tdp_mmu_set_spte_atomic() directly without frozen dance. No need to warn on installing FROZEN_SPTE as a long-term value in kvm_tdp_mmu_age_spte() since removing accessed bit is mutually exclusive with installing FROZEN_SPTE (FROZEN_SPTE is with accessed bit in all x86 platforms). Since tdp_mmu_set_spte_atomic() can also be invoked to atomically zap SPTEs (though there's no path to trigger atomic zap on the mirror page table up to now), also leverage set_external_spte() op to propagate the atomic zaps when tdp_mmu_set_spte_atomic() zaps leaf SPTEs directly. (When tdp_mmu_set_spte_atomic() zaps a non-leaf SPTE, zaps of the child leaf SPTEs are propagated via the remove_external_spte() op). Note: tdp_mmu_set_spte_atomic() invokes __handle_changed_spte() to handle changes to new_spte while the mirror SPTE is frozen, so (1) the update of the external PTEs and statistics, or (2) the update of child mirror SPTEs, child external PTEs and corresponding statistics, now occur before the mirror SPTE is actually set to new_spte. (1) is ok since if it fails, the mirror SPTE will be restored to its original value. (2) is also ok since handle_removed_pt() is no-fail. Link: https://lore.kernel.org/lkml/aYYn0nf2cayYu8e7@google.com/ Not-yet-Signed-off-by: Sean Christopherson [Rick: Based on a diff by Sean Chrisopherson] Signed-off-by: Rick Edgecombe [Yan: added atomic zap case ] Signed-off-by: Yan Zhao --- MMU_refactors v2: - Updated comments and patch log (Yan, Binbin). - Split the "KVM: x86/mmu: Plumb "sp" _pointer_ into the TDP MMU's handle_changed_spte()" out (which was in Sean's original series but had SPTE instead of "sp" in patch title). (Yan) - Also invoke set_external_spte() op to propagate changes for atomic zap of leaf SPTEs. (Yan). - Kept the "Not-yet-Signed-off-by" in v1 https://lore.kernel.org/all/20260327201421.2824383-8-rick.p.edgecombe@int= el.com. --- arch/x86/kvm/mmu/tdp_mmu.c | 124 +++++++++++++++++++++++-------------- 1 file changed, 79 insertions(+), 45 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 05dc8bdc1ea5..ada4a0837298 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -495,7 +495,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep= _t pt, bool shared) } =20 /** - * handle_changed_spte - handle bookkeeping associated with an SPTE change + * __handle_changed_spte - handle bookkeeping associated with an SPTE chan= ge * @kvm: kvm instance * @sp: the page table in which the SPTE resides * @gfn: the base GFN that was mapped by the SPTE @@ -510,9 +510,9 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep= _t pt, bool shared) * dirty logging updates are handled in common code, not here (see make_sp= te() * and fast_pf_fix_direct_spte()). */ -static void handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp, - gfn_t gfn, u64 old_spte, u64 new_spte, - int level, bool shared) +static int __handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp, + gfn_t gfn, u64 old_spte, u64 new_spte, + int level, bool shared) { bool was_present =3D is_shadow_present_pte(old_spte); bool is_present =3D is_shadow_present_pte(new_spte); @@ -549,9 +549,7 @@ static void handle_changed_spte(struct kvm *kvm, struct= kvm_mmu_page *sp, } =20 if (old_spte =3D=3D new_spte) - return; - - trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte); + return 0; =20 if (is_leaf) check_spte_writable_invariants(new_spte); @@ -578,29 +576,49 @@ static void handle_changed_spte(struct kvm *kvm, stru= ct kvm_mmu_page *sp, "a temporary frozen SPTE.\n" "as_id: %d gfn: %llx old_spte: %llx new_spte: %llx level: %d", as_id, gfn, old_spte, new_spte, level); - return; + return 0; } =20 - if (is_leaf !=3D was_leaf) - kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1); - /* * Recursively handle child PTs if the change removed a subtree from * the paging structure. Note the WARN on the PFN changing without the * SPTE being converted to a hugepage (leaf) or being zapped. Shadow * pages are kernel allocations and should never be migrated. + * + * For the mirror page table, propagate changes to present or changes of + * leaf SPTEs to !present under shared mmu_lock to the external SPTE via + * set_external_spte() op. */ if (was_present && !was_leaf && - (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) + (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) { handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); + } else if (is_mirror_sp(sp) && (is_present || shared)) { + int r; + + r =3D kvm_x86_call(set_external_spte)(kvm, gfn, old_spte, new_spte, leve= l); + if (r) + return r; + } + trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte); + + if (is_leaf !=3D was_leaf) + kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1); + + return 0; +} + +static void handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp, + gfn_t gfn, u64 old_spte, u64 new_spte, + int level, bool shared) +{ + KVM_BUG_ON(__handle_changed_spte(kvm, sp, gfn, old_spte, new_spte, + level, shared), kvm); } =20 static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm, struct tdp_iter *iter, u64 new_spte) { - u64 *raw_sptep =3D rcu_dereference(iter->sptep); - /* * The caller is responsible for ensuring the old SPTE is not a FROZEN * SPTE. KVM should never attempt to zap or manipulate a FROZEN SPTE, @@ -609,31 +627,6 @@ static inline int __must_check __tdp_mmu_set_spte_atom= ic(struct kvm *kvm, */ WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte)); =20 - /* Should not set FROZEN_SPTE as a long-term value. */ - KVM_MMU_WARN_ON(is_frozen_spte(new_spte)); - - if (is_mirror_sptep(iter->sptep)) { - int ret; - - /* - * We need to lock out other updates to the SPTE until the external - * page table has been modified. Use FROZEN_SPTE similar to - * the zapping case. - */ - if (!try_cmpxchg64(raw_sptep, &iter->old_spte, FROZEN_SPTE)) - return -EBUSY; - - ret =3D kvm_x86_call(set_external_spte)(kvm, iter->gfn, iter->old_spte, - new_spte, iter->level); - - if (ret) - __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte); - else - __kvm_tdp_mmu_write_spte(iter->sptep, new_spte); - - return ret; - } - /* * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs and * does not hold the mmu_lock. On failure, i.e. if a different logical @@ -641,7 +634,7 @@ static inline int __must_check __tdp_mmu_set_spte_atomi= c(struct kvm *kvm, * the current value, so the caller operates on fresh data, e.g. if it * retries tdp_mmu_set_spte_atomic(). */ - if (!try_cmpxchg64(raw_sptep, &iter->old_spte, new_spte)) + if (!try_cmpxchg64(rcu_dereference(iter->sptep), &iter->old_spte, new_spt= e)) return -EBUSY; =20 return 0; @@ -673,14 +666,51 @@ static inline int __must_check tdp_mmu_set_spte_atomi= c(struct kvm *kvm, =20 lockdep_assert_held_read(&kvm->mmu_lock); =20 - ret =3D __tdp_mmu_set_spte_atomic(kvm, iter, new_spte); + /* Should not set FROZEN_SPTE as a long-term value. */ + KVM_MMU_WARN_ON(is_frozen_spte(new_spte)); + + /* + * Temporarily freeze the SPTE until the external PTE operation has + * completed, e.g. so that concurrent faults don't attempt to install a + * child PTE in the external page table before the parent PTE has been + * written. + */ + if (is_mirror_sptep(iter->sptep)) + ret =3D __tdp_mmu_set_spte_atomic(kvm, iter, FROZEN_SPTE); + else + ret =3D __tdp_mmu_set_spte_atomic(kvm, iter, new_spte); + if (ret) return ret; =20 - handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte, - new_spte, iter->level, true); - - return 0; + /* + * Handle the change from iter->old_spte to new_spte. + * + * Note: for mirror page table, this means the updates of the external + * PTE, statistics, or updates of child SPTEs, child external PTEs and + * corresponding statistics are performed while the mirror SPTE is in + * frozen state (i.e., before the mirror SPTE is set to new_spte). + */ + ret =3D __handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte, + new_spte, iter->level, true); + /* + * Unfreeze the mirror SPTE. If updating the external SPTE failed, + * restore the old value so that the mirror SPTE isn't frozen in + * perpetuity, otherwise set the mirror SPTE to the new desired value. + */ + if (is_mirror_sptep(iter->sptep)) { + if (ret) + __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte); + else + __kvm_tdp_mmu_write_spte(iter->sptep, new_spte); + } else { + /* + * Bug the VM if handling the change failed, as failure is only + * allowed if KVM couldn't update the external SPTE. + */ + KVM_BUG_ON(ret, kvm); + } + return ret; } =20 /* @@ -1334,6 +1364,10 @@ static void kvm_tdp_mmu_age_spte(struct kvm *kvm, st= ruct tdp_iter *iter) { u64 new_spte; =20 + /* TODO: Add support for aging external SPTEs, if necessary. */ + if (WARN_ON_ONCE(is_mirror_sptep(iter->sptep))) + return; + if (spte_ad_enabled(iter->old_spte)) { iter->old_spte =3D tdp_mmu_clear_spte_bits_atomic(iter->sptep, shadow_accessed_mask); --=20 2.43.2 From nobody Sat Jun 13 06:01:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EA2C038E121; Sat, 9 May 2026 08:36:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315804; cv=none; b=IG8kqa62y1JMdCzCZqYXNvYYLfvh2uUZbyVQMv3E93+rplhwb9+K1vTAVRlgEkzr+2wI1izTN791n7MpeBdWY/k2PTJ7P8ELJP+C6vbMjOwnajA+0tSb+AREu5tsxICHbrru7UN5DcwwMBEciIID7JnCZqrXmfSICnBwMQWOX6w= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315804; c=relaxed/simple; bh=CWVlTJ2fw9Hr14bsGHbzw8g4DedxcZzL9H4u85xMyA4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ZTfMwUI+CsUDX69aTLQbcerezsL9NEGPsSmTvL6gM7wI52jKQ5un4kQTtYTNdWDoJrWTLDD207qDoBAF2lJ6Gclr9vFYjl5ig56kg3cNEwYwBoVsczRm5sR8CzDWJ4UZHgAcLg1MeDJWc19rpo/n/5k4NHg2HDLIofGM3OnXK2U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=krZgTNyf; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="krZgTNyf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778315803; x=1809851803; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=CWVlTJ2fw9Hr14bsGHbzw8g4DedxcZzL9H4u85xMyA4=; b=krZgTNyfDbPeO70b/53mcGNyBlsIoNOcqEzdoYHG3WK4HHymTSfFTck7 SiOL+4LWn0dcLJHVRLBjQR+ICZFG/f8AT08TTWSTj6n6pNbiinsM2cTs5 ilFG+/G7rG/IGaehL8tgYLSmtqYSQPbKHTBiorBd0zfvTjCwrt9SHd8u8 VIebKIKqgpw4OftaVMpui/C5YM+N0OinbE90iRfqFhAAi8DRCsyhbvQJx 2N1VXaXEOSdgyW9j/JctHFyuKAbdTuOO6Ykn2YsaNg6yLN6mXIc3vj+hf R70O51Hi6WgOooorQqe/mJlr4NSjVYdb51uDYCh2Y2zj4EqbvgBGSlqFZ Q==; X-CSE-ConnectionGUID: miNfV0QnQ0S2pACrlkRQFQ== X-CSE-MsgGUID: rBm8L1pJRoCeIPFmkAAwUA== X-IronPort-AV: E=McAfee;i="6800,10657,11780"; a="79306587" X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="79306587" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:36:42 -0700 X-CSE-ConnectionGUID: W/jDTjevRq68nRpsrIxX8w== X-CSE-MsgGUID: 35Tecm0uSbiHNKNABG6h+Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="230584765" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:36:39 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v2 10/15] KVM: x86/mmu: Drop KVM_BUG_ON() on shared lock to zap child external PTEs Date: Sat, 9 May 2026 15:56:47 +0800 Message-ID: <20260509075647.4290-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260509075201.4077-1-yan.y.zhao@intel.com> References: <20260509075201.4077-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rick Edgecombe Drop the KVM_BUG_ON() in the KVM MMU core before zapping child external PTEs, since requiring zapping PTEs to be protected by exclusive mmu_lock is TDX's specific requirement. No need to plumb the shared/exclusive info into the remove_external_spte() op or move the KVM_BUG_ON() to TDX, because - There's already an assertion of exclusive mmu_lock protection in TDX. - The KVM_BUG_ON() is a bit redundant given that if there's any bug causing zapping of leaf PTEs in S-EPT under shared mmu_lock, SEAMCALL failures due to contention would result in TDX_BUG_ON() in TDX. Link: https://lore.kernel.org/kvm/aYUarHf3KEwHGuJe@google.com/ Suggested-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v2: - Updated commit log and title. (Yan) --- arch/x86/kvm/mmu/tdp_mmu.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index ada4a0837298..553a30628960 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -473,10 +473,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pte= p_t pt, bool shared) } handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared); =20 - if (is_mirror_sp(sp)) { - KVM_BUG_ON(shared, kvm); + if (is_mirror_sp(sp)) remove_external_spte(kvm, gfn, old_spte, level); - } } =20 if (is_mirror_sp(sp) && --=20 2.43.2 From nobody Sat Jun 13 06:01:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 18083355813; Sat, 9 May 2026 08:36:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315815; cv=none; b=iRYEIrOwakd0P/v13Sj0KIx9DmGFZ91ZjpG+vPNHzBJUkN7D9u+8oOb3Btk5LaVCnjGh+hjbdESjOg1Afuy1MEbxyUyAe9n6bF7r5F0cb3XSTgKklu3UeJN19EYK9xKEnxL13kRr9vck2V9y7JQ7h+HP9Ra+kqvHVVyScLiZUyY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315815; c=relaxed/simple; bh=2xIa4NP3fin8NYMgYIyqbZxBEs78HwYv0RuAhvdeaxw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=l3LrUkId87MAhOx7a1fY9wJJ/ZyWAleHuN8hWmIKfkur8NUKS5wm6yhnT88w44SuVv6u0kZaiJjxhiu+mpNVDCHDUOTaovNzytiwlp7ce9OGDFl24sdVd1TD8QfqbiL76bC02GsbUkC1N94ByapHl0FKn1UsougBLFw9pinbkGg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=LD6E1XUQ; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="LD6E1XUQ" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778315814; x=1809851814; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=2xIa4NP3fin8NYMgYIyqbZxBEs78HwYv0RuAhvdeaxw=; b=LD6E1XUQNu95NVaQW1dNNLAy1oCY9L69pr4FDxeHDADWi6iqbfXCksEg 8OaN3FlOZthoyJlCBfjqrvxeD6BXF53Ns2Y5Iyp4SwzvrS5va4wKNB/z6 LqTeWAdNoEUVFsAFQmDrB5C2QzYULIySUs4mbrzjvhvJh6Uyu9tuQVnK4 I5gEsI7lgljz96ZCr7dFcliEl3jmbtkDTsvmFwdtH/yY4Veh60Coy1dpE vzAoII/4RMDgo26mtNLagMph28OgXhrbG5acHxBm6uRP43W+3TJmxmY/v ptqQJVz5KenDpXo4oH2gFhDILH14CmJbKX5SP77UMfLOwOs6Fe898C0cP w==; X-CSE-ConnectionGUID: pc/+IUL/T3S8AvPDuzLWgw== X-CSE-MsgGUID: JduBUUwwSie1wCMXOBkwuw== X-IronPort-AV: E=McAfee;i="6800,10657,11780"; a="79306595" X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="79306595" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:36:54 -0700 X-CSE-ConnectionGUID: pDwcHvluSNesprpnTBJ50A== X-CSE-MsgGUID: 0ykWu+d5Q/aJwhDx9QpMrA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="230584771" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:36:50 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v2 11/15] KVM: TDX: Hoist tdx_sept_remove_private_spte() above set_private_spte() Date: Sat, 9 May 2026 15:56:58 +0800 Message-ID: <20260509075658.4306-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260509075201.4077-1-yan.y.zhao@intel.com> References: <20260509075201.4077-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Arrange tdx_sept_remove_private_spte() (and its tdx_track() helper) to be above tdx_sept_set_private_spte() in anticipation of routing all S-EPT writes (with the exception of reclaiming non-leaf pages) through the "set" API. No functional change intended. Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v2: -Made the patch log description more generic to match the diff. (Kai, Rick) -Kept tdx_sept_free_private_spt() below tdx_sept_set_private_spte(). (Yan) --- arch/x86/kvm/vmx/tdx.c | 80 +++++++++++++++++++++--------------------- 1 file changed, 40 insertions(+), 40 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 48aa7936a7f7..e40a999b0fb8 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1718,23 +1718,6 @@ static int tdx_sept_map_leaf_spte(struct kvm *kvm, g= fn_t gfn, enum pg_level leve return tdx_mem_page_aug(kvm, gfn, level, pfn); } =20 -static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, - u64 new_spte, enum pg_level level) -{ - lockdep_assert_held(&kvm->mmu_lock); - - if (KVM_BUG_ON(is_shadow_present_pte(old_spte), kvm)) - return -EIO; - - if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) - return -EIO; - - if (!is_last_spte(new_spte, level)) - return tdx_sept_map_nonleaf_spte(kvm, gfn, level, new_spte); - - return tdx_sept_map_leaf_spte(kvm, gfn, level, new_spte); -} - /* * Ensure shared and private EPTs to be flushed on all vCPUs. * tdh_mem_track() is the only caller that increases TD epoch. An increase= in @@ -1781,29 +1764,6 @@ static void tdx_track(struct kvm *kvm) kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); } =20 -static int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, - enum pg_level level, void *private_spt) -{ - struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - - /* - * free_external_spt() is only called after hkid is freed when TD is - * tearing down. - * KVM doesn't (yet) zap page table pages in mirror page table while - * TD is active, though guest pages mapped in mirror page table could be - * zapped during TD is active, e.g. for shared <-> private conversion - * and slot move/deletion. - */ - if (KVM_BUG_ON(is_hkid_assigned(kvm_tdx), kvm)) - return -EIO; - - /* - * The HKID assigned to this TD was already freed and cache was - * already flushed. We don't have to flush again. - */ - return tdx_reclaim_page(virt_to_page(private_spt)); -} - static void tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte) { @@ -1854,6 +1814,46 @@ static void tdx_sept_remove_private_spte(struct kvm = *kvm, gfn_t gfn, tdx_quirk_reset_paddr(PFN_PHYS(pfn), PAGE_SIZE); } =20 +static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, + u64 new_spte, enum pg_level level) +{ + lockdep_assert_held(&kvm->mmu_lock); + + if (KVM_BUG_ON(is_shadow_present_pte(old_spte), kvm)) + return -EIO; + + if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) + return -EIO; + + if (!is_last_spte(new_spte, level)) + return tdx_sept_map_nonleaf_spte(kvm, gfn, level, new_spte); + + return tdx_sept_map_leaf_spte(kvm, gfn, level, new_spte); +} + +static int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, void *private_spt) +{ + struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); + + /* + * free_external_spt() is only called after hkid is freed when TD is + * tearing down. + * KVM doesn't (yet) zap page table pages in mirror page table while + * TD is active, though guest pages mapped in mirror page table could be + * zapped during TD is active, e.g. for shared <-> private conversion + * and slot move/deletion. + */ + if (KVM_BUG_ON(is_hkid_assigned(kvm_tdx), kvm)) + return -EIO; + + /* + * The HKID assigned to this TD was already freed and cache was + * already flushed. We don't have to flush again. + */ + return tdx_reclaim_page(virt_to_page(private_spt)); +} + void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, int trig_mode, int vector) { --=20 2.43.2 From nobody Sat Jun 13 06:01:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 45DD238C421; Sat, 9 May 2026 08:37:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.18 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315829; cv=none; b=EhcvMa61eQYre95hJErxXB0R6Hd2e5mTSZEN4lavYWV0hDjpBNx23lXd/a8Hr99ckjmoRT/Fj8/7epTQ4ZSTwqykgfxL3n+CRaRNUBJ1bLaLdShJgtfjRl5iS37W1nBTgcOkYrELCA37zUxXPgC9D0ZVOPi6GmBmLNwGdLZfWzo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315829; c=relaxed/simple; bh=tfPeTIyWIaphWywq6hpaQtZdrVqTcipgosfK2WhF+mc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=btxEzX8UZ9jlLDoyfc/5IEmO6oedr/gvLWdY+o9seFviBbIYZuJS5LyKo4J8U/1ssG+Kkt1MN/BNYZYP3aET0larm3GLNzv9xKvI0yRWxqhB3svDQOoyRCM9AO2e0AU8AExczPMDBynx7VvQ+tq/966jlMjMAYeY3+BLu2DIQO4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Rp8/Ma2g; arc=none smtp.client-ip=198.175.65.18 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Rp8/Ma2g" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778315826; x=1809851826; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tfPeTIyWIaphWywq6hpaQtZdrVqTcipgosfK2WhF+mc=; b=Rp8/Ma2gZ4eqglcIPknfRjsAkSErei0iIpJUgvcFWW7wwEwY9IJRMNHD Adwh36r0wVpc3Hs7+eymvPuhU49X/Nes+C++O5qLJbl/XZ7rP4V4eQOgL J9XALXxMDiobZ1OdkKelKbtJ+M/m5O71PHQewhVT040VI7EupF3eXNB2D BSt3Xe4bCssVhSHN9noIIJn8FBb6U3zctFxUYrtqbrQ+euClNBhMEcqe6 uypKMWIolriaVZdLz0+iOc8bVFOqW3LDZPhnwB5w4e4FoQNzLamyGvY5+ 9XEUuga2vZFsHOMexEZqvU7kRjmZUYFQMWM0C/11WAckYe4vtj+IIuuta w==; X-CSE-ConnectionGUID: mtLVjCYgQECCksVK5VMDAA== X-CSE-MsgGUID: hcYDo4fZRy6lzhNIsHatzA== X-IronPort-AV: E=McAfee;i="6800,10657,11780"; a="79306602" X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="79306602" Received: from fmviesa009.fm.intel.com ([10.60.135.149]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:37:05 -0700 X-CSE-ConnectionGUID: 20NNMm0aQ6Svr8OzexSNew== X-CSE-MsgGUID: Y9HKrJpJS+iXwlF9l+iU+w== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="230584777" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa009-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:37:01 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v2 12/15] KVM: TDX: Drop kvm_x86_ops.remove_external_spte() Date: Sat, 9 May 2026 15:57:09 +0800 Message-ID: <20260509075709.4322-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260509075201.4077-1-yan.y.zhao@intel.com> References: <20260509075201.4077-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Drop kvm_x86_ops.remove_external_spte(), and instead handle the removal of leaf SPTEs in the S-EPT (a.k.a. external page table) in kvm_x86_ops.set_external_spte(). This will also allow extending tdx_sept_set_private_spte() to support splitting a huge S-EPT entry without needing yet another kvm_x86_ops hook. Now all changes for removing leaf mirror SPTEs are propagated through kvm_x86_ops.set_external_spte(). - When removing leaf mirror SPTEs under shared mmu_lock (though currently no path can trigger this scenario and TDX does not support this scenario), tdx_sept_remove_private_spte() may produce a warning due to lockdep_assert_held_write() or may return -EIO and trigger TDX_BUG_ON() due to concurrent BLOCK, TRACK, REMOVE. - When removing leaf mirror SPTEs under exclusive mmu_lock, all errors are unexpected. If any error occurs in this scenario, tdx_sept_remove_private_spte() will return -EIO and trigger KVM_BUG_ON(). A redundant KVM_BUG_ON() call will also be triggered in TDP MMU core in handle_changed_spte(), which is benign (the WARN will fire if and only if the VM isn't already bugged). Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v2: - Added expected lock and valid scenarios in function comment of tdx_sept_set_private_spte(). (Yan/Rick) - Updated patch title (was "Handle removal of leaf SPTEs in .set_private_spte()"), since atomic zaps of leaf SPTEs are already handled in kvm_x86_ops.set_external_spte() before this patch. (Yan). --- arch/x86/include/asm/kvm-x86-ops.h | 1 - arch/x86/include/asm/kvm_host.h | 3 --- arch/x86/kvm/mmu/tdp_mmu.c | 37 +++------------------------- arch/x86/kvm/vmx/tdx.c | 39 +++++++++++++++++++++--------- 4 files changed, 31 insertions(+), 49 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index bced6d938702..ed348c6dd445 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -97,7 +97,6 @@ KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) KVM_X86_OP_OPTIONAL_RET0(free_external_spt) -KVM_X86_OP_OPTIONAL(remove_external_spte) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) KVM_X86_OP(get_l2_tsc_multiplier) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 9b55973f194c..c62a14623dcc 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1899,9 +1899,6 @@ struct kvm_x86_ops { int (*free_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, void *external_spt); =20 - /* Update external page table from spte getting removed, and flush TLB. */ - void (*remove_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level le= vel, - u64 mirror_spte); =20 bool (*has_wbinvd_exit)(void); =20 diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 553a30628960..5cc2e948610b 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -359,25 +359,6 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct = kvm_mmu_page *sp) spin_unlock(&kvm->arch.tdp_mmu_pages_lock); } =20 -static void remove_external_spte(struct kvm *kvm, gfn_t gfn, u64 old_spte, - int level) -{ - /* - * External (TDX) SPTEs are limited to PG_LEVEL_4K, and external - * PTs are removed in a special order, involving free_external_spt(). - * But remove_external_spte() will be called on non-leaf PTEs via - * __tdp_mmu_zap_root(), so avoid the error the former would return - * in this case. - */ - if (!is_last_spte(old_spte, level)) - return; - - /* Zapping leaf spte is allowed only when write lock is held. */ - lockdep_assert_held_write(&kvm->mmu_lock); - - kvm_x86_call(remove_external_spte)(kvm, gfn, level, old_spte); -} - /** * handle_removed_pt() - handle a page table removed from the TDP structure * @@ -472,9 +453,6 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep= _t pt, bool shared) FROZEN_SPTE, level); } handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared); - - if (is_mirror_sp(sp)) - remove_external_spte(kvm, gfn, old_spte, level); } =20 if (is_mirror_sp(sp) && @@ -583,14 +561,14 @@ static int __handle_changed_spte(struct kvm *kvm, str= uct kvm_mmu_page *sp, * SPTE being converted to a hugepage (leaf) or being zapped. Shadow * pages are kernel allocations and should never be migrated. * - * For the mirror page table, propagate changes to present or changes of - * leaf SPTEs to !present under shared mmu_lock to the external SPTE via + * For the mirror page table, propagate all changes to the external SPTE + * (except zapping/promotion of non-leaf SPTEs) via the * set_external_spte() op. */ if (was_present && !was_leaf && (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) { handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); - } else if (is_mirror_sp(sp) && (is_present || shared)) { + } else if (is_mirror_sp(sp)) { int r; =20 r =3D kvm_x86_call(set_external_spte)(kvm, gfn, old_spte, new_spte, leve= l); @@ -743,15 +721,6 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, tdp_ptep_= t sptep, u64 old_spte, =20 handle_changed_spte(kvm, sp, gfn, old_spte, new_spte, level, false); =20 - /* - * Users that do non-atomic setting of PTEs don't operate on mirror - * roots, so don't handle it and bug the VM if it's seen. - */ - if (is_mirror_sptep(sptep)) { - KVM_BUG_ON(is_shadow_present_pte(new_spte), kvm); - remove_external_spte(kvm, gfn, old_spte, level); - } - return old_spte; } =20 diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index e40a999b0fb8..749883fb8f11 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1764,11 +1764,11 @@ static void tdx_track(struct kvm *kvm) kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); } =20 -static void tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 mirror_spte) +static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, u64 old_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); + kvm_pfn_t pfn =3D spte_to_pfn(old_spte); gpa_t gpa =3D gfn_to_gpa(gfn); u64 err, entry, level_state; =20 @@ -1780,16 +1780,16 @@ static void tdx_sept_remove_private_spte(struct kvm= *kvm, gfn_t gfn, * there can't be anything populated in the private EPT. */ if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm)) - return; + return -EIO; =20 /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) - return; + return -EIO; =20 err =3D tdh_do_no_vcpus(tdh_mem_range_block, kvm, &kvm_tdx->td, gpa, level, &entry, &level_state); if (TDX_BUG_ON_2(err, TDH_MEM_RANGE_BLOCK, entry, level_state, kvm)) - return; + return -EIO; =20 /* * TDX requires TLB tracking before dropping private page. Do @@ -1805,22 +1805,40 @@ static void tdx_sept_remove_private_spte(struct kvm= *kvm, gfn_t gfn, err =3D tdh_do_no_vcpus(tdh_mem_page_remove, kvm, &kvm_tdx->td, gpa, level, &entry, &level_state); if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_REMOVE, entry, level_state, kvm)) - return; + return -EIO; =20 err =3D tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, pfn); if (TDX_BUG_ON(err, TDH_PHYMEM_PAGE_WBINVD, kvm)) - return; + return -EIO; =20 tdx_quirk_reset_paddr(PFN_PHYS(pfn), PAGE_SIZE); + return 0; } =20 +/* + * Handle changes for + * (1) leaf SPTEs from non-present to present + * (2) non-leaf SPTEs from non-present to present + * (3) leaf SPTEs from present to non-present + * + * - (1) and (2) must be under shared mmu_lock. If (1) and (2) are under + * exclusive mmu_lock (currently impossible), contention errors may lead= to + * KVM_BUG_ON() in handle_changed_spte(), e.g., due to tdx_mem_page_aug(= ), + * tdx_mem_page_add(), or tdh_mem_sept_add() contending with tdh_vp_ente= r() + * due to zero-step mitigation or contending with TDCALLs. + * - (3) must be under write mmu_lock. If (3) is under shared mmu_lock + * (currently impossible), warnings will be generated due to + * lockdep_assert_held_write() or TDX_BUG_ON() caused by concurrent BLOC= K, + * TRACK, REMOVE. + * - Promotion/demotion is not yet supported. + */ static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, u64 new_spte, enum pg_level level) { lockdep_assert_held(&kvm->mmu_lock); =20 - if (KVM_BUG_ON(is_shadow_present_pte(old_spte), kvm)) - return -EIO; + if (is_shadow_present_pte(old_spte)) + return tdx_sept_remove_private_spte(kvm, gfn, level, old_spte); =20 if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EIO; @@ -3448,7 +3466,6 @@ int __init tdx_hardware_setup(void) =20 vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; vt_x86_ops.free_external_spt =3D tdx_sept_free_private_spt; - vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; vt_x86_ops.protected_apic_has_interrupt =3D tdx_protected_apic_has_interr= upt; return 0; =20 --=20 2.43.2 From nobody Sat Jun 13 06:01:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0B5F938239B; Sat, 9 May 2026 08:37:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315837; cv=none; b=D7/mGaEiKt7Ni6vJxu3bH0hYbnTOQNNiOXOM3cTjRMq7j4K/s0tcoy7aX52XOC44ztzLaiiLdC8HtfXMrBuqgG9u1mjRk+5AZSp67t6j8K/KkSG8SrFuOCStNFGIN8aI2+jDZB2FIfGtzqHDTTR7kQWhIP/RKKLPQHsQer/O3NQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315837; c=relaxed/simple; bh=PuP4wTw1gXLowAtx3WjUawqXh+u2rSJ5YEAUiyfAcPo=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Lm7H2v616Ze84Dv75erfV7ZNnaEgCdbaVdclKyU3Fbq0zRiXiGf3Rb2AUBAbG+NJKv58VTNXKCFWM3si4xNn2hIVLOYpAohomwuif/nGK0LhBDjn7/b9SkjcY/zR8kzsfYq+Usva64JOVeiTYVpNHTaxDoW5wDZafhOnpqpzSvA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=AP+S5hPq; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="AP+S5hPq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778315835; x=1809851835; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=PuP4wTw1gXLowAtx3WjUawqXh+u2rSJ5YEAUiyfAcPo=; b=AP+S5hPq+QbZ3J3S7Z9BzFPLZHoj5QjIVCSl6O+GNNSeLPq86+Uxh/KI 2cRQ66IPhTOrPSx/leGHXzJY8DjxnpR4z54WuOOmZJGLM2vASF34dEMLP lFsBmha4L9Tn9uC0XMfgR71ug42KTRBxnTdbra4R0BeSJZ+IrPPPVg6KO bJKtSrG+9dxbTBR+5k8hFZgYjksfYuB/MUf/FmU4fy+ps34uitcnCzFtq 3L0AcZUxbD0v3MjfTI7HWAEyMW3/MFvUD7k15+Mu28BA3eTjZHTwkWP6/ LyB7xgTLbrQ1bYxUDNOELwWcpKcMi6fQVI1IX9MiBlR17HHlptYzxqdSR g==; X-CSE-ConnectionGUID: aCvdPP0sSCmIY31ke8DhBQ== X-CSE-MsgGUID: uiZ6sgm7QVmuVTdEfn2FyA== X-IronPort-AV: E=McAfee;i="6800,10657,11780"; a="79005338" X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="79005338" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:37:15 -0700 X-CSE-ConnectionGUID: /XN2E2RORLimCAYo9OAjOA== X-CSE-MsgGUID: +ogWKdmwQpmhs9Z0zh2rDw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="238784786" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:37:11 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v2 13/15] KVM: TDX: Rename tdx_sept_remove_private_spte() to show it's for leaf SPTEs Date: Sat, 9 May 2026 15:57:19 +0800 Message-ID: <20260509075719.4338-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260509075201.4077-1-yan.y.zhao@intel.com> References: <20260509075201.4077-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Rename tdx_sept_remove_private_spte() to tdx_sept_remove_leaf_spte() to clearly show that this function is for removal of leaf SPTEs. No functional change intended. Signed-off-by: Yan Zhao --- MMU_refactors v2: - New patch. --- arch/x86/kvm/vmx/tdx.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 749883fb8f11..5a7f304e14af 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1764,8 +1764,8 @@ static void tdx_track(struct kvm *kvm) kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); } =20 -static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 old_spte) +static int tdx_sept_remove_leaf_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, u64 old_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); kvm_pfn_t pfn =3D spte_to_pfn(old_spte); @@ -1838,7 +1838,7 @@ static int tdx_sept_set_private_spte(struct kvm *kvm,= gfn_t gfn, u64 old_spte, lockdep_assert_held(&kvm->mmu_lock); =20 if (is_shadow_present_pte(old_spte)) - return tdx_sept_remove_private_spte(kvm, gfn, level, old_spte); + return tdx_sept_remove_leaf_spte(kvm, gfn, level, old_spte); =20 if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EIO; @@ -2834,7 +2834,7 @@ void tdx_flush_tlb_current(struct kvm_vcpu *vcpu) void tdx_flush_tlb_all(struct kvm_vcpu *vcpu) { /* - * TDX has called tdx_track() in tdx_sept_remove_private_spte() to + * TDX has called tdx_track() in tdx_sept_remove_leaf_spte() to * ensure that private EPT will be flushed on the next TD enter. No need * to call tdx_track() here again even when this callback is a result of * zapping private EPT. --=20 2.43.2 From nobody Sat Jun 13 06:01:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C54D338E124; Sat, 9 May 2026 08:37:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315846; cv=none; b=o0hfroUiS8bZjv41lschaHhiVN+SXOvRXGSF84fS+hAMoDgj6jrd1fQRd0UFYvqLTTRqLtRTPh8bw9wdYsg5G3hWLem6TqgozEn66t/YwwPJPQyJuMXVjOJiJSJ5OZ1OsNnYEeqXjnrhgPd0Uh4ekVT6bTiJWGNzn7+sEIOHEvg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315846; c=relaxed/simple; bh=keyvAnRPIGpgf0AfD8aSWj5V+z5abmGDkNnLctGyMoQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=DzZzloWeROy0Ie6hVplEPfq34N1OwZUwmvEMa8AU1yzKPbF8yqs003th/7XJrLJ31Use9SpjKR/UJpZN5Awj0rZyz4b7CZss/5vMz2OItomEpAiqOjlVQ75EKKT8P+3OSliVC130oQkMzPRYAT+24XGlSq2Pc7NJZYHF2vVp02U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SnCP5QjE; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SnCP5QjE" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778315845; x=1809851845; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=keyvAnRPIGpgf0AfD8aSWj5V+z5abmGDkNnLctGyMoQ=; b=SnCP5QjEgW/QVT5znygiYSN2tT2CBg6H51aV1zoiN5MHdoM3Ym5FLM/O k5+J79H+RaR3rq696/wODFfluTt+U7ub4kkK7f/j6GwY2oNF6hM0gl9wD U6DzorDXYs3dPCjSTUSBasb+Gfh4oZaFgsiBhUFxeYk8y3tCRgNKF+Juz kllQqqB2Luzj2fZ5eGLZdgmiheHWU5YEOOuMzoRMJ5rnH+sxmV7nvwpfi ofnT/vxooS709KfqxOG29JhM14Bi0aESlKnHbMskKp4BW7EfO2El93vQc ZNZZ/4aHj97nzr4k3PaC3SHF3s0gRAqOsmfkQuctLwjAOk+ZF3Qnyoisg g==; X-CSE-ConnectionGUID: g8G0V23RSJS8f2YRm5UyMw== X-CSE-MsgGUID: dH8OmJ9TQDKik3yVp2nhLw== X-IronPort-AV: E=McAfee;i="6800,10657,11780"; a="79005357" X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="79005357" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:37:25 -0700 X-CSE-ConnectionGUID: Y1iF8KjORRKyBj91+XcsJw== X-CSE-MsgGUID: OtYW7KwtTRuE+IldcNHfNQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="238784820" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:37:22 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v2 14/15] KVM: x86: Move error handling inside free_external_spt() Date: Sat, 9 May 2026 15:57:30 +0800 Message-ID: <20260509075730.4354-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260509075201.4077-1-yan.y.zhao@intel.com> References: <20260509075201.4077-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Move the logic for TDX's specific need to leak pages when reclaim fails inside the free_external_spt() op, so this can be done in TDX specific code and not the generic MMU. Do this by passing the "sp" in instead of the external page table pointer. This way TDX code can set sp->external_spt to NULL. Since the error is now handled internally in TDX code (by triggering KVM_BUG_ON() or TDX_BUG_ON_3(), which warn and stop the VM on any error), change the op to return void. This way it also operates like a normal free in that success is guaranteed from the caller's perspective. Opportunistically, drop the unused level and gfn args while adjusting the sp arg. [ Rick: Re-wrote log and massaged op name ] [ Yan: Updated patch log/function comment, dropped unused param in op ] Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v2: - Fixed typo in the patch log. (Binbin) - Dropped unused param gfn. (Binbin) - Mentioned that failure is not handled silently in the patch log. (Binbin) - Added expected lock and valid scenarios in function comment of tdx_sept_free_private_spt(). (Yan/Rick) --- arch/x86/include/asm/kvm-x86-ops.h | 2 +- arch/x86/include/asm/kvm_host.h | 3 +-- arch/x86/kvm/mmu/tdp_mmu.c | 13 ++----------- arch/x86/kvm/vmx/tdx.c | 28 ++++++++++++++-------------- 4 files changed, 18 insertions(+), 28 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index ed348c6dd445..10ccf6ea9d9a 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -96,7 +96,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) -KVM_X86_OP_OPTIONAL_RET0(free_external_spt) +KVM_X86_OP_OPTIONAL(free_external_spt) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) KVM_X86_OP(get_l2_tsc_multiplier) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index c62a14623dcc..6b28dd387bc6 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1896,8 +1896,7 @@ struct kvm_x86_ops { u64 new_spte, enum pg_level level); =20 /* Update external page tables for page table about to be freed. */ - int (*free_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, - void *external_spt); + void (*free_external_spt)(struct kvm *kvm, struct kvm_mmu_page *sp); =20 =20 bool (*has_wbinvd_exit)(void); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 5cc2e948610b..a847a8f09bc6 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -455,17 +455,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pte= p_t pt, bool shared) handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared); } =20 - if (is_mirror_sp(sp) && - WARN_ON(kvm_x86_call(free_external_spt)(kvm, base_gfn, sp->role.level, - sp->external_spt))) { - /* - * Failed to free page table page in mirror page table and - * there is nothing to do further. - * Intentionally leak the page to prevent the kernel from - * accessing the encrypted page. - */ - sp->external_spt =3D NULL; - } + if (is_mirror_sp(sp)) + kvm_x86_call(free_external_spt)(kvm, sp); =20 call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 5a7f304e14af..9431bc443d50 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1849,27 +1849,27 @@ static int tdx_sept_set_private_spte(struct kvm *kv= m, gfn_t gfn, u64 old_spte, return tdx_sept_map_leaf_spte(kvm, gfn, level, new_spte); } =20 -static int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, - enum pg_level level, void *private_spt) +/* + * Handle changes for non-leaf SPTEs from present to non-present. + * Must be under exclusive mmu_lock and cannot fail. + */ +static void tdx_sept_free_private_spt(struct kvm *kvm, struct kvm_mmu_page= *sp) { - struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - /* - * free_external_spt() is only called after hkid is freed when TD is - * tearing down. * KVM doesn't (yet) zap page table pages in mirror page table while * TD is active, though guest pages mapped in mirror page table could be * zapped during TD is active, e.g. for shared <-> private conversion * and slot move/deletion. + * + * In other words, KVM should only free mirror page tables after the + * TD's hkid is freed, when the TD is being torn down. + * + * If the S-EPT PTE can't be removed for any reason, intentionally leak + * the page to prevent the kernel from accessing the encrypted page. */ - if (KVM_BUG_ON(is_hkid_assigned(kvm_tdx), kvm)) - return -EIO; - - /* - * The HKID assigned to this TD was already freed and cache was - * already flushed. We don't have to flush again. - */ - return tdx_reclaim_page(virt_to_page(private_spt)); + if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) || + tdx_reclaim_page(virt_to_page(sp->external_spt))) + sp->external_spt =3D NULL; } =20 void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, --=20 2.43.2 From nobody Sat Jun 13 06:01:23 2026 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.20]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F9C338CFF8; Sat, 9 May 2026 08:37:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=198.175.65.20 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315856; cv=none; b=JqWA/V5T9/fqL2GlUAhwGGtF3r/MAtIae7+o6sOeOWjTgBKSAMqXCBhK+LSb6SPdSZLasofcj0Rox9xmyIotbxmPbHcHbBQZRpzatncUes338n59ydYG0tL/5EUm3Z4NiwqQHkVrq4m51heyBE32Cv8aVDZ2bBSRa9ED8T8nq2Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778315856; c=relaxed/simple; bh=VNGptNcnug1Kynfoow7xYKFTwBXMjzkYDEZiM53td6Q=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KmrApDEH/cqJDBAIiliClMx7GL+EbxfdgUy3yCWNSLBreXCVxO6LpL5Pl8RhgBO5uHXBYF6oWP+R+dMENE0Hz0f3PRat0ekcd+fffRCw2t7OwH/EnoRp54JaDdv31h25XZVJiLXySnaGKw5CzcO/tRfoSF4LMNyW2SKXjdypx+E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Q0LXPy6l; arc=none smtp.client-ip=198.175.65.20 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Q0LXPy6l" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1778315855; x=1809851855; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=VNGptNcnug1Kynfoow7xYKFTwBXMjzkYDEZiM53td6Q=; b=Q0LXPy6l1buAUbrvkVZY7uDdAJIXjCxuNOK2gKy9qCk1IhU2vOwRJJ8W jsrW7xjCDC6w7lLBoA6CktvWPZFQYXxZ0F2iPBHuKjjTurSlkGQoOyBxI jSoFXuaTuzLDAjhLqBbK4D+j3e3d92dm3MrXKqZqj+qNhwN1pLEW/Tqjk ITOH9TwPe007S0ii/0FxIoykVBOUSJE+uRpq6+aCvk0Yvpaoy1JrPJW43 79yDXdJd3F8GaVOQ5WWUabemhhF5M/udEFi2scAXH0idbmoLtMP3fCNOq 7UPxmFEMp680ds3FhEB2bOHqYyYXrw0C2Tq4ggGM/Tbv6vBVtoILi5+js w==; X-CSE-ConnectionGUID: HQq9KIx+TKyehk/saQKEgw== X-CSE-MsgGUID: scw4E+Q0QgatQ3vkELchIA== X-IronPort-AV: E=McAfee;i="6800,10657,11780"; a="79005364" X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="79005364" Received: from fmviesa004.fm.intel.com ([10.60.135.144]) by orvoesa112.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:37:35 -0700 X-CSE-ConnectionGUID: Ffebmn3jRpWQjYQpVqfEXQ== X-CSE-MsgGUID: Gn750mfYQ5CnXlfBjmQRFQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,225,1770624000"; d="scan'208";a="238784850" Received: from yzhao56-desk.sh.intel.com ([10.239.47.19]) by fmviesa004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 May 2026 01:37:32 -0700 From: Yan Zhao To: seanjc@google.com, pbonzini@redhat.com, kvm@vger.kernel.org, rick.p.edgecombe@intel.com, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, kai.huang@intel.com, binbin.wu@linux.intel.com, xiaoyao.li@intel.com, yan.y.zhao@intel.com Subject: [PATCH v2 15/15] KVM: TDX: Move external page table freeing to TDX code Date: Sat, 9 May 2026 15:57:40 +0800 Message-ID: <20260509075740.4371-1-yan.y.zhao@intel.com> X-Mailer: git-send-email 2.43.2 In-Reply-To: <20260509075201.4077-1-yan.y.zhao@intel.com> References: <20260509075201.4077-1-yan.y.zhao@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Move the freeing of external page tables into the reclaim operation that lives in TDX code. The TDP MMU supports traversing the TDP without holding locks. Page tables need to be freed via RCU to prevent walking one that gets freed. While none of these lockless walk operations actually happen for the mirror page table, the TDP MMU nonetheless frees the mirror page table in the same way, and (because it's a handy place to plug it in) the external page table as well. However, the external page table definitely can't be walked once the page table pages are reclaimed from the TDX module. The TDX module releases the page for the host VMM to use, so this RCU-time free is unnecessary for the external page table. So move the free_page() call to TDX code. Create an tdp_mmu_free_unused_sp() to allow for freeing external page tables that have never left the TDP MMU code (i.e. don't need freed in a special way). Link: https://lore.kernel.org/kvm/aYpjNrtGmogNzqwT@google.com/ Not-yet-Signed-off-by: Sean Christopherson [Based on a diff by Sean, added log] Signed-off-by: Rick Edgecombe Signed-off-by: Yan Zhao --- MMU_refactors v2: - Fixed typos in the patch log. (Yan, Kai) - Still kept "Not-yet-Signed-off-by" tag. Sean, please change it to SoB if the patch looks good to you. - Updated the code comment in tdx_sept_free_private_spt(): invoking free_page() to free S-EPT page in tdx_sept_free_private_spt() is only because RCU-time free is unnecessary, not because it can't be performed from RCU callbacks. (Yan) --- arch/x86/kvm/mmu/tdp_mmu.c | 16 +++++++++++----- arch/x86/kvm/vmx/tdx.c | 11 ++++++++++- 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index a847a8f09bc6..bb18e9e61542 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -53,13 +53,18 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) rcu_barrier(); } =20 -static void tdp_mmu_free_sp(struct kvm_mmu_page *sp) +static void __tdp_mmu_free_sp(struct kvm_mmu_page *sp) { - free_page((unsigned long)sp->external_spt); free_page((unsigned long)sp->spt); kmem_cache_free(mmu_page_header_cache, sp); } =20 +static void tdp_mmu_free_unused_sp(struct kvm_mmu_page *sp) +{ + free_page((unsigned long)sp->external_spt); + __tdp_mmu_free_sp(sp); +} + /* * This is called through call_rcu in order to free TDP page table memory * safely with respect to other kernel threads that may be operating on @@ -73,7 +78,8 @@ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head = *head) struct kvm_mmu_page *sp =3D container_of(head, struct kvm_mmu_page, rcu_head); =20 - tdp_mmu_free_sp(sp); + WARN_ON_ONCE(sp->external_spt); + __tdp_mmu_free_sp(sp); } =20 void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root) @@ -1266,7 +1272,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm= _page_fault *fault) * failed, e.g. because a different task modified the SPTE. */ if (r) { - tdp_mmu_free_sp(sp); + tdp_mmu_free_unused_sp(sp); goto retry; } =20 @@ -1577,7 +1583,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *= kvm, * installs its own sp in place of the last sp we tried to split. */ if (sp) - tdp_mmu_free_sp(sp); + tdp_mmu_free_unused_sp(sp); =20 return 0; } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 9431bc443d50..2539107e0ad3 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1869,7 +1869,16 @@ static void tdx_sept_free_private_spt(struct kvm *kv= m, struct kvm_mmu_page *sp) */ if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) || tdx_reclaim_page(virt_to_page(sp->external_spt))) - sp->external_spt =3D NULL; + goto out; + + /* + * Immediately free the S-EPT page because RCU-time free is unnecessary + * after TDH.PHYMEM.PAGE.RECLAIM ensures there are no outstanding + * readers. + */ + free_page((unsigned long)sp->external_spt); +out: + sp->external_spt =3D NULL; } =20 void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, --=20 2.43.2