From nobody Mon Feb 9 05:58:28 2026 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 85F2D32AAA3 for ; Thu, 29 Jan 2026 01:15:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649340; cv=none; b=r98bJ3B2up9qMnFFxeC0X16kkDiVMNBte26P1CAsrbcgK41wpjUBciVBRSdC3vZIgbnuTUMmhQ7p6hyfMg9g0yyQNbVy7M0DY/qCNINsutc4AxpcLuynA0K1MLTvJoM5PQOOsGIv71swQGBv5RjMKEGrLIhnEXH7ilGZlVyoUKc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1769649340; c=relaxed/simple; bh=P8qsQaiuZL+h7L+tAAdynSCmTDEjnF5aicywFGbF+f4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=DyaieWMBG7lLW2x49s3dGtO0gga1fuYBaD5QSC5P1+ETtQz8AP5u0fq2SrvKDyQPgii12pmoGrQqFQR3vxO8d3AtJvzcjQDfld1acLyVX2+fVfG/A+aovcuIqnjzfLfLr3BXwBttomV5p7pMt2rbzv3KKr+fK0DmHBa36rRfsq4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=MLX3OaMZ; arc=none smtp.client-ip=209.85.215.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="MLX3OaMZ" Received: by mail-pg1-f202.google.com with SMTP id 41be03b00d2f7-c6187bdadcdso216004a12.0 for ; Wed, 28 Jan 2026 17:15:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1769649337; x=1770254137; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=pQCejTr0vCZl81GKWwdB4BKHKeMCfpmuFrgxzwPsDWQ=; b=MLX3OaMZsd1x+Dr8bv/oDSDCV9VQsrnpASyMyf9vmk8FVw+CL/KZcNo0d01wm/tOdm O+6rJermk6khzhsHnh9J5LPIYv1ryVBT0kgKMDW1vxR0YMJ1VKLGHK6ouIbm5gucHSEr bcETH6xOJVlet2o8eE56euryRwTlbpMdckpvxJC7rSIhTUXd9wa1FQKdfNkLYSn5s1Ln JB5WkN1d3Renh6yPKqj3ztXrGkglWtT0pQU/J/e6xAdwCBB7ovJlVZ2nwPAI6+45YulV JjteZmR/eTLOvu5d+fO42MMtqsyA6CnTvN9rX4Gx5aEBIqH3PYBZDiPxdKOQkXTDis7P WDhw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769649337; x=1770254137; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pQCejTr0vCZl81GKWwdB4BKHKeMCfpmuFrgxzwPsDWQ=; b=BUswzLJjR3zwco4ZYR7L+d9LpwIb9RJmXv4HIoxBuLXr8h0P5l664xZuq03UKORwjV L7YxOash/WHCBW8TvDt56WVwaeSMsCc2q82EAAA8+movsRZz/6zdGacNN7HjX4ZJkIPb ljrG+/wtbQEceID33ohNCCUXUxri+2ve7fgY5C0GBMBSo9r1/VzL/Jp0SD8uRpuv9mJK ePFSQjfyZ6ImnillOoJHHzwMndGfk46oF7mHGBazN1qPX5lUZPPobCpprYe9nf0vfmYd O0yEN7MzgbNl+8dkOQRDt3PXY6H5E9Yz32hCrbmOX6dHoqohMRtkqbWjxmE8dqZWy+ey aowg== X-Gm-Message-State: AOJu0YxtwXNlHj34TbVHtSkYx3wDQ4mfIvwuBAAdTEVky93Z/P/ugfms a/jiLp+TwQ/+owlVQ4IxKsa74exJ+Py4Yx2HAWozrwTkD93ydXMcLLi9MovzmZ0ZG0FWFz85zDF hu2OfHQ== X-Received: from pgg11.prod.google.com ([2002:a05:6a02:4d8b:b0:c65:b239:37ed]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:a106:b0:35d:d477:a7f3 with SMTP id adf61e73a8af0-38ec62ddca8mr6620267637.20.1769649336815; Wed, 28 Jan 2026 17:15:36 -0800 (PST) Reply-To: Sean Christopherson Date: Wed, 28 Jan 2026 17:14:37 -0800 In-Reply-To: <20260129011517.3545883-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260129011517.3545883-1-seanjc@google.com> X-Mailer: git-send-email 2.53.0.rc1.217.geba53bf80e-goog Message-ID: <20260129011517.3545883-6-seanjc@google.com> Subject: [RFC PATCH v5 05/45] KVM: TDX: Drop kvm_x86_ops.link_external_spt(), use .set_external_spte() for all From: Sean Christopherson To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, Kiryl Shutsemau , Sean Christopherson , Paolo Bonzini Cc: linux-kernel@vger.kernel.org, linux-coco@lists.linux.dev, kvm@vger.kernel.org, Kai Huang , Rick Edgecombe , Yan Zhao , Vishal Annapurve , Ackerley Tng , Sagi Shahar , Binbin Wu , Xiaoyao Li , Isaku Yamahata Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Drop the dedicated .link_external_spt() for linking non-leaf S-EPT pages, and instead funnel everything through .set_external_spte(). Using separate hooks doesn't help prevent TDP MMU details from bleeding into TDX, and vice versa; to the contrary, dedicated callbacks will result in _more_ pollution when hugepage support is added, e.g. will require the TDP MMU to know details about the splitting rules for TDX that aren't all that relevant to the TDP MMU. Ideally, KVM would provide a single pair of hooks to set S-EPT entries, one hook for setting SPTEs under write-lock and another for settings SPTEs under read-lock (e.g. to ensure the entire operation is "atomic", to allow for failure, etc.). Sadly, TDX's requirement that all child S-EPT entries are removed before the parent makes that impractical: the TDP MMU deliberately prunes non-leaf SPTEs and _then_ processes its children, thus making it quite important for the TDP MMU to differentiate between zapping leaf and non-leaf S-EPT entries. However, that's the _only_ case that's truly special, and even that case could be shoehorned into a single hook; it's just wouldn't be a net positive. Signed-off-by: Sean Christopherson Reviewed-by: Rick Edgecombe --- arch/x86/include/asm/kvm-x86-ops.h | 1 - arch/x86/include/asm/kvm_host.h | 3 -- arch/x86/kvm/mmu/tdp_mmu.c | 37 +++--------------- arch/x86/kvm/vmx/tdx.c | 61 ++++++++++++++++++++---------- 4 files changed, 48 insertions(+), 54 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index c18a033bee7e..57eb1f4832ae 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -94,7 +94,6 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr) KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) -KVM_X86_OP_OPTIONAL_RET0(link_external_spt) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) KVM_X86_OP_OPTIONAL_RET0(free_external_spt) KVM_X86_OP_OPTIONAL(remove_external_spte) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index e441f270f354..d12ca0f8a348 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1853,9 +1853,6 @@ struct kvm_x86_ops { void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); =20 - /* Update external mapping with page table link. */ - int (*link_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, - void *external_spt); /* Update the external page table from spte getting set. */ int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 0feda295859a..56ad056e6042 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -495,31 +495,17 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pt= ep_t pt, bool shared) call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); } =20 -static void *get_external_spt(gfn_t gfn, u64 new_spte, int level) -{ - if (is_shadow_present_pte(new_spte) && !is_last_spte(new_spte, level)) { - struct kvm_mmu_page *sp =3D spte_to_child_sp(new_spte); - - WARN_ON_ONCE(sp->role.level + 1 !=3D level); - WARN_ON_ONCE(sp->gfn !=3D gfn); - return sp->external_spt; - } - - return NULL; -} - static int __must_check set_external_spte_present(struct kvm *kvm, tdp_pte= p_t sptep, gfn_t gfn, u64 *old_spte, u64 new_spte, int level) { - bool was_present =3D is_shadow_present_pte(*old_spte); - bool is_present =3D is_shadow_present_pte(new_spte); - bool is_leaf =3D is_present && is_last_spte(new_spte, level); - int ret =3D 0; - - KVM_BUG_ON(was_present, kvm); + int ret; =20 lockdep_assert_held(&kvm->mmu_lock); + + if (KVM_BUG_ON(is_shadow_present_pte(*old_spte), kvm)) + return -EIO; + /* * We need to lock out other updates to the SPTE until the external * page table has been modified. Use FROZEN_SPTE similar to @@ -528,18 +514,7 @@ static int __must_check set_external_spte_present(stru= ct kvm *kvm, tdp_ptep_t sp if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE)) return -EBUSY; =20 - /* - * Use different call to either set up middle level - * external page table, or leaf. - */ - if (is_leaf) { - ret =3D kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); - } else { - void *external_spt =3D get_external_spt(gfn, new_spte, level); - - KVM_BUG_ON(!external_spt, kvm); - ret =3D kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt); - } + ret =3D kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); if (ret) __kvm_tdp_mmu_write_spte(sptep, *old_spte); else diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 5688c77616e3..30494f9ceb31 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1664,18 +1664,58 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t = gfn, return 0; } =20 +static struct page *tdx_spte_to_external_spt(struct kvm *kvm, gfn_t gfn, + u64 new_spte, enum pg_level level) +{ + struct kvm_mmu_page *sp =3D spte_to_child_sp(new_spte); + + if (KVM_BUG_ON(!sp->external_spt, kvm) || + KVM_BUG_ON(sp->role.level + 1 !=3D level, kvm) || + KVM_BUG_ON(sp->gfn !=3D gfn, kvm)) + return NULL; + + return virt_to_page(sp->external_spt); +} + +static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, u64 mirror_spte) +{ + gpa_t gpa =3D gfn_to_gpa(gfn); + u64 err, entry, level_state; + struct page *external_spt; + + external_spt =3D tdx_spte_to_external_spt(kvm, gfn, mirror_spte, level); + if (!external_spt) + return -EIO; + + err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, level, external_spt, + &entry, &level_state); + if (unlikely(tdx_operand_busy(err))) + return -EBUSY; + + if (TDX_BUG_ON_2(err, TDH_MEM_SEPT_ADD, entry, level_state, kvm)) + return -EIO; + + return 0; +} + static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); =20 + if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) + return -EIO; + + if (!is_last_spte(mirror_spte, level)) + return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte); + /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) return -EIO; =20 - WARN_ON_ONCE(!is_shadow_present_pte(mirror_spte) || - (mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); + WARN_ON_ONCE((mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); =20 /* * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory() @@ -1695,23 +1735,7 @@ static int tdx_sept_set_private_spte(struct kvm *kvm= , gfn_t gfn, return tdx_mem_page_aug(kvm, gfn, level, pfn); } =20 -static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, - enum pg_level level, void *private_spt) -{ - gpa_t gpa =3D gfn_to_gpa(gfn); - struct page *page =3D virt_to_page(private_spt); - u64 err, entry, level_state; =20 - err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, level, page, &entry, - &level_state); - if (unlikely(tdx_operand_busy(err))) - return -EBUSY; - - if (TDX_BUG_ON_2(err, TDH_MEM_SEPT_ADD, entry, level_state, kvm)) - return -EIO; - - return 0; -} =20 /* * Ensure shared and private EPTs to be flushed on all vCPUs. @@ -3592,7 +3616,6 @@ void __init tdx_hardware_setup(void) */ vt_x86_ops.vm_size =3D max_t(unsigned int, vt_x86_ops.vm_size, sizeof(str= uct kvm_tdx)); =20 - vt_x86_ops.link_external_spt =3D tdx_sept_link_private_spt; vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; vt_x86_ops.free_external_spt =3D tdx_sept_free_private_spt; vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; --=20 2.53.0.rc1.217.geba53bf80e-goog