From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 190A7391846; Fri, 27 Mar 2026 20:14:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642469; cv=none; b=Pqxru7mzapQuS+DS2+wGnaCSWM6ZO1G5L6VFsZRMv4Tg0AnrC6hKdEtJ3KiAGVcklG6XzVUP0taoEDITuz5Ujp/IG3RtXzO9rmmLHbxQl/FMOhw/dVFTTzEKg7jAOtUnLaGXEXistz08s+WnLwrGjWQ6/IboaWFdz291iB6k36s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642469; c=relaxed/simple; bh=LHUZQ0C68NCKk4sQ2cqrZIJBg5G6QZSyv4anzWHlrH0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tir/Cn14KyKGCq61hsPcAOpYLcs0yJptPvImeVPGMzwsEN8Yd5Mds/0NEu9t12qLRUkakuv+Sm2/a75q+13ZBkuVMejrki4Vr6HDvDzKP+DqQBSDRORKP7Yxko/gYUpz0ASDrBrkbuG93mQJGquoYuSqZc0qAe8kEkqEbeM3cdk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=BbxUnPIq; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="BbxUnPIq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642468; x=1806178468; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=LHUZQ0C68NCKk4sQ2cqrZIJBg5G6QZSyv4anzWHlrH0=; b=BbxUnPIqttpr2QcCp821ptvOnY/bmtV7Oxfx3ipjx/M/LUHG886dgmod rdEvXJT2460vrE35tKFLKFLQ7svQCcboKeY0A/NCIhDvbREcw4Appwkf8 +mx4sUu400D932Z1nYEV30hlss3kn8wEh+HTdLUfyGinwVcyqjjJ0meEq BnPx/C3gmCsxABpZMirnbJrZAx9c1FVLvCa4R0F2h9uwJbPI0JVPBAJB1 5wohWGvwPeR1x8E8tBEf/96bKVSLyQKzbyP3fLtaPBtDS4vZgmVGySmhj Ws15uBboFY2iSWM3HOTpGca7SEc7fHChxWyMP4n4WkmZfvX9ssmO1L5YU Q==; X-CSE-ConnectionGUID: CkyQtUOZT8maT7T0a52qAA== X-CSE-MsgGUID: CRR9BnT4QBuRM99xwDHyPg== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182705" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182705" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:27 -0700 X-CSE-ConnectionGUID: Qw0RE6w2TrqqcCmEU2vh5Q== X-CSE-MsgGUID: UJ4m9BzHQ9Odteo2sZDc7g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922889" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 01/17] x86/tdx: Use pg_level in TDX APIs, not the TDX-Module's 0-based level Date: Fri, 27 Mar 2026 13:14:05 -0700 Message-ID: <20260327201421.2824383-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Rework the TDX APIs to take the kernel's 1-based pg_level enum, not the TDX-Module's 0-based level. The APIs are _kernel_ APIs, not TDX-Module APIs, and the kernel (and KVM) uses "enum pg_level" literally everywhere. Using "enum pg_level" eliminates ambiguity when looking at the APIs (it's NOT clear that "int level" refers to the TDX-Module's level), and will allow for using existing helpers like page_level_size() when support for hugepages is added to the S-EPT APIs. No functional change intended. Cc: Kai Huang Cc: Dave Hansen Cc: Rick Edgecombe Cc: Yan Zhao Cc: Vishal Annapurve Cc: Ackerley Tng Acked-by: Kiryl Shutsemau Reviewed-by: Kai Huang Tested-by: Kai Huang Reviewed-by: Rick Edgecombe Tested-by: Rick Edgecombe Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/tdx.h | 14 ++++---------- arch/x86/kvm/vmx/tdx.c | 11 ++++------- arch/x86/virt/vmx/tdx/tdx.c | 26 ++++++++++++++++++-------- 3 files changed, 26 insertions(+), 25 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index a149740b24e8..c140ddde59ff 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -187,19 +187,13 @@ static inline u64 mk_keyed_paddr(u16 hkid, struct pag= e *page) return ret; } =20 -static inline int pg_level_to_tdx_sept_level(enum pg_level level) -{ - WARN_ON_ONCE(level =3D=3D PG_LEVEL_NONE); - return level - 1; -} - u64 tdh_vp_enter(struct tdx_vp *vp, struct tdx_module_args *args); u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page); u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, struct page *page, struct= page *source, u64 *ext_err1, u64 *ext_err2); -u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, int level, struct page *p= age, u64 *ext_err1, u64 *ext_err2); +u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, enum pg_level level, stru= ct page *page, u64 *ext_err1, u64 *ext_err2); u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *tdcx_page); -u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *p= age, u64 *ext_err1, u64 *ext_err2); -u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, int level, u64 *ext_er= r1, u64 *ext_err2); +u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, enum pg_level level, stru= ct page *page, u64 *ext_err1, u64 *ext_err2); +u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, enum pg_level level, u= 64 *ext_err1, u64 *ext_err2); u64 tdh_mng_key_config(struct tdx_td *td); u64 tdh_mng_create(struct tdx_td *td, u16 hkid); u64 tdh_vp_create(struct tdx_td *td, struct tdx_vp *vp); @@ -215,7 +209,7 @@ u64 tdh_vp_rd(struct tdx_vp *vp, u64 field, u64 *data); u64 tdh_vp_wr(struct tdx_vp *vp, u64 field, u64 data, u64 mask); u64 tdh_phymem_page_reclaim(struct page *page, u64 *tdx_pt, u64 *tdx_owner= , u64 *tdx_size); u64 tdh_mem_track(struct tdx_td *tdr); -u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_er= r1, u64 *ext_err2); +u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, enum pg_level level, u= 64 *ext_err1, u64 *ext_err2); u64 tdh_phymem_cache_wb(bool resume); u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td); u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page); diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 1e47c194af53..38e7b6fa8664 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1638,14 +1638,13 @@ static int tdx_mem_page_add(struct kvm *kvm, gfn_t = gfn, enum pg_level level, static int tdx_mem_page_aug(struct kvm *kvm, gfn_t gfn, enum pg_level level, kvm_pfn_t pfn) { - int tdx_level =3D pg_level_to_tdx_sept_level(level); struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); struct page *page =3D pfn_to_page(pfn); gpa_t gpa =3D gfn_to_gpa(gfn); u64 entry, level_state; u64 err; =20 - err =3D tdh_mem_page_aug(&kvm_tdx->td, gpa, tdx_level, page, &entry, &lev= el_state); + err =3D tdh_mem_page_aug(&kvm_tdx->td, gpa, level, page, &entry, &level_s= tate); if (unlikely(tdx_operand_busy(err))) return -EBUSY; =20 @@ -1689,12 +1688,11 @@ static int tdx_sept_set_private_spte(struct kvm *kv= m, gfn_t gfn, static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, enum pg_level level, void *private_spt) { - int tdx_level =3D pg_level_to_tdx_sept_level(level); gpa_t gpa =3D gfn_to_gpa(gfn); struct page *page =3D virt_to_page(private_spt); u64 err, entry, level_state; =20 - err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, tdx_level, page, &ent= ry, + err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, level, page, &entry, &level_state); if (unlikely(tdx_operand_busy(err))) return -EBUSY; @@ -1778,7 +1776,6 @@ static void tdx_sept_remove_private_spte(struct kvm *= kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte) { struct page *page =3D pfn_to_page(spte_to_pfn(mirror_spte)); - int tdx_level =3D pg_level_to_tdx_sept_level(level); struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); gpa_t gpa =3D gfn_to_gpa(gfn); u64 err, entry, level_state; @@ -1798,7 +1795,7 @@ static void tdx_sept_remove_private_spte(struct kvm *= kvm, gfn_t gfn, return; =20 err =3D tdh_do_no_vcpus(tdh_mem_range_block, kvm, &kvm_tdx->td, gpa, - tdx_level, &entry, &level_state); + level, &entry, &level_state); if (TDX_BUG_ON_2(err, TDH_MEM_RANGE_BLOCK, entry, level_state, kvm)) return; =20 @@ -1814,7 +1811,7 @@ static void tdx_sept_remove_private_spte(struct kvm *= kvm, gfn_t gfn, * Race with TDH.VP.ENTER due to (0-step mitigation) and Guest TDCALLs. */ err =3D tdh_do_no_vcpus(tdh_mem_page_remove, kvm, &kvm_tdx->td, gpa, - tdx_level, &entry, &level_state); + level, &entry, &level_state); if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_REMOVE, entry, level_state, kvm)) return; =20 diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index cb9b3210ab71..a6e77afafa79 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1568,6 +1568,12 @@ static void tdx_clflush_page(struct page *page) clflush_cache_range(page_to_virt(page), PAGE_SIZE); } =20 +static int pg_level_to_tdx_sept_level(enum pg_level level) +{ + WARN_ON_ONCE(level =3D=3D PG_LEVEL_NONE); + return level - 1; +} + noinstr u64 tdh_vp_enter(struct tdx_vp *td, struct tdx_module_args *args) { args->rcx =3D td->tdvpr_pa; @@ -1608,10 +1614,11 @@ u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, st= ruct page *page, struct page } EXPORT_SYMBOL_FOR_KVM(tdh_mem_page_add); =20 -u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, int level, struct page *p= age, u64 *ext_err1, u64 *ext_err2) +u64 tdh_mem_sept_add(struct tdx_td *td, u64 gpa, enum pg_level level, + struct page *page, u64 *ext_err1, u64 *ext_err2) { struct tdx_module_args args =3D { - .rcx =3D gpa | level, + .rcx =3D gpa | pg_level_to_tdx_sept_level(level), .rdx =3D tdx_tdr_pa(td), .r8 =3D page_to_phys(page), }; @@ -1639,10 +1646,11 @@ u64 tdh_vp_addcx(struct tdx_vp *vp, struct page *td= cx_page) } EXPORT_SYMBOL_FOR_KVM(tdh_vp_addcx); =20 -u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, int level, struct page *p= age, u64 *ext_err1, u64 *ext_err2) +u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, enum pg_level level, + struct page *page, u64 *ext_err1, u64 *ext_err2) { struct tdx_module_args args =3D { - .rcx =3D gpa | level, + .rcx =3D gpa | pg_level_to_tdx_sept_level(level), .rdx =3D tdx_tdr_pa(td), .r8 =3D page_to_phys(page), }; @@ -1658,10 +1666,11 @@ u64 tdh_mem_page_aug(struct tdx_td *td, u64 gpa, in= t level, struct page *page, u } EXPORT_SYMBOL_FOR_KVM(tdh_mem_page_aug); =20 -u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, int level, u64 *ext_er= r1, u64 *ext_err2) +u64 tdh_mem_range_block(struct tdx_td *td, u64 gpa, enum pg_level level, + u64 *ext_err1, u64 *ext_err2) { struct tdx_module_args args =3D { - .rcx =3D gpa | level, + .rcx =3D gpa | pg_level_to_tdx_sept_level(level), .rdx =3D tdx_tdr_pa(td), }; u64 ret; @@ -1874,10 +1883,11 @@ u64 tdh_mem_track(struct tdx_td *td) } EXPORT_SYMBOL_FOR_KVM(tdh_mem_track); =20 -u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_er= r1, u64 *ext_err2) +u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, enum pg_level level, + u64 *ext_err1, u64 *ext_err2) { struct tdx_module_args args =3D { - .rcx =3D gpa | level, + .rcx =3D gpa | pg_level_to_tdx_sept_level(level), .rdx =3D tdx_tdr_pa(td), }; u64 ret; --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A53FB38F920; Fri, 27 Mar 2026 20:14:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642471; cv=none; b=rixgUGpBJPy+EeaCrpivuUcueK8HwUWSpDNWDnkk0zaPAQ7BtCvSAjzy2/llzCPXVnLZthDRTmGmnoHQ6wcpUoPgtY+0PPwEIu7NKGRamHKCfPh5N4eofwA2cIqgEyDfo0Kkn0jhrqzsZPqCTB+BHB2diwRosyb6NeYm2NGabo0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642471; c=relaxed/simple; bh=KZ29c+Xij5yE1To66xjc1dFL0PHNiCMisEfZHw5vQ8E=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KUO/ZT8F4/4sEFla0qqpoXgj9EJq0zncSGX1BN/66SJ+rPXGFrarcfdHDmA06jPn8DQeMK45psdwAfHnRh/G6DlBB8y3qJKCwCScvQYxlYRznYeWEnCsRPhIoTYiSXovlU9UHPI2u3haRGzxiyAdjz+rbyFFSnrJbszSU1LxkXQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=JNMTH1FS; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="JNMTH1FS" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642470; x=1806178470; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KZ29c+Xij5yE1To66xjc1dFL0PHNiCMisEfZHw5vQ8E=; b=JNMTH1FSZ2wQWdjfSZ/jiqndLQ8A9yyb0if6sy5x14wqPYbSjR4DVGTx jXkLMaJR5og7MM3ghjXv2boa38WuxUzZZ3acmlVfVm+CmDaHKFKRhnfbk c6n+4kjyldyP2hdlbnrJVPszFc14DDktP3RSjp/SKLSYTHRx0i9pbcqfS 9gLaxwJf5pqZ/01wOgiP9islRQPm8YNy4EMYoAIiQK0slOxMojelQdB8Y 2a3P+gCkamRkGRQQTl8ed58dZFF9/xApTGB/ZdWJz32jL09AWvvwNqaKA OsmeGqEp1JAn2InH6aaJMojdCQa8dT/7Kd8u3KmYCWdmaPY+1vlBGNfKx A==; X-CSE-ConnectionGUID: dhvXJfOLQxOa8cASLXL9PA== X-CSE-MsgGUID: sf9m6lKGTFSFYjaZy4TZKw== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182712" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182712" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:27 -0700 X-CSE-ConnectionGUID: 215oYJdZR/6lUS3+TVr3ww== X-CSE-MsgGUID: hbI4H9p1StyYFS+ztpxWLw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922892" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 02/17] KVM: x86/mmu: Update iter->old_spte if cmpxchg64 on mirror SPTE "fails" Date: Fri, 27 Mar 2026 13:14:06 -0700 Message-ID: <20260327201421.2824383-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Pass a pointer to iter->old_spte, not simply its value, when setting an external SPTE in __tdp_mmu_set_spte_atomic(), so that the iterator's value will be updated if the cmpxchg64 to freeze the mirror SPTE fails. The bug is currently benign as TDX is mutualy exclusive with all paths that do "local" retry", e.g. clear_dirty_gfn_range() and wrprot_gfn_range(). Fixes: 77ac7079e66d ("KVM: x86/tdp_mmu: Propagate building mirror page tabl= es") Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe --- arch/x86/kvm/mmu/tdp_mmu.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 7b1102d26f9c..dbaeb80f2b64 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -509,10 +509,10 @@ static void *get_external_spt(gfn_t gfn, u64 new_spte= , int level) } =20 static int __must_check set_external_spte_present(struct kvm *kvm, tdp_pte= p_t sptep, - gfn_t gfn, u64 old_spte, + gfn_t gfn, u64 *old_spte, u64 new_spte, int level) { - bool was_present =3D is_shadow_present_pte(old_spte); + bool was_present =3D is_shadow_present_pte(*old_spte); bool is_present =3D is_shadow_present_pte(new_spte); bool is_leaf =3D is_present && is_last_spte(new_spte, level); int ret =3D 0; @@ -525,7 +525,7 @@ static int __must_check set_external_spte_present(struc= t kvm *kvm, tdp_ptep_t sp * page table has been modified. Use FROZEN_SPTE similar to * the zapping case. */ - if (!try_cmpxchg64(rcu_dereference(sptep), &old_spte, FROZEN_SPTE)) + if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE)) return -EBUSY; =20 /* @@ -541,7 +541,7 @@ static int __must_check set_external_spte_present(struc= t kvm *kvm, tdp_ptep_t sp ret =3D kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt); } if (ret) - __kvm_tdp_mmu_write_spte(sptep, old_spte); + __kvm_tdp_mmu_write_spte(sptep, *old_spte); else __kvm_tdp_mmu_write_spte(sptep, new_spte); return ret; @@ -670,7 +670,7 @@ static inline int __must_check __tdp_mmu_set_spte_atomi= c(struct kvm *kvm, return -EBUSY; =20 ret =3D set_external_spte_present(kvm, iter->sptep, iter->gfn, - iter->old_spte, new_spte, iter->level); + &iter->old_spte, new_spte, iter->level); if (ret) return ret; } else { --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 121B03921F6; Fri, 27 Mar 2026 20:14:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642471; cv=none; b=gZzLfTGeQ1LgsAWZH1CVpXDuCp12lunbTDLi6iwBrlFSWVE3upnX375WNmmbcEg78wvN0UJb2+bdHrEt72APDzB3/Q3Yw5j+VnRJwkqkIVHHrzPChYd6mLpWIWOqdivE0BiwSRVHGBZTOpPQ78vwhPcWjuOozxedowgHNgPNOSo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642471; c=relaxed/simple; bh=1isAVXq6RldAGaJfpOEXaGLNSTFnOTTvj0rcbVDxYKA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lFoTmcaeN1UfTgskrHLecPPxeOpecukXjCAF1pGC8o3+oUiEW+nbBLvdKw+rjmoKiX2pjQYFlU3TLboSE//gSP5wfyGlhEOta7J6ohCiGaCkt3uJVY92CLCQgnmi7y0sPtVtgEMyqokWO87Gy9JREUj/hlbzK+vo+gNuNilyPX4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SoOZkK/R; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SoOZkK/R" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642470; x=1806178470; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=1isAVXq6RldAGaJfpOEXaGLNSTFnOTTvj0rcbVDxYKA=; b=SoOZkK/RgmWsESAkyHcajGG6NCHPGqlTO29Xgrp76Ts/lfvCjDwEX4se JCQEezv6HJlAqjiyUE6f1LLxqxqWk+tibH31JUn5WWYWDUOd2+G5jAx1H 4vVJbS/4yQp5ItixFaK7YvGlBb7Xd65Er0euXU/cT5+nThVX98vVxjI2c kUhLcnJMOxo7fu2rzkN8kWK0JJSF1VAsvwXhTzNiLLMgP3X9AgpEIrbdV e7BR5pVaTUBuXlLTCayyDULycdgUbbpOLyCKHhjWU43V599G5njIZV4vR DST+vz2ssAaParfI6rVEDcMkvTEKndZrOWPDe4Z9jiLOsa6CUgZrRq0SB g==; X-CSE-ConnectionGUID: DVq6llWER3iM1gbXaKB5Ew== X-CSE-MsgGUID: 3CMIXxQxQ9WkrNSPbT9PJQ== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182722" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182722" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:27 -0700 X-CSE-ConnectionGUID: 66h6j2OKTvu26OQPFg66GA== X-CSE-MsgGUID: ep7Txl5DRPmUR92+zz0sDg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922894" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 03/17] KVM: TDX: Account all non-transient page allocations for per-TD structures Date: Fri, 27 Mar 2026 13:14:07 -0700 Message-ID: <20260327201421.2824383-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Account all non-transient allocations associated with a single TD (or its vCPUs), as KVM's ABI is that allocations that are active for the lifetime of a VM are accounted. Leave temporary allocations, i.e. allocations that are freed within a single function/ioctl, unaccounted, to again align with KVM's existing behavior, e.g. see commit dd103407ca31 ("KVM: X86: Remove unnecessary GFP_KERNEL_ACCOUNT for temporary variables"). Fixes: 8d032b683c29 ("KVM: TDX: create/destroy VM structure") Fixes: a50f673f25e0 ("KVM: TDX: Do TDX specific vcpu initialization") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe --- arch/x86/kvm/vmx/tdx.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 38e7b6fa8664..01e070ec10fd 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -2384,7 +2384,7 @@ static int __tdx_td_init(struct kvm *kvm, struct td_p= arams *td_params, =20 ret =3D -ENOMEM; =20 - tdr_page =3D alloc_page(GFP_KERNEL); + tdr_page =3D alloc_page(GFP_KERNEL_ACCOUNT); if (!tdr_page) goto free_hkid; =20 @@ -2392,12 +2392,13 @@ static int __tdx_td_init(struct kvm *kvm, struct td= _params *td_params, /* TDVPS =3D TDVPR(4K page) + TDCX(multiple 4K pages), -1 for TDVPR. */ kvm_tdx->td.tdcx_nr_pages =3D tdx_sysinfo->td_ctrl.tdvps_base_size / PAGE= _SIZE - 1; tdcs_pages =3D kzalloc_objs(*kvm_tdx->td.tdcs_pages, - kvm_tdx->td.tdcs_nr_pages); + kvm_tdx->td.tdcs_nr_pages, + GFP_KERNEL_ACCOUNT); if (!tdcs_pages) goto free_tdr; =20 for (i =3D 0; i < kvm_tdx->td.tdcs_nr_pages; i++) { - tdcs_pages[i] =3D alloc_page(GFP_KERNEL); + tdcs_pages[i] =3D alloc_page(GFP_KERNEL_ACCOUNT); if (!tdcs_pages[i]) goto free_tdcs; } @@ -2872,7 +2873,7 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, u6= 4 vcpu_rcx) int ret, i; u64 err; =20 - page =3D alloc_page(GFP_KERNEL); + page =3D alloc_page(GFP_KERNEL_ACCOUNT); if (!page) return -ENOMEM; tdx->vp.tdvpr_page =3D page; @@ -2885,14 +2886,14 @@ static int tdx_td_vcpu_init(struct kvm_vcpu *vcpu, = u64 vcpu_rcx) tdx->vp.tdvpr_pa =3D page_to_phys(tdx->vp.tdvpr_page); =20 tdx->vp.tdcx_pages =3D kcalloc(kvm_tdx->td.tdcx_nr_pages, sizeof(*tdx->vp= .tdcx_pages), - GFP_KERNEL); + GFP_KERNEL_ACCOUNT); if (!tdx->vp.tdcx_pages) { ret =3D -ENOMEM; goto free_tdvpr; } =20 for (i =3D 0; i < kvm_tdx->td.tdcx_nr_pages; i++) { - page =3D alloc_page(GFP_KERNEL); + page =3D alloc_page(GFP_KERNEL_ACCOUNT); if (!page) { ret =3D -ENOMEM; goto free_tdcx; --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 46DBB39184C; Fri, 27 Mar 2026 20:14:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642473; cv=none; b=U6InoJQD8jlMUrFQtNCQqSbEJBk8enaS0HOibBeq04xNDuA8XEMu2m91oc9Xm7wA4oeQZtZWDqvsITBwKSssJFHBevEN0xIMor2TX+sDZHqKN2/IDnMv1y2BFMAWzuewzivCc1sR6iSI181fc8ayM/FbogUSswluT2MZuLRHrWc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642473; c=relaxed/simple; bh=BnLASTfk0bboPsP8BzC/w01D2X3PlCKuz/a3ZXgxiYY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=VqqZM9bzkGbS5RsLfSIyXw6mc5ttpICd+ax4nFS++7J3bmFUVKBku0yz+37TZ4hg6coxj8T9A94mBptCWtZSBqhCRchZ9RyKs+EiprZAs3W6RpKsu98D/GokJMQB1vex8D6wotK/PYhWDQYKqS9Z0yMI+5UgJ99VQncQwLsrAo0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=mzOTH5YL; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="mzOTH5YL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642471; x=1806178471; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BnLASTfk0bboPsP8BzC/w01D2X3PlCKuz/a3ZXgxiYY=; b=mzOTH5YLf7S9fmFBeXKGBiQSq9N1HqNRiuQM9YIc+rm/idSauCjPY8TG 6ZVD3y1QiIOd1JOLFrCWxtg4VO9QoqOwOSWwHHWgoK5LrlYQlQj4BuqcZ /tTLXWBzovjsv8JFnTgFqMCYE8XCYTkI55tt/EradjMnLDsfPFD7Ld9vM iDcBg8/QWj68D4joNqzgXu5HQOTz2lR1eCLcIgt1VMh57gyE/fgI8vATx Hoku86iCeclPgvN0c4vHsp+wHoP0ZprfhJ+jrjXGkUM6n8KJ2ASYJlcCr TIkfJ+HV8kTFXP0xUce3kf+bx8xhUaUkdCw49XjLMF9YDp3J3VQHt4nEt w==; X-CSE-ConnectionGUID: Kj3ai58JTb6aL3GcG0rvfA== X-CSE-MsgGUID: wV92g3mFQxyMgCLxhb6G0Q== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182734" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182734" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:27 -0700 X-CSE-ConnectionGUID: hPUbbZq+T5uEZVlsp+4kxQ== X-CSE-MsgGUID: 0cyV7rpkRcKlx1x+mFxr9A== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922897" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 04/17] KVM: x86: Make "external SPTE" ops that can fail RET0 static calls Date: Fri, 27 Mar 2026 13:14:08 -0700 Message-ID: <20260327201421.2824383-5-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Define kvm_x86_ops .link_external_spt(), .set_external_spte(), and .free_external_spt() as RET0 static calls so that an unexpected call to a a default operation doesn't consume garbage. Fixes: 77ac7079e66d ("KVM: x86/tdp_mmu: Propagate building mirror page tabl= es") Fixes: 94faba8999b9 ("KVM: x86/tdp_mmu: Propagate tearing down mirror page = tables") Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/kvm-x86-ops.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 3776cf5382a2..31d5c5d58ae6 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -95,9 +95,9 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr) KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) -KVM_X86_OP_OPTIONAL(link_external_spt) -KVM_X86_OP_OPTIONAL(set_external_spte) -KVM_X86_OP_OPTIONAL(free_external_spt) +KVM_X86_OP_OPTIONAL_RET0(link_external_spt) +KVM_X86_OP_OPTIONAL_RET0(set_external_spte) +KVM_X86_OP_OPTIONAL_RET0(free_external_spt) KVM_X86_OP_OPTIONAL(remove_external_spte) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BAD3439280C; Fri, 27 Mar 2026 20:14:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642471; cv=none; b=aH+X9ycZTPOkNSwkCra2xL5EzPvx6vmhxX/wgj4k5CnKzVXKL184Wv+gAJh1XihHpofl3w8Go2WsxCYQioAgCywgDh6PPiqUHuQC8ZnkIHAqwAs6RlVpfkCXL8XrIE4cqEn/rL/xYnG3DLPRAIFOzbhiFed6BNO20aJ7GN6DvvU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642471; c=relaxed/simple; bh=oCWebE5dfUT3EHbwMQ1dAxf5XfEn+nRJlGBxh/ODp1s=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=H0Fp9Y+wX6N9SJJ8MC6FXwGwgjYkVZlmon1LWjWxhmGSMv9rmQITT8aulrG9DoExQXsZSTTk11NDf+rfqK+w47OOSEyaILiUalg7piVQtnHALoFBtnH3jDEM2YzCPPcvK2gKkBNjWPS0sb9D/NQ3vj9I8pjG2Lri7epxFvvy+Ao= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=BpBUwRuf; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="BpBUwRuf" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642470; x=1806178470; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=oCWebE5dfUT3EHbwMQ1dAxf5XfEn+nRJlGBxh/ODp1s=; b=BpBUwRufVdY6Z7A53uz05gqx7W8OvZ0BNPtkD9Ofu45HvR2UpLoME4dV GoD/y3vPZDFGpMuyyWSmQVoMvqZWTPXkLLaQ72j8Jz+nk9qjsM7Q+Masb JqcByyhQq/ytEZTe6nQZ1kemg06fuxe0If23MlOilf6q1zVc7oJYMWQBq sOhWFPYrggTS9qTShbWa2T0BDRXObpAKJepoWvL5o+AoxAKLv5TieOH5I tvz9r3PbeVMCgwSiOnSE5aGdR/R2JDY4X8JA/zqvTIPYysfhQsNR6VGdS douhAN3lUODH1hQLnui+MtY4bYo+JWsuIXePcLzQqsP10e2XHcuAXbchX A==; X-CSE-ConnectionGUID: zN9tAsY4Rc2A6ntAWUuM6Q== X-CSE-MsgGUID: b+gz9oixQRCoAaW0cf71fQ== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182726" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182726" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:27 -0700 X-CSE-ConnectionGUID: +BuDfg5XRpa/sIutqk2jLQ== X-CSE-MsgGUID: NuTIf+gQRQSIL6oNlJilvg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922901" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 05/17] KVM: x86/tdp_mmu: Drop zapping KVM_BUG_ON() set_external_spte_present() Date: Fri, 27 Mar 2026 13:14:09 -0700 Message-ID: <20260327201421.2824383-6-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Drop some KVM_BUG_ON() that are guarding against TDP MMU attempting to propagate unsupported operations to the external set_external_spte() ops. Despite the generic naming, external TDP ops are designed completely around TDX. They hook the bare minimum of what is needed, and exclude the operations that are not supported by TDX. To help wrangle which operations are handleable by various operations, warnings and KVM_BUG_ONs exist in the code. These warnings and KVM_BUG_ON()s put the burden of understanding which operations should be forwarded to TDX code on the TDP MMU developers, who often read the code without TDX context. Future changes will transition the encapsulation of this domain knowledge to TDX code by funneling the external EPT updates through a central update mechanism. In this paradigm, central update mechanism can encapsulate the special knowledge, but will not have as much knowledge about what operation is in progress. So remove the set external SPTE based KVM_BUG_ON()s in preparation for this future change. Suggested-by: Sean Christopherson Signed-off-by: Rick Edgecombe --- arch/x86/kvm/mmu/tdp_mmu.c | 10 ---------- 1 file changed, 10 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index dbaeb80f2b64..0809fe8e8737 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -512,13 +512,10 @@ static int __must_check set_external_spte_present(str= uct kvm *kvm, tdp_ptep_t sp gfn_t gfn, u64 *old_spte, u64 new_spte, int level) { - bool was_present =3D is_shadow_present_pte(*old_spte); bool is_present =3D is_shadow_present_pte(new_spte); bool is_leaf =3D is_present && is_last_spte(new_spte, level); int ret =3D 0; =20 - KVM_BUG_ON(was_present, kvm); - lockdep_assert_held(&kvm->mmu_lock); /* * We need to lock out other updates to the SPTE until the external @@ -662,13 +659,6 @@ static inline int __must_check __tdp_mmu_set_spte_atom= ic(struct kvm *kvm, if (is_mirror_sptep(iter->sptep) && !is_frozen_spte(new_spte)) { int ret; =20 - /* - * Users of atomic zapping don't operate on mirror roots, - * so don't handle it and bug the VM if it's seen. - */ - if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) - return -EBUSY; - ret =3D set_external_spte_present(kvm, iter->sptep, iter->gfn, &iter->old_spte, new_spte, iter->level); if (ret) --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D78C395254; Fri, 27 Mar 2026 20:14:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642473; cv=none; b=uoUkSoKkU1t5awxQZxIICBWo2ZNYDv9Gg+8rJ0iXwOd4qNB9JZWbF+lqLKGT8eOPQn2hKUHAiXPAI+Tt3LA/5myi9/lfgTy/BA/kl9xPyuep7+PVIo5KzieDf2aExujfxa/tbMgy8Fi877Xzi3F4gpcoAPzp8dtYZueyfEmhLhY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642473; c=relaxed/simple; bh=abr0DVRaJrQUjS4UvUyOz7NWkr4UifXMcKoeUVR+ems=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=XxXp5510CEe9zoM8zctPmBw9l6iMlWca9+k688OwWJAwK7PCQ/Y5VJutX2ftPaTe6NCFDIzwXYqAdQCmyhxbFeJXzy16JKHnWg/mbB0a1lSnuEj86rWQgZfBk+c8RTsB3Vt/LxqM9j/yS335Z2ivgFnSlIO224ob8+VtmKx0upE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=c2ZdCbGj; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="c2ZdCbGj" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642472; x=1806178472; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=abr0DVRaJrQUjS4UvUyOz7NWkr4UifXMcKoeUVR+ems=; b=c2ZdCbGja7lccrrmSphwxvp/mPGWvUMixg54nPFlbn60u6i6B9CHnPja JxkXSDzHB6ZdZzVFKQXS179noWvXWSltqckFnd3fw492iE9svkAdx9kBc Q9mngPadRyNMFNlGcPVym2KeasTjhv9rBx23G5Rj02WtYHzRIRu0gNi9T 5N/SB8xWo0SKYdXIgPXxWxRA/w0HN3UiNr41qraF14CbU6gJIxNdxB10p Rl4WdKEdrv4cNEtnCb1Guf637XTUyjQwKL1o1s9O/Vnms/XnhFL/nOlD6 4ErY37UTXLeGab2ZEBPSsKkGQJ6iMSxB7YH3SG2E6pzyPtZ9m2KpSvy8u A==; X-CSE-ConnectionGUID: iQFhSQu2QpO+eox8OGBC/w== X-CSE-MsgGUID: 69/M40x+ST2Q9Vb6r30wPA== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182735" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182735" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:28 -0700 X-CSE-ConnectionGUID: CxBgY5HBRHyMHg+b56uYxw== X-CSE-MsgGUID: 4tSjUdxVQs+c/5HkNhf0Kw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922904" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 06/17] KVM: x86/tdp_mmu: Morph the !is_frozen_spte() check into a KVM_MMU_WARN_ON() Date: Fri, 27 Mar 2026 13:14:10 -0700 Message-ID: <20260327201421.2824383-7-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Remove the conditional logic for handling the setting of mirror EPTs to frozen in __tdp_mmu_set_spte_atomic() and add it as a warning instead. Mirror TDP needs propagate PTE changes to the to the external TDP. This presents a problem for atomic updates which can't update both at once. So a special value, FROZEN_SPTE, is used as a temporary state during these updates to prevent concurrent operations to the PTE. If the TDP MMU tried to install this as a long term value, it would confuse these updates. Despite this __tdp_mmu_set_spte_atomic() includes a check to handle it being set. Remove this check and turn it into a warning. Suggested-by: Sean Christopherson Signed-off-by: Rick Edgecombe --- arch/x86/kvm/mmu/tdp_mmu.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 0809fe8e8737..338957bc5109 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -656,7 +656,13 @@ static inline int __must_check __tdp_mmu_set_spte_atom= ic(struct kvm *kvm, */ WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte)); =20 - if (is_mirror_sptep(iter->sptep) && !is_frozen_spte(new_spte)) { + /* + * FROZEN_SPTE is a temporary state and should never be set via higher + * level helpers. + */ + KVM_MMU_WARN_ON(is_frozen_spte(new_spte)); + + if (is_mirror_sptep(iter->sptep)) { int ret; =20 ret =3D set_external_spte_present(kvm, iter->sptep, iter->gfn, --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8E5973947AC; Fri, 27 Mar 2026 20:14:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642474; cv=none; b=qiFxXi7Ps8k0EJ4crzZIKWOkRBGt8QScrT1NjTBK82jfVijqzgxYdUO8uMU+B40zNQmrIZOvP0JLsTUCDVrKR+oJHfRuDVFdYOPkoEdDvBTOwOXQOS/dtDxW1B+bmXHJ5wcVCrKijJgL4C2FaVCoWjV8DhiFdLoXwuzPGVsWaIU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642474; c=relaxed/simple; bh=Z8h2Ml+ce0Uce4RP3iqBjRYcOMplx5MQTLmP28OMjNE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=cmJWPMelqPvKkrboN5gwHXVOxf13bZFEcAHnBu3zXdnqWJ5c1/xUXG0sJBzgnZVgHHgkoWbqPTIhG17S2mC7p1vuupi1Aa6mpjsDIwkLxudrFzSLoBKQlOP1jmrb20D6SPXj9WMcadbGjmMuHoVO/2c+QNpbIOvZorjkGin/uSk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Hn5p7XU1; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Hn5p7XU1" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642472; x=1806178472; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Z8h2Ml+ce0Uce4RP3iqBjRYcOMplx5MQTLmP28OMjNE=; b=Hn5p7XU14u3OJttc6iVS8ZF4GWNI+07iEGQKwu/qvH831JLDXkiYJohc rfDsHfV6ObCLroEmU5qYMBBi2JirH0QiwcNNn+1/hXNW6f63JWjtGVsaP moxACdny57LU5quFupT7LHsa59dk0NHppUD0f+V9jttWDjJEV+y0D8Pdb yOyGuTlo6a9YbiockczBh3I7zjMqvFmF+i/8aASJxpiuZZt/0AG5Gezgh ml+mTJSp16aM5O008ixMGKzwI/d88iQOQXyXVgIDdulby5JdgkQGwWHNc Naf2OtfasyFkVvKdqHkA3VMbKM1TO7ur4Awe/bj52BC3W3dSMRULjWEDY Q==; X-CSE-ConnectionGUID: WYJEZF3BTFW11jMQmmdZqg== X-CSE-MsgGUID: LUhnV/TURfCMkKqWl1SiXQ== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182737" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182737" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:28 -0700 X-CSE-ConnectionGUID: /DSS+e2qR4+jVyEzgqTLWw== X-CSE-MsgGUID: HA1A4b4vQSebUTPaD/4+1g== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922907" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 07/17] KVM: x86/tdp_mmu: Centralize updates to present external PTEs Date: Fri, 27 Mar 2026 13:14:11 -0700 Message-ID: <20260327201421.2824383-8-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Sean Christopherson Centralize the updates to present external PTEs to the handle_changed_spte() function. When setting a PTE to present in the mirror page tables, the update needs to propagate to the external page tables (in TDX parlance the S-EPT). Today this is handled by special mirror page tables branching in __tdp_mmu_set_spte_atomic(), which is the only place where present PTEs are set for TDX. This keeps things running, but is a bit hacked on. The hook for setting present leaf PTEs are added only where TDX happens to need them. For example, TDX does not support any of the operations that use the non-atomic variant, tdp_mmu_set_spte() to set present PTEs. Since the hook is missing there, it is very hard to understand the code from a non-TDX lens. If the reader doesn=E2=80=99t know the TDX specifics it could look li= ke the external update is missing. In addition to being confusing, it also litters the TDP MMU with "external" update callbacks. This is especially unfortunate because there is already a central place to react to TDP updates, handle_changed_spte(). Begin the process of moving towards a model where all mirror page table updates are forwarded to TDX code where the TDX specific logic can live with a more proper separation of concerns. Do this by teaching handle_changed_spte() how to return error codes, such that it can propagate the failures that may come from TDX external page table updates. Atomic mirror page table updates need to be done in a special way to prevent concurrent updates to the mirror page table while the external page table is updated. The mirror page table is set to the frozen PTE value while the external version is updates. This frozen PTE dance is currently done in __tdp_mmu_set_spte_atomic(). Hoist it up a level so that the external update in handle_changed_spte() can be done while the PTE is frozen. Link: https://lore.kernel.org/lkml/aYYn0nf2cayYu8e7@google.com/ Not-yet-Signed-off-by: Sean Christopherson [Based on a diff by Sean Chrisopherson] Signed-off-by: Rick Edgecombe --- arch/x86/kvm/mmu/tdp_mmu.c | 150 ++++++++++++++++++++++--------------- 1 file changed, 88 insertions(+), 62 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 338957bc5109..db16e81b9701 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -320,9 +320,9 @@ void kvm_tdp_mmu_alloc_root(struct kvm_vcpu *vcpu, bool= mirror) } } =20 -static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, - u64 old_spte, u64 new_spte, int level, - bool shared); +static void handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp, + gfn_t gfn, u64 old_spte, u64 new_spte, + int level, bool shared); =20 static void tdp_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp) { @@ -471,8 +471,7 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep= _t pt, bool shared) old_spte =3D kvm_tdp_mmu_write_spte(sptep, old_spte, FROZEN_SPTE, level); } - handle_changed_spte(kvm, kvm_mmu_page_as_id(sp), gfn, - old_spte, FROZEN_SPTE, level, shared); + handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared); =20 if (is_mirror_sp(sp)) { KVM_BUG_ON(shared, kvm); @@ -508,22 +507,15 @@ static void *get_external_spt(gfn_t gfn, u64 new_spte= , int level) return NULL; } =20 -static int __must_check set_external_spte_present(struct kvm *kvm, tdp_pte= p_t sptep, - gfn_t gfn, u64 *old_spte, - u64 new_spte, int level) +static int __must_check set_external_spte_present(struct kvm *kvm, + gfn_t gfn, u64 old_spte, + u64 new_spte, int level) { bool is_present =3D is_shadow_present_pte(new_spte); bool is_leaf =3D is_present && is_last_spte(new_spte, level); int ret =3D 0; =20 lockdep_assert_held(&kvm->mmu_lock); - /* - * We need to lock out other updates to the SPTE until the external - * page table has been modified. Use FROZEN_SPTE similar to - * the zapping case. - */ - if (!try_cmpxchg64(rcu_dereference(sptep), old_spte, FROZEN_SPTE)) - return -EBUSY; =20 /* * Use different call to either set up middle level @@ -537,17 +529,13 @@ static int __must_check set_external_spte_present(str= uct kvm *kvm, tdp_ptep_t sp KVM_BUG_ON(!external_spt, kvm); ret =3D kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt); } - if (ret) - __kvm_tdp_mmu_write_spte(sptep, *old_spte); - else - __kvm_tdp_mmu_write_spte(sptep, new_spte); return ret; } =20 /** - * handle_changed_spte - handle bookkeeping associated with an SPTE change + * __handle_changed_spte - handle bookkeeping associated with an SPTE chan= ge * @kvm: kvm instance - * @as_id: the address space of the paging structure the SPTE was a part of + * @sp: the page table in which the SPTE resides * @gfn: the base GFN that was mapped by the SPTE * @old_spte: The value of the SPTE before the change * @new_spte: The value of the SPTE after the change @@ -560,15 +548,17 @@ static int __must_check set_external_spte_present(str= uct kvm *kvm, tdp_ptep_t sp * dirty logging updates are handled in common code, not here (see make_sp= te() * and fast_pf_fix_direct_spte()). */ -static void handle_changed_spte(struct kvm *kvm, int as_id, gfn_t gfn, - u64 old_spte, u64 new_spte, int level, - bool shared) +static int __handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp, + gfn_t gfn, u64 old_spte, u64 new_spte, + int level, bool shared) { bool was_present =3D is_shadow_present_pte(old_spte); bool is_present =3D is_shadow_present_pte(new_spte); bool was_leaf =3D was_present && is_last_spte(old_spte, level); bool is_leaf =3D is_present && is_last_spte(new_spte, level); bool pfn_changed =3D spte_to_pfn(old_spte) !=3D spte_to_pfn(new_spte); + int as_id =3D kvm_mmu_page_as_id(sp); + int r; =20 WARN_ON_ONCE(level > PT64_ROOT_MAX_LEVEL); WARN_ON_ONCE(level < PG_LEVEL_4K); @@ -598,9 +588,7 @@ static void handle_changed_spte(struct kvm *kvm, int as= _id, gfn_t gfn, } =20 if (old_spte =3D=3D new_spte) - return; - - trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte); + return 0; =20 if (is_leaf) check_spte_writable_invariants(new_spte); @@ -627,21 +615,41 @@ static void handle_changed_spte(struct kvm *kvm, int = as_id, gfn_t gfn, "a temporary frozen SPTE.\n" "as_id: %d gfn: %llx old_spte: %llx new_spte: %llx level: %d", as_id, gfn, old_spte, new_spte, level); - return; + return 0; } =20 - if (is_leaf !=3D was_leaf) - kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1); - /* * Recursively handle child PTs if the change removed a subtree from * the paging structure. Note the WARN on the PFN changing without the * SPTE being converted to a hugepage (leaf) or being zapped. Shadow * pages are kernel allocations and should never be migrated. + * + * When modifying leaf entries in mirrored page tables, propagate all + * changes to the external SPTE. */ if (was_present && !was_leaf && - (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) + (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) { handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); + } else if (is_mirror_sp(sp) && is_present) { + r =3D set_external_spte_present(kvm, gfn, old_spte, new_spte, level); + if (r) + return r; + } + + trace_kvm_tdp_mmu_spte_changed(as_id, gfn, level, old_spte, new_spte); + + if (is_leaf !=3D was_leaf) + kvm_update_page_stats(kvm, level, is_leaf ? 1 : -1); + + return 0; +} + +static void handle_changed_spte(struct kvm *kvm, struct kvm_mmu_page *sp, + gfn_t gfn, u64 old_spte, u64 new_spte, + int level, bool shared) +{ + KVM_BUG_ON(__handle_changed_spte(kvm, sp, gfn, old_spte, new_spte, + level, shared), kvm); } =20 static inline int __must_check __tdp_mmu_set_spte_atomic(struct kvm *kvm, @@ -657,32 +665,14 @@ static inline int __must_check __tdp_mmu_set_spte_ato= mic(struct kvm *kvm, WARN_ON_ONCE(iter->yielded || is_frozen_spte(iter->old_spte)); =20 /* - * FROZEN_SPTE is a temporary state and should never be set via higher - * level helpers. + * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs and + * does not hold the mmu_lock. On failure, i.e. if a different logical + * CPU modified the SPTE, try_cmpxchg64() updates iter->old_spte with + * the current value, so the caller operates on fresh data, e.g. if it + * retries tdp_mmu_set_spte_atomic() */ - KVM_MMU_WARN_ON(is_frozen_spte(new_spte)); - - if (is_mirror_sptep(iter->sptep)) { - int ret; - - ret =3D set_external_spte_present(kvm, iter->sptep, iter->gfn, - &iter->old_spte, new_spte, iter->level); - if (ret) - return ret; - } else { - u64 *sptep =3D rcu_dereference(iter->sptep); - - /* - * Note, fast_pf_fix_direct_spte() can also modify TDP MMU SPTEs - * and does not hold the mmu_lock. On failure, i.e. if a - * different logical CPU modified the SPTE, try_cmpxchg64() - * updates iter->old_spte with the current value, so the caller - * operates on fresh data, e.g. if it retries - * tdp_mmu_set_spte_atomic() - */ - if (!try_cmpxchg64(sptep, &iter->old_spte, new_spte)) - return -EBUSY; - } + if (!try_cmpxchg64(rcu_dereference(iter->sptep), &iter->old_spte, new_spt= e)) + return -EBUSY; =20 return 0; } @@ -708,18 +698,49 @@ static inline int __must_check tdp_mmu_set_spte_atomi= c(struct kvm *kvm, struct tdp_iter *iter, u64 new_spte) { + struct kvm_mmu_page *sp =3D sptep_to_sp(rcu_dereference(iter->sptep)); int ret; =20 lockdep_assert_held_read(&kvm->mmu_lock); =20 - ret =3D __tdp_mmu_set_spte_atomic(kvm, iter, new_spte); + /* KVM should never freeze SPTEs using higher level APIs. */ + KVM_MMU_WARN_ON(is_frozen_spte(new_spte)); + + /* + * Temporarily freeze the SPTE until the external PTE operation has + * completed (unless the new SPTE itself will be frozen), e.g. so that + * concurrent faults don't attempt to install a child PTE in the + * external page table before the parent PTE has been written, or try + * to re-install a page table before the old one was removed. + */ + if (is_mirror_sptep(iter->sptep)) + ret =3D __tdp_mmu_set_spte_atomic(kvm, iter, FROZEN_SPTE); + else + ret =3D __tdp_mmu_set_spte_atomic(kvm, iter, new_spte); if (ret) return ret; =20 - handle_changed_spte(kvm, iter->as_id, iter->gfn, iter->old_spte, - new_spte, iter->level, true); + ret =3D __handle_changed_spte(kvm, sp, iter->gfn, iter->old_spte, + new_spte, iter->level, true); =20 - return 0; + /* + * Unfreeze the mirror SPTE. If updating the external SPTE failed, + * restore the old SPTE so that the SPTE isn't frozen in perpetuity, + * otherwise set the mirror SPTE to the new desired value. + */ + if (is_mirror_sptep(iter->sptep)) { + if (ret) + __kvm_tdp_mmu_write_spte(iter->sptep, iter->old_spte); + else + __kvm_tdp_mmu_write_spte(iter->sptep, new_spte); + } else { + /* + * Bug the VM if handling the change failed, as failure is only + * allowed if KVM couldn't update the external SPTE. + */ + KVM_BUG_ON(ret, kvm); + } + return ret; } =20 /* @@ -738,6 +759,8 @@ static inline int __must_check tdp_mmu_set_spte_atomic(= struct kvm *kvm, static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id, tdp_ptep_t sptep, u64 old_spte, u64 new_spte, gfn_t gfn, int level) { + struct kvm_mmu_page *sp =3D sptep_to_sp(rcu_dereference(sptep)); + lockdep_assert_held_write(&kvm->mmu_lock); =20 /* @@ -751,7 +774,7 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id,= tdp_ptep_t sptep, =20 old_spte =3D kvm_tdp_mmu_write_spte(sptep, old_spte, new_spte, level); =20 - handle_changed_spte(kvm, as_id, gfn, old_spte, new_spte, level, false); + handle_changed_spte(kvm, sp, gfn, old_spte, new_spte, level, false); =20 /* * Users that do non-atomic setting of PTEs don't operate on mirror @@ -1373,6 +1396,9 @@ static void kvm_tdp_mmu_age_spte(struct kvm *kvm, str= uct tdp_iter *iter) { u64 new_spte; =20 + if (WARN_ON_ONCE(is_mirror_sptep(iter->sptep))) + return; + if (spte_ad_enabled(iter->old_spte)) { iter->old_spte =3D tdp_mmu_clear_spte_bits_atomic(iter->sptep, shadow_accessed_mask); --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3CBDB39B96C; Fri, 27 Mar 2026 20:14:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642475; cv=none; b=NQE9wUWHH7xloCXx9NPzcWmza/LLGnF1kFBF9rbayvjMx3wzP5xbmMISqDdBmv34u1fHj4xPTSIgS0zOMXPxO8inEAS/ykMxbuHbI9gQehvqjdN8JCdd/pPvRmVzQRQ22n441eIpc2IkGF4xjkQ6JeV3uMlCWRV5xUlRvsx5ezc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642475; c=relaxed/simple; bh=SaEqeTeiEhkjJCLNtXb3zqlM6LFcp8nKgBSVYHkQYV8=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NlbfdoLsXvZ7AlL2LDcICBEvhzj71/JJaW6+Nvtar9sKoRXcectEKd3lA5fHGKtjA4vdFhRD9wn/5WX6MQLPofNO6b6lhgTyJ/cY3utJQvYYE8cGeOB3AwNb7ePc180PxtpAwuGzMx3KK5VE67XeXrXwgZjoWNyBdvyCOVyKK8I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=mJifDqKD; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="mJifDqKD" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642473; x=1806178473; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=SaEqeTeiEhkjJCLNtXb3zqlM6LFcp8nKgBSVYHkQYV8=; b=mJifDqKDdfjyLe3nWWyoq/gJAZnQ5EMBbnKcQhJWlo3Blht9jklpfG15 fkx6ItpJ83lB4GPnQfUoPGfne5hiB6UcdDvnlNbCUH2IQBnpKrptsGJ9X v7W4iiVLu1YLMPz8gF02mjumOt05YaPsJNbKJaLrDh+rGsrZulnzt+6s9 gXA2t29aJsQaFqxwycCb69gy6shJ91WkVbRXehG5TN1UKjiN4PM0RIS9m tU0l9mgLfNchP0EBQn19MLHLlCGpBPIulEGrdG3MFss2Gxzqpqye/M5ZZ 7RE5HI1XaFwFpY06ujO34jvD2RsZ8HvdQFl8zy1vxeII36Tkzk+G9qPm9 g==; X-CSE-ConnectionGUID: o8Dur5v7SX20/quCFf/vvw== X-CSE-MsgGUID: FZ+yUiaUT4S6gPYSBxOj7w== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182743" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182743" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:28 -0700 X-CSE-ConnectionGUID: Yz0ChCC7QxaPlcmXkFZvTg== X-CSE-MsgGUID: S6dbAdZITzi9I5jMFZ5Zmg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922910" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 08/17] KVM: TDX: Drop kvm_x86_ops.link_external_spt(), use .set_external_spte() for all Date: Fri, 27 Mar 2026 13:14:12 -0700 Message-ID: <20260327201421.2824383-9-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Drop the dedicated .link_external_spt() for linking non-leaf S-EPT pages, and instead funnel everything through .set_external_spte(). Using separate hooks doesn't help prevent TDP MMU details from bleeding into TDX, and vice versa; to the contrary, dedicated callbacks will result in _more_ pollution when hugepage support is added, e.g. will require the TDP MMU to know details about the splitting rules for TDX that aren't all that relevant to the TDP MMU. Ideally, KVM would provide a single pair of hooks to set S-EPT entries, one hook for setting SPTEs under write-lock and another for settings SPTEs under read-lock (e.g. to ensure the entire operation is "atomic", to allow for failure, etc.). Sadly, TDX's requirement that all child S-EPT entries are removed before the parent makes that impractical: the TDP MMU deliberately prunes non-leaf SPTEs and _then_ processes its children, thus making it quite important for the TDP MMU to differentiate between zapping leaf and non-leaf S-EPT entries. However, that's the _only_ case that's truly special, and even that case could be shoehorned into a single hook; it's just wouldn't be a net positive. Signed-off-by: Sean Christopherson [add in trivial feedback] Signed-off-by: Rick Edgecombe --- v6: - rename external_spt->sept_pt (Rick, Yan) --- arch/x86/include/asm/kvm-x86-ops.h | 1 - arch/x86/include/asm/kvm_host.h | 3 -- arch/x86/kvm/mmu/tdp_mmu.c | 31 +-------------- arch/x86/kvm/vmx/tdx.c | 61 ++++++++++++++++++++---------- 4 files changed, 43 insertions(+), 53 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index 31d5c5d58ae6..bced6d938702 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -95,7 +95,6 @@ KVM_X86_OP_OPTIONAL_RET0(set_tss_addr) KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) -KVM_X86_OP_OPTIONAL_RET0(link_external_spt) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) KVM_X86_OP_OPTIONAL_RET0(free_external_spt) KVM_X86_OP_OPTIONAL(remove_external_spte) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index d3bdc9828133..1139bd89f0cf 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1876,9 +1876,6 @@ struct kvm_x86_ops { void (*load_mmu_pgd)(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_level); =20 - /* Update external mapping with page table link. */ - int (*link_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, - void *external_spt); /* Update the external page table from spte getting set. */ int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index db16e81b9701..6dc08fe22841 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -494,42 +494,13 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pt= ep_t pt, bool shared) call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); } =20 -static void *get_external_spt(gfn_t gfn, u64 new_spte, int level) -{ - if (is_shadow_present_pte(new_spte) && !is_last_spte(new_spte, level)) { - struct kvm_mmu_page *sp =3D spte_to_child_sp(new_spte); - - WARN_ON_ONCE(sp->role.level + 1 !=3D level); - WARN_ON_ONCE(sp->gfn !=3D gfn); - return sp->external_spt; - } - - return NULL; -} - static int __must_check set_external_spte_present(struct kvm *kvm, gfn_t gfn, u64 old_spte, u64 new_spte, int level) { - bool is_present =3D is_shadow_present_pte(new_spte); - bool is_leaf =3D is_present && is_last_spte(new_spte, level); - int ret =3D 0; - lockdep_assert_held(&kvm->mmu_lock); =20 - /* - * Use different call to either set up middle level - * external page table, or leaf. - */ - if (is_leaf) { - ret =3D kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); - } else { - void *external_spt =3D get_external_spt(gfn, new_spte, level); - - KVM_BUG_ON(!external_spt, kvm); - ret =3D kvm_x86_call(link_external_spt)(kvm, gfn, level, external_spt); - } - return ret; + return kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); } =20 /** diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 01e070ec10fd..92a846b91bac 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1654,18 +1654,58 @@ static int tdx_mem_page_aug(struct kvm *kvm, gfn_t = gfn, return 0; } =20 +static struct page *tdx_spte_to_sept_pt(struct kvm *kvm, gfn_t gfn, + u64 new_spte, enum pg_level level) +{ + struct kvm_mmu_page *sp =3D spte_to_child_sp(new_spte); + + if (KVM_BUG_ON(!sp->external_spt, kvm) || + KVM_BUG_ON(sp->role.level + 1 !=3D level, kvm) || + KVM_BUG_ON(sp->gfn !=3D gfn, kvm)) + return NULL; + + return virt_to_page(sp->external_spt); +} + +static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, + enum pg_level level, u64 mirror_spte) +{ + gpa_t gpa =3D gfn_to_gpa(gfn); + u64 err, entry, level_state; + struct page *sept_pt; + + sept_pt =3D tdx_spte_to_sept_pt(kvm, gfn, mirror_spte, level); + if (!sept_pt) + return -EIO; + + err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, level, sept_pt, + &entry, &level_state); + if (unlikely(tdx_operand_busy(err))) + return -EBUSY; + + if (TDX_BUG_ON_2(err, TDH_MEM_SEPT_ADD, entry, level_state, kvm)) + return -EIO; + + return 0; +} + static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); =20 + if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) + return -EIO; + + if (!is_last_spte(mirror_spte, level)) + return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte); + /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) return -EIO; =20 - WARN_ON_ONCE(!is_shadow_present_pte(mirror_spte) || - (mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); + WARN_ON_ONCE((mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); =20 /* * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory() @@ -1685,23 +1725,7 @@ static int tdx_sept_set_private_spte(struct kvm *kvm= , gfn_t gfn, return tdx_mem_page_aug(kvm, gfn, level, pfn); } =20 -static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn, - enum pg_level level, void *private_spt) -{ - gpa_t gpa =3D gfn_to_gpa(gfn); - struct page *page =3D virt_to_page(private_spt); - u64 err, entry, level_state; =20 - err =3D tdh_mem_sept_add(&to_kvm_tdx(kvm)->td, gpa, level, page, &entry, - &level_state); - if (unlikely(tdx_operand_busy(err))) - return -EBUSY; - - if (TDX_BUG_ON_2(err, TDH_MEM_SEPT_ADD, entry, level_state, kvm)) - return -EIO; - - return 0; -} =20 /* * Ensure shared and private EPTs to be flushed on all vCPUs. @@ -3413,7 +3437,6 @@ int __init tdx_hardware_setup(void) =20 vt_x86_ops.vm_size =3D max_t(unsigned int, vt_x86_ops.vm_size, sizeof(str= uct kvm_tdx)); =20 - vt_x86_ops.link_external_spt =3D tdx_sept_link_private_spt; vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; vt_x86_ops.free_external_spt =3D tdx_sept_free_private_spt; vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 652D039C638; Fri, 27 Mar 2026 20:14:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642475; cv=none; b=S3cxEEU6kXmMVEXxmnii4MGGOSNdqWYgb2Z+PFWfWk8oFlVfhLVjOKP7rof96MFbJt6SCowb8ZWzUhPNlZtKFhEDxZM1k5J/JzFeMZFI1pocYLkrEpnFH13ZXU9TxDOUA52srHUB70qsPhfGuJ1/m3eYLbRV5snZB4pkLlkj674= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642475; c=relaxed/simple; bh=o1Q8ttR2x41qOPRYX1OHKMYlMC04w5fXUFCmeYTDCcU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tzUQbE4QwHO5w55s98u8oIrkXDgagATqrmsUhKDcjHCX1K4vOpU2nedhRxFdK/nDPqbg3JmV46PwOCfhz7yBz8+ug83vvuWI32wEfGnMHI6pkd4B6upHnkIK1K/A3O3/Ku1l4RgTQc5pJTSP1HEQWYChCoObHNUEVStmdw1huUY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=RQnGKb4x; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="RQnGKb4x" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642473; x=1806178473; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=o1Q8ttR2x41qOPRYX1OHKMYlMC04w5fXUFCmeYTDCcU=; b=RQnGKb4x/ktjhktIUz8HGh0bJcqoJ+rgjHyYASqsZ0L+kznBuiMjv/7l sZDTrl84HIFirtdnBCRvOCf+VQIPS7z7BEiytbX8MeiRtYnXXd3W1JGA3 Liqc3EOSqQHsmP5FH+PmMU7eFpOUseCGfMCTL6seNCMr0uTVRucApk2s0 gFWqZQLuvcfB9EbsbbCum0TwF6oelQPifUtC74fX39Z0tEp1EEBVFukl6 jJSSpnbagtPSmRHWwTUtF9iKU+KPChwLNA0hXtgbRzau7OLFGJT4ZZivd VdhWDvbuVYTWn2Td4a/uBE/+Jiuqil0bmLtyIrkXPbvhD8qsnkFpBZzQi A==; X-CSE-ConnectionGUID: P/8tceXDS/CNg7aXvIeeCA== X-CSE-MsgGUID: L/4W5+LITuWCEwKgM/efWg== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182747" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182747" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:28 -0700 X-CSE-ConnectionGUID: LmECI2IrT02QNWD8ZA7SbA== X-CSE-MsgGUID: RnLEZ4oWTASK2BdYFw8gEQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922913" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 09/17] KVM: TDX: Add helper to handle mapping leaf SPTE into S-EPT Date: Fri, 27 Mar 2026 13:14:13 -0700 Message-ID: <20260327201421.2824383-10-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Add a helper, tdx_sept_map_leaf_spte(), to wrap and isolate PAGE.ADD and PAGE.AUG operations, and thus complete tdx_sept_set_private_spte()'s transition into a "dispatch" routine for setting/writing S-EPT entries. No functional change intended. Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe --- arch/x86/kvm/vmx/tdx.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 92a846b91bac..361a75b42ae7 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1689,18 +1689,12 @@ static int tdx_sept_link_private_spt(struct kvm *kv= m, gfn_t gfn, return 0; } =20 -static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 mirror_spte) +static int tdx_sept_map_leaf_spte(struct kvm *kvm, gfn_t gfn, enum pg_leve= l level, + u64 mirror_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); =20 - if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) - return -EIO; - - if (!is_last_spte(mirror_spte, level)) - return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte); - /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) return -EIO; @@ -1725,7 +1719,18 @@ static int tdx_sept_set_private_spte(struct kvm *kvm= , gfn_t gfn, return tdx_mem_page_aug(kvm, gfn, level, pfn); } =20 +static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, u64 mirror_spte) +{ =20 + if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) + return -EIO; + + if (!is_last_spte(mirror_spte, level)) + return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte); + + return tdx_sept_map_leaf_spte(kvm, gfn, level, mirror_spte); +} =20 /* * Ensure shared and private EPTs to be flushed on all vCPUs. --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A2C83A1A28; Fri, 27 Mar 2026 20:14:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642477; cv=none; b=Xo5pVfurusPHsi+vSZaWbApUcsYFgUNLZT7VTnBUhPorBJcpKzqAKHAAYiswCZl5nzJKrsTJ4GCB21jcl1TIl4ZTMUIZWSE5Wc4HksMs/dIKjq2rFjRpKWj9wDKezn9pHxwJFPwgm5R6lhMT0OcoUqnq9MR4S/STO74jTGah4q8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642477; c=relaxed/simple; bh=dkV8/VaHw1LjR8ZDe+MBkWOtj9V/JPG3RtWWXnYPP3w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CO5l7KBhly21ICo8shGKZABc+fZQqz1Hu6Slsev3AYvyyu4oXCoXLtSffGsMhzrzEBAxzrXAWkCUjoTZBhQFEcD08CWkb9QAn6fO/uzDuwKZPJaAl+XkRiWX3nPHbm2iNJXKoh0+PksdcmG1TjuUwicsmdGAbdeZkj3OzAdCdCM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=oHiCMCzg; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="oHiCMCzg" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642475; x=1806178475; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=dkV8/VaHw1LjR8ZDe+MBkWOtj9V/JPG3RtWWXnYPP3w=; b=oHiCMCzgdLhCeOYnO/vvvUHExtaIPjgybcaZsCDo5sppMxtPsoPFsy27 7UqJ0YLQvbUSK703cCmn5O9QYqdIz5Y1FyEEFqSSyweCZj4vNnRxczBz5 LpDzHbispdHc6kiF/+UVPwrSBPYVo03r+OuUgPmVzaJyKqnDUPYV8CM5c l37JUyZuKj1VnMICDSI0H1EJwyn2eS+lOBURXy7rZzXViQgybn8jUvLlQ m10ZhewMvUi33Bm9LdR0FTR0x4eSCgaD8ncqqoLyoKEoaxdyz9vaVK6r0 VdwkviYE4wkj0NqlZTR68MzureWJuuFBIKCboMjq4FXPgVkLvd/C0h/d6 Q==; X-CSE-ConnectionGUID: 0QUwuGVJQWux9AJE3m6iDA== X-CSE-MsgGUID: Aei7RA5CRBeUPUkXDBepyA== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182746" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182746" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:28 -0700 X-CSE-ConnectionGUID: umOxig8bQD2qK5cM7loEjg== X-CSE-MsgGUID: BRfDipE9TS2t+AyBi335Qw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922916" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 10/17] KVM: TDX: Move set_external_spte_present() assert into TDX code Date: Fri, 27 Mar 2026 13:14:14 -0700 Message-ID: <20260327201421.2824383-11-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Move the MMU lockdep assert in set_external_spte_present() into the TDX specific op because the assert is TDX specific in intention. The TDP MMU has many lockdep asserts for various scenarios, and in fact the callchains that are used for TDX already have a lockdep assert which cover the case in set_external_spte_present(). However, these asserts are for management of the TDP root owned by KVM. In the set_external_spte_present() assert case, it is helping with a scheme to avoid contention in the TDX module during zap operations. That is very TDX specific. One option would be to just remove the assert in set_external_spte_present() and rely on the other ones in the TDP MMU. But that assert is for an a different intention, and too far away from the SEAMCALL that needs it. So move just move it to TDX code. Suggested-by: Sean Christopherson Signed-off-by: Rick Edgecombe --- arch/x86/kvm/mmu/tdp_mmu.c | 2 -- arch/x86/kvm/vmx/tdx.c | 3 ++- 2 files changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 6dc08fe22841..6763537098ee 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -498,8 +498,6 @@ static int __must_check set_external_spte_present(struc= t kvm *kvm, gfn_t gfn, u64 old_spte, u64 new_spte, int level) { - lockdep_assert_held(&kvm->mmu_lock); - return kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); } =20 diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 361a75b42ae7..b44a9c96c89e 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1722,10 +1722,11 @@ static int tdx_sept_map_leaf_spte(struct kvm *kvm, = gfn_t gfn, enum pg_level leve static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, enum pg_level level, u64 mirror_spte) { - if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) return -EIO; =20 + lockdep_assert_held(&kvm->mmu_lock); + if (!is_last_spte(mirror_spte, level)) return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte); =20 --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3544639EF07; Fri, 27 Mar 2026 20:14:34 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642475; cv=none; b=I9UupB6+vh+jW2aE+mFb8kA3s+/DcH5q3jl2Y9T6kmQGU9u99E5vejPWkXskSO7pY8NC4T7jVxssdhmld4nD9hxesrUlNC3wzDP3OnSeiACmI937PV/nNPbn4SnJkcEVw2m3qUqL6GpaJGIXH5R3X0EaoAwYCqznqj0W1B587L4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642475; c=relaxed/simple; bh=GKUvVoOuQqfZL+XhpxIs2FU6agVmOLJYI6aybNd00tE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=ABjETGYqbqD76cEtE2drJMpLIY7SfLnqunN12+W0DOZtedG6lzmeVTLgB08PFdGTpfbk9cgmiiCOoVpwG3C10X/rN3oIMOuRkmgLHk1if2K2CrEMsXaw/E9z+FZ/+UgfYCrtXQ3eEVcmtCwqmU7jtBwiSTyxnRkQOr2eXg2zvWI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Xj3oIbMM; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Xj3oIbMM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642474; x=1806178474; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GKUvVoOuQqfZL+XhpxIs2FU6agVmOLJYI6aybNd00tE=; b=Xj3oIbMMEnW5DLuqbR018fsgPiKcAEarsv86aUvQ9Y8z/2SAf29v2NS0 BlkZtD8iU6sbPQPNZAJTRXe+x1O6K9npVvd8tmXpFyv5Rk8fYxbpivuSE PNH2MnzP9PT7sUykStHtTfbyaO/vytAT9Nb72vf/DWwylJ7Xq2YEys5Ex fqJH/NW6CJKBTA+9jd/uOT3/8SANg1GFRz6TqlAS+m1yV4gbVvlICWcnX 5eOioaF9jl46WZOndw9gegu+Zp1sQjCnjkOYbfFxevWhVX4TRhUQCGto/ 6hcg/2lgxDdMMFdzwMO4nABulwTm4H3BdhdbEUIurTkBElj+d4paGUOJt A==; X-CSE-ConnectionGUID: RT24oYIvTQKv5paHsE5gkQ== X-CSE-MsgGUID: 0t3HK5/fS3+c07ltThjxrw== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182751" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182751" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:28 -0700 X-CSE-ConnectionGUID: SnRL7m/DQjWbb/TmIz7IDg== X-CSE-MsgGUID: jgYyI5yqSdahF74SyqC7tw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922919" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 11/17] KVM: x86/mmu: Fold set_external_spte_present() into its sole caller Date: Fri, 27 Mar 2026 13:14:15 -0700 Message-ID: <20260327201421.2824383-12-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Fold set_external_spte_present() into __tdp_mmu_set_spte_atomic() as all the other functionality besides calling the op. It now is a single line helper that is called once. No functional change intended. Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe --- arch/x86/kvm/mmu/tdp_mmu.c | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 6763537098ee..85c92aec868f 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -494,13 +494,6 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pte= p_t pt, bool shared) call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); } =20 -static int __must_check set_external_spte_present(struct kvm *kvm, - gfn_t gfn, u64 old_spte, - u64 new_spte, int level) -{ - return kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); -} - /** * __handle_changed_spte - handle bookkeeping associated with an SPTE chan= ge * @kvm: kvm instance @@ -600,7 +593,7 @@ static int __handle_changed_spte(struct kvm *kvm, struc= t kvm_mmu_page *sp, (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) { handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); } else if (is_mirror_sp(sp) && is_present) { - r =3D set_external_spte_present(kvm, gfn, old_spte, new_spte, level); + r =3D kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); if (r) return r; } --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B17E13A1CFE; Fri, 27 Mar 2026 20:14:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642481; cv=none; b=o6Ewgl0pppROnvgSuonAwxp2/ZVg5scM+aMWzJayFViPMBiT1N/pvxjzCgjny/LyU7+Y9S92ZSHYeGZq0rAhhxZ5ku2UApDnx8HTD0kKUqmp8850UpqqL128zv8gTHNL8e0lnzeoiPZXVLgTAevMseJ76VpsVNYrdh8+kTAuEoU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642481; c=relaxed/simple; bh=y5zhtChgwlv3eiH5J0O5wuswIcXFvqqm2S9/NTTnENA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=CdTUdSYK8FSQnNvXJ0G0XJfXpJfqYfsChGpo8TcaCABxRTtN/9XZq8V9NZj1ra+OfJLOqD8CVoOIw5U323NR4CEfpkCpHPmKF4dKHJ2SW4LJUfT4YRJ1vtALAyiR7KKbLcokWA+koL0jhqQyKnI/plM2jSnKMPLX04llQXjsFVs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=BPiW9fBN; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="BPiW9fBN" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642476; x=1806178476; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=y5zhtChgwlv3eiH5J0O5wuswIcXFvqqm2S9/NTTnENA=; b=BPiW9fBNBwglwZfkh0GHLXFJMhd/97Mb2a6iDhyni12U6tIt3RuQnWcE iD/ABrDsEi1wgUlwMzsCy4cU0adoamy2GggzCj7iu8qaKcIt5UxL2AN2p xH5YVlCkiM5CmP25IPSoQsASYAP1kOPVuzPA/a82CFXsXyln8KTlkAOGE yg8YxRtsauZoZQdcF+yWuw7uWj4YcHg6+oJfUlUNy6X0GMOz2YvfZhjvS Q9XGQAodgkT0wtQuk+5NTGU2basL55Lvkv1OBAnrG2u4e855NETa4//Af aOYO2P9z0FnTffJ+4jhUsF15PWgR+KshrDYRk23zp2WK3D7/dV9CcfeoW Q==; X-CSE-ConnectionGUID: bG1I7xbbTRqFxk1BMe7l/w== X-CSE-MsgGUID: TY1H2r5PSieZ/WV8dzb/Zw== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182754" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182754" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:28 -0700 X-CSE-ConnectionGUID: 8Y0vP5NxSuWYDR6YBLX8VA== X-CSE-MsgGUID: +8F0RnYCQxKNlfuiECAJ+Q== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922923" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 12/17] KVM: x86/mmu: Plumb the old_spte into kvm_x86_ops.set_external_spte() Date: Fri, 27 Mar 2026 13:14:16 -0700 Message-ID: <20260327201421.2824383-13-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Plumb the old SPTE into .set_external_spte() so that the callback can be used to handle removal and splitting of leaf SPTEs. Rename mirror_spte to new_spte to follow the TDP MMU's naming, and to make it more obvious what value the parameter holds. Opportunistically tweak the ordering of parameters to match the pattern of most TDP MMU functions, which do "old, new, level". Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/kvm_host.h | 4 ++-- arch/x86/kvm/mmu/tdp_mmu.c | 3 ++- arch/x86/kvm/vmx/tdx.c | 18 +++++++++--------- 3 files changed, 13 insertions(+), 12 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 1139bd89f0cf..808d2c7ea546 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1877,8 +1877,8 @@ struct kvm_x86_ops { int root_level); =20 /* Update the external page table from spte getting set. */ - int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level level, - u64 mirror_spte); + int (*set_external_spte)(struct kvm *kvm, gfn_t gfn, u64 old_spte, + u64 new_spte, enum pg_level level); =20 /* Update external page tables for page table about to be freed. */ int (*free_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 85c92aec868f..991870789863 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -593,7 +593,8 @@ static int __handle_changed_spte(struct kvm *kvm, struc= t kvm_mmu_page *sp, (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) { handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); } else if (is_mirror_sp(sp) && is_present) { - r =3D kvm_x86_call(set_external_spte)(kvm, gfn, level, new_spte); + r =3D kvm_x86_call(set_external_spte)(kvm, gfn, old_spte, + new_spte, level); if (r) return r; } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index b44a9c96c89e..569a0576e7c9 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1690,16 +1690,16 @@ static int tdx_sept_link_private_spt(struct kvm *kv= m, gfn_t gfn, } =20 static int tdx_sept_map_leaf_spte(struct kvm *kvm, gfn_t gfn, enum pg_leve= l level, - u64 mirror_spte) + u64 new_spte) { struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - kvm_pfn_t pfn =3D spte_to_pfn(mirror_spte); + kvm_pfn_t pfn =3D spte_to_pfn(new_spte); =20 /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) return -EIO; =20 - WARN_ON_ONCE((mirror_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); + WARN_ON_ONCE((new_spte & VMX_EPT_RWX_MASK) !=3D VMX_EPT_RWX_MASK); =20 /* * Ensure pre_fault_allowed is read by kvm_arch_vcpu_pre_fault_memory() @@ -1719,18 +1719,18 @@ static int tdx_sept_map_leaf_spte(struct kvm *kvm, = gfn_t gfn, enum pg_level leve return tdx_mem_page_aug(kvm, gfn, level, pfn); } =20 -static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 mirror_spte) +static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, + u64 new_spte, enum pg_level level) { - if (KVM_BUG_ON(!is_shadow_present_pte(mirror_spte), kvm)) + if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EIO; =20 lockdep_assert_held(&kvm->mmu_lock); =20 - if (!is_last_spte(mirror_spte, level)) - return tdx_sept_link_private_spt(kvm, gfn, level, mirror_spte); + if (!is_last_spte(new_spte, level)) + return tdx_sept_link_private_spt(kvm, gfn, level, new_spte); =20 - return tdx_sept_map_leaf_spte(kvm, gfn, level, mirror_spte); + return tdx_sept_map_leaf_spte(kvm, gfn, level, new_spte); } =20 /* --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D82133A1E9E; Fri, 27 Mar 2026 20:14:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642477; cv=none; b=ZuO/JyhzGS2nZziAC1omQl9IfSEMAvh9CB7T06mo2kH4s+xQj9/AMd1ZVkPTfWQp/dim30NbWFOlIMs3KLHzxygn6a/Gj9o7pxBmori9ctkXZrZCVaT1KS2Lb6qLPc++kpz4SaE/COmDfCHf6aDnlFg1C9ZeBb6O8qYxJ/MHEeA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642477; c=relaxed/simple; bh=OcHkDR+vPE1bKPvOvL80lq/RfRhipAkhZil7/7o2E3Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=PW3eoxSVyKjrgIpDHnZkhtIam37Ex/gFy4wVVbKs+h6MPOMpFzZfsoMHVdgO8EEhMApDHc3FTW2iCavB+5ygf2Hv/H6bmr4S5HfNfR/0cLTE/m0qmH/VrKO7zp1X/hWxHA6L9D+ugzTQBt0sHdPt5TbkV/C/mqjT85hhQa+dV0E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=bDGdg5aO; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="bDGdg5aO" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642476; x=1806178476; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=OcHkDR+vPE1bKPvOvL80lq/RfRhipAkhZil7/7o2E3Y=; b=bDGdg5aOAnw9brm/ffOOhZcJCNExCR13LXYZ7asl/2en2ayvcvU0Ib5+ +flR8AxOyd+5cvTcpolhXYDu8M8MXTuy5S2xxeAxTgiLbKr0nPjlZOWK8 GFf3HZ/HGYFo2J0tVmckjJB2ilpVf7+7ONuxAvPwx9GH0mWIjlerCs82B 3JbvhJxSrgvY0PwX2dRlGwlewI3YTGsoYd/WKG0JNP4QhPjFsG0jvmKL+ Bm0cXA9hYx75ZSF9lbkHqLixQG6j2h78S0bFV+ZTSgysitYNF7HnynHOz U1mhvSNyLIX5DaoiuGEgn08Zb70ATsa7U0HyJoStb3bL5IfxYn58gXVRZ g==; X-CSE-ConnectionGUID: 4Wo7xh21QT2LzHvfdUFTQA== X-CSE-MsgGUID: y24EDLmwS3urQsi5oSzFMQ== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182758" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182758" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:28 -0700 X-CSE-ConnectionGUID: BLuFWd/JS4iS8WKTITjkEw== X-CSE-MsgGUID: VqmEpWd/TWWvEWcEOqqynQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922926" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 13/17] KVM: TDX: Hoist tdx_sept_remove_private_spte() above set_private_spte() Date: Fri, 27 Mar 2026 13:14:17 -0700 Message-ID: <20260327201421.2824383-14-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Move tdx_sept_remove_private_spte() (and its tdx_track() helper) above tdx_sept_set_private_spte() in anticipation of routing all non-atomic S-EPT writes (with the exception of reclaiming non-leaf pages) through the "set" API. No functional change intended. Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe --- arch/x86/kvm/vmx/tdx.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 569a0576e7c9..5a1a6610a98f 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1719,20 +1719,6 @@ static int tdx_sept_map_leaf_spte(struct kvm *kvm, g= fn_t gfn, enum pg_level leve return tdx_mem_page_aug(kvm, gfn, level, pfn); } =20 -static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, - u64 new_spte, enum pg_level level) -{ - if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) - return -EIO; - - lockdep_assert_held(&kvm->mmu_lock); - - if (!is_last_spte(new_spte, level)) - return tdx_sept_link_private_spt(kvm, gfn, level, new_spte); - - return tdx_sept_map_leaf_spte(kvm, gfn, level, new_spte); -} - /* * Ensure shared and private EPTs to be flushed on all vCPUs. * tdh_mem_track() is the only caller that increases TD epoch. An increase= in @@ -1852,6 +1838,20 @@ static void tdx_sept_remove_private_spte(struct kvm = *kvm, gfn_t gfn, tdx_quirk_reset_page(page); } =20 +static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, + u64 new_spte, enum pg_level level) +{ + if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) + return -EIO; + + lockdep_assert_held(&kvm->mmu_lock); + + if (!is_last_spte(new_spte, level)) + return tdx_sept_link_private_spt(kvm, gfn, level, new_spte); + + return tdx_sept_map_leaf_spte(kvm, gfn, level, new_spte); +} + void tdx_deliver_interrupt(struct kvm_lapic *apic, int delivery_mode, int trig_mode, int vector) { --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 755313A4F5D; Fri, 27 Mar 2026 20:14:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642479; cv=none; b=YRAI0psaimFVylXMjdhW4ZZpfvIAsbIQtYSyhm2HbOGm0yxeJ4m/G80gU9phTwlCuWd8g7qrzWDw94pgri6zQjGIY9/xw+fP8AHJZag1BYZMWPortYUauuMo1LHDTXcSyDI4JWwHiSSUmAVbdjOsGFUcAg3fn/t82SEKsr3zXXc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642479; c=relaxed/simple; bh=wV8OuOc83N5v2E4voBHlyx19yiXx4AfYeaGG41ViA/Y=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=FPqCQYyRIg1Mtx1pZev8h9tI9uWGXRWTM7fjYy6TRC9JRLOvG0PzBGSDqJhmKJ0i+N1BjFNClgBL8ue74o1rmU8fZGLFV3dFrnPc8ZYp/LdzcyqbdWqH3J3jomxQvwfAgHxK2wEDYOy7JuBz4qeuQR4uSXt33Ew4eaDL6/vKjT8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=c82AOzbK; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="c82AOzbK" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642477; x=1806178477; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=wV8OuOc83N5v2E4voBHlyx19yiXx4AfYeaGG41ViA/Y=; b=c82AOzbK81muyxSeknDmnIkv1Wue+SkRqJe9Cz5gC/ze9Z1xtafxzm9C AT+v7ikaHz9Jai6wa19gSWYDX1J6N8sL6VCYM+UfAa+YMgAgDQN4yxY/h ST6KwMwmQNIe0NzvNbndrnAc1ujtLjsGHJg4x8eSGUxwwJj5hxMGqBLUT SVktC0EAQ+3f7gOjAJCuH9PfA0OpTPfwD8tfiN4OKJgtUc6Xaeel+Y+Qz fB2OjWLJmcfxKoD1R5sNhgknxHrQC8cm4wWA2zmmxU6wC+Ro/LTKoRFyb gKgbNPJ8iR/XauP9Wof8mhNCmhYysSm1L72tuyg5wQR93OlvOIavN56KQ A==; X-CSE-ConnectionGUID: Lg7R35dzQeO/xRx4E+k+pw== X-CSE-MsgGUID: knjENKTqQx+KTHW2fmgUrA== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182756" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182756" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:28 -0700 X-CSE-ConnectionGUID: SWZdmb4XR3GoAO8gDzFXgQ== X-CSE-MsgGUID: GtapS4rWTkeAQ0l4WImnNg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922929" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 14/17] KVM: x86/mmu: Remove KVM_BUG_ON() that checks lock when removing PTs Date: Fri, 27 Mar 2026 13:14:18 -0700 Message-ID: <20260327201421.2824383-15-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" As part of an ongoing effort to move TDX specific bits from the MMU into the TDX code, drop the KVM_BUG_ON() that checks the MMU lock is held for write while removing page tables. Future changes forward PTE removal mirror EPT updates into the set_private_spte() and let TDX code parse the PTE to decide what S-EPT operations to take. This operations does not pass a shared bool for this KVM_BUG_ON() to use in the logics future home. But even today there are already MMU write lockdep asserts that mostly cover the case. Since the KVM_BUG_ON() is already a bit redundant, just remove it instead of trying to plumb the bool into TDX code. Link: https://lore.kernel.org/kvm/aYUarHf3KEwHGuJe@google.com/ Suggested-by: Sean Christopherson Signed-off-by: Rick Edgecombe --- arch/x86/kvm/mmu/tdp_mmu.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 991870789863..5dc9633c866e 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -473,10 +473,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pte= p_t pt, bool shared) } handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared); =20 - if (is_mirror_sp(sp)) { - KVM_BUG_ON(shared, kvm); + if (is_mirror_sp(sp)) remove_external_spte(kvm, gfn, old_spte, level); - } } =20 if (is_mirror_sp(sp) && --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CA3923A545E; Fri, 27 Mar 2026 20:14:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642479; cv=none; b=GgHL2BuUOa3pcQIYLEMReobHcAdWkFTJxYdrJ60KdoqVOb24unL22uPwEynPH11mJUfPtQF4rqmvCKtRJHmYmK8L8Usp38Pu78dk5cikGmvoUH0nij6kOBm4XLjAJAFgulIFXO0GfM8raKc3AxTWE5Zr8QPtcUU1TsHvvobGtgw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642479; c=relaxed/simple; bh=FOTP5cb4XF1vyg42+XwyX9ktm52KRzI1tMI+n0u/22w=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A3bol/2oNmDL4+l1+EgkuYZpye/O4f76djye2Ulwr9zP1/DckVdmsDX3ud3D9tuxKsqhyznCJpbk2DdY9JyI7mGUCDHcsGx1vmJ/+NMXQOpyvv0XDhShL/HzXQ6xvarSHTL2fsWRcH2cWqDFZvhWbpJ9IJYSQ0wvmivznfiGh88= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=UVPUIONb; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="UVPUIONb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642478; x=1806178478; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=FOTP5cb4XF1vyg42+XwyX9ktm52KRzI1tMI+n0u/22w=; b=UVPUIONbYVwWIGInMW4rQnKho7Vq6QCcUduZ6HXFdqZpbLBUhVXTZ/Gf 2GJNgt0/OEKEz/FnNOrIC4gDMvNAkz8sYIE/+FjtvW+VP1SUetTVOuZ5G gM4eOuEc82gRq9je7jz9vymBELeiKBSGzw013eiNF5ZXlxwXwmqiOQbSv 7bVuEzjAshgz9Vx4oOz/69Ipeen5GqVtwES+Ejl3EY3iFFBMiEtiZR0kK daHyHJb6hz/QMvIqJiOWI4RqWaYNnvsgIgBWSm3/jWkipM1bnl7W9Iwm9 MGPy7C1ANTyaq8iBF6RBOEjbqmPh6Xqp2LjGAv8Q8TOpjujKjk1/c88/f A==; X-CSE-ConnectionGUID: voTHr3OxR7OtSbpDGvYTYw== X-CSE-MsgGUID: sx5FnQXMTwm+X0jDOABeFQ== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182761" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182761" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:28 -0700 X-CSE-ConnectionGUID: C2UW1X9vREGx3M6I1wpKMQ== X-CSE-MsgGUID: 164pRbArSh2aamESPCKqbg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922932" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 15/17] KVM: TDX: Handle removal of leaf SPTEs in .set_private_spte() Date: Fri, 27 Mar 2026 13:14:19 -0700 Message-ID: <20260327201421.2824383-16-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Sean Christopherson Drop kvm_x86_ops.remove_external_spte(), and instead handling the removal of leaf SPTEs in the S-EPT (a.k.a. external root) in .set_private_spte(). This will allow extending tdx_sept_set_private_spte() to support splitting a huge S-EPT entry without needing yet another kvm_x86_ops hook. Bug the VM if the callback fails, as redundant KVM_BUG_ON() calls are benign (the WARN will fire if and only if the VM isn't already bugged) and handle_changed_spte() is most definitely not prepared to handle failure. Signed-off-by: Sean Christopherson Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/kvm-x86-ops.h | 1 - arch/x86/include/asm/kvm_host.h | 3 --- arch/x86/kvm/mmu/tdp_mmu.c | 33 +----------------------------- arch/x86/kvm/vmx/tdx.c | 22 ++++++++++++-------- 4 files changed, 14 insertions(+), 45 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index bced6d938702..ed348c6dd445 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -97,7 +97,6 @@ KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) KVM_X86_OP_OPTIONAL_RET0(free_external_spt) -KVM_X86_OP_OPTIONAL(remove_external_spte) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) KVM_X86_OP(get_l2_tsc_multiplier) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 808d2c7ea546..09588e797e4b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1884,9 +1884,6 @@ struct kvm_x86_ops { int (*free_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, void *external_spt); =20 - /* Update external page table from spte getting removed, and flush TLB. */ - void (*remove_external_spte)(struct kvm *kvm, gfn_t gfn, enum pg_level le= vel, - u64 mirror_spte); =20 bool (*has_wbinvd_exit)(void); =20 diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 5dc9633c866e..806788bdecce 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -359,25 +359,6 @@ static void tdp_mmu_unlink_sp(struct kvm *kvm, struct = kvm_mmu_page *sp) spin_unlock(&kvm->arch.tdp_mmu_pages_lock); } =20 -static void remove_external_spte(struct kvm *kvm, gfn_t gfn, u64 old_spte, - int level) -{ - /* - * External (TDX) SPTEs are limited to PG_LEVEL_4K, and external - * PTs are removed in a special order, involving free_external_spt(). - * But remove_external_spte() will be called on non-leaf PTEs via - * __tdp_mmu_zap_root(), so avoid the error the former would return - * in this case. - */ - if (!is_last_spte(old_spte, level)) - return; - - /* Zapping leaf spte is allowed only when write lock is held. */ - lockdep_assert_held_write(&kvm->mmu_lock); - - kvm_x86_call(remove_external_spte)(kvm, gfn, level, old_spte); -} - /** * handle_removed_pt() - handle a page table removed from the TDP structure * @@ -472,9 +453,6 @@ static void handle_removed_pt(struct kvm *kvm, tdp_ptep= _t pt, bool shared) FROZEN_SPTE, level); } handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared); - - if (is_mirror_sp(sp)) - remove_external_spte(kvm, gfn, old_spte, level); } =20 if (is_mirror_sp(sp) && @@ -590,7 +568,7 @@ static int __handle_changed_spte(struct kvm *kvm, struc= t kvm_mmu_page *sp, if (was_present && !was_leaf && (is_leaf || !is_present || WARN_ON_ONCE(pfn_changed))) { handle_removed_pt(kvm, spte_to_child_pt(old_spte, level), shared); - } else if (is_mirror_sp(sp) && is_present) { + } else if (is_mirror_sp(sp)) { r =3D kvm_x86_call(set_external_spte)(kvm, gfn, old_spte, new_spte, level); if (r) @@ -737,15 +715,6 @@ static u64 tdp_mmu_set_spte(struct kvm *kvm, int as_id= , tdp_ptep_t sptep, =20 handle_changed_spte(kvm, sp, gfn, old_spte, new_spte, level, false); =20 - /* - * Users that do non-atomic setting of PTEs don't operate on mirror - * roots, so don't handle it and bug the VM if it's seen. - */ - if (is_mirror_sptep(sptep)) { - KVM_BUG_ON(is_shadow_present_pte(new_spte), kvm); - remove_external_spte(kvm, gfn, old_spte, level); - } - return old_spte; } =20 diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 5a1a6610a98f..bfbadba8bc08 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1788,10 +1788,10 @@ static int tdx_sept_free_private_spt(struct kvm *kv= m, gfn_t gfn, return tdx_reclaim_page(virt_to_page(private_spt)); } =20 -static void tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, - enum pg_level level, u64 mirror_spte) +static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, + enum pg_level level, u64 old_spte) { - struct page *page =3D pfn_to_page(spte_to_pfn(mirror_spte)); + struct page *page =3D pfn_to_page(spte_to_pfn(old_spte)); struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); gpa_t gpa =3D gfn_to_gpa(gfn); u64 err, entry, level_state; @@ -1804,16 +1804,16 @@ static void tdx_sept_remove_private_spte(struct kvm= *kvm, gfn_t gfn, * there can't be anything populated in the private EPT. */ if (KVM_BUG_ON(!is_hkid_assigned(to_kvm_tdx(kvm)), kvm)) - return; + return -EIO; =20 /* TODO: handle large pages. */ if (KVM_BUG_ON(level !=3D PG_LEVEL_4K, kvm)) - return; + return -EIO; =20 err =3D tdh_do_no_vcpus(tdh_mem_range_block, kvm, &kvm_tdx->td, gpa, level, &entry, &level_state); if (TDX_BUG_ON_2(err, TDH_MEM_RANGE_BLOCK, entry, level_state, kvm)) - return; + return -EIO; =20 /* * TDX requires TLB tracking before dropping private page. Do @@ -1829,18 +1829,22 @@ static void tdx_sept_remove_private_spte(struct kvm= *kvm, gfn_t gfn, err =3D tdh_do_no_vcpus(tdh_mem_page_remove, kvm, &kvm_tdx->td, gpa, level, &entry, &level_state); if (TDX_BUG_ON_2(err, TDH_MEM_PAGE_REMOVE, entry, level_state, kvm)) - return; + return -EIO; =20 err =3D tdh_phymem_page_wbinvd_hkid((u16)kvm_tdx->hkid, page); if (TDX_BUG_ON(err, TDH_PHYMEM_PAGE_WBINVD, kvm)) - return; + return -EIO; =20 tdx_quirk_reset_page(page); + return 0; } =20 static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn, u64 old_s= pte, u64 new_spte, enum pg_level level) { + if (is_shadow_present_pte(old_spte)) + return tdx_sept_remove_private_spte(kvm, gfn, level, old_spte); + if (KVM_BUG_ON(!is_shadow_present_pte(new_spte), kvm)) return -EIO; =20 @@ -3445,7 +3449,7 @@ int __init tdx_hardware_setup(void) =20 vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte; vt_x86_ops.free_external_spt =3D tdx_sept_free_private_spt; - vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte; + vt_x86_ops.protected_apic_has_interrupt =3D tdx_protected_apic_has_interr= upt; return 0; =20 --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A922E3932E5; Fri, 27 Mar 2026 20:14:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642481; cv=none; b=f3g7fTPMXc0ooXPMxsYyjSxrR30hPS3t1fMZoAE6cT34r150nbirz6fY40IaupDqdza6LFr3mwLda11t79C+z5+TCCMdFyGKvtaemhFPfMLwPhEZwucU24Tk7t6hYVxRHI56p/aSUjEkBUhO/RUPPP42TS66qh7SDV910j7ZLYc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642481; c=relaxed/simple; bh=NzbvF1Knz7BqCBLTC5aW6ttylV9MJisv4hKu20gBQJk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=gH0S4L619Blx5sIUv5lN+95j9IZiYPeRkhHQ5cosrnx5OJFt+1B7EbK/H1+Xd5nvzhcBKlRCZ5gi1SymWIRoLQ3sM7kFbop5Kzk8qVTc74HkC1f8DnQxNWXyd8dvsKR7Cfwp+8BU6wTZNbx97yQEBcaaXYIc40XoF8QaOBKQvAA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Fzjaa3cX; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Fzjaa3cX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642480; x=1806178480; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=NzbvF1Knz7BqCBLTC5aW6ttylV9MJisv4hKu20gBQJk=; b=Fzjaa3cXgDxz62aVESc1atfR17j1MjwBjsZYtd3bzTRSLHms3oOEOkl4 KgEY2hh3YblHp6oKuNJvR8IyJygm4KfRCyr92FIPWo/nAh9n0ZAsC0EXG YnS86AEeQ4RRDHgswkYSZktyhKFxJoQTk/yquNS21u/S38KarmsOfyuJ+ NhmQYveYaAni1Jc18SBOn3IYHiBybk9nR6Tz55OnYmdv8D6dtG1pnYFsM a7kQZnRJGAuUrxoI9ocRQYDbDmB5bjKljOMBn1gXl7MBJAPpjRofrVmeB sy5xYgvSkJEzMKFowNNkOtCrciXX0kTa7q8QWAaRnfrZqJ2VU5Cu0v7WH A==; X-CSE-ConnectionGUID: HiTtSTm5T8yfDU6wBHN5lg== X-CSE-MsgGUID: dYvYCw+WQBOSUwmshs3jdw== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182763" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182763" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:28 -0700 X-CSE-ConnectionGUID: W3om/jnjS/u2a0Ve/e2boA== X-CSE-MsgGUID: onxnZjAXSP+yBJP5MYOxAQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922935" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:26 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 16/17] KVM: x86: Move error handling inside free_external_spt() Date: Fri, 27 Mar 2026 13:14:20 -0700 Message-ID: <20260327201421.2824383-17-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Sean Christopherson Move the logic for TDX=E2=80=99s specific need to leak pages when reclaim fails inside the free_external_spt() op, so this can be done in TDX specific code and not the generic MMU. Do this by passing the SP in instead of the external page table pointer. This way TDX code can set sp->external_spt to NULL. Since the error is now handled internally, change the op to return void. This way it also operated like a normal free in that success is guaranteed from the callers perspective. Opportunistically, drop the unused level arg while adjusting the sp arg. Signed-off-by: Sean Christopherson [re-wrote log and massaged op name] Signed-off-by: Rick Edgecombe --- Notable changes since last discussion - Since free_external_sp() is dropped in the latter DPAMT patches, don't bother renaming free_external_spt(). --- arch/x86/include/asm/kvm-x86-ops.h | 2 +- arch/x86/include/asm/kvm_host.h | 3 +-- arch/x86/kvm/mmu/tdp_mmu.c | 13 ++----------- arch/x86/kvm/vmx/tdx.c | 25 +++++++++++-------------- 4 files changed, 15 insertions(+), 28 deletions(-) diff --git a/arch/x86/include/asm/kvm-x86-ops.h b/arch/x86/include/asm/kvm-= x86-ops.h index ed348c6dd445..10ccf6ea9d9a 100644 --- a/arch/x86/include/asm/kvm-x86-ops.h +++ b/arch/x86/include/asm/kvm-x86-ops.h @@ -96,7 +96,7 @@ KVM_X86_OP_OPTIONAL_RET0(set_identity_map_addr) KVM_X86_OP_OPTIONAL_RET0(get_mt_mask) KVM_X86_OP(load_mmu_pgd) KVM_X86_OP_OPTIONAL_RET0(set_external_spte) -KVM_X86_OP_OPTIONAL_RET0(free_external_spt) +KVM_X86_OP_OPTIONAL(free_external_spt) KVM_X86_OP(has_wbinvd_exit) KVM_X86_OP(get_l2_tsc_offset) KVM_X86_OP(get_l2_tsc_multiplier) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 09588e797e4b..fbc39f0bb491 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1881,8 +1881,7 @@ struct kvm_x86_ops { u64 new_spte, enum pg_level level); =20 /* Update external page tables for page table about to be freed. */ - int (*free_external_spt)(struct kvm *kvm, gfn_t gfn, enum pg_level level, - void *external_spt); + void (*free_external_spt)(struct kvm *kvm, gfn_t gfn, struct kvm_mmu_page= *sp); =20 =20 bool (*has_wbinvd_exit)(void); diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 806788bdecce..575033cc7fe4 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -455,17 +455,8 @@ static void handle_removed_pt(struct kvm *kvm, tdp_pte= p_t pt, bool shared) handle_changed_spte(kvm, sp, gfn, old_spte, FROZEN_SPTE, level, shared); } =20 - if (is_mirror_sp(sp) && - WARN_ON(kvm_x86_call(free_external_spt)(kvm, base_gfn, sp->role.level, - sp->external_spt))) { - /* - * Failed to free page table page in mirror page table and - * there is nothing to do further. - * Intentionally leak the page to prevent the kernel from - * accessing the encrypted page. - */ - sp->external_spt =3D NULL; - } + if (is_mirror_sp(sp)) + kvm_x86_call(free_external_spt)(kvm, base_gfn, sp); =20 call_rcu(&sp->rcu_head, tdp_mmu_free_sp_rcu_callback); } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index bfbadba8bc08..d064b40a6b31 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1765,27 +1765,24 @@ static void tdx_track(struct kvm *kvm) kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE); } =20 -static int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, - enum pg_level level, void *private_spt) +static void tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn, + struct kvm_mmu_page *sp) { - struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm); - /* - * free_external_spt() is only called after hkid is freed when TD is - * tearing down. * KVM doesn't (yet) zap page table pages in mirror page table while * TD is active, though guest pages mapped in mirror page table could be * zapped during TD is active, e.g. for shared <-> private conversion * and slot move/deletion. + * + * In other words, KVM should only free mirror page tables after the + * TD's hkid is freed, when the TD is being torn down. + * + * If the S-EPT PTE can't be removed for any reason, intentionally leak + * the page to prevent the kernel from accessing the encrypted page. */ - if (KVM_BUG_ON(is_hkid_assigned(kvm_tdx), kvm)) - return -EIO; - - /* - * The HKID assigned to this TD was already freed and cache was - * already flushed. We don't have to flush again. - */ - return tdx_reclaim_page(virt_to_page(private_spt)); + if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) || + tdx_reclaim_page(virt_to_page(sp->external_spt))) + sp->external_spt =3D NULL; } =20 static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, --=20 2.53.0 From nobody Thu Apr 2 14:14:44 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.7]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1B6AD3A7590; Fri, 27 Mar 2026 20:14:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.7 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642480; cv=none; b=a/RzB48TvOiywzCZ/i2un0/Jp3kLGAb+VITXNsUpFGVdydwwEnutkYC42/zskwh49sPVGwFPcsavtXu9z8uq7FHhpY2XcW1+rPiuSczzB+O4qcgUQXM54SXVJp7fPlHhh+peA2EU4FzHXXgUnyLbbSlCMjWBRZvbjS+oqk6BTqw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774642480; c=relaxed/simple; bh=TYJivbcKhjb/nvi/CHJdsi2LPDmaHOBx57XRPI/8FxI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=DIfeqskL8KGeDIwVP8jzX8GzxsYxs7wpHXt9zj4e2aKq2pPynKN50t0QdQVoVEBGvPFxRTmERiC5P3KhVcijBG9Rscs3OkXb/sFysROKtj1iZz4CbD5YWo2QZ9i9jRpLhEB6Mmq0ybWu8opbr7UZdw9bSH1jraPp7zzKx4UWmwA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ZOsdaJxu; arc=none smtp.client-ip=192.198.163.7 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ZOsdaJxu" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1774642479; x=1806178479; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=TYJivbcKhjb/nvi/CHJdsi2LPDmaHOBx57XRPI/8FxI=; b=ZOsdaJxuD373osB3dWqYS26kestkv6cpWpsue2T1lmDYxlouzp6CziP9 yS3hpYQdqIg8ofHtQ59jwUVhHN9r3kMzfDLXSXGql4J8fvKOehbRBZrAT Y+K7wOe/zkX0UpNIR80XEuMY1BpHbeVleMRNTZYvfc7Kj5aMtL1M6V5iD 6gZns/uTAJKAfXvQYjW+/D0Todvx3IAaCy7GOKHVkheAziD11TAP8H37A 9UNgvvgVtO/Em3KetW6rNb62ENePmvYMt+oUFuYr/j9Y4kMFzmx3pGIfr ojYYoqqiwqua/f7ZuF8I86N99sqF3eRwrSKbZMIwxoC/kPRSq85+JeHQK g==; X-CSE-ConnectionGUID: lSxPk6BbR32LcH4p4FQOGQ== X-CSE-MsgGUID: MhYCCz0TR5KKyK6GEZ8CNw== X-IronPort-AV: E=McAfee;i="6800,10657,11742"; a="101182762" X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="101182762" Received: from orviesa002.jf.intel.com ([10.64.159.142]) by fmvoesa101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:28 -0700 X-CSE-ConnectionGUID: UHdMoRA3TMGfvACAK4rYiA== X-CSE-MsgGUID: t9z75IxZRlm0NzkgwmgGMg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,144,1770624000"; d="scan'208";a="255922938" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Mar 2026 13:14:27 -0700 From: Rick Edgecombe To: seanjc@google.com, pbonzini@redhat.com, yan.y.zhao@intel.com, kai.huang@intel.com, kvm@vger.kernel.org, kas@kernel.org Cc: linux-kernel@vger.kernel.org, x86@kernel.org, dave.hansen@intel.com, rick.p.edgecombe@intel.com Subject: [PATCH 17/17] KVM: TDX: Move external page table freeing to TDX code Date: Fri, 27 Mar 2026 13:14:21 -0700 Message-ID: <20260327201421.2824383-18-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> References: <20260327201421.2824383-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Sean Christopherson Move the freeing of external page tables into the reclaim operation that lives in TDX code. The TDP MMU supports traversing the TDP without holding locks. Page tables needs to be freed via RCU to prevent walking one that gets freed. While none of these lockless walk operations actually happen for the mirror EPT, the TDP MMU none-the-less frees the mirror EPT page tables in the same way, and because it=E2=80=99s a handy place to plug it in, the ext= ernal page tables as well. However, the external page tables definitely can=E2=80=99t be walked once t= hey are reclaimed from the TDX module. The TDX module releases the page for the host VMM to use, so this RCU-time free is unnecessary for external page tables. So move the free_page() call to TDX code. Create an tdp_mmu_free_unused_sp() to allow for freeing external page tables that have never left the TDP MMU code (i.e. don=E2=80=99t need freed in a specia= l way. Link: https://lore.kernel.org/kvm/aYpjNrtGmogNzqwT@google.com/ Not-yet-Signed-off-by: Sean Christopherson [Based on a diff by Sean, added log] Signed-off-by: Rick Edgecombe --- arch/x86/kvm/mmu/tdp_mmu.c | 16 +++++++++++----- arch/x86/kvm/vmx/tdx.c | 11 ++++++++++- 2 files changed, 21 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 575033cc7fe4..18e11c1c7631 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -53,13 +53,18 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) rcu_barrier(); } =20 -static void tdp_mmu_free_sp(struct kvm_mmu_page *sp) +static void __tdp_mmu_free_sp(struct kvm_mmu_page *sp) { - free_page((unsigned long)sp->external_spt); free_page((unsigned long)sp->spt); kmem_cache_free(mmu_page_header_cache, sp); } =20 +static void tdp_mmu_free_unused_sp(struct kvm_mmu_page *sp) +{ + free_page((unsigned long)sp->external_spt); + __tdp_mmu_free_sp(sp); +} + /* * This is called through call_rcu in order to free TDP page table memory * safely with respect to other kernel threads that may be operating on @@ -73,7 +78,8 @@ static void tdp_mmu_free_sp_rcu_callback(struct rcu_head = *head) struct kvm_mmu_page *sp =3D container_of(head, struct kvm_mmu_page, rcu_head); =20 - tdp_mmu_free_sp(sp); + WARN_ON_ONCE(sp->external_spt); + __tdp_mmu_free_sp(sp); } =20 void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root) @@ -1261,7 +1267,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm= _page_fault *fault) * failed, e.g. because a different task modified the SPTE. */ if (r) { - tdp_mmu_free_sp(sp); + tdp_mmu_free_unused_sp(sp); goto retry; } =20 @@ -1571,7 +1577,7 @@ static int tdp_mmu_split_huge_pages_root(struct kvm *= kvm, * installs its own sp in place of the last sp we tried to split. */ if (sp) - tdp_mmu_free_sp(sp); + tdp_mmu_free_unused_sp(sp); =20 return 0; } diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index d064b40a6b31..1346e891ca94 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1782,7 +1782,16 @@ static void tdx_sept_free_private_spt(struct kvm *kv= m, gfn_t gfn, */ if (KVM_BUG_ON(is_hkid_assigned(to_kvm_tdx(kvm)), kvm) || tdx_reclaim_page(virt_to_page(sp->external_spt))) - sp->external_spt =3D NULL; + goto out; + + /* + * Immediately free the S-EPT page as the TDX subsystem doesn't support + * freeing pages from RCU callbacks, and more importantly because + * TDH.PHYMEM.PAGE.RECLAIM ensures there are no outstanding readers. + */ + free_page((unsigned long)sp->external_spt); +out: + sp->external_spt =3D NULL; } =20 static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn, --=20 2.53.0