From nobody Thu Apr 9 15:50:06 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AE867284880; Sat, 7 Mar 2026 01:04:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772845454; cv=none; b=XFPOg+6SNpH/J38EGv1bHlimH/yZ6r/bu+qtiky94YFWIEBNU1qJijpwzlVWhPkcH54gHWotTeWpNj0nBgKJtq0Mg/Ya6nCv14zV5OmRKmUU0/uN51SW20Lrjrpx2M7BppOgaqLtAzJEeJkTkhIhTRj/yL4IBZljJKefvb5otRo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772845454; c=relaxed/simple; bh=g/D2jQlObfVMXooMIoY7xGv3HKXJWcnBX6Rf9c1bpsY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Yaz4UyClWSYs3rueMvg5kKZ/EpYllN02J4nxYe+eZrtSLCadQur3OOamd9EyjfhVm97A9Z56nrZTi6hfDOEkX0nOSYturGa6m8wTA4qamxHxPXk+XxiWFjef9sVCdxTqtxfYWEg9q7OJx8t0M9CkYTUR3jYZC3AHNOfuI+wri8A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Ur+CzO9e; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Ur+CzO9e" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1772845451; x=1804381451; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=g/D2jQlObfVMXooMIoY7xGv3HKXJWcnBX6Rf9c1bpsY=; b=Ur+CzO9evmcJvXC92Nbkr4DN78PC/P+fZD9/oZSzJ4UZAl9nO3RKnEKB aSAzXxwCnJDKZfpM8SkIT9g8ERJdeJ63CkuNlaG2NSZvS5zq7f7nMHKcH E5Kw/AB5ALcy+N+3L/L+n9jAz+/aOhLZxzMlSV168ODNwjIFaM/LjEoYt MCKaRbhkxpJQ3qqNlv39N8gOaFrPhQ8Uawgs1JAr09cCHSD95ra2oVrNq iHrL0owJnitKUFQ5CUg+w7Osn8SIt69OA/go+kP+4BLsNsOqh/CUstZEr 2B9xd6gpRonDiG05EizeDcSrj4ZtJkU1+cb4PKAkyqWimDqIa6CSPsm+I w==; X-CSE-ConnectionGUID: hMWsm3KQTNG61OGNBrx9WQ== X-CSE-MsgGUID: hN/ZQvpERYe0risinNRc2g== X-IronPort-AV: E=McAfee;i="6800,10657,11721"; a="76565940" X-IronPort-AV: E=Sophos;i="6.23,105,1770624000"; d="scan'208";a="76565940" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2026 17:04:08 -0800 X-CSE-ConnectionGUID: FBMnce3WTIqNwEt6qBdX3w== X-CSE-MsgGUID: NSaiEWrsSaivLt3guM7IWg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,105,1770624000"; d="scan'208";a="218329617" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2026 17:04:08 -0800 From: Rick Edgecombe To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com, kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, mingo@redhat.com, pbonzini@redhat.com, seanjc@google.com, tglx@kernel.org, x86@kernel.org, chao.gao@intel.com, kai.huang@intel.com, ackerleytng@google.com Cc: rick.p.edgecombe@intel.com, vishal.l.verma@intel.com, "Kirill A. Shutemov" , Sagi Shahar , Vishal Annapurve Subject: [PATCH 1/4] x86/tdx: Move all TDX error defines into Date: Fri, 6 Mar 2026 17:03:55 -0800 Message-ID: <20260307010358.819645-2-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260307010358.819645-1-rick.p.edgecombe@intel.com> References: <20260307010358.819645-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: "Kirill A. Shutemov" Today there are two separate locations where TDX error codes are defined: arch/x86/include/asm/tdx.h arch/x86/kvm/vmx/tdx_errno.h They have some overlap that is already defined similarly. Reduce the duplication and prepare to introduce some helpers for these error codes in the central place by unifying them. Join them at: asm/shared/tdx_errno.h ...and update the headers that contained the duplicated definitions to include the new unified header. Place the new header in "asm/shared". While the compressed code for the guest doesn't use these error code header definitions today, it does make the types of calls that return the values they define. Place the defines in "shared" location so that compressed code has the definitions accessible, but leave cleanups to use proper error codes for future changes. Opportunistically massage some comments. Also, adjust _BITUL()->_BITULL() to address 32 bit build errors after the move. Signed-off-by: Kirill A. Shutemov [enhance log] Tested-by: Sagi Shahar Signed-off-by: Sean Christopherson Acked-by: Vishal Annapurve Signed-off-by: Vishal Verma Signed-off-by: Rick Edgecombe Reviewed-by: Kai Huang --- arch/x86/include/asm/shared/tdx.h | 1 + .../vmx =3D> include/asm/shared}/tdx_errno.h | 27 +++++++++++++++---- arch/x86/include/asm/tdx.h | 20 -------------- arch/x86/kvm/vmx/tdx.h | 1 - 4 files changed, 23 insertions(+), 26 deletions(-) rename arch/x86/{kvm/vmx =3D> include/asm/shared}/tdx_errno.h (65%) diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/share= d/tdx.h index 8bc074c8d7c6..6a1646fc2b2f 100644 --- a/arch/x86/include/asm/shared/tdx.h +++ b/arch/x86/include/asm/shared/tdx.h @@ -4,6 +4,7 @@ =20 #include #include +#include =20 #define TDX_HYPERCALL_STANDARD 0 =20 diff --git a/arch/x86/kvm/vmx/tdx_errno.h b/arch/x86/include/asm/shared/tdx= _errno.h similarity index 65% rename from arch/x86/kvm/vmx/tdx_errno.h rename to arch/x86/include/asm/shared/tdx_errno.h index 6ff4672c4181..3aa74f6a6119 100644 --- a/arch/x86/kvm/vmx/tdx_errno.h +++ b/arch/x86/include/asm/shared/tdx_errno.h @@ -1,14 +1,16 @@ /* SPDX-License-Identifier: GPL-2.0 */ -/* architectural status code for SEAMCALL */ +#ifndef _X86_SHARED_TDX_ERRNO_H +#define _X86_SHARED_TDX_ERRNO_H =20 -#ifndef __KVM_X86_TDX_ERRNO_H -#define __KVM_X86_TDX_ERRNO_H +#include =20 +/* Upper 32 bit of the TDX error code encodes the status */ #define TDX_SEAMCALL_STATUS_MASK 0xFFFFFFFF00000000ULL =20 /* - * TDX SEAMCALL Status Codes (returned in RAX) + * TDX SEAMCALL Status Codes */ +#define TDX_SUCCESS 0ULL #define TDX_NON_RECOVERABLE_VCPU 0x4000000100000000ULL #define TDX_NON_RECOVERABLE_TD 0x4000000200000000ULL #define TDX_NON_RECOVERABLE_TD_NON_ACCESSIBLE 0x6000000500000000ULL @@ -17,6 +19,7 @@ #define TDX_OPERAND_INVALID 0xC000010000000000ULL #define TDX_OPERAND_BUSY 0x8000020000000000ULL #define TDX_PREVIOUS_TLB_EPOCH_BUSY 0x8000020100000000ULL +#define TDX_RND_NO_ENTROPY 0x8000020300000000ULL #define TDX_PAGE_METADATA_INCORRECT 0xC000030000000000ULL #define TDX_VCPU_NOT_ASSOCIATED 0x8000070200000000ULL #define TDX_KEY_GENERATION_FAILED 0x8000080000000000ULL @@ -28,6 +31,20 @@ #define TDX_EPT_ENTRY_STATE_INCORRECT 0xC0000B0D00000000ULL #define TDX_METADATA_FIELD_NOT_READABLE 0xC0000C0200000000ULL =20 +/* + * SW-defined error codes. + * + * Bits 47:40 =3D=3D 0xFF indicate Reserved status code class that never u= sed by + * TDX module. + */ +#define TDX_ERROR _BITULL(63) +#define TDX_NON_RECOVERABLE _BITULL(62) +#define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40)) +#define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _ULL(0xFFFF0000)) + +#define TDX_SEAMCALL_GP (TDX_SW_ERROR | X86_TRAP_GP) +#define TDX_SEAMCALL_UD (TDX_SW_ERROR | X86_TRAP_UD) + /* * TDX module operand ID, appears in 31:0 part of error code as * detail information @@ -37,4 +54,4 @@ #define TDX_OPERAND_ID_SEPT 0x92 #define TDX_OPERAND_ID_TD_EPOCH 0xa9 =20 -#endif /* __KVM_X86_TDX_ERRNO_H */ +#endif /* _X86_SHARED_TDX_ERRNO_H */ diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index a149740b24e8..0c1ae4954f17 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -12,26 +12,6 @@ #include #include =20 -/* - * SW-defined error codes. - * - * Bits 47:40 =3D=3D 0xFF indicate Reserved status code class that never u= sed by - * TDX module. - */ -#define TDX_ERROR _BITUL(63) -#define TDX_NON_RECOVERABLE _BITUL(62) -#define TDX_SW_ERROR (TDX_ERROR | GENMASK_ULL(47, 40)) -#define TDX_SEAMCALL_VMFAILINVALID (TDX_SW_ERROR | _UL(0xFFFF0000)) - -#define TDX_SEAMCALL_GP (TDX_SW_ERROR | X86_TRAP_GP) -#define TDX_SEAMCALL_UD (TDX_SW_ERROR | X86_TRAP_UD) - -/* - * TDX module SEAMCALL leaf function error codes - */ -#define TDX_SUCCESS 0ULL -#define TDX_RND_NO_ENTROPY 0x8000020300000000ULL - #ifndef __ASSEMBLER__ =20 #include diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h index b5cd2ffb303e..ac8323a68b16 100644 --- a/arch/x86/kvm/vmx/tdx.h +++ b/arch/x86/kvm/vmx/tdx.h @@ -3,7 +3,6 @@ #define __KVM_X86_VMX_TDX_H =20 #include "tdx_arch.h" -#include "tdx_errno.h" =20 #ifdef CONFIG_KVM_INTEL_TDX #include "common.h" --=20 2.53.0 From nobody Thu Apr 9 15:50:06 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC46A286412; Sat, 7 Mar 2026 01:04:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772845454; cv=none; b=GqyZUHE+EdIFnV6MDPCiuU2pa+m3p9SdEMedQqUCrBy18JmAPek0gTCKN5GPa6FHZmLtx5ZMHFk4My32pRg84hK+JmH6ew9RQQF/oisiTyoMcUyt9kbcvWzlCoeWnlrHYjwFu9ymtjQHbECjfeiJrUuVmeimCkV0vmc5adD+6t0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772845454; c=relaxed/simple; bh=oTLfhSSlOmM92LJkqNFBzQ2K3fHxLneuG0MsPSGtmy0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=hHGEGKrXUbqLj5zZWk4A50rT0sWoUfF4NgRT6Tl5pZQuEu/u2NHUQFVtDftsUT5+fVxFZRFZHwO+o8R07yhxOOMMC3BqjhU8BT+AyLOLZr9msd5fRyCshpt1BSFyNUNWydgs81sD+lGYujIR5zPej3hullRwT7uPEkZiy47gD5k= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=gxmL/SI6; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="gxmL/SI6" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1772845452; x=1804381452; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=oTLfhSSlOmM92LJkqNFBzQ2K3fHxLneuG0MsPSGtmy0=; b=gxmL/SI6b32xMC1EdZSj5DWNmfK/qK2dWtDa8CN3zeEYc7IZKV3kYurA hrDmKd68mi3P1PV4MO1lqvWjbefR+b8MhkbjkXgcUmpe0l2F3OPS34OyC pqP+TmUXT2oN/Tmsg53r6s2+e2KUTkL0/7gSA1OX+Q99uunD/Iv6GqfiM pvr+1s18XP1Sx5sD8sVdqq4aLkgjJRdYjklaWZx49kNGLvsDvomgk1wLT ACQJxWAolrFUDsfJl4jUJJQZDAuA9gx7xNF5bEm39h0Rao3WVSqLh6u7r GG4lMk109tSQo8nS6k9U2DrzSsyrPzxEJcy6PbOKzso4l0kV9TNlKHqjN g==; X-CSE-ConnectionGUID: VUhxL4GKRUGX6VFw9ygfMA== X-CSE-MsgGUID: MBip9xhpTBWsmIYs2O26hw== X-IronPort-AV: E=McAfee;i="6800,10657,11721"; a="76565944" X-IronPort-AV: E=Sophos;i="6.23,105,1770624000"; d="scan'208";a="76565944" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2026 17:04:08 -0800 X-CSE-ConnectionGUID: +yrCuu/GSI6QofSK7FcTIw== X-CSE-MsgGUID: tmPMvj9KQfmWXtAEsa0ZTg== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,105,1770624000"; d="scan'208";a="218329622" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2026 17:04:08 -0800 From: Rick Edgecombe To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com, kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, mingo@redhat.com, pbonzini@redhat.com, seanjc@google.com, tglx@kernel.org, x86@kernel.org, chao.gao@intel.com, kai.huang@intel.com, ackerleytng@google.com Cc: rick.p.edgecombe@intel.com, vishal.l.verma@intel.com Subject: [PATCH 2/4] x86/virt/tdx: Pull kexec cache flush logic into arch/x86 Date: Fri, 6 Mar 2026 17:03:56 -0800 Message-ID: <20260307010358.819645-3-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260307010358.819645-1-rick.p.edgecombe@intel.com> References: <20260307010358.819645-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" KVM tries to take care of some required cache flushing earlier in the kexec path in order to be kind to some long standing races that can occur later in the operation. Until recently, VMXOFF was handled within KVM. Since VMX being enabled is required to make a SEAMCALL, it had the best per-cpu scoped operation to plug the flushing into. This early kexec cache flushing in KVM happens via a syscore shutdown=20 callback. Now that VMX enablement control has moved to arch/x86, which has=20 grown its own syscore shutdown callback, it no longer make sense for it to=20 live in KVM. It fits better with the TDX enablement managing code. In addition, future changes will add a SEAMCALL that happens immediately before VMXOFF, which means the cache flush in KVM will be too late to be helpful. So move it to the newly added TDX arch/x86 syscore shutdown handler. Since tdx_cpu_flush_cache_for_kexec() is no longer needed by KVM, make it=20 static and remove the export. Since it is also not part of an operation=20 spread across disparate components, remove the redundant comments and=20 verbose naming. Signed-off-by: Rick Edgecombe Acked-by: Kai Huang --- arch/x86/include/asm/tdx.h | 6 ------ arch/x86/kvm/vmx/tdx.c | 10 ---------- arch/x86/virt/vmx/tdx/tdx.c | 39 +++++++++++++++++++------------------ 3 files changed, 20 insertions(+), 35 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 0c1ae4954f17..f0826b0a512a 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -206,11 +206,5 @@ static inline const char *tdx_dump_mce_info(struct mce= *m) { return NULL; } static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NU= LL; } #endif /* CONFIG_INTEL_TDX_HOST */ =20 -#ifdef CONFIG_KEXEC_CORE -void tdx_cpu_flush_cache_for_kexec(void); -#else -static inline void tdx_cpu_flush_cache_for_kexec(void) { } -#endif - #endif /* !__ASSEMBLER__ */ #endif /* _ASM_X86_TDX_H */ diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index b7264b533feb..50a5cfdbd33e 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -440,16 +440,6 @@ void tdx_disable_virtualization_cpu(void) tdx_flush_vp(&arg); } local_irq_restore(flags); - - /* - * Flush cache now if kexec is possible: this is necessary to avoid - * having dirty private memory cachelines when the new kernel boots, - * but WBINVD is a relatively expensive operation and doing it during - * kexec can exacerbate races in native_stop_other_cpus(). Do it - * now, since this is a safe moment and there is going to be no more - * TDX activity on this CPU from this point on. - */ - tdx_cpu_flush_cache_for_kexec(); } =20 #define TDX_SEAMCALL_RETRIES 10000 diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index cb9b3210ab71..0802d0fd18a4 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -224,8 +224,28 @@ static int tdx_offline_cpu(unsigned int cpu) return 0; } =20 +static void tdx_cpu_flush_cache(void) +{ + lockdep_assert_preemption_disabled(); + + if (!this_cpu_read(cache_state_incoherent)) + return; + + wbinvd(); + this_cpu_write(cache_state_incoherent, false); +} + static void tdx_shutdown_cpu(void *ign) { + /* + * Flush cache now if kexec is possible: this is necessary to avoid + * having dirty private memory cachelines when the new kernel boots, + * but WBINVD is a relatively expensive operation and doing it during + * kexec can exacerbate races in native_stop_other_cpus(). Do it + * now, since this is a safe moment and there is going to be no more + * TDX activity on this CPU from this point on. + */ + tdx_cpu_flush_cache(); x86_virt_put_ref(X86_FEATURE_VMX); } =20 @@ -1920,22 +1940,3 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct pag= e *page) return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); } EXPORT_SYMBOL_FOR_KVM(tdh_phymem_page_wbinvd_hkid); - -#ifdef CONFIG_KEXEC_CORE -void tdx_cpu_flush_cache_for_kexec(void) -{ - lockdep_assert_preemption_disabled(); - - if (!this_cpu_read(cache_state_incoherent)) - return; - - /* - * Private memory cachelines need to be clean at the time of - * kexec. Write them back now, as the caller promises that - * there should be no more SEAMCALLs on this CPU. - */ - wbinvd(); - this_cpu_write(cache_state_incoherent, false); -} -EXPORT_SYMBOL_FOR_KVM(tdx_cpu_flush_cache_for_kexec); -#endif --=20 2.53.0 From nobody Thu Apr 9 15:50:06 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CDE7328851F; Sat, 7 Mar 2026 01:04:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772845458; cv=none; b=YP6UwELAqeKRtCFsTkfntR45X4htztgWfFEUBMbPkQXkaYmYq/t/HGVqtaOeu5DGDLCt/fNC+9sBjs+TuI+7wU3oOC1Erk9Ep104kc9xP+ErNZ9hPumdIbPJfJV3uj4qKwx6ZSGRSrzgtj251zeF+zCYn0ayNaf1u9TZ2IZy1Wg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772845458; c=relaxed/simple; bh=Ib4A2dLqs3t1llyi121mdd/5hQlAcyxmpeDuX6pTBgM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=QtftTb+eUsK9UA8jYHNVrg+RalqnRkgpz9YV0YCdcNmKfSyvKCLxzkxKG/nBbPwlA4RB5lbxEsRSSFsszD9ThJHLOjgl7JIRqRrU7rfvhK5j8gfN5YqsVsv+onUCuwJk+YNHarcKc8TEGEYyQ1eEr0FlKtjaR8xJf4ZFaMAdU/E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=aPwR+Nn2; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="aPwR+Nn2" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1772845456; x=1804381456; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Ib4A2dLqs3t1llyi121mdd/5hQlAcyxmpeDuX6pTBgM=; b=aPwR+Nn2gyZqWRTnI+hGzlr4sqMc8aERkS7r1pbshZXIqjHTmMYBq5kn u5zx17cb4W/JleA1tDylKfzmfb4wPhLGwg4maDbKUwjP3SKl3TeqOf6VB 4ySB3yCqqcdoJL/eEXZeF7p2/cWnmhHqkUlXvrhSLyQtSMMPUzjbODxI3 JftMkB4UabhmNeG6jj/OhhL2rFxWbFqo/stss2qyCWef08EgecABrM2Lr /3YaB7B+46+WUnHZ/9jHdNFowoesRWzsAtchb2wlCBU05Tun+JjJt2rzd dapMYmaEyzoumbmvMrvxqxN1Tyf+U8lpx3tHwxVKavMk81bnYNXSccr3h g==; X-CSE-ConnectionGUID: nKMn6jmsR0iVC5jAvfgu3g== X-CSE-MsgGUID: BcqkT5oMQS+0D3K2DdyD2A== X-IronPort-AV: E=McAfee;i="6800,10657,11721"; a="76565951" X-IronPort-AV: E=Sophos;i="6.23,105,1770624000"; d="scan'208";a="76565951" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2026 17:04:08 -0800 X-CSE-ConnectionGUID: z6oddU4cTMajb1kLg7sgzg== X-CSE-MsgGUID: FKN6yTkOSg2ojlhEfXf0bQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,105,1770624000"; d="scan'208";a="218329625" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2026 17:04:08 -0800 From: Rick Edgecombe To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com, kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, mingo@redhat.com, pbonzini@redhat.com, seanjc@google.com, tglx@kernel.org, x86@kernel.org, chao.gao@intel.com, kai.huang@intel.com, ackerleytng@google.com Cc: rick.p.edgecombe@intel.com, vishal.l.verma@intel.com Subject: [PATCH 3/4] x86/virt/tdx: Add SEAMCALL wrapper for TDH.SYS.DISABLE Date: Fri, 6 Mar 2026 17:03:57 -0800 Message-ID: <20260307010358.819645-4-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260307010358.819645-1-rick.p.edgecombe@intel.com> References: <20260307010358.819645-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Vishal Verma Some early TDX-capable platforms have an erratum where a partial write to TDX private memory can cause a machine check on a subsequent read. On these platforms, kexec and kdump have been disabled in these cases, because the old kernel cannot safely hand off TDX state to the new kernel. Later TDX modules support the TDH.SYS.DISABLE SEAMCALL, which provides a way to cleanly disable TDX and allow kexec to proceed. This can be a long running operation, and the time needed largely depends on the amount of memory that has been allocated to TDs. If all TDs have been destroyed prior to the sys_disable call, then it is fast, with only needing to override the TDX module memory. After the SEAMCALL completes, the TDX module is disabled and all memory resources allocated to TDX are freed and reset. The next kernel can then re-initialize the TDX module from scratch via the normal TDX bring-up sequence. The SEAMCALL may be interrupted by an interrupt. In this case, it returns TDX_INTERRUPTED_RESUMABLE, and it must be retried in a loop until the operation completes successfully. Add a tdx_sys_disable() helper, which implements the retry loop around the SEAMCALL to provide this functionality. Signed-off-by: Vishal Verma Signed-off-by: Rick Edgecombe --- arch/x86/include/asm/tdx.h | 3 +++ arch/x86/virt/vmx/tdx/tdx.c | 18 ++++++++++++++++++ arch/x86/virt/vmx/tdx/tdx.h | 1 + 3 files changed, 22 insertions(+) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index f0826b0a512a..baaf43a09e99 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -173,6 +173,8 @@ static inline int pg_level_to_tdx_sept_level(enum pg_le= vel level) return level - 1; } =20 +void tdx_sys_disable(void); + u64 tdh_vp_enter(struct tdx_vp *vp, struct tdx_module_args *args); u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page); u64 tdh_mem_page_add(struct tdx_td *td, u64 gpa, struct page *page, struct= page *source, u64 *ext_err1, u64 *ext_err2); @@ -204,6 +206,7 @@ static inline void tdx_init(void) { } static inline u32 tdx_get_nr_guest_keyids(void) { return 0; } static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; } static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NU= LL; } +static inline void tdx_sys_disable(void) { } #endif /* CONFIG_INTEL_TDX_HOST */ =20 #endif /* !__ASSEMBLER__ */ diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 0802d0fd18a4..68bd2618dde4 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -37,6 +37,7 @@ #include #include #include +#include #include #include #include @@ -1940,3 +1941,20 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct pag= e *page) return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); } EXPORT_SYMBOL_FOR_KVM(tdh_phymem_page_wbinvd_hkid); + +void tdx_sys_disable(void) +{ + struct tdx_module_args args =3D {}; + + /* + * SEAMCALLs that can return TDX_INTERRUPTED_RESUMABLE are guaranteed + * to make forward progress between interrupts, so it is safe to loop + * unconditionally here. + * + * This is a 'destructive' SEAMCALL, in that no other SEAMCALL can be + * run after this until a full reinitialization is done. + */ + while (seamcall(TDH_SYS_DISABLE, &args) =3D=3D TDX_INTERRUPTED_RESUMABLE) + ; +} + diff --git a/arch/x86/virt/vmx/tdx/tdx.h b/arch/x86/virt/vmx/tdx/tdx.h index dde219c823b4..e2cf2dd48755 100644 --- a/arch/x86/virt/vmx/tdx/tdx.h +++ b/arch/x86/virt/vmx/tdx/tdx.h @@ -46,6 +46,7 @@ #define TDH_PHYMEM_PAGE_WBINVD 41 #define TDH_VP_WR 43 #define TDH_SYS_CONFIG 45 +#define TDH_SYS_DISABLE 69 =20 /* * SEAMCALL leaf: --=20 2.53.0 From nobody Thu Apr 9 15:50:06 2026 Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.13]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D4682288530; Sat, 7 Mar 2026 01:04:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.13 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772845457; cv=none; b=uHAzxHzAYhjkbF5RmHQuewWbUKngHAtX07zOY+BgmJ+Vno1wFLGiG0Tp263a5OLsy7y0YBU7DYicPuRhfVtaDzATFbRh5YIsdN4zoBahHrukgthw1ndA+Vkb/eehryWsx5MN53vr2vZK6JzxHMeil1uWxG6obuplZIO8BKpuEEk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1772845457; c=relaxed/simple; bh=Y9KMueimA6TuqteFC7M2S3rpvtCygw0L+mAHMgMh3rQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SpEVDxlM6+/qt/a88IQIYXyT8ethb64We7jAHmJk8hyiTZ9sEapp4rJBaOismq/ho6ziB74v2bSntwurqE5PxI4qHnhu28jKqv0JFH+bkVk6JVLtTlaI7ETtRmLkAeXEhTsJ+kBMd+H7q6FjcTUOJdxdoXMo1VwQ9LJUyq4fSNQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Y+OM1eSm; arc=none smtp.client-ip=192.198.163.13 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Y+OM1eSm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1772845456; x=1804381456; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Y9KMueimA6TuqteFC7M2S3rpvtCygw0L+mAHMgMh3rQ=; b=Y+OM1eSm2SymZ4SFan+19WU+OsOlyfFFUCwxVnD3du5AkowSb7cpmu7V NVz4E0kKsLdK2hYHCw1rzcfg/UJK19eNUyzcDgmnV2dyjOlQ9YZ0tUEM8 u8fG8sJzkoGe/Ck/lv3bQ20yz68HaDwcGypE0bwbPq9PKbJ0Fynx2bf19 gePVH9o0F3k+KrchyPFG3MU6gRgMKgmUmyMhll2JMb7BTOIwcCIRgSUEv rd6hNX5Rieedd4lRhl7OSle7xqbsOa4hwUakQMmeObYvegdq/3SRTqxoq 94Xnr9mhEZS/BGB8RrcUJLEyZZce0Npt+lqKwgMkURPEHLsMC6Xx7A0g6 w==; X-CSE-ConnectionGUID: QOwr+Sh9RJumJQ14YB2BpA== X-CSE-MsgGUID: MpxwBD9rS5SKPXKfdQ0tXQ== X-IronPort-AV: E=McAfee;i="6800,10657,11721"; a="76565957" X-IronPort-AV: E=Sophos;i="6.23,105,1770624000"; d="scan'208";a="76565957" Received: from orviesa006.jf.intel.com ([10.64.159.146]) by fmvoesa107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2026 17:04:08 -0800 X-CSE-ConnectionGUID: LcrIWqwxSGms1ueyVs2t3g== X-CSE-MsgGUID: eytv7WfRQvemAVuA54gpAA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.23,105,1770624000"; d="scan'208";a="218329628" Received: from rpedgeco-desk.jf.intel.com ([10.88.27.139]) by orviesa006-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 06 Mar 2026 17:04:08 -0800 From: Rick Edgecombe To: bp@alien8.de, dave.hansen@intel.com, hpa@zytor.com, kas@kernel.org, kvm@vger.kernel.org, linux-coco@lists.linux.dev, linux-kernel@vger.kernel.org, mingo@redhat.com, pbonzini@redhat.com, seanjc@google.com, tglx@kernel.org, x86@kernel.org, chao.gao@intel.com, kai.huang@intel.com, ackerleytng@google.com Cc: rick.p.edgecombe@intel.com, vishal.l.verma@intel.com Subject: [PATCH 4/4] KVM: x86: Disable the TDX module during kexec and kdump Date: Fri, 6 Mar 2026 17:03:58 -0800 Message-ID: <20260307010358.819645-5-rick.p.edgecombe@intel.com> X-Mailer: git-send-email 2.53.0 In-Reply-To: <20260307010358.819645-1-rick.p.edgecombe@intel.com> References: <20260307010358.819645-1-rick.p.edgecombe@intel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Vishal Verma Use the TDH.SYS.DISABLE SEAMCALL, which disables the TDX module, reclaims all memory resources assigned to TDX, and clears any partial-write induced poison, to allow kexec and kdump on platforms with the partial write errata. On TDX-capable platforms with the partial write erratum, kexec has been disabled because the new kernel could hit a machine check reading a previously poisoned memory location. Later TDX modules support TDH.SYS.DISABLE, which disables the module and reclaims all TDX memory resources, allowing the new kernel to re-initialize TDX from scratch. This operation also clears the old memory, cleaning up any poison. Add tdx_sys_disable() to tdx_shutdown(), which is called in the syscore_shutdown path for kexec. This is done just before tdx_shutdown() disables VMX on all CPUs. For kdump, call tdx_sys_disable() in the crash path before x86_virt_emergency_disable_virtualization_cpu() does VMXOFF. Since this clears any poison on TDX-managed memory, the X86_BUG_TDX_PW_MCE check in machine_kexec() that blocked kexec on partial write errata platforms can be removed. Signed-off-by: Vishal Verma Signed-off-by: Rick Edgecombe --- arch/x86/kernel/crash.c | 2 ++ arch/x86/kernel/machine_kexec_64.c | 16 ---------------- arch/x86/virt/vmx/tdx/tdx.c | 1 + 3 files changed, 3 insertions(+), 16 deletions(-) diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index cd796818d94d..623d4474631a 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -38,6 +38,7 @@ #include #include #include +#include #include #include #include @@ -112,6 +113,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs) =20 crash_smp_send_stop(); =20 + tdx_sys_disable(); x86_virt_emergency_disable_virtualization_cpu(); =20 /* diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_k= exec_64.c index 0590d399d4f1..c3f4a389992d 100644 --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -347,22 +347,6 @@ int machine_kexec_prepare(struct kimage *image) unsigned long reloc_end =3D (unsigned long)__relocate_kernel_end; int result; =20 - /* - * Some early TDX-capable platforms have an erratum. A kernel - * partial write (a write transaction of less than cacheline - * lands at memory controller) to TDX private memory poisons that - * memory, and a subsequent read triggers a machine check. - * - * On those platforms the old kernel must reset TDX private - * memory before jumping to the new kernel otherwise the new - * kernel may see unexpected machine check. For simplicity - * just fail kexec/kdump on those platforms. - */ - if (boot_cpu_has_bug(X86_BUG_TDX_PW_MCE)) { - pr_info_once("Not allowed on platform with tdx_pw_mce bug\n"); - return -EOPNOTSUPP; - } - /* Setup the identity mapped 64bit page table */ result =3D init_pgtable(image, __pa(control_page)); if (result) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 68bd2618dde4..b388fbce5d76 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -252,6 +252,7 @@ static void tdx_shutdown_cpu(void *ign) =20 static void tdx_shutdown(void *ign) { + tdx_sys_disable(); on_each_cpu(tdx_shutdown_cpu, NULL, 1); } =20 --=20 2.53.0