From nobody Fri Dec 19 10:42:15 2025 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF2CD19F10A; Wed, 26 Mar 2025 08:02:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742976176; cv=none; b=muS0/KSOKk2uBCssIxvsr+FdB9Q6wbfsOCtMVhxgRCmupd21Dqk4JwIcTRsmm1Bs16iXtuFve/wdKTKH0HM/ha0YG0e24IuHqSsNm4GT0ISQVWIH0kW3u05J6SPGxq/IZJUvfj9EFCV+l+o2yCtRK2W4jO/ImSeXrRmHHhMvS4U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1742976176; c=relaxed/simple; bh=JmoLePE8QJbKeZQ3C8QX0X0eA0Fp+alg+DtWX5oHr5c=; h=Date:From:To:Subject:Cc:In-Reply-To:References:MIME-Version: Message-ID:Content-Type; b=KdYILEf+RavfzDUuB2B9x0R1y6frPc+Umu3V0ETcLmJp4jyAPiewpokXgYUSU+6CqzgMGA6leLC6qmfpDRi+WcII8gZ+dcSvmFPIgrngoTdkE21RZNqnV+qj+tpAUdLfZWKuWSXfeUKD40suJK1tMzmk4CdX7DcUnvDDlUvtB2E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=tf6o+jcg; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=sdumLWQH; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="tf6o+jcg"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="sdumLWQH" Date: Wed, 26 Mar 2025 08:02:43 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1742976171; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uNjoH9Y4CeYJsqO8vczoARuZDeayo1PlEVJGTUvLz3A=; b=tf6o+jcgo1aLySFs4mrgs6ppTg4HRrRs1vvBLw1BXdY9nZ0sV4ygjLYRZXcT3tbzhtZCRk dDP2woUTVxFWwIogpNmlA1oxP8FVxMEuvdmgdekeIyHFnh8XCYrC97Rj42uTV38QpfokQz fQ9XEWL2gjRno3e9S5i0JyU53mpKvnoGwMyiKQStMnSRC5YjedY4gtc9YXFaUix4rBylPp MQN8EDe3q3RlcN7Xvu8wPxxrhvsWnlDwz4T2cmUqfBR4wLUhht3mn6/fA3wfMgFM7pugm4 6sEpcWEPX1EbIUsgpyCYMpwTQd4sM7BUs2komqa2WNsxNNxMzpQBg7FBr/W5og== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1742976171; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uNjoH9Y4CeYJsqO8vczoARuZDeayo1PlEVJGTUvLz3A=; b=sdumLWQHAdTyBKZPEIY5u1GttIbVv7ekEIuOvxpb+qrekaMb1qNdZluieb/rLYaBHgXOmO u0xw6ysr8piIeICw== From: "tip-bot2 for Vishal Annapurve" Sender: tip-bot2@linutronix.de Reply-to: linux-kernel@vger.kernel.org To: linux-tip-commits@vger.kernel.org Subject: [tip: x86/tdx] x86/tdx: Fix arch_safe_halt() execution for TDX VMs Cc: Vishal Annapurve , Ingo Molnar , "Kirill A. Shutemov" , Ryan Afranji , Andy Lutomirski , Brian Gerst , Juergen Gross , "H. Peter Anvin" , Linus Torvalds , Josh Poimboeuf , stable@vger.kernel.org, x86@kernel.org, linux-kernel@vger.kernel.org In-Reply-To: <20250228014416.3925664-3-vannapurve@google.com> References: <20250228014416.3925664-3-vannapurve@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Message-ID: <174297617047.14745.9690994192475300024.tip-bot2@tip-bot2> Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails Precedence: bulk Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable The following commit has been merged into the x86/tdx branch of tip: Commit-ID: 9f98a4f4e7216dbe366010b4cdcab6b220f229c4 Gitweb: https://git.kernel.org/tip/9f98a4f4e7216dbe366010b4cdcab6b22= 0f229c4 Author: Vishal Annapurve AuthorDate: Fri, 28 Feb 2025 01:44:15=20 Committer: Ingo Molnar CommitterDate: Wed, 26 Mar 2025 08:51:20 +01:00 x86/tdx: Fix arch_safe_halt() execution for TDX VMs Direct HLT instruction execution causes #VEs for TDX VMs which is routed to hypervisor via TDCALL. If HLT is executed in STI-shadow, resulting #VE handler will enable interrupts before TDCALL is routed to hypervisor leading to missed wakeup events, as current TDX spec doesn't expose interruptibility state information to allow #VE handler to selectively enable interrupts. Commit bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests") prevented the idle routines from executing HLT instruction in STI-shadow. But it missed the paravirt routine which can be reached via this path as an example: kvm_wait() =3D> safe_halt() =3D> raw_safe_halt() =3D> arch_safe_halt() =3D> irq.safe_halt() =3D> pv_native_safe_halt() To reliably handle arch_safe_halt() for TDX VMs, introduce explicit dependency on CONFIG_PARAVIRT and override paravirt halt()/safe_halt() routines with TDX-safe versions that execute direct TDCALL and needed interrupt flag updates. Executing direct TDCALL brings in additional benefit of avoiding HLT related #VEs altogether. As tested by Ryan Afranji: "Tested with the specjbb2015 benchmark. It has heavy lock contention whic= h leads to many halt calls. TDX VMs suffered a poor score before this patchset. Verified the major performance improvement with this patchset applied." Fixes: bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests") Signed-off-by: Vishal Annapurve Signed-off-by: Ingo Molnar Reviewed-by: Kirill A. Shutemov Tested-by: Ryan Afranji Cc: Andy Lutomirski Cc: Brian Gerst Cc: Juergen Gross Cc: H. Peter Anvin Cc: Linus Torvalds Cc: Josh Poimboeuf Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20250228014416.3925664-3-vannapurve@google.= com --- arch/x86/Kconfig | 1 + arch/x86/coco/tdx/tdx.c | 26 +++++++++++++++++++++++++- arch/x86/include/asm/tdx.h | 4 ++-- arch/x86/kernel/process.c | 2 +- 4 files changed, 29 insertions(+), 4 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 05b4eca..f614c05 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -878,6 +878,7 @@ config INTEL_TDX_GUEST depends on X86_64 && CPU_SUP_INTEL depends on X86_X2APIC depends on EFI_STUB + depends on PARAVIRT select ARCH_HAS_CC_PLATFORM select X86_MEM_ENCRYPT select X86_MCE diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 7772b01..aa0eb40 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -398,7 +399,7 @@ static int handle_halt(struct ve_info *ve) return ve_instr_len(ve); } =20 -void __cpuidle tdx_safe_halt(void) +void __cpuidle tdx_halt(void) { const bool irq_disabled =3D false; =20 @@ -409,6 +410,16 @@ void __cpuidle tdx_safe_halt(void) WARN_ONCE(1, "HLT instruction emulation failed\n"); } =20 +static void __cpuidle tdx_safe_halt(void) +{ + tdx_halt(); + /* + * "__cpuidle" section doesn't support instrumentation, so stick + * with raw_* variant that avoids tracing hooks. + */ + raw_local_irq_enable(); +} + static int read_msr(struct pt_regs *regs, struct ve_info *ve) { struct tdx_module_args args =3D { @@ -1110,6 +1121,19 @@ void __init tdx_early_init(void) x86_platform.guest.enc_kexec_finish =3D tdx_kexec_finish; =20 /* + * Avoid "sti;hlt" execution in TDX guests as HLT induces a #VE that + * will enable interrupts before HLT TDCALL invocation if executed + * in STI-shadow, possibly resulting in missed wakeup events. + * + * Modify all possible HLT execution paths to use TDX specific routines + * that directly execute TDCALL and toggle the interrupt state as + * needed after TDCALL completion. This also reduces HLT related #VEs + * in addition to having a reliable halt logic execution. + */ + pv_ops.irq.safe_halt =3D tdx_safe_halt; + pv_ops.irq.halt =3D tdx_halt; + + /* * TDX intercepts the RDMSR to read the X2APIC ID in the parallel * bringup low level code. That raises #VE which cannot be handled * there. diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 65394aa..4a1922e 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -58,7 +58,7 @@ void tdx_get_ve_info(struct ve_info *ve); =20 bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve); =20 -void tdx_safe_halt(void); +void tdx_halt(void); =20 bool tdx_early_handle_ve(struct pt_regs *regs); =20 @@ -72,7 +72,7 @@ void __init tdx_dump_td_ctls(u64 td_ctls); #else =20 static inline void tdx_early_init(void) { }; -static inline void tdx_safe_halt(void) { }; +static inline void tdx_halt(void) { }; =20 static inline bool tdx_early_handle_ve(struct pt_regs *regs) { return fals= e; } =20 diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 91f6ff6..962c3ce 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -939,7 +939,7 @@ void __init select_idle_routine(void) static_call_update(x86_idle, mwait_idle); } else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) { pr_info("using TDX aware idle routine\n"); - static_call_update(x86_idle, tdx_safe_halt); + static_call_update(x86_idle, tdx_halt); } else { static_call_update(x86_idle, default_idle); }