Direct HLT instruction execution causes #VEs for TDX VMs which is routed
to hypervisor via TDCALL. If HLT is executed in STI-shadow, resulting #VE
handler will enable interrupts before TDCALL is routed to hypervisor
leading to missed wakeup events.
Current TDX spec doesn't expose interruptibility state information to
allow #VE handler to selectively enable interrupts. To bypass this
issue, TDX VMs need to replace "sti;hlt" execution with direct TDCALL
followed by explicit interrupt flag update.
Commit bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
prevented the idle routines from executing HLT instruction in STI-shadow.
But it missed the paravirt routine which can be reached like this as an
example:
acpi_safe_halt() =>
raw_safe_halt() =>
arch_safe_halt() =>
irq.safe_halt() =>
pv_native_safe_halt()
To reliably handle arch_safe_halt() for TDX VMs, introduce explicit
dependency on CONFIG_PARAVIRT and override paravirt halt()/safe_halt()
routines with TDX-safe versions that execute direct TDCALL and needed
interrupt flag updates. Executing direct TDCALL brings in additional
benefit of avoiding HLT related #VEs altogether.
Cc: stable@vger.kernel.org
Fixes: bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
Signed-off-by: Vishal Annapurve <vannapurve@google.com>
---
arch/x86/Kconfig | 1 +
arch/x86/coco/tdx/tdx.c | 26 +++++++++++++++++++++++++-
arch/x86/include/asm/tdx.h | 2 +-
arch/x86/kernel/process.c | 2 +-
4 files changed, 28 insertions(+), 3 deletions(-)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index be2c311f5118..933c046e8966 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -902,6 +902,7 @@ config INTEL_TDX_GUEST
depends on X86_64 && CPU_SUP_INTEL
depends on X86_X2APIC
depends on EFI_STUB
+ depends on PARAVIRT
select ARCH_HAS_CC_PLATFORM
select X86_MEM_ENCRYPT
select X86_MCE
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 32809a06dab4..6aad910d119d 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -14,6 +14,7 @@
#include <asm/ia32.h>
#include <asm/insn.h>
#include <asm/insn-eval.h>
+#include <asm/paravirt_types.h>
#include <asm/pgtable.h>
#include <asm/set_memory.h>
#include <asm/traps.h>
@@ -398,7 +399,7 @@ static int handle_halt(struct ve_info *ve)
return ve_instr_len(ve);
}
-void __cpuidle tdx_safe_halt(void)
+void __cpuidle tdx_halt(void)
{
const bool irq_disabled = false;
@@ -409,6 +410,16 @@ void __cpuidle tdx_safe_halt(void)
WARN_ONCE(1, "HLT instruction emulation failed\n");
}
+static void __cpuidle tdx_safe_halt(void)
+{
+ tdx_halt();
+ /*
+ * "__cpuidle" section doesn't support instrumentation, so stick
+ * with raw_* variant that avoids tracing hooks.
+ */
+ raw_local_irq_enable();
+}
+
static int read_msr(struct pt_regs *regs, struct ve_info *ve)
{
struct tdx_module_args args = {
@@ -1109,6 +1120,19 @@ void __init tdx_early_init(void)
x86_platform.guest.enc_kexec_begin = tdx_kexec_begin;
x86_platform.guest.enc_kexec_finish = tdx_kexec_finish;
+ /*
+ * Avoid "sti;hlt" execution in TDX guests as HLT induces a #VE that
+ * will enable interrupts before HLT TDCALL invocation if executed
+ * in STI-shadow, possibly resulting in missed wakeup events.
+ *
+ * Modify all possible HLT execution paths to use TDX specific routines
+ * that directly execute TDCALL and toggle the interrupt state as
+ * needed after TDCALL completion. This also reduces HLT related #VEs
+ * in addition to having a reliable halt logic execution.
+ */
+ pv_ops.irq.safe_halt = tdx_safe_halt;
+ pv_ops.irq.halt = tdx_halt;
+
/*
* TDX intercepts the RDMSR to read the X2APIC ID in the parallel
* bringup low level code. That raises #VE which cannot be handled
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index b4b16dafd55e..393ee2dfaab1 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -58,7 +58,7 @@ void tdx_get_ve_info(struct ve_info *ve);
bool tdx_handle_virt_exception(struct pt_regs *regs, struct ve_info *ve);
-void tdx_safe_halt(void);
+void tdx_halt(void);
bool tdx_early_handle_ve(struct pt_regs *regs);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 6da6769d7254..d11956a178df 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -934,7 +934,7 @@ void __init select_idle_routine(void)
static_call_update(x86_idle, mwait_idle);
} else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
pr_info("using TDX aware idle routine\n");
- static_call_update(x86_idle, tdx_safe_halt);
+ static_call_update(x86_idle, tdx_halt);
} else {
static_call_update(x86_idle, default_idle);
}
--
2.48.1.658.g4767266eb4-goog
Hi Vishal,
kernel test robot noticed the following build errors:
[auto build test ERROR on tip/x86/core]
[also build test ERROR on tip/master linus/master v6.14-rc4 next-20250227]
[cannot apply to tip/x86/tdx tip/auto-latest]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]
url: https://github.com/intel-lab-lkp/linux/commits/Vishal-Annapurve/x86-paravirt-Move-halt-paravirt-calls-under-CONFIG_PARAVIRT/20250225-085043
base: tip/x86/core
patch link: https://lore.kernel.org/r/20250225004704.603652-3-vannapurve%40google.com
patch subject: [PATCH v6 2/3] x86/tdx: Fix arch_safe_halt() execution for TDX VMs
config: i386-buildonly-randconfig-003-20250227 (https://download.01.org/0day-ci/archive/20250227/202502272346.iiQ6Dptt-lkp@intel.com/config)
compiler: clang version 19.1.7 (https://github.com/llvm/llvm-project cd708029e0b2869e80abe31ddb175f7c35361f90)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250227/202502272346.iiQ6Dptt-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202502272346.iiQ6Dptt-lkp@intel.com/
All errors (new ones prefixed by >>):
In file included from arch/x86/kernel/process.c:6:
In file included from include/linux/mm.h:2224:
include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
504 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
505 | item];
| ~~~~
include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
511 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
512 | NR_VM_NUMA_EVENT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~~
>> arch/x86/kernel/process.c:937:32: error: use of undeclared identifier 'tdx_halt'; did you mean 'tdx_init'?
937 | static_call_update(x86_idle, tdx_halt);
| ^~~~~~~~
| tdx_init
include/linux/static_call.h:154:42: note: expanded from macro 'static_call_update'
154 | typeof(&STATIC_CALL_TRAMP(name)) __F = (func); \
| ^
arch/x86/include/asm/tdx.h:123:20: note: 'tdx_init' declared here
123 | static inline void tdx_init(void) { }
| ^
2 warnings and 1 error generated.
vim +937 arch/x86/kernel/process.c
919
920 void __init select_idle_routine(void)
921 {
922 if (boot_option_idle_override == IDLE_POLL) {
923 if (IS_ENABLED(CONFIG_SMP) && __max_threads_per_core > 1)
924 pr_warn_once("WARNING: polling idle and HT enabled, performance may degrade\n");
925 return;
926 }
927
928 /* Required to guard against xen_set_default_idle() */
929 if (x86_idle_set())
930 return;
931
932 if (prefer_mwait_c1_over_halt()) {
933 pr_info("using mwait in idle threads\n");
934 static_call_update(x86_idle, mwait_idle);
935 } else if (cpu_feature_enabled(X86_FEATURE_TDX_GUEST)) {
936 pr_info("using TDX aware idle routine\n");
> 937 static_call_update(x86_idle, tdx_halt);
938 } else {
939 static_call_update(x86_idle, default_idle);
940 }
941 }
942
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
On Thu, Feb 27, 2025 at 8:25 AM kernel test robot <lkp@intel.com> wrote:
>
> Hi Vishal,
>
> kernel test robot noticed the following build errors:
>
> [auto build test ERROR on tip/x86/core]
> [also build test ERROR on tip/master linus/master v6.14-rc4 next-20250227]
> [cannot apply to tip/x86/tdx tip/auto-latest]
> ...
> All errors (new ones prefixed by >>):
>
> In file included from arch/x86/kernel/process.c:6:
> In file included from include/linux/mm.h:2224:
> include/linux/vmstat.h:504:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
> 504 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
> | ~~~~~~~~~~~~~~~~~~~~~ ^
> 505 | item];
> | ~~~~
> include/linux/vmstat.h:511:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
> 511 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
> | ~~~~~~~~~~~~~~~~~~~~~ ^
> 512 | NR_VM_NUMA_EVENT_ITEMS +
> | ~~~~~~~~~~~~~~~~~~~~~~
> >> arch/x86/kernel/process.c:937:32: error: use of undeclared identifier 'tdx_halt'; did you mean 'tdx_init'?
> 937 | static_call_update(x86_idle, tdx_halt);
> | ^~~~~~~~
> | tdx_init
Will fix this in the next version.
On Tue, Feb 25, 2025 at 12:47:03AM +0000, Vishal Annapurve wrote:
> Direct HLT instruction execution causes #VEs for TDX VMs which is routed
> to hypervisor via TDCALL. If HLT is executed in STI-shadow, resulting #VE
> handler will enable interrupts before TDCALL is routed to hypervisor
> leading to missed wakeup events.
>
> Current TDX spec doesn't expose interruptibility state information to
> allow #VE handler to selectively enable interrupts. To bypass this
> issue, TDX VMs need to replace "sti;hlt" execution with direct TDCALL
> followed by explicit interrupt flag update.
>
> Commit bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
> prevented the idle routines from executing HLT instruction in STI-shadow.
> But it missed the paravirt routine which can be reached like this as an
> example:
> acpi_safe_halt() =>
> raw_safe_halt() =>
> arch_safe_halt() =>
> irq.safe_halt() =>
> pv_native_safe_halt()
I would rather use paravirt spinlock example. It is less controversial.
I still see no point in ACPI cpuidle be a thing in TDX guests.
>
> To reliably handle arch_safe_halt() for TDX VMs, introduce explicit
> dependency on CONFIG_PARAVIRT and override paravirt halt()/safe_halt()
> routines with TDX-safe versions that execute direct TDCALL and needed
> interrupt flag updates. Executing direct TDCALL brings in additional
> benefit of avoiding HLT related #VEs altogether.
>
> Cc: stable@vger.kernel.org
> Fixes: bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
> Signed-off-by: Vishal Annapurve <vannapurve@google.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
--
Kiryl Shutsemau / Kirill A. Shutemov
On Wed, Feb 26, 2025 at 3:49 AM Kirill A. Shutemov <kirill@shutemov.name> wrote:
>
> On Tue, Feb 25, 2025 at 12:47:03AM +0000, Vishal Annapurve wrote:
> > Direct HLT instruction execution causes #VEs for TDX VMs which is routed
> > to hypervisor via TDCALL. If HLT is executed in STI-shadow, resulting #VE
> > handler will enable interrupts before TDCALL is routed to hypervisor
> > leading to missed wakeup events.
> >
> > Current TDX spec doesn't expose interruptibility state information to
> > allow #VE handler to selectively enable interrupts. To bypass this
> > issue, TDX VMs need to replace "sti;hlt" execution with direct TDCALL
> > followed by explicit interrupt flag update.
> >
> > Commit bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
> > prevented the idle routines from executing HLT instruction in STI-shadow.
> > But it missed the paravirt routine which can be reached like this as an
> > example:
> > acpi_safe_halt() =>
> > raw_safe_halt() =>
> > arch_safe_halt() =>
> > irq.safe_halt() =>
> > pv_native_safe_halt()
>
> I would rather use paravirt spinlock example. It is less controversial.
> I still see no point in ACPI cpuidle be a thing in TDX guests.
>
I will modify the description to include a paravirt spinlock example.
> >
> > To reliably handle arch_safe_halt() for TDX VMs, introduce explicit
> > dependency on CONFIG_PARAVIRT and override paravirt halt()/safe_halt()
> > routines with TDX-safe versions that execute direct TDCALL and needed
> > interrupt flag updates. Executing direct TDCALL brings in additional
> > benefit of avoiding HLT related #VEs altogether.
> >
> > Cc: stable@vger.kernel.org
> > Fixes: bfe6ed0c6727 ("x86/tdx: Add HLT support for TDX guests")
> > Signed-off-by: Vishal Annapurve <vannapurve@google.com>
>
> Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
>
> --
> Kiryl Shutsemau / Kirill A. Shutemov
© 2016 - 2026 Red Hat, Inc.