[PATCH] x86/tdx, KVM: fix HKID leak when kexec is initiated with active TDs

Nowicki, Robert posted 1 patch 1 month, 3 weeks ago
arch/x86/include/asm/tdx.h  |  2 ++
arch/x86/kvm/vmx/tdx.c      |  3 +++
arch/x86/virt/vmx/tdx/tdx.c | 12 ++++++++++++
3 files changed, 17 insertions(+)
[PATCH] x86/tdx, KVM: fix HKID leak when kexec is initiated with active TDs
Posted by Nowicki, Robert 1 month, 3 weeks ago
When kexec is initiated while TDs are running, vCPU threads can be
mid-TDH.VP.ENTER on other CPUs when tdx_shutdown() fires. The TDX
module rejects TDH.MNG.VPFLUSHDONE for a VP in RUNNING state, leaving
the HKID in a leaked state:

  kvm_intel: tdh_mng_vpflushdone() failed. HKID 33 is leaked.

Fix this by introducing a quiescing flag set at the start of
tdx_shutdown(). KVM's tdx_vcpu_run() checks the flag and returns
EXIT_FASTPATH_NONE before attempting TDH.VP.ENTER. After setting the
flag, tdx_shutdown() calls on_each_cpu(tdx_seam_sync) with wait=1 to
ensure any CPU currently inside TDH.VP.ENTER has exited SEAM before
tdx_sys_disable() is called.

Fixes: 58171ae22e11 ("x86/tdx: Disable the TDX module during kexec and kdump")
Signed-off-by: Nowicki, Robert <robert.nowicki@intel.com>
---
 arch/x86/include/asm/tdx.h  |  2 ++
 arch/x86/kvm/vmx/tdx.c      |  3 +++
 arch/x86/virt/vmx/tdx/tdx.c | 12 ++++++++++++
 3 files changed, 17 insertions(+)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index a0a4a15142fc..68a87bdbca9a 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -173,6 +173,7 @@ static inline int pg_level_to_tdx_sept_level(enum pg_level level)
 }
 
 void tdx_sys_disable(void);
+bool tdx_kexec_quiescing(void);
 
 u64 tdh_vp_enter(struct tdx_vp *vp, struct tdx_module_args *args);
 u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page);
@@ -206,6 +207,7 @@ static inline u32 tdx_get_nr_guest_keyids(void) { return 0; }
 static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; }
 static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NULL; }
 static inline void tdx_sys_disable(void) { }
+static inline bool tdx_kexec_quiescing(void) { return false; }
 #endif	/* CONFIG_INTEL_TDX_HOST */
 
 #endif /* !__ASSEMBLER__ */
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 50a5cfdbd33e..2d658db7700d 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -1053,6 +1053,9 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
 	struct vcpu_tdx *tdx = to_tdx(vcpu);
 	struct vcpu_vt *vt = to_vt(vcpu);
 
+	if (unlikely(tdx_kexec_quiescing()))
+		return EXIT_FASTPATH_NONE;
+
 	/*
 	 * WARN if KVM wants to force an immediate exit, as the TDX module does
 	 * not guarantee entry into the guest, i.e. it's possible for KVM to
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index aaf22a87717a..71c7e4fadda3 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -236,6 +236,16 @@ static void tdx_cpu_flush_cache(void)
 	this_cpu_write(cache_state_incoherent, false);
 }
 
+
+static atomic_t tdx_shutdown_in_progress = ATOMIC_INIT(0);
+
+bool tdx_kexec_quiescing(void)
+{
+	return atomic_read(&tdx_shutdown_in_progress);
+}
+EXPORT_SYMBOL_GPL(tdx_kexec_quiescing);
+
+static void tdx_seam_sync(void *ign) { }
 static void tdx_shutdown_cpu(void *ign)
 {
 	/*
@@ -252,6 +262,8 @@ static void tdx_shutdown_cpu(void *ign)
 
 static void tdx_shutdown(void *ign)
 {
+	atomic_set(&tdx_shutdown_in_progress, 1);
+	on_each_cpu(tdx_seam_sync, NULL, 1);
 	tdx_sys_disable();
 	on_each_cpu(tdx_shutdown_cpu, NULL, 1);
 }
-- 
2.53.0

---------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173 | 80-298 Gdansk | Sad Rejonowy Gdansk Polnoc | VII Wydzial Gospodarczy Krajowego Rejestru Sadowego - KRS 101882 | NIP 957-07-52-316 | Kapital zakladowy 200.000 PLN.
Spolka oswiadcza, ze posiada status duzego przedsiebiorcy w rozumieniu ustawy z dnia 8 marca 2013 r. o przeciwdzialaniu nadmiernym opoznieniom w transakcjach handlowych.

Ta wiadomosc wraz z zalacznikami jest przeznaczona dla okreslonego adresata i moze zawierac informacje poufne. W razie przypadkowego otrzymania tej wiadomosci, prosimy o powiadomienie nadawcy oraz trwale jej usuniecie; jakiekolwiek przegladanie lub rozpowszechnianie jest zabronione.
This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). If you are not the intended recipient, please contact the sender and delete all copies; any review or distribution by others is strictly prohibited.
Re: [PATCH] x86/tdx, KVM: fix HKID leak when kexec is initiated with active TDs
Posted by Sean Christopherson 1 month, 3 weeks ago
On Wed, Apr 22, 2026, Robert Nowicki wrote:
> When kexec is initiated while TDs are running, vCPU threads can be
> mid-TDH.VP.ENTER on other CPUs when tdx_shutdown() fires. The TDX
> module rejects TDH.MNG.VPFLUSHDONE for a VP in RUNNING state, leaving
> the HKID in a leaked state:
> 
>   kvm_intel: tdh_mng_vpflushdone() failed. HKID 33 is leaked.
> 
> Fix this by introducing a quiescing flag set at the start of
> tdx_shutdown(). KVM's tdx_vcpu_run() checks the flag and returns
> EXIT_FASTPATH_NONE before attempting TDH.VP.ENTER. After setting the
> flag, tdx_shutdown() calls on_each_cpu(tdx_seam_sync) with wait=1 to
> ensure any CPU currently inside TDH.VP.ENTER has exited SEAM before
> tdx_sys_disable() is called.
> 
> Fixes: 58171ae22e11 ("x86/tdx: Disable the TDX module during kexec and kdump")

Please don't post seemingly standalone patches for code that hasn't yet been
merged, it's quite confusing.

>  u64 tdh_vp_enter(struct tdx_vp *vp, struct tdx_module_args *args);
>  u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page);
> @@ -206,6 +207,7 @@ static inline u32 tdx_get_nr_guest_keyids(void) { return 0; }
>  static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; }
>  static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NULL; }
>  static inline void tdx_sys_disable(void) { }
> +static inline bool tdx_kexec_quiescing(void) { return false; }
>  #endif	/* CONFIG_INTEL_TDX_HOST */
>  
>  #endif /* !__ASSEMBLER__ */
> diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> index 50a5cfdbd33e..2d658db7700d 100644
> --- a/arch/x86/kvm/vmx/tdx.c
> +++ b/arch/x86/kvm/vmx/tdx.c
> @@ -1053,6 +1053,9 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64 run_flags)
>  	struct vcpu_tdx *tdx = to_tdx(vcpu);
>  	struct vcpu_vt *vt = to_vt(vcpu);
>  
> +	if (unlikely(tdx_kexec_quiescing()))

Requiring KVM to check a global on every entry is pretty ugly, especially since
this is for a very rare scenario (in terms of number of entries).  And forcing
KVM to do a CALL+RET to check an almost-never-set flag is especially ugly.

Why not handle this entirely in tdx_shutdown_cpu()?  E.g. have the last CPU through
disable TDX, and hld all the CPUs hostage until that's done.  It's not the prettiest
thing in the world, but it's entirely self-contained.

static void tdx_shutdown_cpu(void *__nr_cpus_remaining)
{
	atomic_t *nr_cpus_remaining = __nr_cpus_remaining;

	if (!atomic_add_unless(nr_cpus_remaining, -1, 1)) {
		tdx_sys_disable();
		atomic_set(nr_cpus_remaining, 0);
	}

	x86_virt_put_ref(X86_FEATURE_VMX);

	while (!atomic_read(nr_cpus_remaining))
		cpu_relax();
}

static void tdx_shutdown(void *ign)
{
	atomic_t nr_cpus_remaining = ATOMIC_INIT(num_online_cpus());

	on_each_cpu(tdx_shutdown_cpu, &nr_cpus_remaining, 1);
}
Re: [PATCH] x86/tdx, KVM: fix HKID leak when kexec is initiated with active TDs
Posted by Edgecombe, Rick P 1 month, 3 weeks ago
On Wed, 2026-04-22 at 06:14 -0700, Sean Christopherson wrote:
> On Wed, Apr 22, 2026, Robert Nowicki wrote:
> > When kexec is initiated while TDs are running, vCPU threads can be
> > mid-TDH.VP.ENTER on other CPUs when tdx_shutdown() fires. The TDX
> > module rejects TDH.MNG.VPFLUSHDONE for a VP in RUNNING state, leaving
> > the HKID in a leaked state:
> > 
> >    kvm_intel: tdh_mng_vpflushdone() failed. HKID 33 is leaked.
> > 
> > Fix this by introducing a quiescing flag set at the start of
> > tdx_shutdown(). KVM's tdx_vcpu_run() checks the flag and returns
> > EXIT_FASTPATH_NONE before attempting TDH.VP.ENTER. After setting the
> > flag, tdx_shutdown() calls on_each_cpu(tdx_seam_sync) with wait=1 to
> > ensure any CPU currently inside TDH.VP.ENTER has exited SEAM before
> > tdx_sys_disable() is called.
> > 
> > Fixes: 58171ae22e11 ("x86/tdx: Disable the TDX module during kexec and
> > kdump")
> 
> Please don't post seemingly standalone patches for code that hasn't yet been
> merged, it's quite confusing.

+1. Robert, we try to coordinate public Linux TDX work internally before posting
because there is so much of it, it gets confusing to community/maintainers.
Please check in with the Linux TDX developers before posing TDX patches so we
can have a cohesive effort.

> 
> >   u64 tdh_vp_enter(struct tdx_vp *vp, struct tdx_module_args *args);
> >   u64 tdh_mng_addcx(struct tdx_td *td, struct page *tdcs_page);
> > @@ -206,6 +207,7 @@ static inline u32 tdx_get_nr_guest_keyids(void) { return
> > 0; }
> >   static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL;
> > }
> >   static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return
> > NULL; }
> >   static inline void tdx_sys_disable(void) { }
> > +static inline bool tdx_kexec_quiescing(void) { return false; }
> >   #endif	/* CONFIG_INTEL_TDX_HOST */
> >   
> >   #endif /* !__ASSEMBLER__ */
> > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
> > index 50a5cfdbd33e..2d658db7700d 100644
> > --- a/arch/x86/kvm/vmx/tdx.c
> > +++ b/arch/x86/kvm/vmx/tdx.c
> > @@ -1053,6 +1053,9 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, u64
> > run_flags)
> >   	struct vcpu_tdx *tdx = to_tdx(vcpu);
> >   	struct vcpu_vt *vt = to_vt(vcpu);
> >   
> > +	if (unlikely(tdx_kexec_quiescing()))

There is essentially an existing kexec race, where vmxoff happens when SEAMCALLs
could still happen. It goes back to the first TDX kexec support (i.e. not
introduced by vmxon refactor). VMX KVM has some spurious logic to handle
something similar for normal VMs, but TDX doesn't. 

I don't see why this TDH.MNG.VPFLUSHDONE case is special. If the TDX module is
shutdown and the old kernel is going away, how is anything leaked other than the
normal type of leakage that happens during kexec? So I think maybe this is just
the known vmxoff seamcall race, with the specific case observed generating a
message about leaking.

Also, not sure how handling VP.ENTER would prevent the VPFLUSHDONE call from
meeting an error and emitting the same message. If the TDX module is shutdown...

> 
> Requiring KVM to check a global on every entry is pretty ugly, especially
> since this is for a very rare scenario (in terms of number of entries).  And
> forcing KVM to do a CALL+RET to check an almost-never-set flag is especially
> ugly.
> 
> Why not handle this entirely in tdx_shutdown_cpu()?  E.g. have the last CPU
> through disable TDX, and hld all the CPUs hostage until that's done.  It's not
> the prettiest thing in the world, but it's entirely self-contained.
> 
> static void tdx_shutdown_cpu(void *__nr_cpus_remaining)
> {
> 	atomic_t *nr_cpus_remaining = __nr_cpus_remaining;
> 
> 	if (!atomic_add_unless(nr_cpus_remaining, -1, 1)) {
> 		tdx_sys_disable();
> 		atomic_set(nr_cpus_remaining, 0);
> 	}
> 
> 	x86_virt_put_ref(X86_FEATURE_VMX);
> 
> 	while (!atomic_read(nr_cpus_remaining))
> 		cpu_relax();
> }
> 
> static void tdx_shutdown(void *ign)
> {
> 	atomic_t nr_cpus_remaining = ATOMIC_INIT(num_online_cpus());
> 
> 	on_each_cpu(tdx_shutdown_cpu, &nr_cpus_remaining, 1);
> }

After vmxoff happens, the SEAMCALLs will just meet other errors. The wrappers
will morph the vmxoff condition into a SW error that much of the TDX code can't
handle either. So it doesn't help the problem I'm afraid.

It would be my preference to fix the existing issue separately than this series.
This series makes kexec way more functional for TDX, and the worst case AFAICT
is a splat in an otherwise successful kexec. So a non-critical and existing
problem.

Kai and I were previously kicking around some ideas about the general case
problem. It somehow missed our cleanup list, but I just added it.