On TDX platforms, during kexec, the kernel needs to make sure there are
no dirty cachelines of TDX private memory before booting to the new
kernel to avoid silent memory corruption to the new kernel.
During kexec, the kexec-ing CPU firstly invokes native_stop_other_cpus()
to stop all remote CPUs before booting to the new kernel. The remote
CPUs will then execute stop_this_cpu() to stop themselves.
The kernel has a percpu boolean to indicate whether the cache of a CPU
may be in incoherent state. In stop_this_cpu(), the kernel does WBINVD
if that percpu boolean is true.
TDX turns on that percpu boolean on a CPU when the kernel does SEAMCALL.
This makes sure the caches will be flushed during kexec.
However, the native_stop_other_cpus() and stop_this_cpu() have a "race"
which is extremely rare to happen but could cause the system to hang.
Specifically, the native_stop_other_cpus() firstly sends normal reboot
IPI to remote CPUs and waits one second for them to stop. If that times
out, native_stop_other_cpus() then sends NMIs to remote CPUs to stop
them.
The aforementioned race happens when NMIs are sent. Doing WBINVD in
stop_this_cpu() makes each CPU take longer time to stop and increases
the chance of the race happening.
Explicitly flush cache in tdx_disable_virtualization_cpu() after which
no more TDX activity can happen on this cpu. This moves the WBINVD to
an earlier stage than stop_this_cpus(), avoiding a possibly lengthy
operation at a time where it could cause this race.
Signed-off-by: Kai Huang <kai.huang@intel.com>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>
Reviewed-by: Chao Gao <chao.gao@intel.com>
---
v5 -> v6:
- Add Chao's RB.
v4 -> v5:
- No change
v3 -> v4:
- Change doing wbinvd() from rebooting notifier to
tdx_disable_virtualization_cpu() to cover the case where more
SEAMCALL can be made after cache flush, i.e., doing kexec when
there's TD alive. - Chao.
- Add check to skip wbinvd if the boolean is false. -- Chao
- Fix typo in the comment -- Binbin.
---
arch/x86/include/asm/tdx.h | 2 ++
arch/x86/kvm/vmx/tdx.c | 12 ++++++++++++
arch/x86/virt/vmx/tdx/tdx.c | 12 ++++++++++++
3 files changed, 26 insertions(+)
diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 0922265c6bdc..e9a213582f03 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -217,6 +217,7 @@ u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_err1, u6
u64 tdh_phymem_cache_wb(bool resume);
u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td);
u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page);
+void tdx_cpu_flush_cache(void);
#else
static inline void tdx_init(void) { }
static inline int tdx_cpu_enable(void) { return -ENODEV; }
@@ -224,6 +225,7 @@ static inline int tdx_enable(void) { return -ENODEV; }
static inline u32 tdx_get_nr_guest_keyids(void) { return 0; }
static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; }
static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NULL; }
+static inline void tdx_cpu_flush_cache(void) { }
#endif /* CONFIG_INTEL_TDX_HOST */
#endif /* !__ASSEMBLER__ */
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 66744f5768c8..1bc6f52e0cd7 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -442,6 +442,18 @@ void tdx_disable_virtualization_cpu(void)
tdx_flush_vp(&arg);
}
local_irq_restore(flags);
+
+ /*
+ * No more TDX activity on this CPU from here. Flush cache to
+ * avoid having to do WBINVD in stop_this_cpu() during kexec.
+ *
+ * Kexec calls native_stop_other_cpus() to stop remote CPUs
+ * before booting to new kernel, but that code has a "race"
+ * when the normal REBOOT IPI times out and NMIs are sent to
+ * remote CPUs to stop them. Doing WBINVD in stop_this_cpu()
+ * could potentially increase the possibility of the "race".
+ */
+ tdx_cpu_flush_cache();
}
#define TDX_SEAMCALL_RETRIES 10000
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index 3ea6f587c81a..c26e2e07ff6b 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1870,3 +1870,15 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page)
return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args);
}
EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_hkid);
+
+void tdx_cpu_flush_cache(void)
+{
+ lockdep_assert_preemption_disabled();
+
+ if (!this_cpu_read(cache_state_incoherent))
+ return;
+
+ wbinvd();
+ this_cpu_write(cache_state_incoherent, false);
+}
+EXPORT_SYMBOL_GPL(tdx_cpu_flush_cache);
--
2.50.1
On Thu, Aug 14, 2025, Kai Huang wrote: > arch/x86/include/asm/tdx.h | 2 ++ > arch/x86/kvm/vmx/tdx.c | 12 ++++++++++++ > arch/x86/virt/vmx/tdx/tdx.c | 12 ++++++++++++ > 3 files changed, 26 insertions(+) > > diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h > index 0922265c6bdc..e9a213582f03 100644 > --- a/arch/x86/include/asm/tdx.h > +++ b/arch/x86/include/asm/tdx.h > @@ -217,6 +217,7 @@ u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_err1, u6 > u64 tdh_phymem_cache_wb(bool resume); > u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td); > u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page); > +void tdx_cpu_flush_cache(void); > #else > static inline void tdx_init(void) { } > static inline int tdx_cpu_enable(void) { return -ENODEV; } > @@ -224,6 +225,7 @@ static inline int tdx_enable(void) { return -ENODEV; } > static inline u32 tdx_get_nr_guest_keyids(void) { return 0; } > static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; } > static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NULL; } > +static inline void tdx_cpu_flush_cache(void) { } Stub is unnecessary. tdx.c is built iff KVM_INTEL_TDX=y, and that depends on INTEL_TDX_HOST. At a glance, some of the existing stubs are useless as well. > #endif /* CONFIG_INTEL_TDX_HOST */ > > #endif /* !__ASSEMBLER__ */ > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > index 66744f5768c8..1bc6f52e0cd7 100644 > --- a/arch/x86/kvm/vmx/tdx.c > +++ b/arch/x86/kvm/vmx/tdx.c > @@ -442,6 +442,18 @@ void tdx_disable_virtualization_cpu(void) > tdx_flush_vp(&arg); > } > local_irq_restore(flags); > + > + /* > + * No more TDX activity on this CPU from here. Flush cache to > + * avoid having to do WBINVD in stop_this_cpu() during kexec. > + * > + * Kexec calls native_stop_other_cpus() to stop remote CPUs > + * before booting to new kernel, but that code has a "race" > + * when the normal REBOOT IPI times out and NMIs are sent to > + * remote CPUs to stop them. Doing WBINVD in stop_this_cpu() > + * could potentially increase the possibility of the "race". > + */ > + tdx_cpu_flush_cache(); IIUC, this can be: if (IS_ENABLED(CONFIG_KEXEC)) tdx_cpu_flush_cache(); > } > > #define TDX_SEAMCALL_RETRIES 10000 > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c > index 3ea6f587c81a..c26e2e07ff6b 100644 > --- a/arch/x86/virt/vmx/tdx/tdx.c > +++ b/arch/x86/virt/vmx/tdx/tdx.c > @@ -1870,3 +1870,15 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page) > return seamcall(TDH_PHYMEM_PAGE_WBINVD, &args); > } > EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_hkid); > + > +void tdx_cpu_flush_cache(void) > +{ > + lockdep_assert_preemption_disabled(); > + > + if (!this_cpu_read(cache_state_incoherent)) > + return; > + > + wbinvd(); > + this_cpu_write(cache_state_incoherent, false); > +} > +EXPORT_SYMBOL_GPL(tdx_cpu_flush_cache); > -- > 2.50.1 >
On Thu, 2025-08-14 at 06:54 -0700, Sean Christopherson wrote: > > diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h > > index 0922265c6bdc..e9a213582f03 100644 > > --- a/arch/x86/include/asm/tdx.h > > +++ b/arch/x86/include/asm/tdx.h > > @@ -217,6 +217,7 @@ u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_err1, u6 > > u64 tdh_phymem_cache_wb(bool resume); > > u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td); > > u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page); > > +void tdx_cpu_flush_cache(void); > > #else > > static inline void tdx_init(void) { } > > static inline int tdx_cpu_enable(void) { return -ENODEV; } > > @@ -224,6 +225,7 @@ static inline int tdx_enable(void) { return -ENODEV; } > > static inline u32 tdx_get_nr_guest_keyids(void) { return 0; } > > static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; } > > static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NULL; } > > +static inline void tdx_cpu_flush_cache(void) { } > > Stub is unnecessary. tdx.c is built iff KVM_INTEL_TDX=y, and that depends on > INTEL_TDX_HOST. > > At a glance, some of the existing stubs are useless as well. OK I will remove it. Thanks.
On Thu, 2025-08-14 at 06:54 -0700, Sean Christopherson wrote: > > static inline void tdx_init(void) { } > > static inline int tdx_cpu_enable(void) { return -ENODEV; } > > @@ -224,6 +225,7 @@ static inline int tdx_enable(void) { return -ENODEV; } > > static inline u32 tdx_get_nr_guest_keyids(void) { return 0; } > > static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; > > } > > static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return > > NULL; } > > +static inline void tdx_cpu_flush_cache(void) { } > > Stub is unnecessary. tdx.c is built iff KVM_INTEL_TDX=y, and that depends on > INTEL_TDX_HOST. > > At a glance, some of the existing stubs are useless as well. Oh, yep. We'll add it to the cleanup list. > > > #endif /* CONFIG_INTEL_TDX_HOST */ > > > > #endif /* !__ASSEMBLER__ */ > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > > index 66744f5768c8..1bc6f52e0cd7 100644 > > --- a/arch/x86/kvm/vmx/tdx.c > > +++ b/arch/x86/kvm/vmx/tdx.c > > @@ -442,6 +442,18 @@ void tdx_disable_virtualization_cpu(void) > > tdx_flush_vp(&arg); > > } > > local_irq_restore(flags); > > + > > + /* > > + * No more TDX activity on this CPU from here. Flush cache to > > + * avoid having to do WBINVD in stop_this_cpu() during kexec. > > + * > > + * Kexec calls native_stop_other_cpus() to stop remote CPUs > > + * before booting to new kernel, but that code has a "race" > > + * when the normal REBOOT IPI times out and NMIs are sent to > > + * remote CPUs to stop them. Doing WBINVD in stop_this_cpu() > > + * could potentially increase the possibility of the "race". > > + */ > > + tdx_cpu_flush_cache(); > > IIUC, this can be: > > if (IS_ENABLED(CONFIG_KEXEC)) > tdx_cpu_flush_cache(); > No strong objection, just 2 cents. I bet !CONFIG_KEXEC && CONFIG_INTEL_TDX_HOST kernels will be the minority. Seems like an opportunity to simplify the code. But if we do this, we should probably compile out the unused tdx_cpu_flush_cache() too. > > } > >
On Thu, Aug 14, 2025, Rick P Edgecombe wrote: > On Thu, 2025-08-14 at 06:54 -0700, Sean Christopherson wrote: > > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > > > index 66744f5768c8..1bc6f52e0cd7 100644 > > > --- a/arch/x86/kvm/vmx/tdx.c > > > +++ b/arch/x86/kvm/vmx/tdx.c > > > @@ -442,6 +442,18 @@ void tdx_disable_virtualization_cpu(void) > > > tdx_flush_vp(&arg); > > > } > > > local_irq_restore(flags); > > > + > > > + /* > > > + * No more TDX activity on this CPU from here. Flush cache to > > > + * avoid having to do WBINVD in stop_this_cpu() during kexec. > > > + * > > > + * Kexec calls native_stop_other_cpus() to stop remote CPUs > > > + * before booting to new kernel, but that code has a "race" > > > + * when the normal REBOOT IPI times out and NMIs are sent to > > > + * remote CPUs to stop them. Doing WBINVD in stop_this_cpu() > > > + * could potentially increase the possibility of the "race". Why is that race problematic? The changelog just says : However, the native_stop_other_cpus() and stop_this_cpu() have a "race" : which is extremely rare to happen but could cause the system to hang. : : Specifically, the native_stop_other_cpus() firstly sends normal reboot : IPI to remote CPUs and waits one second for them to stop. If that times : out, native_stop_other_cpus() then sends NMIs to remote CPUs to stop : them. without explaining how that can cause a system hang. > > > + */ > > > + tdx_cpu_flush_cache(); > > > > IIUC, this can be: > > > > if (IS_ENABLED(CONFIG_KEXEC)) > > tdx_cpu_flush_cache(); > > > > No strong objection, just 2 cents. I bet !CONFIG_KEXEC && CONFIG_INTEL_TDX_HOST > kernels will be the minority. Seems like an opportunity to simplify the code. Reducing the number of lines of code is not always a simplification. IMO, not checking CONFIG_KEXEC adds "complexity" because anyone that reads the comment (and/or the massive changelog) will be left wondering why there's a bunch of documentation that talks about kexec, but no hint of kexec considerations in the code.
On Thu, 2025-08-14 at 11:00 -0700, Sean Christopherson wrote: > On Thu, Aug 14, 2025, Rick P Edgecombe wrote: > > On Thu, 2025-08-14 at 06:54 -0700, Sean Christopherson wrote: > > > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > > > > index 66744f5768c8..1bc6f52e0cd7 100644 > > > > --- a/arch/x86/kvm/vmx/tdx.c > > > > +++ b/arch/x86/kvm/vmx/tdx.c > > > > @@ -442,6 +442,18 @@ void tdx_disable_virtualization_cpu(void) > > > > tdx_flush_vp(&arg); > > > > } > > > > local_irq_restore(flags); > > > > + > > > > + /* > > > > + * No more TDX activity on this CPU from here. Flush cache to > > > > + * avoid having to do WBINVD in stop_this_cpu() during kexec. > > > > + * > > > > + * Kexec calls native_stop_other_cpus() to stop remote CPUs > > > > + * before booting to new kernel, but that code has a "race" > > > > + * when the normal REBOOT IPI times out and NMIs are sent to > > > > + * remote CPUs to stop them. Doing WBINVD in stop_this_cpu() > > > > + * could potentially increase the possibility of the "race". > > Why is that race problematic? The changelog just says > > : However, the native_stop_other_cpus() and stop_this_cpu() have a "race" > : which is extremely rare to happen but could cause the system to hang. > : even > : Specifically, the native_stop_other_cpus() firstly sends normal reboot > : IPI to remote CPUs and waits one second for them to stop. If that times > : out, native_stop_other_cpus() then sends NMIs to remote CPUs to stop > : them. > > without explaining how that can cause a system hang. Thanks for review. Sean. The race is about the kexec-ing CPU could jump to second kernel when other CPUs have not fully stopped. In the patch 3 I appended a link in the changelog to explain the race: https://lore.kernel.org/kvm/b963fcd60abe26c7ec5dc20b42f1a2ebbcc72397.1750934177.git.kai.huang@intel.com/ Please see "[*] The "race" in native_stop_other_cpus()" part. I will put the link in the changelog of this patch too. > > > > > + */ > > > > + tdx_cpu_flush_cache(); > > > > > > IIUC, this can be: > > > > > > if (IS_ENABLED(CONFIG_KEXEC)) > > > tdx_cpu_flush_cache(); > > > > > > > No strong objection, just 2 cents. I bet !CONFIG_KEXEC && CONFIG_INTEL_TDX_HOST > > kernels will be the minority. Seems like an opportunity to simplify the code. > > Reducing the number of lines of code is not always a simplification. IMO, not > checking CONFIG_KEXEC adds "complexity" because anyone that reads the comment > (and/or the massive changelog) will be left wondering why there's a bunch of > documentation that talks about kexec, but no hint of kexec considerations in the > code. I think we can use 'kexec_in_progress', which is even better than IS_ENABLED(CONFIG_KEXEC) IMHO. When CONFIG_KEXEC is on, 'kexec_in_progress' will only be set when kexec is actually happening, thus tdx_cpu_flush_cache() will only be called for kexec. When CONFIG_KEXEC (CONFIG_KEXEC_CORE) is off, then 'kexec_in_progress' is a macro defined to false. The compiler can optimize this out too I suppose. Any comments?
On Thu, Aug 14, 2025, Kai Huang wrote: > On Thu, 2025-08-14 at 11:00 -0700, Sean Christopherson wrote: > > > > > + */ > > > > > + tdx_cpu_flush_cache(); > > > > > > > > IIUC, this can be: > > > > > > > > if (IS_ENABLED(CONFIG_KEXEC)) > > > > tdx_cpu_flush_cache(); > > > > > > > > > > No strong objection, just 2 cents. I bet !CONFIG_KEXEC && CONFIG_INTEL_TDX_HOST > > > kernels will be the minority. Seems like an opportunity to simplify the code. > > > > Reducing the number of lines of code is not always a simplification. IMO, not > > checking CONFIG_KEXEC adds "complexity" because anyone that reads the comment > > (and/or the massive changelog) will be left wondering why there's a bunch of > > documentation that talks about kexec, but no hint of kexec considerations in the > > code. > > I think we can use 'kexec_in_progress', which is even better than > IS_ENABLED(CONFIG_KEXEC) IMHO. I don't think that will accomplish what you want. E.g. kvm-intel.ko is unloaded after doing TDX things, while kexec_in_progress=false, and then some time later a kexec is triggered. In that case, stop_this_cpu() will still get stuck doing WBINVD.
On Thu, 2025-08-14 at 16:22 -0700, Sean Christopherson wrote: > On Thu, Aug 14, 2025, Kai Huang wrote: > > On Thu, 2025-08-14 at 11:00 -0700, Sean Christopherson wrote: > > > > > > + */ > > > > > > + tdx_cpu_flush_cache(); > > > > > > > > > > IIUC, this can be: > > > > > > > > > > if (IS_ENABLED(CONFIG_KEXEC)) > > > > > tdx_cpu_flush_cache(); > > > > > > > > > > > > > No strong objection, just 2 cents. I bet !CONFIG_KEXEC && CONFIG_INTEL_TDX_HOST > > > > kernels will be the minority. Seems like an opportunity to simplify the code. > > > > > > Reducing the number of lines of code is not always a simplification. IMO, not > > > checking CONFIG_KEXEC adds "complexity" because anyone that reads the comment > > > (and/or the massive changelog) will be left wondering why there's a bunch of > > > documentation that talks about kexec, but no hint of kexec considerations in the > > > code. > > > > I think we can use 'kexec_in_progress', which is even better than > > IS_ENABLED(CONFIG_KEXEC) IMHO. > > I don't think that will accomplish what you want. E.g. kvm-intel.ko is unloaded > after doing TDX things, while kexec_in_progress=false, and then some time later > a kexec is triggered. In that case, stop_this_cpu() will still get stuck doing > WBINVD. Right. Thanks. Let me think more on this. One minor thing is I think we should use IS_ENABLED(CONFIG_KEXEC_CORE) instead. Besides the CONFIG_KEXEC, there is another CONFIG_KEXEC_FILE, and both of them select CONFIG_KEXEC_CORE.
On 8/15/25 02:00, Huang, Kai wrote: > On Thu, 2025-08-14 at 16:22 -0700, Sean Christopherson wrote: >> On Thu, Aug 14, 2025, Kai Huang wrote: >>> On Thu, 2025-08-14 at 11:00 -0700, Sean Christopherson wrote: >>>> Reducing the number of lines of code is not always a simplification. IMO, not >>>> checking CONFIG_KEXEC adds "complexity" because anyone that reads the comment >>>> (and/or the massive changelog) will be left wondering why there's a bunch of >>>> documentation that talks about kexec, but no hint of kexec considerations in the >>>> code. > > One minor thing is I think we should use IS_ENABLED(CONFIG_KEXEC_CORE) > instead. Besides the CONFIG_KEXEC, there is another CONFIG_KEXEC_FILE, > and both of them select CONFIG_KEXEC_CORE. Agreed on needing CONFIG_KEXEC_CORE if you did test it, however: 1) The big comment, explaining how this is done for kexec, makes it clear that this is what the WBINVD is needed for. I don't think you'd be left wondering why there's no hint of kexec in the comment. 2) ... but anyway, KVM is the wrong place to do the test. If anything, since we need a v7 to change the unnecessary stub, you could move that stub under #ifndef CONFIG_KEXEC_CORE and rename the function to tdx_cpu_flush_cache_for_kexec(). Paolo
On Tue, 2025-08-19 at 12:31 +0200, Paolo Bonzini wrote: > On 8/15/25 02:00, Huang, Kai wrote: > > On Thu, 2025-08-14 at 16:22 -0700, Sean Christopherson wrote: > > > On Thu, Aug 14, 2025, Kai Huang wrote: > > > > On Thu, 2025-08-14 at 11:00 -0700, Sean Christopherson wrote: > > > > > Reducing the number of lines of code is not always a simplification. IMO, not > > > > > checking CONFIG_KEXEC adds "complexity" because anyone that reads the comment > > > > > (and/or the massive changelog) will be left wondering why there's a bunch of > > > > > documentation that talks about kexec, but no hint of kexec considerations in the > > > > > code. > > > > One minor thing is I think we should use IS_ENABLED(CONFIG_KEXEC_CORE) > > instead. Besides the CONFIG_KEXEC, there is another CONFIG_KEXEC_FILE, > > and both of them select CONFIG_KEXEC_CORE. > > Agreed on needing CONFIG_KEXEC_CORE if you did test it, however: > > 1) The big comment, explaining how this is done for kexec, makes it > clear that this is what the WBINVD is needed for. I don't think you'd > be left wondering why there's no hint of kexec in the comment. (Thanks for the feedback.) Agreed. > > 2) ... but anyway, KVM is the wrong place to do the test. If anything, > since we need a v7 to change the unnecessary stub, you could move that > stub under #ifndef CONFIG_KEXEC_CORE and rename the function to > tdx_cpu_flush_cache_for_kexec(). > > Paolo Agreed on renaming to tdx_cpu_flush_cache_for_kexec(). But with the "for_kexec()" part in the function name, it already implies it is related to kexec, and I kinda think there's no need to test IS_ENABLED(CONFIG_KEXEC_CORE) anymore. One of the main purpose of this series is to unblock TDX_HOST and KEXEC in the Kconfig, since otherwise I've been told distros will simply choose to disable TDX_HOST in the Kconfig. So in reality, I suppose they will be on together probably in like 95% cases, if not 100%. If we want to test CONFIG_KEXEC_CORE in tdx_cpu_flush_cache_for_kexec(), then it would be a little bit weird that why we don't test it in other places, e.g., when setting up the boolean. Testing it in all places would make the code unnecessarily long and harder to read. So my preference is to simply rename to tdx_cpu_flush_cache_for_kexec(). Paolo/Sean, are you OK with this?
On 8/19/25 23:53, Huang, Kai wrote: > If we want to test CONFIG_KEXEC_CORE in tdx_cpu_flush_cache_for_kexec(), > then it would be a little bit weird that why we don't test it in other > places, e.g., when setting up the boolean. Testing it in all places would > make the code unnecessarily long and harder to read. I agree about not checking everywhere. But I think this is a good compromise too if v6 is not acceptable as is (and as far as I am concerned, it would be): diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index e9a213582f03..913199b1954b 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -217,7 +217,6 @@ u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_err1, u6 u64 tdh_phymem_cache_wb(bool resume); u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td); u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page); -void tdx_cpu_flush_cache(void); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } @@ -225,8 +224,13 @@ static inline int tdx_enable(void) { return -ENODEV; } static inline u32 tdx_get_nr_guest_keyids(void) { return 0; } static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; } static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NULL; } -static inline void tdx_cpu_flush_cache(void) { } #endif /* CONFIG_INTEL_TDX_HOST */ +#ifdef CONFIG_KEXEC_CORE +void tdx_cpu_flush_cache_for_kexec(void); +#else +static inline void tdx_cpu_flush_cache_for_kexec(void) { } +#endif + #endif /* !__ASSEMBLER__ */ #endif /* _ASM_X86_TDX_H */ diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 93477233baae..376d49ef4472 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -453,7 +453,7 @@ void tdx_disable_virtualization_cpu(void) * remote CPUs to stop them. Doing WBINVD in stop_this_cpu() * could potentially increase the possibility of the "race". */ - tdx_cpu_flush_cache(); + tdx_cpu_flush_cache_for_kexec(); } #define TDX_SEAMCALL_RETRIES 10000 diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index c26e2e07ff6b..cd2a36dbbfc5 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1871,7 +1871,8 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page) } EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_hkid); -void tdx_cpu_flush_cache(void) +#ifdef CONFIG_KEXEC_CORE +void tdx_cpu_flush_cache_for_kexec(void) { lockdep_assert_preemption_disabled(); @@ -1881,4 +1881,5 @@ void tdx_cpu_flush_cache(void) wbinvd(); this_cpu_write(cache_state_incoherent, false); } -EXPORT_SYMBOL_GPL(tdx_cpu_flush_cache); +EXPORT_SYMBOL_GPL(tdx_cpu_flush_cache_for_kexec); +#endif It solves pretty much all the objections that Sean had, in one fell swoop. The name clearly references kexec, it's stubbed out if not in use, and it's not anymore unnecessarily under CONFIG_INTEL_TDX_HOST. Paolo
On 8/19/25 23:53, Huang, Kai wrote: > On Tue, 2025-08-19 at 12:31 +0200, Paolo Bonzini wrote: >> 2) ... but anyway, KVM is the wrong place to do the test. If anything, >> since we need a v7 to change the unnecessary stub, you could move that >> stub under #ifndef CONFIG_KEXEC_CORE and rename the function to >> tdx_cpu_flush_cache_for_kexec(). > > Agreed on renaming to tdx_cpu_flush_cache_for_kexec(). > > But with the "for_kexec()" part in the function name, it already implies > it is related to kexec, and I kinda think there's no need to test > IS_ENABLED(CONFIG_KEXEC_CORE) anymore. > > One of the main purpose of this series is to unblock TDX_HOST and KEXEC in > the Kconfig, since otherwise I've been told distros will simply choose to > disable TDX_HOST in the Kconfig. So in reality, I suppose they will be on > together probably in like 95% cases, if not 100%. > > If we want to test CONFIG_KEXEC_CORE in tdx_cpu_flush_cache_for_kexec(), > then it would be a little bit weird that why we don't test it in other > places, e.g., when setting up the boolean. Testing it in all places would > make the code unnecessarily long and harder to read. No I don't mean testing it there, but just making tdx_cpu_flush_cache_for_kexec() a stub when CONFIG_KEXEC_CORE is undefined: diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index e9a213582f03..913199b1954b 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -217,7 +217,6 @@ u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_err1, u6 u64 tdh_phymem_cache_wb(bool resume); u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td); u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page); -void tdx_cpu_flush_cache(void); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } @@ -225,8 +224,13 @@ static inline int tdx_enable(void) { return -ENODEV; } static inline u32 tdx_get_nr_guest_keyids(void) { return 0; } static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; } static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NULL; } -static inline void tdx_cpu_flush_cache(void) { } #endif /* CONFIG_INTEL_TDX_HOST */ +#ifdef CONFIG_KEXEC_CORE +void tdx_cpu_flush_cache_for_kexec(void); +#else +static inline void tdx_cpu_flush_cache_for_kexec(void) { } +#endif + #endif /* !__ASSEMBLER__ */ #endif /* _ASM_X86_TDX_H */ diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 93477233baae..376d49ef4472 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -453,7 +453,7 @@ void tdx_disable_virtualization_cpu(void) * remote CPUs to stop them. Doing WBINVD in stop_this_cpu() * could potentially increase the possibility of the "race". */ - tdx_cpu_flush_cache(); + tdx_cpu_flush_cache_for_kexec(); } #define TDX_SEAMCALL_RETRIES 10000 diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index c26e2e07ff6b..cd2a36dbbfc5 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1871,7 +1871,7 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page) } EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_hkid); -void tdx_cpu_flush_cache(void) +void tdx_cpu_flush_cache_for_kexec(void) { lockdep_assert_preemption_disabled(); @@ -1881,4 +1881,4 @@ void tdx_cpu_flush_cache(void) wbinvd(); this_cpu_write(cache_state_incoherent, false); } -EXPORT_SYMBOL_GPL(tdx_cpu_flush_cache); +EXPORT_SYMBOL_GPL(tdx_cpu_flush_cache_for_kexec); Personally, I'm totally okay with v6. But the above change seems to me like the best way to obey Sean's objection, better than adding the test in KVM. Paolo
On Wed, 2025-08-20 at 11:51 +0200, Paolo Bonzini wrote: > On 8/19/25 23:53, Huang, Kai wrote: > > On Tue, 2025-08-19 at 12:31 +0200, Paolo Bonzini wrote: > > > 2) ... but anyway, KVM is the wrong place to do the test. If anything, > > > since we need a v7 to change the unnecessary stub, you could move that > > > stub under #ifndef CONFIG_KEXEC_CORE and rename the function to > > > tdx_cpu_flush_cache_for_kexec(). > > > > Agreed on renaming to tdx_cpu_flush_cache_for_kexec(). > > > > But with the "for_kexec()" part in the function name, it already implies > > it is related to kexec, and I kinda think there's no need to test > > IS_ENABLED(CONFIG_KEXEC_CORE) anymore. > > > > One of the main purpose of this series is to unblock TDX_HOST and KEXEC in > > the Kconfig, since otherwise I've been told distros will simply choose to > > disable TDX_HOST in the Kconfig. So in reality, I suppose they will be on > > together probably in like 95% cases, if not 100%. > > > > If we want to test CONFIG_KEXEC_CORE in tdx_cpu_flush_cache_for_kexec(), > > then it would be a little bit weird that why we don't test it in other > > places, e.g., when setting up the boolean. Testing it in all places would > > make the code unnecessarily long and harder to read. > > No I don't mean testing it there, but just making > tdx_cpu_flush_cache_for_kexec() a stub when CONFIG_KEXEC_CORE is > undefined: > > diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h > index e9a213582f03..913199b1954b 100644 > --- a/arch/x86/include/asm/tdx.h > +++ b/arch/x86/include/asm/tdx.h > @@ -217,7 +217,6 @@ u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_err1, u6 > u64 tdh_phymem_cache_wb(bool resume); > u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td); > u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page); > -void tdx_cpu_flush_cache(void); > #else > static inline void tdx_init(void) { } > static inline int tdx_cpu_enable(void) { return -ENODEV; } > @@ -225,8 +224,13 @@ static inline int tdx_enable(void) { return -ENODEV; } > static inline u32 tdx_get_nr_guest_keyids(void) { return 0; } > static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; } > static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NULL; } > -static inline void tdx_cpu_flush_cache(void) { } > #endif /* CONFIG_INTEL_TDX_HOST */ > > +#ifdef CONFIG_KEXEC_CORE > +void tdx_cpu_flush_cache_for_kexec(void); > +#else > +static inline void tdx_cpu_flush_cache_for_kexec(void) { } > +#endif > + I think one minor issue here is, when CONFIG_INTEL_TDX_HOST is off but CONFIG_KEXEC_CORE is on, there will be no implementation of tdx_cpu_flush_cache_for_kexec(). This won't result in build error, though, because when TDX_HOST is off, KVM_INTEL_TDX will be off too, i.e., there won't be any caller of tdx_cpu_flush_cache_for_kexec(). But this still doesn't look nice? Btw, the above will provide the stub function when both KEXEC_CORE and TDX_HOST is off, which seems to be a step back too? :-) To me, it's more straightforward to just rename it to tdx_cpu_flush_cache_for_kexec() and remove the stub: diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index e9a213582f03..5e37ae1a40e8 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -217,7 +217,7 @@ u64 tdh_mem_page_remove(struct tdx_td *td, u64 gpa, u64 level, u64 *ext_err1, u6 u64 tdh_phymem_cache_wb(bool resume); u64 tdh_phymem_page_wbinvd_tdr(struct tdx_td *td); u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page); -void tdx_cpu_flush_cache(void); +void tdx_cpu_flush_cache_for_kexec(void); #else static inline void tdx_init(void) { } static inline int tdx_cpu_enable(void) { return -ENODEV; } @@ -225,7 +225,6 @@ static inline int tdx_enable(void) { return -ENODEV; } static inline u32 tdx_get_nr_guest_keyids(void) { return 0; } static inline const char *tdx_dump_mce_info(struct mce *m) { return NULL; } static inline const struct tdx_sys_info *tdx_get_sysinfo(void) { return NULL; } -static inline void tdx_cpu_flush_cache(void) { } #endif /* CONFIG_INTEL_TDX_HOST */ #endif /* !__ASSEMBLER__ */ diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 1bc6f52e0cd7..382792e31f32 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -453,7 +453,7 @@ void tdx_disable_virtualization_cpu(void) * remote CPUs to stop them. Doing WBINVD in stop_this_cpu() * could potentially increase the possibility of the "race". */ - tdx_cpu_flush_cache(); + tdx_cpu_flush_cache_for_kexec(); } #define TDX_SEAMCALL_RETRIES 10000 diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index c26e2e07ff6b..cd2a36dbbfc5 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1871,7 +1871,7 @@ u64 tdh_phymem_page_wbinvd_hkid(u64 hkid, struct page *page) } EXPORT_SYMBOL_GPL(tdh_phymem_page_wbinvd_hkid); -void tdx_cpu_flush_cache(void) +void tdx_cpu_flush_cache_for_kexec(void) { lockdep_assert_preemption_disabled(); @@ -1881,4 +1881,4 @@ void tdx_cpu_flush_cache(void) wbinvd(); this_cpu_write(cache_state_incoherent, false); } -EXPORT_SYMBOL_GPL(tdx_cpu_flush_cache); +EXPORT_SYMBOL_GPL(tdx_cpu_flush_cache_for_kexec);
On Wed, Aug 20, 2025 at 1:22 PM Huang, Kai <kai.huang@intel.com> wrote: > I think one minor issue here is, when CONFIG_INTEL_TDX_HOST is off but > CONFIG_KEXEC_CORE is on, there will be no implementation of > tdx_cpu_flush_cache_for_kexec(). This won't result in build error, > though, because when TDX_HOST is off, KVM_INTEL_TDX will be off too, i.e., > there won't be any caller of tdx_cpu_flush_cache_for_kexec(). > > But this still doesn't look nice? Why do you need one? It's called tdx_cpu_flush_cache_for_kexec(), you don't need it if TDX is disabled. > Btw, the above will provide the stub function when both KEXEC_CORE and > TDX_HOST is off, which seems to be a step back too? Let's just stop here. Are we really wasting this much time discussing like 30 characters and 0 bytes of object code? > To me, it's more straightforward to just rename it to > tdx_cpu_flush_cache_for_kexec() and remove the stub: Sure, just rename the function and let's call it a day. If it was me, v6 was good enough. Paolo
On Wed, 2025-08-20 at 22:35 +0200, Paolo Bonzini wrote: > On Wed, Aug 20, 2025 at 1:22 PM Huang, Kai <kai.huang@intel.com> wrote: > > I think one minor issue here is, when CONFIG_INTEL_TDX_HOST is off but > > CONFIG_KEXEC_CORE is on, there will be no implementation of > > tdx_cpu_flush_cache_for_kexec(). This won't result in build error, > > though, because when TDX_HOST is off, KVM_INTEL_TDX will be off too, i.e., > > there won't be any caller of tdx_cpu_flush_cache_for_kexec(). > > > > But this still doesn't look nice? > > Why do you need one? It's called tdx_cpu_flush_cache_for_kexec(), you > don't need it if TDX is disabled. Sorry I meant the declaration will still be there w/o the function body. > > > Btw, the above will provide the stub function when both KEXEC_CORE and > > TDX_HOST is off, which seems to be a step back too? > > Let's just stop here. Are we really wasting this much time discussing > like 30 characters and 0 bytes of object code? > > > To me, it's more straightforward to just rename it to > > tdx_cpu_flush_cache_for_kexec() and remove the stub: > > Sure, just rename the function and let's call it a day. If it was me, > v6 was good enough. > Thanks for your time Paolo!
© 2016 - 2025 Red Hat, Inc.