Before KVM can use TDX to create and run TDX guests, TDX needs to be
initialized from two perspectives: 1) TDX module must be initialized
properly to a working state; 2) A per-cpu TDX initialization, a.k.a the
TDH.SYS.LP.INIT SEAMCALL must be done on any logical cpu before it can
run any other TDX SEAMCALLs.
The TDX host core-kernel provides two functions to do the above two
respectively: tdx_enable() and tdx_cpu_enable().
There are two options in terms of when to initialize TDX: initialize TDX
at KVM module loading time, or when creating the first TDX guest.
Choose to initialize TDX during KVM module loading time:
Initializing TDX module is both memory and CPU time consuming: 1) the
kernel needs to allocate a non-trivial size(~1/256) of system memory
as metadata used by TDX module to track each TDX-usable memory page's
status; 2) the TDX module needs to initialize this metadata, one entry
for each TDX-usable memory page.
Also, the kernel uses alloc_contig_pages() to allocate those metadata
chunks, because they are large and need to be physically contiguous.
alloc_contig_pages() can fail. If initializing TDX when creating the
first TDX guest, then there's chance that KVM won't be able to run any
TDX guests albeit KVM _declares_ to be able to support TDX.
This isn't good for the user.
On the other hand, initializing TDX at KVM module loading time can make
sure KVM is providing a consistent view of whether KVM can support TDX
to the user.
Always only try to initialize TDX after VMX has been initialized. TDX
is based on VMX, and if VMX fails to initialize then TDX is likely to be
broken anyway. Also, in practice, supporting TDX will require part of
VMX and common x86 infrastructure in working order, so TDX cannot be
enabled alone w/o VMX support.
There are two cases that can result in failure to initialize TDX: 1) TDX
cannot be supported (e.g., because of TDX is not supported or enabled by
hardware, or module is not loaded, or missing some dependency in KVM's
configuration); 2) Any unexpected error during TDX bring-up. For the
first case only mark TDX is disabled but still allow KVM module to be
loaded. For the second case just fail to load the KVM module so that
the user can be aware.
Because TDX costs additional memory, don't enable TDX by default. Add a
new module parameter 'enable_tdx' to allow the user to opt-in.
Note, the name tdx_init() has already been taken by the early boot code.
Use tdx_bringup() for initializing TDX (and tdx_cleanup() since KVM
doesn't actually teardown TDX). They don't match vt_init()/vt_exit(),
vmx_init()/vmx_exit() etc but it's not end of the world.
Also, once initialized, the TDX module cannot be disabled and enabled
again w/o the TDX module runtime update, which isn't supported by the
kernel. After TDX is enabled, nothing needs to be done when KVM
disables hardware virtualization, e.g., when offlining CPU, or during
suspend/resume. TDX host core-kernel code internally tracks TDX status
and can handle "multiple enabling" scenario.
Similar to KVM_AMD_SEV, add a new KVM_INTEL_TDX Kconfig to guide KVM TDX
code. Make it depend on INTEL_TDX_HOST but not replace INTEL_TDX_HOST
because in the longer term there's a use case that requires making
SEAMCALLs w/o KVM as mentioned by Dan [1].
Link: https://lore.kernel.org/6723fc2070a96_60c3294dc@dwillia2-mobl3.amr.corp.intel.com.notmuch/ [1]
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
arch/x86/kvm/Kconfig | 10 +++
arch/x86/kvm/Makefile | 1 +
arch/x86/kvm/vmx/main.c | 9 +++
arch/x86/kvm/vmx/tdx.c | 153 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/vmx/tdx.h | 13 ++++
5 files changed, 186 insertions(+)
create mode 100644 arch/x86/kvm/vmx/tdx.c
create mode 100644 arch/x86/kvm/vmx/tdx.h
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index d93af5390341..e6da1b4ff3d2 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -128,6 +128,16 @@ config X86_SGX_KVM
If unsure, say N.
+config KVM_INTEL_TDX
+ bool "Intel Trust Domain Extensions (TDX) support"
+ default y
+ depends on INTEL_TDX_HOST
+ help
+ Provides support for launching Intel Trust Domain Extensions (TDX)
+ confidential VMs on Intel processors.
+
+ If unsure, say N.
+
config KVM_AMD
tristate "KVM for AMD processors support"
depends on KVM && (CPU_SUP_AMD || CPU_SUP_HYGON)
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index f9dddb8cb466..a5d362c7b504 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -20,6 +20,7 @@ kvm-intel-y += vmx/vmx.o vmx/vmenter.o vmx/pmu_intel.o vmx/vmcs12.o \
kvm-intel-$(CONFIG_X86_SGX_KVM) += vmx/sgx.o
kvm-intel-$(CONFIG_KVM_HYPERV) += vmx/hyperv.o vmx/hyperv_evmcs.o
+kvm-intel-$(CONFIG_KVM_INTEL_TDX) += vmx/tdx.o
kvm-amd-y += svm/svm.o svm/vmenter.o svm/pmu.o svm/nested.o svm/avic.o
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 6772e560ac7b..bc690c63e511 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -6,6 +6,7 @@
#include "nested.h"
#include "pmu.h"
#include "posted_intr.h"
+#include "tdx.h"
#define VMX_REQUIRED_APICV_INHIBITS \
(BIT(APICV_INHIBIT_REASON_DISABLED) | \
@@ -171,6 +172,7 @@ struct kvm_x86_init_ops vt_init_ops __initdata = {
static void vt_exit(void)
{
kvm_exit();
+ tdx_cleanup();
vmx_exit();
}
module_exit(vt_exit);
@@ -183,6 +185,11 @@ static int __init vt_init(void)
if (r)
return r;
+ /* tdx_init() has been taken */
+ r = tdx_bringup();
+ if (r)
+ goto err_tdx_bringup;
+
/*
* Common KVM initialization _must_ come last, after this, /dev/kvm is
* exposed to userspace!
@@ -195,6 +202,8 @@ static int __init vt_init(void)
return 0;
err_kvm_init:
+ tdx_cleanup();
+err_tdx_bringup:
vmx_exit();
return r;
}
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
new file mode 100644
index 000000000000..d35112758641
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -0,0 +1,153 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/cpu.h>
+#include <asm/cpufeature.h>
+#include <asm/tdx.h>
+#include "capabilities.h"
+#include "tdx.h"
+
+#undef pr_fmt
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+static bool enable_tdx __ro_after_init;
+module_param_named(tdx, enable_tdx, bool, 0444);
+
+static enum cpuhp_state tdx_cpuhp_state;
+
+static int tdx_online_cpu(unsigned int cpu)
+{
+ unsigned long flags;
+ int r;
+
+ /* Sanity check CPU is already in post-VMXON */
+ WARN_ON_ONCE(!(cr4_read_shadow() & X86_CR4_VMXE));
+
+ local_irq_save(flags);
+ r = tdx_cpu_enable();
+ local_irq_restore(flags);
+
+ return r;
+}
+
+static void __do_tdx_cleanup(void)
+{
+ /*
+ * Once TDX module is initialized, it cannot be disabled and
+ * re-initialized again w/o runtime update (which isn't
+ * supported by kernel). Only need to remove the cpuhp here.
+ * The TDX host core code tracks TDX status and can handle
+ * 'multiple enabling' scenario.
+ */
+ WARN_ON_ONCE(!tdx_cpuhp_state);
+ cpuhp_remove_state_nocalls(tdx_cpuhp_state);
+ tdx_cpuhp_state = 0;
+}
+
+static int __init __do_tdx_bringup(void)
+{
+ int r;
+
+ /*
+ * TDX-specific cpuhp callback to call tdx_cpu_enable() on all
+ * online CPUs before calling tdx_enable(), and on any new
+ * going-online CPU to make sure it is ready for TDX guest.
+ */
+ r = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN,
+ "kvm/cpu/tdx:online",
+ tdx_online_cpu, NULL);
+ if (r < 0)
+ return r;
+
+ tdx_cpuhp_state = r;
+
+ r = tdx_enable();
+ if (r)
+ __do_tdx_cleanup();
+
+ return r;
+}
+
+static bool __init kvm_can_support_tdx(void)
+{
+ return cpu_feature_enabled(X86_FEATURE_TDX_HOST_PLATFORM);
+}
+
+static int __init __tdx_bringup(void)
+{
+ int r;
+
+ /*
+ * Enabling TDX requires enabling hardware virtualization first,
+ * as making SEAMCALLs requires CPU being in post-VMXON state.
+ */
+ r = kvm_enable_virtualization();
+ if (r)
+ return r;
+
+ cpus_read_lock();
+ r = __do_tdx_bringup();
+ cpus_read_unlock();
+
+ if (r)
+ goto tdx_bringup_err;
+
+ /*
+ * Leave hardware virtualization enabled after TDX is enabled
+ * successfully. TDX CPU hotplug depends on this.
+ */
+ return 0;
+tdx_bringup_err:
+ kvm_disable_virtualization();
+ return r;
+}
+
+void tdx_cleanup(void)
+{
+ if (enable_tdx) {
+ __do_tdx_cleanup();
+ kvm_disable_virtualization();
+ }
+}
+
+int __init tdx_bringup(void)
+{
+ int r;
+
+ enable_tdx = enable_tdx && kvm_can_support_tdx();
+
+ if (!enable_tdx)
+ return 0;
+
+ /*
+ * Ideally KVM should probe whether TDX module has been loaded
+ * first and then try to bring it up, because KVM should treat
+ * them differently. I.e., KVM should just disable TDX while
+ * still allow module to be loaded when TDX module is not
+ * loaded, but fail to load module when it actually fails to
+ * bring up TDX.
+ *
+ * But unfortunately TDX needs to use SEAMCALL to probe whether
+ * the module is loaded (there is no CPUID or MSR for that),
+ * and making SEAMCALL requires enabling virtualization first,
+ * just like the rest steps of bringing up TDX module.
+ *
+ * The first SEAMCALL to bring up TDX module returns -ENODEV
+ * when the module is not loaded. For simplicity just try to
+ * bring up TDX and use the return code as the way to probe,
+ * albeit this is not perfect, i.e., need to make sure
+ * __tdx_bringup() doesn't return -ENODEV in other cases.
+ */
+ r = __tdx_bringup();
+ if (r) {
+ enable_tdx = 0;
+ /*
+ * Disable TDX only but don't fail to load module when
+ * TDX module is not loaded. No need to print message
+ * saying "module is not loaded" because it was printed
+ * when the first SEAMCALL failed.
+ */
+ if (r == -ENODEV)
+ r = 0;
+ }
+
+ return r;
+}
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
new file mode 100644
index 000000000000..8aee938a968f
--- /dev/null
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __KVM_X86_VMX_TDX_H
+#define __KVM_X86_VMX_TDX_H
+
+#ifdef CONFIG_INTEL_TDX_HOST
+int tdx_bringup(void);
+void tdx_cleanup(void);
+#else
+static inline int tdx_bringup(void) { return 0; }
+static inline void tdx_cleanup(void) {}
+#endif
+
+#endif
--
2.46.2
>+static int tdx_online_cpu(unsigned int cpu) >+{ >+ unsigned long flags; >+ int r; >+ >+ /* Sanity check CPU is already in post-VMXON */ >+ WARN_ON_ONCE(!(cr4_read_shadow() & X86_CR4_VMXE)); >+ >+ local_irq_save(flags); >+ r = tdx_cpu_enable(); >+ local_irq_restore(flags); The comment above tdx_cpu_enable() is outdated because now it may be called from CPU hotplug rather than IPI function calls only. Can we relax the assertion lockdep_assert_irqs_disabled() in tdx_cpu_enable()? looks the requirement is just the enabling work won't be migrated and done to another CPU. >+ >+ return r; >+} >+ >+static void __do_tdx_cleanup(void) >+{ >+ /* >+ * Once TDX module is initialized, it cannot be disabled and >+ * re-initialized again w/o runtime update (which isn't >+ * supported by kernel). Only need to remove the cpuhp here. >+ * The TDX host core code tracks TDX status and can handle >+ * 'multiple enabling' scenario. >+ */ >+ WARN_ON_ONCE(!tdx_cpuhp_state); >+ cpuhp_remove_state_nocalls(tdx_cpuhp_state); ... >+ tdx_cpuhp_state = 0; >+} >+ >+static int __init __do_tdx_bringup(void) >+{ >+ int r; >+ >+ /* >+ * TDX-specific cpuhp callback to call tdx_cpu_enable() on all >+ * online CPUs before calling tdx_enable(), and on any new >+ * going-online CPU to make sure it is ready for TDX guest. >+ */ >+ r = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN, >+ "kvm/cpu/tdx:online", >+ tdx_online_cpu, NULL); >+ if (r < 0) >+ return r; >+ >+ tdx_cpuhp_state = r; >+ >+ r = tdx_enable(); >+ if (r) >+ __do_tdx_cleanup(); this calls cpuhp_remove_state_nocalls(), which acquires cpu locks again, causing a potential deadlock IIUC. >+ >+ return r; >+} >+ >+static bool __init kvm_can_support_tdx(void) I think "static __init bool" is the preferred order. see https://www.kernel.org/doc/html/latest/process/coding-style.html#function-prototypes >+{ >+ return cpu_feature_enabled(X86_FEATURE_TDX_HOST_PLATFORM); >+} >+ >+static int __init __tdx_bringup(void) >+{ >+ int r; >+ >+ /* >+ * Enabling TDX requires enabling hardware virtualization first, >+ * as making SEAMCALLs requires CPU being in post-VMXON state. >+ */ >+ r = kvm_enable_virtualization(); >+ if (r) >+ return r; >+ >+ cpus_read_lock(); >+ r = __do_tdx_bringup(); >+ cpus_read_unlock(); >+ >+ if (r) >+ goto tdx_bringup_err; >+ >+ /* >+ * Leave hardware virtualization enabled after TDX is enabled >+ * successfully. TDX CPU hotplug depends on this. >+ */ Shouldn't we make enable_tdx dependent on enable_virt_at_load? Otherwise, if someone sets enable_tdx=1 and enable_virt_at_load=0, they will get hardware virtualization enabled at load time while enable_virt_at_load still shows 0. This behavior is a bit confusing to me. I think a check against enable_virt_at_load in kvm_can_support_tdx() will work. The call of kvm_enable_virtualization() here effectively moves kvm_init_virtualization() out of kvm_init() when enable_tdx=1. I wonder if it makes more sense to refactor out kvm_init_virtualization() for non-TDX cases as well, i.e., vmx_init(); kvm_init_virtualization(); tdx_init(); kvm_init(); I'm not sure if this was ever discussed. To me, this approach is better because TDX code needn't handle virtualization enabling stuff. It can simply check that enable_virt_at_load=1, assume virtualization is enabled and needn't disable virtualization on errors. A bonus is that on non-TDX-capable systems, hardware virtualization won't be toggled twice at KVM load time for no good reason. >+ return 0; >+tdx_bringup_err: >+ kvm_disable_virtualization(); >+ return r; >+}
On 18/11/2024 2:22 pm, Chao Gao wrote: >> +static int tdx_online_cpu(unsigned int cpu) >> +{ >> + unsigned long flags; >> + int r; >> + >> + /* Sanity check CPU is already in post-VMXON */ >> + WARN_ON_ONCE(!(cr4_read_shadow() & X86_CR4_VMXE)); >> + >> + local_irq_save(flags); >> + r = tdx_cpu_enable(); >> + local_irq_restore(flags); > > The comment above tdx_cpu_enable() is outdated because now it may be called > from CPU hotplug rather than IPI function calls only. > > Can we relax the assertion lockdep_assert_irqs_disabled() in tdx_cpu_enable()? > looks the requirement is just the enabling work won't be migrated and done to > another CPU. We can but I don't want to do it now. We will need to revist both tdx_cpu_enable() and tdx_enable() when we move VMXON out of KVM anyway. I would like to focus on bringing KVM TDX support first and then to revisit them together at that timeframe. > >> + >> + return r; >> +} >> + >> +static void __do_tdx_cleanup(void) >> +{ >> + /* >> + * Once TDX module is initialized, it cannot be disabled and >> + * re-initialized again w/o runtime update (which isn't >> + * supported by kernel). Only need to remove the cpuhp here. >> + * The TDX host core code tracks TDX status and can handle >> + * 'multiple enabling' scenario. >> + */ >> + WARN_ON_ONCE(!tdx_cpuhp_state); >> + cpuhp_remove_state_nocalls(tdx_cpuhp_state); > > ... > >> + tdx_cpuhp_state = 0; >> +} >> + >> +static int __init __do_tdx_bringup(void) >> +{ >> + int r; >> + >> + /* >> + * TDX-specific cpuhp callback to call tdx_cpu_enable() on all >> + * online CPUs before calling tdx_enable(), and on any new >> + * going-online CPU to make sure it is ready for TDX guest. >> + */ >> + r = cpuhp_setup_state_cpuslocked(CPUHP_AP_ONLINE_DYN, >> + "kvm/cpu/tdx:online", >> + tdx_online_cpu, NULL); >> + if (r < 0) >> + return r; >> + >> + tdx_cpuhp_state = r; >> + >> + r = tdx_enable(); >> + if (r) >> + __do_tdx_cleanup(); > > this calls cpuhp_remove_state_nocalls(), which acquires cpu locks again, > causing a potential deadlock IIUC. Dam.. I'll fix. Thanks for catching. > >> + >> + return r; >> +} >> + >> +static bool __init kvm_can_support_tdx(void) > > I think "static __init bool" is the preferred order. see > > https://www.kernel.org/doc/html/latest/process/coding-style.html#function-prototypes I think you are right, but IIUC we'd better to change all the existing 'static <ret_type> __init' to 'static __init <ret_type>' in KVM code. I'd rather to keep the current way to make them aligned and we can change them at once if needed in the future. > >> +{ >> + return cpu_feature_enabled(X86_FEATURE_TDX_HOST_PLATFORM); >> +} >> + >> +static int __init __tdx_bringup(void) >> +{ >> + int r; >> + >> + /* >> + * Enabling TDX requires enabling hardware virtualization first, >> + * as making SEAMCALLs requires CPU being in post-VMXON state. >> + */ >> + r = kvm_enable_virtualization(); >> + if (r) >> + return r; >> + >> + cpus_read_lock(); >> + r = __do_tdx_bringup(); >> + cpus_read_unlock(); >> + >> + if (r) >> + goto tdx_bringup_err; >> + >> + /* >> + * Leave hardware virtualization enabled after TDX is enabled >> + * successfully. TDX CPU hotplug depends on this. >> + */ > > Shouldn't we make enable_tdx dependent on enable_virt_at_load? Otherwise, if > someone sets enable_tdx=1 and enable_virt_at_load=0, they will get hardware > virtualization enabled at load time while enable_virt_at_load still shows 0. > This behavior is a bit confusing to me. > > I think a check against enable_virt_at_load in kvm_can_support_tdx() will work. > > The call of kvm_enable_virtualization() here effectively moves > kvm_init_virtualization() out of kvm_init() when enable_tdx=1. I wonder if it > makes more sense to refactor out kvm_init_virtualization() for non-TDX cases > as well, i.e., > > vmx_init(); > kvm_init_virtualization(); > tdx_init(); > kvm_init(); > > I'm not sure if this was ever discussed. To me, this approach is better because > TDX code needn't handle virtualization enabling stuff. It can simply check that > enable_virt_at_load=1, assume virtualization is enabled and needn't disable > virtualization on errors. I think this was briefly discussed here: https://lore.kernel.org/all/ZrrFgBmoywk7eZYC@google.com/ The disadvantage of splitting out kvm_init_virtualization() is all other ARCHs (all non-TDX cases actually) will need to explicitly call kvm_init_virtualization() separately. > > A bonus is that on non-TDX-capable systems, hardware virtualization won't be > toggled twice at KVM load time for no good reason. I am fine with either way. Sean, do you have any comments?
>> Shouldn't we make enable_tdx dependent on enable_virt_at_load? >> Otherwise, if >> someone sets enable_tdx=1 and enable_virt_at_load=0, they will get >> hardware >> virtualization enabled at load time while enable_virt_at_load still >> shows 0. >> This behavior is a bit confusing to me. Forgot to reply this ... >> >> I think a check against enable_virt_at_load in kvm_can_support_tdx() >> will work. >> >> The call of kvm_enable_virtualization() here effectively moves >> kvm_init_virtualization() out of kvm_init() when enable_tdx=1. I >> wonder if it >> makes more sense to refactor out kvm_init_virtualization() for non-TDX >> cases >> as well, i.e., >> >> vmx_init(); >> kvm_init_virtualization(); >> tdx_init(); >> kvm_init(); >> >> I'm not sure if this was ever discussed. To me, this approach is >> better because >> TDX code needn't handle virtualization enabling stuff. It can simply >> check that >> enable_virt_at_load=1, assume virtualization is enabled and needn't >> disable >> virtualization on errors. > > I think this was briefly discussed here: > > https://lore.kernel.org/all/ZrrFgBmoywk7eZYC@google.com/ > > The disadvantage of splitting out kvm_init_virtualization() is all other > ARCHs (all non-TDX cases actually) will need to explicitly call > kvm_init_virtualization() separately. > >> >> A bonus is that on non-TDX-capable systems, hardware virtualization >> won't be >> toggled twice at KVM load time for no good reason. > > I am fine with either way. > > Sean, do you have any comments? > ... and yes I think we should make TDX depend on 'enable_virt_at_load'. And by doing this, I think we can still do kvm_init_virtualization() inside kvm_init(): 'enable_virt_at_load' still reflects its behaviour -- TDX just enables virt earlier than other cases, but it is still "enable virt at load". It's perhaps not perfect, but it saves unneeded separate call of kvm_init_virtualization() for other ARCHs.
© 2016 - 2024 Red Hat, Inc.