P-SEAMLDR is another component alongside the TDX module within the
protected SEAM range. Software can invoke its functions by executing the
SEAMCALL instruction with the 63 bit of RAX set to 1. P-SEAMLDR SEAMCALLs
differ from those of the TDX module in terms of error codes and the
handling of the current VMCS.
Add a wrapper for P-SEAMLDR SEAMCALLs based on the SEAMCALL infrastructure.
Intel® Trust Domain CPU Architectural Extensions (May 2021 edition)
Chapter 2.3 states:
SEAMRET from the P-SEAMLDR clears the current VMCS structure pointed to by
the current-VMCS pointer. A VMM that invokes the P-SEAMLDR using SEAMCALL
must reload the current-VMCS, if required, using the VMPTRLD instruction.
So, save and restore the current-VMCS pointer using VMPTRST and VMPTRLD
instructions to avoid breaking KVM, which manages the current-VMCS.
Disable interrupts to prevent KVM code from interfering with P-SEAMLDR
SEAMCALLs. For example, if a vCPU is scheduled before the current VMCS is
restored, it may encounter an invalid current VMCS, causing its VMX
instruction to fail. Additionally, if KVM sends IPIs to invalidate a
current VMCS and the invalidation occurs right after the current VMCS is
saved, that VMCS will be reloaded after P-SEAMLDR SEAMCALLs, leading to
unexpected behavior.
NMIs are not a problem, as the only scenario where instructions relying on
the current-VMCS are used is during guest PMI handling in KVM. This occurs
immediately after VM exits with IRQ and NMI disabled, ensuring no
interference with P-SEAMLDR SEAMCALLs.
Signed-off-by: Chao Gao <chao.gao@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
---
arch/x86/Kconfig | 10 ++++++++
arch/x86/virt/vmx/tdx/Makefile | 1 +
arch/x86/virt/vmx/tdx/seamldr.c | 44 +++++++++++++++++++++++++++++++++
arch/x86/virt/vmx/vmx.h | 40 ++++++++++++++++++++++++++++++
4 files changed, 95 insertions(+)
create mode 100644 arch/x86/virt/vmx/tdx/seamldr.c
create mode 100644 arch/x86/virt/vmx/vmx.h
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4b9f378e05f6..8b1e0986b7f8 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1932,6 +1932,16 @@ config INTEL_TDX_HOST
If unsure, say N.
+config INTEL_TDX_MODULE_UPDATE
+ bool "Intel TDX module runtime update"
+ depends on INTEL_TDX_HOST
+ help
+ This enables the kernel to support TDX module runtime update. This allows
+ the admin to upgrade the TDX module to a newer one without the need to
+ terminate running TDX guests.
+
+ If unsure, say N.
+
config EFI
bool "EFI runtime service support"
depends on ACPI
diff --git a/arch/x86/virt/vmx/tdx/Makefile b/arch/x86/virt/vmx/tdx/Makefile
index 90da47eb85ee..26aea3531c36 100644
--- a/arch/x86/virt/vmx/tdx/Makefile
+++ b/arch/x86/virt/vmx/tdx/Makefile
@@ -1,2 +1,3 @@
# SPDX-License-Identifier: GPL-2.0-only
obj-y += seamcall.o tdx.o
+obj-$(CONFIG_INTEL_TDX_MODULE_UPDATE) += seamldr.o
diff --git a/arch/x86/virt/vmx/tdx/seamldr.c b/arch/x86/virt/vmx/tdx/seamldr.c
new file mode 100644
index 000000000000..a252f1ae3483
--- /dev/null
+++ b/arch/x86/virt/vmx/tdx/seamldr.c
@@ -0,0 +1,44 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright(c) 2025 Intel Corporation.
+ *
+ * Intel TDX module runtime update
+ */
+#define pr_fmt(fmt) "seamldr: " fmt
+
+#include <linux/cleanup.h>
+
+#include "tdx.h"
+#include "../vmx.h"
+
+static __maybe_unused int seamldr_call(u64 fn, struct tdx_module_args *args)
+{
+ u64 vmcs;
+ int ret;
+
+ if (!is_seamldr_call(fn))
+ return -EINVAL;
+
+ /*
+ * SEAMRET from P-SEAMLDR invalidates the current-VMCS pointer.
+ * Save/restore current-VMCS pointer across P-SEAMLDR SEAMCALLs so
+ * that VMX instructions won't fail due to an invalid current-VMCS.
+ *
+ * Disable interrupt to prevent SMP call functions from seeing the
+ * invalid current-VMCS.
+ */
+ guard(irqsave)();
+
+ ret = cpu_vmcs_store(&vmcs);
+ if (ret)
+ return ret;
+
+ ret = seamldr_prerr(fn, args);
+
+ /* Restore current-VMCS pointer */
+#define INVALID_VMCS -1ULL
+ if (vmcs != INVALID_VMCS)
+ WARN_ON_ONCE(cpu_vmcs_load(vmcs));
+
+ return ret;
+}
diff --git a/arch/x86/virt/vmx/vmx.h b/arch/x86/virt/vmx/vmx.h
new file mode 100644
index 000000000000..51e6460fd1fd
--- /dev/null
+++ b/arch/x86/virt/vmx/vmx.h
@@ -0,0 +1,40 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef ARCH_X86_VIRT_VMX_H
+#define ARCH_X86_VIRT_VMX_H
+
+#include <linux/printk.h>
+
+static inline int cpu_vmcs_load(u64 vmcs_pa)
+{
+ asm goto("1: vmptrld %0\n\t"
+ ".byte 0x2e\n\t" /* branch not taken hint */
+ "jna %l[error]\n\t"
+ _ASM_EXTABLE(1b, %l[fault])
+ : : "m" (vmcs_pa) : "cc" : error, fault);
+ return 0;
+
+error:
+ pr_err_once("vmptrld failed: %llx\n", vmcs_pa);
+ return -EIO;
+fault:
+ pr_err_once("vmptrld faulted\n");
+ return -EIO;
+}
+
+static inline int cpu_vmcs_store(u64 *vmcs_pa)
+{
+ int ret = -EIO;
+
+ asm volatile("1: vmptrst %0\n\t"
+ "mov $0, %1\n\t"
+ "2:\n\t"
+ _ASM_EXTABLE(1b, 2b)
+ : "=m" (*vmcs_pa), "+r" (ret) : :);
+
+ if (ret)
+ pr_err_once("vmptrst faulted\n");
+
+ return ret;
+}
+
+#endif
--
2.47.1
On Fri, May 23, 2025, Chao Gao wrote:
> +static __maybe_unused int seamldr_call(u64 fn, struct tdx_module_args *args)
> +{
> + u64 vmcs;
> + int ret;
> +
> + if (!is_seamldr_call(fn))
> + return -EINVAL;
> +
> + /*
> + * SEAMRET from P-SEAMLDR invalidates the current-VMCS pointer.
I'd rather this use human-friendly language as opposed to the SDM's pedantic
terminology, e.g. just "current VMCS".
> + * Save/restore current-VMCS pointer across P-SEAMLDR SEAMCALLs so
> + * that VMX instructions won't fail due to an invalid current-VMCS.
> + *
> + * Disable interrupt to prevent SMP call functions from seeing the
I would rather we establish a rule that KVM is allowed to do VMREAD/VMWRITE in
IRQ context, i.e. don't single out SMP function calls.
> + * invalid current-VMCS.
> + */
> + guard(irqsave)();
> +
> + ret = cpu_vmcs_store(&vmcs);
> + if (ret)
> + return ret;
> +
> + ret = seamldr_prerr(fn, args);
> +
> + /* Restore current-VMCS pointer */
> +#define INVALID_VMCS -1ULL
> + if (vmcs != INVALID_VMCS)
> + WARN_ON_ONCE(cpu_vmcs_load(vmcs));
> +
> + return ret;
> +}
> diff --git a/arch/x86/virt/vmx/vmx.h b/arch/x86/virt/vmx/vmx.h
> new file mode 100644
> index 000000000000..51e6460fd1fd
> --- /dev/null
> +++ b/arch/x86/virt/vmx/vmx.h
> @@ -0,0 +1,40 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef ARCH_X86_VIRT_VMX_H
> +#define ARCH_X86_VIRT_VMX_H
> +
> +#include <linux/printk.h>
> +
> +static inline int cpu_vmcs_load(u64 vmcs_pa)
> +{
> + asm goto("1: vmptrld %0\n\t"
> + ".byte 0x2e\n\t" /* branch not taken hint */
Heh, don't copy paste the crappy indentation, that was a result of Linus' tree-wide
changes from 4356e9f841f7 ("work around gcc bugs with 'asm goto' with outputs"),
i.e. not intentional.
Regarding question #3 from the cover letter:
3. Two helpers, cpu_vmcs_load() and cpu_vmcs_store(), are added in patch 3
to save and restore the current VMCS. KVM has a variant of cpu_vmcs_load(),
i.e., vmcs_load(). Extracting KVM's version would cause a lot of code
churn, and I don't think that can be justified for reducing ~16 LoC
duplication. Please let me know if you disagree.
I'm fine with the SEAMLDR code having its own code, because I agree it's not worth
extracting KVM's macro maze just to get at VMPTRLD. But I'm not fine with creating
a new, inferior framework. So if we elect to leave KVM alone for the time being,
I would prefer to simply open code VMPTRST and VMPTRLD in seamldr.c, e.g.
static inline int seamldr_call(u64 fn, struct tdx_module_args *args)
{
u64 vmcs;
int ret;
if (!is_seamldr_call(fn))
return -EINVAL;
/*
* SEAMRET from P-SEAMLDR invalidates the current VMCS. Save/restore
* the VMCS across P-SEAMLDR SEAMCALLs to avoid clobbering KVM state.
* Disable interrupts as KVM is allowed to do VMREAD/VMWRITE in IRQ
* context (but not NMI context).
*/
guard(irqsave)();
asm goto("1: vmptrst %0\n\t"
_ASM_EXTABLE(1b, %l[error])
: "=m" (&vmcs) : "cc" : error);
ret = seamldr_prerr(fn, args);
/*
* Restore the current VMCS pointer. VMPTSTR "returns" all ones if the
* current VMCS is invalid.
*/
if (vmcs != -1ULL) {
asm goto("1: vmptrld %0\n\t"
"jna %l[error]\n\t"
_ASM_EXTABLE(1b, %l[error])
: : "m" (&vmcs) : "cc" : error);
}
return ret;
error:
WARN_ONCE(1, "Failed to save/restore the current VMCS");
return -EIO;
}
>Regarding question #3 from the cover letter:
>
> 3. Two helpers, cpu_vmcs_load() and cpu_vmcs_store(), are added in patch 3
> to save and restore the current VMCS. KVM has a variant of cpu_vmcs_load(),
> i.e., vmcs_load(). Extracting KVM's version would cause a lot of code
> churn, and I don't think that can be justified for reducing ~16 LoC
> duplication. Please let me know if you disagree.
>
>I'm fine with the SEAMLDR code having its own code, because I agree it's not worth
>extracting KVM's macro maze just to get at VMPTRLD. But I'm not fine with creating
>a new, inferior framework. So if we elect to leave KVM alone for the time being,
>I would prefer to simply open code VMPTRST and VMPTRLD in seamldr.c, e.g.
Agreed. And the code below makes perfect sense to me, so I will incorporate it
into my next version.
Thanks for your prompt feedback.
>
>static inline int seamldr_call(u64 fn, struct tdx_module_args *args)
>{
> u64 vmcs;
> int ret;
>
> if (!is_seamldr_call(fn))
> return -EINVAL;
>
> /*
> * SEAMRET from P-SEAMLDR invalidates the current VMCS. Save/restore
> * the VMCS across P-SEAMLDR SEAMCALLs to avoid clobbering KVM state.
> * Disable interrupts as KVM is allowed to do VMREAD/VMWRITE in IRQ
> * context (but not NMI context).
> */
> guard(irqsave)();
>
> asm goto("1: vmptrst %0\n\t"
> _ASM_EXTABLE(1b, %l[error])
> : "=m" (&vmcs) : "cc" : error);
>
> ret = seamldr_prerr(fn, args);
>
> /*
> * Restore the current VMCS pointer. VMPTSTR "returns" all ones if the
> * current VMCS is invalid.
> */
> if (vmcs != -1ULL) {
> asm goto("1: vmptrld %0\n\t"
> "jna %l[error]\n\t"
> _ASM_EXTABLE(1b, %l[error])
> : : "m" (&vmcs) : "cc" : error);
> }
>
> return ret;
>
>error:
> WARN_ONCE(1, "Failed to save/restore the current VMCS");
> return -EIO;
>}
On 5/23/25 12:52, Chao Gao wrote: > P-SEAMLDR is another component alongside the TDX module within the > protected SEAM range. Software can invoke its functions by executing the > SEAMCALL instruction with the 63 bit of RAX set to 1. P-SEAMLDR SEAMCALLs > differ from those of the TDX module in terms of error codes and the > handling of the current VMCS. > > Add a wrapper for P-SEAMLDR SEAMCALLs based on the SEAMCALL infrastructure. > > Intel® Trust Domain CPU Architectural Extensions (May 2021 edition) > Chapter 2.3 states: > > SEAMRET from the P-SEAMLDR clears the current VMCS structure pointed to by > the current-VMCS pointer. A VMM that invokes the P-SEAMLDR using SEAMCALL > must reload the current-VMCS, if required, using the VMPTRLD instruction. > > So, save and restore the current-VMCS pointer using VMPTRST and VMPTRLD > instructions to avoid breaking KVM, which manages the current-VMCS. > > Disable interrupts to prevent KVM code from interfering with P-SEAMLDR > SEAMCALLs. For example, if a vCPU is scheduled before the current VMCS is > restored, it may encounter an invalid current VMCS, causing its VMX > instruction to fail. Additionally, if KVM sends IPIs to invalidate a > current VMCS and the invalidation occurs right after the current VMCS is > saved, that VMCS will be reloaded after P-SEAMLDR SEAMCALLs, leading to > unexpected behavior. > > NMIs are not a problem, as the only scenario where instructions relying on > the current-VMCS are used is during guest PMI handling in KVM. This occurs > immediately after VM exits with IRQ and NMI disabled, ensuring no > interference with P-SEAMLDR SEAMCALLs. > > Signed-off-by: Chao Gao <chao.gao@intel.com> > Tested-by: Farrah Chen <farrah.chen@intel.com> > --- > arch/x86/Kconfig | 10 ++++++++ > arch/x86/virt/vmx/tdx/Makefile | 1 + > arch/x86/virt/vmx/tdx/seamldr.c | 44 +++++++++++++++++++++++++++++++++ > arch/x86/virt/vmx/vmx.h | 40 ++++++++++++++++++++++++++++++ > 4 files changed, 95 insertions(+) > create mode 100644 arch/x86/virt/vmx/tdx/seamldr.c > create mode 100644 arch/x86/virt/vmx/vmx.h > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > index 4b9f378e05f6..8b1e0986b7f8 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1932,6 +1932,16 @@ config INTEL_TDX_HOST > > If unsure, say N. > > +config INTEL_TDX_MODULE_UPDATE > + bool "Intel TDX module runtime update" > + depends on INTEL_TDX_HOST > + help > + This enables the kernel to support TDX module runtime update. This allows > + the admin to upgrade the TDX module to a newer one without the need to > + terminate running TDX guests. > + > + If unsure, say N. > + WHy should this be conditional?
>> +config INTEL_TDX_MODULE_UPDATE
>> + bool "Intel TDX module runtime update"
>> + depends on INTEL_TDX_HOST
>> + help
>> + This enables the kernel to support TDX module runtime update. This allows
>> + the admin to upgrade the TDX module to a newer one without the need to
>> + terminate running TDX guests.
>> +
>> + If unsure, say N.
>> +
>
>WHy should this be conditional?
>
Good question. I don't have a strong reason, but here are my considerations:
1. Runtime updates aren't strictly necessary for TDX functionalities. Users can
update the TDX module via BIOS updates and reboot if service downtime isn't
a concern.
2. Selecting TDX module updates requires selecting FW_UPLOAD and FW_LOADER,
which I think will significantly increase the kernel size if FW_UPLOAD/LOADER
won't otherwise be selected.
It may or may not be wise to assume that most TDX users will enable TDX module
updates. so, I'm taking a conservative approach by making it optional. The
resulting code isn't that complex, as CONFIG_INTEL_TDX_MODULE_UPDATE
appears in only two places:
1. in the Makefile:
obj-y += seamcall.o tdx.o
obj-$(CONFIG_INTEL_TDX_MODULE_UPDATE) += seamldr.o
2. in the seamldr.h:
#ifdef CONFIG_INTEL_TDX_MODULE_UPDATE
extern struct attribute_group seamldr_group;
#define SEAMLDR_GROUP (&seamldr_group)
int get_seamldr_info(void);
void seamldr_init(struct device *dev);
#else
#define SEAMLDR_GROUP NULL
static inline int get_seamldr_info(void) { return 0; }
static inline void seamldr_init(struct device *dev) { }
#endif
That said, I'm open to keeping or dropping the Kconfig option.
On 6/9/25 10:53, Chao Gao wrote:
>>> +config INTEL_TDX_MODULE_UPDATE
>>> + bool "Intel TDX module runtime update"
>>> + depends on INTEL_TDX_HOST
>>> + help
>>> + This enables the kernel to support TDX module runtime update. This allows
>>> + the admin to upgrade the TDX module to a newer one without the need to
>>> + terminate running TDX guests.
>>> +
>>> + If unsure, say N.
>>> +
>>
>> WHy should this be conditional?
>>
>
> Good question. I don't have a strong reason, but here are my considerations:
>
> 1. Runtime updates aren't strictly necessary for TDX functionalities. Users can
> update the TDX module via BIOS updates and reboot if service downtime isn't
> a concern.
>
> 2. Selecting TDX module updates requires selecting FW_UPLOAD and FW_LOADER,
> which I think will significantly increase the kernel size if FW_UPLOAD/LOADER
> won't otherwise be selected.
If size is a consideration (but given the size of machines that are
likely to run CoCo guests I'd say it's not) then don't make this a
user-configurable option but rather make it depend on TDX being selected
and FW_UPLOAD/FW_LOADER being selected.
I'd rather keep the user visible options to a minimum, especially
something such as this update functionality.
But in any case I'd like to hear other opinions as well.
>
> It may or may not be wise to assume that most TDX users will enable TDX module
> updates. so, I'm taking a conservative approach by making it optional. The
> resulting code isn't that complex, as CONFIG_INTEL_TDX_MODULE_UPDATE
> appears in only two places:
>
> 1. in the Makefile:
>
> obj-y += seamcall.o tdx.o
> obj-$(CONFIG_INTEL_TDX_MODULE_UPDATE) += seamldr.o
>
> 2. in the seamldr.h:
>
> #ifdef CONFIG_INTEL_TDX_MODULE_UPDATE
> extern struct attribute_group seamldr_group;
> #define SEAMLDR_GROUP (&seamldr_group)
> int get_seamldr_info(void);
> void seamldr_init(struct device *dev);
> #else
> #define SEAMLDR_GROUP NULL
> static inline int get_seamldr_info(void) { return 0; }
> static inline void seamldr_init(struct device *dev) { }
> #endif
>
> That said, I'm open to keeping or dropping the Kconfig option.
On Mon, Jun 09, 2025 at 11:02:49AM +0300, Nikolay Borisov wrote:
>
>
>On 6/9/25 10:53, Chao Gao wrote:
>> > > +config INTEL_TDX_MODULE_UPDATE
>> > > + bool "Intel TDX module runtime update"
>> > > + depends on INTEL_TDX_HOST
>> > > + help
>> > > + This enables the kernel to support TDX module runtime update. This allows
>> > > + the admin to upgrade the TDX module to a newer one without the need to
>> > > + terminate running TDX guests.
>> > > +
>> > > + If unsure, say N.
>> > > +
>> >
>> > WHy should this be conditional?
>> >
>>
>> Good question. I don't have a strong reason, but here are my considerations:
>>
>> 1. Runtime updates aren't strictly necessary for TDX functionalities. Users can
>> update the TDX module via BIOS updates and reboot if service downtime isn't
>> a concern.
>>
>> 2. Selecting TDX module updates requires selecting FW_UPLOAD and FW_LOADER,
>> which I think will significantly increase the kernel size if FW_UPLOAD/LOADER
>> won't otherwise be selected.
>
>If size is a consideration (but given the size of machines that are likely to
>run CoCo guests I'd say it's not) then don't make this a user-configurable
>option but rather make it depend on TDX being selected and
>FW_UPLOAD/FW_LOADER being selected.
But in almost all existing cases, 'select FW_UPLOAD/LOADER' is used rather than
'depends on FW_UPLOAD/LOADER'. You can verify this by running
find . -name 'Kconfig' -exec grep -E 'FW_UPLOAD|FW_LOADER$' {} +
>
>I'd rather keep the user visible options to a minimum, especially something
>such as this update functionality.
>
>But in any case I'd like to hear other opinions as well.
Yeah, let's see what others think.
<snip>
On 6/10/25 04:03, Chao Gao wrote:
> On Mon, Jun 09, 2025 at 11:02:49AM +0300, Nikolay Borisov wrote:
>>
>>
>> On 6/9/25 10:53, Chao Gao wrote:
>>>>> +config INTEL_TDX_MODULE_UPDATE
>>>>> + bool "Intel TDX module runtime update"
>>>>> + depends on INTEL_TDX_HOST
>>>>> + help
>>>>> + This enables the kernel to support TDX module runtime update. This allows
>>>>> + the admin to upgrade the TDX module to a newer one without the need to
>>>>> + terminate running TDX guests.
>>>>> +
>>>>> + If unsure, say N.
>>>>> +
>>>>
>>>> WHy should this be conditional?
>>>>
>>>
>>> Good question. I don't have a strong reason, but here are my considerations:
>>>
>>> 1. Runtime updates aren't strictly necessary for TDX functionalities. Users can
>>> update the TDX module via BIOS updates and reboot if service downtime isn't
>>> a concern.
>>>
>>> 2. Selecting TDX module updates requires selecting FW_UPLOAD and FW_LOADER,
>>> which I think will significantly increase the kernel size if FW_UPLOAD/LOADER
>>> won't otherwise be selected.
>>
>> If size is a consideration (but given the size of machines that are likely to
>> run CoCo guests I'd say it's not) then don't make this a user-configurable
>> option but rather make it depend on TDX being selected and
>> FW_UPLOAD/FW_LOADER being selected.
>
> But in almost all existing cases, 'select FW_UPLOAD/LOADER' is used rather than
> 'depends on FW_UPLOAD/LOADER'. You can verify this by running
>
> find . -name 'Kconfig' -exec grep -E 'FW_UPLOAD|FW_LOADER$' {} +
Then just have TDX select FW_UPLOAD/FW_LOADER and be done with it.
Still, let's hear other opinions but in this case I'd say size
considerations aren't major so let's make it simpler for the user.
>
>>
>> I'd rather keep the user visible options to a minimum, especially something
>> such as this update functionality.
>>
>> But in any case I'd like to hear other opinions as well.
>
> Yeah, let's see what others think.
>
> <snip>
On 6/9/25 23:52, Nikolay Borisov wrote: > Then just have TDX select FW_UPLOAD/FW_LOADER and be done with it. > Still, let's hear other opinions but in this case I'd say size > considerations aren't major so let's make it simpler for the user. In general, those of us at hardware companies like to mint lots of Kconfig options. Folks like Nikolay at the distros have to deal with fallout from the piles of Kconfig options. I tend to be pretty deferential to the distro guys on matters like this. I haven't heard anything even remotely compelling that we need a user-visible Kconfig option for this.
© 2016 - 2025 Red Hat, Inc.