[PATCH v2 2/5] x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL

Kai Huang posted 5 patches 9 months ago
There is a newer version of this series
[PATCH v2 2/5] x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL
Posted by Kai Huang 9 months ago
On TDX platforms, at hardware level dirty cachelines with and without
TDX keyID can coexist, and CPU can flush them back to memory in random
order.  During kexec, the caches must be flushed before jumping to the
new kernel to avoid silent memory corruption when a cacheline with a
different encryption property is written back over whatever encryption
properties the new kernel is using.

A percpu boolean is used to mark whether the cache of a given CPU may be
in an incoherent state, and the kexec performs WBINVD on the CPUs with
that boolean turned on.

For TDX, only the TDX module or the TDX guests can generate dirty
cachelines of TDX private memory, i.e., they are only generated when the
kernel does SEAMCALL.

Turn on that boolean when the kernel does SEAMCALL so that kexec can
correctly flush cache.  Note not all SEAMCALL leaf functions generate
dirty cachelines of TDX private memory, but for simplicity, just treat
all of them do.

SEAMCALL can be made from both task context and IRQ disabled context.
Given SEAMCALL is just a lengthy instruction (e.g., thousands of cycles)
from kernel's point of view and preempt_{disable|enable}() is cheap
compared to it, simply unconditionally disable preemption during setting
the percpu boolean and making SEAMCALL.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/tdx.h | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 4a1922ec80cf..d017e48958cd 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -97,9 +97,38 @@ u64 __seamcall_saved_ret(u64 fn, struct tdx_module_args *args);
 void tdx_init(void);
 
 #include <asm/archrandom.h>
+#include <asm/processor.h>
 
 typedef u64 (*sc_func_t)(u64 fn, struct tdx_module_args *args);
 
+static inline u64 do_seamcall(sc_func_t func, u64 fn,
+			      struct tdx_module_args *args)
+{
+	u64 ret;
+
+	preempt_disable();
+
+	/*
+	 * SEAMCALLs are made to the TDX module and can generate dirty
+	 * cachelines of TDX private memory.  Mark cache state incoherent
+	 * so that the cache can be flushed during kexec.
+	 *
+	 * Not all SEAMCALL leaf functions generate dirty cachelines
+	 * but for simplicity just treat all of them do.
+	 *
+	 * This needs to be done before actually making the SEAMCALL,
+	 * because kexec-ing CPU could send NMI to stop remote CPUs,
+	 * in which case even disabling IRQ won't help here.
+	 */
+	this_cpu_write(cache_state_incoherent, true);
+
+	ret = func(fn, args);
+
+	preempt_enable();
+
+	return ret;
+}
+
 static inline u64 sc_retry(sc_func_t func, u64 fn,
 			   struct tdx_module_args *args)
 {
@@ -107,7 +136,7 @@ static inline u64 sc_retry(sc_func_t func, u64 fn,
 	u64 ret;
 
 	do {
-		ret = func(fn, args);
+		ret = do_seamcall(func, fn, args);
 	} while (ret == TDX_RND_NO_ENTROPY && --retry);
 
 	return ret;
-- 
2.43.0
[PATCH v2.1 2/5] x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL
Posted by Kai Huang 9 months ago
On TDX platforms, at hardware level dirty cachelines with and without
TDX keyID can coexist, and CPU can flush them back to memory in random
order.  During kexec, the caches must be flushed before jumping to the
new kernel to avoid silent memory corruption when a cacheline with a
different encryption property is written back over whatever encryption
properties the new kernel is using.

A percpu boolean is used to mark whether the cache of a given CPU may be
in an incoherent state, and the kexec performs WBINVD on the CPUs with
that boolean turned on.

For TDX, only the TDX module or the TDX guests can generate dirty
cachelines of TDX private memory, i.e., they are only generated when the
kernel does SEAMCALL.

Turn on that boolean when the kernel does SEAMCALL so that kexec can
correctly flush cache.  Note not all SEAMCALL leaf functions generate
dirty cachelines of TDX private memory, but for simplicity, just treat
all of them do.

SEAMCALL can be made from both task context and IRQ disabled context.
Given SEAMCALL is just a lengthy instruction (e.g., thousands of cycles)
from kernel's point of view and preempt_{disable|enable}() is cheap
compared to it, simply unconditionally disable preemption during setting
the percpu boolean and making SEAMCALL.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---

v2 -> v2.1:
 - Include <linux/preempt.h> to fix a build issue reported by LKP using
   'x86_64-allyesconfig' config.

---
 arch/x86/include/asm/tdx.h | 32 +++++++++++++++++++++++++++++++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h
index 4a1922ec80cf..e69021aee731 100644
--- a/arch/x86/include/asm/tdx.h
+++ b/arch/x86/include/asm/tdx.h
@@ -96,10 +96,40 @@ u64 __seamcall_ret(u64 fn, struct tdx_module_args *args);
 u64 __seamcall_saved_ret(u64 fn, struct tdx_module_args *args);
 void tdx_init(void);
 
+#include <linux/preempt.h>
 #include <asm/archrandom.h>
+#include <asm/processor.h>
 
 typedef u64 (*sc_func_t)(u64 fn, struct tdx_module_args *args);
 
+static inline u64 do_seamcall(sc_func_t func, u64 fn,
+			      struct tdx_module_args *args)
+{
+	u64 ret;
+
+	preempt_disable();
+
+	/*
+	 * SEAMCALLs are made to the TDX module and can generate dirty
+	 * cachelines of TDX private memory.  Mark cache state incoherent
+	 * so that the cache can be flushed during kexec.
+	 *
+	 * Not all SEAMCALL leaf functions generate dirty cachelines
+	 * but for simplicity just treat all of them do.
+	 *
+	 * This needs to be done before actually making the SEAMCALL,
+	 * because kexec-ing CPU could send NMI to stop remote CPUs,
+	 * in which case even disabling IRQ won't help here.
+	 */
+	this_cpu_write(cache_state_incoherent, true);
+
+	ret = func(fn, args);
+
+	preempt_enable();
+
+	return ret;
+}
+
 static inline u64 sc_retry(sc_func_t func, u64 fn,
 			   struct tdx_module_args *args)
 {
@@ -107,7 +137,7 @@ static inline u64 sc_retry(sc_func_t func, u64 fn,
 	u64 ret;
 
 	do {
-		ret = func(fn, args);
+		ret = do_seamcall(func, fn, args);
 	} while (ret == TDX_RND_NO_ENTROPY && --retry);
 
 	return ret;
-- 
2.49.0
Re: [PATCH v2.1 2/5] x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL
Posted by Dave Hansen 8 months, 4 weeks ago
On 5/14/25 03:10, Kai Huang wrote:
> Turn on that boolean when the kernel does SEAMCALL so that kexec can
> correctly flush cache.  Note not all SEAMCALL leaf functions generate
> dirty cachelines of TDX private memory, but for simplicity, just treat
> all of them do.

It's not just for simplicity.

There's no contract in place for when the TDX module will dirty memory
or not. A call that is "clean" today might dirty memory tomorrow.

The _only_ thing we know is that SEAMCALLs can dirty cachelines. We
don't know when or how they do it. This blurb makes it sound like it's
possible to optimize this. It's not.
Re: [PATCH v2.1 2/5] x86/virt/tdx: Mark memory cache state incoherent when making SEAMCALL
Posted by Huang, Kai 8 months, 4 weeks ago
On Wed, 2025-05-14 at 08:52 -0700, Hansen, Dave wrote:
> On 5/14/25 03:10, Kai Huang wrote:
> > Turn on that boolean when the kernel does SEAMCALL so that kexec can
> > correctly flush cache.  Note not all SEAMCALL leaf functions generate
> > dirty cachelines of TDX private memory, but for simplicity, just treat
> > all of them do.
> 
> It's not just for simplicity.
> 
> There's no contract in place for when the TDX module will dirty memory
> or not. A call that is "clean" today might dirty memory tomorrow.
> 
> The _only_ thing we know is that SEAMCALLs can dirty cachelines. We
> don't know when or how they do it. This blurb makes it sound like it's
> possible to optimize this. It's not.

I thought it should not be possible to generate dirty cachelines with TDX
private KeyIDs before the very first TDX KeyID (which is global KeyID) is
configured, but right I guess we'd better not assume that.

I'll remote the "Note ..." part.  Thanks!