[PATCH v3 3/6] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum

Kai Huang posted 6 patches 3 months, 2 weeks ago
There is a newer version of this series
[PATCH v3 3/6] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Kai Huang 3 months, 2 weeks ago
Some early TDX-capable platforms have an erratum: A kernel partial
write (a write transaction of less than cacheline lands at memory
controller) to TDX private memory poisons that memory, and a subsequent
read triggers a machine check.

On those platforms, the old kernel must reset TDX private memory before
jumping to the new kernel, otherwise the new kernel may see unexpected
machine check.  Currently the kernel doesn't track which page is a TDX
private page.  For simplicity just fail kexec/kdump for those platforms.

Leverage the existing machine_kexec_prepare() to fail kexec/kdump by
adding the check of the presence of the TDX erratum (which is only
checked for if the kernel is built with TDX host support).  This rejects
kexec/kdump when the kernel is loading the kexec/kdump kernel image.

The alternative is to reject kexec/kdump when the kernel is jumping to
the new kernel.  But for kexec this requires adding a new check (e.g.,
arch_kexec_allowed()) in the common code to fail kernel_kexec() at early
stage.  Kdump (crash_kexec()) needs similar check, but it's hard to
justify because crash_kexec() is not supposed to abort.

It's feasible to further relax this limitation, i.e., only fail kexec
when TDX is actually enabled by the kernel.  But this is still a half
measure compared to resetting TDX private memory so just do the simplest
thing for now.

The impact to userspace is the users will get an error when loading the
kexec/kdump kernel image:

  kexec_load failed: Operation not supported

This might be confusing to the users, thus also print the reason in the
dmesg:

  [..] kexec: not allowed on platform with tdx_pw_mce bug.

Signed-off-by: Kai Huang <kai.huang@intel.com>
Tested-by: Farrah Chen <farrah.chen@intel.com>
---
 arch/x86/kernel/machine_kexec_64.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 4519c7b75c49..d5a85d786e61 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -347,6 +347,22 @@ int machine_kexec_prepare(struct kimage *image)
 	unsigned long reloc_end = (unsigned long)__relocate_kernel_end;
 	int result;
 
+	/*
+	 * Some early TDX-capable platforms have an erratum.  A kernel
+	 * partial write (a write transaction of less than cacheline
+	 * lands at memory controller) to TDX private memory poisons that
+	 * memory, and a subsequent read triggers a machine check.
+	 *
+	 * On those platforms the old kernel must reset TDX private
+	 * memory before jumping to the new kernel otherwise the new
+	 * kernel may see unexpected machine check.  For simplicity
+	 * just fail kexec/kdump on those platforms.
+	 */
+	if (boot_cpu_has_bug(X86_BUG_TDX_PW_MCE)) {
+		pr_info_once("Not allowed on platform with tdx_pw_mce bug\n");
+		return -EOPNOTSUPP;
+	}
+
 	/* Setup the identity mapped 64bit page table */
 	result = init_pgtable(image, __pa(control_page));
 	if (result)
-- 
2.49.0
Re: [PATCH v3 3/6] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Chao Gao 3 months, 1 week ago
On Thu, Jun 26, 2025 at 10:48:49PM +1200, Kai Huang wrote:
>Some early TDX-capable platforms have an erratum: A kernel partial
>write (a write transaction of less than cacheline lands at memory
>controller) to TDX private memory poisons that memory, and a subsequent
>read triggers a machine check.
>
>On those platforms, the old kernel must reset TDX private memory before
>jumping to the new kernel, otherwise the new kernel may see unexpected
>machine check.  Currently the kernel doesn't track which page is a TDX
>private page.  For simplicity just fail kexec/kdump for those platforms.

My understanding is that the kdump kernel uses a small amount of memory
reserved at boot, which the crashed kernel never accesses. And the kdump
kernel reads the memory of the crashed kernel and doesn't overwrite it.
So it should be safe to allow kdump (i.e., no partial write to private
memory). Anything I missed?

(I am not asking to enable kdump in *this* series; I'm just trying to
understand the rationale behind disabling kdump)

>
>Leverage the existing machine_kexec_prepare() to fail kexec/kdump by
>adding the check of the presence of the TDX erratum (which is only
>checked for if the kernel is built with TDX host support).  This rejects
>kexec/kdump when the kernel is loading the kexec/kdump kernel image.
>
>The alternative is to reject kexec/kdump when the kernel is jumping to
>the new kernel.  But for kexec this requires adding a new check (e.g.,
>arch_kexec_allowed()) in the common code to fail kernel_kexec() at early
>stage.  Kdump (crash_kexec()) needs similar check, but it's hard to
>justify because crash_kexec() is not supposed to abort.
>
>It's feasible to further relax this limitation, i.e., only fail kexec
>when TDX is actually enabled by the kernel.  But this is still a half
>measure compared to resetting TDX private memory so just do the simplest
>thing for now.
>
>The impact to userspace is the users will get an error when loading the
>kexec/kdump kernel image:
>
>  kexec_load failed: Operation not supported
>
>This might be confusing to the users, thus also print the reason in the
>dmesg:
>
>  [..] kexec: not allowed on platform with tdx_pw_mce bug.
>
>Signed-off-by: Kai Huang <kai.huang@intel.com>
>Tested-by: Farrah Chen <farrah.chen@intel.com>
Re: [PATCH v3 3/6] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Huang, Kai 3 months, 1 week ago
On Wed, 2025-07-02 at 16:25 +0800, Gao, Chao wrote:
> On Thu, Jun 26, 2025 at 10:48:49PM +1200, Kai Huang wrote:
> > Some early TDX-capable platforms have an erratum: A kernel partial
> > write (a write transaction of less than cacheline lands at memory
> > controller) to TDX private memory poisons that memory, and a subsequent
> > read triggers a machine check.
> > 
> > On those platforms, the old kernel must reset TDX private memory before
> > jumping to the new kernel, otherwise the new kernel may see unexpected
> > machine check.  Currently the kernel doesn't track which page is a TDX
> > private page.  For simplicity just fail kexec/kdump for those platforms.
> 
> My understanding is that the kdump kernel uses a small amount of memory
> reserved at boot, which the crashed kernel never accesses. And the kdump
> kernel reads the memory of the crashed kernel and doesn't overwrite it.
> So it should be safe to allow kdump (i.e., no partial write to private
> memory). Anything I missed?
> 
> (I am not asking to enable kdump in *this* series; I'm just trying to
> understand the rationale behind disabling kdump)

As you said it *should* be safe.  The kdump kernel should only read TDX
private memory but not write.  But I cannot say I am 100% sure (there are
many things involved when generating the kdump file such as memory
compression) so in internal discussion we thought we should just disable it.
Re: [PATCH v3 3/6] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Vishal Annapurve 3 months, 1 week ago
On Wed, Jul 2, 2025 at 1:45 AM Huang, Kai <kai.huang@intel.com> wrote:
>
> On Wed, 2025-07-02 at 16:25 +0800, Gao, Chao wrote:
> > On Thu, Jun 26, 2025 at 10:48:49PM +1200, Kai Huang wrote:
> > > Some early TDX-capable platforms have an erratum: A kernel partial
> > > write (a write transaction of less than cacheline lands at memory
> > > controller) to TDX private memory poisons that memory, and a subsequent
> > > read triggers a machine check.
> > >
> > > On those platforms, the old kernel must reset TDX private memory before
> > > jumping to the new kernel, otherwise the new kernel may see unexpected
> > > machine check.  Currently the kernel doesn't track which page is a TDX
> > > private page.  For simplicity just fail kexec/kdump for those platforms.
> >
> > My understanding is that the kdump kernel uses a small amount of memory
> > reserved at boot, which the crashed kernel never accesses. And the kdump
> > kernel reads the memory of the crashed kernel and doesn't overwrite it.
> > So it should be safe to allow kdump (i.e., no partial write to private
> > memory). Anything I missed?
> >
> > (I am not asking to enable kdump in *this* series; I'm just trying to
> > understand the rationale behind disabling kdump)
>
> As you said it *should* be safe.  The kdump kernel should only read TDX
> private memory but not write.  But I cannot say I am 100% sure (there are
> many things involved when generating the kdump file such as memory
> compression) so in internal discussion we thought we should just disable it.

So what's the side-effect of enabling kdump, in the worst case kdump
kernel crashes and in the most likely scenario kdump will generate a
lot of important data to analyze from the host failure.

Allowing kdump seems to be a net positive outcome to me. Am I missing
something? If not, my vote would be to enable/allow kdump for such
platforms in this series itself.
Re: [PATCH v3 3/6] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Edgecombe, Rick P 3 months, 1 week ago
On Wed, 2025-07-02 at 15:16 -0700, Vishal Annapurve wrote:
> > As you said it *should* be safe.  The kdump kernel should only read TDX
> > private memory but not write.  But I cannot say I am 100% sure (there are
> > many things involved when generating the kdump file such as memory
> > compression) so in internal discussion we thought we should just disable it.
> 
> So what's the side-effect of enabling kdump, in the worst case kdump
> kernel crashes and in the most likely scenario kdump will generate a
> lot of important data to analyze from the host failure.
> 
> Allowing kdump seems to be a net positive outcome to me. Am I missing
> something? If not, my vote would be to enable/allow kdump for such
> platforms in this series itself.

This reasoning makes sense. But today there is no way to even configure kexec
when TDX is configured. It blocks TDX for distro based hosts. Kdump can always
be expanded in a follow up. The series has been tricky and so it's nice to not
have to tackle all the angles before getting at least some support back.
Re: [PATCH v3 3/6] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Binbin Wu 3 months, 1 week ago

On 6/26/2025 6:48 PM, Kai Huang wrote:
> Some early TDX-capable platforms have an erratum: A kernel partial
> write (a write transaction of less than cacheline lands at memory
> controller) to TDX private memory poisons that memory, and a subsequent
> read triggers a machine check.
>
> On those platforms, the old kernel must reset TDX private memory before
> jumping to the new kernel, otherwise the new kernel may see unexpected
> machine check.  Currently the kernel doesn't track which page is a TDX
> private page.  For simplicity just fail kexec/kdump for those platforms.
>
> Leverage the existing machine_kexec_prepare() to fail kexec/kdump by
> adding the check of the presence of the TDX erratum (which is only
> checked for if the kernel is built with TDX host support).  This rejects
> kexec/kdump when the kernel is loading the kexec/kdump kernel image.
>
> The alternative is to reject kexec/kdump when the kernel is jumping to
> the new kernel.  But for kexec this requires adding a new check (e.g.,
> arch_kexec_allowed()) in the common code to fail kernel_kexec() at early
> stage.  Kdump (crash_kexec()) needs similar check, but it's hard to
> justify because crash_kexec() is not supposed to abort.
>
> It's feasible to further relax this limitation, i.e., only fail kexec
> when TDX is actually enabled by the kernel.  But this is still a half
> measure compared to resetting TDX private memory so just do the simplest
> thing for now.
>
> The impact to userspace is the users will get an error when loading the
> kexec/kdump kernel image:
>
>    kexec_load failed: Operation not supported
>
> This might be confusing to the users, thus also print the reason in the
> dmesg:
>
>    [..] kexec: not allowed on platform with tdx_pw_mce bug.
>
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Tested-by: Farrah Chen <farrah.chen@intel.com>
> ---
>   arch/x86/kernel/machine_kexec_64.c | 16 ++++++++++++++++
>   1 file changed, 16 insertions(+)
>
> diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
> index 4519c7b75c49..d5a85d786e61 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -347,6 +347,22 @@ int machine_kexec_prepare(struct kimage *image)
>   	unsigned long reloc_end = (unsigned long)__relocate_kernel_end;
>   	int result;
>   
> +	/*
> +	 * Some early TDX-capable platforms have an erratum.  A kernel
> +	 * partial write (a write transaction of less than cacheline
> +	 * lands at memory controller) to TDX private memory poisons that
> +	 * memory, and a subsequent read triggers a machine check.
> +	 *
Nit: About the description of the erratum, maybe it's better to refer to the
comments of check_tdx_erratum() to avoid duplication. Also it gives a link to
how/when the bug is set.

Otherwise,
Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>


> +	 * On those platforms the old kernel must reset TDX private
> +	 * memory before jumping to the new kernel otherwise the new
> +	 * kernel may see unexpected machine check.  For simplicity
> +	 * just fail kexec/kdump on those platforms.
> +	 */
> +	if (boot_cpu_has_bug(X86_BUG_TDX_PW_MCE)) {
> +		pr_info_once("Not allowed on platform with tdx_pw_mce bug\n");
> +		return -EOPNOTSUPP;
> +	}
> +
>   	/* Setup the identity mapped 64bit page table */
>   	result = init_pgtable(image, __pa(control_page));
>   	if (result)
Re: [PATCH v3 3/6] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Huang, Kai 3 months, 1 week ago
> > +	/*
> > +	 * Some early TDX-capable platforms have an erratum.  A kernel
> > +	 * partial write (a write transaction of less than cacheline
> > +	 * lands at memory controller) to TDX private memory poisons that
> > +	 * memory, and a subsequent read triggers a machine check.
> > +	 *
> Nit: About the description of the erratum, maybe it's better to refer to the
> comments of check_tdx_erratum() to avoid duplication. Also it gives a link to
> how/when the bug is set.

I am not sure pointing to the comment at another place is desired,
especially at another file.  It could be done in some cases but IMHO in
general it's better to have a "standalone" comment so we don't need to jump
back and forth, or need to care about the case that the other comment could
be updated etc.

> 
> Otherwise,
> Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com>

Thanks.
Re: [PATCH v3 3/6] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Edgecombe, Rick P 3 months, 1 week ago
On Thu, 2025-06-26 at 22:48 +1200, Kai Huang wrote:
> Some early TDX-capable platforms have an erratum: A kernel partial
> write (a write transaction of less than cacheline lands at memory
> controller) to TDX private memory poisons that memory, and a subsequent
> read triggers a machine check.
> 
> On those platforms, the old kernel must reset TDX private memory before
> jumping to the new kernel, otherwise the new kernel may see unexpected
> machine check.  Currently the kernel doesn't track which page is a TDX
> private page.  For simplicity just fail kexec/kdump for those platforms.
> 
> Leverage the existing machine_kexec_prepare() to fail kexec/kdump by
> adding the check of the presence of the TDX erratum (which is only
> checked for if the kernel is built with TDX host support).  This rejects
> kexec/kdump when the kernel is loading the kexec/kdump kernel image.
> 
> The alternative is to reject kexec/kdump when the kernel is jumping to
> the new kernel.  But for kexec this requires adding a new check (e.g.,
> arch_kexec_allowed()) in the common code to fail kernel_kexec() at early
> stage.  Kdump (crash_kexec()) needs similar check, but it's hard to
> justify because crash_kexec() is not supposed to abort.
> 
> It's feasible to further relax this limitation, i.e., only fail kexec
> when TDX is actually enabled by the kernel.  But this is still a half
> measure compared to resetting TDX private memory so just do the simplest
> thing for now.
> 
> The impact to userspace is the users will get an error when loading the
> kexec/kdump kernel image:
> 
>   kexec_load failed: Operation not supported
> 
> This might be confusing to the users, thus also print the reason in the
> dmesg:
> 
>   [..] kexec: not allowed on platform with tdx_pw_mce bug.
> 
> Signed-off-by: Kai Huang <kai.huang@intel.com>
> Tested-by: Farrah Chen <farrah.chen@intel.com>
> 

This does mean that kdump will not be allowed on these platforms if TDX is
configured in the BIOS, even if they don't set the kvm.tdx module param to
actually use it. Today it is not easy to accidentally turn on TDX in the BIOS,
so this would not usually happen by accident. Some future platforms might make
it easier, but today we don't support any kexec if TDX is configured today. So
this still opens up more capability than existed before. All considered, I think
it's a good direction.

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>