[RFC PATCH 3/5] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum

Kai Huang posted 5 patches 9 months, 1 week ago
There is a newer version of this series
[RFC PATCH 3/5] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Kai Huang 9 months, 1 week ago
Some early TDX-capable platforms have an erratum: A kernel partial
write (a write transaction of less than cacheline lands at memory
controller) to TDX private memory poisons that memory, and a subsequent
read triggers a machine check.

On those platforms, the old kernel must reset TDX private memory before
jumping to the new kernel, otherwise the new kernel may see unexpected
machine check.  Currently the kernel doesn't track which page is a TDX
private page.  For simplicity just fail kexec/kdump for those platforms.

Leverage the existing machine_kexec_prepare() to fail kexec/kdump by
adding the check of the presence of the TDX erratum (which is only
checked for if the kernel is built with TDX host support).  This rejects
kexec/kdump when the kernel is loading the kexec/kdump kernel image.

The alternative is to reject kexec/kdump when the kernel is jumping to
the new kernel.  But for kexec this requires adding a new check (e.g.,
arch_kexec_allowed()) in the common code to fail kernel_kexec() at early
stage.  Kdump (crash_kexec()) needs similar check, but it's hard to
justify because crash_kexec() is not supposed to abort.

It's feasible to further relax this limitation, i.e., only fail kexec
when TDX is actually enabled by the kernel.  But this is still a half
measure compared to resetting TDX private memory so just do the simplest
thing for now.

The impact to userspace is the users will get an error when loading the
kexec/kdump kernel image:

  kexec_load failed: Operation not supported

This might be confusing to the users, thus also print the reason in the
dmesg:

  [..] kexec: not allowed on platform with tdx_pw_mce bug.

Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/kernel/machine_kexec_64.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index 0e9808eeb63e..e438c4163960 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -311,6 +311,22 @@ int machine_kexec_prepare(struct kimage *image)
 	unsigned long reloc_end = (unsigned long)__relocate_kernel_end;
 	int result;
 
+	/*
+	 * Some early TDX-capable platforms have an erratum.  A kernel
+	 * partial write (a write transaction of less than cacheline
+	 * lands at memory controller) to TDX private memory poisons that
+	 * memory, and a subsequent read triggers a machine check.
+	 *
+	 * On those platforms the old kernel must reset TDX private
+	 * memory before jumping to the new kernel otherwise the new
+	 * kernel may see unexpected machine check.  For simplicity
+	 * just fail kexec/kdump on those platforms.
+	 */
+	if (boot_cpu_has_bug(X86_BUG_TDX_PW_MCE)) {
+		pr_info_once("Not allowed on platform with tdx_pw_mce bug\n");
+		return -EOPNOTSUPP;
+	}
+
 	/* Setup the identity mapped 64bit page table */
 	result = init_pgtable(image, __pa(control_page));
 	if (result)
-- 
2.48.1
Re: [RFC PATCH 3/5] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Edgecombe, Rick P 9 months, 1 week ago
On Thu, 2025-03-13 at 00:34 +1300, Kai Huang wrote:
> The impact to userspace is the users will get an error when loading the
> kexec/kdump kernel image:
> 
>   kexec_load failed: Operation not supported
> 
> This might be confusing to the users, thus also print the reason in the
> dmesg:
> 
>   [..] kexec: not allowed on platform with tdx_pw_mce bug.

Continuing an internal discussion... A problem with the plan to more simply
start support for kexec by not supporting the errata platforms is that when
these platforms configure TDX they will lose kexec by default.

Probably a better default for a lot of kernels would be to have kexec work by
default, and require opt-in to use TDX (and lose kexec). One idea was a kernel
parameter be required to enable TDX on those platforms. But then we are starting
to add complexity to avoid other complexity (the errata platform kexec support).

Still, it may be a net win on complexity.
Re: [RFC PATCH 3/5] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Huang, Kai 9 months, 1 week ago
On Wed, 2025-03-12 at 23:27 +0000, Edgecombe, Rick P wrote:
> On Thu, 2025-03-13 at 00:34 +1300, Kai Huang wrote:
> > The impact to userspace is the users will get an error when loading the
> > kexec/kdump kernel image:
> > 
> >   kexec_load failed: Operation not supported
> > 
> > This might be confusing to the users, thus also print the reason in the
> > dmesg:
> > 
> >   [..] kexec: not allowed on platform with tdx_pw_mce bug.
> 
> Continuing an internal discussion... A problem with the plan to more simply
> start support for kexec by not supporting the errata platforms is that when
> these platforms configure TDX they will lose kexec by default.
> 
> Probably a better default for a lot of kernels would be to have kexec work by
> default, and require opt-in to use TDX (and lose kexec). One idea was a kernel
> parameter be required to enable TDX on those platforms. But then we are starting
> to add complexity to avoid other complexity (the errata platform kexec support).
> 
> Still, it may be a net win on complexity.

We can add a kernel parameter 'tdx_host={on|off}' and skip all TDX code (thus no
erratum detection) when it is off.  I suppose it will be useful in general
anyway even w/o the context of kexec.
Re: [RFC PATCH 3/5] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Edgecombe, Rick P 9 months, 1 week ago
On Thu, 2025-03-13 at 00:57 +0000, Huang, Kai wrote:
> > Continuing an internal discussion... A problem with the plan to more simply
> > start support for kexec by not supporting the errata platforms is that when
> > these platforms configure TDX they will lose kexec by default.
> > 
> > Probably a better default for a lot of kernels would be to have kexec work
> > by
> > default, and require opt-in to use TDX (and lose kexec). One idea was a
> > kernel
> > parameter be required to enable TDX on those platforms. But then we are
> > starting
> > to add complexity to avoid other complexity (the errata platform kexec
> > support).
> > 
> > Still, it may be a net win on complexity.
> 
> We can add a kernel parameter 'tdx_host={on|off}' and skip all TDX code (thus
> no
> erratum detection) when it is off.  I suppose it will be useful in general
> anyway even w/o the context of kexec.

What exactly are you thinking? Add a tdx_host parameter, but what is the default
behavior? When tdx_host=on with the errata, kexec must still be disabled, right?
Better to return an error, than proceed and crash.
Re: [RFC PATCH 3/5] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Huang, Kai 9 months, 1 week ago
On Thu, 2025-03-13 at 17:18 +0000, Edgecombe, Rick P wrote:
> On Thu, 2025-03-13 at 00:57 +0000, Huang, Kai wrote:
> > > Continuing an internal discussion... A problem with the plan to more simply
> > > start support for kexec by not supporting the errata platforms is that when
> > > these platforms configure TDX they will lose kexec by default.
> > > 
> > > Probably a better default for a lot of kernels would be to have kexec work
> > > by
> > > default, and require opt-in to use TDX (and lose kexec). One idea was a
> > > kernel
> > > parameter be required to enable TDX on those platforms. But then we are
> > > starting
> > > to add complexity to avoid other complexity (the errata platform kexec
> > > support).
> > > 
> > > Still, it may be a net win on complexity.
> > 
> > We can add a kernel parameter 'tdx_host={on|off}' and skip all TDX code (thus
> > no
> > erratum detection) when it is off.  I suppose it will be useful in general
> > anyway even w/o the context of kexec.
> 
> What exactly are you thinking? Add a tdx_host parameter, but what is the default
> behavior? When tdx_host=on with the errata, kexec must still be disabled, right?
> Better to return an error, than proceed and crash.

The default behaviour is tdx_host=off in order to not disrupt kexec/kdump
behaviour on the TDX platforms with erratum.  The distros will be able to ship
kernels with both CONFIG_KEXEC_CORE and CONFIG_INTEL_TDX_HOST on, and no visible
impact to the user who doesn't care about TDX.

If the user is interested in TDX, tdx_host=on must be set in the kernel command
line, but in this case user is expected to know kexec/kdump can only work
normally if the TDX platform doesn't have the erratum -- kexec/kdump are
disabled if the platform has the erratum.
Re: [RFC PATCH 3/5] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Edgecombe, Rick P 9 months, 1 week ago
On Thu, 2025-03-13 at 22:32 +0000, Huang, Kai wrote:
> > > 
> > > We can add a kernel parameter 'tdx_host={on|off}' and skip all TDX code
> > > (thus
> > > no
> > > erratum detection) when it is off.  I suppose it will be useful in general
> > > anyway even w/o the context of kexec.
> > 
> > What exactly are you thinking? Add a tdx_host parameter, but what is the
> > default
> > behavior? When tdx_host=on with the errata, kexec must still be disabled,
> > right?
> > Better to return an error, than proceed and crash.
> 
> The default behaviour is tdx_host=off in order to not disrupt kexec/kdump
> behaviour on the TDX platforms with erratum.  The distros will be able to ship
> kernels with both CONFIG_KEXEC_CORE and CONFIG_INTEL_TDX_HOST on, and no
> visible
> impact to the user who doesn't care about TDX.
> 
> If the user is interested in TDX, tdx_host=on must be set in the kernel
> command
> line, but in this case user is expected to know kexec/kdump can only work
> normally if the TDX platform doesn't have the erratum -- kexec/kdump are
> disabled if the platform has the erratum.

So this will switch all of TDX to be default off then, unless the kernel gets a
parameter set. In which case we could also just unlock the Kconfig with just one
small change. TDX and kexec would still mutually exclusive, but just at runtime.
We should try to flag Paolo and see what he thinks.

Or is the proposal to only be default tdx_host=off on the errata platforms? And
tdx_host=on otherwise?

It seems like this series is close though, and would probably be wanted sooner
than later.



Re: [RFC PATCH 3/5] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Huang, Kai 9 months, 1 week ago
On Thu, 2025-03-13 at 22:47 +0000, Edgecombe, Rick P wrote:
> On Thu, 2025-03-13 at 22:32 +0000, Huang, Kai wrote:
> > > > 
> > > > We can add a kernel parameter 'tdx_host={on|off}' and skip all TDX code
> > > > (thus
> > > > no
> > > > erratum detection) when it is off.  I suppose it will be useful in general
> > > > anyway even w/o the context of kexec.
> > > 
> > > What exactly are you thinking? Add a tdx_host parameter, but what is the
> > > default
> > > behavior? When tdx_host=on with the errata, kexec must still be disabled,
> > > right?
> > > Better to return an error, than proceed and crash.
> > 
> > The default behaviour is tdx_host=off in order to not disrupt kexec/kdump
> > behaviour on the TDX platforms with erratum.  The distros will be able to ship
> > kernels with both CONFIG_KEXEC_CORE and CONFIG_INTEL_TDX_HOST on, and no
> > visible
> > impact to the user who doesn't care about TDX.
> > 
> > If the user is interested in TDX, tdx_host=on must be set in the kernel
> > command
> > line, but in this case user is expected to know kexec/kdump can only work
> > normally if the TDX platform doesn't have the erratum -- kexec/kdump are
> > disabled if the platform has the erratum.
> 
> So this will switch all of TDX to be default off then, unless the kernel gets a
> parameter set. 
> 

Currently in KVM TDX is also default off.

> In which case we could also just unlock the Kconfig with just one
> small change. TDX and kexec would still mutually exclusive, but just at runtime.

Yeah I am thinking this too, given the "keyID 0 integrity" thing are still on-
going.

> We should try to flag Paolo and see what he thinks.

I appreciate if you could help to do.

> 
> Or is the proposal to only be default tdx_host=off on the errata platforms? And
> tdx_host=on otherwise?

The tricky thing is, naturally, we want to skip all the code in tdx_init() if
tdx_host=off, because there's no reason to do those detection/initialization if
we are not going to use TDX, e.g., we don't need to this one:

	register_memory_notifier(&tdx_memory_nb);

.. that means the code of detecting erratum will be skipped too.

If we only to only make tdx_host=off as default for erratum platforms, then we
need to do cleanup (e.g., to unregister the above memory notifier).

This isn't nice and seems hacky.

I don't see making tdx_host=off as default has problem, anyway, as mentioned
above TDX is off by default in KVM.

Re: [RFC PATCH 3/5] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Edgecombe, Rick P 9 months ago
On Thu, 2025-03-13 at 23:57 +0000, Huang, Kai wrote:
> > So this will switch all of TDX to be default off then, unless the kernel
> > gets a
> > parameter set. 
> > 
> 
> Currently in KVM TDX is also default off.

Good point. It begs the question of how many command line options the user
should have to pass to enable TDX.

> 
> > In which case we could also just unlock the Kconfig with just one
> > small change. TDX and kexec would still mutually exclusive, but just at
> > runtime.
> 
> Yeah I am thinking this too, given the "keyID 0 integrity" thing are still on-
> going.

You mentioned offline that there used to be a command line option, but it was
removed after discussion with Dave. I went to look for it and only found this:
https://lore.kernel.org/lkml/7e63912a-895f-d3b3-3173-336beaa86d08@intel.com/

...where Dave just asks why it's needed. In the next version it's dropped.
Unless there is anything more, it doesn't seem like there was really any
objection.

> 
> > We should try to flag Paolo and see what he thinks.
> 
> I appreciate if you could help to do.
> 
> > 
> > Or is the proposal to only be default tdx_host=off on the errata platforms?
> > And
> > tdx_host=on otherwise?
> 
> The tricky thing is, naturally, we want to skip all the code in tdx_init() if
> tdx_host=off, because there's no reason to do those detection/initialization
> if
> we are not going to use TDX, e.g., we don't need to this one:
> 
> 	register_memory_notifier(&tdx_memory_nb);
> 
> .. that means the code of detecting erratum will be skipped too.
> 
> If we only to only make tdx_host=off as default for erratum platforms, then we
> need to do cleanup (e.g., to unregister the above memory notifier).

This is a strange point. The errata detection is not dependent on the earlier
code in TDX init. It couldn't just be moved?

> 
> This isn't nice and seems hacky.
> 
> I don't see making tdx_host=off as default has problem, anyway, as mentioned
> above TDX is off by default in KVM.

Yea, tdx_host=!errata as a default value makes it more complicated.


So I think the situation is we need at one kernel parameter. We already have one
for KVM, which controls the late initialization parts of TDX that we care about
here. So what about just using the existing one? I think we don't want two.

If KVM has not initialized TDX (based on its own TDX parameter), then kexec is
fine. It could work by exposing an interface for features to be exclusive with
TDX. Since real TDX module initialization happens late anyway. I don't know if
it's better than a kernel one, but I don't see adding a second one going well.


Very, very rough:

diff --git a/arch/x86/kernel/machine_kexec_64.c
b/arch/x86/kernel/machine_kexec_64.c
index a68f5a0a9f37..bfea4e78c577 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -315,6 +315,12 @@ int machine_kexec_prepare(struct kimage *image)
        result = init_pgtable(image, __pa(control_page));
        if (result)
                return result;
+
+       if (tdx_exclude_feature()) {
+               pr_info_once("Not allowed once TDX has been used.\n");
+               return -EOPNOTSUPP;
+       }
+
        kexec_va_control_page = (unsigned long)control_page;
        kexec_pa_table_page = (unsigned long)__pa(image->arch.pgd);
 
diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
index f5e2a937c1e7..9b1f42a1059c 100644
--- a/arch/x86/virt/vmx/tdx/tdx.c
+++ b/arch/x86/virt/vmx/tdx/tdx.c
@@ -1215,6 +1215,21 @@ int tdx_enable(void)
 }
 EXPORT_SYMBOL_GPL(tdx_enable);
 
+bool tdx_exclude_feature(void)
+{
+       bool ret = false;
+
+       mutex_lock(&tdx_module_lock);
+       if (tdx_module_status == TDX_MODULE_INITIALIZED)
+               ret = true;
+       else
+               tdx_module_status = TDX_MODULE_EXCLUDED;
+       mutex_lock(&tdx_module_lock);
+
+       return ret;
+}
+
 static bool is_pamt_page(unsigned long phys)
 {
        struct tdmr_info_list *tdmr_list = &tdx_tdmr_list;


Re: [RFC PATCH 3/5] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Huang, Kai 9 months ago
On Fri, 2025-03-14 at 19:03 +0000, Edgecombe, Rick P wrote:
> On Thu, 2025-03-13 at 23:57 +0000, Huang, Kai wrote:
> > > So this will switch all of TDX to be default off then, unless the kernel
> > > gets a
> > > parameter set. 
> > > 
> > 
> > Currently in KVM TDX is also default off.
> 
> Good point. It begs the question of how many command line options the user
> should have to pass to enable TDX.
> 
> > 
> > > In which case we could also just unlock the Kconfig with just one
> > > small change. TDX and kexec would still mutually exclusive, but just at
> > > runtime.
> > 
> > Yeah I am thinking this too, given the "keyID 0 integrity" thing are still on-
> > going.
> 
> You mentioned offline that there used to be a command line option, but it was
> removed after discussion with Dave. I went to look for it and only found this:
> https://lore.kernel.org/lkml/7e63912a-895f-d3b3-3173-336beaa86d08@intel.com/
> 
> ...where Dave just asks why it's needed. In the next version it's dropped.
> Unless there is anything more, it doesn't seem like there was really any
> objection.

Thanks for digging. :-)

I couldn't find any solid reason to argue against Dave so I just dropped it.  I
could argue that "this allows people to disable TDX once for all" but it was not
something mandatory at that time.

> 
> > 
> > > We should try to flag Paolo and see what he thinks.
> > 
> > I appreciate if you could help to do.
> > 
> > > 
> > > Or is the proposal to only be default tdx_host=off on the errata platforms?
> > > And
> > > tdx_host=on otherwise?
> > 
> > The tricky thing is, naturally, we want to skip all the code in tdx_init() if
> > tdx_host=off, because there's no reason to do those detection/initialization
> > if
> > we are not going to use TDX, e.g., we don't need to this one:
> > 
> > 	register_memory_notifier(&tdx_memory_nb);
> > 
> > .. that means the code of detecting erratum will be skipped too.
> > 
> > If we only to only make tdx_host=off as default for erratum platforms, then we
> > need to do cleanup (e.g., to unregister the above memory notifier).
> 
> This is a strange point. The errata detection is not dependent on the earlier
> code in TDX init. It couldn't just be moved?

Sorry I don't quite follow your point, but seems you agreed it's not a good
idea.

> 
> > 
> > This isn't nice and seems hacky.
> > 
> > I don't see making tdx_host=off as default has problem, anyway, as mentioned
> > above TDX is off by default in KVM.
> 
> Yea, tdx_host=!errata as a default value makes it more complicated.

Yes.

> 
> 
> So I think the situation is we need at one kernel parameter. We already have one
> for KVM, which controls the late initialization parts of TDX that we care about
> here. So what about just using the existing one? I think we don't want two.

Logically, KVM is one user of TDX.  I think whether KVM has a parameter should
not impact whether we should introduce one kernel parameter for TDX host core-
kernel.

Dan also made a point that in the context of TDX Connect, there's requirement to
make SEAMCALLs even KVM is not going to run any TDX guest:

https://lore.kernel.org/kvm/cover.1730120881.git.kai.huang@intel.com/T/#m6928f5519de25def97d47fc6bbb77f5c3e958f7b

So I agree ideally we don't want two, but I think it is also OK if there's good
reason to do so.

> 
> If KVM has not initialized TDX (based on its own TDX parameter), then kexec is
> fine. 
> 

For now.  In the future TDX module could be initialized by other kernel
components.

> It could work by exposing an interface for features to be exclusive with
> TDX. Since real TDX module initialization happens late anyway. I don't know if
> it's better than a kernel one, but I don't see adding a second one going well.
> 
> 
> Very, very rough:
> 
> diff --git a/arch/x86/kernel/machine_kexec_64.c
> b/arch/x86/kernel/machine_kexec_64.c
> index a68f5a0a9f37..bfea4e78c577 100644
> --- a/arch/x86/kernel/machine_kexec_64.c
> +++ b/arch/x86/kernel/machine_kexec_64.c
> @@ -315,6 +315,12 @@ int machine_kexec_prepare(struct kimage *image)
>         result = init_pgtable(image, __pa(control_page));
>         if (result)
>                 return result;
> +
> +       if (tdx_exclude_feature()) {
> +               pr_info_once("Not allowed once TDX has been used.\n");
> +               return -EOPNOTSUPP;
> +       }
> +
>         kexec_va_control_page = (unsigned long)control_page;
>         kexec_pa_table_page = (unsigned long)__pa(image->arch.pgd);
>  
> diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> index f5e2a937c1e7..9b1f42a1059c 100644
> --- a/arch/x86/virt/vmx/tdx/tdx.c
> +++ b/arch/x86/virt/vmx/tdx/tdx.c
> @@ -1215,6 +1215,21 @@ int tdx_enable(void)
>  }
>  EXPORT_SYMBOL_GPL(tdx_enable);
>  
> +bool tdx_exclude_feature(void)
> +{
> +       bool ret = false;
> +
> +       mutex_lock(&tdx_module_lock);
> +       if (tdx_module_status == TDX_MODULE_INITIALIZED)
> +               ret = true;
> +       else
> +               tdx_module_status = TDX_MODULE_EXCLUDED;
> +       mutex_lock(&tdx_module_lock);
> +
> +       return ret;
> +}

Assuming setting module status to "excluded" means we are not able to initialize
TDX module for ever.

The thing is Kexec has two phases: 1) loading kernel image, and 2) actually do
kexec.  Your approach basically marks TDX unusable for ever when a user tries to
load a kxec kernel image, but this is a little bit nasty because loading kexec
kernel image successfully doesn't mean you have to actually do the kexec, i.e.,
you can unload the image and move on.

I am not saying this doesn't work, but IMHO it is more straightforward to just
let user make decision via kernel parameter.
Re: [RFC PATCH 3/5] x86/kexec: Disable kexec/kdump on platforms with TDX partial write erratum
Posted by Edgecombe, Rick P 9 months ago
On Mon, 2025-03-17 at 01:19 +0000, Huang, Kai wrote:
> On Fri, 2025-03-14 at 19:03 +0000, Edgecombe, Rick P wrote:
> > On Thu, 2025-03-13 at 23:57 +0000, Huang, Kai wrote:
> > 
> > 
> > 
> > So I think the situation is we need at one kernel parameter. We already have one
> > for KVM, which controls the late initialization parts of TDX that we care about
> > here. So what about just using the existing one? I think we don't want two.
> 
> Logically, KVM is one user of TDX.  I think whether KVM has a parameter should
> not impact whether we should introduce one kernel parameter for TDX host core-
> kernel.
> 
> Dan also made a point that in the context of TDX Connect, there's requirement to
> make SEAMCALLs even KVM is not going to run any TDX guest:
> 
> https://lore.kernel.org/kvm/cover.1730120881.git.kai.huang@intel.com/T/#m6928f5519de25def97d47fc6bbb77f5c3e958f7b
> 
> So I agree ideally we don't want two, but I think it is also OK if there's good
> reason to do so.

What is the good reason to have two though? Do we just want one host side one
and lose the KVM one? It seems adding kernel parameters to make code problems go
away is usually frowned upon.

> 
> > 
> > If KVM has not initialized TDX (based on its own TDX parameter), then kexec is
> > fine. 
> > 
> 
> For now.  In the future TDX module could be initialized by other kernel
> components.
> 
> > It could work by exposing an interface for features to be exclusive with
> > TDX. Since real TDX module initialization happens late anyway. I don't know if
> > it's better than a kernel one, but I don't see adding a second one going well.
> > 
> > 
> > Very, very rough:
> > 
> > diff --git a/arch/x86/kernel/machine_kexec_64.c
> > b/arch/x86/kernel/machine_kexec_64.c
> > index a68f5a0a9f37..bfea4e78c577 100644
> > --- a/arch/x86/kernel/machine_kexec_64.c
> > +++ b/arch/x86/kernel/machine_kexec_64.c
> > @@ -315,6 +315,12 @@ int machine_kexec_prepare(struct kimage *image)
> >         result = init_pgtable(image, __pa(control_page));
> >         if (result)
> >                 return result;
> > +
> > +       if (tdx_exclude_feature()) {
> > +               pr_info_once("Not allowed once TDX has been used.\n");
> > +               return -EOPNOTSUPP;
> > +       }
> > +
> >         kexec_va_control_page = (unsigned long)control_page;
> >         kexec_pa_table_page = (unsigned long)__pa(image->arch.pgd);
> >  
> > diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c
> > index f5e2a937c1e7..9b1f42a1059c 100644
> > --- a/arch/x86/virt/vmx/tdx/tdx.c
> > +++ b/arch/x86/virt/vmx/tdx/tdx.c
> > @@ -1215,6 +1215,21 @@ int tdx_enable(void)
> >  }
> >  EXPORT_SYMBOL_GPL(tdx_enable);
> >  
> > +bool tdx_exclude_feature(void)
> > +{
> > +       bool ret = false;
> > +
> > +       mutex_lock(&tdx_module_lock);
> > +       if (tdx_module_status == TDX_MODULE_INITIALIZED)
> > +               ret = true;
> > +       else
> > +               tdx_module_status = TDX_MODULE_EXCLUDED;
> > +       mutex_lock(&tdx_module_lock);
> > +
> > +       return ret;
> > +}
> 
> Assuming setting module status to "excluded" means we are not able to initialize
> TDX module for ever.

I was going for the simplest approach without adding a new kernel parameter. But
in practice for distros KVM will load at boot and it should work pretty much the
same. If there is the tdx parameter kexec is disabled, otherwise it's enabled.
> 
> The thing is Kexec has two phases: 1) loading kernel image, and 2) actually do
> kexec.  Your approach basically marks TDX unusable for ever when a user tries to
> load a kxec kernel image, but this is a little bit nasty because loading kexec
> kernel image successfully doesn't mean you have to actually do the kexec, i.e.,
> you can unload the image and move on.

This compared to tdx_host parameter means that sometimes the user may be able to
decide late whether they want TDX or kexec.

> 
> I am not saying this doesn't work, but IMHO it is more straightforward to just
> let user make decision via kernel parameter.

Straightforward, yes agree. It's easier to document and the code would be
simpler.

I'm ok trying the tdx_host method, but I do think we need a better reason for
having two tdx kernel parameters when there is only one users of TDX today
(KVM).