[PATCHv10 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest

Kirill A. Shutemov posted 18 patches 1 year, 10 months ago
There is a newer version of this series
[PATCHv10 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
Posted by Kirill A. Shutemov 1 year, 10 months ago
Depending on setup, TDX guests might be allowed to clear CR4.MCE.
Attempt to clear it leads to #VE.

Use alternatives to keep the flag during kexec for TDX guests.

The change doesn't affect non-TDX-guest environments.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/relocate_kernel_64.S | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 56cab1bb25f5..8e2037d78a1f 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -5,6 +5,8 @@
  */
 
 #include <linux/linkage.h>
+#include <linux/stringify.h>
+#include <asm/alternative.h>
 #include <asm/page_types.h>
 #include <asm/kexec.h>
 #include <asm/processor-flags.h>
@@ -145,11 +147,17 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
 	 * Set cr4 to a known state:
 	 *  - physical address extension enabled
 	 *  - 5-level paging, if it was enabled before
+	 *  - Machine check exception on TDX guest, if it was enabled before.
+	 *    Clearing MCE might not allowed in TDX guests, depending on setup.
 	 */
 	movl	$X86_CR4_PAE, %eax
 	testq	$X86_CR4_LA57, %r13
 	jz	1f
 	orl	$X86_CR4_LA57, %eax
+1:
+	testq	$X86_CR4_MCE, %r13
+	jz	1f
+	ALTERNATIVE "", __stringify(orl $X86_CR4_MCE, %eax), X86_FEATURE_TDX_GUEST
 1:
 	movq	%rax, %cr4
 
-- 
2.43.0
Re: [PATCHv10 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
Posted by Borislav Petkov 1 year, 9 months ago
On Tue, Apr 09, 2024 at 02:29:57PM +0300, Kirill A. Shutemov wrote:
> +1:
> +	testq	$X86_CR4_MCE, %r13
> +	jz	1f
> +	ALTERNATIVE "", __stringify(orl $X86_CR4_MCE, %eax), X86_FEATURE_TDX_GUEST
>  1:

Please add the below patch to your set. Those same-number labels are
just abominable.

Thx.

---
From: "Borislav Petkov (AMD)" <bp@alien8.de>
Date: Tue, 30 Apr 2024 15:00:16 +0200
Subject: [PATCH] x86/relocate_kernel: Use named labels for less confusion

That identity_mapped() function was loving that "1" label to the point
of completely confusing its readers.

Use named labels in each place for clarity.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
---
 arch/x86/kernel/relocate_kernel_64.S | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 8e2037d78a1f..0077c9e562a7 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -152,13 +152,15 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
 	 */
 	movl	$X86_CR4_PAE, %eax
 	testq	$X86_CR4_LA57, %r13
-	jz	1f
+	jz	no_la57
 	orl	$X86_CR4_LA57, %eax
-1:
+no_la57:
+
 	testq	$X86_CR4_MCE, %r13
-	jz	1f
+	jz	mca_off
 	ALTERNATIVE "", __stringify(orl $X86_CR4_MCE, %eax), X86_FEATURE_TDX_GUEST
-1:
+mca_off:
+
 	movq	%rax, %cr4
 
 	jmp 1f
@@ -173,9 +175,9 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
 	 * used by kexec. Flush the caches before copying the kernel.
 	 */
 	testq	%r12, %r12
-	jz 1f
+	jz sme_off
 	wbinvd
-1:
+sme_off:
 
 	movq	%rcx, %r11
 	call	swap_pages
@@ -195,7 +197,7 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
 	 */
 
 	testq	%r11, %r11
-	jnz 1f
+	jnz relocate
 	xorl	%eax, %eax
 	xorl	%ebx, %ebx
 	xorl    %ecx, %ecx
@@ -216,7 +218,7 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
 	ret
 	int3
 
-1:
+relocate:
 	popq	%rdx
 	leaq	PAGE_SIZE(%r10), %rsp
 	ANNOTATE_RETPOLINE_SAFE
-- 
2.43.0

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCHv10 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
Posted by Kirill A. Shutemov 1 year, 9 months ago
On Tue, Apr 30, 2024 at 03:03:23PM +0200, Borislav Petkov wrote:
> On Tue, Apr 09, 2024 at 02:29:57PM +0300, Kirill A. Shutemov wrote:
> > +1:
> > +	testq	$X86_CR4_MCE, %r13
> > +	jz	1f
> > +	ALTERNATIVE "", __stringify(orl $X86_CR4_MCE, %eax), X86_FEATURE_TDX_GUEST
> >  1:
> 
> Please add the below patch to your set. Those same-number labels are
> just abominable.
> 
> Thx.
> 
> ---
> From: "Borislav Petkov (AMD)" <bp@alien8.de>
> Date: Tue, 30 Apr 2024 15:00:16 +0200
> Subject: [PATCH] x86/relocate_kernel: Use named labels for less confusion
> 
> That identity_mapped() function was loving that "1" label to the point
> of completely confusing its readers.
> 
> Use named labels in each place for clarity.
> 
> No functional changes.
> 
> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
> ---
>  arch/x86/kernel/relocate_kernel_64.S | 18 ++++++++++--------
>  1 file changed, 10 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
> index 8e2037d78a1f..0077c9e562a7 100644
> --- a/arch/x86/kernel/relocate_kernel_64.S
> +++ b/arch/x86/kernel/relocate_kernel_64.S
> @@ -152,13 +152,15 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
>  	 */
>  	movl	$X86_CR4_PAE, %eax
>  	testq	$X86_CR4_LA57, %r13
> -	jz	1f
> +	jz	no_la57
>  	orl	$X86_CR4_LA57, %eax
> -1:
> +no_la57:

I assume all of these new labels have to be prefixed with ".L", right?

-- 
  Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCHv10 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
Posted by Borislav Petkov 1 year, 9 months ago
On Tue, Apr 30, 2024 at 05:49:08PM +0300, Kirill A. Shutemov wrote:
> I assume all of these new labels have to be prefixed with ".L", right?

Oh yes, please.

Thx!

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCHv10 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
Posted by Borislav Petkov 1 year, 9 months ago
On Thu, May 02, 2024 at 03:22:29PM +0200, Borislav Petkov wrote:
> On Tue, Apr 30, 2024 at 05:49:08PM +0300, Kirill A. Shutemov wrote:
> > I assume all of these new labels have to be prefixed with ".L", right?
> 
> Oh yes, please.

Here's a fixed version:

---
From: "Borislav Petkov (AMD)" <bp@alien8.de>
Date: Tue, 30 Apr 2024 15:00:16 +0200
Subject: [PATCH] x86/relocate_kernel: Use named labels for less confusion

That identity_mapped() functions was loving that "1" label to the point
of completely confusing its readers.

Use named labels in each place for clarity.

No functional changes.

Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
---
 arch/x86/kernel/relocate_kernel_64.S | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 8e2037d78a1f..7f70707c7372 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -152,13 +152,15 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
 	 */
 	movl	$X86_CR4_PAE, %eax
 	testq	$X86_CR4_LA57, %r13
-	jz	1f
+	jz	.Lno_la57
 	orl	$X86_CR4_LA57, %eax
-1:
+.Lno_la57:
+
 	testq	$X86_CR4_MCE, %r13
-	jz	1f
+	jz	.Lmca_off
 	ALTERNATIVE "", __stringify(orl $X86_CR4_MCE, %eax), X86_FEATURE_TDX_GUEST
-1:
+.Lmca_off:
+
 	movq	%rax, %cr4
 
 	jmp 1f
@@ -173,9 +175,9 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
 	 * used by kexec. Flush the caches before copying the kernel.
 	 */
 	testq	%r12, %r12
-	jz 1f
+	jz .Lsme_off
 	wbinvd
-1:
+.Lsme_off:
 
 	movq	%rcx, %r11
 	call	swap_pages
@@ -195,7 +197,7 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
 	 */
 
 	testq	%r11, %r11
-	jnz 1f
+	jnz .Lrelocate
 	xorl	%eax, %eax
 	xorl	%ebx, %ebx
 	xorl    %ecx, %ecx
@@ -216,7 +218,7 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
 	ret
 	int3
 
-1:
+.Lrelocate:
 	popq	%rdx
 	leaq	PAGE_SIZE(%r10), %rsp
 	ANNOTATE_RETPOLINE_SAFE
-- 
2.43.0

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCHv10 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
Posted by Sean Christopherson 1 year, 10 months ago
On Tue, Apr 09, 2024, Kirill A. Shutemov wrote:
> Depending on setup, TDX guests might be allowed to clear CR4.MCE.
> Attempt to clear it leads to #VE.
> 
> Use alternatives to keep the flag during kexec for TDX guests.
> 
> The change doesn't affect non-TDX-guest environments.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/kernel/relocate_kernel_64.S | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
> index 56cab1bb25f5..8e2037d78a1f 100644
> --- a/arch/x86/kernel/relocate_kernel_64.S
> +++ b/arch/x86/kernel/relocate_kernel_64.S
> @@ -5,6 +5,8 @@
>   */
>  
>  #include <linux/linkage.h>
> +#include <linux/stringify.h>
> +#include <asm/alternative.h>
>  #include <asm/page_types.h>
>  #include <asm/kexec.h>
>  #include <asm/processor-flags.h>
> @@ -145,11 +147,17 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
>  	 * Set cr4 to a known state:
>  	 *  - physical address extension enabled
>  	 *  - 5-level paging, if it was enabled before
> +	 *  - Machine check exception on TDX guest, if it was enabled before.
> +	 *    Clearing MCE might not allowed in TDX guests, depending on setup.
>  	 */
>  	movl	$X86_CR4_PAE, %eax
>  	testq	$X86_CR4_LA57, %r13
>  	jz	1f
>  	orl	$X86_CR4_LA57, %eax
> +1:
> +	testq	$X86_CR4_MCE, %r13
> +	jz	1f
> +	ALTERNATIVE "", __stringify(orl $X86_CR4_MCE, %eax), X86_FEATURE_TDX_GUEST

The TEST+Jcc+OR sequences are rather odd, and require way more instructions and
thus way more copy+paste than is necessary.

	movl	$X86_CR4_LA57, %eax
	ALTERNATIVE "", __stringify(orl $X86_CR4_MCE, %eax), X86_FEATURE_TDX_GUEST
	andl	%r13d, %eax
	orl	$X86_CR4_PAE, %eax
	movq	%rax, %cr4

Then preserving new bits unconditionally only requires adding the flag to the
initial move, and feature-dependent bits only need a single ALTERNATIVE line.

And there's no branches, blazing fast kexec! ;-)
Re: [PATCHv10 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
Posted by Kirill A. Shutemov 1 year, 10 months ago
On Tue, Apr 09, 2024 at 07:22:24AM -0700, Sean Christopherson wrote:
> On Tue, Apr 09, 2024, Kirill A. Shutemov wrote:
> > Depending on setup, TDX guests might be allowed to clear CR4.MCE.
> > Attempt to clear it leads to #VE.
> > 
> > Use alternatives to keep the flag during kexec for TDX guests.
> > 
> > The change doesn't affect non-TDX-guest environments.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  arch/x86/kernel/relocate_kernel_64.S | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
> > index 56cab1bb25f5..8e2037d78a1f 100644
> > --- a/arch/x86/kernel/relocate_kernel_64.S
> > +++ b/arch/x86/kernel/relocate_kernel_64.S
> > @@ -5,6 +5,8 @@
> >   */
> >  
> >  #include <linux/linkage.h>
> > +#include <linux/stringify.h>
> > +#include <asm/alternative.h>
> >  #include <asm/page_types.h>
> >  #include <asm/kexec.h>
> >  #include <asm/processor-flags.h>
> > @@ -145,11 +147,17 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
> >  	 * Set cr4 to a known state:
> >  	 *  - physical address extension enabled
> >  	 *  - 5-level paging, if it was enabled before
> > +	 *  - Machine check exception on TDX guest, if it was enabled before.
> > +	 *    Clearing MCE might not allowed in TDX guests, depending on setup.
> >  	 */
> >  	movl	$X86_CR4_PAE, %eax
> >  	testq	$X86_CR4_LA57, %r13
> >  	jz	1f
> >  	orl	$X86_CR4_LA57, %eax
> > +1:
> > +	testq	$X86_CR4_MCE, %r13
> > +	jz	1f
> > +	ALTERNATIVE "", __stringify(orl $X86_CR4_MCE, %eax), X86_FEATURE_TDX_GUEST
> 
> The TEST+Jcc+OR sequences are rather odd, and require way more instructions and
> thus way more copy+paste than is necessary.
> 
> 	movl	$X86_CR4_LA57, %eax
> 	ALTERNATIVE "", __stringify(orl $X86_CR4_MCE, %eax), X86_FEATURE_TDX_GUEST
> 	andl	%r13d, %eax
> 	orl	$X86_CR4_PAE, %eax
> 	movq	%rax, %cr4
> 
> Then preserving new bits unconditionally only requires adding the flag to the
> initial move, and feature-dependent bits only need a single ALTERNATIVE line.

Thanks! It is much better.

> And there's no branches, blazing fast kexec! ;-)

kexec/sec STONKS! :D

Updated patch is below.

From 6be428e3b1c6fb494b2c48ba6a7c133514a0b2b4 Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Fri, 10 Feb 2023 12:53:11 +0300
Subject: [PATCHv10.1 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest

Depending on setup, TDX guests might be allowed to clear CR4.MCE.
Attempt to clear it leads to #VE.

Use alternatives to keep the flag during kexec for TDX guests.

The change doesn't affect non-TDX-guest environments.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/relocate_kernel_64.S | 15 +++++++++------
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 56cab1bb25f5..90246d544eb1 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -5,6 +5,8 @@
  */
 
 #include <linux/linkage.h>
+#include <linux/stringify.h>
+#include <asm/alternative.h>
 #include <asm/page_types.h>
 #include <asm/kexec.h>
 #include <asm/processor-flags.h>
@@ -143,14 +145,15 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
 
 	/*
 	 * Set cr4 to a known state:
-	 *  - physical address extension enabled
 	 *  - 5-level paging, if it was enabled before
+	 *  - Machine check exception on TDX guest, if it was enabled before.
+	 *    Clearing MCE might not allowed in TDX guests, depending on setup.
+	 *  - physical address extension enabled
 	 */
-	movl	$X86_CR4_PAE, %eax
-	testq	$X86_CR4_LA57, %r13
-	jz	1f
-	orl	$X86_CR4_LA57, %eax
-1:
+	movl	$X86_CR4_LA57, %eax
+	ALTERNATIVE "", __stringify(orl $X86_CR4_MCE, %eax), X86_FEATURE_TDX_GUEST
+	andl	%r13d, %eax
+	orl	$X86_CR4_PAE, %eax
 	movq	%rax, %cr4
 
 	jmp 1f
-- 
  Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCHv10 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
Posted by Borislav Petkov 1 year, 9 months ago
On Tue, Apr 09, 2024 at 06:26:05PM +0300, Kirill A. Shutemov wrote:
> From 6be428e3b1c6fb494b2c48ba6a7c133514a0b2b4 Mon Sep 17 00:00:00 2001
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> Date: Fri, 10 Feb 2023 12:53:11 +0300
> Subject: [PATCHv10.1 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
> 
> Depending on setup, TDX guests might be allowed to clear CR4.MCE.
> Attempt to clear it leads to #VE.
> 
> Use alternatives to keep the flag during kexec for TDX guests.
> 
> The change doesn't affect non-TDX-guest environments.

This is all fine and dandy but nothing explains *why* TDX needs this
special dance.

Why can't TDX do the usual CR4.MCE diddling like the normal kernel
during init and needs to do that here immediately?

> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> ---
>  arch/x86/kernel/relocate_kernel_64.S | 15 +++++++++------
>  1 file changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
> index 56cab1bb25f5..90246d544eb1 100644
> --- a/arch/x86/kernel/relocate_kernel_64.S
> +++ b/arch/x86/kernel/relocate_kernel_64.S
> @@ -5,6 +5,8 @@
>   */
>  
>  #include <linux/linkage.h>
> +#include <linux/stringify.h>
> +#include <asm/alternative.h>
>  #include <asm/page_types.h>
>  #include <asm/kexec.h>
>  #include <asm/processor-flags.h>
> @@ -143,14 +145,15 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
>  
>  	/*
>  	 * Set cr4 to a known state:
> -	 *  - physical address extension enabled
>  	 *  - 5-level paging, if it was enabled before
> +	 *  - Machine check exception on TDX guest, if it was enabled before.
> +	 *    Clearing MCE might not allowed in TDX guests, depending on setup.

			 ... might not be allowed ...

> +	 *  - physical address extension enabled
>  	 */
> -	movl	$X86_CR4_PAE, %eax
> -	testq	$X86_CR4_LA57, %r13
> -	jz	1f
> -	orl	$X86_CR4_LA57, %eax
> -1:
> +	movl	$X86_CR4_LA57, %eax
> +	ALTERNATIVE "", __stringify(orl $X86_CR4_MCE, %eax), X86_FEATURE_TDX_GUEST
> +	andl	%r13d, %eax

%r13 needs a comment here that it contains %cr4 read above in
relocate_kernel()

> +	orl	$X86_CR4_PAE, %eax
>  	movq	%rax, %cr4
>  
>  	jmp 1f
> -- 
>   Kiryl Shutsemau / Kirill A. Shutemov

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCHv10 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
Posted by Kirill A. Shutemov 1 year, 9 months ago
On Sun, Apr 28, 2024 at 07:11:11PM +0200, Borislav Petkov wrote:
> On Tue, Apr 09, 2024 at 06:26:05PM +0300, Kirill A. Shutemov wrote:
> > From 6be428e3b1c6fb494b2c48ba6a7c133514a0b2b4 Mon Sep 17 00:00:00 2001
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > Date: Fri, 10 Feb 2023 12:53:11 +0300
> > Subject: [PATCHv10.1 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
> > 
> > Depending on setup, TDX guests might be allowed to clear CR4.MCE.
> > Attempt to clear it leads to #VE.
> > 
> > Use alternatives to keep the flag during kexec for TDX guests.
> > 
> > The change doesn't affect non-TDX-guest environments.
> 
> This is all fine and dandy but nothing explains *why* TDX needs this
> special dance.
> 
> Why can't TDX do the usual CR4.MCE diddling like the normal kernel
> during init and needs to do that here immediately?

As I mentioned above, clearing CR4.MCE triggers #VE. It is quirk of the
platform.

There's plan to allow it in newer TDX modules, but kernel still has to
assume we cannot touch it in TDX guest case.

> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > ---
> >  arch/x86/kernel/relocate_kernel_64.S | 15 +++++++++------
> >  1 file changed, 9 insertions(+), 6 deletions(-)
> > 
> > diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
> > index 56cab1bb25f5..90246d544eb1 100644
> > --- a/arch/x86/kernel/relocate_kernel_64.S
> > +++ b/arch/x86/kernel/relocate_kernel_64.S
> > @@ -5,6 +5,8 @@
> >   */
> >  
> >  #include <linux/linkage.h>
> > +#include <linux/stringify.h>
> > +#include <asm/alternative.h>
> >  #include <asm/page_types.h>
> >  #include <asm/kexec.h>
> >  #include <asm/processor-flags.h>
> > @@ -143,14 +145,15 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
> >  
> >  	/*
> >  	 * Set cr4 to a known state:
> > -	 *  - physical address extension enabled
> >  	 *  - 5-level paging, if it was enabled before
> > +	 *  - Machine check exception on TDX guest, if it was enabled before.
> > +	 *    Clearing MCE might not allowed in TDX guests, depending on setup.
> 
> 			 ... might not be allowed ...
> 

Oopsie. Thanks.

> > +	 *  - physical address extension enabled
> >  	 */
> > -	movl	$X86_CR4_PAE, %eax
> > -	testq	$X86_CR4_LA57, %r13
> > -	jz	1f
> > -	orl	$X86_CR4_LA57, %eax
> > -1:
> > +	movl	$X86_CR4_LA57, %eax
> > +	ALTERNATIVE "", __stringify(orl $X86_CR4_MCE, %eax), X86_FEATURE_TDX_GUEST
> > +	andl	%r13d, %eax
> 
> %r13 needs a comment here that it contains %cr4 read above in
> relocate_kernel()

Okay.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCHv10 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
Posted by Borislav Petkov 1 year, 9 months ago
On Mon, Apr 29, 2024 at 04:17:38PM +0300, Kirill A. Shutemov wrote:
> As I mentioned above, clearing CR4.MCE triggers #VE. It is quirk of the
> platform.

You mean when identity_mapped() runs as part of a kexec-ed kernel, it
might clear CR4.MCE and that would trigger the #VE?

So, if that is correct, you basically want to *preserve* the CR4.MCE
setting across kexec?

But then __mcheck_cpu_init_generic() will go and set it
unconditionally.

So what exactly is the correct flow here?

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCHv10 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
Posted by Kirill A. Shutemov 1 year, 9 months ago
On Mon, Apr 29, 2024 at 04:45:08PM +0200, Borislav Petkov wrote:
> On Mon, Apr 29, 2024 at 04:17:38PM +0300, Kirill A. Shutemov wrote:
> > As I mentioned above, clearing CR4.MCE triggers #VE. It is quirk of the
> > platform.
> 
> You mean when identity_mapped() runs as part of a kexec-ed kernel, it
> might clear CR4.MCE and that would trigger the #VE?

Yes, that's what happens in current upstream.

> So, if that is correct, you basically want to *preserve* the CR4.MCE
> setting across kexec?

Yes.

> But then __mcheck_cpu_init_generic() will go and set it
> unconditionally.

__mcheck_cpu_init_generic() will not change anything in this case as the
bit is already set. Everything is hunky-dory.

> So what exactly is the correct flow here?

TDX guest has CR4.MCE set from time 0 and it has to stay this way all the
time including kexec flow.

We have already modified early boot code to preserve CR4.MCE. See
77a512e35db7 ("x86/boot: Avoid #VE during boot for TDX platforms").
The patch extends it to kexec flow.

-- 
  Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCHv10 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
Posted by Borislav Petkov 1 year, 9 months ago
On Mon, Apr 29, 2024 at 06:16:54PM +0300, Kirill A. Shutemov wrote:
> Yes, that's what happens in current upstream.

Let's rewrite that commit message then:

"TDX guests run with MCA enabled (CR4.MCE=1b) from the very start. If
that bit is cleared during CR4 register reprogramming during boot or
kexec flows, a #VE exception will be raised which the guest kernel
cannot handle that early.

Therefore, make sure the CR4.MCE setting is preserved over kexec too and
avoid raising any #VEs."

without that bit about TDX guests might be allowed to clear CR4.MCE
which is only confusing and unnecessary.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette
Re: [PATCHv10 05/18] x86/kexec: Keep CR4.MCE set during kexec for TDX guest
Posted by Huang, Kai 1 year, 10 months ago
On Tue, 2024-04-09 at 14:29 +0300, Kirill A. Shutemov wrote:
> Depending on setup, TDX guests might be allowed to clear CR4.MCE.
> Attempt to clear it leads to #VE.
> 
> Use alternatives to keep the flag during kexec for TDX guests.
> 
> The change doesn't affect non-TDX-guest environments.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Reviewed-by: Kai Huang <kai.huang@intel.com>