[PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec

Kirill A. Shutemov posted 2 patches 2 years, 7 months ago
[PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec
Posted by Kirill A. Shutemov 2 years, 7 months ago
TDX guests are not allowed to clear CR4.MCE. Attempt to clear it leads
to #VE.

Preserve the flag during kexec.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/kernel/relocate_kernel_64.S | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 4a73351f87f8..18f19dcc40e9 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -145,8 +145,12 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
 	 * Set cr4 to a known state:
 	 *  - physical address extension enabled
 	 *  - 5-level paging, if it was enabled before
+	 *  - Preserve MCE, if it was set. Clearing MCE may fault in some
+	 *    environments.
 	 */
-	movl	$X86_CR4_PAE, %eax
+	movq	%cr4, %rax
+	andl	$X86_CR4_MCE, %eax
+	orl	$X86_CR4_PAE, %eax
 	testq	$X86_CR4_LA57, %r13
 	jz	1f
 	orl	$X86_CR4_LA57, %eax
-- 
2.39.1
Re: [PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec
Posted by Andrew Cooper 2 years, 6 months ago
On 13/02/2023 11:48 pm, Kirill A. Shutemov wrote:
> TDX guests are not allowed to clear CR4.MCE. Attempt to clear it leads
> to #VE.
>
> Preserve the flag during kexec.
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

That's unfortunate for TDX, but in a non-TDX system you must never
maintain CR4.MCE into kexec.

Async events, including NMIs, cannot be taken between this point and the
target having set itself up into it's intended operating mode.  During
this period you get all kinds of fun with type confusion in the IDT/TSS
and/or not having a safe stack to service the event.

A clean shutdown from not having machine checks enabled is far
preferable to trying to deliver an #MC in purgatory.

That, or you're welcome to debug the next bug report I get where
(amazingly) an NMI managed to hit with a good stack in the new context,
but most of the old context (IDT, TSS and .text) still intact enough to
start emitting a very confused oops onto serial...

~Andrew
Re: [PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec
Posted by Edgecombe, Rick P 2 years, 6 months ago
On Tue, 2023-02-14 at 02:48 +0300, Kirill A. Shutemov wrote:
> TDX guests are not allowed to clear CR4.MCE. Attempt to clear it
> leads
> to #VE.
> 
> Preserve the flag during kexec.
> 
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

I wonder whats going on with the pre-existing switching between eax and
rax in this code for the cr0 and cr4 manipulations. Do you know what
the reason is?

Also, for a simple non-tdx kexec regression test:
Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Re: [PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec
Posted by Kirill A. Shutemov 2 years, 6 months ago
On Thu, Feb 16, 2023 at 01:49:39AM +0000, Edgecombe, Rick P wrote:
> On Tue, 2023-02-14 at 02:48 +0300, Kirill A. Shutemov wrote:
> > TDX guests are not allowed to clear CR4.MCE. Attempt to clear it
> > leads
> > to #VE.
> > 
> > Preserve the flag during kexec.
> > 
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> 
> I wonder whats going on with the pre-existing switching between eax and
> rax in this code for the cr0 and cr4 manipulations. Do you know what
> the reason is?

32-bit ORs and ANDs save one byte per instruction. And there's no 32-bit
MOV to/from control registers in 64-bit mode.

> 
> Also, for a simple non-tdx kexec regression test:
> Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

Thanks!

-- 
  Kiryl Shutsemau / Kirill A. Shutemov
Re: [PATCH 1/2] x86/kexec: Preserve CR4.MCE during kexec
Posted by Edgecombe, Rick P 2 years, 6 months ago
On Thu, 2023-02-16 at 12:43 +0300, Kirill A. Shutemov wrote:
> On Thu, Feb 16, 2023 at 01:49:39AM +0000, Edgecombe, Rick P wrote:
> > On Tue, 2023-02-14 at 02:48 +0300, Kirill A. Shutemov wrote:
> > > TDX guests are not allowed to clear CR4.MCE. Attempt to clear it
> > > leads
> > > to #VE.
> > > 
> > > Preserve the flag during kexec.
> > > 
> > > Signed-off-by: Kirill A. Shutemov <
> > > kirill.shutemov@linux.intel.com>
> > 
> > I wonder whats going on with the pre-existing switching between eax
> > and
> > rax in this code for the cr0 and cr4 manipulations. Do you know
> > what
> > the reason is?
> 
> 32-bit ORs and ANDs save one byte per instruction. And there's no 32-
> bit
> MOV to/from control registers in 64-bit mode.

Oh right, I think I recall now. There is a 64 bit AND in the CR0 piece
here too, which of course is outside of these changes.

But otherwise, it's not clear from the patch what the implications are
of leaving CR4.MCE set for the non-TDX environment. I see in head_64.S
it will clear it during boot if the kernel doesn't support machine
check. So it leaves a little window where CR4.MCE is set where it
wasn't before.

The piece in head_64.S talks about how an #MC will crash the system if
it happens before the machine check stuff is fully setup anyway, so it
doesn't hurt to leave it on. Is that the reasoning for this change as
well? If so it might help to add a little more about the reasoning in
the commit log.