TDX guests are not allowed to clear CR4.MCE. Attempt to clear it leads
to #VE.
Preserve the flag during kexec.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
arch/x86/kernel/relocate_kernel_64.S | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/relocate_kernel_64.S b/arch/x86/kernel/relocate_kernel_64.S
index 4a73351f87f8..18f19dcc40e9 100644
--- a/arch/x86/kernel/relocate_kernel_64.S
+++ b/arch/x86/kernel/relocate_kernel_64.S
@@ -145,8 +145,12 @@ SYM_CODE_START_LOCAL_NOALIGN(identity_mapped)
* Set cr4 to a known state:
* - physical address extension enabled
* - 5-level paging, if it was enabled before
+ * - Preserve MCE, if it was set. Clearing MCE may fault in some
+ * environments.
*/
- movl $X86_CR4_PAE, %eax
+ movq %cr4, %rax
+ andl $X86_CR4_MCE, %eax
+ orl $X86_CR4_PAE, %eax
testq $X86_CR4_LA57, %r13
jz 1f
orl $X86_CR4_LA57, %eax
--
2.39.1
On 13/02/2023 11:48 pm, Kirill A. Shutemov wrote: > TDX guests are not allowed to clear CR4.MCE. Attempt to clear it leads > to #VE. > > Preserve the flag during kexec. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> That's unfortunate for TDX, but in a non-TDX system you must never maintain CR4.MCE into kexec. Async events, including NMIs, cannot be taken between this point and the target having set itself up into it's intended operating mode. During this period you get all kinds of fun with type confusion in the IDT/TSS and/or not having a safe stack to service the event. A clean shutdown from not having machine checks enabled is far preferable to trying to deliver an #MC in purgatory. That, or you're welcome to debug the next bug report I get where (amazingly) an NMI managed to hit with a good stack in the new context, but most of the old context (IDT, TSS and .text) still intact enough to start emitting a very confused oops onto serial... ~Andrew
On Tue, 2023-02-14 at 02:48 +0300, Kirill A. Shutemov wrote: > TDX guests are not allowed to clear CR4.MCE. Attempt to clear it > leads > to #VE. > > Preserve the flag during kexec. > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> I wonder whats going on with the pre-existing switching between eax and rax in this code for the cr0 and cr4 manipulations. Do you know what the reason is? Also, for a simple non-tdx kexec regression test: Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
On Thu, Feb 16, 2023 at 01:49:39AM +0000, Edgecombe, Rick P wrote: > On Tue, 2023-02-14 at 02:48 +0300, Kirill A. Shutemov wrote: > > TDX guests are not allowed to clear CR4.MCE. Attempt to clear it > > leads > > to #VE. > > > > Preserve the flag during kexec. > > > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > I wonder whats going on with the pre-existing switching between eax and > rax in this code for the cr0 and cr4 manipulations. Do you know what > the reason is? 32-bit ORs and ANDs save one byte per instruction. And there's no 32-bit MOV to/from control registers in 64-bit mode. > > Also, for a simple non-tdx kexec regression test: > Tested-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Thanks! -- Kiryl Shutsemau / Kirill A. Shutemov
On Thu, 2023-02-16 at 12:43 +0300, Kirill A. Shutemov wrote: > On Thu, Feb 16, 2023 at 01:49:39AM +0000, Edgecombe, Rick P wrote: > > On Tue, 2023-02-14 at 02:48 +0300, Kirill A. Shutemov wrote: > > > TDX guests are not allowed to clear CR4.MCE. Attempt to clear it > > > leads > > > to #VE. > > > > > > Preserve the flag during kexec. > > > > > > Signed-off-by: Kirill A. Shutemov < > > > kirill.shutemov@linux.intel.com> > > > > I wonder whats going on with the pre-existing switching between eax > > and > > rax in this code for the cr0 and cr4 manipulations. Do you know > > what > > the reason is? > > 32-bit ORs and ANDs save one byte per instruction. And there's no 32- > bit > MOV to/from control registers in 64-bit mode. Oh right, I think I recall now. There is a 64 bit AND in the CR0 piece here too, which of course is outside of these changes. But otherwise, it's not clear from the patch what the implications are of leaving CR4.MCE set for the non-TDX environment. I see in head_64.S it will clear it during boot if the kernel doesn't support machine check. So it leaves a little window where CR4.MCE is set where it wasn't before. The piece in head_64.S talks about how an #MC will crash the system if it happens before the machine check stuff is fully setup anyway, so it doesn't hurt to leave it on. Is that the reasoning for this change as well? If so it might help to add a little more about the reasoning in the commit log.
© 2016 - 2025 Red Hat, Inc.