[PATCH v1] x86/smp: Set up exception handling before cr4_init()

Xin Li (Intel) posted 1 patch 12 hours ago
arch/x86/kernel/smpboot.c | 41 ++++++++++++++++++++++++++++++++-------
1 file changed, 34 insertions(+), 7 deletions(-)
[PATCH v1] x86/smp: Set up exception handling before cr4_init()
Posted by Xin Li (Intel) 12 hours ago
The current AP boot sequence initializes CR4 before setting up
exception handling.  With FRED enabled, however, CR4.FRED is set
prior to initializing the FRED configuration MSRs, introducing a
brief window where a triple fault could occur.  This wasn't
considered a problem, as the early boot code was carefully designed
to avoid triggering exceptions.  Moreover, if an exception does
occur at this stage, it's preferable for the CPU to triple fault
rather than risk a potential exploit.

However, under TDX or SEV-ES/SNP, printk() triggers a #VE or #VC,
so any logging during this small window results in a triple fault.

Swap the order of cr4_init() and cpu_init_exception_handling(),
since cr4_init() only involves reading from and writing to CR4,
and setting up exception handling does not depend on any specific
CR4 bits being set (Arguably CR4.PAE, CR4.PSE and CR4.PGE are
related but they are already set before start_secondary() anyway).

Notably, this triple fault can still occur before FRED is enabled,
while the bringup IDT is in use, since it lacks a #VE handler for
TDX.

BTW, on 32-bit systems, loading CR3 with swapper_pg_dir is moved
ahead of cr4_init(), which appears to be harmless.

Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code")
Reported-by: Nikunj A Dadhania <nikunj@amd.com>
Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Cc: stable@vger.kernel.org # 6.9+
---
 arch/x86/kernel/smpboot.c | 41 ++++++++++++++++++++++++++++++++-------
 1 file changed, 34 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 5cd6950ab672..d2bf1acddb86 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -228,13 +228,6 @@ static void ap_calibrate_delay(void)
  */
 static void notrace __noendbr start_secondary(void *unused)
 {
-	/*
-	 * Don't put *anything* except direct CPU state initialization
-	 * before cpu_init(), SMP booting is too fragile that we want to
-	 * limit the things done here to the most necessary things.
-	 */
-	cr4_init();
-
 	/*
 	 * 32-bit specific. 64-bit reaches this code with the correct page
 	 * table established. Yet another historical divergence.
@@ -245,8 +238,37 @@ static void notrace __noendbr start_secondary(void *unused)
 		__flush_tlb_all();
 	}
 
+	/*
+	 * AP startup assembly code has setup the following before calling
+	 * start_secondary() on 64-bit:
+	 *
+	 * 1) CS set to __KERNEL_CS.
+	 * 2) CR3 switched to the init_top_pgt.
+	 * 3) CR4.PAE, CR4.PSE and CR4.PGE are set.
+	 * 4) GDT set to per-CPU gdt_page.
+	 * 5) ALL data segments set to the NULL descriptor.
+	 * 6) MSR_GS_BASE set to per-CPU offset.
+	 * 7) IDT set to bringup IDT.
+	 * 8) CR0 set to CR0_STATE.
+	 *
+	 * So it's ready to setup exception handling.
+	 */
 	cpu_init_exception_handling(false);
 
+	/*
+	 * Ensure bits set in cr4_pinned_bits are set in CR4.
+	 *
+	 * cr4_pinned_bits is a subset of cr4_pinned_mask, which includes
+	 * the following bits:
+	 *         X86_CR4_SMEP
+	 *         X86_CR4_SMAP
+	 *         X86_CR4_UMIP
+	 *         X86_CR4_FSGSBASE
+	 *         X86_CR4_CET
+	 *         X86_CR4_FRED
+	 */
+	cr4_init();
+
 	/*
 	 * Load the microcode before reaching the AP alive synchronization
 	 * point below so it is not part of the full per CPU serialized
@@ -272,6 +294,11 @@ static void notrace __noendbr start_secondary(void *unused)
 	 */
 	cpuhp_ap_sync_alive();
 
+	/*
+	 * Don't put *anything* except direct CPU state initialization
+	 * before cpu_init(), SMP booting is too fragile that we want to
+	 * limit the things done here to the most necessary things.
+	 */
 	cpu_init();
 	fpu__init_cpu();
 	rcutree_report_cpu_starting(raw_smp_processor_id());
-- 
2.52.0
Re: [PATCH v1] x86/smp: Set up exception handling before cr4_init()
Posted by Dave Hansen 12 hours ago
> +	/*
> +	 * AP startup assembly code has setup the following before calling
> +	 * start_secondary() on 64-bit:
> +	 *
> +	 * 1) CS set to __KERNEL_CS.
> +	 * 2) CR3 switched to the init_top_pgt.
> +	 * 3) CR4.PAE, CR4.PSE and CR4.PGE are set.
> +	 * 4) GDT set to per-CPU gdt_page.
> +	 * 5) ALL data segments set to the NULL descriptor.
> +	 * 6) MSR_GS_BASE set to per-CPU offset.
> +	 * 7) IDT set to bringup IDT.
> +	 * 8) CR0 set to CR0_STATE.
> +	 *
> +	 * So it's ready to setup exception handling.
> +	 */
>  	cpu_init_exception_handling(false);

This is fine because very little of that ^ is ever going to change. It's
not great that it duplicates the assembly logic, but it's good
documentation I guess.

> +	/*
> +	 * Ensure bits set in cr4_pinned_bits are set in CR4.
> +	 *
> +	 * cr4_pinned_bits is a subset of cr4_pinned_mask, which includes
> +	 * the following bits:
> +	 *         X86_CR4_SMEP
> +	 *         X86_CR4_SMAP
> +	 *         X86_CR4_UMIP
> +	 *         X86_CR4_FSGSBASE
> +	 *         X86_CR4_CET
> +	 *         X86_CR4_FRED
> +	 */
> +	cr4_init();

I'm not as big of a fan of this comment. The next pinned bit that gets
added will make this stale. Could we try to make this more timeless, please?

I'm also not sure I like the asymmetry of this between the boot and
secondary CPUs. On a boot CPU, CR4.SMEP will get set via
identify_boot_cpu() and eventually setup_smep(). On a secondary CPU,
it'll get set in cr4_init() and *not* in setup_smep().

This asymmetry is (I think) part of what the root of the problem is here
and how this bug came to be.

I really think the current pinning behavior is too invasive. It has zero
benefit to be pinning CR bits this early in bringup. It's only causing pain.

I _really_ think we need a defined per-cpu point where pinning comes
into effect. Marking the CPU online is one idea.

Thoughts?