[PATCH] x86/boot: Disable interrupts when establishing SSP

Andrew Cooper posted 1 patch 1 day, 18 hours ago
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20260501191028.1250225-1-andrew.cooper3@citrix.com
xen/arch/x86/setup.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)
[PATCH] x86/boot: Disable interrupts when establishing SSP
Posted by Andrew Cooper 1 day, 18 hours ago
Gitlab CI reported a crash on boot on Alder Lake hardware.  The bug is years
old, making it an incredibly rare occurance:

  (XEN) *** DOUBLE FAULT ***
  (XEN) ----[ Xen-4.22-unstable  x86_64  debug=y ubsan=y  Not tainted ]----
  (XEN) CPU:    0
  (XEN) RIP:    e008:[<ffff82d04077bbc4>] arch/x86/setup.c#reinit_bsp_stack+0xfa/0x160
  (XEN) RFLAGS: 0000000000010202   CONTEXT: hypervisor
  (XEN) rax: 0000000000000007   rbx: ffff83049a4b0000   rcx: 00000000000006a2
  (XEN) rdx: 0000000000000000   rsi: 0000000000000000   rdi: 0000000000000000
  (XEN) rbp: ffff83049a4b7f00   rsp: ffff83049a4b7ef8   r8:  ffff830497e47000
  (XEN) r9:  00000000ffffffff   r10: 00000000900c2121   r11: 000000009a392956
  (XEN) r12: ffff830497e47000   r13: ffff830497e49f40   r14: 0000000000000000
  (XEN) r15: ffff82d0407dad10   cr0: 0000000080050033   cr4: 0000000000f526e0
  (XEN) cr3: 0000000043c16000   cr2: fffffffffffffffc
  (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
  (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
  (XEN) Xen code around <ffff82d04077bbc4> (arch/x86/setup.c#reinit_bsp_stack+0xfa/0x160):
  (XEN)  00 b9 a2 06 00 00 0f 30 <80> 3d 71 26 f1 ff 00 74 3e 48 8d 93 f8 5f 00 00
  (XEN) Valid stack range: ffff83049a4b6000-ffff83049a4b8000, sp=ffff83049a4b7ef8, tss.rsp0=ffff83049a4b7fb0
  (XEN) No stack overflow detected. Skipping stack trace.
  (XEN)
  (XEN) ****************************************
  (XEN) Panic on CPU 0:
  (XEN) DOUBLE FAULT -- system shutdown
  (XEN) ****************************************

This is on the instruction boundary after enabling CET (writing MSR_S_CET) and
prior to establishing SSP.  Despite identifying this as a critical window
where any fault was deadly (the CPU tries to push a shadow stack frame at 0,
hence the CR2 value wrapping around to the top of the address space), I
clearly forgot that this meant interrupts too, which are enabled.

Along with regular interrupts, NMIs are a problem.  Unlike other cases needing
NMI safety, we can't use a self NMI and callback, as the stack needs to be
empty at the point of enabling Shadow Stacks.

Disable interrupts, and turn off the watchdog if it's configured.

Note that watchdog_{en,dis}able() do not work here.  They cause the watchdog
NMI to be ignored; they do not inhibit the generation of NMIs.

Fixes: b60ab42db2f0 ("x86/shstk: Activate Supervisor Shadow Stacks")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <jbeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Teddy Astie <teddy.astie@vates.tech>

The only way I can think of doing this in NMI context is to have the NMI LRET
off the NMI stack back to the interrupted context.  But this is horrible to
arrange, not to mention different between IDT and FRED.

Also, the {disable,setup}_lapic_nmi_watchdog() API is horrible but I don't
have time to make it sane right now, and this needs backporting a long way.

https://gitlab.com/xen-project/hardware/xen-staging/-/pipelines/2494069238
---
 xen/arch/x86/setup.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index d041cbd5f6f1..95ac36beab37 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -908,6 +908,19 @@ static void __init noreturn reinit_bsp_stack(void)
 
     if ( cpu_has_xen_shstk )
     {
+        bool watchdog = (nmi_watchdog == NMI_LOCAL_APIC);
+
+        /*
+         * Between enabling CET and establishing SSP, any fault or interrupt
+         * is fatal.  We must arrange for none to happen.
+         *
+         * TODO: Figure out how to perform CET enablement in NMI context,
+         *       given the constraint that the stack must be empty.
+         */
+        if ( watchdog )
+            disable_lapic_nmi_watchdog();
+        local_irq_disable();
+
         wrmsrl(MSR_S_CET, xen_msr_s_cet_value());
 
         /*
@@ -932,6 +945,13 @@ static void __init noreturn reinit_bsp_stack(void)
         }
         else
             asm volatile ( "setssbsy" ::: "memory" );
+
+        local_irq_enable();
+        if ( watchdog )
+        {
+            nmi_watchdog = NMI_LOCAL_APIC;
+            setup_apic_nmi_watchdog();
+        }
     }
 
     reset_stack_and_jump(init_done);

base-commit: 61f957d48c78df6c5254b6f54d6170d3bd3d717e
-- 
2.39.5