cpu_smpboot_alloc() is designed to be idempotent with respect to partially
initialised state. This occurs for S3 and CPU parking, where enough state to
handle NMIs/#MCs needs to remain valid for the entire lifetime of Xen, even
when we otherwise want to offline the CPU.
For simplicity between various configuration, Xen always uses shadow stack
mappings (Read-only + Dirty) for the guard page, irrespective of whether
CET-SS is enabled.
Unfortunately, the CET-SS changes in memguard_guard_stack() broke idempotency
by first writing out the supervisor shadow stack tokens with plain writes,
then changing the mapping to being read-only.
This ordering is strictly necessary to configure the BSP, which cannot have
the supervisor tokens be written with WRSS.
Instead of calling memguard_guard_stack() unconditionally, call it only when
actually allocating a new stack. Xenheap allocates are guaranteed to be
writeable, and the net result is idempotency WRT configuring stack_base[].
Fixes: 91d26ed304f ("x86/shstk: Create shadow stacks")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
This can more easily be demonstrated with CPU hotplug than S3, and the absence
of bug reports goes to show how rarely hotplug is used.
v2:
* Don't break S3/CPU parking in combination with CET-SS. v1 would, for S3,
turn the BSP shadow stack into regular mappings, and #DF as soon as the TLB
shootdown completes. For CPU Parking, it would invalidate the shadow stack
of the parked CPUs, causing a #DF on the next NMI/#MC to hit the thread.
---
xen/arch/x86/smpboot.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 5708573c41..67e727cebd 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -997,16 +997,18 @@ static int cpu_smpboot_alloc(unsigned int cpu)
memflags = MEMF_node(node);
if ( stack_base[cpu] == NULL )
+ {
stack_base[cpu] = alloc_xenheap_pages(STACK_ORDER, memflags);
- if ( stack_base[cpu] == NULL )
- goto out;
+ if ( !stack_base[cpu] )
+ goto out;
+
+ memguard_guard_stack(stack_base[cpu]);
+ }
info = get_cpu_info_from_stack((unsigned long)stack_base[cpu]);
info->processor_id = cpu;
info->per_cpu_offset = __per_cpu_offset[cpu];
- memguard_guard_stack(stack_base[cpu]);
-
gdt = per_cpu(gdt, cpu) ?: alloc_xenheap_pages(0, memflags);
if ( gdt == NULL )
goto out;
--
2.11.0
On 14.10.2020 20:47, Andrew Cooper wrote: > cpu_smpboot_alloc() is designed to be idempotent with respect to partially > initialised state. This occurs for S3 and CPU parking, where enough state to > handle NMIs/#MCs needs to remain valid for the entire lifetime of Xen, even > when we otherwise want to offline the CPU. > > For simplicity between various configuration, Xen always uses shadow stack > mappings (Read-only + Dirty) for the guard page, irrespective of whether > CET-SS is enabled. > > Unfortunately, the CET-SS changes in memguard_guard_stack() broke idempotency > by first writing out the supervisor shadow stack tokens with plain writes, > then changing the mapping to being read-only. > > This ordering is strictly necessary to configure the BSP, which cannot have > the supervisor tokens be written with WRSS. > > Instead of calling memguard_guard_stack() unconditionally, call it only when > actually allocating a new stack. Xenheap allocates are guaranteed to be > writeable, and the net result is idempotency WRT configuring stack_base[]. > > Fixes: 91d26ed304f ("x86/shstk: Create shadow stacks") > Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > --- > CC: Jan Beulich <JBeulich@suse.com> > CC: Roger Pau Monné <roger.pau@citrix.com> > CC: Wei Liu <wl@xen.org> > > This can more easily be demonstrated with CPU hotplug than S3, and the absence > of bug reports goes to show how rarely hotplug is used. > > v2: > * Don't break S3/CPU parking in combination with CET-SS. v1 would, for S3, > turn the BSP shadow stack into regular mappings, and #DF as soon as the TLB > shootdown completes. The code change looks correct to me, but since I don't understand this part I'm afraid I may be overlooking something. I understand the "turn the BSP shadow stack into regular mappings" relates to cpu_smpboot_free()'s call to memguard_unguard_stack(), but I didn't think we come through cpu_smpboot_free() for the BSP upon entering or leaving S3. Jan
On 15/10/2020 09:50, Jan Beulich wrote: > On 14.10.2020 20:47, Andrew Cooper wrote: >> cpu_smpboot_alloc() is designed to be idempotent with respect to partially >> initialised state. This occurs for S3 and CPU parking, where enough state to >> handle NMIs/#MCs needs to remain valid for the entire lifetime of Xen, even >> when we otherwise want to offline the CPU. >> >> For simplicity between various configuration, Xen always uses shadow stack >> mappings (Read-only + Dirty) for the guard page, irrespective of whether >> CET-SS is enabled. >> >> Unfortunately, the CET-SS changes in memguard_guard_stack() broke idempotency >> by first writing out the supervisor shadow stack tokens with plain writes, >> then changing the mapping to being read-only. >> >> This ordering is strictly necessary to configure the BSP, which cannot have >> the supervisor tokens be written with WRSS. >> >> Instead of calling memguard_guard_stack() unconditionally, call it only when >> actually allocating a new stack. Xenheap allocates are guaranteed to be >> writeable, and the net result is idempotency WRT configuring stack_base[]. >> >> Fixes: 91d26ed304f ("x86/shstk: Create shadow stacks") >> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> --- >> CC: Jan Beulich <JBeulich@suse.com> >> CC: Roger Pau Monné <roger.pau@citrix.com> >> CC: Wei Liu <wl@xen.org> >> >> This can more easily be demonstrated with CPU hotplug than S3, and the absence >> of bug reports goes to show how rarely hotplug is used. >> >> v2: >> * Don't break S3/CPU parking in combination with CET-SS. v1 would, for S3, >> turn the BSP shadow stack into regular mappings, and #DF as soon as the TLB >> shootdown completes. > The code change looks correct to me, but since I don't understand > this part I'm afraid I may be overlooking something. I understand > the "turn the BSP shadow stack into regular mappings" relates to > cpu_smpboot_free()'s call to memguard_unguard_stack(), but I > didn't think we come through cpu_smpboot_free() for the BSP upon > entering or leaving S3. The v1 really did fix Marek's repro of the problem. The only possible way this can occur is if, somewhere, there is a call to cpu_smpboot_free() for CPU0 with remove=0 on the S3 path I have to admit that I can't actually spot where it is. Either way - it doesn't impact the fix, which attempts to make "the stack" into a single object. I experimented with introducing smpboot_{alloc,free}_stack(), but the result wasn't clean and I abandoned that approach. ~Andrew
On 15.10.2020 16:02, Andrew Cooper wrote: > On 15/10/2020 09:50, Jan Beulich wrote: >> On 14.10.2020 20:47, Andrew Cooper wrote: >>> cpu_smpboot_alloc() is designed to be idempotent with respect to partially >>> initialised state. This occurs for S3 and CPU parking, where enough state to >>> handle NMIs/#MCs needs to remain valid for the entire lifetime of Xen, even >>> when we otherwise want to offline the CPU. >>> >>> For simplicity between various configuration, Xen always uses shadow stack >>> mappings (Read-only + Dirty) for the guard page, irrespective of whether >>> CET-SS is enabled. >>> >>> Unfortunately, the CET-SS changes in memguard_guard_stack() broke idempotency >>> by first writing out the supervisor shadow stack tokens with plain writes, >>> then changing the mapping to being read-only. >>> >>> This ordering is strictly necessary to configure the BSP, which cannot have >>> the supervisor tokens be written with WRSS. >>> >>> Instead of calling memguard_guard_stack() unconditionally, call it only when >>> actually allocating a new stack. Xenheap allocates are guaranteed to be >>> writeable, and the net result is idempotency WRT configuring stack_base[]. >>> >>> Fixes: 91d26ed304f ("x86/shstk: Create shadow stacks") >>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> >>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >>> --- >>> CC: Jan Beulich <JBeulich@suse.com> >>> CC: Roger Pau Monné <roger.pau@citrix.com> >>> CC: Wei Liu <wl@xen.org> >>> >>> This can more easily be demonstrated with CPU hotplug than S3, and the absence >>> of bug reports goes to show how rarely hotplug is used. >>> >>> v2: >>> * Don't break S3/CPU parking in combination with CET-SS. v1 would, for S3, >>> turn the BSP shadow stack into regular mappings, and #DF as soon as the TLB >>> shootdown completes. >> The code change looks correct to me, but since I don't understand >> this part I'm afraid I may be overlooking something. I understand >> the "turn the BSP shadow stack into regular mappings" relates to >> cpu_smpboot_free()'s call to memguard_unguard_stack(), but I >> didn't think we come through cpu_smpboot_free() for the BSP upon >> entering or leaving S3. > > The v1 really did fix Marek's repro of the problem. > > The only possible way this can occur is if, somewhere, there is a call > to cpu_smpboot_free() for CPU0 with remove=0 on the S3 path I didn't think it was the BSP's stack that got written to, but the first AP's before letting it run. Jan
On 15/10/2020 16:16, Jan Beulich wrote: > On 15.10.2020 16:02, Andrew Cooper wrote: >> On 15/10/2020 09:50, Jan Beulich wrote: >>> On 14.10.2020 20:47, Andrew Cooper wrote: >>>> cpu_smpboot_alloc() is designed to be idempotent with respect to partially >>>> initialised state. This occurs for S3 and CPU parking, where enough state to >>>> handle NMIs/#MCs needs to remain valid for the entire lifetime of Xen, even >>>> when we otherwise want to offline the CPU. >>>> >>>> For simplicity between various configuration, Xen always uses shadow stack >>>> mappings (Read-only + Dirty) for the guard page, irrespective of whether >>>> CET-SS is enabled. >>>> >>>> Unfortunately, the CET-SS changes in memguard_guard_stack() broke idempotency >>>> by first writing out the supervisor shadow stack tokens with plain writes, >>>> then changing the mapping to being read-only. >>>> >>>> This ordering is strictly necessary to configure the BSP, which cannot have >>>> the supervisor tokens be written with WRSS. >>>> >>>> Instead of calling memguard_guard_stack() unconditionally, call it only when >>>> actually allocating a new stack. Xenheap allocates are guaranteed to be >>>> writeable, and the net result is idempotency WRT configuring stack_base[]. >>>> >>>> Fixes: 91d26ed304f ("x86/shstk: Create shadow stacks") >>>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> >>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >>>> --- >>>> CC: Jan Beulich <JBeulich@suse.com> >>>> CC: Roger Pau Monné <roger.pau@citrix.com> >>>> CC: Wei Liu <wl@xen.org> >>>> >>>> This can more easily be demonstrated with CPU hotplug than S3, and the absence >>>> of bug reports goes to show how rarely hotplug is used. >>>> >>>> v2: >>>> * Don't break S3/CPU parking in combination with CET-SS. v1 would, for S3, >>>> turn the BSP shadow stack into regular mappings, and #DF as soon as the TLB >>>> shootdown completes. >>> The code change looks correct to me, but since I don't understand >>> this part I'm afraid I may be overlooking something. I understand >>> the "turn the BSP shadow stack into regular mappings" relates to >>> cpu_smpboot_free()'s call to memguard_unguard_stack(), but I >>> didn't think we come through cpu_smpboot_free() for the BSP upon >>> entering or leaving S3. >> The v1 really did fix Marek's repro of the problem. >> >> The only possible way this can occur is if, somewhere, there is a call >> to cpu_smpboot_free() for CPU0 with remove=0 on the S3 path > I didn't think it was the BSP's stack that got written to, but the > first AP's before letting it run. Oh yes - my analysis was wrong. The CPU notifier for CPU 1 to come up runs on CPU 0. So only the --- text was wrong. Are you happy with the fix now? ~Andrew
On 15.10.2020 18:38, Andrew Cooper wrote: > On 15/10/2020 16:16, Jan Beulich wrote: >> On 15.10.2020 16:02, Andrew Cooper wrote: >>> On 15/10/2020 09:50, Jan Beulich wrote: >>>> On 14.10.2020 20:47, Andrew Cooper wrote: >>>>> cpu_smpboot_alloc() is designed to be idempotent with respect to partially >>>>> initialised state. This occurs for S3 and CPU parking, where enough state to >>>>> handle NMIs/#MCs needs to remain valid for the entire lifetime of Xen, even >>>>> when we otherwise want to offline the CPU. >>>>> >>>>> For simplicity between various configuration, Xen always uses shadow stack >>>>> mappings (Read-only + Dirty) for the guard page, irrespective of whether >>>>> CET-SS is enabled. >>>>> >>>>> Unfortunately, the CET-SS changes in memguard_guard_stack() broke idempotency >>>>> by first writing out the supervisor shadow stack tokens with plain writes, >>>>> then changing the mapping to being read-only. >>>>> >>>>> This ordering is strictly necessary to configure the BSP, which cannot have >>>>> the supervisor tokens be written with WRSS. >>>>> >>>>> Instead of calling memguard_guard_stack() unconditionally, call it only when >>>>> actually allocating a new stack. Xenheap allocates are guaranteed to be >>>>> writeable, and the net result is idempotency WRT configuring stack_base[]. >>>>> >>>>> Fixes: 91d26ed304f ("x86/shstk: Create shadow stacks") >>>>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> >>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >>>>> --- >>>>> CC: Jan Beulich <JBeulich@suse.com> >>>>> CC: Roger Pau Monné <roger.pau@citrix.com> >>>>> CC: Wei Liu <wl@xen.org> >>>>> >>>>> This can more easily be demonstrated with CPU hotplug than S3, and the absence >>>>> of bug reports goes to show how rarely hotplug is used. >>>>> >>>>> v2: >>>>> * Don't break S3/CPU parking in combination with CET-SS. v1 would, for S3, >>>>> turn the BSP shadow stack into regular mappings, and #DF as soon as the TLB >>>>> shootdown completes. >>>> The code change looks correct to me, but since I don't understand >>>> this part I'm afraid I may be overlooking something. I understand >>>> the "turn the BSP shadow stack into regular mappings" relates to >>>> cpu_smpboot_free()'s call to memguard_unguard_stack(), but I >>>> didn't think we come through cpu_smpboot_free() for the BSP upon >>>> entering or leaving S3. >>> The v1 really did fix Marek's repro of the problem. >>> >>> The only possible way this can occur is if, somewhere, there is a call >>> to cpu_smpboot_free() for CPU0 with remove=0 on the S3 path >> I didn't think it was the BSP's stack that got written to, but the >> first AP's before letting it run. > > Oh yes - my analysis was wrong. The CPU notifier for CPU 1 to come up > runs on CPU 0. > > So only the --- text was wrong. Are you happy with the fix now? Indeed I am: Reviewed-by: Jan Beulich <jbeulich@suse.com> Jan
© 2016 - 2024 Red Hat, Inc.