[PATCH v2] x86/smpboot: Don't unconditionally call memguard_guard_stack() in cpu_smpboot_alloc()

Andrew Cooper posted 1 patch 3 years, 6 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/20201014184708.17758-1-andrew.cooper3@citrix.com
xen/arch/x86/smpboot.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
[PATCH v2] x86/smpboot: Don't unconditionally call memguard_guard_stack() in cpu_smpboot_alloc()
Posted by Andrew Cooper 3 years, 6 months ago
cpu_smpboot_alloc() is designed to be idempotent with respect to partially
initialised state.  This occurs for S3 and CPU parking, where enough state to
handle NMIs/#MCs needs to remain valid for the entire lifetime of Xen, even
when we otherwise want to offline the CPU.

For simplicity between various configuration, Xen always uses shadow stack
mappings (Read-only + Dirty) for the guard page, irrespective of whether
CET-SS is enabled.

Unfortunately, the CET-SS changes in memguard_guard_stack() broke idempotency
by first writing out the supervisor shadow stack tokens with plain writes,
then changing the mapping to being read-only.

This ordering is strictly necessary to configure the BSP, which cannot have
the supervisor tokens be written with WRSS.

Instead of calling memguard_guard_stack() unconditionally, call it only when
actually allocating a new stack.  Xenheap allocates are guaranteed to be
writeable, and the net result is idempotency WRT configuring stack_base[].

Fixes: 91d26ed304f ("x86/shstk: Create shadow stacks")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>

This can more easily be demonstrated with CPU hotplug than S3, and the absence
of bug reports goes to show how rarely hotplug is used.

v2:
 * Don't break S3/CPU parking in combination with CET-SS.  v1 would, for S3,
   turn the BSP shadow stack into regular mappings, and #DF as soon as the TLB
   shootdown completes.  For CPU Parking, it would invalidate the shadow stack
   of the parked CPUs, causing a #DF on the next NMI/#MC to hit the thread.
---
 xen/arch/x86/smpboot.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/smpboot.c b/xen/arch/x86/smpboot.c
index 5708573c41..67e727cebd 100644
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -997,16 +997,18 @@ static int cpu_smpboot_alloc(unsigned int cpu)
         memflags = MEMF_node(node);
 
     if ( stack_base[cpu] == NULL )
+    {
         stack_base[cpu] = alloc_xenheap_pages(STACK_ORDER, memflags);
-    if ( stack_base[cpu] == NULL )
-        goto out;
+        if ( !stack_base[cpu] )
+            goto out;
+
+        memguard_guard_stack(stack_base[cpu]);
+    }
 
     info = get_cpu_info_from_stack((unsigned long)stack_base[cpu]);
     info->processor_id = cpu;
     info->per_cpu_offset = __per_cpu_offset[cpu];
 
-    memguard_guard_stack(stack_base[cpu]);
-
     gdt = per_cpu(gdt, cpu) ?: alloc_xenheap_pages(0, memflags);
     if ( gdt == NULL )
         goto out;
-- 
2.11.0


Re: [PATCH v2] x86/smpboot: Don't unconditionally call memguard_guard_stack() in cpu_smpboot_alloc()
Posted by Jan Beulich 3 years, 6 months ago
On 14.10.2020 20:47, Andrew Cooper wrote:
> cpu_smpboot_alloc() is designed to be idempotent with respect to partially
> initialised state.  This occurs for S3 and CPU parking, where enough state to
> handle NMIs/#MCs needs to remain valid for the entire lifetime of Xen, even
> when we otherwise want to offline the CPU.
> 
> For simplicity between various configuration, Xen always uses shadow stack
> mappings (Read-only + Dirty) for the guard page, irrespective of whether
> CET-SS is enabled.
> 
> Unfortunately, the CET-SS changes in memguard_guard_stack() broke idempotency
> by first writing out the supervisor shadow stack tokens with plain writes,
> then changing the mapping to being read-only.
> 
> This ordering is strictly necessary to configure the BSP, which cannot have
> the supervisor tokens be written with WRSS.
> 
> Instead of calling memguard_guard_stack() unconditionally, call it only when
> actually allocating a new stack.  Xenheap allocates are guaranteed to be
> writeable, and the net result is idempotency WRT configuring stack_base[].
> 
> Fixes: 91d26ed304f ("x86/shstk: Create shadow stacks")
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> 
> This can more easily be demonstrated with CPU hotplug than S3, and the absence
> of bug reports goes to show how rarely hotplug is used.
> 
> v2:
>  * Don't break S3/CPU parking in combination with CET-SS.  v1 would, for S3,
>    turn the BSP shadow stack into regular mappings, and #DF as soon as the TLB
>    shootdown completes.

The code change looks correct to me, but since I don't understand
this part I'm afraid I may be overlooking something. I understand
the "turn the BSP shadow stack into regular mappings" relates to
cpu_smpboot_free()'s call to memguard_unguard_stack(), but I
didn't think we come through cpu_smpboot_free() for the BSP upon
entering or leaving S3.

Jan

Re: [PATCH v2] x86/smpboot: Don't unconditionally call memguard_guard_stack() in cpu_smpboot_alloc()
Posted by Andrew Cooper 3 years, 6 months ago
On 15/10/2020 09:50, Jan Beulich wrote:
> On 14.10.2020 20:47, Andrew Cooper wrote:
>> cpu_smpboot_alloc() is designed to be idempotent with respect to partially
>> initialised state.  This occurs for S3 and CPU parking, where enough state to
>> handle NMIs/#MCs needs to remain valid for the entire lifetime of Xen, even
>> when we otherwise want to offline the CPU.
>>
>> For simplicity between various configuration, Xen always uses shadow stack
>> mappings (Read-only + Dirty) for the guard page, irrespective of whether
>> CET-SS is enabled.
>>
>> Unfortunately, the CET-SS changes in memguard_guard_stack() broke idempotency
>> by first writing out the supervisor shadow stack tokens with plain writes,
>> then changing the mapping to being read-only.
>>
>> This ordering is strictly necessary to configure the BSP, which cannot have
>> the supervisor tokens be written with WRSS.
>>
>> Instead of calling memguard_guard_stack() unconditionally, call it only when
>> actually allocating a new stack.  Xenheap allocates are guaranteed to be
>> writeable, and the net result is idempotency WRT configuring stack_base[].
>>
>> Fixes: 91d26ed304f ("x86/shstk: Create shadow stacks")
>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>> CC: Wei Liu <wl@xen.org>
>>
>> This can more easily be demonstrated with CPU hotplug than S3, and the absence
>> of bug reports goes to show how rarely hotplug is used.
>>
>> v2:
>>  * Don't break S3/CPU parking in combination with CET-SS.  v1 would, for S3,
>>    turn the BSP shadow stack into regular mappings, and #DF as soon as the TLB
>>    shootdown completes.
> The code change looks correct to me, but since I don't understand
> this part I'm afraid I may be overlooking something. I understand
> the "turn the BSP shadow stack into regular mappings" relates to
> cpu_smpboot_free()'s call to memguard_unguard_stack(), but I
> didn't think we come through cpu_smpboot_free() for the BSP upon
> entering or leaving S3.

The v1 really did fix Marek's repro of the problem.

The only possible way this can occur is if, somewhere, there is a call
to cpu_smpboot_free() for CPU0 with remove=0 on the S3 path

I have to admit that I can't actually spot where it is.


Either way - it doesn't impact the fix, which attempts to make "the
stack" into a single object.  I experimented with introducing
smpboot_{alloc,free}_stack(), but the result wasn't clean and I
abandoned that approach.

~Andrew

Re: [PATCH v2] x86/smpboot: Don't unconditionally call memguard_guard_stack() in cpu_smpboot_alloc()
Posted by Jan Beulich 3 years, 6 months ago
On 15.10.2020 16:02, Andrew Cooper wrote:
> On 15/10/2020 09:50, Jan Beulich wrote:
>> On 14.10.2020 20:47, Andrew Cooper wrote:
>>> cpu_smpboot_alloc() is designed to be idempotent with respect to partially
>>> initialised state.  This occurs for S3 and CPU parking, where enough state to
>>> handle NMIs/#MCs needs to remain valid for the entire lifetime of Xen, even
>>> when we otherwise want to offline the CPU.
>>>
>>> For simplicity between various configuration, Xen always uses shadow stack
>>> mappings (Read-only + Dirty) for the guard page, irrespective of whether
>>> CET-SS is enabled.
>>>
>>> Unfortunately, the CET-SS changes in memguard_guard_stack() broke idempotency
>>> by first writing out the supervisor shadow stack tokens with plain writes,
>>> then changing the mapping to being read-only.
>>>
>>> This ordering is strictly necessary to configure the BSP, which cannot have
>>> the supervisor tokens be written with WRSS.
>>>
>>> Instead of calling memguard_guard_stack() unconditionally, call it only when
>>> actually allocating a new stack.  Xenheap allocates are guaranteed to be
>>> writeable, and the net result is idempotency WRT configuring stack_base[].
>>>
>>> Fixes: 91d26ed304f ("x86/shstk: Create shadow stacks")
>>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>> CC: Jan Beulich <JBeulich@suse.com>
>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>> CC: Wei Liu <wl@xen.org>
>>>
>>> This can more easily be demonstrated with CPU hotplug than S3, and the absence
>>> of bug reports goes to show how rarely hotplug is used.
>>>
>>> v2:
>>>  * Don't break S3/CPU parking in combination with CET-SS.  v1 would, for S3,
>>>    turn the BSP shadow stack into regular mappings, and #DF as soon as the TLB
>>>    shootdown completes.
>> The code change looks correct to me, but since I don't understand
>> this part I'm afraid I may be overlooking something. I understand
>> the "turn the BSP shadow stack into regular mappings" relates to
>> cpu_smpboot_free()'s call to memguard_unguard_stack(), but I
>> didn't think we come through cpu_smpboot_free() for the BSP upon
>> entering or leaving S3.
> 
> The v1 really did fix Marek's repro of the problem.
> 
> The only possible way this can occur is if, somewhere, there is a call
> to cpu_smpboot_free() for CPU0 with remove=0 on the S3 path

I didn't think it was the BSP's stack that got written to, but the
first AP's before letting it run.

Jan

Re: [PATCH v2] x86/smpboot: Don't unconditionally call memguard_guard_stack() in cpu_smpboot_alloc()
Posted by Andrew Cooper 3 years, 6 months ago
On 15/10/2020 16:16, Jan Beulich wrote:
> On 15.10.2020 16:02, Andrew Cooper wrote:
>> On 15/10/2020 09:50, Jan Beulich wrote:
>>> On 14.10.2020 20:47, Andrew Cooper wrote:
>>>> cpu_smpboot_alloc() is designed to be idempotent with respect to partially
>>>> initialised state.  This occurs for S3 and CPU parking, where enough state to
>>>> handle NMIs/#MCs needs to remain valid for the entire lifetime of Xen, even
>>>> when we otherwise want to offline the CPU.
>>>>
>>>> For simplicity between various configuration, Xen always uses shadow stack
>>>> mappings (Read-only + Dirty) for the guard page, irrespective of whether
>>>> CET-SS is enabled.
>>>>
>>>> Unfortunately, the CET-SS changes in memguard_guard_stack() broke idempotency
>>>> by first writing out the supervisor shadow stack tokens with plain writes,
>>>> then changing the mapping to being read-only.
>>>>
>>>> This ordering is strictly necessary to configure the BSP, which cannot have
>>>> the supervisor tokens be written with WRSS.
>>>>
>>>> Instead of calling memguard_guard_stack() unconditionally, call it only when
>>>> actually allocating a new stack.  Xenheap allocates are guaranteed to be
>>>> writeable, and the net result is idempotency WRT configuring stack_base[].
>>>>
>>>> Fixes: 91d26ed304f ("x86/shstk: Create shadow stacks")
>>>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>> ---
>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>>> CC: Wei Liu <wl@xen.org>
>>>>
>>>> This can more easily be demonstrated with CPU hotplug than S3, and the absence
>>>> of bug reports goes to show how rarely hotplug is used.
>>>>
>>>> v2:
>>>>  * Don't break S3/CPU parking in combination with CET-SS.  v1 would, for S3,
>>>>    turn the BSP shadow stack into regular mappings, and #DF as soon as the TLB
>>>>    shootdown completes.
>>> The code change looks correct to me, but since I don't understand
>>> this part I'm afraid I may be overlooking something. I understand
>>> the "turn the BSP shadow stack into regular mappings" relates to
>>> cpu_smpboot_free()'s call to memguard_unguard_stack(), but I
>>> didn't think we come through cpu_smpboot_free() for the BSP upon
>>> entering or leaving S3.
>> The v1 really did fix Marek's repro of the problem.
>>
>> The only possible way this can occur is if, somewhere, there is a call
>> to cpu_smpboot_free() for CPU0 with remove=0 on the S3 path
> I didn't think it was the BSP's stack that got written to, but the
> first AP's before letting it run.

Oh yes - my analysis was wrong.  The CPU notifier for CPU 1 to come up
runs on CPU 0.

So only the --- text was wrong.  Are you happy with the fix now?

~Andrew

Re: [PATCH v2] x86/smpboot: Don't unconditionally call memguard_guard_stack() in cpu_smpboot_alloc()
Posted by Jan Beulich 3 years, 6 months ago
On 15.10.2020 18:38, Andrew Cooper wrote:
> On 15/10/2020 16:16, Jan Beulich wrote:
>> On 15.10.2020 16:02, Andrew Cooper wrote:
>>> On 15/10/2020 09:50, Jan Beulich wrote:
>>>> On 14.10.2020 20:47, Andrew Cooper wrote:
>>>>> cpu_smpboot_alloc() is designed to be idempotent with respect to partially
>>>>> initialised state.  This occurs for S3 and CPU parking, where enough state to
>>>>> handle NMIs/#MCs needs to remain valid for the entire lifetime of Xen, even
>>>>> when we otherwise want to offline the CPU.
>>>>>
>>>>> For simplicity between various configuration, Xen always uses shadow stack
>>>>> mappings (Read-only + Dirty) for the guard page, irrespective of whether
>>>>> CET-SS is enabled.
>>>>>
>>>>> Unfortunately, the CET-SS changes in memguard_guard_stack() broke idempotency
>>>>> by first writing out the supervisor shadow stack tokens with plain writes,
>>>>> then changing the mapping to being read-only.
>>>>>
>>>>> This ordering is strictly necessary to configure the BSP, which cannot have
>>>>> the supervisor tokens be written with WRSS.
>>>>>
>>>>> Instead of calling memguard_guard_stack() unconditionally, call it only when
>>>>> actually allocating a new stack.  Xenheap allocates are guaranteed to be
>>>>> writeable, and the net result is idempotency WRT configuring stack_base[].
>>>>>
>>>>> Fixes: 91d26ed304f ("x86/shstk: Create shadow stacks")
>>>>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>>>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>>>> ---
>>>>> CC: Jan Beulich <JBeulich@suse.com>
>>>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>>>> CC: Wei Liu <wl@xen.org>
>>>>>
>>>>> This can more easily be demonstrated with CPU hotplug than S3, and the absence
>>>>> of bug reports goes to show how rarely hotplug is used.
>>>>>
>>>>> v2:
>>>>>  * Don't break S3/CPU parking in combination with CET-SS.  v1 would, for S3,
>>>>>    turn the BSP shadow stack into regular mappings, and #DF as soon as the TLB
>>>>>    shootdown completes.
>>>> The code change looks correct to me, but since I don't understand
>>>> this part I'm afraid I may be overlooking something. I understand
>>>> the "turn the BSP shadow stack into regular mappings" relates to
>>>> cpu_smpboot_free()'s call to memguard_unguard_stack(), but I
>>>> didn't think we come through cpu_smpboot_free() for the BSP upon
>>>> entering or leaving S3.
>>> The v1 really did fix Marek's repro of the problem.
>>>
>>> The only possible way this can occur is if, somewhere, there is a call
>>> to cpu_smpboot_free() for CPU0 with remove=0 on the S3 path
>> I didn't think it was the BSP's stack that got written to, but the
>> first AP's before letting it run.
> 
> Oh yes - my analysis was wrong.  The CPU notifier for CPU 1 to come up
> runs on CPU 0.
> 
> So only the --- text was wrong.  Are you happy with the fix now?

Indeed I am:
Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan