[PATCH] x86/S3: Restore CR4 earlier during resume

Andrew Cooper posted 1 patch 3 years, 6 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/20201002213650.2197-1-andrew.cooper3@citrix.com
xen/arch/x86/acpi/power.c       | 3 ---
xen/arch/x86/acpi/wakeup_prot.S | 5 +++++
2 files changed, 5 insertions(+), 3 deletions(-)
[PATCH] x86/S3: Restore CR4 earlier during resume
Posted by Andrew Cooper 3 years, 6 months ago
c/s 4304ff420e5 "x86/S3: Drop {save,restore}_rest_processor_state()
completely" moved CR4 restoration up into C, to account for the fact that MCE
was explicitly handled later.

However, time_resume() ends up making an EFI Runtime Service call, and EFI
explodes without OSFXSR, presumably when trying to spill %xmm registers onto
the stack.

Given this codepath, and the potential for other issues of a similar kind (TLB
flushing vs INVPCID, HVM logic vs VMXE, etc), restore CR4 in asm before
entering C.

Ignore the previous MCE special case, because its not actually necessary.  The
handler is already suitably configured from before suspend.

Fixes: 4304ff420e5 ("x86/S3: Drop {save,restore}_rest_processor_state() completely")
Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

This is one definite bug fix.  It doesn't appear to be the only S3 bug
however.
---
 xen/arch/x86/acpi/power.c       | 3 ---
 xen/arch/x86/acpi/wakeup_prot.S | 5 +++++
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 4fb1e7a148..7f162a4df9 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -276,9 +276,6 @@ static int enter_state(u32 state)
 
     mcheck_init(&boot_cpu_data, false);
 
-    /* Restore CR4 from cached value, now MCE is set up. */
-    write_cr4(read_cr4());
-
     printk(XENLOG_INFO "Finishing wakeup from ACPI S%d state.\n", state);
 
     if ( (state == ACPI_STATE_S3) && error )
diff --git a/xen/arch/x86/acpi/wakeup_prot.S b/xen/arch/x86/acpi/wakeup_prot.S
index c6b3fcc93d..1ee5551fb5 100644
--- a/xen/arch/x86/acpi/wakeup_prot.S
+++ b/xen/arch/x86/acpi/wakeup_prot.S
@@ -110,6 +110,11 @@ ENTRY(s3_resume)
 
         call    load_system_tables
 
+        /* Restore CR4 from the cpuinfo block. */
+        GET_STACK_END(bx)
+        mov     STACK_CPUINFO_FIELD(cr4)(%rbx), %rax
+        mov     %rax, %cr4
+
 .Lsuspend_err:
         pop     %r15
         pop     %r14
-- 
2.11.0


Re: [PATCH] x86/S3: Restore CR4 earlier during resume
Posted by Jan Beulich 3 years, 6 months ago
On 02.10.2020 23:36, Andrew Cooper wrote:
> c/s 4304ff420e5 "x86/S3: Drop {save,restore}_rest_processor_state()
> completely" moved CR4 restoration up into C, to account for the fact that MCE
> was explicitly handled later.
> 
> However, time_resume() ends up making an EFI Runtime Service call, and EFI
> explodes without OSFXSR, presumably when trying to spill %xmm registers onto
> the stack.
> 
> Given this codepath, and the potential for other issues of a similar kind (TLB
> flushing vs INVPCID, HVM logic vs VMXE, etc), restore CR4 in asm before
> entering C.
> 
> Ignore the previous MCE special case, because its not actually necessary.  The
> handler is already suitably configured from before suspend.

Are you suggesting we could drop the call to mcheck_init() altogether?

> Fixes: 4304ff420e5 ("x86/S3: Drop {save,restore}_rest_processor_state() completely")
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

Jan

Re: [PATCH] x86/S3: Restore CR4 earlier during resume
Posted by Andrew Cooper 3 years, 6 months ago
On 04/10/2020 08:38, Jan Beulich wrote:
> On 02.10.2020 23:36, Andrew Cooper wrote:
>> c/s 4304ff420e5 "x86/S3: Drop {save,restore}_rest_processor_state()
>> completely" moved CR4 restoration up into C, to account for the fact that MCE
>> was explicitly handled later.
>>
>> However, time_resume() ends up making an EFI Runtime Service call, and EFI
>> explodes without OSFXSR, presumably when trying to spill %xmm registers onto
>> the stack.
>>
>> Given this codepath, and the potential for other issues of a similar kind (TLB
>> flushing vs INVPCID, HVM logic vs VMXE, etc), restore CR4 in asm before
>> entering C.
>>
>> Ignore the previous MCE special case, because its not actually necessary.  The
>> handler is already suitably configured from before suspend.
> Are you suggesting we could drop the call to mcheck_init() altogether?

Not completely.  It reconfigures some of the MCE bank controls, which
probably won't survive S3, but the #MC handler itself is fully intact
once the IDT is re-established.

It probably wants splitting in two, but I think some part of it needs to
remain.

>
>> Fixes: 4304ff420e5 ("x86/S3: Drop {save,restore}_rest_processor_state() completely")
>> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>

Thanks,

~Andrew

Re: [PATCH] x86/S3: Restore CR4 earlier during resume
Posted by Marek Marczykowski-Górecki 3 years, 6 months ago
On Fri, Oct 02, 2020 at 10:36:50PM +0100, Andrew Cooper wrote:
> c/s 4304ff420e5 "x86/S3: Drop {save,restore}_rest_processor_state()
> completely" moved CR4 restoration up into C, to account for the fact that MCE
> was explicitly handled later.
> 
> However, time_resume() ends up making an EFI Runtime Service call, and EFI
> explodes without OSFXSR, presumably when trying to spill %xmm registers onto
> the stack.
> 
> Given this codepath, and the potential for other issues of a similar kind (TLB
> flushing vs INVPCID, HVM logic vs VMXE, etc), restore CR4 in asm before
> entering C.
> 
> Ignore the previous MCE special case, because its not actually necessary.  The
> handler is already suitably configured from before suspend.
> 
> Fixes: 4304ff420e5 ("x86/S3: Drop {save,restore}_rest_processor_state() completely")
> Reported-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

This one doesn't build, wakeup_prot.S misses #include <asm/asm_defns.h>
. With that fixed:

Tested-by: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
> 
> This is one definite bug fix.  It doesn't appear to be the only S3 bug
> however.
> ---
>  xen/arch/x86/acpi/power.c       | 3 ---
>  xen/arch/x86/acpi/wakeup_prot.S | 5 +++++
>  2 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
> index 4fb1e7a148..7f162a4df9 100644
> --- a/xen/arch/x86/acpi/power.c
> +++ b/xen/arch/x86/acpi/power.c
> @@ -276,9 +276,6 @@ static int enter_state(u32 state)
>  
>      mcheck_init(&boot_cpu_data, false);
>  
> -    /* Restore CR4 from cached value, now MCE is set up. */
> -    write_cr4(read_cr4());
> -
>      printk(XENLOG_INFO "Finishing wakeup from ACPI S%d state.\n", state);
>  
>      if ( (state == ACPI_STATE_S3) && error )
> diff --git a/xen/arch/x86/acpi/wakeup_prot.S b/xen/arch/x86/acpi/wakeup_prot.S
> index c6b3fcc93d..1ee5551fb5 100644
> --- a/xen/arch/x86/acpi/wakeup_prot.S
> +++ b/xen/arch/x86/acpi/wakeup_prot.S
> @@ -110,6 +110,11 @@ ENTRY(s3_resume)
>  
>          call    load_system_tables
>  
> +        /* Restore CR4 from the cpuinfo block. */
> +        GET_STACK_END(bx)
> +        mov     STACK_CPUINFO_FIELD(cr4)(%rbx), %rax
> +        mov     %rax, %cr4
> +
>  .Lsuspend_err:
>          pop     %r15
>          pop     %r14

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?