[Xen-devel] [PATCH] x86: store cr4 during suspend/resume

Roger Pau Monne posted 1 patch 4 years, 4 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/xen tags/patchew/20191209180638.47305-1-roger.pau@citrix.com
xen/arch/x86/acpi/wakeup_prot.S | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
[Xen-devel] [PATCH] x86: store cr4 during suspend/resume
Posted by Roger Pau Monne 4 years, 4 months ago
Currently cr4 is not cached before suspension, and mmu_cr4_features is
used in order to restore the expected cr4 value. This is correct so
far because the tasklet that executes the suspend/resume code is
running in the idle vCPU context.

In order to make the code less fragile, explicitly save the current
cr4 value before suspension, so that it can be restored afterwards.
This ensures that the cr4 value cached in the cpu_info doesn't get out
of sync after resume from suspension.

Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
---
 xen/arch/x86/acpi/wakeup_prot.S | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/acpi/wakeup_prot.S b/xen/arch/x86/acpi/wakeup_prot.S
index 74261cb4f1..57431e4e2d 100644
--- a/xen/arch/x86/acpi/wakeup_prot.S
+++ b/xen/arch/x86/acpi/wakeup_prot.S
@@ -40,6 +40,9 @@ ENTRY(do_suspend_lowlevel)
         mov     %cr3, GREG(ax)
         mov     GREG(ax), REF(saved_cr3)
 
+        mov     %cr4, GREG(ax)
+        mov     GREG(ax), REF(saved_cr4)
+
         call    save_rest_processor_state
 
         mov     $3, %rdi
@@ -53,8 +56,7 @@ ENTRY(do_suspend_lowlevel)
 ENTRY(__ret_point)
         lgdt    boot_gdtr(%rip)
 
-        /* mmu_cr4_features contains latest cr4 setting */
-        mov     REF(mmu_cr4_features), GREG(ax)
+        mov     REF(saved_cr4), GREG(ax)
         mov     GREG(ax), %cr4
 
         mov     REF(saved_cr3), GREG(ax)
@@ -124,3 +126,4 @@ DECLARE_GREG(15)
 
 saved_cr0:      .quad   0
 saved_cr3:      .quad   0
+saved_cr4:      .quad   0
-- 
2.24.0


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86: store cr4 during suspend/resume
Posted by Jan Beulich 4 years, 4 months ago
On 09.12.2019 19:06, Roger Pau Monne wrote:
> Currently cr4 is not cached before suspension, and mmu_cr4_features is
> used in order to restore the expected cr4 value. This is correct so
> far because the tasklet that executes the suspend/resume code is
> running in the idle vCPU context.
> 
> In order to make the code less fragile, explicitly save the current
> cr4 value before suspension, so that it can be restored afterwards.
> This ensures that the cr4 value cached in the cpu_info doesn't get out
> of sync after resume from suspension.
> 
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86: store cr4 during suspend/resume
Posted by Andrew Cooper 4 years, 4 months ago
On 09/12/2019 18:06, Roger Pau Monne wrote:
> Currently cr4 is not cached before suspension, and mmu_cr4_features is
> used in order to restore the expected cr4 value. This is correct so
> far because the tasklet that executes the suspend/resume code is
> running in the idle vCPU context.
>
> In order to make the code less fragile, explicitly save the current
> cr4 value before suspension, so that it can be restored afterwards.
> This ensures that the cr4 value cached in the cpu_info doesn't get out
> of sync after resume from suspension.
>
> Suggested-by: Jan Beulich <jbeulich@suse.com>
> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>

Why?  There is nothing fragile here.  Suspend/resume is always in idle
context and loads of other logic already depends on this.

I've been slowly stripping out redundant saved state like this.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86: store cr4 during suspend/resume
Posted by Jan Beulich 4 years, 4 months ago
On 10.12.2019 10:59, Andrew Cooper wrote:
> On 09/12/2019 18:06, Roger Pau Monne wrote:
>> Currently cr4 is not cached before suspension, and mmu_cr4_features is
>> used in order to restore the expected cr4 value. This is correct so
>> far because the tasklet that executes the suspend/resume code is
>> running in the idle vCPU context.
>>
>> In order to make the code less fragile, explicitly save the current
>> cr4 value before suspension, so that it can be restored afterwards.
>> This ensures that the cr4 value cached in the cpu_info doesn't get out
>> of sync after resume from suspension.
>>
>> Suggested-by: Jan Beulich <jbeulich@suse.com>
>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
> 
> Why?  There is nothing fragile here.  Suspend/resume is always in idle
> context and loads of other logic already depends on this.
> 
> I've been slowly stripping out redundant saved state like this.

Where it's clearly redundant, this is fine. But I don't think it's
sufficiently clear here, and going back to what was there before
is imo generally less error prone than going to some fixed state.
Furthermore I was hoping we could eventually do away with
mmu_cr4_features.

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86: store cr4 during suspend/resume
Posted by Andrew Cooper 4 years, 4 months ago
On 10/12/2019 10:07, Jan Beulich wrote:
> On 10.12.2019 10:59, Andrew Cooper wrote:
>> On 09/12/2019 18:06, Roger Pau Monne wrote:
>>> Currently cr4 is not cached before suspension, and mmu_cr4_features is
>>> used in order to restore the expected cr4 value. This is correct so
>>> far because the tasklet that executes the suspend/resume code is
>>> running in the idle vCPU context.
>>>
>>> In order to make the code less fragile, explicitly save the current
>>> cr4 value before suspension, so that it can be restored afterwards.
>>> This ensures that the cr4 value cached in the cpu_info doesn't get out
>>> of sync after resume from suspension.
>>>
>>> Suggested-by: Jan Beulich <jbeulich@suse.com>
>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>> Why?  There is nothing fragile here.  Suspend/resume is always in idle
>> context and loads of other logic already depends on this.
>>
>> I've been slowly stripping out redundant saved state like this.
> Where it's clearly redundant, this is fine. But I don't think it's
> sufficiently clear here

There is a reason I made it explicitly crystal clear with c/s 87e7b4d5b

> , and going back to what was there before
> is imo generally less error prone than going to some fixed state.

It is demonstrably more error prone, which is why I'm slowly killing it.

Stashing state wastes unnecessary space, and adds an error case where
state is either stashed incorrectly, or gets modified before restore,
and we'll blindly use.

Two examples of real bugs caused by this are c/s 0c30171cb and 4ee0ad72d

Absolutely nothing remaining in suspend.c should be spilled.  It can all
be (re)caluclated from the same information source as the AP boot path,
and the result will be strictly smaller in RAM, and more robust.

> Furthermore I was hoping we could eventually do away with
> mmu_cr4_features.

How do you plan on doing this?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86: store cr4 during suspend/resume
Posted by Jan Beulich 4 years, 4 months ago
On 11.12.2019 20:36, Andrew Cooper wrote:
> On 10/12/2019 10:07, Jan Beulich wrote:
>> On 10.12.2019 10:59, Andrew Cooper wrote:
>>> On 09/12/2019 18:06, Roger Pau Monne wrote:
>>>> Currently cr4 is not cached before suspension, and mmu_cr4_features is
>>>> used in order to restore the expected cr4 value. This is correct so
>>>> far because the tasklet that executes the suspend/resume code is
>>>> running in the idle vCPU context.
>>>>
>>>> In order to make the code less fragile, explicitly save the current
>>>> cr4 value before suspension, so that it can be restored afterwards.
>>>> This ensures that the cr4 value cached in the cpu_info doesn't get out
>>>> of sync after resume from suspension.
>>>>
>>>> Suggested-by: Jan Beulich <jbeulich@suse.com>
>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>> Why?  There is nothing fragile here.  Suspend/resume is always in idle
>>> context and loads of other logic already depends on this.
>>>
>>> I've been slowly stripping out redundant saved state like this.
>> Where it's clearly redundant, this is fine. But I don't think it's
>> sufficiently clear here
> 
> There is a reason I made it explicitly crystal clear with c/s 87e7b4d5b

Well, this makes clear we're in idle context, yes. But there's
still a disconnect between this and the use of mmu_cr4_features
(up to and including the somewhat misleading comment saying
"mmu_cr4_features contains latest cr4 setting" without it really
being clear what "latest" means, now that we run with varying
CR4 values. Yes, write_ptbase() does use the variable when
switching to idle, but the variable name lack any connection to
this fact.

>> , and going back to what was there before
>> is imo generally less error prone than going to some fixed state.
> 
> It is demonstrably more error prone, which is why I'm slowly killing it.
> 
> Stashing state wastes unnecessary space, and adds an error case where
> state is either stashed incorrectly, or gets modified before restore,
> and we'll blindly use.

The waste of space is entirely secondary here, I think. A value
getting modified before restore is no different from a value
going out of sync with the variable we reload from. It's a blind
use in either case.

> Two examples of real bugs caused by this are c/s 0c30171cb and 4ee0ad72d

I see your point for the former, but the latter seems to be unrelated.

> Absolutely nothing remaining in suspend.c should be spilled.  It can all
> be (re)caluclated from the same information source as the AP boot path,
> and the result will be strictly smaller in RAM, and more robust.

Robustness to me would imply using the same code for doing the
calculations, not re-calculating from the same information source.
This could be as simple as an idle_cr4() wrapper around the read
of mmu_cr4_features for the case at hand (suitably used wherever
applicable).

Anyway - together with your subsequent mail I accept your objections.
Once the code changes proposed there have gone in, I think it'll
become more clear to readers that indeed state saving/restoring is to
be the exception, not the rule (current code doesn't give this
impression, I think).

Jan

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86: store cr4 during suspend/resume
Posted by Andrew Cooper 4 years, 4 months ago
On 12/12/2019 09:52, Jan Beulich wrote:
> On 11.12.2019 20:36, Andrew Cooper wrote:
>> On 10/12/2019 10:07, Jan Beulich wrote:
>>> On 10.12.2019 10:59, Andrew Cooper wrote:
>>>> On 09/12/2019 18:06, Roger Pau Monne wrote:
>>>>> Currently cr4 is not cached before suspension, and mmu_cr4_features is
>>>>> used in order to restore the expected cr4 value. This is correct so
>>>>> far because the tasklet that executes the suspend/resume code is
>>>>> running in the idle vCPU context.
>>>>>
>>>>> In order to make the code less fragile, explicitly save the current
>>>>> cr4 value before suspension, so that it can be restored afterwards.
>>>>> This ensures that the cr4 value cached in the cpu_info doesn't get out
>>>>> of sync after resume from suspension.
>>>>>
>>>>> Suggested-by: Jan Beulich <jbeulich@suse.com>
>>>>> Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
>>>> Why?  There is nothing fragile here.  Suspend/resume is always in idle
>>>> context and loads of other logic already depends on this.
>>>>
>>>> I've been slowly stripping out redundant saved state like this.
>>> Where it's clearly redundant, this is fine. But I don't think it's
>>> sufficiently clear here
>> There is a reason I made it explicitly crystal clear with c/s 87e7b4d5b
> Well, this makes clear we're in idle context, yes. But there's
> still a disconnect between this and the use of mmu_cr4_features
> (up to and including the somewhat misleading comment saying
> "mmu_cr4_features contains latest cr4 setting" without it really
> being clear what "latest" means, now that we run with varying
> CR4 values. Yes, write_ptbase() does use the variable when
> switching to idle, but the variable name lack any connection to
> this fact.

The name of the variable should probably be improved - I'm not a fan of
its current name.

Perhaps this seems more obvious to me than others because I did all the
work for XSA-293, but the commit message of c/s b2dd00574a4 is relevant:

> First of all, modify write_ptbase() to always use mmu_cr4_features for
> IDLE
> and HVM contexts.  mmu_cr4_features *is* the correct value to use, and
> makes
> the ASSERT() obviously redundant.

and the same applies to S3 path.

>
>>> , and going back to what was there before
>>> is imo generally less error prone than going to some fixed state.
>> It is demonstrably more error prone, which is why I'm slowly killing it.
>>
>> Stashing state wastes unnecessary space, and adds an error case where
>> state is either stashed incorrectly, or gets modified before restore,
>> and we'll blindly use.
> The waste of space is entirely secondary here, I think. A value
> getting modified before restore is no different from a value
> going out of sync with the variable we reload from. It's a blind
> use in either case.
>
>> Two examples of real bugs caused by this are c/s 0c30171cb and 4ee0ad72d
> I see your point for the former, but the latter seems to be unrelated.

Oh - quite right.  I wasn't paying quite enough attention when doing
archaeology.

>
>> Absolutely nothing remaining in suspend.c should be spilled.  It can all
>> be (re)caluclated from the same information source as the AP boot path,
>> and the result will be strictly smaller in RAM, and more robust.
> Robustness to me would imply using the same code for doing the
> calculations, not re-calculating from the same information source.
> This could be as simple as an idle_cr4() wrapper around the read
> of mmu_cr4_features for the case at hand (suitably used wherever
> applicable).
>
> Anyway - together with your subsequent mail I accept your objections.
> Once the code changes proposed there have gone in, I think it'll
> become more clear to readers that indeed state saving/restoring is to
> be the exception, not the rule (current code doesn't give this
> impression, I think).

It was all inherited from Linux, and is slowly being stripped out there
as well.

I'll try and do some more cleanup in some free time.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [PATCH] x86: store cr4 during suspend/resume
Posted by Andrew Cooper 4 years, 4 months ago
On 11/12/2019 19:36, Andrew Cooper wrote:
> Absolutely nothing remaining in suspend.c should be spilled.  It can all
> be (re)caluclated from the same information source as the AP boot path,
> and the result will be strictly smaller in RAM, and more robust.

And at a cursory glance, the logic in suspend.c doesn't correctly handle
the ler=1 case.  This can trivially be fixed by:

diff --git a/xen/arch/x86/acpi/suspend.c b/xen/arch/x86/acpi/suspend.c
index c9dea67bf3..50dbb14528 100644
--- a/xen/arch/x86/acpi/suspend.c
+++ b/xen/arch/x86/acpi/suspend.c
@@ -15,8 +15,6 @@
 #include <asm/xstate.h>
 #include <xen/hypercall.h>
 
-static unsigned long saved_lstar, saved_cstar;
-static unsigned long saved_sysenter_esp, saved_sysenter_eip;
 static unsigned long saved_fs_base, saved_gs_base, saved_kernel_gs_base;
 static uint64_t saved_xcr0;
 
@@ -24,15 +22,6 @@ void save_rest_processor_state(void)
 {
     saved_fs_base = rdfsbase();
     saved_gs_base = rdgsbase();
-    rdmsrl(MSR_SHADOW_GS_BASE, saved_kernel_gs_base);
-    rdmsrl(MSR_CSTAR, saved_cstar);
-    rdmsrl(MSR_LSTAR, saved_lstar);
-
-    if ( cpu_has_sep )
-    {
-        rdmsrl(MSR_IA32_SYSENTER_ESP, saved_sysenter_esp);
-        rdmsrl(MSR_IA32_SYSENTER_EIP, saved_sysenter_eip);
-    }
 
     if ( cpu_has_xsave )
         saved_xcr0 = get_xcr0();
@@ -42,25 +31,12 @@ void save_rest_processor_state(void)
 void restore_rest_processor_state(void)
 {
     load_system_tables();
-
-    /* Recover syscall MSRs */
-    wrmsrl(MSR_LSTAR, saved_lstar);
-    wrmsrl(MSR_CSTAR, saved_cstar);
-    wrmsrl(MSR_STAR, XEN_MSR_STAR);
-    wrmsrl(MSR_SYSCALL_MASK, XEN_SYSCALL_MASK);
+    percpu_traps_init();
 
     wrfsbase(saved_fs_base);
     wrgsbase(saved_gs_base);
     wrmsrl(MSR_SHADOW_GS_BASE, saved_kernel_gs_base);
 
-    if ( cpu_has_sep )
-    {
-        /* Recover sysenter MSRs */
-        wrmsrl(MSR_IA32_SYSENTER_ESP, saved_sysenter_esp);
-        wrmsrl(MSR_IA32_SYSENTER_EIP, saved_sysenter_eip);
-        wrmsr(MSR_IA32_SYSENTER_CS, __HYPERVISOR_CS, 0);
-    }
-
     if ( cpu_has_xsave && !set_xcr0(saved_xcr0) )
         BUG();
 

for starters.

The fs/gs/shadow values are guest-only state so can just be discarded.

xcr0 isn't needed but this code is latently(soon-to-be?) buggy by not
handling XSS at the same time.  Both should follow the BSP logic for
evaluating the system default.

That lets us delete save_rest_processor_state() entirely.

In the assembly side of things, ss and cr3 obviously don't need to be
spilled.  ss is always 0, and we're already on the correct pagetable. 
We're also probably on the correct cr0.

The caller-clobbered GPRs can just be discarded, and that gets our
stashed state down to almost nothing.

I trust I have made my point blindingly obvious.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel