[PATCH v4 04/14] x86/boot: Document the ordering dependency of _svm_cpu_up()

Andrew Cooper posted 14 patches 3 days, 4 hours ago
[PATCH v4 04/14] x86/boot: Document the ordering dependency of _svm_cpu_up()
Posted by Andrew Cooper 3 days, 4 hours ago
Lets just say this took an unreasoanble amount of time and effort to track
down, when trying to move traps_init() earlier during boot.

When the SYSCALL linkage MSRs are not configured ahead of _svm_cpu_up() on the
BSP, the first context switch into PV uses svm_load_segs() and clobbers the
later-set-up linkage with the 0's cached here, causing hypercalls issues by
the PV guest to enter at 0 in supervisor mode on the user stack.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>

v4:
 * New

It occurs to me that it's not actually 0's we cache here.  It's whatever
context was left from prior to Xen.  We still don't reliably clean unused
MSRs.
---
 xen/arch/x86/hvm/svm/svm.c | 16 ++++++++++++++++
 xen/arch/x86/setup.c       |  2 +-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 18ba837738c6..f1e02d919cae 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -35,6 +35,7 @@
 #include <asm/p2m.h>
 #include <asm/paging.h>
 #include <asm/processor.h>
+#include <asm/traps.h>
 #include <asm/vm_event.h>
 #include <asm/x86_emulate.h>
 
@@ -1581,6 +1582,21 @@ static int _svm_cpu_up(bool bsp)
     /* Initialize OSVW bits to be used by guests */
     svm_host_osvw_init();
 
+    /*
+     * VMSAVE writes out the current full FS, GS, LDTR and TR segments, and
+     * the GS_SHADOW, SYSENTER and SYSCALL linkage MSRs.
+     *
+     * The segment data gets modified by the svm_load_segs() optimisation for
+     * PV context switches, but all values get reloaded at that point, as well
+     * as during context switch from SVM.
+     *
+     * If PV guests are available (and FRED is not in use), it is critical
+     * that the SYSCALL linkage MSRs been configured at this juncture.
+     */
+    ASSERT(opt_fred >= 0); /* Confirm that FRED-ness has been resolved */
+    if ( IS_ENABLED(CONFIG_PV) && !opt_fred )
+        ASSERT(rdmsr(MSR_LSTAR));
+
     svm_vmsave_pa(per_cpu(host_vmcb, cpu));
 
     return 0;
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 27c63d1d97c9..675de3a649ea 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -2078,7 +2078,7 @@ void asmlinkage __init noreturn __start_xen(void)
                                            &this_cpu(stubs).mfn);
     BUG_ON(!this_cpu(stubs.addr));
 
-    traps_init(); /* Needs stubs allocated. */
+    traps_init(); /* Needs stubs allocated, must be before presmp_initcalls. */
 
     cpu_init();
 
-- 
2.39.5


Re: [PATCH v4 04/14] x86/boot: Document the ordering dependency of _svm_cpu_up()
Posted by Andrew Cooper 12 hours ago
On 27/02/2026 11:16 pm, Andrew Cooper wrote:
> Lets just say this took an unreasoanble amount of time and effort to track
> down, when trying to move traps_init() earlier during boot.
>
> When the SYSCALL linkage MSRs are not configured ahead of _svm_cpu_up() on the
> BSP, the first context switch into PV uses svm_load_segs() and clobbers the
> later-set-up linkage with the 0's cached here, causing hypercalls issues by
> the PV guest to enter at 0 in supervisor mode on the user stack.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
>
> v4:
>  * New
>
> It occurs to me that it's not actually 0's we cache here.  It's whatever
> context was left from prior to Xen.  We still don't reliably clean unused
> MSRs.
> ---
>  xen/arch/x86/hvm/svm/svm.c | 16 ++++++++++++++++
>  xen/arch/x86/setup.c       |  2 +-
>  2 files changed, 17 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
> index 18ba837738c6..f1e02d919cae 100644
> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -35,6 +35,7 @@
>  #include <asm/p2m.h>
>  #include <asm/paging.h>
>  #include <asm/processor.h>
> +#include <asm/traps.h>
>  #include <asm/vm_event.h>
>  #include <asm/x86_emulate.h>
>  
> @@ -1581,6 +1582,21 @@ static int _svm_cpu_up(bool bsp)
>      /* Initialize OSVW bits to be used by guests */
>      svm_host_osvw_init();
>  
> +    /*
> +     * VMSAVE writes out the current full FS, GS, LDTR and TR segments, and
> +     * the GS_SHADOW, SYSENTER and SYSCALL linkage MSRs.
> +     *
> +     * The segment data gets modified by the svm_load_segs() optimisation for
> +     * PV context switches, but all values get reloaded at that point, as well
> +     * as during context switch from SVM.
> +     *
> +     * If PV guests are available (and FRED is not in use), it is critical
> +     * that the SYSCALL linkage MSRs been configured at this juncture.
> +     */
> +    ASSERT(opt_fred >= 0); /* Confirm that FRED-ness has been resolved */
> +    if ( IS_ENABLED(CONFIG_PV) && !opt_fred )
> +        ASSERT(rdmsr(MSR_LSTAR));

It has occurred to me that this is subtly wrong.  While FRED doesn't use
LSTAR/SFMASK, it does reuse STAR.

So this needs to be:

    if ( IS_ENABLED(CONFIG_PV) )
        ASSERT(rdmsr(MSR_STAR));

with the include dropped, as the final sentence adjusted to say "even
with FRED".

~Andrew

Re: [PATCH v4 04/14] x86/boot: Document the ordering dependency of _svm_cpu_up()
Posted by Jan Beulich 11 hours ago
On 02.03.2026 16:20, Andrew Cooper wrote:
> On 27/02/2026 11:16 pm, Andrew Cooper wrote:
>> Lets just say this took an unreasoanble amount of time and effort to track
>> down, when trying to move traps_init() earlier during boot.
>>
>> When the SYSCALL linkage MSRs are not configured ahead of _svm_cpu_up() on the
>> BSP, the first context switch into PV uses svm_load_segs() and clobbers the
>> later-set-up linkage with the 0's cached here, causing hypercalls issues by
>> the PV guest to enter at 0 in supervisor mode on the user stack.
>>
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> ---
>> CC: Jan Beulich <JBeulich@suse.com>
>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>
>> v4:
>>  * New
>>
>> It occurs to me that it's not actually 0's we cache here.  It's whatever
>> context was left from prior to Xen.  We still don't reliably clean unused
>> MSRs.

Actually, with this, ...

>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -35,6 +35,7 @@
>>  #include <asm/p2m.h>
>>  #include <asm/paging.h>
>>  #include <asm/processor.h>
>> +#include <asm/traps.h>
>>  #include <asm/vm_event.h>
>>  #include <asm/x86_emulate.h>
>>  
>> @@ -1581,6 +1582,21 @@ static int _svm_cpu_up(bool bsp)
>>      /* Initialize OSVW bits to be used by guests */
>>      svm_host_osvw_init();
>>  
>> +    /*
>> +     * VMSAVE writes out the current full FS, GS, LDTR and TR segments, and
>> +     * the GS_SHADOW, SYSENTER and SYSCALL linkage MSRs.
>> +     *
>> +     * The segment data gets modified by the svm_load_segs() optimisation for
>> +     * PV context switches, but all values get reloaded at that point, as well
>> +     * as during context switch from SVM.
>> +     *
>> +     * If PV guests are available (and FRED is not in use), it is critical
>> +     * that the SYSCALL linkage MSRs been configured at this juncture.
>> +     */
>> +    ASSERT(opt_fred >= 0); /* Confirm that FRED-ness has been resolved */
>> +    if ( IS_ENABLED(CONFIG_PV) && !opt_fred )
>> +        ASSERT(rdmsr(MSR_LSTAR));
> 
> It has occurred to me that this is subtly wrong.  While FRED doesn't use
> LSTAR/SFMASK, it does reuse STAR.
> 
> So this needs to be:
> 
>     if ( IS_ENABLED(CONFIG_PV) )
>         ASSERT(rdmsr(MSR_STAR));
> 
> with the include dropped, as the final sentence adjusted to say "even
> with FRED".

... if we inherit a non-zero value, is the assertion of much use this way?

Jan

Re: [PATCH v4 04/14] x86/boot: Document the ordering dependency of _svm_cpu_up()
Posted by Andrew Cooper 11 hours ago
On 02/03/2026 3:34 pm, Jan Beulich wrote:
> On 02.03.2026 16:20, Andrew Cooper wrote:
>> On 27/02/2026 11:16 pm, Andrew Cooper wrote:
>>> Lets just say this took an unreasoanble amount of time and effort to track
>>> down, when trying to move traps_init() earlier during boot.
>>>
>>> When the SYSCALL linkage MSRs are not configured ahead of _svm_cpu_up() on the
>>> BSP, the first context switch into PV uses svm_load_segs() and clobbers the
>>> later-set-up linkage with the 0's cached here, causing hypercalls issues by
>>> the PV guest to enter at 0 in supervisor mode on the user stack.
>>>
>>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>>> ---
>>> CC: Jan Beulich <JBeulich@suse.com>
>>> CC: Roger Pau Monné <roger.pau@citrix.com>
>>>
>>> v4:
>>>  * New
>>>
>>> It occurs to me that it's not actually 0's we cache here.  It's whatever
>>> context was left from prior to Xen.  We still don't reliably clean unused
>>> MSRs.
> Actually, with this, ...
>
>>> --- a/xen/arch/x86/hvm/svm/svm.c
>>> +++ b/xen/arch/x86/hvm/svm/svm.c
>>> @@ -35,6 +35,7 @@
>>>  #include <asm/p2m.h>
>>>  #include <asm/paging.h>
>>>  #include <asm/processor.h>
>>> +#include <asm/traps.h>
>>>  #include <asm/vm_event.h>
>>>  #include <asm/x86_emulate.h>
>>>  
>>> @@ -1581,6 +1582,21 @@ static int _svm_cpu_up(bool bsp)
>>>      /* Initialize OSVW bits to be used by guests */
>>>      svm_host_osvw_init();
>>>  
>>> +    /*
>>> +     * VMSAVE writes out the current full FS, GS, LDTR and TR segments, and
>>> +     * the GS_SHADOW, SYSENTER and SYSCALL linkage MSRs.
>>> +     *
>>> +     * The segment data gets modified by the svm_load_segs() optimisation for
>>> +     * PV context switches, but all values get reloaded at that point, as well
>>> +     * as during context switch from SVM.
>>> +     *
>>> +     * If PV guests are available (and FRED is not in use), it is critical
>>> +     * that the SYSCALL linkage MSRs been configured at this juncture.
>>> +     */
>>> +    ASSERT(opt_fred >= 0); /* Confirm that FRED-ness has been resolved */
>>> +    if ( IS_ENABLED(CONFIG_PV) && !opt_fred )
>>> +        ASSERT(rdmsr(MSR_LSTAR));
>> It has occurred to me that this is subtly wrong.  While FRED doesn't use
>> LSTAR/SFMASK, it does reuse STAR.
>>
>> So this needs to be:
>>
>>     if ( IS_ENABLED(CONFIG_PV) )
>>         ASSERT(rdmsr(MSR_STAR));
>>
>> with the include dropped, as the final sentence adjusted to say "even
>> with FRED".
> ... if we inherit a non-zero value, is the assertion of much use this way?

The inherited case is normally when we're KEXEC'd into.  That doesn't
happen very much with Xen, and is more of a concern with Linux.

But yes, IMO the assertion is still useful.  CI boots from clean, so the
ASSERT() will catch accidental code movement which violates the dependency.

~Andrew

Re: [PATCH v4 04/14] x86/boot: Document the ordering dependency of _svm_cpu_up()
Posted by Jan Beulich 15 hours ago
On 28.02.2026 00:16, Andrew Cooper wrote:
> Lets just say this took an unreasoanble amount of time and effort to track
> down, when trying to move traps_init() earlier during boot.
> 
> When the SYSCALL linkage MSRs are not configured ahead of _svm_cpu_up() on the
> BSP, the first context switch into PV uses svm_load_segs() and clobbers the
> later-set-up linkage with the 0's cached here, causing hypercalls issues by
> the PV guest to enter at 0 in supervisor mode on the user stack.
> 
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Reviewed-by: Jan Beulich <jbeulich@suse.com>