Lets just say this took an unreasoanble amount of time and effort to track
down, when trying to move traps_init() earlier during boot.
When the SYSCALL linkage MSRs are not configured ahead of _svm_cpu_up() on the
BSP, the first context switch into PV uses svm_load_segs() and clobbers the
later-set-up linkage with the 0's cached here, causing hypercalls issues by
the PV guest to enter at 0 in supervisor mode on the user stack.
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
v4:
* New
It occurs to me that it's not actually 0's we cache here. It's whatever
context was left from prior to Xen. We still don't reliably clean unused
MSRs.
---
xen/arch/x86/hvm/svm/svm.c | 16 ++++++++++++++++
xen/arch/x86/setup.c | 2 +-
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index 18ba837738c6..f1e02d919cae 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -35,6 +35,7 @@
#include <asm/p2m.h>
#include <asm/paging.h>
#include <asm/processor.h>
+#include <asm/traps.h>
#include <asm/vm_event.h>
#include <asm/x86_emulate.h>
@@ -1581,6 +1582,21 @@ static int _svm_cpu_up(bool bsp)
/* Initialize OSVW bits to be used by guests */
svm_host_osvw_init();
+ /*
+ * VMSAVE writes out the current full FS, GS, LDTR and TR segments, and
+ * the GS_SHADOW, SYSENTER and SYSCALL linkage MSRs.
+ *
+ * The segment data gets modified by the svm_load_segs() optimisation for
+ * PV context switches, but all values get reloaded at that point, as well
+ * as during context switch from SVM.
+ *
+ * If PV guests are available (and FRED is not in use), it is critical
+ * that the SYSCALL linkage MSRs been configured at this juncture.
+ */
+ ASSERT(opt_fred >= 0); /* Confirm that FRED-ness has been resolved */
+ if ( IS_ENABLED(CONFIG_PV) && !opt_fred )
+ ASSERT(rdmsr(MSR_LSTAR));
+
svm_vmsave_pa(per_cpu(host_vmcb, cpu));
return 0;
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index 27c63d1d97c9..675de3a649ea 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -2078,7 +2078,7 @@ void asmlinkage __init noreturn __start_xen(void)
&this_cpu(stubs).mfn);
BUG_ON(!this_cpu(stubs.addr));
- traps_init(); /* Needs stubs allocated. */
+ traps_init(); /* Needs stubs allocated, must be before presmp_initcalls. */
cpu_init();
--
2.39.5
On 27/02/2026 11:16 pm, Andrew Cooper wrote: > Lets just say this took an unreasoanble amount of time and effort to track > down, when trying to move traps_init() earlier during boot. > > When the SYSCALL linkage MSRs are not configured ahead of _svm_cpu_up() on the > BSP, the first context switch into PV uses svm_load_segs() and clobbers the > later-set-up linkage with the 0's cached here, causing hypercalls issues by > the PV guest to enter at 0 in supervisor mode on the user stack. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > --- > CC: Jan Beulich <JBeulich@suse.com> > CC: Roger Pau Monné <roger.pau@citrix.com> > > v4: > * New > > It occurs to me that it's not actually 0's we cache here. It's whatever > context was left from prior to Xen. We still don't reliably clean unused > MSRs. > --- > xen/arch/x86/hvm/svm/svm.c | 16 ++++++++++++++++ > xen/arch/x86/setup.c | 2 +- > 2 files changed, 17 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c > index 18ba837738c6..f1e02d919cae 100644 > --- a/xen/arch/x86/hvm/svm/svm.c > +++ b/xen/arch/x86/hvm/svm/svm.c > @@ -35,6 +35,7 @@ > #include <asm/p2m.h> > #include <asm/paging.h> > #include <asm/processor.h> > +#include <asm/traps.h> > #include <asm/vm_event.h> > #include <asm/x86_emulate.h> > > @@ -1581,6 +1582,21 @@ static int _svm_cpu_up(bool bsp) > /* Initialize OSVW bits to be used by guests */ > svm_host_osvw_init(); > > + /* > + * VMSAVE writes out the current full FS, GS, LDTR and TR segments, and > + * the GS_SHADOW, SYSENTER and SYSCALL linkage MSRs. > + * > + * The segment data gets modified by the svm_load_segs() optimisation for > + * PV context switches, but all values get reloaded at that point, as well > + * as during context switch from SVM. > + * > + * If PV guests are available (and FRED is not in use), it is critical > + * that the SYSCALL linkage MSRs been configured at this juncture. > + */ > + ASSERT(opt_fred >= 0); /* Confirm that FRED-ness has been resolved */ > + if ( IS_ENABLED(CONFIG_PV) && !opt_fred ) > + ASSERT(rdmsr(MSR_LSTAR)); It has occurred to me that this is subtly wrong. While FRED doesn't use LSTAR/SFMASK, it does reuse STAR. So this needs to be: if ( IS_ENABLED(CONFIG_PV) ) ASSERT(rdmsr(MSR_STAR)); with the include dropped, as the final sentence adjusted to say "even with FRED". ~Andrew
On 02.03.2026 16:20, Andrew Cooper wrote: > On 27/02/2026 11:16 pm, Andrew Cooper wrote: >> Lets just say this took an unreasoanble amount of time and effort to track >> down, when trying to move traps_init() earlier during boot. >> >> When the SYSCALL linkage MSRs are not configured ahead of _svm_cpu_up() on the >> BSP, the first context switch into PV uses svm_load_segs() and clobbers the >> later-set-up linkage with the 0's cached here, causing hypercalls issues by >> the PV guest to enter at 0 in supervisor mode on the user stack. >> >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> --- >> CC: Jan Beulich <JBeulich@suse.com> >> CC: Roger Pau Monné <roger.pau@citrix.com> >> >> v4: >> * New >> >> It occurs to me that it's not actually 0's we cache here. It's whatever >> context was left from prior to Xen. We still don't reliably clean unused >> MSRs. Actually, with this, ... >> --- a/xen/arch/x86/hvm/svm/svm.c >> +++ b/xen/arch/x86/hvm/svm/svm.c >> @@ -35,6 +35,7 @@ >> #include <asm/p2m.h> >> #include <asm/paging.h> >> #include <asm/processor.h> >> +#include <asm/traps.h> >> #include <asm/vm_event.h> >> #include <asm/x86_emulate.h> >> >> @@ -1581,6 +1582,21 @@ static int _svm_cpu_up(bool bsp) >> /* Initialize OSVW bits to be used by guests */ >> svm_host_osvw_init(); >> >> + /* >> + * VMSAVE writes out the current full FS, GS, LDTR and TR segments, and >> + * the GS_SHADOW, SYSENTER and SYSCALL linkage MSRs. >> + * >> + * The segment data gets modified by the svm_load_segs() optimisation for >> + * PV context switches, but all values get reloaded at that point, as well >> + * as during context switch from SVM. >> + * >> + * If PV guests are available (and FRED is not in use), it is critical >> + * that the SYSCALL linkage MSRs been configured at this juncture. >> + */ >> + ASSERT(opt_fred >= 0); /* Confirm that FRED-ness has been resolved */ >> + if ( IS_ENABLED(CONFIG_PV) && !opt_fred ) >> + ASSERT(rdmsr(MSR_LSTAR)); > > It has occurred to me that this is subtly wrong. While FRED doesn't use > LSTAR/SFMASK, it does reuse STAR. > > So this needs to be: > > if ( IS_ENABLED(CONFIG_PV) ) > ASSERT(rdmsr(MSR_STAR)); > > with the include dropped, as the final sentence adjusted to say "even > with FRED". ... if we inherit a non-zero value, is the assertion of much use this way? Jan
On 02/03/2026 3:34 pm, Jan Beulich wrote: > On 02.03.2026 16:20, Andrew Cooper wrote: >> On 27/02/2026 11:16 pm, Andrew Cooper wrote: >>> Lets just say this took an unreasoanble amount of time and effort to track >>> down, when trying to move traps_init() earlier during boot. >>> >>> When the SYSCALL linkage MSRs are not configured ahead of _svm_cpu_up() on the >>> BSP, the first context switch into PV uses svm_load_segs() and clobbers the >>> later-set-up linkage with the 0's cached here, causing hypercalls issues by >>> the PV guest to enter at 0 in supervisor mode on the user stack. >>> >>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >>> --- >>> CC: Jan Beulich <JBeulich@suse.com> >>> CC: Roger Pau Monné <roger.pau@citrix.com> >>> >>> v4: >>> * New >>> >>> It occurs to me that it's not actually 0's we cache here. It's whatever >>> context was left from prior to Xen. We still don't reliably clean unused >>> MSRs. > Actually, with this, ... > >>> --- a/xen/arch/x86/hvm/svm/svm.c >>> +++ b/xen/arch/x86/hvm/svm/svm.c >>> @@ -35,6 +35,7 @@ >>> #include <asm/p2m.h> >>> #include <asm/paging.h> >>> #include <asm/processor.h> >>> +#include <asm/traps.h> >>> #include <asm/vm_event.h> >>> #include <asm/x86_emulate.h> >>> >>> @@ -1581,6 +1582,21 @@ static int _svm_cpu_up(bool bsp) >>> /* Initialize OSVW bits to be used by guests */ >>> svm_host_osvw_init(); >>> >>> + /* >>> + * VMSAVE writes out the current full FS, GS, LDTR and TR segments, and >>> + * the GS_SHADOW, SYSENTER and SYSCALL linkage MSRs. >>> + * >>> + * The segment data gets modified by the svm_load_segs() optimisation for >>> + * PV context switches, but all values get reloaded at that point, as well >>> + * as during context switch from SVM. >>> + * >>> + * If PV guests are available (and FRED is not in use), it is critical >>> + * that the SYSCALL linkage MSRs been configured at this juncture. >>> + */ >>> + ASSERT(opt_fred >= 0); /* Confirm that FRED-ness has been resolved */ >>> + if ( IS_ENABLED(CONFIG_PV) && !opt_fred ) >>> + ASSERT(rdmsr(MSR_LSTAR)); >> It has occurred to me that this is subtly wrong. While FRED doesn't use >> LSTAR/SFMASK, it does reuse STAR. >> >> So this needs to be: >> >> if ( IS_ENABLED(CONFIG_PV) ) >> ASSERT(rdmsr(MSR_STAR)); >> >> with the include dropped, as the final sentence adjusted to say "even >> with FRED". > ... if we inherit a non-zero value, is the assertion of much use this way? The inherited case is normally when we're KEXEC'd into. That doesn't happen very much with Xen, and is more of a concern with Linux. But yes, IMO the assertion is still useful. CI boots from clean, so the ASSERT() will catch accidental code movement which violates the dependency. ~Andrew
On 28.02.2026 00:16, Andrew Cooper wrote: > Lets just say this took an unreasoanble amount of time and effort to track > down, when trying to move traps_init() earlier during boot. > > When the SYSCALL linkage MSRs are not configured ahead of _svm_cpu_up() on the > BSP, the first context switch into PV uses svm_load_segs() and clobbers the > later-set-up linkage with the 0's cached here, causing hypercalls issues by > the PV guest to enter at 0 in supervisor mode on the user stack. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com>
© 2016 - 2026 Red Hat, Inc.