c/s 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to
alt-call") went too far with dropping NULL function pointer checks.
smp_callin() calls hvm_cpu_up() unconditionally. When the platform doesn't
support HVM, hvm_enable() exits without filling in hvm_funcs, after which the
altcall pass nukes the (now unconditional) indirect call, causing:
(XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]----
(XEN) CPU: 1
(XEN) RIP: e008:[<ffff82d04034bef5>] start_secondary+0x393/0x3b7
(XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor
...
(XEN) Xen code around <ffff82d04034bef5> (start_secondary+0x393/0x3b7):
(XEN) ff ff 8b 05 1b 84 17 00 <0f> 0b 0f ff ff 90 89 c3 85 c0 0f 84 db fe ff ff
...
(XEN) Xen call trace:
(XEN) [<ffff82d04034bef5>] R start_secondary+0x393/0x3b7
(XEN) [<ffff82d0402000e2>] F __high_start+0x42/0x60
To make matters worse, __stop_this_cpu() calls hvm_cpu_down() unconditionally
too, so what happen next is:
(XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e008:[<ffff82d04034ab02>] __stop_this_cpu+0x12/0x3c
(XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor
...
(XEN) Xen code around <ffff82d04034ab02> (__stop_this_cpu+0x12/0x3c):
(XEN) 48 89 e5 e8 8a 1d fd ff <0f> 0b 0f ff ff 90 0f 06 db e3 48 89 e0 48 0d ff
...
(XEN) Xen call trace:
(XEN) [<ffff82d04034ab02>] R __stop_this_cpu+0x12/0x3c
(XEN) [<ffff82d04034ac15>] F smp_send_stop+0xdd/0xf8
(XEN) [<ffff82d04034a229>] F machine_restart+0xa2/0x298
(XEN) [<ffff82d04034a42a>] F arch/x86/shutdown.c#__machine_restart+0xb/0x11
(XEN) [<ffff82d04022fd15>] F smp_call_function_interrupt+0xbf/0xea
(XEN) [<ffff82d04034acc6>] F call_function_interrupt+0x35/0x37
(XEN) [<ffff82d040331a70>] F do_IRQ+0xa3/0x6b5
(XEN) [<ffff82d04039482a>] F common_interrupt+0x10a/0x120
(XEN) [<ffff82d04031f649>] F __udelay+0x3a/0x51
(XEN) [<ffff82d04034d5fb>] F __cpu_up+0x48f/0x734
(XEN) [<ffff82d040203c2b>] F cpu_up+0x7d/0xde
(XEN) [<ffff82d0404543d3>] F __start_xen+0x200b/0x2618
(XEN) [<ffff82d0402000ef>] F __high_start+0x4f/0x60
which recurses until hitting a stack overflow. The #DF handler, which resets
its stack on each invocation, loops indefinitely.
Reinstate the NULL function pointer checks for hvm_cpu_{up,down}().
Fixes: 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to alt-call")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
RFC. Not tested yet on the imacted hardware. It's a Xeon PHI with another
werid thing in need of debugging. First boot is fine, while second
boot (loading microcode this time) has a problem with vmx.
I wonder if we want to modify the callers to check for HVM being enabled,
rather than leaving the NULL pointer checks in a position where they're liable
to be reaped again.
Also, the #UD handler really should identify 0f 0b 0f ff ff as the
clobbered-altcall sequence, and provide a message to that effect.
---
xen/arch/x86/include/asm/hvm/hvm.h | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/xen/arch/x86/include/asm/hvm/hvm.h b/xen/arch/x86/include/asm/hvm/hvm.h
index a441cbc22159..2dd08567f730 100644
--- a/xen/arch/x86/include/asm/hvm/hvm.h
+++ b/xen/arch/x86/include/asm/hvm/hvm.h
@@ -553,12 +553,16 @@ static inline void hvm_invlpg(struct vcpu *v, unsigned long linear)
static inline int hvm_cpu_up(void)
{
- return alternative_call(hvm_funcs.cpu_up);
+ if ( hvm_funcs.cpu_up )
+ return alternative_call(hvm_funcs.cpu_up);
+
+ return 0;
}
static inline void hvm_cpu_down(void)
{
- alternative_vcall(hvm_funcs.cpu_down);
+ if ( hvm_funcs.cpu_down )
+ alternative_vcall(hvm_funcs.cpu_down);
}
static inline unsigned int hvm_get_insn_bytes(struct vcpu *v, uint8_t *buf)
--
2.11.0
On Fri, Feb 04, 2022 at 05:34:05PM +0000, Andrew Cooper wrote: > c/s 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to > alt-call") went too far with dropping NULL function pointer checks. > > smp_callin() calls hvm_cpu_up() unconditionally. When the platform doesn't > support HVM, hvm_enable() exits without filling in hvm_funcs, after which the > altcall pass nukes the (now unconditional) indirect call, causing: > > (XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]---- > (XEN) CPU: 1 > (XEN) RIP: e008:[<ffff82d04034bef5>] start_secondary+0x393/0x3b7 > (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor > ... > (XEN) Xen code around <ffff82d04034bef5> (start_secondary+0x393/0x3b7): > (XEN) ff ff 8b 05 1b 84 17 00 <0f> 0b 0f ff ff 90 89 c3 85 c0 0f 84 db fe ff ff > ... > (XEN) Xen call trace: > (XEN) [<ffff82d04034bef5>] R start_secondary+0x393/0x3b7 > (XEN) [<ffff82d0402000e2>] F __high_start+0x42/0x60 > > To make matters worse, __stop_this_cpu() calls hvm_cpu_down() unconditionally > too, so what happen next is: > > (XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]---- > (XEN) CPU: 0 > (XEN) RIP: e008:[<ffff82d04034ab02>] __stop_this_cpu+0x12/0x3c > (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor > ... > (XEN) Xen code around <ffff82d04034ab02> (__stop_this_cpu+0x12/0x3c): > (XEN) 48 89 e5 e8 8a 1d fd ff <0f> 0b 0f ff ff 90 0f 06 db e3 48 89 e0 48 0d ff > ... > (XEN) Xen call trace: > (XEN) [<ffff82d04034ab02>] R __stop_this_cpu+0x12/0x3c > (XEN) [<ffff82d04034ac15>] F smp_send_stop+0xdd/0xf8 > (XEN) [<ffff82d04034a229>] F machine_restart+0xa2/0x298 > (XEN) [<ffff82d04034a42a>] F arch/x86/shutdown.c#__machine_restart+0xb/0x11 > (XEN) [<ffff82d04022fd15>] F smp_call_function_interrupt+0xbf/0xea > (XEN) [<ffff82d04034acc6>] F call_function_interrupt+0x35/0x37 > (XEN) [<ffff82d040331a70>] F do_IRQ+0xa3/0x6b5 > (XEN) [<ffff82d04039482a>] F common_interrupt+0x10a/0x120 > (XEN) [<ffff82d04031f649>] F __udelay+0x3a/0x51 > (XEN) [<ffff82d04034d5fb>] F __cpu_up+0x48f/0x734 > (XEN) [<ffff82d040203c2b>] F cpu_up+0x7d/0xde > (XEN) [<ffff82d0404543d3>] F __start_xen+0x200b/0x2618 > (XEN) [<ffff82d0402000ef>] F __high_start+0x4f/0x60 > > which recurses until hitting a stack overflow. The #DF handler, which resets > its stack on each invocation, loops indefinitely. > > Reinstate the NULL function pointer checks for hvm_cpu_{up,down}(). > > Fixes: 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to alt-call") > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> > --- > CC: Jan Beulich <JBeulich@suse.com> > CC: Roger Pau Monné <roger.pau@citrix.com> > CC: Wei Liu <wl@xen.org> > > RFC. Not tested yet on the imacted hardware. It's a Xeon PHI with another > werid thing in need of debugging. First boot is fine, while second > boot (loading microcode this time) has a problem with vmx. > > I wonder if we want to modify the callers to check for HVM being enabled, > rather than leaving the NULL pointer checks in a position where they're liable > to be reaped again. What about adding a couple of comments to hvm_cpu_{up,down} to note they are called unconditionally regardless of whether HVM is present or not? Thanks, Roger.
On 05.02.2022 10:47, Roger Pau Monné wrote: > On Fri, Feb 04, 2022 at 05:34:05PM +0000, Andrew Cooper wrote: >> c/s 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to >> alt-call") went too far with dropping NULL function pointer checks. Oh, I'm sorry, I should have noticed this. >> smp_callin() calls hvm_cpu_up() unconditionally. When the platform doesn't >> support HVM, hvm_enable() exits without filling in hvm_funcs, after which the >> altcall pass nukes the (now unconditional) indirect call, causing: >> >> (XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]---- >> (XEN) CPU: 1 >> (XEN) RIP: e008:[<ffff82d04034bef5>] start_secondary+0x393/0x3b7 >> (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor >> ... >> (XEN) Xen code around <ffff82d04034bef5> (start_secondary+0x393/0x3b7): >> (XEN) ff ff 8b 05 1b 84 17 00 <0f> 0b 0f ff ff 90 89 c3 85 c0 0f 84 db fe ff ff >> ... >> (XEN) Xen call trace: >> (XEN) [<ffff82d04034bef5>] R start_secondary+0x393/0x3b7 >> (XEN) [<ffff82d0402000e2>] F __high_start+0x42/0x60 >> >> To make matters worse, __stop_this_cpu() calls hvm_cpu_down() unconditionally >> too, so what happen next is: >> >> (XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]---- >> (XEN) CPU: 0 >> (XEN) RIP: e008:[<ffff82d04034ab02>] __stop_this_cpu+0x12/0x3c >> (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor >> ... >> (XEN) Xen code around <ffff82d04034ab02> (__stop_this_cpu+0x12/0x3c): >> (XEN) 48 89 e5 e8 8a 1d fd ff <0f> 0b 0f ff ff 90 0f 06 db e3 48 89 e0 48 0d ff >> ... >> (XEN) Xen call trace: >> (XEN) [<ffff82d04034ab02>] R __stop_this_cpu+0x12/0x3c >> (XEN) [<ffff82d04034ac15>] F smp_send_stop+0xdd/0xf8 >> (XEN) [<ffff82d04034a229>] F machine_restart+0xa2/0x298 >> (XEN) [<ffff82d04034a42a>] F arch/x86/shutdown.c#__machine_restart+0xb/0x11 >> (XEN) [<ffff82d04022fd15>] F smp_call_function_interrupt+0xbf/0xea >> (XEN) [<ffff82d04034acc6>] F call_function_interrupt+0x35/0x37 >> (XEN) [<ffff82d040331a70>] F do_IRQ+0xa3/0x6b5 >> (XEN) [<ffff82d04039482a>] F common_interrupt+0x10a/0x120 >> (XEN) [<ffff82d04031f649>] F __udelay+0x3a/0x51 >> (XEN) [<ffff82d04034d5fb>] F __cpu_up+0x48f/0x734 >> (XEN) [<ffff82d040203c2b>] F cpu_up+0x7d/0xde >> (XEN) [<ffff82d0404543d3>] F __start_xen+0x200b/0x2618 >> (XEN) [<ffff82d0402000ef>] F __high_start+0x4f/0x60 >> >> which recurses until hitting a stack overflow. The #DF handler, which resets >> its stack on each invocation, loops indefinitely. >> >> Reinstate the NULL function pointer checks for hvm_cpu_{up,down}(). >> >> Fixes: 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to alt-call") >> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> > > Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> Reviewed-by: Jan Beulich <jbeulich@suse.com> >> --- >> CC: Jan Beulich <JBeulich@suse.com> >> CC: Roger Pau Monné <roger.pau@citrix.com> >> CC: Wei Liu <wl@xen.org> >> >> RFC. Not tested yet on the imacted hardware. It's a Xeon PHI with another >> werid thing in need of debugging. First boot is fine, while second >> boot (loading microcode this time) has a problem with vmx. Sounds not unfamiliar: My meanwhile oldish Romley needs to be cold- booted for VMX to actually be usable (not locked) on APs. >> I wonder if we want to modify the callers to check for HVM being enabled, >> rather than leaving the NULL pointer checks in a position where they're liable >> to be reaped again. > > What about adding a couple of comments to hvm_cpu_{up,down} to note > they are called unconditionally regardless of whether HVM is present > or not? I second this as the perhaps better alternative: The S3 path is similarly affected (and you may want to mention this in the description), so this would mean up to 5 conditionals (at the source level) instead of the just two you get away with here. Jan
On 07/02/2022 08:29, Jan Beulich wrote: > On 05.02.2022 10:47, Roger Pau Monné wrote: >> On Fri, Feb 04, 2022 at 05:34:05PM +0000, Andrew Cooper wrote: >>> c/s 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to >>> alt-call") went too far with dropping NULL function pointer checks. > Oh, I'm sorry, I should have noticed this. > >>> smp_callin() calls hvm_cpu_up() unconditionally. When the platform doesn't >>> support HVM, hvm_enable() exits without filling in hvm_funcs, after which the >>> altcall pass nukes the (now unconditional) indirect call, causing: >>> >>> (XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]---- >>> (XEN) CPU: 1 >>> (XEN) RIP: e008:[<ffff82d04034bef5>] start_secondary+0x393/0x3b7 >>> (XEN) RFLAGS: 0000000000010086 CONTEXT: hypervisor >>> ... >>> (XEN) Xen code around <ffff82d04034bef5> (start_secondary+0x393/0x3b7): >>> (XEN) ff ff 8b 05 1b 84 17 00 <0f> 0b 0f ff ff 90 89 c3 85 c0 0f 84 db fe ff ff >>> ... >>> (XEN) Xen call trace: >>> (XEN) [<ffff82d04034bef5>] R start_secondary+0x393/0x3b7 >>> (XEN) [<ffff82d0402000e2>] F __high_start+0x42/0x60 >>> >>> To make matters worse, __stop_this_cpu() calls hvm_cpu_down() unconditionally >>> too, so what happen next is: >>> >>> (XEN) ----[ Xen-4.17.0-10.18-d x86_64 debug=y Not tainted ]---- >>> (XEN) CPU: 0 >>> (XEN) RIP: e008:[<ffff82d04034ab02>] __stop_this_cpu+0x12/0x3c >>> (XEN) RFLAGS: 0000000000010046 CONTEXT: hypervisor >>> ... >>> (XEN) Xen code around <ffff82d04034ab02> (__stop_this_cpu+0x12/0x3c): >>> (XEN) 48 89 e5 e8 8a 1d fd ff <0f> 0b 0f ff ff 90 0f 06 db e3 48 89 e0 48 0d ff >>> ... >>> (XEN) Xen call trace: >>> (XEN) [<ffff82d04034ab02>] R __stop_this_cpu+0x12/0x3c >>> (XEN) [<ffff82d04034ac15>] F smp_send_stop+0xdd/0xf8 >>> (XEN) [<ffff82d04034a229>] F machine_restart+0xa2/0x298 >>> (XEN) [<ffff82d04034a42a>] F arch/x86/shutdown.c#__machine_restart+0xb/0x11 >>> (XEN) [<ffff82d04022fd15>] F smp_call_function_interrupt+0xbf/0xea >>> (XEN) [<ffff82d04034acc6>] F call_function_interrupt+0x35/0x37 >>> (XEN) [<ffff82d040331a70>] F do_IRQ+0xa3/0x6b5 >>> (XEN) [<ffff82d04039482a>] F common_interrupt+0x10a/0x120 >>> (XEN) [<ffff82d04031f649>] F __udelay+0x3a/0x51 >>> (XEN) [<ffff82d04034d5fb>] F __cpu_up+0x48f/0x734 >>> (XEN) [<ffff82d040203c2b>] F cpu_up+0x7d/0xde >>> (XEN) [<ffff82d0404543d3>] F __start_xen+0x200b/0x2618 >>> (XEN) [<ffff82d0402000ef>] F __high_start+0x4f/0x60 >>> >>> which recurses until hitting a stack overflow. The #DF handler, which resets >>> its stack on each invocation, loops indefinitely. >>> >>> Reinstate the NULL function pointer checks for hvm_cpu_{up,down}(). >>> >>> Fixes: 27a63cdac388 ("x86/HVM: convert remaining hvm_funcs hook invocations to alt-call") >>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> >> Reviewed-by: Roger Pau Monné <roger.pau@citrix.com> > Reviewed-by: Jan Beulich <jbeulich@suse.com> Thanks. > >>> --- >>> CC: Jan Beulich <JBeulich@suse.com> >>> CC: Roger Pau Monné <roger.pau@citrix.com> >>> CC: Wei Liu <wl@xen.org> >>> >>> RFC. Not tested yet on the imacted hardware. It's a Xeon PHI with another >>> werid thing in need of debugging. First boot is fine, while second >>> boot (loading microcode this time) has a problem with vmx. > Sounds not unfamiliar: My meanwhile oldish Romley needs to be cold- > booted for VMX to actually be usable (not locked) on APs. This is something which goes wrong as a consequence of loading microcode. >>> I wonder if we want to modify the callers to check for HVM being enabled, >>> rather than leaving the NULL pointer checks in a position where they're liable >>> to be reaped again. >> What about adding a couple of comments to hvm_cpu_{up,down} to note >> they are called unconditionally regardless of whether HVM is present >> or not? > I second this as the perhaps better alternative: The S3 path is > similarly affected (and you may want to mention this in the > description), so this would mean up to 5 conditionals (at the > source level) instead of the just two you get away with here. Ok. I've added: /* Called in boot/resume paths. Must cope with no HVM support. */ and: /* Called in shutdown paths. Must cope with no HVM support. */ ~Andrew
© 2016 - 2024 Red Hat, Inc.