xen/arch/x86/hvm/svm/entry.S | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
There is a corner case in the VMRUN instruction where its INTR_SHADOW state
leaks into guest state if a VMExit occurs before the VMRUN is complete. An
example of this could be taking #NPF due to event injection.
Xen can safely execute STI anywhere between CLGI and VMRUN, as CLGI blocks
external interrupts too. Move the STI to the other end of the block, which
moves the VMRUN instruction outside of STI's shadow.
Link: https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4uzJQ@mail.gmail.com/
Fixes: 66b245d9eaeb ("SVM: limit GIF=0 region")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
I'm reasonbly sure this will trigger reliably during LogDirty because of how
we do misconfig propagation.
It's also mostly benign; from the guest's point of view, a pending interrupt
will be delayed by one instruction. Hence, not tagged for 4.20 at this
juncture.
---
xen/arch/x86/hvm/svm/entry.S | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
index 6fd9652c04a1..c710464673f0 100644
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -57,6 +57,14 @@ __UNLIKELY_END(nsvm_hap)
clgi
+ /*
+ * Set EFLAGS.IF, after CLGI covers us from real interrupts, but not
+ * immediately prior to VMRUN. AMD CPUs leak Xen's INTR_SHADOW from
+ * the STI into guest state if a VMExit occurs during VMEntry
+ * (e.g. taking #NPF during event injecting.)
+ */
+ sti
+
/* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
/* SPEC_CTRL_EXIT_TO_SVM Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
.macro svm_vmentry_spec_ctrl
@@ -91,7 +99,6 @@ __UNLIKELY_END(nsvm_hap)
pop %rsi
pop %rdi
- sti
vmrun
SAVE_ALL
base-commit: 414dde38b0cf8a38230c8c3f9e8564da9762e743
--
2.39.5
On 17.02.2025 17:12, Andrew Cooper wrote: > There is a corner case in the VMRUN instruction where its INTR_SHADOW state > leaks into guest state if a VMExit occurs before the VMRUN is complete. An > example of this could be taking #NPF due to event injection. Ouch. > --- a/xen/arch/x86/hvm/svm/entry.S > +++ b/xen/arch/x86/hvm/svm/entry.S > @@ -57,6 +57,14 @@ __UNLIKELY_END(nsvm_hap) > > clgi > > + /* > + * Set EFLAGS.IF, after CLGI covers us from real interrupts, but not > + * immediately prior to VMRUN. AMD CPUs leak Xen's INTR_SHADOW from > + * the STI into guest state if a VMExit occurs during VMEntry > + * (e.g. taking #NPF during event injecting.) > + */ > + sti > + > /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */ > /* SPEC_CTRL_EXIT_TO_SVM Req: b=curr %rsp=regs/cpuinfo, Clob: acd */ > .macro svm_vmentry_spec_ctrl I'm mildly worried to see it moved this high up. Any exception taken in this exit code would consider the system to have interrupts enabled, when we have more restrictive handling for the IF=0 case. Could we meet in the middle and have STI before we start popping registers off the stack, but after all the speculation machinery? Jan
On 17/02/2025 4:51 pm, Jan Beulich wrote:
> On 17.02.2025 17:12, Andrew Cooper wrote:
>> There is a corner case in the VMRUN instruction where its INTR_SHADOW state
>> leaks into guest state if a VMExit occurs before the VMRUN is complete. An
>> example of this could be taking #NPF due to event injection.
> Ouch.
Yeah. Intel go out of their way to make VM{LAUNCH,RESUME} fail if
they're executed in a shadow.
>
>> --- a/xen/arch/x86/hvm/svm/entry.S
>> +++ b/xen/arch/x86/hvm/svm/entry.S
>> @@ -57,6 +57,14 @@ __UNLIKELY_END(nsvm_hap)
>>
>> clgi
>>
>> + /*
>> + * Set EFLAGS.IF, after CLGI covers us from real interrupts, but not
>> + * immediately prior to VMRUN. AMD CPUs leak Xen's INTR_SHADOW from
>> + * the STI into guest state if a VMExit occurs during VMEntry
>> + * (e.g. taking #NPF during event injecting.)
>> + */
>> + sti
>> +
>> /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
>> /* SPEC_CTRL_EXIT_TO_SVM Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
>> .macro svm_vmentry_spec_ctrl
> I'm mildly worried to see it moved this high up. Any exception taken in
> this exit code would consider the system to have interrupts enabled, when
> we have more restrictive handling for the IF=0 case. Could we meet in the
> middle and have STI before we start popping registers off the stack, but
> after all the speculation machinery?
Any exception taken here is fatal, and going to fail in weird ways.
e.g. we don't clean up GIF before entering the crash kernel.
But yes, we probably should take steps to avoid the interrupted context
from looking even more weird than usual.
I'll put it above the line of pops. They're going to turn into a single
macro when I can dust off that series.
~Andrew
On 17.02.2025 18:40, Andrew Cooper wrote:
> On 17/02/2025 4:51 pm, Jan Beulich wrote:
>> On 17.02.2025 17:12, Andrew Cooper wrote:
>>> There is a corner case in the VMRUN instruction where its INTR_SHADOW state
>>> leaks into guest state if a VMExit occurs before the VMRUN is complete. An
>>> example of this could be taking #NPF due to event injection.
>> Ouch.
>
> Yeah. Intel go out of their way to make VM{LAUNCH,RESUME} fail if
> they're executed in a shadow.
>
>>
>>> --- a/xen/arch/x86/hvm/svm/entry.S
>>> +++ b/xen/arch/x86/hvm/svm/entry.S
>>> @@ -57,6 +57,14 @@ __UNLIKELY_END(nsvm_hap)
>>>
>>> clgi
>>>
>>> + /*
>>> + * Set EFLAGS.IF, after CLGI covers us from real interrupts, but not
>>> + * immediately prior to VMRUN. AMD CPUs leak Xen's INTR_SHADOW from
>>> + * the STI into guest state if a VMExit occurs during VMEntry
>>> + * (e.g. taking #NPF during event injecting.)
>>> + */
>>> + sti
>>> +
>>> /* WARNING! `ret`, `call *`, `jmp *` not safe beyond this point. */
>>> /* SPEC_CTRL_EXIT_TO_SVM Req: b=curr %rsp=regs/cpuinfo, Clob: acd */
>>> .macro svm_vmentry_spec_ctrl
>> I'm mildly worried to see it moved this high up. Any exception taken in
>> this exit code would consider the system to have interrupts enabled, when
>> we have more restrictive handling for the IF=0 case. Could we meet in the
>> middle and have STI before we start popping registers off the stack, but
>> after all the speculation machinery?
>
> Any exception taken here is fatal, and going to fail in weird ways.
> e.g. we don't clean up GIF before entering the crash kernel.
>
> But yes, we probably should take steps to avoid the interrupted context
> from looking even more weird than usual.
>
> I'll put it above the line of pops. They're going to turn into a single
> macro when I can dust off that series.
Then:
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Jan
There is a corner case in the VMRUN instruction where its INTR_SHADOW state
leaks into guest state if a VMExit occurs before the VMRUN is complete. An
example of this could be taking #NPF due to event injection.
Xen can safely execute STI anywhere between CLGI and VMRUN, as CLGI blocks
external interrupts too. However, an exception (while fatal) will appear to
be in an irqs-on region (as GIF isn't considered), so position the STI after
the speculation actions but prior to the GPR pops.
Link: https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4uzJQ@mail.gmail.com/
Fixes: 66b245d9eaeb ("SVM: limit GIF=0 region")
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
---
v2:
* Move after the speculation actions.
Emailed out just for completeness. I've queued it in my for-4.21 branch.
---
xen/arch/x86/hvm/svm/entry.S | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/xen/arch/x86/hvm/svm/entry.S b/xen/arch/x86/hvm/svm/entry.S
index 6fd9652c04a1..91edb3345938 100644
--- a/xen/arch/x86/hvm/svm/entry.S
+++ b/xen/arch/x86/hvm/svm/entry.S
@@ -74,6 +74,14 @@ __UNLIKELY_END(nsvm_hap)
ALTERNATIVE "", svm_vmentry_spec_ctrl, X86_FEATURE_SC_MSR_HVM
ALTERNATIVE "", DO_SPEC_CTRL_DIV, X86_FEATURE_SC_DIV
+ /*
+ * Set EFLAGS.IF after CLGI covers us from real interrupts, but not
+ * immediately prior to VMRUN. The VMRUN instruction leaks it's
+ * INTR_SHADOW into guest state if a VMExit occurs before VMRUN
+ * completes (e.g. taking #NPF during event injecting.)
+ */
+ sti
+
pop %r15
pop %r14
pop %r13
@@ -91,7 +99,6 @@ __UNLIKELY_END(nsvm_hap)
pop %rsi
pop %rdi
- sti
vmrun
SAVE_ALL
base-commit: 81f8b1dd9407e4a3d9dc058b7fbbc591168649ad
prerequisite-patch-id: 838271a62e35fbc5b6696a9d331ba09fd1b54328
prerequisite-patch-id: e597e6f0f303962d4829ab0b8601daa5db15a9e6
prerequisite-patch-id: 2d19885bdacc98098fc44caf68fcd25ac1f19596
--
2.39.5
On 18.02.2025 15:37, Andrew Cooper wrote:
> There is a corner case in the VMRUN instruction where its INTR_SHADOW state
> leaks into guest state if a VMExit occurs before the VMRUN is complete. An
> example of this could be taking #NPF due to event injection.
>
> Xen can safely execute STI anywhere between CLGI and VMRUN, as CLGI blocks
> external interrupts too. However, an exception (while fatal) will appear to
> be in an irqs-on region (as GIF isn't considered), so position the STI after
> the speculation actions but prior to the GPR pops.
>
> Link: https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4uzJQ@mail.gmail.com/
> Fixes: 66b245d9eaeb ("SVM: limit GIF=0 region")
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Reviewed-by: Jan Beulich <jbeulich@suse.com>
> ---
> v2:
> * Move after the speculation actions.
>
> Emailed out just for completeness. I've queued it in my for-4.21 branch.
It'll want backporting, so I wonder if we should persuade Oleksii into
taking it for 4.20.
Jan
On 18/02/2025 2:42 pm, Jan Beulich wrote:
> On 18.02.2025 15:37, Andrew Cooper wrote:
>> There is a corner case in the VMRUN instruction where its INTR_SHADOW state
>> leaks into guest state if a VMExit occurs before the VMRUN is complete. An
>> example of this could be taking #NPF due to event injection.
>>
>> Xen can safely execute STI anywhere between CLGI and VMRUN, as CLGI blocks
>> external interrupts too. However, an exception (while fatal) will appear to
>> be in an irqs-on region (as GIF isn't considered), so position the STI after
>> the speculation actions but prior to the GPR pops.
>>
>> Link: https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4uzJQ@mail.gmail.com/
>> Fixes: 66b245d9eaeb ("SVM: limit GIF=0 region")
>> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Reviewed-by: Jan Beulich <jbeulich@suse.com>
>> ---
>> v2:
>> * Move after the speculation actions.
>>
>> Emailed out just for completeness. I've queued it in my for-4.21 branch.
> It'll want backporting, so I wonder if we should persuade Oleksii into
> taking it for 4.20.
If Oleksii is happy, I can put it into 4.20.
~Andrew
On 2/18/25 3:45 PM, Andrew Cooper wrote:
> On 18/02/2025 2:42 pm, Jan Beulich wrote:
>> On 18.02.2025 15:37, Andrew Cooper wrote:
>>> There is a corner case in the VMRUN instruction where its INTR_SHADOW state
>>> leaks into guest state if a VMExit occurs before the VMRUN is complete. An
>>> example of this could be taking #NPF due to event injection.
>>>
>>> Xen can safely execute STI anywhere between CLGI and VMRUN, as CLGI blocks
>>> external interrupts too. However, an exception (while fatal) will appear to
>>> be in an irqs-on region (as GIF isn't considered), so position the STI after
>>> the speculation actions but prior to the GPR pops.
>>>
>>> Link:https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4uzJQ@mail.gmail.com/
>>> Fixes: 66b245d9eaeb ("SVM: limit GIF=0 region")
>>> Signed-off-by: Andrew Cooper<andrew.cooper3@citrix.com>
>>> Reviewed-by: Jan Beulich<jbeulich@suse.com>
>>> ---
>>> v2:
>>> * Move after the speculation actions.
>>>
>>> Emailed out just for completeness. I've queued it in my for-4.21 branch.
>> It'll want backporting, so I wonder if we should persuade Oleksii into
>> taking it for 4.20.
Based on that ...
> If Oleksii is happy, I can put it into 4.20.
... Release-Acked-by: Oleksii Kurochko<oleksii.kurochko@gmail.com>
Thanks.
~ Oleksii
© 2016 - 2025 Red Hat, Inc.