Rework the handling of the MMIO Stale Data mitigation to clear CPU buffers
immediately prior to VM-Enter, i.e. in the same location that KVM emits a
VERW for unconditional (at runtime) clearing. Co-locating the code and
using a single ALTERNATIVES_2 makes it more obvious how VMX mitigates the
various vulnerabilities.
Deliberately order the alternatives as:
0. Do nothing
1. Clear if vCPU can access MMIO
2. Clear always
since the last alternative wins in ALTERNATIVES_2(), i.e. so that KVM will
honor the strictest mitigation (always clear CPU buffers) if multiple
mitigations are selected. E.g. even if the kernel chooses to mitigate
MMIO Stale Data via X86_FEATURE_CLEAR_CPU_BUF_MMIO, some other mitigation
may enable X86_FEATURE_CLEAR_CPU_BUF_VM, and that other thing needs to win.
Note, decoupling the MMIO mitigation from the L1TF mitigation also fixes
a mostly-benign flaw where KVM wouldn't do any clearing/flushing if the
L1TF mitigation is configured to conditionally flush the L1D, and the MMIO
mitigation but not any other "clear CPU buffers" mitigation is enabled.
For that specific scenario, KVM would skip clearing CPU buffers for the
MMIO mitigation even though the kernel requested a clear on every VM-Enter.
Note #2, the flaw goes back to the introduction of the MDS mitigation. The
MDS mitigation was inadvertently fixed by commit 43fb862de8f6 ("KVM/VMX:
Move VERW closer to VMentry for MDS mitigation"), but previous kernels
that flush CPU buffers in vmx_vcpu_enter_exit() are affected (though it's
unlikely the flaw is meaningfully exploitable even older kernels).
Fixes: 650b68a0622f ("x86/kvm/vmx: Add MDS protection when L1D Flush is not active")
Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/vmenter.S | 14 +++++++++++++-
arch/x86/kvm/vmx/vmx.c | 13 -------------
2 files changed, 13 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
index 1f99a98a16a2..61a809790a58 100644
--- a/arch/x86/kvm/vmx/vmenter.S
+++ b/arch/x86/kvm/vmx/vmenter.S
@@ -71,6 +71,7 @@
* @regs: unsigned long * (to guest registers)
* @flags: VMX_RUN_VMRESUME: use VMRESUME instead of VMLAUNCH
* VMX_RUN_SAVE_SPEC_CTRL: save guest SPEC_CTRL into vmx->spec_ctrl
+ * VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO: vCPU can access host MMIO
*
* Returns:
* 0 on VM-Exit, 1 on VM-Fail
@@ -137,6 +138,12 @@ SYM_FUNC_START(__vmx_vcpu_run)
/* Load @regs to RAX. */
mov (%_ASM_SP), %_ASM_AX
+ /* Stash "clear for MMIO" in EFLAGS.ZF (used below). */
+ ALTERNATIVE_2 "", \
+ __stringify(test $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, %ebx), \
+ X86_FEATURE_CLEAR_CPU_BUF_MMIO, \
+ "", X86_FEATURE_CLEAR_CPU_BUF_VM
+
/* Check if vmlaunch or vmresume is needed */
bt $VMX_RUN_VMRESUME_SHIFT, %ebx
@@ -161,7 +168,12 @@ SYM_FUNC_START(__vmx_vcpu_run)
mov VCPU_RAX(%_ASM_AX), %_ASM_AX
/* Clobbers EFLAGS.ZF */
- VM_CLEAR_CPU_BUFFERS
+ ALTERNATIVE_2 "", \
+ __stringify(jz .Lskip_clear_cpu_buffers; \
+ CLEAR_CPU_BUFFERS_SEQ; \
+ .Lskip_clear_cpu_buffers:), \
+ X86_FEATURE_CLEAR_CPU_BUF_MMIO, \
+ __CLEAR_CPU_BUFFERS, X86_FEATURE_CLEAR_CPU_BUF_VM
/* Check EFLAGS.CF from the VMX_RUN_VMRESUME bit test above. */
jnc .Lvmlaunch
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 68cde725d1c7..5af2338c7cb8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7339,21 +7339,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
guest_state_enter_irqoff();
- /*
- * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW
- * mitigation for MDS is done late in VMentry and is still
- * executed in spite of L1D Flush. This is because an extra VERW
- * should not matter much after the big hammer L1D Flush.
- *
- * cpu_buf_vm_clear is used when system is not vulnerable to MDS/TAA,
- * and is affected by MMIO Stale Data. In such cases mitigation in only
- * needed against an MMIO capable guest.
- */
if (static_branch_unlikely(&vmx_l1d_should_flush))
vmx_l1d_flush(vcpu);
- else if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF_MMIO) &&
- (flags & VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO))
- x86_clear_cpu_buffers();
vmx_disable_fb_clear(vmx);
--
2.51.1.930.gacf6e81ea2-goog
On Thu, Oct 30, 2025 at 05:30:36PM -0700, Sean Christopherson wrote:
> @@ -137,6 +138,12 @@ SYM_FUNC_START(__vmx_vcpu_run)
> /* Load @regs to RAX. */
> mov (%_ASM_SP), %_ASM_AX
>
> + /* Stash "clear for MMIO" in EFLAGS.ZF (used below). */
Oh wow. Alternatives interdependence. What can go wrong. :)
> + ALTERNATIVE_2 "", \
> + __stringify(test $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, %ebx), \
So this VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO bit gets set here:
if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF_MMIO) &&
kvm_vcpu_can_access_host_mmio(&vmx->vcpu))
flags |= VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO;
So how static and/or dynamic is this?
IOW, can you stick this into a simple variable which is unconditionally
updated and you can use it in X86_FEATURE_CLEAR_CPU_BUF_MMIO case and
otherwise it simply remains unused?
Because then you get rid of that yuckiness.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Nov 12, 2025, Borislav Petkov wrote: > On Thu, Oct 30, 2025 at 05:30:36PM -0700, Sean Christopherson wrote: > > @@ -137,6 +138,12 @@ SYM_FUNC_START(__vmx_vcpu_run) > > /* Load @regs to RAX. */ > > mov (%_ASM_SP), %_ASM_AX > > > > + /* Stash "clear for MMIO" in EFLAGS.ZF (used below). */ > > Oh wow. Alternatives interdependence. What can go wrong. :) Nothing, it's perfect. :-D > > + ALTERNATIVE_2 "", \ > > + __stringify(test $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, %ebx), \ > > So this VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO bit gets set here: > > if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF_MMIO) && > kvm_vcpu_can_access_host_mmio(&vmx->vcpu)) > flags |= VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO; > > So how static and/or dynamic is this? kvm_vcpu_can_access_host_mmio() is very dynamic. It can be different between vCPUs in a VM, and can even change on back-to-back runs of the same vCPU. > > IOW, can you stick this into a simple variable which is unconditionally > updated and you can use it in X86_FEATURE_CLEAR_CPU_BUF_MMIO case and > otherwise it simply remains unused? Can you elaborate? I don't think I follow what you're suggesting. > > Because then you get rid of that yuckiness. > > Thx. > > -- > Regards/Gruss, > Boris. > > https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Nov 12, 2025 at 09:15:00AM -0800, Sean Christopherson wrote:
> On Wed, Nov 12, 2025, Borislav Petkov wrote:
> > On Thu, Oct 30, 2025 at 05:30:36PM -0700, Sean Christopherson wrote:
> > > @@ -137,6 +138,12 @@ SYM_FUNC_START(__vmx_vcpu_run)
> > > /* Load @regs to RAX. */
> > > mov (%_ASM_SP), %_ASM_AX
> > >
> > > + /* Stash "clear for MMIO" in EFLAGS.ZF (used below). */
> >
> > Oh wow. Alternatives interdependence. What can go wrong. :)
>
> Nothing, it's perfect. :-D
Yeah. :-P
>
> > > + ALTERNATIVE_2 "", \
> > > + __stringify(test $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, %ebx), \
> >
> > So this VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO bit gets set here:
> >
> > if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF_MMIO) &&
> > kvm_vcpu_can_access_host_mmio(&vmx->vcpu))
> > flags |= VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO;
> >
> > So how static and/or dynamic is this?
>
> kvm_vcpu_can_access_host_mmio() is very dynamic. It can be different between
> vCPUs in a VM, and can even change on back-to-back runs of the same vCPU.
Hmm, strange. Because looking at those things there:
root->has_mapped_host_mmio and vcpu->kvm->arch.has_mapped_host_mmio
they both read like something that a guest would set up once and that's it.
But what do I know...
> > IOW, can you stick this into a simple variable which is unconditionally
> > updated and you can use it in X86_FEATURE_CLEAR_CPU_BUF_MMIO case and
> > otherwise it simply remains unused?
>
> Can you elaborate? I don't think I follow what you're suggesting.
So I was thinking if you could set a per-guest variable in
C - vmx_per_guest_clear_per_mmio or so and then test it in asm:
testb $1,vmx_per_guest_clear_per_mmio(%rip)
jz .Lskip_clear_cpu_buffers;
CLEAR_CPU_BUFFERS_SEQ;
.Lskip_clear_cpu_buffers:
gcc -O3 suggests also
cmpb $0x0,vmx_per_guest_clear_per_mmio(%rip)
which is the same insn size...
The idea is to get rid of this first asm stashing things and it'll be a bit
more robust, I'd say.
And you don't rely on registers...
and when I say that, I now realize this is 32-bit too and you don't want to
touch regs - that's why you're stashing it - and there's no rip-relative on
32-bit...
I dunno - it might get hairy but I would still opt for a different solution
instead of this fragile stashing in ZF. You could do a function which pushes
and pops a scratch register where you put the value, i.e., you could do
push %reg
mov var, %reg
test or cmp ...
...
jz skip...
skip:
pop %reg
It is still all together in one place instead of spreading it around like
that.
Oh well.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Wed, Nov 12, 2025, Borislav Petkov wrote:
> On Wed, Nov 12, 2025 at 09:15:00AM -0800, Sean Christopherson wrote:
> > On Wed, Nov 12, 2025, Borislav Petkov wrote:
> > > So this VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO bit gets set here:
> > >
> > > if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF_MMIO) &&
> > > kvm_vcpu_can_access_host_mmio(&vmx->vcpu))
> > > flags |= VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO;
> > >
> > > So how static and/or dynamic is this?
> >
> > kvm_vcpu_can_access_host_mmio() is very dynamic. It can be different between
> > vCPUs in a VM, and can even change on back-to-back runs of the same vCPU.
>
> Hmm, strange. Because looking at those things there:
>
> root->has_mapped_host_mmio and vcpu->kvm->arch.has_mapped_host_mmio
>
> they both read like something that a guest would set up once and that's it.
> But what do I know...
They're set based on what memory is mapped into the KVM-controlled page tables,
e.g. into the EPT/NPT tables, that will be used by the vCPU for that VM-Enter.
root->has_mapped_host_mmio is per page table. vcpu->kvm->arch.has_mapped_host_mmio
exists because of nastiness related to shadow paging; for all intents and purposes,
I would just mentally ignore that one.
> > > IOW, can you stick this into a simple variable which is unconditionally
> > > updated and you can use it in X86_FEATURE_CLEAR_CPU_BUF_MMIO case and
> > > otherwise it simply remains unused?
> >
> > Can you elaborate? I don't think I follow what you're suggesting.
>
> So I was thinking if you could set a per-guest variable in
> C - vmx_per_guest_clear_per_mmio or so and then test it in asm:
>
> testb $1,vmx_per_guest_clear_per_mmio(%rip)
> jz .Lskip_clear_cpu_buffers;
> CLEAR_CPU_BUFFERS_SEQ;
>
> .Lskip_clear_cpu_buffers:
>
> gcc -O3 suggests also
>
> cmpb $0x0,vmx_per_guest_clear_per_mmio(%rip)
>
> which is the same insn size...
>
> The idea is to get rid of this first asm stashing things and it'll be a bit
> more robust, I'd say.
VMX "needs" to abuse RFLAGS no matter what, because RFLAGS is the only register
that's available at the time of VMLAUNCH/VMRESUME. On Intel, only RSP and
RFLAGS are context switched via the VMCS, all other GPRs need to be context
switch by software. Which is why I didn't balk at Pawan's idea to use RFLAGS.ZF
to track whether or not a VERW for MMIO is needed.
Hmm, actually, @flags is already on the stack because it's needed at VM-Exit.
Using EBX was a holdover from the conversion from inline asm to "proper" asm,
e.g. from commit 77df549559db ("KVM: VMX: Pass @launched to the vCPU-run asm via
standard ABI regs").
Oooh, and if we stop using bt+RFLAGS.CF, then we drop the annoying SHIFT definitions
in arch/x86/kvm/vmx/run_flags.h.
Very lightly tested at this point, but I think this can all be simplified to
/*
* Note, ALTERNATIVE_2 works in reverse order. If CLEAR_CPU_BUF_VM is
* enabled, do VERW unconditionally. If CPU_BUF_VM_MMIO is enabled,
* check @flags to see if the vCPU has access to host MMIO, and do VERW
* if so. Else, do nothing (no mitigations needed/enabled).
*/
ALTERNATIVE_2 "", \
__stringify(testl $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, WORD_SIZE(%_ASM_SP); \
jz .Lskip_clear_cpu_buffers; \
VERW; \
.Lskip_clear_cpu_buffers:), \
X86_FEATURE_CLEAR_CPU_BUF_VM_MMIO, \
__stringify(VERW), X86_FEATURE_CLEAR_CPU_BUF_VM
/* Check if vmlaunch or vmresume is needed */
testl $VMX_RUN_VMRESUME, WORD_SIZE(%_ASM_SP)
jz .Lvmlaunch
> And you don't rely on registers...
>
> and when I say that, I now realize this is 32-bit too and you don't want to
> touch regs - that's why you're stashing it - and there's no rip-relative on
> 32-bit...
>
> I dunno - it might get hairy but I would still opt for a different solution
> instead of this fragile stashing in ZF. You could do a function which pushes
> and pops a scratch register where you put the value, i.e., you could do
>
> push %reg
> mov var, %reg
> test or cmp ...
> ...
> jz skip...
> skip:
> pop %reg
>
> It is still all together in one place instead of spreading it around like
> that.
FWIW, all GPRs except RSP are off limits. But as above, getting at @flags via
RSP is trivial.
On Wed, Nov 12, 2025 at 12:30:36PM -0800, Sean Christopherson wrote:
> They're set based on what memory is mapped into the KVM-controlled page tables,
> e.g. into the EPT/NPT tables, that will be used by the vCPU for that VM-Enter.
> root->has_mapped_host_mmio is per page table. vcpu->kvm->arch.has_mapped_host_mmio
> exists because of nastiness related to shadow paging; for all intents and purposes,
> I would just mentally ignore that one.
And you say they're very dynamic because the page table will ofc very likely
change before each VM-Enter. Or rather, as long as the fact that the guest has
mapped host MMIO ranges changes. Oh well, I guess that's dynamic enough...
> Very lightly tested at this point, but I think this can all be simplified to
>
> /*
> * Note, ALTERNATIVE_2 works in reverse order. If CLEAR_CPU_BUF_VM is
> * enabled, do VERW unconditionally. If CPU_BUF_VM_MMIO is enabled,
> * check @flags to see if the vCPU has access to host MMIO, and do VERW
> * if so. Else, do nothing (no mitigations needed/enabled).
> */
> ALTERNATIVE_2 "", \
> __stringify(testl $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, WORD_SIZE(%_ASM_SP); \
> jz .Lskip_clear_cpu_buffers; \
> VERW; \
> .Lskip_clear_cpu_buffers:), \
And juse because that label is local to this statement only, you can simply
call it "1" and reduce clutter even more.
> X86_FEATURE_CLEAR_CPU_BUF_VM_MMIO, \
> __stringify(VERW), X86_FEATURE_CLEAR_CPU_BUF_VM
>
> /* Check if vmlaunch or vmresume is needed */
> testl $VMX_RUN_VMRESUME, WORD_SIZE(%_ASM_SP)
> jz .Lvmlaunch
Yap, that's as nice as it gets. Looks much more straight-forward and
contained.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
On Thu, Nov 13, 2025, Borislav Petkov wrote: > On Wed, Nov 12, 2025 at 12:30:36PM -0800, Sean Christopherson wrote: > > They're set based on what memory is mapped into the KVM-controlled page tables, > > e.g. into the EPT/NPT tables, that will be used by the vCPU for that VM-Enter. > > root->has_mapped_host_mmio is per page table. vcpu->kvm->arch.has_mapped_host_mmio > > exists because of nastiness related to shadow paging; for all intents and purposes, > > I would just mentally ignore that one. > > And you say they're very dynamic because the page table will ofc very likely > change before each VM-Enter. Or rather, as long as the fact that the guest has > mapped host MMIO ranges changes. Oh well, I guess that's dynamic enough... In practice, the flag will be quite static for a given vCPU. The issue is that it _could_ be extremely volatile depending on VMM and/or guest behavior, and so I don't want to try and optimize for any particular behavior/pattern, because KVM effectively doesn't have any control over whether or not the vCPU can access MMIO. > > Very lightly tested at this point, but I think this can all be simplified to > > > > /* > > * Note, ALTERNATIVE_2 works in reverse order. If CLEAR_CPU_BUF_VM is > > * enabled, do VERW unconditionally. If CPU_BUF_VM_MMIO is enabled, > > * check @flags to see if the vCPU has access to host MMIO, and do VERW > > * if so. Else, do nothing (no mitigations needed/enabled). > > */ > > ALTERNATIVE_2 "", \ > > __stringify(testl $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, WORD_SIZE(%_ASM_SP); \ > > jz .Lskip_clear_cpu_buffers; \ > > VERW; \ > > .Lskip_clear_cpu_buffers:), \ > > And juse because that label is local to this statement only, you can simply > call it "1" and reduce clutter even more. Eh, sort of. In the past, this code used "simple" numeric labels, and it became nearly impossible to maintain. This is quite contained code and so isn't likely to cause maintenance problems, but unless someone feels *really* strongly about numeric labels, I'll keep a named label to match the rest of the code. Though with it just being VERW, I can shorten it a wee bit and make it more precise at the same time: __stringify(testl $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, WORD_SIZE(%_ASM_SP); \ jz .Lskip_mmio_verw; \ VERW; \ .Lskip_mmio_verw:), \
On Wed, Nov 12, 2025 at 12:30:36PM -0800, Sean Christopherson wrote:
> VMX "needs" to abuse RFLAGS no matter what, because RFLAGS is the only register
> that's available at the time of VMLAUNCH/VMRESUME. On Intel, only RSP and
> RFLAGS are context switched via the VMCS, all other GPRs need to be context
> switch by software. Which is why I didn't balk at Pawan's idea to use RFLAGS.ZF
> to track whether or not a VERW for MMIO is needed.
>
> Hmm, actually, @flags is already on the stack because it's needed at VM-Exit.
> Using EBX was a holdover from the conversion from inline asm to "proper" asm,
> e.g. from commit 77df549559db ("KVM: VMX: Pass @launched to the vCPU-run asm via
> standard ABI regs").
>
> Oooh, and if we stop using bt+RFLAGS.CF, then we drop the annoying SHIFT definitions
> in arch/x86/kvm/vmx/run_flags.h.
>
> Very lightly tested at this point, but I think this can all be simplified to
>
> /*
> * Note, ALTERNATIVE_2 works in reverse order. If CLEAR_CPU_BUF_VM is
> * enabled, do VERW unconditionally. If CPU_BUF_VM_MMIO is enabled,
> * check @flags to see if the vCPU has access to host MMIO, and do VERW
> * if so. Else, do nothing (no mitigations needed/enabled).
> */
> ALTERNATIVE_2 "", \
> __stringify(testl $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, WORD_SIZE(%_ASM_SP); \
WORD_SIZE(%_ASM_SP) is still a bit fragile, but this is definitely an
improvement.
> jz .Lskip_clear_cpu_buffers; \
> VERW; \
> .Lskip_clear_cpu_buffers:), \
> X86_FEATURE_CLEAR_CPU_BUF_VM_MMIO, \
> __stringify(VERW), X86_FEATURE_CLEAR_CPU_BUF_VM
>
> /* Check if vmlaunch or vmresume is needed */
> testl $VMX_RUN_VMRESUME, WORD_SIZE(%_ASM_SP)
> jz .Lvmlaunch
On Thu, Oct 30, 2025 at 05:30:36PM -0700, Sean Christopherson wrote:
> Rework the handling of the MMIO Stale Data mitigation to clear CPU buffers
> immediately prior to VM-Enter, i.e. in the same location that KVM emits a
> VERW for unconditional (at runtime) clearing. Co-locating the code and
> using a single ALTERNATIVES_2 makes it more obvious how VMX mitigates the
> various vulnerabilities.
>
> Deliberately order the alternatives as:
>
> 0. Do nothing
> 1. Clear if vCPU can access MMIO
> 2. Clear always
>
> since the last alternative wins in ALTERNATIVES_2(), i.e. so that KVM will
> honor the strictest mitigation (always clear CPU buffers) if multiple
> mitigations are selected. E.g. even if the kernel chooses to mitigate
> MMIO Stale Data via X86_FEATURE_CLEAR_CPU_BUF_MMIO, some other mitigation
> may enable X86_FEATURE_CLEAR_CPU_BUF_VM, and that other thing needs to win.
>
> Note, decoupling the MMIO mitigation from the L1TF mitigation also fixes
> a mostly-benign flaw where KVM wouldn't do any clearing/flushing if the
> L1TF mitigation is configured to conditionally flush the L1D, and the MMIO
> mitigation but not any other "clear CPU buffers" mitigation is enabled.
> For that specific scenario, KVM would skip clearing CPU buffers for the
> MMIO mitigation even though the kernel requested a clear on every VM-Enter.
>
> Note #2, the flaw goes back to the introduction of the MDS mitigation. The
> MDS mitigation was inadvertently fixed by commit 43fb862de8f6 ("KVM/VMX:
> Move VERW closer to VMentry for MDS mitigation"), but previous kernels
> that flush CPU buffers in vmx_vcpu_enter_exit() are affected (though it's
> unlikely the flaw is meaningfully exploitable even older kernels).
>
> Fixes: 650b68a0622f ("x86/kvm/vmx: Add MDS protection when L1D Flush is not active")
> Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
Reviewed-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
On Thu, Oct 30, 2025 at 05:30:36PM -0700, Sean Christopherson wrote: ... > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S > index 1f99a98a16a2..61a809790a58 100644 > --- a/arch/x86/kvm/vmx/vmenter.S > +++ b/arch/x86/kvm/vmx/vmenter.S > @@ -71,6 +71,7 @@ > * @regs: unsigned long * (to guest registers) > * @flags: VMX_RUN_VMRESUME: use VMRESUME instead of VMLAUNCH > * VMX_RUN_SAVE_SPEC_CTRL: save guest SPEC_CTRL into vmx->spec_ctrl > + * VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO: vCPU can access host MMIO > * > * Returns: > * 0 on VM-Exit, 1 on VM-Fail > @@ -137,6 +138,12 @@ SYM_FUNC_START(__vmx_vcpu_run) > /* Load @regs to RAX. */ > mov (%_ASM_SP), %_ASM_AX > > + /* Stash "clear for MMIO" in EFLAGS.ZF (used below). */ > + ALTERNATIVE_2 "", \ > + __stringify(test $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, %ebx), \ > + X86_FEATURE_CLEAR_CPU_BUF_MMIO, \ > + "", X86_FEATURE_CLEAR_CPU_BUF_VM > + > /* Check if vmlaunch or vmresume is needed */ > bt $VMX_RUN_VMRESUME_SHIFT, %ebx > > @@ -161,7 +168,12 @@ SYM_FUNC_START(__vmx_vcpu_run) > mov VCPU_RAX(%_ASM_AX), %_ASM_AX > > /* Clobbers EFLAGS.ZF */ > - VM_CLEAR_CPU_BUFFERS > + ALTERNATIVE_2 "", \ > + __stringify(jz .Lskip_clear_cpu_buffers; \ > + CLEAR_CPU_BUFFERS_SEQ; \ > + .Lskip_clear_cpu_buffers:), \ > + X86_FEATURE_CLEAR_CPU_BUF_MMIO, \ > + __CLEAR_CPU_BUFFERS, X86_FEATURE_CLEAR_CPU_BUF_VM Another way to write this could be: ALTERNATIVE_2 "jmp .Lskip_clear_cpu_buffers", \ "jz .Lskip_clear_cpu_buffers", X86_FEATURE_CLEAR_CPU_BUF_MMIO, \ "", X86_FEATURE_CLEAR_CPU_BUF_VM CLEAR_CPU_BUFFERS_SEQ .Lskip_clear_cpu_buffers: With this jmp;verw; would show up in the disassembly on unaffected CPUs, I don't know how big a problem is that. OTOH, I find this easier to understand.
On Fri, Oct 31, 2025 at 04:55:24PM -0700, Pawan Gupta wrote: > On Thu, Oct 30, 2025 at 05:30:36PM -0700, Sean Christopherson wrote: > ... > > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S > > index 1f99a98a16a2..61a809790a58 100644 > > --- a/arch/x86/kvm/vmx/vmenter.S > > +++ b/arch/x86/kvm/vmx/vmenter.S > > @@ -71,6 +71,7 @@ > > * @regs: unsigned long * (to guest registers) > > * @flags: VMX_RUN_VMRESUME: use VMRESUME instead of VMLAUNCH > > * VMX_RUN_SAVE_SPEC_CTRL: save guest SPEC_CTRL into vmx->spec_ctrl > > + * VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO: vCPU can access host MMIO > > * > > * Returns: > > * 0 on VM-Exit, 1 on VM-Fail > > @@ -137,6 +138,12 @@ SYM_FUNC_START(__vmx_vcpu_run) > > /* Load @regs to RAX. */ > > mov (%_ASM_SP), %_ASM_AX > > > > + /* Stash "clear for MMIO" in EFLAGS.ZF (used below). */ > > + ALTERNATIVE_2 "", \ > > + __stringify(test $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, %ebx), \ > > + X86_FEATURE_CLEAR_CPU_BUF_MMIO, \ > > + "", X86_FEATURE_CLEAR_CPU_BUF_VM > > + > > /* Check if vmlaunch or vmresume is needed */ > > bt $VMX_RUN_VMRESUME_SHIFT, %ebx > > > > @@ -161,7 +168,12 @@ SYM_FUNC_START(__vmx_vcpu_run) > > mov VCPU_RAX(%_ASM_AX), %_ASM_AX > > > > /* Clobbers EFLAGS.ZF */ > > - VM_CLEAR_CPU_BUFFERS > > + ALTERNATIVE_2 "", \ > > + __stringify(jz .Lskip_clear_cpu_buffers; \ > > + CLEAR_CPU_BUFFERS_SEQ; \ > > + .Lskip_clear_cpu_buffers:), \ > > + X86_FEATURE_CLEAR_CPU_BUF_MMIO, \ > > + __CLEAR_CPU_BUFFERS, X86_FEATURE_CLEAR_CPU_BUF_VM > > Another way to write this could be: > > ALTERNATIVE_2 "jmp .Lskip_clear_cpu_buffers", \ > "jz .Lskip_clear_cpu_buffers", X86_FEATURE_CLEAR_CPU_BUF_MMIO, \ > "", X86_FEATURE_CLEAR_CPU_BUF_VM > > CLEAR_CPU_BUFFERS_SEQ > .Lskip_clear_cpu_buffers: > > With this jmp;verw; would show up in the disassembly on unaffected CPUs, I > don't know how big a problem is that. OTOH, I find this easier to understand. Generating larger code just to keep disassembly 'simple' seems wrong. Also, see this: https://lkml.kernel.org/r/194ad779-f41f-46a5-9973-e886f483b60a@oracle.com
On Mon, Nov 03, 2025 at 10:17:49AM +0100, Peter Zijlstra wrote: > On Fri, Oct 31, 2025 at 04:55:24PM -0700, Pawan Gupta wrote: > > On Thu, Oct 30, 2025 at 05:30:36PM -0700, Sean Christopherson wrote: > > ... > > > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S > > > index 1f99a98a16a2..61a809790a58 100644 > > > --- a/arch/x86/kvm/vmx/vmenter.S > > > +++ b/arch/x86/kvm/vmx/vmenter.S > > > @@ -71,6 +71,7 @@ > > > * @regs: unsigned long * (to guest registers) > > > * @flags: VMX_RUN_VMRESUME: use VMRESUME instead of VMLAUNCH > > > * VMX_RUN_SAVE_SPEC_CTRL: save guest SPEC_CTRL into vmx->spec_ctrl > > > + * VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO: vCPU can access host MMIO > > > * > > > * Returns: > > > * 0 on VM-Exit, 1 on VM-Fail > > > @@ -137,6 +138,12 @@ SYM_FUNC_START(__vmx_vcpu_run) > > > /* Load @regs to RAX. */ > > > mov (%_ASM_SP), %_ASM_AX > > > > > > + /* Stash "clear for MMIO" in EFLAGS.ZF (used below). */ > > > + ALTERNATIVE_2 "", \ > > > + __stringify(test $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, %ebx), \ > > > + X86_FEATURE_CLEAR_CPU_BUF_MMIO, \ > > > + "", X86_FEATURE_CLEAR_CPU_BUF_VM > > > + > > > /* Check if vmlaunch or vmresume is needed */ > > > bt $VMX_RUN_VMRESUME_SHIFT, %ebx > > > > > > @@ -161,7 +168,12 @@ SYM_FUNC_START(__vmx_vcpu_run) > > > mov VCPU_RAX(%_ASM_AX), %_ASM_AX > > > > > > /* Clobbers EFLAGS.ZF */ > > > - VM_CLEAR_CPU_BUFFERS > > > + ALTERNATIVE_2 "", \ > > > + __stringify(jz .Lskip_clear_cpu_buffers; \ > > > + CLEAR_CPU_BUFFERS_SEQ; \ > > > + .Lskip_clear_cpu_buffers:), \ > > > + X86_FEATURE_CLEAR_CPU_BUF_MMIO, \ > > > + __CLEAR_CPU_BUFFERS, X86_FEATURE_CLEAR_CPU_BUF_VM > > > > Another way to write this could be: > > > > ALTERNATIVE_2 "jmp .Lskip_clear_cpu_buffers", \ > > "jz .Lskip_clear_cpu_buffers", X86_FEATURE_CLEAR_CPU_BUF_MMIO, \ > > "", X86_FEATURE_CLEAR_CPU_BUF_VM > > > > CLEAR_CPU_BUFFERS_SEQ > > .Lskip_clear_cpu_buffers: > > > > With this jmp;verw; would show up in the disassembly on unaffected CPUs, I > > don't know how big a problem is that. OTOH, I find this easier to understand. > > Generating larger code just to keep disassembly 'simple' seems wrong. > Also, see this: > > https://lkml.kernel.org/r/194ad779-f41f-46a5-9973-e886f483b60a@oracle.com Ok thanks, that settles it.
On Fri, Oct 31, 2025 at 04:55:37PM -0700, Pawan Gupta wrote: > On Thu, Oct 30, 2025 at 05:30:36PM -0700, Sean Christopherson wrote: > ... > > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S > > index 1f99a98a16a2..61a809790a58 100644 > > --- a/arch/x86/kvm/vmx/vmenter.S > > +++ b/arch/x86/kvm/vmx/vmenter.S > > @@ -71,6 +71,7 @@ > > * @regs: unsigned long * (to guest registers) > > * @flags: VMX_RUN_VMRESUME: use VMRESUME instead of VMLAUNCH > > * VMX_RUN_SAVE_SPEC_CTRL: save guest SPEC_CTRL into vmx->spec_ctrl > > + * VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO: vCPU can access host MMIO > > * > > * Returns: > > * 0 on VM-Exit, 1 on VM-Fail > > @@ -137,6 +138,12 @@ SYM_FUNC_START(__vmx_vcpu_run) > > /* Load @regs to RAX. */ > > mov (%_ASM_SP), %_ASM_AX > > > > + /* Stash "clear for MMIO" in EFLAGS.ZF (used below). */ > > + ALTERNATIVE_2 "", \ > > + __stringify(test $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, %ebx), \ > > + X86_FEATURE_CLEAR_CPU_BUF_MMIO, \ > > + "", X86_FEATURE_CLEAR_CPU_BUF_VM > > + > > /* Check if vmlaunch or vmresume is needed */ > > bt $VMX_RUN_VMRESUME_SHIFT, %ebx > > > > @@ -161,7 +168,12 @@ SYM_FUNC_START(__vmx_vcpu_run) > > mov VCPU_RAX(%_ASM_AX), %_ASM_AX > > > > /* Clobbers EFLAGS.ZF */ > > - VM_CLEAR_CPU_BUFFERS > > + ALTERNATIVE_2 "", \ > > + __stringify(jz .Lskip_clear_cpu_buffers; \ > > + CLEAR_CPU_BUFFERS_SEQ; \ > > + .Lskip_clear_cpu_buffers:), \ > > + X86_FEATURE_CLEAR_CPU_BUF_MMIO, \ > > + __CLEAR_CPU_BUFFERS, X86_FEATURE_CLEAR_CPU_BUF_VM > > Another way to write this could be: > > ALTERNATIVE_2 "jmp .Lskip_clear_cpu_buffers", \ > "jz .Lskip_clear_cpu_buffers", X86_FEATURE_CLEAR_CPU_BUF_MMIO, \ > "", X86_FEATURE_CLEAR_CPU_BUF_VM > > CLEAR_CPU_BUFFERS_SEQ > .Lskip_clear_cpu_buffers: > > With this jmp;verw; would show up in the disassembly on unaffected CPUs, I > don't know how big a problem is that. OTOH, I find this easier to understand. As far as execution is concerned, it basically boils down to 9 NOPs: 54: 48 8b 00 mov (%rax),%rax --- 57: 90 nop 58: 90 nop 59: 90 nop 5a: 90 nop 5b: 90 nop 5c: 90 nop 5d: 90 nop 5e: 90 nop 5f: 90 nop --- 60: 73 08 jae versus 1 near jump: 54: 48 8b 00 mov (%rax),%rax --- 57: eb 0b jmp ffffffff81fa1064 59: 90 nop 5a: 90 nop 5b: 90 nop 5c: 90 nop 5d: 0f 00 2d dc ef 05 ff verw -0xfa1024(%rip) --- 64: 73 08 jae I can't tell which one is better.
On Fri Oct 31, 2025 at 12:30 AM UTC, Sean Christopherson wrote:
> Rework the handling of the MMIO Stale Data mitigation to clear CPU buffers
> immediately prior to VM-Enter, i.e. in the same location that KVM emits a
> VERW for unconditional (at runtime) clearing. Co-locating the code and
> using a single ALTERNATIVES_2 makes it more obvious how VMX mitigates the
> various vulnerabilities.
>
> Deliberately order the alternatives as:
>
> 0. Do nothing
> 1. Clear if vCPU can access MMIO
> 2. Clear always
>
> since the last alternative wins in ALTERNATIVES_2(), i.e. so that KVM will
> honor the strictest mitigation (always clear CPU buffers) if multiple
> mitigations are selected. E.g. even if the kernel chooses to mitigate
> MMIO Stale Data via X86_FEATURE_CLEAR_CPU_BUF_MMIO, some other mitigation
> may enable X86_FEATURE_CLEAR_CPU_BUF_VM, and that other thing needs to win.
>
> Note, decoupling the MMIO mitigation from the L1TF mitigation also fixes
> a mostly-benign flaw where KVM wouldn't do any clearing/flushing if the
> L1TF mitigation is configured to conditionally flush the L1D, and the MMIO
> mitigation but not any other "clear CPU buffers" mitigation is enabled.
> For that specific scenario, KVM would skip clearing CPU buffers for the
> MMIO mitigation even though the kernel requested a clear on every VM-Enter.
>
> Note #2, the flaw goes back to the introduction of the MDS mitigation. The
> MDS mitigation was inadvertently fixed by commit 43fb862de8f6 ("KVM/VMX:
> Move VERW closer to VMentry for MDS mitigation"), but previous kernels
> that flush CPU buffers in vmx_vcpu_enter_exit() are affected (though it's
> unlikely the flaw is meaningfully exploitable even older kernels).
>
> Fixes: 650b68a0622f ("x86/kvm/vmx: Add MDS protection when L1D Flush is not active")
> Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/vmx/vmenter.S | 14 +++++++++++++-
> arch/x86/kvm/vmx/vmx.c | 13 -------------
> 2 files changed, 13 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> index 1f99a98a16a2..61a809790a58 100644
> --- a/arch/x86/kvm/vmx/vmenter.S
> +++ b/arch/x86/kvm/vmx/vmenter.S
> @@ -71,6 +71,7 @@
> * @regs: unsigned long * (to guest registers)
> * @flags: VMX_RUN_VMRESUME: use VMRESUME instead of VMLAUNCH
> * VMX_RUN_SAVE_SPEC_CTRL: save guest SPEC_CTRL into vmx->spec_ctrl
> + * VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO: vCPU can access host MMIO
> *
> * Returns:
> * 0 on VM-Exit, 1 on VM-Fail
> @@ -137,6 +138,12 @@ SYM_FUNC_START(__vmx_vcpu_run)
> /* Load @regs to RAX. */
> mov (%_ASM_SP), %_ASM_AX
>
> + /* Stash "clear for MMIO" in EFLAGS.ZF (used below). */
> + ALTERNATIVE_2 "", \
> + __stringify(test $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, %ebx), \
> + X86_FEATURE_CLEAR_CPU_BUF_MMIO, \
> + "", X86_FEATURE_CLEAR_CPU_BUF_VM
Ah, so this ALTERNATIVE_2 (instead of just an ALTERNATIVE that checks
CLEAR_CPU_BUF_MMIO) is really about avoiding the flags needing to be
mutually exclusive? I.e. this is
if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF_MMIO) &&
!cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF_VM))
test $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, %ebx
... right? This is a good idea but I think it warrants a comment to
capture the intent, without having the commit message in short-term
memory I'd have struggled with this code, I think.
> /* Check if vmlaunch or vmresume is needed */
> bt $VMX_RUN_VMRESUME_SHIFT, %ebx
>
> @@ -161,7 +168,12 @@ SYM_FUNC_START(__vmx_vcpu_run)
> mov VCPU_RAX(%_ASM_AX), %_ASM_AX
>
> /* Clobbers EFLAGS.ZF */
> - VM_CLEAR_CPU_BUFFERS
> + ALTERNATIVE_2 "", \
> + __stringify(jz .Lskip_clear_cpu_buffers; \
Maybe I'm just an asm noob (I was very impressed by this trick of
using CF and ZF together like this!) but I think it's helpful to
have the comment like the jnc has below, and Pawan had in his version,
to really make the test->jz dependency obvious, since the two
instructions are quite far apart.
> + CLEAR_CPU_BUFFERS_SEQ; \
> + .Lskip_clear_cpu_buffers:), \
> + X86_FEATURE_CLEAR_CPU_BUF_MMIO, \
> + __CLEAR_CPU_BUFFERS, X86_FEATURE_CLEAR_CPU_BUF_VM
Sorry I'm really nitpicking but I think it's justified for asm
readability...
It's a bit unfortunate that one branch says
CLEAR_CPU_BUFFERS_SEQ and the other says __CLEAR_CPU_BUFFERS. With the
current code I think it would be more readable to jut have
__stringify(CLEAR_CPU_BUFFERS_SEQ) in the CLEAR_CPU_BUF_VM case, then
you don't have to mentally expand the macro to see how the two branches
actually differ.
Anyway the actual idea here LGTM, thanks.
On Fri, Oct 31, 2025, Brendan Jackman wrote:
> On Fri Oct 31, 2025 at 12:30 AM UTC, Sean Christopherson wrote:
> > Rework the handling of the MMIO Stale Data mitigation to clear CPU buffers
> > immediately prior to VM-Enter, i.e. in the same location that KVM emits a
> > VERW for unconditional (at runtime) clearing. Co-locating the code and
> > using a single ALTERNATIVES_2 makes it more obvious how VMX mitigates the
> > various vulnerabilities.
> >
> > Deliberately order the alternatives as:
> >
> > 0. Do nothing
> > 1. Clear if vCPU can access MMIO
> > 2. Clear always
> >
> > since the last alternative wins in ALTERNATIVES_2(), i.e. so that KVM will
> > honor the strictest mitigation (always clear CPU buffers) if multiple
> > mitigations are selected. E.g. even if the kernel chooses to mitigate
> > MMIO Stale Data via X86_FEATURE_CLEAR_CPU_BUF_MMIO, some other mitigation
> > may enable X86_FEATURE_CLEAR_CPU_BUF_VM, and that other thing needs to win.
> >
> > Note, decoupling the MMIO mitigation from the L1TF mitigation also fixes
> > a mostly-benign flaw where KVM wouldn't do any clearing/flushing if the
> > L1TF mitigation is configured to conditionally flush the L1D, and the MMIO
> > mitigation but not any other "clear CPU buffers" mitigation is enabled.
> > For that specific scenario, KVM would skip clearing CPU buffers for the
> > MMIO mitigation even though the kernel requested a clear on every VM-Enter.
> >
> > Note #2, the flaw goes back to the introduction of the MDS mitigation. The
> > MDS mitigation was inadvertently fixed by commit 43fb862de8f6 ("KVM/VMX:
> > Move VERW closer to VMentry for MDS mitigation"), but previous kernels
> > that flush CPU buffers in vmx_vcpu_enter_exit() are affected (though it's
> > unlikely the flaw is meaningfully exploitable even older kernels).
> >
> > Fixes: 650b68a0622f ("x86/kvm/vmx: Add MDS protection when L1D Flush is not active")
> > Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> > arch/x86/kvm/vmx/vmenter.S | 14 +++++++++++++-
> > arch/x86/kvm/vmx/vmx.c | 13 -------------
> > 2 files changed, 13 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
> > index 1f99a98a16a2..61a809790a58 100644
> > --- a/arch/x86/kvm/vmx/vmenter.S
> > +++ b/arch/x86/kvm/vmx/vmenter.S
> > @@ -71,6 +71,7 @@
> > * @regs: unsigned long * (to guest registers)
> > * @flags: VMX_RUN_VMRESUME: use VMRESUME instead of VMLAUNCH
> > * VMX_RUN_SAVE_SPEC_CTRL: save guest SPEC_CTRL into vmx->spec_ctrl
> > + * VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO: vCPU can access host MMIO
> > *
> > * Returns:
> > * 0 on VM-Exit, 1 on VM-Fail
> > @@ -137,6 +138,12 @@ SYM_FUNC_START(__vmx_vcpu_run)
> > /* Load @regs to RAX. */
> > mov (%_ASM_SP), %_ASM_AX
> >
> > + /* Stash "clear for MMIO" in EFLAGS.ZF (used below). */
> > + ALTERNATIVE_2 "", \
> > + __stringify(test $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, %ebx), \
> > + X86_FEATURE_CLEAR_CPU_BUF_MMIO, \
> > + "", X86_FEATURE_CLEAR_CPU_BUF_VM
>
> Ah, so this ALTERNATIVE_2 (instead of just an ALTERNATIVE that checks
> CLEAR_CPU_BUF_MMIO) is really about avoiding the flags needing to be
> mutually exclusive?
Yeah, more or less. More specifically, I want to keep the X vs. Y logic in one
place (well, two if you count both ALTERNATIVE_2 flows), so that in generaly,
from KVM's perspective, the mitigations are handled as independent things. E.g.
even if CLEAR_CPU_BUF_VM and CLEAR_CPU_BUF_MMIO are mutually exclusive within
the kernel (and it's not clear to me that that's 100% guaranteed), I want to
limit how much of KVM assumes they are exclusive. Partly to avoid "oops, we
forgot to mitigate that thing you care about", partly so that reading code like
the setting of VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO doesn't require understanding
the relationship between CLEAR_CPU_BUF_VM and CLEAR_CPU_BUF_MMIO.
> if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF_MMIO) &&
> !cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF_VM))
> test $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, %ebx
>
> ... right? This is a good idea but I think it warrants a comment to
> capture the intent, without having the commit message in short-term
> memory I'd have struggled with this code, I think.
>
> > /* Check if vmlaunch or vmresume is needed */
> > bt $VMX_RUN_VMRESUME_SHIFT, %ebx
> >
> > @@ -161,7 +168,12 @@ SYM_FUNC_START(__vmx_vcpu_run)
> > mov VCPU_RAX(%_ASM_AX), %_ASM_AX
> >
> > /* Clobbers EFLAGS.ZF */
> > - VM_CLEAR_CPU_BUFFERS
> > + ALTERNATIVE_2 "", \
> > + __stringify(jz .Lskip_clear_cpu_buffers; \
>
> Maybe I'm just an asm noob
Nah, all of this is definitely playing on hard mode. I'm just thankful we don't
have to deal with the horrors of KVM doing all of this in inline asm. :-D
> I was very impressed by this trick of using CF and ZF together like this!)
> but I think it's helpful to have the comment like the jnc has below, and
> Pawan had in his version, to really make the test->jz dependency obvious,
> since the two instructions are quite far apart.
>
>
> > + CLEAR_CPU_BUFFERS_SEQ; \
> > + .Lskip_clear_cpu_buffers:), \
> > + X86_FEATURE_CLEAR_CPU_BUF_MMIO, \
> > + __CLEAR_CPU_BUFFERS, X86_FEATURE_CLEAR_CPU_BUF_VM
>
> Sorry I'm really nitpicking but I think it's justified for asm
> readability...
>
> It's a bit unfortunate that one branch says
> CLEAR_CPU_BUFFERS_SEQ and the other says __CLEAR_CPU_BUFFERS. With the
> current code I think it would be more readable to jut have
> __stringify(CLEAR_CPU_BUFFERS_SEQ) in the CLEAR_CPU_BUF_VM case, then
> you don't have to mentally expand the macro to see how the two branches
> actually differ.
No preference here (assuming I understand what you're asking).
Is this better?
/*
* Note, this sequence consumes *and* clobbers EFLAGS.ZF. The MMIO
* mitigations uses ZF to track whether or not the vCPU has access to
* host MMIO (see above), and VERW (the instruction microcode hijacks
* to clear CPU buffers) writes ZF.
*/
ALTERNATIVE_2 "", \
__stringify(jz .Lskip_clear_cpu_buffers; \
CLEAR_CPU_BUFFERS_SEQ; \
.Lskip_clear_cpu_buffers:), \
X86_FEATURE_CLEAR_CPU_BUF_MMIO, \
__stringify(CLEAR_CPU_BUFFERS_SEQ), X86_FEATURE_CLEAR_CPU_BUF_VM
On Fri Oct 31, 2025 at 9:44 PM UTC, Sean Christopherson wrote:
> On Fri, Oct 31, 2025, Brendan Jackman wrote:
>> On Fri Oct 31, 2025 at 12:30 AM UTC, Sean Christopherson wrote:
>> > Rework the handling of the MMIO Stale Data mitigation to clear CPU buffers
>> > immediately prior to VM-Enter, i.e. in the same location that KVM emits a
>> > VERW for unconditional (at runtime) clearing. Co-locating the code and
>> > using a single ALTERNATIVES_2 makes it more obvious how VMX mitigates the
>> > various vulnerabilities.
>> >
>> > Deliberately order the alternatives as:
>> >
>> > 0. Do nothing
>> > 1. Clear if vCPU can access MMIO
>> > 2. Clear always
>> >
>> > since the last alternative wins in ALTERNATIVES_2(), i.e. so that KVM will
>> > honor the strictest mitigation (always clear CPU buffers) if multiple
>> > mitigations are selected. E.g. even if the kernel chooses to mitigate
>> > MMIO Stale Data via X86_FEATURE_CLEAR_CPU_BUF_MMIO, some other mitigation
>> > may enable X86_FEATURE_CLEAR_CPU_BUF_VM, and that other thing needs to win.
>> >
>> > Note, decoupling the MMIO mitigation from the L1TF mitigation also fixes
>> > a mostly-benign flaw where KVM wouldn't do any clearing/flushing if the
>> > L1TF mitigation is configured to conditionally flush the L1D, and the MMIO
>> > mitigation but not any other "clear CPU buffers" mitigation is enabled.
>> > For that specific scenario, KVM would skip clearing CPU buffers for the
>> > MMIO mitigation even though the kernel requested a clear on every VM-Enter.
>> >
>> > Note #2, the flaw goes back to the introduction of the MDS mitigation. The
>> > MDS mitigation was inadvertently fixed by commit 43fb862de8f6 ("KVM/VMX:
>> > Move VERW closer to VMentry for MDS mitigation"), but previous kernels
>> > that flush CPU buffers in vmx_vcpu_enter_exit() are affected (though it's
>> > unlikely the flaw is meaningfully exploitable even older kernels).
>> >
>> > Fixes: 650b68a0622f ("x86/kvm/vmx: Add MDS protection when L1D Flush is not active")
>> > Suggested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
>> > Signed-off-by: Sean Christopherson <seanjc@google.com>
>> > ---
>> > arch/x86/kvm/vmx/vmenter.S | 14 +++++++++++++-
>> > arch/x86/kvm/vmx/vmx.c | 13 -------------
>> > 2 files changed, 13 insertions(+), 14 deletions(-)
>> >
>> > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S
>> > index 1f99a98a16a2..61a809790a58 100644
>> > --- a/arch/x86/kvm/vmx/vmenter.S
>> > +++ b/arch/x86/kvm/vmx/vmenter.S
>> > @@ -71,6 +71,7 @@
>> > * @regs: unsigned long * (to guest registers)
>> > * @flags: VMX_RUN_VMRESUME: use VMRESUME instead of VMLAUNCH
>> > * VMX_RUN_SAVE_SPEC_CTRL: save guest SPEC_CTRL into vmx->spec_ctrl
>> > + * VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO: vCPU can access host MMIO
>> > *
>> > * Returns:
>> > * 0 on VM-Exit, 1 on VM-Fail
>> > @@ -137,6 +138,12 @@ SYM_FUNC_START(__vmx_vcpu_run)
>> > /* Load @regs to RAX. */
>> > mov (%_ASM_SP), %_ASM_AX
>> >
>> > + /* Stash "clear for MMIO" in EFLAGS.ZF (used below). */
>> > + ALTERNATIVE_2 "", \
>> > + __stringify(test $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, %ebx), \
>> > + X86_FEATURE_CLEAR_CPU_BUF_MMIO, \
>> > + "", X86_FEATURE_CLEAR_CPU_BUF_VM
>>
>> Ah, so this ALTERNATIVE_2 (instead of just an ALTERNATIVE that checks
>> CLEAR_CPU_BUF_MMIO) is really about avoiding the flags needing to be
>> mutually exclusive?
>
> Yeah, more or less. More specifically, I want to keep the X vs. Y logic in one
> place (well, two if you count both ALTERNATIVE_2 flows), so that in generaly,
> from KVM's perspective, the mitigations are handled as independent things. E.g.
> even if CLEAR_CPU_BUF_VM and CLEAR_CPU_BUF_MMIO are mutually exclusive within
> the kernel (and it's not clear to me that that's 100% guaranteed), I want to
> limit how much of KVM assumes they are exclusive. Partly to avoid "oops, we
> forgot to mitigate that thing you care about", partly so that reading code like
> the setting of VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO doesn't require understanding
> the relationship between CLEAR_CPU_BUF_VM and CLEAR_CPU_BUF_MMIO.
Yeah, this makes sense, if we can avoid creating any unnecessary
and awkward-to-enforce invariants that seems like a win.
>> if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF_MMIO) &&
>> !cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF_VM))
>> test $VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO, %ebx
>>
>> ... right? This is a good idea but I think it warrants a comment to
>> capture the intent, without having the commit message in short-term
>> memory I'd have struggled with this code, I think.
>>
>> > /* Check if vmlaunch or vmresume is needed */
>> > bt $VMX_RUN_VMRESUME_SHIFT, %ebx
>> >
>> > @@ -161,7 +168,12 @@ SYM_FUNC_START(__vmx_vcpu_run)
>> > mov VCPU_RAX(%_ASM_AX), %_ASM_AX
>> >
>> > /* Clobbers EFLAGS.ZF */
>> > - VM_CLEAR_CPU_BUFFERS
>> > + ALTERNATIVE_2 "", \
>> > + __stringify(jz .Lskip_clear_cpu_buffers; \
>>
>> Maybe I'm just an asm noob
>
> Nah, all of this is definitely playing on hard mode. I'm just thankful we don't
> have to deal with the horrors of KVM doing all of this in inline asm. :-D
>
>> I was very impressed by this trick of using CF and ZF together like this!)
>> but I think it's helpful to have the comment like the jnc has below, and
>> Pawan had in his version, to really make the test->jz dependency obvious,
>> since the two instructions are quite far apart.
>>
>>
>> > + CLEAR_CPU_BUFFERS_SEQ; \
>> > + .Lskip_clear_cpu_buffers:), \
>> > + X86_FEATURE_CLEAR_CPU_BUF_MMIO, \
>> > + __CLEAR_CPU_BUFFERS, X86_FEATURE_CLEAR_CPU_BUF_VM
>>
>> Sorry I'm really nitpicking but I think it's justified for asm
>> readability...
>>
>> It's a bit unfortunate that one branch says
>> CLEAR_CPU_BUFFERS_SEQ and the other says __CLEAR_CPU_BUFFERS. With the
>> current code I think it would be more readable to jut have
>> __stringify(CLEAR_CPU_BUFFERS_SEQ) in the CLEAR_CPU_BUF_VM case, then
>> you don't have to mentally expand the macro to see how the two branches
>> actually differ.
>
> No preference here (assuming I understand what you're asking).
>
> Is this better?
>
> /*
> * Note, this sequence consumes *and* clobbers EFLAGS.ZF. The MMIO
> * mitigations uses ZF to track whether or not the vCPU has access to
> * host MMIO (see above), and VERW (the instruction microcode hijacks
> * to clear CPU buffers) writes ZF.
> */
> ALTERNATIVE_2 "", \
> __stringify(jz .Lskip_clear_cpu_buffers; \
> CLEAR_CPU_BUFFERS_SEQ; \
> .Lskip_clear_cpu_buffers:), \
> X86_FEATURE_CLEAR_CPU_BUF_MMIO, \
> __stringify(CLEAR_CPU_BUFFERS_SEQ), X86_FEATURE_CLEAR_CPU_BUF_VM
Yep that looks good to me.
© 2016 - 2026 Red Hat, Inc.