If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to
mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g.
because none of the "heavy" paths that trigger an L1D flush were tripped
since the last VM-Enter.
Note, the flaw goes back to the introduction of the MDS mitigation. The
MDS mitigation was inadvertently fixed by commit 43fb862de8f6 ("KVM/VMX:
Move VERW closer to VMentry for MDS mitigation"), but previous kernels
that flush CPU buffers in vmx_vcpu_enter_exit() are affected.
Fixes: 650b68a0622f ("x86/kvm/vmx: Add MDS protection when L1D Flush is not active")
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
arch/x86/kvm/vmx/vmx.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index f87c216d976d..ce556d5dc39b 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -6663,7 +6663,7 @@ int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
* information but as all relevant affected CPUs have 32KiB L1D cache size
* there is no point in doing so.
*/
-static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
+static noinstr bool vmx_l1d_flush(struct kvm_vcpu *vcpu)
{
int size = PAGE_SIZE << L1D_CACHE_ORDER;
@@ -6691,14 +6691,14 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
kvm_clear_cpu_l1tf_flush_l1d();
if (!flush_l1d)
- return;
+ return false;
}
vcpu->stat.l1d_flush++;
if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
native_wrmsrq(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
- return;
+ return true;
}
asm volatile(
@@ -6722,6 +6722,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
:: [flush_pages] "r" (vmx_l1d_flush_pages),
[size] "r" (size)
: "eax", "ebx", "ecx", "edx");
+ return true;
}
void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
@@ -7330,8 +7331,9 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
* and is affected by MMIO Stale Data. In such cases mitigation in only
* needed against an MMIO capable guest.
*/
- if (static_branch_unlikely(&vmx_l1d_should_flush))
- vmx_l1d_flush(vcpu);
+ if (static_branch_unlikely(&vmx_l1d_should_flush) &&
+ vmx_l1d_flush(vcpu))
+ ;
else if (static_branch_unlikely(&cpu_buf_vm_clear) &&
(flags & VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO))
x86_clear_cpu_buffers();
--
2.51.0.858.gf9c4a03a3a-goog
On Thu, Oct 16, 2025 at 01:04:14PM -0700, Sean Christopherson wrote: > If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to > mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g. > because none of the "heavy" paths that trigger an L1D flush were tripped > since the last VM-Enter. > > Note, the flaw goes back to the introduction of the MDS mitigation. I don't think it is a flaw. If L1D flush was skipped because VMexit did not touch any interested data, then there shouldn't be any need to flush CPU buffers. Secondly, when L1D flush is skipped, flushing MDS affected buffers is of no use, because the data could still be extracted from L1D cache using L1TF. Isn't it?
On Tue Oct 21, 2025 at 11:18 PM UTC, Pawan Gupta wrote: > On Thu, Oct 16, 2025 at 01:04:14PM -0700, Sean Christopherson wrote: >> If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to >> mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g. >> because none of the "heavy" paths that trigger an L1D flush were tripped >> since the last VM-Enter. >> >> Note, the flaw goes back to the introduction of the MDS mitigation. > > I don't think it is a flaw. If L1D flush was skipped because VMexit did not > touch any interested data, then there shouldn't be any need to flush CPU > buffers. > > Secondly, when L1D flush is skipped, flushing MDS affected buffers is of no > use, because the data could still be extracted from L1D cache using L1TF. > Isn't it? This is assuming an equivalence between what L1TF and MMIO Stale Data exploits can do, that isn't really captured in the code/documentation IMO. This probably felt much more obvious when the vulns were new... I dunno, in the end this definitely doesn't seem like a terrifying big deal, I'm not saying the current behaviour is crazy or anything, it's just slightly surprising and people with sophisticated opinions about this might not be getting what they think they are out of the default setup. But I have no evidence that these sophisticated dissidents actually exist, maybe just adding commentary about this rationale is more than good enough here.
On Wed, Oct 22, 2025, Brendan Jackman wrote: > On Tue Oct 21, 2025 at 11:18 PM UTC, Pawan Gupta wrote: > > On Thu, Oct 16, 2025 at 01:04:14PM -0700, Sean Christopherson wrote: > >> If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to > >> mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g. > >> because none of the "heavy" paths that trigger an L1D flush were tripped > >> since the last VM-Enter. > >> > >> Note, the flaw goes back to the introduction of the MDS mitigation. > > > > I don't think it is a flaw. If L1D flush was skipped because VMexit did not > > touch any interested data, then there shouldn't be any need to flush CPU > > buffers. But as Brendan alludes to below, that assumes certain aspects of L1TF and MDS are equal. Obliterating the L1D is far more costly than flushing CPU buffers, as evidenced by the much more conditional flushing for L1TF. My read of the L1TF mitigation is that the conditional flushing is that it's a compromise between performance and security. Skipping the flush doesn't necessarily mean nothing interesting was accessed, it just means that KVM didn't hit any of the flows where a large amount of interesting data was guaranteed to have been accessed. > > Secondly, when L1D flush is skipped, flushing MDS affected buffers is of no > > use, because the data could still be extracted from L1D cache using L1TF. > > Isn't it? > > This is assuming an equivalence between what L1TF and MMIO Stale Data > exploits can do, that isn't really captured in the code/documentation > IMO. And again, the cost. To fully mitigate L1TF, KVM would need to flush on every entry, but that completely tanks performance. But that doesn't > This probably felt much more obvious when the vulns were new... > > I dunno, in the end this definitely doesn't seem like a terrifying big > deal, I'm not saying the current behaviour is crazy or anything, it's > just slightly surprising and people with sophisticated opinions about > this might not be getting what they think they are out of the default > setup. Ya. I highly doubt this particular combination matters in practice, but I don't like surprises. And I find it surprising that the behavior of KVM's mitigation for MMIO Stale Data changes based on whether or not the L1TF mitigation is enabled. > But I have no evidence that these sophisticated dissidents actually > exist, maybe just adding commentary about this rationale is more than > good enough here.
On Thu Oct 16, 2025 at 8:04 PM UTC, Sean Christopherson wrote:
> If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to
> mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g.
> because none of the "heavy" paths that trigger an L1D flush were tripped
> since the last VM-Enter.
Presumably the assumption here was that the L1TF conditionality is good
enough for the MMIO stale data vuln too? I'm not qualified to assess if
that assumption is true, but also even if it's a good one it's
definitely not obvious to users that the mitigation you pick for L1TF
has this side-effect. So I think I'm on board with calling this a bug.
If anyone turns out to be depending on the current behaviour for
performance I think they should probably add it back as a separate flag.
> MDS mitigation was inadvertently fixed by commit 43fb862de8f6 ("KVM/VMX:
> Move VERW closer to VMentry for MDS mitigation"), but previous kernels
> that flush CPU buffers in vmx_vcpu_enter_exit() are affected.
>
> Fixes: 650b68a0622f ("x86/kvm/vmx: Add MDS protection when L1D Flush is not active")
> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
> arch/x86/kvm/vmx/vmx.c | 12 +++++++-----
> 1 file changed, 7 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
> index f87c216d976d..ce556d5dc39b 100644
> --- a/arch/x86/kvm/vmx/vmx.c
> +++ b/arch/x86/kvm/vmx/vmx.c
> @@ -6663,7 +6663,7 @@ int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath)
> * information but as all relevant affected CPUs have 32KiB L1D cache size
> * there is no point in doing so.
> */
> -static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
> +static noinstr bool vmx_l1d_flush(struct kvm_vcpu *vcpu)
> {
> int size = PAGE_SIZE << L1D_CACHE_ORDER;
>
> @@ -6691,14 +6691,14 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
> kvm_clear_cpu_l1tf_flush_l1d();
>
> if (!flush_l1d)
> - return;
> + return false;
> }
>
> vcpu->stat.l1d_flush++;
>
> if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) {
> native_wrmsrq(MSR_IA32_FLUSH_CMD, L1D_FLUSH);
> - return;
> + return true;
> }
>
> asm volatile(
> @@ -6722,6 +6722,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu)
> :: [flush_pages] "r" (vmx_l1d_flush_pages),
> [size] "r" (size)
> : "eax", "ebx", "ecx", "edx");
> + return true;
The comment in the caller says the L1D flush "includes CPU buffer clear
to mitigate MDS" - do we actually know that this software sequence
mitigates the MMIO stale data vuln like the verw does? (Do we even know if
it mitigates MDS?)
Anyway, if this is an issue, it's orthogonal to this patch.
Reviewed-by: Brendan Jackman <jackmanb@google.com>
> }
>
> void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
> @@ -7330,8 +7331,9 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu,
> * and is affected by MMIO Stale Data. In such cases mitigation in only
> * needed against an MMIO capable guest.
> */
> - if (static_branch_unlikely(&vmx_l1d_should_flush))
> - vmx_l1d_flush(vcpu);
> + if (static_branch_unlikely(&vmx_l1d_should_flush) &&
> + vmx_l1d_flush(vcpu))
> + ;
> else if (static_branch_unlikely(&cpu_buf_vm_clear) &&
> (flags & VMX_RUN_CLEAR_CPU_BUFFERS_FOR_MMIO))
> x86_clear_cpu_buffers();
On Tue, Oct 21, 2025, Brendan Jackman wrote: > On Thu Oct 16, 2025 at 8:04 PM UTC, Sean Christopherson wrote: > > If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to > > mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g. > > because none of the "heavy" paths that trigger an L1D flush were tripped > > since the last VM-Enter. > > Presumably the assumption here was that the L1TF conditionality is good > enough for the MMIO stale data vuln too? I'm not qualified to assess if > that assumption is true, but also even if it's a good one it's > definitely not obvious to users that the mitigation you pick for L1TF > has this side-effect. So I think I'm on board with calling this a bug. Yeah, that's where I'm at as well. > If anyone turns out to be depending on the current behaviour for > performance I think they should probably add it back as a separate flag. ... > > @@ -6722,6 +6722,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) > > :: [flush_pages] "r" (vmx_l1d_flush_pages), > > [size] "r" (size) > > : "eax", "ebx", "ecx", "edx"); > > + return true; > > The comment in the caller says the L1D flush "includes CPU buffer clear > to mitigate MDS" - do we actually know that this software sequence > mitigates the MMIO stale data vuln like the verw does? (Do we even know if > it mitigates MDS?) > > Anyway, if this is an issue, it's orthogonal to this patch. Pawan, any idea?
On Tue, Oct 21, 2025 at 09:48:30AM -0700, Sean Christopherson wrote: > On Tue, Oct 21, 2025, Brendan Jackman wrote: > > On Thu Oct 16, 2025 at 8:04 PM UTC, Sean Christopherson wrote: > > > If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to > > > mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g. > > > because none of the "heavy" paths that trigger an L1D flush were tripped > > > since the last VM-Enter. > > > > Presumably the assumption here was that the L1TF conditionality is good > > enough for the MMIO stale data vuln too? I'm not qualified to assess if > > that assumption is true, but also even if it's a good one it's > > definitely not obvious to users that the mitigation you pick for L1TF > > has this side-effect. So I think I'm on board with calling this a bug. > > Yeah, that's where I'm at as well. > > > If anyone turns out to be depending on the current behaviour for > > performance I think they should probably add it back as a separate flag. > > ... > > > > @@ -6722,6 +6722,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) > > > :: [flush_pages] "r" (vmx_l1d_flush_pages), > > > [size] "r" (size) > > > : "eax", "ebx", "ecx", "edx"); > > > + return true; > > > > The comment in the caller says the L1D flush "includes CPU buffer clear > > to mitigate MDS" - do we actually know that this software sequence > > mitigates the MMIO stale data vuln like the verw does? (Do we even know if > > it mitigates MDS?) > > > > Anyway, if this is an issue, it's orthogonal to this patch. > > Pawan, any idea? I want to say yes, but let me first confirm this internally and get back to you.
On Tue, Oct 21, 2025 at 04:30:12PM -0700, Pawan Gupta wrote: > On Tue, Oct 21, 2025 at 09:48:30AM -0700, Sean Christopherson wrote: > > On Tue, Oct 21, 2025, Brendan Jackman wrote: > > > On Thu Oct 16, 2025 at 8:04 PM UTC, Sean Christopherson wrote: > > > > If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to > > > > mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g. > > > > because none of the "heavy" paths that trigger an L1D flush were tripped > > > > since the last VM-Enter. > > > > > > Presumably the assumption here was that the L1TF conditionality is good > > > enough for the MMIO stale data vuln too? I'm not qualified to assess if > > > that assumption is true, but also even if it's a good one it's > > > definitely not obvious to users that the mitigation you pick for L1TF > > > has this side-effect. So I think I'm on board with calling this a bug. > > > > Yeah, that's where I'm at as well. > > > > > If anyone turns out to be depending on the current behaviour for > > > performance I think they should probably add it back as a separate flag. > > > > ... > > > > > > @@ -6722,6 +6722,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) > > > > :: [flush_pages] "r" (vmx_l1d_flush_pages), > > > > [size] "r" (size) > > > > : "eax", "ebx", "ecx", "edx"); > > > > + return true; > > > > > > The comment in the caller says the L1D flush "includes CPU buffer clear > > > to mitigate MDS" - do we actually know that this software sequence > > > mitigates the MMIO stale data vuln like the verw does? (Do we even know if > > > it mitigates MDS?) > > > > > > Anyway, if this is an issue, it's orthogonal to this patch. > > > > Pawan, any idea? > > I want to say yes, but let me first confirm this internally and get back to > you. The software sequence for L1D flush was not validated to mitigate MMIO Stale Data. To be on safer side, it is better to not rely on the sequence. OTOH, if a user has not updated the microcode to mitigate L1TF, the system will not have the microcode to mitigate MMIO Stale Data either, because the microcode for MMIO Stale Data was released after L1TF. Also I am not aware of any CPUs that are vulnerable to L1TF and vulnerable to MMIO Stale Data only(not MDS). So in practice, decoupling L1D flush and MMIO Stale Data won't have any practical impact on functionality, and makes MMIO Stale Data mitigation consistent with MDS mitigation. I hope that makes things clear.
On Tue, Oct 21, 2025 at 04:30:19PM -0700, Pawan Gupta wrote: > On Tue, Oct 21, 2025 at 09:48:30AM -0700, Sean Christopherson wrote: > > On Tue, Oct 21, 2025, Brendan Jackman wrote: > > > On Thu Oct 16, 2025 at 8:04 PM UTC, Sean Christopherson wrote: > > > > If the L1D flush for L1TF is conditionally enabled, flush CPU buffers to > > > > mitigate MMIO Stale Data as needed if KVM skips the L1D flush, e.g. > > > > because none of the "heavy" paths that trigger an L1D flush were tripped > > > > since the last VM-Enter. > > > > > > Presumably the assumption here was that the L1TF conditionality is good > > > enough for the MMIO stale data vuln too? I'm not qualified to assess if > > > that assumption is true, but also even if it's a good one it's > > > definitely not obvious to users that the mitigation you pick for L1TF > > > has this side-effect. So I think I'm on board with calling this a bug. > > > > Yeah, that's where I'm at as well. > > > > > If anyone turns out to be depending on the current behaviour for > > > performance I think they should probably add it back as a separate flag. > > > > ... > > > > > > @@ -6722,6 +6722,7 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) > > > > :: [flush_pages] "r" (vmx_l1d_flush_pages), > > > > [size] "r" (size) > > > > : "eax", "ebx", "ecx", "edx"); > > > > + return true; > > > > > > The comment in the caller says the L1D flush "includes CPU buffer clear > > > to mitigate MDS" - do we actually know that this software sequence > > > mitigates the MMIO stale data vuln like the verw does? (Do we even know if > > > it mitigates MDS?) Thinking more on this, the software sequence is only invoked when the system doesn't have the L1D flushing feature added by a microcode update. In such a case system is not expected to have a flushing VERW either, which was introduced after L1TF. Also, the admin needs to have a very good reason for not updating the microcode for 5+ years :-) Anyways, I have asked for a confirmation if the sequence works for MMIO stale data also. I will update once I get a response. > > > Anyway, if this is an issue, it's orthogonal to this patch. > > > > Pawan, any idea? > > I want to say yes, but let me first confirm this internally and get back to > you.
On Tue, Oct 21, 2025 at 6:20 PM Pawan Gupta <pawan.kumar.gupta@linux.intel.com> wrote: > > ... > Thinking more on this, the software sequence is only invoked when the > system doesn't have the L1D flushing feature added by a microcode update. > In such a case system is not expected to have a flushing VERW either, which > was introduced after L1TF. Also, the admin needs to have a very good reason > for not updating the microcode for 5+ years :-) KVM started reporting MD_CLEAR to userspace in Linux v5.2, but it didn't report L1D_FLUSH to userspace until Linux v6.4, so there are plenty of virtual CPUs with a flushing VERW that don't have the L1D flushing feature.
On Mon, Oct 27, 2025 at 03:03:23PM -0700, Jim Mattson wrote:
> On Tue, Oct 21, 2025 at 6:20 PM Pawan Gupta
> <pawan.kumar.gupta@linux.intel.com> wrote:
> >
> > ...
> > Thinking more on this, the software sequence is only invoked when the
> > system doesn't have the L1D flushing feature added by a microcode update.
> > In such a case system is not expected to have a flushing VERW either, which
> > was introduced after L1TF. Also, the admin needs to have a very good reason
> > for not updating the microcode for 5+ years :-)
>
> KVM started reporting MD_CLEAR to userspace in Linux v5.2, but it
> didn't report L1D_FLUSH to userspace until Linux v6.4, so there are
> plenty of virtual CPUs with a flushing VERW that don't have the L1D
> flushing feature.
Shouldn't only the L0 hypervisor be doing the L1D_FLUSH?
kvm_get_arch_capabilities()
{
...
/*
* If we're doing cache flushes (either "always" or "cond")
* we will do one whenever the guest does a vmlaunch/vmresume.
* If an outer hypervisor is doing the cache flush for us
* (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
* capability to the guest too, and if EPT is disabled we're not
* vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will
* require a nested hypervisor to do a flush of its own.
*/
if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
On Mon, Oct 27, 2025 at 4:17 PM Pawan Gupta
<pawan.kumar.gupta@linux.intel.com> wrote:
>
> On Mon, Oct 27, 2025 at 03:03:23PM -0700, Jim Mattson wrote:
> > On Tue, Oct 21, 2025 at 6:20 PM Pawan Gupta
> > <pawan.kumar.gupta@linux.intel.com> wrote:
> > >
> > > ...
> > > Thinking more on this, the software sequence is only invoked when the
> > > system doesn't have the L1D flushing feature added by a microcode update.
> > > In such a case system is not expected to have a flushing VERW either, which
> > > was introduced after L1TF. Also, the admin needs to have a very good reason
> > > for not updating the microcode for 5+ years :-)
> >
> > KVM started reporting MD_CLEAR to userspace in Linux v5.2, but it
> > didn't report L1D_FLUSH to userspace until Linux v6.4, so there are
> > plenty of virtual CPUs with a flushing VERW that don't have the L1D
> > flushing feature.
>
> Shouldn't only the L0 hypervisor be doing the L1D_FLUSH?
>
> kvm_get_arch_capabilities()
> {
> ...
> /*
> * If we're doing cache flushes (either "always" or "cond")
> * we will do one whenever the guest does a vmlaunch/vmresume.
> * If an outer hypervisor is doing the cache flush for us
> * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
> * capability to the guest too, and if EPT is disabled we're not
> * vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will
> * require a nested hypervisor to do a flush of its own.
> */
> if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
> data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
>
Unless L0 has chosen L1D_FLUSH_NEVER. :)
On GCE's L1TF-vulnerable hosts, we actually do an L1D flush at ASI
entry rather than VM-entry. ASI entries are two orders of magnitude
less frequent than VM-entries, so we get comparable protection to
L1D_FLUSH_ALWAYS at a fraction of the cost.
At the moment, we still do an L1D flush on emulated VM-entry, but
that's just because we have historically advertised
IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY to L1.
On Mon, Oct 27, 2025 at 04:58:10PM -0700, Jim Mattson wrote:
> On Mon, Oct 27, 2025 at 4:17 PM Pawan Gupta
> <pawan.kumar.gupta@linux.intel.com> wrote:
> >
> > On Mon, Oct 27, 2025 at 03:03:23PM -0700, Jim Mattson wrote:
> > > On Tue, Oct 21, 2025 at 6:20 PM Pawan Gupta
> > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > >
> > > > ...
> > > > Thinking more on this, the software sequence is only invoked when the
> > > > system doesn't have the L1D flushing feature added by a microcode update.
> > > > In such a case system is not expected to have a flushing VERW either, which
> > > > was introduced after L1TF. Also, the admin needs to have a very good reason
> > > > for not updating the microcode for 5+ years :-)
> > >
> > > KVM started reporting MD_CLEAR to userspace in Linux v5.2, but it
> > > didn't report L1D_FLUSH to userspace until Linux v6.4, so there are
> > > plenty of virtual CPUs with a flushing VERW that don't have the L1D
> > > flushing feature.
> >
> > Shouldn't only the L0 hypervisor be doing the L1D_FLUSH?
> >
> > kvm_get_arch_capabilities()
> > {
> > ...
> > /*
> > * If we're doing cache flushes (either "always" or "cond")
> > * we will do one whenever the guest does a vmlaunch/vmresume.
> > * If an outer hypervisor is doing the cache flush for us
> > * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
> > * capability to the guest too, and if EPT is disabled we're not
> > * vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will
> > * require a nested hypervisor to do a flush of its own.
> > */
> > if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
> > data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
> >
>
> Unless L0 has chosen L1D_FLUSH_NEVER. :)
>
> On GCE's L1TF-vulnerable hosts, we actually do an L1D flush at ASI
> entry rather than VM-entry. ASI entries are two orders of magnitude
> less frequent than VM-entries, so we get comparable protection to
> L1D_FLUSH_ALWAYS at a fraction of the cost.
>
> At the moment, we still do an L1D flush on emulated VM-entry, but
> that's just because we have historically advertised
> IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY to L1.
Thanks for the background.
I still don't see the problem, CPUs that are vulnerable to L1TF are also
vulnerable to MDS. So, they don't set mmio_stale_data_clear, instead they
set X86_FEATURE_CLEAR_CPU_BUF and execute VERW in __vmx_vcpu_run()
regardless of whether L1D_FLUSH was done.
But, I agree it is best to decouple L1D flush and MMIO Stale Data to be
avoid any confusion.
On Mon, Oct 27, 2025 at 05:19:57PM -0700, Pawan Gupta wrote:
> On Mon, Oct 27, 2025 at 04:58:10PM -0700, Jim Mattson wrote:
> > On Mon, Oct 27, 2025 at 4:17 PM Pawan Gupta
> > <pawan.kumar.gupta@linux.intel.com> wrote:
> > >
> > > On Mon, Oct 27, 2025 at 03:03:23PM -0700, Jim Mattson wrote:
> > > > On Tue, Oct 21, 2025 at 6:20 PM Pawan Gupta
> > > > <pawan.kumar.gupta@linux.intel.com> wrote:
> > > > >
> > > > > ...
> > > > > Thinking more on this, the software sequence is only invoked when the
> > > > > system doesn't have the L1D flushing feature added by a microcode update.
> > > > > In such a case system is not expected to have a flushing VERW either, which
> > > > > was introduced after L1TF. Also, the admin needs to have a very good reason
> > > > > for not updating the microcode for 5+ years :-)
> > > >
> > > > KVM started reporting MD_CLEAR to userspace in Linux v5.2, but it
> > > > didn't report L1D_FLUSH to userspace until Linux v6.4, so there are
> > > > plenty of virtual CPUs with a flushing VERW that don't have the L1D
> > > > flushing feature.
> > >
> > > Shouldn't only the L0 hypervisor be doing the L1D_FLUSH?
> > >
> > > kvm_get_arch_capabilities()
> > > {
> > > ...
> > > /*
> > > * If we're doing cache flushes (either "always" or "cond")
> > > * we will do one whenever the guest does a vmlaunch/vmresume.
> > > * If an outer hypervisor is doing the cache flush for us
> > > * (ARCH_CAP_SKIP_VMENTRY_L1DFLUSH), we can safely pass that
> > > * capability to the guest too, and if EPT is disabled we're not
> > > * vulnerable. Overall, only VMENTER_L1D_FLUSH_NEVER will
> > > * require a nested hypervisor to do a flush of its own.
> > > */
> > > if (l1tf_vmx_mitigation != VMENTER_L1D_FLUSH_NEVER)
> > > data |= ARCH_CAP_SKIP_VMENTRY_L1DFLUSH;
> > >
> >
> > Unless L0 has chosen L1D_FLUSH_NEVER. :)
> >
> > On GCE's L1TF-vulnerable hosts, we actually do an L1D flush at ASI
> > entry rather than VM-entry. ASI entries are two orders of magnitude
> > less frequent than VM-entries, so we get comparable protection to
> > L1D_FLUSH_ALWAYS at a fraction of the cost.
> >
> > At the moment, we still do an L1D flush on emulated VM-entry, but
> > that's just because we have historically advertised
> > IA32_ARCH_CAPABILITIES.SKIP_L1DFL_VMENTRY to L1.
>
> Thanks for the background.
>
> I still don't see the problem, CPUs that are vulnerable to L1TF are also
> vulnerable to MDS. So, they don't set mmio_stale_data_clear, instead they
Sorry I meant cpu_buf_vm_clear instead of mmio_stale_data_clear (I was
looking at a slightly older kernel).
> set X86_FEATURE_CLEAR_CPU_BUF and execute VERW in __vmx_vcpu_run()
> regardless of whether L1D_FLUSH was done.
>
> But, I agree it is best to decouple L1D flush and MMIO Stale Data to be
> avoid any confusion.
© 2016 - 2026 Red Hat, Inc.