arch/x86/include/asm/kvm_host.h | 3 +-- arch/x86/kvm/irq.c | 9 +------ arch/x86/kvm/mmu/mmu_internal.h | 3 +++ arch/x86/kvm/mmu/spte.c | 43 ++++++++++++++++++++++++++++++--- arch/x86/kvm/mmu/spte.h | 10 ++++++++ arch/x86/kvm/vmx/run_flags.h | 10 +++++--- arch/x86/kvm/vmx/vmx.c | 8 +++++- arch/x86/kvm/x86.c | 18 -------------- include/linux/kvm_host.h | 18 -------------- virt/kvm/vfio.c | 3 --- 10 files changed, 68 insertions(+), 57 deletions(-)
Fix KVM's mitigation of the MMIO Stale Data bug, as the current approach doesn't actually detect whether or not a guest has access to MMIO. E.g. KVM_DEV_VFIO_FILE_ADD is entirely optional, and obviously only covers VFIO devices, and so is a terrible heuristic for "can this vCPU access MMIO?" To fix the flaw (hopefully), track whether or not a vCPU has access to MMIO based on the MMU it will run with. KVM already detects host MMIO when installing PTEs in order to force host MMIO to UC (EPT bypasses MTRRs), so feeding that information into the MMU is rather straightforward. Note, I haven't actually verified this mitigates the MMIO Stale Data bug, but I think it's safe to say no has verified the existing code works either. All that said, and despite what the subject says, my real interest in this series it to kill off kvm_arch_{start,end}_assignment(). I.e. preciesly identifying MMIO is a means to an end. Because as evidenced by the MMIO mess and other bugs (e.g. vDPA device not getting device posted interrupts), keying off KVM_DEV_VFIO_FILE_ADD for anything is a bad idea. The last two patches of this series depend on the stupidly large device posted interrupts rework: https://lore.kernel.org/all/20250523010004.3240643-1-seanjc@google.com which in turn depends on a not-tiny prep series: https://lore.kernel.org/all/20250519232808.2745331-1-seanjc@google.com Unless you care deeply about those patches, I honestly recommend just ignoring them. I posted them as part of this series, because post two patches that depends on *four* series seemed even more ridiculousr :-) Side topic: Pawan, I haven't forgotten about your mmio_stale_data_clear => cpu_buf_vm_clear rename, I promise I'll review it soon. Sean Christopherson (5): KVM: x86: Avoid calling kvm_is_mmio_pfn() when kvm_x86_ops.get_mt_mask is NULL KVM: x86/mmu: Locally cache whether a PFN is host MMIO when making a SPTE KVM: VMX: Apply MMIO Stale Data mitigation if KVM maps MMIO into the guest Revert "kvm: detect assigned device via irqbypass manager" VFIO: KVM: x86: Drop kvm_arch_{start,end}_assignment() arch/x86/include/asm/kvm_host.h | 3 +-- arch/x86/kvm/irq.c | 9 +------ arch/x86/kvm/mmu/mmu_internal.h | 3 +++ arch/x86/kvm/mmu/spte.c | 43 ++++++++++++++++++++++++++++++--- arch/x86/kvm/mmu/spte.h | 10 ++++++++ arch/x86/kvm/vmx/run_flags.h | 10 +++++--- arch/x86/kvm/vmx/vmx.c | 8 +++++- arch/x86/kvm/x86.c | 18 -------------- include/linux/kvm_host.h | 18 -------------- virt/kvm/vfio.c | 3 --- 10 files changed, 68 insertions(+), 57 deletions(-) base-commit: 1f0486097459e53d292db749de70e587339267f5 -- 2.49.0.1151.ga128411c76-goog
On Thu, 22 May 2025 18:17:51 -0700, Sean Christopherson wrote: > Fix KVM's mitigation of the MMIO Stale Data bug, as the current approach > doesn't actually detect whether or not a guest has access to MMIO. E.g. > KVM_DEV_VFIO_FILE_ADD is entirely optional, and obviously only covers VFIO > devices, and so is a terrible heuristic for "can this vCPU access MMIO?" > > To fix the flaw (hopefully), track whether or not a vCPU has access to MMIO > based on the MMU it will run with. KVM already detects host MMIO when > installing PTEs in order to force host MMIO to UC (EPT bypasses MTRRs), so > feeding that information into the MMU is rather straightforward. > > [...] Applied 1-3 to kvm-x86 mmio, and 4-5 to 'kvm-x86 no_assignment' (which is based on 'irqs' and includes 'mmio' via a merge, to avoid having the mmio changes depend on the IRQ overhaul). [1/5] KVM: x86: Avoid calling kvm_is_mmio_pfn() when kvm_x86_ops.get_mt_mask is NULL https://github.com/kvm-x86/linux/commit/c126b46e6fa8 [2/5] KVM: x86/mmu: Locally cache whether a PFN is host MMIO when making a SPTE https://github.com/kvm-x86/linux/commit/ffe9d7966d01 [3/5] KVM: VMX: Apply MMIO Stale Data mitigation if KVM maps MMIO into the guest https://github.com/kvm-x86/linux/commit/83ebe7157483 [4/5] Revert "kvm: detect assigned device via irqbypass manager" https://github.com/kvm-x86/linux/commit/ff845e6a84c8 [5/5] VFIO: KVM: x86: Drop kvm_arch_{start,end}_assignment() https://github.com/kvm-x86/linux/commit/bbc13ae593e0 -- https://github.com/kvm-x86/kvm-unit-tests/tree/next
On Thu, May 22, 2025 at 06:17:51PM -0700, Sean Christopherson wrote: > Fix KVM's mitigation of the MMIO Stale Data bug, as the current approach > doesn't actually detect whether or not a guest has access to MMIO. E.g. > KVM_DEV_VFIO_FILE_ADD is entirely optional, and obviously only covers VFIO I believe this needs userspace co-operation? > devices, and so is a terrible heuristic for "can this vCPU access MMIO?" > > To fix the flaw (hopefully), track whether or not a vCPU has access to MMIO > based on the MMU it will run with. KVM already detects host MMIO when > installing PTEs in order to force host MMIO to UC (EPT bypasses MTRRs), so > feeding that information into the MMU is rather straightforward. > > Note, I haven't actually verified this mitigates the MMIO Stale Data bug, but > I think it's safe to say no has verified the existing code works either. Mitigation was verifed for VFIO devices, but ofcourse not for the cases you mentioned above. Typically, it is the PCI config registers on some faulty devices (that don't respect byte-enable) are subject to MMIO Stale Data. But, it is impossible to test and confirm with absolute certainity that all other cases are not affected. Your patches should rule out those cases as well. Regarding validating this, if VERW is executed at VMenter, mitigation was found to be effective. This is similar to other bugs like MDS. I am not a virtualization expert, but I will try to validate whatever I can. > All that said, and despite what the subject says, my real interest in this > series it to kill off kvm_arch_{start,end}_assignment(). I.e. preciesly > identifying MMIO is a means to an end. Because as evidenced by the MMIO mess > and other bugs (e.g. vDPA device not getting device posted interrupts), > keying off KVM_DEV_VFIO_FILE_ADD for anything is a bad idea. > > The last two patches of this series depend on the stupidly large device > posted interrupts rework: > > https://lore.kernel.org/all/20250523010004.3240643-1-seanjc@google.com > > which in turn depends on a not-tiny prep series: > > https://lore.kernel.org/all/20250519232808.2745331-1-seanjc@google.com > > Unless you care deeply about those patches, I honestly recommend just ignoring > them. I posted them as part of this series, because post two patches that > depends on *four* series seemed even more ridiculousr :-) > > Side topic: Pawan, I haven't forgotten about your mmio_stale_data_clear => > cpu_buf_vm_clear rename, I promise I'll review it soon. No problem.
On Wed, May 28, 2025, Pawan Gupta wrote: > On Thu, May 22, 2025 at 06:17:51PM -0700, Sean Christopherson wrote: > > Fix KVM's mitigation of the MMIO Stale Data bug, as the current approach > > doesn't actually detect whether or not a guest has access to MMIO. E.g. > > KVM_DEV_VFIO_FILE_ADD is entirely optional, and obviously only covers VFIO > > I believe this needs userspace co-operation? Yes, more or less. If the userspace VMM knows it doesn't need to trigger the side effects of KVM_DEV_VFIO_FILE_ADD (e.g. isn't dealing with non-coherent DMA), and doesn't need the VFIO<=>KVM binding (e.g. for KVM-GT), then AFAIK it's safe to skip KVM_DEV_VFIO_FILE_ADD, modulo this mitigation. > > devices, and so is a terrible heuristic for "can this vCPU access MMIO?" > > > > To fix the flaw (hopefully), track whether or not a vCPU has access to MMIO > > based on the MMU it will run with. KVM already detects host MMIO when > > installing PTEs in order to force host MMIO to UC (EPT bypasses MTRRs), so > > feeding that information into the MMU is rather straightforward. > > > > Note, I haven't actually verified this mitigates the MMIO Stale Data bug, but > > I think it's safe to say no has verified the existing code works either. > > Mitigation was verifed for VFIO devices, but ofcourse not for the cases you > mentioned above. Typically, it is the PCI config registers on some faulty > devices (that don't respect byte-enable) are subject to MMIO Stale Data. > > But, it is impossible to test and confirm with absolute certainity that all Yeah, no argument there. > other cases are not affected. Your patches should rule out those cases as > well. > > Regarding validating this, if VERW is executed at VMenter, mitigation was > found to be effective. This is similar to other bugs like MDS. I am not a > virtualization expert, but I will try to validate whatever I can. If you can re-verify the mitigation works for VFIO devices, that's more than good enough for me. The bar at this point is to not regress the existing mitigation, anything beyond that is gravy. I've verified the KVM mechanics of tracing MMIO mappings fairly well (famous last words), the only thing I haven't sanity checked is that the existing coverage for VFIO devices is maintained.
On Mon, Jun 02, 2025 at 04:41:35PM -0700, Sean Christopherson wrote: > > Regarding validating this, if VERW is executed at VMenter, mitigation was > > found to be effective. This is similar to other bugs like MDS. I am not a > > virtualization expert, but I will try to validate whatever I can. > > If you can re-verify the mitigation works for VFIO devices, that's more than > good enough for me. The bar at this point is to not regress the existing mitigation, > anything beyond that is gravy. Ok sure. I'll verify that VERW is getting executed for VFIO devices. > I've verified the KVM mechanics of tracing MMIO mappings fairly well (famous last > words), the only thing I haven't sanity checked is that the existing coverage for > VFIO devices is maintained.
On Mon, Jun 02, 2025 at 06:22:08PM -0700, Pawan Gupta wrote: > On Mon, Jun 02, 2025 at 04:41:35PM -0700, Sean Christopherson wrote: > > > Regarding validating this, if VERW is executed at VMenter, mitigation was > > > found to be effective. This is similar to other bugs like MDS. I am not a > > > virtualization expert, but I will try to validate whatever I can. > > > > If you can re-verify the mitigation works for VFIO devices, that's more than > > good enough for me. The bar at this point is to not regress the existing mitigation, > > anything beyond that is gravy. > > Ok sure. I'll verify that VERW is getting executed for VFIO devices. I have verified that with below patches CPU buffer clearing for MMIO Stale Data is working as expected for VFIO device. KVM: VMX: Apply MMIO Stale Data mitigation if KVM maps MMIO into the guest KVM: x86/mmu: Locally cache whether a PFN is host MMIO when making a SPTE KVM: x86: Avoid calling kvm_is_mmio_pfn() when kvm_x86_ops.get_mt_mask is NULL For the above patches: Tested-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Below are excerpts from the logs with debug prints added: # virsh start ubuntu24.04 <------ Guest launched [ 5737.281649] virbr0: port 1(vnet1) entered blocking state [ 5737.281659] virbr0: port 1(vnet1) entered disabled state [ 5737.281686] vnet1: entered allmulticast mode [ 5737.281775] vnet1: entered promiscuous mode [ 5737.282026] virbr0: port 1(vnet1) entered blocking state [ 5737.282032] virbr0: port 1(vnet1) entered listening state [ 5737.775162] vmx_vcpu_enter_exit: 13085 callbacks suppressed [ 5737.775169] kvm_intel: vmx_vcpu_enter_exit: CPU buffer NOT cleared for MMIO <----- Buffers not cleared [ 5737.775192] kvm_intel: vmx_vcpu_enter_exit: CPU buffer NOT cleared for MMIO [ 5737.775203] kvm_intel: vmx_vcpu_enter_exit: CPU buffer NOT cleared for MMIO ... Domain 'ubuntu24.04' started [ 5739.323529] virbr0: port 1(vnet1) entered learning state [ 5741.372527] virbr0: port 1(vnet1) entered forwarding state [ 5741.372540] virbr0: topology change detected, propagating [ 5742.906218] kvm_intel: vmx_vcpu_enter_exit: CPU buffer NOT cleared for MMIO [ 5742.906232] kvm_intel: vmx_vcpu_enter_exit: CPU buffer NOT cleared for MMIO [ 5742.906234] kvm_intel: vmx_vcpu_enter_exit: CPU buffer NOT cleared for MMIO [ 5747.906515] vmx_vcpu_enter_exit: 267825 callbacks suppressed ... # virsh attach-device ubuntu24.04 vfio.xml --live <----- Device attached [ 5749.913996] ioatdma 0000:00:01.1: Removing dma and dca services [ 5750.786112] vfio-pci 0000:00:01.1: resetting [ 5750.891646] vfio-pci 0000:00:01.1: reset done [ 5750.900521] vfio-pci 0000:00:01.1: resetting [ 5751.003645] vfio-pci 0000:00:01.1: reset done Device attached successfully [ 5751.074292] kvm_intel: vmx_vcpu_enter_exit: CPU buffer cleared for MMIO <----- Buffers getting cleared [ 5751.074293] kvm_intel: vmx_vcpu_enter_exit: CPU buffer cleared for MMIO [ 5751.074294] kvm_intel: vmx_vcpu_enter_exit: CPU buffer cleared for MMIO [ 5756.076427] vmx_vcpu_enter_exit: 68991 callbacks suppressed
© 2016 - 2025 Red Hat, Inc.