Documentation/virt/kvm/api.rst | 6 +- Documentation/virt/kvm/x86/errata.rst | 18 + arch/x86/include/asm/kvm_host.h | 15 +- arch/x86/kvm/mmu.h | 7 +- arch/x86/kvm/mmu/mmu.c | 35 +- arch/x86/kvm/mtrr.c | 644 ++------------------------ arch/x86/kvm/vmx/vmx.c | 40 +- arch/x86/kvm/x86.c | 24 +- arch/x86/kvm/x86.h | 4 - include/linux/srcu.h | 14 + 10 files changed, 105 insertions(+), 702 deletions(-)
First, rip out KVM's support for virtualizing guest MTRRs on VMX. The code is costly to main, a drag on guest boot performance, imperfect, and not required for functional correctness with modern guest kernels. Many details in patch 1's changelog. With MTRR virtualization gone, always honor guest PAT on Intel CPUs that support self-snoop, as such CPUs are guaranteed to maintain coherency even if the guest is aliasing memtypes, e.g. if the host is using WB but the guest is using WC. Honoring guest PAT is desirable for use cases where the guest must use WC when accessing memory that is DMA'd from a non-coherent device that does NOT bounce through VFIO, e.g. for mediated virtual GPUs. The SRCU patch adds an API that is effectively documentation for the memory barrier in srcu_read_lock(). Intel CPUs with self-snoop require a memory barrier after VM-Exit to ensure coherency, and KVM always does a srcu_read_lock() before reading guest memory after VM-Exit. Relying on SRCU to provide the barrier allows KVM to avoid emitting a redundant barrier of its own. This series needs a _lot_ more testing; I arguably should have tagged it RFC, but I'm feeling lucky. Sean Christopherson (3): KVM: x86: Remove VMX support for virtualizing guest MTRR memtypes KVM: VMX: Drop support for forcing UC memory when guest CR0.CD=1 KVM: VMX: Always honor guest PAT on CPUs that support self-snoop Yan Zhao (2): srcu: Add an API for a memory barrier after SRCU read lock KVM: x86: Ensure a full memory barrier is emitted in the VM-Exit path Documentation/virt/kvm/api.rst | 6 +- Documentation/virt/kvm/x86/errata.rst | 18 + arch/x86/include/asm/kvm_host.h | 15 +- arch/x86/kvm/mmu.h | 7 +- arch/x86/kvm/mmu/mmu.c | 35 +- arch/x86/kvm/mtrr.c | 644 ++------------------------ arch/x86/kvm/vmx/vmx.c | 40 +- arch/x86/kvm/x86.c | 24 +- arch/x86/kvm/x86.h | 4 - include/linux/srcu.h | 14 + 10 files changed, 105 insertions(+), 702 deletions(-) base-commit: 964d0c614c7f71917305a5afdca9178fe8231434 -- 2.44.0.278.ge034bb2e1d-goog
On Fri, 08 Mar 2024 17:09:24 -0800, Sean Christopherson wrote:
> First, rip out KVM's support for virtualizing guest MTRRs on VMX. The
> code is costly to main, a drag on guest boot performance, imperfect, and
> not required for functional correctness with modern guest kernels. Many
> details in patch 1's changelog.
>
> With MTRR virtualization gone, always honor guest PAT on Intel CPUs that
> support self-snoop, as such CPUs are guaranteed to maintain coherency
> even if the guest is aliasing memtypes, e.g. if the host is using WB but
> the guest is using WC. Honoring guest PAT is desirable for use cases
> where the guest must use WC when accessing memory that is DMA'd from a
> non-coherent device that does NOT bounce through VFIO, e.g. for mediated
> virtual GPUs.
>
> [...]
Applied to kvm-x86 mtrrs, to get as much testing as possible before a potential
merge in 6.11.
Paul, if you can take a gander at patch 3, it would be much appreciated.
Thanks!
[1/5] KVM: x86: Remove VMX support for virtualizing guest MTRR memtypes
https://github.com/kvm-x86/linux/commit/0a7b73559b39
[2/5] KVM: VMX: Drop support for forcing UC memory when guest CR0.CD=1
https://github.com/kvm-x86/linux/commit/e1548088ff54
[3/5] srcu: Add an API for a memory barrier after SRCU read lock
https://github.com/kvm-x86/linux/commit/fcfe671e0879
[4/5] KVM: x86: Ensure a full memory barrier is emitted in the VM-Exit path
https://github.com/kvm-x86/linux/commit/eb8d8fc29286
[5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop
https://github.com/kvm-x86/linux/commit/95200f24b862
--
https://github.com/kvm-x86/linux/tree/next
On Wed, Jun 05, 2024 at 04:20:34PM -0700, Sean Christopherson wrote: > On Fri, 08 Mar 2024 17:09:24 -0800, Sean Christopherson wrote: > > First, rip out KVM's support for virtualizing guest MTRRs on VMX. The > > code is costly to main, a drag on guest boot performance, imperfect, and > > not required for functional correctness with modern guest kernels. Many > > details in patch 1's changelog. > > > > With MTRR virtualization gone, always honor guest PAT on Intel CPUs that > > support self-snoop, as such CPUs are guaranteed to maintain coherency > > even if the guest is aliasing memtypes, e.g. if the host is using WB but > > the guest is using WC. Honoring guest PAT is desirable for use cases > > where the guest must use WC when accessing memory that is DMA'd from a > > non-coherent device that does NOT bounce through VFIO, e.g. for mediated > > virtual GPUs. > > > > [...] > > Applied to kvm-x86 mtrrs, to get as much testing as possible before a potential > merge in 6.11. > > Paul, if you can take a gander at patch 3, it would be much appreciated. > > Thanks! > > [1/5] KVM: x86: Remove VMX support for virtualizing guest MTRR memtypes > https://github.com/kvm-x86/linux/commit/0a7b73559b39 > [2/5] KVM: VMX: Drop support for forcing UC memory when guest CR0.CD=1 > https://github.com/kvm-x86/linux/commit/e1548088ff54 > [3/5] srcu: Add an API for a memory barrier after SRCU read lock > https://github.com/kvm-x86/linux/commit/fcfe671e0879 Looks straightforward enough. We could combine this with the existing smp_mb__after_srcu_read_unlock(), but if we did that, someone would no doubt come up with some clever optimization that provided a full barrier in srcu_read_lock() but not in srcu_read_unlock() or vice versa. So: Reviewed-by: Paul E. McKenney <paulmck@kernel.org> > [4/5] KVM: x86: Ensure a full memory barrier is emitted in the VM-Exit path > https://github.com/kvm-x86/linux/commit/eb8d8fc29286 > [5/5] KVM: VMX: Always honor guest PAT on CPUs that support self-snoop > https://github.com/kvm-x86/linux/commit/95200f24b862 > > -- > https://github.com/kvm-x86/linux/tree/next
Xiangfei found out an failure in kvm unit test rdtsc_vmexit_diff_test with below error log: "FAIL: RDTSC to VM-exit delta too high in 100 of 100 iterations, last = 902 FAIL: Guest didn't run to completion." Fixed it by adding below lines in the unit test rdtsc_vmexit_diff_test before enter guest in my side. vmcs_write(HOST_PAT, 0x6); vmcs_clear_bits(EXI_CONTROLS, EXI_SAVE_PAT); vmcs_set_bits(EXI_CONTROLS, EXI_LOAD_PAT);
Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com> I have verified this method which can solve the issue. -----Original Message----- From: Zhao, Yan Y <yan.y.zhao@intel.com> Sent: Friday, March 22, 2024 9:08 PM To: Sean Christopherson <seanjc@google.com>; Ma, XiangfeiX <xiangfeix.ma@intel.com>; Hao, Xudong <xudong.hao@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com>; Lai Jiangshan <jiangshanlai@gmail.com>; Paul E. McKenney <paulmck@kernel.org>; Josh Triplett <josh@joshtriplett.org>; kvm@vger.kernel.org; rcu@vger.kernel.org; linux-kernel@vger.kernel.org; Tian, Kevin <kevin.tian@intel.com>; Yiwei Zhang <zzyiwei@google.com> Subject: Re: [PATCH 0/5] KVM: VMX: Drop MTRR virtualization, honor guest PAT Xiangfei found out an failure in kvm unit test rdtsc_vmexit_diff_test with below error log: "FAIL: RDTSC to VM-exit delta too high in 100 of 100 iterations, last = 902 FAIL: Guest didn't run to completion." Fixed it by adding below lines in the unit test rdtsc_vmexit_diff_test before enter guest in my side. vmcs_write(HOST_PAT, 0x6); vmcs_clear_bits(EXI_CONTROLS, EXI_SAVE_PAT); vmcs_set_bits(EXI_CONTROLS, EXI_LOAD_PAT);
Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com> Testing environment is based on the EMR-2S3 platform + CentOS 9(kernel version: 6.8.0-rc4). Test cases include cpu, amx, umip, ptvmx, IPIv, vtd, PMU, SGX, kmv-unit-tests, kvm selftests, etc. And workload test on the guest using Netperf(bridge) and SPECJBB(passthrough NIC). Except for the known issue and the previously mentioned "rdtsc_vmexit_diff_test", no other issue found. -----Original Message----- From: Ma, XiangfeiX Sent: Monday, March 25, 2024 2:56 PM To: Zhao, Yan Y <yan.y.zhao@intel.com>; Sean Christopherson <seanjc@google.com>; Hao, Xudong <xudong.hao@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com>; Lai Jiangshan <jiangshanlai@gmail.com>; Paul E. McKenney <paulmck@kernel.org>; Josh Triplett <josh@joshtriplett.org>; kvm@vger.kernel.org; rcu@vger.kernel.org; linux-kernel@vger.kernel.org; Tian, Kevin <kevin.tian@intel.com>; Yiwei Zhang <zzyiwei@google.com> Subject: RE: [PATCH 0/5] KVM: VMX: Drop MTRR virtualization, honor guest PAT Tested-by: Xiangfei Ma <xiangfeix.ma@intel.com> I have verified this method which can solve the issue. -----Original Message----- From: Zhao, Yan Y <yan.y.zhao@intel.com> Sent: Friday, March 22, 2024 9:08 PM To: Sean Christopherson <seanjc@google.com>; Ma, XiangfeiX <xiangfeix.ma@intel.com>; Hao, Xudong <xudong.hao@intel.com> Cc: Paolo Bonzini <pbonzini@redhat.com>; Lai Jiangshan <jiangshanlai@gmail.com>; Paul E. McKenney <paulmck@kernel.org>; Josh Triplett <josh@joshtriplett.org>; kvm@vger.kernel.org; rcu@vger.kernel.org; linux-kernel@vger.kernel.org; Tian, Kevin <kevin.tian@intel.com>; Yiwei Zhang <zzyiwei@google.com> Subject: Re: [PATCH 0/5] KVM: VMX: Drop MTRR virtualization, honor guest PAT Xiangfei found out an failure in kvm unit test rdtsc_vmexit_diff_test with below error log: "FAIL: RDTSC to VM-exit delta too high in 100 of 100 iterations, last = 902 FAIL: Guest didn't run to completion." Fixed it by adding below lines in the unit test rdtsc_vmexit_diff_test before enter guest in my side. vmcs_write(HOST_PAT, 0x6); vmcs_clear_bits(EXI_CONTROLS, EXI_SAVE_PAT); vmcs_set_bits(EXI_CONTROLS, EXI_LOAD_PAT);
> First, rip out KVM's support for virtualizing guest MTRRs on VMX. The code is > costly to main, a drag on guest boot performance, imperfect, and not > required for functional correctness with modern guest kernels. Many details > in patch 1's changelog. > > With MTRR virtualization gone, always honor guest PAT on Intel CPUs that > support self-snoop, as such CPUs are guaranteed to maintain coherency > even if the guest is aliasing memtypes, e.g. if the host is using WB but the > guest is using WC. Honoring guest PAT is desirable for use cases where the > guest must use WC when accessing memory that is DMA'd from a non- > coherent device that does NOT bounce through VFIO, e.g. for mediated > virtual GPUs. > > The SRCU patch adds an API that is effectively documentation for the > memory barrier in srcu_read_lock(). Intel CPUs with self-snoop require a > memory barrier after VM-Exit to ensure coherency, and KVM always does a > srcu_read_lock() before reading guest memory after VM-Exit. Relying on > SRCU to provide the barrier allows KVM to avoid emitting a redundant barrier > of its own. > > This series needs a _lot_ more testing; I arguably should have tagged it RFC, > but I'm feeling lucky. > > Sean Christopherson (3): > KVM: x86: Remove VMX support for virtualizing guest MTRR memtypes > KVM: VMX: Drop support for forcing UC memory when guest CR0.CD=1 > KVM: VMX: Always honor guest PAT on CPUs that support self-snoop > > Yan Zhao (2): > srcu: Add an API for a memory barrier after SRCU read lock > KVM: x86: Ensure a full memory barrier is emitted in the VM-Exit path > > Documentation/virt/kvm/api.rst | 6 +- > Documentation/virt/kvm/x86/errata.rst | 18 + > arch/x86/include/asm/kvm_host.h | 15 +- > arch/x86/kvm/mmu.h | 7 +- > arch/x86/kvm/mmu/mmu.c | 35 +- > arch/x86/kvm/mtrr.c | 644 ++------------------------ > arch/x86/kvm/vmx/vmx.c | 40 +- > arch/x86/kvm/x86.c | 24 +- > arch/x86/kvm/x86.h | 4 - > include/linux/srcu.h | 14 + > 10 files changed, 105 insertions(+), 702 deletions(-) > > > base-commit: 964d0c614c7f71917305a5afdca9178fe8231434 > -- > 2.44.0.278.ge034bb2e1d-goog > Verified iGPU passthrough(GVT-d) on Intel platforms, TGL Core(TM) i5-1135G7/ADL Core(TM) i7-12700/RPL/MTL Ultra 7 + Ubuntu22.04 LTS. Both Linux Ubuntu 22.04 VM and Windows10 VM could boot up successfully. 3D benchmark GLmark2 can run as expected in the guest VM. Tested-by: Yongwei Ma <yongwei.ma@intel.com> Best Regards, Yongwei Ma
© 2016 - 2026 Red Hat, Inc.