Add a blog post on "Micro-Optimizing KVM VM-Exits"

[PATCH v2] Add a blog post on "Micro-Optimizing KVM VM-Exits"

Posted by Kashyap Chamarthy 4 years, 5 months ago

This blog post summarizes the talk "Micro-Optimizing KVM VM-Exits"[1],
given by Andrea Arcangeli at the recently concluded KVM Forum 2019.

[1] https://kvmforum2019.sched.com/event/Tmwr/micro-optimizing-kvm-vm-exits-andrea-arcangeli-red-hat-inc

Signed-off-by: Kashyap Chamarthy <kchamart@redhat.com>
---
v2: Address Rich W.M Jones' feedback
---
 ...019-11-06-micro-optimizing-kvm-vmexits.txt | 115 ++++++++++++++++++
 1 file changed, 115 insertions(+)
 create mode 100644 _posts/2019-11-06-micro-optimizing-kvm-vmexits.txt

diff --git a/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
new file mode 100644
index 0000000000000000000000000000000000000000..f4a28d58ddb40103dd599fdfd861eeb4c41ed976
--- /dev/null
+++ b/_posts/2019-11-06-micro-optimizing-kvm-vmexits.txt
@@ -0,0 +1,115 @@
+---
+layout: post
+title: "Micro-Optimizing KVM VM-Exits"
+date:   2019-11-08
+categories: [kvm, optimization]
+---
+
+Background on VM-Exits
+----------------------
+
+KVM (Kernel-based Virtual Machine) is the Linux kernel module that
+allows a host to run virtualized guests (Linux, Windows, etc).  The KVM
+"guest execution loop", with QEMU (the open source emulator and
+virtualizer) as its user space, is roughly as follows: QEMU issues the
+ioctl(), KVM_RUN, to tell KVM to prepare to enter the CPU's "Guest Mode"
+-- a special processor mode which allows guest code to safely run
+directly on the physical CPU.  The guest code, which is inside a "jail"
+and thus cannot interfere with the rest of the system, keeps running on
+the hardware until it encounters a request it cannot handle.  Then the
+processor gives the control back (referred to as "VM-Exit") either to
+kernel space, or to the user space to handle the request.  Once the
+request is handled, native execution of guest code on the processor
+resumes again.  And the loop goes on.
+
+There are dozens of reasons for VM-Exits (Intel's Software Developer
+Manual outlines 64 "Basic Exit Reasons").  For example, when a guest
+needs to emulate the CPUID instruction, it causes a "light-weight exit"
+to kernel space, because CPUID (among a few others) is emulated in the
+kernel itself, for performance reasons.  But when the kernel _cannot_
+handle a request, e.g. to emulate certain hardware, it results in a
+"heavy-weight exit" to QEMU, to perform the emulation.  These VM-Exits
+and subsequent re-entries ("VM-Enters"), even the light-weight ones, can
+be expensive.  What can be done about it?
+
+Guest workloads that are hard to virtualize
+-------------------------------------------
+
+At the 2019 edition of the KVM Forum in Lyon, kernel developer, Andrea
+Arcangeli, attempted to address the kernel part of minimizing VM-Exits.
+
+His talk touched on the cost of VM-Exits into the kernel, especially for
+guest workloads (e.g. enterprise databases) that are sensitive to their
+performance penalty.  However, these workloads cannot avoid triggering
+VM-Exits with a high frequency.  Andrea then outlined some of the
+optimizations he's been working on to improve the VM-Exit performance in
+the KVM code path -- especially in light of applying mitigations for
+speculative execution flaws (Spectre v2, MDS, L1TF).
+
+Andrea gave a brief recap of the different kinds of speculative
+execution attacks (retpolines, IBPB, PTI, SSBD, etc).  Followed by that
+he outlined the performance impact of Spectre-v2 mitigations in context
+of KVM.
+
+The microbechmark: CPUID in a one million loop
+----------------------------------------------
+
+The synthetic microbenchmark (meaning, focus on measuring the
+performance of a specific area of code) Andrea used was to run the CPUID
+instruction one million times, without any GCC optimizations or caching.
+This was done to test the latency of VM-Exits.
+
+While stressing that the results of these microbenchmarks do not
+represent real-world workloads, he had two goals in mind with it: (a)
+explain how the software mitigation works; and (b) to justify to the
+broader community the value of the software optimizations he's working
+on in KVM.
+
+Andrea then reasoned through several interesting graphs that show how
+CPU computation time gets impacted when you disable or enable the
+various kernel-space mitigations for Spectre v2, L1TF, MDS, et al.
+
+The proposal: "KVM Monolithic"
+------------------------------
+
+Based on his investigation, Andrea proposed a patch series, ["KVM
+monolithc"](https://lwn.net/Articles/800870/), to get rid of the KVM
+common module, 'kvm.ko'.  Instead the KVM common code gets linked twice
+into each of the vendor-specific KVM modules, 'kvm-intel.ko' and
+'kvm-amd.ko'.
+
+The reason for doing this is that the 'kvm.ko' module indirectly calls
+(via the "retpoline" technique) the vendor-specific KVM modules at every
+VM-Exit, several times.  These indirect calls were not optimal before,
+but the "retpoline" mitigation (which isolates indirect branches, that
+allow a CPU to execute code from arbitrary locations, from speculative
+execution) for Spectre v2 compounds the problem, as it degrades
+performance.
+
+This approach will result in a few MiB of increased disk space for
+'kvm-intel.ko' and 'kvm-amd.ko', but the upside in saved indirect calls,
+and the elimination of "retpoline" overhead at run-time more than
+compensate for it.
+
+With the "KVM Monolithic" patch series applied, Andrea's microbenchmarks
+show a double-digit improvement in performance with default mitigations
+(for Spectre v2, et al) enabled on both Intel 'VMX' and AMD 'SVM'.  And
+with 'spectre_v2=off' or for CPUs with IBRS_ALL in ARCH_CAPABILITIES
+"KVM monolithic" still improve[s] performance, albiet it's on the order
+of 1%.
+
+Conclusion
+----------
+
+Removal of the common KVM module has a non-negligible positive
+performance impact.  And the "KVM Monolitic" patch series is still
+actively being reviewed, modulo some pending clean-ups.  Based on the
+upstream review discussion, KVM Maintainer, Paolo Bonzini, and other
+reviewers seemed amenable to merge the series.
+
+Although, we still have to deal with mitigations for 'indirect branch
+prediction' for a long time, reducing the VM-Exit latency is important
+in general; and more specifically, for guest workloads that happen to
+trigger frequent VM-Exits, without having to disable Spectre v2
+mitigations on the host, as Andrea stated in the cover letter of his
+patch series.
-- 
2.21.0

Re: [PATCH v2] Add a blog post on "Micro-Optimizing KVM VM-Exits"

Posted by no-reply@patchew.org 4 years, 5 months ago

Patchew URL: https://patchew.org/QEMU/20191115102406.31316-1-kchamart@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TEST    iotest-qcow2: 268
Failures: 192
Failed 1 of 108 iotests
make: *** [check-tests/check-block.sh] Error 1
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 662, in <module>
    sys.exit(main())
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=f97d072baeef472cb6f83a5e15c3e32b', '-u', '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-jlp3g276/src/docker-src.2019-11-15-06.40.11.6937:/var/tmp/qemu:z,ro', 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=f97d072baeef472cb6f83a5e15c3e32b
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-jlp3g276/src'
make: *** [docker-run-test-quick@centos7] Error 2

real    13m22.725s
user    0m8.050s


The full log is available at
http://patchew.org/logs/20191115102406.31316-1-kchamart@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com