[v1] KVM: x86: Cleanup #MC and XCR0/XSS/PKRU handling

[PATCH 0/4] KVM: x86: Cleanup #MC and XCR0/XSS/PKRU handling

Posted by Sean Christopherson 3 months, 1 week ago

This series is the result of the recent PUCK discussion[*] on optimizing the
XCR0/XSS loads that are currently done on every VM-Enter and VM-Exit.  My
initial thought that swapping XCR0/XSS outside of the fastpath was spot on;
turns out the only reason they're swapped in the fastpath is because of a
hack-a-fix that papered over an egregious #MC handling bug where the kernel #MC
handler would call schedule() from an atomic context.  The resulting #GP due to
trying to swap FPU state with a guest XCR0/XSS was "fixed" by loading the host
values before handling #MCs from the guest.

Thankfully, the #MC mess has long since been cleaned up, so it's once again
safe to swap XCR0/XSS outside of the fastpath (but when IRQs are disabled!).

As for what may be contributing to the SAP HANA performance improvements when
enabling PKU, my instincts again appear to be spot on.  As predicted, the
fastpath savings are ~300 cycles on Intel (~500 on AMD).  I.e. if the guest
is literally doing _nothing_ but generating fastpath exits, it will see a
~%25 improvement.  There's basically zero chance the uplift seen with enabling
PKU is dues to eliding XCR0 loads; my guess is that the guest actualy uses
protection keys to optimize something.

Why does kvm_load_guest_xsave_state() show up in perf?  Probably because it's
the only visible symbol other than vmx_vmexit() (and vmx_vcpu_run() when not
hammering the fastpath).  E.g. running perf top on a running VM instance yields
these numbers with various guest workloads (the middle one is running
mmu_stress_test in the guest, which hammers on mmu_lock in L0).  But other than
doing INVD (handled in the fastpath) in a tight loop, there's no perceived perf
improvement from the guest.

Overhead  Shared Object       Symbol
  15.65%  [kernel]            [k] vmx_vmexit
   6.78%  [kernel]            [k] kvm_vcpu_halt
   5.15%  [kernel]            [k] __srcu_read_lock
   4.73%  [kernel]            [k] kvm_load_guest_xsave_state
   4.69%  [kernel]            [k] __srcu_read_unlock
   4.65%  [kernel]            [k] read_tsc
   4.44%  [kernel]            [k] vmx_sync_pir_to_irr
   4.03%  [kernel]            [k] kvm_apic_has_interrupt


  45.52%  [kernel]            [k] queued_spin_lock_slowpath
  24.40%  [kernel]            [k] vmx_vmexit
   2.84%  [kernel]            [k] queued_write_lock_slowpath
   1.92%  [kernel]            [k] vmx_vcpu_run
   1.40%  [kernel]            [k] vcpu_run
   1.00%  [kernel]            [k] kvm_load_guest_xsave_state
   0.84%  [kernel]            [k] kvm_load_host_xsave_state
   0.72%  [kernel]            [k] mmu_try_to_unsync_pages
   0.68%  [kernel]            [k] __srcu_read_lock
   0.65%  [kernel]            [k] try_get_folio

  17.78%  [kernel]            [k] vmx_vmexit
   5.08%  [kernel]            [k] vmx_vcpu_run
   4.24%  [kernel]            [k] vcpu_run
   4.21%  [kernel]            [k] _raw_spin_lock_irqsave
   2.99%  [kernel]            [k] kvm_load_guest_xsave_state
   2.51%  [kernel]            [k] rcu_note_context_switch
   2.47%  [kernel]            [k] ktime_get_update_offsets_now
   2.21%  [kernel]            [k] kvm_load_host_xsave_state
   2.16%  [kernel]            [k] fput

[*] https://drive.google.com/corp/drive/folders/1DCdvqFGudQc7pxXjM7f35vXogTf9uhD4

Sean Christopherson (4):
  KVM: SVM: Handle #MCs in guest outside of fastpath
  KVM: VMX: Handle #MCs on VM-Enter/TD-Enter outside of the fastpath
  KVM: x86: Load guest/host XCR0 and XSS outside of the fastpath run
    loop
  KVM: x86: Load guest/host PKRU outside of the fastpath run loop

 arch/x86/kvm/svm/svm.c  | 20 ++++++++--------
 arch/x86/kvm/vmx/main.c | 13 ++++++++++-
 arch/x86/kvm/vmx/tdx.c  |  3 ---
 arch/x86/kvm/vmx/vmx.c  |  7 ------
 arch/x86/kvm/x86.c      | 51 ++++++++++++++++++++++++++++-------------
 arch/x86/kvm/x86.h      |  2 --
 6 files changed, 56 insertions(+), 40 deletions(-)


base-commit: 4cc167c50eb19d44ac7e204938724e685e3d8057
-- 
2.51.1.930.gacf6e81ea2-goog

Re: [PATCH 0/4] KVM: x86: Cleanup #MC and XCR0/XSS/PKRU handling

Posted by Sean Christopherson 2 months, 4 weeks ago

On Thu, 30 Oct 2025 15:42:42 -0700, Sean Christopherson wrote:
> This series is the result of the recent PUCK discussion[*] on optimizing the
> XCR0/XSS loads that are currently done on every VM-Enter and VM-Exit.  My
> initial thought that swapping XCR0/XSS outside of the fastpath was spot on;
> turns out the only reason they're swapped in the fastpath is because of a
> hack-a-fix that papered over an egregious #MC handling bug where the kernel #MC
> handler would call schedule() from an atomic context.  The resulting #GP due to
> trying to swap FPU state with a guest XCR0/XSS was "fixed" by loading the host
> values before handling #MCs from the guest.
> 
> [...]

Applied to kvm-x86 misc, thanks!

[1/4] KVM: SVM: Handle #MCs in guest outside of fastpath
      https://github.com/kvm-x86/linux/commit/6e640bb5caab
[2/4] KVM: VMX: Handle #MCs on VM-Enter/TD-Enter outside of the fastpath
      https://github.com/kvm-x86/linux/commit/8934c592bcbf
[3/4] KVM: x86: Load guest/host XCR0 and XSS outside of the fastpath run loop
      https://github.com/kvm-x86/linux/commit/3377a9233d30
[4/4] KVM: x86: Load guest/host PKRU outside of the fastpath run loop
      https://github.com/kvm-x86/linux/commit/7df3021b622f

--
https://github.com/kvm-x86/linux/tree/next

Re: [PATCH 0/4] KVM: x86: Cleanup #MC and XCR0/XSS/PKRU handling

Posted by Sean Christopherson 2 months, 3 weeks ago

On Mon, Nov 10, 2025, Sean Christopherson wrote:
> On Thu, 30 Oct 2025 15:42:42 -0700, Sean Christopherson wrote:
> > This series is the result of the recent PUCK discussion[*] on optimizing the
> > XCR0/XSS loads that are currently done on every VM-Enter and VM-Exit.  My
> > initial thought that swapping XCR0/XSS outside of the fastpath was spot on;
> > turns out the only reason they're swapped in the fastpath is because of a
> > hack-a-fix that papered over an egregious #MC handling bug where the kernel #MC
> > handler would call schedule() from an atomic context.  The resulting #GP due to
> > trying to swap FPU state with a guest XCR0/XSS was "fixed" by loading the host
> > values before handling #MCs from the guest.
> > 
> > [...]
> 
> Applied to kvm-x86 misc, thanks!
> 
> [1/4] KVM: SVM: Handle #MCs in guest outside of fastpath
>       https://github.com/kvm-x86/linux/commit/6e640bb5caab
> [2/4] KVM: VMX: Handle #MCs on VM-Enter/TD-Enter outside of the fastpath
>       https://github.com/kvm-x86/linux/commit/8934c592bcbf
> [3/4] KVM: x86: Load guest/host XCR0 and XSS outside of the fastpath run loop
>       https://github.com/kvm-x86/linux/commit/3377a9233d30
> [4/4] KVM: x86: Load guest/host PKRU outside of the fastpath run loop
>       https://github.com/kvm-x86/linux/commit/7df3021b622f

I've dropped these for now as patch 2 broke TDX.  I'll send a v2 shortly.

Re: [PATCH 0/4] KVM: x86: Cleanup #MC and XCR0/XSS/PKRU handling

Posted by Edgecombe, Rick P 3 months, 1 week ago

On Thu, 2025-10-30 at 15:42 -0700, Sean Christopherson wrote:
> Sean Christopherson (4):
>   KVM: SVM: Handle #MCs in guest outside of fastpath
>   KVM: VMX: Handle #MCs on VM-Enter/TD-Enter outside of the fastpath
>   KVM: x86: Load guest/host XCR0 and XSS outside of the fastpath run
>     loop
>   KVM: x86: Load guest/host PKRU outside of the fastpath run loop

Reviewed-by: Rick Edgecombe <rick.p.edgecombe@intel.com>

Interesting analysis.

Re: [PATCH 0/4] KVM: x86: Cleanup #MC and XCR0/XSS/PKRU handling

Posted by Jon Kohler 3 months, 1 week ago


> On Oct 30, 2025, at 6:42 PM, Sean Christopherson <seanjc@google.com> wrote:
> 
> !-------------------------------------------------------------------|
>  CAUTION: External Email
> 
> |-------------------------------------------------------------------!
> 
> This series is the result of the recent PUCK discussion[*] on optimizing the
> XCR0/XSS loads that are currently done on every VM-Enter and VM-Exit.  My
> initial thought that swapping XCR0/XSS outside of the fastpath was spot on;
> turns out the only reason they're swapped in the fastpath is because of a
> hack-a-fix that papered over an egregious #MC handling bug where the kernel #MC
> handler would call schedule() from an atomic context.  The resulting #GP due to
> trying to swap FPU state with a guest XCR0/XSS was "fixed" by loading the host
> values before handling #MCs from the guest.
> 
> Thankfully, the #MC mess has long since been cleaned up, so it's once again
> safe to swap XCR0/XSS outside of the fastpath (but when IRQs are disabled!).

Thank you for doing the diligence on this, I appreciate it! 

> As for what may be contributing to the SAP HANA performance improvements when
> enabling PKU, my instincts again appear to be spot on.  As predicted, the
> fastpath savings are ~300 cycles on Intel (~500 on AMD).  I.e. if the guest
> is literally doing _nothing_ but generating fastpath exits, it will see a
> ~%25 improvement.  There's basically zero chance the uplift seen with enabling
> PKU is dues to eliding XCR0 loads; my guess is that the guest actualy uses
> protection keys to optimize something.

Every little bit counts, thats a healthy percentage speedup for fast path stuff,
especially on AMD.

> Why does kvm_load_guest_xsave_state() show up in perf?  Probably because it's
> the only visible symbol other than vmx_vmexit() (and vmx_vcpu_run() when not
> hammering the fastpath).  E.g. running perf top on a running VM instance yields
> these numbers with various guest workloads (the middle one is running
> mmu_stress_test in the guest, which hammers on mmu_lock in L0).  But other than
> doing INVD (handled in the fastpath) in a tight loop, there's no perceived perf
> improvement from the guest.

nit: it’d be nice if these bits were labeled with what they were from (the middle one
you called out above, but what’s the first and third one)

> Overhead  Shared Object       Symbol
>  15.65%  [kernel]            [k] vmx_vmexit
>   6.78%  [kernel]            [k] kvm_vcpu_halt
>   5.15%  [kernel]            [k] __srcu_read_lock
>   4.73%  [kernel]            [k] kvm_load_guest_xsave_state
>   4.69%  [kernel]            [k] __srcu_read_unlock
>   4.65%  [kernel]            [k] read_tsc
>   4.44%  [kernel]            [k] vmx_sync_pir_to_irr
>   4.03%  [kernel]            [k] kvm_apic_has_interrupt
> 
> 
>  45.52%  [kernel]            [k] queued_spin_lock_slowpath
>  24.40%  [kernel]            [k] vmx_vmexit
>   2.84%  [kernel]            [k] queued_write_lock_slowpath
>   1.92%  [kernel]            [k] vmx_vcpu_run
>   1.40%  [kernel]            [k] vcpu_run
>   1.00%  [kernel]            [k] kvm_load_guest_xsave_state
>   0.84%  [kernel]            [k] kvm_load_host_xsave_state
>   0.72%  [kernel]            [k] mmu_try_to_unsync_pages
>   0.68%  [kernel]            [k] __srcu_read_lock
>   0.65%  [kernel]            [k] try_get_folio
> 
>  17.78%  [kernel]            [k] vmx_vmexit
>   5.08%  [kernel]            [k] vmx_vcpu_run
>   4.24%  [kernel]            [k] vcpu_run
>   4.21%  [kernel]            [k] _raw_spin_lock_irqsave
>   2.99%  [kernel]            [k] kvm_load_guest_xsave_state
>   2.51%  [kernel]            [k] rcu_note_context_switch
>   2.47%  [kernel]            [k] ktime_get_update_offsets_now
>   2.21%  [kernel]            [k] kvm_load_host_xsave_state
>   2.16%  [kernel]            [k] fput
> 
> [*] https://drive.google.com/drive/folders/1DCdvqFGudQc7pxXjM7f35vXogTf9uhD4
> 
> Sean Christopherson (4):
>  KVM: SVM: Handle #MCs in guest outside of fastpath
>  KVM: VMX: Handle #MCs on VM-Enter/TD-Enter outside of the fastpath
>  KVM: x86: Load guest/host XCR0 and XSS outside of the fastpath run
>    loop
>  KVM: x86: Load guest/host PKRU outside of the fastpath run loop
> 
> arch/x86/kvm/svm/svm.c  | 20 ++++++++--------
> arch/x86/kvm/vmx/main.c | 13 ++++++++++-
> arch/x86/kvm/vmx/tdx.c  |  3 ---
> arch/x86/kvm/vmx/vmx.c  |  7 ------
> arch/x86/kvm/x86.c      | 51 ++++++++++++++++++++++++++++-------------
> arch/x86/kvm/x86.h      |  2 --
> 6 files changed, 56 insertions(+), 40 deletions(-)
> 
> 
> base-commit: 4cc167c50eb19d44ac7e204938724e685e3d8057
> -- 
> 2.51.1.930.gacf6e81ea2-goog
> 

Had one conversation starter comment on patch 4, but otherwise, LGTM for
the entire series, thanks again for the help!

Reviewed-By: Jon Kohler <jon@nutanix.com>