[PATCH v2 00/20] KVM: x86/xen: Fix Xen/GP/PREEMPT_RT issues with rwlock_t

Sean Christopherson posted 20 patches 1 week, 2 days ago
arch/x86/include/asm/kvm_host.h |   2 +-
arch/x86/kvm/x86.c              | 140 +++++++---------
arch/x86/kvm/xen.c              | 288 +++++++++++++-------------------
include/linux/kvm_host.h        |  84 +++++++---
include/linux/kvm_types.h       |  17 ++
kernel/locking/rwbase_rt.c      |   5 +-
virt/kvm/pfncache.c             |  68 ++++++--
7 files changed, 304 insertions(+), 300 deletions(-)
[PATCH v2 00/20] KVM: x86/xen: Fix Xen/GP/PREEMPT_RT issues with rwlock_t
Posted by Sean Christopherson 1 week, 2 days ago
This series fixes sleeping-in-hardirq bugs in KVM's Xen emulation on
PREEMPT_RT, cleans up the now-unnecessary IRQ disabling in GPC lock usage
throughout KVM, and then adds CLASS()-based APIs for utilizing GPCs mappings
to dedup code and (hopefully) make it easier to use GPCs in other places.
  
The core issue is that kvm_xen_set_evtchn_fast() and the Xen timer
callback are called from hardirq/atomic context, but on PREEMPT_RT the
GPC rwlock_t is a sleeping lock.

Assuming I can get an Ack on patch 1, I'm planning on grabbing at least these

  KVM: Move {g,p}fn <=> {g,h}pa conversion helpers to kvm_types.h
  KVM: x86/xen: Don't dirty track "vCPU info" page
  KVM: x86/xen: Explicitly tag "shared info" page as never being dirty tracked
  KVM: x86/xen: Extract delivery of event to vCPU into a separate helper
  KVM: x86/xen: Use guard() to grab kvm->srcu around gpc critical sections
  KVM: Remove unnecessary IRQ disabling from GPC lock in pfncache.c
  KVM: x86: Remove unnecessary irqsave from kvm_setup_guest_pvclock()
  KVM: x86/xen: Remove unnecessary irqsave from GPC lock usage in xen.c
  KVM: x86/xen: Use read_trylock() for GPC locks in hardirq/atomic paths
  locking/rt: Use raw_spin_lock_irqsave() in __rwbase_read_unlock()

for 7.2.  If people like the CLASS() stuff, I'll also probably grab these:

  KVM: Add "extended" gpc CLASS() APIs for sometimes-atomic cases
  KVM: x86/xen: Convert event injection to gpc's CLASS() APIs
  KVM: x86/xen: Drop local "kick_vcpu" from __kvm_xen_set_evtchn_fast()
  KVM: x86/xen: Convert xen_get_guest_pvclock() to gpc's CLASS() APIs
  KVM: x86/xen: Convert kvm_xen_set_evtchn_fast() to gpc's CLASS() APIs
  KVM: x86/xen: Convert wait_pending_event() to gpc's CLASS() APIs
  KVM: x86/xen: Don't bother waiting on gpc->lock in SCHEDOP_poll
  KVM: x86/xen: Convert kvm_xen_shared_info_init() to gpc's CLASS() APIs
  KVM: Add CLASS() constructs to automagically handle lock+check of gpc

I do NOT plan on grabbing the record_steal_time change for 7.2 no matter
what, even though I do like the end result, as I still have concerns over the
lack of range-based invalidation for GPCs.  I 100% agree that such problems
are really only due to flawed VMMs and/or setups, but unfortunately history
has shown that there are a suprising number of deployments running what I
would consider flawed setups, e.g. run with NUMA autobalancing and KSM.

I realize I'm being somewhat paranoid, as KVM already uses a GPC for PV
clocks.  But for modern setups, KVM_REQ_CLOCK_UPDATE is a rare event, whereas
KVM will update steal time (when enabled) on every vCPU load.  So I want a
high level of confidence that KVM won't regress "imperfect" setups before
switching to a GPC for steal time (though again, I definitely like the end
result and want to do so).

[*] https://lore.kernel.org/all/20240821202814.711673-2-dwmw2@infradead.org

v2:
 - Add the CLASS() APIs.
 - Move the steal time change to the very end.
 - "Fix" a dirty logging inconsistency with the Xen vCPU info page.

v1: https://lore.kernel.org/all/20260508181717.3230988-1-dwmw2@infradead.org

Carsten Stollmaier (1):
  KVM: x86: Use gfn_to_pfn_cache for record_steal_time

David Woodhouse (5):
  locking/rt: Use raw_spin_lock_irqsave() in __rwbase_read_unlock()
  KVM: x86/xen: Use read_trylock() for GPC locks in hardirq/atomic paths
  KVM: x86/xen: Remove unnecessary irqsave from GPC lock usage in xen.c
  KVM: x86: Remove unnecessary irqsave from kvm_setup_guest_pvclock()
  KVM: Remove unnecessary IRQ disabling from GPC lock in pfncache.c

Sean Christopherson (14):
  KVM: x86/xen: Use guard() to grab kvm->srcu around gpc critical
    sections
  KVM: x86/xen: Extract delivery of event to vCPU into a separate helper
  KVM: x86/xen: Explicitly tag "shared info" page as never being dirty
    tracked
  KVM: x86/xen: Don't dirty track "vCPU info" page
  KVM: Move {g,p}fn <=> {g,h}pa conversion helpers to kvm_types.h
  KVM: Add CLASS() constructs to automagically handle lock+check of gpc
  KVM: x86/xen: Convert kvm_xen_shared_info_init() to gpc's CLASS() APIs
  KVM: x86/xen: Don't bother waiting on gpc->lock in SCHEDOP_poll
  KVM: x86/xen: Convert wait_pending_event() to gpc's CLASS() APIs
  KVM: x86/xen: Convert kvm_xen_set_evtchn_fast() to gpc's CLASS() APIs
  KVM: x86/xen: Convert xen_get_guest_pvclock() to gpc's CLASS() APIs
  KVM: x86/xen: Drop local "kick_vcpu" from __kvm_xen_set_evtchn_fast()
  KVM: x86/xen: Convert event injection to gpc's CLASS() APIs
  KVM: Add "extended" gpc CLASS() APIs for sometimes-atomic cases

 arch/x86/include/asm/kvm_host.h |   2 +-
 arch/x86/kvm/x86.c              | 140 +++++++---------
 arch/x86/kvm/xen.c              | 288 +++++++++++++-------------------
 include/linux/kvm_host.h        |  84 +++++++---
 include/linux/kvm_types.h       |  17 ++
 kernel/locking/rwbase_rt.c      |   5 +-
 virt/kvm/pfncache.c             |  68 ++++++--
 7 files changed, 304 insertions(+), 300 deletions(-)


base-commit: d1568b1332b6b3b36b222c2868fc102727c12a34
-- 
2.54.0.823.g6e5bcc1fc9-goog