[PATCH v3 0/2] KVM: x86/mmu: Run NX huge page recovery under MMU read lock

Vipin Sharma posted 2 patches 2 months, 3 weeks ago
arch/x86/include/asm/kvm_host.h |  13 +++-
arch/x86/kvm/mmu/mmu.c          | 116 ++++++++++++++++++++++----------
arch/x86/kvm/mmu/mmu_internal.h |   8 ++-
arch/x86/kvm/mmu/tdp_mmu.c      |  73 ++++++++++++++++----
arch/x86/kvm/mmu/tdp_mmu.h      |   6 +-
5 files changed, 164 insertions(+), 52 deletions(-)
[PATCH v3 0/2] KVM: x86/mmu: Run NX huge page recovery under MMU read lock
Posted by Vipin Sharma 2 months, 3 weeks ago
Split NX huge page recovery in two separate flows, one for TDP MMU and
one for non-TDP MMU.

TDP MMU flow will use MMU read lock and non-TDP MMU flow will use MMU
write lock. This change unblocks vCPUs which are waiting for MMU read
lock while NX huge page recovery is running and zapping MMU pages.

A Windows guest was showing network latency jitters which was root
caused to vCPUs waiting for MMU read lock when NX huge page recovery
thread was holding MMU write lock. Disabling NX huge page recovery fixed
the jitter issue.

So, to optimize NX huge page recovery, it was modified to run under MMU
read lock, the switch made jitter issue disappear completely and vCPUs
wait time for MMU read lock reduced drastically. Patch 2 commit log has
the data from the tool to show improvement observed.

Patch 1 splits the NX huge pages tracking into two lists, one for TDP
MMU and one for shadow and legacy MMU. Patch 2 adds support to run
recovery worker under MMU read lock for TDP MMU pages.

v3:
- Use pointers in track and untrack NX huge pages APIs for accounting.
- Remove #ifdefs from v2.
- Fix error in v2 where TDP MMU flow was using
  cond_resched_rwlock_write() instead of cond_resched_rwlock_read() 
- Keep common code for both TDP and non-TDP MMU logic.
- Create wrappers for TDP MMU data structures to avoid #ifdefs.

v2: https://lore.kernel.org/kvm/20240829191135.2041489-1-vipinsh@google.com/#t
- Track legacy and TDP MMU NX huge pages separately.
- Each list has their own calculation of "to_zap", i.e. number of pages
  to zap.
- Unaccount huge page before dirty log check and zap logic in TDP MMU recovery
  worker. Check patch 4 for more details.
- 32 bit build issue fix.
- Sparse warning fix for comparing RCU pointer with non-RCU pointer.
  (sp->spt == spte_to_child_pt())


v1: https://lore.kernel.org/kvm/20240812171341.1763297-1-vipinsh@google.com/#t

Vipin Sharma (2):
  KVM: x86/mmu: Track TDP MMU NX huge pages separately
  KVM: x86/mmu: Recover TDP MMU NX huge pages using MMU read lock

 arch/x86/include/asm/kvm_host.h |  13 +++-
 arch/x86/kvm/mmu/mmu.c          | 116 ++++++++++++++++++++++----------
 arch/x86/kvm/mmu/mmu_internal.h |   8 ++-
 arch/x86/kvm/mmu/tdp_mmu.c      |  73 ++++++++++++++++----
 arch/x86/kvm/mmu/tdp_mmu.h      |   6 +-
 5 files changed, 164 insertions(+), 52 deletions(-)


base-commit: 332d2c1d713e232e163386c35a3ba0c1b90df83f
-- 
2.46.0.469.g59c65b2a67-goog