From nobody Wed Sep 17 05:56:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BF37C10F1B for ; Thu, 22 Dec 2022 02:35:17 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235000AbiLVCfO (ORCPT ); Wed, 21 Dec 2022 21:35:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46286 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234956AbiLVCfD (ORCPT ); Wed, 21 Dec 2022 21:35:03 -0500 Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com [IPv6:2607:f8b0:4864:20::549]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 672BC65FB for ; Wed, 21 Dec 2022 18:35:02 -0800 (PST) Received: by mail-pg1-x549.google.com with SMTP id a72-20020a63904b000000b00489753edf13so400679pge.21 for ; Wed, 21 Dec 2022 18:35:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6jVBO9S5dbcnt6LQaPOW+OOm7KBKZVW/r/1rxTlv7y0=; b=Oo7Qropuw/gdh8zHqxm5y4tH/LW1Zuut2j3i90LaWzm1//jVejmK3t0ed6NX17Pv5j Z5OHUTQpFUFenvI4oWoFiCoLJ+I0EuIZK/8I55I53zFB4+ry3RmuEfwzelxYKcdWWLZg GPq6Dzv2zwGXrzI5DqVl+cgeh08cdc3EjFSVeDGxQcjhdy0CkU4ppP/+YBk5LKo75SLB MO/IQNg1nVXQoJHIkgNmNYodEZF949Th0Xu9IXfEGAoIYB8OL62nKrKu8ZtwfZngXvmC nD6GIoiek/7Ls1lDTRlti1yOUj0n1DXw1a1LJ+MROJyWxH1cTjY0/82fooJp4KrK9zkj K7ZQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6jVBO9S5dbcnt6LQaPOW+OOm7KBKZVW/r/1rxTlv7y0=; b=yKJaFqqWr1NixLarAH998os9Zkqjm/L6nNmWahiC7C9JV5Bix0twz3XnhL3WKE7tO7 IyWea/nSBDQFdeB3ENWQjPszLVi/nR/dsdeupetTWFWuwt71yfJh8kths/KwcUoXmlzt dV8+D03CFHtq3gtl3FiUnIehHoATQMJNWr1Rzf+3b3SA7D9+k3pqZLhbEmER+g3YHTna 5DO1125zThwvy7J8GqER4eHpUDBLjWidmlV8tMwL1BN6VQLUYLeaIWFexqxTzgUM/Srr bDGDbuaIFWbmSEbVOdX8wYrMVnfNFI0hMFBTBSHt/W5YPDa4234j2a1si7HFaad3pJYx mxRg== X-Gm-Message-State: AFqh2kqVIzpADmZ046YbWTHbfhA2IFjxrXASUG/GeTuSFIpbEBxVe8Rg AjE2tJkRV2dwF9w5qAjAx6Y2emjvbVhd X-Google-Smtp-Source: AMrXdXscnJjos0xpFtk0+sb/v4CvHXN/vVCpf4sWeWSpsLgs5nHAFKaZ6pg0jsQ+eZgvw4oz61jBcmjZT8Ng X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a17:902:fe0c:b0:189:6077:5598 with SMTP id g12-20020a170902fe0c00b0018960775598mr278964plj.100.1671676501765; Wed, 21 Dec 2022 18:35:01 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:49 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-2-vipinsh@google.com> Subject: [Patch v3 1/9] KVM: x86/mmu: Repurpose KVM MMU shrinker to purge shadow page caches From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" mmu_shrink_scan() is very disruptive to VMs. It picks the first VM in the vm_list, zaps the oldest page which is most likely an upper level SPTEs and most like to be reused. Prior to TDP MMU, this is even more disruptive in nested VMs case, considering L1 SPTEs will be the oldest even though most of the entries are for L2 SPTEs. As discussed in https://lore.kernel.org/lkml/Y45dldZnI6OIf+a5@google.com/ shrinker logic has not be very useful in actually keeping VMs performant and reducing memory usage. Change mmu_shrink_scan() to free pages from the vCPU's shadow page cache. Freeing pages from cache doesn't cause vCPU exits, therefore, a VM's performance should not be affected. This also allows to change cache capacities without worrying too much about high memory usage in cache. Tested this change by running dirty_log_perf_test while dropping cache via "echo 2 > /proc/sys/vm/drop_caches" at 1 second interval continuously. There were WARN_ON(!mc->nobjs) messages printed in kernel logs from kvm_mmu_memory_cache_alloc(), which is expected. Suggested-by: Sean Christopherson Signed-off-by: Vipin Sharma --- arch/x86/include/asm/kvm_host.h | 5 + arch/x86/kvm/mmu/mmu.c | 163 +++++++++++++++++++------------- arch/x86/kvm/mmu/mmu_internal.h | 2 + arch/x86/kvm/mmu/tdp_mmu.c | 3 +- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 11 ++- 6 files changed, 114 insertions(+), 71 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index aa4eb8cfcd7e..89cc809e4a00 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -786,6 +786,11 @@ struct kvm_vcpu_arch { struct kvm_mmu_memory_cache mmu_shadowed_info_cache; struct kvm_mmu_memory_cache mmu_page_header_cache; =20 + /* + * Protects change in size of mmu_shadow_page_cache cache. + */ + spinlock_t mmu_shadow_page_cache_lock; + /* * QEMU userspace and the guest each have their own FPU state. * In vcpu_run, we switch between the user and guest FPU contexts. diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 254bc46234e0..157417e1cb6e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -164,7 +164,10 @@ struct kvm_shadow_walk_iterator { =20 static struct kmem_cache *pte_list_desc_cache; struct kmem_cache *mmu_page_header_cache; -static struct percpu_counter kvm_total_used_mmu_pages; +/* + * Total number of unused pages in MMU shadow page cache. + */ +static struct percpu_counter kvm_total_unused_mmu_pages; =20 static void mmu_spte_set(u64 *sptep, u64 spte); =20 @@ -655,6 +658,22 @@ static void walk_shadow_page_lockless_end(struct kvm_v= cpu *vcpu) } } =20 +static int mmu_topup_sp_memory_cache(struct kvm_mmu_memory_cache *cache, + spinlock_t *cache_lock) +{ + int orig_nobjs; + int r; + + spin_lock(cache_lock); + orig_nobjs =3D cache->nobjs; + r =3D kvm_mmu_topup_memory_cache(cache, PT64_ROOT_MAX_LEVEL); + if (orig_nobjs !=3D cache->nobjs) + percpu_counter_add(&kvm_total_unused_mmu_pages, + (cache->nobjs - orig_nobjs)); + spin_unlock(cache_lock); + return r; +} + static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indir= ect) { int r; @@ -664,8 +683,8 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcp= u, bool maybe_indirect) 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM); if (r) return r; - r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache, - PT64_ROOT_MAX_LEVEL); + r =3D mmu_topup_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache, + &vcpu->arch.mmu_shadow_page_cache_lock); if (r) return r; if (maybe_indirect) { @@ -678,10 +697,25 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *v= cpu, bool maybe_indirect) PT64_ROOT_MAX_LEVEL); } =20 +static void mmu_free_sp_memory_cache(struct kvm_mmu_memory_cache *cache, + spinlock_t *cache_lock) +{ + int orig_nobjs; + + spin_lock(cache_lock); + orig_nobjs =3D cache->nobjs; + kvm_mmu_free_memory_cache(cache); + if (orig_nobjs) + percpu_counter_sub(&kvm_total_unused_mmu_pages, orig_nobjs); + + spin_unlock(cache_lock); +} + static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) { kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache); - kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache); + mmu_free_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache, + &vcpu->arch.mmu_shadow_page_cache_lock); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache); } @@ -1693,27 +1727,15 @@ static int is_empty_shadow_page(u64 *spt) } #endif =20 -/* - * This value is the sum of all of the kvm instances's - * kvm->arch.n_used_mmu_pages values. We need a global, - * aggregate version in order to make the slab shrinker - * faster - */ -static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, long nr) -{ - kvm->arch.n_used_mmu_pages +=3D nr; - percpu_counter_add(&kvm_total_used_mmu_pages, nr); -} - static void kvm_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp) { - kvm_mod_used_mmu_pages(kvm, +1); + kvm->arch.n_used_mmu_pages++; kvm_account_pgtable_pages((void *)sp->spt, +1); } =20 static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *s= p) { - kvm_mod_used_mmu_pages(kvm, -1); + kvm->arch.n_used_mmu_pages--; kvm_account_pgtable_pages((void *)sp->spt, -1); } =20 @@ -2150,8 +2172,31 @@ struct shadow_page_caches { struct kvm_mmu_memory_cache *page_header_cache; struct kvm_mmu_memory_cache *shadow_page_cache; struct kvm_mmu_memory_cache *shadowed_info_cache; + /* + * Protects change in size of shadow_page_cache cache. + */ + spinlock_t *shadow_page_cache_lock; }; =20 +void *kvm_mmu_sp_memory_cache_alloc(struct kvm_mmu_memory_cache *shadow_pa= ge_cache, + spinlock_t *cache_lock) +{ + int orig_nobjs; + void *page; + + if (!cache_lock) { + spin_lock(cache_lock); + orig_nobjs =3D shadow_page_cache->nobjs; + } + page =3D kvm_mmu_memory_cache_alloc(shadow_page_cache); + if (!cache_lock) { + if (orig_nobjs) + percpu_counter_dec(&kvm_total_unused_mmu_pages); + spin_unlock(cache_lock); + } + return page; +} + static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm, struct shadow_page_caches *caches, gfn_t gfn, @@ -2161,7 +2206,8 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page= (struct kvm *kvm, struct kvm_mmu_page *sp; =20 sp =3D kvm_mmu_memory_cache_alloc(caches->page_header_cache); - sp->spt =3D kvm_mmu_memory_cache_alloc(caches->shadow_page_cache); + sp->spt =3D kvm_mmu_sp_memory_cache_alloc(caches->shadow_page_cache, + caches->shadow_page_cache_lock); if (!role.direct) sp->shadowed_translation =3D kvm_mmu_memory_cache_alloc(caches->shadowed= _info_cache); =20 @@ -2218,6 +2264,7 @@ static struct kvm_mmu_page *kvm_mmu_get_shadow_page(s= truct kvm_vcpu *vcpu, .page_header_cache =3D &vcpu->arch.mmu_page_header_cache, .shadow_page_cache =3D &vcpu->arch.mmu_shadow_page_cache, .shadowed_info_cache =3D &vcpu->arch.mmu_shadowed_info_cache, + .shadow_page_cache_lock =3D &vcpu->arch.mmu_shadow_page_cache_lock }; =20 return __kvm_mmu_get_shadow_page(vcpu->kvm, vcpu, &caches, gfn, role); @@ -5916,6 +5963,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) vcpu->arch.mmu_page_header_cache.gfp_zero =3D __GFP_ZERO; =20 vcpu->arch.mmu_shadow_page_cache.gfp_zero =3D __GFP_ZERO; + spin_lock_init(&vcpu->arch.mmu_shadow_page_cache_lock); =20 vcpu->arch.mmu =3D &vcpu->arch.root_mmu; vcpu->arch.walk_mmu =3D &vcpu->arch.root_mmu; @@ -6051,11 +6099,6 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm) kvm_tdp_mmu_zap_invalidated_roots(kvm); } =20 -static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm) -{ - return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages)); -} - static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, struct kvm_page_track_notifier_node *node) @@ -6277,6 +6320,7 @@ static struct kvm_mmu_page *shadow_mmu_get_sp_for_spl= it(struct kvm *kvm, u64 *hu /* Direct SPs do not require a shadowed_info_cache. */ caches.page_header_cache =3D &kvm->arch.split_page_header_cache; caches.shadow_page_cache =3D &kvm->arch.split_shadow_page_cache; + caches.shadow_page_cache_lock =3D NULL; =20 /* Safe to pass NULL for vCPU since requesting a direct SP. */ return __kvm_mmu_get_shadow_page(kvm, NULL, &caches, gfn, role); @@ -6646,66 +6690,49 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm,= u64 gen) static unsigned long mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) { - struct kvm *kvm; - int nr_to_scan =3D sc->nr_to_scan; + struct kvm_mmu_memory_cache *cache; + struct kvm *kvm, *first_kvm =3D NULL; unsigned long freed =3D 0; + /* spinlock for memory cache */ + spinlock_t *cache_lock; + struct kvm_vcpu *vcpu; + unsigned long i; =20 mutex_lock(&kvm_lock); =20 list_for_each_entry(kvm, &vm_list, vm_list) { - int idx; - LIST_HEAD(invalid_list); - - /* - * Never scan more than sc->nr_to_scan VM instances. - * Will not hit this condition practically since we do not try - * to shrink more than one VM and it is very unlikely to see - * !n_used_mmu_pages so many times. - */ - if (!nr_to_scan--) + if (first_kvm =3D=3D kvm) break; - /* - * n_used_mmu_pages is accessed without holding kvm->mmu_lock - * here. We may skip a VM instance errorneosly, but we do not - * want to shrink a VM that only started to populate its MMU - * anyway. - */ - if (!kvm->arch.n_used_mmu_pages && - !kvm_has_zapped_obsolete_pages(kvm)) - continue; + if (!first_kvm) + first_kvm =3D kvm; + list_move_tail(&kvm->vm_list, &vm_list); =20 - idx =3D srcu_read_lock(&kvm->srcu); - write_lock(&kvm->mmu_lock); + kvm_for_each_vcpu(i, vcpu, kvm) { + cache =3D &vcpu->arch.mmu_shadow_page_cache; + cache_lock =3D &vcpu->arch.mmu_shadow_page_cache_lock; + if (READ_ONCE(cache->nobjs)) { + spin_lock(cache_lock); + freed +=3D kvm_mmu_empty_memory_cache(cache); + spin_unlock(cache_lock); + } =20 - if (kvm_has_zapped_obsolete_pages(kvm)) { - kvm_mmu_commit_zap_page(kvm, - &kvm->arch.zapped_obsolete_pages); - goto unlock; } =20 - freed =3D kvm_mmu_zap_oldest_mmu_pages(kvm, sc->nr_to_scan); - -unlock: - write_unlock(&kvm->mmu_lock); - srcu_read_unlock(&kvm->srcu, idx); - - /* - * unfair on small ones - * per-vm shrinkers cry out - * sadness comes quickly - */ - list_move_tail(&kvm->vm_list, &vm_list); - break; + if (freed >=3D sc->nr_to_scan) + break; } =20 + if (freed) + percpu_counter_sub(&kvm_total_unused_mmu_pages, freed); mutex_unlock(&kvm_lock); + percpu_counter_sync(&kvm_total_unused_mmu_pages); return freed; } =20 static unsigned long mmu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) { - return percpu_counter_read_positive(&kvm_total_used_mmu_pages); + return percpu_counter_sum_positive(&kvm_total_unused_mmu_pages); } =20 static struct shrinker mmu_shrinker =3D { @@ -6820,7 +6847,7 @@ int kvm_mmu_vendor_module_init(void) if (!mmu_page_header_cache) goto out; =20 - if (percpu_counter_init(&kvm_total_used_mmu_pages, 0, GFP_KERNEL)) + if (percpu_counter_init(&kvm_total_unused_mmu_pages, 0, GFP_KERNEL)) goto out; =20 ret =3D register_shrinker(&mmu_shrinker, "x86-mmu"); @@ -6830,7 +6857,7 @@ int kvm_mmu_vendor_module_init(void) return 0; =20 out_shrinker: - percpu_counter_destroy(&kvm_total_used_mmu_pages); + percpu_counter_destroy(&kvm_total_unused_mmu_pages); out: mmu_destroy_caches(); return ret; @@ -6847,7 +6874,7 @@ void kvm_mmu_destroy(struct kvm_vcpu *vcpu) void kvm_mmu_vendor_module_exit(void) { mmu_destroy_caches(); - percpu_counter_destroy(&kvm_total_used_mmu_pages); + percpu_counter_destroy(&kvm_total_unused_mmu_pages); unregister_shrinker(&mmu_shrinker); } =20 diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna= l.h index ac00bfbf32f6..c2a342028b6a 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -325,4 +325,6 @@ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cach= e *mc); void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp); void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *s= p); =20 +void *kvm_mmu_sp_memory_cache_alloc(struct kvm_mmu_memory_cache *shadow_pa= ge_cache, + spinlock_t *cache_lock); #endif /* __KVM_X86_MMU_INTERNAL_H */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 764f7c87286f..4974fa96deff 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -264,7 +264,8 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm= _vcpu *vcpu) struct kvm_mmu_page *sp; =20 sp =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache); - sp->spt =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache); + sp->spt =3D kvm_mmu_sp_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cac= he, + &vcpu->arch.mmu_shadow_page_cache_lock); =20 return sp; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 01aad8b74162..efd9b38ea9a2 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1362,6 +1362,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm); int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min); int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capa= city, int min); int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache *mc); +int kvm_mmu_empty_memory_cache(struct kvm_mmu_memory_cache *mc); void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc); void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); #endif diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 13e88297f999..f2d762878b97 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -438,8 +438,10 @@ int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mm= u_memory_cache *mc) return mc->nobjs; } =20 -void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc) +int kvm_mmu_empty_memory_cache(struct kvm_mmu_memory_cache *mc) { + int freed =3D mc->nobjs; + while (mc->nobjs) { if (mc->kmem_cache) kmem_cache_free(mc->kmem_cache, mc->objects[--mc->nobjs]); @@ -447,8 +449,13 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_c= ache *mc) free_page((unsigned long)mc->objects[--mc->nobjs]); } =20 - kvfree(mc->objects); + return freed; +} =20 +void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc) +{ + kvm_mmu_empty_memory_cache(mc); + kvfree(mc->objects); mc->objects =3D NULL; mc->capacity =3D 0; } --=20 2.39.0.314.g84b9a713c41-goog From nobody Wed Sep 17 05:56:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CCC0C4332F for ; Thu, 22 Dec 2022 02:35:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234976AbiLVCfI (ORCPT ); Wed, 21 Dec 2022 21:35:08 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46296 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234961AbiLVCfE (ORCPT ); Wed, 21 Dec 2022 21:35:04 -0500 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA2E820BDC for ; Wed, 21 Dec 2022 18:35:03 -0800 (PST) Received: by mail-pg1-x54a.google.com with SMTP id j21-20020a63fc15000000b00476d6932baeso397205pgi.23 for ; Wed, 21 Dec 2022 18:35:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lvgCZzGuCZEbUtwnqfq9owkASUBDttKfTawhWhWdLK8=; b=Lj/SLu+ZPDBJ2sGiuCeMbA8Eib+ys2Vq5L2kcNArJJ27StQ16ma1UJb42Mlbw+MX2y +vrXP50O/g9dt+8oLnnQPMY/pVvzZccMhn37yhkkMKRf4OlgdSQXrJ+ZWJgL43dFFYR+ tdwnZfiM6qXzjCF0pB4Ud8WosLspziQJwWE3TyKPGONvwBr+KawoDzl0ACL4bYB7f0ka +9u/KzDX7Y/WSElPNKVuB24p/2FtsBTEOpi1zH9/xrJSOu0EbYYUmkBsdvah4qZZp3Yj 04qEKwX3b7Ue+pkFRhxsJ6kCgSSMeS3TCIzw3hYmLO78q2Llvjee+NN6cvHnX+T+1k7v btLg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lvgCZzGuCZEbUtwnqfq9owkASUBDttKfTawhWhWdLK8=; b=J/UK/VrMirhF6brZ2oOIGpqfuCgjfp4TLsdloNwyryMps2DSbsiH3r+jKrEGeP7vwR a1tcjEeMMAO+plD7GOCvk1Z8Rl80ErWyMSqXrjNoWsmDHkH+TjfpS8wTJNXOZ271+fDf MRs5oEENxdaVM0jeFIpJBQwa0wG00uHxDKKLqc23KlJIXOS815NybYP7SdmeLfSRPZH6 WmuEIHXAqsjacJ8ZkWKH1P4RsNvj4m11nRHLGr2ROGv+7K/p7+5czHcONCG/YlpIXguX zX8V5KMmdC2B5bGPtxjkrinhUKMBG/7M/Jb0IBBpNPxf5LWe5oHZyGomWuSOMduXQwEg DBXQ== X-Gm-Message-State: AFqh2kpdBrbaQpS0tJCF5B9pITc9H0JUIqqdvXvtDV0iOYydq2xCNUuN hLihwGS7jZDx2cBdovJROYx5+LVIsIZe X-Google-Smtp-Source: AMrXdXvpyhuXUSV2ZgsVbb3u/vup+sTPI608UoPHhlRzsUzMCiUiErRLse5yLEL+556xYOprZ+VJNxunZl78 X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a17:90b:d8a:b0:223:f336:1519 with SMTP id bg10-20020a17090b0d8a00b00223f3361519mr359433pjb.198.1671676503393; Wed, 21 Dec 2022 18:35:03 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:50 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-3-vipinsh@google.com> Subject: [Patch v3 2/9] KVM: x86/mmu: Remove zapped_obsolete_pages from struct kvm_arch{} From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" zapped_obsolete_pages list was used in struct kvm_arch{} to provide pages for KVM MMU shrinker. This is not needed now as KVM MMU shrinker has been repurposed to free shadow page caches and not zapped_obsolete_pages. Remove zapped_obsolete_pages from struct kvm_arch{} and use local list in kvm_zap_obsolete_pages(). Signed-off-by: Vipin Sharma Reviewed-by: David Matlack --- arch/x86/include/asm/kvm_host.h | 1 - arch/x86/kvm/mmu/mmu.c | 8 ++++---- 2 files changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 89cc809e4a00..f89f02e18080 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1215,7 +1215,6 @@ struct kvm_arch { u8 mmu_valid_gen; struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES]; struct list_head active_mmu_pages; - struct list_head zapped_obsolete_pages; /* * A list of kvm_mmu_page structs that, if zapped, could possibly be * replaced by an NX huge page. A shadow page is on this list if its diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 157417e1cb6e..3364760a1695 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5987,6 +5987,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm) { struct kvm_mmu_page *sp, *node; int nr_zapped, batch =3D 0; + LIST_HEAD(zapped_pages); bool unstable; =20 restart: @@ -6019,8 +6020,8 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm) goto restart; } =20 - unstable =3D __kvm_mmu_prepare_zap_page(kvm, sp, - &kvm->arch.zapped_obsolete_pages, &nr_zapped); + unstable =3D __kvm_mmu_prepare_zap_page(kvm, sp, &zapped_pages, + &nr_zapped); batch +=3D nr_zapped; =20 if (unstable) @@ -6036,7 +6037,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm) * kvm_mmu_load()), and the reload in the caller ensure no vCPUs are * running with an obsolete MMU. */ - kvm_mmu_commit_zap_page(kvm, &kvm->arch.zapped_obsolete_pages); + kvm_mmu_commit_zap_page(kvm, &zapped_pages); } =20 /* @@ -6112,7 +6113,6 @@ int kvm_mmu_init_vm(struct kvm *kvm) int r; =20 INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); - INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages); INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages); spin_lock_init(&kvm->arch.mmu_unsync_pages_lock); =20 --=20 2.39.0.314.g84b9a713c41-goog From nobody Wed Sep 17 05:56:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 85CEAC4332F for ; Thu, 22 Dec 2022 02:35:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234992AbiLVCfT (ORCPT ); Wed, 21 Dec 2022 21:35:19 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46330 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234966AbiLVCfG (ORCPT ); Wed, 21 Dec 2022 21:35:06 -0500 Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com [IPv6:2607:f8b0:4864:20::54a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0B5424BE9 for ; Wed, 21 Dec 2022 18:35:05 -0800 (PST) Received: by mail-pg1-x54a.google.com with SMTP id h185-20020a636cc2000000b004820a10a57bso398969pgc.22 for ; Wed, 21 Dec 2022 18:35:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=+chyPC2pzC5mKivdZn2tXKKp7i5hdJsSiGnXjGpd+qo=; b=nHCHJ906sv9cxX8DnY/jDErP4w5InUk/yQ735EsKsvPVAmpXxi0y6/6oFa0Ure2H/v hzYxnyKvYvrKkdb7uMSqsnWwhBbO0EhadDcgtpLc22KCypMQx7uCwCa782RsWJA/djXh VToUto+d6Ob3eNyF0TG05M2icTNN9rOnPtRuHqrAeoZZawgG/HO/2quETm+y7BdXSAPK 3XuARlSbS0ji1U90zg9ATDqpNhQy4Ra4Dar/1GHhR8l812S8MSprSWgnsTFSq6tDseSY XQXRLkfGe7gPLZJHtk0190ocrakiM/5KtJaA73UsDTTKrqmvqSCLZ8UryxcP8WaMHZlQ Lq+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+chyPC2pzC5mKivdZn2tXKKp7i5hdJsSiGnXjGpd+qo=; b=G1e/M21fILzTItK/Q/WyCvurI6mxTimrupn4jT5PDER3h4nnBQ82ooHOqDhDLOwIuc W54tg4+A/0n2MmKBTkV0WwmNjohh3pZdGwkbW/ufTXzhUGbXsiNia/WIJZxRQbGFxCT7 F4I+Z+mGqwVXDmgRHI398iqs78+K+LLB/1LFPNZrl31MSrN36RP8UrnZa2eZvkxj+/wt /FEoIoinWYj7+nzMfMtGzPE9L2lGinVdx77+Gx1T3tPUH88zFRSJD/fhGXyUmqwik6Eh 98pAyRhliXevUykNkd627yP3N2SwoxrbRQpTWJCR2xlIDsIyjqEaWWyvVb/U4xkglVfj 79HA== X-Gm-Message-State: AFqh2krT1JcprpjPrCEG/b0FonG9iBSyx245rzRNEAclKk10fWlZ+Xhb Qb9oI3Emxt35KJmhkQnF9Tjiq8LQ8Y+X X-Google-Smtp-Source: AMrXdXsothIyZZv/0F4TXnv6H6IlW73v2BSr7V+Bexsj6gh/v1vAiynOCYEIOrsxLEPfYDTbw3BlGUXc6gE7 X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a63:8c48:0:b0:479:46cd:e2dc with SMTP id q8-20020a638c48000000b0047946cde2dcmr163712pgn.547.1671676505112; Wed, 21 Dec 2022 18:35:05 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:51 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-4-vipinsh@google.com> Subject: [Patch v3 3/9] KVM: x86/mmu: Shrink split_shadow_page_cache via KVM MMU shrinker From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" split_shadow_page_cache is not used after dirty log is disabled. It is a good candidate to free memory in case of mmu_shrink_scan kicks in. Account for split_shadow_page_cache via kvm_total_unused_mmu_pages and use it in mmu_shrink_scan. Signed-off-by: Vipin Sharma --- arch/x86/include/asm/kvm_host.h | 5 +++ arch/x86/kvm/mmu/mmu.c | 63 +++++++++++++++++++-------------- 2 files changed, 42 insertions(+), 26 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index f89f02e18080..293994fabae3 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1413,6 +1413,11 @@ struct kvm_arch { struct kvm_mmu_memory_cache split_shadow_page_cache; struct kvm_mmu_memory_cache split_page_header_cache; =20 + /* + * Protects change in size of split_shadow_page_cache cache. + */ + spinlock_t split_shadow_page_cache_lock; + /* * Memory cache used to allocate pte_list_desc structs while splitting * huge pages. In the worst case, to split one huge page, 512 diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 3364760a1695..6f6a10d7a871 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -659,14 +659,15 @@ static void walk_shadow_page_lockless_end(struct kvm_= vcpu *vcpu) } =20 static int mmu_topup_sp_memory_cache(struct kvm_mmu_memory_cache *cache, - spinlock_t *cache_lock) + spinlock_t *cache_lock, + int min) { int orig_nobjs; int r; =20 spin_lock(cache_lock); orig_nobjs =3D cache->nobjs; - r =3D kvm_mmu_topup_memory_cache(cache, PT64_ROOT_MAX_LEVEL); + r =3D kvm_mmu_topup_memory_cache(cache, min); if (orig_nobjs !=3D cache->nobjs) percpu_counter_add(&kvm_total_unused_mmu_pages, (cache->nobjs - orig_nobjs)); @@ -684,7 +685,8 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcp= u, bool maybe_indirect) if (r) return r; r =3D mmu_topup_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache, - &vcpu->arch.mmu_shadow_page_cache_lock); + &vcpu->arch.mmu_shadow_page_cache_lock, + PT64_ROOT_MAX_LEVEL); if (r) return r; if (maybe_indirect) { @@ -2184,16 +2186,12 @@ void *kvm_mmu_sp_memory_cache_alloc(struct kvm_mmu_= memory_cache *shadow_page_cac int orig_nobjs; void *page; =20 - if (!cache_lock) { - spin_lock(cache_lock); - orig_nobjs =3D shadow_page_cache->nobjs; - } + spin_lock(cache_lock); + orig_nobjs =3D shadow_page_cache->nobjs; page =3D kvm_mmu_memory_cache_alloc(shadow_page_cache); - if (!cache_lock) { - if (orig_nobjs) - percpu_counter_dec(&kvm_total_unused_mmu_pages); - spin_unlock(cache_lock); - } + if (orig_nobjs) + percpu_counter_dec(&kvm_total_unused_mmu_pages); + spin_unlock(cache_lock); return page; } =20 @@ -6130,6 +6128,7 @@ int kvm_mmu_init_vm(struct kvm *kvm) kvm->arch.split_page_header_cache.gfp_zero =3D __GFP_ZERO; =20 kvm->arch.split_shadow_page_cache.gfp_zero =3D __GFP_ZERO; + spin_lock_init(&kvm->arch.split_shadow_page_cache_lock); =20 kvm->arch.split_desc_cache.kmem_cache =3D pte_list_desc_cache; kvm->arch.split_desc_cache.gfp_zero =3D __GFP_ZERO; @@ -6141,7 +6140,8 @@ static void mmu_free_vm_memory_caches(struct kvm *kvm) { kvm_mmu_free_memory_cache(&kvm->arch.split_desc_cache); kvm_mmu_free_memory_cache(&kvm->arch.split_page_header_cache); - kvm_mmu_free_memory_cache(&kvm->arch.split_shadow_page_cache); + mmu_free_sp_memory_cache(&kvm->arch.split_shadow_page_cache, + &kvm->arch.split_shadow_page_cache_lock); } =20 void kvm_mmu_uninit_vm(struct kvm *kvm) @@ -6295,7 +6295,9 @@ static int topup_split_caches(struct kvm *kvm) if (r) return r; =20 - return kvm_mmu_topup_memory_cache(&kvm->arch.split_shadow_page_cache, 1); + return mmu_topup_sp_memory_cache(&kvm->arch.split_shadow_page_cache, + &kvm->arch.split_shadow_page_cache_lock, + 1); } =20 static struct kvm_mmu_page *shadow_mmu_get_sp_for_split(struct kvm *kvm, u= 64 *huge_sptep) @@ -6320,7 +6322,7 @@ static struct kvm_mmu_page *shadow_mmu_get_sp_for_spl= it(struct kvm *kvm, u64 *hu /* Direct SPs do not require a shadowed_info_cache. */ caches.page_header_cache =3D &kvm->arch.split_page_header_cache; caches.shadow_page_cache =3D &kvm->arch.split_shadow_page_cache; - caches.shadow_page_cache_lock =3D NULL; + caches.shadow_page_cache_lock =3D &kvm->arch.split_shadow_page_cache_lock; =20 /* Safe to pass NULL for vCPU since requesting a direct SP. */ return __kvm_mmu_get_shadow_page(kvm, NULL, &caches, gfn, role); @@ -6687,14 +6689,23 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm,= u64 gen) } } =20 +static unsigned long mmu_shrink_cache(struct kvm_mmu_memory_cache *cache, + spinlock_t *cache_lock) +{ + unsigned long freed =3D 0; + + spin_lock(cache_lock); + if (cache->nobjs) + freed =3D kvm_mmu_empty_memory_cache(cache); + spin_unlock(cache_lock); + return freed; +} + static unsigned long mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) { - struct kvm_mmu_memory_cache *cache; struct kvm *kvm, *first_kvm =3D NULL; unsigned long freed =3D 0; - /* spinlock for memory cache */ - spinlock_t *cache_lock; struct kvm_vcpu *vcpu; unsigned long i; =20 @@ -6707,15 +6718,15 @@ mmu_shrink_scan(struct shrinker *shrink, struct shr= ink_control *sc) first_kvm =3D kvm; list_move_tail(&kvm->vm_list, &vm_list); =20 - kvm_for_each_vcpu(i, vcpu, kvm) { - cache =3D &vcpu->arch.mmu_shadow_page_cache; - cache_lock =3D &vcpu->arch.mmu_shadow_page_cache_lock; - if (READ_ONCE(cache->nobjs)) { - spin_lock(cache_lock); - freed +=3D kvm_mmu_empty_memory_cache(cache); - spin_unlock(cache_lock); - } + freed +=3D mmu_shrink_cache(&kvm->arch.split_shadow_page_cache, + &kvm->arch.split_shadow_page_cache_lock); =20 + if (freed >=3D sc->nr_to_scan) + break; + + kvm_for_each_vcpu(i, vcpu, kvm) { + freed +=3D mmu_shrink_cache(&vcpu->arch.mmu_shadow_page_cache, + &vcpu->arch.mmu_shadow_page_cache_lock); } =20 if (freed >=3D sc->nr_to_scan) --=20 2.39.0.314.g84b9a713c41-goog From nobody Wed Sep 17 05:56:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CB03C4332F for ; Thu, 22 Dec 2022 02:35:25 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230319AbiLVCfX (ORCPT ); Wed, 21 Dec 2022 21:35:23 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46370 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234975AbiLVCfI (ORCPT ); Wed, 21 Dec 2022 21:35:08 -0500 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C6D224BE9 for ; Wed, 21 Dec 2022 18:35:07 -0800 (PST) Received: by mail-pj1-x1049.google.com with SMTP id il11-20020a17090b164b00b00219a4366109so2315685pjb.0 for ; Wed, 21 Dec 2022 18:35:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=bxkmrVszmYtdX56U0+zIQlMbh2VgS2C48ld2l9CYwus=; b=Rn6JhTg0zhPlYluWmxYHpBlbM65jcOKT08CTpFfaF7H5DiKsfX4cghOlXKtbXYarOK Rly5XOhtvEYkFNI27R7kdmcDaft+DtQZHjIUVe60dOsihgLsa9smgxM+b5XDJjnQ5ViO Lkw5zbkLDE5ElgM22t1HWQhqSPjJc/A7roPsY0/dbXeUhXDdkCO+aerp6M5Gy8UvxMx9 hry1N1NLcMymvSxaWuZTNy40tIRuVsbU4zETnVIVyXentp2aZLJhf0IN9lIyvdeQEICD jjslMf9GOy2+ihLLAQ7wMggtMTn5Pk6LiRqyldHCiyElR3iBYcg59o/9Nzc9a7CBMBBj D2Vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=bxkmrVszmYtdX56U0+zIQlMbh2VgS2C48ld2l9CYwus=; b=ucqHjmmPBBcd6XRaqeEXEVvzl6lObI5ylqBFUK0Fw2dIyJCCI5mUQYR844o3YahPNI DgcpLzE03Ycae2WLyQTlDsPYKVpGFcP0+fqVK/Oyu9FOHjZGYxmbk5ZRLKWpdxF62Bwd w06Ci/Jw7ZOEQGxu/HSaHtPrW/XU9NAWs22dTPvzSDlkX5vOFDY0P/dn0O1mqrYhwJeW LtHrr/gJODrZoYEvfLpgMQFRc7+osVDvJexdnQqfcr+GnkW7SUSJ8wqK3FPNCr0q17p4 HQRDqrL1x0eyGThPODqync4Ce67z5g9UzFZmhp+9Zrx2S6SX5GuniHZDHKyW+jkAiWLq ZFfQ== X-Gm-Message-State: AFqh2kpAqvLZlqPTmzM8xkc8MTgUZC/3WhWORevfDaighXwC6Ym2vkPj 6L2S6WFcDQ6rAjfnspOCIk10EHEAyEGS X-Google-Smtp-Source: AMrXdXtuF9r3bYe3VPv1FhZLvcXC0CXs3tzUIuXM098fxmkDbP+sDF600+9NCrol1k+pvP3NgkN8AZPuhIVd X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a62:1d0c:0:b0:56b:bba4:650a with SMTP id d12-20020a621d0c000000b0056bbba4650amr294643pfd.4.1671676506822; Wed, 21 Dec 2022 18:35:06 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:52 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-5-vipinsh@google.com> Subject: [Patch v3 4/9] KVM: Add module param to make page tables NUMA aware From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add a numa_aware_page_table module param to make page tables NUMA aware. Signed-off-by: Vipin Sharma --- include/linux/kvm_host.h | 2 ++ virt/kvm/kvm_main.c | 22 ++++++++++++++++++++++ 2 files changed, 24 insertions(+) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index efd9b38ea9a2..d48064503b88 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1358,6 +1358,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool use= rmode_vcpu_not_eligible); =20 void kvm_flush_remote_tlbs(struct kvm *kvm); =20 +void *kvm_mmu_get_free_page(int nid, gfp_t gfp); + #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min); int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capa= city, int min); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index f2d762878b97..d96c8146e9ba 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -93,6 +93,13 @@ unsigned int halt_poll_ns_shrink; module_param(halt_poll_ns_shrink, uint, 0644); EXPORT_SYMBOL_GPL(halt_poll_ns_shrink); =20 +/* + * If possible, allocate page table's pages on the same node the underlying + * physical page is pointing to. + */ +static bool __read_mostly numa_aware_pagetable =3D true; +module_param_named(numa_aware_pagetable, numa_aware_pagetable, bool, 0644); + /* * Ordering of locks: * @@ -384,6 +391,21 @@ static void kvm_flush_shadow_all(struct kvm *kvm) kvm_arch_guest_memory_reclaimed(kvm); } =20 +void *kvm_mmu_get_free_page(int nid, gfp_t gfp) +{ + #ifdef CONFIG_NUMA + struct page *spt_page; + + if (numa_aware_pagetable) { + spt_page =3D alloc_pages_node(nid, gfp, 0); + if (spt_page) + return page_address(spt_page); + } + #endif // CONFIG_NUMA + + return (void *)__get_free_page(gfp); +} + #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache= *mc, gfp_t gfp_flags) --=20 2.39.0.314.g84b9a713c41-goog From nobody Wed Sep 17 05:56:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A509C4332F for ; Thu, 22 Dec 2022 02:35:29 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234993AbiLVCf1 (ORCPT ); Wed, 21 Dec 2022 21:35:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234967AbiLVCfO (ORCPT ); Wed, 21 Dec 2022 21:35:14 -0500 Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com [IPv6:2607:f8b0:4864:20::649]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E956A25E88 for ; Wed, 21 Dec 2022 18:35:08 -0800 (PST) Received: by mail-pl1-x649.google.com with SMTP id n5-20020a170902d2c500b00189e5b86fe2so491946plc.16 for ; Wed, 21 Dec 2022 18:35:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Uy5Q2JyltsFU76SmhFR5SIPjgCJm3U7CWH+6Z0DXsQs=; b=Sxl6bgj9EJ9iuihYZp83JeZx6IvPl8UZ1m5+6Cm43HO2NjgEx1Ek/IP3Bjg0bBhZW0 8ZsoHARIkAabw+eZ9yX3vNLtLVrs4JXFtNBZ9ix0k6roaGbDS77OSl0l9+SaY1BxFrgj WyVFHhz44Nzpglk0v7PrLxxRSm/7PGwcY/LPhH5DfSLZIQd9fxWJKRGLBAUCpaB23ERE haqSqyKBtrgEucB247ShrNEzUOq8CKDN0XA8iJ7Koj/0KmQ9/BF5rYF7mYrXvQ555RLf Rq3YPDdsAMh/1f7gLPWqTz9zIe10Nns6MnBvB6QXtJMsV0+jRLRhbINgIKNU9UmSqp05 Sb4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Uy5Q2JyltsFU76SmhFR5SIPjgCJm3U7CWH+6Z0DXsQs=; b=a7hXPsVr/dG5jGhMyFIG+fgWWUgxhE1qaraAHOjTRbYUajm5jLfG81wHerHDMCqGe3 bSyQFMCanLPUNA0ZDG9rTsIXVpjQf4MC36PQMfNCA9ClpFqurdK0JtOUsfWKzq4H0b9v oyMW07l4xZxZhhv4W9XskfHmeV3k6atsX+p+6UjL7pwrgaKXt+PLGa+FhzJAaZg4Qp7O 9tNxIujlyN56bSlGDZXcRuuweSi/NntrjUNWK7hPVFWMpp+TwQ1RAkX6GMqPZaEhFdNa N+8yJ6YHGGiWY0D9kSTICh16q8hWVn1KUqOqu9W5JkOa3k+qCFl+DS1RGrhihe9vRUpN Qyig== X-Gm-Message-State: AFqh2kogY0obFDPigLIzY3j04QzOoM4IBNkSyscHrOzF9tRt+Z7znFzK Z4RQYaTZparPdM1snXeO6zmUvx5SsNwH X-Google-Smtp-Source: AMrXdXv4l2iC4/6tj8LaLY8coRC0+QBvzj+rX/d00qDOyzqCHl1d07EeJQdXbmkvRo6dKHfncNl0vOPNS2s0 X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a17:90a:8e81:b0:219:c5b3:c543 with SMTP id f1-20020a17090a8e8100b00219c5b3c543mr494590pjo.200.1671676508473; Wed, 21 Dec 2022 18:35:08 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:53 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-6-vipinsh@google.com> Subject: [Patch v3 5/9] KVM: x86/mmu: Allocate TDP page table's page on correct NUMA node on split From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When dirty log is enabled, huge pages are split. Page table's pages during the split are allocated based on the current thread NUMA node or mempolicy. This causes inefficient page table accesses if underlying page is on a different NUMA node Allocate page table's pages on the same NUMA node as the underlying huge page when dirty log is enabled and huge pages are split. The performance gain during the pre-copy phase of live migrations of a 416 vCPUs and 11 TiB memory VM on a 8 node host was seen in the range of 130% to 150%. Suggested-by: David Matlack Signed-off-by: Vipin Sharma --- arch/x86/kvm/mmu/tdp_mmu.c | 12 ++++++++---- include/linux/kvm_host.h | 18 ++++++++++++++++++ 2 files changed, 26 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 4974fa96deff..376b8dceb3f9 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1403,7 +1403,7 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm, return spte_set; } =20 -static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp) +static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(int nid, gfp_t gf= p) { struct kvm_mmu_page *sp; =20 @@ -1413,7 +1413,8 @@ static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_sp= lit(gfp_t gfp) if (!sp) return NULL; =20 - sp->spt =3D (void *)__get_free_page(gfp); + sp->spt =3D kvm_mmu_get_free_page(nid, gfp); + if (!sp->spt) { kmem_cache_free(mmu_page_header_cache, sp); return NULL; @@ -1427,6 +1428,9 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli= t(struct kvm *kvm, bool shared) { struct kvm_mmu_page *sp; + int nid; + + nid =3D kvm_pfn_to_page_table_nid(spte_to_pfn(iter->old_spte)); =20 /* * Since we are allocating while under the MMU lock we have to be @@ -1437,7 +1441,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli= t(struct kvm *kvm, * If this allocation fails we drop the lock and retry with reclaim * allowed. */ - sp =3D __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT); + sp =3D __tdp_mmu_alloc_sp_for_split(nid, GFP_NOWAIT | __GFP_ACCOUNT); if (sp) return sp; =20 @@ -1449,7 +1453,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli= t(struct kvm *kvm, write_unlock(&kvm->mmu_lock); =20 iter->yielded =3D true; - sp =3D __tdp_mmu_alloc_sp_for_split(GFP_KERNEL_ACCOUNT); + sp =3D __tdp_mmu_alloc_sp_for_split(nid, GFP_KERNEL_ACCOUNT); =20 if (shared) read_lock(&kvm->mmu_lock); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index d48064503b88..a262e15ebd19 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1583,6 +1583,24 @@ void kvm_arch_sync_events(struct kvm *kvm); int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu); =20 struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn); + +/* + * Tells the appropriate NUMA node location of the page table's page based= on + * pfn it will point to. + * + * Return the nid of the page if pfn is valid and backed by a refcounted p= age, + * otherwise, return the nearest memory node for the current CPU. + */ +static inline int kvm_pfn_to_page_table_nid(kvm_pfn_t pfn) +{ + struct page *page =3D kvm_pfn_to_refcounted_page(pfn); + + if (page) + return page_to_nid(page); + else + return numa_mem_id(); +} + bool kvm_is_zone_device_page(struct page *page); =20 struct kvm_irq_ack_notifier { --=20 2.39.0.314.g84b9a713c41-goog From nobody Wed Sep 17 05:56:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E272CC10F1B for ; Thu, 22 Dec 2022 02:35:43 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230447AbiLVCfm (ORCPT ); Wed, 21 Dec 2022 21:35:42 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46540 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235003AbiLVCfR (ORCPT ); Wed, 21 Dec 2022 21:35:17 -0500 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10C6B25C7E for ; Wed, 21 Dec 2022 18:35:10 -0800 (PST) Received: by mail-pj1-x1049.google.com with SMTP id p11-20020a17090a680b00b002233455d706so296308pjj.4 for ; Wed, 21 Dec 2022 18:35:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=poiDBxCXZuaSeZicG2qgYYUccKGMhjWu3dn9FbyWbdI=; b=TH1iUYrFfRpDWtVxW3+pgFlRAZdRswEo031A+JoBSFw8+0LAdMFXq8dZa92oOrcqew zYK5KnaC50bemT2vcH3I6ljHX1zk29hcYKiDZBNLKVzfySyuRHuu18d89jK8PIfT1R5h LcbSU39GflE5evKZtr1uNxzcziFGWAUPdRR3Mx/UUTStoxEpMFjqdxAwTrADnkgxrxfR MoqyQVpxe61yOYHtHbVvl+2zwqIlTaQOZOu2yejduA1CKn6/fjkWxKF51oEKMyBQO64B G5UcwCIQeSaC9FL/TdKjgMmQqju1POsZRD1JXmJmsK1WsMnE1GTwbAqHf7k7VSLWZPyK ZOlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=poiDBxCXZuaSeZicG2qgYYUccKGMhjWu3dn9FbyWbdI=; b=OXm0Bvb7kD1i2OAsjqaf7XKwXTP8vKJa8z68leRpIUGWt77vsdDtIy2cLCA70jNMQP nRgb/J4GjrGD9ET+LO7xVzRWPNfQH7xaNlL1u8k9g+ghtAV/WtD4ZCt34f/5Pbm6Hpob 7+MtEtBV6WwPECL0xwsCqI7aKXhzWXoPYr9jJ87TxQ6MqtMHB9kra8sCoHezHwYQnTrA p4Czhj4iHxBN8NjrorfxMv4vuW8mqnzjV4K1kNsPVwwkOQaKAoer+XCDKl4maNauIYeZ BAROHBvwz/ABaajDFSDt1XlO+OS3BhqW4rsG3Bko+BrJkZ2HIEfatNtrd+j1xL4ZhwbL 41FA== X-Gm-Message-State: AFqh2koeiPA/25YeaiNVeVJmWSXdbMPZ88VBAe4BbtBm+42gTmtjJgb8 r1UHpnsGdmzxD3fZxq72vroLOwd9W2w1 X-Google-Smtp-Source: AMrXdXtcmKHWHo1AhNx9UQ0Z/sHm4RIiqsiHuv78kXo5tk+ZPsGbtCKvNa6VwqFR7MArdGz/SkeKXbSRg1zN X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a17:90a:4c83:b0:219:ac7f:27d8 with SMTP id k3-20020a17090a4c8300b00219ac7f27d8mr366228pjh.192.1671676510249; Wed, 21 Dec 2022 18:35:10 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:54 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-7-vipinsh@google.com> Subject: [Patch v3 6/9] KVM: Provide NUMA node support to kvm_mmu_memory_cache{} From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add 'node' variable in kvm_mmu_memory_cache{} to denote which NUMA node this cache should allocate memory from. Default initialize to NUMA_NO_NODE in all architectures. Signed-off-by: Vipin Sharma --- arch/arm64/kvm/arm.c | 2 +- arch/arm64/kvm/mmu.c | 4 +++- arch/mips/kvm/mips.c | 2 ++ arch/riscv/kvm/mmu.c | 2 +- arch/riscv/kvm/vcpu.c | 2 +- arch/x86/kvm/mmu/mmu.c | 22 ++++++++++++---------- include/linux/kvm_host.h | 6 ++++++ include/linux/kvm_types.h | 2 ++ 8 files changed, 28 insertions(+), 14 deletions(-) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 9c5573bc4614..52a41f4532e2 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -340,7 +340,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) vcpu->arch.target =3D -1; bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES); =20 - vcpu->arch.mmu_page_cache.gfp_zero =3D __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_page_cache, NULL, NUMA_NO_NODE); =20 /* * Default value for the FP state, will be overloaded at load diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 31d7fa4c7c14..bd07155e17fa 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -894,12 +894,14 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_= t guest_ipa, { phys_addr_t addr; int ret =3D 0; - struct kvm_mmu_memory_cache cache =3D { .gfp_zero =3D __GFP_ZERO }; + struct kvm_mmu_memory_cache cache; struct kvm_pgtable *pgt =3D kvm->arch.mmu.pgt; enum kvm_pgtable_prot prot =3D KVM_PGTABLE_PROT_DEVICE | KVM_PGTABLE_PROT_R | (writable ? KVM_PGTABLE_PROT_W : 0); =20 + INIT_KVM_MMU_MEMORY_CACHE(&cache, NULL, NUMA_NO_NODE); + if (is_protected_kvm_enabled()) return -EPERM; =20 diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c index a25e0b73ee70..b017c29a9340 100644 --- a/arch/mips/kvm/mips.c +++ b/arch/mips/kvm/mips.c @@ -304,6 +304,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) HRTIMER_MODE_REL); vcpu->arch.comparecount_timer.function =3D kvm_mips_comparecount_wakeup; =20 + vcpu->arch.mmu_page_cache.node =3D NUMA_NO_NODE; + /* * Allocate space for host mode exception handlers that handle * guest mode exits diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c index 34b57e0be2ef..119de4520cc6 100644 --- a/arch/riscv/kvm/mmu.c +++ b/arch/riscv/kvm/mmu.c @@ -353,9 +353,9 @@ int kvm_riscv_gstage_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t addr, end; struct kvm_mmu_memory_cache pcache =3D { .gfp_custom =3D (in_atomic) ? GFP_ATOMIC | __GFP_ACCOUNT : 0, - .gfp_zero =3D __GFP_ZERO, }; =20 + INIT_KVM_MMU_MEMORY_CACHE(&pcache, NULL, NUMA_NO_NODE); end =3D (gpa + size + PAGE_SIZE - 1) & PAGE_MASK; pfn =3D __phys_to_pfn(hpa); =20 diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c index 7c08567097f0..189b14feb365 100644 --- a/arch/riscv/kvm/vcpu.c +++ b/arch/riscv/kvm/vcpu.c @@ -161,7 +161,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) =20 /* Mark this VCPU never ran */ vcpu->arch.ran_atleast_once =3D false; - vcpu->arch.mmu_page_cache.gfp_zero =3D __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_page_cache, NULL, NUMA_NO_NODE); bitmap_zero(vcpu->arch.isa, RISCV_ISA_EXT_MAX); =20 /* Setup ISA features available to VCPU */ diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 6f6a10d7a871..23a3b82b2384 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5954,13 +5954,14 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) { int ret; =20 - vcpu->arch.mmu_pte_list_desc_cache.kmem_cache =3D pte_list_desc_cache; - vcpu->arch.mmu_pte_list_desc_cache.gfp_zero =3D __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_pte_list_desc_cache, + pte_list_desc_cache, NUMA_NO_NODE); =20 - vcpu->arch.mmu_page_header_cache.kmem_cache =3D mmu_page_header_cache; - vcpu->arch.mmu_page_header_cache.gfp_zero =3D __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_page_header_cache, + mmu_page_header_cache, NUMA_NO_NODE); =20 - vcpu->arch.mmu_shadow_page_cache.gfp_zero =3D __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_shadow_page_cache, + NULL, NUMA_NO_NODE); spin_lock_init(&vcpu->arch.mmu_shadow_page_cache_lock); =20 vcpu->arch.mmu =3D &vcpu->arch.root_mmu; @@ -6124,14 +6125,15 @@ int kvm_mmu_init_vm(struct kvm *kvm) node->track_flush_slot =3D kvm_mmu_invalidate_zap_pages_in_memslot; kvm_page_track_register_notifier(kvm, node); =20 - kvm->arch.split_page_header_cache.kmem_cache =3D mmu_page_header_cache; - kvm->arch.split_page_header_cache.gfp_zero =3D __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_page_header_cache, + mmu_page_header_cache, NUMA_NO_NODE); =20 - kvm->arch.split_shadow_page_cache.gfp_zero =3D __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_shadow_page_cache, + NULL, NUMA_NO_NODE); spin_lock_init(&kvm->arch.split_shadow_page_cache_lock); =20 - kvm->arch.split_desc_cache.kmem_cache =3D pte_list_desc_cache; - kvm->arch.split_desc_cache.gfp_zero =3D __GFP_ZERO; + INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_desc_cache, + pte_list_desc_cache, NUMA_NO_NODE); =20 return 0; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a262e15ebd19..719687a37ef7 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -2302,4 +2302,10 @@ static inline void kvm_account_pgtable_pages(void *v= irt, int nr) /* Max number of entries allowed for each kvm dirty ring */ #define KVM_DIRTY_RING_MAX_ENTRIES 65536 =20 +#define INIT_KVM_MMU_MEMORY_CACHE(_cache, _kmem_cache, _node) ({ \ + (_cache)->kmem_cache =3D _kmem_cache; \ + (_cache)->gfp_zero =3D __GFP_ZERO; \ + (_cache)->node =3D _node; \ +}) + #endif diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index 76de36e56cdf..9c70ce95e51f 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -97,6 +97,8 @@ struct kvm_mmu_memory_cache { struct kmem_cache *kmem_cache; int capacity; void **objects; + /* Node on which memory should be allocated by default */ + int node; }; #endif =20 --=20 2.39.0.314.g84b9a713c41-goog From nobody Wed Sep 17 05:56:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4ECA0C4332F for ; Thu, 22 Dec 2022 02:35:49 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235062AbiLVCfr (ORCPT ); Wed, 21 Dec 2022 21:35:47 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46706 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235009AbiLVCfR (ORCPT ); Wed, 21 Dec 2022 21:35:17 -0500 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6CF66264B3 for ; Wed, 21 Dec 2022 18:35:12 -0800 (PST) Received: by mail-pj1-x1049.google.com with SMTP id ep17-20020a17090ae65100b00219702c495cso310534pjb.2 for ; Wed, 21 Dec 2022 18:35:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mit5lexfNnxmUxBj8TQ21C+l/JJ6de0LF0cpUag1ehI=; b=EHRq6KnggJg0qHDofwfjR8ae32NXDI/HKi8Txp4oJH7WGgC54js8Lmg9jOpgQKL0rf z5JBSYGPSrYY/bjbOoczVB/e35UZHQcAExxP9b/UwvY8aeF7ea4XZKfRr0uNoyjCemAz jVdJjr7nGbgxKIB6oB8sOdVoj3l64EwTQ7tXZxVXxJhjQtsdcf9fjjFsaFhEV0lMOSog qLan7LTbdfWcqMilsqnd4KRXpnWI1fT1BqGyVFVLzG+lEAxZDh3DifZDDrCiKznX2L+W UUGlJ8LkknW4OnL+SHGFiZMuUeqW1LHnteUkayX070e5ORkSeVq505RNpaK5ocBgWUOy L9lg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mit5lexfNnxmUxBj8TQ21C+l/JJ6de0LF0cpUag1ehI=; b=Z7Uu7G8Cnd8Xek0+DzzrWTNu9Ko4xuikA0T4HanUk69BFSvZph5ldTkLABkCUfIzwT MXdPSLjqtIC6oaac9y/I6AUWhTgw2czRfv3/BnRbTC/K388z96Ck8F/RN9cVx+Mv9tfO sTMJ9FfaSvZNEogizeexm/BvaOYD4GqoP0vSI0lY1TkS2dCmTGf6Qlnnqt2EdOVp0wxY RBKOVWP1/Yi9cJKuFx3rnIeaj6yEHlIqW1tvZz/uX8po3KSczJSJKGiBYiv69cbChe+y STeC9R+/N5L1fqCB9S5w6DO90BglwfK5ZrPav/UmUgXggVgT3bwkuQQ45AsYxO0eF5K0 OZfg== X-Gm-Message-State: AFqh2kqF0ANpGzlMXhi0GN83p0iTJPBRd7lr9+rCyPu7KYmPd97MHvsk dCnDxRZhU5rwSE0++8KTC5N674UagMtf X-Google-Smtp-Source: AMrXdXtxUxWd7v+xEZSntupDbitZG1s36UwcrA0HpW4+YL7A8mh7wqUIn/5H+j8OdYSHApBFGXK6k37/g7hV X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a17:90a:bd12:b0:225:b164:8874 with SMTP id y18-20020a17090abd1200b00225b1648874mr90799pjr.87.1671676511714; Wed, 21 Dec 2022 18:35:11 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:55 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-8-vipinsh@google.com> Subject: [Patch v3 7/9] KVM: x86/mmu: Allocate page table's pages on NUMA node of the underlying pages From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Page table pages of a VM are currently allocated based on the current task's NUMA node or its mempolicy. This can cause suboptimal remote accesses by the vCPU if it is accessing physical pages local to its NUMA node but the page table pages mapping those physcal pages were created by some other vCPU which was on different NUMA node or had different policy. Allocate page table pages on the same NUMA node where underlying physical page exists. Page table at level 5, 4, and 3 might not end up on the same NUMA node as they can span multiple NUMA nodes. Signed-off-by: Vipin Sharma --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/mmu/mmu.c | 63 ++++++++++++++++++++++----------- arch/x86/kvm/mmu/paging_tmpl.h | 4 +-- arch/x86/kvm/mmu/tdp_mmu.c | 11 +++--- virt/kvm/kvm_main.c | 2 +- 5 files changed, 53 insertions(+), 29 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 293994fabae3..b1f319ad6f89 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -782,7 +782,7 @@ struct kvm_vcpu_arch { struct kvm_mmu *walk_mmu; =20 struct kvm_mmu_memory_cache mmu_pte_list_desc_cache; - struct kvm_mmu_memory_cache mmu_shadow_page_cache; + struct kvm_mmu_memory_cache mmu_shadow_page_cache[MAX_NUMNODES]; struct kvm_mmu_memory_cache mmu_shadowed_info_cache; struct kvm_mmu_memory_cache mmu_page_header_cache; =20 diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 23a3b82b2384..511c6ef265ee 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -677,24 +677,29 @@ static int mmu_topup_sp_memory_cache(struct kvm_mmu_m= emory_cache *cache, =20 static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indir= ect) { - int r; + int r, nid; =20 /* 1 rmap, 1 parent PTE per level, and the prefetched rmaps. */ r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache, 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM); if (r) return r; - r =3D mmu_topup_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache, - &vcpu->arch.mmu_shadow_page_cache_lock, - PT64_ROOT_MAX_LEVEL); - if (r) - return r; + + for_each_online_node(nid) { + r =3D mmu_topup_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache[nid], + &vcpu->arch.mmu_shadow_page_cache_lock, + PT64_ROOT_MAX_LEVEL); + if (r) + return r; + } + if (maybe_indirect) { r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadowed_info_cache, PT64_ROOT_MAX_LEVEL); if (r) return r; } + return kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache, PT64_ROOT_MAX_LEVEL); } @@ -715,9 +720,14 @@ static void mmu_free_sp_memory_cache(struct kvm_mmu_me= mory_cache *cache, =20 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu) { + int nid; + kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache); - mmu_free_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache, - &vcpu->arch.mmu_shadow_page_cache_lock); + + for_each_node(nid) + mmu_free_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache[nid], + &vcpu->arch.mmu_shadow_page_cache_lock); + kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache); kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache); } @@ -2256,11 +2266,12 @@ static struct kvm_mmu_page *__kvm_mmu_get_shadow_pa= ge(struct kvm *kvm, =20 static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu, gfn_t gfn, - union kvm_mmu_page_role role) + union kvm_mmu_page_role role, + int nid) { struct shadow_page_caches caches =3D { .page_header_cache =3D &vcpu->arch.mmu_page_header_cache, - .shadow_page_cache =3D &vcpu->arch.mmu_shadow_page_cache, + .shadow_page_cache =3D &vcpu->arch.mmu_shadow_page_cache[nid], .shadowed_info_cache =3D &vcpu->arch.mmu_shadowed_info_cache, .shadow_page_cache_lock =3D &vcpu->arch.mmu_shadow_page_cache_lock }; @@ -2316,15 +2327,19 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u= 64 *sptep, bool direct, =20 static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn, - bool direct, unsigned int access) + bool direct, unsigned int access, + kvm_pfn_t pfn) { union kvm_mmu_page_role role; + int nid; =20 if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) return ERR_PTR(-EEXIST); =20 role =3D kvm_mmu_child_role(sptep, direct, access); - return kvm_mmu_get_shadow_page(vcpu, gfn, role); + nid =3D kvm_pfn_to_page_table_nid(pfn); + + return kvm_mmu_get_shadow_page(vcpu, gfn, role, nid); } =20 static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *i= terator, @@ -3208,7 +3223,8 @@ static int direct_map(struct kvm_vcpu *vcpu, struct k= vm_page_fault *fault) if (it.level =3D=3D fault->goal_level) break; =20 - sp =3D kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true, ACC_ALL); + sp =3D kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true, + ACC_ALL, fault->pfn); if (sp =3D=3D ERR_PTR(-EEXIST)) continue; =20 @@ -3636,7 +3652,7 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gf= n_t gfn, int quadrant, WARN_ON_ONCE(quadrant && !role.has_4_byte_gpte); WARN_ON_ONCE(role.direct && role.has_4_byte_gpte); =20 - sp =3D kvm_mmu_get_shadow_page(vcpu, gfn, role); + sp =3D kvm_mmu_get_shadow_page(vcpu, gfn, role, numa_mem_id()); ++sp->root_count; =20 return __pa(sp->spt); @@ -5952,7 +5968,7 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, st= ruct kvm_mmu *mmu) =20 int kvm_mmu_create(struct kvm_vcpu *vcpu) { - int ret; + int ret, nid; =20 INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_pte_list_desc_cache, pte_list_desc_cache, NUMA_NO_NODE); @@ -5960,8 +5976,9 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_page_header_cache, mmu_page_header_cache, NUMA_NO_NODE); =20 - INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_shadow_page_cache, - NULL, NUMA_NO_NODE); + for_each_node(nid) + INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_shadow_page_cache[nid], + NULL, nid); spin_lock_init(&vcpu->arch.mmu_shadow_page_cache_lock); =20 vcpu->arch.mmu =3D &vcpu->arch.root_mmu; @@ -6692,13 +6709,17 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm,= u64 gen) } =20 static unsigned long mmu_shrink_cache(struct kvm_mmu_memory_cache *cache, + int cache_count, spinlock_t *cache_lock) { unsigned long freed =3D 0; + int nid; =20 spin_lock(cache_lock); - if (cache->nobjs) - freed =3D kvm_mmu_empty_memory_cache(cache); + for (nid =3D 0; nid < cache_count; nid++) { + if (node_online(nid) && cache[nid].nobjs) + freed +=3D kvm_mmu_empty_memory_cache(&cache[nid]); + } spin_unlock(cache_lock); return freed; } @@ -6721,13 +6742,15 @@ mmu_shrink_scan(struct shrinker *shrink, struct shr= ink_control *sc) list_move_tail(&kvm->vm_list, &vm_list); =20 freed +=3D mmu_shrink_cache(&kvm->arch.split_shadow_page_cache, + 1, &kvm->arch.split_shadow_page_cache_lock); =20 if (freed >=3D sc->nr_to_scan) break; =20 kvm_for_each_vcpu(i, vcpu, kvm) { - freed +=3D mmu_shrink_cache(&vcpu->arch.mmu_shadow_page_cache, + freed +=3D mmu_shrink_cache(vcpu->arch.mmu_shadow_page_cache, + MAX_NUMNODES, &vcpu->arch.mmu_shadow_page_cache_lock); } =20 diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index e5662dbd519c..1ceca62ec4cf 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -652,7 +652,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct k= vm_page_fault *fault, table_gfn =3D gw->table_gfn[it.level - 2]; access =3D gw->pt_access[it.level - 2]; sp =3D kvm_mmu_get_child_sp(vcpu, it.sptep, table_gfn, - false, access); + false, access, fault->pfn); =20 if (sp !=3D ERR_PTR(-EEXIST)) { /* @@ -708,7 +708,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct k= vm_page_fault *fault, validate_direct_spte(vcpu, it.sptep, direct_access); =20 sp =3D kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, - true, direct_access); + true, direct_access, fault->pfn); if (sp =3D=3D ERR_PTR(-EEXIST)) continue; =20 diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 376b8dceb3f9..b5abae2366dd 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -259,12 +259,12 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct = kvm *kvm, kvm_mmu_page_as_id(_root) !=3D _as_id) { \ } else =20 -static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu) +static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu, int ni= d) { struct kvm_mmu_page *sp; =20 sp =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache); - sp->spt =3D kvm_mmu_sp_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cac= he, + sp->spt =3D kvm_mmu_sp_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cac= he[nid], &vcpu->arch.mmu_shadow_page_cache_lock); =20 return sp; @@ -317,7 +317,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vc= pu) goto out; } =20 - root =3D tdp_mmu_alloc_sp(vcpu); + root =3D tdp_mmu_alloc_sp(vcpu, numa_mem_id()); tdp_mmu_init_sp(root, NULL, 0, role); =20 refcount_set(&root->tdp_mmu_root_count, 1); @@ -1149,7 +1149,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm= _page_fault *fault) struct kvm *kvm =3D vcpu->kvm; struct tdp_iter iter; struct kvm_mmu_page *sp; - int ret =3D RET_PF_RETRY; + int ret =3D RET_PF_RETRY, nid; =20 kvm_mmu_hugepage_adjust(vcpu, fault); =20 @@ -1178,11 +1178,12 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct k= vm_page_fault *fault) !is_large_pte(iter.old_spte)) continue; =20 + nid =3D kvm_pfn_to_page_table_nid(fault->pfn); /* * The SPTE is either non-present or points to a huge page that * needs to be split. */ - sp =3D tdp_mmu_alloc_sp(vcpu); + sp =3D tdp_mmu_alloc_sp(vcpu, nid); tdp_mmu_init_child_sp(sp, &iter); =20 sp->nx_huge_page_disallowed =3D fault->huge_page_disallowed; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index d96c8146e9ba..4f3db7ffeba8 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -415,7 +415,7 @@ static inline void *mmu_memory_cache_alloc_obj(struct k= vm_mmu_memory_cache *mc, if (mc->kmem_cache) return kmem_cache_alloc(mc->kmem_cache, gfp_flags); else - return (void *)__get_free_page(gfp_flags); + return kvm_mmu_get_free_page(mc->node, gfp_flags); } =20 int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capa= city, int min) --=20 2.39.0.314.g84b9a713c41-goog From nobody Wed Sep 17 05:56:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B757BC4332F for ; Thu, 22 Dec 2022 02:36:02 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235034AbiLVCgB (ORCPT ); Wed, 21 Dec 2022 21:36:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46676 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234980AbiLVCfh (ORCPT ); Wed, 21 Dec 2022 21:35:37 -0500 Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com [IPv6:2607:f8b0:4864:20::104a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A1EE264A0 for ; Wed, 21 Dec 2022 18:35:14 -0800 (PST) Received: by mail-pj1-x104a.google.com with SMTP id pi14-20020a17090b1e4e00b0021d20da7a51so2300625pjb.2 for ; Wed, 21 Dec 2022 18:35:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=OqpNn9k+U+hG3YogdgDR/fOEr/aI6IdVDrzcvJRoBmQ=; b=LZCil3LEBuiqPglCaABHVnyyRktcG83zDWgYL8mZt4giWucmb8FzbuHkl/qRDPEfTL sho0PmZ83Kz1PVrv7LxQITDU73TzGBV57cqQUyLCGPrx9sfwBeRiioD2DlM4/tip3fqi o4IJFivw5AtKaKsdUfv+TfoKhPRPMShpd4EbkgwFft29q/MNo2D6I6u6PD7ToZ52tz4W VKhny6D60ScNULutdaRX1NMuignZ1Y+Q42SR/GGwjeDOYE8dXqUceH2wQdl+BQh9qtF5 92z74m16CNnw3BzXc5KaYuQ6cZkc4YU7xlvloYt7f+L5x8vmyokePQB0TBCZVG3vF0Yg INTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=OqpNn9k+U+hG3YogdgDR/fOEr/aI6IdVDrzcvJRoBmQ=; b=b8XX7b2uZvcozgBj6OGMmzGTiexqi6XAxEjJK32lZe8NnldgDXdjl4XH5ZOZawHkGd F0J/gTgeTyI1tXKEjf9rHcmwSz64smtDZub1cn7hqo9dO2vk5e1lBeaa7NcFtB5qGDPw 6h/0TTlbC8Ijl9Mfz9XT6zlxTWrO1IPWhqzhvbwD9RY7M8W8aqi22eAYEQ29DzYW0QeN UrMbWwTfbS6zrDHnoKSQQnxtU8UyAv3RuwbTUuudxLU8mKDELLj6VhNii5Mr+98BY2hT aSKgxM3XH+dOB8D37oy9vH2lhofPRra+8/dRqvPcWP2ON8q3Y0pvTW1aWLu44qhBGJt3 fVgg== X-Gm-Message-State: AFqh2krDNTACpeR2KkYpbSGtJ926b9tzIXp89y9NoAO33SMje5ruzHKP 0cveuLDYPQsiR4LWM6jjJWCcRTIH+epO X-Google-Smtp-Source: AMrXdXsdS1uAeaLst9k9vclVCCOLWIbxK9Lx9IWWeNQ8BLMOiOd/S8VYUSYeV2+muAopSD1+l7BaJNjs32ck X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a17:902:7d92:b0:18e:bd50:f19a with SMTP id a18-20020a1709027d9200b0018ebd50f19amr219700plm.81.1671676513459; Wed, 21 Dec 2022 18:35:13 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:56 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-9-vipinsh@google.com> Subject: [Patch v3 8/9] KVM: x86/mmu: Make split_shadow_page_cache NUMA aware From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Make split_shadow_page_cache NUMA aware and allocate page table's pages during the split based on the underlying physical page's NUMA node. Signed-off-by: Vipin Sharma --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/mmu/mmu.c | 50 ++++++++++++++++++--------------- 2 files changed, 29 insertions(+), 23 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index b1f319ad6f89..7b3f36ae37a4 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1410,7 +1410,7 @@ struct kvm_arch { * * Protected by kvm->slots_lock. */ - struct kvm_mmu_memory_cache split_shadow_page_cache; + struct kvm_mmu_memory_cache split_shadow_page_cache[MAX_NUMNODES]; struct kvm_mmu_memory_cache split_page_header_cache; =20 /* diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 511c6ef265ee..7454bfc49a51 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -6126,7 +6126,7 @@ static void kvm_mmu_invalidate_zap_pages_in_memslot(s= truct kvm *kvm, int kvm_mmu_init_vm(struct kvm *kvm) { struct kvm_page_track_notifier_node *node =3D &kvm->arch.mmu_sp_tracker; - int r; + int r, nid; =20 INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages); @@ -6145,8 +6145,9 @@ int kvm_mmu_init_vm(struct kvm *kvm) INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_page_header_cache, mmu_page_header_cache, NUMA_NO_NODE); =20 - INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_shadow_page_cache, - NULL, NUMA_NO_NODE); + for_each_node(nid) + INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_shadow_page_cache[nid], + NULL, NUMA_NO_NODE); spin_lock_init(&kvm->arch.split_shadow_page_cache_lock); =20 INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_desc_cache, @@ -6157,10 +6158,13 @@ int kvm_mmu_init_vm(struct kvm *kvm) =20 static void mmu_free_vm_memory_caches(struct kvm *kvm) { + int nid; + kvm_mmu_free_memory_cache(&kvm->arch.split_desc_cache); kvm_mmu_free_memory_cache(&kvm->arch.split_page_header_cache); - mmu_free_sp_memory_cache(&kvm->arch.split_shadow_page_cache, - &kvm->arch.split_shadow_page_cache_lock); + for_each_node(nid) + mmu_free_sp_memory_cache(&kvm->arch.split_shadow_page_cache[nid], + &kvm->arch.split_shadow_page_cache_lock); } =20 void kvm_mmu_uninit_vm(struct kvm *kvm) @@ -6269,7 +6273,7 @@ static inline bool need_topup(struct kvm_mmu_memory_c= ache *cache, int min) return kvm_mmu_memory_cache_nr_free_objects(cache) < min; } =20 -static bool need_topup_split_caches_or_resched(struct kvm *kvm) +static bool need_topup_split_caches_or_resched(struct kvm *kvm, int nid) { if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) return true; @@ -6281,10 +6285,10 @@ static bool need_topup_split_caches_or_resched(stru= ct kvm *kvm) */ return need_topup(&kvm->arch.split_desc_cache, SPLIT_DESC_CACHE_MIN_NR_OB= JECTS) || need_topup(&kvm->arch.split_page_header_cache, 1) || - need_topup(&kvm->arch.split_shadow_page_cache, 1); + need_topup(&kvm->arch.split_shadow_page_cache[nid], 1); } =20 -static int topup_split_caches(struct kvm *kvm) +static int topup_split_caches(struct kvm *kvm, int nid) { /* * Allocating rmap list entries when splitting huge pages for nested @@ -6314,18 +6318,21 @@ static int topup_split_caches(struct kvm *kvm) if (r) return r; =20 - return mmu_topup_sp_memory_cache(&kvm->arch.split_shadow_page_cache, + return mmu_topup_sp_memory_cache(&kvm->arch.split_shadow_page_cache[nid], &kvm->arch.split_shadow_page_cache_lock, 1); } =20 -static struct kvm_mmu_page *shadow_mmu_get_sp_for_split(struct kvm *kvm, u= 64 *huge_sptep) +static struct kvm_mmu_page *shadow_mmu_get_sp_for_split(struct kvm *kvm, + u64 *huge_sptep, + u64 huge_spte) { struct kvm_mmu_page *huge_sp =3D sptep_to_sp(huge_sptep); struct shadow_page_caches caches =3D {}; union kvm_mmu_page_role role; unsigned int access; gfn_t gfn; + int nid; =20 gfn =3D kvm_mmu_page_get_gfn(huge_sp, spte_index(huge_sptep)); access =3D kvm_mmu_page_get_access(huge_sp, spte_index(huge_sptep)); @@ -6338,9 +6345,11 @@ static struct kvm_mmu_page *shadow_mmu_get_sp_for_sp= lit(struct kvm *kvm, u64 *hu */ role =3D kvm_mmu_child_role(huge_sptep, /*direct=3D*/true, access); =20 + nid =3D kvm_pfn_to_page_table_nid(spte_to_pfn(huge_spte)); + /* Direct SPs do not require a shadowed_info_cache. */ caches.page_header_cache =3D &kvm->arch.split_page_header_cache; - caches.shadow_page_cache =3D &kvm->arch.split_shadow_page_cache; + caches.shadow_page_cache =3D &kvm->arch.split_shadow_page_cache[nid]; caches.shadow_page_cache_lock =3D &kvm->arch.split_shadow_page_cache_lock; =20 /* Safe to pass NULL for vCPU since requesting a direct SP. */ @@ -6360,7 +6369,7 @@ static void shadow_mmu_split_huge_page(struct kvm *kv= m, gfn_t gfn; int index; =20 - sp =3D shadow_mmu_get_sp_for_split(kvm, huge_sptep); + sp =3D shadow_mmu_get_sp_for_split(kvm, huge_sptep, huge_spte); =20 for (index =3D 0; index < SPTE_ENT_PER_PAGE; index++) { sptep =3D &sp->spt[index]; @@ -6398,7 +6407,7 @@ static int shadow_mmu_try_split_huge_page(struct kvm = *kvm, u64 *huge_sptep) { struct kvm_mmu_page *huge_sp =3D sptep_to_sp(huge_sptep); - int level, r =3D 0; + int level, r =3D 0, nid; gfn_t gfn; u64 spte; =20 @@ -6406,13 +6415,14 @@ static int shadow_mmu_try_split_huge_page(struct kv= m *kvm, gfn =3D kvm_mmu_page_get_gfn(huge_sp, spte_index(huge_sptep)); level =3D huge_sp->role.level; spte =3D *huge_sptep; + nid =3D kvm_pfn_to_page_table_nid(spte_to_pfn(spte)); =20 if (kvm_mmu_available_pages(kvm) <=3D KVM_MIN_FREE_MMU_PAGES) { r =3D -ENOSPC; goto out; } =20 - if (need_topup_split_caches_or_resched(kvm)) { + if (need_topup_split_caches_or_resched(kvm, nid)) { write_unlock(&kvm->mmu_lock); cond_resched(); /* @@ -6420,7 +6430,7 @@ static int shadow_mmu_try_split_huge_page(struct kvm = *kvm, * rmap iterator should be restarted because the MMU lock was * dropped. */ - r =3D topup_split_caches(kvm) ?: -EAGAIN; + r =3D topup_split_caches(kvm, nid) ?: -EAGAIN; write_lock(&kvm->mmu_lock); goto out; } @@ -6709,17 +6719,15 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm,= u64 gen) } =20 static unsigned long mmu_shrink_cache(struct kvm_mmu_memory_cache *cache, - int cache_count, spinlock_t *cache_lock) { unsigned long freed =3D 0; int nid; =20 spin_lock(cache_lock); - for (nid =3D 0; nid < cache_count; nid++) { - if (node_online(nid) && cache[nid].nobjs) + for_each_online_node(nid) + if (cache[nid].nobjs) freed +=3D kvm_mmu_empty_memory_cache(&cache[nid]); - } spin_unlock(cache_lock); return freed; } @@ -6741,8 +6749,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrin= k_control *sc) first_kvm =3D kvm; list_move_tail(&kvm->vm_list, &vm_list); =20 - freed +=3D mmu_shrink_cache(&kvm->arch.split_shadow_page_cache, - 1, + freed +=3D mmu_shrink_cache(kvm->arch.split_shadow_page_cache, &kvm->arch.split_shadow_page_cache_lock); =20 if (freed >=3D sc->nr_to_scan) @@ -6750,7 +6757,6 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrin= k_control *sc) =20 kvm_for_each_vcpu(i, vcpu, kvm) { freed +=3D mmu_shrink_cache(vcpu->arch.mmu_shadow_page_cache, - MAX_NUMNODES, &vcpu->arch.mmu_shadow_page_cache_lock); } =20 --=20 2.39.0.314.g84b9a713c41-goog From nobody Wed Sep 17 05:56:26 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2CFBDC4332F for ; Thu, 22 Dec 2022 02:36:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235045AbiLVCgE (ORCPT ); Wed, 21 Dec 2022 21:36:04 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47288 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235046AbiLVCfj (ORCPT ); Wed, 21 Dec 2022 21:35:39 -0500 Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com [IPv6:2607:f8b0:4864:20::1049]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58B6D26AC6 for ; Wed, 21 Dec 2022 18:35:16 -0800 (PST) Received: by mail-pj1-x1049.google.com with SMTP id pa16-20020a17090b265000b0020a71040b4cso300727pjb.6 for ; Wed, 21 Dec 2022 18:35:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=b9yKHiaQzq9mZ3IJWGzrPWbD3+83EkY9xWMVBG05dNQ=; b=XlyER0aXwnjO8q6dTDihNfSIfjWn8dSVJ8IiBcSHE7yuANREvStJV3g9XDl8A/5Vwr OaNHoUYM9Z3wM78fZ/XqZqc2ZUebk2Up0L9xXgXL4U3gFGGz6w6NBfJtgfGPwYUC+B8s lG8Y2mimnBPeTLnpwg2Gmem25P6HBJfKl/2T+uaujz8TJVx2GozbpVFZB4OD+HpbyDdX FHdfVOyX6+BPp8Ht/TdNLUscgGNFc8b5S04IVXGZFWLXt3JasjvhdrpNTDIHgFTTxdhl C2eu1PX9i9Z3SW518upry6mS+ShsSl8pxZNEoSPq4X6DQp773Xn1V/u5SNKVXrbP8X0c vtUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=b9yKHiaQzq9mZ3IJWGzrPWbD3+83EkY9xWMVBG05dNQ=; b=x9xuSQJNwgHBxcU5HYm2Dy5FNIj96/5HorcahakgG3P9Ku2yRVrqXRMBE5cFuwD7tN vvBnWRbdpWOjoyCXYwl/wWpdAKR3loYWE0L8OydOrCl3gZw7rvdxANftHOU4tm/s3CM/ xfWqYJmpBBJ9q2PlABo4NZElzgzYHzOC4sXMVxapHDQSVR1T2Sr2D6sEgR4o/TM6ivtX mxGwfA4Rt3tWhmcFJ7uvNd0Nbg3ZdIu1OOJii0o/H6++Z3B+4Dh5/HUTllFjCa0So4R5 2JjHzGjMR0CyT3cApAjCOdPOrBx5jLo6vIN2ItEf8xNAxREP/GzQOD77PrPXSFz7N4Yq xfFQ== X-Gm-Message-State: AFqh2kqjAGj+Z8tqtrPg/0TeX8sz8HZxGk4Jl6PmRUF7G/wTr7EePQ95 6vQwNnF30BcwPCYhB1cg+WOhxEvCn6XP X-Google-Smtp-Source: AMrXdXuzNj9EuXh7z055yKJ7hUYnL0IVln/7MJaAb+bM/jJJ7BibhMJSEr7enuNFlr9FF9cWepYFeciD79VX X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f]) (user=vipinsh job=sendgmr) by 2002:a17:90b:701:b0:219:1d0a:34a6 with SMTP id s1-20020a17090b070100b002191d0a34a6mr128835pjz.1.1671676515179; Wed, 21 Dec 2022 18:35:15 -0800 (PST) Date: Wed, 21 Dec 2022 18:34:57 -0800 In-Reply-To: <20221222023457.1764-1-vipinsh@google.com> Mime-Version: 1.0 References: <20221222023457.1764-1-vipinsh@google.com> X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog Message-ID: <20221222023457.1764-10-vipinsh@google.com> Subject: [Patch v3 9/9] KVM: x86/mmu: Reduce default cache size in KVM from 40 to PT64_ROOT_MAX_LEVEL From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com, dmatlack@google.com Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE is set to 40 without any specific reason. Reduce default size to PT64_ROOT_MAX_LEVEL, which is currently 5. Change mmu_pte_list_desc_cache size to what is needed as it is more than 5 but way less than 40. Tested by running dirty_log_perf_test on both tdp and shadow MMU with 48 vcpu and 2GB/vcpu size on a 2 NUMA node machine. No impact on performance noticed. Ran perf on dirty_log_perf_test and found kvm_mmu_get_free_page() calls reduced by ~3300 which is near to 48 (vcpus) * 2 (nodes) * 35 (cache size). Signed-off-by: Vipin Sharma --- arch/x86/include/asm/kvm_types.h | 2 +- arch/x86/kvm/mmu/mmu.c | 7 ++++--- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_types.h b/arch/x86/include/asm/kvm_ty= pes.h index 08f1b57d3b62..752dab218a62 100644 --- a/arch/x86/include/asm/kvm_types.h +++ b/arch/x86/include/asm/kvm_types.h @@ -2,6 +2,6 @@ #ifndef _ASM_X86_KVM_TYPES_H #define _ASM_X86_KVM_TYPES_H =20 -#define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40 +#define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE PT64_ROOT_MAX_LEVEL =20 #endif /* _ASM_X86_KVM_TYPES_H */ diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 7454bfc49a51..f89d933ff380 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -677,11 +677,12 @@ static int mmu_topup_sp_memory_cache(struct kvm_mmu_m= emory_cache *cache, =20 static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indir= ect) { - int r, nid; + int r, nid, desc_capacity; =20 /* 1 rmap, 1 parent PTE per level, and the prefetched rmaps. */ - r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache, - 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM); + desc_capacity =3D 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM; + r =3D __kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache, + desc_capacity, desc_capacity); if (r) return r; =20 --=20 2.39.0.314.g84b9a713c41-goog