From nobody Sun Feb  8 07:23:23 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6BF37C10F1B
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 02:35:17 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235000AbiLVCfO (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 21:35:14 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46286 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S234956AbiLVCfD (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 21:35:03 -0500
Received: from mail-pg1-x549.google.com (mail-pg1-x549.google.com
 [IPv6:2607:f8b0:4864:20::549])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 672BC65FB
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:02 -0800 (PST)
Received: by mail-pg1-x549.google.com with SMTP id
 a72-20020a63904b000000b00489753edf13so400679pge.21
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:02 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=6jVBO9S5dbcnt6LQaPOW+OOm7KBKZVW/r/1rxTlv7y0=;
        b=Oo7Qropuw/gdh8zHqxm5y4tH/LW1Zuut2j3i90LaWzm1//jVejmK3t0ed6NX17Pv5j
         Z5OHUTQpFUFenvI4oWoFiCoLJ+I0EuIZK/8I55I53zFB4+ry3RmuEfwzelxYKcdWWLZg
         GPq6Dzv2zwGXrzI5DqVl+cgeh08cdc3EjFSVeDGxQcjhdy0CkU4ppP/+YBk5LKo75SLB
         MO/IQNg1nVXQoJHIkgNmNYodEZF949Th0Xu9IXfEGAoIYB8OL62nKrKu8ZtwfZngXvmC
         nD6GIoiek/7Ls1lDTRlti1yOUj0n1DXw1a1LJ+MROJyWxH1cTjY0/82fooJp4KrK9zkj
         K7ZQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=6jVBO9S5dbcnt6LQaPOW+OOm7KBKZVW/r/1rxTlv7y0=;
        b=yKJaFqqWr1NixLarAH998os9Zkqjm/L6nNmWahiC7C9JV5Bix0twz3XnhL3WKE7tO7
         IyWea/nSBDQFdeB3ENWQjPszLVi/nR/dsdeupetTWFWuwt71yfJh8kths/KwcUoXmlzt
         dV8+D03CFHtq3gtl3FiUnIehHoATQMJNWr1Rzf+3b3SA7D9+k3pqZLhbEmER+g3YHTna
         5DO1125zThwvy7J8GqER4eHpUDBLjWidmlV8tMwL1BN6VQLUYLeaIWFexqxTzgUM/Srr
         bDGDbuaIFWbmSEbVOdX8wYrMVnfNFI0hMFBTBSHt/W5YPDa4234j2a1si7HFaad3pJYx
         mxRg==
X-Gm-Message-State: AFqh2kqVIzpADmZ046YbWTHbfhA2IFjxrXASUG/GeTuSFIpbEBxVe8Rg
        AjE2tJkRV2dwF9w5qAjAx6Y2emjvbVhd
X-Google-Smtp-Source: 
 AMrXdXscnJjos0xpFtk0+sb/v4CvHXN/vVCpf4sWeWSpsLgs5nHAFKaZ6pg0jsQ+eZgvw4oz61jBcmjZT8Ng
X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f])
 (user=vipinsh job=sendgmr) by 2002:a17:902:fe0c:b0:189:6077:5598 with SMTP id
 g12-20020a170902fe0c00b0018960775598mr278964plj.100.1671676501765; Wed, 21
 Dec 2022 18:35:01 -0800 (PST)
Date: Wed, 21 Dec 2022 18:34:49 -0800
In-Reply-To: <20221222023457.1764-1-vipinsh@google.com>
Mime-Version: 1.0
References: <20221222023457.1764-1-vipinsh@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Message-ID: <20221222023457.1764-2-vipinsh@google.com>
Subject: [Patch v3 1/9] KVM: x86/mmu: Repurpose KVM MMU shrinker to purge
 shadow page caches
From: Vipin Sharma <vipinsh@google.com>
To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com,
        dmatlack@google.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Vipin Sharma <vipinsh@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

mmu_shrink_scan() is very disruptive to VMs. It picks the first
VM in the vm_list, zaps the oldest page which is most likely an upper
level SPTEs and most like to be reused. Prior to TDP MMU, this is even
more disruptive in nested VMs case, considering L1 SPTEs will be the
oldest even though most of the entries are for L2 SPTEs.

As discussed in
https://lore.kernel.org/lkml/Y45dldZnI6OIf+a5@google.com/
shrinker logic has not be very useful in actually keeping VMs performant
and reducing memory usage.

Change mmu_shrink_scan() to free pages from the vCPU's shadow page
cache.  Freeing pages from cache doesn't cause vCPU exits, therefore, a
VM's performance should not be affected.

This also allows to change cache capacities without worrying too much
about high memory usage in cache.

Tested this change by running dirty_log_perf_test while dropping cache
via "echo 2 > /proc/sys/vm/drop_caches" at 1 second interval
continuously. There were WARN_ON(!mc->nobjs) messages printed in kernel
logs from kvm_mmu_memory_cache_alloc(), which is expected.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Vipin Sharma <vipinsh@google.com>
---
 arch/x86/include/asm/kvm_host.h |   5 +
 arch/x86/kvm/mmu/mmu.c          | 163 +++++++++++++++++++-------------
 arch/x86/kvm/mmu/mmu_internal.h |   2 +
 arch/x86/kvm/mmu/tdp_mmu.c      |   3 +-
 include/linux/kvm_host.h        |   1 +
 virt/kvm/kvm_main.c             |  11 ++-
 6 files changed, 114 insertions(+), 71 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index aa4eb8cfcd7e..89cc809e4a00 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -786,6 +786,11 @@ struct kvm_vcpu_arch {
 	struct kvm_mmu_memory_cache mmu_shadowed_info_cache;
 	struct kvm_mmu_memory_cache mmu_page_header_cache;
=20
+	/*
+	 * Protects change in size of mmu_shadow_page_cache cache.
+	 */
+	spinlock_t mmu_shadow_page_cache_lock;
+
 	/*
 	 * QEMU userspace and the guest each have their own FPU state.
 	 * In vcpu_run, we switch between the user and guest FPU contexts.
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 254bc46234e0..157417e1cb6e 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -164,7 +164,10 @@ struct kvm_shadow_walk_iterator {
=20
 static struct kmem_cache *pte_list_desc_cache;
 struct kmem_cache *mmu_page_header_cache;
-static struct percpu_counter kvm_total_used_mmu_pages;
+/*
+ * Total number of unused pages in MMU shadow page cache.
+ */
+static struct percpu_counter kvm_total_unused_mmu_pages;
=20
 static void mmu_spte_set(u64 *sptep, u64 spte);
=20
@@ -655,6 +658,22 @@ static void walk_shadow_page_lockless_end(struct kvm_v=
cpu *vcpu)
 	}
 }
=20
+static int mmu_topup_sp_memory_cache(struct kvm_mmu_memory_cache *cache,
+				     spinlock_t *cache_lock)
+{
+	int orig_nobjs;
+	int r;
+
+	spin_lock(cache_lock);
+	orig_nobjs =3D cache->nobjs;
+	r =3D kvm_mmu_topup_memory_cache(cache, PT64_ROOT_MAX_LEVEL);
+	if (orig_nobjs !=3D cache->nobjs)
+		percpu_counter_add(&kvm_total_unused_mmu_pages,
+				   (cache->nobjs - orig_nobjs));
+	spin_unlock(cache_lock);
+	return r;
+}
+
 static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indir=
ect)
 {
 	int r;
@@ -664,8 +683,8 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcp=
u, bool maybe_indirect)
 				       1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
 	if (r)
 		return r;
-	r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
-				       PT64_ROOT_MAX_LEVEL);
+	r =3D mmu_topup_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
+				      &vcpu->arch.mmu_shadow_page_cache_lock);
 	if (r)
 		return r;
 	if (maybe_indirect) {
@@ -678,10 +697,25 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *v=
cpu, bool maybe_indirect)
 					  PT64_ROOT_MAX_LEVEL);
 }
=20
+static void mmu_free_sp_memory_cache(struct kvm_mmu_memory_cache *cache,
+				     spinlock_t *cache_lock)
+{
+	int orig_nobjs;
+
+	spin_lock(cache_lock);
+	orig_nobjs =3D cache->nobjs;
+	kvm_mmu_free_memory_cache(cache);
+	if (orig_nobjs)
+		percpu_counter_sub(&kvm_total_unused_mmu_pages, orig_nobjs);
+
+	spin_unlock(cache_lock);
+}
+
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
-	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
+	mmu_free_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
+				 &vcpu->arch.mmu_shadow_page_cache_lock);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
@@ -1693,27 +1727,15 @@ static int is_empty_shadow_page(u64 *spt)
 }
 #endif
=20
-/*
- * This value is the sum of all of the kvm instances's
- * kvm->arch.n_used_mmu_pages values.  We need a global,
- * aggregate version in order to make the slab shrinker
- * faster
- */
-static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, long nr)
-{
-	kvm->arch.n_used_mmu_pages +=3D nr;
-	percpu_counter_add(&kvm_total_used_mmu_pages, nr);
-}
-
 static void kvm_account_mmu_page(struct kvm *kvm, struct kvm_mmu_page *sp)
 {
-	kvm_mod_used_mmu_pages(kvm, +1);
+	kvm->arch.n_used_mmu_pages++;
 	kvm_account_pgtable_pages((void *)sp->spt, +1);
 }
=20
 static void kvm_unaccount_mmu_page(struct kvm *kvm, struct kvm_mmu_page *s=
p)
 {
-	kvm_mod_used_mmu_pages(kvm, -1);
+	kvm->arch.n_used_mmu_pages--;
 	kvm_account_pgtable_pages((void *)sp->spt, -1);
 }
=20
@@ -2150,8 +2172,31 @@ struct shadow_page_caches {
 	struct kvm_mmu_memory_cache *page_header_cache;
 	struct kvm_mmu_memory_cache *shadow_page_cache;
 	struct kvm_mmu_memory_cache *shadowed_info_cache;
+	/*
+	 * Protects change in size of shadow_page_cache cache.
+	 */
+	spinlock_t *shadow_page_cache_lock;
 };
=20
+void *kvm_mmu_sp_memory_cache_alloc(struct kvm_mmu_memory_cache *shadow_pa=
ge_cache,
+				    spinlock_t *cache_lock)
+{
+	int orig_nobjs;
+	void *page;
+
+	if (!cache_lock) {
+		spin_lock(cache_lock);
+		orig_nobjs =3D shadow_page_cache->nobjs;
+	}
+	page =3D kvm_mmu_memory_cache_alloc(shadow_page_cache);
+	if (!cache_lock) {
+		if (orig_nobjs)
+			percpu_counter_dec(&kvm_total_unused_mmu_pages);
+		spin_unlock(cache_lock);
+	}
+	return page;
+}
+
 static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page(struct kvm *kvm,
 						      struct shadow_page_caches *caches,
 						      gfn_t gfn,
@@ -2161,7 +2206,8 @@ static struct kvm_mmu_page *kvm_mmu_alloc_shadow_page=
(struct kvm *kvm,
 	struct kvm_mmu_page *sp;
=20
 	sp =3D kvm_mmu_memory_cache_alloc(caches->page_header_cache);
-	sp->spt =3D kvm_mmu_memory_cache_alloc(caches->shadow_page_cache);
+	sp->spt =3D kvm_mmu_sp_memory_cache_alloc(caches->shadow_page_cache,
+						caches->shadow_page_cache_lock);
 	if (!role.direct)
 		sp->shadowed_translation =3D kvm_mmu_memory_cache_alloc(caches->shadowed=
_info_cache);
=20
@@ -2218,6 +2264,7 @@ static struct kvm_mmu_page *kvm_mmu_get_shadow_page(s=
truct kvm_vcpu *vcpu,
 		.page_header_cache =3D &vcpu->arch.mmu_page_header_cache,
 		.shadow_page_cache =3D &vcpu->arch.mmu_shadow_page_cache,
 		.shadowed_info_cache =3D &vcpu->arch.mmu_shadowed_info_cache,
+		.shadow_page_cache_lock =3D &vcpu->arch.mmu_shadow_page_cache_lock
 	};
=20
 	return __kvm_mmu_get_shadow_page(vcpu->kvm, vcpu, &caches, gfn, role);
@@ -5916,6 +5963,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.mmu_page_header_cache.gfp_zero =3D __GFP_ZERO;
=20
 	vcpu->arch.mmu_shadow_page_cache.gfp_zero =3D __GFP_ZERO;
+	spin_lock_init(&vcpu->arch.mmu_shadow_page_cache_lock);
=20
 	vcpu->arch.mmu =3D &vcpu->arch.root_mmu;
 	vcpu->arch.walk_mmu =3D &vcpu->arch.root_mmu;
@@ -6051,11 +6099,6 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 		kvm_tdp_mmu_zap_invalidated_roots(kvm);
 }
=20
-static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
-{
-	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
-}
-
 static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm,
 			struct kvm_memory_slot *slot,
 			struct kvm_page_track_notifier_node *node)
@@ -6277,6 +6320,7 @@ static struct kvm_mmu_page *shadow_mmu_get_sp_for_spl=
it(struct kvm *kvm, u64 *hu
 	/* Direct SPs do not require a shadowed_info_cache. */
 	caches.page_header_cache =3D &kvm->arch.split_page_header_cache;
 	caches.shadow_page_cache =3D &kvm->arch.split_shadow_page_cache;
+	caches.shadow_page_cache_lock =3D NULL;
=20
 	/* Safe to pass NULL for vCPU since requesting a direct SP. */
 	return __kvm_mmu_get_shadow_page(kvm, NULL, &caches, gfn, role);
@@ -6646,66 +6690,49 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm,=
 u64 gen)
 static unsigned long
 mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 {
-	struct kvm *kvm;
-	int nr_to_scan =3D sc->nr_to_scan;
+	struct kvm_mmu_memory_cache *cache;
+	struct kvm *kvm, *first_kvm =3D NULL;
 	unsigned long freed =3D 0;
+	/* spinlock for memory cache */
+	spinlock_t *cache_lock;
+	struct kvm_vcpu *vcpu;
+	unsigned long i;
=20
 	mutex_lock(&kvm_lock);
=20
 	list_for_each_entry(kvm, &vm_list, vm_list) {
-		int idx;
-		LIST_HEAD(invalid_list);
-
-		/*
-		 * Never scan more than sc->nr_to_scan VM instances.
-		 * Will not hit this condition practically since we do not try
-		 * to shrink more than one VM and it is very unlikely to see
-		 * !n_used_mmu_pages so many times.
-		 */
-		if (!nr_to_scan--)
+		if (first_kvm =3D=3D kvm)
 			break;
-		/*
-		 * n_used_mmu_pages is accessed without holding kvm->mmu_lock
-		 * here. We may skip a VM instance errorneosly, but we do not
-		 * want to shrink a VM that only started to populate its MMU
-		 * anyway.
-		 */
-		if (!kvm->arch.n_used_mmu_pages &&
-		    !kvm_has_zapped_obsolete_pages(kvm))
-			continue;
+		if (!first_kvm)
+			first_kvm =3D kvm;
+		list_move_tail(&kvm->vm_list, &vm_list);
=20
-		idx =3D srcu_read_lock(&kvm->srcu);
-		write_lock(&kvm->mmu_lock);
+		kvm_for_each_vcpu(i, vcpu, kvm) {
+			cache =3D &vcpu->arch.mmu_shadow_page_cache;
+			cache_lock =3D &vcpu->arch.mmu_shadow_page_cache_lock;
+			if (READ_ONCE(cache->nobjs)) {
+				spin_lock(cache_lock);
+				freed +=3D kvm_mmu_empty_memory_cache(cache);
+				spin_unlock(cache_lock);
+			}
=20
-		if (kvm_has_zapped_obsolete_pages(kvm)) {
-			kvm_mmu_commit_zap_page(kvm,
-			      &kvm->arch.zapped_obsolete_pages);
-			goto unlock;
 		}
=20
-		freed =3D kvm_mmu_zap_oldest_mmu_pages(kvm, sc->nr_to_scan);
-
-unlock:
-		write_unlock(&kvm->mmu_lock);
-		srcu_read_unlock(&kvm->srcu, idx);
-
-		/*
-		 * unfair on small ones
-		 * per-vm shrinkers cry out
-		 * sadness comes quickly
-		 */
-		list_move_tail(&kvm->vm_list, &vm_list);
-		break;
+		if (freed >=3D sc->nr_to_scan)
+			break;
 	}
=20
+	if (freed)
+		percpu_counter_sub(&kvm_total_unused_mmu_pages, freed);
 	mutex_unlock(&kvm_lock);
+	percpu_counter_sync(&kvm_total_unused_mmu_pages);
 	return freed;
 }
=20
 static unsigned long
 mmu_shrink_count(struct shrinker *shrink, struct shrink_control *sc)
 {
-	return percpu_counter_read_positive(&kvm_total_used_mmu_pages);
+	return percpu_counter_sum_positive(&kvm_total_unused_mmu_pages);
 }
=20
 static struct shrinker mmu_shrinker =3D {
@@ -6820,7 +6847,7 @@ int kvm_mmu_vendor_module_init(void)
 	if (!mmu_page_header_cache)
 		goto out;
=20
-	if (percpu_counter_init(&kvm_total_used_mmu_pages, 0, GFP_KERNEL))
+	if (percpu_counter_init(&kvm_total_unused_mmu_pages, 0, GFP_KERNEL))
 		goto out;
=20
 	ret =3D register_shrinker(&mmu_shrinker, "x86-mmu");
@@ -6830,7 +6857,7 @@ int kvm_mmu_vendor_module_init(void)
 	return 0;
=20
 out_shrinker:
-	percpu_counter_destroy(&kvm_total_used_mmu_pages);
+	percpu_counter_destroy(&kvm_total_unused_mmu_pages);
 out:
 	mmu_destroy_caches();
 	return ret;
@@ -6847,7 +6874,7 @@ void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
 void kvm_mmu_vendor_module_exit(void)
 {
 	mmu_destroy_caches();
-	percpu_counter_destroy(&kvm_total_used_mmu_pages);
+	percpu_counter_destroy(&kvm_total_unused_mmu_pages);
 	unregister_shrinker(&mmu_shrinker);
 }
=20
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_interna=
l.h
index ac00bfbf32f6..c2a342028b6a 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -325,4 +325,6 @@ void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cach=
e *mc);
 void track_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *sp);
 void untrack_possible_nx_huge_page(struct kvm *kvm, struct kvm_mmu_page *s=
p);
=20
+void *kvm_mmu_sp_memory_cache_alloc(struct kvm_mmu_memory_cache *shadow_pa=
ge_cache,
+				    spinlock_t *cache_lock);
 #endif /* __KVM_X86_MMU_INTERNAL_H */
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 764f7c87286f..4974fa96deff 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -264,7 +264,8 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm=
_vcpu *vcpu)
 	struct kvm_mmu_page *sp;
=20
 	sp =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
-	sp->spt =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
+	sp->spt =3D kvm_mmu_sp_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cac=
he,
+						&vcpu->arch.mmu_shadow_page_cache_lock);
=20
 	return sp;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 01aad8b74162..efd9b38ea9a2 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1362,6 +1362,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm);
 int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
 int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capa=
city, int min);
 int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache *mc);
+int kvm_mmu_empty_memory_cache(struct kvm_mmu_memory_cache *mc);
 void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc);
 void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 13e88297f999..f2d762878b97 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -438,8 +438,10 @@ int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mm=
u_memory_cache *mc)
 	return mc->nobjs;
 }
=20
-void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
+int kvm_mmu_empty_memory_cache(struct kvm_mmu_memory_cache *mc)
 {
+	int freed =3D mc->nobjs;
+
 	while (mc->nobjs) {
 		if (mc->kmem_cache)
 			kmem_cache_free(mc->kmem_cache, mc->objects[--mc->nobjs]);
@@ -447,8 +449,13 @@ void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_c=
ache *mc)
 			free_page((unsigned long)mc->objects[--mc->nobjs]);
 	}
=20
-	kvfree(mc->objects);
+	return freed;
+}
=20
+void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
+{
+	kvm_mmu_empty_memory_cache(mc);
+	kvfree(mc->objects);
 	mc->objects =3D NULL;
 	mc->capacity =3D 0;
 }
--=20
2.39.0.314.g84b9a713c41-goog
From nobody Sun Feb  8 07:23:23 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3CCC0C4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 02:35:11 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234976AbiLVCfI (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 21:35:08 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46296 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S234961AbiLVCfE (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 21:35:04 -0500
Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com
 [IPv6:2607:f8b0:4864:20::54a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA2E820BDC
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:03 -0800 (PST)
Received: by mail-pg1-x54a.google.com with SMTP id
 j21-20020a63fc15000000b00476d6932baeso397205pgi.23
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:03 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=lvgCZzGuCZEbUtwnqfq9owkASUBDttKfTawhWhWdLK8=;
        b=Lj/SLu+ZPDBJ2sGiuCeMbA8Eib+ys2Vq5L2kcNArJJ27StQ16ma1UJb42Mlbw+MX2y
         +vrXP50O/g9dt+8oLnnQPMY/pVvzZccMhn37yhkkMKRf4OlgdSQXrJ+ZWJgL43dFFYR+
         tdwnZfiM6qXzjCF0pB4Ud8WosLspziQJwWE3TyKPGONvwBr+KawoDzl0ACL4bYB7f0ka
         +9u/KzDX7Y/WSElPNKVuB24p/2FtsBTEOpi1zH9/xrJSOu0EbYYUmkBsdvah4qZZp3Yj
         04qEKwX3b7Ue+pkFRhxsJ6kCgSSMeS3TCIzw3hYmLO78q2Llvjee+NN6cvHnX+T+1k7v
         btLg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=lvgCZzGuCZEbUtwnqfq9owkASUBDttKfTawhWhWdLK8=;
        b=J/UK/VrMirhF6brZ2oOIGpqfuCgjfp4TLsdloNwyryMps2DSbsiH3r+jKrEGeP7vwR
         a1tcjEeMMAO+plD7GOCvk1Z8Rl80ErWyMSqXrjNoWsmDHkH+TjfpS8wTJNXOZ271+fDf
         MRs5oEENxdaVM0jeFIpJBQwa0wG00uHxDKKLqc23KlJIXOS815NybYP7SdmeLfSRPZH6
         WmuEIHXAqsjacJ8ZkWKH1P4RsNvj4m11nRHLGr2ROGv+7K/p7+5czHcONCG/YlpIXguX
         zX8V5KMmdC2B5bGPtxjkrinhUKMBG/7M/Jb0IBBpNPxf5LWe5oHZyGomWuSOMduXQwEg
         DBXQ==
X-Gm-Message-State: AFqh2kpdBrbaQpS0tJCF5B9pITc9H0JUIqqdvXvtDV0iOYydq2xCNUuN
        hLihwGS7jZDx2cBdovJROYx5+LVIsIZe
X-Google-Smtp-Source: 
 AMrXdXvpyhuXUSV2ZgsVbb3u/vup+sTPI608UoPHhlRzsUzMCiUiErRLse5yLEL+556xYOprZ+VJNxunZl78
X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f])
 (user=vipinsh job=sendgmr) by 2002:a17:90b:d8a:b0:223:f336:1519 with SMTP id
 bg10-20020a17090b0d8a00b00223f3361519mr359433pjb.198.1671676503393; Wed, 21
 Dec 2022 18:35:03 -0800 (PST)
Date: Wed, 21 Dec 2022 18:34:50 -0800
In-Reply-To: <20221222023457.1764-1-vipinsh@google.com>
Mime-Version: 1.0
References: <20221222023457.1764-1-vipinsh@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Message-ID: <20221222023457.1764-3-vipinsh@google.com>
Subject: [Patch v3 2/9] KVM: x86/mmu: Remove zapped_obsolete_pages from struct
 kvm_arch{}
From: Vipin Sharma <vipinsh@google.com>
To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com,
        dmatlack@google.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Vipin Sharma <vipinsh@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

zapped_obsolete_pages list was used in struct kvm_arch{} to provide
pages for KVM MMU shrinker. This is not needed now as KVM MMU shrinker
has been repurposed to free shadow page caches and not
zapped_obsolete_pages.

Remove zapped_obsolete_pages from struct kvm_arch{} and use local list
in kvm_zap_obsolete_pages().

Signed-off-by: Vipin Sharma <vipinsh@google.com>
Reviewed-by: David Matlack <dmatlack@google.com>
---
 arch/x86/include/asm/kvm_host.h | 1 -
 arch/x86/kvm/mmu/mmu.c          | 8 ++++----
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 89cc809e4a00..f89f02e18080 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1215,7 +1215,6 @@ struct kvm_arch {
 	u8 mmu_valid_gen;
 	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
 	struct list_head active_mmu_pages;
-	struct list_head zapped_obsolete_pages;
 	/*
 	 * A list of kvm_mmu_page structs that, if zapped, could possibly be
 	 * replaced by an NX huge page.  A shadow page is on this list if its
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 157417e1cb6e..3364760a1695 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5987,6 +5987,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
 {
 	struct kvm_mmu_page *sp, *node;
 	int nr_zapped, batch =3D 0;
+	LIST_HEAD(zapped_pages);
 	bool unstable;
=20
 restart:
@@ -6019,8 +6020,8 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
 			goto restart;
 		}
=20
-		unstable =3D __kvm_mmu_prepare_zap_page(kvm, sp,
-				&kvm->arch.zapped_obsolete_pages, &nr_zapped);
+		unstable =3D __kvm_mmu_prepare_zap_page(kvm, sp, &zapped_pages,
+						      &nr_zapped);
 		batch +=3D nr_zapped;
=20
 		if (unstable)
@@ -6036,7 +6037,7 @@ static void kvm_zap_obsolete_pages(struct kvm *kvm)
 	 * kvm_mmu_load()), and the reload in the caller ensure no vCPUs are
 	 * running with an obsolete MMU.
 	 */
-	kvm_mmu_commit_zap_page(kvm, &kvm->arch.zapped_obsolete_pages);
+	kvm_mmu_commit_zap_page(kvm, &zapped_pages);
 }
=20
 /*
@@ -6112,7 +6113,6 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	int r;
=20
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
-	INIT_LIST_HEAD(&kvm->arch.zapped_obsolete_pages);
 	INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages);
 	spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
=20
--=20
2.39.0.314.g84b9a713c41-goog
From nobody Sun Feb  8 07:23:23 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 85CEAC4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 02:35:21 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234992AbiLVCfT (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 21:35:19 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46330 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S234966AbiLVCfG (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 21:35:06 -0500
Received: from mail-pg1-x54a.google.com (mail-pg1-x54a.google.com
 [IPv6:2607:f8b0:4864:20::54a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C0B5424BE9
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:05 -0800 (PST)
Received: by mail-pg1-x54a.google.com with SMTP id
 h185-20020a636cc2000000b004820a10a57bso398969pgc.22
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:05 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=+chyPC2pzC5mKivdZn2tXKKp7i5hdJsSiGnXjGpd+qo=;
        b=nHCHJ906sv9cxX8DnY/jDErP4w5InUk/yQ735EsKsvPVAmpXxi0y6/6oFa0Ure2H/v
         hzYxnyKvYvrKkdb7uMSqsnWwhBbO0EhadDcgtpLc22KCypMQx7uCwCa782RsWJA/djXh
         VToUto+d6Ob3eNyF0TG05M2icTNN9rOnPtRuHqrAeoZZawgG/HO/2quETm+y7BdXSAPK
         3XuARlSbS0ji1U90zg9ATDqpNhQy4Ra4Dar/1GHhR8l812S8MSprSWgnsTFSq6tDseSY
         XQXRLkfGe7gPLZJHtk0190ocrakiM/5KtJaA73UsDTTKrqmvqSCLZ8UryxcP8WaMHZlQ
         Lq+A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=+chyPC2pzC5mKivdZn2tXKKp7i5hdJsSiGnXjGpd+qo=;
        b=G1e/M21fILzTItK/Q/WyCvurI6mxTimrupn4jT5PDER3h4nnBQ82ooHOqDhDLOwIuc
         W54tg4+A/0n2MmKBTkV0WwmNjohh3pZdGwkbW/ufTXzhUGbXsiNia/WIJZxRQbGFxCT7
         F4I+Z+mGqwVXDmgRHI398iqs78+K+LLB/1LFPNZrl31MSrN36RP8UrnZa2eZvkxj+/wt
         /FEoIoinWYj7+nzMfMtGzPE9L2lGinVdx77+Gx1T3tPUH88zFRSJD/fhGXyUmqwik6Eh
         98pAyRhliXevUykNkd627yP3N2SwoxrbRQpTWJCR2xlIDsIyjqEaWWyvVb/U4xkglVfj
         79HA==
X-Gm-Message-State: AFqh2krT1JcprpjPrCEG/b0FonG9iBSyx245rzRNEAclKk10fWlZ+Xhb
        Qb9oI3Emxt35KJmhkQnF9Tjiq8LQ8Y+X
X-Google-Smtp-Source: 
 AMrXdXsothIyZZv/0F4TXnv6H6IlW73v2BSr7V+Bexsj6gh/v1vAiynOCYEIOrsxLEPfYDTbw3BlGUXc6gE7
X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f])
 (user=vipinsh job=sendgmr) by 2002:a63:8c48:0:b0:479:46cd:e2dc with SMTP id
 q8-20020a638c48000000b0047946cde2dcmr163712pgn.547.1671676505112; Wed, 21 Dec
 2022 18:35:05 -0800 (PST)
Date: Wed, 21 Dec 2022 18:34:51 -0800
In-Reply-To: <20221222023457.1764-1-vipinsh@google.com>
Mime-Version: 1.0
References: <20221222023457.1764-1-vipinsh@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Message-ID: <20221222023457.1764-4-vipinsh@google.com>
Subject: [Patch v3 3/9] KVM: x86/mmu: Shrink split_shadow_page_cache via KVM
 MMU shrinker
From: Vipin Sharma <vipinsh@google.com>
To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com,
        dmatlack@google.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Vipin Sharma <vipinsh@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

split_shadow_page_cache is not used after dirty log is disabled. It is a
good candidate to free memory in case of mmu_shrink_scan kicks in.

Account for split_shadow_page_cache via kvm_total_unused_mmu_pages and
use it in mmu_shrink_scan.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
---
 arch/x86/include/asm/kvm_host.h |  5 +++
 arch/x86/kvm/mmu/mmu.c          | 63 +++++++++++++++++++--------------
 2 files changed, 42 insertions(+), 26 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index f89f02e18080..293994fabae3 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1413,6 +1413,11 @@ struct kvm_arch {
 	struct kvm_mmu_memory_cache split_shadow_page_cache;
 	struct kvm_mmu_memory_cache split_page_header_cache;
=20
+	/*
+	 * Protects change in size of split_shadow_page_cache cache.
+	 */
+	spinlock_t split_shadow_page_cache_lock;
+
 	/*
 	 * Memory cache used to allocate pte_list_desc structs while splitting
 	 * huge pages. In the worst case, to split one huge page, 512
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 3364760a1695..6f6a10d7a871 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -659,14 +659,15 @@ static void walk_shadow_page_lockless_end(struct kvm_=
vcpu *vcpu)
 }
=20
 static int mmu_topup_sp_memory_cache(struct kvm_mmu_memory_cache *cache,
-				     spinlock_t *cache_lock)
+				     spinlock_t *cache_lock,
+				     int min)
 {
 	int orig_nobjs;
 	int r;
=20
 	spin_lock(cache_lock);
 	orig_nobjs =3D cache->nobjs;
-	r =3D kvm_mmu_topup_memory_cache(cache, PT64_ROOT_MAX_LEVEL);
+	r =3D kvm_mmu_topup_memory_cache(cache, min);
 	if (orig_nobjs !=3D cache->nobjs)
 		percpu_counter_add(&kvm_total_unused_mmu_pages,
 				   (cache->nobjs - orig_nobjs));
@@ -684,7 +685,8 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcp=
u, bool maybe_indirect)
 	if (r)
 		return r;
 	r =3D mmu_topup_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
-				      &vcpu->arch.mmu_shadow_page_cache_lock);
+				      &vcpu->arch.mmu_shadow_page_cache_lock,
+				      PT64_ROOT_MAX_LEVEL);
 	if (r)
 		return r;
 	if (maybe_indirect) {
@@ -2184,16 +2186,12 @@ void *kvm_mmu_sp_memory_cache_alloc(struct kvm_mmu_=
memory_cache *shadow_page_cac
 	int orig_nobjs;
 	void *page;
=20
-	if (!cache_lock) {
-		spin_lock(cache_lock);
-		orig_nobjs =3D shadow_page_cache->nobjs;
-	}
+	spin_lock(cache_lock);
+	orig_nobjs =3D shadow_page_cache->nobjs;
 	page =3D kvm_mmu_memory_cache_alloc(shadow_page_cache);
-	if (!cache_lock) {
-		if (orig_nobjs)
-			percpu_counter_dec(&kvm_total_unused_mmu_pages);
-		spin_unlock(cache_lock);
-	}
+	if (orig_nobjs)
+		percpu_counter_dec(&kvm_total_unused_mmu_pages);
+	spin_unlock(cache_lock);
 	return page;
 }
=20
@@ -6130,6 +6128,7 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	kvm->arch.split_page_header_cache.gfp_zero =3D __GFP_ZERO;
=20
 	kvm->arch.split_shadow_page_cache.gfp_zero =3D __GFP_ZERO;
+	spin_lock_init(&kvm->arch.split_shadow_page_cache_lock);
=20
 	kvm->arch.split_desc_cache.kmem_cache =3D pte_list_desc_cache;
 	kvm->arch.split_desc_cache.gfp_zero =3D __GFP_ZERO;
@@ -6141,7 +6140,8 @@ static void mmu_free_vm_memory_caches(struct kvm *kvm)
 {
 	kvm_mmu_free_memory_cache(&kvm->arch.split_desc_cache);
 	kvm_mmu_free_memory_cache(&kvm->arch.split_page_header_cache);
-	kvm_mmu_free_memory_cache(&kvm->arch.split_shadow_page_cache);
+	mmu_free_sp_memory_cache(&kvm->arch.split_shadow_page_cache,
+				 &kvm->arch.split_shadow_page_cache_lock);
 }
=20
 void kvm_mmu_uninit_vm(struct kvm *kvm)
@@ -6295,7 +6295,9 @@ static int topup_split_caches(struct kvm *kvm)
 	if (r)
 		return r;
=20
-	return kvm_mmu_topup_memory_cache(&kvm->arch.split_shadow_page_cache, 1);
+	return mmu_topup_sp_memory_cache(&kvm->arch.split_shadow_page_cache,
+					 &kvm->arch.split_shadow_page_cache_lock,
+					 1);
 }
=20
 static struct kvm_mmu_page *shadow_mmu_get_sp_for_split(struct kvm *kvm, u=
64 *huge_sptep)
@@ -6320,7 +6322,7 @@ static struct kvm_mmu_page *shadow_mmu_get_sp_for_spl=
it(struct kvm *kvm, u64 *hu
 	/* Direct SPs do not require a shadowed_info_cache. */
 	caches.page_header_cache =3D &kvm->arch.split_page_header_cache;
 	caches.shadow_page_cache =3D &kvm->arch.split_shadow_page_cache;
-	caches.shadow_page_cache_lock =3D NULL;
+	caches.shadow_page_cache_lock =3D &kvm->arch.split_shadow_page_cache_lock;
=20
 	/* Safe to pass NULL for vCPU since requesting a direct SP. */
 	return __kvm_mmu_get_shadow_page(kvm, NULL, &caches, gfn, role);
@@ -6687,14 +6689,23 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm,=
 u64 gen)
 	}
 }
=20
+static unsigned long mmu_shrink_cache(struct kvm_mmu_memory_cache *cache,
+				      spinlock_t *cache_lock)
+{
+	unsigned long freed =3D 0;
+
+	spin_lock(cache_lock);
+	if (cache->nobjs)
+		freed =3D kvm_mmu_empty_memory_cache(cache);
+	spin_unlock(cache_lock);
+	return freed;
+}
+
 static unsigned long
 mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc)
 {
-	struct kvm_mmu_memory_cache *cache;
 	struct kvm *kvm, *first_kvm =3D NULL;
 	unsigned long freed =3D 0;
-	/* spinlock for memory cache */
-	spinlock_t *cache_lock;
 	struct kvm_vcpu *vcpu;
 	unsigned long i;
=20
@@ -6707,15 +6718,15 @@ mmu_shrink_scan(struct shrinker *shrink, struct shr=
ink_control *sc)
 			first_kvm =3D kvm;
 		list_move_tail(&kvm->vm_list, &vm_list);
=20
-		kvm_for_each_vcpu(i, vcpu, kvm) {
-			cache =3D &vcpu->arch.mmu_shadow_page_cache;
-			cache_lock =3D &vcpu->arch.mmu_shadow_page_cache_lock;
-			if (READ_ONCE(cache->nobjs)) {
-				spin_lock(cache_lock);
-				freed +=3D kvm_mmu_empty_memory_cache(cache);
-				spin_unlock(cache_lock);
-			}
+		freed +=3D mmu_shrink_cache(&kvm->arch.split_shadow_page_cache,
+					  &kvm->arch.split_shadow_page_cache_lock);
=20
+		if (freed >=3D sc->nr_to_scan)
+			break;
+
+		kvm_for_each_vcpu(i, vcpu, kvm) {
+			freed +=3D mmu_shrink_cache(&vcpu->arch.mmu_shadow_page_cache,
+						  &vcpu->arch.mmu_shadow_page_cache_lock);
 		}
=20
 		if (freed >=3D sc->nr_to_scan)
--=20
2.39.0.314.g84b9a713c41-goog
From nobody Sun Feb  8 07:23:23 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6CB03C4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 02:35:25 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230319AbiLVCfX (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 21:35:23 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46370 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S234975AbiLVCfI (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 21:35:08 -0500
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4C6D224BE9
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:07 -0800 (PST)
Received: by mail-pj1-x1049.google.com with SMTP id
 il11-20020a17090b164b00b00219a4366109so2315685pjb.0
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:07 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=bxkmrVszmYtdX56U0+zIQlMbh2VgS2C48ld2l9CYwus=;
        b=Rn6JhTg0zhPlYluWmxYHpBlbM65jcOKT08CTpFfaF7H5DiKsfX4cghOlXKtbXYarOK
         Rly5XOhtvEYkFNI27R7kdmcDaft+DtQZHjIUVe60dOsihgLsa9smgxM+b5XDJjnQ5ViO
         Lkw5zbkLDE5ElgM22t1HWQhqSPjJc/A7roPsY0/dbXeUhXDdkCO+aerp6M5Gy8UvxMx9
         hry1N1NLcMymvSxaWuZTNy40tIRuVsbU4zETnVIVyXentp2aZLJhf0IN9lIyvdeQEICD
         jjslMf9GOy2+ihLLAQ7wMggtMTn5Pk6LiRqyldHCiyElR3iBYcg59o/9Nzc9a7CBMBBj
         D2Vw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=bxkmrVszmYtdX56U0+zIQlMbh2VgS2C48ld2l9CYwus=;
        b=ucqHjmmPBBcd6XRaqeEXEVvzl6lObI5ylqBFUK0Fw2dIyJCCI5mUQYR844o3YahPNI
         DgcpLzE03Ycae2WLyQTlDsPYKVpGFcP0+fqVK/Oyu9FOHjZGYxmbk5ZRLKWpdxF62Bwd
         w06Ci/Jw7ZOEQGxu/HSaHtPrW/XU9NAWs22dTPvzSDlkX5vOFDY0P/dn0O1mqrYhwJeW
         LtHrr/gJODrZoYEvfLpgMQFRc7+osVDvJexdnQqfcr+GnkW7SUSJ8wqK3FPNCr0q17p4
         HQRDqrL1x0eyGThPODqync4Ce67z5g9UzFZmhp+9Zrx2S6SX5GuniHZDHKyW+jkAiWLq
         ZFfQ==
X-Gm-Message-State: AFqh2kpAqvLZlqPTmzM8xkc8MTgUZC/3WhWORevfDaighXwC6Ym2vkPj
        6L2S6WFcDQ6rAjfnspOCIk10EHEAyEGS
X-Google-Smtp-Source: 
 AMrXdXtuF9r3bYe3VPv1FhZLvcXC0CXs3tzUIuXM098fxmkDbP+sDF600+9NCrol1k+pvP3NgkN8AZPuhIVd
X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f])
 (user=vipinsh job=sendgmr) by 2002:a62:1d0c:0:b0:56b:bba4:650a with SMTP id
 d12-20020a621d0c000000b0056bbba4650amr294643pfd.4.1671676506822; Wed, 21 Dec
 2022 18:35:06 -0800 (PST)
Date: Wed, 21 Dec 2022 18:34:52 -0800
In-Reply-To: <20221222023457.1764-1-vipinsh@google.com>
Mime-Version: 1.0
References: <20221222023457.1764-1-vipinsh@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Message-ID: <20221222023457.1764-5-vipinsh@google.com>
Subject: [Patch v3 4/9] KVM: Add module param to make page tables NUMA aware
From: Vipin Sharma <vipinsh@google.com>
To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com,
        dmatlack@google.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Vipin Sharma <vipinsh@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Add a numa_aware_page_table module param to make page tables NUMA aware.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
---
 include/linux/kvm_host.h |  2 ++
 virt/kvm/kvm_main.c      | 22 ++++++++++++++++++++++
 2 files changed, 24 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index efd9b38ea9a2..d48064503b88 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1358,6 +1358,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool use=
rmode_vcpu_not_eligible);
=20
 void kvm_flush_remote_tlbs(struct kvm *kvm);
=20
+void *kvm_mmu_get_free_page(int nid, gfp_t gfp);
+
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
 int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
 int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capa=
city, int min);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f2d762878b97..d96c8146e9ba 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -93,6 +93,13 @@ unsigned int halt_poll_ns_shrink;
 module_param(halt_poll_ns_shrink, uint, 0644);
 EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
=20
+/*
+ * If possible, allocate page table's pages on the same node the underlying
+ * physical page is pointing to.
+ */
+static bool __read_mostly numa_aware_pagetable =3D true;
+module_param_named(numa_aware_pagetable, numa_aware_pagetable, bool, 0644);
+
 /*
  * Ordering of locks:
  *
@@ -384,6 +391,21 @@ static void kvm_flush_shadow_all(struct kvm *kvm)
 	kvm_arch_guest_memory_reclaimed(kvm);
 }
=20
+void *kvm_mmu_get_free_page(int nid, gfp_t gfp)
+{
+	#ifdef CONFIG_NUMA
+	struct page *spt_page;
+
+	if (numa_aware_pagetable) {
+		spt_page =3D alloc_pages_node(nid, gfp, 0);
+		if (spt_page)
+			return page_address(spt_page);
+	}
+	#endif // CONFIG_NUMA
+
+	return (void *)__get_free_page(gfp);
+}
+
 #ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
 static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache=
 *mc,
 					       gfp_t gfp_flags)
--=20
2.39.0.314.g84b9a713c41-goog
From nobody Sun Feb  8 07:23:23 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6A509C4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 02:35:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234993AbiLVCf1 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 21:35:27 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46538 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S234967AbiLVCfO (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 21:35:14 -0500
Received: from mail-pl1-x649.google.com (mail-pl1-x649.google.com
 [IPv6:2607:f8b0:4864:20::649])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E956A25E88
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:08 -0800 (PST)
Received: by mail-pl1-x649.google.com with SMTP id
 n5-20020a170902d2c500b00189e5b86fe2so491946plc.16
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:08 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=Uy5Q2JyltsFU76SmhFR5SIPjgCJm3U7CWH+6Z0DXsQs=;
        b=Sxl6bgj9EJ9iuihYZp83JeZx6IvPl8UZ1m5+6Cm43HO2NjgEx1Ek/IP3Bjg0bBhZW0
         8ZsoHARIkAabw+eZ9yX3vNLtLVrs4JXFtNBZ9ix0k6roaGbDS77OSl0l9+SaY1BxFrgj
         WyVFHhz44Nzpglk0v7PrLxxRSm/7PGwcY/LPhH5DfSLZIQd9fxWJKRGLBAUCpaB23ERE
         haqSqyKBtrgEucB247ShrNEzUOq8CKDN0XA8iJ7Koj/0KmQ9/BF5rYF7mYrXvQ555RLf
         Rq3YPDdsAMh/1f7gLPWqTz9zIe10Nns6MnBvB6QXtJMsV0+jRLRhbINgIKNU9UmSqp05
         Sb4A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=Uy5Q2JyltsFU76SmhFR5SIPjgCJm3U7CWH+6Z0DXsQs=;
        b=a7hXPsVr/dG5jGhMyFIG+fgWWUgxhE1qaraAHOjTRbYUajm5jLfG81wHerHDMCqGe3
         bSyQFMCanLPUNA0ZDG9rTsIXVpjQf4MC36PQMfNCA9ClpFqurdK0JtOUsfWKzq4H0b9v
         oyMW07l4xZxZhhv4W9XskfHmeV3k6atsX+p+6UjL7pwrgaKXt+PLGa+FhzJAaZg4Qp7O
         9tNxIujlyN56bSlGDZXcRuuweSi/NntrjUNWK7hPVFWMpp+TwQ1RAkX6GMqPZaEhFdNa
         N+8yJ6YHGGiWY0D9kSTICh16q8hWVn1KUqOqu9W5JkOa3k+qCFl+DS1RGrhihe9vRUpN
         Qyig==
X-Gm-Message-State: AFqh2kogY0obFDPigLIzY3j04QzOoM4IBNkSyscHrOzF9tRt+Z7znFzK
        Z4RQYaTZparPdM1snXeO6zmUvx5SsNwH
X-Google-Smtp-Source: 
 AMrXdXv4l2iC4/6tj8LaLY8coRC0+QBvzj+rX/d00qDOyzqCHl1d07EeJQdXbmkvRo6dKHfncNl0vOPNS2s0
X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f])
 (user=vipinsh job=sendgmr) by 2002:a17:90a:8e81:b0:219:c5b3:c543 with SMTP id
 f1-20020a17090a8e8100b00219c5b3c543mr494590pjo.200.1671676508473; Wed, 21 Dec
 2022 18:35:08 -0800 (PST)
Date: Wed, 21 Dec 2022 18:34:53 -0800
In-Reply-To: <20221222023457.1764-1-vipinsh@google.com>
Mime-Version: 1.0
References: <20221222023457.1764-1-vipinsh@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Message-ID: <20221222023457.1764-6-vipinsh@google.com>
Subject: [Patch v3 5/9] KVM: x86/mmu: Allocate TDP page table's page on
 correct NUMA node on split
From: Vipin Sharma <vipinsh@google.com>
To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com,
        dmatlack@google.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Vipin Sharma <vipinsh@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

When dirty log is enabled, huge pages are split. Page table's pages
during the split are allocated based on the current thread NUMA node or
mempolicy. This causes inefficient page table accesses if underlying
page is on a different NUMA node

Allocate page table's pages on the same NUMA node as the underlying huge
page when dirty log is enabled and huge pages are split.

The performance gain during the pre-copy phase of live migrations of a
416 vCPUs and 11 TiB memory VM  on a 8 node host was seen in the range
of 130% to 150%.

Suggested-by: David Matlack <dmatlack@google.com>
Signed-off-by: Vipin Sharma <vipinsh@google.com>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 12 ++++++++----
 include/linux/kvm_host.h   | 18 ++++++++++++++++++
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 4974fa96deff..376b8dceb3f9 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1403,7 +1403,7 @@ bool kvm_tdp_mmu_wrprot_slot(struct kvm *kvm,
 	return spte_set;
 }
=20
-static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(gfp_t gfp)
+static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_split(int nid, gfp_t gf=
p)
 {
 	struct kvm_mmu_page *sp;
=20
@@ -1413,7 +1413,8 @@ static struct kvm_mmu_page *__tdp_mmu_alloc_sp_for_sp=
lit(gfp_t gfp)
 	if (!sp)
 		return NULL;
=20
-	sp->spt =3D (void *)__get_free_page(gfp);
+	sp->spt =3D kvm_mmu_get_free_page(nid, gfp);
+
 	if (!sp->spt) {
 		kmem_cache_free(mmu_page_header_cache, sp);
 		return NULL;
@@ -1427,6 +1428,9 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli=
t(struct kvm *kvm,
 						       bool shared)
 {
 	struct kvm_mmu_page *sp;
+	int nid;
+
+	nid =3D kvm_pfn_to_page_table_nid(spte_to_pfn(iter->old_spte));
=20
 	/*
 	 * Since we are allocating while under the MMU lock we have to be
@@ -1437,7 +1441,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli=
t(struct kvm *kvm,
 	 * If this allocation fails we drop the lock and retry with reclaim
 	 * allowed.
 	 */
-	sp =3D __tdp_mmu_alloc_sp_for_split(GFP_NOWAIT | __GFP_ACCOUNT);
+	sp =3D __tdp_mmu_alloc_sp_for_split(nid, GFP_NOWAIT | __GFP_ACCOUNT);
 	if (sp)
 		return sp;
=20
@@ -1449,7 +1453,7 @@ static struct kvm_mmu_page *tdp_mmu_alloc_sp_for_spli=
t(struct kvm *kvm,
 		write_unlock(&kvm->mmu_lock);
=20
 	iter->yielded =3D true;
-	sp =3D __tdp_mmu_alloc_sp_for_split(GFP_KERNEL_ACCOUNT);
+	sp =3D __tdp_mmu_alloc_sp_for_split(nid, GFP_KERNEL_ACCOUNT);
=20
 	if (shared)
 		read_lock(&kvm->mmu_lock);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d48064503b88..a262e15ebd19 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1583,6 +1583,24 @@ void kvm_arch_sync_events(struct kvm *kvm);
 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
=20
 struct page *kvm_pfn_to_refcounted_page(kvm_pfn_t pfn);
+
+/*
+ * Tells the appropriate NUMA node location of the page table's page based=
 on
+ * pfn it will point to.
+ *
+ * Return the nid of the page if pfn is valid and backed by a refcounted p=
age,
+ * otherwise, return the nearest memory node for the current CPU.
+ */
+static inline int kvm_pfn_to_page_table_nid(kvm_pfn_t pfn)
+{
+	struct page *page =3D kvm_pfn_to_refcounted_page(pfn);
+
+	if (page)
+		return page_to_nid(page);
+	else
+		return numa_mem_id();
+}
+
 bool kvm_is_zone_device_page(struct page *page);
=20
 struct kvm_irq_ack_notifier {
--=20
2.39.0.314.g84b9a713c41-goog
From nobody Sun Feb  8 07:23:23 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id E272CC10F1B
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 02:35:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230447AbiLVCfm (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 21:35:42 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46540 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235003AbiLVCfR (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 21:35:17 -0500
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 10C6B25C7E
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:10 -0800 (PST)
Received: by mail-pj1-x1049.google.com with SMTP id
 p11-20020a17090a680b00b002233455d706so296308pjj.4
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:10 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=poiDBxCXZuaSeZicG2qgYYUccKGMhjWu3dn9FbyWbdI=;
        b=TH1iUYrFfRpDWtVxW3+pgFlRAZdRswEo031A+JoBSFw8+0LAdMFXq8dZa92oOrcqew
         zYK5KnaC50bemT2vcH3I6ljHX1zk29hcYKiDZBNLKVzfySyuRHuu18d89jK8PIfT1R5h
         LcbSU39GflE5evKZtr1uNxzcziFGWAUPdRR3Mx/UUTStoxEpMFjqdxAwTrADnkgxrxfR
         MoqyQVpxe61yOYHtHbVvl+2zwqIlTaQOZOu2yejduA1CKn6/fjkWxKF51oEKMyBQO64B
         G5UcwCIQeSaC9FL/TdKjgMmQqju1POsZRD1JXmJmsK1WsMnE1GTwbAqHf7k7VSLWZPyK
         ZOlQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=poiDBxCXZuaSeZicG2qgYYUccKGMhjWu3dn9FbyWbdI=;
        b=OXm0Bvb7kD1i2OAsjqaf7XKwXTP8vKJa8z68leRpIUGWt77vsdDtIy2cLCA70jNMQP
         nRgb/J4GjrGD9ET+LO7xVzRWPNfQH7xaNlL1u8k9g+ghtAV/WtD4ZCt34f/5Pbm6Hpob
         7+MtEtBV6WwPECL0xwsCqI7aKXhzWXoPYr9jJ87TxQ6MqtMHB9kra8sCoHezHwYQnTrA
         p4Czhj4iHxBN8NjrorfxMv4vuW8mqnzjV4K1kNsPVwwkOQaKAoer+XCDKl4maNauIYeZ
         BAROHBvwz/ABaajDFSDt1XlO+OS3BhqW4rsG3Bko+BrJkZ2HIEfatNtrd+j1xL4ZhwbL
         41FA==
X-Gm-Message-State: AFqh2koeiPA/25YeaiNVeVJmWSXdbMPZ88VBAe4BbtBm+42gTmtjJgb8
        r1UHpnsGdmzxD3fZxq72vroLOwd9W2w1
X-Google-Smtp-Source: 
 AMrXdXtcmKHWHo1AhNx9UQ0Z/sHm4RIiqsiHuv78kXo5tk+ZPsGbtCKvNa6VwqFR7MArdGz/SkeKXbSRg1zN
X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f])
 (user=vipinsh job=sendgmr) by 2002:a17:90a:4c83:b0:219:ac7f:27d8 with SMTP id
 k3-20020a17090a4c8300b00219ac7f27d8mr366228pjh.192.1671676510249; Wed, 21 Dec
 2022 18:35:10 -0800 (PST)
Date: Wed, 21 Dec 2022 18:34:54 -0800
In-Reply-To: <20221222023457.1764-1-vipinsh@google.com>
Mime-Version: 1.0
References: <20221222023457.1764-1-vipinsh@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Message-ID: <20221222023457.1764-7-vipinsh@google.com>
Subject: [Patch v3 6/9] KVM: Provide NUMA node support to
 kvm_mmu_memory_cache{}
From: Vipin Sharma <vipinsh@google.com>
To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com,
        dmatlack@google.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Vipin Sharma <vipinsh@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Add 'node' variable in kvm_mmu_memory_cache{} to denote which NUMA node
this cache should allocate memory from. Default initialize to
NUMA_NO_NODE in all architectures.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
---
 arch/arm64/kvm/arm.c      |  2 +-
 arch/arm64/kvm/mmu.c      |  4 +++-
 arch/mips/kvm/mips.c      |  2 ++
 arch/riscv/kvm/mmu.c      |  2 +-
 arch/riscv/kvm/vcpu.c     |  2 +-
 arch/x86/kvm/mmu/mmu.c    | 22 ++++++++++++----------
 include/linux/kvm_host.h  |  6 ++++++
 include/linux/kvm_types.h |  2 ++
 8 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 9c5573bc4614..52a41f4532e2 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -340,7 +340,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.target =3D -1;
 	bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
=20
-	vcpu->arch.mmu_page_cache.gfp_zero =3D __GFP_ZERO;
+	INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_page_cache, NULL, NUMA_NO_NODE);
=20
 	/*
 	 * Default value for the FP state, will be overloaded at load
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 31d7fa4c7c14..bd07155e17fa 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -894,12 +894,14 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_=
t guest_ipa,
 {
 	phys_addr_t addr;
 	int ret =3D 0;
-	struct kvm_mmu_memory_cache cache =3D { .gfp_zero =3D __GFP_ZERO };
+	struct kvm_mmu_memory_cache cache;
 	struct kvm_pgtable *pgt =3D kvm->arch.mmu.pgt;
 	enum kvm_pgtable_prot prot =3D KVM_PGTABLE_PROT_DEVICE |
 				     KVM_PGTABLE_PROT_R |
 				     (writable ? KVM_PGTABLE_PROT_W : 0);
=20
+	INIT_KVM_MMU_MEMORY_CACHE(&cache, NULL, NUMA_NO_NODE);
+
 	if (is_protected_kvm_enabled())
 		return -EPERM;
=20
diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index a25e0b73ee70..b017c29a9340 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -304,6 +304,8 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 		     HRTIMER_MODE_REL);
 	vcpu->arch.comparecount_timer.function =3D kvm_mips_comparecount_wakeup;
=20
+	vcpu->arch.mmu_page_cache.node =3D NUMA_NO_NODE;
+
 	/*
 	 * Allocate space for host mode exception handlers that handle
 	 * guest mode exits
diff --git a/arch/riscv/kvm/mmu.c b/arch/riscv/kvm/mmu.c
index 34b57e0be2ef..119de4520cc6 100644
--- a/arch/riscv/kvm/mmu.c
+++ b/arch/riscv/kvm/mmu.c
@@ -353,9 +353,9 @@ int kvm_riscv_gstage_ioremap(struct kvm *kvm, gpa_t gpa,
 	phys_addr_t addr, end;
 	struct kvm_mmu_memory_cache pcache =3D {
 		.gfp_custom =3D (in_atomic) ? GFP_ATOMIC | __GFP_ACCOUNT : 0,
-		.gfp_zero =3D __GFP_ZERO,
 	};
=20
+	INIT_KVM_MMU_MEMORY_CACHE(&pcache, NULL, NUMA_NO_NODE);
 	end =3D (gpa + size + PAGE_SIZE - 1) & PAGE_MASK;
 	pfn =3D __phys_to_pfn(hpa);
=20
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index 7c08567097f0..189b14feb365 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -161,7 +161,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
=20
 	/* Mark this VCPU never ran */
 	vcpu->arch.ran_atleast_once =3D false;
-	vcpu->arch.mmu_page_cache.gfp_zero =3D __GFP_ZERO;
+	INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_page_cache, NULL, NUMA_NO_NODE);
 	bitmap_zero(vcpu->arch.isa, RISCV_ISA_EXT_MAX);
=20
 	/* Setup ISA features available to VCPU */
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6f6a10d7a871..23a3b82b2384 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5954,13 +5954,14 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 {
 	int ret;
=20
-	vcpu->arch.mmu_pte_list_desc_cache.kmem_cache =3D pte_list_desc_cache;
-	vcpu->arch.mmu_pte_list_desc_cache.gfp_zero =3D __GFP_ZERO;
+	INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_pte_list_desc_cache,
+				  pte_list_desc_cache, NUMA_NO_NODE);
=20
-	vcpu->arch.mmu_page_header_cache.kmem_cache =3D mmu_page_header_cache;
-	vcpu->arch.mmu_page_header_cache.gfp_zero =3D __GFP_ZERO;
+	INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_page_header_cache,
+				  mmu_page_header_cache, NUMA_NO_NODE);
=20
-	vcpu->arch.mmu_shadow_page_cache.gfp_zero =3D __GFP_ZERO;
+	INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_shadow_page_cache,
+				  NULL, NUMA_NO_NODE);
 	spin_lock_init(&vcpu->arch.mmu_shadow_page_cache_lock);
=20
 	vcpu->arch.mmu =3D &vcpu->arch.root_mmu;
@@ -6124,14 +6125,15 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	node->track_flush_slot =3D kvm_mmu_invalidate_zap_pages_in_memslot;
 	kvm_page_track_register_notifier(kvm, node);
=20
-	kvm->arch.split_page_header_cache.kmem_cache =3D mmu_page_header_cache;
-	kvm->arch.split_page_header_cache.gfp_zero =3D __GFP_ZERO;
+	INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_page_header_cache,
+				  mmu_page_header_cache, NUMA_NO_NODE);
=20
-	kvm->arch.split_shadow_page_cache.gfp_zero =3D __GFP_ZERO;
+	INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_shadow_page_cache,
+				  NULL, NUMA_NO_NODE);
 	spin_lock_init(&kvm->arch.split_shadow_page_cache_lock);
=20
-	kvm->arch.split_desc_cache.kmem_cache =3D pte_list_desc_cache;
-	kvm->arch.split_desc_cache.gfp_zero =3D __GFP_ZERO;
+	INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_desc_cache,
+				  pte_list_desc_cache, NUMA_NO_NODE);
=20
 	return 0;
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a262e15ebd19..719687a37ef7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2302,4 +2302,10 @@ static inline void kvm_account_pgtable_pages(void *v=
irt, int nr)
 /* Max number of entries allowed for each kvm dirty ring */
 #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
=20
+#define INIT_KVM_MMU_MEMORY_CACHE(_cache, _kmem_cache, _node) ({	\
+	(_cache)->kmem_cache =3D _kmem_cache;				\
+	(_cache)->gfp_zero =3D __GFP_ZERO;				\
+	(_cache)->node =3D _node;						\
+})
+
 #endif
diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h
index 76de36e56cdf..9c70ce95e51f 100644
--- a/include/linux/kvm_types.h
+++ b/include/linux/kvm_types.h
@@ -97,6 +97,8 @@ struct kvm_mmu_memory_cache {
 	struct kmem_cache *kmem_cache;
 	int capacity;
 	void **objects;
+	/* Node on which memory should be allocated by default */
+	int node;
 };
 #endif
=20
--=20
2.39.0.314.g84b9a713c41-goog
From nobody Sun Feb  8 07:23:23 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4ECA0C4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 02:35:49 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235062AbiLVCfr (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 21:35:47 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46706 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235009AbiLVCfR (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 21:35:17 -0500
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6CF66264B3
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:12 -0800 (PST)
Received: by mail-pj1-x1049.google.com with SMTP id
 ep17-20020a17090ae65100b00219702c495cso310534pjb.2
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:12 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=mit5lexfNnxmUxBj8TQ21C+l/JJ6de0LF0cpUag1ehI=;
        b=EHRq6KnggJg0qHDofwfjR8ae32NXDI/HKi8Txp4oJH7WGgC54js8Lmg9jOpgQKL0rf
         z5JBSYGPSrYY/bjbOoczVB/e35UZHQcAExxP9b/UwvY8aeF7ea4XZKfRr0uNoyjCemAz
         jVdJjr7nGbgxKIB6oB8sOdVoj3l64EwTQ7tXZxVXxJhjQtsdcf9fjjFsaFhEV0lMOSog
         qLan7LTbdfWcqMilsqnd4KRXpnWI1fT1BqGyVFVLzG+lEAxZDh3DifZDDrCiKznX2L+W
         UUGlJ8LkknW4OnL+SHGFiZMuUeqW1LHnteUkayX070e5ORkSeVq505RNpaK5ocBgWUOy
         L9lg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=mit5lexfNnxmUxBj8TQ21C+l/JJ6de0LF0cpUag1ehI=;
        b=Z7Uu7G8Cnd8Xek0+DzzrWTNu9Ko4xuikA0T4HanUk69BFSvZph5ldTkLABkCUfIzwT
         MXdPSLjqtIC6oaac9y/I6AUWhTgw2czRfv3/BnRbTC/K388z96Ck8F/RN9cVx+Mv9tfO
         sTMJ9FfaSvZNEogizeexm/BvaOYD4GqoP0vSI0lY1TkS2dCmTGf6Qlnnqt2EdOVp0wxY
         RBKOVWP1/Yi9cJKuFx3rnIeaj6yEHlIqW1tvZz/uX8po3KSczJSJKGiBYiv69cbChe+y
         STeC9R+/N5L1fqCB9S5w6DO90BglwfK5ZrPav/UmUgXggVgT3bwkuQQ45AsYxO0eF5K0
         OZfg==
X-Gm-Message-State: AFqh2kqF0ANpGzlMXhi0GN83p0iTJPBRd7lr9+rCyPu7KYmPd97MHvsk
        dCnDxRZhU5rwSE0++8KTC5N674UagMtf
X-Google-Smtp-Source: 
 AMrXdXtxUxWd7v+xEZSntupDbitZG1s36UwcrA0HpW4+YL7A8mh7wqUIn/5H+j8OdYSHApBFGXK6k37/g7hV
X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f])
 (user=vipinsh job=sendgmr) by 2002:a17:90a:bd12:b0:225:b164:8874 with SMTP id
 y18-20020a17090abd1200b00225b1648874mr90799pjr.87.1671676511714; Wed, 21 Dec
 2022 18:35:11 -0800 (PST)
Date: Wed, 21 Dec 2022 18:34:55 -0800
In-Reply-To: <20221222023457.1764-1-vipinsh@google.com>
Mime-Version: 1.0
References: <20221222023457.1764-1-vipinsh@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Message-ID: <20221222023457.1764-8-vipinsh@google.com>
Subject: [Patch v3 7/9] KVM: x86/mmu: Allocate page table's pages on NUMA node
 of the underlying pages
From: Vipin Sharma <vipinsh@google.com>
To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com,
        dmatlack@google.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Vipin Sharma <vipinsh@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Page table pages of a VM are currently allocated based on the current
task's NUMA node or its mempolicy. This can cause suboptimal remote
accesses by the vCPU if it is accessing physical pages local to its NUMA
node but the page table pages mapping those physcal pages were created
by some other vCPU which was on different NUMA node or had different
policy.

Allocate page table pages on the same NUMA node where underlying
physical page exists. Page table at level 5, 4, and 3 might not end up
on the same NUMA node as they can span multiple NUMA nodes.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/mmu/mmu.c          | 63 ++++++++++++++++++++++-----------
 arch/x86/kvm/mmu/paging_tmpl.h  |  4 +--
 arch/x86/kvm/mmu/tdp_mmu.c      | 11 +++---
 virt/kvm/kvm_main.c             |  2 +-
 5 files changed, 53 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 293994fabae3..b1f319ad6f89 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -782,7 +782,7 @@ struct kvm_vcpu_arch {
 	struct kvm_mmu *walk_mmu;
=20
 	struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
-	struct kvm_mmu_memory_cache mmu_shadow_page_cache;
+	struct kvm_mmu_memory_cache mmu_shadow_page_cache[MAX_NUMNODES];
 	struct kvm_mmu_memory_cache mmu_shadowed_info_cache;
 	struct kvm_mmu_memory_cache mmu_page_header_cache;
=20
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 23a3b82b2384..511c6ef265ee 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -677,24 +677,29 @@ static int mmu_topup_sp_memory_cache(struct kvm_mmu_m=
emory_cache *cache,
=20
 static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indir=
ect)
 {
-	int r;
+	int r, nid;
=20
 	/* 1 rmap, 1 parent PTE per level, and the prefetched rmaps. */
 	r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
 				       1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
 	if (r)
 		return r;
-	r =3D mmu_topup_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
-				      &vcpu->arch.mmu_shadow_page_cache_lock,
-				      PT64_ROOT_MAX_LEVEL);
-	if (r)
-		return r;
+
+	for_each_online_node(nid) {
+		r =3D mmu_topup_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache[nid],
+					      &vcpu->arch.mmu_shadow_page_cache_lock,
+					      PT64_ROOT_MAX_LEVEL);
+		if (r)
+			return r;
+	}
+
 	if (maybe_indirect) {
 		r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadowed_info_cache,
 					       PT64_ROOT_MAX_LEVEL);
 		if (r)
 			return r;
 	}
+
 	return kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
 					  PT64_ROOT_MAX_LEVEL);
 }
@@ -715,9 +720,14 @@ static void mmu_free_sp_memory_cache(struct kvm_mmu_me=
mory_cache *cache,
=20
 static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
 {
+	int nid;
+
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
-	mmu_free_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
-				 &vcpu->arch.mmu_shadow_page_cache_lock);
+
+	for_each_node(nid)
+		mmu_free_sp_memory_cache(&vcpu->arch.mmu_shadow_page_cache[nid],
+					 &vcpu->arch.mmu_shadow_page_cache_lock);
+
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadowed_info_cache);
 	kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
 }
@@ -2256,11 +2266,12 @@ static struct kvm_mmu_page *__kvm_mmu_get_shadow_pa=
ge(struct kvm *kvm,
=20
 static struct kvm_mmu_page *kvm_mmu_get_shadow_page(struct kvm_vcpu *vcpu,
 						    gfn_t gfn,
-						    union kvm_mmu_page_role role)
+						    union kvm_mmu_page_role role,
+						    int nid)
 {
 	struct shadow_page_caches caches =3D {
 		.page_header_cache =3D &vcpu->arch.mmu_page_header_cache,
-		.shadow_page_cache =3D &vcpu->arch.mmu_shadow_page_cache,
+		.shadow_page_cache =3D &vcpu->arch.mmu_shadow_page_cache[nid],
 		.shadowed_info_cache =3D &vcpu->arch.mmu_shadowed_info_cache,
 		.shadow_page_cache_lock =3D &vcpu->arch.mmu_shadow_page_cache_lock
 	};
@@ -2316,15 +2327,19 @@ static union kvm_mmu_page_role kvm_mmu_child_role(u=
64 *sptep, bool direct,
=20
 static struct kvm_mmu_page *kvm_mmu_get_child_sp(struct kvm_vcpu *vcpu,
 						 u64 *sptep, gfn_t gfn,
-						 bool direct, unsigned int access)
+						 bool direct, unsigned int access,
+						 kvm_pfn_t pfn)
 {
 	union kvm_mmu_page_role role;
+	int nid;
=20
 	if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
 		return ERR_PTR(-EEXIST);
=20
 	role =3D kvm_mmu_child_role(sptep, direct, access);
-	return kvm_mmu_get_shadow_page(vcpu, gfn, role);
+	nid =3D kvm_pfn_to_page_table_nid(pfn);
+
+	return kvm_mmu_get_shadow_page(vcpu, gfn, role, nid);
 }
=20
 static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *i=
terator,
@@ -3208,7 +3223,8 @@ static int direct_map(struct kvm_vcpu *vcpu, struct k=
vm_page_fault *fault)
 		if (it.level =3D=3D fault->goal_level)
 			break;
=20
-		sp =3D kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true, ACC_ALL);
+		sp =3D kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn, true,
+					  ACC_ALL, fault->pfn);
 		if (sp =3D=3D ERR_PTR(-EEXIST))
 			continue;
=20
@@ -3636,7 +3652,7 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gf=
n_t gfn, int quadrant,
 	WARN_ON_ONCE(quadrant && !role.has_4_byte_gpte);
 	WARN_ON_ONCE(role.direct && role.has_4_byte_gpte);
=20
-	sp =3D kvm_mmu_get_shadow_page(vcpu, gfn, role);
+	sp =3D kvm_mmu_get_shadow_page(vcpu, gfn, role, numa_mem_id());
 	++sp->root_count;
=20
 	return __pa(sp->spt);
@@ -5952,7 +5968,7 @@ static int __kvm_mmu_create(struct kvm_vcpu *vcpu, st=
ruct kvm_mmu *mmu)
=20
 int kvm_mmu_create(struct kvm_vcpu *vcpu)
 {
-	int ret;
+	int ret, nid;
=20
 	INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_pte_list_desc_cache,
 				  pte_list_desc_cache, NUMA_NO_NODE);
@@ -5960,8 +5976,9 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
 	INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_page_header_cache,
 				  mmu_page_header_cache, NUMA_NO_NODE);
=20
-	INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_shadow_page_cache,
-				  NULL, NUMA_NO_NODE);
+	for_each_node(nid)
+		INIT_KVM_MMU_MEMORY_CACHE(&vcpu->arch.mmu_shadow_page_cache[nid],
+					  NULL, nid);
 	spin_lock_init(&vcpu->arch.mmu_shadow_page_cache_lock);
=20
 	vcpu->arch.mmu =3D &vcpu->arch.root_mmu;
@@ -6692,13 +6709,17 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm,=
 u64 gen)
 }
=20
 static unsigned long mmu_shrink_cache(struct kvm_mmu_memory_cache *cache,
+				      int cache_count,
 				      spinlock_t *cache_lock)
 {
 	unsigned long freed =3D 0;
+	int nid;
=20
 	spin_lock(cache_lock);
-	if (cache->nobjs)
-		freed =3D kvm_mmu_empty_memory_cache(cache);
+	for (nid =3D 0; nid < cache_count; nid++) {
+		if (node_online(nid) && cache[nid].nobjs)
+			freed +=3D kvm_mmu_empty_memory_cache(&cache[nid]);
+	}
 	spin_unlock(cache_lock);
 	return freed;
 }
@@ -6721,13 +6742,15 @@ mmu_shrink_scan(struct shrinker *shrink, struct shr=
ink_control *sc)
 		list_move_tail(&kvm->vm_list, &vm_list);
=20
 		freed +=3D mmu_shrink_cache(&kvm->arch.split_shadow_page_cache,
+					  1,
 					  &kvm->arch.split_shadow_page_cache_lock);
=20
 		if (freed >=3D sc->nr_to_scan)
 			break;
=20
 		kvm_for_each_vcpu(i, vcpu, kvm) {
-			freed +=3D mmu_shrink_cache(&vcpu->arch.mmu_shadow_page_cache,
+			freed +=3D mmu_shrink_cache(vcpu->arch.mmu_shadow_page_cache,
+						  MAX_NUMNODES,
 						  &vcpu->arch.mmu_shadow_page_cache_lock);
 		}
=20
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index e5662dbd519c..1ceca62ec4cf 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -652,7 +652,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct k=
vm_page_fault *fault,
 		table_gfn =3D gw->table_gfn[it.level - 2];
 		access =3D gw->pt_access[it.level - 2];
 		sp =3D kvm_mmu_get_child_sp(vcpu, it.sptep, table_gfn,
-					  false, access);
+					  false, access, fault->pfn);
=20
 		if (sp !=3D ERR_PTR(-EEXIST)) {
 			/*
@@ -708,7 +708,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct k=
vm_page_fault *fault,
 		validate_direct_spte(vcpu, it.sptep, direct_access);
=20
 		sp =3D kvm_mmu_get_child_sp(vcpu, it.sptep, base_gfn,
-					  true, direct_access);
+					  true, direct_access, fault->pfn);
 		if (sp =3D=3D ERR_PTR(-EEXIST))
 			continue;
=20
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 376b8dceb3f9..b5abae2366dd 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -259,12 +259,12 @@ static struct kvm_mmu_page *tdp_mmu_next_root(struct =
kvm *kvm,
 		    kvm_mmu_page_as_id(_root) !=3D _as_id) {		\
 		} else
=20
-static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu)
+static struct kvm_mmu_page *tdp_mmu_alloc_sp(struct kvm_vcpu *vcpu, int ni=
d)
 {
 	struct kvm_mmu_page *sp;
=20
 	sp =3D kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
-	sp->spt =3D kvm_mmu_sp_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cac=
he,
+	sp->spt =3D kvm_mmu_sp_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cac=
he[nid],
 						&vcpu->arch.mmu_shadow_page_cache_lock);
=20
 	return sp;
@@ -317,7 +317,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vc=
pu)
 			goto out;
 	}
=20
-	root =3D tdp_mmu_alloc_sp(vcpu);
+	root =3D tdp_mmu_alloc_sp(vcpu, numa_mem_id());
 	tdp_mmu_init_sp(root, NULL, 0, role);
=20
 	refcount_set(&root->tdp_mmu_root_count, 1);
@@ -1149,7 +1149,7 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm=
_page_fault *fault)
 	struct kvm *kvm =3D vcpu->kvm;
 	struct tdp_iter iter;
 	struct kvm_mmu_page *sp;
-	int ret =3D RET_PF_RETRY;
+	int ret =3D RET_PF_RETRY, nid;
=20
 	kvm_mmu_hugepage_adjust(vcpu, fault);
=20
@@ -1178,11 +1178,12 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct k=
vm_page_fault *fault)
 		    !is_large_pte(iter.old_spte))
 			continue;
=20
+		nid =3D kvm_pfn_to_page_table_nid(fault->pfn);
 		/*
 		 * The SPTE is either non-present or points to a huge page that
 		 * needs to be split.
 		 */
-		sp =3D tdp_mmu_alloc_sp(vcpu);
+		sp =3D tdp_mmu_alloc_sp(vcpu, nid);
 		tdp_mmu_init_child_sp(sp, &iter);
=20
 		sp->nx_huge_page_disallowed =3D fault->huge_page_disallowed;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index d96c8146e9ba..4f3db7ffeba8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -415,7 +415,7 @@ static inline void *mmu_memory_cache_alloc_obj(struct k=
vm_mmu_memory_cache *mc,
 	if (mc->kmem_cache)
 		return kmem_cache_alloc(mc->kmem_cache, gfp_flags);
 	else
-		return (void *)__get_free_page(gfp_flags);
+		return kvm_mmu_get_free_page(mc->node, gfp_flags);
 }
=20
 int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capa=
city, int min)
--=20
2.39.0.314.g84b9a713c41-goog
From nobody Sun Feb  8 07:23:23 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id B757BC4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 02:36:02 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235034AbiLVCgB (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 21:36:01 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46676 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S234980AbiLVCfh (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 21:35:37 -0500
Received: from mail-pj1-x104a.google.com (mail-pj1-x104a.google.com
 [IPv6:2607:f8b0:4864:20::104a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A1EE264A0
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:14 -0800 (PST)
Received: by mail-pj1-x104a.google.com with SMTP id
 pi14-20020a17090b1e4e00b0021d20da7a51so2300625pjb.2
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:14 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=OqpNn9k+U+hG3YogdgDR/fOEr/aI6IdVDrzcvJRoBmQ=;
        b=LZCil3LEBuiqPglCaABHVnyyRktcG83zDWgYL8mZt4giWucmb8FzbuHkl/qRDPEfTL
         sho0PmZ83Kz1PVrv7LxQITDU73TzGBV57cqQUyLCGPrx9sfwBeRiioD2DlM4/tip3fqi
         o4IJFivw5AtKaKsdUfv+TfoKhPRPMShpd4EbkgwFft29q/MNo2D6I6u6PD7ToZ52tz4W
         VKhny6D60ScNULutdaRX1NMuignZ1Y+Q42SR/GGwjeDOYE8dXqUceH2wQdl+BQh9qtF5
         92z74m16CNnw3BzXc5KaYuQ6cZkc4YU7xlvloYt7f+L5x8vmyokePQB0TBCZVG3vF0Yg
         INTQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=OqpNn9k+U+hG3YogdgDR/fOEr/aI6IdVDrzcvJRoBmQ=;
        b=b8XX7b2uZvcozgBj6OGMmzGTiexqi6XAxEjJK32lZe8NnldgDXdjl4XH5ZOZawHkGd
         F0J/gTgeTyI1tXKEjf9rHcmwSz64smtDZub1cn7hqo9dO2vk5e1lBeaa7NcFtB5qGDPw
         6h/0TTlbC8Ijl9Mfz9XT6zlxTWrO1IPWhqzhvbwD9RY7M8W8aqi22eAYEQ29DzYW0QeN
         UrMbWwTfbS6zrDHnoKSQQnxtU8UyAv3RuwbTUuudxLU8mKDELLj6VhNii5Mr+98BY2hT
         aSKgxM3XH+dOB8D37oy9vH2lhofPRra+8/dRqvPcWP2ON8q3Y0pvTW1aWLu44qhBGJt3
         fVgg==
X-Gm-Message-State: AFqh2krDNTACpeR2KkYpbSGtJ926b9tzIXp89y9NoAO33SMje5ruzHKP
        0cveuLDYPQsiR4LWM6jjJWCcRTIH+epO
X-Google-Smtp-Source: 
 AMrXdXsdS1uAeaLst9k9vclVCCOLWIbxK9Lx9IWWeNQ8BLMOiOd/S8VYUSYeV2+muAopSD1+l7BaJNjs32ck
X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f])
 (user=vipinsh job=sendgmr) by 2002:a17:902:7d92:b0:18e:bd50:f19a with SMTP id
 a18-20020a1709027d9200b0018ebd50f19amr219700plm.81.1671676513459; Wed, 21 Dec
 2022 18:35:13 -0800 (PST)
Date: Wed, 21 Dec 2022 18:34:56 -0800
In-Reply-To: <20221222023457.1764-1-vipinsh@google.com>
Mime-Version: 1.0
References: <20221222023457.1764-1-vipinsh@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Message-ID: <20221222023457.1764-9-vipinsh@google.com>
Subject: [Patch v3 8/9] KVM: x86/mmu: Make split_shadow_page_cache NUMA aware
From: Vipin Sharma <vipinsh@google.com>
To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com,
        dmatlack@google.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Vipin Sharma <vipinsh@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Make split_shadow_page_cache NUMA aware and allocate page table's pages
during the split based on the underlying physical page's NUMA node.

Signed-off-by: Vipin Sharma <vipinsh@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/mmu/mmu.c          | 50 ++++++++++++++++++---------------
 2 files changed, 29 insertions(+), 23 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index b1f319ad6f89..7b3f36ae37a4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1410,7 +1410,7 @@ struct kvm_arch {
 	 *
 	 * Protected by kvm->slots_lock.
 	 */
-	struct kvm_mmu_memory_cache split_shadow_page_cache;
+	struct kvm_mmu_memory_cache split_shadow_page_cache[MAX_NUMNODES];
 	struct kvm_mmu_memory_cache split_page_header_cache;
=20
 	/*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 511c6ef265ee..7454bfc49a51 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6126,7 +6126,7 @@ static void kvm_mmu_invalidate_zap_pages_in_memslot(s=
truct kvm *kvm,
 int kvm_mmu_init_vm(struct kvm *kvm)
 {
 	struct kvm_page_track_notifier_node *node =3D &kvm->arch.mmu_sp_tracker;
-	int r;
+	int r, nid;
=20
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages);
@@ -6145,8 +6145,9 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_page_header_cache,
 				  mmu_page_header_cache, NUMA_NO_NODE);
=20
-	INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_shadow_page_cache,
-				  NULL, NUMA_NO_NODE);
+	for_each_node(nid)
+		INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_shadow_page_cache[nid],
+					  NULL, NUMA_NO_NODE);
 	spin_lock_init(&kvm->arch.split_shadow_page_cache_lock);
=20
 	INIT_KVM_MMU_MEMORY_CACHE(&kvm->arch.split_desc_cache,
@@ -6157,10 +6158,13 @@ int kvm_mmu_init_vm(struct kvm *kvm)
=20
 static void mmu_free_vm_memory_caches(struct kvm *kvm)
 {
+	int nid;
+
 	kvm_mmu_free_memory_cache(&kvm->arch.split_desc_cache);
 	kvm_mmu_free_memory_cache(&kvm->arch.split_page_header_cache);
-	mmu_free_sp_memory_cache(&kvm->arch.split_shadow_page_cache,
-				 &kvm->arch.split_shadow_page_cache_lock);
+	for_each_node(nid)
+		mmu_free_sp_memory_cache(&kvm->arch.split_shadow_page_cache[nid],
+					 &kvm->arch.split_shadow_page_cache_lock);
 }
=20
 void kvm_mmu_uninit_vm(struct kvm *kvm)
@@ -6269,7 +6273,7 @@ static inline bool need_topup(struct kvm_mmu_memory_c=
ache *cache, int min)
 	return kvm_mmu_memory_cache_nr_free_objects(cache) < min;
 }
=20
-static bool need_topup_split_caches_or_resched(struct kvm *kvm)
+static bool need_topup_split_caches_or_resched(struct kvm *kvm, int nid)
 {
 	if (need_resched() || rwlock_needbreak(&kvm->mmu_lock))
 		return true;
@@ -6281,10 +6285,10 @@ static bool need_topup_split_caches_or_resched(stru=
ct kvm *kvm)
 	 */
 	return need_topup(&kvm->arch.split_desc_cache, SPLIT_DESC_CACHE_MIN_NR_OB=
JECTS) ||
 	       need_topup(&kvm->arch.split_page_header_cache, 1) ||
-	       need_topup(&kvm->arch.split_shadow_page_cache, 1);
+	       need_topup(&kvm->arch.split_shadow_page_cache[nid], 1);
 }
=20
-static int topup_split_caches(struct kvm *kvm)
+static int topup_split_caches(struct kvm *kvm, int nid)
 {
 	/*
 	 * Allocating rmap list entries when splitting huge pages for nested
@@ -6314,18 +6318,21 @@ static int topup_split_caches(struct kvm *kvm)
 	if (r)
 		return r;
=20
-	return mmu_topup_sp_memory_cache(&kvm->arch.split_shadow_page_cache,
+	return mmu_topup_sp_memory_cache(&kvm->arch.split_shadow_page_cache[nid],
 					 &kvm->arch.split_shadow_page_cache_lock,
 					 1);
 }
=20
-static struct kvm_mmu_page *shadow_mmu_get_sp_for_split(struct kvm *kvm, u=
64 *huge_sptep)
+static struct kvm_mmu_page *shadow_mmu_get_sp_for_split(struct kvm *kvm,
+							u64 *huge_sptep,
+							u64 huge_spte)
 {
 	struct kvm_mmu_page *huge_sp =3D sptep_to_sp(huge_sptep);
 	struct shadow_page_caches caches =3D {};
 	union kvm_mmu_page_role role;
 	unsigned int access;
 	gfn_t gfn;
+	int nid;
=20
 	gfn =3D kvm_mmu_page_get_gfn(huge_sp, spte_index(huge_sptep));
 	access =3D kvm_mmu_page_get_access(huge_sp, spte_index(huge_sptep));
@@ -6338,9 +6345,11 @@ static struct kvm_mmu_page *shadow_mmu_get_sp_for_sp=
lit(struct kvm *kvm, u64 *hu
 	 */
 	role =3D kvm_mmu_child_role(huge_sptep, /*direct=3D*/true, access);
=20
+	nid =3D kvm_pfn_to_page_table_nid(spte_to_pfn(huge_spte));
+
 	/* Direct SPs do not require a shadowed_info_cache. */
 	caches.page_header_cache =3D &kvm->arch.split_page_header_cache;
-	caches.shadow_page_cache =3D &kvm->arch.split_shadow_page_cache;
+	caches.shadow_page_cache =3D &kvm->arch.split_shadow_page_cache[nid];
 	caches.shadow_page_cache_lock =3D &kvm->arch.split_shadow_page_cache_lock;
=20
 	/* Safe to pass NULL for vCPU since requesting a direct SP. */
@@ -6360,7 +6369,7 @@ static void shadow_mmu_split_huge_page(struct kvm *kv=
m,
 	gfn_t gfn;
 	int index;
=20
-	sp =3D shadow_mmu_get_sp_for_split(kvm, huge_sptep);
+	sp =3D shadow_mmu_get_sp_for_split(kvm, huge_sptep, huge_spte);
=20
 	for (index =3D 0; index < SPTE_ENT_PER_PAGE; index++) {
 		sptep =3D &sp->spt[index];
@@ -6398,7 +6407,7 @@ static int shadow_mmu_try_split_huge_page(struct kvm =
*kvm,
 					  u64 *huge_sptep)
 {
 	struct kvm_mmu_page *huge_sp =3D sptep_to_sp(huge_sptep);
-	int level, r =3D 0;
+	int level, r =3D 0, nid;
 	gfn_t gfn;
 	u64 spte;
=20
@@ -6406,13 +6415,14 @@ static int shadow_mmu_try_split_huge_page(struct kv=
m *kvm,
 	gfn =3D kvm_mmu_page_get_gfn(huge_sp, spte_index(huge_sptep));
 	level =3D huge_sp->role.level;
 	spte =3D *huge_sptep;
+	nid =3D kvm_pfn_to_page_table_nid(spte_to_pfn(spte));
=20
 	if (kvm_mmu_available_pages(kvm) <=3D KVM_MIN_FREE_MMU_PAGES) {
 		r =3D -ENOSPC;
 		goto out;
 	}
=20
-	if (need_topup_split_caches_or_resched(kvm)) {
+	if (need_topup_split_caches_or_resched(kvm, nid)) {
 		write_unlock(&kvm->mmu_lock);
 		cond_resched();
 		/*
@@ -6420,7 +6430,7 @@ static int shadow_mmu_try_split_huge_page(struct kvm =
*kvm,
 		 * rmap iterator should be restarted because the MMU lock was
 		 * dropped.
 		 */
-		r =3D topup_split_caches(kvm) ?: -EAGAIN;
+		r =3D topup_split_caches(kvm, nid) ?: -EAGAIN;
 		write_lock(&kvm->mmu_lock);
 		goto out;
 	}
@@ -6709,17 +6719,15 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm,=
 u64 gen)
 }
=20
 static unsigned long mmu_shrink_cache(struct kvm_mmu_memory_cache *cache,
-				      int cache_count,
 				      spinlock_t *cache_lock)
 {
 	unsigned long freed =3D 0;
 	int nid;
=20
 	spin_lock(cache_lock);
-	for (nid =3D 0; nid < cache_count; nid++) {
-		if (node_online(nid) && cache[nid].nobjs)
+	for_each_online_node(nid)
+		if (cache[nid].nobjs)
 			freed +=3D kvm_mmu_empty_memory_cache(&cache[nid]);
-	}
 	spin_unlock(cache_lock);
 	return freed;
 }
@@ -6741,8 +6749,7 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrin=
k_control *sc)
 			first_kvm =3D kvm;
 		list_move_tail(&kvm->vm_list, &vm_list);
=20
-		freed +=3D mmu_shrink_cache(&kvm->arch.split_shadow_page_cache,
-					  1,
+		freed +=3D mmu_shrink_cache(kvm->arch.split_shadow_page_cache,
 					  &kvm->arch.split_shadow_page_cache_lock);
=20
 		if (freed >=3D sc->nr_to_scan)
@@ -6750,7 +6757,6 @@ mmu_shrink_scan(struct shrinker *shrink, struct shrin=
k_control *sc)
=20
 		kvm_for_each_vcpu(i, vcpu, kvm) {
 			freed +=3D mmu_shrink_cache(vcpu->arch.mmu_shadow_page_cache,
-						  MAX_NUMNODES,
 						  &vcpu->arch.mmu_shadow_page_cache_lock);
 		}
=20
--=20
2.39.0.314.g84b9a713c41-goog
From nobody Sun Feb  8 07:23:23 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2CFBDC4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 02:36:06 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235045AbiLVCgE (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 21:36:04 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:47288 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235046AbiLVCfj (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 21:35:39 -0500
Received: from mail-pj1-x1049.google.com (mail-pj1-x1049.google.com
 [IPv6:2607:f8b0:4864:20::1049])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 58B6D26AC6
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:16 -0800 (PST)
Received: by mail-pj1-x1049.google.com with SMTP id
 pa16-20020a17090b265000b0020a71040b4cso300727pjb.6
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 18:35:16 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=b9yKHiaQzq9mZ3IJWGzrPWbD3+83EkY9xWMVBG05dNQ=;
        b=XlyER0aXwnjO8q6dTDihNfSIfjWn8dSVJ8IiBcSHE7yuANREvStJV3g9XDl8A/5Vwr
         OaNHoUYM9Z3wM78fZ/XqZqc2ZUebk2Up0L9xXgXL4U3gFGGz6w6NBfJtgfGPwYUC+B8s
         lG8Y2mimnBPeTLnpwg2Gmem25P6HBJfKl/2T+uaujz8TJVx2GozbpVFZB4OD+HpbyDdX
         FHdfVOyX6+BPp8Ht/TdNLUscgGNFc8b5S04IVXGZFWLXt3JasjvhdrpNTDIHgFTTxdhl
         C2eu1PX9i9Z3SW518upry6mS+ShsSl8pxZNEoSPq4X6DQp773Xn1V/u5SNKVXrbP8X0c
         vtUQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=b9yKHiaQzq9mZ3IJWGzrPWbD3+83EkY9xWMVBG05dNQ=;
        b=x9xuSQJNwgHBxcU5HYm2Dy5FNIj96/5HorcahakgG3P9Ku2yRVrqXRMBE5cFuwD7tN
         vvBnWRbdpWOjoyCXYwl/wWpdAKR3loYWE0L8OydOrCl3gZw7rvdxANftHOU4tm/s3CM/
         xfWqYJmpBBJ9q2PlABo4NZElzgzYHzOC4sXMVxapHDQSVR1T2Sr2D6sEgR4o/TM6ivtX
         mxGwfA4Rt3tWhmcFJ7uvNd0Nbg3ZdIu1OOJii0o/H6++Z3B+4Dh5/HUTllFjCa0So4R5
         2JjHzGjMR0CyT3cApAjCOdPOrBx5jLo6vIN2ItEf8xNAxREP/GzQOD77PrPXSFz7N4Yq
         xfFQ==
X-Gm-Message-State: AFqh2kqjAGj+Z8tqtrPg/0TeX8sz8HZxGk4Jl6PmRUF7G/wTr7EePQ95
        6vQwNnF30BcwPCYhB1cg+WOhxEvCn6XP
X-Google-Smtp-Source: 
 AMrXdXuzNj9EuXh7z055yKJ7hUYnL0IVln/7MJaAb+bM/jJJ7BibhMJSEr7enuNFlr9FF9cWepYFeciD79VX
X-Received: from vipin.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:479f])
 (user=vipinsh job=sendgmr) by 2002:a17:90b:701:b0:219:1d0a:34a6 with SMTP id
 s1-20020a17090b070100b002191d0a34a6mr128835pjz.1.1671676515179; Wed, 21 Dec
 2022 18:35:15 -0800 (PST)
Date: Wed, 21 Dec 2022 18:34:57 -0800
In-Reply-To: <20221222023457.1764-1-vipinsh@google.com>
Mime-Version: 1.0
References: <20221222023457.1764-1-vipinsh@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Message-ID: <20221222023457.1764-10-vipinsh@google.com>
Subject: [Patch v3 9/9] KVM: x86/mmu: Reduce default cache size in KVM from 40
 to PT64_ROOT_MAX_LEVEL
From: Vipin Sharma <vipinsh@google.com>
To: seanjc@google.com, pbonzini@redhat.com, bgardon@google.com,
        dmatlack@google.com
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
        Vipin Sharma <vipinsh@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE is set to 40 without any specific
reason. Reduce default size to PT64_ROOT_MAX_LEVEL, which is currently
5.

Change mmu_pte_list_desc_cache size to what is needed as it is more than
5 but way less than 40.

Tested by running dirty_log_perf_test on both tdp and shadow MMU with 48
vcpu and 2GB/vcpu size on a 2 NUMA node machine. No impact on
performance noticed.

Ran perf on dirty_log_perf_test and found kvm_mmu_get_free_page() calls
reduced by ~3300 which is near to 48 (vcpus) * 2 (nodes) * 35 (cache
size).

Signed-off-by: Vipin Sharma <vipinsh@google.com>
---
 arch/x86/include/asm/kvm_types.h | 2 +-
 arch/x86/kvm/mmu/mmu.c           | 7 ++++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_types.h b/arch/x86/include/asm/kvm_ty=
pes.h
index 08f1b57d3b62..752dab218a62 100644
--- a/arch/x86/include/asm/kvm_types.h
+++ b/arch/x86/include/asm/kvm_types.h
@@ -2,6 +2,6 @@
 #ifndef _ASM_X86_KVM_TYPES_H
 #define _ASM_X86_KVM_TYPES_H
=20
-#define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40
+#define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE PT64_ROOT_MAX_LEVEL
=20
 #endif /* _ASM_X86_KVM_TYPES_H */
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 7454bfc49a51..f89d933ff380 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -677,11 +677,12 @@ static int mmu_topup_sp_memory_cache(struct kvm_mmu_m=
emory_cache *cache,
=20
 static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indir=
ect)
 {
-	int r, nid;
+	int r, nid, desc_capacity;
=20
 	/* 1 rmap, 1 parent PTE per level, and the prefetched rmaps. */
-	r =3D kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
-				       1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
+	desc_capacity =3D 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM;
+	r =3D __kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
+					 desc_capacity, desc_capacity);
 	if (r)
 		return r;
=20
--=20
2.39.0.314.g84b9a713c41-goog