From nobody Sun Feb  8 03:30:48 2026
Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com
 [209.85.210.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B338280A4F
	for <linux-kernel@vger.kernel.org>; Fri, 16 May 2025 21:54:31 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.210.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1747432473; cv=none;
 b=qIm0nmEAr3ESzTQweKk9SA7zp331Bc9xS7UCyQ261SPst5mGVeF6WBARjATm5JIf791UiKeEPUOXzqiOwalCBbG3kQYzSscofY5aGNLyY7IpK8KI9NIE+mqdzDgmnfdKJ4AUOF1TDZEKGueyIO76fPF39E4CcjbnYdNo6m8oqT8=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1747432473; c=relaxed/simple;
	bh=b8X+SZETCA+/r2QlXODqiqz30AodXiz4FkatNip2IL4=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=MqhQV/k5pNXKqr/VsL6DpSkjHZKx55BrzzsYqqJaQkHy1IuluDxgo2DdrlSlziFbCJmTsXVUj6kNDZD1T/RNcuyoCEWDgiXh4ndB+6Nj+1A/z311yj6428YWqqQMcibwhhx558y2rvVNnDiPNZOnehbsVOKLTxg05EftCLVHOaY=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=HDCY8peW; arc=none smtp.client-ip=209.85.210.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="HDCY8peW"
Received: by mail-pf1-f202.google.com with SMTP id
 d2e1a72fcca58-7394772635dso1832838b3a.0
        for <linux-kernel@vger.kernel.org>;
 Fri, 16 May 2025 14:54:31 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1747432471; x=1748037271;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=he9wiLV5uZ6/m/Ubi4LHJ7N+Onl68GiZAuqObi6fPYU=;
        b=HDCY8peWj93g5yHjAtIF5tcgJPVRrOv2O1aYUHlgcaAyxJfo9tfvg1kKgJhdNT+/el
         iOzgjlFiDVnUg5tJEYHF8hlX2jSMeptEidk7p4BBMtJixj+8OLQD1bi4mr46sJvT34+k
         4BV9ebfvDF5k9lUi/VzqwXSbd27bw3yeHVzwvIoObMtjQ/G0mlvM8RaESwg/4cFgCPES
         mpmCowv6ewqTnKVIJGM9uiCPtHrWWm6NCElVUNPyx4esluYq+K86qjmOnkiIP1WjnR4k
         FYDGU0b+Xtqgc8y7utEN6nPEOATGPUQhdy9+ylDh2xK1ReFYQLjIln569g2amGI7xvVF
         ifyw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1747432471; x=1748037271;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=he9wiLV5uZ6/m/Ubi4LHJ7N+Onl68GiZAuqObi6fPYU=;
        b=WzXcBjKMfmdj0+TEq7P6w5/povWv8ukuHKE9Cv2bUuFHBmoRaX5vpy1G/0/pKkzdoO
         ySXrNyl/alJPEgvVyb9pM1s4FMMMbiYCXpXtdAFhdO8+GDKxTwd9l8iP9rCMflfbWDml
         xEUhj07qV4CKSJEJyFOkFvfO72OyU0FBFl+Fjng6gi+H4wJ77IlQv/5NfKveJL9elj9p
         +VHfTl/Rd6/auRe7PtGcGNNTmiyUiV05MsKjIY/4HH6T8X6cA2eLfvAptv9le+0g/wvO
         Ap6WcE5nX9BsGpou3Do6jfIh7ZtXT08D2F0ie/Hd7yMpEzYJGP4VEFSqTE+v1bm1TqJJ
         sX7g==
X-Forwarded-Encrypted: i=1;
 AJvYcCU/erDGw84KlVMLoLt/1vHVlF/KvldrMT7VukRu7P2dq1sfiMJdJdutr2Oss1FN/hUnBx+Sj2VQESVfbwE=@vger.kernel.org
X-Gm-Message-State: AOJu0YxZ2kPEF2ILX62MrUl1YVulXqsxfcxO3+5eumQRhtHJabTM8nw4
	xz7ConqsnJwwnbkmLymzdDlR5BvSWeEMadZyDe88YNBL2BStjDR8x5YBSj4bs4BRGDieruR9Ibk
	+SjctMg==
X-Google-Smtp-Source: 
 AGHT+IFvdwjKVnl8oB8hg8sThG1n7Y/nnlVTgBb2yxJZPjq66baQbtmAXviN6UxG+4eozQyHW/ONPWfBnpk=
X-Received: from pfbhe16.prod.google.com
 ([2002:a05:6a00:6610:b0:730:90b2:dab])
 (user=seanjc job=prod-delivery.src-stubby-dispatcher) by
 2002:a05:6a20:1591:b0:1f1:432:f4a3
 with SMTP id adf61e73a8af0-2170ccb2dccmr6280425637.23.1747432471453; Fri, 16
 May 2025 14:54:31 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 16 May 2025 14:54:20 -0700
In-Reply-To: <20250516215422.2550669-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250516215422.2550669-1-seanjc@google.com>
X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog
Message-ID: <20250516215422.2550669-2-seanjc@google.com>
Subject: [PATCH v3 1/3] KVM: x86/mmu: Dynamically allocate shadow MMU's hashed
 page list
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
 Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Vipin Sharma <vipinsh@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Dynamically allocate the (massive) array of hashed lists used to track
shadow pages, as the array itself is 32KiB, i.e. is an order-3 allocation
all on its own, and is *exactly* an order-3 allocation.  Dynamically
allocating the array will allow allocating "struct kvm" using kvmalloc(),
and will also allow deferring allocation of the array until it's actually
needed, i.e. until the first shadow root is allocated.

Opportunistically use kvmalloc() for the hashed lists, as an order-3
allocation is (stating the obvious) less likely to fail than an order-4
allocation, and the overhead of vmalloc() is undesirable given that the
size of the allocation is fixed.

Cc: Vipin Sharma <vipinsh@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/mmu/mmu.c          | 23 ++++++++++++++++++++++-
 arch/x86/kvm/x86.c              |  5 ++++-
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 330cdcbed1a6..9667d6b929ee 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1343,7 +1343,7 @@ struct kvm_arch {
 	bool has_private_mem;
 	bool has_protected_state;
 	bool pre_fault_allowed;
-	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
+	struct hlist_head *mmu_page_hash;
 	struct list_head active_mmu_pages;
 	/*
 	 * A list of kvm_mmu_page structs that, if zapped, could possibly be
@@ -2006,7 +2006,7 @@ void kvm_mmu_vendor_module_exit(void);
=20
 void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
 int kvm_mmu_create(struct kvm_vcpu *vcpu);
-void kvm_mmu_init_vm(struct kvm *kvm);
+int kvm_mmu_init_vm(struct kvm *kvm);
 void kvm_mmu_uninit_vm(struct kvm *kvm);
=20
 void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cbc84c6abc2e..41da2cb1e3f1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3882,6 +3882,18 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *v=
cpu)
 	return r;
 }
=20
+static int kvm_mmu_alloc_page_hash(struct kvm *kvm)
+{
+	typeof(kvm->arch.mmu_page_hash) h;
+
+	h =3D kvcalloc(KVM_NUM_MMU_PAGES, sizeof(*h), GFP_KERNEL_ACCOUNT);
+	if (!h)
+		return -ENOMEM;
+
+	kvm->arch.mmu_page_hash =3D h;
+	return 0;
+}
+
 static int mmu_first_shadow_root_alloc(struct kvm *kvm)
 {
 	struct kvm_memslots *slots;
@@ -6675,13 +6687,19 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 		kvm_tdp_mmu_zap_invalidated_roots(kvm, true);
 }
=20
-void kvm_mmu_init_vm(struct kvm *kvm)
+int kvm_mmu_init_vm(struct kvm *kvm)
 {
+	int r;
+
 	kvm->arch.shadow_mmio_value =3D shadow_mmio_value;
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages);
 	spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
=20
+	r =3D kvm_mmu_alloc_page_hash(kvm);
+	if (r)
+		return r;
+
 	if (tdp_mmu_enabled)
 		kvm_mmu_init_tdp_mmu(kvm);
=20
@@ -6692,6 +6710,7 @@ void kvm_mmu_init_vm(struct kvm *kvm)
=20
 	kvm->arch.split_desc_cache.kmem_cache =3D pte_list_desc_cache;
 	kvm->arch.split_desc_cache.gfp_zero =3D __GFP_ZERO;
+	return 0;
 }
=20
 static void mmu_free_vm_memory_caches(struct kvm *kvm)
@@ -6703,6 +6722,8 @@ static void mmu_free_vm_memory_caches(struct kvm *kvm)
=20
 void kvm_mmu_uninit_vm(struct kvm *kvm)
 {
+	kvfree(kvm->arch.mmu_page_hash);
+
 	if (tdp_mmu_enabled)
 		kvm_mmu_uninit_tdp_mmu(kvm);
=20
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f9f798f286ce..d204ba9368f8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12787,7 +12787,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long=
 type)
 	if (ret)
 		goto out;
=20
-	kvm_mmu_init_vm(kvm);
+	ret =3D kvm_mmu_init_vm(kvm);
+	if (ret)
+		goto out_cleanup_page_track;
=20
 	ret =3D kvm_x86_call(vm_init)(kvm);
 	if (ret)
@@ -12840,6 +12842,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long=
 type)
=20
 out_uninit_mmu:
 	kvm_mmu_uninit_vm(kvm);
+out_cleanup_page_track:
 	kvm_page_track_cleanup(kvm);
 out:
 	return ret;
--=20
2.49.0.1112.g889b7c5bd8-goog
From nobody Sun Feb  8 03:30:48 2026
Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com
 [209.85.216.73])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6529E280306
	for <linux-kernel@vger.kernel.org>; Fri, 16 May 2025 21:54:40 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.73
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1747432481; cv=none;
 b=fhGHls0Ybxol6IW9aj3Jksj3XgGQUt3vY4KmaPg9TZi4Py/mkMgUEQJL5qaduP//ubgR/1Inrtk+HgUkYaQLO/ZccHtQ+x/XuhelUul0QJSWBUl4oyBcM+HzbhQNTW/tQxlsQubknw8SSLD7FjkgRbX6R1SCuXCdYJY6UT0DzTw=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1747432481; c=relaxed/simple;
	bh=nRPieSPOQF3VWwkmferQ1dgFBNyvX49xzQ3/NHlAE9Q=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=hshP2GqIRX5NaKa6OsJVGzpW9XsAY8wY3ZSUtl+XZMXX6yTDhTXnEhdgwuxvkuBDYdw6UhtobesATFJQGNHUua9dlIDNrHzV/heAuEYsExp2GoTCZo+gzbVJS5+fwvel3rVVphWq9W08tz90wswyc2aXDciodl8nP9FjHFQH+YQ=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=mpgAOzN6; arc=none smtp.client-ip=209.85.216.73
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="mpgAOzN6"
Received: by mail-pj1-f73.google.com with SMTP id
 98e67ed59e1d1-30e6ccd17d6so2370343a91.1
        for <linux-kernel@vger.kernel.org>;
 Fri, 16 May 2025 14:54:40 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1747432479; x=1748037279;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=D0gXvpr2sHdqy9QZLUq/qbAZv7yiyUTA7R6EBW7+pvc=;
        b=mpgAOzN609xA3+MSbXoT0pRFGc/sGYDtMvWODsOZRp9/SdDIQwxKNRbycdg5iCrscn
         1UjqFf4KpE+cZh8qwGS3V166rUtCGI0uo4B0t5sLmVzsbf5cepwakJGItZlS3EBZZ8iA
         yxIno286gtoM1apT3fSWbxtp+TYhI5axFXVbAGihfk/HaW3WpQ6E8D+D9Ww6BBgNKIlq
         3Pz2l05PIcnBnlwJeaFQ5KEq0yxgkuBu4VqIpG9Dd9Zmz62ehwukxDobK4v/h9bdZvpI
         zrwySx6zD37w1pEBhalO0BxjeXagDd02FYqRPk7446mTnoUg42JKOk2aqRjPLYD+LBPA
         ZVzA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1747432479; x=1748037279;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=D0gXvpr2sHdqy9QZLUq/qbAZv7yiyUTA7R6EBW7+pvc=;
        b=DZxBjTcjt2BQblM8itu9kvdQT3n4Unu9SiR4w55wH4IJo6Qi4kWfB2t93CrWjuCsC3
         as1yLZAUVq+T9k0vVOYf9QpDE4D3Ys+LNa2WFhT5FL4pQ8Ty3MjIq7WXpVnRQr8p9iAx
         A6XR5mqMRLyB3n0DV1IOGRgSgKudEbx670Nd79ukwBqo8bGbOZ68LFOo51bhWTGT7tfQ
         QZoYGYTTrPKpT+L/9m06S2sNNWednCk40001aaz51zATl4LvcUh1nxM+oW9bM3GDDKv/
         yO11SWpBElPUGliGFM+MRZ15rHpyPRxKMyFQGRMeagx7ME5LTWOCk5BOHTDGs2zsouHA
         sStw==
X-Forwarded-Encrypted: i=1;
 AJvYcCV7e+9nh4Auli/fZA1c/aI5zgbj5o1OW1XtmpybseNB4b3AOi/RK6i6+h3YwfkcxvxHUviMwZDMM9sSEzo=@vger.kernel.org
X-Gm-Message-State: AOJu0YzuI0M9Au0QC+g8tKtcFIc0N4z4rivcxrEYuqcQSNlHCPLiW2GI
	XXmLjM6Wvp1SWgVLkxmdZlJ3uEorVn2YMyHa4Z4bvZREsQS4uwJ1GBWkOyKkpLeEoAifrQMmzfR
	gp54vtg==
X-Google-Smtp-Source: 
 AGHT+IH34nCA0NwEOc5om/HjB7xT1Sgsq7/Owr6ybh70IbeIayMQAkFXjkdzcFi3Zex3eJnI4pnnMU32WEw=
X-Received: from pjbpt6.prod.google.com ([2002:a17:90b:3d06:b0:30a:7da4:f075])
 (user=seanjc job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:28c5:b0:2f2:a664:df1a
 with SMTP id 98e67ed59e1d1-30e7d4f920dmr8130238a91.2.1747432479652; Fri, 16
 May 2025 14:54:39 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 16 May 2025 14:54:21 -0700
In-Reply-To: <20250516215422.2550669-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250516215422.2550669-1-seanjc@google.com>
X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog
Message-ID: <20250516215422.2550669-3-seanjc@google.com>
Subject: [PATCH v3 2/3] KVM: x86: Use kvzalloc() to allocate VM struct
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
 Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Vipin Sharma <vipinsh@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Allocate VM structs via kvzalloc(), i.e. try to use a contiguous physical
allocation before falling back to __vmalloc(), to avoid the overhead of
establishing the virtual mappings.  For non-debug builds, The SVM and VMX
(and TDX) structures are now just below 7000 bytes in the worst case
scenario (see below), i.e. are order-1 allocations, and will likely remain
that way for quite some time.

Add compile-time assertions in vendor code to ensure the size of the
structures, sans the memslos hash tables, are order-0 allocations, i.e.
are less than 4KiB.  There's nothing fundamentally wrong with a larger
kvm_{svm,vmx,tdx} size, but given that the size of the structure (without
the memslots hash tables) is below 2KiB after 18+ years of existence,
more than doubling the size would be quite notable.

Add sanity checks on the memslot hash table sizes, partly to ensure they
aren't resized without accounting for the impact on VM structure size, and
partly to document that the majority of the size of VM structures comes
from the memslots.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/svm/svm.c          |  2 ++
 arch/x86/kvm/vmx/main.c         |  2 ++
 arch/x86/kvm/vmx/vmx.c          |  2 ++
 arch/x86/kvm/x86.h              | 22 ++++++++++++++++++++++
 5 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 9667d6b929ee..3a985825a945 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1961,7 +1961,7 @@ void kvm_x86_vendor_exit(void);
 #define __KVM_HAVE_ARCH_VM_ALLOC
 static inline struct kvm *kvm_arch_alloc_vm(void)
 {
-	return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	return kvzalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT);
 }
=20
 #define __KVM_HAVE_ARCH_VM_FREE
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0ad1a6d4fb6d..d13e475c3407 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5675,6 +5675,8 @@ static int __init svm_init(void)
 {
 	int r;
=20
+	KVM_SANITY_CHECK_VM_STRUCT_SIZE(kvm_svm);
+
 	__unused_size_checks();
=20
 	if (!kvm_is_svm_supported())
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index d1e02e567b57..e18dfada2e90 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -64,6 +64,8 @@ static __init int vt_hardware_setup(void)
 		vt_x86_ops.protected_apic_has_interrupt =3D tdx_protected_apic_has_inter=
rupt;
 	}
=20
+	KVM_SANITY_CHECK_VM_STRUCT_SIZE(kvm_tdx);
+
 	return 0;
 }
=20
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9ff00ae9f05a..ef58b727d6c8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8643,6 +8643,8 @@ int __init vmx_init(void)
 {
 	int r, cpu;
=20
+	KVM_SANITY_CHECK_VM_STRUCT_SIZE(kvm_vmx);
+
 	if (!kvm_is_vmx_supported())
 		return -EOPNOTSUPP;
=20
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 832f0faf4779..0f3046cccb79 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -55,6 +55,28 @@ struct kvm_host_values {
=20
 void kvm_spurious_fault(void);
=20
+#define SIZE_OF_MEMSLOTS_HASHTABLE \
+	(sizeof(((struct kvm_memslots *)0)->id_hash) * 2 * KVM_MAX_NR_ADDRESS_SPA=
CES)
+
+/* Sanity check the size of the memslot hash tables. */
+static_assert(SIZE_OF_MEMSLOTS_HASHTABLE =3D=3D
+	      (1024 * (1 + IS_ENABLED(CONFIG_X86_64)) * (1 + IS_ENABLED(CONFIG_KV=
M_SMM))));
+
+/*
+ * Assert that "struct kvm_{svm,vmx,tdx}" is an order-0 or order-1 allocat=
ion.
+ * Spilling over to an order-2 allocation isn't fundamentally problematic,=
 but
+ * isn't expected to happen in the foreseeable future (O(years)).  Assert =
that
+ * the size is an order-0 allocation when ignoring the memslot hash tables=
, to
+ * help detect and debug unexpected size increases.
+ */
+#define KVM_SANITY_CHECK_VM_STRUCT_SIZE(x)						\
+do {											\
+	BUILD_BUG_ON(get_order(sizeof(struct x) - SIZE_OF_MEMSLOTS_HASHTABLE) &&	\
+		     !IS_ENABLED(CONFIG_DEBUG_KERNEL) && !IS_ENABLED(CONFIG_KASAN));	\
+	BUILD_BUG_ON(get_order(sizeof(struct x)) < 2 &&					\
+		     !IS_ENABLED(CONFIG_DEBUG_KERNEL) && !IS_ENABLED(CONFIG_KASAN));	\
+} while (0)
+
 #define KVM_NESTED_VMENTER_CONSISTENCY_CHECK(consistency_check)		\
 ({									\
 	bool failed =3D (consistency_check);				\
--=20
2.49.0.1112.g889b7c5bd8-goog
From nobody Sun Feb  8 03:30:48 2026
Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com
 [209.85.216.73])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 48622280306
	for <linux-kernel@vger.kernel.org>; Fri, 16 May 2025 21:54:46 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.73
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1747432487; cv=none;
 b=MO82sv3HmpdqiBpZX8LB40TE6QxyA8QWFHQHtNa1VFTRUJqg+m85UwFTj7H9ve802yZZQY+RmX3NAgiSujjb9CHrVYEqbdVPSTOviYsMI32QMb+HexAhpPZCh4E0Ah5JaOuGD1s4ygX4IS0yfE+G6C/0GWYgtxDwfwLSoT+YXcQ=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1747432487; c=relaxed/simple;
	bh=vG4MilI6ZPjsxWzTTmJYBAXAtBz2ENQvYI2HlBYevYI=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=N6HcmQ9imS4f/h+FX8H/SWyOhegLrv4m9NHHAM4DwMpzWq0fBiMqnD1G6Thu2GfabhPnZLjr3WEJVgfGgqRzvPgT3hiT0q6pSIYVhaby0M3XH9LN2VeFQTVPKH2IM0uWKp1/85TpZwWzsyGrrliHIPON8W+O9tg374ApCpISccI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=L3Lyj6Ww; arc=none smtp.client-ip=209.85.216.73
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="L3Lyj6Ww"
Received: by mail-pj1-f73.google.com with SMTP id
 98e67ed59e1d1-30e8fc03575so542310a91.0
        for <linux-kernel@vger.kernel.org>;
 Fri, 16 May 2025 14:54:46 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1747432485; x=1748037285;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=NSUjHZohehNysHCDxTu74aE3HD7zo4nmAA1APckiJz0=;
        b=L3Lyj6WwfYTHirB6zsGpMhRY+u0b1vYrhN02zNFFCMAxIXMnmxbUmy1TnKvn28vjSl
         wMHnS6fa9lEJQonZHYUt7XDRoV7VX1Uchjk4soHyZ+GxpGjzPHdZNU6GXHtLMVdNNjNa
         3kNTgACuu812GI9UcXEfvj0tXOfIxvU/194v9kmT7BJgnhoSW3zCfUh8BwapsbKM8jRx
         dWuRzKJ1TCqWeh+L3pQSxpvnGzDqDv83fSKMgLo/815zNJAPFX87HN/j0z0hgPUaV1QT
         sPspIowQeG9zC9F3jj0njtAZxY3bpX4YFGTFGlCoTqFOaQlhLN1QqA4GUEHhSfKaTBpL
         JkYw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1747432485; x=1748037285;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=NSUjHZohehNysHCDxTu74aE3HD7zo4nmAA1APckiJz0=;
        b=P7k6f9sbQyx7sFB5zgN4U8MWF9Gzcbxh7u4QuhqG5IcCjc469ynMkLBFsunCcRYRcN
         RP07fvkirpR+MFRHOBTq0E3v3I6pMTa2EGGloRzCpfRDra1xPmCWDMWFYMnolp2ZpX50
         9VLL5SG6Bz0Dp8/hcajyl8zsMquZRrhmAleVgnHLRKqQEC8iX9Bg5wohj+PQSOwpwUkO
         8buMZsb90BFaytWmRh58TCcykUDCnoEPLrC/v8N4+J3JqkzmgQTongnqtbduvTwt8pAF
         KkMikWTaFcoZ/82vSldTOQNyd+83wmtdBL8DU1x/8AabXGcjxNenuCTsAEYZC0e2j797
         JKOA==
X-Forwarded-Encrypted: i=1;
 AJvYcCXuQruA5LMWxWn1Z2z+1pyV/29h/2Ush2HkVUqTx8AWfYXahgR26QXY0iFhIl63joMsFFdYNGY7FymepUM=@vger.kernel.org
X-Gm-Message-State: AOJu0Yy5KU7sSaLVDYP7mkLHfVb1GREa7iboKOfczvPDb4X7+4uiekDe
	3GZP1LdF47cXsYQLnGybwkd4Id5dIKdbBoPLJF71bGWc19xYns4J3vuZnXG94ulhlUKC6p1m39q
	I97hFIw==
X-Google-Smtp-Source: 
 AGHT+IHkQzjmHcULUBiBU5Hpt5ez1mbp/7xSjtX/QxCL0i4ISdp33Z6ODZCSemjhL/36/EZTOC8neh15Vdo=
X-Received: from pjbqi17.prod.google.com
 ([2002:a17:90b:2751:b0:30a:9720:ea33])
 (user=seanjc job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:4a04:b0:30e:9349:2da2
 with SMTP id 98e67ed59e1d1-30e934931eemr3640734a91.4.1747432485576; Fri, 16
 May 2025 14:54:45 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Fri, 16 May 2025 14:54:22 -0700
In-Reply-To: <20250516215422.2550669-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250516215422.2550669-1-seanjc@google.com>
X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog
Message-ID: <20250516215422.2550669-4-seanjc@google.com>
Subject: [PATCH v3 3/3] KVM: x86/mmu: Defer allocation of shadow MMU's hashed
 page list
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
 Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Vipin Sharma <vipinsh@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

When the TDP MMU is enabled, i.e. when the shadow MMU isn't used until a
nested TDP VM is run, defer allocation of the array of hashed lists used
to track shadow MMU pages until the first shadow root is allocated.

Setting the list outside of mmu_lock is safe, as concurrent readers must
hold mmu_lock in some capacity, shadow pages can only be added (or removed)
from the list when mmu_lock is held for write, and tasks that are creating
a shadow root are serialized by slots_arch_lock.  I.e. it's impossible for
the list to become non-empty until all readers go away, and so readers are
guaranteed to see an empty list even if they make multiple calls to
kvm_get_mmu_page_hash() in a single mmu_lock critical section.

Use {WRITE/READ}_ONCE to set/get the list when mmu_lock isn't held for
write, out of an abundance of paranoia; no sane compiler should tear the
store or load, but it's technically possible.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 60 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 50 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 41da2cb1e3f1..edb4ecff9917 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1983,14 +1983,33 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp)
 	return true;
 }
=20
+static __ro_after_init HLIST_HEAD(empty_page_hash);
+
+static struct hlist_head *kvm_get_mmu_page_hash(struct kvm *kvm, gfn_t gfn)
+{
+	/*
+	 * Load mmu_page_hash from memory exactly once, as it's set at runtime
+	 * outside of mmu_lock when the TDP MMU is enabled, i.e. when the hash
+	 * table of shadow pages isn't needed unless KVM needs to shadow L1's
+	 * TDP for an L2 guest.
+	 */
+	struct hlist_head *page_hash =3D READ_ONCE(kvm->arch.mmu_page_hash);
+
+	lockdep_assert_held(&kvm->mmu_lock);
+
+	if (!page_hash)
+		return &empty_page_hash;
+
+	return &page_hash[kvm_page_table_hashfn(gfn)];
+}
+
 #define for_each_valid_sp(_kvm, _sp, _list)				\
 	hlist_for_each_entry(_sp, _list, hash_link)			\
 		if (is_obsolete_sp((_kvm), (_sp))) {			\
 		} else
=20
 #define for_each_gfn_valid_sp_with_gptes(_kvm, _sp, _gfn)		\
-	for_each_valid_sp(_kvm, _sp,					\
-	  &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)])	\
+	for_each_valid_sp(_kvm, _sp, kvm_get_mmu_page_hash(_kvm, _gfn))	\
 		if ((_sp)->gfn !=3D (_gfn) || !sp_has_gptes(_sp)) {} else
=20
 static bool kvm_sync_page_check(struct kvm_vcpu *vcpu, struct kvm_mmu_page=
 *sp)
@@ -2358,6 +2377,12 @@ static struct kvm_mmu_page *__kvm_mmu_get_shadow_pag=
e(struct kvm *kvm,
 	struct kvm_mmu_page *sp;
 	bool created =3D false;
=20
+	/*
+	 * No need for READ_ONCE(), unlike in kvm_get_mmu_page_hash(), because
+	 * mmu_page_hash must be set prior to creating the first shadow root,
+	 * i.e. reaching this point is fully serialized by slots_arch_lock.
+	 */
+	BUG_ON(!kvm->arch.mmu_page_hash);
 	sp_list =3D &kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
=20
 	sp =3D kvm_mmu_find_shadow_page(kvm, vcpu, gfn, sp_list, role);
@@ -3886,11 +3911,21 @@ static int kvm_mmu_alloc_page_hash(struct kvm *kvm)
 {
 	typeof(kvm->arch.mmu_page_hash) h;
=20
+	if (kvm->arch.mmu_page_hash)
+		return 0;
+
 	h =3D kvcalloc(KVM_NUM_MMU_PAGES, sizeof(*h), GFP_KERNEL_ACCOUNT);
 	if (!h)
 		return -ENOMEM;
=20
-	kvm->arch.mmu_page_hash =3D h;
+	/*
+	 * Write mmu_page_hash exactly once as there may be concurrent readers,
+	 * e.g. to check for shadowed PTEs in mmu_try_to_unsync_pages().  Note,
+	 * mmu_lock must be held for write to add (or remove) shadow pages, and
+	 * so readers are guaranteed to see an empty list for their current
+	 * mmu_lock critical section.
+	 */
+	WRITE_ONCE(kvm->arch.mmu_page_hash, h);
 	return 0;
 }
=20
@@ -3913,9 +3948,13 @@ static int mmu_first_shadow_root_alloc(struct kvm *k=
vm)
 	if (kvm_shadow_root_allocated(kvm))
 		goto out_unlock;
=20
+	r =3D kvm_mmu_alloc_page_hash(kvm);
+	if (r)
+		goto out_unlock;
+
 	/*
-	 * Check if anything actually needs to be allocated, e.g. all metadata
-	 * will be allocated upfront if TDP is disabled.
+	 * Check if memslot metadata actually needs to be allocated, e.g. all
+	 * metadata will be allocated upfront if TDP is disabled.
 	 */
 	if (kvm_memslots_have_rmaps(kvm) &&
 	    kvm_page_track_write_tracking_enabled(kvm))
@@ -6696,12 +6735,13 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages);
 	spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
=20
-	r =3D kvm_mmu_alloc_page_hash(kvm);
-	if (r)
-		return r;
-
-	if (tdp_mmu_enabled)
+	if (tdp_mmu_enabled) {
 		kvm_mmu_init_tdp_mmu(kvm);
+	} else {
+		r =3D kvm_mmu_alloc_page_hash(kvm);
+		if (r)
+			return r;
+	}
=20
 	kvm->arch.split_page_header_cache.kmem_cache =3D mmu_page_header_cache;
 	kvm->arch.split_page_header_cache.gfp_zero =3D __GFP_ZERO;
--=20
2.49.0.1112.g889b7c5bd8-goog