From nobody Mon Feb  9 02:29:10 2026
Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com
 [209.85.216.74])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id D87AAA48
	for <linux-kernel@vger.kernel.org>; Fri, 23 May 2025 00:11:43 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.74
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1747959105; cv=none;
 b=QTOOUt3slCXaqr/12JLWvKHQjk0dCxUhzU9dwuMy9Pq8BdvFpIR0qPMb13bKVFPj9g2YqVkKZ4S2CbsMB9PBtlWv8VvOdueW1PBOO7U7DzJoK4SruE6s4dOIZR84V/w7dJASECXH9vZuOAGV8qtbOhtatfyAXWgHGnKN98s+IV0=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1747959105; c=relaxed/simple;
	bh=G0RDgBaetD99CnRPKH6iyJ3zEre2Mdnla3R/SEbQaoQ=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=Vv+0ZU1DIdVnRp919SQDB3OYF1PZ12xIuV+Iyc2oFQk2akTdZqO6/FwaSr17qu53lGit0eU9+Pu/MfryUiQ9Wd69gn6gFUZw539vm4XLSE3MwrliT9xuaOsi9dpXOSYsxTbSH2sRuVGFcODsysGS4fZ3UUnCqixzdlRBNfw1DTg=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=Bdw99PL1; arc=none smtp.client-ip=209.85.216.74
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="Bdw99PL1"
Received: by mail-pj1-f74.google.com with SMTP id
 98e67ed59e1d1-30e8425926eso9477033a91.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 22 May 2025 17:11:43 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1747959103; x=1748563903;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=ZTz7h4GlYFASCSDvq3MGwO0zKatp0Zx+JK9Pkw18hXA=;
        b=Bdw99PL1di+5L3Ufh//gKcfPS8ryxnSdM9elOPZZQy9GxO/26yMKq8Q1FQlxYkqgfX
         s/kMbAysf1hx9NNJMv/hr00QGTmkUKd1x1ZVgB+P1gYoavZ5g/MG8mIuwCQHOzrbKaTG
         DHIytBlR1EpelOZtaCzKWH6ByUedMa8spHbPTbEfB/OYWhNEqD3xglFuInp4zxdppP1c
         Df+6w/xYewI6WsayCBsbGskfBj/TM7waHDGP3XH1snY49786+YAridr9T1A+aWrfdGuA
         dtM1ZZg7tk1ClsLWX142ERXWBFA+PiG3xxHcmLn3fnajO+02U+QOB5L6ZCi9lEufZAog
         R2qA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1747959103; x=1748563903;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=ZTz7h4GlYFASCSDvq3MGwO0zKatp0Zx+JK9Pkw18hXA=;
        b=Es5vpVso11N4YJ8U2MrnUU1YpauHrDZlxSpPgR6qM5V6J8vG+spOodYWLb4pfTzWE+
         T0/nDbF9QNJbtFHoYi+6I1aDwn4JyNoQ/FcO7WJtNPswtLs9LcLB92X+bw7mHIqYsUGo
         ebObFuHccz5DBsRYL+MRCOAznP7FHsQyAoXMKqPDszMWtUaeujksCLsBQH1InhdAeC4H
         w87Oz08TOgutCghZ6CS6u6YnZpue3VnHyG3xAXWKuU+F0QFV7/kShChFBTbe0JPyRXRB
         jUidsz/dUmMeevMiipoT62bplj3zj5sM3rYTelNgXki8fG7y/etuS3TJ/fHkR7elu1iA
         oBDw==
X-Forwarded-Encrypted: i=1;
 AJvYcCXsy5/e7dp84enXH0CTAwFjNP5jGbYssvaqKbl7gx2GPoSlH1VFTvmZVaKDOSkxiiLAMu5Dt/GiNAVs/og=@vger.kernel.org
X-Gm-Message-State: AOJu0Yxf2dyYrn5vYUbLqRenG7BVzmH1RXLXWrvG+hAHZAhHkKp/E3eb
	NYZgJh1Gnfow6xwGpG6EsW45TbH9sdhHPnWXmfMiYrONR5FzO6YAumfWweVXR9uHmhuWp9l3x5C
	rn9m9Fg==
X-Google-Smtp-Source: 
 AGHT+IHONfSPdCJFXkshmuSt+zEMD7DSeSIrKIz/GIuk8D7dE7BGyxB9Vk9j/E/m1jYUQkiFGIU9BvhAhWs=
X-Received: from pjbpt18.prod.google.com
 ([2002:a17:90b:3d12:b0:2ff:6e58:8a0a])
 (user=seanjc job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:3684:b0:310:ca8f:18d3
 with SMTP id 98e67ed59e1d1-310ca8f18e2mr5191236a91.17.1747959103244; Thu, 22
 May 2025 17:11:43 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Thu, 22 May 2025 17:11:35 -0700
In-Reply-To: <20250523001138.3182794-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250523001138.3182794-1-seanjc@google.com>
X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog
Message-ID: <20250523001138.3182794-2-seanjc@google.com>
Subject: [PATCH v4 1/4] KVM: TDX: Move TDX hardware setup from main.c to tdx.c
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
 Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Vipin Sharma <vipinsh@google.com>, James Houghton <jthoughton@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Move TDX hardware setup to tdx.c, as the code is obviously TDX specific,
co-locating the setup with tdx_bringup() makes it easier to see and
document the success_disable_tdx "error" path, and configuring the TDX
specific hooks in tdx.c reduces the number of globally visible TDX symbols.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Kai Huang <kai.huang@intel.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/kvm/vmx/main.c    | 36 ++----------------------------
 arch/x86/kvm/vmx/tdx.c     | 45 +++++++++++++++++++++++++++-----------
 arch/x86/kvm/vmx/tdx.h     |  1 +
 arch/x86/kvm/vmx/x86_ops.h | 10 ---------
 4 files changed, 35 insertions(+), 57 deletions(-)

diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index d1e02e567b57..d7178d15ac8f 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -29,40 +29,8 @@ static __init int vt_hardware_setup(void)
 	if (ret)
 		return ret;
=20
-	/*
-	 * Update vt_x86_ops::vm_size here so it is ready before
-	 * kvm_ops_update() is called in kvm_x86_vendor_init().
-	 *
-	 * Note, the actual bringing up of TDX must be done after
-	 * kvm_ops_update() because enabling TDX requires enabling
-	 * hardware virtualization first, i.e., all online CPUs must
-	 * be in post-VMXON state.  This means the @vm_size here
-	 * may be updated to TDX's size but TDX may fail to enable
-	 * at later time.
-	 *
-	 * The VMX/VT code could update kvm_x86_ops::vm_size again
-	 * after bringing up TDX, but this would require exporting
-	 * either kvm_x86_ops or kvm_ops_update() from the base KVM
-	 * module, which looks overkill.  Anyway, the worst case here
-	 * is KVM may allocate couple of more bytes than needed for
-	 * each VM.
-	 */
-	if (enable_tdx) {
-		vt_x86_ops.vm_size =3D max_t(unsigned int, vt_x86_ops.vm_size,
-				sizeof(struct kvm_tdx));
-		/*
-		 * Note, TDX may fail to initialize in a later time in
-		 * vt_init(), in which case it is not necessary to setup
-		 * those callbacks.  But making them valid here even
-		 * when TDX fails to init later is fine because those
-		 * callbacks won't be called if the VM isn't TDX guest.
-		 */
-		vt_x86_ops.link_external_spt =3D tdx_sept_link_private_spt;
-		vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte;
-		vt_x86_ops.free_external_spt =3D tdx_sept_free_private_spt;
-		vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte;
-		vt_x86_ops.protected_apic_has_interrupt =3D tdx_protected_apic_has_inter=
rupt;
-	}
+	if (enable_tdx)
+		tdx_hardware_setup();
=20
 	return 0;
 }
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index b952bc673271..1790f6dee870 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -738,7 +738,7 @@ bool tdx_interrupt_allowed(struct kvm_vcpu *vcpu)
 	       !to_tdx(vcpu)->vp_enter_args.r12;
 }
=20
-bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
+static bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu)
 {
 	u64 vcpu_state_details;
=20
@@ -1543,8 +1543,8 @@ static int tdx_mem_page_record_premap_cnt(struct kvm =
*kvm, gfn_t gfn,
 	return 0;
 }
=20
-int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
-			      enum pg_level level, kvm_pfn_t pfn)
+static int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
+				     enum pg_level level, kvm_pfn_t pfn)
 {
 	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
 	struct page *page =3D pfn_to_page(pfn);
@@ -1624,8 +1624,8 @@ static int tdx_sept_drop_private_spte(struct kvm *kvm=
, gfn_t gfn,
 	return 0;
 }
=20
-int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn,
-			      enum pg_level level, void *private_spt)
+static int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn,
+				     enum pg_level level, void *private_spt)
 {
 	int tdx_level =3D pg_level_to_tdx_sept_level(level);
 	gpa_t gpa =3D gfn_to_gpa(gfn);
@@ -1760,8 +1760,8 @@ static void tdx_track(struct kvm *kvm)
 	kvm_make_all_cpus_request(kvm, KVM_REQ_OUTSIDE_GUEST_MODE);
 }
=20
-int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn,
-			      enum pg_level level, void *private_spt)
+static int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn,
+				     enum pg_level level, void *private_spt)
 {
 	struct kvm_tdx *kvm_tdx =3D to_kvm_tdx(kvm);
=20
@@ -1783,8 +1783,8 @@ int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t =
gfn,
 	return tdx_reclaim_page(virt_to_page(private_spt));
 }
=20
-int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
-				 enum pg_level level, kvm_pfn_t pfn)
+static int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
+					enum pg_level level, kvm_pfn_t pfn)
 {
 	struct page *page =3D pfn_to_page(pfn);
 	int ret;
@@ -3507,10 +3507,14 @@ int __init tdx_bringup(void)
 	r =3D __tdx_bringup();
 	if (r) {
 		/*
-		 * Disable TDX only but don't fail to load module if
-		 * the TDX module could not be loaded.  No need to print
-		 * message saying "module is not loaded" because it was
-		 * printed when the first SEAMCALL failed.
+		 * Disable TDX only but don't fail to load module if the TDX
+		 * module could not be loaded.  No need to print message saying
+		 * "module is not loaded" because it was printed when the first
+		 * SEAMCALL failed.  Don't bother unwinding the S-EPT hooks or
+		 * vm_size, as kvm_x86_ops have already been finalized (and are
+		 * intentionally not exported).  The S-EPT code is unreachable,
+		 * and allocating a few more bytes per VM in a should-be-rare
+		 * failure scenario is a non-issue.
 		 */
 		if (r =3D=3D -ENODEV)
 			goto success_disable_tdx;
@@ -3524,3 +3528,18 @@ int __init tdx_bringup(void)
 	enable_tdx =3D 0;
 	return 0;
 }
+
+void __init tdx_hardware_setup(void)
+{
+	/*
+	 * Note, if the TDX module can't be loaded, KVM TDX support will be
+	 * disabled but KVM will continue loading (see tdx_bringup()).
+	 */
+	vt_x86_ops.vm_size =3D max_t(unsigned int, vt_x86_ops.vm_size, sizeof(str=
uct kvm_tdx));
+
+	vt_x86_ops.link_external_spt =3D tdx_sept_link_private_spt;
+	vt_x86_ops.set_external_spte =3D tdx_sept_set_private_spte;
+	vt_x86_ops.free_external_spt =3D tdx_sept_free_private_spt;
+	vt_x86_ops.remove_external_spte =3D tdx_sept_remove_private_spte;
+	vt_x86_ops.protected_apic_has_interrupt =3D tdx_protected_apic_has_interr=
upt;
+}
diff --git a/arch/x86/kvm/vmx/tdx.h b/arch/x86/kvm/vmx/tdx.h
index 51f98443e8a2..ca39a9391db1 100644
--- a/arch/x86/kvm/vmx/tdx.h
+++ b/arch/x86/kvm/vmx/tdx.h
@@ -8,6 +8,7 @@
 #ifdef CONFIG_KVM_INTEL_TDX
 #include "common.h"
=20
+void tdx_hardware_setup(void);
 int tdx_bringup(void);
 void tdx_cleanup(void);
=20
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index b4596f651232..87e855276a88 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -136,7 +136,6 @@ int tdx_vcpu_pre_run(struct kvm_vcpu *vcpu);
 fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit);
 void tdx_prepare_switch_to_guest(struct kvm_vcpu *vcpu);
 void tdx_vcpu_put(struct kvm_vcpu *vcpu);
-bool tdx_protected_apic_has_interrupt(struct kvm_vcpu *vcpu);
 int tdx_handle_exit(struct kvm_vcpu *vcpu,
 		enum exit_fastpath_completion fastpath);
=20
@@ -151,15 +150,6 @@ int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data=
 *msr);
=20
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
=20
-int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn,
-			      enum pg_level level, void *private_spt);
-int tdx_sept_free_private_spt(struct kvm *kvm, gfn_t gfn,
-			      enum pg_level level, void *private_spt);
-int tdx_sept_set_private_spte(struct kvm *kvm, gfn_t gfn,
-			      enum pg_level level, kvm_pfn_t pfn);
-int tdx_sept_remove_private_spte(struct kvm *kvm, gfn_t gfn,
-				 enum pg_level level, kvm_pfn_t pfn);
-
 void tdx_flush_tlb_current(struct kvm_vcpu *vcpu);
 void tdx_flush_tlb_all(struct kvm_vcpu *vcpu);
 void tdx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, int root_leve=
l);
--=20
2.49.0.1151.ga128411c76-goog
From nobody Mon Feb  9 02:29:10 2026
Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com
 [209.85.216.73])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id A2B4B6FC3
	for <linux-kernel@vger.kernel.org>; Fri, 23 May 2025 00:11:45 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.73
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1747959107; cv=none;
 b=QTVYIFxaDCJT8Plp3XWJOB3ePEpEQuZj5iVx8ptruMk4oZC3vqJYOGrvJ06cX021Gz6sFro73qgwV+wRJ+77TgI9M6NKBsc0H7CBVt3cc9crnDPuVUNPgueFOUuIx8wnSnMLrrXuDuQQp7p66yO9mIFZw0618jmFhAqFYIviEw4=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1747959107; c=relaxed/simple;
	bh=yeeO76fTwkSlC35gpHI7W6M2JmOIOSugXmyVksY+1RQ=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=ECxBz98HxMtfWLrc6B3oDUmqT4qf/Tr67n5i598a39vN1qOPsTLdUE4ehMrugrzRTIDqhP5a7kcyFaM6vJoVqLFzzqSXRCR6axsVi0rbTVGnOJDR7yuGf37/cJrRtwA72cuDODEu2Zq7mRp2rIErd3CVXvBpaTUBoIpzj89szZ4=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=C+wT8y8Z; arc=none smtp.client-ip=209.85.216.73
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="C+wT8y8Z"
Received: by mail-pj1-f73.google.com with SMTP id
 98e67ed59e1d1-3108d5123c4so2053564a91.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 22 May 2025 17:11:45 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1747959105; x=1748563905;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=Dk34eBQNyx1X+T1IGkU3aiQ/pwVlCQf8gXCAh3fmSMc=;
        b=C+wT8y8ZXkv1PHdU3Cinyowb9cE6/J3GPGg+4hpG48cIDV8kRttBU0szpqKnQla6oA
         0UHtVS1ieiSncTC0dClDNVMfbZdLNpZWjcoev8oT4+wElm+yo8qSXLn8otFISFQXOROp
         s8k9dQQwh0w4E9IaXYi/4e7OqerjyxS0yJpqhi2piBtqs4lXoglV0U+BdyrHrVP92B39
         vwzOcEF6qf+zzeBXugQgWs4yT5p7F1C0BCADtIm9+OG0ZSsBzRpJJ8zrUokuR3GpzzT2
         YJorBzIGx+aieZB9B39TUp9RsUUUo35sEbc1t0nMiTrqKc8JAVD4VucC0p3HTdfku5s6
         HKUw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1747959105; x=1748563905;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=Dk34eBQNyx1X+T1IGkU3aiQ/pwVlCQf8gXCAh3fmSMc=;
        b=NKCzSfueqnA/VT2Er0+yti7sfwzSey2ZsFzndoie9oOCDFY3QPZgVWlOie56dvA9db
         3BlnQ7S+A1+MD6SMdp+S8FfKj/APwD2wypy6Q7Z0lfxEfmgqk6U3MU5wh3m7D4dSD8Jr
         8pdqgk50E8ZUGe9FmiDbYqH6n4xmDIXQ6JGewQqMqiqnv/cfDB5BUJHVUnpLtxxzgi68
         sde9CyaHISxuwD7Xg0FPxUZ6xsKFic2EiJdGh2J3bVi/zgDu0Suow+43T/PlpLi76LFs
         2+QDSCLZ5B3RXMwPdJl7I7Ye3LIkr9261Okm1CDoS96ldFAftySZgHAqORST/90ClHXu
         Ay5Q==
X-Forwarded-Encrypted: i=1;
 AJvYcCX9B7pxyM7FEtnuJBm64fti5YJ3cMNk0LsOS6IxMba3AV79K1tQKAo2WfE7JwfuzhApzBoVo5FB7MIuP+g=@vger.kernel.org
X-Gm-Message-State: AOJu0YwtpzQP6pzQvgh+T7h0MhKrE4RjmLFGThJa3sxKLnCcmWgKwE4P
	DECbIP2TrEkYLcw9WuAfMugIZHAKocA/xipDm0nf93qAkEXf7A1k93Yeu9Z7lZ+NCMxmaXaisTI
	RKfxhEA==
X-Google-Smtp-Source: 
 AGHT+IGQCeiY9Bgt7vILoyXCOkcSCyFklm70V/Ox192NsumKnWZCgipPK8ctHIxrF/qvGl+cM19ZN02dy1A=
X-Received: from pjbpd8.prod.google.com ([2002:a17:90b:1dc8:b0:2f9:c349:2f84])
 (user=seanjc job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:2e85:b0:2ee:af31:a7bd
 with SMTP id 98e67ed59e1d1-30e830c4833mr36271130a91.5.1747959104920; Thu, 22
 May 2025 17:11:44 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Thu, 22 May 2025 17:11:36 -0700
In-Reply-To: <20250523001138.3182794-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250523001138.3182794-1-seanjc@google.com>
X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog
Message-ID: <20250523001138.3182794-3-seanjc@google.com>
Subject: [PATCH v4 2/4] KVM: x86/mmu: Dynamically allocate shadow MMU's hashed
 page list
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
 Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Vipin Sharma <vipinsh@google.com>, James Houghton <jthoughton@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Dynamically allocate the (massive) array of hashed lists used to track
shadow pages, as the array itself is 32KiB, i.e. is an order-3 allocation
all on its own, and is *exactly* an order-3 allocation.  Dynamically
allocating the array will allow allocating "struct kvm" using kvmalloc(),
and will also allow deferring allocation of the array until it's actually
needed, i.e. until the first shadow root is allocated.

Opportunistically use kvmalloc() for the hashed lists, as an order-3
allocation is (stating the obvious) less likely to fail than an order-4
allocation, and the overhead of vmalloc() is undesirable given that the
size of the allocation is fixed.

Cc: Vipin Sharma <vipinsh@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  4 ++--
 arch/x86/kvm/mmu/mmu.c          | 23 ++++++++++++++++++++++-
 arch/x86/kvm/x86.c              |  5 ++++-
 3 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 330cdcbed1a6..9667d6b929ee 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1343,7 +1343,7 @@ struct kvm_arch {
 	bool has_private_mem;
 	bool has_protected_state;
 	bool pre_fault_allowed;
-	struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
+	struct hlist_head *mmu_page_hash;
 	struct list_head active_mmu_pages;
 	/*
 	 * A list of kvm_mmu_page structs that, if zapped, could possibly be
@@ -2006,7 +2006,7 @@ void kvm_mmu_vendor_module_exit(void);
=20
 void kvm_mmu_destroy(struct kvm_vcpu *vcpu);
 int kvm_mmu_create(struct kvm_vcpu *vcpu);
-void kvm_mmu_init_vm(struct kvm *kvm);
+int kvm_mmu_init_vm(struct kvm *kvm);
 void kvm_mmu_uninit_vm(struct kvm *kvm);
=20
 void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cbc84c6abc2e..41da2cb1e3f1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3882,6 +3882,18 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *v=
cpu)
 	return r;
 }
=20
+static int kvm_mmu_alloc_page_hash(struct kvm *kvm)
+{
+	typeof(kvm->arch.mmu_page_hash) h;
+
+	h =3D kvcalloc(KVM_NUM_MMU_PAGES, sizeof(*h), GFP_KERNEL_ACCOUNT);
+	if (!h)
+		return -ENOMEM;
+
+	kvm->arch.mmu_page_hash =3D h;
+	return 0;
+}
+
 static int mmu_first_shadow_root_alloc(struct kvm *kvm)
 {
 	struct kvm_memslots *slots;
@@ -6675,13 +6687,19 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
 		kvm_tdp_mmu_zap_invalidated_roots(kvm, true);
 }
=20
-void kvm_mmu_init_vm(struct kvm *kvm)
+int kvm_mmu_init_vm(struct kvm *kvm)
 {
+	int r;
+
 	kvm->arch.shadow_mmio_value =3D shadow_mmio_value;
 	INIT_LIST_HEAD(&kvm->arch.active_mmu_pages);
 	INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages);
 	spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
=20
+	r =3D kvm_mmu_alloc_page_hash(kvm);
+	if (r)
+		return r;
+
 	if (tdp_mmu_enabled)
 		kvm_mmu_init_tdp_mmu(kvm);
=20
@@ -6692,6 +6710,7 @@ void kvm_mmu_init_vm(struct kvm *kvm)
=20
 	kvm->arch.split_desc_cache.kmem_cache =3D pte_list_desc_cache;
 	kvm->arch.split_desc_cache.gfp_zero =3D __GFP_ZERO;
+	return 0;
 }
=20
 static void mmu_free_vm_memory_caches(struct kvm *kvm)
@@ -6703,6 +6722,8 @@ static void mmu_free_vm_memory_caches(struct kvm *kvm)
=20
 void kvm_mmu_uninit_vm(struct kvm *kvm)
 {
+	kvfree(kvm->arch.mmu_page_hash);
+
 	if (tdp_mmu_enabled)
 		kvm_mmu_uninit_tdp_mmu(kvm);
=20
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f9f798f286ce..d204ba9368f8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -12787,7 +12787,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long=
 type)
 	if (ret)
 		goto out;
=20
-	kvm_mmu_init_vm(kvm);
+	ret =3D kvm_mmu_init_vm(kvm);
+	if (ret)
+		goto out_cleanup_page_track;
=20
 	ret =3D kvm_x86_call(vm_init)(kvm);
 	if (ret)
@@ -12840,6 +12842,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long=
 type)
=20
 out_uninit_mmu:
 	kvm_mmu_uninit_vm(kvm);
+out_cleanup_page_track:
 	kvm_page_track_cleanup(kvm);
 out:
 	return ret;
--=20
2.49.0.1151.ga128411c76-goog
From nobody Mon Feb  9 02:29:10 2026
Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com
 [209.85.216.74])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6661C24DCFE
	for <linux-kernel@vger.kernel.org>; Fri, 23 May 2025 00:11:47 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.216.74
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1747959109; cv=none;
 b=oqbMyrxPbs9easfrRWoGNh12lwqvjF/9c4t1ttdap30+QVxr4F0S66Al1ZLYEezztK9YkbNfyjdzn3U+gxfknVBZHMcmuBDLNeTa1JtHDaMByrKrimga9ul3Ehh77vYnGBnXTrvlP78pFQ9hT1bdkwGNvPwxXDwVtmXmmF33sVE=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1747959109; c=relaxed/simple;
	bh=+ePbA9IebCdbBB4vABHa4UUkOnxwLFCdaaUUAlJzaeo=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=NRZHWlsPy64mEO0+SrXjtWE9jtfUAenC0t8Yj/ENxmrLdPawyasdncejxkjEu7z1HR4Fy0BmqUlzc/5LwPVNWCSZzLBMgO6P9kmcOtVEXsHvb13N8mj2o8vNnXkLV1Zf/vkvho/8UESbkAUT+m+JCeADxyEy9ytRXtoKodApI4U=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=o3sxMYV6; arc=none smtp.client-ip=209.85.216.74
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="o3sxMYV6"
Received: by mail-pj1-f74.google.com with SMTP id
 98e67ed59e1d1-3108d5538c0so2056365a91.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 22 May 2025 17:11:47 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1747959106; x=1748563906;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=SFvVGfdjVe68Lcv3nMcYvz6wEM8RX1QjbaqHj6wcH2k=;
        b=o3sxMYV6cSis3XtpXIXqQJ/ix5+ZyB/OAbceQwb8jHrTkOmUXwO266UsxkGw1lfxh9
         nxt/uuO/1br6UVKKOI2vR4pN36LG6MwgmKiLIW+LC3keSWW8pduQVulU6s168JRu/DVh
         j/3E2L/7qJPQESHx/rOVCTvCP8AUTRJLNYR30EDyEEPWzueu6r5HaxJdeu45PXHDKxon
         GRQfXYQ9Lb9U95Zuzd0azls1vgRVBkzZhnnn0blwneHQOSSWqjFzENjcJU024FaOoFPd
         pwFdkH457y//cTzVuFdbUVKN2ptnoPSq67RS69ppQdGlvkczEtrOM4mjNJBcNKkZ8mYp
         9VFA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1747959106; x=1748563906;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=SFvVGfdjVe68Lcv3nMcYvz6wEM8RX1QjbaqHj6wcH2k=;
        b=alBTR2uTiuu2+XmuDN6Q+OglDCr3MpiK0zg898MaV01tHVKyWJye+/CTPwm4gFneP3
         yvrkJFGBhF5xCHligqzBgQwSmvX/pGLHrAIRcNCrKLPD+inTDjy2YCHnwuWECjgWd0ZP
         NU49D5ulyY1eELUiSSbshwRvquYegOl/M/y+1TN295fJydSsj/eNS34D/JYW37ktgO9A
         rZdSGrp81ofpMRG29zSKQBnynzF14KTo0GA6FMX9j4k2NSQOL0vKxP40k6VDUxSxVNND
         z+8RPw8hxcPEDgZLwq5yPch82Xhj5QvOpEp6gdsqcSxuLxkLI4n5Acayjy5gR6nNQLFa
         FHlw==
X-Forwarded-Encrypted: i=1;
 AJvYcCUJ2lP9W2r8FrwO4cgHcWecvcQ0ec6tFuz94CTWM33Nb1YbqDD3HwkvkPZjkcIPwzaMaK71dq5fWTxS39U=@vger.kernel.org
X-Gm-Message-State: AOJu0Yxd7YWunCW9rG3l6KCwcU7OO9ctazFS88z/2nGyoM6U5exWKWRR
	WzwBtg1fiuDbp78P8wdXv4ZxeOVHy36sORfBRvL9TlP79eyKosUolmWQX8/y/jUije0VB1RVQJh
	Avu2RxQ==
X-Google-Smtp-Source: 
 AGHT+IFXejHNB1MVLyCyxsu8hgU/yJNUmWF4rMS2c4NmhYW0BkLhjuLGqMvU5Lsb8AeRc+Sz5DJolyQUQkQ=
X-Received: from pjd15.prod.google.com ([2002:a17:90b:54cf:b0:308:861f:fddb])
 (user=seanjc job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:2f0d:b0:2fe:68a5:d84b
 with SMTP id 98e67ed59e1d1-30e7d4e7e1bmr38315774a91.1.1747959106548; Thu, 22
 May 2025 17:11:46 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Thu, 22 May 2025 17:11:37 -0700
In-Reply-To: <20250523001138.3182794-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250523001138.3182794-1-seanjc@google.com>
X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog
Message-ID: <20250523001138.3182794-4-seanjc@google.com>
Subject: [PATCH v4 3/4] KVM: x86: Use kvzalloc() to allocate VM struct
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
 Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Vipin Sharma <vipinsh@google.com>, James Houghton <jthoughton@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Allocate VM structs via kvzalloc(), i.e. try to use a contiguous physical
allocation before falling back to __vmalloc(), to avoid the overhead of
establishing the virtual mappings.  For non-debug builds, The SVM and VMX
(and TDX) structures are now just below 7000 bytes in the worst case
scenario (see below), i.e. are order-1 allocations, and will likely remain
that way for quite some time.

Add compile-time assertions in vendor code to ensure the size of the
structures, sans the memslos hash tables, are order-0 allocations, i.e.
are less than 4KiB.  There's nothing fundamentally wrong with a larger
kvm_{svm,vmx,tdx} size, but given that the size of the structure (without
the memslots hash tables) is below 2KiB after 18+ years of existence,
more than doubling the size would be quite notable.

Add sanity checks on the memslot hash table sizes, partly to ensure they
aren't resized without accounting for the impact on VM structure size, and
partly to document that the majority of the size of VM structures comes
from the memslots.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com>
---
 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/svm/svm.c          |  2 ++
 arch/x86/kvm/vmx/tdx.c          |  2 ++
 arch/x86/kvm/vmx/vmx.c          |  2 ++
 arch/x86/kvm/x86.h              | 22 ++++++++++++++++++++++
 5 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 9667d6b929ee..3a985825a945 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1961,7 +1961,7 @@ void kvm_x86_vendor_exit(void);
 #define __KVM_HAVE_ARCH_VM_ALLOC
 static inline struct kvm *kvm_arch_alloc_vm(void)
 {
-	return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO);
+	return kvzalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT);
 }
=20
 #define __KVM_HAVE_ARCH_VM_FREE
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 0ad1a6d4fb6d..d13e475c3407 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -5675,6 +5675,8 @@ static int __init svm_init(void)
 {
 	int r;
=20
+	KVM_SANITY_CHECK_VM_STRUCT_SIZE(kvm_svm);
+
 	__unused_size_checks();
=20
 	if (!kvm_is_svm_supported())
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 1790f6dee870..559fb18ff9fb 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -3531,6 +3531,8 @@ int __init tdx_bringup(void)
=20
 void __init tdx_hardware_setup(void)
 {
+	KVM_SANITY_CHECK_VM_STRUCT_SIZE(kvm_tdx);
+
 	/*
 	 * Note, if the TDX module can't be loaded, KVM TDX support will be
 	 * disabled but KVM will continue loading (see tdx_bringup()).
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 9ff00ae9f05a..ef58b727d6c8 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -8643,6 +8643,8 @@ int __init vmx_init(void)
 {
 	int r, cpu;
=20
+	KVM_SANITY_CHECK_VM_STRUCT_SIZE(kvm_vmx);
+
 	if (!kvm_is_vmx_supported())
 		return -EOPNOTSUPP;
=20
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index 832f0faf4779..db4e6a90e83d 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -55,6 +55,28 @@ struct kvm_host_values {
=20
 void kvm_spurious_fault(void);
=20
+#define SIZE_OF_MEMSLOTS_HASHTABLE \
+	(sizeof(((struct kvm_memslots *)0)->id_hash) * 2 * KVM_MAX_NR_ADDRESS_SPA=
CES)
+
+/* Sanity check the size of the memslot hash tables. */
+static_assert(SIZE_OF_MEMSLOTS_HASHTABLE =3D=3D
+	      (1024 * (1 + IS_ENABLED(CONFIG_X86_64)) * (1 + IS_ENABLED(CONFIG_KV=
M_SMM))));
+
+/*
+ * Assert that "struct kvm_{svm,vmx,tdx}" is an order-0 or order-1 allocat=
ion.
+ * Spilling over to an order-2 allocation isn't fundamentally problematic,=
 but
+ * isn't expected to happen in the foreseeable future (O(years)).  Assert =
that
+ * the size is an order-0 allocation when ignoring the memslot hash tables=
, to
+ * help detect and debug unexpected size increases.
+ */
+#define KVM_SANITY_CHECK_VM_STRUCT_SIZE(x)						\
+do {											\
+	BUILD_BUG_ON(get_order(sizeof(struct x) - SIZE_OF_MEMSLOTS_HASHTABLE) &&	\
+		     !IS_ENABLED(CONFIG_DEBUG_KERNEL) && !IS_ENABLED(CONFIG_KASAN));	\
+	BUILD_BUG_ON(get_order(sizeof(struct x)) > 1 &&					\
+		     !IS_ENABLED(CONFIG_DEBUG_KERNEL) && !IS_ENABLED(CONFIG_KASAN));	\
+} while (0)
+
 #define KVM_NESTED_VMENTER_CONSISTENCY_CHECK(consistency_check)		\
 ({									\
 	bool failed =3D (consistency_check);				\
--=20
2.49.0.1151.ga128411c76-goog
From nobody Mon Feb  9 02:29:10 2026
Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com
 [209.85.215.202])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C48E624DD0B
	for <linux-kernel@vger.kernel.org>; Fri, 23 May 2025 00:11:48 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=209.85.215.202
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1747959110; cv=none;
 b=Aw6X1SaJwYTYLUnPQ0pwxHHppDd+uFgt9yu6sfX/XzHedUQVu4GOb1mJBvgNznygnwdfsEQgELLTnWhtud8SACQfahaV9WCMuv//N7c1TICCH9I0C+vS86gLorAUXA9FV/6rgVtWNPFAP/BBnpMP+mu9GFs9DmvtkW2VNyvZ740=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1747959110; c=relaxed/simple;
	bh=B12X3JWhxznlcRgHh2qi4Ak58CALHcWct2odIF6qY+E=;
	h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From:
	 To:Cc:Content-Type;
 b=sWNU1Jg6n4KSXi+Ae+SU6+BOUDjjpULlpvIpdhB4AovvpqnrPxrAlXgZ/qQSBctmTK/KUEyEWyr2riCz57TXWOFeSA1ugp8iVX5Sml6bgUl02B6iZXCiTRDb050svY6WdW/UR4X7eggnAIP1szhQXafWhwGC2BqHh1qL+a05Mdk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com;
 dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b=LJKcaoi6; arc=none smtp.client-ip=209.85.215.202
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=google.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com
 header.b="LJKcaoi6"
Received: by mail-pg1-f202.google.com with SMTP id
 41be03b00d2f7-b16b35ea570so8428713a12.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 22 May 2025 17:11:48 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20230601; t=1747959108; x=1748563908;
 darn=vger.kernel.org;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:from:to:cc:subject:date:message-id:reply-to;
        bh=ICBtionEpcInnEHm0R5wGA1N61VckpDnHwHr/zSAgpU=;
        b=LJKcaoi6iqLYwBANDxnMPqiyIjZ96kLA7ipV2whMLREDkbf61+I4FYJZfLLmu2ZpVn
         453G+8V+Yhx1DSUEHCX3uEC5y4mNV80JcZm+V6pmLGYRoEp6LA1Wh9QM6eEVbyC5J6W0
         vDS0QQy4x7NHUxIzkZ3sipBqQOrS4m0zNqVGkuxXoge0y5uuSBDuHvyB3vly/TQOP7Va
         TQ6f6hrMGvkYUIaDU5AnNHDj60N2ahzlB2EERXvNEFB0pq4Fy8SWDe3dxUvmfaEObdMs
         Pup12kA0glPlClfF90UwTJJlFUSe2dSqyCbg+worb26giFbCrUG8knZXDPPz+tVPCKDF
         ESKg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1747959108; x=1748563908;
        h=cc:to:from:subject:message-id:references:mime-version:in-reply-to
         :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id
         :reply-to;
        bh=ICBtionEpcInnEHm0R5wGA1N61VckpDnHwHr/zSAgpU=;
        b=PMvfa3yssEjetw1zC91oJm6g5WWW4G65J/smH/z54zlj1QYMCvcLV3yMZNVrgipOpu
         JoAAeR42s7NmsgvFLSXASKp5Y8IxnK6cUi7FbKPFBzl6LyP/uYVmor+um7vnSZOrmGTA
         DSRkPuRXvERYU9vCzTa+xhLvGU7jJT0EfYJSayvbnFBvYhzrTqcdXGQmYEKh3YaKWVvb
         aGByk6KtfRiTeE0wRSukTin+EMQISvnkZQFpCYqDJtIlU14n2pezW962y3uArQS/5ZtF
         33SWEJDqJjaPsGXGELQzMxQDARVrexnMO+zHL83fkK7cY4PEL7If9hKHaHGbWACmuhDm
         M80w==
X-Forwarded-Encrypted: i=1;
 AJvYcCUIY8loXbSp546k8XUI37KwGnCwn24MwLPYC20Phh/p1F8bPVCWgqTbrX/DI7Eg4drgOvAmuQrX0EU5OJA=@vger.kernel.org
X-Gm-Message-State: AOJu0YxbHrqnJlPknRDHlMneAkQ9+sZKVtiHPAGDVBANzQoiVfSPe9he
	1T4DWE4yTl/3MuzfUqDTV6kBKfAOkkGx4WXK9Hi4C0ong0eyblbpz55jC+Izv2CbZUpnG9tfjNV
	vlVISZQ==
X-Google-Smtp-Source: 
 AGHT+IHHYxG9CcZuEmSk4Nb2ra5RfhcDZNkQLE2VxWWyhLXKCZHL145tN4rfUIK1hwi4Tnf2i/3YCUPh+L8=
X-Received: from pjtu7.prod.google.com ([2002:a17:90a:c887:b0:2ea:29de:af10])
 (user=seanjc job=prod-delivery.src-stubby-dispatcher) by
 2002:a17:90b:57ec:b0:30c:52c5:3dc4
 with SMTP id 98e67ed59e1d1-310e972bb26mr1890088a91.24.1747959108208; Thu, 22
 May 2025 17:11:48 -0700 (PDT)
Reply-To: Sean Christopherson <seanjc@google.com>
Date: Thu, 22 May 2025 17:11:38 -0700
In-Reply-To: <20250523001138.3182794-1-seanjc@google.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
Mime-Version: 1.0
References: <20250523001138.3182794-1-seanjc@google.com>
X-Mailer: git-send-email 2.49.0.1151.ga128411c76-goog
Message-ID: <20250523001138.3182794-5-seanjc@google.com>
Subject: [PATCH v4 4/4] KVM: x86/mmu: Defer allocation of shadow MMU's hashed
 page list
From: Sean Christopherson <seanjc@google.com>
To: Sean Christopherson <seanjc@google.com>,
 Paolo Bonzini <pbonzini@redhat.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	Vipin Sharma <vipinsh@google.com>, James Houghton <jthoughton@google.com>
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

When the TDP MMU is enabled, i.e. when the shadow MMU isn't used until a
nested TDP VM is run, defer allocation of the array of hashed lists used
to track shadow MMU pages until the first shadow root is allocated.

Setting the list outside of mmu_lock is safe, as concurrent readers must
hold mmu_lock in some capacity, shadow pages can only be added (or removed)
from the list when mmu_lock is held for write, and tasks that are creating
a shadow root are serialized by slots_arch_lock.  I.e. it's impossible for
the list to become non-empty until all readers go away, and so readers are
guaranteed to see an empty list even if they make multiple calls to
kvm_get_mmu_page_hash() in a single mmu_lock critical section.

Use smp_store_release() and smp_load_acquire() to access the hash table
pointer to ensure the stores to zero the lists are retired before readers
start to walk the list.  E.g. if the compiler hoisted the store before the
zeroing of memory, for_each_gfn_valid_sp_with_gptes() could consume stale
kernel data.

Cc: James Houghton <jthoughton@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/mmu/mmu.c | 62 +++++++++++++++++++++++++++++++++++-------
 1 file changed, 52 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 41da2cb1e3f1..173f7fdfba21 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1983,14 +1983,35 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp)
 	return true;
 }
=20
+static __ro_after_init HLIST_HEAD(empty_page_hash);
+
+static struct hlist_head *kvm_get_mmu_page_hash(struct kvm *kvm, gfn_t gfn)
+{
+	/*
+	 * Ensure the load of the hash table pointer itself is ordered before
+	 * loads to walk the table.  The pointer is set at runtime outside of
+	 * mmu_lock when the TDP MMU is enabled, i.e. when the hash table of
+	 * shadow pages becomes necessary only when KVM needs to shadow L1's
+	 * TDP for an L2 guest.  Pairs with the smp_store_release() in
+	 * kvm_mmu_alloc_page_hash().
+	 */
+	struct hlist_head *page_hash =3D smp_load_acquire(&kvm->arch.mmu_page_has=
h);
+
+	lockdep_assert_held(&kvm->mmu_lock);
+
+	if (!page_hash)
+		return &empty_page_hash;
+
+	return &page_hash[kvm_page_table_hashfn(gfn)];
+}
+
 #define for_each_valid_sp(_kvm, _sp, _list)				\
 	hlist_for_each_entry(_sp, _list, hash_link)			\
 		if (is_obsolete_sp((_kvm), (_sp))) {			\
 		} else
=20
 #define for_each_gfn_valid_sp_with_gptes(_kvm, _sp, _gfn)		\
-	for_each_valid_sp(_kvm, _sp,					\
-	  &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)])	\
+	for_each_valid_sp(_kvm, _sp, kvm_get_mmu_page_hash(_kvm, _gfn))	\
 		if ((_sp)->gfn !=3D (_gfn) || !sp_has_gptes(_sp)) {} else
=20
 static bool kvm_sync_page_check(struct kvm_vcpu *vcpu, struct kvm_mmu_page=
 *sp)
@@ -2358,6 +2379,12 @@ static struct kvm_mmu_page *__kvm_mmu_get_shadow_pag=
e(struct kvm *kvm,
 	struct kvm_mmu_page *sp;
 	bool created =3D false;
=20
+	/*
+	 * No need for memory barriers, unlike in kvm_get_mmu_page_hash(), as
+	 * mmu_page_hash must be set prior to creating the first shadow root,
+	 * i.e. reaching this point is fully serialized by slots_arch_lock.
+	 */
+	BUG_ON(!kvm->arch.mmu_page_hash);
 	sp_list =3D &kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)];
=20
 	sp =3D kvm_mmu_find_shadow_page(kvm, vcpu, gfn, sp_list, role);
@@ -3886,11 +3913,21 @@ static int kvm_mmu_alloc_page_hash(struct kvm *kvm)
 {
 	typeof(kvm->arch.mmu_page_hash) h;
=20
+	if (kvm->arch.mmu_page_hash)
+		return 0;
+
 	h =3D kvcalloc(KVM_NUM_MMU_PAGES, sizeof(*h), GFP_KERNEL_ACCOUNT);
 	if (!h)
 		return -ENOMEM;
=20
-	kvm->arch.mmu_page_hash =3D h;
+	/*
+	 * Ensure the hash table pointer is set only after all stores to zero
+	 * the memory are retired.  Pairs with the smp_load_acquire() in
+	 * kvm_get_mmu_page_hash().  Note, mmu_lock must be held for write to
+	 * add (or remove) shadow pages, and so readers are guaranteed to see
+	 * an empty list for their current mmu_lock critical section.
+	 */
+	smp_store_release(&kvm->arch.mmu_page_hash, h);
 	return 0;
 }
=20
@@ -3913,9 +3950,13 @@ static int mmu_first_shadow_root_alloc(struct kvm *k=
vm)
 	if (kvm_shadow_root_allocated(kvm))
 		goto out_unlock;
=20
+	r =3D kvm_mmu_alloc_page_hash(kvm);
+	if (r)
+		goto out_unlock;
+
 	/*
-	 * Check if anything actually needs to be allocated, e.g. all metadata
-	 * will be allocated upfront if TDP is disabled.
+	 * Check if memslot metadata actually needs to be allocated, e.g. all
+	 * metadata will be allocated upfront if TDP is disabled.
 	 */
 	if (kvm_memslots_have_rmaps(kvm) &&
 	    kvm_page_track_write_tracking_enabled(kvm))
@@ -6696,12 +6737,13 @@ int kvm_mmu_init_vm(struct kvm *kvm)
 	INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages);
 	spin_lock_init(&kvm->arch.mmu_unsync_pages_lock);
=20
-	r =3D kvm_mmu_alloc_page_hash(kvm);
-	if (r)
-		return r;
-
-	if (tdp_mmu_enabled)
+	if (tdp_mmu_enabled) {
 		kvm_mmu_init_tdp_mmu(kvm);
+	} else {
+		r =3D kvm_mmu_alloc_page_hash(kvm);
+		if (r)
+			return r;
+	}
=20
 	kvm->arch.split_page_header_cache.kmem_cache =3D mmu_page_header_cache;
 	kvm->arch.split_page_header_cache.gfp_zero =3D __GFP_ZERO;
--=20
2.49.0.1151.ga128411c76-goog