From nobody Fri Dec 19 19:17:08 2025 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2B338280A4F for ; Fri, 16 May 2025 21:54:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747432473; cv=none; b=qIm0nmEAr3ESzTQweKk9SA7zp331Bc9xS7UCyQ261SPst5mGVeF6WBARjATm5JIf791UiKeEPUOXzqiOwalCBbG3kQYzSscofY5aGNLyY7IpK8KI9NIE+mqdzDgmnfdKJ4AUOF1TDZEKGueyIO76fPF39E4CcjbnYdNo6m8oqT8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747432473; c=relaxed/simple; bh=b8X+SZETCA+/r2QlXODqiqz30AodXiz4FkatNip2IL4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=MqhQV/k5pNXKqr/VsL6DpSkjHZKx55BrzzsYqqJaQkHy1IuluDxgo2DdrlSlziFbCJmTsXVUj6kNDZD1T/RNcuyoCEWDgiXh4ndB+6Nj+1A/z311yj6428YWqqQMcibwhhx558y2rvVNnDiPNZOnehbsVOKLTxg05EftCLVHOaY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=HDCY8peW; arc=none smtp.client-ip=209.85.210.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="HDCY8peW" Received: by mail-pf1-f202.google.com with SMTP id d2e1a72fcca58-7394772635dso1832838b3a.0 for ; Fri, 16 May 2025 14:54:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747432471; x=1748037271; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=he9wiLV5uZ6/m/Ubi4LHJ7N+Onl68GiZAuqObi6fPYU=; b=HDCY8peWj93g5yHjAtIF5tcgJPVRrOv2O1aYUHlgcaAyxJfo9tfvg1kKgJhdNT+/el iOzgjlFiDVnUg5tJEYHF8hlX2jSMeptEidk7p4BBMtJixj+8OLQD1bi4mr46sJvT34+k 4BV9ebfvDF5k9lUi/VzqwXSbd27bw3yeHVzwvIoObMtjQ/G0mlvM8RaESwg/4cFgCPES mpmCowv6ewqTnKVIJGM9uiCPtHrWWm6NCElVUNPyx4esluYq+K86qjmOnkiIP1WjnR4k FYDGU0b+Xtqgc8y7utEN6nPEOATGPUQhdy9+ylDh2xK1ReFYQLjIln569g2amGI7xvVF ifyw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747432471; x=1748037271; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=he9wiLV5uZ6/m/Ubi4LHJ7N+Onl68GiZAuqObi6fPYU=; b=WzXcBjKMfmdj0+TEq7P6w5/povWv8ukuHKE9Cv2bUuFHBmoRaX5vpy1G/0/pKkzdoO ySXrNyl/alJPEgvVyb9pM1s4FMMMbiYCXpXtdAFhdO8+GDKxTwd9l8iP9rCMflfbWDml xEUhj07qV4CKSJEJyFOkFvfO72OyU0FBFl+Fjng6gi+H4wJ77IlQv/5NfKveJL9elj9p +VHfTl/Rd6/auRe7PtGcGNNTmiyUiV05MsKjIY/4HH6T8X6cA2eLfvAptv9le+0g/wvO Ap6WcE5nX9BsGpou3Do6jfIh7ZtXT08D2F0ie/Hd7yMpEzYJGP4VEFSqTE+v1bm1TqJJ sX7g== X-Forwarded-Encrypted: i=1; AJvYcCU/erDGw84KlVMLoLt/1vHVlF/KvldrMT7VukRu7P2dq1sfiMJdJdutr2Oss1FN/hUnBx+Sj2VQESVfbwE=@vger.kernel.org X-Gm-Message-State: AOJu0YxZ2kPEF2ILX62MrUl1YVulXqsxfcxO3+5eumQRhtHJabTM8nw4 xz7ConqsnJwwnbkmLymzdDlR5BvSWeEMadZyDe88YNBL2BStjDR8x5YBSj4bs4BRGDieruR9Ibk +SjctMg== X-Google-Smtp-Source: AGHT+IFvdwjKVnl8oB8hg8sThG1n7Y/nnlVTgBb2yxJZPjq66baQbtmAXviN6UxG+4eozQyHW/ONPWfBnpk= X-Received: from pfbhe16.prod.google.com ([2002:a05:6a00:6610:b0:730:90b2:dab]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a20:1591:b0:1f1:432:f4a3 with SMTP id adf61e73a8af0-2170ccb2dccmr6280425637.23.1747432471453; Fri, 16 May 2025 14:54:31 -0700 (PDT) Reply-To: Sean Christopherson Date: Fri, 16 May 2025 14:54:20 -0700 In-Reply-To: <20250516215422.2550669-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250516215422.2550669-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog Message-ID: <20250516215422.2550669-2-seanjc@google.com> Subject: [PATCH v3 1/3] KVM: x86/mmu: Dynamically allocate shadow MMU's hashed page list From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Dynamically allocate the (massive) array of hashed lists used to track shadow pages, as the array itself is 32KiB, i.e. is an order-3 allocation all on its own, and is *exactly* an order-3 allocation. Dynamically allocating the array will allow allocating "struct kvm" using kvmalloc(), and will also allow deferring allocation of the array until it's actually needed, i.e. until the first shadow root is allocated. Opportunistically use kvmalloc() for the hashed lists, as an order-3 allocation is (stating the obvious) less likely to fail than an order-4 allocation, and the overhead of vmalloc() is undesirable given that the size of the allocation is fixed. Cc: Vipin Sharma Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 4 ++-- arch/x86/kvm/mmu/mmu.c | 23 ++++++++++++++++++++++- arch/x86/kvm/x86.c | 5 ++++- 3 files changed, 28 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 330cdcbed1a6..9667d6b929ee 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1343,7 +1343,7 @@ struct kvm_arch { bool has_private_mem; bool has_protected_state; bool pre_fault_allowed; - struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES]; + struct hlist_head *mmu_page_hash; struct list_head active_mmu_pages; /* * A list of kvm_mmu_page structs that, if zapped, could possibly be @@ -2006,7 +2006,7 @@ void kvm_mmu_vendor_module_exit(void); =20 void kvm_mmu_destroy(struct kvm_vcpu *vcpu); int kvm_mmu_create(struct kvm_vcpu *vcpu); -void kvm_mmu_init_vm(struct kvm *kvm); +int kvm_mmu_init_vm(struct kvm *kvm); void kvm_mmu_uninit_vm(struct kvm *kvm); =20 void kvm_mmu_init_memslot_memory_attributes(struct kvm *kvm, diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index cbc84c6abc2e..41da2cb1e3f1 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3882,6 +3882,18 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *v= cpu) return r; } =20 +static int kvm_mmu_alloc_page_hash(struct kvm *kvm) +{ + typeof(kvm->arch.mmu_page_hash) h; + + h =3D kvcalloc(KVM_NUM_MMU_PAGES, sizeof(*h), GFP_KERNEL_ACCOUNT); + if (!h) + return -ENOMEM; + + kvm->arch.mmu_page_hash =3D h; + return 0; +} + static int mmu_first_shadow_root_alloc(struct kvm *kvm) { struct kvm_memslots *slots; @@ -6675,13 +6687,19 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm) kvm_tdp_mmu_zap_invalidated_roots(kvm, true); } =20 -void kvm_mmu_init_vm(struct kvm *kvm) +int kvm_mmu_init_vm(struct kvm *kvm) { + int r; + kvm->arch.shadow_mmio_value =3D shadow_mmio_value; INIT_LIST_HEAD(&kvm->arch.active_mmu_pages); INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages); spin_lock_init(&kvm->arch.mmu_unsync_pages_lock); =20 + r =3D kvm_mmu_alloc_page_hash(kvm); + if (r) + return r; + if (tdp_mmu_enabled) kvm_mmu_init_tdp_mmu(kvm); =20 @@ -6692,6 +6710,7 @@ void kvm_mmu_init_vm(struct kvm *kvm) =20 kvm->arch.split_desc_cache.kmem_cache =3D pte_list_desc_cache; kvm->arch.split_desc_cache.gfp_zero =3D __GFP_ZERO; + return 0; } =20 static void mmu_free_vm_memory_caches(struct kvm *kvm) @@ -6703,6 +6722,8 @@ static void mmu_free_vm_memory_caches(struct kvm *kvm) =20 void kvm_mmu_uninit_vm(struct kvm *kvm) { + kvfree(kvm->arch.mmu_page_hash); + if (tdp_mmu_enabled) kvm_mmu_uninit_tdp_mmu(kvm); =20 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f9f798f286ce..d204ba9368f8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -12787,7 +12787,9 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long= type) if (ret) goto out; =20 - kvm_mmu_init_vm(kvm); + ret =3D kvm_mmu_init_vm(kvm); + if (ret) + goto out_cleanup_page_track; =20 ret =3D kvm_x86_call(vm_init)(kvm); if (ret) @@ -12840,6 +12842,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long= type) =20 out_uninit_mmu: kvm_mmu_uninit_vm(kvm); +out_cleanup_page_track: kvm_page_track_cleanup(kvm); out: return ret; --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Fri Dec 19 19:17:08 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6529E280306 for ; Fri, 16 May 2025 21:54:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747432481; cv=none; b=fhGHls0Ybxol6IW9aj3Jksj3XgGQUt3vY4KmaPg9TZi4Py/mkMgUEQJL5qaduP//ubgR/1Inrtk+HgUkYaQLO/ZccHtQ+x/XuhelUul0QJSWBUl4oyBcM+HzbhQNTW/tQxlsQubknw8SSLD7FjkgRbX6R1SCuXCdYJY6UT0DzTw= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747432481; c=relaxed/simple; bh=nRPieSPOQF3VWwkmferQ1dgFBNyvX49xzQ3/NHlAE9Q=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=hshP2GqIRX5NaKa6OsJVGzpW9XsAY8wY3ZSUtl+XZMXX6yTDhTXnEhdgwuxvkuBDYdw6UhtobesATFJQGNHUua9dlIDNrHzV/heAuEYsExp2GoTCZo+gzbVJS5+fwvel3rVVphWq9W08tz90wswyc2aXDciodl8nP9FjHFQH+YQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=mpgAOzN6; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mpgAOzN6" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30e6ccd17d6so2370343a91.1 for ; Fri, 16 May 2025 14:54:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747432479; x=1748037279; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=D0gXvpr2sHdqy9QZLUq/qbAZv7yiyUTA7R6EBW7+pvc=; b=mpgAOzN609xA3+MSbXoT0pRFGc/sGYDtMvWODsOZRp9/SdDIQwxKNRbycdg5iCrscn 1UjqFf4KpE+cZh8qwGS3V166rUtCGI0uo4B0t5sLmVzsbf5cepwakJGItZlS3EBZZ8iA yxIno286gtoM1apT3fSWbxtp+TYhI5axFXVbAGihfk/HaW3WpQ6E8D+D9Ww6BBgNKIlq 3Pz2l05PIcnBnlwJeaFQ5KEq0yxgkuBu4VqIpG9Dd9Zmz62ehwukxDobK4v/h9bdZvpI zrwySx6zD37w1pEBhalO0BxjeXagDd02FYqRPk7446mTnoUg42JKOk2aqRjPLYD+LBPA ZVzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747432479; x=1748037279; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=D0gXvpr2sHdqy9QZLUq/qbAZv7yiyUTA7R6EBW7+pvc=; b=DZxBjTcjt2BQblM8itu9kvdQT3n4Unu9SiR4w55wH4IJo6Qi4kWfB2t93CrWjuCsC3 as1yLZAUVq+T9k0vVOYf9QpDE4D3Ys+LNa2WFhT5FL4pQ8Ty3MjIq7WXpVnRQr8p9iAx A6XR5mqMRLyB3n0DV1IOGRgSgKudEbx670Nd79ukwBqo8bGbOZ68LFOo51bhWTGT7tfQ QZoYGYTTrPKpT+L/9m06S2sNNWednCk40001aaz51zATl4LvcUh1nxM+oW9bM3GDDKv/ yO11SWpBElPUGliGFM+MRZ15rHpyPRxKMyFQGRMeagx7ME5LTWOCk5BOHTDGs2zsouHA sStw== X-Forwarded-Encrypted: i=1; AJvYcCV7e+9nh4Auli/fZA1c/aI5zgbj5o1OW1XtmpybseNB4b3AOi/RK6i6+h3YwfkcxvxHUviMwZDMM9sSEzo=@vger.kernel.org X-Gm-Message-State: AOJu0YzuI0M9Au0QC+g8tKtcFIc0N4z4rivcxrEYuqcQSNlHCPLiW2GI XXmLjM6Wvp1SWgVLkxmdZlJ3uEorVn2YMyHa4Z4bvZREsQS4uwJ1GBWkOyKkpLeEoAifrQMmzfR gp54vtg== X-Google-Smtp-Source: AGHT+IH34nCA0NwEOc5om/HjB7xT1Sgsq7/Owr6ybh70IbeIayMQAkFXjkdzcFi3Zex3eJnI4pnnMU32WEw= X-Received: from pjbpt6.prod.google.com ([2002:a17:90b:3d06:b0:30a:7da4:f075]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:28c5:b0:2f2:a664:df1a with SMTP id 98e67ed59e1d1-30e7d4f920dmr8130238a91.2.1747432479652; Fri, 16 May 2025 14:54:39 -0700 (PDT) Reply-To: Sean Christopherson Date: Fri, 16 May 2025 14:54:21 -0700 In-Reply-To: <20250516215422.2550669-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250516215422.2550669-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog Message-ID: <20250516215422.2550669-3-seanjc@google.com> Subject: [PATCH v3 2/3] KVM: x86: Use kvzalloc() to allocate VM struct From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Allocate VM structs via kvzalloc(), i.e. try to use a contiguous physical allocation before falling back to __vmalloc(), to avoid the overhead of establishing the virtual mappings. For non-debug builds, The SVM and VMX (and TDX) structures are now just below 7000 bytes in the worst case scenario (see below), i.e. are order-1 allocations, and will likely remain that way for quite some time. Add compile-time assertions in vendor code to ensure the size of the structures, sans the memslos hash tables, are order-0 allocations, i.e. are less than 4KiB. There's nothing fundamentally wrong with a larger kvm_{svm,vmx,tdx} size, but given that the size of the structure (without the memslots hash tables) is below 2KiB after 18+ years of existence, more than doubling the size would be quite notable. Add sanity checks on the memslot hash table sizes, partly to ensure they aren't resized without accounting for the impact on VM structure size, and partly to document that the majority of the size of VM structures comes from the memslots. Signed-off-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/svm/svm.c | 2 ++ arch/x86/kvm/vmx/main.c | 2 ++ arch/x86/kvm/vmx/vmx.c | 2 ++ arch/x86/kvm/x86.h | 22 ++++++++++++++++++++++ 5 files changed, 29 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index 9667d6b929ee..3a985825a945 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1961,7 +1961,7 @@ void kvm_x86_vendor_exit(void); #define __KVM_HAVE_ARCH_VM_ALLOC static inline struct kvm *kvm_arch_alloc_vm(void) { - return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO); + return kvzalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT); } =20 #define __KVM_HAVE_ARCH_VM_FREE diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 0ad1a6d4fb6d..d13e475c3407 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -5675,6 +5675,8 @@ static int __init svm_init(void) { int r; =20 + KVM_SANITY_CHECK_VM_STRUCT_SIZE(kvm_svm); + __unused_size_checks(); =20 if (!kvm_is_svm_supported()) diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c index d1e02e567b57..e18dfada2e90 100644 --- a/arch/x86/kvm/vmx/main.c +++ b/arch/x86/kvm/vmx/main.c @@ -64,6 +64,8 @@ static __init int vt_hardware_setup(void) vt_x86_ops.protected_apic_has_interrupt =3D tdx_protected_apic_has_inter= rupt; } =20 + KVM_SANITY_CHECK_VM_STRUCT_SIZE(kvm_tdx); + return 0; } =20 diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 9ff00ae9f05a..ef58b727d6c8 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -8643,6 +8643,8 @@ int __init vmx_init(void) { int r, cpu; =20 + KVM_SANITY_CHECK_VM_STRUCT_SIZE(kvm_vmx); + if (!kvm_is_vmx_supported()) return -EOPNOTSUPP; =20 diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 832f0faf4779..0f3046cccb79 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -55,6 +55,28 @@ struct kvm_host_values { =20 void kvm_spurious_fault(void); =20 +#define SIZE_OF_MEMSLOTS_HASHTABLE \ + (sizeof(((struct kvm_memslots *)0)->id_hash) * 2 * KVM_MAX_NR_ADDRESS_SPA= CES) + +/* Sanity check the size of the memslot hash tables. */ +static_assert(SIZE_OF_MEMSLOTS_HASHTABLE =3D=3D + (1024 * (1 + IS_ENABLED(CONFIG_X86_64)) * (1 + IS_ENABLED(CONFIG_KV= M_SMM)))); + +/* + * Assert that "struct kvm_{svm,vmx,tdx}" is an order-0 or order-1 allocat= ion. + * Spilling over to an order-2 allocation isn't fundamentally problematic,= but + * isn't expected to happen in the foreseeable future (O(years)). Assert = that + * the size is an order-0 allocation when ignoring the memslot hash tables= , to + * help detect and debug unexpected size increases. + */ +#define KVM_SANITY_CHECK_VM_STRUCT_SIZE(x) \ +do { \ + BUILD_BUG_ON(get_order(sizeof(struct x) - SIZE_OF_MEMSLOTS_HASHTABLE) && \ + !IS_ENABLED(CONFIG_DEBUG_KERNEL) && !IS_ENABLED(CONFIG_KASAN)); \ + BUILD_BUG_ON(get_order(sizeof(struct x)) < 2 && \ + !IS_ENABLED(CONFIG_DEBUG_KERNEL) && !IS_ENABLED(CONFIG_KASAN)); \ +} while (0) + #define KVM_NESTED_VMENTER_CONSISTENCY_CHECK(consistency_check) \ ({ \ bool failed =3D (consistency_check); \ --=20 2.49.0.1112.g889b7c5bd8-goog From nobody Fri Dec 19 19:17:08 2025 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 48622280306 for ; Fri, 16 May 2025 21:54:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747432487; cv=none; b=MO82sv3HmpdqiBpZX8LB40TE6QxyA8QWFHQHtNa1VFTRUJqg+m85UwFTj7H9ve802yZZQY+RmX3NAgiSujjb9CHrVYEqbdVPSTOviYsMI32QMb+HexAhpPZCh4E0Ah5JaOuGD1s4ygX4IS0yfE+G6C/0GWYgtxDwfwLSoT+YXcQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1747432487; c=relaxed/simple; bh=vG4MilI6ZPjsxWzTTmJYBAXAtBz2ENQvYI2HlBYevYI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=N6HcmQ9imS4f/h+FX8H/SWyOhegLrv4m9NHHAM4DwMpzWq0fBiMqnD1G6Thu2GfabhPnZLjr3WEJVgfGgqRzvPgT3hiT0q6pSIYVhaby0M3XH9LN2VeFQTVPKH2IM0uWKp1/85TpZwWzsyGrrliHIPON8W+O9tg374ApCpISccI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=L3Lyj6Ww; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--seanjc.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="L3Lyj6Ww" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-30e8fc03575so542310a91.0 for ; Fri, 16 May 2025 14:54:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747432485; x=1748037285; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:from:to:cc:subject:date:message-id:reply-to; bh=NSUjHZohehNysHCDxTu74aE3HD7zo4nmAA1APckiJz0=; b=L3Lyj6WwfYTHirB6zsGpMhRY+u0b1vYrhN02zNFFCMAxIXMnmxbUmy1TnKvn28vjSl wMHnS6fa9lEJQonZHYUt7XDRoV7VX1Uchjk4soHyZ+GxpGjzPHdZNU6GXHtLMVdNNjNa 3kNTgACuu812GI9UcXEfvj0tXOfIxvU/194v9kmT7BJgnhoSW3zCfUh8BwapsbKM8jRx dWuRzKJ1TCqWeh+L3pQSxpvnGzDqDv83fSKMgLo/815zNJAPFX87HN/j0z0hgPUaV1QT sPspIowQeG9zC9F3jj0njtAZxY3bpX4YFGTFGlCoTqFOaQlhLN1QqA4GUEHhSfKaTBpL JkYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747432485; x=1748037285; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:reply-to:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=NSUjHZohehNysHCDxTu74aE3HD7zo4nmAA1APckiJz0=; b=P7k6f9sbQyx7sFB5zgN4U8MWF9Gzcbxh7u4QuhqG5IcCjc469ynMkLBFsunCcRYRcN RP07fvkirpR+MFRHOBTq0E3v3I6pMTa2EGGloRzCpfRDra1xPmCWDMWFYMnolp2ZpX50 9VLL5SG6Bz0Dp8/hcajyl8zsMquZRrhmAleVgnHLRKqQEC8iX9Bg5wohj+PQSOwpwUkO 8buMZsb90BFaytWmRh58TCcykUDCnoEPLrC/v8N4+J3JqkzmgQTongnqtbduvTwt8pAF KkMikWTaFcoZ/82vSldTOQNyd+83wmtdBL8DU1x/8AabXGcjxNenuCTsAEYZC0e2j797 JKOA== X-Forwarded-Encrypted: i=1; AJvYcCXuQruA5LMWxWn1Z2z+1pyV/29h/2Ush2HkVUqTx8AWfYXahgR26QXY0iFhIl63joMsFFdYNGY7FymepUM=@vger.kernel.org X-Gm-Message-State: AOJu0Yy5KU7sSaLVDYP7mkLHfVb1GREa7iboKOfczvPDb4X7+4uiekDe 3GZP1LdF47cXsYQLnGybwkd4Id5dIKdbBoPLJF71bGWc19xYns4J3vuZnXG94ulhlUKC6p1m39q I97hFIw== X-Google-Smtp-Source: AGHT+IHkQzjmHcULUBiBU5Hpt5ez1mbp/7xSjtX/QxCL0i4ISdp33Z6ODZCSemjhL/36/EZTOC8neh15Vdo= X-Received: from pjbqi17.prod.google.com ([2002:a17:90b:2751:b0:30a:9720:ea33]) (user=seanjc job=prod-delivery.src-stubby-dispatcher) by 2002:a17:90b:4a04:b0:30e:9349:2da2 with SMTP id 98e67ed59e1d1-30e934931eemr3640734a91.4.1747432485576; Fri, 16 May 2025 14:54:45 -0700 (PDT) Reply-To: Sean Christopherson Date: Fri, 16 May 2025 14:54:22 -0700 In-Reply-To: <20250516215422.2550669-1-seanjc@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20250516215422.2550669-1-seanjc@google.com> X-Mailer: git-send-email 2.49.0.1112.g889b7c5bd8-goog Message-ID: <20250516215422.2550669-4-seanjc@google.com> Subject: [PATCH v3 3/3] KVM: x86/mmu: Defer allocation of shadow MMU's hashed page list From: Sean Christopherson To: Sean Christopherson , Paolo Bonzini Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When the TDP MMU is enabled, i.e. when the shadow MMU isn't used until a nested TDP VM is run, defer allocation of the array of hashed lists used to track shadow MMU pages until the first shadow root is allocated. Setting the list outside of mmu_lock is safe, as concurrent readers must hold mmu_lock in some capacity, shadow pages can only be added (or removed) from the list when mmu_lock is held for write, and tasks that are creating a shadow root are serialized by slots_arch_lock. I.e. it's impossible for the list to become non-empty until all readers go away, and so readers are guaranteed to see an empty list even if they make multiple calls to kvm_get_mmu_page_hash() in a single mmu_lock critical section. Use {WRITE/READ}_ONCE to set/get the list when mmu_lock isn't held for write, out of an abundance of paranoia; no sane compiler should tear the store or load, but it's technically possible. Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 60 +++++++++++++++++++++++++++++++++++------- 1 file changed, 50 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 41da2cb1e3f1..edb4ecff9917 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -1983,14 +1983,33 @@ static bool sp_has_gptes(struct kvm_mmu_page *sp) return true; } =20 +static __ro_after_init HLIST_HEAD(empty_page_hash); + +static struct hlist_head *kvm_get_mmu_page_hash(struct kvm *kvm, gfn_t gfn) +{ + /* + * Load mmu_page_hash from memory exactly once, as it's set at runtime + * outside of mmu_lock when the TDP MMU is enabled, i.e. when the hash + * table of shadow pages isn't needed unless KVM needs to shadow L1's + * TDP for an L2 guest. + */ + struct hlist_head *page_hash =3D READ_ONCE(kvm->arch.mmu_page_hash); + + lockdep_assert_held(&kvm->mmu_lock); + + if (!page_hash) + return &empty_page_hash; + + return &page_hash[kvm_page_table_hashfn(gfn)]; +} + #define for_each_valid_sp(_kvm, _sp, _list) \ hlist_for_each_entry(_sp, _list, hash_link) \ if (is_obsolete_sp((_kvm), (_sp))) { \ } else =20 #define for_each_gfn_valid_sp_with_gptes(_kvm, _sp, _gfn) \ - for_each_valid_sp(_kvm, _sp, \ - &(_kvm)->arch.mmu_page_hash[kvm_page_table_hashfn(_gfn)]) \ + for_each_valid_sp(_kvm, _sp, kvm_get_mmu_page_hash(_kvm, _gfn)) \ if ((_sp)->gfn !=3D (_gfn) || !sp_has_gptes(_sp)) {} else =20 static bool kvm_sync_page_check(struct kvm_vcpu *vcpu, struct kvm_mmu_page= *sp) @@ -2358,6 +2377,12 @@ static struct kvm_mmu_page *__kvm_mmu_get_shadow_pag= e(struct kvm *kvm, struct kvm_mmu_page *sp; bool created =3D false; =20 + /* + * No need for READ_ONCE(), unlike in kvm_get_mmu_page_hash(), because + * mmu_page_hash must be set prior to creating the first shadow root, + * i.e. reaching this point is fully serialized by slots_arch_lock. + */ + BUG_ON(!kvm->arch.mmu_page_hash); sp_list =3D &kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]; =20 sp =3D kvm_mmu_find_shadow_page(kvm, vcpu, gfn, sp_list, role); @@ -3886,11 +3911,21 @@ static int kvm_mmu_alloc_page_hash(struct kvm *kvm) { typeof(kvm->arch.mmu_page_hash) h; =20 + if (kvm->arch.mmu_page_hash) + return 0; + h =3D kvcalloc(KVM_NUM_MMU_PAGES, sizeof(*h), GFP_KERNEL_ACCOUNT); if (!h) return -ENOMEM; =20 - kvm->arch.mmu_page_hash =3D h; + /* + * Write mmu_page_hash exactly once as there may be concurrent readers, + * e.g. to check for shadowed PTEs in mmu_try_to_unsync_pages(). Note, + * mmu_lock must be held for write to add (or remove) shadow pages, and + * so readers are guaranteed to see an empty list for their current + * mmu_lock critical section. + */ + WRITE_ONCE(kvm->arch.mmu_page_hash, h); return 0; } =20 @@ -3913,9 +3948,13 @@ static int mmu_first_shadow_root_alloc(struct kvm *k= vm) if (kvm_shadow_root_allocated(kvm)) goto out_unlock; =20 + r =3D kvm_mmu_alloc_page_hash(kvm); + if (r) + goto out_unlock; + /* - * Check if anything actually needs to be allocated, e.g. all metadata - * will be allocated upfront if TDP is disabled. + * Check if memslot metadata actually needs to be allocated, e.g. all + * metadata will be allocated upfront if TDP is disabled. */ if (kvm_memslots_have_rmaps(kvm) && kvm_page_track_write_tracking_enabled(kvm)) @@ -6696,12 +6735,13 @@ int kvm_mmu_init_vm(struct kvm *kvm) INIT_LIST_HEAD(&kvm->arch.possible_nx_huge_pages); spin_lock_init(&kvm->arch.mmu_unsync_pages_lock); =20 - r =3D kvm_mmu_alloc_page_hash(kvm); - if (r) - return r; - - if (tdp_mmu_enabled) + if (tdp_mmu_enabled) { kvm_mmu_init_tdp_mmu(kvm); + } else { + r =3D kvm_mmu_alloc_page_hash(kvm); + if (r) + return r; + } =20 kvm->arch.split_page_header_cache.kmem_cache =3D mmu_page_header_cache; kvm->arch.split_page_header_cache.gfp_zero =3D __GFP_ZERO; --=20 2.49.0.1112.g889b7c5bd8-goog