From nobody Thu Nov 28 04:51:25 2024 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F62C1E1C3E for ; Fri, 4 Oct 2024 19:55:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728071750; cv=none; b=n3Bgm8K35TWdZCVQ3GU91tJD17H5PjgHT+euggQ8RVlMsX2si8eCSoDWE1YQ/7iqUU9aUiVfMFOxe7uNYyUmh3yl46drsg08SSmQsrYRJRE8Sx4UZHwEBjrrrwt2jKb45ZRIHlNThBwNak2iZFlKRpxqM56YSUZgk+Sbi+8Ileg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1728071750; c=relaxed/simple; bh=DYdRMJkwfHq1EeSmBOjN2a31jLCtFMQDVwgOph06U6s=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=njrvDhAWkwMI+UBGzm1dmA9ZUHJfDxsJy4ROcDPoD3rFq6avNeROQApAsXtlYUtzn4EU02PDvgge5JAKD0N8uT8fLIYHtrzKSwhg4wzM/S+D8H5jWHFAum/uYXt8+4w7/AsPotuUoOYfOeX87oci1F4UKwZbERHmBG/82Fzm4TY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--vipinsh.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=l5+RxsgN; arc=none smtp.client-ip=209.85.216.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--vipinsh.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="l5+RxsgN" Received: by mail-pj1-f73.google.com with SMTP id 98e67ed59e1d1-2e06f5ec69cso3266652a91.1 for ; Fri, 04 Oct 2024 12:55:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1728071748; x=1728676548; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=wOQiKDWxKwvHSTq+X53UtqBcRoF0/VfzwQic5qnhwMQ=; b=l5+RxsgNwm/YWWSXNflzk3F+davH+NLAexwiJr3kPOVpZgX1gDusJkLMnZpLS9GMXS n1v6P0Y+OscPL3mm2hrXL/y10h+uQd3DesVzbBBbYY6IZSQ7PmFX3rMs1hM1ovDAZQkI iM6vxloUG2zhC/GmP0v1IJVaWhHPhgF8iVx/bGjp0vIJ6pmUd0KgujNlnCDHPiSkocbm otxUybCt3+9QyFRoxgDp46o7TPzIQ3UGPfAzcBVZyrRAcHVBAjto3Oret4lbTrwRUZiU A3WdRCy+L704pzjg07433Bu8iC4QkEfBqXpcdDNST8nWC/UL+GZQyUcqpmyVc40NQ5nj gmdA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728071748; x=1728676548; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wOQiKDWxKwvHSTq+X53UtqBcRoF0/VfzwQic5qnhwMQ=; b=bmgTBToRf7bv6kuBXRzdiQaOdgVLE1S6F4YYxbpRkNmDKnRo6dYnjmEKjcslhY4e7m sTtq2eDn6U1/TWNAMCM5I6EzsRPfgOD1sAfTyQdH4ipSMbOKLTw0U/LBniL1B0lJ/VYT P7IjN2Z+gP8Cvu4TRA50JpWP7H4zBlCw7lZu6qsRVh8GAvlmM38vWjfuxnEtLtjgZ9TJ N4Hz4cZe7uam+EFZi5Q0c/wgfQ4eoL9WFaJ48IAZJLqXo4+BxPknzLUISUNQCcdim5Tm DcSgjLn9p6MDTG652gUH/cpvtPOUC2bjddsy2LBsDaXtAPrs53q1/x6syufejSwzOm5a xyTA== X-Forwarded-Encrypted: i=1; AJvYcCUk/p6vMuDGJ7CE1IHAIRgFcLTbp6CZL/lmIrRgrmGOgcUYptjrxkSb7nfDW/IGhAaTQohrAYgP8O3lnhU=@vger.kernel.org X-Gm-Message-State: AOJu0Yxahkniv/0rcXkZp0tiO/QaBM5oXJ9dDmFVC7PIob0n8wo8o66C RKUuKVzNx0ezBTC3QfS3UqfbwsestLufA5AqmuUo7P7BYPJohvpORLVDmvMsqpZ57SBzLPEbrUM iLYzZxQ== X-Google-Smtp-Source: AGHT+IHPE5k5SFpXLK6Z34zLxu8yWs3OwTfU0LRLmjyctWLUAce3eV/NaZe8zBYNUajJi9VaxaEPwfPjjn77 X-Received: from vipin.c.googlers.com ([35.247.89.60]) (user=vipinsh job=sendgmr) by 2002:a17:90a:ec08:b0:2d8:8c74:7088 with SMTP id 98e67ed59e1d1-2e1e501d358mr37694a91.0.1728071747652; Fri, 04 Oct 2024 12:55:47 -0700 (PDT) Date: Fri, 4 Oct 2024 12:55:39 -0700 In-Reply-To: <20241004195540.210396-1-vipinsh@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20241004195540.210396-1-vipinsh@google.com> X-Mailer: git-send-email 2.47.0.rc0.187.ge670bccf7e-goog Message-ID: <20241004195540.210396-3-vipinsh@google.com> Subject: [PATCH v2 2/3] KVM: x86/mmu: Use MMU shrinker to shrink KVM MMU memory caches From: Vipin Sharma To: seanjc@google.com, pbonzini@redhat.com, dmatlack@google.com Cc: zhi.wang.linux@gmail.com, weijiang.yang@intel.com, mizhang@google.com, liangchen.linux@gmail.com, kvm@vger.kernel.org, linux-kernel@vger.kernel.org, Vipin Sharma Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Use MMU shrinker to iterate through all the vCPUs of all the VMs and free pages allocated in MMU memory caches. Protect cache allocation in page fault and MMU load path from MMU shrinker by using a per vCPU mutex. In MMU shrinker, move the iterated VM to the end of the VMs list so that the pain of emptying cache spread among other VMs too. The specific caches to empty are mmu_shadow_page_cache and mmu_shadowed_info_cache as these caches store whole pages. Emptying them will give more impact to shrinker compared to other caches like mmu_pte_list_desc_cache{} and mmu_page_header_cache{} Holding per vCPU mutex lock ensures that a vCPU doesn't get surprised by finding its cache emptied after filling them up for page table allocations during page fault handling and MMU load operation. Per vCPU mutex also makes sure there is only race between MMU shrinker and all other vCPUs. This should result in very less contention. Signed-off-by: Vipin Sharma Suggested-by: David Matlack Suggested-by: Sean Christopherson --- arch/x86/include/asm/kvm_host.h | 6 +++ arch/x86/kvm/mmu/mmu.c | 69 +++++++++++++++++++++++++++------ arch/x86/kvm/mmu/paging_tmpl.h | 14 ++++--- include/linux/kvm_host.h | 1 + virt/kvm/kvm_main.c | 8 +++- 5 files changed, 81 insertions(+), 17 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos= t.h index cbfe31bac6cf6..63eaf03111ebb 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -811,6 +811,12 @@ struct kvm_vcpu_arch { */ struct kvm_mmu *walk_mmu; =20 + /* + * Protect cache from getting emptied in MMU shrinker while vCPU might + * use cache for fault handling or loading MMU. As this is a per vCPU + * lock, only contention might happen when MMU shrinker runs. + */ + struct mutex mmu_memory_cache_lock; struct kvm_mmu_memory_cache mmu_pte_list_desc_cache; struct kvm_mmu_memory_cache mmu_shadow_page_cache; struct kvm_mmu_memory_cache mmu_shadowed_info_cache; diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 213e46b55dda2..8e2935347615d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4524,29 +4524,33 @@ static int direct_page_fault(struct kvm_vcpu *vcpu,= struct kvm_page_fault *fault if (r !=3D RET_PF_INVALID) return r; =20 + mutex_lock(&vcpu->arch.mmu_memory_cache_lock); r =3D mmu_topup_memory_caches(vcpu, false); if (r) - return r; + goto out_mmu_memory_cache_unlock; =20 r =3D kvm_faultin_pfn(vcpu, fault, ACC_ALL); if (r !=3D RET_PF_CONTINUE) - return r; + goto out_mmu_memory_cache_unlock; =20 r =3D RET_PF_RETRY; write_lock(&vcpu->kvm->mmu_lock); =20 if (is_page_fault_stale(vcpu, fault)) - goto out_unlock; + goto out_mmu_unlock; =20 r =3D make_mmu_pages_available(vcpu); if (r) - goto out_unlock; + goto out_mmu_unlock; =20 r =3D direct_map(vcpu, fault); =20 -out_unlock: +out_mmu_unlock: write_unlock(&vcpu->kvm->mmu_lock); kvm_release_pfn_clean(fault->pfn); +out_mmu_memory_cache_unlock: + mutex_unlock(&vcpu->arch.mmu_memory_cache_lock); + return r; } =20 @@ -4617,25 +4621,28 @@ static int kvm_tdp_mmu_page_fault(struct kvm_vcpu *= vcpu, if (r !=3D RET_PF_INVALID) return r; =20 + mutex_lock(&vcpu->arch.mmu_memory_cache_lock); r =3D mmu_topup_memory_caches(vcpu, false); if (r) - return r; + goto out_mmu_memory_cache_unlock; =20 r =3D kvm_faultin_pfn(vcpu, fault, ACC_ALL); if (r !=3D RET_PF_CONTINUE) - return r; + goto out_mmu_memory_cache_unlock; =20 r =3D RET_PF_RETRY; read_lock(&vcpu->kvm->mmu_lock); =20 if (is_page_fault_stale(vcpu, fault)) - goto out_unlock; + goto out_mmu_unlock; =20 r =3D kvm_tdp_mmu_map(vcpu, fault); =20 -out_unlock: +out_mmu_unlock: read_unlock(&vcpu->kvm->mmu_lock); kvm_release_pfn_clean(fault->pfn); +out_mmu_memory_cache_unlock: + mutex_unlock(&vcpu->arch.mmu_memory_cache_lock); return r; } #endif @@ -5691,6 +5698,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) { int r; =20 + mutex_lock(&vcpu->arch.mmu_memory_cache_lock); r =3D mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->root_role.direct); if (r) goto out; @@ -5717,6 +5725,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) */ kvm_x86_call(flush_tlb_current)(vcpu); out: + mutex_unlock(&vcpu->arch.mmu_memory_cache_lock); return r; } =20 @@ -6303,6 +6312,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) if (!vcpu->arch.mmu_shadow_page_cache.init_value) vcpu->arch.mmu_shadow_page_cache.gfp_zero =3D __GFP_ZERO; =20 + mutex_init(&vcpu->arch.mmu_memory_cache_lock); vcpu->arch.mmu =3D &vcpu->arch.root_mmu; vcpu->arch.walk_mmu =3D &vcpu->arch.root_mmu; =20 @@ -6997,13 +7007,50 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm,= u64 gen) static unsigned long mmu_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) { - return SHRINK_STOP; + struct kvm *kvm, *next_kvm, *first_kvm =3D NULL; + unsigned long i, freed =3D 0; + struct kvm_vcpu *vcpu; + + mutex_lock(&kvm_lock); + list_for_each_entry_safe(kvm, next_kvm, &vm_list, vm_list) { + if (!first_kvm) + first_kvm =3D kvm; + else if (first_kvm =3D=3D kvm) + break; + + list_move_tail(&kvm->vm_list, &vm_list); + + kvm_for_each_vcpu(i, vcpu, kvm) { + if (!mutex_trylock(&vcpu->arch.mmu_memory_cache_lock)) + continue; + freed +=3D kvm_mmu_empty_memory_cache(&vcpu->arch.mmu_shadow_page_cache= ); + freed +=3D kvm_mmu_empty_memory_cache(&vcpu->arch.mmu_shadowed_info_cac= he); + mutex_unlock(&vcpu->arch.mmu_memory_cache_lock); + if (freed >=3D sc->nr_to_scan) + goto out; + } + } +out: + mutex_unlock(&kvm_lock); + return freed; } =20 static unsigned long mmu_shrink_count(struct shrinker *shrink, struct shrink_control *sc) { - return SHRINK_EMPTY; + unsigned long i, count =3D 0; + struct kvm_vcpu *vcpu; + struct kvm *kvm; + + mutex_lock(&kvm_lock); + list_for_each_entry(kvm, &vm_list, vm_list) { + kvm_for_each_vcpu(i, vcpu, kvm) { + count +=3D READ_ONCE(vcpu->arch.mmu_shadow_page_cache.nobjs); + count +=3D READ_ONCE(vcpu->arch.mmu_shadowed_info_cache.nobjs); + } + } + mutex_unlock(&kvm_lock); + return !count ? SHRINK_EMPTY : count; } =20 static struct shrinker *mmu_shrinker; diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 405bd7ceee2a3..084a5c532078f 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -809,13 +809,14 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, s= truct kvm_page_fault *fault return RET_PF_EMULATE; } =20 + mutex_lock(&vcpu->arch.mmu_memory_cache_lock); r =3D mmu_topup_memory_caches(vcpu, true); if (r) - return r; + goto out_mmu_memory_cache_unlock; =20 r =3D kvm_faultin_pfn(vcpu, fault, walker.pte_access); if (r !=3D RET_PF_CONTINUE) - return r; + goto out_mmu_memory_cache_unlock; =20 /* * Do not change pte_access if the pfn is a mmio page, otherwise @@ -840,16 +841,19 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, s= truct kvm_page_fault *fault write_lock(&vcpu->kvm->mmu_lock); =20 if (is_page_fault_stale(vcpu, fault)) - goto out_unlock; + goto out_mmu_unlock; =20 r =3D make_mmu_pages_available(vcpu); if (r) - goto out_unlock; + goto out_mmu_unlock; r =3D FNAME(fetch)(vcpu, fault, &walker); =20 -out_unlock: +out_mmu_unlock: write_unlock(&vcpu->kvm->mmu_lock); kvm_release_pfn_clean(fault->pfn); +out_mmu_memory_cache_unlock: + mutex_unlock(&vcpu->arch.mmu_memory_cache_lock); + return r; } =20 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index b23c6d48392f7..288e503f14a0b 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1446,6 +1446,7 @@ void kvm_flush_remote_tlbs_memslot(struct kvm *kvm, int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min); int __kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int capa= city, int min); int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache *mc); +int kvm_mmu_empty_memory_cache(struct kvm_mmu_memory_cache *mc); void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc); void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); #endif diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index cb2b78e92910f..5d89ca218791b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -451,15 +451,21 @@ int kvm_mmu_memory_cache_nr_free_objects(struct kvm_m= mu_memory_cache *mc) return mc->nobjs; } =20 -void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc) +int kvm_mmu_empty_memory_cache(struct kvm_mmu_memory_cache *mc) { + int freed =3D mc->nobjs; while (mc->nobjs) { if (mc->kmem_cache) kmem_cache_free(mc->kmem_cache, mc->objects[--mc->nobjs]); else free_page((unsigned long)mc->objects[--mc->nobjs]); } + return freed; +} =20 +void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc) +{ + kvm_mmu_empty_memory_cache(mc); kvfree(mc->objects); =20 mc->objects =3D NULL; --=20 2.47.0.rc0.187.ge670bccf7e-goog