From nobody Sun Feb 8 11:44:09 2026 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B7301F8ACD for ; Wed, 18 Dec 2024 16:59:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734541167; cv=none; b=boMOf37gOug4R9n92T43giK8i5ciAvBDEsBHr0y0/GKVOBUfvhadhWoOH4cv40zCLgpPMf0MqO4Ew7JhvsTUXidWaC+cmfuxUeuuCca7davAXfsvHNUoq6pbrpoKj4nPP2gxgojGZ1w3I8VLDQmEjPUd37Pe/m4IUsJk9fyt53o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1734541167; c=relaxed/simple; bh=BHpfB8K3fAUFRwCDH+0yjQTiYRuD3Xma/CXzk56zWAM=; h=Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type; b=rwQ/KBIl/mbwvcSanwQDIClk93ELMwBgcVqozyVoZ8aD3KSX/w2Ct+l9YcAiJomZujCFE/QdLFcxM1nH2Mx1E7AiLp4zIHIbBDWpX1NI3AXusulEaorSik4XyvfynCKebaaqSsW3rJgkHUHzDHzfvGcKrSm0vTRpuvrbQtBv6VI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=shelob.surriel.com; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=shelob.surriel.com Received: from [2601:18c:9101:a8b6:82e7:cf5d:dfd9:50ef] (helo=fangorn) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1tNxLA-000000008Pb-3yFZ; Wed, 18 Dec 2024 11:56:04 -0500 Date: Wed, 18 Dec 2024 11:56:04 -0500 From: Rik van Riel To: Andrew Morton Cc: "Huang, Ying" , Chris Li , Ryan Roberts , David Hildenbrand , "Matthew Wilcox (Oracle)" , linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-team@meta.com Subject: [PATCH] mm: add maybe_lru_add_drain() that only drains when threshold is exceeded Message-ID: <20241218115604.7e56bedb@fangorn> X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-redhat-linux-gnu) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: riel@surriel.com Content-Type: text/plain; charset="utf-8" The lru_add_drain() call in zap_page_range_single() always takes some locks, and will drain the buffers even when there is only a single page pending. We probably don't need to do that, since we already deal fine with zap_page= _range encountering pages that are still in the buffers of other CPUs. On an AMD Milan CPU, will-it-scale the tlb_flush2_threads test performance = with 36 threads (one for each core) increases from 526k to 730k loops per second. The overhead in this case was on the lruvec locks, taking the lock to flush a single page. There may be other spots where this variant could be appropr= iate. Signed-off-by: Rik van Riel --- include/linux/swap.h | 1 + mm/memory.c | 2 +- mm/swap.c | 18 ++++++++++++++++++ mm/swap_state.c | 2 +- 4 files changed, 21 insertions(+), 2 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index dd5ac833150d..a2f06317bd4b 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -391,6 +391,7 @@ static inline void lru_cache_enable(void) } =20 extern void lru_cache_disable(void); +extern void maybe_lru_add_drain(void); extern void lru_add_drain(void); extern void lru_add_drain_cpu(int cpu); extern void lru_add_drain_cpu_zone(struct zone *zone); diff --git a/mm/memory.c b/mm/memory.c index 2635f7bceab5..1767c65b93ad 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1919,7 +1919,7 @@ void zap_page_range_single(struct vm_area_struct *vma= , unsigned long address, struct mmu_notifier_range range; struct mmu_gather tlb; =20 - lru_add_drain(); + maybe_lru_add_drain(); mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm, address, end); hugetlb_zap_begin(vma, &range.start, &range.end); diff --git a/mm/swap.c b/mm/swap.c index 9caf6b017cf0..001664a652ff 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -777,6 +777,24 @@ void lru_add_drain(void) mlock_drain_local(); } =20 +static bool should_lru_add_drain(void) +{ + struct cpu_fbatches *fbatches =3D this_cpu_ptr(&cpu_fbatches); + int pending =3D folio_batch_count(&fbatches->lru_add); + pending +=3D folio_batch_count(&fbatches->lru_deactivate); + pending +=3D folio_batch_count(&fbatches->lru_deactivate_file); + pending +=3D folio_batch_count(&fbatches->lru_lazyfree); + + /* Don't bother draining unless we have several pages pending. */ + return pending > SWAP_CLUSTER_MAX; +} + +void maybe_lru_add_drain(void) +{ + if (should_lru_add_drain()) + lru_add_drain(); +} + /* * It's called from per-cpu workqueue context in SMP case so * lru_add_drain_cpu and invalidate_bh_lrus_cpu should run on diff --git a/mm/swap_state.c b/mm/swap_state.c index 3a0cf965f32b..1ae4cd7b041e 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -317,7 +317,7 @@ void free_pages_and_swap_cache(struct encoded_page **pa= ges, int nr) struct folio_batch folios; unsigned int refs[PAGEVEC_SIZE]; =20 - lru_add_drain(); + maybe_lru_add_drain(); folio_batch_init(&folios); for (int i =3D 0; i < nr; i++) { struct folio *folio =3D page_folio(encoded_page_ptr(pages[i])); --=20 2.43.5