From nobody Thu Jun 11 21:38:58 2026
Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7B7301F8ACD
	for <linux-kernel@vger.kernel.org>; Wed, 18 Dec 2024 16:59:15 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=96.67.55.147
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1734541167; cv=none;
 b=boMOf37gOug4R9n92T43giK8i5ciAvBDEsBHr0y0/GKVOBUfvhadhWoOH4cv40zCLgpPMf0MqO4Ew7JhvsTUXidWaC+cmfuxUeuuCca7davAXfsvHNUoq6pbrpoKj4nPP2gxgojGZ1w3I8VLDQmEjPUd37Pe/m4IUsJk9fyt53o=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1734541167; c=relaxed/simple;
	bh=BHpfB8K3fAUFRwCDH+0yjQTiYRuD3Xma/CXzk56zWAM=;
	h=Date:From:To:Cc:Subject:Message-ID:MIME-Version:Content-Type;
 b=rwQ/KBIl/mbwvcSanwQDIClk93ELMwBgcVqozyVoZ8aD3KSX/w2Ct+l9YcAiJomZujCFE/QdLFcxM1nH2Mx1E7AiLp4zIHIbBDWpX1NI3AXusulEaorSik4XyvfynCKebaaqSsW3rJgkHUHzDHzfvGcKrSm0vTRpuvrbQtBv6VI=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=surriel.com;
 spf=pass smtp.mailfrom=shelob.surriel.com;
 arc=none smtp.client-ip=96.67.55.147
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=none (p=none dis=none) header.from=surriel.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=shelob.surriel.com
Received: from [2601:18c:9101:a8b6:82e7:cf5d:dfd9:50ef] (helo=fangorn)
	by shelob.surriel.com with esmtpsa  (TLS1.2) tls
 TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384
	(Exim 4.97.1)
	(envelope-from <riel@shelob.surriel.com>)
	id 1tNxLA-000000008Pb-3yFZ;
	Wed, 18 Dec 2024 11:56:04 -0500
Date: Wed, 18 Dec 2024 11:56:04 -0500
From: Rik van Riel <riel@surriel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: "Huang, Ying" <ying.huang@intel.com>, Chris Li <chrisl@kernel.org>, Ryan
 Roberts <ryan.roberts@arm.com>, David Hildenbrand <david@redhat.com>,
 "Matthew Wilcox (Oracle)" <willy@infradead.org>,
 linux-kernel@vger.kernel.org, linux-mm@kvack.org, kernel-team@meta.com
Subject: [PATCH] mm: add maybe_lru_add_drain() that only drains when
 threshold  is exceeded
Message-ID: <20241218115604.7e56bedb@fangorn>
X-Mailer: Claws Mail 4.3.0 (GTK 3.24.43; x86_64-redhat-linux-gnu)
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Sender: riel@surriel.com
Content-Type: text/plain; charset="utf-8"

The lru_add_drain() call in zap_page_range_single() always takes some locks,
and will drain the buffers even when there is only a single page pending.

We probably don't need to do that, since we already deal fine with zap_page=
_range
encountering pages that are still in the buffers of other CPUs.

On an AMD Milan CPU, will-it-scale the tlb_flush2_threads test performance =
with
36 threads (one for each core) increases from 526k to 730k loops per second.

The overhead in this case was on the lruvec locks, taking the lock to flush
a single page. There may be other spots where this variant could be appropr=
iate.

Signed-off-by: Rik van Riel <riel@surriel.com>
---
 include/linux/swap.h |  1 +
 mm/memory.c          |  2 +-
 mm/swap.c            | 18 ++++++++++++++++++
 mm/swap_state.c      |  2 +-
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index dd5ac833150d..a2f06317bd4b 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -391,6 +391,7 @@ static inline void lru_cache_enable(void)
 }
=20
 extern void lru_cache_disable(void);
+extern void maybe_lru_add_drain(void);
 extern void lru_add_drain(void);
 extern void lru_add_drain_cpu(int cpu);
 extern void lru_add_drain_cpu_zone(struct zone *zone);
diff --git a/mm/memory.c b/mm/memory.c
index 2635f7bceab5..1767c65b93ad 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1919,7 +1919,7 @@ void zap_page_range_single(struct vm_area_struct *vma=
, unsigned long address,
 	struct mmu_notifier_range range;
 	struct mmu_gather tlb;
=20
-	lru_add_drain();
+	maybe_lru_add_drain();
 	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma->vm_mm,
 				address, end);
 	hugetlb_zap_begin(vma, &range.start, &range.end);
diff --git a/mm/swap.c b/mm/swap.c
index 9caf6b017cf0..001664a652ff 100644
--- a/mm/swap.c
+++ b/mm/swap.c
@@ -777,6 +777,24 @@ void lru_add_drain(void)
 	mlock_drain_local();
 }
=20
+static bool should_lru_add_drain(void)
+{
+	struct cpu_fbatches *fbatches =3D this_cpu_ptr(&cpu_fbatches);
+	int pending =3D folio_batch_count(&fbatches->lru_add);
+	pending +=3D folio_batch_count(&fbatches->lru_deactivate);
+	pending +=3D folio_batch_count(&fbatches->lru_deactivate_file);
+	pending +=3D folio_batch_count(&fbatches->lru_lazyfree);
+
+	/* Don't bother draining unless we have several pages pending. */
+	return pending > SWAP_CLUSTER_MAX;
+}
+
+void maybe_lru_add_drain(void)
+{
+	if (should_lru_add_drain())
+		lru_add_drain();
+}
+
 /*
  * It's called from per-cpu workqueue context in SMP case so
  * lru_add_drain_cpu and invalidate_bh_lrus_cpu should run on
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 3a0cf965f32b..1ae4cd7b041e 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -317,7 +317,7 @@ void free_pages_and_swap_cache(struct encoded_page **pa=
ges, int nr)
 	struct folio_batch folios;
 	unsigned int refs[PAGEVEC_SIZE];
=20
-	lru_add_drain();
+	maybe_lru_add_drain();
 	folio_batch_init(&folios);
 	for (int i =3D 0; i < nr; i++) {
 		struct folio *folio =3D page_folio(encoded_page_ptr(pages[i]));
--=20
2.43.5