From nobody Sat Feb  7 16:10:20 2026
Received: from mail-pl1-f174.google.com (mail-pl1-f174.google.com
 [209.85.214.174])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id EC51554FB1
	for <linux-kernel@vger.kernel.org>; Thu, 11 Jan 2024 18:33:38 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="gzfNPMQd"
Received: by mail-pl1-f174.google.com with SMTP id
 d9443c01a7336-1d409bcb0e7so31429775ad.1
        for <linux-kernel@vger.kernel.org>;
 Thu, 11 Jan 2024 10:33:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1704998018; x=1705602818;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:reply-to:references
         :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject
         :date:message-id:reply-to;
        bh=2NX9KnQsza9Uz6QRxAoPzwzDEngTlQbn6WQFeW15RSs=;
        b=gzfNPMQdiBRCekSwFSY1SXSxbDWm3t1v+H8m4g/Zg1DPwTgTXQGPfA09VML3Oeh3WY
         jobFDlkdgNu7zeTaW83Y6r8tt8awrXBjKT+cyNVB191ntp313PqXx243G6L2Yh2p5b+K
         9Ag6qOciaurxgUzQPqFrvQru0kYSzKKQVf0NLTzJdGRCm7F4fN0FAP/tn5MH3UhBbf7t
         +5iBkWh4qPjtka/h1B0kHC13XygQ9LjpwU9DacUfpuamBVNT9CEVJdKmgJFk8o4v6/Ns
         TQOVrf9Zpbo0B2itUSMHdZ0935oVbrAXp3WFC9mZDK6IAXwB3al9UWGBhjVgDNNfOhcR
         ifvw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1704998018; x=1705602818;
        h=content-transfer-encoding:mime-version:reply-to:references
         :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=2NX9KnQsza9Uz6QRxAoPzwzDEngTlQbn6WQFeW15RSs=;
        b=Q13ctGVoFMUVElSKmVWe5hz5teomTUktb0XuxXjejf9SY+o61uu+75fM2R+Q858zgA
         U0IlHnp1EqLYibR159qZu5Q8RUSbYkBEMNqtbLseYhzPFSz4NHmoh5wgOYsaLIvplrlH
         SLIh7+GTlWX+PrR5vDCCsYNi30elSikWQFW3mUFHlWVv2maRNQ5rZ8k7sABMraxbQUwx
         VT3Y/9uBUN1SCyKURHETx47ZN7X18dl1DIJI430jxmVMGLHwXTtuGKDf/sf6yuLyZlHp
         M0WbZ91e7nG7FrPK/QtZbPXFU014ncc65wra4lfc9aXQg6gsfKZ/oJjJ7QaZS3L4xorP
         pKxw==
X-Gm-Message-State: AOJu0YwjIK0gw92PTj0kve/oPFT1p7ldXZPMm/OQGW1IJFeK3qj66B1Q
	nhUES3LGeVc+hc+edvW9pQQ=
X-Google-Smtp-Source: 
 AGHT+IHnDtOq2y94Dh8PFjaWgtWLEz5nVEvXR0aC1RzvZIlhpBXuXDqh89s31F4som1CtytG5C2rzA==
X-Received: by 2002:a17:902:6b82:b0:1d4:7685:90df with SMTP id
 p2-20020a1709026b8200b001d4768590dfmr161590plk.31.1704998018116;
        Thu, 11 Jan 2024 10:33:38 -0800 (PST)
Received: from KASONG-MB2.tencent.com ([1.203.117.98])
        by smtp.gmail.com with ESMTPSA id
 mf3-20020a170902fc8300b001d08e080042sm1483267plb.43.2024.01.11.10.33.35
        (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256);
        Thu, 11 Jan 2024 10:33:37 -0800 (PST)
From: Kairui Song <ryncsn@gmail.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Yu Zhao <yuzhao@google.com>,
	Chris Li <chrisl@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	linux-kernel@vger.kernel.org,
	Kairui Song <kasong@tencent.com>
Subject: [PATCH v2 1/3] mm, lru_gen: batch update counters on againg
Date: Fri, 12 Jan 2024 02:33:19 +0800
Message-ID: <20240111183321.19984-2-ryncsn@gmail.com>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20240111183321.19984-1-ryncsn@gmail.com>
References: <20240111183321.19984-1-ryncsn@gmail.com>
Reply-To: Kairui Song <kasong@tencent.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Kairui Song <kasong@tencent.com>

When lru_gen is aging, it will update mm counters page by page,
which causes a higher overhead if age happens frequently or there
are a lot of pages in one generation getting moved.
Optimize this by doing the counter update in batch.

Although most __mod_*_state has its own caches the overhead
is still observable.

Tested in a 4G memcg on a EPYC 7K62 with:

  memcached -u nobody -m 16384 -s /tmp/memcached.socket \
    -a 0766 -t 16 -B binary &

  memtier_benchmark -S /tmp/memcached.socket \
    -P memcache_binary -n allkeys \
    --key-minimum=3D1 --key-maximum=3D16000000 -d 1024 \
    --ratio=3D1:0 --key-pattern=3DP:P -c 2 -t 16 --pipeline 8 -x 6

Average result of 18 test runs:

Before: 44017.78 Ops/sec
After:  44687.08 Ops/sec (+1.5%)

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/vmscan.c | 64 +++++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 55 insertions(+), 9 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4f9c854ce6cc..185d53607c7e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3113,9 +3113,47 @@ static int folio_update_gen(struct folio *folio, int=
 gen)
 	return ((old_flags & LRU_GEN_MASK) >> LRU_GEN_PGOFF) - 1;
 }
=20
+/*
+ * Update LRU gen in batch for each lru_gen LRU list. The batch is limited=
 to
+ * each gen / type / zone level LRU. Batch is applied after finished or ab=
orted
+ * scanning one LRU list.
+ */
+struct gen_update_batch {
+	int delta[MAX_NR_GENS];
+};
+
+static void lru_gen_update_batch(struct lruvec *lruvec, int type, int zone,
+				 struct gen_update_batch *batch)
+{
+	int gen;
+	int promoted =3D 0;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
+	enum lru_list lru =3D type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON;
+
+	for (gen =3D 0; gen < MAX_NR_GENS; gen++) {
+		int delta =3D batch->delta[gen];
+
+		if (!delta)
+			continue;
+
+		WRITE_ONCE(lrugen->nr_pages[gen][type][zone],
+			   lrugen->nr_pages[gen][type][zone] + delta);
+
+		if (lru_gen_is_active(lruvec, gen))
+			promoted +=3D delta;
+	}
+
+	if (promoted) {
+		__update_lru_size(lruvec, lru, zone, -promoted);
+		__update_lru_size(lruvec, lru + LRU_ACTIVE, zone, promoted);
+	}
+}
+
 /* protect pages accessed multiple times through file descriptors */
-static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool =
reclaiming)
+static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio,
+			 bool reclaiming, struct gen_update_batch *batch)
 {
+	int delta =3D folio_nr_pages(folio);
 	int type =3D folio_is_file_lru(folio);
 	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
 	int new_gen, old_gen =3D lru_gen_from_seq(lrugen->min_seq[type]);
@@ -3138,7 +3176,8 @@ static int folio_inc_gen(struct lruvec *lruvec, struc=
t folio *folio, bool reclai
 			new_flags |=3D BIT(PG_reclaim);
 	} while (!try_cmpxchg(&folio->flags, &old_flags, new_flags));
=20
-	lru_gen_update_size(lruvec, folio, old_gen, new_gen);
+	batch->delta[old_gen] -=3D delta;
+	batch->delta[new_gen] +=3D delta;
=20
 	return new_gen;
 }
@@ -3672,6 +3711,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int ty=
pe, bool can_swap)
 {
 	int zone;
 	int remaining =3D MAX_LRU_BATCH;
+	struct gen_update_batch batch =3D { };
 	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
 	int new_gen, old_gen =3D lru_gen_from_seq(lrugen->min_seq[type]);
=20
@@ -3690,12 +3730,15 @@ static bool inc_min_seq(struct lruvec *lruvec, int =
type, bool can_swap)
 			VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) !=3D type, folio);
 			VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) !=3D zone, folio);
=20
-			new_gen =3D folio_inc_gen(lruvec, folio, false);
+			new_gen =3D folio_inc_gen(lruvec, folio, false, &batch);
 			list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]);
=20
-			if (!--remaining)
+			if (!--remaining) {
+				lru_gen_update_batch(lruvec, type, zone, &batch);
 				return false;
+			}
 		}
+		lru_gen_update_batch(lruvec, type, zone, &batch);
 	}
 done:
 	reset_ctrl_pos(lruvec, type, true);
@@ -4215,7 +4258,7 @@ void lru_gen_soft_reclaim(struct mem_cgroup *memcg, i=
nt nid)
  *************************************************************************=
*****/
=20
 static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct =
scan_control *sc,
-		       int tier_idx)
+		       int tier_idx, struct gen_update_batch *batch)
 {
 	bool success;
 	int gen =3D folio_lru_gen(folio);
@@ -4257,7 +4300,7 @@ static bool sort_folio(struct lruvec *lruvec, struct =
folio *folio, struct scan_c
 	if (tier > tier_idx || refs =3D=3D BIT(LRU_REFS_WIDTH)) {
 		int hist =3D lru_hist_from_seq(lrugen->min_seq[type]);
=20
-		gen =3D folio_inc_gen(lruvec, folio, false);
+		gen =3D folio_inc_gen(lruvec, folio, false, batch);
 		list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
=20
 		WRITE_ONCE(lrugen->protected[hist][type][tier - 1],
@@ -4267,7 +4310,7 @@ static bool sort_folio(struct lruvec *lruvec, struct =
folio *folio, struct scan_c
=20
 	/* ineligible */
 	if (zone > sc->reclaim_idx || skip_cma(folio, sc)) {
-		gen =3D folio_inc_gen(lruvec, folio, false);
+		gen =3D folio_inc_gen(lruvec, folio, false, batch);
 		list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
 		return true;
 	}
@@ -4275,7 +4318,7 @@ static bool sort_folio(struct lruvec *lruvec, struct =
folio *folio, struct scan_c
 	/* waiting for writeback */
 	if (folio_test_locked(folio) || folio_test_writeback(folio) ||
 	    (type =3D=3D LRU_GEN_FILE && folio_test_dirty(folio))) {
-		gen =3D folio_inc_gen(lruvec, folio, true);
+		gen =3D folio_inc_gen(lruvec, folio, true, batch);
 		list_move(&folio->lru, &lrugen->folios[gen][type][zone]);
 		return true;
 	}
@@ -4341,6 +4384,7 @@ static int scan_folios(struct lruvec *lruvec, struct =
scan_control *sc,
 	for (i =3D MAX_NR_ZONES; i > 0; i--) {
 		LIST_HEAD(moved);
 		int skipped_zone =3D 0;
+		struct gen_update_batch batch =3D { };
 		int zone =3D (sc->reclaim_idx + i) % MAX_NR_ZONES;
 		struct list_head *head =3D &lrugen->folios[gen][type][zone];
=20
@@ -4355,7 +4399,7 @@ static int scan_folios(struct lruvec *lruvec, struct =
scan_control *sc,
=20
 			scanned +=3D delta;
=20
-			if (sort_folio(lruvec, folio, sc, tier))
+			if (sort_folio(lruvec, folio, sc, tier, &batch))
 				sorted +=3D delta;
 			else if (isolate_folio(lruvec, folio, sc)) {
 				list_add(&folio->lru, list);
@@ -4375,6 +4419,8 @@ static int scan_folios(struct lruvec *lruvec, struct =
scan_control *sc,
 			skipped +=3D skipped_zone;
 		}
=20
+		lru_gen_update_batch(lruvec, type, zone, &batch);
+
 		if (!remaining || isolated >=3D MIN_LRU_BATCH)
 			break;
 	}
--=20
2.43.0
From nobody Sat Feb  7 16:10:20 2026
Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com
 [209.85.214.173])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id C055555C0A
	for <linux-kernel@vger.kernel.org>; Thu, 11 Jan 2024 18:33:41 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="FSd6NJ4t"
Received: by mail-pl1-f173.google.com with SMTP id
 d9443c01a7336-1d51ba18e1bso46461065ad.0
        for <linux-kernel@vger.kernel.org>;
 Thu, 11 Jan 2024 10:33:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1704998021; x=1705602821;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:reply-to:references
         :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject
         :date:message-id:reply-to;
        bh=ooTkGE2rW3pIuHvnebpbhiU0Zel5DUvb4P5ZSh1D2Dw=;
        b=FSd6NJ4t8g8M+s9wMQHcm9TSrwgpulq1tTGlz4IxxllxgYGC2pct+Z46b+3chvYKdB
         IYcTxNU+xus1+rBu79MRyjjhV77oXMsW38poWpPpxWkj7EDuGZw7vtjwGsGx+dfETfx4
         oqqvMHP+kHU1cXqeEMytXAMZ2nW7RVYCbYHYZbgm232xOiZsvrHpXxCV1l9v6bXxk391
         ee1OvMxoGTkw/lxBrTeupz9lQNehVFCBU8HZYe0sRh4OU4HEHVfVH80qQVuWX3unC5B1
         x0sSyvKwDU10LUgjRdpIa+qTRMfeCqbVIHzORJJhaim+kBLsQA6wMsjxb3QphQCNENS6
         Q0PA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1704998021; x=1705602821;
        h=content-transfer-encoding:mime-version:reply-to:references
         :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=ooTkGE2rW3pIuHvnebpbhiU0Zel5DUvb4P5ZSh1D2Dw=;
        b=CsEpJ9TYATPvfnwu5TQPSwc3Inro9IB7WnZkhZrsr8iDxs+E4VMfWg3/0UfmhJ1bX+
         r4MJPzJKrPjPeLryHOwxG9/MmCtF7IibC/ZbGwvCc2WFXA2qz/FDVtnIJGxMCRO3v49l
         izSAcKLyBZgxDs7WGhTMgKSb84Mso20nEUlgn0BXa2XGXDdjIuSJ99NAUBazcn3IHX9R
         4QDPGfJdc98OS7wyQDWCqrJJRmKsSLDci0qvPrPvcUbV6GWpeGaht+hC8xDwWIH7QoIy
         Y564IusARyGrGNNPSaPH8ccBcdzDnpo3X2py+A0XYp+jctdTuezjZebhHsiHH8PVO1Rk
         U7wA==
X-Gm-Message-State: AOJu0YwAMOTek0NROCvBXO325OmDniXyI6oZkHarR654nnjvZ+gJCJwi
	O4cQquiXCC2FWAtXIZZxLdM=
X-Google-Smtp-Source: 
 AGHT+IEDHbwlcQ2ocO41xF0rCg4oiBOl0m3y0fRQIMYq2UBZBjcvr8e3NyI9T83Xh8KcFvM5suVFWw==
X-Received: by 2002:a17:902:690c:b0:1d4:1b4e:ebf5 with SMTP id
 j12-20020a170902690c00b001d41b4eebf5mr171681plk.10.1704998020964;
        Thu, 11 Jan 2024 10:33:40 -0800 (PST)
Received: from KASONG-MB2.tencent.com ([1.203.117.98])
        by smtp.gmail.com with ESMTPSA id
 mf3-20020a170902fc8300b001d08e080042sm1483267plb.43.2024.01.11.10.33.38
        (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256);
        Thu, 11 Jan 2024 10:33:40 -0800 (PST)
From: Kairui Song <ryncsn@gmail.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Yu Zhao <yuzhao@google.com>,
	Chris Li <chrisl@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	linux-kernel@vger.kernel.org,
	Kairui Song <kasong@tencent.com>
Subject: [PATCH v2 2/3] mm, lru_gen: move pages in bulk when aging
Date: Fri, 12 Jan 2024 02:33:20 +0800
Message-ID: <20240111183321.19984-3-ryncsn@gmail.com>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20240111183321.19984-1-ryncsn@gmail.com>
References: <20240111183321.19984-1-ryncsn@gmail.com>
Reply-To: Kairui Song <kasong@tencent.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Kairui Song <kasong@tencent.com>

Another overhead of aging is page moving. Actually, in most cases,
pages are being moved to the same gen after folio_inc_gen is called,
especially the protected pages.  So it's better to move them in bulk.

This also has a good effect on LRU order. Currently when MGLRU
ages, it walks the LRU backwards, and the protected pages are moved to
the tail of newer gen one by one, which actually reverses the order of
pages in LRU. Moving them in batches can help keep their order, only
in a small scope though, due to the scan limit of MAX_LRU_BATCH pages.

After this commit, we can see a slight performance gain (with
CONFIG_DEBUG_LIST=3Dn):

Test 1: of memcached in a 4G memcg on a EPYC 7K62 with:

  memcached -u nobody -m 16384 -s /tmp/memcached.socket \
    -a 0766 -t 16 -B binary &

  memtier_benchmark -S /tmp/memcached.socket \
    -P memcache_binary -n allkeys \
    --key-minimum=3D1 --key-maximum=3D16000000 -d 1024 \
    --ratio=3D1:0 --key-pattern=3DP:P -c 2 -t 16 --pipeline 8 -x 6

Average result of 18 test runs:

Before:           44017.78 Ops/sec
After patch 1-2:  44810.01 Ops/sec (+1.8%)

Test 2: MySQL in 6G memcg with:

  echo 'set GLOBAL innodb_buffer_pool_size=3D16106127360;' | \
    mysql -u USER -h localhost --password=3DPASS

  sysbench /usr/share/sysbench/oltp_read_only.lua \
    --mysql-user=3DUSER --mysql-password=3DPASS --mysql-db=3Dsb\
    --tables=3D48 --table-size=3D2000000 --threads=3D16 --time=3D1800\
    --report-interval=3D5 run

QPS of 6 test runs:

Before:
134126.83
134352.13
134045.19
133985.12
134787.47
134554.43

After patch 1-2 (+0.4%):
134913.38
134695.35
134891.31
134662.66
135090.32
134901.14

Only about 10% CPU time is spent in kernel space for MySQL test so the
improvement is very trivial.

There could be a higher performance gain when pages are getting
protected aggressively.

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/vmscan.c | 84 ++++++++++++++++++++++++++++++++++++++++++++---------
 1 file changed, 71 insertions(+), 13 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 185d53607c7e..57b6549946c3 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3120,9 +3120,46 @@ static int folio_update_gen(struct folio *folio, int=
 gen)
  */
 struct gen_update_batch {
 	int delta[MAX_NR_GENS];
+	struct folio *head, *tail;
 };
=20
-static void lru_gen_update_batch(struct lruvec *lruvec, int type, int zone,
+static void inline lru_gen_inc_bulk_finish(struct lru_gen_folio *lrugen,
+					   int bulk_gen, bool type, int zone,
+					   struct gen_update_batch *batch)
+{
+	if (!batch->head)
+		return;
+
+	list_bulk_move_tail(&lrugen->folios[bulk_gen][type][zone],
+			    &batch->head->lru,
+			    &batch->tail->lru);
+
+	batch->head =3D NULL;
+}
+
+/*
+ * When aging, protected pages will go to the tail of the same higher
+ * gen, so the can be moved in batches. Besides reduced overhead, this
+ * also avoids changing their LRU order in a small scope.
+ */
+static inline void lru_gen_try_inc_bulk(struct lru_gen_folio *lrugen, stru=
ct folio *folio,
+					int bulk_gen, int gen, bool type, int zone,
+					struct gen_update_batch *batch)
+{
+	/*
+	 * If folio not moving to the bulk_gen, it's raced with promotion
+	 * so it need to go to the head of another LRU.
+	 */
+	if (bulk_gen !=3D gen)
+		list_move(&folio->lru, &lrugen->folios[gen][type][zone]);
+
+	if (!batch->head)
+		batch->tail =3D folio;
+
+	batch->head =3D folio;
+}
+
+static void lru_gen_update_batch(struct lruvec *lruvec, int bulk_gen, int =
type, int zone,
 				 struct gen_update_batch *batch)
 {
 	int gen;
@@ -3130,6 +3167,8 @@ static void lru_gen_update_batch(struct lruvec *lruve=
c, int type, int zone,
 	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
 	enum lru_list lru =3D type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON;
=20
+	lru_gen_inc_bulk_finish(lrugen, bulk_gen, type, zone, batch);
+
 	for (gen =3D 0; gen < MAX_NR_GENS; gen++) {
 		int delta =3D batch->delta[gen];
=20
@@ -3714,6 +3753,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int ty=
pe, bool can_swap)
 	struct gen_update_batch batch =3D { };
 	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
 	int new_gen, old_gen =3D lru_gen_from_seq(lrugen->min_seq[type]);
+	int bulk_gen =3D (old_gen + 1) % MAX_NR_GENS;
=20
 	if (type =3D=3D LRU_GEN_ANON && !can_swap)
 		goto done;
@@ -3721,24 +3761,33 @@ static bool inc_min_seq(struct lruvec *lruvec, int =
type, bool can_swap)
 	/* prevent cold/hot inversion if force_scan is true */
 	for (zone =3D 0; zone < MAX_NR_ZONES; zone++) {
 		struct list_head *head =3D &lrugen->folios[old_gen][type][zone];
+		struct folio *prev =3D NULL;
=20
-		while (!list_empty(head)) {
-			struct folio *folio =3D lru_to_folio(head);
+		if (!list_empty(head))
+			prev =3D lru_to_folio(head);
=20
+		while (prev) {
+			struct folio *folio =3D prev;
 			VM_WARN_ON_ONCE_FOLIO(folio_test_unevictable(folio), folio);
 			VM_WARN_ON_ONCE_FOLIO(folio_test_active(folio), folio);
 			VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) !=3D type, folio);
 			VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) !=3D zone, folio);
=20
+			if (unlikely(list_is_first(&folio->lru, head)))
+				prev =3D NULL;
+			else
+				prev =3D lru_to_folio(&folio->lru);
+
 			new_gen =3D folio_inc_gen(lruvec, folio, false, &batch);
-			list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]);
+			lru_gen_try_inc_bulk(lrugen, folio, bulk_gen, new_gen, type, zone, &bat=
ch);
=20
 			if (!--remaining) {
-				lru_gen_update_batch(lruvec, type, zone, &batch);
+				lru_gen_update_batch(lruvec, bulk_gen, type, zone, &batch);
 				return false;
 			}
 		}
-		lru_gen_update_batch(lruvec, type, zone, &batch);
+
+		lru_gen_update_batch(lruvec, bulk_gen, type, zone, &batch);
 	}
 done:
 	reset_ctrl_pos(lruvec, type, true);
@@ -4258,7 +4307,7 @@ void lru_gen_soft_reclaim(struct mem_cgroup *memcg, i=
nt nid)
  *************************************************************************=
*****/
=20
 static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct =
scan_control *sc,
-		       int tier_idx, struct gen_update_batch *batch)
+		       int tier_idx, int bulk_gen, struct gen_update_batch *batch)
 {
 	bool success;
 	int gen =3D folio_lru_gen(folio);
@@ -4301,7 +4350,7 @@ static bool sort_folio(struct lruvec *lruvec, struct =
folio *folio, struct scan_c
 		int hist =3D lru_hist_from_seq(lrugen->min_seq[type]);
=20
 		gen =3D folio_inc_gen(lruvec, folio, false, batch);
-		list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
+		lru_gen_try_inc_bulk(lrugen, folio, bulk_gen, gen, type, zone, batch);
=20
 		WRITE_ONCE(lrugen->protected[hist][type][tier - 1],
 			   lrugen->protected[hist][type][tier - 1] + delta);
@@ -4311,7 +4360,7 @@ static bool sort_folio(struct lruvec *lruvec, struct =
folio *folio, struct scan_c
 	/* ineligible */
 	if (zone > sc->reclaim_idx || skip_cma(folio, sc)) {
 		gen =3D folio_inc_gen(lruvec, folio, false, batch);
-		list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
+		lru_gen_try_inc_bulk(lrugen, folio, bulk_gen, gen, type, zone, batch);
 		return true;
 	}
=20
@@ -4385,11 +4434,16 @@ static int scan_folios(struct lruvec *lruvec, struc=
t scan_control *sc,
 		LIST_HEAD(moved);
 		int skipped_zone =3D 0;
 		struct gen_update_batch batch =3D { };
+		int bulk_gen =3D (gen + 1) % MAX_NR_GENS;
 		int zone =3D (sc->reclaim_idx + i) % MAX_NR_ZONES;
 		struct list_head *head =3D &lrugen->folios[gen][type][zone];
+		struct folio *prev =3D NULL;
=20
-		while (!list_empty(head)) {
-			struct folio *folio =3D lru_to_folio(head);
+		if (!list_empty(head))
+			prev =3D lru_to_folio(head);
+
+		while (prev) {
+			struct folio *folio =3D prev;
 			int delta =3D folio_nr_pages(folio);
=20
 			VM_WARN_ON_ONCE_FOLIO(folio_test_unevictable(folio), folio);
@@ -4398,8 +4452,12 @@ static int scan_folios(struct lruvec *lruvec, struct=
 scan_control *sc,
 			VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) !=3D zone, folio);
=20
 			scanned +=3D delta;
+			if (unlikely(list_is_first(&folio->lru, head)))
+				prev =3D NULL;
+			else
+				prev =3D lru_to_folio(&folio->lru);
=20
-			if (sort_folio(lruvec, folio, sc, tier, &batch))
+			if (sort_folio(lruvec, folio, sc, tier, bulk_gen, &batch))
 				sorted +=3D delta;
 			else if (isolate_folio(lruvec, folio, sc)) {
 				list_add(&folio->lru, list);
@@ -4419,7 +4477,7 @@ static int scan_folios(struct lruvec *lruvec, struct =
scan_control *sc,
 			skipped +=3D skipped_zone;
 		}
=20
-		lru_gen_update_batch(lruvec, type, zone, &batch);
+		lru_gen_update_batch(lruvec, bulk_gen, type, zone, &batch);
=20
 		if (!remaining || isolated >=3D MIN_LRU_BATCH)
 			break;
--=20
2.43.0
From nobody Sat Feb  7 16:10:20 2026
Received: from mail-pl1-f175.google.com (mail-pl1-f175.google.com
 [209.85.214.175])
	(using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5AC8E55C3E
	for <linux-kernel@vger.kernel.org>; Thu, 11 Jan 2024 18:33:44 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=none dis=none) header.from=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=pass smtp.mailfrom=gmail.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com
 header.b="WV7L8mTQ"
Received: by mail-pl1-f175.google.com with SMTP id
 d9443c01a7336-1d3eb299e2eso35979405ad.2
        for <linux-kernel@vger.kernel.org>;
 Thu, 11 Jan 2024 10:33:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20230601; t=1704998024; x=1705602824;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:reply-to:references
         :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject
         :date:message-id:reply-to;
        bh=FtBGRSINF6KI+RQqf15dVUHoZl7wlhzwJWjlp3npicI=;
        b=WV7L8mTQK650Xu/cZzejuMMoTwolQGrq8Kjy9Rrg75rq/GKwheFZ2TiLBzst72L2El
         PX8+pgvayjS+JzbumqxxrHumo9FEpA22dI4RDU3RKR5I7qml/ovHv6CF5YPEdIUonZfM
         tdCrC+zp1j9IQk9QTZX13MDZWCbmMTiD9leQHJ8eEMFJ+GmzehSYEOCRw6Df1BnSJCMh
         GG+2pbgP2cesw9heqs88OoonqSNywbesdYVTHPqov/VsyLwIiOeP9yseMDqmG4Kh4HD1
         a40ya4a6TqyVKkNDcM7ASN8u6JgxkFxH6RX3+oW1PTX4dAbmSMZf+9prv2ER85de2W/9
         Fksg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1704998024; x=1705602824;
        h=content-transfer-encoding:mime-version:reply-to:references
         :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=FtBGRSINF6KI+RQqf15dVUHoZl7wlhzwJWjlp3npicI=;
        b=oKQdYTO8wlbNhPdI05D8Mchscre6yN5gNbKcuX0aYYCV+l2o287JWavptHn+Mz7JW6
         ujYp6h7FCZntqsIillsaaleXQ8fz3LYiNv+12GbFnlLEBG6kP3xAHHNFrET8hwuq8aOX
         FRvdsIkLpCBKozU4yulYsLlT7v4bPJDQYUaMilBbG7EamQ21Kckg8Xee4E5/dT1Xd6C/
         mf5gaYQq4j0Wag42ihJ2HU6mgWBNZ83mBzKivUJx9jYPy2SNxOiWq9P1MHZZLW14D2O+
         VuQFOPeXwz0vV1I9PxEYoyC6+hfhEDMqM5QxF7b2ujlxzAi22ZmzzYrydsJ90TVRnXbM
         j93A==
X-Gm-Message-State: AOJu0Yw/lr4XxjvBwopM8C1KQM57CNvRPEMLr71wbN7gHqxnw/ai5KTX
	D7R9TMqplfK7dFlgyn7ibkI=
X-Google-Smtp-Source: 
 AGHT+IFBEjyW6RvYROEJiwKiMLp56a9P4SPF8Qzgo7qO9Q/w+d/Lqk+mkTK5izHiT22OVPgAbeFUzw==
X-Received: by 2002:a17:902:ea91:b0:1d4:ca2e:2bfb with SMTP id
 x17-20020a170902ea9100b001d4ca2e2bfbmr149921plb.42.1704998023664;
        Thu, 11 Jan 2024 10:33:43 -0800 (PST)
Received: from KASONG-MB2.tencent.com ([1.203.117.98])
        by smtp.gmail.com with ESMTPSA id
 mf3-20020a170902fc8300b001d08e080042sm1483267plb.43.2024.01.11.10.33.41
        (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256);
        Thu, 11 Jan 2024 10:33:43 -0800 (PST)
From: Kairui Song <ryncsn@gmail.com>
To: linux-mm@kvack.org
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Yu Zhao <yuzhao@google.com>,
	Chris Li <chrisl@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	linux-kernel@vger.kernel.org,
	Kairui Song <kasong@tencent.com>
Subject: [PATCH v2 3/3] mm,
 lru_gen: try to prefetch next page when canning LRU
Date: Fri, 12 Jan 2024 02:33:21 +0800
Message-ID: <20240111183321.19984-4-ryncsn@gmail.com>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20240111183321.19984-1-ryncsn@gmail.com>
References: <20240111183321.19984-1-ryncsn@gmail.com>
Reply-To: Kairui Song <kasong@tencent.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

From: Kairui Song <kasong@tencent.com>

Prefetch for inactive/active LRU have been long exiting, apply the same
optimization for MGLRU.

Ramdisk based swap test in a 4G memcg on a EPYC 7K62 with:

  memcached -u nobody -m 16384 -s /tmp/memcached.socket \
    -a 0766 -t 16 -B binary &

  memtier_benchmark -S /tmp/memcached.socket \
    -P memcache_binary -n allkeys \
    --key-minimum=3D1 --key-maximum=3D16000000 -d 1024 \
    --ratio=3D1:0 --key-pattern=3DP:P -c 2 -t 16 --pipeline 8 -x 6

Average result of 18 test runs:

Before:           44017.78 Ops/sec
After patch 1-3:  44890.50 Ops/sec (+1.8%)

Ramdisk fio test in a 4G memcg on a EPYC 7K62 with:

  fio -name=3Dmglru --numjobs=3D16 --directory=3D/mnt --size=3D960m \
    --buffered=3D1 --ioengine=3Dio_uring --iodepth=3D128 \
    --iodepth_batch_submit=3D32 --iodepth_batch_complete=3D32 \
    --rw=3Drandread --random_distribution=3Dzipf:0.5 --norandommap \
    --time_based --ramp_time=3D1m --runtime=3D5m --group_reporting

Before this patch:
bw (  MiB/s): min=3D 7644, max=3D 9293, per=3D100.00%, avg=3D8777.77, stdev=
=3D16.59, samples=3D9568
iops        : min=3D1956954, max=3D2379053, avg=3D2247108.51, stdev=3D4247.=
22, samples=3D9568

After this patch (+7.5%):
bw (  MiB/s): min=3D 8462, max=3D 9902, per=3D100.00%, avg=3D9444.77, stdev=
=3D16.43, samples=3D9568
iops        : min=3D2166433, max=3D2535135, avg=3D2417858.23, stdev=3D4205.=
15, samples=3D9568

Prefetch is highly related to timing and architecture so it may only help in
certain cases, some extra test showed at least no regression here for
the series:

Ramdisk memtier test above in a 8G memcg on an Intel i7-9700:

  memtier_benchmark -S /tmp/memcached.socket \
    -P memcache_binary -n allkeys --key-minimum=3D1 \
    --key-maximum=3D36000000 --key-pattern=3DP:P -c 1 -t 12 \
    --ratio 1:0 --pipeline 8 -d 1024 -x 4

Average result of 12 test runs:

Before:           61241.96 Ops/sec
After patch 1-3:  61268.53 Ops/sec (+0.0%)

Signed-off-by: Kairui Song <kasong@tencent.com>
---
 mm/vmscan.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 57b6549946c3..4ef83db40adb 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3773,10 +3773,12 @@ static bool inc_min_seq(struct lruvec *lruvec, int =
type, bool can_swap)
 			VM_WARN_ON_ONCE_FOLIO(folio_is_file_lru(folio) !=3D type, folio);
 			VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) !=3D zone, folio);
=20
-			if (unlikely(list_is_first(&folio->lru, head)))
+			if (unlikely(list_is_first(&folio->lru, head))) {
 				prev =3D NULL;
-			else
+			} else {
 				prev =3D lru_to_folio(&folio->lru);
+				prefetchw(&prev->flags);
+			}
=20
 			new_gen =3D folio_inc_gen(lruvec, folio, false, &batch);
 			lru_gen_try_inc_bulk(lrugen, folio, bulk_gen, new_gen, type, zone, &bat=
ch);
@@ -4452,10 +4454,12 @@ static int scan_folios(struct lruvec *lruvec, struc=
t scan_control *sc,
 			VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) !=3D zone, folio);
=20
 			scanned +=3D delta;
-			if (unlikely(list_is_first(&folio->lru, head)))
+			if (unlikely(list_is_first(&folio->lru, head))) {
 				prev =3D NULL;
-			else
+			} else {
 				prev =3D lru_to_folio(&folio->lru);
+				prefetchw(&prev->flags);
+			}
=20
 			if (sort_folio(lruvec, folio, sc, tier, bulk_gen, &batch))
 				sorted +=3D delta;
--=20
2.43.0