From nobody Wed Sep 17 06:42:57 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6F2DDC4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 04:20:20 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235014AbiLVEUR (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 23:20:17 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45138 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235042AbiLVETr (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 23:19:47 -0500
Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com
 [IPv6:2607:f8b0:4864:20::b4a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4EF40240BD
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:42 -0800 (PST)
Received: by mail-yb1-xb4a.google.com with SMTP id
 g9-20020a25bdc9000000b0073727a20239so694802ybk.4
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=N4TcA/s/8t2bWWYZiBbVtrtPPkaRo7qGxumdUIvl0ek=;
        b=bG75aKyCcpU6cfyYWl1zzl+PIHqX6D0BG6nxb9rq9djraywRphLGdqUWMK51qHsbkf
         RveqneQH3WAqaehOnRBvR2w5bCgf5mzxRado8Oh6VSctqdjsCEFrOOexrzSXs2qvgtFT
         kHnNKxLUvsQhwuzNWinlPQpfNWtkzLBG6zMQD9e1a1HYzJ0FaudoTt5fpMoXJZweb2/r
         +HNQI//INOZLvHI9743lEW9/m2zl5FHTzKUCdO9JVLWtJs8dJsrQNHLKGZEIIalz/7Jy
         MOQf4F4jjkErPcUB6fC56tXt6SzPyyxDgfg/2YBhz/zsqeAgAFPSOYgvaHX5Pd0F4KjU
         bLOw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=N4TcA/s/8t2bWWYZiBbVtrtPPkaRo7qGxumdUIvl0ek=;
        b=5Orl9l20Y7wUl+3gdnhGtgGN66jLfEyrwcxDP89fralQFeV22ZJP8OUonLGHaDBQ7T
         Jz2SbVzw874a11jcmDY5PG+rD+uT/IQFCYwCdOIEh3A9mVxAmYZaUIzQnBNfNb2uslGm
         YxGRJS76mXwA4uwna69llIFTQMBGILgIHB6m4se9CyiXMih6Qg5Idd6uGcsL4T5Nb4nG
         n2fJaWbaY8CAOgrz6BYZitPCMUnkQ3GEUrCnaNftOrnEBo1Yfvk58HskHtgP/izkZO0K
         J642YE2dL+mMptbGE9hxUb8OgyTSsP/BMNG0KMTIblDOyFPMLf8z/fsZ1pa4kkupFVNx
         sIFA==
X-Gm-Message-State: AFqh2kq5PZP9JwD+dChaiSFouPvP0aEvQHf+jr32xs54H7ORKlAINic2
        KRyaw8RrrOUc+Cx9mNJlAos7+//kZXQ=
X-Google-Smtp-Source: 
 AMrXdXvFsC2yaW02qCDMqrDqFDN0SJMRekr525L1lxoDa69bWaQeswqLxQLsHlKx5usxhH753zJinh3PyRI=
X-Received: from yuzhao.bld.corp.google.com
 ([2620:15c:183:200:a463:5f7b:440e:5c77])
 (user=yuzhao job=sendgmr) by 2002:a0d:db8d:0:b0:329:88ec:ba20 with SMTP id
 d135-20020a0ddb8d000000b0032988ecba20mr499754ywe.492.1671682781586; Wed, 21
 Dec 2022 20:19:41 -0800 (PST)
Date: Wed, 21 Dec 2022 21:18:59 -0700
In-Reply-To: <20221222041905.2431096-1-yuzhao@google.com>
Message-Id: <20221222041905.2431096-2-yuzhao@google.com>
Mime-Version: 1.0
References: <20221222041905.2431096-1-yuzhao@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Subject: [PATCH mm-unstable v3 1/8] mm: multi-gen LRU: rename lru_gen_struct
 to lru_gen_folio
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
        Jonathan Corbet <corbet@lwn.net>,
        Michael Larabel <michael@michaellarabel.com>,
        Michal Hocko <mhocko@kernel.org>,
        Mike Rapoport <rppt@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Suren Baghdasaryan <surenb@google.com>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org, linux-mm@google.com,
        Yu Zhao <yuzhao@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

The new name lru_gen_folio will be more distinct from the coming
lru_gen_memcg.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 include/linux/mm_inline.h |  4 ++--
 include/linux/mmzone.h    |  6 +++---
 mm/vmscan.c               | 34 +++++++++++++++++-----------------
 mm/workingset.c           |  4 ++--
 4 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index e8ed225d8f7c..f63968bd7de5 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -178,7 +178,7 @@ static inline void lru_gen_update_size(struct lruvec *l=
ruvec, struct folio *foli
 	int zone =3D folio_zonenum(folio);
 	int delta =3D folio_nr_pages(folio);
 	enum lru_list lru =3D type * LRU_INACTIVE_FILE;
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
=20
 	VM_WARN_ON_ONCE(old_gen !=3D -1 && old_gen >=3D MAX_NR_GENS);
 	VM_WARN_ON_ONCE(new_gen !=3D -1 && new_gen >=3D MAX_NR_GENS);
@@ -224,7 +224,7 @@ static inline bool lru_gen_add_folio(struct lruvec *lru=
vec, struct folio *folio,
 	int gen =3D folio_lru_gen(folio);
 	int type =3D folio_is_file_lru(folio);
 	int zone =3D folio_zonenum(folio);
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
=20
 	VM_WARN_ON_ONCE_FOLIO(gen !=3D -1, folio);
=20
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index cd28a100d9e4..1686fcc4ed01 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -404,7 +404,7 @@ enum {
  * The number of pages in each generation is eventually consistent and the=
refore
  * can be transiently negative when reset_batch_size() is pending.
  */
-struct lru_gen_struct {
+struct lru_gen_folio {
 	/* the aging increments the youngest generation number */
 	unsigned long max_seq;
 	/* the eviction increments the oldest generation numbers */
@@ -461,7 +461,7 @@ struct lru_gen_mm_state {
 struct lru_gen_mm_walk {
 	/* the lruvec under reclaim */
 	struct lruvec *lruvec;
-	/* unstable max_seq from lru_gen_struct */
+	/* unstable max_seq from lru_gen_folio */
 	unsigned long max_seq;
 	/* the next address within an mm to scan */
 	unsigned long next_addr;
@@ -524,7 +524,7 @@ struct lruvec {
 	unsigned long			flags;
 #ifdef CONFIG_LRU_GEN
 	/* evictable pages divided into generations */
-	struct lru_gen_struct		lrugen;
+	struct lru_gen_folio		lrugen;
 	/* to concurrently iterate lru_gen_mm_list */
 	struct lru_gen_mm_state		mm_state;
 #endif
diff --git a/mm/vmscan.c b/mm/vmscan.c
index e83d2a74e942..42507b36698e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3215,7 +3215,7 @@ static int get_nr_gens(struct lruvec *lruvec, int typ=
e)
=20
 static bool __maybe_unused seq_is_valid(struct lruvec *lruvec)
 {
-	/* see the comment on lru_gen_struct */
+	/* see the comment on lru_gen_folio */
 	return get_nr_gens(lruvec, LRU_GEN_FILE) >=3D MIN_NR_GENS &&
 	       get_nr_gens(lruvec, LRU_GEN_FILE) <=3D get_nr_gens(lruvec, LRU_GEN=
_ANON) &&
 	       get_nr_gens(lruvec, LRU_GEN_ANON) <=3D MAX_NR_GENS;
@@ -3612,7 +3612,7 @@ struct ctrl_pos {
 static void read_ctrl_pos(struct lruvec *lruvec, int type, int tier, int g=
ain,
 			  struct ctrl_pos *pos)
 {
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
 	int hist =3D lru_hist_from_seq(lrugen->min_seq[type]);
=20
 	pos->refaulted =3D lrugen->avg_refaulted[type][tier] +
@@ -3627,7 +3627,7 @@ static void read_ctrl_pos(struct lruvec *lruvec, int =
type, int tier, int gain,
 static void reset_ctrl_pos(struct lruvec *lruvec, int type, bool carryover)
 {
 	int hist, tier;
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
 	bool clear =3D carryover ? NR_HIST_GENS =3D=3D 1 : NR_HIST_GENS > 1;
 	unsigned long seq =3D carryover ? lrugen->min_seq[type] : lrugen->max_seq=
 + 1;
=20
@@ -3704,7 +3704,7 @@ static int folio_update_gen(struct folio *folio, int =
gen)
 static int folio_inc_gen(struct lruvec *lruvec, struct folio *folio, bool =
reclaiming)
 {
 	int type =3D folio_is_file_lru(folio);
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
 	int new_gen, old_gen =3D lru_gen_from_seq(lrugen->min_seq[type]);
 	unsigned long new_flags, old_flags =3D READ_ONCE(folio->flags);
=20
@@ -3749,7 +3749,7 @@ static void update_batch_size(struct lru_gen_mm_walk =
*walk, struct folio *folio,
 static void reset_batch_size(struct lruvec *lruvec, struct lru_gen_mm_walk=
 *walk)
 {
 	int gen, type, zone;
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
=20
 	walk->batched =3D 0;
=20
@@ -4263,7 +4263,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int ty=
pe, bool can_swap)
 {
 	int zone;
 	int remaining =3D MAX_LRU_BATCH;
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
 	int new_gen, old_gen =3D lru_gen_from_seq(lrugen->min_seq[type]);
=20
 	if (type =3D=3D LRU_GEN_ANON && !can_swap)
@@ -4299,7 +4299,7 @@ static bool try_to_inc_min_seq(struct lruvec *lruvec,=
 bool can_swap)
 {
 	int gen, type, zone;
 	bool success =3D false;
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
 	DEFINE_MIN_SEQ(lruvec);
=20
 	VM_WARN_ON_ONCE(!seq_is_valid(lruvec));
@@ -4320,7 +4320,7 @@ static bool try_to_inc_min_seq(struct lruvec *lruvec,=
 bool can_swap)
 		;
 	}
=20
-	/* see the comment on lru_gen_struct */
+	/* see the comment on lru_gen_folio */
 	if (can_swap) {
 		min_seq[LRU_GEN_ANON] =3D min(min_seq[LRU_GEN_ANON], min_seq[LRU_GEN_FIL=
E]);
 		min_seq[LRU_GEN_FILE] =3D max(min_seq[LRU_GEN_ANON], lrugen->min_seq[LRU=
_GEN_FILE]);
@@ -4342,7 +4342,7 @@ static void inc_max_seq(struct lruvec *lruvec, bool c=
an_swap, bool force_scan)
 {
 	int prev, next;
 	int type, zone;
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
=20
 	spin_lock_irq(&lruvec->lru_lock);
=20
@@ -4400,7 +4400,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec,=
 unsigned long max_seq,
 	bool success;
 	struct lru_gen_mm_walk *walk;
 	struct mm_struct *mm =3D NULL;
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
=20
 	VM_WARN_ON_ONCE(max_seq > READ_ONCE(lrugen->max_seq));
=20
@@ -4465,7 +4465,7 @@ static bool should_run_aging(struct lruvec *lruvec, u=
nsigned long max_seq, unsig
 	unsigned long old =3D 0;
 	unsigned long young =3D 0;
 	unsigned long total =3D 0;
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
 	struct mem_cgroup *memcg =3D lruvec_memcg(lruvec);
=20
 	for (type =3D !can_swap; type < ANON_AND_FILE; type++) {
@@ -4750,7 +4750,7 @@ static bool sort_folio(struct lruvec *lruvec, struct =
folio *folio, int tier_idx)
 	int delta =3D folio_nr_pages(folio);
 	int refs =3D folio_lru_refs(folio);
 	int tier =3D lru_tier_from_refs(refs);
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
=20
 	VM_WARN_ON_ONCE_FOLIO(gen >=3D MAX_NR_GENS, folio);
=20
@@ -4850,7 +4850,7 @@ static int scan_folios(struct lruvec *lruvec, struct =
scan_control *sc,
 	int scanned =3D 0;
 	int isolated =3D 0;
 	int remaining =3D MAX_LRU_BATCH;
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
 	struct mem_cgroup *memcg =3D lruvec_memcg(lruvec);
=20
 	VM_WARN_ON_ONCE(!list_empty(list));
@@ -5251,7 +5251,7 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruv=
ec, struct scan_control *sc
=20
 static bool __maybe_unused state_is_valid(struct lruvec *lruvec)
 {
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
=20
 	if (lrugen->enabled) {
 		enum lru_list lru;
@@ -5530,7 +5530,7 @@ static void lru_gen_seq_show_full(struct seq_file *m,=
 struct lruvec *lruvec,
 	int i;
 	int type, tier;
 	int hist =3D lru_hist_from_seq(seq);
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
=20
 	for (tier =3D 0; tier < MAX_NR_TIERS; tier++) {
 		seq_printf(m, "            %10d", tier);
@@ -5580,7 +5580,7 @@ static int lru_gen_seq_show(struct seq_file *m, void =
*v)
 	unsigned long seq;
 	bool full =3D !debugfs_real_fops(m->file)->write;
 	struct lruvec *lruvec =3D v;
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
 	int nid =3D lruvec_pgdat(lruvec)->node_id;
 	struct mem_cgroup *memcg =3D lruvec_memcg(lruvec);
 	DEFINE_MAX_SEQ(lruvec);
@@ -5834,7 +5834,7 @@ void lru_gen_init_lruvec(struct lruvec *lruvec)
 {
 	int i;
 	int gen, type, zone;
-	struct lru_gen_struct *lrugen =3D &lruvec->lrugen;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
=20
 	lrugen->max_seq =3D MIN_NR_GENS + 1;
 	lrugen->enabled =3D lru_gen_enabled();
diff --git a/mm/workingset.c b/mm/workingset.c
index 1a86645b7b3c..fd666584515c 100644
--- a/mm/workingset.c
+++ b/mm/workingset.c
@@ -223,7 +223,7 @@ static void *lru_gen_eviction(struct folio *folio)
 	unsigned long token;
 	unsigned long min_seq;
 	struct lruvec *lruvec;
-	struct lru_gen_struct *lrugen;
+	struct lru_gen_folio *lrugen;
 	int type =3D folio_is_file_lru(folio);
 	int delta =3D folio_nr_pages(folio);
 	int refs =3D folio_lru_refs(folio);
@@ -252,7 +252,7 @@ static void lru_gen_refault(struct folio *folio, void *=
shadow)
 	unsigned long token;
 	unsigned long min_seq;
 	struct lruvec *lruvec;
-	struct lru_gen_struct *lrugen;
+	struct lru_gen_folio *lrugen;
 	struct mem_cgroup *memcg;
 	struct pglist_data *pgdat;
 	int type =3D folio_is_file_lru(folio);
--=20
2.39.0.314.g84b9a713c41-goog
From nobody Wed Sep 17 06:42:57 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id DB133C4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 04:20:23 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235044AbiLVEUW (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 23:20:22 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44882 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235015AbiLVETs (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 23:19:48 -0500
Received: from mail-il1-x14a.google.com (mail-il1-x14a.google.com
 [IPv6:2607:f8b0:4864:20::14a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D64D42409A
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:43 -0800 (PST)
Received: by mail-il1-x14a.google.com with SMTP id
 i21-20020a056e021d1500b003041b04e3ebso493357ila.7
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:43 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=GKQYzYQslmsEvTKoQjl1Dy8mF9dw9swe45tpdyd96Iw=;
        b=hiWrjVIMR/8TnqO43W7ej+Q9H4D1PFrQLabYHgT2Fiz6OmgEI02DldL0TyORs8wy8n
         gzQMTWHm8bPRJjvpDLuxOYEX45gDXAal7JLlT/n18L6hVsg3dngAiFDK3Tn+rpAmxV6C
         +qEslUlp9HbFzqL4mgVM5b5RkhpgXYnyZ+ORq6Ita7eNOYYtIOGIzMuXESbtK6od4r1A
         jWAL37uhaZUkjPoemHffstsks18inpeSIQG+tf76sgmOk7/nZKU+GpNLKMjITNkpPHgq
         53hXe1Fl0FagiNNwuieSMmO4uKwVQ6ZUaFgxxLKWXGELNgQBxX2r4plz2LGpanIqTl28
         9HRA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=GKQYzYQslmsEvTKoQjl1Dy8mF9dw9swe45tpdyd96Iw=;
        b=PRlp8cVIIu/KvRYLxUgPtydbE/mrlQcPDwL3236t48AAYx0SSbF3WnozHhBp97D/7U
         HSYFjtok1CoOaSyrmTcpWMHZCVsPij6WUu3OouLGv2QZZ8UM0rjxa6dZEa5br+mFYo9B
         QPrn9g5DrlHipqVOw7lJX0gXBRYc19cMJoO2TUgP6WBdTas8xJ846WdKGnEVqk1mgcYx
         BdO8j399T/ZRIxDktRXXl24fEL7k7EHm3amUZ+KPobv6Cqt5yy9qKGDE6OxSn+DRphJu
         JViuW+3nQW4RKg78nX2HHrcpGlcMvZcl6Iqc7+2T9AjNOIw5FWe8mtasBKUXXm8RqX4c
         V8kA==
X-Gm-Message-State: AFqh2kplAMfinwAPe16zTPgaWHCOBH3xJABqwcCi0jUWlDQYzD9n+6sK
        HyrWwTAN27iAkMHXO9O02cElIBavfxY=
X-Google-Smtp-Source: 
 AMrXdXulwki9nSBxGR/8kwCpRV7uTwszRsuNMgerK7fVzStawG8GMbadC6mUNc3+9DnL5KUi28j6NQ4IuKE=
X-Received: from yuzhao.bld.corp.google.com
 ([2620:15c:183:200:a463:5f7b:440e:5c77])
 (user=yuzhao job=sendgmr) by 2002:a02:881a:0:b0:38a:141d:6564 with SMTP id
 r26-20020a02881a000000b0038a141d6564mr112099jai.140.1671682783226; Wed, 21
 Dec 2022 20:19:43 -0800 (PST)
Date: Wed, 21 Dec 2022 21:19:00 -0700
In-Reply-To: <20221222041905.2431096-1-yuzhao@google.com>
Message-Id: <20221222041905.2431096-3-yuzhao@google.com>
Mime-Version: 1.0
References: <20221222041905.2431096-1-yuzhao@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Subject: [PATCH mm-unstable v3 2/8] mm: multi-gen LRU: rename lrugen->lists[]
 to lrugen->folios[]
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
        Jonathan Corbet <corbet@lwn.net>,
        Michael Larabel <michael@michaellarabel.com>,
        Michal Hocko <mhocko@kernel.org>,
        Mike Rapoport <rppt@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Suren Baghdasaryan <surenb@google.com>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org, linux-mm@google.com,
        Yu Zhao <yuzhao@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

lru_gen_folio will be chained into per-node lists by the coming
lrugen->list.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 Documentation/mm/multigen_lru.rst |  8 ++++----
 include/linux/mm_inline.h         |  4 ++--
 include/linux/mmzone.h            |  8 ++++----
 mm/vmscan.c                       | 20 ++++++++++----------
 4 files changed, 20 insertions(+), 20 deletions(-)

diff --git a/Documentation/mm/multigen_lru.rst b/Documentation/mm/multigen_=
lru.rst
index d7062c6a8946..d8f721f98868 100644
--- a/Documentation/mm/multigen_lru.rst
+++ b/Documentation/mm/multigen_lru.rst
@@ -89,15 +89,15 @@ variables are monotonically increasing.
=20
 Generation numbers are truncated into ``order_base_2(MAX_NR_GENS+1)``
 bits in order to fit into the gen counter in ``folio->flags``. Each
-truncated generation number is an index to ``lrugen->lists[]``. The
+truncated generation number is an index to ``lrugen->folios[]``. The
 sliding window technique is used to track at least ``MIN_NR_GENS`` and
 at most ``MAX_NR_GENS`` generations. The gen counter stores a value
 within ``[1, MAX_NR_GENS]`` while a page is on one of
-``lrugen->lists[]``; otherwise it stores zero.
+``lrugen->folios[]``; otherwise it stores zero.
=20
 Each generation is divided into multiple tiers. A page accessed ``N``
 times through file descriptors is in tier ``order_base_2(N)``. Unlike
-generations, tiers do not have dedicated ``lrugen->lists[]``. In
+generations, tiers do not have dedicated ``lrugen->folios[]``. In
 contrast to moving across generations, which requires the LRU lock,
 moving across tiers only involves atomic operations on
 ``folio->flags`` and therefore has a negligible cost. A feedback loop
@@ -127,7 +127,7 @@ page mapped by this PTE to ``(max_seq%MAX_NR_GENS)+1``.
 Eviction
 --------
 The eviction consumes old generations. Given an ``lruvec``, it
-increments ``min_seq`` when ``lrugen->lists[]`` indexed by
+increments ``min_seq`` when ``lrugen->folios[]`` indexed by
 ``min_seq%MAX_NR_GENS`` becomes empty. To select a type and a tier to
 evict from, it first compares ``min_seq[]`` to select the older type.
 If both types are equally old, it selects the one whose first tier has
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index f63968bd7de5..da38e3d962e2 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -256,9 +256,9 @@ static inline bool lru_gen_add_folio(struct lruvec *lru=
vec, struct folio *folio,
 	lru_gen_update_size(lruvec, folio, -1, gen);
 	/* for folio_rotate_reclaimable() */
 	if (reclaiming)
-		list_add_tail(&folio->lru, &lrugen->lists[gen][type][zone]);
+		list_add_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
 	else
-		list_add(&folio->lru, &lrugen->lists[gen][type][zone]);
+		list_add(&folio->lru, &lrugen->folios[gen][type][zone]);
=20
 	return true;
 }
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 1686fcc4ed01..6c96ee823dbd 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -312,7 +312,7 @@ enum lruvec_flags {
  * They form a sliding window of a variable size [MIN_NR_GENS, MAX_NR_GENS=
]. An
  * offset within MAX_NR_GENS, i.e., gen, indexes the LRU list of the
  * corresponding generation. The gen counter in folio->flags stores gen+1 =
while
- * a page is on one of lrugen->lists[]. Otherwise it stores 0.
+ * a page is on one of lrugen->folios[]. Otherwise it stores 0.
  *
  * A page is added to the youngest generation on faulting. The aging needs=
 to
  * check the accessed bit at least twice before handing this page over to =
the
@@ -324,8 +324,8 @@ enum lruvec_flags {
  * rest of generations, if they exist, are considered inactive. See
  * lru_gen_is_active().
  *
- * PG_active is always cleared while a page is on one of lrugen->lists[] s=
o that
- * the aging needs not to worry about it. And it's set again when a page
+ * PG_active is always cleared while a page is on one of lrugen->folios[] =
so
+ * that the aging needs not to worry about it. And it's set again when a p=
age
  * considered active is isolated for non-reclaiming purposes, e.g., migrat=
ion.
  * See lru_gen_add_folio() and lru_gen_del_folio().
  *
@@ -412,7 +412,7 @@ struct lru_gen_folio {
 	/* the birth time of each generation in jiffies */
 	unsigned long timestamps[MAX_NR_GENS];
 	/* the multi-gen LRU lists, lazily sorted on eviction */
-	struct list_head lists[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES];
+	struct list_head folios[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES];
 	/* the multi-gen LRU sizes, eventually consistent */
 	long nr_pages[MAX_NR_GENS][ANON_AND_FILE][MAX_NR_ZONES];
 	/* the exponential moving average of refaulted */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 42507b36698e..d94d9fcabf36 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4271,7 +4271,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int ty=
pe, bool can_swap)
=20
 	/* prevent cold/hot inversion if force_scan is true */
 	for (zone =3D 0; zone < MAX_NR_ZONES; zone++) {
-		struct list_head *head =3D &lrugen->lists[old_gen][type][zone];
+		struct list_head *head =3D &lrugen->folios[old_gen][type][zone];
=20
 		while (!list_empty(head)) {
 			struct folio *folio =3D lru_to_folio(head);
@@ -4282,7 +4282,7 @@ static bool inc_min_seq(struct lruvec *lruvec, int ty=
pe, bool can_swap)
 			VM_WARN_ON_ONCE_FOLIO(folio_zonenum(folio) !=3D zone, folio);
=20
 			new_gen =3D folio_inc_gen(lruvec, folio, false);
-			list_move_tail(&folio->lru, &lrugen->lists[new_gen][type][zone]);
+			list_move_tail(&folio->lru, &lrugen->folios[new_gen][type][zone]);
=20
 			if (!--remaining)
 				return false;
@@ -4310,7 +4310,7 @@ static bool try_to_inc_min_seq(struct lruvec *lruvec,=
 bool can_swap)
 			gen =3D lru_gen_from_seq(min_seq[type]);
=20
 			for (zone =3D 0; zone < MAX_NR_ZONES; zone++) {
-				if (!list_empty(&lrugen->lists[gen][type][zone]))
+				if (!list_empty(&lrugen->folios[gen][type][zone]))
 					goto next;
 			}
=20
@@ -4775,7 +4775,7 @@ static bool sort_folio(struct lruvec *lruvec, struct =
folio *folio, int tier_idx)
=20
 	/* promoted */
 	if (gen !=3D lru_gen_from_seq(lrugen->min_seq[type])) {
-		list_move(&folio->lru, &lrugen->lists[gen][type][zone]);
+		list_move(&folio->lru, &lrugen->folios[gen][type][zone]);
 		return true;
 	}
=20
@@ -4784,7 +4784,7 @@ static bool sort_folio(struct lruvec *lruvec, struct =
folio *folio, int tier_idx)
 		int hist =3D lru_hist_from_seq(lrugen->min_seq[type]);
=20
 		gen =3D folio_inc_gen(lruvec, folio, false);
-		list_move_tail(&folio->lru, &lrugen->lists[gen][type][zone]);
+		list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]);
=20
 		WRITE_ONCE(lrugen->protected[hist][type][tier - 1],
 			   lrugen->protected[hist][type][tier - 1] + delta);
@@ -4796,7 +4796,7 @@ static bool sort_folio(struct lruvec *lruvec, struct =
folio *folio, int tier_idx)
 	if (folio_test_locked(folio) || folio_test_writeback(folio) ||
 	    (type =3D=3D LRU_GEN_FILE && folio_test_dirty(folio))) {
 		gen =3D folio_inc_gen(lruvec, folio, true);
-		list_move(&folio->lru, &lrugen->lists[gen][type][zone]);
+		list_move(&folio->lru, &lrugen->folios[gen][type][zone]);
 		return true;
 	}
=20
@@ -4863,7 +4863,7 @@ static int scan_folios(struct lruvec *lruvec, struct =
scan_control *sc,
 	for (zone =3D sc->reclaim_idx; zone >=3D 0; zone--) {
 		LIST_HEAD(moved);
 		int skipped =3D 0;
-		struct list_head *head =3D &lrugen->lists[gen][type][zone];
+		struct list_head *head =3D &lrugen->folios[gen][type][zone];
=20
 		while (!list_empty(head)) {
 			struct folio *folio =3D lru_to_folio(head);
@@ -5264,7 +5264,7 @@ static bool __maybe_unused state_is_valid(struct lruv=
ec *lruvec)
 		int gen, type, zone;
=20
 		for_each_gen_type_zone(gen, type, zone) {
-			if (!list_empty(&lrugen->lists[gen][type][zone]))
+			if (!list_empty(&lrugen->folios[gen][type][zone]))
 				return false;
 		}
 	}
@@ -5309,7 +5309,7 @@ static bool drain_evictable(struct lruvec *lruvec)
 	int remaining =3D MAX_LRU_BATCH;
=20
 	for_each_gen_type_zone(gen, type, zone) {
-		struct list_head *head =3D &lruvec->lrugen.lists[gen][type][zone];
+		struct list_head *head =3D &lruvec->lrugen.folios[gen][type][zone];
=20
 		while (!list_empty(head)) {
 			bool success;
@@ -5843,7 +5843,7 @@ void lru_gen_init_lruvec(struct lruvec *lruvec)
 		lrugen->timestamps[i] =3D jiffies;
=20
 	for_each_gen_type_zone(gen, type, zone)
-		INIT_LIST_HEAD(&lrugen->lists[gen][type][zone]);
+		INIT_LIST_HEAD(&lrugen->folios[gen][type][zone]);
=20
 	lruvec->mm_state.seq =3D MIN_NR_GENS;
 	init_waitqueue_head(&lruvec->mm_state.wait);
--=20
2.39.0.314.g84b9a713c41-goog
From nobody Wed Sep 17 06:42:57 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 08D4FC4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 04:20:29 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235076AbiLVEU0 (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 23:20:26 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44816 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235004AbiLVETt (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 23:19:49 -0500
Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com
 [IPv6:2607:f8b0:4864:20::b4a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 504F7233B0
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:45 -0800 (PST)
Received: by mail-yb1-xb4a.google.com with SMTP id
 z17-20020a25e311000000b00719e04e59e1so689077ybd.10
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=BDLLEQAWPiuWvu6uAw7SlKRBGnG2Q5GXtGAMB+TrPE8=;
        b=YYDj3c5fM6hkrbkgsT5AQ5QuP7PmzqKTtUZOCkQp6tMZe2KtlQXDXuSeID5qZPLv5w
         uekCpC4/7VqRHl3u7ovnnOyeEqj/EsywNvGk0SU8TWCKdL5T+UUJ8v4uRxtZmS6vPnl7
         Wzj8lALjPQjik55Xsborgrd9EdV5KGHkidFuPRxek7BFO5jMvsBbkzoNYwVZleOoZih8
         UbO7SNB2W/YpVQufPScszqS4sLh/XpytF5xvF1sxeegGQeYmCWYdyiAVgWPTJHcniUSO
         tcG3i3xlYkukeP0JsAW7kB7ooTJnC5WmKclKn5JT2ujNOCPHbdsDPc8vVma69pRmczRk
         1nkQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=BDLLEQAWPiuWvu6uAw7SlKRBGnG2Q5GXtGAMB+TrPE8=;
        b=Mp4DVX1Cv0o3iPYIMwGf2JeA9ockLqcWiGL0ydhu4B/79UfDO8mkY7vpQn7mDXF8Gx
         REcylFz8mjhLeMz6cDVGEDKCRGqBUvJsGbO1UNgPOwbw2mcIVHsbm3wBX0HVnn8+/t1s
         joaAhevtuMtXV2as/Rfh01Z2EWabXUYLt70qpjCVqkZ/EAeWJAwvIC7ISaiZR8xMp2EV
         SqTUt6onMu8buUxP7e7eeXiETQjV2SxaVHpFJKa3U4PugqAcssD9vph+pgpp3oUb6nK6
         5s1+x7xQpD+LuRcVpjZyHkBE4RF+tfuHH+YYqkcszsfCun2M7dKWQkh5Vbp4BTlDfpBM
         MKjA==
X-Gm-Message-State: AFqh2kqJr//+INtPKcmjq96P9cMm2S3DLHcDBD5+9pcvk7C6Gb3yJ4d9
        RhJIGcfxio9H+QlwKyEIB+Xpitg3+Lo=
X-Google-Smtp-Source: 
 AMrXdXs9f5JEJZRSlLA1c9vebSH3Lts9X7RAt6HHv3GXYgSIKO08I/5btJz8yV9BWIG6W3F5rzxEG2SadvM=
X-Received: from yuzhao.bld.corp.google.com
 ([2620:15c:183:200:a463:5f7b:440e:5c77])
 (user=yuzhao job=sendgmr) by 2002:a25:dcc7:0:b0:6fa:5ab4:12b5 with SMTP id
 y190-20020a25dcc7000000b006fa5ab412b5mr381185ybe.620.1671682784642; Wed, 21
 Dec 2022 20:19:44 -0800 (PST)
Date: Wed, 21 Dec 2022 21:19:01 -0700
In-Reply-To: <20221222041905.2431096-1-yuzhao@google.com>
Message-Id: <20221222041905.2431096-4-yuzhao@google.com>
Mime-Version: 1.0
References: <20221222041905.2431096-1-yuzhao@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Subject: [PATCH mm-unstable v3 3/8] mm: multi-gen LRU: remove eviction
 fairness safeguard
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
        Jonathan Corbet <corbet@lwn.net>,
        Michael Larabel <michael@michaellarabel.com>,
        Michal Hocko <mhocko@kernel.org>,
        Mike Rapoport <rppt@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Suren Baghdasaryan <surenb@google.com>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org, linux-mm@google.com,
        Yu Zhao <yuzhao@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Recall that the eviction consumes the oldest generation: first it
bucket-sorts folios whose gen counters were updated by the aging and
reclaims the rest; then it increments lrugen->min_seq.

The current eviction fairness safeguard for global reclaim has a
dilemma: when there are multiple eligible memcgs, should it continue
or stop upon meeting the reclaim goal? If it continues, it overshoots
and increases direct reclaim latency; if it stops, it loses fairness
between memcgs it has taken memory away from and those it has yet to.

With memcg LRU, the eviction, while ensuring eventual fairness, will
stop upon meeting its goal. Therefore the current eviction fairness
safeguard for global reclaim will not be needed.

Note that memcg LRU only applies to global reclaim. For memcg reclaim,
the eviction will continue, even if it is overshooting. This becomes
unconditional due to code simplification.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/vmscan.c | 81 +++++++++++++++--------------------------------------
 1 file changed, 23 insertions(+), 58 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index d94d9fcabf36..49d7c103906a 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -449,6 +449,11 @@ static bool cgroup_reclaim(struct scan_control *sc)
 	return sc->target_mem_cgroup;
 }
=20
+static bool global_reclaim(struct scan_control *sc)
+{
+	return !sc->target_mem_cgroup || mem_cgroup_is_root(sc->target_mem_cgroup=
);
+}
+
 /**
  * writeback_throttling_sane - is the usual dirty throttling mechanism ava=
ilable?
  * @sc: scan_control in question
@@ -499,6 +504,11 @@ static bool cgroup_reclaim(struct scan_control *sc)
 	return false;
 }
=20
+static bool global_reclaim(struct scan_control *sc)
+{
+	return true;
+}
+
 static bool writeback_throttling_sane(struct scan_control *sc)
 {
 	return true;
@@ -5006,8 +5016,7 @@ static int isolate_folios(struct lruvec *lruvec, stru=
ct scan_control *sc, int sw
 	return scanned;
 }
=20
-static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, in=
t swappiness,
-			bool *need_swapping)
+static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, in=
t swappiness)
 {
 	int type;
 	int scanned;
@@ -5096,9 +5105,6 @@ static int evict_folios(struct lruvec *lruvec, struct=
 scan_control *sc, int swap
 		goto retry;
 	}
=20
-	if (need_swapping && type =3D=3D LRU_GEN_ANON)
-		*need_swapping =3D true;
-
 	return scanned;
 }
=20
@@ -5138,67 +5144,26 @@ static unsigned long get_nr_to_scan(struct lruvec *=
lruvec, struct scan_control *
 	return min_seq[!can_swap] + MIN_NR_GENS <=3D max_seq ? nr_to_scan : 0;
 }
=20
-static bool should_abort_scan(struct lruvec *lruvec, unsigned long seq,
-			      struct scan_control *sc, bool need_swapping)
+static unsigned long get_nr_to_reclaim(struct scan_control *sc)
 {
-	int i;
-	DEFINE_MAX_SEQ(lruvec);
+	/* don't abort memcg reclaim to ensure fairness */
+	if (!global_reclaim(sc))
+		return -1;
=20
-	if (!current_is_kswapd()) {
-		/* age each memcg at most once to ensure fairness */
-		if (max_seq - seq > 1)
-			return true;
+	/* discount the previous progress for kswapd */
+	if (current_is_kswapd())
+		return sc->nr_to_reclaim + sc->last_reclaimed;
=20
-		/* over-swapping can increase allocation latency */
-		if (sc->nr_reclaimed >=3D sc->nr_to_reclaim && need_swapping)
-			return true;
-
-		/* give this thread a chance to exit and free its memory */
-		if (fatal_signal_pending(current)) {
-			sc->nr_reclaimed +=3D MIN_LRU_BATCH;
-			return true;
-		}
-
-		if (cgroup_reclaim(sc))
-			return false;
-	} else if (sc->nr_reclaimed - sc->last_reclaimed < sc->nr_to_reclaim)
-		return false;
-
-	/* keep scanning at low priorities to ensure fairness */
-	if (sc->priority > DEF_PRIORITY - 2)
-		return false;
-
-	/*
-	 * A minimum amount of work was done under global memory pressure. For
-	 * kswapd, it may be overshooting. For direct reclaim, the allocation
-	 * may succeed if all suitable zones are somewhat safe. In either case,
-	 * it's better to stop now, and restart later if necessary.
-	 */
-	for (i =3D 0; i <=3D sc->reclaim_idx; i++) {
-		unsigned long wmark;
-		struct zone *zone =3D lruvec_pgdat(lruvec)->node_zones + i;
-
-		if (!managed_zone(zone))
-			continue;
-
-		wmark =3D current_is_kswapd() ? high_wmark_pages(zone) : low_wmark_pages=
(zone);
-		if (wmark > zone_page_state(zone, NR_FREE_PAGES))
-			return false;
-	}
-
-	sc->nr_reclaimed +=3D MIN_LRU_BATCH;
-
-	return true;
+	return max(sc->nr_to_reclaim, compact_gap(sc->order));
 }
=20
 static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_contr=
ol *sc)
 {
 	struct blk_plug plug;
 	bool need_aging =3D false;
-	bool need_swapping =3D false;
 	unsigned long scanned =3D 0;
 	unsigned long reclaimed =3D sc->nr_reclaimed;
-	DEFINE_MAX_SEQ(lruvec);
+	unsigned long nr_to_reclaim =3D get_nr_to_reclaim(sc);
=20
 	lru_add_drain();
=20
@@ -5222,7 +5187,7 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruv=
ec, struct scan_control *sc
 		if (!nr_to_scan)
 			goto done;
=20
-		delta =3D evict_folios(lruvec, sc, swappiness, &need_swapping);
+		delta =3D evict_folios(lruvec, sc, swappiness);
 		if (!delta)
 			goto done;
=20
@@ -5230,7 +5195,7 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruv=
ec, struct scan_control *sc
 		if (scanned >=3D nr_to_scan)
 			break;
=20
-		if (should_abort_scan(lruvec, max_seq, sc, need_swapping))
+		if (sc->nr_reclaimed >=3D nr_to_reclaim)
 			break;
=20
 		cond_resched();
@@ -5677,7 +5642,7 @@ static int run_eviction(struct lruvec *lruvec, unsign=
ed long seq, struct scan_co
 		if (sc->nr_reclaimed >=3D nr_to_reclaim)
 			return 0;
=20
-		if (!evict_folios(lruvec, sc, swappiness, NULL))
+		if (!evict_folios(lruvec, sc, swappiness))
 			return 0;
=20
 		cond_resched();
--=20
2.39.0.314.g84b9a713c41-goog
From nobody Wed Sep 17 06:42:57 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4E586C4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 04:20:36 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235106AbiLVEUa (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 23:20:30 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45032 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235006AbiLVETt (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 23:19:49 -0500
Received: from mail-yb1-xb49.google.com (mail-yb1-xb49.google.com
 [IPv6:2607:f8b0:4864:20::b49])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id EA1F7218AE
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:46 -0800 (PST)
Received: by mail-yb1-xb49.google.com with SMTP id
 h67-20020a25d046000000b00729876d3b2bso669196ybg.17
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:46 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=nzLJiZ4QM5Uh21mWC41LC4GBB5ZaSED6XDHCtugxc08=;
        b=LGK8h9wzmfZbkub7vTd8RNWpNWLSzm7G1OuZlNkyjd72d/DuTcDHYUL4xBQcaqX+6a
         iBARPo3dG0EKbr43vnaB9kXhiLdIVarVSxPRMSmvLc3GrMSsLYLgsfAVBTIA/B44vjLs
         so6T2/KZM+3aQXkpITzI4UM8KbocN8LznaSbs2lDpTf74F8ww3eWoj7GwxhNNeTZDBGS
         3JuGx5KWL6WL3jJU6O7VPlUhEW4+O/Q3JOV/j8eDtLU8oYPgWtWgLX5JetQCIv39UVGk
         62+J38ROwufIeFkVog6el5JN4Yi8QoJ0KVmDw0YykY5v+DyL1rqOiHGd579YyrOxuB05
         c1Sw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=nzLJiZ4QM5Uh21mWC41LC4GBB5ZaSED6XDHCtugxc08=;
        b=tDqJZrmPTylt74tiGCEk4ftN7PNLWnGI8UgZeuZGtPFpxh0A+wulrNapyaZ1JYqe3v
         MUckVEIIpPO45x38T5QdM/ZnaXs3T6D7uu+vqZhPoYjQxbgGlQYjwPXYnHxZRxt7L5Y2
         3kDXVgdXa9D3RVO5fRyhIWTXQOWMenzvO8QM9mQrTPCZsAOSe1AscDeIZUIHt4QX6ycp
         NNEq8lTwyu2MDeC1kbQV7Ie6H3708VrFAFcy/onntyJPqLuTNq4GvgmodUUvpirQ3PpS
         Nt4KZNGISreJykCim5Xv/TO25A1q3WoCHFyN6FiMcQ99l3fT3RjCq0Ji3nCjS1mpzaoP
         Ysmw==
X-Gm-Message-State: AFqh2koL2D725a+TsgymKO73F122Z9Upo+SdhYkXYQXGBNlf6xtn1P5L
        cvP/eQWE6d36wERiYoh1C8HvUABfopw=
X-Google-Smtp-Source: 
 AMrXdXu349F+GjLv+LsWuQvHnJ0vl+EFKxOrB01QlRFTB17yje2mwF78wvgJ6L9LCsoyXjHckCuxD+e8kGc=
X-Received: from yuzhao.bld.corp.google.com
 ([2620:15c:183:200:a463:5f7b:440e:5c77])
 (user=yuzhao job=sendgmr) by 2002:a05:690c:c95:b0:36c:aaa6:e571 with SMTP id
 cm21-20020a05690c0c9500b0036caaa6e571mr312608ywb.467.1671682786236; Wed, 21
 Dec 2022 20:19:46 -0800 (PST)
Date: Wed, 21 Dec 2022 21:19:02 -0700
In-Reply-To: <20221222041905.2431096-1-yuzhao@google.com>
Message-Id: <20221222041905.2431096-5-yuzhao@google.com>
Mime-Version: 1.0
References: <20221222041905.2431096-1-yuzhao@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Subject: [PATCH mm-unstable v3 4/8] mm: multi-gen LRU: remove aging fairness
 safeguard
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
        Jonathan Corbet <corbet@lwn.net>,
        Michael Larabel <michael@michaellarabel.com>,
        Michal Hocko <mhocko@kernel.org>,
        Mike Rapoport <rppt@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Suren Baghdasaryan <surenb@google.com>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org, linux-mm@google.com,
        Yu Zhao <yuzhao@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Recall that the aging produces the youngest generation: first it scans
for accessed folios and updates their gen counters; then it increments
lrugen->max_seq.

The current aging fairness safeguard for kswapd uses two passes to
ensure the fairness to multiple eligible memcgs. On the first pass,
which is shared with the eviction, it checks whether all eligible
memcgs are low on cold folios. If so, it requires a second pass, on
which it ages all those memcgs at the same time.

With memcg LRU, the aging, while ensuring eventual fairness, will run
when necessary. Therefore the current aging fairness safeguard for
kswapd will not be needed.

Note that memcg LRU only applies to global reclaim. For memcg reclaim,
the aging can be unfair to different memcgs, i.e., their
lrugen->max_seq can be incremented at different paces.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/vmscan.c | 150 +++++++++++++++++++++++++---------------------------
 1 file changed, 71 insertions(+), 79 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 49d7c103906a..65cc82208b6e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -137,7 +137,6 @@ struct scan_control {
=20
 #ifdef CONFIG_LRU_GEN
 	/* help kswapd make better choices among multiple memcgs */
-	unsigned int memcgs_need_aging:1;
 	unsigned long last_reclaimed;
 #endif
=20
@@ -4468,7 +4467,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec,=
 unsigned long max_seq,
 	return true;
 }
=20
-static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,=
 unsigned long *min_seq,
+static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
 			     struct scan_control *sc, bool can_swap, unsigned long *nr_to_scan)
 {
 	int gen, type, zone;
@@ -4477,6 +4476,13 @@ static bool should_run_aging(struct lruvec *lruvec, =
unsigned long max_seq, unsig
 	unsigned long total =3D 0;
 	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
 	struct mem_cgroup *memcg =3D lruvec_memcg(lruvec);
+	DEFINE_MIN_SEQ(lruvec);
+
+	/* whether this lruvec is completely out of cold folios */
+	if (min_seq[!can_swap] + MIN_NR_GENS > max_seq) {
+		*nr_to_scan =3D 0;
+		return true;
+	}
=20
 	for (type =3D !can_swap; type < ANON_AND_FILE; type++) {
 		unsigned long seq;
@@ -4505,8 +4511,6 @@ static bool should_run_aging(struct lruvec *lruvec, u=
nsigned long max_seq, unsig
 	 * stalls when the number of generations reaches MIN_NR_GENS. Hence, the
 	 * ideal number of generations is MIN_NR_GENS+1.
 	 */
-	if (min_seq[!can_swap] + MIN_NR_GENS > max_seq)
-		return true;
 	if (min_seq[!can_swap] + MIN_NR_GENS < max_seq)
 		return false;
=20
@@ -4525,40 +4529,54 @@ static bool should_run_aging(struct lruvec *lruvec,=
 unsigned long max_seq, unsig
 	return false;
 }
=20
-static bool age_lruvec(struct lruvec *lruvec, struct scan_control *sc, uns=
igned long min_ttl)
+static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *=
sc)
 {
-	bool need_aging;
-	unsigned long nr_to_scan;
-	int swappiness =3D get_swappiness(lruvec, sc);
+	int gen, type, zone;
+	unsigned long total =3D 0;
+	bool can_swap =3D get_swappiness(lruvec, sc);
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
 	struct mem_cgroup *memcg =3D lruvec_memcg(lruvec);
 	DEFINE_MAX_SEQ(lruvec);
 	DEFINE_MIN_SEQ(lruvec);
=20
+	for (type =3D !can_swap; type < ANON_AND_FILE; type++) {
+		unsigned long seq;
+
+		for (seq =3D min_seq[type]; seq <=3D max_seq; seq++) {
+			gen =3D lru_gen_from_seq(seq);
+
+			for (zone =3D 0; zone < MAX_NR_ZONES; zone++)
+				total +=3D max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
+		}
+	}
+
+	/* whether the size is big enough to be helpful */
+	return mem_cgroup_online(memcg) ? (total >> sc->priority) : total;
+}
+
+static bool lruvec_is_reclaimable(struct lruvec *lruvec, struct scan_contr=
ol *sc,
+				  unsigned long min_ttl)
+{
+	int gen;
+	unsigned long birth;
+	struct mem_cgroup *memcg =3D lruvec_memcg(lruvec);
+	DEFINE_MIN_SEQ(lruvec);
+
 	VM_WARN_ON_ONCE(sc->memcg_low_reclaim);
=20
+	/* see the comment on lru_gen_folio */
+	gen =3D lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
+	birth =3D READ_ONCE(lruvec->lrugen.timestamps[gen]);
+
+	if (time_is_after_jiffies(birth + min_ttl))
+		return false;
+
+	if (!lruvec_is_sizable(lruvec, sc))
+		return false;
+
 	mem_cgroup_calculate_protection(NULL, memcg);
=20
-	if (mem_cgroup_below_min(NULL, memcg))
-		return false;
-
-	need_aging =3D should_run_aging(lruvec, max_seq, min_seq, sc, swappiness,=
 &nr_to_scan);
-
-	if (min_ttl) {
-		int gen =3D lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
-		unsigned long birth =3D READ_ONCE(lruvec->lrugen.timestamps[gen]);
-
-		if (time_is_after_jiffies(birth + min_ttl))
-			return false;
-
-		/* the size is likely too small to be helpful */
-		if (!nr_to_scan && sc->priority !=3D DEF_PRIORITY)
-			return false;
-	}
-
-	if (need_aging)
-		try_to_inc_max_seq(lruvec, max_seq, sc, swappiness, false);
-
-	return true;
+	return !mem_cgroup_below_min(NULL, memcg);
 }
=20
 /* to protect the working set of the last N jiffies */
@@ -4567,46 +4585,32 @@ static unsigned long lru_gen_min_ttl __read_mostly;
 static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_contro=
l *sc)
 {
 	struct mem_cgroup *memcg;
-	bool success =3D false;
 	unsigned long min_ttl =3D READ_ONCE(lru_gen_min_ttl);
=20
 	VM_WARN_ON_ONCE(!current_is_kswapd());
=20
 	sc->last_reclaimed =3D sc->nr_reclaimed;
=20
-	/*
-	 * To reduce the chance of going into the aging path, which can be
-	 * costly, optimistically skip it if the flag below was cleared in the
-	 * eviction path. This improves the overall performance when multiple
-	 * memcgs are available.
-	 */
-	if (!sc->memcgs_need_aging) {
-		sc->memcgs_need_aging =3D true;
-		return;
-	}
-
-	set_mm_walk(pgdat);
-
-	memcg =3D mem_cgroup_iter(NULL, NULL, NULL);
-	do {
-		struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, pgdat);
-
-		if (age_lruvec(lruvec, sc, min_ttl))
-			success =3D true;
-
-		cond_resched();
-	} while ((memcg =3D mem_cgroup_iter(NULL, memcg, NULL)));
-
-	clear_mm_walk();
-
 	/* check the order to exclude compaction-induced reclaim */
-	if (success || !min_ttl || sc->order)
+	if (!min_ttl || sc->order || sc->priority =3D=3D DEF_PRIORITY)
 		return;
=20
+	memcg =3D mem_cgroup_iter(NULL, NULL, NULL);
+	do {
+		struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, pgdat);
+
+		if (lruvec_is_reclaimable(lruvec, sc, min_ttl)) {
+			mem_cgroup_iter_break(NULL, memcg);
+			return;
+		}
+
+		cond_resched();
+	} while ((memcg =3D mem_cgroup_iter(NULL, memcg, NULL)));
+
 	/*
 	 * The main goal is to OOM kill if every generation from all memcgs is
 	 * younger than min_ttl. However, another possibility is all memcgs are
-	 * either below min or empty.
+	 * either too small or below min.
 	 */
 	if (mutex_trylock(&oom_lock)) {
 		struct oom_control oc =3D {
@@ -5114,34 +5118,28 @@ static int evict_folios(struct lruvec *lruvec, stru=
ct scan_control *sc, int swap
  *    reclaim.
  */
 static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_con=
trol *sc,
-				    bool can_swap, bool *need_aging)
+				    bool can_swap)
 {
 	unsigned long nr_to_scan;
 	struct mem_cgroup *memcg =3D lruvec_memcg(lruvec);
 	DEFINE_MAX_SEQ(lruvec);
-	DEFINE_MIN_SEQ(lruvec);
=20
 	if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg) ||
 	    (mem_cgroup_below_low(sc->target_mem_cgroup, memcg) &&
 	     !sc->memcg_low_reclaim))
 		return 0;
=20
-	*need_aging =3D should_run_aging(lruvec, max_seq, min_seq, sc, can_swap, =
&nr_to_scan);
-	if (!*need_aging)
+	if (!should_run_aging(lruvec, max_seq, sc, can_swap, &nr_to_scan))
 		return nr_to_scan;
=20
 	/* skip the aging path at the default priority */
 	if (sc->priority =3D=3D DEF_PRIORITY)
-		goto done;
-
-	/* leave the work to lru_gen_age_node() */
-	if (current_is_kswapd())
-		return 0;
-
-	if (try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false))
 		return nr_to_scan;
-done:
-	return min_seq[!can_swap] + MIN_NR_GENS <=3D max_seq ? nr_to_scan : 0;
+
+	try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false);
+
+	/* skip this lruvec as it's low on cold folios */
+	return 0;
 }
=20
 static unsigned long get_nr_to_reclaim(struct scan_control *sc)
@@ -5160,9 +5158,7 @@ static unsigned long get_nr_to_reclaim(struct scan_co=
ntrol *sc)
 static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_contr=
ol *sc)
 {
 	struct blk_plug plug;
-	bool need_aging =3D false;
 	unsigned long scanned =3D 0;
-	unsigned long reclaimed =3D sc->nr_reclaimed;
 	unsigned long nr_to_reclaim =3D get_nr_to_reclaim(sc);
=20
 	lru_add_drain();
@@ -5183,13 +5179,13 @@ static void lru_gen_shrink_lruvec(struct lruvec *lr=
uvec, struct scan_control *sc
 		else
 			swappiness =3D 0;
=20
-		nr_to_scan =3D get_nr_to_scan(lruvec, sc, swappiness, &need_aging);
+		nr_to_scan =3D get_nr_to_scan(lruvec, sc, swappiness);
 		if (!nr_to_scan)
-			goto done;
+			break;
=20
 		delta =3D evict_folios(lruvec, sc, swappiness);
 		if (!delta)
-			goto done;
+			break;
=20
 		scanned +=3D delta;
 		if (scanned >=3D nr_to_scan)
@@ -5201,10 +5197,6 @@ static void lru_gen_shrink_lruvec(struct lruvec *lru=
vec, struct scan_control *sc
 		cond_resched();
 	}
=20
-	/* see the comment in lru_gen_age_node() */
-	if (sc->nr_reclaimed - reclaimed >=3D MIN_LRU_BATCH && !need_aging)
-		sc->memcgs_need_aging =3D false;
-done:
 	clear_mm_walk();
=20
 	blk_finish_plug(&plug);
--=20
2.39.0.314.g84b9a713c41-goog
From nobody Wed Sep 17 06:42:57 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 73143C4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 04:20:40 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235109AbiLVEUj (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 23:20:39 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45034 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235016AbiLVETu (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 23:19:50 -0500
Received: from mail-il1-x14a.google.com (mail-il1-x14a.google.com
 [IPv6:2607:f8b0:4864:20::14a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 709A522295
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:48 -0800 (PST)
Received: by mail-il1-x14a.google.com with SMTP id
 l3-20020a056e021aa300b00304be32e9e5so487800ilv.12
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:48 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=S9ZuAyfwqYLskluTxeMZ1Oyj25JaC3jpnfC7bM5WyD4=;
        b=awokrXW2IaKsda0feV8JW/aYhMrUqKzEHQX+f5UsTBU2+8KVpH/XEmCKdVW/3j64+P
         LX4/xKKwf3yGaOd1RMWjLx+SBKbq5b79RaWXbvO3uvqC1rK19G2lTd+XOyNc4OAudt0G
         BNVuAfZmPTiS12q4Pp+vrGp7KjeUeKAD9fMxEibkXqvB9XoQo6+4i1rCFGj5s/wHCCxE
         YvcF1Rui1WagXwkNShtpDS8/0ZLm9UBhXa7tsws0qPh8YIdTW6gtnBpxOGzSDL8IeNie
         vd507zZFTZU4yPE5FV0HqFBmaZ/37IUyHVnBJLcsj1LUs4dMyMtEtQhrG/k+tqR6EKRe
         4JXw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=S9ZuAyfwqYLskluTxeMZ1Oyj25JaC3jpnfC7bM5WyD4=;
        b=YBzsnGRXmDqbkoWe7sN9lr7nou4wJpv8/SDWWuqjESTPn2F35iuz7pfjvcsGKqFBOj
         hbt7INYrNnsJsdVqH7pMMu+CgnfP9Fcyybc7XWfK0rkgFtS93hZJ5hCmsnBe/0EcObyJ
         kYAYA/4PEI8e8nsWu9mgGVxImnnvPoBVrJ+4dVeQNLpJA+Nv2DuomUyHaG18J7AXVVCB
         kZPFgn5Wbh+gfX+klmegxkRiaTef+kVMCdeb9htSEZ2PKyMWwaj6AtRQtXJ/fG0ICdMy
         ovzrIQBnFVvgcSENLIcafaIgRlpqYGz90mbDRMbjAQIulg85mj0pQ+dhBWT66IAiO3v/
         fJvQ==
X-Gm-Message-State: AFqh2kq6vjOLHLWgSRuEu5YxRWcyV0cnQpPBVyfwA1tn7CLCJkOYh3w9
        8uFFQiTYpB8325Ijf8cmCtaWifZzOdg=
X-Google-Smtp-Source: 
 AMrXdXuEY3YkuSyiXYaG/rF71AtXpvUi99HGEHzJyq69UdzONGfza3jeMcIIHRBjZ3l13oxwVLYTLfNu04M=
X-Received: from yuzhao.bld.corp.google.com
 ([2620:15c:183:200:a463:5f7b:440e:5c77])
 (user=yuzhao job=sendgmr) by 2002:a92:cb42:0:b0:305:eba6:78ab with SMTP id
 f2-20020a92cb42000000b00305eba678abmr300227ilq.316.1671682787850; Wed, 21 Dec
 2022 20:19:47 -0800 (PST)
Date: Wed, 21 Dec 2022 21:19:03 -0700
In-Reply-To: <20221222041905.2431096-1-yuzhao@google.com>
Message-Id: <20221222041905.2431096-6-yuzhao@google.com>
Mime-Version: 1.0
References: <20221222041905.2431096-1-yuzhao@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Subject: [PATCH mm-unstable v3 5/8] mm: multi-gen LRU: shuffle
 should_run_aging()
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
        Jonathan Corbet <corbet@lwn.net>,
        Michael Larabel <michael@michaellarabel.com>,
        Michal Hocko <mhocko@kernel.org>,
        Mike Rapoport <rppt@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Suren Baghdasaryan <surenb@google.com>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org, linux-mm@google.com,
        Yu Zhao <yuzhao@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Move should_run_aging() next to its only caller left.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/vmscan.c | 124 ++++++++++++++++++++++++++--------------------------
 1 file changed, 62 insertions(+), 62 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 65cc82208b6e..dd9f7b7abe1c 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4467,68 +4467,6 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec=
, unsigned long max_seq,
 	return true;
 }
=20
-static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
-			     struct scan_control *sc, bool can_swap, unsigned long *nr_to_scan)
-{
-	int gen, type, zone;
-	unsigned long old =3D 0;
-	unsigned long young =3D 0;
-	unsigned long total =3D 0;
-	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
-	struct mem_cgroup *memcg =3D lruvec_memcg(lruvec);
-	DEFINE_MIN_SEQ(lruvec);
-
-	/* whether this lruvec is completely out of cold folios */
-	if (min_seq[!can_swap] + MIN_NR_GENS > max_seq) {
-		*nr_to_scan =3D 0;
-		return true;
-	}
-
-	for (type =3D !can_swap; type < ANON_AND_FILE; type++) {
-		unsigned long seq;
-
-		for (seq =3D min_seq[type]; seq <=3D max_seq; seq++) {
-			unsigned long size =3D 0;
-
-			gen =3D lru_gen_from_seq(seq);
-
-			for (zone =3D 0; zone < MAX_NR_ZONES; zone++)
-				size +=3D max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
-
-			total +=3D size;
-			if (seq =3D=3D max_seq)
-				young +=3D size;
-			else if (seq + MIN_NR_GENS =3D=3D max_seq)
-				old +=3D size;
-		}
-	}
-
-	/* try to scrape all its memory if this memcg was deleted */
-	*nr_to_scan =3D mem_cgroup_online(memcg) ? (total >> sc->priority) : tota=
l;
-
-	/*
-	 * The aging tries to be lazy to reduce the overhead, while the eviction
-	 * stalls when the number of generations reaches MIN_NR_GENS. Hence, the
-	 * ideal number of generations is MIN_NR_GENS+1.
-	 */
-	if (min_seq[!can_swap] + MIN_NR_GENS < max_seq)
-		return false;
-
-	/*
-	 * It's also ideal to spread pages out evenly, i.e., 1/(MIN_NR_GENS+1)
-	 * of the total number of pages for each generation. A reasonable range
-	 * for this average portion is [1/MIN_NR_GENS, 1/(MIN_NR_GENS+2)]. The
-	 * aging cares about the upper bound of hot pages, while the eviction
-	 * cares about the lower bound of cold pages.
-	 */
-	if (young * MIN_NR_GENS > total)
-		return true;
-	if (old * (MIN_NR_GENS + 2) < total)
-		return true;
-
-	return false;
-}
-
 static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *=
sc)
 {
 	int gen, type, zone;
@@ -5112,6 +5050,68 @@ static int evict_folios(struct lruvec *lruvec, struc=
t scan_control *sc, int swap
 	return scanned;
 }
=20
+static bool should_run_aging(struct lruvec *lruvec, unsigned long max_seq,
+			     struct scan_control *sc, bool can_swap, unsigned long *nr_to_scan)
+{
+	int gen, type, zone;
+	unsigned long old =3D 0;
+	unsigned long young =3D 0;
+	unsigned long total =3D 0;
+	struct lru_gen_folio *lrugen =3D &lruvec->lrugen;
+	struct mem_cgroup *memcg =3D lruvec_memcg(lruvec);
+	DEFINE_MIN_SEQ(lruvec);
+
+	/* whether this lruvec is completely out of cold folios */
+	if (min_seq[!can_swap] + MIN_NR_GENS > max_seq) {
+		*nr_to_scan =3D 0;
+		return true;
+	}
+
+	for (type =3D !can_swap; type < ANON_AND_FILE; type++) {
+		unsigned long seq;
+
+		for (seq =3D min_seq[type]; seq <=3D max_seq; seq++) {
+			unsigned long size =3D 0;
+
+			gen =3D lru_gen_from_seq(seq);
+
+			for (zone =3D 0; zone < MAX_NR_ZONES; zone++)
+				size +=3D max(READ_ONCE(lrugen->nr_pages[gen][type][zone]), 0L);
+
+			total +=3D size;
+			if (seq =3D=3D max_seq)
+				young +=3D size;
+			else if (seq + MIN_NR_GENS =3D=3D max_seq)
+				old +=3D size;
+		}
+	}
+
+	/* try to scrape all its memory if this memcg was deleted */
+	*nr_to_scan =3D mem_cgroup_online(memcg) ? (total >> sc->priority) : tota=
l;
+
+	/*
+	 * The aging tries to be lazy to reduce the overhead, while the eviction
+	 * stalls when the number of generations reaches MIN_NR_GENS. Hence, the
+	 * ideal number of generations is MIN_NR_GENS+1.
+	 */
+	if (min_seq[!can_swap] + MIN_NR_GENS < max_seq)
+		return false;
+
+	/*
+	 * It's also ideal to spread pages out evenly, i.e., 1/(MIN_NR_GENS+1)
+	 * of the total number of pages for each generation. A reasonable range
+	 * for this average portion is [1/MIN_NR_GENS, 1/(MIN_NR_GENS+2)]. The
+	 * aging cares about the upper bound of hot pages, while the eviction
+	 * cares about the lower bound of cold pages.
+	 */
+	if (young * MIN_NR_GENS > total)
+		return true;
+	if (old * (MIN_NR_GENS + 2) < total)
+		return true;
+
+	return false;
+}
+
 /*
  * For future optimizations:
  * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
--=20
2.39.0.314.g84b9a713c41-goog
From nobody Wed Sep 17 06:42:57 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6E404C4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 04:20:46 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235111AbiLVEUp (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 23:20:45 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44818 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S231897AbiLVETw (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 23:19:52 -0500
Received: from mail-yb1-xb4a.google.com (mail-yb1-xb4a.google.com
 [IPv6:2607:f8b0:4864:20::b4a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 24EBD220CE
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:50 -0800 (PST)
Received: by mail-yb1-xb4a.google.com with SMTP id
 h66-20020a252145000000b0071a7340eea9so691631ybh.6
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:50 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=QjWoCnNqJvghgGM5xTq+UZn44MWyzd/ooNIjEotcyzA=;
        b=gVk09z+meyISg3VNqxt5i7DDOmbulc2ogiSqR7HhEgg8qNueR1ZAOmFWTdW3mwkeeQ
         IvEDmXNlqlKFrpRcz2ljGW8kWPN+h3qdjnYAptEnAV1jH0bDsA4vYJ+91Sb2idREYo1j
         +efmSy6zYsa+g/tMMmiPYuElmaiJ47XK9c8G/dbPrOgQ/EpDV0o1wlf4494YHu86U1yL
         8YMhav2fmfTLN/no9ZGrYYAK7rxYzs/9S4GrQyAQoJv2WmMKhBQxAFt1Y6im40/IF1bC
         +/7vaB8LjBU2OoDnzmMmg82U7BAcrIC+s33gtUCKePMPVtx0OEr9dxpmGjmEGk+9bjdJ
         JtLg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=QjWoCnNqJvghgGM5xTq+UZn44MWyzd/ooNIjEotcyzA=;
        b=XI81OSqM2oXOHJexlPO9L87ckHZnUPJiOCJRV7QlYbs5vxuAEYY421d2FnbwSsGvuT
         DLdWpVu2WLJBtJVTzVEBn2LpzInb+0KLlzzOjcTjEvARu5Rlr6f1Q9fqExK+e/Igj73a
         yMkV/YJoB9VAhpYHGQdSoLMrvQmk7aBNFXOOSLwSfp/s9kS5VioAGjn3c7MinwHMq8uj
         1Jds+fzSTBp3d30vTLXgFUDgfIERnB7alTQ6jBRt1DLTF+pkFbJNg/yaBiRguZGYAvtm
         iLcKiB6CPiRgDD202powwfjPKrLBcSAtYwZC9TW50NTBdQM9xnUKt4cdG+4urQZiw0qP
         aj3w==
X-Gm-Message-State: AFqh2krPzHri6AK/HVhLybVoFNb2ecfVBPsLTdBZR7TE7bsAqA3K7FRe
        sDYrJybrCl2jz6LklUwLCZs0Fh6+4ok=
X-Google-Smtp-Source: 
 AMrXdXu6MLU36MR7Wmyx5EZalLxbE9elTeO/qB32Kp9eR42Wvc0UXcJxCD+Vc4NU6y1HfeFn+akh/l3nnzU=
X-Received: from yuzhao.bld.corp.google.com
 ([2620:15c:183:200:a463:5f7b:440e:5c77])
 (user=yuzhao job=sendgmr) by 2002:a25:bc8b:0:b0:6fb:efbb:f588 with SMTP id
 e11-20020a25bc8b000000b006fbefbbf588mr429922ybk.395.1671682789357; Wed, 21
 Dec 2022 20:19:49 -0800 (PST)
Date: Wed, 21 Dec 2022 21:19:04 -0700
In-Reply-To: <20221222041905.2431096-1-yuzhao@google.com>
Message-Id: <20221222041905.2431096-7-yuzhao@google.com>
Mime-Version: 1.0
References: <20221222041905.2431096-1-yuzhao@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Subject: [PATCH mm-unstable v3 6/8] mm: multi-gen LRU: per-node lru_gen_folio
 lists
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
        Jonathan Corbet <corbet@lwn.net>,
        Michael Larabel <michael@michaellarabel.com>,
        Michal Hocko <mhocko@kernel.org>,
        Mike Rapoport <rppt@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Suren Baghdasaryan <surenb@google.com>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org, linux-mm@google.com,
        Yu Zhao <yuzhao@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

For each node, memcgs are divided into two generations: the old and
the young. For each generation, memcgs are randomly sharded into
multiple bins to improve scalability. For each bin, an RCU hlist_nulls
is virtually divided into three segments: the head, the tail and the
default.

An onlining memcg is added to the tail of a random bin in the old
generation. The eviction starts at the head of a random bin in the old
generation. The per-node memcg generation counter, whose reminder (mod
2) indexes the old generation, is incremented when all its bins become
empty.

There are four operations:
1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin in
   its current generation (old or young) and updates its "seg" to
   "head";
2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin in
   its current generation (old or young) and updates its "seg" to
   "tail";
3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in
   the old generation, updates its "gen" to "old" and resets its "seg"
   to "default";
4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin
   in the young generation, updates its "gen" to "young" and resets
   its "seg" to "default".

The events that trigger the above operations are:
1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD;
2. The first attempt to reclaim an memcg below low, which triggers
   MEMCG_LRU_TAIL;
3. The first attempt to reclaim an memcg below reclaimable size
   threshold, which triggers MEMCG_LRU_TAIL;
4. The second attempt to reclaim an memcg below reclaimable size
   threshold, which triggers MEMCG_LRU_YOUNG;
5. Attempting to reclaim an memcg below min, which triggers
   MEMCG_LRU_YOUNG;
6. Finishing the aging on the eviction path, which triggers
   MEMCG_LRU_YOUNG;
7. Offlining an memcg, which triggers MEMCG_LRU_OLD.

Note that memcg LRU only applies to global reclaim, and the
round-robin incrementing of their max_seq counters ensures the
eventual fairness to all eligible memcgs. For memcg reclaim, it still
relies on mem_cgroup_iter().

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 include/linux/memcontrol.h |  10 +
 include/linux/mm_inline.h  |  17 ++
 include/linux/mmzone.h     | 117 +++++++++++-
 mm/memcontrol.c            |  16 ++
 mm/page_alloc.c            |   1 +
 mm/vmscan.c                | 374 +++++++++++++++++++++++++++++++++----
 6 files changed, 500 insertions(+), 35 deletions(-)

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index d3c8203cab6c..2e08b05bc6bf 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -794,6 +794,11 @@ static inline void obj_cgroup_put(struct obj_cgroup *o=
bjcg)
 	percpu_ref_put(&objcg->refcnt);
 }
=20
+static inline bool mem_cgroup_tryget(struct mem_cgroup *memcg)
+{
+	return !memcg || css_tryget(&memcg->css);
+}
+
 static inline void mem_cgroup_put(struct mem_cgroup *memcg)
 {
 	if (memcg)
@@ -1301,6 +1306,11 @@ static inline void obj_cgroup_put(struct obj_cgroup =
*objcg)
 {
 }
=20
+static inline bool mem_cgroup_tryget(struct mem_cgroup *memcg)
+{
+	return true;
+}
+
 static inline void mem_cgroup_put(struct mem_cgroup *memcg)
 {
 }
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index da38e3d962e2..c1fd3922dc5d 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -122,6 +122,18 @@ static inline bool lru_gen_in_fault(void)
 	return current->in_lru_fault;
 }
=20
+#ifdef CONFIG_MEMCG
+static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+	return READ_ONCE(lruvec->lrugen.seg);
+}
+#else
+static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+	return 0;
+}
+#endif
+
 static inline int lru_gen_from_seq(unsigned long seq)
 {
 	return seq % MAX_NR_GENS;
@@ -297,6 +309,11 @@ static inline bool lru_gen_in_fault(void)
 	return false;
 }
=20
+static inline int lru_gen_memcg_seg(struct lruvec *lruvec)
+{
+	return 0;
+}
+
 static inline bool lru_gen_add_folio(struct lruvec *lruvec, struct folio *=
folio, bool reclaiming)
 {
 	return false;
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6c96ee823dbd..815c7c2edf45 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -7,6 +7,7 @@
=20
 #include <linux/spinlock.h>
 #include <linux/list.h>
+#include <linux/list_nulls.h>
 #include <linux/wait.h>
 #include <linux/bitops.h>
 #include <linux/cache.h>
@@ -367,6 +368,15 @@ struct page_vma_mapped_walk;
 #define LRU_GEN_MASK		((BIT(LRU_GEN_WIDTH) - 1) << LRU_GEN_PGOFF)
 #define LRU_REFS_MASK		((BIT(LRU_REFS_WIDTH) - 1) << LRU_REFS_PGOFF)
=20
+/* see the comment on MEMCG_NR_GENS */
+enum {
+	MEMCG_LRU_NOP,
+	MEMCG_LRU_HEAD,
+	MEMCG_LRU_TAIL,
+	MEMCG_LRU_OLD,
+	MEMCG_LRU_YOUNG,
+};
+
 #ifdef CONFIG_LRU_GEN
=20
 enum {
@@ -426,6 +436,14 @@ struct lru_gen_folio {
 	atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS];
 	/* whether the multi-gen LRU is enabled */
 	bool enabled;
+#ifdef CONFIG_MEMCG
+	/* the memcg generation this lru_gen_folio belongs to */
+	u8 gen;
+	/* the list segment this lru_gen_folio belongs to */
+	u8 seg;
+	/* per-node lru_gen_folio list for global reclaim */
+	struct hlist_nulls_node list;
+#endif
 };
=20
 enum {
@@ -479,12 +497,87 @@ void lru_gen_init_lruvec(struct lruvec *lruvec);
 void lru_gen_look_around(struct page_vma_mapped_walk *pvmw);
=20
 #ifdef CONFIG_MEMCG
+
+/*
+ * For each node, memcgs are divided into two generations: the old and the
+ * young. For each generation, memcgs are randomly sharded into multiple b=
ins
+ * to improve scalability. For each bin, the hlist_nulls is virtually divi=
ded
+ * into three segments: the head, the tail and the default.
+ *
+ * An onlining memcg is added to the tail of a random bin in the old gener=
ation.
+ * The eviction starts at the head of a random bin in the old generation. =
The
+ * per-node memcg generation counter, whose reminder (mod MEMCG_NR_GENS) i=
ndexes
+ * the old generation, is incremented when all its bins become empty.
+ *
+ * There are four operations:
+ * 1. MEMCG_LRU_HEAD, which moves an memcg to the head of a random bin in =
its
+ *    current generation (old or young) and updates its "seg" to "head";
+ * 2. MEMCG_LRU_TAIL, which moves an memcg to the tail of a random bin in =
its
+ *    current generation (old or young) and updates its "seg" to "tail";
+ * 3. MEMCG_LRU_OLD, which moves an memcg to the head of a random bin in t=
he old
+ *    generation, updates its "gen" to "old" and resets its "seg" to "defa=
ult";
+ * 4. MEMCG_LRU_YOUNG, which moves an memcg to the tail of a random bin in=
 the
+ *    young generation, updates its "gen" to "young" and resets its "seg" =
to
+ *    "default".
+ *
+ * The events that trigger the above operations are:
+ * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD;
+ * 2. The first attempt to reclaim an memcg below low, which triggers
+ *    MEMCG_LRU_TAIL;
+ * 3. The first attempt to reclaim an memcg below reclaimable size thresho=
ld,
+ *    which triggers MEMCG_LRU_TAIL;
+ * 4. The second attempt to reclaim an memcg below reclaimable size thresh=
old,
+ *    which triggers MEMCG_LRU_YOUNG;
+ * 5. Attempting to reclaim an memcg below min, which triggers MEMCG_LRU_Y=
OUNG;
+ * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_Y=
OUNG;
+ * 7. Offlining an memcg, which triggers MEMCG_LRU_OLD.
+ *
+ * Note that memcg LRU only applies to global reclaim, and the round-robin
+ * incrementing of their max_seq counters ensures the eventual fairness to=
 all
+ * eligible memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(=
).
+ */
+#define MEMCG_NR_GENS	2
+#define MEMCG_NR_BINS	8
+
+struct lru_gen_memcg {
+	/* the per-node memcg generation counter */
+	unsigned long seq;
+	/* each memcg has one lru_gen_folio per node */
+	unsigned long nr_memcgs[MEMCG_NR_GENS];
+	/* per-node lru_gen_folio list for global reclaim */
+	struct hlist_nulls_head	fifo[MEMCG_NR_GENS][MEMCG_NR_BINS];
+	/* protects the above */
+	spinlock_t lock;
+};
+
+void lru_gen_init_pgdat(struct pglist_data *pgdat);
+
 void lru_gen_init_memcg(struct mem_cgroup *memcg);
 void lru_gen_exit_memcg(struct mem_cgroup *memcg);
-#endif
+void lru_gen_online_memcg(struct mem_cgroup *memcg);
+void lru_gen_offline_memcg(struct mem_cgroup *memcg);
+void lru_gen_release_memcg(struct mem_cgroup *memcg);
+void lru_gen_rotate_memcg(struct lruvec *lruvec, int op);
+
+#else /* !CONFIG_MEMCG */
+
+#define MEMCG_NR_GENS	1
+
+struct lru_gen_memcg {
+};
+
+static inline void lru_gen_init_pgdat(struct pglist_data *pgdat)
+{
+}
+
+#endif /* CONFIG_MEMCG */
=20
 #else /* !CONFIG_LRU_GEN */
=20
+static inline void lru_gen_init_pgdat(struct pglist_data *pgdat)
+{
+}
+
 static inline void lru_gen_init_lruvec(struct lruvec *lruvec)
 {
 }
@@ -494,6 +587,7 @@ static inline void lru_gen_look_around(struct page_vma_=
mapped_walk *pvmw)
 }
=20
 #ifdef CONFIG_MEMCG
+
 static inline void lru_gen_init_memcg(struct mem_cgroup *memcg)
 {
 }
@@ -501,7 +595,24 @@ static inline void lru_gen_init_memcg(struct mem_cgrou=
p *memcg)
 static inline void lru_gen_exit_memcg(struct mem_cgroup *memcg)
 {
 }
-#endif
+
+static inline void lru_gen_online_memcg(struct mem_cgroup *memcg)
+{
+}
+
+static inline void lru_gen_offline_memcg(struct mem_cgroup *memcg)
+{
+}
+
+static inline void lru_gen_release_memcg(struct mem_cgroup *memcg)
+{
+}
+
+static inline void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
+{
+}
+
+#endif /* CONFIG_MEMCG */
=20
 #endif /* CONFIG_LRU_GEN */
=20
@@ -1243,6 +1354,8 @@ typedef struct pglist_data {
 #ifdef CONFIG_LRU_GEN
 	/* kswap mm walk data */
 	struct lru_gen_mm_walk	mm_walk;
+	/* lru_gen_folio list */
+	struct lru_gen_memcg memcg_lru;
 #endif
=20
 	CACHELINE_PADDING(_pad2_);
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 92f319ef6c99..36200a1a448f 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -477,6 +477,16 @@ static void mem_cgroup_update_tree(struct mem_cgroup *=
memcg, int nid)
 	struct mem_cgroup_per_node *mz;
 	struct mem_cgroup_tree_per_node *mctz;
=20
+	if (lru_gen_enabled()) {
+		struct lruvec *lruvec =3D &memcg->nodeinfo[nid]->lruvec;
+
+		/* see the comment on MEMCG_NR_GENS */
+		if (soft_limit_excess(memcg) && lru_gen_memcg_seg(lruvec) !=3D MEMCG_LRU=
_HEAD)
+			lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD);
+
+		return;
+	}
+
 	mctz =3D soft_limit_tree.rb_tree_per_node[nid];
 	if (!mctz)
 		return;
@@ -3526,6 +3536,9 @@ unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t=
 *pgdat, int order,
 	struct mem_cgroup_tree_per_node *mctz;
 	unsigned long excess;
=20
+	if (lru_gen_enabled())
+		return 0;
+
 	if (order > 0)
 		return 0;
=20
@@ -5386,6 +5399,7 @@ static int mem_cgroup_css_online(struct cgroup_subsys=
_state *css)
 	if (unlikely(mem_cgroup_is_root(memcg)))
 		queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
 				   2UL*HZ);
+	lru_gen_online_memcg(memcg);
 	return 0;
 offline_kmem:
 	memcg_offline_kmem(memcg);
@@ -5417,6 +5431,7 @@ static void mem_cgroup_css_offline(struct cgroup_subs=
ys_state *css)
 	memcg_offline_kmem(memcg);
 	reparent_shrinker_deferred(memcg);
 	wb_memcg_offline(memcg);
+	lru_gen_offline_memcg(memcg);
=20
 	drain_all_stock(memcg);
=20
@@ -5428,6 +5443,7 @@ static void mem_cgroup_css_released(struct cgroup_sub=
sys_state *css)
 	struct mem_cgroup *memcg =3D mem_cgroup_from_css(css);
=20
 	invalidate_reclaim_iterators(memcg);
+	lru_gen_release_memcg(memcg);
 }
=20
 static void mem_cgroup_css_free(struct cgroup_subsys_state *css)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7d980dc0000e..5668c1a2de49 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -7941,6 +7941,7 @@ static void __init free_area_init_node(int nid)
 	pgdat_set_deferred_range(pgdat);
=20
 	free_area_init_core(pgdat);
+	lru_gen_init_pgdat(pgdat);
 }
=20
 static void __init free_area_init_memoryless_node(int nid)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dd9f7b7abe1c..f22c8876473e 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -55,6 +55,8 @@
 #include <linux/ctype.h>
 #include <linux/debugfs.h>
 #include <linux/khugepaged.h>
+#include <linux/rculist_nulls.h>
+#include <linux/random.h>
=20
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -135,11 +137,6 @@ struct scan_control {
 	/* Always discard instead of demoting to lower tier memory */
 	unsigned int no_demotion:1;
=20
-#ifdef CONFIG_LRU_GEN
-	/* help kswapd make better choices among multiple memcgs */
-	unsigned long last_reclaimed;
-#endif
-
 	/* Allocation order */
 	s8 order;
=20
@@ -3185,6 +3182,9 @@ DEFINE_STATIC_KEY_ARRAY_FALSE(lru_gen_caps, NR_LRU_GE=
N_CAPS);
 		for ((type) =3D 0; (type) < ANON_AND_FILE; (type)++)	\
 			for ((zone) =3D 0; (zone) < MAX_NR_ZONES; (zone)++)
=20
+#define get_memcg_gen(seq)	((seq) % MEMCG_NR_GENS)
+#define get_memcg_bin(bin)	((bin) % MEMCG_NR_BINS)
+
 static struct lruvec *get_lruvec(struct mem_cgroup *memcg, int nid)
 {
 	struct pglist_data *pgdat =3D NODE_DATA(nid);
@@ -4453,8 +4453,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec,=
 unsigned long max_seq,
 		if (sc->priority <=3D DEF_PRIORITY - 2)
 			wait_event_killable(lruvec->mm_state.wait,
 					    max_seq < READ_ONCE(lrugen->max_seq));
-
-		return max_seq < READ_ONCE(lrugen->max_seq);
+		return false;
 	}
=20
 	VM_WARN_ON_ONCE(max_seq !=3D READ_ONCE(lrugen->max_seq));
@@ -4527,8 +4526,6 @@ static void lru_gen_age_node(struct pglist_data *pgda=
t, struct scan_control *sc)
=20
 	VM_WARN_ON_ONCE(!current_is_kswapd());
=20
-	sc->last_reclaimed =3D sc->nr_reclaimed;
-
 	/* check the order to exclude compaction-induced reclaim */
 	if (!min_ttl || sc->order || sc->priority =3D=3D DEF_PRIORITY)
 		return;
@@ -5117,8 +5114,7 @@ static bool should_run_aging(struct lruvec *lruvec, u=
nsigned long max_seq,
  * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
  *    reclaim.
  */
-static unsigned long get_nr_to_scan(struct lruvec *lruvec, struct scan_con=
trol *sc,
-				    bool can_swap)
+static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc,=
 bool can_swap)
 {
 	unsigned long nr_to_scan;
 	struct mem_cgroup *memcg =3D lruvec_memcg(lruvec);
@@ -5136,10 +5132,8 @@ static unsigned long get_nr_to_scan(struct lruvec *l=
ruvec, struct scan_control *
 	if (sc->priority =3D=3D DEF_PRIORITY)
 		return nr_to_scan;
=20
-	try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false);
-
 	/* skip this lruvec as it's low on cold folios */
-	return 0;
+	return try_to_inc_max_seq(lruvec, max_seq, sc, can_swap, false) ? -1 : 0;
 }
=20
 static unsigned long get_nr_to_reclaim(struct scan_control *sc)
@@ -5148,29 +5142,18 @@ static unsigned long get_nr_to_reclaim(struct scan_=
control *sc)
 	if (!global_reclaim(sc))
 		return -1;
=20
-	/* discount the previous progress for kswapd */
-	if (current_is_kswapd())
-		return sc->nr_to_reclaim + sc->last_reclaimed;
-
 	return max(sc->nr_to_reclaim, compact_gap(sc->order));
 }
=20
-static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_contr=
ol *sc)
+static bool try_to_shrink_lruvec(struct lruvec *lruvec, struct scan_contro=
l *sc)
 {
-	struct blk_plug plug;
+	long nr_to_scan;
 	unsigned long scanned =3D 0;
 	unsigned long nr_to_reclaim =3D get_nr_to_reclaim(sc);
=20
-	lru_add_drain();
-
-	blk_start_plug(&plug);
-
-	set_mm_walk(lruvec_pgdat(lruvec));
-
 	while (true) {
 		int delta;
 		int swappiness;
-		unsigned long nr_to_scan;
=20
 		if (sc->may_swap)
 			swappiness =3D get_swappiness(lruvec, sc);
@@ -5180,7 +5163,7 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruv=
ec, struct scan_control *sc
 			swappiness =3D 0;
=20
 		nr_to_scan =3D get_nr_to_scan(lruvec, sc, swappiness);
-		if (!nr_to_scan)
+		if (nr_to_scan <=3D 0)
 			break;
=20
 		delta =3D evict_folios(lruvec, sc, swappiness);
@@ -5197,10 +5180,251 @@ static void lru_gen_shrink_lruvec(struct lruvec *l=
ruvec, struct scan_control *sc
 		cond_resched();
 	}
=20
+	/* whether try_to_inc_max_seq() was successful */
+	return nr_to_scan < 0;
+}
+
+static int shrink_one(struct lruvec *lruvec, struct scan_control *sc)
+{
+	bool success;
+	unsigned long scanned =3D sc->nr_scanned;
+	unsigned long reclaimed =3D sc->nr_reclaimed;
+	int seg =3D lru_gen_memcg_seg(lruvec);
+	struct mem_cgroup *memcg =3D lruvec_memcg(lruvec);
+	struct pglist_data *pgdat =3D lruvec_pgdat(lruvec);
+
+	/* see the comment on MEMCG_NR_GENS */
+	if (!lruvec_is_sizable(lruvec, sc))
+		return seg !=3D MEMCG_LRU_TAIL ? MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG;
+
+	mem_cgroup_calculate_protection(NULL, memcg);
+
+	if (mem_cgroup_below_min(NULL, memcg))
+		return MEMCG_LRU_YOUNG;
+
+	if (mem_cgroup_below_low(NULL, memcg)) {
+		/* see the comment on MEMCG_NR_GENS */
+		if (seg !=3D MEMCG_LRU_TAIL)
+			return MEMCG_LRU_TAIL;
+
+		memcg_memory_event(memcg, MEMCG_LOW);
+	}
+
+	success =3D try_to_shrink_lruvec(lruvec, sc);
+
+	shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority);
+
+	if (!sc->proactive)
+		vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned,
+			   sc->nr_reclaimed - reclaimed);
+
+	sc->nr_reclaimed +=3D current->reclaim_state->reclaimed_slab;
+	current->reclaim_state->reclaimed_slab =3D 0;
+
+	return success ? MEMCG_LRU_YOUNG : 0;
+}
+
+#ifdef CONFIG_MEMCG
+
+static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
+{
+	int gen;
+	int bin;
+	int first_bin;
+	struct lruvec *lruvec;
+	struct lru_gen_folio *lrugen;
+	const struct hlist_nulls_node *pos;
+	int op =3D 0;
+	struct mem_cgroup *memcg =3D NULL;
+	unsigned long nr_to_reclaim =3D get_nr_to_reclaim(sc);
+
+	bin =3D first_bin =3D get_random_u32_below(MEMCG_NR_BINS);
+restart:
+	gen =3D get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq));
+
+	rcu_read_lock();
+
+	hlist_nulls_for_each_entry_rcu(lrugen, pos, &pgdat->memcg_lru.fifo[gen][b=
in], list) {
+		if (op)
+			lru_gen_rotate_memcg(lruvec, op);
+
+		mem_cgroup_put(memcg);
+
+		lruvec =3D container_of(lrugen, struct lruvec, lrugen);
+		memcg =3D lruvec_memcg(lruvec);
+
+		if (!mem_cgroup_tryget(memcg)) {
+			op =3D 0;
+			memcg =3D NULL;
+			continue;
+		}
+
+		rcu_read_unlock();
+
+		op =3D shrink_one(lruvec, sc);
+
+		if (sc->nr_reclaimed >=3D nr_to_reclaim)
+			goto success;
+
+		rcu_read_lock();
+	}
+
+	rcu_read_unlock();
+
+	/* restart if raced with lru_gen_rotate_memcg() */
+	if (gen !=3D get_nulls_value(pos))
+		goto restart;
+
+	/* try the rest of the bins of the current generation */
+	bin =3D get_memcg_bin(bin + 1);
+	if (bin !=3D first_bin)
+		goto restart;
+success:
+	if (op)
+		lru_gen_rotate_memcg(lruvec, op);
+
+	mem_cgroup_put(memcg);
+}
+
+static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_contr=
ol *sc)
+{
+	struct blk_plug plug;
+
+	VM_WARN_ON_ONCE(global_reclaim(sc));
+
+	lru_add_drain();
+
+	blk_start_plug(&plug);
+
+	set_mm_walk(lruvec_pgdat(lruvec));
+
+	if (try_to_shrink_lruvec(lruvec, sc))
+		lru_gen_rotate_memcg(lruvec, MEMCG_LRU_YOUNG);
+
+	clear_mm_walk();
+
+	blk_finish_plug(&plug);
+}
+
+#else /* !CONFIG_MEMCG */
+
+static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc)
+{
+	BUILD_BUG();
+}
+
+static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_contr=
ol *sc)
+{
+	BUILD_BUG();
+}
+
+#endif
+
+static void set_initial_priority(struct pglist_data *pgdat, struct scan_co=
ntrol *sc)
+{
+	int priority;
+	unsigned long reclaimable;
+	struct lruvec *lruvec =3D mem_cgroup_lruvec(NULL, pgdat);
+
+	if (sc->priority !=3D DEF_PRIORITY || sc->nr_to_reclaim < MIN_LRU_BATCH)
+		return;
+	/*
+	 * Determine the initial priority based on ((total / MEMCG_NR_GENS) >>
+	 * priority) * reclaimed_to_scanned_ratio =3D nr_to_reclaim, where the
+	 * estimated reclaimed_to_scanned_ratio =3D inactive / total.
+	 */
+	reclaimable =3D node_page_state(pgdat, NR_INACTIVE_FILE);
+	if (get_swappiness(lruvec, sc))
+		reclaimable +=3D node_page_state(pgdat, NR_INACTIVE_ANON);
+
+	reclaimable /=3D MEMCG_NR_GENS;
+
+	/* round down reclaimable and round up sc->nr_to_reclaim */
+	priority =3D fls_long(reclaimable) - 1 - fls_long(sc->nr_to_reclaim - 1);
+
+	sc->priority =3D clamp(priority, 0, DEF_PRIORITY);
+}
+
+static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_con=
trol *sc)
+{
+	struct blk_plug plug;
+	unsigned long reclaimed =3D sc->nr_reclaimed;
+
+	VM_WARN_ON_ONCE(!global_reclaim(sc));
+
+	lru_add_drain();
+
+	blk_start_plug(&plug);
+
+	set_mm_walk(pgdat);
+
+	set_initial_priority(pgdat, sc);
+
+	if (current_is_kswapd())
+		sc->nr_reclaimed =3D 0;
+
+	if (mem_cgroup_disabled())
+		shrink_one(&pgdat->__lruvec, sc);
+	else
+		shrink_many(pgdat, sc);
+
+	if (current_is_kswapd())
+		sc->nr_reclaimed +=3D reclaimed;
+
 	clear_mm_walk();
=20
 	blk_finish_plug(&plug);
+
+	/* kswapd should never fail */
+	pgdat->kswapd_failures =3D 0;
+}
+
+#ifdef CONFIG_MEMCG
+void lru_gen_rotate_memcg(struct lruvec *lruvec, int op)
+{
+	int seg;
+	int old, new;
+	int bin =3D get_random_u32_below(MEMCG_NR_BINS);
+	struct pglist_data *pgdat =3D lruvec_pgdat(lruvec);
+
+	spin_lock(&pgdat->memcg_lru.lock);
+
+	VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+	seg =3D 0;
+	new =3D old =3D lruvec->lrugen.gen;
+
+	/* see the comment on MEMCG_NR_GENS */
+	if (op =3D=3D MEMCG_LRU_HEAD)
+		seg =3D MEMCG_LRU_HEAD;
+	else if (op =3D=3D MEMCG_LRU_TAIL)
+		seg =3D MEMCG_LRU_TAIL;
+	else if (op =3D=3D MEMCG_LRU_OLD)
+		new =3D get_memcg_gen(pgdat->memcg_lru.seq);
+	else if (op =3D=3D MEMCG_LRU_YOUNG)
+		new =3D get_memcg_gen(pgdat->memcg_lru.seq + 1);
+	else
+		VM_WARN_ON_ONCE(true);
+
+	hlist_nulls_del_rcu(&lruvec->lrugen.list);
+
+	if (op =3D=3D MEMCG_LRU_HEAD || op =3D=3D MEMCG_LRU_OLD)
+		hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ne=
w][bin]);
+	else
+		hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ne=
w][bin]);
+
+	pgdat->memcg_lru.nr_memcgs[old]--;
+	pgdat->memcg_lru.nr_memcgs[new]++;
+
+	lruvec->lrugen.gen =3D new;
+	WRITE_ONCE(lruvec->lrugen.seg, seg);
+
+	if (!pgdat->memcg_lru.nr_memcgs[old] && old =3D=3D get_memcg_gen(pgdat->m=
emcg_lru.seq))
+		WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
+
+	spin_unlock(&pgdat->memcg_lru.lock);
 }
+#endif
=20
 /*************************************************************************=
*****
  *                          state change
@@ -5655,11 +5879,11 @@ static int run_cmd(char cmd, int memcg_id, int nid,=
 unsigned long seq,
=20
 	if (!mem_cgroup_disabled()) {
 		rcu_read_lock();
+
 		memcg =3D mem_cgroup_from_id(memcg_id);
-#ifdef CONFIG_MEMCG
-		if (memcg && !css_tryget(&memcg->css))
+		if (!mem_cgroup_tryget(memcg))
 			memcg =3D NULL;
-#endif
+
 		rcu_read_unlock();
=20
 		if (!memcg)
@@ -5807,6 +6031,19 @@ void lru_gen_init_lruvec(struct lruvec *lruvec)
 }
=20
 #ifdef CONFIG_MEMCG
+
+void lru_gen_init_pgdat(struct pglist_data *pgdat)
+{
+	int i, j;
+
+	spin_lock_init(&pgdat->memcg_lru.lock);
+
+	for (i =3D 0; i < MEMCG_NR_GENS; i++) {
+		for (j =3D 0; j < MEMCG_NR_BINS; j++)
+			INIT_HLIST_NULLS_HEAD(&pgdat->memcg_lru.fifo[i][j], i);
+	}
+}
+
 void lru_gen_init_memcg(struct mem_cgroup *memcg)
 {
 	INIT_LIST_HEAD(&memcg->mm_list.fifo);
@@ -5830,7 +6067,69 @@ void lru_gen_exit_memcg(struct mem_cgroup *memcg)
 		}
 	}
 }
-#endif
+
+void lru_gen_online_memcg(struct mem_cgroup *memcg)
+{
+	int gen;
+	int nid;
+	int bin =3D get_random_u32_below(MEMCG_NR_BINS);
+
+	for_each_node(nid) {
+		struct pglist_data *pgdat =3D NODE_DATA(nid);
+		struct lruvec *lruvec =3D get_lruvec(memcg, nid);
+
+		spin_lock(&pgdat->memcg_lru.lock);
+
+		VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+		gen =3D get_memcg_gen(pgdat->memcg_lru.seq);
+
+		hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ge=
n][bin]);
+		pgdat->memcg_lru.nr_memcgs[gen]++;
+
+		lruvec->lrugen.gen =3D gen;
+
+		spin_unlock(&pgdat->memcg_lru.lock);
+	}
+}
+
+void lru_gen_offline_memcg(struct mem_cgroup *memcg)
+{
+	int nid;
+
+	for_each_node(nid) {
+		struct lruvec *lruvec =3D get_lruvec(memcg, nid);
+
+		lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD);
+	}
+}
+
+void lru_gen_release_memcg(struct mem_cgroup *memcg)
+{
+	int gen;
+	int nid;
+
+	for_each_node(nid) {
+		struct pglist_data *pgdat =3D NODE_DATA(nid);
+		struct lruvec *lruvec =3D get_lruvec(memcg, nid);
+
+		spin_lock(&pgdat->memcg_lru.lock);
+
+		VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list));
+
+		gen =3D lruvec->lrugen.gen;
+
+		hlist_nulls_del_rcu(&lruvec->lrugen.list);
+		pgdat->memcg_lru.nr_memcgs[gen]--;
+
+		if (!pgdat->memcg_lru.nr_memcgs[gen] && gen =3D=3D get_memcg_gen(pgdat->=
memcg_lru.seq))
+			WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1);
+
+		spin_unlock(&pgdat->memcg_lru.lock);
+	}
+}
+
+#endif /* CONFIG_MEMCG */
=20
 static int __init init_lru_gen(void)
 {
@@ -5857,6 +6156,10 @@ static void lru_gen_shrink_lruvec(struct lruvec *lru=
vec, struct scan_control *sc
 {
 }
=20
+static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_con=
trol *sc)
+{
+}
+
 #endif /* CONFIG_LRU_GEN */
=20
 static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
@@ -5870,7 +6173,7 @@ static void shrink_lruvec(struct lruvec *lruvec, stru=
ct scan_control *sc)
 	bool proportional_reclaim;
 	struct blk_plug plug;
=20
-	if (lru_gen_enabled()) {
+	if (lru_gen_enabled() && !global_reclaim(sc)) {
 		lru_gen_shrink_lruvec(lruvec, sc);
 		return;
 	}
@@ -6113,6 +6416,11 @@ static void shrink_node(pg_data_t *pgdat, struct sca=
n_control *sc)
 	struct lruvec *target_lruvec;
 	bool reclaimable =3D false;
=20
+	if (lru_gen_enabled() && global_reclaim(sc)) {
+		lru_gen_shrink_node(pgdat, sc);
+		return;
+	}
+
 	target_lruvec =3D mem_cgroup_lruvec(sc->target_mem_cgroup, pgdat);
=20
 again:
--=20
2.39.0.314.g84b9a713c41-goog
From nobody Wed Sep 17 06:42:57 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 66EB9C4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 04:20:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235117AbiLVEUr (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 23:20:47 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44978 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235007AbiLVETw (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 23:19:52 -0500
Received: from mail-il1-x14a.google.com (mail-il1-x14a.google.com
 [IPv6:2607:f8b0:4864:20::14a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 799F8248C0
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:51 -0800 (PST)
Received: by mail-il1-x14a.google.com with SMTP id
 x10-20020a056e021bca00b00302b6c0a683so473075ilv.23
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:51 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=beHmgSLzXbJpunP6/l/NxN2gRkL01SNlutgba3N1Qk4=;
        b=KVc8L16VLUlCC+NuT6JsPGih4l/R7AmDZj1lk/UOA7MArp9oW8qRRRAEE/CHL9JFRE
         k6OsWrqWjs+QpUdCZOasfSH0W/8vTYtZg4cIuIuZakFH3G4roLVmeOi6mhpvzf9Rr5Pm
         sG5fevrhs8sxV9/VRvNIlM/H7Jgms4DgL73qCf1rTYOzPm5XuvhBd0DzLtCuuYYtxdBA
         J1jv7wKED+6OqtgdXvWWjtKi1MH+Ffly5IDAHlnvtXC9SvYAIXhRiM6OkvNDWfvhZlSl
         5hLGMSmwhFx6HT88lIBYT9dimMWP19Q8HexyCSyXhoj/OTlnPqL1UW4y0l5Xa0C+5wwP
         8Www==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=beHmgSLzXbJpunP6/l/NxN2gRkL01SNlutgba3N1Qk4=;
        b=m4Oa7pgMgcY4/8YZoBa4UTe1duh4tAYPT5qOIUBuuPKOnZskUkddipHxyzwPQXqUpR
         rQ/IW9yQMI263pNMkfxefqMTkWTjwWCj7kJ1EBmaeU8vVIE8XjptMpFySdHRG5DXo/VI
         3XILfw8Ywao5pUoo19PURnPkeU6hzFO8q/cMu3clv7uHNxsWWWOk9BZdKaPtadf+s/vk
         Kzq+d0YFundJu7UlF2soOwddHre4IcWEb6AkFTGZvibMpKijO+RRYtOhb/vViXDIklBU
         imQ+/CUmnJGS77y0vtLDVqy97vexISOkQTwNBGfwrBoLbhrE41ayRII0WJMF6zHPOzLj
         ChKQ==
X-Gm-Message-State: AFqh2kpGxtXoFuGzfdVM+n4ThopudpvmVo1DQCuMv0/73xb0n9djS9FS
        It2+rTZA7ZF2pe0cNzxaKs9hH+dnV+Y=
X-Google-Smtp-Source: 
 AMrXdXvUCUZtzccxqK7kgdP2DV3G+AxrfRMA4CYtUSyCFpt/vaQ9J1+I6DbC09S/08Y40JAtOCDqp8u0BPU=
X-Received: from yuzhao.bld.corp.google.com
 ([2620:15c:183:200:a463:5f7b:440e:5c77])
 (user=yuzhao job=sendgmr) by 2002:a05:6602:88a:b0:6df:5f05:40b6 with SMTP id
 f10-20020a056602088a00b006df5f0540b6mr286712ioz.74.1671682790897; Wed, 21 Dec
 2022 20:19:50 -0800 (PST)
Date: Wed, 21 Dec 2022 21:19:05 -0700
In-Reply-To: <20221222041905.2431096-1-yuzhao@google.com>
Message-Id: <20221222041905.2431096-8-yuzhao@google.com>
Mime-Version: 1.0
References: <20221222041905.2431096-1-yuzhao@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Subject: [PATCH mm-unstable v3 7/8] mm: multi-gen LRU: clarify scan_control
 flags
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
        Jonathan Corbet <corbet@lwn.net>,
        Michael Larabel <michael@michaellarabel.com>,
        Michal Hocko <mhocko@kernel.org>,
        Mike Rapoport <rppt@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Suren Baghdasaryan <surenb@google.com>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org, linux-mm@google.com,
        Yu Zhao <yuzhao@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Among the flags in scan_control:
1. sc->may_swap, which indicates swap constraint due to memsw.max, is
   supported as usual.
2. sc->proactive, which indicates reclaim by memory.reclaim, may not
   opportunistically skip the aging path, since it is considered less
   latency sensitive.
3. !(sc->gfp_mask & __GFP_IO), which indicates IO constraint, lowers
   swappiness to prioritize file LRU, since clean file folios are more
   likely to exist.
4. sc->may_writepage and sc->may_unmap, which indicates opportunistic
   reclaim, are rejected, since unmapped clean folios are already
   prioritized. Scanning for more of them is likely futile and can
   cause high reclaim latency when there is a large number of memcgs.

The rest are handled by the existing code.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/vmscan.c | 56 ++++++++++++++++++++++++++---------------------------
 1 file changed, 28 insertions(+), 28 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index f22c8876473e..a9b318e1bdc2 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3210,6 +3210,9 @@ static int get_swappiness(struct lruvec *lruvec, stru=
ct scan_control *sc)
 	struct mem_cgroup *memcg =3D lruvec_memcg(lruvec);
 	struct pglist_data *pgdat =3D lruvec_pgdat(lruvec);
=20
+	if (!sc->may_swap)
+		return 0;
+
 	if (!can_demote(pgdat->node_id, sc) &&
 	    mem_cgroup_get_nr_swap_pages(memcg) < MIN_LRU_BATCH)
 		return 0;
@@ -4236,7 +4239,7 @@ static void walk_mm(struct lruvec *lruvec, struct mm_=
struct *mm, struct lru_gen_
 	} while (err =3D=3D -EAGAIN);
 }
=20
-static struct lru_gen_mm_walk *set_mm_walk(struct pglist_data *pgdat)
+static struct lru_gen_mm_walk *set_mm_walk(struct pglist_data *pgdat, bool=
 force_alloc)
 {
 	struct lru_gen_mm_walk *walk =3D current->reclaim_state->mm_walk;
=20
@@ -4244,7 +4247,7 @@ static struct lru_gen_mm_walk *set_mm_walk(struct pgl=
ist_data *pgdat)
 		VM_WARN_ON_ONCE(walk);
=20
 		walk =3D &pgdat->mm_walk;
-	} else if (!pgdat && !walk) {
+	} else if (!walk && force_alloc) {
 		VM_WARN_ON_ONCE(current_is_kswapd());
=20
 		walk =3D kzalloc(sizeof(*walk), __GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NO=
WARN);
@@ -4430,7 +4433,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec,=
 unsigned long max_seq,
 		goto done;
 	}
=20
-	walk =3D set_mm_walk(NULL);
+	walk =3D set_mm_walk(NULL, true);
 	if (!walk) {
 		success =3D iterate_mm_list_nowalk(lruvec, max_seq);
 		goto done;
@@ -4499,8 +4502,6 @@ static bool lruvec_is_reclaimable(struct lruvec *lruv=
ec, struct scan_control *sc
 	struct mem_cgroup *memcg =3D lruvec_memcg(lruvec);
 	DEFINE_MIN_SEQ(lruvec);
=20
-	VM_WARN_ON_ONCE(sc->memcg_low_reclaim);
-
 	/* see the comment on lru_gen_folio */
 	gen =3D lru_gen_from_seq(min_seq[LRU_GEN_FILE]);
 	birth =3D READ_ONCE(lruvec->lrugen.timestamps[gen]);
@@ -4756,12 +4757,8 @@ static bool isolate_folio(struct lruvec *lruvec, str=
uct folio *folio, struct sca
 {
 	bool success;
=20
-	/* unmapping inhibited */
-	if (!sc->may_unmap && folio_mapped(folio))
-		return false;
-
 	/* swapping inhibited */
-	if (!(sc->may_writepage && (sc->gfp_mask & __GFP_IO)) &&
+	if (!(sc->gfp_mask & __GFP_IO) &&
 	    (folio_test_dirty(folio) ||
 	     (folio_test_anon(folio) && !folio_test_swapcache(folio))))
 		return false;
@@ -4858,9 +4855,8 @@ static int scan_folios(struct lruvec *lruvec, struct =
scan_control *sc,
 	__count_vm_events(PGSCAN_ANON + type, isolated);
=20
 	/*
-	 * There might not be eligible pages due to reclaim_idx, may_unmap and
-	 * may_writepage. Check the remaining to prevent livelock if it's not
-	 * making progress.
+	 * There might not be eligible folios due to reclaim_idx. Check the
+	 * remaining to prevent livelock if it's not making progress.
 	 */
 	return isolated || !remaining ? scanned : 0;
 }
@@ -5120,9 +5116,7 @@ static long get_nr_to_scan(struct lruvec *lruvec, str=
uct scan_control *sc, bool
 	struct mem_cgroup *memcg =3D lruvec_memcg(lruvec);
 	DEFINE_MAX_SEQ(lruvec);
=20
-	if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg) ||
-	    (mem_cgroup_below_low(sc->target_mem_cgroup, memcg) &&
-	     !sc->memcg_low_reclaim))
+	if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg))
 		return 0;
=20
 	if (!should_run_aging(lruvec, max_seq, sc, can_swap, &nr_to_scan))
@@ -5150,17 +5144,14 @@ static bool try_to_shrink_lruvec(struct lruvec *lru=
vec, struct scan_control *sc)
 	long nr_to_scan;
 	unsigned long scanned =3D 0;
 	unsigned long nr_to_reclaim =3D get_nr_to_reclaim(sc);
+	int swappiness =3D get_swappiness(lruvec, sc);
+
+	/* clean file folios are more likely to exist */
+	if (swappiness && !(sc->gfp_mask & __GFP_IO))
+		swappiness =3D 1;
=20
 	while (true) {
 		int delta;
-		int swappiness;
-
-		if (sc->may_swap)
-			swappiness =3D get_swappiness(lruvec, sc);
-		else if (!cgroup_reclaim(sc) && get_swappiness(lruvec, sc))
-			swappiness =3D 1;
-		else
-			swappiness =3D 0;
=20
 		nr_to_scan =3D get_nr_to_scan(lruvec, sc, swappiness);
 		if (nr_to_scan <=3D 0)
@@ -5291,12 +5282,13 @@ static void lru_gen_shrink_lruvec(struct lruvec *lr=
uvec, struct scan_control *sc
 	struct blk_plug plug;
=20
 	VM_WARN_ON_ONCE(global_reclaim(sc));
+	VM_WARN_ON_ONCE(!sc->may_writepage || !sc->may_unmap);
=20
 	lru_add_drain();
=20
 	blk_start_plug(&plug);
=20
-	set_mm_walk(lruvec_pgdat(lruvec));
+	set_mm_walk(NULL, sc->proactive);
=20
 	if (try_to_shrink_lruvec(lruvec, sc))
 		lru_gen_rotate_memcg(lruvec, MEMCG_LRU_YOUNG);
@@ -5352,11 +5344,19 @@ static void lru_gen_shrink_node(struct pglist_data =
*pgdat, struct scan_control *
=20
 	VM_WARN_ON_ONCE(!global_reclaim(sc));
=20
+	/*
+	 * Unmapped clean folios are already prioritized. Scanning for more of
+	 * them is likely futile and can cause high reclaim latency when there
+	 * is a large number of memcgs.
+	 */
+	if (!sc->may_writepage || !sc->may_unmap)
+		goto done;
+
 	lru_add_drain();
=20
 	blk_start_plug(&plug);
=20
-	set_mm_walk(pgdat);
+	set_mm_walk(pgdat, sc->proactive);
=20
 	set_initial_priority(pgdat, sc);
=20
@@ -5374,7 +5374,7 @@ static void lru_gen_shrink_node(struct pglist_data *p=
gdat, struct scan_control *
 	clear_mm_walk();
=20
 	blk_finish_plug(&plug);
-
+done:
 	/* kswapd should never fail */
 	pgdat->kswapd_failures =3D 0;
 }
@@ -5943,7 +5943,7 @@ static ssize_t lru_gen_seq_write(struct file *file, c=
onst char __user *src,
 	set_task_reclaim_state(current, &sc.reclaim_state);
 	flags =3D memalloc_noreclaim_save();
 	blk_start_plug(&plug);
-	if (!set_mm_walk(NULL)) {
+	if (!set_mm_walk(NULL, true)) {
 		err =3D -ENOMEM;
 		goto done;
 	}
--=20
2.39.0.314.g84b9a713c41-goog
From nobody Wed Sep 17 06:42:57 2025
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8D478C4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 22 Dec 2022 04:20:57 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S235123AbiLVEUz (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 21 Dec 2022 23:20:55 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:44998 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S234968AbiLVETy (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 21 Dec 2022 23:19:54 -0500
Received: from mail-yw1-x1149.google.com (mail-yw1-x1149.google.com
 [IPv6:2607:f8b0:4864:20::1149])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 37E2C22BDD
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:53 -0800 (PST)
Received: by mail-yw1-x1149.google.com with SMTP id
 00721157ae682-4528903f275so9657567b3.8
        for <linux-kernel@vger.kernel.org>;
 Wed, 21 Dec 2022 20:19:53 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:from:to:cc:subject:date:message-id:reply-to;
        bh=wqbMgd0zowymzGskr7701SDcBxSEYhpmKyp2L4CmwQc=;
        b=FaAk0+IovKrA30gqzGnUXCgvxnAEqM9RNUrx6/qbk2dXbMev+aX6+TBCBI++qJqoJh
         DHNBFkCo27K0mtm3kCypi0Ii7FLQEyOoUcXG+t83h3xzdMvkNUCkZTP8OrqRsYYWm0el
         eWwdtsGBlfK8BfaDX27IjQxVLcqXb9rSOW0ntCxNjAM2Cim1CHaEVKSL6BfOAxE/zGHv
         zshIYCJIZ+wR/JAB7hswAHRV8jE3CLxEuhhNl1iaGkZYIJeIjSv9yuHWpAemQnptTj9G
         AsAn5TBRtcqHkkCAS4JOx0xwOehAeuT/Kj0REFHOiEd6bNs1NdquJnCAzj9joH3GCGp1
         aNTQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=cc:to:from:subject:references:mime-version:message-id:in-reply-to
         :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to;
        bh=wqbMgd0zowymzGskr7701SDcBxSEYhpmKyp2L4CmwQc=;
        b=DJmZpvGwa47fRGOzxcJH+2hrKMiv0YdQEb71x6ZA7TSFsF+0xjuzQBUvnooT5f2glg
         HM2yB4RxSZlqNmpHr5i96rt6Qf0Yb7CUqjIm3rMXPoPKR5i+PqAQNvl5aP0u0thHwnYv
         5Gx3DevBaLYOsBv2Qo4GbmKl4zsakG2ZNZzyBiKu/vsMZ0YHFOdteIC6nqrW5qpIy6Qg
         JogTScz5mwFddghFY61R8nS8/ddEqmqi5446jsKcuMi112p+zP3bgMRop6XLuRfJHl5n
         leBT0yvrOmOjdez7ANHw1jLBpTKWJNuehOCNTjo8IwCO2/lnft4q4fwapWy8GLeDdvlA
         iZtg==
X-Gm-Message-State: AFqh2kpA3abPB+h8sl0o3cc3KOKrzSF32JkUOPpVvu/pSYLvJ5t6LZIn
        EIsLKJqpWEWs2LgAfqo7ciU/4Ahoc9Y=
X-Google-Smtp-Source: 
 AMrXdXtoQV2nYQsCRHGYe9z8m7Uo+pjFU+YdC4C/Y+Mib9s2WlE6ZwrTFMZkcNSCiAcHyEFs+f2bQg5UyD0=
X-Received: from yuzhao.bld.corp.google.com
 ([2620:15c:183:200:a463:5f7b:440e:5c77])
 (user=yuzhao job=sendgmr) by 2002:a25:3c85:0:b0:6dc:b9ec:7c87 with SMTP id
 j127-20020a253c85000000b006dcb9ec7c87mr460947yba.322.1671682792499; Wed, 21
 Dec 2022 20:19:52 -0800 (PST)
Date: Wed, 21 Dec 2022 21:19:06 -0700
In-Reply-To: <20221222041905.2431096-1-yuzhao@google.com>
Message-Id: <20221222041905.2431096-9-yuzhao@google.com>
Mime-Version: 1.0
References: <20221222041905.2431096-1-yuzhao@google.com>
X-Mailer: git-send-email 2.39.0.314.g84b9a713c41-goog
Subject: [PATCH mm-unstable v3 8/8] mm: multi-gen LRU: simplify
 arch_has_hw_pte_young() check
From: Yu Zhao <yuzhao@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
        Jonathan Corbet <corbet@lwn.net>,
        Michael Larabel <michael@michaellarabel.com>,
        Michal Hocko <mhocko@kernel.org>,
        Mike Rapoport <rppt@kernel.org>,
        Roman Gushchin <roman.gushchin@linux.dev>,
        Suren Baghdasaryan <surenb@google.com>, linux-mm@kvack.org,
        linux-kernel@vger.kernel.org, linux-mm@google.com,
        Yu Zhao <yuzhao@google.com>
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain; charset="utf-8"

Scanning page tables when hardware does not set the accessed bit has
no real use cases.

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/vmscan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index a9b318e1bdc2..71d13c969b52 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4428,7 +4428,7 @@ static bool try_to_inc_max_seq(struct lruvec *lruvec,=
 unsigned long max_seq,
 	 * handful of PTEs. Spreading the work out over a period of time usually
 	 * is less efficient, but it avoids bursty page faults.
 	 */
-	if (!force_scan && !(arch_has_hw_pte_young() && get_cap(LRU_GEN_MM_WALK))=
) {
+	if (!arch_has_hw_pte_young() || !get_cap(LRU_GEN_MM_WALK)) {
 		success =3D iterate_mm_list_nowalk(lruvec, max_seq);
 		goto done;
 	}
--=20
2.39.0.314.g84b9a713c41-goog