From nobody Fri Dec 19 08:23:02 2025 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12EDF2FF144; Tue, 9 Dec 2025 01:41:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765244475; cv=none; b=kyTbAQo0XFIXMDSia73+7tvvd0CP4Q0rTPsFK+wSvjr5ASJ6QGRwbGnxN+3XKXnhNFNJrArxfm3RCvZL4MoeKVkVfKtqZXkYkPBSYD3deqxICzqyhRGTF1YYFTy+6w6BnzNHYWmlwUHF2oJtwqQViAfP8YegV21OHmaFMep3oBk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765244475; c=relaxed/simple; bh=rfNi7zzZbosLRNd0zqdVXj/xUzuZBqXhhGN5VHAKbio=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=JEEYMACgZTVxvX1o7Yxro4tVIAWOXiYg9Y84EUU+qm/7NXAweWTf0zFFXAqbIAY5DQvhIz/OKPIrDDixp4km175hmKcx7fhOiQmVYEDJnskR9V6GtZzq950EengGZzU3wJ/dvwbjV+p4dhAv5jHZBJ02KFjfWyFElp7wcKwOXxM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4dQM3x1C0QzKHMKB; Tue, 9 Dec 2025 09:40:13 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id F0A1D1A06DE; Tue, 9 Dec 2025 09:41:10 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP2 (Coremail) with SMTP id Syh0CgBnRlAafjdpkF9fBA--.23909S3; Tue, 09 Dec 2025 09:41:10 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, corbet@lwn.net, hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, zhengqi.arch@bytedance.com Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, lujialin4@huawei.com, chenridong@huaweicloud.com, zhongjinji@honor.com Subject: [PATCH -next 1/5] mm/mglru: use mem_cgroup_iter for global reclaim Date: Tue, 9 Dec 2025 01:25:53 +0000 Message-Id: <20251209012557.1949239-2-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251209012557.1949239-1-chenridong@huaweicloud.com> References: <20251209012557.1949239-1-chenridong@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: Syh0CgBnRlAafjdpkF9fBA--.23909S3 X-Coremail-Antispam: 1UD129KBjvJXoW3Jw1rJFyrtryUKF15Xr1DJrb_yoWxXF1fpF ZxJ3yay3yrJr1agF4fKF4Dua4rA3y8Jr45Jryxtw1xAF1fK34Fg342kr1xXFW5uFZ5Zr17 GFyYyw1UW3yjvFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUm014x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWU JVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67 kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY 6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0x vEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVj vjDU0xZFpf9x0pRlJPiUUUUU= X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ Content-Type: text/plain; charset="utf-8" From: Chen Ridong The memcg LRU was originally introduced for global reclaim to enhance scalability. However, its implementation complexity has led to performance regressions when dealing with a large number of memory cgroups [1]. As suggested by Johannes [1], this patch adopts mem_cgroup_iter with cookie-based iteration for global reclaim, aligning with the approach already used in shrink_node_memcgs. This simplification removes the dedicated memcg LRU tracking while maintaining the core functionality. It performed a stress test based on Yu Zhao's methodology [2] on a 1 TB, 4-node NUMA system. The results are summarized below: pgsteal: memcg LRU memcg iter stddev(pgsteal) / mean(pgsteal) 106.03% 93.20% sum(pgsteal) / sum(requested) 98.10% 99.28% workingset_refault_anon: memcg LRU memcg iter stddev(refault) / mean(refault) 193.97% 134.67% sum(refault) 1963229 2027567 The new implementation shows a clear fairness improvement, reducing the standard deviation relative to the mean by 12.8 percentage points. The pgsteal ratio is also closer to 100%. Refault counts increased by 3.2% (from 1,963,229 to 2,027,567). The primary benefits of this change are: 1. Simplified codebase by removing custom memcg LRU infrastructure 2. Improved fairness in memory reclaim across multiple cgroups 3. Better performance when creating many memory cgroups [1] https://lore.kernel.org/r/20251126171513.GC135004@cmpxchg.org [2] https://lore.kernel.org/r/20221222041905.2431096-7-yuzhao@google.com Suggested-by: Johannes Weiner Signed-off-by: Chen Ridong Acked-by: Johannes Weiner --- mm/vmscan.c | 117 ++++++++++++++++------------------------------------ 1 file changed, 36 insertions(+), 81 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index fddd168a9737..70b0e7e5393c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4895,27 +4895,14 @@ static bool try_to_shrink_lruvec(struct lruvec *lru= vec, struct scan_control *sc) return nr_to_scan < 0; } =20 -static int shrink_one(struct lruvec *lruvec, struct scan_control *sc) +static void shrink_one(struct lruvec *lruvec, struct scan_control *sc) { - bool success; unsigned long scanned =3D sc->nr_scanned; unsigned long reclaimed =3D sc->nr_reclaimed; - struct mem_cgroup *memcg =3D lruvec_memcg(lruvec); struct pglist_data *pgdat =3D lruvec_pgdat(lruvec); + struct mem_cgroup *memcg =3D lruvec_memcg(lruvec); =20 - /* lru_gen_age_node() called mem_cgroup_calculate_protection() */ - if (mem_cgroup_below_min(NULL, memcg)) - return MEMCG_LRU_YOUNG; - - if (mem_cgroup_below_low(NULL, memcg)) { - /* see the comment on MEMCG_NR_GENS */ - if (READ_ONCE(lruvec->lrugen.seg) !=3D MEMCG_LRU_TAIL) - return MEMCG_LRU_TAIL; - - memcg_memory_event(memcg, MEMCG_LOW); - } - - success =3D try_to_shrink_lruvec(lruvec, sc); + try_to_shrink_lruvec(lruvec, sc); =20 shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority); =20 @@ -4924,86 +4911,55 @@ static int shrink_one(struct lruvec *lruvec, struct= scan_control *sc) sc->nr_reclaimed - reclaimed); =20 flush_reclaim_state(sc); - - if (success && mem_cgroup_online(memcg)) - return MEMCG_LRU_YOUNG; - - if (!success && lruvec_is_sizable(lruvec, sc)) - return 0; - - /* one retry if offlined or too small */ - return READ_ONCE(lruvec->lrugen.seg) !=3D MEMCG_LRU_TAIL ? - MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG; } =20 static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) { - int op; - int gen; - int bin; - int first_bin; - struct lruvec *lruvec; - struct lru_gen_folio *lrugen; + struct mem_cgroup *target =3D sc->target_mem_cgroup; + struct mem_cgroup_reclaim_cookie reclaim =3D { + .pgdat =3D pgdat, + }; + struct mem_cgroup_reclaim_cookie *cookie =3D &reclaim; struct mem_cgroup *memcg; - struct hlist_nulls_node *pos; =20 - gen =3D get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq)); - bin =3D first_bin =3D get_random_u32_below(MEMCG_NR_BINS); -restart: - op =3D 0; - memcg =3D NULL; - - rcu_read_lock(); + if (current_is_kswapd() || sc->memcg_full_walk) + cookie =3D NULL; =20 - hlist_nulls_for_each_entry_rcu(lrugen, pos, &pgdat->memcg_lru.fifo[gen][b= in], list) { - if (op) { - lru_gen_rotate_memcg(lruvec, op); - op =3D 0; - } + memcg =3D mem_cgroup_iter(target, NULL, cookie); + while (memcg) { + struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, pgdat); =20 - mem_cgroup_put(memcg); - memcg =3D NULL; + cond_resched(); =20 - if (gen !=3D READ_ONCE(lrugen->gen)) - continue; + mem_cgroup_calculate_protection(target, memcg); =20 - lruvec =3D container_of(lrugen, struct lruvec, lrugen); - memcg =3D lruvec_memcg(lruvec); + if (mem_cgroup_below_min(target, memcg)) + goto next; =20 - if (!mem_cgroup_tryget(memcg)) { - lru_gen_release_memcg(memcg); - memcg =3D NULL; - continue; + if (mem_cgroup_below_low(target, memcg)) { + if (!sc->memcg_low_reclaim) { + sc->memcg_low_skipped =3D 1; + goto next; + } + memcg_memory_event(memcg, MEMCG_LOW); } =20 - rcu_read_unlock(); + shrink_one(lruvec, sc); =20 - op =3D shrink_one(lruvec, sc); - - rcu_read_lock(); - - if (should_abort_scan(lruvec, sc)) + if (should_abort_scan(lruvec, sc)) { + if (cookie) + mem_cgroup_iter_break(target, memcg); break; - } - - rcu_read_unlock(); - - if (op) - lru_gen_rotate_memcg(lruvec, op); - - mem_cgroup_put(memcg); - - if (!is_a_nulls(pos)) - return; + } =20 - /* restart if raced with lru_gen_rotate_memcg() */ - if (gen !=3D get_nulls_value(pos)) - goto restart; +next: + if (cookie && sc->nr_reclaimed >=3D sc->nr_to_reclaim) { + mem_cgroup_iter_break(target, memcg); + break; + } =20 - /* try the rest of the bins of the current generation */ - bin =3D get_memcg_bin(bin + 1); - if (bin !=3D first_bin) - goto restart; + memcg =3D mem_cgroup_iter(target, memcg, cookie); + } } =20 static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_contr= ol *sc) @@ -5019,8 +4975,7 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruv= ec, struct scan_control *sc =20 set_mm_walk(NULL, sc->proactive); =20 - if (try_to_shrink_lruvec(lruvec, sc)) - lru_gen_rotate_memcg(lruvec, MEMCG_LRU_YOUNG); + try_to_shrink_lruvec(lruvec, sc); =20 clear_mm_walk(); =20 --=20 2.34.1 From nobody Fri Dec 19 08:23:02 2025 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 12DF12FE589; Tue, 9 Dec 2025 01:41:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765244476; cv=none; b=CYHthVbKDi2nHd9v13BttZu7dvvnHIj2/pNOsgsND7aUQPOj7F3g4/hObOfchnNWeyk7ic/ELIwGRA1zSc929kNMFJft9eYeOUEpbcunFHsm1kmgbks6ggx1Av0o3/NOM11ujLjJ+M+AHvfdQMJY6P7E+WJ/GdBJPDbdarOQoCc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765244476; c=relaxed/simple; bh=uC7oDLb8Xv/yD3W/os2+wuGhXkqLKupacXGOQKtEHPk=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=t0XhzSXa8RrOejHboFVGAec4lmmmaVasgy/8iRd+A+o41XPy+NeuKj1o+SZUL2yODKKzuNIcpZINUdc5hUNr6wGpmwGW/tmVsqRLmgk7St49pDtN3hT3Z1SihsSgd2D8Ntvtwa6eIu04tRKkkpyrjd5n0e2cNpLe1wR8Fj3uIHE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4dQM3x2NMfzKHMKv; Tue, 9 Dec 2025 09:40:13 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id 21D191A06D7; Tue, 9 Dec 2025 09:41:11 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP2 (Coremail) with SMTP id Syh0CgBnRlAafjdpkF9fBA--.23909S4; Tue, 09 Dec 2025 09:41:10 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, corbet@lwn.net, hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, zhengqi.arch@bytedance.com Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, lujialin4@huawei.com, chenridong@huaweicloud.com, zhongjinji@honor.com Subject: [PATCH -next 2/5] mm/mglru: remove memcg lru Date: Tue, 9 Dec 2025 01:25:54 +0000 Message-Id: <20251209012557.1949239-3-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251209012557.1949239-1-chenridong@huaweicloud.com> References: <20251209012557.1949239-1-chenridong@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: Syh0CgBnRlAafjdpkF9fBA--.23909S4 X-Coremail-Antispam: 1UD129KBjvAXoW3Kw1ftw4kKry3tw15Wr18AFb_yoW8XrW7Jo WSvr4DC3Zagr18Xw18ZrnFyF9xZF4DKryrXw15JwsrC3Wjvrn8Wr4DJw4UGFyfJF1rG3y0 vryYqw1UXrZaqw13n29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOc7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r15M28IrcIa0x kI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l84AC jcxK6xIIjxv20xvE14v26w1j6s0DM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr 1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE3s1l e2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2Ix0cI 8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8JwAC jcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka0x kIwI1lc7CjxVAaw2AFwI0_GFv_Wryl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_ Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1V AY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAI cVC0I7IYx2IY6xkF7I0E14v26F4j6r4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF4lIx AIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8JVW8JrUvcSsGvfC2 KfnxnUUI43ZEXa7sRipB-tUUUUU== X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ Content-Type: text/plain; charset="utf-8" From: Chen Ridong Now that the previous patch has switched global reclaim to use mem_cgroup_iter, the specialized memcg LRU infrastructure is no longer needed. This patch removes all related code: Signed-off-by: Chen Ridong Acked-by: Johannes Weiner --- Documentation/mm/multigen_lru.rst | 30 ------ include/linux/mmzone.h | 89 ----------------- mm/memcontrol-v1.c | 6 -- mm/memcontrol.c | 4 - mm/mm_init.c | 1 - mm/vmscan.c | 153 +----------------------------- 6 files changed, 1 insertion(+), 282 deletions(-) diff --git a/Documentation/mm/multigen_lru.rst b/Documentation/mm/multigen_= lru.rst index 52ed5092022f..bf8547e2f592 100644 --- a/Documentation/mm/multigen_lru.rst +++ b/Documentation/mm/multigen_lru.rst @@ -220,36 +220,6 @@ time domain because a CPU can scan pages at different = rates under varying memory pressure. It calculates a moving average for each new generation to avoid being permanently locked in a suboptimal state. =20 -Memcg LRU ---------- -An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs, -since each node and memcg combination has an LRU of folios (see -``mem_cgroup_lruvec()``). Its goal is to improve the scalability of -global reclaim, which is critical to system-wide memory overcommit in -data centers. Note that memcg LRU only applies to global reclaim. - -The basic structure of an memcg LRU can be understood by an analogy to -the active/inactive LRU (of folios): - -1. It has the young and the old (generations), i.e., the counterparts - to the active and the inactive; -2. The increment of ``max_seq`` triggers promotion, i.e., the - counterpart to activation; -3. Other events trigger similar operations, e.g., offlining an memcg - triggers demotion, i.e., the counterpart to deactivation. - -In terms of global reclaim, it has two distinct features: - -1. Sharding, which allows each thread to start at a random memcg (in - the old generation) and improves parallelism; -2. Eventual fairness, which allows direct reclaim to bail out at will - and reduces latency without affecting fairness over some time. - -In terms of traversing memcgs during global reclaim, it improves the -best-case complexity from O(n) to O(1) and does not affect the -worst-case complexity O(n). Therefore, on average, it has a sublinear -complexity. - Summary ------- The multi-gen LRU (of folios) can be disassembled into the following diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 75ef7c9f9307..49952301ff3b 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -509,12 +509,6 @@ struct lru_gen_folio { atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS]; /* whether the multi-gen LRU is enabled */ bool enabled; - /* the memcg generation this lru_gen_folio belongs to */ - u8 gen; - /* the list segment this lru_gen_folio belongs to */ - u8 seg; - /* per-node lru_gen_folio list for global reclaim */ - struct hlist_nulls_node list; }; =20 enum { @@ -558,79 +552,14 @@ struct lru_gen_mm_walk { bool force_scan; }; =20 -/* - * For each node, memcgs are divided into two generations: the old and the - * young. For each generation, memcgs are randomly sharded into multiple b= ins - * to improve scalability. For each bin, the hlist_nulls is virtually divi= ded - * into three segments: the head, the tail and the default. - * - * An onlining memcg is added to the tail of a random bin in the old gener= ation. - * The eviction starts at the head of a random bin in the old generation. = The - * per-node memcg generation counter, whose reminder (mod MEMCG_NR_GENS) i= ndexes - * the old generation, is incremented when all its bins become empty. - * - * There are four operations: - * 1. MEMCG_LRU_HEAD, which moves a memcg to the head of a random bin in i= ts - * current generation (old or young) and updates its "seg" to "head"; - * 2. MEMCG_LRU_TAIL, which moves a memcg to the tail of a random bin in i= ts - * current generation (old or young) and updates its "seg" to "tail"; - * 3. MEMCG_LRU_OLD, which moves a memcg to the head of a random bin in th= e old - * generation, updates its "gen" to "old" and resets its "seg" to "defa= ult"; - * 4. MEMCG_LRU_YOUNG, which moves a memcg to the tail of a random bin in = the - * young generation, updates its "gen" to "young" and resets its "seg" = to - * "default". - * - * The events that trigger the above operations are: - * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD; - * 2. The first attempt to reclaim a memcg below low, which triggers - * MEMCG_LRU_TAIL; - * 3. The first attempt to reclaim a memcg offlined or below reclaimable s= ize - * threshold, which triggers MEMCG_LRU_TAIL; - * 4. The second attempt to reclaim a memcg offlined or below reclaimable = size - * threshold, which triggers MEMCG_LRU_YOUNG; - * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YO= UNG; - * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_Y= OUNG; - * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD. - * - * Notes: - * 1. Memcg LRU only applies to global reclaim, and the round-robin increm= enting - * of their max_seq counters ensures the eventual fairness to all eligi= ble - * memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). - * 2. There are only two valid generations: old (seq) and young (seq+1). - * MEMCG_NR_GENS is set to three so that when reading the generation co= unter - * locklessly, a stale value (seq-1) does not wraparound to young. - */ -#define MEMCG_NR_GENS 3 -#define MEMCG_NR_BINS 8 - -struct lru_gen_memcg { - /* the per-node memcg generation counter */ - unsigned long seq; - /* each memcg has one lru_gen_folio per node */ - unsigned long nr_memcgs[MEMCG_NR_GENS]; - /* per-node lru_gen_folio list for global reclaim */ - struct hlist_nulls_head fifo[MEMCG_NR_GENS][MEMCG_NR_BINS]; - /* protects the above */ - spinlock_t lock; -}; - -void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); =20 void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); -void lru_gen_online_memcg(struct mem_cgroup *memcg); -void lru_gen_offline_memcg(struct mem_cgroup *memcg); -void lru_gen_release_memcg(struct mem_cgroup *memcg); -void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid); =20 #else /* !CONFIG_LRU_GEN */ =20 -static inline void lru_gen_init_pgdat(struct pglist_data *pgdat) -{ -} - static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } @@ -648,22 +577,6 @@ static inline void lru_gen_exit_memcg(struct mem_cgrou= p *memcg) { } =20 -static inline void lru_gen_online_memcg(struct mem_cgroup *memcg) -{ -} - -static inline void lru_gen_offline_memcg(struct mem_cgroup *memcg) -{ -} - -static inline void lru_gen_release_memcg(struct mem_cgroup *memcg) -{ -} - -static inline void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid) -{ -} - #endif /* CONFIG_LRU_GEN */ =20 struct lruvec { @@ -1503,8 +1416,6 @@ typedef struct pglist_data { #ifdef CONFIG_LRU_GEN /* kswap mm walk data */ struct lru_gen_mm_walk mm_walk; - /* lru_gen_folio list */ - struct lru_gen_memcg memcg_lru; #endif =20 CACHELINE_PADDING(_pad2_); diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index 6eed14bff742..8f41e72ae7f0 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -182,12 +182,6 @@ static void memcg1_update_tree(struct mem_cgroup *memc= g, int nid) struct mem_cgroup_per_node *mz; struct mem_cgroup_tree_per_node *mctz; =20 - if (lru_gen_enabled()) { - if (soft_limit_excess(memcg)) - lru_gen_soft_reclaim(memcg, nid); - return; - } - mctz =3D soft_limit_tree.rb_tree_per_node[nid]; if (!mctz) return; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index be810c1fbfc3..ab3ebecb5ec7 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3874,8 +3874,6 @@ static int mem_cgroup_css_online(struct cgroup_subsys= _state *css) if (unlikely(mem_cgroup_is_root(memcg)) && !mem_cgroup_disabled()) queue_delayed_work(system_unbound_wq, &stats_flush_dwork, FLUSH_TIME); - lru_gen_online_memcg(memcg); - /* Online state pins memcg ID, memcg ID pins CSS */ refcount_set(&memcg->id.ref, 1); css_get(css); @@ -3915,7 +3913,6 @@ static void mem_cgroup_css_offline(struct cgroup_subs= ys_state *css) reparent_deferred_split_queue(memcg); reparent_shrinker_deferred(memcg); wb_memcg_offline(memcg); - lru_gen_offline_memcg(memcg); =20 drain_all_stock(memcg); =20 @@ -3927,7 +3924,6 @@ static void mem_cgroup_css_released(struct cgroup_sub= sys_state *css) struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); =20 invalidate_reclaim_iterators(memcg); - lru_gen_release_memcg(memcg); } =20 static void mem_cgroup_css_free(struct cgroup_subsys_state *css) diff --git a/mm/mm_init.c b/mm/mm_init.c index fc2a6f1e518f..6e5e1fe6ff31 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1745,7 +1745,6 @@ static void __init free_area_init_node(int nid) pgdat_set_deferred_range(pgdat); =20 free_area_init_core(pgdat); - lru_gen_init_pgdat(pgdat); } =20 /* Any regular or high memory on that node ? */ diff --git a/mm/vmscan.c b/mm/vmscan.c index 70b0e7e5393c..584f41eb4c14 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2698,9 +2698,6 @@ static bool should_clear_pmd_young(void) #define for_each_evictable_type(type, swappiness) \ for ((type) =3D min_type(swappiness); (type) <=3D max_type(swappiness); (= type)++) =20 -#define get_memcg_gen(seq) ((seq) % MEMCG_NR_GENS) -#define get_memcg_bin(bin) ((bin) % MEMCG_NR_BINS) - static struct lruvec *get_lruvec(struct mem_cgroup *memcg, int nid) { struct pglist_data *pgdat =3D NODE_DATA(nid); @@ -4287,140 +4284,6 @@ bool lru_gen_look_around(struct page_vma_mapped_wal= k *pvmw) return true; } =20 -/*************************************************************************= ***** - * memcg LRU - *************************************************************************= *****/ - -/* see the comment on MEMCG_NR_GENS */ -enum { - MEMCG_LRU_NOP, - MEMCG_LRU_HEAD, - MEMCG_LRU_TAIL, - MEMCG_LRU_OLD, - MEMCG_LRU_YOUNG, -}; - -static void lru_gen_rotate_memcg(struct lruvec *lruvec, int op) -{ - int seg; - int old, new; - unsigned long flags; - int bin =3D get_random_u32_below(MEMCG_NR_BINS); - struct pglist_data *pgdat =3D lruvec_pgdat(lruvec); - - spin_lock_irqsave(&pgdat->memcg_lru.lock, flags); - - VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list)); - - seg =3D 0; - new =3D old =3D lruvec->lrugen.gen; - - /* see the comment on MEMCG_NR_GENS */ - if (op =3D=3D MEMCG_LRU_HEAD) - seg =3D MEMCG_LRU_HEAD; - else if (op =3D=3D MEMCG_LRU_TAIL) - seg =3D MEMCG_LRU_TAIL; - else if (op =3D=3D MEMCG_LRU_OLD) - new =3D get_memcg_gen(pgdat->memcg_lru.seq); - else if (op =3D=3D MEMCG_LRU_YOUNG) - new =3D get_memcg_gen(pgdat->memcg_lru.seq + 1); - else - VM_WARN_ON_ONCE(true); - - WRITE_ONCE(lruvec->lrugen.seg, seg); - WRITE_ONCE(lruvec->lrugen.gen, new); - - hlist_nulls_del_rcu(&lruvec->lrugen.list); - - if (op =3D=3D MEMCG_LRU_HEAD || op =3D=3D MEMCG_LRU_OLD) - hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ne= w][bin]); - else - hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ne= w][bin]); - - pgdat->memcg_lru.nr_memcgs[old]--; - pgdat->memcg_lru.nr_memcgs[new]++; - - if (!pgdat->memcg_lru.nr_memcgs[old] && old =3D=3D get_memcg_gen(pgdat->m= emcg_lru.seq)) - WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1); - - spin_unlock_irqrestore(&pgdat->memcg_lru.lock, flags); -} - -#ifdef CONFIG_MEMCG - -void lru_gen_online_memcg(struct mem_cgroup *memcg) -{ - int gen; - int nid; - int bin =3D get_random_u32_below(MEMCG_NR_BINS); - - for_each_node(nid) { - struct pglist_data *pgdat =3D NODE_DATA(nid); - struct lruvec *lruvec =3D get_lruvec(memcg, nid); - - spin_lock_irq(&pgdat->memcg_lru.lock); - - VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list)); - - gen =3D get_memcg_gen(pgdat->memcg_lru.seq); - - lruvec->lrugen.gen =3D gen; - - hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ge= n][bin]); - pgdat->memcg_lru.nr_memcgs[gen]++; - - spin_unlock_irq(&pgdat->memcg_lru.lock); - } -} - -void lru_gen_offline_memcg(struct mem_cgroup *memcg) -{ - int nid; - - for_each_node(nid) { - struct lruvec *lruvec =3D get_lruvec(memcg, nid); - - lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD); - } -} - -void lru_gen_release_memcg(struct mem_cgroup *memcg) -{ - int gen; - int nid; - - for_each_node(nid) { - struct pglist_data *pgdat =3D NODE_DATA(nid); - struct lruvec *lruvec =3D get_lruvec(memcg, nid); - - spin_lock_irq(&pgdat->memcg_lru.lock); - - if (hlist_nulls_unhashed(&lruvec->lrugen.list)) - goto unlock; - - gen =3D lruvec->lrugen.gen; - - hlist_nulls_del_init_rcu(&lruvec->lrugen.list); - pgdat->memcg_lru.nr_memcgs[gen]--; - - if (!pgdat->memcg_lru.nr_memcgs[gen] && gen =3D=3D get_memcg_gen(pgdat->= memcg_lru.seq)) - WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1); -unlock: - spin_unlock_irq(&pgdat->memcg_lru.lock); - } -} - -void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid) -{ - struct lruvec *lruvec =3D get_lruvec(memcg, nid); - - /* see the comment on MEMCG_NR_GENS */ - if (READ_ONCE(lruvec->lrugen.seg) !=3D MEMCG_LRU_HEAD) - lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD); -} - -#endif /* CONFIG_MEMCG */ - /*************************************************************************= ***** * the eviction *************************************************************************= *****/ @@ -5613,18 +5476,6 @@ static const struct file_operations lru_gen_ro_fops = =3D { * initialization *************************************************************************= *****/ =20 -void lru_gen_init_pgdat(struct pglist_data *pgdat) -{ - int i, j; - - spin_lock_init(&pgdat->memcg_lru.lock); - - for (i =3D 0; i < MEMCG_NR_GENS; i++) { - for (j =3D 0; j < MEMCG_NR_BINS; j++) - INIT_HLIST_NULLS_HEAD(&pgdat->memcg_lru.fifo[i][j], i); - } -} - void lru_gen_init_lruvec(struct lruvec *lruvec) { int i; @@ -5671,9 +5522,7 @@ void lru_gen_exit_memcg(struct mem_cgroup *memcg) struct lru_gen_mm_state *mm_state =3D get_mm_state(lruvec); =20 VM_WARN_ON_ONCE(memchr_inv(lruvec->lrugen.nr_pages, 0, - sizeof(lruvec->lrugen.nr_pages))); - - lruvec->lrugen.list.next =3D LIST_POISON1; + sizeof(lruvec->lrugen.nr_pages))); =20 if (!mm_state) continue; --=20 2.34.1 From nobody Fri Dec 19 08:23:02 2025 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 28C222FF14D; Tue, 9 Dec 2025 01:41:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765244476; cv=none; b=DFlhOc7WHiO+TIN7jA5kLrtFV953cLljVhyPRW8Me1MbZiiTDfDEhePbv/gbBTRa/reRQUFqgQfHEddPAaXZX1dWz1RHyuWrZCn21lQvUyF8FWFWBvu3FBruDzV0AFBGo4B7DAPc2zv/q+QMq1zu3xDVsXuasuSuLnUkT7jImxQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765244476; c=relaxed/simple; bh=T+kqe7s3p6OD9UwmnfewzL0JtVgGbWopOKOE++i19n8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=kJgJmWZ95rP1CweeqTsDbaXZUdyA4QydCn15yKNMe+ERtu+g6qIcNoDyu4h6EtAohhYIBwTYJCe0U0gCSYqzg+BXZrHMrOHsz2PRIq3WJsOWQeh4Zq5fs5pwguEZTKDcuL8xBRSYDAK2REQ7jLcr4ZBlXSx2mtKypy76YDs5ZAE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4dQM3x2vW3zKHMHr; Tue, 9 Dec 2025 09:40:13 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id 356EC1A1CB3; Tue, 9 Dec 2025 09:41:11 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP2 (Coremail) with SMTP id Syh0CgBnRlAafjdpkF9fBA--.23909S5; Tue, 09 Dec 2025 09:41:10 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, corbet@lwn.net, hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, zhengqi.arch@bytedance.com Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, lujialin4@huawei.com, chenridong@huaweicloud.com, zhongjinji@honor.com Subject: [PATCH -next 3/5] mm/mglru: extend shrink_one for both lrugen and non-lrugen Date: Tue, 9 Dec 2025 01:25:55 +0000 Message-Id: <20251209012557.1949239-4-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251209012557.1949239-1-chenridong@huaweicloud.com> References: <20251209012557.1949239-1-chenridong@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: Syh0CgBnRlAafjdpkF9fBA--.23909S5 X-Coremail-Antispam: 1UD129KBjvJXoWxJF1kArWUtr43KF1rGF45ZFb_yoWrZry7pF ZxJry2ya1rArnIg39avF4kWw1Yyw48Gr13Ary5C3WfAFyfJFyFya47CrW8CryUC3s5ur9x Ar4avw1UW3y0vF7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmF14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWU JVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67 kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY 6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42 IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2 KfnxnUUI43ZEXa7sRiuWl3UUUUU== X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ Content-Type: text/plain; charset="utf-8" From: Chen Ridong Currently, flush_reclaim_state is placed differently between shrink_node_memcgs and shrink_many. shrink_many (only used for gen-LRU) calls it after each lruvec is shrunk, while shrink_node_memcgs calls it only after all lruvecs have been shrunk. This patch moves flush_reclaim_state into shrink_node_memcgs and calls it after each lruvec. This unifies the behavior and is reasonable because: 1. flush_reclaim_state adds current->reclaim_state->reclaimed to sc->nr_reclaimed. 2. For non-MGLRU root reclaim, this can help stop the iteration earlier when nr_to_reclaim is reached. 3. For non-root reclaim, the effect is negligible since flush_reclaim_state does nothing in that case. After moving flush_reclaim_state into shrink_node_memcgs, shrink_one can be extended to support both lrugen and non-lrugen paths. It will call try_to_shrink_lruvec for lrugen root reclaim and shrink_lruvec otherwise. Signed-off-by: Chen Ridong --- mm/vmscan.c | 57 +++++++++++++++++++++-------------------------------- 1 file changed, 23 insertions(+), 34 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 584f41eb4c14..795f5ebd9341 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4758,23 +4758,7 @@ static bool try_to_shrink_lruvec(struct lruvec *lruv= ec, struct scan_control *sc) return nr_to_scan < 0; } =20 -static void shrink_one(struct lruvec *lruvec, struct scan_control *sc) -{ - unsigned long scanned =3D sc->nr_scanned; - unsigned long reclaimed =3D sc->nr_reclaimed; - struct pglist_data *pgdat =3D lruvec_pgdat(lruvec); - struct mem_cgroup *memcg =3D lruvec_memcg(lruvec); - - try_to_shrink_lruvec(lruvec, sc); - - shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority); - - if (!sc->proactive) - vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned, - sc->nr_reclaimed - reclaimed); - - flush_reclaim_state(sc); -} +static void shrink_one(struct lruvec *lruvec, struct scan_control *sc); =20 static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) { @@ -5760,6 +5744,27 @@ static inline bool should_continue_reclaim(struct pg= list_data *pgdat, return inactive_lru_pages > pages_for_compaction; } =20 +static void shrink_one(struct lruvec *lruvec, struct scan_control *sc) +{ + unsigned long scanned =3D sc->nr_scanned; + unsigned long reclaimed =3D sc->nr_reclaimed; + struct pglist_data *pgdat =3D lruvec_pgdat(lruvec); + struct mem_cgroup *memcg =3D lruvec_memcg(lruvec); + + if (lru_gen_enabled() && root_reclaim(sc)) + try_to_shrink_lruvec(lruvec, sc); + else + shrink_lruvec(lruvec, sc); + + shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority); + + if (!sc->proactive) + vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned, + sc->nr_reclaimed - reclaimed); + + flush_reclaim_state(sc); +} + static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) { struct mem_cgroup *target_memcg =3D sc->target_mem_cgroup; @@ -5784,8 +5789,6 @@ static void shrink_node_memcgs(pg_data_t *pgdat, stru= ct scan_control *sc) memcg =3D mem_cgroup_iter(target_memcg, NULL, partial); do { struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, pgdat); - unsigned long reclaimed; - unsigned long scanned; =20 /* * This loop can become CPU-bound when target memcgs @@ -5817,19 +5820,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, str= uct scan_control *sc) memcg_memory_event(memcg, MEMCG_LOW); } =20 - reclaimed =3D sc->nr_reclaimed; - scanned =3D sc->nr_scanned; - - shrink_lruvec(lruvec, sc); - - shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, - sc->priority); - - /* Record the group's reclaim efficiency */ - if (!sc->proactive) - vmpressure(sc->gfp_mask, memcg, false, - sc->nr_scanned - scanned, - sc->nr_reclaimed - reclaimed); + shrink_one(lruvec, sc); =20 /* If partial walks are allowed, bail once goal is reached */ if (partial && sc->nr_reclaimed >=3D sc->nr_to_reclaim) { @@ -5863,8 +5854,6 @@ static void shrink_node(pg_data_t *pgdat, struct scan= _control *sc) =20 shrink_node_memcgs(pgdat, sc); =20 - flush_reclaim_state(sc); - nr_node_reclaimed =3D sc->nr_reclaimed - nr_reclaimed; =20 /* Record the subtree's reclaim efficiency */ --=20 2.34.1 From nobody Fri Dec 19 08:23:02 2025 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 082B12FF153; Tue, 9 Dec 2025 01:41:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765244476; cv=none; b=OvtB9gN7IWv3ArbJAQGY1lDRotSFGGawUAnFh924CUXm5c87p3J+HoaShK+UT7L+M41OTvovJQDJf5BcoEFuvFJL8m2XpTFnwSnyjtGihAPJGYpFmzC0wApe+cR0u1ybl8tlWm+syY3luORuHf0rjLxd1Xdl752NJQSF5m6YnIU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765244476; c=relaxed/simple; bh=IozRJQ+B6OgfbHCeUBmAr2KIQZ4FCMfcLnzh6x/3YPs=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=oScRBA3A3VGpb+6JLBfpyLCb04k5bYTCDSLpJ34WqEAQk97j/PuaU+3Dw74Rf1ojPjqyPhUhU0HlPZlwoGqsf71iI/BaNETt+r1kxuZlEgjRfQtDDt4p1USFElcaFZ4IUeFGYwzoysiWu++LnKrw2zMUhacVJpj/Zxuby+7ZvHY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4dQM4p4vQ2zYQtkD; Tue, 9 Dec 2025 09:40:58 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id 5760C1A084B; Tue, 9 Dec 2025 09:41:11 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP2 (Coremail) with SMTP id Syh0CgBnRlAafjdpkF9fBA--.23909S6; Tue, 09 Dec 2025 09:41:10 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, corbet@lwn.net, hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, zhengqi.arch@bytedance.com Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, lujialin4@huawei.com, chenridong@huaweicloud.com, zhongjinji@honor.com Subject: [PATCH -next 4/5] mm/mglru: combine shrink_many into shrink_node_memcgs Date: Tue, 9 Dec 2025 01:25:56 +0000 Message-Id: <20251209012557.1949239-5-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251209012557.1949239-1-chenridong@huaweicloud.com> References: <20251209012557.1949239-1-chenridong@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: Syh0CgBnRlAafjdpkF9fBA--.23909S6 X-Coremail-Antispam: 1UD129KBjvJXoWxGF15Jw45Gr4rCF17Cr47CFg_yoWrXr4DpF ZxJ347AayrAF4Yga4fta97ua4fCw48GrW3Ary8J3WfAr1Sga45Ga47CryIyFW5Ca4kur17 ZF90vw18Wa1jvFUanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0pRQJ5wUUUUU= X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ Content-Type: text/plain; charset="utf-8" From: Chen Ridong The previous patch extended shrink_one to support both lrugen and non-lrugen reclaim. Now shrink_many and shrink_node_memcgs are almost identical, except that shrink_many also calls should_abort_scan for lrugen root reclaim. This patch adds the should_abort_scan check to shrink_node_memcgs (which is only meaningful for gen-LRU root reclaim). After this change, shrink_node_memcgs can be used directly instead of shrink_many, allowing shrink_many to be safely removed. Suggested-by: Shakeel Butt Signed-off-by: Chen Ridong --- mm/vmscan.c | 67 ++++++++++++----------------------------------------- 1 file changed, 15 insertions(+), 52 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 795f5ebd9341..dbf2cfbe3243 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4758,57 +4758,6 @@ static bool try_to_shrink_lruvec(struct lruvec *lruv= ec, struct scan_control *sc) return nr_to_scan < 0; } =20 -static void shrink_one(struct lruvec *lruvec, struct scan_control *sc); - -static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) -{ - struct mem_cgroup *target =3D sc->target_mem_cgroup; - struct mem_cgroup_reclaim_cookie reclaim =3D { - .pgdat =3D pgdat, - }; - struct mem_cgroup_reclaim_cookie *cookie =3D &reclaim; - struct mem_cgroup *memcg; - - if (current_is_kswapd() || sc->memcg_full_walk) - cookie =3D NULL; - - memcg =3D mem_cgroup_iter(target, NULL, cookie); - while (memcg) { - struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, pgdat); - - cond_resched(); - - mem_cgroup_calculate_protection(target, memcg); - - if (mem_cgroup_below_min(target, memcg)) - goto next; - - if (mem_cgroup_below_low(target, memcg)) { - if (!sc->memcg_low_reclaim) { - sc->memcg_low_skipped =3D 1; - goto next; - } - memcg_memory_event(memcg, MEMCG_LOW); - } - - shrink_one(lruvec, sc); - - if (should_abort_scan(lruvec, sc)) { - if (cookie) - mem_cgroup_iter_break(target, memcg); - break; - } - -next: - if (cookie && sc->nr_reclaimed >=3D sc->nr_to_reclaim) { - mem_cgroup_iter_break(target, memcg); - break; - } - - memcg =3D mem_cgroup_iter(target, memcg, cookie); - } -} - static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_contr= ol *sc) { struct blk_plug plug; @@ -4829,6 +4778,9 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruv= ec, struct scan_control *sc blk_finish_plug(&plug); } =20 +static void shrink_one(struct lruvec *lruvec, struct scan_control *sc); +static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc); + static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_con= trol *sc) { struct blk_plug plug; @@ -4858,7 +4810,7 @@ static void lru_gen_shrink_node(struct pglist_data *p= gdat, struct scan_control * if (mem_cgroup_disabled()) shrink_one(&pgdat->__lruvec, sc); else - shrink_many(pgdat, sc); + shrink_node_memcgs(pgdat, sc); =20 if (current_is_kswapd()) sc->nr_reclaimed +=3D reclaimed; @@ -5554,6 +5506,11 @@ static void lru_gen_shrink_node(struct pglist_data *= pgdat, struct scan_control * BUILD_BUG(); } =20 +static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *= sc) +{ + return false; +} + #endif /* CONFIG_LRU_GEN */ =20 static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) @@ -5822,6 +5779,12 @@ static void shrink_node_memcgs(pg_data_t *pgdat, str= uct scan_control *sc) =20 shrink_one(lruvec, sc); =20 + if (should_abort_scan(lruvec, sc)) { + if (partial) + mem_cgroup_iter_break(target_memcg, memcg); + break; + } + /* If partial walks are allowed, bail once goal is reached */ if (partial && sc->nr_reclaimed >=3D sc->nr_to_reclaim) { mem_cgroup_iter_break(target_memcg, memcg); --=20 2.34.1 From nobody Fri Dec 19 08:23:02 2025 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 560EC2FF151; Tue, 9 Dec 2025 01:41:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765244475; cv=none; b=oHaRoX0ycSgAyfS3HbnSVIMSgyrS0vWbi0Uoi6KbmY2CaTdkfqZ3GrIBivoijbVYJpAghYb6XTismYZOaRYiEzg9qJABEjENWkPJ19zuF8C3bPKIRkUy3M4VCxG2EM2p89u0xAB7gd/cl6E3fLJImLhdyqeTOI80YjCsSV0LnaU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1765244475; c=relaxed/simple; bh=oGyGsrhl+bT6RvC4sisJDQxidB4pVVdXpZbBrZRHauQ=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tXM4bRA3/tAM5PNUif/+Za6BcKzn/SBu5dcoh11H1b5qzSgDvSgfsgsk4PDiXf6sHYIOBVDtE/3kNB1tsEkKSGuQ72IvLWdrVn9q6mAtuospb+UQgYJ6IPVGGHpX0cvLykO6UMjRFkLwFlixJVKHHA+BnwzLHIPXmx8ZbMHHR+c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.216]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4dQM3x4nV0zKHMMf; Tue, 9 Dec 2025 09:40:13 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.75]) by mail.maildlp.com (Postfix) with ESMTP id 7A2581A15CB; Tue, 9 Dec 2025 09:41:11 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP2 (Coremail) with SMTP id Syh0CgBnRlAafjdpkF9fBA--.23909S7; Tue, 09 Dec 2025 09:41:11 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, corbet@lwn.net, hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, zhengqi.arch@bytedance.com Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, lujialin4@huawei.com, chenridong@huaweicloud.com, zhongjinji@honor.com Subject: [PATCH -next 5/5] mm/mglru: factor lrugen state out of shrink_lruvec Date: Tue, 9 Dec 2025 01:25:57 +0000 Message-Id: <20251209012557.1949239-6-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20251209012557.1949239-1-chenridong@huaweicloud.com> References: <20251209012557.1949239-1-chenridong@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: Syh0CgBnRlAafjdpkF9fBA--.23909S7 X-Coremail-Antispam: 1UD129KBjvJXoW7tw1UurW8Aw1rCFy5XFWUurg_yoW8try5pa 9xG3yUZa4FyFn0qr9xJF4DWa45ur4UtrWxJr9rWw18CF4Sqa4rK347Cr4Uury5Z3ykZr13 Xry5Gr17Ww1jvFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_tr0E3s1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUJVWUGwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0pRQJ5wUUUUU= X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ Content-Type: text/plain; charset="utf-8" From: Chen Ridong A previous patch updated shrink_node_memcgs to handle lrugen root reclaim and extended shrink_one to support both lrugen and non-lrugen. However, in shrink_one, lrugen non-root reclaim still invokes shrink_lruvec, which should only be used for non-lrugen reclaim. To clarify the semantics, this patch moves the lrugen-specific logic out of shrink_lruvec, leaving shrink_lruvec exclusively for non-lrugen reclaim. Now for lrugen, shrink_one invokes lru_gen_shrink_lruvec, which calls try_to_shrink_lruvec directly, without extra handling for root reclaim, as that processing is already done in lru_gen_shrink_node. Non-root reclaim behavior remains unchanged. Signed-off-by: Chen Ridong --- mm/vmscan.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index dbf2cfbe3243..c5f517ec52a7 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4762,7 +4762,12 @@ static void lru_gen_shrink_lruvec(struct lruvec *lru= vec, struct scan_control *sc { struct blk_plug plug; =20 - VM_WARN_ON_ONCE(root_reclaim(sc)); + /* Root reclaim has finished other extra work outside, just shrink. */ + if (root_reclaim(sc)) { + try_to_shrink_lruvec(lruvec, sc); + return; + } + VM_WARN_ON_ONCE(!sc->may_writepage || !sc->may_unmap); =20 lru_add_drain(); @@ -5524,11 +5529,6 @@ static void shrink_lruvec(struct lruvec *lruvec, str= uct scan_control *sc) bool proportional_reclaim; struct blk_plug plug; =20 - if (lru_gen_enabled() && !root_reclaim(sc)) { - lru_gen_shrink_lruvec(lruvec, sc); - return; - } - get_scan_count(lruvec, sc, nr); =20 /* Record the original scan target for proportional adjustments later */ @@ -5708,8 +5708,8 @@ static void shrink_one(struct lruvec *lruvec, struct = scan_control *sc) struct pglist_data *pgdat =3D lruvec_pgdat(lruvec); struct mem_cgroup *memcg =3D lruvec_memcg(lruvec); =20 - if (lru_gen_enabled() && root_reclaim(sc)) - try_to_shrink_lruvec(lruvec, sc); + if (lru_gen_enabled()) + lru_gen_shrink_lruvec(lruvec, sc); else shrink_lruvec(lruvec, sc); =20 --=20 2.34.1