From nobody Mon Feb 9 15:11:07 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60A8742EEC9; Tue, 20 Jan 2026 13:57:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768917473; cv=none; b=k5lF+eZ05CMAujOmgdgRwXkyqAWz0Fw7tO1vuUuV4kAfjnpd0+uWQsRFD5QQf3o/FrrPWreYSvi0fgJaPFXnbSqSnvPu0zFx3OdFP327l0yVAKvlnbBaskkuOmsgVHtqY5ucZgaUHG/Xn0jLapGDQj8RziDp/4PzD698cfyA040= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768917473; c=relaxed/simple; bh=1mQ+p20BVTtij/RjqCHRdSovj3VKQLmAwG1/Ah+A9SI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=tgz3yKmMjovyDYEp75gYyIMecT0CKeYPPmSJeKXXNBoWYXnxQmECRbDCKSR4h38WpsFgoocg25s1c5JwJDP3nwjq4OlqeGQf5MtFwHjKRv7ruS47mrJ0/uf1E1fGNcTwDzHovhlr1eS9Y6GVhZ87SdQeDJyjnOI6lUK56Zts+58= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4dwTQP4lmszKHMkc; Tue, 20 Jan 2026 21:56:45 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 0576C4056B; Tue, 20 Jan 2026 21:57:47 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP4 (Coremail) with SMTP id gCh0CgCnCPnQiW9pwhTxEQ--.10691S3; Tue, 20 Jan 2026 21:57:46 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, corbet@lwn.net, skhan@linuxfoundation.org, hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, zhengqi.arch@bytedance.com Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, lujialin4@huawei.com, chenridong@huaweicloud.com, ryncsn@gmail.com Subject: [RFC PATCH -next 1/7] vmscan: add memcg heat level for reclaim Date: Tue, 20 Jan 2026 13:42:50 +0000 Message-Id: <20260120134256.2271710-2-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260120134256.2271710-1-chenridong@huaweicloud.com> References: <20260120134256.2271710-1-chenridong@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCnCPnQiW9pwhTxEQ--.10691S3 X-Coremail-Antispam: 1UD129KBjvJXoW3Kw1UCF1DZF1xXrWDCw45Jrb_yoWkXF1DpF Z3JayYyws3JF13KwnIy3WUW34fAwn7Ww13J343Gr1fAr13t345Za12kr47ZFW5CF98Xr13 J390kw1UWw4DZa7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUm014x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jr4l82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWU JVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67 kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY 6xIIjxv20xvEc7CjxVAFwI0_Gr0_Cr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r1xMIIF0x vEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr0_Gr1UYxBIdaVFxhVj vjDU0xZFpf9x0JUU9N3UUUUU= X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ Content-Type: text/plain; charset="utf-8" From: Chen Ridong The memcg LRU was originally introduced to improve scalability during global reclaim. However, it is complex and only works with gen lru global reclaim. Moreover, its implementation complexity has led to performance regressions when handling a large number of memory cgroups [1]. This patch introduces a per-memcg heat level for reclaim, aiming to unify gen lru and traditional LRU global reclaim. The core idea is to track per-node per-memcg reclaim state, including heat, last_decay, and last_refault. The last_refault records the total reclaimed data from the previous memcg reclaim. The last_decay is a time-based parameter; the heat level decays over time if the memcg is not reclaimed again. Both last_decay and last_refault are used to calculate the current heat level when reclaim starts. Three reclaim heat levels are defined: cold, warm, and hot. Cold memcgs are reclaimed first; only if cold memcgs cannot reclaim enough pages, warm memcgs become eligible for reclaim. Hot memcgs are reclaimed last. While this design can be applied to all memcg reclaim scenarios, this patch is conservative and only introduces heat levels for traditional LRU global reclaim. Subsequent patches will replace the memcg LRU with heat-level-based reclaim. Based on tests provided by YU Zhao, traditional LRU global reclaim shows significant performance improvement with heat-level reclaim enabled. The results below are from a 2-hour run of the test [2]. Throughput (number of requests) before after Change Total 1734169 2353717 +35% Tail latency (number of requests) before after Change [128s, inf) 1231 1057 -14% [64s, 128s) 586 444 -24% [32s, 64s) 1658 1061 -36% [16s, 32s) 4611 2863 -38% [1] https://lore.kernel.org/r/20251126171513.GC135004@cmpxchg.org [2] https://lore.kernel.org/all/20221220214923.1229538-1-yuzhao@google.com/ Signed-off-by: Chen Ridong --- include/linux/memcontrol.h | 7 ++ mm/memcontrol.c | 3 + mm/vmscan.c | 227 +++++++++++++++++++++++++++++-------- 3 files changed, 192 insertions(+), 45 deletions(-) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index af352cabedba..b293caf70034 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -76,6 +76,12 @@ struct memcg_vmstats; struct lruvec_stats_percpu; struct lruvec_stats; =20 +struct memcg_reclaim_state { + atomic_long_t heat; + unsigned long last_decay; + atomic_long_t last_refault; +}; + struct mem_cgroup_reclaim_iter { struct mem_cgroup *position; /* scan generation, increased every round-trip */ @@ -114,6 +120,7 @@ struct mem_cgroup_per_node { CACHELINE_PADDING(_pad2_); unsigned long lru_zone_size[MAX_NR_ZONES][NR_LRU_LISTS]; struct mem_cgroup_reclaim_iter iter; + struct memcg_reclaim_state reclaim; =20 #ifdef CONFIG_MEMCG_NMI_SAFETY_REQUIRES_ATOMIC /* slab stats for nmi context */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f2b87e02574e..675d49ad7e2c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3713,6 +3713,9 @@ static bool alloc_mem_cgroup_per_node_info(struct mem= _cgroup *memcg, int node) =20 lruvec_init(&pn->lruvec); pn->memcg =3D memcg; + atomic_long_set(&pn->reclaim.heat, 0); + pn->reclaim.last_decay =3D jiffies; + atomic_long_set(&pn->reclaim.last_refault, 0); =20 memcg->nodeinfo[node] =3D pn; return true; diff --git a/mm/vmscan.c b/mm/vmscan.c index 4aa73f125772..3759cd52c336 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5978,6 +5978,124 @@ static inline bool should_continue_reclaim(struct p= glist_data *pgdat, return inactive_lru_pages > pages_for_compaction; } =20 +enum memcg_scan_level { + MEMCG_LEVEL_COLD, + MEMCG_LEVEL_WARM, + MEMCG_LEVEL_HOT, + MEMCG_LEVEL_MAX, +}; + +#define MEMCG_HEAT_WARM 4 +#define MEMCG_HEAT_HOT 8 +#define MEMCG_HEAT_MAX 12 +#define MEMCG_HEAT_DECAY_STEP 1 +#define MEMCG_HEAT_DECAY_INTERVAL (1 * HZ) + +static void memcg_adjust_heat(struct mem_cgroup_per_node *pn, long delta) +{ + long heat, new_heat; + + if (mem_cgroup_is_root(pn->memcg)) + return; + + heat =3D atomic_long_read(&pn->reclaim.heat); + do { + new_heat =3D clamp_t(long, heat + delta, 0, MEMCG_HEAT_MAX); + if (atomic_long_cmpxchg(&pn->reclaim.heat, heat, new_heat) =3D=3D heat) + break; + heat =3D atomic_long_read(&pn->reclaim.heat); + } while (1); +} + +static void memcg_decay_heat(struct mem_cgroup_per_node *pn) +{ + unsigned long last; + unsigned long now =3D jiffies; + + if (mem_cgroup_is_root(pn->memcg)) + return; + + last =3D READ_ONCE(pn->reclaim.last_decay); + if (!time_after(now, last + MEMCG_HEAT_DECAY_INTERVAL)) + return; + + if (cmpxchg(&pn->reclaim.last_decay, last, now) !=3D last) + return; + + memcg_adjust_heat(pn, -MEMCG_HEAT_DECAY_STEP); +} + +static int memcg_heat_level(struct mem_cgroup_per_node *pn) +{ + long heat; + + if (mem_cgroup_is_root(pn->memcg)) + return MEMCG_LEVEL_COLD; + + memcg_decay_heat(pn); + heat =3D atomic_long_read(&pn->reclaim.heat); + + if (heat >=3D MEMCG_HEAT_HOT) + return MEMCG_LEVEL_HOT; + if (heat >=3D MEMCG_HEAT_WARM) + return MEMCG_LEVEL_WARM; + return MEMCG_LEVEL_COLD; +} + +static void memcg_record_reclaim_result(struct mem_cgroup_per_node *pn, + struct lruvec *lruvec, + unsigned long scanned, + unsigned long reclaimed) +{ + long delta; + + if (mem_cgroup_is_root(pn->memcg)) + return; + + memcg_decay_heat(pn); + + /* + * Memory cgroup heat adjustment algorithm: + * - If scanned =3D=3D 0: mark as hottest (+MAX_HEAT) + * - If reclaimed >=3D 50% * scanned: strong cool (-2) + * - If reclaimed >=3D 25% * scanned: mild cool (-1) + * - Otherwise: warm up (+1) + */ + if (!scanned) + delta =3D MEMCG_HEAT_MAX; + else if (reclaimed * 2 >=3D scanned) + delta =3D -2; + else if (reclaimed * 4 >=3D scanned) + delta =3D -1; + else + delta =3D 1; + + /* + * Refault-based heat adjustment: + * - If refault increase > reclaimed pages: heat up (more cautious reclai= m) + * - If no refaults and currently warm: cool down (allow more reclaim) + * This prevents thrashing by backing off when refaults indicate over-rec= laim. + */ + if (lruvec) { + unsigned long total_refaults; + unsigned long prev; + long refault_delta; + + total_refaults =3D lruvec_page_state(lruvec, WORKINGSET_ACTIVATE_ANON); + total_refaults +=3D lruvec_page_state(lruvec, WORKINGSET_ACTIVATE_FILE); + + prev =3D atomic_long_xchg(&pn->reclaim.last_refault, total_refaults); + refault_delta =3D total_refaults - prev; + + if (refault_delta > reclaimed) + delta++; + else if (!refault_delta && delta > 0) + delta--; + } + + memcg_adjust_heat(pn, delta); +} + static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc) { struct mem_cgroup *target_memcg =3D sc->target_mem_cgroup; @@ -5986,7 +6104,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, stru= ct scan_control *sc) }; struct mem_cgroup_reclaim_cookie *partial =3D &reclaim; struct mem_cgroup *memcg; - + int level; + int max_level =3D root_reclaim(sc) ? MEMCG_LEVEL_MAX : MEMCG_LEVEL_WARM; /* * In most cases, direct reclaimers can do partial walks * through the cgroup tree, using an iterator state that @@ -5999,62 +6118,80 @@ static void shrink_node_memcgs(pg_data_t *pgdat, st= ruct scan_control *sc) if (current_is_kswapd() || sc->memcg_full_walk) partial =3D NULL; =20 - memcg =3D mem_cgroup_iter(target_memcg, NULL, partial); - do { - struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, pgdat); - unsigned long reclaimed; - unsigned long scanned; - - /* - * This loop can become CPU-bound when target memcgs - * aren't eligible for reclaim - either because they - * don't have any reclaimable pages, or because their - * memory is explicitly protected. Avoid soft lockups. - */ - cond_resched(); + for (level =3D MEMCG_LEVEL_COLD; level < max_level; level++) { + bool need_next_level =3D false; =20 - mem_cgroup_calculate_protection(target_memcg, memcg); + memcg =3D mem_cgroup_iter(target_memcg, NULL, partial); + do { + struct lruvec *lruvec =3D mem_cgroup_lruvec(memcg, pgdat); + unsigned long reclaimed; + unsigned long scanned; + struct mem_cgroup_per_node *pn =3D memcg->nodeinfo[pgdat->node_id]; =20 - if (mem_cgroup_below_min(target_memcg, memcg)) { - /* - * Hard protection. - * If there is no reclaimable memory, OOM. - */ - continue; - } else if (mem_cgroup_below_low(target_memcg, memcg)) { /* - * Soft protection. - * Respect the protection only as long as - * there is an unprotected supply - * of reclaimable memory from other cgroups. + * This loop can become CPU-bound when target memcgs + * aren't eligible for reclaim - either because they + * don't have any reclaimable pages, or because their + * memory is explicitly protected. Avoid soft lockups. */ - if (!sc->memcg_low_reclaim) { - sc->memcg_low_skipped =3D 1; + cond_resched(); + + mem_cgroup_calculate_protection(target_memcg, memcg); + + if (mem_cgroup_below_min(target_memcg, memcg)) { + /* + * Hard protection. + * If there is no reclaimable memory, OOM. + */ continue; + } else if (mem_cgroup_below_low(target_memcg, memcg)) { + /* + * Soft protection. + * Respect the protection only as long as + * there is an unprotected supply + * of reclaimable memory from other cgroups. + */ + if (!sc->memcg_low_reclaim) { + sc->memcg_low_skipped =3D 1; + continue; + } + memcg_memory_event(memcg, MEMCG_LOW); } - memcg_memory_event(memcg, MEMCG_LOW); - } =20 - reclaimed =3D sc->nr_reclaimed; - scanned =3D sc->nr_scanned; + if (root_reclaim(sc) && memcg_heat_level(pn) > level) { + need_next_level =3D true; + continue; + } =20 - shrink_lruvec(lruvec, sc); + reclaimed =3D sc->nr_reclaimed; + scanned =3D sc->nr_scanned; =20 - shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, - sc->priority); + shrink_lruvec(lruvec, sc); + if (!memcg || memcg_page_state(memcg, NR_SLAB_RECLAIMABLE_B)) + shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, + sc->priority); =20 - /* Record the group's reclaim efficiency */ - if (!sc->proactive) - vmpressure(sc->gfp_mask, memcg, false, - sc->nr_scanned - scanned, - sc->nr_reclaimed - reclaimed); + if (root_reclaim(sc)) + memcg_record_reclaim_result(pn, lruvec, + sc->nr_scanned - scanned, + sc->nr_reclaimed - reclaimed); =20 - /* If partial walks are allowed, bail once goal is reached */ - if (partial && sc->nr_reclaimed >=3D sc->nr_to_reclaim) { - mem_cgroup_iter_break(target_memcg, memcg); + /* Record the group's reclaim efficiency */ + if (!sc->proactive) + vmpressure(sc->gfp_mask, memcg, false, + sc->nr_scanned - scanned, + sc->nr_reclaimed - reclaimed); + + /* If partial walks are allowed, bail once goal is reached */ + if (partial && sc->nr_reclaimed >=3D sc->nr_to_reclaim) { + mem_cgroup_iter_break(target_memcg, memcg); + break; + } + } while ((memcg =3D mem_cgroup_iter(target_memcg, memcg, partial))); + + if (!need_next_level) break; - } - } while ((memcg =3D mem_cgroup_iter(target_memcg, memcg, partial))); + } } =20 static void shrink_node(pg_data_t *pgdat, struct scan_control *sc) --=20 2.34.1 From nobody Mon Feb 9 15:11:07 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 60B2242EECF; Tue, 20 Jan 2026 13:57:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768917472; cv=none; b=kh2bifKUdhmREWtC28TDzo3aIAX8Do0MiAA5s1EkU8xdAavSc/0hSDY1Cs10KZRyDhl716LRDkPKms8Zy2IuFFhQ69QoG1WxW0jeyuioS2GjE7XExBssHLuypq1SmGDmaJK0W5FPQ4jEvf6OV1FjD25VipEZKpAfs9OrHlDGMB0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768917472; c=relaxed/simple; bh=cCpdy+/A49B4NCr7Pt/CUB5D1XH6Ky191EJUEsDJTa0=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=Githd1gjBfpa5Wrls9uPJl0OIoGBchwu+sDQqq0cn9dXVxZkYPvHco5auo8FSDDI0bsEuklNFjqhfcw0DB8CXGBNxlBOb37gMRanNZ7SDy4FYwyn93RJTn4IbpJCOM5eI8HrOMraJJXPsdFTQ9u/gG8K11C4mWuTOWiA2kkgMQM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4dwTR71Z3RzYQvHc; Tue, 20 Jan 2026 21:57:23 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 1A50A40574; Tue, 20 Jan 2026 21:57:47 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP4 (Coremail) with SMTP id gCh0CgCnCPnQiW9pwhTxEQ--.10691S4; Tue, 20 Jan 2026 21:57:46 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, corbet@lwn.net, skhan@linuxfoundation.org, hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, zhengqi.arch@bytedance.com Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, lujialin4@huawei.com, chenridong@huaweicloud.com, ryncsn@gmail.com Subject: [RFC PATCH -next 2/7] mm/mglru: make calls to flush_reclaim_state() similar for MGLRU and non-MGLRU Date: Tue, 20 Jan 2026 13:42:51 +0000 Message-Id: <20260120134256.2271710-3-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260120134256.2271710-1-chenridong@huaweicloud.com> References: <20260120134256.2271710-1-chenridong@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCnCPnQiW9pwhTxEQ--.10691S4 X-Coremail-Antispam: 1UD129KBjvJXoW7WrW8ArWkWr43Wr15uw43trb_yoW8WFWUpF ZxGry8ta1rArnIgwnIvF48W3s0vw4UKry5Jrs093WfAasxJr1YkrZxCrW0krWrWryvqrW3 Wr12gw1UZ3yUA3JanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmY14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_Jryl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWU JVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67 kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY 6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42 IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIev Ja73UjIFyTuYvjTRRCJPDUUUU X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ Content-Type: text/plain; charset="utf-8" From: Chen Ridong Currently, flush_reclaim_state is placed differently between shrink_node_memcgs and shrink_many. shrink_many (only used for gen-LRU) calls it after each lruvec is shrunk, while shrink_node_memcgs calls it only after all lruvecs have been shrunk. This patch moves flush_reclaim_state into shrink_node_memcgs and calls it after each lruvec. This unifies the behavior and is reasonable because: 1. flush_reclaim_state adds current->reclaim_state->reclaimed to sc->nr_reclaimed. 2. For non-MGLRU root reclaim, this can help stop the iteration earlier when nr_to_reclaim is reached. 3. For non-root reclaim, the effect is negligible since flush_reclaim_state does nothing in that case. Signed-off-by: Chen Ridong --- mm/vmscan.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 3759cd52c336..5a156ff48520 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -6182,6 +6182,7 @@ static void shrink_node_memcgs(pg_data_t *pgdat, stru= ct scan_control *sc) sc->nr_scanned - scanned, sc->nr_reclaimed - reclaimed); =20 + flush_reclaim_state(sc); /* If partial walks are allowed, bail once goal is reached */ if (partial && sc->nr_reclaimed >=3D sc->nr_to_reclaim) { mem_cgroup_iter_break(target_memcg, memcg); @@ -6218,8 +6219,6 @@ static void shrink_node(pg_data_t *pgdat, struct scan= _control *sc) =20 shrink_node_memcgs(pgdat, sc); =20 - flush_reclaim_state(sc); - nr_node_reclaimed =3D sc->nr_reclaimed - nr_reclaimed; =20 /* Record the subtree's reclaim efficiency */ --=20 2.34.1 From nobody Mon Feb 9 15:11:07 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EF0CA436344; Tue, 20 Jan 2026 13:57:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768917473; cv=none; b=XqBr8ISERFs7Ob3jIapSurdv+oxTVmSo24Ny8KDfB/qOirbbXpkjUcb7aaCneycdxHzxBL6J/D71VdQ5lRZbnObCEo7VdQpq9+MBTtyxeODXcWGyD2quAUxHAOtMi+v99/7OISTv2TEljCENg5jNM/MVMiwIX5HWIfbvWzM/da0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768917473; c=relaxed/simple; bh=p17iXYkwsGW0zMBtU2mO731CQqBuWr+SKG++pwTUTmg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=L7QKbv2qOLPVisRFzEo0dBQNBc4w/IVzfRzsz0oqKKCRU1rYbEUIdS27HPbjkSTFxl2aTRUpvzTp6YKi4AF8eQPQf6KyMPGjN6zbsFzIr/cldxOIdtDnvZUQ15grh1TbUE0hJQcA2R61e8lKYuomNzoRgOHeYuQ397cCh+2a4UA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4dwTQP64cmzKHMkx; Tue, 20 Jan 2026 21:56:45 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 30FF440576; Tue, 20 Jan 2026 21:57:47 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP4 (Coremail) with SMTP id gCh0CgCnCPnQiW9pwhTxEQ--.10691S5; Tue, 20 Jan 2026 21:57:46 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, corbet@lwn.net, skhan@linuxfoundation.org, hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, zhengqi.arch@bytedance.com Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, lujialin4@huawei.com, chenridong@huaweicloud.com, ryncsn@gmail.com Subject: [RFC PATCH -next 3/7] mm/mglru: rename should_abort_scan to lru_gen_should_abort_scan Date: Tue, 20 Jan 2026 13:42:52 +0000 Message-Id: <20260120134256.2271710-4-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260120134256.2271710-1-chenridong@huaweicloud.com> References: <20260120134256.2271710-1-chenridong@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCnCPnQiW9pwhTxEQ--.10691S5 X-Coremail-Antispam: 1UD129KBjvJXoW7Ar15GF17WF48AFy7urW7urg_yoW8Aw1DpF WDW3y7Aa4rJF45Ka9YqF4kCa43CrWxtFyDtrWxJ34xCrsagFy8WayUCryIvry5u34kuF1x XFWaqF1UGa1jqFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmF14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JrWl82xGYIkIc2 x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2z4x0 Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJw A2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq3wAS 0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7IYx2 IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4UM4x0 Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2kIc2 xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkEbVWU JVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67AF67 kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI42IY 6xIIjxv20xvEc7CjxVAFwI0_Cr0_Gr1UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42 IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4UJVWxJrUvcSsGvfC2 KfnxnUUI43ZEXa7VUUbAw7UUUUU== X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ Content-Type: text/plain; charset="utf-8" From: Chen Ridong The function should_abort_scan will be called in shrink_node_memcgs to integrate shrink_many and shrink_node_memcgs. Renaming it to lru_gen_should_abort_scan clarifies that it is specific to the generational LRU implementation. Signed-off-by: Chen Ridong --- mm/vmscan.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 5a156ff48520..ab7a74de80da 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4855,7 +4855,7 @@ static long get_nr_to_scan(struct lruvec *lruvec, str= uct scan_control *sc, int s return try_to_inc_max_seq(lruvec, max_seq, swappiness, false) ? -1 : 0; } =20 -static bool should_abort_scan(struct lruvec *lruvec, struct scan_control *= sc) +static bool lru_gen_should_abort_scan(struct lruvec *lruvec, struct scan_c= ontrol *sc) { int i; enum zone_watermarks mark; @@ -4907,7 +4907,7 @@ static bool try_to_shrink_lruvec(struct lruvec *lruve= c, struct scan_control *sc) if (scanned >=3D nr_to_scan) break; =20 - if (should_abort_scan(lruvec, sc)) + if (lru_gen_should_abort_scan(lruvec, sc)) break; =20 cond_resched(); @@ -5011,7 +5011,7 @@ static void shrink_many(struct pglist_data *pgdat, st= ruct scan_control *sc) =20 rcu_read_lock(); =20 - if (should_abort_scan(lruvec, sc)) + if (lru_gen_should_abort_scan(lruvec, sc)) break; } =20 @@ -5788,6 +5788,10 @@ static void lru_gen_shrink_node(struct pglist_data *= pgdat, struct scan_control * BUILD_BUG(); } =20 +static bool lru_gen_should_abort_scan(struct lruvec *lruvec, struct scan_c= ontrol *sc) +{ + return false; +} #endif /* CONFIG_LRU_GEN */ =20 static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) --=20 2.34.1 From nobody Mon Feb 9 15:11:07 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C94040F8EB; Tue, 20 Jan 2026 13:57:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768917472; cv=none; b=ATpJQMJ2vxcQqP/5c4Eyk65VyB0jI8abmrcs57tOlYCSntudAY6o50eXzNmgxa51Q/fykN5yUPis26oHEeSK5Er/eX82un622Q2oAR9B4hWHfnwU+IXD7WMq3vKjltZxwq3CO+Dkmslhe00+qKP7qr3A38rS+rI4E6m0F5G7PZc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768917472; c=relaxed/simple; bh=/PeaWaI60R9lnXpsGvOkNcWV5Hr9Ff9iTWyfPXHCr7c=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=NjwbLWxvY/InCVKa8wdOA/uNW37SAMPtwzcZw8o3dz7WKZKqfDKieQD3yTDrHua7CVZViU++vpgh+kiGxPjwVivzuk4zHgBILAw6WhMjgITvfr7fEvV8YIc3sfoPMNvZFIUfQW3y8UM3Iu8tLyH15eImhobHYkLJoghEfD9XfLQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=none smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=none smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4dwTR72yB4zYQvJ0; Tue, 20 Jan 2026 21:57:23 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 4B7314058D; Tue, 20 Jan 2026 21:57:47 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP4 (Coremail) with SMTP id gCh0CgCnCPnQiW9pwhTxEQ--.10691S6; Tue, 20 Jan 2026 21:57:46 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, corbet@lwn.net, skhan@linuxfoundation.org, hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, zhengqi.arch@bytedance.com Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, lujialin4@huawei.com, chenridong@huaweicloud.com, ryncsn@gmail.com Subject: [RFC PATCH -next 4/7] mm/mglru: extend lru_gen_shrink_lruvec to support root reclaim Date: Tue, 20 Jan 2026 13:42:53 +0000 Message-Id: <20260120134256.2271710-5-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260120134256.2271710-1-chenridong@huaweicloud.com> References: <20260120134256.2271710-1-chenridong@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCnCPnQiW9pwhTxEQ--.10691S6 X-Coremail-Antispam: 1UD129KBjvJXoWrKF4xKF1rJrWUWFWrWw4kJFb_yoW8Jr1rpa 9xW3y5Za4rAr4ag3s3Xa1kWa4F9w48GFyxJrZxXr1rAF1fXFy5K3y2kr4UCrW5Aw4kXry3 XFyYgryUX3WUXFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUJVWUCwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0pRiF4iUUUUU= X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ Content-Type: text/plain; charset="utf-8" From: Chen Ridong The upcoming patch will integrate shrink_many and shrink_node_memcgs. Currently, lru_gen_shrink_lruvec only supports non-root reclaim invoked from shrink_node_memcgs. This patch extends it to also handle root reclaim. Since the initial setup for root reclaim is already completed in lru_gen_shrink_node, we can simply call try_to_shrink_lruvec within lru_gen_shrink_lruvec for root reclaim. Signed-off-by: Chen Ridong --- mm/vmscan.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index ab7a74de80da..27c6fdbc9394 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -5039,7 +5039,15 @@ static void lru_gen_shrink_lruvec(struct lruvec *lru= vec, struct scan_control *sc { struct blk_plug plug; =20 - VM_WARN_ON_ONCE(root_reclaim(sc)); + /* + * For root reclaim, the initial setup has already been completed externa= lly; + * proceed directly with the shrinking operation. + */ + if (root_reclaim(sc)) { + try_to_shrink_lruvec(lruvec, sc); + return; + } + VM_WARN_ON_ONCE(!sc->may_writepage || !sc->may_unmap); =20 lru_add_drain(); --=20 2.34.1 From nobody Mon Feb 9 15:11:07 2026 Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id CE8C7436357; Tue, 20 Jan 2026 13:57:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.56 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768917473; cv=none; b=HKeDUX100BgoKYrqCQzO5qJD5V75G5Tm7aspLbVfi6OZk9JWZzEeMopGLRPtOk9NRcs0quvGj+roCOZ/WXbdsnJWGPg4/HNlhFVF3gfXfPz7Sl9gIVgvnZVER6UczP2EpdJlK56Nd95XGguz8TK+1y938FatBRmFUVPQiNLZNVE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768917473; c=relaxed/simple; bh=J9yEojaUh7VqQNQDT0IMKJdcMZhbDqbgy/U1qmgWZaA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=A58gA+iuyI3I0TswG8Y/cPEWsKycnHrMbczlcuQk15BqAfOGVdpSGLR77UQMInks6h2p5xJSoBLS7l8HuainKNcmRt+KvdttWzcJvkFwFwxfDpJ2hHHKQukKDvLwOo0GfnWRNaHswuuLFWk2wvenIWJJKWd+GqIH15f3JjLLCDs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.56 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.198]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTPS id 4dwTQQ0YWSzKHMkm; Tue, 20 Jan 2026 21:56:46 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 649B640573; Tue, 20 Jan 2026 21:57:47 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP4 (Coremail) with SMTP id gCh0CgCnCPnQiW9pwhTxEQ--.10691S7; Tue, 20 Jan 2026 21:57:47 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, corbet@lwn.net, skhan@linuxfoundation.org, hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, zhengqi.arch@bytedance.com Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, lujialin4@huawei.com, chenridong@huaweicloud.com, ryncsn@gmail.com Subject: [RFC PATCH -next 5/7] mm/mglru: combine shrink_many into shrink_node_memcgs Date: Tue, 20 Jan 2026 13:42:54 +0000 Message-Id: <20260120134256.2271710-6-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260120134256.2271710-1-chenridong@huaweicloud.com> References: <20260120134256.2271710-1-chenridong@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCnCPnQiW9pwhTxEQ--.10691S7 X-Coremail-Antispam: 1UD129KBjvJXoW3Jw17Cr4kAw45CF1ruw45ZFb_yoW7ZFyfpa 9xG3y2ya95JF4Ygws3tF4Du3Wakw48JrWYyry8G3WxCr1agFyrK34jkryfZFW5u3ykurnx ZryYvr1UWa1jqFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0pRiF4iUUUUU= X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ Content-Type: text/plain; charset="utf-8" From: Chen Ridong The memcg LRU was originally introduced to improve scalability during global reclaim, but it only supports gen lru global reclaim and remains complex in implementation. Previous patches have introduced heat-level-based memcg reclaim, which is significantly simpler. This patch switches gen lru global reclaim to the heat-level-based reclaim mechanism. The following results are from a 24-hour test provided by YU Zhao [1]: Throughput (number of requests) before after Change Total 22879701 25331956 +10% Tail latency (number of requests) before after Change [128s, inf) 19197 15628 -19% [64s, 128s) 4500 3815 -29% [32s, 64s) 14971 13755 -36% [16s, 32s) 46117 42942 -7% [1] https://lore.kernel.org/all/20221220214923.1229538-1-yuzhao@google.com/ Signed-off-by: Chen Ridong --- mm/vmscan.c | 101 ++++++++++++---------------------------------------- 1 file changed, 22 insertions(+), 79 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 27c6fdbc9394..f806838c3cea 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4965,76 +4965,6 @@ static int shrink_one(struct lruvec *lruvec, struct = scan_control *sc) MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG; } =20 -static void shrink_many(struct pglist_data *pgdat, struct scan_control *sc) -{ - int op; - int gen; - int bin; - int first_bin; - struct lruvec *lruvec; - struct lru_gen_folio *lrugen; - struct mem_cgroup *memcg; - struct hlist_nulls_node *pos; - - gen =3D get_memcg_gen(READ_ONCE(pgdat->memcg_lru.seq)); - bin =3D first_bin =3D get_random_u32_below(MEMCG_NR_BINS); -restart: - op =3D 0; - memcg =3D NULL; - - rcu_read_lock(); - - hlist_nulls_for_each_entry_rcu(lrugen, pos, &pgdat->memcg_lru.fifo[gen][b= in], list) { - if (op) { - lru_gen_rotate_memcg(lruvec, op); - op =3D 0; - } - - mem_cgroup_put(memcg); - memcg =3D NULL; - - if (gen !=3D READ_ONCE(lrugen->gen)) - continue; - - lruvec =3D container_of(lrugen, struct lruvec, lrugen); - memcg =3D lruvec_memcg(lruvec); - - if (!mem_cgroup_tryget(memcg)) { - lru_gen_release_memcg(memcg); - memcg =3D NULL; - continue; - } - - rcu_read_unlock(); - - op =3D shrink_one(lruvec, sc); - - rcu_read_lock(); - - if (lru_gen_should_abort_scan(lruvec, sc)) - break; - } - - rcu_read_unlock(); - - if (op) - lru_gen_rotate_memcg(lruvec, op); - - mem_cgroup_put(memcg); - - if (!is_a_nulls(pos)) - return; - - /* restart if raced with lru_gen_rotate_memcg() */ - if (gen !=3D get_nulls_value(pos)) - goto restart; - - /* try the rest of the bins of the current generation */ - bin =3D get_memcg_bin(bin + 1); - if (bin !=3D first_bin) - goto restart; -} - static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_contr= ol *sc) { struct blk_plug plug; @@ -5064,6 +4994,7 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruv= ec, struct scan_control *sc blk_finish_plug(&plug); } =20 +static void shrink_node_memcgs(pg_data_t *pgdat, struct scan_control *sc); static void lru_gen_shrink_node(struct pglist_data *pgdat, struct scan_con= trol *sc) { struct blk_plug plug; @@ -5093,7 +5024,7 @@ static void lru_gen_shrink_node(struct pglist_data *p= gdat, struct scan_control * if (mem_cgroup_disabled()) shrink_one(&pgdat->__lruvec, sc); else - shrink_many(pgdat, sc); + shrink_node_memcgs(pgdat, sc); =20 if (current_is_kswapd()) sc->nr_reclaimed +=3D reclaimed; @@ -5800,6 +5731,11 @@ static bool lru_gen_should_abort_scan(struct lruvec = *lruvec, struct scan_control { return false; } + +static bool lruvec_is_sizable(struct lruvec *lruvec, struct scan_control *= sc) +{ + BUILD_BUG(); +} #endif /* CONFIG_LRU_GEN */ =20 static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc) @@ -5813,11 +5749,6 @@ static void shrink_lruvec(struct lruvec *lruvec, str= uct scan_control *sc) bool proportional_reclaim; struct blk_plug plug; =20 - if (lru_gen_enabled() && !root_reclaim(sc)) { - lru_gen_shrink_lruvec(lruvec, sc); - return; - } - get_scan_count(lruvec, sc, nr); =20 /* Record the original scan target for proportional adjustments later */ @@ -6127,7 +6058,8 @@ static void shrink_node_memcgs(pg_data_t *pgdat, stru= ct scan_control *sc) * For kswapd, reliable forward progress is more important * than a quick return to idle. Always do full walks. */ - if (current_is_kswapd() || sc->memcg_full_walk) + if ((current_is_kswapd() && lru_gen_enabled()) + || sc->memcg_full_walk) partial =3D NULL; =20 for (level =3D MEMCG_LEVEL_COLD; level < max_level; level++) { @@ -6178,7 +6110,13 @@ static void shrink_node_memcgs(pg_data_t *pgdat, str= uct scan_control *sc) reclaimed =3D sc->nr_reclaimed; scanned =3D sc->nr_scanned; =20 - shrink_lruvec(lruvec, sc); + if (lru_gen_enabled()) { + if (!lruvec_is_sizable(lruvec, sc)) + continue; + lru_gen_shrink_lruvec(lruvec, sc); + } else + shrink_lruvec(lruvec, sc); + if (!memcg || memcg_page_state(memcg, NR_SLAB_RECLAIMABLE_B)) shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority); @@ -6196,7 +6134,12 @@ static void shrink_node_memcgs(pg_data_t *pgdat, str= uct scan_control *sc) =20 flush_reclaim_state(sc); /* If partial walks are allowed, bail once goal is reached */ - if (partial && sc->nr_reclaimed >=3D sc->nr_to_reclaim) { + if (lru_gen_enabled() && root_reclaim(sc)) { + if (lru_gen_should_abort_scan(lruvec, sc)) { + mem_cgroup_iter_break(target_memcg, memcg); + break; + } + } else if (partial && sc->nr_reclaimed >=3D sc->nr_to_reclaim) { mem_cgroup_iter_break(target_memcg, memcg); break; } --=20 2.34.1 From nobody Mon Feb 9 15:11:07 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2078B436358; Tue, 20 Jan 2026 13:57:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768917473; cv=none; b=Tencpt3Ps6OApm6gAMWj4Ds0JOh8REgCLvFadZciYM0PF+ijlxoGWoglBoWRiwNh/Zk2rWOawQ8r1KgysoaOrChZIFbdj0ARnLlzIE2YuC/9o8yHV2PP+Dbd+s3QN7bvGkZZUtLI9c8axxsL5KwS3feCtOae+1PI3ly8t7K+dr4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768917473; c=relaxed/simple; bh=VEei/FBbQUWoBnky82fXrK1tQd7KhKGHHLmMZXOEVAE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=QeJPqp8zSpUn+SqSIK3phAazfq1srR+pboF4L/ZETCHxepNUR5tD+wTC0RujHOMhhMXhSk6EvFeFWjQ2tfjwx9tleOdjprve+X3dsk/sSXKcSs2TNrO34FcEjU3TZvCKvkWBn/4GdTFeYUQuyRnSGff6m67Z0pGQv8H3NoUakgQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.177]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4dwTR749kYzYQvJD; Tue, 20 Jan 2026 21:57:23 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 784174058F; Tue, 20 Jan 2026 21:57:47 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP4 (Coremail) with SMTP id gCh0CgCnCPnQiW9pwhTxEQ--.10691S8; Tue, 20 Jan 2026 21:57:47 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, corbet@lwn.net, skhan@linuxfoundation.org, hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, zhengqi.arch@bytedance.com Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, lujialin4@huawei.com, chenridong@huaweicloud.com, ryncsn@gmail.com Subject: [RFC PATCH -next 6/7] mm/mglru: remove memcg disable handling from lru_gen_shrink_node Date: Tue, 20 Jan 2026 13:42:55 +0000 Message-Id: <20260120134256.2271710-7-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260120134256.2271710-1-chenridong@huaweicloud.com> References: <20260120134256.2271710-1-chenridong@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCnCPnQiW9pwhTxEQ--.10691S8 X-Coremail-Antispam: 1UD129KBjvJXoW7Ar4ruFWfuFWDGryfGry8Krg_yoW8KFyfpF Z3GrWIy3yrJF1ag3ZaqF47uasxCw48tr1rJrWUtw4fAr1furyrKa4UCrW8Wry5Arykur13 Jr9Ivr1rG3yjqF7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUmS14x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2048vs2IY020E87I2jVAFwI0_JF0E3s1l82xGYI kIc2x26xkF7I0E14v26ryj6s0DM28lY4IEw2IIxxk0rwA2F7IY1VAKz4vEj48ve4kI8wA2 z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F 4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x0267AKxVW0oVCq 3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG6I80ewAv7VC0I7 IYx2IY67AKxVWUGVWUXwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFVCjc4AY6r1j6r4U M4x0Y48IcxkI7VAKI48JM4x0x7Aq67IIx4CEVc8vx2IErcIFxwACI402YVCY1x02628vn2 kIc2xKxwCY1x0262kKe7AKxVW8ZVWrXwCF04k20xvY0x0EwIxGrwCFx2IqxVCFs4IE7xkE bVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E7480Y4vE14v26r106r1rMI8E67 AF67kF1VAFwI0_GFv_WrylIxkGc2Ij64vIr41lIxAIcVC0I7IYx2IY67AKxVWUCVW8JwCI 42IY6xIIjxv20xvEc7CjxVAFwI0_Gr1j6F4UJwCI42IY6xAIw20EY4v20xvaj40_Jr0_JF 4lIxAIcVC2z280aVAFwI0_Jr0_Gr1lIxAIcVC2z280aVCY1x0267AKxVW8Jr0_Cr1UYxBI daVFxhVjvjDU0xZFpf9x0pRiF4iUUUUU= X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ Content-Type: text/plain; charset="utf-8" From: Chen Ridong Since shrink_node_memcgs already handles the memcg disabled case, this special-case logic in lru_gen_shrink_node is unnecessary. Remove it. Signed-off-by: Chen Ridong --- mm/vmscan.c | 46 +--------------------------------------------- 1 file changed, 1 insertion(+), 45 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index f806838c3cea..d4eaa8221174 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4924,47 +4924,6 @@ static bool try_to_shrink_lruvec(struct lruvec *lruv= ec, struct scan_control *sc) return nr_to_scan < 0; } =20 -static int shrink_one(struct lruvec *lruvec, struct scan_control *sc) -{ - bool success; - unsigned long scanned =3D sc->nr_scanned; - unsigned long reclaimed =3D sc->nr_reclaimed; - struct mem_cgroup *memcg =3D lruvec_memcg(lruvec); - struct pglist_data *pgdat =3D lruvec_pgdat(lruvec); - - /* lru_gen_age_node() called mem_cgroup_calculate_protection() */ - if (mem_cgroup_below_min(NULL, memcg)) - return MEMCG_LRU_YOUNG; - - if (mem_cgroup_below_low(NULL, memcg)) { - /* see the comment on MEMCG_NR_GENS */ - if (READ_ONCE(lruvec->lrugen.seg) !=3D MEMCG_LRU_TAIL) - return MEMCG_LRU_TAIL; - - memcg_memory_event(memcg, MEMCG_LOW); - } - - success =3D try_to_shrink_lruvec(lruvec, sc); - - shrink_slab(sc->gfp_mask, pgdat->node_id, memcg, sc->priority); - - if (!sc->proactive) - vmpressure(sc->gfp_mask, memcg, false, sc->nr_scanned - scanned, - sc->nr_reclaimed - reclaimed); - - flush_reclaim_state(sc); - - if (success && mem_cgroup_online(memcg)) - return MEMCG_LRU_YOUNG; - - if (!success && lruvec_is_sizable(lruvec, sc)) - return 0; - - /* one retry if offlined or too small */ - return READ_ONCE(lruvec->lrugen.seg) !=3D MEMCG_LRU_TAIL ? - MEMCG_LRU_TAIL : MEMCG_LRU_YOUNG; -} - static void lru_gen_shrink_lruvec(struct lruvec *lruvec, struct scan_contr= ol *sc) { struct blk_plug plug; @@ -5021,10 +4980,7 @@ static void lru_gen_shrink_node(struct pglist_data *= pgdat, struct scan_control * if (current_is_kswapd()) sc->nr_reclaimed =3D 0; =20 - if (mem_cgroup_disabled()) - shrink_one(&pgdat->__lruvec, sc); - else - shrink_node_memcgs(pgdat, sc); + shrink_node_memcgs(pgdat, sc); =20 if (current_is_kswapd()) sc->nr_reclaimed +=3D reclaimed; --=20 2.34.1 From nobody Mon Feb 9 15:11:07 2026 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B445B366566; Tue, 20 Jan 2026 13:57:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=45.249.212.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768917473; cv=none; b=KUP8orO0DBc7n8o36zkV60pqQJpT8ZpVEJVg9v+999OUZzuKUlBfY1hSzExvUNT0Gm+QiYjKLHLWj+toYOa6Z5thaM5RsfDfAGLvUCH91eQNiGDaC6Q9Au4gYB7bnV8J0+r/yINHXkS9Lut1sYJd4twbKud1KJQpj77IpA2bwYU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1768917473; c=relaxed/simple; bh=B9DDkUbgb+A+315xGa2xitiV/1txUYHcvSOAZtxYLbg=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=FVaiw1nvUdSXtubtn3krVqaKnfSD1+JyoGqsEO+28rBKxd7dXymKu0yaMlxhP4DdGH2YU77mUbGeGu3uW27yIoyahwGLjxzlmJqboJ+uUch06I5dHpiZ/t7Hg7gWG/ubCOQ3mS7rLvVCeCNnYxtiVq3CpqUDq4Z0clI5gGxkxxQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com; spf=pass smtp.mailfrom=huaweicloud.com; arc=none smtp.client-ip=45.249.212.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=huaweicloud.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=huaweicloud.com Received: from mail.maildlp.com (unknown [172.19.163.170]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTPS id 4dwTR75BsvzYQvJC; Tue, 20 Jan 2026 21:57:23 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 9925A4056D; Tue, 20 Jan 2026 21:57:47 +0800 (CST) Received: from hulk-vt.huawei.com (unknown [10.67.174.121]) by APP4 (Coremail) with SMTP id gCh0CgCnCPnQiW9pwhTxEQ--.10691S9; Tue, 20 Jan 2026 21:57:47 +0800 (CST) From: Chen Ridong To: akpm@linux-foundation.org, axelrasmussen@google.com, yuanchu@google.com, weixugc@google.com, david@kernel.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, vbabka@suse.cz, rppt@kernel.org, surenb@google.com, mhocko@suse.com, corbet@lwn.net, skhan@linuxfoundation.org, hannes@cmpxchg.org, roman.gushchin@linux.dev, shakeel.butt@linux.dev, muchun.song@linux.dev, zhengqi.arch@bytedance.com Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, lujialin4@huawei.com, chenridong@huaweicloud.com, ryncsn@gmail.com Subject: [RFC PATCH -next 7/7] mm/mglru: remove memcg lru Date: Tue, 20 Jan 2026 13:42:56 +0000 Message-Id: <20260120134256.2271710-8-chenridong@huaweicloud.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20260120134256.2271710-1-chenridong@huaweicloud.com> References: <20260120134256.2271710-1-chenridong@huaweicloud.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-CM-TRANSID: gCh0CgCnCPnQiW9pwhTxEQ--.10691S9 X-Coremail-Antispam: 1UD129KBjvAXoW3Kw1ftw4kKry3tw15Wr18AFb_yoW8XF47to WSvr4DC3Zagr18Xw18ZrsFyF9xZF4DKryrXw15JwsrC3W2vrn8Wr4DJw4UGFyfJF1rGay0 vryYqw1UXrZ3Jw13n29KB7ZKAUJUUUU8529EdanIXcx71UUUUU7v73VFW2AGmfu7bjvjm3 AaLaJ3UjIYCTnIWjp_UUUOV7AC8VAFwI0_Wr0E3s1l1xkIjI8I6I8E6xAIw20EY4v20xva j40_Wr0E3s1l1IIY67AEw4v_Jr0_Jr4l82xGYIkIc2x26280x7IE14v26r126s0DM28Irc Ia0xkI8VCY1x0267AKxVW5JVCq3wA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK021l 84ACjcxK6xIIjxv20xvE14v26F1j6w1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4UJV WxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_GcCE 3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E2I x0cI8IcVAFwI0_JrI_JrylYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJVW8 JwACjcxG0xvY0x0EwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2 ka0xkIwI1lc7CjxVAaw2AFwI0_GFv_Wryl42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Y z7v_Jr0_Gr1lx2IqxVAqx4xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zV AF1VAY17CE14v26r4a6rW5MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_JFI_Gr1l IxAIcVC0I7IYx2IY6xkF7I0E14v26r4UJVWxJr1lIxAIcVCF04k26cxKx2IYs7xG6r1j6r 1xMIIF0xvEx4A2jsIE14v26r1j6r4UMIIF0xvEx4A2jsIEc7CjxVAFwI0_Gr1j6F4UJbIY CTnIWIevJa73UjIFyTuYvjTRRyxRDUUUU X-CM-SenderInfo: hfkh02xlgr0w46kxt4xhlfz01xgou0bp/ Content-Type: text/plain; charset="utf-8" From: Chen Ridong Now that the previous patch has switched global reclaim to use mem_cgroup_iter, the specialized memcg LRU infrastructure is no longer needed. This patch removes all related code: Signed-off-by: Chen Ridong --- Documentation/mm/multigen_lru.rst | 30 ------ include/linux/mmzone.h | 89 ----------------- mm/memcontrol-v1.c | 6 -- mm/memcontrol.c | 4 - mm/mm_init.c | 1 - mm/vmscan.c | 156 +----------------------------- 6 files changed, 2 insertions(+), 284 deletions(-) diff --git a/Documentation/mm/multigen_lru.rst b/Documentation/mm/multigen_= lru.rst index 52ed5092022f..bf8547e2f592 100644 --- a/Documentation/mm/multigen_lru.rst +++ b/Documentation/mm/multigen_lru.rst @@ -220,36 +220,6 @@ time domain because a CPU can scan pages at different = rates under varying memory pressure. It calculates a moving average for each new generation to avoid being permanently locked in a suboptimal state. =20 -Memcg LRU ---------- -An memcg LRU is a per-node LRU of memcgs. It is also an LRU of LRUs, -since each node and memcg combination has an LRU of folios (see -``mem_cgroup_lruvec()``). Its goal is to improve the scalability of -global reclaim, which is critical to system-wide memory overcommit in -data centers. Note that memcg LRU only applies to global reclaim. - -The basic structure of an memcg LRU can be understood by an analogy to -the active/inactive LRU (of folios): - -1. It has the young and the old (generations), i.e., the counterparts - to the active and the inactive; -2. The increment of ``max_seq`` triggers promotion, i.e., the - counterpart to activation; -3. Other events trigger similar operations, e.g., offlining an memcg - triggers demotion, i.e., the counterpart to deactivation. - -In terms of global reclaim, it has two distinct features: - -1. Sharding, which allows each thread to start at a random memcg (in - the old generation) and improves parallelism; -2. Eventual fairness, which allows direct reclaim to bail out at will - and reduces latency without affecting fairness over some time. - -In terms of traversing memcgs during global reclaim, it improves the -best-case complexity from O(n) to O(1) and does not affect the -worst-case complexity O(n). Therefore, on average, it has a sublinear -complexity. - Summary ------- The multi-gen LRU (of folios) can be disassembled into the following diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index cf3095198db6..5bb7ed3fa238 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -509,12 +509,6 @@ struct lru_gen_folio { atomic_long_t refaulted[NR_HIST_GENS][ANON_AND_FILE][MAX_NR_TIERS]; /* whether the multi-gen LRU is enabled */ bool enabled; - /* the memcg generation this lru_gen_folio belongs to */ - u8 gen; - /* the list segment this lru_gen_folio belongs to */ - u8 seg; - /* per-node lru_gen_folio list for global reclaim */ - struct hlist_nulls_node list; }; =20 enum { @@ -558,79 +552,14 @@ struct lru_gen_mm_walk { bool force_scan; }; =20 -/* - * For each node, memcgs are divided into two generations: the old and the - * young. For each generation, memcgs are randomly sharded into multiple b= ins - * to improve scalability. For each bin, the hlist_nulls is virtually divi= ded - * into three segments: the head, the tail and the default. - * - * An onlining memcg is added to the tail of a random bin in the old gener= ation. - * The eviction starts at the head of a random bin in the old generation. = The - * per-node memcg generation counter, whose reminder (mod MEMCG_NR_GENS) i= ndexes - * the old generation, is incremented when all its bins become empty. - * - * There are four operations: - * 1. MEMCG_LRU_HEAD, which moves a memcg to the head of a random bin in i= ts - * current generation (old or young) and updates its "seg" to "head"; - * 2. MEMCG_LRU_TAIL, which moves a memcg to the tail of a random bin in i= ts - * current generation (old or young) and updates its "seg" to "tail"; - * 3. MEMCG_LRU_OLD, which moves a memcg to the head of a random bin in th= e old - * generation, updates its "gen" to "old" and resets its "seg" to "defa= ult"; - * 4. MEMCG_LRU_YOUNG, which moves a memcg to the tail of a random bin in = the - * young generation, updates its "gen" to "young" and resets its "seg" = to - * "default". - * - * The events that trigger the above operations are: - * 1. Exceeding the soft limit, which triggers MEMCG_LRU_HEAD; - * 2. The first attempt to reclaim a memcg below low, which triggers - * MEMCG_LRU_TAIL; - * 3. The first attempt to reclaim a memcg offlined or below reclaimable s= ize - * threshold, which triggers MEMCG_LRU_TAIL; - * 4. The second attempt to reclaim a memcg offlined or below reclaimable = size - * threshold, which triggers MEMCG_LRU_YOUNG; - * 5. Attempting to reclaim a memcg below min, which triggers MEMCG_LRU_YO= UNG; - * 6. Finishing the aging on the eviction path, which triggers MEMCG_LRU_Y= OUNG; - * 7. Offlining a memcg, which triggers MEMCG_LRU_OLD. - * - * Notes: - * 1. Memcg LRU only applies to global reclaim, and the round-robin increm= enting - * of their max_seq counters ensures the eventual fairness to all eligi= ble - * memcgs. For memcg reclaim, it still relies on mem_cgroup_iter(). - * 2. There are only two valid generations: old (seq) and young (seq+1). - * MEMCG_NR_GENS is set to three so that when reading the generation co= unter - * locklessly, a stale value (seq-1) does not wraparound to young. - */ -#define MEMCG_NR_GENS 3 -#define MEMCG_NR_BINS 8 - -struct lru_gen_memcg { - /* the per-node memcg generation counter */ - unsigned long seq; - /* each memcg has one lru_gen_folio per node */ - unsigned long nr_memcgs[MEMCG_NR_GENS]; - /* per-node lru_gen_folio list for global reclaim */ - struct hlist_nulls_head fifo[MEMCG_NR_GENS][MEMCG_NR_BINS]; - /* protects the above */ - spinlock_t lock; -}; - -void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); =20 void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); -void lru_gen_online_memcg(struct mem_cgroup *memcg); -void lru_gen_offline_memcg(struct mem_cgroup *memcg); -void lru_gen_release_memcg(struct mem_cgroup *memcg); -void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid); =20 #else /* !CONFIG_LRU_GEN */ =20 -static inline void lru_gen_init_pgdat(struct pglist_data *pgdat) -{ -} - static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } @@ -648,22 +577,6 @@ static inline void lru_gen_exit_memcg(struct mem_cgrou= p *memcg) { } =20 -static inline void lru_gen_online_memcg(struct mem_cgroup *memcg) -{ -} - -static inline void lru_gen_offline_memcg(struct mem_cgroup *memcg) -{ -} - -static inline void lru_gen_release_memcg(struct mem_cgroup *memcg) -{ -} - -static inline void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid) -{ -} - #endif /* CONFIG_LRU_GEN */ =20 struct lruvec { @@ -1503,8 +1416,6 @@ typedef struct pglist_data { #ifdef CONFIG_LRU_GEN /* kswap mm walk data */ struct lru_gen_mm_walk mm_walk; - /* lru_gen_folio list */ - struct lru_gen_memcg memcg_lru; #endif =20 CACHELINE_PADDING(_pad2_); diff --git a/mm/memcontrol-v1.c b/mm/memcontrol-v1.c index f0ef650d2317..3f0fd1141f37 100644 --- a/mm/memcontrol-v1.c +++ b/mm/memcontrol-v1.c @@ -182,12 +182,6 @@ static void memcg1_update_tree(struct mem_cgroup *memc= g, int nid) struct mem_cgroup_per_node *mz; struct mem_cgroup_tree_per_node *mctz; =20 - if (lru_gen_enabled()) { - if (soft_limit_excess(memcg)) - lru_gen_soft_reclaim(memcg, nid); - return; - } - mctz =3D soft_limit_tree.rb_tree_per_node[nid]; if (!mctz) return; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 675d49ad7e2c..f9aace496881 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3894,8 +3894,6 @@ static int mem_cgroup_css_online(struct cgroup_subsys= _state *css) if (unlikely(mem_cgroup_is_root(memcg)) && !mem_cgroup_disabled()) queue_delayed_work(system_dfl_wq, &stats_flush_dwork, FLUSH_TIME); - lru_gen_online_memcg(memcg); - /* Online state pins memcg ID, memcg ID pins CSS */ refcount_set(&memcg->id.ref, 1); css_get(css); @@ -3935,7 +3933,6 @@ static void mem_cgroup_css_offline(struct cgroup_subs= ys_state *css) reparent_deferred_split_queue(memcg); reparent_shrinker_deferred(memcg); wb_memcg_offline(memcg); - lru_gen_offline_memcg(memcg); =20 drain_all_stock(memcg); =20 @@ -3947,7 +3944,6 @@ static void mem_cgroup_css_released(struct cgroup_sub= sys_state *css) struct mem_cgroup *memcg =3D mem_cgroup_from_css(css); =20 invalidate_reclaim_iterators(memcg); - lru_gen_release_memcg(memcg); } =20 static void mem_cgroup_css_free(struct cgroup_subsys_state *css) diff --git a/mm/mm_init.c b/mm/mm_init.c index 46ac915558d4..262238925c50 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -1742,7 +1742,6 @@ static void __init free_area_init_node(int nid) pgdat_set_deferred_range(pgdat); =20 free_area_init_core(pgdat); - lru_gen_init_pgdat(pgdat); } =20 /* Any regular or high memory on that node? */ diff --git a/mm/vmscan.c b/mm/vmscan.c index d4eaa8221174..0946ba0af064 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2726,9 +2726,6 @@ static bool should_clear_pmd_young(void) #define for_each_evictable_type(type, swappiness) \ for ((type) =3D min_type(swappiness); (type) <=3D max_type(swappiness); (= type)++) =20 -#define get_memcg_gen(seq) ((seq) % MEMCG_NR_GENS) -#define get_memcg_bin(bin) ((bin) % MEMCG_NR_BINS) - static struct lruvec *get_lruvec(struct mem_cgroup *memcg, int nid) { struct pglist_data *pgdat =3D NODE_DATA(nid); @@ -4315,140 +4312,6 @@ bool lru_gen_look_around(struct page_vma_mapped_wal= k *pvmw) return true; } =20 -/*************************************************************************= ***** - * memcg LRU - *************************************************************************= *****/ - -/* see the comment on MEMCG_NR_GENS */ -enum { - MEMCG_LRU_NOP, - MEMCG_LRU_HEAD, - MEMCG_LRU_TAIL, - MEMCG_LRU_OLD, - MEMCG_LRU_YOUNG, -}; - -static void lru_gen_rotate_memcg(struct lruvec *lruvec, int op) -{ - int seg; - int old, new; - unsigned long flags; - int bin =3D get_random_u32_below(MEMCG_NR_BINS); - struct pglist_data *pgdat =3D lruvec_pgdat(lruvec); - - spin_lock_irqsave(&pgdat->memcg_lru.lock, flags); - - VM_WARN_ON_ONCE(hlist_nulls_unhashed(&lruvec->lrugen.list)); - - seg =3D 0; - new =3D old =3D lruvec->lrugen.gen; - - /* see the comment on MEMCG_NR_GENS */ - if (op =3D=3D MEMCG_LRU_HEAD) - seg =3D MEMCG_LRU_HEAD; - else if (op =3D=3D MEMCG_LRU_TAIL) - seg =3D MEMCG_LRU_TAIL; - else if (op =3D=3D MEMCG_LRU_OLD) - new =3D get_memcg_gen(pgdat->memcg_lru.seq); - else if (op =3D=3D MEMCG_LRU_YOUNG) - new =3D get_memcg_gen(pgdat->memcg_lru.seq + 1); - else - VM_WARN_ON_ONCE(true); - - WRITE_ONCE(lruvec->lrugen.seg, seg); - WRITE_ONCE(lruvec->lrugen.gen, new); - - hlist_nulls_del_rcu(&lruvec->lrugen.list); - - if (op =3D=3D MEMCG_LRU_HEAD || op =3D=3D MEMCG_LRU_OLD) - hlist_nulls_add_head_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ne= w][bin]); - else - hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ne= w][bin]); - - pgdat->memcg_lru.nr_memcgs[old]--; - pgdat->memcg_lru.nr_memcgs[new]++; - - if (!pgdat->memcg_lru.nr_memcgs[old] && old =3D=3D get_memcg_gen(pgdat->m= emcg_lru.seq)) - WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1); - - spin_unlock_irqrestore(&pgdat->memcg_lru.lock, flags); -} - -#ifdef CONFIG_MEMCG - -void lru_gen_online_memcg(struct mem_cgroup *memcg) -{ - int gen; - int nid; - int bin =3D get_random_u32_below(MEMCG_NR_BINS); - - for_each_node(nid) { - struct pglist_data *pgdat =3D NODE_DATA(nid); - struct lruvec *lruvec =3D get_lruvec(memcg, nid); - - spin_lock_irq(&pgdat->memcg_lru.lock); - - VM_WARN_ON_ONCE(!hlist_nulls_unhashed(&lruvec->lrugen.list)); - - gen =3D get_memcg_gen(pgdat->memcg_lru.seq); - - lruvec->lrugen.gen =3D gen; - - hlist_nulls_add_tail_rcu(&lruvec->lrugen.list, &pgdat->memcg_lru.fifo[ge= n][bin]); - pgdat->memcg_lru.nr_memcgs[gen]++; - - spin_unlock_irq(&pgdat->memcg_lru.lock); - } -} - -void lru_gen_offline_memcg(struct mem_cgroup *memcg) -{ - int nid; - - for_each_node(nid) { - struct lruvec *lruvec =3D get_lruvec(memcg, nid); - - lru_gen_rotate_memcg(lruvec, MEMCG_LRU_OLD); - } -} - -void lru_gen_release_memcg(struct mem_cgroup *memcg) -{ - int gen; - int nid; - - for_each_node(nid) { - struct pglist_data *pgdat =3D NODE_DATA(nid); - struct lruvec *lruvec =3D get_lruvec(memcg, nid); - - spin_lock_irq(&pgdat->memcg_lru.lock); - - if (hlist_nulls_unhashed(&lruvec->lrugen.list)) - goto unlock; - - gen =3D lruvec->lrugen.gen; - - hlist_nulls_del_init_rcu(&lruvec->lrugen.list); - pgdat->memcg_lru.nr_memcgs[gen]--; - - if (!pgdat->memcg_lru.nr_memcgs[gen] && gen =3D=3D get_memcg_gen(pgdat->= memcg_lru.seq)) - WRITE_ONCE(pgdat->memcg_lru.seq, pgdat->memcg_lru.seq + 1); -unlock: - spin_unlock_irq(&pgdat->memcg_lru.lock); - } -} - -void lru_gen_soft_reclaim(struct mem_cgroup *memcg, int nid) -{ - struct lruvec *lruvec =3D get_lruvec(memcg, nid); - - /* see the comment on MEMCG_NR_GENS */ - if (READ_ONCE(lruvec->lrugen.seg) !=3D MEMCG_LRU_HEAD) - lru_gen_rotate_memcg(lruvec, MEMCG_LRU_HEAD); -} - -#endif /* CONFIG_MEMCG */ - /*************************************************************************= ***** * the eviction *************************************************************************= *****/ @@ -4945,8 +4808,7 @@ static void lru_gen_shrink_lruvec(struct lruvec *lruv= ec, struct scan_control *sc =20 set_mm_walk(NULL, sc->proactive); =20 - if (try_to_shrink_lruvec(lruvec, sc)) - lru_gen_rotate_memcg(lruvec, MEMCG_LRU_YOUNG); + try_to_shrink_lruvec(lruvec, sc); =20 clear_mm_walk(); =20 @@ -5575,18 +5437,6 @@ static const struct file_operations lru_gen_ro_fops = =3D { * initialization *************************************************************************= *****/ =20 -void lru_gen_init_pgdat(struct pglist_data *pgdat) -{ - int i, j; - - spin_lock_init(&pgdat->memcg_lru.lock); - - for (i =3D 0; i < MEMCG_NR_GENS; i++) { - for (j =3D 0; j < MEMCG_NR_BINS; j++) - INIT_HLIST_NULLS_HEAD(&pgdat->memcg_lru.fifo[i][j], i); - } -} - void lru_gen_init_lruvec(struct lruvec *lruvec) { int i; @@ -5633,9 +5483,7 @@ void lru_gen_exit_memcg(struct mem_cgroup *memcg) struct lru_gen_mm_state *mm_state =3D get_mm_state(lruvec); =20 VM_WARN_ON_ONCE(memchr_inv(lruvec->lrugen.nr_pages, 0, - sizeof(lruvec->lrugen.nr_pages))); - - lruvec->lrugen.list.next =3D LIST_POISON1; + sizeof(lruvec->lrugen.nr_pages))); =20 if (!mm_state) continue; --=20 2.34.1