Documentation/mm/multigen_lru.rst | 30 --- include/linux/mmzone.h | 89 -------- mm/memcontrol-v1.c | 6 - mm/memcontrol.c | 4 - mm/mm_init.c | 1 - mm/vmscan.c | 332 ++++-------------------------- 6 files changed, 44 insertions(+), 418 deletions(-)
From: Chen Ridong <chenridong@huawei.com>
The memcg LRU was introduced to improve scalability in global reclaim,
but its implementation has grown complex and can cause performance
regressions when creating many memory cgroups [1].
This series implements mem_cgroup_iter with a reclaim cookie in
shrink_many() for global reclaim, following the pattern already used in
shrink_node_memcgs(), an approach suggested by Johannes [1]. The new
design maintains good fairness across cgroups by preserving iteration
state between reclaim passes.
Testing was performed using the original stress test from Yu Zhao [2] on a
1 TB, 4-node NUMA system. The results show:
pgsteal:
memcg LRU memcg iter
stddev(pgsteal) / mean(pgsteal) 106.03% 93.20%
sum(pgsteal) / sum(requested) 98.10% 99.28%
workingset_refault_anon:
memcg LRU memcg iter
stddev(refault) / mean(refault) 193.97% 134.67%
sum(refault) 1,963,229 2,027,567
The new implementation shows clear fairness improvements, reducing the
standard deviation relative to the mean by 12.8 percentage points for
pgsteal and bringing the pgsteal ratio closer to 100%. Refault counts
increased by 3.2% (from 1,963,229 to 2,027,567).
To simplify review:
1. Patch 1 uses mem_cgroup_iter with reclaim cookie in shrink_many()
2. Patch 2 removes the now-unused memcg LRU code
3. Patches 3–5 combine shrink_many and shrink_node_memcgs
(This reorganization is clearer after switching to mem_cgroup_iter)
---
Changes from RFC series:
1. Updated the test result data.
2. Added patches 3–5 to combine shrink_many and shrink_node_memcgs.
RFC: https://lore.kernel.org/all/20251204123124.1822965-1-chenridong@huaweicloud.com/
Chen Ridong (5):
mm/mglru: use mem_cgroup_iter for global reclaim
mm/mglru: remove memcg lru
mm/mglru: extend shrink_one for both lrugen and non-lrugen
mm/mglru: combine shrink_many into shrink_node_memcgs
mm/mglru: factor lrugen state out of shrink_lruvec
Documentation/mm/multigen_lru.rst | 30 ---
include/linux/mmzone.h | 89 --------
mm/memcontrol-v1.c | 6 -
mm/memcontrol.c | 4 -
mm/mm_init.c | 1 -
mm/vmscan.c | 332 ++++--------------------------
6 files changed, 44 insertions(+), 418 deletions(-)
--
2.34.1
Hi. On Tue, Dec 09, 2025 at 01:25:52AM +0000, Chen Ridong <chenridong@huaweicloud.com> wrote: > From: Chen Ridong <chenridong@huawei.com> > > The memcg LRU was introduced to improve scalability in global reclaim, > but its implementation has grown complex and can cause performance > regressions when creating many memory cgroups [1]. > > This series implements mem_cgroup_iter with a reclaim cookie in > shrink_many() for global reclaim, following the pattern already used in > shrink_node_memcgs(), an approach suggested by Johannes [1]. The new > design maintains good fairness across cgroups by preserving iteration > state between reclaim passes. > > Testing was performed using the original stress test from Yu Zhao [2] on a > 1 TB, 4-node NUMA system. The results show: (I think the cover letter somehow lost the targets of [1],[2]. I assume I could retrieve those from patch 1/5.) > > pgsteal: > memcg LRU memcg iter > stddev(pgsteal) / mean(pgsteal) 106.03% 93.20% > sum(pgsteal) / sum(requested) 98.10% 99.28% > > workingset_refault_anon: > memcg LRU memcg iter > stddev(refault) / mean(refault) 193.97% 134.67% > sum(refault) 1,963,229 2,027,567 > > The new implementation shows clear fairness improvements, reducing the > standard deviation relative to the mean by 12.8 percentage points for > pgsteal and bringing the pgsteal ratio closer to 100%. Refault counts > increased by 3.2% (from 1,963,229 to 2,027,567). Just as a quick clarification -- this isn't supposed to affect regular (CONFIG_LRU_GEN_ENABLED=n) reclaim, correct? Thanks, Michal
On 2025/12/16 0:18, Michal Koutný wrote: > Hi. > > On Tue, Dec 09, 2025 at 01:25:52AM +0000, Chen Ridong <chenridong@huaweicloud.com> wrote: >> From: Chen Ridong <chenridong@huawei.com> >> >> The memcg LRU was introduced to improve scalability in global reclaim, >> but its implementation has grown complex and can cause performance >> regressions when creating many memory cgroups [1]. >> >> This series implements mem_cgroup_iter with a reclaim cookie in >> shrink_many() for global reclaim, following the pattern already used in >> shrink_node_memcgs(), an approach suggested by Johannes [1]. The new >> design maintains good fairness across cgroups by preserving iteration >> state between reclaim passes. >> >> Testing was performed using the original stress test from Yu Zhao [2] on a >> 1 TB, 4-node NUMA system. The results show: > > (I think the cover letter somehow lost the targets of [1],[2]. I assume > I could retrieve those from patch 1/5.) > Hi Michal, Thanks for the reminder—I appreciate you pointing that out. Apologies for missing the links in the cover letter. You can find them in patch 1/5. > >> >> pgsteal: >> memcg LRU memcg iter >> stddev(pgsteal) / mean(pgsteal) 106.03% 93.20% >> sum(pgsteal) / sum(requested) 98.10% 99.28% >> >> workingset_refault_anon: >> memcg LRU memcg iter >> stddev(refault) / mean(refault) 193.97% 134.67% >> sum(refault) 1,963,229 2,027,567 >> >> The new implementation shows clear fairness improvements, reducing the >> standard deviation relative to the mean by 12.8 percentage points for >> pgsteal and bringing the pgsteal ratio closer to 100%. Refault counts >> increased by 3.2% (from 1,963,229 to 2,027,567). > > Just as a quick clarification -- this isn't supposed to affect regular > (CONFIG_LRU_GEN_ENABLED=n) reclaim, correct? > > Thanks, > Michal That's correct. To be precise, it only affects root reclaim when lru_gen_enabled() returns true. Note that the generation LRU can still be enabled via /sys/kernel/mm/lru_gen/enabled even when CONFIG_LRU_GEN_ENABLED=n. -- Best regards, Ridong
On 2025/12/9 9:25, Chen Ridong wrote: > From: Chen Ridong <chenridong@huawei.com> > > The memcg LRU was introduced to improve scalability in global reclaim, > but its implementation has grown complex and can cause performance > regressions when creating many memory cgroups [1]. > > This series implements mem_cgroup_iter with a reclaim cookie in > shrink_many() for global reclaim, following the pattern already used in > shrink_node_memcgs(), an approach suggested by Johannes [1]. The new > design maintains good fairness across cgroups by preserving iteration > state between reclaim passes. > > Testing was performed using the original stress test from Yu Zhao [2] on a > 1 TB, 4-node NUMA system. The results show: > > pgsteal: > memcg LRU memcg iter > stddev(pgsteal) / mean(pgsteal) 106.03% 93.20% > sum(pgsteal) / sum(requested) 98.10% 99.28% > > workingset_refault_anon: > memcg LRU memcg iter > stddev(refault) / mean(refault) 193.97% 134.67% > sum(refault) 1,963,229 2,027,567 > > The new implementation shows clear fairness improvements, reducing the > standard deviation relative to the mean by 12.8 percentage points for > pgsteal and bringing the pgsteal ratio closer to 100%. Refault counts > increased by 3.2% (from 1,963,229 to 2,027,567). > > To simplify review: > 1. Patch 1 uses mem_cgroup_iter with reclaim cookie in shrink_many() > 2. Patch 2 removes the now-unused memcg LRU code > 3. Patches 3–5 combine shrink_many and shrink_node_memcgs > (This reorganization is clearer after switching to mem_cgroup_iter) > > --- > > Changes from RFC series: > 1. Updated the test result data. > 2. Added patches 3–5 to combine shrink_many and shrink_node_memcgs. > > RFC: https://lore.kernel.org/all/20251204123124.1822965-1-chenridong@huaweicloud.com/ > > Chen Ridong (5): > mm/mglru: use mem_cgroup_iter for global reclaim > mm/mglru: remove memcg lru > mm/mglru: extend shrink_one for both lrugen and non-lrugen > mm/mglru: combine shrink_many into shrink_node_memcgs > mm/mglru: factor lrugen state out of shrink_lruvec > > Documentation/mm/multigen_lru.rst | 30 --- > include/linux/mmzone.h | 89 -------- > mm/memcontrol-v1.c | 6 - > mm/memcontrol.c | 4 - > mm/mm_init.c | 1 - > mm/vmscan.c | 332 ++++-------------------------- > 6 files changed, 44 insertions(+), 418 deletions(-) > Hello all, There's a warning from the kernel test robot, and I would like to update the series to fix it along with any feedback from your reviews. I'd appreciate it if you could take a look at this patch series when convenient. Hi Shakeel, I would be very grateful if you could review patches 3-5. They combine shrink_many and shrink_node_memcgs as you suggested — does that look good to you? -- Best regards, Ridong
© 2016 - 2025 Red Hat, Inc.