[v1] mm/mglru: remove memcg lru

[PATCH -next 0/5] mm/mglru: remove memcg lru

Posted by Chen Ridong 2 months ago

From: Chen Ridong <chenridong@huawei.com>

The memcg LRU was introduced to improve scalability in global reclaim,
but its implementation has grown complex and can cause performance
regressions when creating many memory cgroups [1].

This series implements mem_cgroup_iter with a reclaim cookie in
shrink_many() for global reclaim, following the pattern already used in
shrink_node_memcgs(), an approach suggested by Johannes [1]. The new
design maintains good fairness across cgroups by preserving iteration
state between reclaim passes.

Testing was performed using the original stress test from Yu Zhao [2] on a
1 TB, 4-node NUMA system. The results show:

    pgsteal:
                                        memcg LRU    memcg iter
    stddev(pgsteal) / mean(pgsteal)     106.03%       93.20%
    sum(pgsteal) / sum(requested)        98.10%       99.28%
    
    workingset_refault_anon:
                                        memcg LRU    memcg iter
    stddev(refault) / mean(refault)     193.97%      134.67%
    sum(refault)                       1,963,229    2,027,567

The new implementation shows clear fairness improvements, reducing the
standard deviation relative to the mean by 12.8 percentage points for
pgsteal and bringing the pgsteal ratio closer to 100%. Refault counts
increased by 3.2% (from 1,963,229 to 2,027,567).

To simplify review:
1. Patch 1 uses mem_cgroup_iter with reclaim cookie in shrink_many()
2. Patch 2 removes the now-unused memcg LRU code
3. Patches 3–5 combine shrink_many and shrink_node_memcgs
   (This reorganization is clearer after switching to mem_cgroup_iter)

---

Changes from RFC series:
1. Updated the test result data.
2. Added patches 3–5 to combine shrink_many and shrink_node_memcgs.

RFC: https://lore.kernel.org/all/20251204123124.1822965-1-chenridong@huaweicloud.com/

Chen Ridong (5):
  mm/mglru: use mem_cgroup_iter for global reclaim
  mm/mglru: remove memcg lru
  mm/mglru: extend shrink_one for both lrugen and non-lrugen
  mm/mglru: combine shrink_many into shrink_node_memcgs
  mm/mglru: factor lrugen state out of shrink_lruvec

 Documentation/mm/multigen_lru.rst |  30 ---
 include/linux/mmzone.h            |  89 --------
 mm/memcontrol-v1.c                |   6 -
 mm/memcontrol.c                   |   4 -
 mm/mm_init.c                      |   1 -
 mm/vmscan.c                       | 332 ++++--------------------------
 6 files changed, 44 insertions(+), 418 deletions(-)

-- 
2.34.1

Re: [PATCH -next 0/5] mm/mglru: remove memcg lru

Posted by Chen Ridong 1 month, 3 weeks ago


On 2025/12/9 9:25, Chen Ridong wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> The memcg LRU was introduced to improve scalability in global reclaim,
> but its implementation has grown complex and can cause performance
> regressions when creating many memory cgroups [1].
> 
> This series implements mem_cgroup_iter with a reclaim cookie in
> shrink_many() for global reclaim, following the pattern already used in
> shrink_node_memcgs(), an approach suggested by Johannes [1]. The new
> design maintains good fairness across cgroups by preserving iteration
> state between reclaim passes.
> 
> Testing was performed using the original stress test from Yu Zhao [2] on a
> 1 TB, 4-node NUMA system. The results show:
> 
>     pgsteal:
>                                         memcg LRU    memcg iter
>     stddev(pgsteal) / mean(pgsteal)     106.03%       93.20%
>     sum(pgsteal) / sum(requested)        98.10%       99.28%
>     
>     workingset_refault_anon:
>                                         memcg LRU    memcg iter
>     stddev(refault) / mean(refault)     193.97%      134.67%
>     sum(refault)                       1,963,229    2,027,567
> 
> The new implementation shows clear fairness improvements, reducing the
> standard deviation relative to the mean by 12.8 percentage points for
> pgsteal and bringing the pgsteal ratio closer to 100%. Refault counts
> increased by 3.2% (from 1,963,229 to 2,027,567).
> 
> To simplify review:
> 1. Patch 1 uses mem_cgroup_iter with reclaim cookie in shrink_many()
> 2. Patch 2 removes the now-unused memcg LRU code
> 3. Patches 3–5 combine shrink_many and shrink_node_memcgs
>    (This reorganization is clearer after switching to mem_cgroup_iter)
> 
> ---
> 
> Changes from RFC series:
> 1. Updated the test result data.
> 2. Added patches 3–5 to combine shrink_many and shrink_node_memcgs.
> 
> RFC: https://lore.kernel.org/all/20251204123124.1822965-1-chenridong@huaweicloud.com/
> 
> Chen Ridong (5):
>   mm/mglru: use mem_cgroup_iter for global reclaim
>   mm/mglru: remove memcg lru
>   mm/mglru: extend shrink_one for both lrugen and non-lrugen
>   mm/mglru: combine shrink_many into shrink_node_memcgs
>   mm/mglru: factor lrugen state out of shrink_lruvec
> 
>  Documentation/mm/multigen_lru.rst |  30 ---
>  include/linux/mmzone.h            |  89 --------
>  mm/memcontrol-v1.c                |   6 -
>  mm/memcontrol.c                   |   4 -
>  mm/mm_init.c                      |   1 -
>  mm/vmscan.c                       | 332 ++++--------------------------
>  6 files changed, 44 insertions(+), 418 deletions(-)
> 

Hello all,

There's a warning from the kernel test robot, and I would like to update the series to fix it along
with any feedback from your reviews.

I'd appreciate it if you could take a look at this patch series when convenient.

Hi Shakeel, I would be very grateful if you could review patches 3-5. They combine shrink_many and
shrink_node_memcgs as you suggested — does that look good to you?

-- 
Best regards,
Ridong

Re: [PATCH -next 0/5] mm/mglru: remove memcg lru

Posted by Michal Koutný 1 month, 3 weeks ago

Hi.

On Tue, Dec 09, 2025 at 01:25:52AM +0000, Chen Ridong <chenridong@huaweicloud.com> wrote:
> From: Chen Ridong <chenridong@huawei.com>
> 
> The memcg LRU was introduced to improve scalability in global reclaim,
> but its implementation has grown complex and can cause performance
> regressions when creating many memory cgroups [1].
> 
> This series implements mem_cgroup_iter with a reclaim cookie in
> shrink_many() for global reclaim, following the pattern already used in
> shrink_node_memcgs(), an approach suggested by Johannes [1]. The new
> design maintains good fairness across cgroups by preserving iteration
> state between reclaim passes.
> 
> Testing was performed using the original stress test from Yu Zhao [2] on a
> 1 TB, 4-node NUMA system. The results show:

(I think the cover letter somehow lost the targets of [1],[2]. I assume
I could retrieve those from patch 1/5.)


> 
>     pgsteal:
>                                         memcg LRU    memcg iter
>     stddev(pgsteal) / mean(pgsteal)     106.03%       93.20%
>     sum(pgsteal) / sum(requested)        98.10%       99.28%
>     
>     workingset_refault_anon:
>                                         memcg LRU    memcg iter
>     stddev(refault) / mean(refault)     193.97%      134.67%
>     sum(refault)                       1,963,229    2,027,567
> 
> The new implementation shows clear fairness improvements, reducing the
> standard deviation relative to the mean by 12.8 percentage points for
> pgsteal and bringing the pgsteal ratio closer to 100%. Refault counts
> increased by 3.2% (from 1,963,229 to 2,027,567).

Just as a quick clarification -- this isn't supposed to affect regular
(CONFIG_LRU_GEN_ENABLED=n) reclaim, correct?

Thanks,
Michal

Re: [PATCH -next 0/5] mm/mglru: remove memcg lru

Posted by Chen Ridong 1 month, 3 weeks ago


On 2025/12/16 0:18, Michal Koutný wrote:
> Hi.
> 
> On Tue, Dec 09, 2025 at 01:25:52AM +0000, Chen Ridong <chenridong@huaweicloud.com> wrote:
>> From: Chen Ridong <chenridong@huawei.com>
>>
>> The memcg LRU was introduced to improve scalability in global reclaim,
>> but its implementation has grown complex and can cause performance
>> regressions when creating many memory cgroups [1].
>>
>> This series implements mem_cgroup_iter with a reclaim cookie in
>> shrink_many() for global reclaim, following the pattern already used in
>> shrink_node_memcgs(), an approach suggested by Johannes [1]. The new
>> design maintains good fairness across cgroups by preserving iteration
>> state between reclaim passes.
>>
>> Testing was performed using the original stress test from Yu Zhao [2] on a
>> 1 TB, 4-node NUMA system. The results show:
> 
> (I think the cover letter somehow lost the targets of [1],[2]. I assume
> I could retrieve those from patch 1/5.)
> 

Hi Michal,

Thanks for the reminder—I appreciate you pointing that out.

Apologies for missing the links in the cover letter. You can find them in patch 1/5.

> 
>>
>>     pgsteal:
>>                                         memcg LRU    memcg iter
>>     stddev(pgsteal) / mean(pgsteal)     106.03%       93.20%
>>     sum(pgsteal) / sum(requested)        98.10%       99.28%
>>     
>>     workingset_refault_anon:
>>                                         memcg LRU    memcg iter
>>     stddev(refault) / mean(refault)     193.97%      134.67%
>>     sum(refault)                       1,963,229    2,027,567
>>
>> The new implementation shows clear fairness improvements, reducing the
>> standard deviation relative to the mean by 12.8 percentage points for
>> pgsteal and bringing the pgsteal ratio closer to 100%. Refault counts
>> increased by 3.2% (from 1,963,229 to 2,027,567).
> 
> Just as a quick clarification -- this isn't supposed to affect regular
> (CONFIG_LRU_GEN_ENABLED=n) reclaim, correct?
> 
> Thanks,
> Michal

That's correct. To be precise, it only affects root reclaim when lru_gen_enabled() returns true.

Note that the generation LRU can still be enabled via /sys/kernel/mm/lru_gen/enabled even when
CONFIG_LRU_GEN_ENABLED=n.

-- 
Best regards,
Ridong