[v1] RE: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs

RE: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs

Posted by wangzicheng 2 months, 1 week ago

Hi Barry,

> Hi Liam,
> 
> I saw you mentioned me, so I just wanted to join in :-)
> 
> On Sat, Nov 29, 2025 at 12:16 AM Liam R. Howlett <Liam.Howlett@oracle.com>
> wrote:
> >
> > * Matthew Wilcox <willy@infradead.org> [251128 10:16]:
> > > On Fri, Nov 28, 2025 at 10:53:12AM +0800, Zicheng Wang wrote:
> > > > Case study:
> > > > A widely observed issue on Android is that after application
> > > > launch,
> >
> > What do you mean by application launch?  What does this mean in the
> > kernel context?
> 
> I think there are two cases. First, a cold start: a new process is forked to
> launch the app. Second, when the app switches from background to
> foreground, for example when we bring it back to the screen after it has
> been running in the background.
> 
> In the first case, you reboot your phone and tap the YouTube icon to start
> the app (cold launch). In the second case, you are watching a video in
> YouTube, then switch to Facebook, and later tap the YouTube icon again to
> bring it from background to foreground.
> 
Thanks for the explain, that's exactly what I meant.  

Android lifecycle model isn't obvious outside the Android context. I’ll make that 
clearer in the next version.
> >
> > > > the oldest anon generation often becomes empty, and file pages are
> > > > over-reclaimed.
> > >
> > > You should fix the bug, not move the debug interface to procfs.  NACK.
> >
> > Barry recently sent an RFC [1] to affect LRU in the exit path for
> > Android.  This was proven incorrect by Johannes, iirc, in another
> > thread I cannot find (destroys performance of calling the same command).
> 
> My understanding is that affecting the LRU in the exit path is not generally
> correct, but it still highlights a requirement: Linux LRU needs a way to
> understand app-cycling behavior in an Android-like system.
> 
> >
> > These ideas seem both related as it points to a suboptimal LRU in the
> > Android ecosystem, at least.  It seems to stem from Androids life
> > (cycle) choices :)
> >
> > I strongly agree with Willy.  We don't want another userspace daemon
> > and/or interface, but this time to play with the LRU to avoid trying
> > to define and fix the problem.
> >
> > Do you know if this affects others or why it is android specific?
> 
> The behavior Zicheng probably wants is a proactive memory reclamation
> interface. For example, since each app may be in a different memcg, if an
> app has been in the background for a long time, he wants to reclaim its
> memory proactively rather than waiting until kswapd hits the watermarks.
> 
> This may help a newly launched app obtain memory more quickly, avoiding
> delays from reclamation, since a new app typically requires a substantial
> amount of memory.
> 
> Zicheng, please let me know if I’m misunderstanding anything.

Yes, but not least.

1. proactive memory reclaim: yes, that's we are after. 
When an app is swiped away and kept in the background and not use for a while, 
proactively reclaiming its memcg can help new foreground apps get memory 
faster (instead of paying the cost of direct reclaim).

2. Anon v.s. File: *bias more towards anonymous* pages for background apps.
With mglru, however, the oldest generations often contain almost no anon pages,
so simply tuning swappiness cannot achieve that -- reclaim will still clear file cache
in the old generations first.
To some extent, file caches are `over-reclaimed` in such senario, leading to a disaster
when user‑interaction threads get stuck in direct reclaim of anon pages.

See the case in the cover letter.
```
memcg    54 /apps/some_app
node     0
1     119804          0       85461
2     119804          0           5
3     119804     181719       18667
4       1752        392         244
```

> 
> >
> > [1].
> > https://lore.kernel.org/all/20250514070820.51793-1-21cnbao@gmail.com/
> >
> 
> Thanks
> Barry

Since the semantic gap between user/kernel space will always exist.
It would be great benefits for leaving some APIs for user hints, just like 
mmadvise/userfault/para-virtualization.
Exposing such hints to the kernel can help improve overall system performance.

Best,
Zicheng

Re: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs

Posted by Barry Song 2 months, 1 week ago

On Mon, Dec 1, 2025 at 2:50 PM wangzicheng <wangzicheng@honor.com> wrote:
>
> Hi Barry,
>
> > Hi Liam,
> >
> > I saw you mentioned me, so I just wanted to join in :-)
> >
> > On Sat, Nov 29, 2025 at 12:16 AM Liam R. Howlett <Liam.Howlett@oracle.com>
> > wrote:
> > >
> > > * Matthew Wilcox <willy@infradead.org> [251128 10:16]:
> > > > On Fri, Nov 28, 2025 at 10:53:12AM +0800, Zicheng Wang wrote:
> > > > > Case study:
> > > > > A widely observed issue on Android is that after application
> > > > > launch,
> > >
> > > What do you mean by application launch?  What does this mean in the
> > > kernel context?
> >
> > I think there are two cases. First, a cold start: a new process is forked to
> > launch the app. Second, when the app switches from background to
> > foreground, for example when we bring it back to the screen after it has
> > been running in the background.
> >
> > In the first case, you reboot your phone and tap the YouTube icon to start
> > the app (cold launch). In the second case, you are watching a video in
> > YouTube, then switch to Facebook, and later tap the YouTube icon again to
> > bring it from background to foreground.
> >
> Thanks for the explain, that's exactly what I meant.
>
> Android lifecycle model isn't obvious outside the Android context. I’ll make that
> clearer in the next version.
> > >
> > > > > the oldest anon generation often becomes empty, and file pages are
> > > > > over-reclaimed.
> > > >
> > > > You should fix the bug, not move the debug interface to procfs.  NACK.
> > >
> > > Barry recently sent an RFC [1] to affect LRU in the exit path for
> > > Android.  This was proven incorrect by Johannes, iirc, in another
> > > thread I cannot find (destroys performance of calling the same command).
> >
> > My understanding is that affecting the LRU in the exit path is not generally
> > correct, but it still highlights a requirement: Linux LRU needs a way to
> > understand app-cycling behavior in an Android-like system.
> >
> > >
> > > These ideas seem both related as it points to a suboptimal LRU in the
> > > Android ecosystem, at least.  It seems to stem from Androids life
> > > (cycle) choices :)
> > >
> > > I strongly agree with Willy.  We don't want another userspace daemon
> > > and/or interface, but this time to play with the LRU to avoid trying
> > > to define and fix the problem.
> > >
> > > Do you know if this affects others or why it is android specific?
> >
> > The behavior Zicheng probably wants is a proactive memory reclamation
> > interface. For example, since each app may be in a different memcg, if an
> > app has been in the background for a long time, he wants to reclaim its
> > memory proactively rather than waiting until kswapd hits the watermarks.
> >
> > This may help a newly launched app obtain memory more quickly, avoiding
> > delays from reclamation, since a new app typically requires a substantial
> > amount of memory.
> >
> > Zicheng, please let me know if I’m misunderstanding anything.
>
> Yes, but not least.
>
> 1. proactive memory reclaim: yes, that's we are after.
> When an app is swiped away and kept in the background and not use for a while,
> proactively reclaiming its memcg can help new foreground apps get memory
> faster (instead of paying the cost of direct reclaim).
>
> 2. Anon v.s. File: *bias more towards anonymous* pages for background apps.
> With mglru, however, the oldest generations often contain almost no anon pages,
> so simply tuning swappiness cannot achieve that -- reclaim will still clear file cache
> in the old generations first.
> To some extent, file caches are `over-reclaimed` in such senario, leading to a disaster
> when user‑interaction threads get stuck in direct reclaim of anon pages.

I strongly recommend separating this from your patchset. Avoid including
unrelated changes in a single patchset.

MGLRU has a mechanism to ensure that file and anon pages can keep pace
with each other. In the newest kernel, the minimum generation is 2. For
example, if anon has only 2 generations left and we decide to reclaim
anon folios, we will fall back to reclaiming file pages. Sometimes,
this means that anon reclamation is insufficient while file pages are
over-reclaimed.

static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
                       struct scan_control *sc, int type, int tier,
                       struct list_head *list)
{
        ...
        if (get_nr_gens(lruvec, type) == MIN_NR_GENS)
                return 0;
        ...
}

This is probably not a bug, but this design can sometimes work
suboptimally.

Regarding this issue, both Kairui (from the Linux server side, cc-ed) and I
(from the Android side) have observed it. This should be addressed in
MGLRU's code, and we already have kernel code for that. It is unrelated
to your patchset, so you shouldn’t include so many unrelated changes in
a single patchset.

Please keep your patchset focused solely on whether the MGLRU proactive
reclamation interface should be promoted to sysfs (LRU_GEN already has a
folder in sysfs) instead of debugfs, if there is a v2.

The following is quoted from
`Documentation/admin-guide/mm/multigen_lru.rst`.

Proactive reclaim
-----------------
Proactive reclaim induces page reclaim when there is no memory
pressure. It usually targets cold pages only. E.g., when a new job
comes in, the job scheduler wants to proactively reclaim cold pages on
the server it selected, to improve the chance of successfully landing
this new job.

Users can write the following command to ``lru_gen`` to evict
generations less than or equal to ``min_gen_nr``.

    ``- memcg_id node_id min_gen_nr [swappiness [nr_to_reclaim]]``


>
> See the case in the cover letter.
> ```
> memcg    54 /apps/some_app
> node     0
> 1     119804          0       85461
> 2     119804          0           5
> 3     119804     181719       18667
> 4       1752        392         244
> ```
>
>
> Since the semantic gap between user/kernel space will always exist.
> It would be great benefits for leaving some APIs for user hints, just like
> mmadvise/userfault/para-virtualization.

Nope. This is just an internal detail of MGLRU and shouldn’t be exposed
as an interface.
Hopefully, Kairui or I will send a patchset soon to address the balance
issue between file and anon pages. For now, you can use `swappiness=201`
as a temporary workaround. Take a look at bytedance's patchset.[1]

> Exposing such hints to the kernel can help improve overall system performance.

[1] https://lore.kernel.org/linux-mm/cover.1744169302.git.hezhongkun.hzk@bytedance.com/

Thanks
Barry

RE: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs

Posted by wangzicheng 2 months, 1 week ago

> 
> I strongly recommend separating this from your patchset. Avoid including
> unrelated changes in a single patchset.
> 
Thank you for the clarification, separating it from our patchset makes sense.

Recall that imbalance file/anon generations is one of the reasons to move `lru_gen`
files out of the debugfs.

> MGLRU has a mechanism to ensure that file and anon pages can keep pace
> with each other. In the newest kernel, the minimum generation is 2. For
> example, if anon has only 2 generations left and we decide to reclaim anon
> folios, we will fall back to reclaiming file pages. Sometimes, this means that
> anon reclamation is insufficient while file pages are over-reclaimed.
> 
> static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
>                        struct scan_control *sc, int type, int tier,
>                        struct list_head *list) {
>         ...
>         if (get_nr_gens(lruvec, type) == MIN_NR_GENS)
>                 return 0;
>         ...
> }
> 
> This is probably not a bug, but this design can sometimes work suboptimally.
> 

Yes, our patchset also aims to solve similar cases by proactive aging 2/3 gens.

> Regarding this issue, both Kairui (from the Linux server side, cc-ed) and I
> (from the Android side) have observed it. This should be addressed in
> MGLRU's code, and we already have kernel code for that. It is unrelated to
> your patchset, so you shouldn’t include so many unrelated changes in a
> single patchset.
> 
> Please keep your patchset focused solely on whether the MGLRU proactive
> reclamation interface should be promoted to sysfs (LRU_GEN already has a
> folder in sysfs) instead of debugfs, if there is a v2.
> 
> The following is quoted from
> `Documentation/admin-guide/mm/multigen_lru.rst`.
> 
> Proactive reclaim
> -----------------
> Proactive reclaim induces page reclaim when there is no memory pressure. It
> usually targets cold pages only. E.g., when a new job comes in, the job
> scheduler wants to proactively reclaim cold pages on the server it selected,
> to improve the chance of successfully landing this new job.
> 
> Users can write the following command to ``lru_gen`` to evict generations
> less than or equal to ``min_gen_nr``.
> 
>     ``- memcg_id node_id min_gen_nr [swappiness [nr_to_reclaim]]``
> 
> 
> >
> > See the case in the cover letter.
> > ```
> > memcg    54 /apps/some_app
> > node     0
> > 1     119804          0       85461
> > 2     119804          0           5
> > 3     119804     181719       18667
> > 4       1752        392         244
> > ```
> >
> >
> > Since the semantic gap between user/kernel space will always exist.
> > It would be great benefits for leaving some APIs for user hints, just
> > like mmadvise/userfault/para-virtualization.
> 
> Nope. This is just an internal detail of MGLRU and shouldn’t be exposed as an
> interface.
> Hopefully, Kairui or I will send a patchset soon to address the balance issue
> between file and anon pages. For now, you can use `swappiness=201` as a
> temporary workaround. Take a look at bytedance's patchset.[1]
> 
Sound great:), we are looking forward to it.

> > Exposing such hints to the kernel can help improve overall system
> performance.
> 
> [1] https://lore.kernel.org/linux-
> mm/cover.1744169302.git.hezhongkun.hzk@bytedance.com/
> 
And thank you for the `swappiness=201` workaround, we will research on it.

> Thanks
> Barry

Best,
Zicheng

Re: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs

Posted by Barry Song 2 months, 1 week ago

On Mon, Dec 1, 2025 at 4:14 PM wangzicheng <wangzicheng@honor.com> wrote:
>
> >
> > I strongly recommend separating this from your patchset. Avoid including
> > unrelated changes in a single patchset.
> >
> Thank you for the clarification, separating it from our patchset makes sense.
>

Also note that memcg already has an interface for proactive reclamation,
so I’m not certain whether your patchset can coexist with it or extend
it to meet your requirements—which seems quite impossible to me

memory.reclaim
        A write-only nested-keyed file which exists for all cgroups.

        This is a simple interface to trigger memory reclaim in the
        target cgroup.

        Example::

          echo "1G" > memory.reclaim

        Please note that the kernel can over or under reclaim from
        the target cgroup. If less bytes are reclaimed than the
        specified amount, -EAGAIN is returned.

Thanks
Barry

RE: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs

Posted by wangzicheng 2 months, 1 week ago

Hi Barry,

Thank you for the comment, actually we do know the cgroup file.

What we really need is to *proactive aging 2~3 gens* before proactive reclaim.
(especially after cold launches when no anon pages in the oldest gens)

The proactive aging also helps distribute the anon and file pages evenly in 
MGLRU gens. And reclaiming won't fall into file caches.

> Also note that memcg already has an interface for proactive reclamation,
> so I’m not certain whether your patchset can coexist with it or extend
> it to meet your requirements—which seems quite impossible to me
> 
> memory.reclaim
>         A write-only nested-keyed file which exists for all cgroups.
> 
>         This is a simple interface to trigger memory reclaim in the
>         target cgroup.
> 
>         Example::
> 
>           echo "1G" > memory.reclaim
> 
>         Please note that the kernel can over or under reclaim from
>         the target cgroup. If less bytes are reclaimed than the
>         specified amount, -EAGAIN is returned.
> 
This remind me that adding a `memor.aging` under memcg directories
rather than adding new procfs files is also a great option.

> Thanks
> Barry

Thanks,
Zicheng

Re: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs

Posted by Barry Song 2 months, 1 week ago

Hi Zicheng,

On Mon, Dec 1, 2025 at 5:55 PM wangzicheng <wangzicheng@honor.com> wrote:
>
> Hi Barry,
>
> Thank you for the comment, actually we do know the cgroup file.
>
> What we really need is to *proactive aging 2~3 gens* before proactive reclaim.
> (especially after cold launches when no anon pages in the oldest gens)
>
> The proactive aging also helps distribute the anon and file pages evenly in
> MGLRU gens. And reclaiming won't fall into file caches.

I’m not quite sure what you mean by “reclaiming won’t fall into file caches.”

I assume you mean you configured a high swappiness for MGLRU proactive
reclamation, so when both anon and file have four generations,
`get_type_to_scan()` effectively always returns anon?

>
> > Also note that memcg already has an interface for proactive reclamation,
> > so I’m not certain whether your patchset can coexist with it or extend
> > it to meet your requirements—which seems quite impossible to me
> >
> > memory.reclaim
> >         A write-only nested-keyed file which exists for all cgroups.
> >
> >         This is a simple interface to trigger memory reclaim in the
> >         target cgroup.
> >
> >         Example::
> >
> >           echo "1G" > memory.reclaim
> >
> >         Please note that the kernel can over or under reclaim from
> >         the target cgroup. If less bytes are reclaimed than the
> >         specified amount, -EAGAIN is returned.
> >
> This remind me that adding a `memor.aging` under memcg directories
> rather than adding new procfs files is also a great option.

I still don’t understand why. Aging is something MGLRU itself should
handle; components outside MGLRU, such as cgroup v2, do not need to be
aware of this concept at all. Exposing it will likely lead to another
immediate NAK.

In short, aging should remain within MGLRU’s internal scope.

But it seems you do want some policy control for your proactive
reclamation, such as always reclaiming anon pages or reclaiming them
more aggressively than file pages. I assume Zhongkun’s patch [1] we
mentioned earlier should provide support for that, correct?

As a workaround, you can set `swappiness=max` for `memory.reclaim` before
we internally improve the handling of the aging issue. In short,
“proactive aging” and similar mechanisms should be handled automatically
and internally within the scope of the MGLRU code.

[1] https://lore.kernel.org/linux-mm/cover.1744169302.git.hezhongkun.hzk@bytedance.com/

Thanks
Barry

RE: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs

Posted by wangzicheng 2 months, 1 week ago

Hi Barry,

Thank you for the follow-up questions.

It seems that our main testbed (kernel v6.6/v6.12 for latest devices), 
don't have SWAPPINESS_ANON_ONLY/201 - related patches yet.

Since the max swappiness is 200, there are quite scenarios that file
pages are the only option.

Quote from kairui's reply:
> Right, we are seeing similar problems on our server too. To workaround
> it we force an age iteration before reclaiming when it happens, which
> isn't the best choice. When the LRU is long and the opposite type of
> the folios we want to reclaim is piling up in the oldest gen, a forced
> age will have to move all these folios, which leads to long tailing
> issues. Let's work on a reasonable solution for that.

Again, thank you for your guidance. We will carefully evaluate the 
Patchset[1] you recommended.

> Hi Zicheng,
> 
> On Mon, Dec 1, 2025 at 5:55 PM wangzicheng <wangzicheng@honor.com>
> wrote:
> >
> > Hi Barry,
> >
> > Thank you for the comment, actually we do know the cgroup file.
> >
> > What we really need is to *proactive aging 2~3 gens* before proactive
> reclaim.
> > (especially after cold launches when no anon pages in the oldest gens)
> >
> > The proactive aging also helps distribute the anon and file pages evenly in
> > MGLRU gens. And reclaiming won't fall into file caches.
> 
> I’m not quite sure what you mean by “reclaiming won’t fall into file caches.”
> 
> I assume you mean you configured a high swappiness for MGLRU proactive
> reclamation, so when both anon and file have four generations,
> `get_type_to_scan()` effectively always returns anon?
> 
> >
> > > Also note that memcg already has an interface for proactive reclamation,
> > > so I’m not certain whether your patchset can coexist with it or extend
> > > it to meet your requirements—which seems quite impossible to me
> > >
> > > memory.reclaim
> > >         A write-only nested-keyed file which exists for all cgroups.
> > >
> > >         This is a simple interface to trigger memory reclaim in the
> > >         target cgroup.
> > >
> > >         Example::
> > >
> > >           echo "1G" > memory.reclaim
> > >
> > >         Please note that the kernel can over or under reclaim from
> > >         the target cgroup. If less bytes are reclaimed than the
> > >         specified amount, -EAGAIN is returned.
> > >
> > This remind me that adding a `memor.aging` under memcg directories
> > rather than adding new procfs files is also a great option.
> 
> I still don’t understand why. Aging is something MGLRU itself should
> handle; components outside MGLRU, such as cgroup v2, do not need to be
> aware of this concept at all. Exposing it will likely lead to another
> immediate NAK.
> 
> In short, aging should remain within MGLRU’s internal scope.

I would like to express a different point of view. We are working on something
Interesting on it, will be shared once ready.

> 
> But it seems you do want some policy control for your proactive
> reclamation, such as always reclaiming anon pages or reclaiming them
> more aggressively than file pages. I assume Zhongkun’s patch [1] we
> mentioned earlier should provide support for that, correct?
> 
> As a workaround, you can set `swappiness=max` for `memory.reclaim`
> before
> we internally improve the handling of the aging issue. In short,
> “proactive aging” and similar mechanisms should be handled automatically
> and internally within the scope of the MGLRU code.

Sure, we will make a careful evaluation.

> 
> [1] https://lore.kernel.org/linux-
> mm/cover.1744169302.git.hezhongkun.hzk@bytedance.com/
> 
> Thanks
> Barry

Thanks
Zicheng

Re: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs

Posted by Barry Song 2 months, 1 week ago

On Mon, Dec 1, 2025 at 9:32 PM wangzicheng <wangzicheng@honor.com> wrote:
>
> Hi Barry,
>
> Thank you for the follow-up questions.
>
> It seems that our main testbed (kernel v6.6/v6.12 for latest devices),
> don't have SWAPPINESS_ANON_ONLY/201 - related patches yet.

Then please check with Suren whether it is possible to backport this to
the Android common kernel.
My understanding is that this should already be present in the Android 6.12
kernel.

>
> Since the max swappiness is 200, there are quite scenarios that file
> pages are the only option.
>
> Quote from kairui's reply:
> > Right, we are seeing similar problems on our server too. To workaround
> > it we force an age iteration before reclaiming when it happens, which
> > isn't the best choice. When the LRU is long and the opposite type of
> > the folios we want to reclaim is piling up in the oldest gen, a forced
> > age will have to move all these folios, which leads to long tailing
> > issues. Let's work on a reasonable solution for that.
>

We all agree that MGLRU has this generation issue. You mentioned it, I agreed
and noted that both Kairui and I had observed it. Then Kairui replied that he
had indeed seen it as well. Now you are using Kairui’s reply to argue against
me, and I honestly don’t understand the logic behind your responses.

> Again, thank you for your guidance. We will carefully evaluate the
> Patchset[1] you recommended.
>
> > Hi Zicheng,
> >
> > On Mon, Dec 1, 2025 at 5:55 PM wangzicheng <wangzicheng@honor.com>
> > wrote:
> > >
> > > Hi Barry,
> > >
> > > Thank you for the comment, actually we do know the cgroup file.
> > >
> > > What we really need is to *proactive aging 2~3 gens* before proactive
> > reclaim.
> > > (especially after cold launches when no anon pages in the oldest gens)
> > >
> > > The proactive aging also helps distribute the anon and file pages evenly in
> > > MGLRU gens. And reclaiming won't fall into file caches.
> >
> > I’m not quite sure what you mean by “reclaiming won’t fall into file caches.”
> >
> > I assume you mean you configured a high swappiness for MGLRU proactive
> > reclamation, so when both anon and file have four generations,
> > `get_type_to_scan()` effectively always returns anon?
> >
> > >
> > > > Also note that memcg already has an interface for proactive reclamation,
> > > > so I’m not certain whether your patchset can coexist with it or extend
> > > > it to meet your requirements—which seems quite impossible to me
> > > >
> > > > memory.reclaim
> > > >         A write-only nested-keyed file which exists for all cgroups.
> > > >
> > > >         This is a simple interface to trigger memory reclaim in the
> > > >         target cgroup.
> > > >
> > > >         Example::
> > > >
> > > >           echo "1G" > memory.reclaim
> > > >
> > > >         Please note that the kernel can over or under reclaim from
> > > >         the target cgroup. If less bytes are reclaimed than the
> > > >         specified amount, -EAGAIN is returned.
> > > >
> > > This remind me that adding a `memor.aging` under memcg directories
> > > rather than adding new procfs files is also a great option.
> >
> > I still don’t understand why. Aging is something MGLRU itself should
> > handle; components outside MGLRU, such as cgroup v2, do not need to be
> > aware of this concept at all. Exposing it will likely lead to another
> > immediate NAK.
> >
> > In short, aging should remain within MGLRU’s internal scope.
>
> I would like to express a different point of view. We are working on something
> Interesting on it, will be shared once ready.

You are always welcome to share, but please understand that memory.aging is
not of interest to any module outside the scope of MGLRU itself. An interface
is an interface, and internal implementation should remain internal. In other
words, there is no reason for cgroupv2 to be aware of what “aging” is.

You may submit your new code as a "fix" for the generation issue without
introducing a new interface. That would be a good starting point for
discussing how to resolve the problem.

>
> >
> > But it seems you do want some policy control for your proactive
> > reclamation, such as always reclaiming anon pages or reclaiming them
> > more aggressively than file pages. I assume Zhongkun’s patch [1] we
> > mentioned earlier should provide support for that, correct?
> >
> > As a workaround, you can set `swappiness=max` for `memory.reclaim`
> > before
> > we internally improve the handling of the aging issue. In short,
> > “proactive aging” and similar mechanisms should be handled automatically
> > and internally within the scope of the MGLRU code.
>
> Sure, we will make a careful evaluation.

Thanks
Barry

RE: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs

Posted by wangzicheng 2 months, 1 week ago

Hi Barry,

> Then please check with Suren whether it is possible to backport this to
> the Android common kernel.
> My understanding is that this should already be present in the Android 6.12
> kernel.
> 
Thanks for the reminding.

> >
> > Since the max swappiness is 200, there are quite scenarios that file
> > pages are the only option.
> >
> > Quote from kairui's reply:
> > > Right, we are seeing similar problems on our server too. To workaround
> > > it we force an age iteration before reclaiming when it happens, which
> > > isn't the best choice. When the LRU is long and the opposite type of
> > > the folios we want to reclaim is piling up in the oldest gen, a forced
> > > age will have to move all these folios, which leads to long tailing
> > > issues. Let's work on a reasonable solution for that.
> >
> 
> We all agree that MGLRU has this generation issue. You mentioned it, I
> agreed
> and noted that both Kairui and I had observed it. Then Kairui replied that he
> had indeed seen it as well. Now you are using Kairui’s reply to argue against
> me, and I honestly don’t understand the logic behind your responses.
> 

My apologize if my previous wording caused any confusion.

The only thing the patchset (want to) do is forcing 2/3 gens aging right before proactive
reclaim, and it helps reclaim more anon pages and preserve more file pages under
certain workload. (400~800MB MemAvailable improvement).

The reason for quoting Kairui's reply:
`force aging 2/3 gens before reclaim` would be roughly similar in spirit to what Kairui
referred to ` force an age iteration before reclaiming`, from my understanding.

If my understanding is inaccurate, please feel free to correct me.

> > Again, thank you for your guidance. We will carefully evaluate the
> > Patchset[1] you recommended.
> >
> > > Hi Zicheng,
> > >
> > > On Mon, Dec 1, 2025 at 5:55 PM wangzicheng <wangzicheng@honor.com>
> > > wrote:
> > > >
> > > > Hi Barry,
> > > >
> > > > Thank you for the comment, actually we do know the cgroup file.
> > > >
> > > > What we really need is to *proactive aging 2~3 gens* before proactive
> > > reclaim.
> > > > (especially after cold launches when no anon pages in the oldest gens)
> > > >
> > > > The proactive aging also helps distribute the anon and file pages evenly
> in
> > > > MGLRU gens. And reclaiming won't fall into file caches.
> > >
> > > I’m not quite sure what you mean by “reclaiming won’t fall into file
> caches.”
> > >
> > > I assume you mean you configured a high swappiness for MGLRU
> proactive
> > > reclamation, so when both anon and file have four generations,
> > > `get_type_to_scan()` effectively always returns anon?
> > >
> > > >
> > > > > Also note that memcg already has an interface for proactive
> reclamation,
> > > > > so I’m not certain whether your patchset can coexist with it or extend
> > > > > it to meet your requirements—which seems quite impossible to me
> > > > >
> > > > > memory.reclaim
> > > > >         A write-only nested-keyed file which exists for all cgroups.
> > > > >
> > > > >         This is a simple interface to trigger memory reclaim in the
> > > > >         target cgroup.
> > > > >
> > > > >         Example::
> > > > >
> > > > >           echo "1G" > memory.reclaim
> > > > >
> > > > >         Please note that the kernel can over or under reclaim from
> > > > >         the target cgroup. If less bytes are reclaimed than the
> > > > >         specified amount, -EAGAIN is returned.
> > > > >
> > > > This remind me that adding a `memor.aging` under memcg directories
> > > > rather than adding new procfs files is also a great option.
> > >
> > > I still don’t understand why. Aging is something MGLRU itself should
> > > handle; components outside MGLRU, such as cgroup v2, do not need to
> be
> > > aware of this concept at all. Exposing it will likely lead to another
> > > immediate NAK.
> > >
> > > In short, aging should remain within MGLRU’s internal scope.
> >
> > I would like to express a different point of view. We are working on
> something
> > Interesting on it, will be shared once ready.
> 
> You are always welcome to share, but please understand that memory.aging
> is
> not of interest to any module outside the scope of MGLRU itself. An
> interface
> is an interface, and internal implementation should remain internal. In other
> words, there is no reason for cgroupv2 to be aware of what “aging” is.
> 
> You may submit your new code as a "fix" for the generation issue without
> introducing a new interface. That would be a good starting point for
> discussing how to resolve the problem.
> 

Completely agree with your guidance.
We will revisit the design and think about the next version, and try to keep the
mechanism internally.

> >
> > >
> > > But it seems you do want some policy control for your proactive
> > > reclamation, such as always reclaiming anon pages or reclaiming them
> > > more aggressively than file pages. I assume Zhongkun’s patch [1] we
> > > mentioned earlier should provide support for that, correct?
> > >
> > > As a workaround, you can set `swappiness=max` for `memory.reclaim`
> > > before
> > > we internally improve the handling of the aging issue. In short,
> > > “proactive aging” and similar mechanisms should be handled
> automatically
> > > and internally within the scope of the MGLRU code.
> >
> > Sure, we will make a careful evaluation.
> 
> Thanks
> Barry

Best,
Zicheng

Re: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs

Posted by Kairui Song 2 months, 1 week ago

On Mon, Dec 1, 2025 at 3:46 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Mon, Dec 1, 2025 at 2:50 PM wangzicheng <wangzicheng@honor.com> wrote:
> >
> > Hi Barry,
> >
> > > Hi Liam,
> > >
> > > I saw you mentioned me, so I just wanted to join in :-)
> > >
> > > On Sat, Nov 29, 2025 at 12:16 AM Liam R. Howlett <Liam.Howlett@oracle.com>
> > > wrote:
> > > >
> > > > * Matthew Wilcox <willy@infradead.org> [251128 10:16]:
> > > > > On Fri, Nov 28, 2025 at 10:53:12AM +0800, Zicheng Wang wrote:
> > > > > > Case study:
> > > > > > A widely observed issue on Android is that after application
> > > > > > launch,
> > > >
> > > > What do you mean by application launch?  What does this mean in the
> > > > kernel context?
> > >
> > > I think there are two cases. First, a cold start: a new process is forked to
> > > launch the app. Second, when the app switches from background to
> > > foreground, for example when we bring it back to the screen after it has
> > > been running in the background.
> > >
> > > In the first case, you reboot your phone and tap the YouTube icon to start
> > > the app (cold launch). In the second case, you are watching a video in
> > > YouTube, then switch to Facebook, and later tap the YouTube icon again to
> > > bring it from background to foreground.
> > >
> > Thanks for the explain, that's exactly what I meant.
> >
> > Android lifecycle model isn't obvious outside the Android context. I’ll make that
> > clearer in the next version.
> > > >
> > > > > > the oldest anon generation often becomes empty, and file pages are
> > > > > > over-reclaimed.
> > > > >
> > > > > You should fix the bug, not move the debug interface to procfs.  NACK.
> > > >
> > > > Barry recently sent an RFC [1] to affect LRU in the exit path for
> > > > Android.  This was proven incorrect by Johannes, iirc, in another
> > > > thread I cannot find (destroys performance of calling the same command).
> > >
> > > My understanding is that affecting the LRU in the exit path is not generally
> > > correct, but it still highlights a requirement: Linux LRU needs a way to
> > > understand app-cycling behavior in an Android-like system.
> > >
> > > >
> > > > These ideas seem both related as it points to a suboptimal LRU in the
> > > > Android ecosystem, at least.  It seems to stem from Androids life
> > > > (cycle) choices :)
> > > >
> > > > I strongly agree with Willy.  We don't want another userspace daemon
> > > > and/or interface, but this time to play with the LRU to avoid trying
> > > > to define and fix the problem.
> > > >
> > > > Do you know if this affects others or why it is android specific?
> > >
> > > The behavior Zicheng probably wants is a proactive memory reclamation
> > > interface. For example, since each app may be in a different memcg, if an
> > > app has been in the background for a long time, he wants to reclaim its
> > > memory proactively rather than waiting until kswapd hits the watermarks.
> > >
> > > This may help a newly launched app obtain memory more quickly, avoiding
> > > delays from reclamation, since a new app typically requires a substantial
> > > amount of memory.
> > >
> > > Zicheng, please let me know if I’m misunderstanding anything.
> >
> > Yes, but not least.
> >
> > 1. proactive memory reclaim: yes, that's we are after.
> > When an app is swiped away and kept in the background and not use for a while,
> > proactively reclaiming its memcg can help new foreground apps get memory
> > faster (instead of paying the cost of direct reclaim).
> >
> > 2. Anon v.s. File: *bias more towards anonymous* pages for background apps.
> > With mglru, however, the oldest generations often contain almost no anon pages,
> > so simply tuning swappiness cannot achieve that -- reclaim will still clear file cache
> > in the old generations first.
> > To some extent, file caches are `over-reclaimed` in such senario, leading to a disaster
> > when user‑interaction threads get stuck in direct reclaim of anon pages.
> I strongly recommend separating this from your patchset. Avoid including
> unrelated changes in a single patchset.
>
> MGLRU has a mechanism to ensure that file and anon pages can keep pace
> with each other. In the newest kernel, the minimum generation is 2. For
> example, if anon has only 2 generations left and we decide to reclaim
> anon folios, we will fall back to reclaiming file pages. Sometimes,
> this means that anon reclamation is insufficient while file pages are
> over-reclaimed.
>
> static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
>                        struct scan_control *sc, int type, int tier,
>                        struct list_head *list)
> {
>         ...
>         if (get_nr_gens(lruvec, type) == MIN_NR_GENS)
>                 return 0;
>         ...
> }
>
> This is probably not a bug, but this design can sometimes work
> suboptimally.
>
> Regarding this issue, both Kairui (from the Linux server side, cc-ed) and I
> (from the Android side) have observed it. This should be addressed in
> MGLRU's code, and we already have kernel code for that. It is unrelated
> to your patchset, so you shouldn’t include so many unrelated changes in
> a single patchset.

Thanks for including me in the discussion.

Right, we are seeing similar problems on our server too. To workaround
it we force an age iteration before reclaiming when it happens, which
isn't the best choice. When the LRU is long and the opposite type of
the folios we want to reclaim is piling up in the oldest gen, a forced
age will have to move all these folios, which leads to long tailing
issues. Let's work on a reasonable solution for that.

>
> Please keep your patchset focused solely on whether the MGLRU proactive
> reclamation interface should be promoted to sysfs (LRU_GEN already has a
> folder in sysfs) instead of debugfs, if there is a v2.
>
> The following is quoted from
> `Documentation/admin-guide/mm/multigen_lru.rst`.
>
> Proactive reclaim
> -----------------
> Proactive reclaim induces page reclaim when there is no memory
> pressure. It usually targets cold pages only. E.g., when a new job
> comes in, the job scheduler wants to proactively reclaim cold pages on
> the server it selected, to improve the chance of successfully landing
> this new job.
>
> Users can write the following command to ``lru_gen`` to evict
> generations less than or equal to ``min_gen_nr``.
>
>     ``- memcg_id node_id min_gen_nr [swappiness [nr_to_reclaim]]``
>
>
> >
> > See the case in the cover letter.
> > ```
> > memcg    54 /apps/some_app
> > node     0
> > 1     119804          0       85461
> > 2     119804          0           5
> > 3     119804     181719       18667
> > 4       1752        392         244
> > ```
> >
> >
> > Since the semantic gap between user/kernel space will always exist.
> > It would be great benefits for leaving some APIs for user hints, just like
> > mmadvise/userfault/para-virtualization.
>
> Nope. This is just an internal detail of MGLRU and shouldn’t be exposed
> as an interface.
> Hopefully, Kairui or I will send a patchset soon to address the balance
> issue between file and anon pages. For now, you can use `swappiness=201`
> as a temporary workaround. Take a look at bytedance's patchset.[1]

Agree, Thanks!

Re: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs

Posted by zhongjinji 2 months, 1 week ago

> > I strongly recommend separating this from your patchset. Avoid including
> > unrelated changes in a single patchset.
> >
> > MGLRU has a mechanism to ensure that file and anon pages can keep pace
> > with each other. In the newest kernel, the minimum generation is 2. For
> > example, if anon has only 2 generations left and we decide to reclaim
> > anon folios, we will fall back to reclaiming file pages. Sometimes,
> > this means that anon reclamation is insufficient while file pages are
> > over-reclaimed.
> >
> > static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> >                        struct scan_control *sc, int type, int tier,
> >                        struct list_head *list)
> > {
> >         ...
> >         if (get_nr_gens(lruvec, type) == MIN_NR_GENS)
> >                 return 0;
> >         ...
> > }
> >
> > This is probably not a bug, but this design can sometimes work
> > suboptimally.
> >
> > Regarding this issue, both Kairui (from the Linux server side, cc-ed) and I
> > (from the Android side) have observed it. This should be addressed in
> > MGLRU's code, and we already have kernel code for that. It is unrelated
> > to your patchset, so you shouldn’t include so many unrelated changes in
> > a single patchset.
> 
> Thanks for including me in the discussion.
> 
> Right, we are seeing similar problems on our server too. To workaround
> it we force an age iteration before reclaiming when it happens, which
> isn't the best choice. When the LRU is long and the opposite type of
> the folios we want to reclaim is piling up in the oldest gen, a forced
> age will have to move all these folios, which leads to long tailing
> issues. Let's work on a reasonable solution for that.

We have encountered the same issue on Android. When an app is frozen
(which may mean the app will not be used for a long time), we want to
reclaim the app's anonymous pages. After all inactive anonymous pages
are reclaimed, the reclamation cannot proceed further. If we actively trigger
aging on anonymous pages at this point, the number of inactive file pages
may become very large.

To address this issue, I have tried using different max_seq values for
anonymous and file pages. When reclaiming anonymous pages through memory.reclaim,
we can age only the anonymous pages. However, this approach requires extensive
code changes, and it does not seem worthwhile to implement.

RE: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs

Posted by wangzicheng 2 months, 1 week ago

> > >
> > > [1].
> > > https://lore.kernel.org/all/20250514070820.51793-1-21cnbao@gmail.com
> > > /
> > >
> >
> > Thanks
> > Barry
> 
> Since the semantic gap between user/kernel space will always exist.
> It would be great benefits for leaving some APIs for user hints, just like
> mmadvise/userfault/para-virtualization.
> Exposing such hints to the kernel can help improve overall system
> performance.
> 
> Best,
> Zicheng

More precisely, it’s a form of *proactive scanning and aging*.

Ensure a more even generational distribution between file and anonymous pages.

Best,
Zicheng