From: Kairui Song <kasong@tencent.com>
Make the scan helpers return the exact number of folios being scanned
or isolated. Since the reclaim loop now has a natural scan budget that
controls the scan progress, returning the scan number directly should
make the scan more accurate and easier to follow.
The number of scanned folios for each iteration is always positive and
larger than 0, unless the reclaim must stop for a forced aging, so
there is no more need for any special handling when there is no
progress made:
- `return isolated || !remaining ? scanned : 0` in scan_folios: both
the function and the call now just return the exact scan count,
combined with the scan budget introduced in the previous commit to
avoid livelock or under scan.
- `scanned += try_to_inc_min_seq` in evict_folios: adding a bool as a
scan count was kind of confusing and no longer needed too, as scan
number will never be zero even if none of the folio in oldest
generation is isolated.
- `evictable_min_seq + MIN_NR_GENS > max_seq` guard in evict_folios:
the per-type get_nr_gens == MIN_NR_GENS check in scan_folios
naturally returns 0 when only two gens remain and breaks the loop.
Also move try_to_inc_min_seq before isolate_folios, so that any empty
gens created by external folio freeing are also skipped.
The scan still stops if there are only two gens left as the scan number
will be zero, this behavior is same as before. This force gen protection
may get removed or softened later to improve the reclaim a bit more.
Signed-off-by: Kairui Song <kasong@tencent.com>
---
mm/vmscan.c | 46 +++++++++++++++++++++++-----------------------
1 file changed, 23 insertions(+), 23 deletions(-)
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ab81ffdb241a..c5361efa6776 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4686,7 +4686,7 @@ static bool isolate_folio(struct lruvec *lruvec, struct folio *folio, struct sca
static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
struct scan_control *sc, int type, int tier,
- struct list_head *list)
+ struct list_head *list, int *isolatedp)
{
int i;
int gen;
@@ -4756,11 +4756,9 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
if (type == LRU_GEN_FILE)
sc->nr.file_taken += isolated;
- /*
- * There might not be eligible folios due to reclaim_idx. Check the
- * remaining to prevent livelock if it's not making progress.
- */
- return isolated || !remaining ? scanned : 0;
+
+ *isolatedp = isolated;
+ return scanned;
}
static int get_tier_idx(struct lruvec *lruvec, int type)
@@ -4804,33 +4802,36 @@ static int get_type_to_scan(struct lruvec *lruvec, int swappiness)
static int isolate_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
struct scan_control *sc, int swappiness,
- int *type_scanned, struct list_head *list)
+ struct list_head *list, int *isolated,
+ int *isolate_type, int *isolate_scanned)
{
int i;
+ int scanned = 0;
int type = get_type_to_scan(lruvec, swappiness);
for_each_evictable_type(i, swappiness) {
- int scanned;
+ int type_scan;
int tier = get_tier_idx(lruvec, type);
- *type_scanned = type;
+ type_scan = scan_folios(nr_to_scan, lruvec, sc,
+ type, tier, list, isolated);
- scanned = scan_folios(nr_to_scan, lruvec, sc, type, tier, list);
- if (scanned)
- return scanned;
+ scanned += type_scan;
+ if (*isolated) {
+ *isolate_type = type;
+ *isolate_scanned = type_scan;
+ break;
+ }
type = !type;
}
- return 0;
+ return scanned;
}
static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
struct scan_control *sc, int swappiness)
{
- int type;
- int scanned;
- int reclaimed;
LIST_HEAD(list);
LIST_HEAD(clean);
struct folio *folio;
@@ -4838,19 +4839,18 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
enum node_stat_item item;
struct reclaim_stat stat;
struct lru_gen_mm_walk *walk;
+ int scanned, reclaimed;
+ int isolated = 0, type, type_scanned;
bool skip_retry = false;
- struct lru_gen_folio *lrugen = &lruvec->lrugen;
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
lruvec_lock_irq(lruvec);
- scanned = isolate_folios(nr_to_scan, lruvec, sc, swappiness, &type, &list);
-
- scanned += try_to_inc_min_seq(lruvec, swappiness);
+ try_to_inc_min_seq(lruvec, swappiness);
- if (evictable_min_seq(lrugen->min_seq, swappiness) + MIN_NR_GENS > lrugen->max_seq)
- scanned = 0;
+ scanned = isolate_folios(nr_to_scan, lruvec, sc, swappiness,
+ &list, &isolated, &type, &type_scanned);
lruvec_unlock_irq(lruvec);
@@ -4861,7 +4861,7 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
sc->nr_reclaimed += reclaimed;
trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
- scanned, reclaimed, &stat, sc->priority,
+ type_scanned, reclaimed, &stat, sc->priority,
type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
list_for_each_entry_safe_reverse(folio, next, &list, lru) {
--
2.53.0
On 3/29/26 3:52 AM, Kairui Song via B4 Relay wrote:
> From: Kairui Song <kasong@tencent.com>
>
> Make the scan helpers return the exact number of folios being scanned
> or isolated. Since the reclaim loop now has a natural scan budget that
> controls the scan progress, returning the scan number directly should
> make the scan more accurate and easier to follow.
>
> The number of scanned folios for each iteration is always positive and
> larger than 0, unless the reclaim must stop for a forced aging, so
> there is no more need for any special handling when there is no
> progress made:
>
> - `return isolated || !remaining ? scanned : 0` in scan_folios: both
> the function and the call now just return the exact scan count,
> combined with the scan budget introduced in the previous commit to
> avoid livelock or under scan.
Make sense to me.
>
> - `scanned += try_to_inc_min_seq` in evict_folios: adding a bool as a
> scan count was kind of confusing and no longer needed too, as scan
> number will never be zero even if none of the folio in oldest
> generation is isolated.
Yes, agree.
>
> - `evictable_min_seq + MIN_NR_GENS > max_seq` guard in evict_folios:
> the per-type get_nr_gens == MIN_NR_GENS check in scan_folios
> naturally returns 0 when only two gens remain and breaks the loop.
>
> Also move try_to_inc_min_seq before isolate_folios, so that any empty
> gens created by external folio freeing are also skipped.
This part is somewhat confusing. You probably mean the case where the
list of that gen becomes empty via isolate_folio(), right?
If that's the case, the original logic would remove the empty gens
produced by isolate_folio() after calling try_to_inc_min_seq().
However, with your changes, this removal won't happen until the next
eviction. Does this provide any additional benefits? Or could you
describe how this change impacts your testing?
> The scan still stops if there are only two gens left as the scan number
> will be zero, this behavior is same as before. This force gen protection
> may get removed or softened later to improve the reclaim a bit more.
>
> Signed-off-by: Kairui Song <kasong@tencent.com>
> ---
> mm/vmscan.c | 46 +++++++++++++++++++++++-----------------------
> 1 file changed, 23 insertions(+), 23 deletions(-)
>
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ab81ffdb241a..c5361efa6776 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4686,7 +4686,7 @@ static bool isolate_folio(struct lruvec *lruvec, struct folio *folio, struct sca
>
> static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> struct scan_control *sc, int type, int tier,
> - struct list_head *list)
> + struct list_head *list, int *isolatedp)
> {
> int i;
> int gen;
> @@ -4756,11 +4756,9 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
> if (type == LRU_GEN_FILE)
> sc->nr.file_taken += isolated;
> - /*
> - * There might not be eligible folios due to reclaim_idx. Check the
> - * remaining to prevent livelock if it's not making progress.
> - */
> - return isolated || !remaining ? scanned : 0;
> +
> + *isolatedp = isolated;
> + return scanned;
> }
>
> static int get_tier_idx(struct lruvec *lruvec, int type)
> @@ -4804,33 +4802,36 @@ static int get_type_to_scan(struct lruvec *lruvec, int swappiness)
>
> static int isolate_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> struct scan_control *sc, int swappiness,
> - int *type_scanned, struct list_head *list)
> + struct list_head *list, int *isolated,
> + int *isolate_type, int *isolate_scanned)
> {
8 parameters:), can we reduce some of them?
> int i;
> + int scanned = 0;
> int type = get_type_to_scan(lruvec, swappiness);
>
> for_each_evictable_type(i, swappiness) {
> - int scanned;
> + int type_scan;
> int tier = get_tier_idx(lruvec, type);
>
> - *type_scanned = type;
> + type_scan = scan_folios(nr_to_scan, lruvec, sc,
> + type, tier, list, isolated);
>
> - scanned = scan_folios(nr_to_scan, lruvec, sc, type, tier, list);
> - if (scanned)
> - return scanned;
> + scanned += type_scan;
> + if (*isolated) {
> + *isolate_type = type;
> + *isolate_scanned = type_scan;
> + break;
> + }
>
> type = !type;
> }
>
> - return 0;
> + return scanned;
> }
>
> static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> struct scan_control *sc, int swappiness)
> {
> - int type;
> - int scanned;
> - int reclaimed;
> LIST_HEAD(list);
> LIST_HEAD(clean);
> struct folio *folio;
> @@ -4838,19 +4839,18 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> enum node_stat_item item;
> struct reclaim_stat stat;
> struct lru_gen_mm_walk *walk;
> + int scanned, reclaimed;
> + int isolated = 0, type, type_scanned;
> bool skip_retry = false;
> - struct lru_gen_folio *lrugen = &lruvec->lrugen;
> struct mem_cgroup *memcg = lruvec_memcg(lruvec);
> struct pglist_data *pgdat = lruvec_pgdat(lruvec);
>
> lruvec_lock_irq(lruvec);
>
> - scanned = isolate_folios(nr_to_scan, lruvec, sc, swappiness, &type, &list);
> -
> - scanned += try_to_inc_min_seq(lruvec, swappiness);
> + try_to_inc_min_seq(lruvec, swappiness);
>
> - if (evictable_min_seq(lrugen->min_seq, swappiness) + MIN_NR_GENS > lrugen->max_seq)
> - scanned = 0;
> + scanned = isolate_folios(nr_to_scan, lruvec, sc, swappiness,
> + &list, &isolated, &type, &type_scanned);
>
> lruvec_unlock_irq(lruvec);
>
> @@ -4861,7 +4861,7 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> sc->nr.unqueued_dirty += stat.nr_unqueued_dirty;
> sc->nr_reclaimed += reclaimed;
> trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id,
> - scanned, reclaimed, &stat, sc->priority,
> + type_scanned, reclaimed, &stat, sc->priority,
> type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
>
> list_for_each_entry_safe_reverse(folio, next, &list, lru) {
>
On Tue, Mar 31, 2026 at 04:04:30PM +0800, Baolin Wang wrote:
>
>
> On 3/29/26 3:52 AM, Kairui Song via B4 Relay wrote:
> > From: Kairui Song <kasong@tencent.com>
> >
> > Make the scan helpers return the exact number of folios being scanned
> > or isolated. Since the reclaim loop now has a natural scan budget that
> > controls the scan progress, returning the scan number directly should
> > make the scan more accurate and easier to follow.
> >
> > The number of scanned folios for each iteration is always positive and
> > larger than 0, unless the reclaim must stop for a forced aging, so
> > there is no more need for any special handling when there is no
> > progress made:
> >
> > - `return isolated || !remaining ? scanned : 0` in scan_folios: both
> > the function and the call now just return the exact scan count,
> > combined with the scan budget introduced in the previous commit to
> > avoid livelock or under scan.
>
> Make sense to me.
>
> >
> > - `scanned += try_to_inc_min_seq` in evict_folios: adding a bool as a
> > scan count was kind of confusing and no longer needed too, as scan
> > number will never be zero even if none of the folio in oldest
> > generation is isolated.
>
> Yes, agree.
>
> >
> > - `evictable_min_seq + MIN_NR_GENS > max_seq` guard in evict_folios:
> > the per-type get_nr_gens == MIN_NR_GENS check in scan_folios
> > naturally returns 0 when only two gens remain and breaks the loop.
> >
> > Also move try_to_inc_min_seq before isolate_folios, so that any empty
> > gens created by external folio freeing are also skipped.
>
> This part is somewhat confusing. You probably mean the case where the list
> of that gen becomes empty via isolate_folio(), right?
>
> If that's the case, the original logic would remove the empty gens produced
> by isolate_folio() after calling try_to_inc_min_seq().
>
> However, with your changes, this removal won't happen until the next
> eviction. Does this provide any additional benefits? Or could you describe
> how this change impacts your testing?
Hi Baolin, thanks for the review.
Yeah, I also notices this issue after sending this while doing more
self review.
So I did some test with the patch below:
static bool inc_max_seq(struct lruvec *lruvec, unsigned long seq, int swappiness)
@@ -4818,11 +4814,15 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
lruvec_lock_irq(lruvec);
+ /* In case folio deletion created empty gen, flush them */
try_to_inc_min_seq(lruvec, swappiness);
scanned = isolate_folios(nr_to_scan, lruvec, sc, swappiness,
&list, &isolated, &type, &type_scanned);
+ /* Isolation might created empty gen, flush them */
+ try_to_inc_min_seq(lruvec, swappiness);
+
lruvec_unlock_irq(lruvec);
if (list_empty(&list))
The return value of try_to_inc_min_seq can also be dropped
since it's no longer used, and the function call should be cheap.
After system time of build kernel using 3G memory and make -j96
with ZRAM as swap, system time in seconds average of 12 test run each:
mm-new:
9136.055833
After V2:
8819.932222
After V2, with above patch:
8783.944444
After V2, without above patch but move try_to_inc_min_seq
back to after isolate_folios:
8807.874444
This series is looking good, this inc_min change seems trivial
but in theory it does have have real effect.
- Moving the try_to_inc_min_seq after isolate_folios may result in a
wasted isolate_folios call and early abort of reclaim loop if there
is a stalled oldest gen created by folio deletion.
- Moving the try_to_inc_min_seq before isolate_folios may leave a
empty gen after isolation. Usually it's fine because next eviction
will still reclaim them. But before next eviction, during that period,
new file folios could be added the oldest gen and get reclaim too
early. That looks a real problem.
This maybe trivial since MGLRU itself also may suffer the same
problem when the oldest gen is just too short, that's a much more
common case (For this short oldest gen issue we can solve later).
- Having try_to_inc_min_seq both before and after isolate_folios
seems the best choice here and somehow matches the benchmark
result above, very close to the noise level though.
Well I only tested one cases, the cover letter described a
larger matrix, still all good with this series and I'm not
100% sure how this particular change effects them, I guess
it's still trivial.
The try_to_inc_min_seq call should be cheap enough since it's
called only for one batch of 64 folios, and it's only reading
a few lists for the non inc path.
How do you think that we just call it twice here?
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index ab81ffdb241a..c5361efa6776 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4686,7 +4686,7 @@ static bool isolate_folio(struct lruvec *lruvec, struct folio *folio, struct sca
> > static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> > struct scan_control *sc, int type, int tier,
> > - struct list_head *list)
> > + struct list_head *list, int *isolatedp)
> > {
> > int i;
> > int gen;
> > @@ -4756,11 +4756,9 @@ static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> > type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON);
> > if (type == LRU_GEN_FILE)
> > sc->nr.file_taken += isolated;
> > - /*
> > - * There might not be eligible folios due to reclaim_idx. Check the
> > - * remaining to prevent livelock if it's not making progress.
> > - */
> > - return isolated || !remaining ? scanned : 0;
> > +
> > + *isolatedp = isolated;
> > + return scanned;
> > }
> > static int get_tier_idx(struct lruvec *lruvec, int type)
> > @@ -4804,33 +4802,36 @@ static int get_type_to_scan(struct lruvec *lruvec, int swappiness)
> > static int isolate_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> > struct scan_control *sc, int swappiness,
> > - int *type_scanned, struct list_head *list)
> > + struct list_head *list, int *isolated,
> > + int *isolate_type, int *isolate_scanned)
> > {
>
> 8 parameters:), can we reduce some of them?
I'm not too concerned about this yet since it's a static function
with only one caller so in most cases it's inlined.
It's a bit long indeed :), haven't find out a good way since the
tracepoint below needs the isolate type and number. Maybe can be
simplify by unfolding the function here later or some other
refactor.
On 3/31/26 5:01 PM, Kairui Song wrote: > On Tue, Mar 31, 2026 at 04:04:30PM +0800, Baolin Wang wrote: >> >> >> On 3/29/26 3:52 AM, Kairui Song via B4 Relay wrote: >>> From: Kairui Song <kasong@tencent.com> >>> >>> Make the scan helpers return the exact number of folios being scanned >>> or isolated. Since the reclaim loop now has a natural scan budget that >>> controls the scan progress, returning the scan number directly should >>> make the scan more accurate and easier to follow. >>> >>> The number of scanned folios for each iteration is always positive and >>> larger than 0, unless the reclaim must stop for a forced aging, so >>> there is no more need for any special handling when there is no >>> progress made: >>> >>> - `return isolated || !remaining ? scanned : 0` in scan_folios: both >>> the function and the call now just return the exact scan count, >>> combined with the scan budget introduced in the previous commit to >>> avoid livelock or under scan. >> >> Make sense to me. >> >>> >>> - `scanned += try_to_inc_min_seq` in evict_folios: adding a bool as a >>> scan count was kind of confusing and no longer needed too, as scan >>> number will never be zero even if none of the folio in oldest >>> generation is isolated. >> >> Yes, agree. >> >>> >>> - `evictable_min_seq + MIN_NR_GENS > max_seq` guard in evict_folios: >>> the per-type get_nr_gens == MIN_NR_GENS check in scan_folios >>> naturally returns 0 when only two gens remain and breaks the loop. >>> >>> Also move try_to_inc_min_seq before isolate_folios, so that any empty >>> gens created by external folio freeing are also skipped. >> >> This part is somewhat confusing. You probably mean the case where the list >> of that gen becomes empty via isolate_folio(), right? >> >> If that's the case, the original logic would remove the empty gens produced >> by isolate_folio() after calling try_to_inc_min_seq(). >> >> However, with your changes, this removal won't happen until the next >> eviction. Does this provide any additional benefits? Or could you describe >> how this change impacts your testing? > > Hi Baolin, thanks for the review. > > Yeah, I also notices this issue after sending this while doing more > self review. > > So I did some test with the patch below: > > static bool inc_max_seq(struct lruvec *lruvec, unsigned long seq, int swappiness) > @@ -4818,11 +4814,15 @@ static int evict_folios(unsigned long nr_to_scan, struct lruvec *lruvec, > > lruvec_lock_irq(lruvec); > > + /* In case folio deletion created empty gen, flush them */ > try_to_inc_min_seq(lruvec, swappiness); > > scanned = isolate_folios(nr_to_scan, lruvec, sc, swappiness, > &list, &isolated, &type, &type_scanned); > > + /* Isolation might created empty gen, flush them */ > + try_to_inc_min_seq(lruvec, swappiness); > + > lruvec_unlock_irq(lruvec); > > if (list_empty(&list)) > > The return value of try_to_inc_min_seq can also be dropped > since it's no longer used, and the function call should be cheap. > > After system time of build kernel using 3G memory and make -j96 > with ZRAM as swap, system time in seconds average of 12 test run each: > > mm-new: > 9136.055833 > > After V2: > 8819.932222 > > After V2, with above patch: > 8783.944444 > > After V2, without above patch but move try_to_inc_min_seq > back to after isolate_folios: > 8807.874444 > > This series is looking good, this inc_min change seems trivial > but in theory it does have have real effect. > > - Moving the try_to_inc_min_seq after isolate_folios may result in a > wasted isolate_folios call and early abort of reclaim loop if there > is a stalled oldest gen created by folio deletion. Indeed. > - Moving the try_to_inc_min_seq before isolate_folios may leave a > empty gen after isolation. Usually it's fine because next eviction > will still reclaim them. But before next eviction, during that period, > new file folios could be added the oldest gen and get reclaim too > early. That looks a real problem. > > This maybe trivial since MGLRU itself also may suffer the same > problem when the oldest gen is just too short, that's a much more > common case (For this short oldest gen issue we can solve later). > > - Having try_to_inc_min_seq both before and after isolate_folios > seems the best choice here and somehow matches the benchmark > result above, very close to the noise level though. > > Well I only tested one cases, the cover letter described a > larger matrix, still all good with this series and I'm not > 100% sure how this particular change effects them, I guess > it's still trivial. > > The try_to_inc_min_seq call should be cheap enough since it's > called only for one batch of 64 folios, and it's only reading > a few lists for the non inc path. > > How do you think that we just call it twice here? Sounds reasonable to me. I'm not sure if we need to split out a new patch with adding above message, as this patch mainly focuses on optimizing the number of folios being scanned.
© 2016 - 2026 Red Hat, Inc.