[PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios

Vernon Yang posted 4 patches 1 month, 1 week ago
[PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
Posted by Vernon Yang 1 month, 1 week ago
From: Vernon Yang <yanglincheng@kylinos.cn>

For example, create three task: hot1 -> cold -> hot2. After all three
task are created, each allocate memory 128MB. the hot1/hot2 task
continuously access 128 MB memory, while the cold task only accesses
its memory briefly and then call madvise(MADV_FREE). However, khugepaged
still prioritizes scanning the cold task and only scans the hot2 task
after completing the scan of the cold task.

And if all folios in VM_DROPPABLE are lazyfree, Collapsing maintains
that property, so we can just collapse and memory pressure in the future
will free it up. In contrast, collapsing in !VM_DROPPABLE does not
maintain that property, the collapsed folio will not be lazyfree and
memory pressure in the future will not be able to free it up.

So if the user has explicitly informed us via MADV_FREE that this memory
will be freed, and this vma does not have VM_DROPPABLE flags, it is
appropriate for khugepaged to skip it only, thereby avoiding unnecessary
scan and collapse operations to reducing CPU wastage.

Here are the performance test results:
(Throughput bigger is better, other smaller is better)

Testing on x86_64 machine:

| task hot2           | without patch | with patch    |  delta  |
|---------------------|---------------|---------------|---------|
| total accesses time |  3.14 sec     |  2.93 sec     | -6.69%  |
| cycles per access   |  4.96         |  2.21         | -55.44% |
| Throughput          |  104.38 M/sec |  111.89 M/sec | +7.19%  |
| dTLB-load-misses    |  284814532    |  69597236     | -75.56% |

Testing on qemu-system-x86_64 -enable-kvm:

| task hot2           | without patch | with patch    |  delta  |
|---------------------|---------------|---------------|---------|
| total accesses time |  3.35 sec     |  2.96 sec     | -11.64% |
| cycles per access   |  7.29         |  2.07         | -71.60% |
| Throughput          |  97.67 M/sec  |  110.77 M/sec | +13.41% |
| dTLB-load-misses    |  241600871    |  3216108      | -98.67% |

Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
Acked-by: David Hildenbrand (arm) <david@kernel.org>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
---
 include/trace/events/huge_memory.h |  1 +
 mm/khugepaged.c                    | 13 +++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
index 384e29f6bef0..bcdc57eea270 100644
--- a/include/trace/events/huge_memory.h
+++ b/include/trace/events/huge_memory.h
@@ -25,6 +25,7 @@
 	EM( SCAN_PAGE_LRU,		"page_not_in_lru")		\
 	EM( SCAN_PAGE_LOCK,		"page_locked")			\
 	EM( SCAN_PAGE_ANON,		"page_not_anon")		\
+	EM( SCAN_PAGE_LAZYFREE,		"page_lazyfree")		\
 	EM( SCAN_PAGE_COMPOUND,		"page_compound")		\
 	EM( SCAN_ANY_PROCESS,		"no_process_for_page")		\
 	EM( SCAN_VMA_NULL,		"vma_null")			\
diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index 61e25cf5424b..e792e9074b48 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -46,6 +46,7 @@ enum scan_result {
 	SCAN_PAGE_LRU,
 	SCAN_PAGE_LOCK,
 	SCAN_PAGE_ANON,
+	SCAN_PAGE_LAZYFREE,
 	SCAN_PAGE_COMPOUND,
 	SCAN_ANY_PROCESS,
 	SCAN_VMA_NULL,
@@ -574,6 +575,12 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		folio = page_folio(page);
 		VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
 
+		if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
+		    folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
+			result = SCAN_PAGE_LAZYFREE;
+			goto out;
+		}
+
 		/* See hpage_collapse_scan_pmd(). */
 		if (folio_maybe_mapped_shared(folio)) {
 			++shared;
@@ -1326,6 +1333,12 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
 		}
 		folio = page_folio(page);
 
+		if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
+		    folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
+			result = SCAN_PAGE_LAZYFREE;
+			goto out_unmap;
+		}
+
 		if (!folio_test_anon(folio)) {
 			result = SCAN_PAGE_ANON;
 			goto out_unmap;
-- 
2.51.0
Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
Posted by Barry Song 1 month, 1 week ago
On Sat, Feb 21, 2026 at 5:40 PM Vernon Yang <vernon2gm@gmail.com> wrote:
>
> From: Vernon Yang <yanglincheng@kylinos.cn>
>
> For example, create three task: hot1 -> cold -> hot2. After all three
> task are created, each allocate memory 128MB. the hot1/hot2 task
> continuously access 128 MB memory, while the cold task only accesses
> its memory briefly and then call madvise(MADV_FREE). However, khugepaged
> still prioritizes scanning the cold task and only scans the hot2 task
> after completing the scan of the cold task.
>
> And if all folios in VM_DROPPABLE are lazyfree, Collapsing maintains
> that property, so we can just collapse and memory pressure in the future

I don’t think this is accurate. A VMA without VM_DROPPABLE
can still have all folios marked as lazyfree. Therefore, having
all folios lazyfree is not the reason why collapsing preserves
the property.

This raises a question: if a VMA without VM_DROPPABLE has
many contiguous lazyfree folios that can be collapsed, and
none of those folios are non-lazyfree, should we collapse
them and pass the lazyfree state to the new folio?

Currently, our approach skips the collapse, which also feels
a bit inconsistent.

> will free it up. In contrast, collapsing in !VM_DROPPABLE does not
> maintain that property, the collapsed folio will not be lazyfree and
> memory pressure in the future will not be able to free it up.
>
> So if the user has explicitly informed us via MADV_FREE that this memory
> will be freed, and this vma does not have VM_DROPPABLE flags, it is
> appropriate for khugepaged to skip it only, thereby avoiding unnecessary
> scan and collapse operations to reducing CPU wastage.
>
> Here are the performance test results:
> (Throughput bigger is better, other smaller is better)
>
> Testing on x86_64 machine:
>
> | task hot2           | without patch | with patch    |  delta  |
> |---------------------|---------------|---------------|---------|
> | total accesses time |  3.14 sec     |  2.93 sec     | -6.69%  |
> | cycles per access   |  4.96         |  2.21         | -55.44% |
> | Throughput          |  104.38 M/sec |  111.89 M/sec | +7.19%  |
> | dTLB-load-misses    |  284814532    |  69597236     | -75.56% |
>
> Testing on qemu-system-x86_64 -enable-kvm:
>
> | task hot2           | without patch | with patch    |  delta  |
> |---------------------|---------------|---------------|---------|
> | total accesses time |  3.35 sec     |  2.96 sec     | -11.64% |
> | cycles per access   |  7.29         |  2.07         | -71.60% |
> | Throughput          |  97.67 M/sec  |  110.77 M/sec | +13.41% |
> | dTLB-load-misses    |  241600871    |  3216108      | -98.67% |
>
> Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
> Acked-by: David Hildenbrand (arm) <david@kernel.org>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> ---

Overall, LGTM,

Reviewed-by: Barry Song <baohua@kernel.org>

>  include/trace/events/huge_memory.h |  1 +
>  mm/khugepaged.c                    | 13 +++++++++++++
>  2 files changed, 14 insertions(+)
>
> diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> index 384e29f6bef0..bcdc57eea270 100644
> --- a/include/trace/events/huge_memory.h
> +++ b/include/trace/events/huge_memory.h
> @@ -25,6 +25,7 @@
>         EM( SCAN_PAGE_LRU,              "page_not_in_lru")              \
>         EM( SCAN_PAGE_LOCK,             "page_locked")                  \
>         EM( SCAN_PAGE_ANON,             "page_not_anon")                \
> +       EM( SCAN_PAGE_LAZYFREE,         "page_lazyfree")                \
>         EM( SCAN_PAGE_COMPOUND,         "page_compound")                \
>         EM( SCAN_ANY_PROCESS,           "no_process_for_page")          \
>         EM( SCAN_VMA_NULL,              "vma_null")                     \
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 61e25cf5424b..e792e9074b48 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -46,6 +46,7 @@ enum scan_result {
>         SCAN_PAGE_LRU,
>         SCAN_PAGE_LOCK,
>         SCAN_PAGE_ANON,
> +       SCAN_PAGE_LAZYFREE,
>         SCAN_PAGE_COMPOUND,
>         SCAN_ANY_PROCESS,
>         SCAN_VMA_NULL,
> @@ -574,6 +575,12 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
>                 folio = page_folio(page);
>                 VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
>
> +               if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
> +                   folio_test_lazyfree(folio) && !pte_dirty(pteval)) {

I would prefer to add a comment about VM_DROPPABLE here
rather than only mentioning it in the changelog.

> +                       result = SCAN_PAGE_LAZYFREE;
> +                       goto out;
> +               }
> +
>                 /* See hpage_collapse_scan_pmd(). */
>                 if (folio_maybe_mapped_shared(folio)) {
>                         ++shared;
> @@ -1326,6 +1333,12 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
>                 }
>                 folio = page_folio(page);
>
> +               if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
> +                   folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
> +                       result = SCAN_PAGE_LAZYFREE;
> +                       goto out_unmap;
> +               }

As above.

> +
>                 if (!folio_test_anon(folio)) {
>                         result = SCAN_PAGE_ANON;
>                         goto out_unmap;
> --
> 2.51.0
>

Thanks
Barry
Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
Posted by Vernon Yang 1 month, 1 week ago
On Sat, Feb 21, 2026 at 06:27:36PM +0800, Barry Song wrote:
> On Sat, Feb 21, 2026 at 5:40 PM Vernon Yang <vernon2gm@gmail.com> wrote:
> >
> > From: Vernon Yang <yanglincheng@kylinos.cn>
> >
> > For example, create three task: hot1 -> cold -> hot2. After all three
> > task are created, each allocate memory 128MB. the hot1/hot2 task
> > continuously access 128 MB memory, while the cold task only accesses
> > its memory briefly and then call madvise(MADV_FREE). However, khugepaged
> > still prioritizes scanning the cold task and only scans the hot2 task
> > after completing the scan of the cold task.
> >
> > And if all folios in VM_DROPPABLE are lazyfree, Collapsing maintains
> > that property, so we can just collapse and memory pressure in the future
>
> I don’t think this is accurate. A VMA without VM_DROPPABLE
> can still have all folios marked as lazyfree. Therefore, having
> all folios lazyfree is not the reason why collapsing preserves
> the property.

In folio_add_new_anon_rmap(), we know that the vma has the VM_DROPPABLE
attribute, which is the root reason why Collapsing maintains that property.
The above commit log clearly states "all folios in VM_DROPPABLE are lazyfree"
                                                ^^^^^^^^^^^^^^^
(the "if" is redundant and should be removed), not "all folios are lazyfree".

> This raises a question: if a VMA without VM_DROPPABLE has
> many contiguous lazyfree folios that can be collapsed, and
> none of those folios are non-lazyfree, should we collapse
> them and pass the lazyfree state to the new folio?
>
> Currently, our approach skips the collapse, which also feels
> a bit inconsistent.

Yes, they are inconsistent, because this question need to scan all folios
to make a decision, and it cannot solve the hot1->cold->hot2 scenario.

> > will free it up. In contrast, collapsing in !VM_DROPPABLE does not
> > maintain that property, the collapsed folio will not be lazyfree and
> > memory pressure in the future will not be able to free it up.
> >
> > So if the user has explicitly informed us via MADV_FREE that this memory
> > will be freed, and this vma does not have VM_DROPPABLE flags, it is
> > appropriate for khugepaged to skip it only, thereby avoiding unnecessary
> > scan and collapse operations to reducing CPU wastage.
> >
> > Here are the performance test results:
> > (Throughput bigger is better, other smaller is better)
> >
> > Testing on x86_64 machine:
> >
> > | task hot2           | without patch | with patch    |  delta  |
> > |---------------------|---------------|---------------|---------|
> > | total accesses time |  3.14 sec     |  2.93 sec     | -6.69%  |
> > | cycles per access   |  4.96         |  2.21         | -55.44% |
> > | Throughput          |  104.38 M/sec |  111.89 M/sec | +7.19%  |
> > | dTLB-load-misses    |  284814532    |  69597236     | -75.56% |
> >
> > Testing on qemu-system-x86_64 -enable-kvm:
> >
> > | task hot2           | without patch | with patch    |  delta  |
> > |---------------------|---------------|---------------|---------|
> > | total accesses time |  3.35 sec     |  2.96 sec     | -11.64% |
> > | cycles per access   |  7.29         |  2.07         | -71.60% |
> > | Throughput          |  97.67 M/sec  |  110.77 M/sec | +13.41% |
> > | dTLB-load-misses    |  241600871    |  3216108      | -98.67% |
> >
> > Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
> > Acked-by: David Hildenbrand (arm) <david@kernel.org>
> > Reviewed-by: Lance Yang <lance.yang@linux.dev>
> > ---
>
> Overall, LGTM,
>
> Reviewed-by: Barry Song <baohua@kernel.org>

Thank you for review.

> >  include/trace/events/huge_memory.h |  1 +
> >  mm/khugepaged.c                    | 13 +++++++++++++
> >  2 files changed, 14 insertions(+)
> >
> > diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h
> > index 384e29f6bef0..bcdc57eea270 100644
> > --- a/include/trace/events/huge_memory.h
> > +++ b/include/trace/events/huge_memory.h
> > @@ -25,6 +25,7 @@
> >         EM( SCAN_PAGE_LRU,              "page_not_in_lru")              \
> >         EM( SCAN_PAGE_LOCK,             "page_locked")                  \
> >         EM( SCAN_PAGE_ANON,             "page_not_anon")                \
> > +       EM( SCAN_PAGE_LAZYFREE,         "page_lazyfree")                \
> >         EM( SCAN_PAGE_COMPOUND,         "page_compound")                \
> >         EM( SCAN_ANY_PROCESS,           "no_process_for_page")          \
> >         EM( SCAN_VMA_NULL,              "vma_null")                     \
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index 61e25cf5424b..e792e9074b48 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -46,6 +46,7 @@ enum scan_result {
> >         SCAN_PAGE_LRU,
> >         SCAN_PAGE_LOCK,
> >         SCAN_PAGE_ANON,
> > +       SCAN_PAGE_LAZYFREE,
> >         SCAN_PAGE_COMPOUND,
> >         SCAN_ANY_PROCESS,
> >         SCAN_VMA_NULL,
> > @@ -574,6 +575,12 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
> >                 folio = page_folio(page);
> >                 VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);
> >
> > +               if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
> > +                   folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
>
> I would prefer to add a comment about VM_DROPPABLE here
> rather than only mentioning it in the changelog.

Is the following comment clear?

/*
 * If the vma has the VM_DROPPABLE flag, the collapse will
 * preserve the lazyfree property without needing to skip.
 */

> > +                       result = SCAN_PAGE_LAZYFREE;
> > +                       goto out;
> > +               }
> > +
> >                 /* See hpage_collapse_scan_pmd(). */
> >                 if (folio_maybe_mapped_shared(folio)) {
> >                         ++shared;
> > @@ -1326,6 +1333,12 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
> >                 }
> >                 folio = page_folio(page);
> >
> > +               if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
> > +                   folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
> > +                       result = SCAN_PAGE_LAZYFREE;
> > +                       goto out_unmap;
> > +               }
>
> As above.
>
> > +
> >                 if (!folio_test_anon(folio)) {
> >                         result = SCAN_PAGE_ANON;
> >                         goto out_unmap;
> > --
> > 2.51.0
> >
>
> Thanks
> Barry
>
Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
Posted by Barry Song 1 month, 1 week ago
On Sat, Feb 21, 2026 at 9:39 PM Vernon Yang <vernon2gm@gmail.com> wrote:
>
> On Sat, Feb 21, 2026 at 06:27:36PM +0800, Barry Song wrote:
> > On Sat, Feb 21, 2026 at 5:40 PM Vernon Yang <vernon2gm@gmail.com> wrote:
> > >
> > > From: Vernon Yang <yanglincheng@kylinos.cn>
> > >
> > > For example, create three task: hot1 -> cold -> hot2. After all three
> > > task are created, each allocate memory 128MB. the hot1/hot2 task
> > > continuously access 128 MB memory, while the cold task only accesses
> > > its memory briefly and then call madvise(MADV_FREE). However, khugepaged
> > > still prioritizes scanning the cold task and only scans the hot2 task
> > > after completing the scan of the cold task.
> > >
> > > And if all folios in VM_DROPPABLE are lazyfree, Collapsing maintains
> > > that property, so we can just collapse and memory pressure in the future
> >
> > I don’t think this is accurate. A VMA without VM_DROPPABLE
> > can still have all folios marked as lazyfree. Therefore, having
> > all folios lazyfree is not the reason why collapsing preserves
> > the property.
>
> In folio_add_new_anon_rmap(), we know that the vma has the VM_DROPPABLE
> attribute, which is the root reason why Collapsing maintains that property.
> The above commit log clearly states "all folios in VM_DROPPABLE are lazyfree"
>                                                 ^^^^^^^^^^^^^^^
> (the "if" is redundant and should be removed), not "all folios are lazyfree".

Yes, we should remove the if; otherwise, it’s misleading.

[...]

> >
> > I would prefer to add a comment about VM_DROPPABLE here
> > rather than only mentioning it in the changelog.
>
> Is the following comment clear?
>
> /*
>  * If the vma has the VM_DROPPABLE flag, the collapse will
>  * preserve the lazyfree property without needing to skip.
>  */

Looks good to me.

Best Regards
Barry
Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
Posted by Vernon Yang 1 month ago
On Tue, Feb 24, 2026 at 04:10:47AM +0800, Barry Song wrote:
> On Sat, Feb 21, 2026 at 9:39 PM Vernon Yang <vernon2gm@gmail.com> wrote:
> >
> > On Sat, Feb 21, 2026 at 06:27:36PM +0800, Barry Song wrote:
> > > On Sat, Feb 21, 2026 at 5:40 PM Vernon Yang <vernon2gm@gmail.com> wrote:
> > > >
> > > > From: Vernon Yang <yanglincheng@kylinos.cn>
> > > >
> > > > For example, create three task: hot1 -> cold -> hot2. After all three
> > > > task are created, each allocate memory 128MB. the hot1/hot2 task
> > > > continuously access 128 MB memory, while the cold task only accesses
> > > > its memory briefly and then call madvise(MADV_FREE). However, khugepaged
> > > > still prioritizes scanning the cold task and only scans the hot2 task
> > > > after completing the scan of the cold task.
> > > >
> > > > And if all folios in VM_DROPPABLE are lazyfree, Collapsing maintains
            ^^
here

> > > > that property, so we can just collapse and memory pressure in the future
> > >
> > > I don’t think this is accurate. A VMA without VM_DROPPABLE
> > > can still have all folios marked as lazyfree. Therefore, having
> > > all folios lazyfree is not the reason why collapsing preserves
> > > the property.
> >
> > In folio_add_new_anon_rmap(), we know that the vma has the VM_DROPPABLE
> > attribute, which is the root reason why Collapsing maintains that property.
> > The above commit log clearly states "all folios in VM_DROPPABLE are lazyfree"
> >                                                 ^^^^^^^^^^^^^^^
> > (the "if" is redundant and should be removed), not "all folios are lazyfree".
>
> Yes, we should remove the if; otherwise, it’s misleading.
>
> [...]
>
> > >
> > > I would prefer to add a comment about VM_DROPPABLE here
> > > rather than only mentioning it in the changelog.
> >
> > Is the following comment clear?
> >
> > /*
> >  * If the vma has the VM_DROPPABLE flag, the collapse will
> >  * preserve the lazyfree property without needing to skip.
> >  */
>
> Looks good to me.

Hi Andrew, could you please squash the following fix into this patch?
also remove "if" in the changelog above.

---
From ab5060c7be655dd00bf3a9abc779915922b2f969 Mon Sep 17 00:00:00 2001
From: Vernon Yang <yanglincheng@kylinos.cn>
Date: Thu, 26 Feb 2026 13:18:39 +0800
Subject: [PATCH] fixup! mm: khugepaged: skip lazy-free folios

add comment about VM_DROPPABLE in code, make it clearer.

Signed-off-by: Vernon Yang <yanglincheng@kylinos.cn>
---
 mm/khugepaged.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index c85d7381adb5..7c1642fbe394 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -575,6 +575,10 @@ static enum scan_result __collapse_huge_page_isolate(struct vm_area_struct *vma,
 		folio = page_folio(page);
 		VM_BUG_ON_FOLIO(!folio_test_anon(folio), folio);

+		/*
+		 * If the vma has the VM_DROPPABLE flag, the collapse will
+		 * preserve the lazyfree property without needing to skip.
+		 */
 		if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
 		    folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
 			result = SCAN_PAGE_LAZYFREE;
@@ -1333,6 +1337,10 @@ static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm,
 		}
 		folio = page_folio(page);

+		/*
+		 * If the vma has the VM_DROPPABLE flag, the collapse will
+		 * preserve the lazyfree property without needing to skip.
+		 */
 		if (cc->is_khugepaged && !(vma->vm_flags & VM_DROPPABLE) &&
 		    folio_test_lazyfree(folio) && !pte_dirty(pteval)) {
 			result = SCAN_PAGE_LAZYFREE;
--
2.51.0

Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
Posted by Andrew Morton 2 weeks, 2 days ago
On Thu, 26 Feb 2026 15:55:45 +0800 Vernon Yang <vernon2gm@gmail.com> wrote:

> Hi Andrew, could you please squash the following fix into this patch?

yup.

> also remove "if" in the changelog above.

So you want it like this?

: All folios in VM_DROPPABLE are lazyfree, Collapsing maintains that
: property, so we can just collapse and memory pressure in the future will
: free it up.  In contrast, collapsing in !VM_DROPPABLE does not maintain
: that property, the collapsed folio will not be lazyfree and memory
: pressure in the future will not be able to free it up.
Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
Posted by Vernon Yang 2 weeks, 2 days ago
On Mon, Mar 16, 2026 at 12:41:57PM -0700, Andrew Morton wrote:
> On Thu, 26 Feb 2026 15:55:45 +0800 Vernon Yang <vernon2gm@gmail.com> wrote:
>
> > Hi Andrew, could you please squash the following fix into this patch?
>
> yup.
>
> > also remove "if" in the changelog above.
>
> So you want it like this?

Yes, we should remove the "if"; otherwise, it’s misleading.

> : All folios in VM_DROPPABLE are lazyfree, Collapsing maintains that
> : property, so we can just collapse and memory pressure in the future will
> : free it up.  In contrast, collapsing in !VM_DROPPABLE does not maintain
> : that property, the collapsed folio will not be lazyfree and memory
> : pressure in the future will not be able to free it up.
>

LGTM, Thanks!

--
Cheers,
Vernon
Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
Posted by David Hildenbrand (Arm) 1 month, 1 week ago
On 2/21/26 14:38, Vernon Yang wrote:
> On Sat, Feb 21, 2026 at 06:27:36PM +0800, Barry Song wrote:
>> On Sat, Feb 21, 2026 at 5:40 PM Vernon Yang <vernon2gm@gmail.com> wrote:
>>>
>>> From: Vernon Yang <yanglincheng@kylinos.cn>
>>>
>>> For example, create three task: hot1 -> cold -> hot2. After all three
>>> task are created, each allocate memory 128MB. the hot1/hot2 task
>>> continuously access 128 MB memory, while the cold task only accesses
>>> its memory briefly and then call madvise(MADV_FREE). However, khugepaged
>>> still prioritizes scanning the cold task and only scans the hot2 task
>>> after completing the scan of the cold task.
>>>
>>> And if all folios in VM_DROPPABLE are lazyfree, Collapsing maintains
>>> that property, so we can just collapse and memory pressure in the future
>>
>> I don’t think this is accurate. A VMA without VM_DROPPABLE
>> can still have all folios marked as lazyfree. Therefore, having
>> all folios lazyfree is not the reason why collapsing preserves
>> the property.
> 
> In folio_add_new_anon_rmap(), we know that the vma has the VM_DROPPABLE
> attribute, which is the root reason why Collapsing maintains that property.
> The above commit log clearly states "all folios in VM_DROPPABLE are lazyfree"
>                                                  ^^^^^^^^^^^^^^^
> (the "if" is redundant and should be removed), not "all folios are lazyfree".


Exactly. folio_add_new_anon_rmap() makes sure that all folios (except 
the shared zero folios ;) ) in VM_DROPPABLE are lazyfree.

In fact, MADV_FREE should be a NOP on VM_DROPPABLE, as 
folio_mark_lazyfree() doesn't do anything.

> 
>> This raises a question: if a VMA without VM_DROPPABLE has
>> many contiguous lazyfree folios that can be collapsed, and
>> none of those folios are non-lazyfree, should we collapse
>> them and pass the lazyfree state to the new folio?

I'd assume we'd only want to add support for that when there are actual 
known use cases that can trigger that + benefit from it.

Adds complexity.

-- 
Cheers,

David
Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
Posted by Barry Song 1 month, 1 week ago
On Mon, Feb 23, 2026 at 9:16 PM David Hildenbrand (Arm)
<david@kernel.org> wrote:
>
> On 2/21/26 14:38, Vernon Yang wrote:
> > On Sat, Feb 21, 2026 at 06:27:36PM +0800, Barry Song wrote:
> >> On Sat, Feb 21, 2026 at 5:40 PM Vernon Yang <vernon2gm@gmail.com> wrote:
> >>>
> >>> From: Vernon Yang <yanglincheng@kylinos.cn>
> >>>
> >>> For example, create three task: hot1 -> cold -> hot2. After all three
> >>> task are created, each allocate memory 128MB. the hot1/hot2 task
> >>> continuously access 128 MB memory, while the cold task only accesses
> >>> its memory briefly and then call madvise(MADV_FREE). However, khugepaged
> >>> still prioritizes scanning the cold task and only scans the hot2 task
> >>> after completing the scan of the cold task.
> >>>
> >>> And if all folios in VM_DROPPABLE are lazyfree, Collapsing maintains
> >>> that property, so we can just collapse and memory pressure in the future
> >>
> >> I don’t think this is accurate. A VMA without VM_DROPPABLE
> >> can still have all folios marked as lazyfree. Therefore, having
> >> all folios lazyfree is not the reason why collapsing preserves
> >> the property.
> >
> > In folio_add_new_anon_rmap(), we know that the vma has the VM_DROPPABLE
> > attribute, which is the root reason why Collapsing maintains that property.
> > The above commit log clearly states "all folios in VM_DROPPABLE are lazyfree"
> >                                                  ^^^^^^^^^^^^^^^
> > (the "if" is redundant and should be removed), not "all folios are lazyfree".
>
>
> Exactly. folio_add_new_anon_rmap() makes sure that all folios (except
> the shared zero folios ;) ) in VM_DROPPABLE are lazyfree.
>
> In fact, MADV_FREE should be a NOP on VM_DROPPABLE, as
> folio_mark_lazyfree() doesn't do anything.
>

Maybe we could do something like the following?

diff --git a/mm/madvise.c b/mm/madvise.c
index c0370d9b4e23..173b0e5308b5 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -817,6 +817,11 @@ static int madvise_free_single_vma(struct
madvise_behavior *madv_behavior)
        range.end = min(vma->vm_end, end_addr);
        if (range.end <= vma->vm_start)
                return -EINVAL;
+
+       /* All folios in the VM_DROPPABLE VMA are already lazyfree */
+       if (vma->vm_flags & VM_DROPPABLE)
+               return 0;
+
        mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm,
                                range.start, range.end);

Thanks
Barry
Re: [PATCH mm-new v8 4/4] mm: khugepaged: skip lazy-free folios
Posted by David Hildenbrand (Arm) 1 month, 1 week ago
On 2/23/26 21:08, Barry Song wrote:
> On Mon, Feb 23, 2026 at 9:16 PM David Hildenbrand (Arm)
> <david@kernel.org> wrote:
>>
>> On 2/21/26 14:38, Vernon Yang wrote:
>>>
>>> In folio_add_new_anon_rmap(), we know that the vma has the VM_DROPPABLE
>>> attribute, which is the root reason why Collapsing maintains that property.
>>> The above commit log clearly states "all folios in VM_DROPPABLE are lazyfree"
>>>                                                  ^^^^^^^^^^^^^^^
>>> (the "if" is redundant and should be removed), not "all folios are lazyfree".
>>
>>
>> Exactly. folio_add_new_anon_rmap() makes sure that all folios (except
>> the shared zero folios ;) ) in VM_DROPPABLE are lazyfree.
>>
>> In fact, MADV_FREE should be a NOP on VM_DROPPABLE, as
>> folio_mark_lazyfree() doesn't do anything.
>>
> 
> Maybe we could do something like the following?
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index c0370d9b4e23..173b0e5308b5 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -817,6 +817,11 @@ static int madvise_free_single_vma(struct
> madvise_behavior *madv_behavior)
>         range.end = min(vma->vm_end, end_addr);
>         if (range.end <= vma->vm_start)
>                 return -EINVAL;
> +
> +       /* All folios in the VM_DROPPABLE VMA are already lazyfree */
> +       if (vma->vm_flags & VM_DROPPABLE)
> +               return 0;

We could, but it feels like optimizing for a case that likely nobody
triggers :)

-- 
Cheers,

David