[v4] mm: khugepaged cleanups and mTHP prerequisites

[PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites

Posted by Nico Pache 1 week, 1 day ago

MAINTAINER NOTE: This is based on mm-unstable with the coresponding
patches reverted then reapplied.

The following series contains cleanups and prerequisites for my work on
khugepaged mTHP support [1]. These have been separated out to ease review.

The first patch in the series refactors the page fault folio to pte mapping
and follows a similar convention as defined by map_anon_folio_pmd_(no)pf().
This not only cleans up the current implementation of do_anonymous_page(),
but will allow for reuse later in the khugepaged mTHP implementation.

The second patch adds a small is_pmd_order() helper to check if an order is
the PMD order. This check is open-coded in a number of places. This patch
aims to clean this up and will be used more in the khugepaged mTHP work.
The third patch also adds a small DEFINE for (HPAGE_PMD_NR - 1) which is
used often across the khugepaged code.

The fourth and fifth patch come from the khugepaged mTHP patchset [1].
These two patches include the rename of function prefixes, and the
unification of khugepaged and madvise_collapse via a new
collapse_single_pmd function.

Patch 1:     refactor do_anonymous_page into map_anon_folio_pte_(no)pf
Patch 2:     add is_pmd_order helper
Patch 3:     Add define for (HPAGE_PMD_NR - 1)
Patch 4:     Refactor/rename hpage_collapse
Patch 5:     Refactoring to combine madvise_collapse and khugepaged

Testing:
- Built for x86_64, aarch64, ppc64le, and s390x
- ran all arches on test suites provided by the kernel-tests project
- selftests mm

V4 Changes:
 - added RB and SB tags
 - Patch1: commit message cleanup/additions
 - Patch1: constify two variables, and change 1<<order to 1L<<..
 - Patch1: change zero-page read path to use update_mmu_cache varient
 - Patch5: remove dead code switch statement (SCAN_PTE_MAPPED_HUGEPAGE)
 - Patch5: remove local mmap_locked from madvise_collapse()
 - Patch5: rename mmap_locked to lock_dropped in ..scan_mm_slot() and
   invert the logic. the madvise|khugepaged code now share the same 
   naming convention across both functions.
 - Patch5: add assertion to collapse_single_pmd() so both madvise_collapse
   and khugepaged assert the lock.
 - Patch5: Convert one of the VM_BUG_ON's to VM_WARN_ON

V3 - https://lore.kernel.org/all/20260311211315.450947-1-npache@redhat.com
V2 - https://lore.kernel.org/all/20260226012929.169479-1-npache@redhat.com
V1 - https://lore.kernel.org/all/20260212021835.17755-1-npache@redhat.com

A big thanks to everyone that has reviewed, tested, and participated in
the development process.

[1] - https://lore.kernel.org/all/20260122192841.128719-1-npache@redhat.com
[2] - https://lore.kernel.org/all/7334b702-f6a0-4ccf-8ac6-8426a90d1846@kernel.org/
[3] - https://lore.kernel.org/all/25723c0f-c702-44ad-93e9-1056313680cd@kernel.org/
[4] - https://lore.kernel.org/all/81ff9caa-50f2-4951-8d82-2c8dcdf3db91@kernel.org/

Nico Pache (5):
  mm: consolidate anonymous folio PTE mapping into helpers
  mm: introduce is_pmd_order helper
  mm/khugepaged: define KHUGEPAGED_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1
  mm/khugepaged: rename hpage_collapse_* to collapse_*
  mm/khugepaged: unify khugepaged and madv_collapse with
    collapse_single_pmd()

 include/linux/huge_mm.h |   5 +
 include/linux/mm.h      |   4 +
 mm/huge_memory.c        |   2 +-
 mm/khugepaged.c         | 207 ++++++++++++++++++++--------------------
 mm/memory.c             |  63 ++++++++----
 mm/mempolicy.c          |   2 +-
 mm/mremap.c             |   2 +-
 mm/page_alloc.c         |   4 +-
 mm/shmem.c              |   3 +-
 9 files changed, 161 insertions(+), 131 deletions(-)

-- 
2.53.0

Re: [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites

Posted by Lorenzo Stoakes (Oracle) 1 week, 1 day ago

-cc old email

Gentle reminder to please send new stuff to ljs@kernel.org!

Thanks, Lorenzo

On Wed, Mar 25, 2026 at 05:40:17AM -0600, Nico Pache wrote:
> MAINTAINER NOTE: This is based on mm-unstable with the coresponding
> patches reverted then reapplied.
>
> The following series contains cleanups and prerequisites for my work on
> khugepaged mTHP support [1]. These have been separated out to ease review.
>
> The first patch in the series refactors the page fault folio to pte mapping
> and follows a similar convention as defined by map_anon_folio_pmd_(no)pf().
> This not only cleans up the current implementation of do_anonymous_page(),
> but will allow for reuse later in the khugepaged mTHP implementation.
>
> The second patch adds a small is_pmd_order() helper to check if an order is
> the PMD order. This check is open-coded in a number of places. This patch
> aims to clean this up and will be used more in the khugepaged mTHP work.
> The third patch also adds a small DEFINE for (HPAGE_PMD_NR - 1) which is
> used often across the khugepaged code.
>
> The fourth and fifth patch come from the khugepaged mTHP patchset [1].
> These two patches include the rename of function prefixes, and the
> unification of khugepaged and madvise_collapse via a new
> collapse_single_pmd function.
>
> Patch 1:     refactor do_anonymous_page into map_anon_folio_pte_(no)pf
> Patch 2:     add is_pmd_order helper
> Patch 3:     Add define for (HPAGE_PMD_NR - 1)
> Patch 4:     Refactor/rename hpage_collapse
> Patch 5:     Refactoring to combine madvise_collapse and khugepaged
>
> Testing:
> - Built for x86_64, aarch64, ppc64le, and s390x
> - ran all arches on test suites provided by the kernel-tests project
> - selftests mm
>
> V4 Changes:
>  - added RB and SB tags
>  - Patch1: commit message cleanup/additions
>  - Patch1: constify two variables, and change 1<<order to 1L<<..
>  - Patch1: change zero-page read path to use update_mmu_cache varient
>  - Patch5: remove dead code switch statement (SCAN_PTE_MAPPED_HUGEPAGE)
>  - Patch5: remove local mmap_locked from madvise_collapse()
>  - Patch5: rename mmap_locked to lock_dropped in ..scan_mm_slot() and
>    invert the logic. the madvise|khugepaged code now share the same
>    naming convention across both functions.
>  - Patch5: add assertion to collapse_single_pmd() so both madvise_collapse
>    and khugepaged assert the lock.
>  - Patch5: Convert one of the VM_BUG_ON's to VM_WARN_ON
>
> V3 - https://lore.kernel.org/all/20260311211315.450947-1-npache@redhat.com
> V2 - https://lore.kernel.org/all/20260226012929.169479-1-npache@redhat.com
> V1 - https://lore.kernel.org/all/20260212021835.17755-1-npache@redhat.com
>
> A big thanks to everyone that has reviewed, tested, and participated in
> the development process.
>
> [1] - https://lore.kernel.org/all/20260122192841.128719-1-npache@redhat.com
> [2] - https://lore.kernel.org/all/7334b702-f6a0-4ccf-8ac6-8426a90d1846@kernel.org/
> [3] - https://lore.kernel.org/all/25723c0f-c702-44ad-93e9-1056313680cd@kernel.org/
> [4] - https://lore.kernel.org/all/81ff9caa-50f2-4951-8d82-2c8dcdf3db91@kernel.org/
>
> Nico Pache (5):
>   mm: consolidate anonymous folio PTE mapping into helpers
>   mm: introduce is_pmd_order helper
>   mm/khugepaged: define KHUGEPAGED_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1
>   mm/khugepaged: rename hpage_collapse_* to collapse_*
>   mm/khugepaged: unify khugepaged and madv_collapse with
>     collapse_single_pmd()
>
>  include/linux/huge_mm.h |   5 +
>  include/linux/mm.h      |   4 +
>  mm/huge_memory.c        |   2 +-
>  mm/khugepaged.c         | 207 ++++++++++++++++++++--------------------
>  mm/memory.c             |  63 ++++++++----
>  mm/mempolicy.c          |   2 +-
>  mm/mremap.c             |   2 +-
>  mm/page_alloc.c         |   4 +-
>  mm/shmem.c              |   3 +-
>  9 files changed, 161 insertions(+), 131 deletions(-)
>
> --
> 2.53.0
>

Re: [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites

Posted by Andrew Morton 1 week ago

On Wed, 25 Mar 2026 05:40:17 -0600 Nico Pache <npache@redhat.com> wrote:

> MAINTAINER NOTE: This is based on mm-unstable with the coresponding
> patches reverted then reapplied.

Unfortunately the update-in-place trick fooled AI review, which might
have been useful.  Oh well.  In retrospect we could have avoided that
by you asking me to drop v3 a couple of days before mailing out v4.

otoh, this series *does* apply to the mm-stable branch.  Roman, I
though Sashiko is attempting that?

> The following series contains cleanups and prerequisites for my work on
> khugepaged mTHP support [1]. These have been separated out to ease review.

And boy that's a lot of reviewers!  Aren't you a lucky ducky ;)

> The first patch in the series refactors the page fault folio to pte mapping
> and follows a similar convention as defined by map_anon_folio_pmd_(no)pf().
> This not only cleans up the current implementation of do_anonymous_page(),
> but will allow for reuse later in the khugepaged mTHP implementation.
> 
> The second patch adds a small is_pmd_order() helper to check if an order is
> the PMD order. This check is open-coded in a number of places. This patch
> aims to clean this up and will be used more in the khugepaged mTHP work.
> The third patch also adds a small DEFINE for (HPAGE_PMD_NR - 1) which is
> used often across the khugepaged code.
> 
> The fourth and fifth patch come from the khugepaged mTHP patchset [1].
> These two patches include the rename of function prefixes, and the
> unification of khugepaged and madvise_collapse via a new
> collapse_single_pmd function.
> 
> Patch 1:     refactor do_anonymous_page into map_anon_folio_pte_(no)pf
> Patch 2:     add is_pmd_order helper
> Patch 3:     Add define for (HPAGE_PMD_NR - 1)
> Patch 4:     Refactor/rename hpage_collapse
> Patch 5:     Refactoring to combine madvise_collapse and khugepaged
> 

Thanks, I updated mm.git's mm-unstable branch to this version.

> V4 Changes:
>  - added RB and SB tags
>  - Patch1: commit message cleanup/additions
>  - Patch1: constify two variables, and change 1<<order to 1L<<..
>  - Patch1: change zero-page read path to use update_mmu_cache varient
>  - Patch5: remove dead code switch statement (SCAN_PTE_MAPPED_HUGEPAGE)
>  - Patch5: remove local mmap_locked from madvise_collapse()
>  - Patch5: rename mmap_locked to lock_dropped in ..scan_mm_slot() and
>    invert the logic. the madvise|khugepaged code now share the same 
>    naming convention across both functions.
>  - Patch5: add assertion to collapse_single_pmd() so both madvise_collapse
>    and khugepaged assert the lock.
>  - Patch5: Convert one of the VM_BUG_ON's to VM_WARN_ON

Below is how v4 altered mm,git:

 mm/khugepaged.c |   34 +++++++++++++++-------------------
 mm/memory.c     |   11 +++++------
 2 files changed, 20 insertions(+), 25 deletions(-)

--- a/mm/khugepaged.c~b
+++ a/mm/khugepaged.c
@@ -1250,7 +1250,7 @@ out_nolock:
 
 static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
 		struct vm_area_struct *vma, unsigned long start_addr,
-		bool *mmap_locked, struct collapse_control *cc)
+		bool *lock_dropped, struct collapse_control *cc)
 {
 	pmd_t *pmd;
 	pte_t *pte, *_pte;
@@ -1425,7 +1425,7 @@ out_unmap:
 		result = collapse_huge_page(mm, start_addr, referenced,
 					    unmapped, cc);
 		/* collapse_huge_page will return with the mmap_lock released */
-		*mmap_locked = false;
+		*lock_dropped = true;
 	}
 out:
 	trace_mm_khugepaged_scan_pmd(mm, folio, referenced,
@@ -2422,7 +2422,7 @@ static enum scan_result collapse_scan_fi
  * the results.
  */
 static enum scan_result collapse_single_pmd(unsigned long addr,
-		struct vm_area_struct *vma, bool *mmap_locked,
+		struct vm_area_struct *vma, bool *lock_dropped,
 		struct collapse_control *cc)
 {
 	struct mm_struct *mm = vma->vm_mm;
@@ -2431,8 +2431,10 @@ static enum scan_result collapse_single_
 	struct file *file;
 	pgoff_t pgoff;
 
+	mmap_assert_locked(mm);
+
 	if (vma_is_anonymous(vma)) {
-		result = collapse_scan_pmd(mm, vma, addr, mmap_locked, cc);
+		result = collapse_scan_pmd(mm, vma, addr, lock_dropped, cc);
 		goto end;
 	}
 
@@ -2440,7 +2442,7 @@ static enum scan_result collapse_single_
 	pgoff = linear_page_index(vma, addr);
 
 	mmap_read_unlock(mm);
-	*mmap_locked = false;
+	*lock_dropped = true;
 retry:
 	result = collapse_scan_file(mm, addr, file, pgoff, cc);
 
@@ -2537,21 +2539,21 @@ static void collapse_scan_mm_slot(unsign
 		VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
 
 		while (khugepaged_scan.address < hend) {
-			bool mmap_locked = true;
+			bool lock_dropped = false;
 
 			cond_resched();
 			if (unlikely(collapse_test_exit_or_disable(mm)))
 				goto breakouterloop;
 
-			VM_BUG_ON(khugepaged_scan.address < hstart ||
+			VM_WARN_ON_ONCE(khugepaged_scan.address < hstart ||
 				  khugepaged_scan.address + HPAGE_PMD_SIZE >
 				  hend);
 
 			*result = collapse_single_pmd(khugepaged_scan.address,
-						      vma, &mmap_locked, cc);
+						      vma, &lock_dropped, cc);
 			/* move to next address */
 			khugepaged_scan.address += HPAGE_PMD_SIZE;
-			if (!mmap_locked)
+			if (lock_dropped)
 				/*
 				 * We released mmap_lock so break loop.  Note
 				 * that we drop mmap_lock before all hugepage
@@ -2826,7 +2828,6 @@ int madvise_collapse(struct vm_area_stru
 	unsigned long hstart, hend, addr;
 	enum scan_result last_fail = SCAN_FAIL;
 	int thps = 0;
-	bool mmap_locked = true;
 
 	BUG_ON(vma->vm_start > start);
 	BUG_ON(vma->vm_end < end);
@@ -2849,10 +2850,10 @@ int madvise_collapse(struct vm_area_stru
 	for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
 		enum scan_result result = SCAN_FAIL;
 
-		if (!mmap_locked) {
+		if (*lock_dropped) {
 			cond_resched();
 			mmap_read_lock(mm);
-			mmap_locked = true;
+			*lock_dropped = false;
 			result = hugepage_vma_revalidate(mm, addr, false, &vma,
 							 cc);
 			if (result  != SCAN_SUCCEED) {
@@ -2862,12 +2863,8 @@ int madvise_collapse(struct vm_area_stru
 
 			hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
 		}
-		mmap_assert_locked(mm);
-
-		result = collapse_single_pmd(addr, vma, &mmap_locked, cc);
 
-		if (!mmap_locked)
-			*lock_dropped = true;
+		result = collapse_single_pmd(addr, vma, lock_dropped, cc);
 
 		switch (result) {
 		case SCAN_SUCCEED:
@@ -2876,7 +2873,6 @@ int madvise_collapse(struct vm_area_stru
 			break;
 		/* Whitelisted set of results where continuing OK */
 		case SCAN_NO_PTE_TABLE:
-		case SCAN_PTE_MAPPED_HUGEPAGE:
 		case SCAN_PTE_NON_PRESENT:
 		case SCAN_PTE_UFFD_WP:
 		case SCAN_LACK_REFERENCED_PAGE:
@@ -2897,7 +2893,7 @@ int madvise_collapse(struct vm_area_stru
 
 out_maybelock:
 	/* Caller expects us to hold mmap_lock on return */
-	if (!mmap_locked)
+	if (*lock_dropped)
 		mmap_read_lock(mm);
 out_nolock:
 	mmap_assert_locked(mm);
--- a/mm/memory.c~b
+++ a/mm/memory.c
@@ -5201,7 +5201,7 @@ void map_anon_folio_pte_nopf(struct foli
 		struct vm_area_struct *vma, unsigned long addr,
 		bool uffd_wp)
 {
-	unsigned int nr_pages = folio_nr_pages(folio);
+	const unsigned int nr_pages = folio_nr_pages(folio);
 	pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
 
 	entry = pte_sw_mkyoung(entry);
@@ -5221,10 +5221,10 @@ void map_anon_folio_pte_nopf(struct foli
 static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
 		struct vm_area_struct *vma, unsigned long addr, bool uffd_wp)
 {
-	unsigned int order = folio_order(folio);
+	const unsigned int order = folio_order(folio);
 
 	map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
-	add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1 << order);
+	add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1L << order);
 	count_mthp_stat(order, MTHP_STAT_ANON_FAULT_ALLOC);
 }
 
@@ -5239,7 +5239,7 @@ static vm_fault_t do_anonymous_page(stru
 	unsigned long addr = vmf->address;
 	struct folio *folio;
 	vm_fault_t ret = 0;
-	int nr_pages = 1;
+	int nr_pages;
 	pte_t entry;
 
 	/* File mapping without ->vm_ops ? */
@@ -5279,8 +5279,7 @@ static vm_fault_t do_anonymous_page(stru
 		set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
 
 		/* No need to invalidate - it was non-present before */
-		update_mmu_cache_range(vmf, vma, addr, vmf->pte,
-				       /*nr_pages=*/ 1);
+		update_mmu_cache(vma, addr, vmf->pte);
 		goto unlock;
 	}
 
_

Re: [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites

Posted by Roman Gushchin 1 week ago

Andrew Morton <akpm@linux-foundation.org> writes:

> On Wed, 25 Mar 2026 05:40:17 -0600 Nico Pache <npache@redhat.com> wrote:
>
>> MAINTAINER NOTE: This is based on mm-unstable with the coresponding
>> patches reverted then reapplied.
>
> Unfortunately the update-in-place trick fooled AI review, which might
> have been useful.  Oh well.  In retrospect we could have avoided that
> by you asking me to drop v3 a couple of days before mailing out v4.
>
> otoh, this series *does* apply to the mm-stable branch.  Roman, I
> though Sashiko is attempting that?

It did, but for some reason it failed.
You can actually see which trees/branches it tried under the Baseline
section for each patchset. And yes, I need to improve it to log specific
sha's, not only something like mm/mm-new. Now it's kinda hard to say why
it failed.

Re: [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites

Posted by Nico Pache 6 days, 23 hours ago

On Wed, Mar 25, 2026 at 10:45 PM Roman Gushchin
<roman.gushchin@linux.dev> wrote:
>
> Andrew Morton <akpm@linux-foundation.org> writes:
>
> > On Wed, 25 Mar 2026 05:40:17 -0600 Nico Pache <npache@redhat.com> wrote:
> >
> >> MAINTAINER NOTE: This is based on mm-unstable with the coresponding
> >> patches reverted then reapplied.
> >
> > Unfortunately the update-in-place trick fooled AI review, which might
> > have been useful.  Oh well.  In retrospect we could have avoided that
> > by you asking me to drop v3 a couple of days before mailing out v4.
> >
> > otoh, this series *does* apply to the mm-stable branch.  Roman, I
> > though Sashiko is attempting that?
>
> It did, but for some reason it failed.
> You can actually see which trees/branches it tried under the Baseline
> section for each patchset. And yes, I need to improve it to log specific
> sha's, not only something like mm/mm-new. Now it's kinda hard to say why
> it failed.

I dont think this was Sashkio's fault; this doesn't apply cleanly to
anything other than mm-unstable with my patches first reverted. I
first tried an mm-stable base but my code is now dependent on some
code in mm-unstable. As Andrew noted, I should have requested that he
pull the patches from mm-unstable before resending them.

-- Nico

>

Re: [PATCH mm-unstable v4 0/5] mm: khugepaged cleanups and mTHP prerequisites

Posted by Andrew Morton 6 days, 22 hours ago

On Thu, 26 Mar 2026 10:48:13 -0600 Nico Pache <npache@redhat.com> wrote:

> > It did, but for some reason it failed.
> > You can actually see which trees/branches it tried under the Baseline
> > section for each patchset. And yes, I need to improve it to log specific
> > sha's, not only something like mm/mm-new. Now it's kinda hard to say why
> > it failed.
> 
> I dont think this was Sashkio's fault; this doesn't apply cleanly to
> anything other than mm-unstable with my patches first reverted. I
> first tried an mm-stable base but my code is now dependent on some
> code in mm-unstable. As Andrew noted, I should have requested that he
> pull the patches from mm-unstable before resending them.

Well, you'd be the first person to do this.  We're still figuring out
how to use this tool, taking it day-by-day.

It would be nice for you to have the opportunity - most authors are
appreciating the AI checking.  It just caused Brendan to describe his
own patch as "complete rubbish" ;)

But don't bust a gut over this - let's ease into it rather than aiming
for some possibly premature step transition.