include/linux/huge_mm.h | 5 + include/linux/mm.h | 4 + mm/huge_memory.c | 2 +- mm/khugepaged.c | 207 ++++++++++++++++++++-------------------- mm/memory.c | 63 ++++++++---- mm/mempolicy.c | 2 +- mm/mremap.c | 2 +- mm/page_alloc.c | 4 +- mm/shmem.c | 3 +- 9 files changed, 161 insertions(+), 131 deletions(-)
MAINTAINER NOTE: This is based on mm-unstable with the coresponding
patches reverted then reapplied.
The following series contains cleanups and prerequisites for my work on
khugepaged mTHP support [1]. These have been separated out to ease review.
The first patch in the series refactors the page fault folio to pte mapping
and follows a similar convention as defined by map_anon_folio_pmd_(no)pf().
This not only cleans up the current implementation of do_anonymous_page(),
but will allow for reuse later in the khugepaged mTHP implementation.
The second patch adds a small is_pmd_order() helper to check if an order is
the PMD order. This check is open-coded in a number of places. This patch
aims to clean this up and will be used more in the khugepaged mTHP work.
The third patch also adds a small DEFINE for (HPAGE_PMD_NR - 1) which is
used often across the khugepaged code.
The fourth and fifth patch come from the khugepaged mTHP patchset [1].
These two patches include the rename of function prefixes, and the
unification of khugepaged and madvise_collapse via a new
collapse_single_pmd function.
Patch 1: refactor do_anonymous_page into map_anon_folio_pte_(no)pf
Patch 2: add is_pmd_order helper
Patch 3: Add define for (HPAGE_PMD_NR - 1)
Patch 4: Refactor/rename hpage_collapse
Patch 5: Refactoring to combine madvise_collapse and khugepaged
Testing:
- Built for x86_64, aarch64, ppc64le, and s390x
- ran all arches on test suites provided by the kernel-tests project
- selftests mm
V4 Changes:
- added RB and SB tags
- Patch1: commit message cleanup/additions
- Patch1: constify two variables, and change 1<<order to 1L<<..
- Patch1: change zero-page read path to use update_mmu_cache varient
- Patch5: remove dead code switch statement (SCAN_PTE_MAPPED_HUGEPAGE)
- Patch5: remove local mmap_locked from madvise_collapse()
- Patch5: rename mmap_locked to lock_dropped in ..scan_mm_slot() and
invert the logic. the madvise|khugepaged code now share the same
naming convention across both functions.
- Patch5: add assertion to collapse_single_pmd() so both madvise_collapse
and khugepaged assert the lock.
- Patch5: Convert one of the VM_BUG_ON's to VM_WARN_ON
V3 - https://lore.kernel.org/all/20260311211315.450947-1-npache@redhat.com
V2 - https://lore.kernel.org/all/20260226012929.169479-1-npache@redhat.com
V1 - https://lore.kernel.org/all/20260212021835.17755-1-npache@redhat.com
A big thanks to everyone that has reviewed, tested, and participated in
the development process.
[1] - https://lore.kernel.org/all/20260122192841.128719-1-npache@redhat.com
[2] - https://lore.kernel.org/all/7334b702-f6a0-4ccf-8ac6-8426a90d1846@kernel.org/
[3] - https://lore.kernel.org/all/25723c0f-c702-44ad-93e9-1056313680cd@kernel.org/
[4] - https://lore.kernel.org/all/81ff9caa-50f2-4951-8d82-2c8dcdf3db91@kernel.org/
Nico Pache (5):
mm: consolidate anonymous folio PTE mapping into helpers
mm: introduce is_pmd_order helper
mm/khugepaged: define KHUGEPAGED_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1
mm/khugepaged: rename hpage_collapse_* to collapse_*
mm/khugepaged: unify khugepaged and madv_collapse with
collapse_single_pmd()
include/linux/huge_mm.h | 5 +
include/linux/mm.h | 4 +
mm/huge_memory.c | 2 +-
mm/khugepaged.c | 207 ++++++++++++++++++++--------------------
mm/memory.c | 63 ++++++++----
mm/mempolicy.c | 2 +-
mm/mremap.c | 2 +-
mm/page_alloc.c | 4 +-
mm/shmem.c | 3 +-
9 files changed, 161 insertions(+), 131 deletions(-)
--
2.53.0
-cc old email Gentle reminder to please send new stuff to ljs@kernel.org! Thanks, Lorenzo On Wed, Mar 25, 2026 at 05:40:17AM -0600, Nico Pache wrote: > MAINTAINER NOTE: This is based on mm-unstable with the coresponding > patches reverted then reapplied. > > The following series contains cleanups and prerequisites for my work on > khugepaged mTHP support [1]. These have been separated out to ease review. > > The first patch in the series refactors the page fault folio to pte mapping > and follows a similar convention as defined by map_anon_folio_pmd_(no)pf(). > This not only cleans up the current implementation of do_anonymous_page(), > but will allow for reuse later in the khugepaged mTHP implementation. > > The second patch adds a small is_pmd_order() helper to check if an order is > the PMD order. This check is open-coded in a number of places. This patch > aims to clean this up and will be used more in the khugepaged mTHP work. > The third patch also adds a small DEFINE for (HPAGE_PMD_NR - 1) which is > used often across the khugepaged code. > > The fourth and fifth patch come from the khugepaged mTHP patchset [1]. > These two patches include the rename of function prefixes, and the > unification of khugepaged and madvise_collapse via a new > collapse_single_pmd function. > > Patch 1: refactor do_anonymous_page into map_anon_folio_pte_(no)pf > Patch 2: add is_pmd_order helper > Patch 3: Add define for (HPAGE_PMD_NR - 1) > Patch 4: Refactor/rename hpage_collapse > Patch 5: Refactoring to combine madvise_collapse and khugepaged > > Testing: > - Built for x86_64, aarch64, ppc64le, and s390x > - ran all arches on test suites provided by the kernel-tests project > - selftests mm > > V4 Changes: > - added RB and SB tags > - Patch1: commit message cleanup/additions > - Patch1: constify two variables, and change 1<<order to 1L<<.. > - Patch1: change zero-page read path to use update_mmu_cache varient > - Patch5: remove dead code switch statement (SCAN_PTE_MAPPED_HUGEPAGE) > - Patch5: remove local mmap_locked from madvise_collapse() > - Patch5: rename mmap_locked to lock_dropped in ..scan_mm_slot() and > invert the logic. the madvise|khugepaged code now share the same > naming convention across both functions. > - Patch5: add assertion to collapse_single_pmd() so both madvise_collapse > and khugepaged assert the lock. > - Patch5: Convert one of the VM_BUG_ON's to VM_WARN_ON > > V3 - https://lore.kernel.org/all/20260311211315.450947-1-npache@redhat.com > V2 - https://lore.kernel.org/all/20260226012929.169479-1-npache@redhat.com > V1 - https://lore.kernel.org/all/20260212021835.17755-1-npache@redhat.com > > A big thanks to everyone that has reviewed, tested, and participated in > the development process. > > [1] - https://lore.kernel.org/all/20260122192841.128719-1-npache@redhat.com > [2] - https://lore.kernel.org/all/7334b702-f6a0-4ccf-8ac6-8426a90d1846@kernel.org/ > [3] - https://lore.kernel.org/all/25723c0f-c702-44ad-93e9-1056313680cd@kernel.org/ > [4] - https://lore.kernel.org/all/81ff9caa-50f2-4951-8d82-2c8dcdf3db91@kernel.org/ > > Nico Pache (5): > mm: consolidate anonymous folio PTE mapping into helpers > mm: introduce is_pmd_order helper > mm/khugepaged: define KHUGEPAGED_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1 > mm/khugepaged: rename hpage_collapse_* to collapse_* > mm/khugepaged: unify khugepaged and madv_collapse with > collapse_single_pmd() > > include/linux/huge_mm.h | 5 + > include/linux/mm.h | 4 + > mm/huge_memory.c | 2 +- > mm/khugepaged.c | 207 ++++++++++++++++++++-------------------- > mm/memory.c | 63 ++++++++---- > mm/mempolicy.c | 2 +- > mm/mremap.c | 2 +- > mm/page_alloc.c | 4 +- > mm/shmem.c | 3 +- > 9 files changed, 161 insertions(+), 131 deletions(-) > > -- > 2.53.0 >
On Wed, 25 Mar 2026 05:40:17 -0600 Nico Pache <npache@redhat.com> wrote:
> MAINTAINER NOTE: This is based on mm-unstable with the coresponding
> patches reverted then reapplied.
Unfortunately the update-in-place trick fooled AI review, which might
have been useful. Oh well. In retrospect we could have avoided that
by you asking me to drop v3 a couple of days before mailing out v4.
otoh, this series *does* apply to the mm-stable branch. Roman, I
though Sashiko is attempting that?
> The following series contains cleanups and prerequisites for my work on
> khugepaged mTHP support [1]. These have been separated out to ease review.
And boy that's a lot of reviewers! Aren't you a lucky ducky ;)
> The first patch in the series refactors the page fault folio to pte mapping
> and follows a similar convention as defined by map_anon_folio_pmd_(no)pf().
> This not only cleans up the current implementation of do_anonymous_page(),
> but will allow for reuse later in the khugepaged mTHP implementation.
>
> The second patch adds a small is_pmd_order() helper to check if an order is
> the PMD order. This check is open-coded in a number of places. This patch
> aims to clean this up and will be used more in the khugepaged mTHP work.
> The third patch also adds a small DEFINE for (HPAGE_PMD_NR - 1) which is
> used often across the khugepaged code.
>
> The fourth and fifth patch come from the khugepaged mTHP patchset [1].
> These two patches include the rename of function prefixes, and the
> unification of khugepaged and madvise_collapse via a new
> collapse_single_pmd function.
>
> Patch 1: refactor do_anonymous_page into map_anon_folio_pte_(no)pf
> Patch 2: add is_pmd_order helper
> Patch 3: Add define for (HPAGE_PMD_NR - 1)
> Patch 4: Refactor/rename hpage_collapse
> Patch 5: Refactoring to combine madvise_collapse and khugepaged
>
Thanks, I updated mm.git's mm-unstable branch to this version.
> V4 Changes:
> - added RB and SB tags
> - Patch1: commit message cleanup/additions
> - Patch1: constify two variables, and change 1<<order to 1L<<..
> - Patch1: change zero-page read path to use update_mmu_cache varient
> - Patch5: remove dead code switch statement (SCAN_PTE_MAPPED_HUGEPAGE)
> - Patch5: remove local mmap_locked from madvise_collapse()
> - Patch5: rename mmap_locked to lock_dropped in ..scan_mm_slot() and
> invert the logic. the madvise|khugepaged code now share the same
> naming convention across both functions.
> - Patch5: add assertion to collapse_single_pmd() so both madvise_collapse
> and khugepaged assert the lock.
> - Patch5: Convert one of the VM_BUG_ON's to VM_WARN_ON
Below is how v4 altered mm,git:
mm/khugepaged.c | 34 +++++++++++++++-------------------
mm/memory.c | 11 +++++------
2 files changed, 20 insertions(+), 25 deletions(-)
--- a/mm/khugepaged.c~b
+++ a/mm/khugepaged.c
@@ -1250,7 +1250,7 @@ out_nolock:
static enum scan_result collapse_scan_pmd(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long start_addr,
- bool *mmap_locked, struct collapse_control *cc)
+ bool *lock_dropped, struct collapse_control *cc)
{
pmd_t *pmd;
pte_t *pte, *_pte;
@@ -1425,7 +1425,7 @@ out_unmap:
result = collapse_huge_page(mm, start_addr, referenced,
unmapped, cc);
/* collapse_huge_page will return with the mmap_lock released */
- *mmap_locked = false;
+ *lock_dropped = true;
}
out:
trace_mm_khugepaged_scan_pmd(mm, folio, referenced,
@@ -2422,7 +2422,7 @@ static enum scan_result collapse_scan_fi
* the results.
*/
static enum scan_result collapse_single_pmd(unsigned long addr,
- struct vm_area_struct *vma, bool *mmap_locked,
+ struct vm_area_struct *vma, bool *lock_dropped,
struct collapse_control *cc)
{
struct mm_struct *mm = vma->vm_mm;
@@ -2431,8 +2431,10 @@ static enum scan_result collapse_single_
struct file *file;
pgoff_t pgoff;
+ mmap_assert_locked(mm);
+
if (vma_is_anonymous(vma)) {
- result = collapse_scan_pmd(mm, vma, addr, mmap_locked, cc);
+ result = collapse_scan_pmd(mm, vma, addr, lock_dropped, cc);
goto end;
}
@@ -2440,7 +2442,7 @@ static enum scan_result collapse_single_
pgoff = linear_page_index(vma, addr);
mmap_read_unlock(mm);
- *mmap_locked = false;
+ *lock_dropped = true;
retry:
result = collapse_scan_file(mm, addr, file, pgoff, cc);
@@ -2537,21 +2539,21 @@ static void collapse_scan_mm_slot(unsign
VM_BUG_ON(khugepaged_scan.address & ~HPAGE_PMD_MASK);
while (khugepaged_scan.address < hend) {
- bool mmap_locked = true;
+ bool lock_dropped = false;
cond_resched();
if (unlikely(collapse_test_exit_or_disable(mm)))
goto breakouterloop;
- VM_BUG_ON(khugepaged_scan.address < hstart ||
+ VM_WARN_ON_ONCE(khugepaged_scan.address < hstart ||
khugepaged_scan.address + HPAGE_PMD_SIZE >
hend);
*result = collapse_single_pmd(khugepaged_scan.address,
- vma, &mmap_locked, cc);
+ vma, &lock_dropped, cc);
/* move to next address */
khugepaged_scan.address += HPAGE_PMD_SIZE;
- if (!mmap_locked)
+ if (lock_dropped)
/*
* We released mmap_lock so break loop. Note
* that we drop mmap_lock before all hugepage
@@ -2826,7 +2828,6 @@ int madvise_collapse(struct vm_area_stru
unsigned long hstart, hend, addr;
enum scan_result last_fail = SCAN_FAIL;
int thps = 0;
- bool mmap_locked = true;
BUG_ON(vma->vm_start > start);
BUG_ON(vma->vm_end < end);
@@ -2849,10 +2850,10 @@ int madvise_collapse(struct vm_area_stru
for (addr = hstart; addr < hend; addr += HPAGE_PMD_SIZE) {
enum scan_result result = SCAN_FAIL;
- if (!mmap_locked) {
+ if (*lock_dropped) {
cond_resched();
mmap_read_lock(mm);
- mmap_locked = true;
+ *lock_dropped = false;
result = hugepage_vma_revalidate(mm, addr, false, &vma,
cc);
if (result != SCAN_SUCCEED) {
@@ -2862,12 +2863,8 @@ int madvise_collapse(struct vm_area_stru
hend = min(hend, vma->vm_end & HPAGE_PMD_MASK);
}
- mmap_assert_locked(mm);
-
- result = collapse_single_pmd(addr, vma, &mmap_locked, cc);
- if (!mmap_locked)
- *lock_dropped = true;
+ result = collapse_single_pmd(addr, vma, lock_dropped, cc);
switch (result) {
case SCAN_SUCCEED:
@@ -2876,7 +2873,6 @@ int madvise_collapse(struct vm_area_stru
break;
/* Whitelisted set of results where continuing OK */
case SCAN_NO_PTE_TABLE:
- case SCAN_PTE_MAPPED_HUGEPAGE:
case SCAN_PTE_NON_PRESENT:
case SCAN_PTE_UFFD_WP:
case SCAN_LACK_REFERENCED_PAGE:
@@ -2897,7 +2893,7 @@ int madvise_collapse(struct vm_area_stru
out_maybelock:
/* Caller expects us to hold mmap_lock on return */
- if (!mmap_locked)
+ if (*lock_dropped)
mmap_read_lock(mm);
out_nolock:
mmap_assert_locked(mm);
--- a/mm/memory.c~b
+++ a/mm/memory.c
@@ -5201,7 +5201,7 @@ void map_anon_folio_pte_nopf(struct foli
struct vm_area_struct *vma, unsigned long addr,
bool uffd_wp)
{
- unsigned int nr_pages = folio_nr_pages(folio);
+ const unsigned int nr_pages = folio_nr_pages(folio);
pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
entry = pte_sw_mkyoung(entry);
@@ -5221,10 +5221,10 @@ void map_anon_folio_pte_nopf(struct foli
static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
struct vm_area_struct *vma, unsigned long addr, bool uffd_wp)
{
- unsigned int order = folio_order(folio);
+ const unsigned int order = folio_order(folio);
map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
- add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1 << order);
+ add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1L << order);
count_mthp_stat(order, MTHP_STAT_ANON_FAULT_ALLOC);
}
@@ -5239,7 +5239,7 @@ static vm_fault_t do_anonymous_page(stru
unsigned long addr = vmf->address;
struct folio *folio;
vm_fault_t ret = 0;
- int nr_pages = 1;
+ int nr_pages;
pte_t entry;
/* File mapping without ->vm_ops ? */
@@ -5279,8 +5279,7 @@ static vm_fault_t do_anonymous_page(stru
set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
/* No need to invalidate - it was non-present before */
- update_mmu_cache_range(vmf, vma, addr, vmf->pte,
- /*nr_pages=*/ 1);
+ update_mmu_cache(vma, addr, vmf->pte);
goto unlock;
}
_
Andrew Morton <akpm@linux-foundation.org> writes: > On Wed, 25 Mar 2026 05:40:17 -0600 Nico Pache <npache@redhat.com> wrote: > >> MAINTAINER NOTE: This is based on mm-unstable with the coresponding >> patches reverted then reapplied. > > Unfortunately the update-in-place trick fooled AI review, which might > have been useful. Oh well. In retrospect we could have avoided that > by you asking me to drop v3 a couple of days before mailing out v4. > > otoh, this series *does* apply to the mm-stable branch. Roman, I > though Sashiko is attempting that? It did, but for some reason it failed. You can actually see which trees/branches it tried under the Baseline section for each patchset. And yes, I need to improve it to log specific sha's, not only something like mm/mm-new. Now it's kinda hard to say why it failed.
On Wed, Mar 25, 2026 at 10:45 PM Roman Gushchin <roman.gushchin@linux.dev> wrote: > > Andrew Morton <akpm@linux-foundation.org> writes: > > > On Wed, 25 Mar 2026 05:40:17 -0600 Nico Pache <npache@redhat.com> wrote: > > > >> MAINTAINER NOTE: This is based on mm-unstable with the coresponding > >> patches reverted then reapplied. > > > > Unfortunately the update-in-place trick fooled AI review, which might > > have been useful. Oh well. In retrospect we could have avoided that > > by you asking me to drop v3 a couple of days before mailing out v4. > > > > otoh, this series *does* apply to the mm-stable branch. Roman, I > > though Sashiko is attempting that? > > It did, but for some reason it failed. > You can actually see which trees/branches it tried under the Baseline > section for each patchset. And yes, I need to improve it to log specific > sha's, not only something like mm/mm-new. Now it's kinda hard to say why > it failed. I dont think this was Sashkio's fault; this doesn't apply cleanly to anything other than mm-unstable with my patches first reverted. I first tried an mm-stable base but my code is now dependent on some code in mm-unstable. As Andrew noted, I should have requested that he pull the patches from mm-unstable before resending them. -- Nico >
On Thu, 26 Mar 2026 10:48:13 -0600 Nico Pache <npache@redhat.com> wrote: > > It did, but for some reason it failed. > > You can actually see which trees/branches it tried under the Baseline > > section for each patchset. And yes, I need to improve it to log specific > > sha's, not only something like mm/mm-new. Now it's kinda hard to say why > > it failed. > > I dont think this was Sashkio's fault; this doesn't apply cleanly to > anything other than mm-unstable with my patches first reverted. I > first tried an mm-stable base but my code is now dependent on some > code in mm-unstable. As Andrew noted, I should have requested that he > pull the patches from mm-unstable before resending them. Well, you'd be the first person to do this. We're still figuring out how to use this tool, taking it day-by-day. It would be nice for you to have the opportunity - most authors are appreciating the AI checking. It just caused Brendan to describe his own patch as "complete rubbish" ;) But don't bust a gut over this - let's ease into it rather than aiming for some possibly premature step transition.
© 2016 - 2026 Red Hat, Inc.