[v1] mm: khugepaged cleanups and mTHP prerequisites

[PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers

Posted by Nico Pache 1 month, 2 weeks ago

The anonymous page fault handler in do_anonymous_page() open-codes the
sequence to map a newly allocated anonymous folio at the PTE level:
	- construct the PTE entry
	- add rmap
	- add to LRU
	- set the PTEs
	- update the MMU cache.

Introduce a two helpers to consolidate this duplicated logic, mirroring the
existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:

	map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
	references, adds anon rmap and LRU. This function also handles the
	uffd_wp that can occur in the pf variant.

	map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
	counter updates, and mTHP fault allocation statistics for the page fault
	path.

The zero-page read path in do_anonymous_page() is also untangled from the
shared setpte label, since it does not allocate a folio and should not
share the same mapping sequence as the write path. Make nr_pages = 1
rather than relying on the variable. This makes it more clear that we
are operating on the zero page only.

This refactoring will also help reduce code duplication between mm/memory.c
and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
mapping that can be reused by future callers.

Signed-off-by: Nico Pache <npache@redhat.com>
---
 include/linux/mm.h |  4 ++++
 mm/memory.c        | 56 ++++++++++++++++++++++++++++++----------------
 2 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index f8a8fd47399c..c3aa1f51e020 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4916,4 +4916,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
 
 void snapshot_page(struct page_snapshot *ps, const struct page *page);
 
+void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
+		struct vm_area_struct *vma, unsigned long addr,
+		bool uffd_wp);
+
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory.c b/mm/memory.c
index 8c19af97f0a0..61c2277c9d9f 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5211,6 +5211,35 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 	return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
 }
 
+
+void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
+		struct vm_area_struct *vma, unsigned long addr,
+		bool uffd_wp)
+{
+	pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
+	unsigned int nr_pages = folio_nr_pages(folio);
+
+	entry = maybe_mkwrite(pte_mkdirty(entry), vma);
+	if (uffd_wp)
+		entry = pte_mkuffd_wp(entry);
+
+	folio_ref_add(folio, nr_pages - 1);
+	folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
+	folio_add_lru_vma(folio, vma);
+	set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
+	update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
+}
+
+static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
+		struct vm_area_struct *vma, unsigned long addr,
+		unsigned int nr_pages, bool uffd_wp)
+{
+	map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
+	add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
+	count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
+}
+
+
 /*
  * We enter with non-exclusive mmap_lock (to exclude vma changes,
  * but allow concurrent faults), and pte mapped but not yet locked.
@@ -5257,7 +5286,13 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 			pte_unmap_unlock(vmf->pte, vmf->ptl);
 			return handle_userfault(vmf, VM_UFFD_MISSING);
 		}
-		goto setpte;
+		if (vmf_orig_pte_uffd_wp(vmf))
+			entry = pte_mkuffd_wp(entry);
+		set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
+
+		/* No need to invalidate - it was non-present before */
+		update_mmu_cache_range(vmf, vma, addr, vmf->pte, /*nr_pages=*/ 1);
+		goto unlock;
 	}
 
 	/* Allocate our own private page. */
@@ -5281,11 +5316,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 	 */
 	__folio_mark_uptodate(folio);
 
-	entry = folio_mk_pte(folio, vma->vm_page_prot);
-	entry = pte_sw_mkyoung(entry);
-	if (vma->vm_flags & VM_WRITE)
-		entry = pte_mkwrite(pte_mkdirty(entry), vma);
-
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
 	if (!vmf->pte)
 		goto release;
@@ -5307,19 +5337,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 		folio_put(folio);
 		return handle_userfault(vmf, VM_UFFD_MISSING);
 	}
-
-	folio_ref_add(folio, nr_pages - 1);
-	add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
-	count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
-	folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
-	folio_add_lru_vma(folio, vma);
-setpte:
-	if (vmf_orig_pte_uffd_wp(vmf))
-		entry = pte_mkuffd_wp(entry);
-	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
-
-	/* No need to invalidate - it was non-present before */
-	update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
+	map_anon_folio_pte_pf(folio, vmf->pte, vma, addr, nr_pages, vmf_orig_pte_uffd_wp(vmf));
 unlock:
 	if (vmf->pte)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-- 
2.53.0

Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers

Posted by Zi Yan 1 month, 2 weeks ago

On 11 Feb 2026, at 21:18, Nico Pache wrote:

> The anonymous page fault handler in do_anonymous_page() open-codes the
> sequence to map a newly allocated anonymous folio at the PTE level:
> 	- construct the PTE entry
> 	- add rmap
> 	- add to LRU
> 	- set the PTEs
> 	- update the MMU cache.
>
> Introduce a two helpers to consolidate this duplicated logic, mirroring the
> existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
>
> 	map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
> 	references, adds anon rmap and LRU. This function also handles the
> 	uffd_wp that can occur in the pf variant.
>
> 	map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
> 	counter updates, and mTHP fault allocation statistics for the page fault
> 	path.
>
> The zero-page read path in do_anonymous_page() is also untangled from the
> shared setpte label, since it does not allocate a folio and should not
> share the same mapping sequence as the write path. Make nr_pages = 1
> rather than relying on the variable. This makes it more clear that we
> are operating on the zero page only.
>
> This refactoring will also help reduce code duplication between mm/memory.c
> and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
> mapping that can be reused by future callers.
>
> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  include/linux/mm.h |  4 ++++
>  mm/memory.c        | 56 ++++++++++++++++++++++++++++++----------------
>  2 files changed, 41 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f8a8fd47399c..c3aa1f51e020 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4916,4 +4916,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
>
>  void snapshot_page(struct page_snapshot *ps, const struct page *page);
>
> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> +		struct vm_area_struct *vma, unsigned long addr,
> +		bool uffd_wp);
> +
>  #endif /* _LINUX_MM_H */
> diff --git a/mm/memory.c b/mm/memory.c
> index 8c19af97f0a0..61c2277c9d9f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5211,6 +5211,35 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>  	return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
>  }
>
> +
> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> +		struct vm_area_struct *vma, unsigned long addr,
> +		bool uffd_wp)
> +{
> +	pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
> +	unsigned int nr_pages = folio_nr_pages(folio);
> +
> +	entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> +	if (uffd_wp)
> +		entry = pte_mkuffd_wp(entry);
> +
> +	folio_ref_add(folio, nr_pages - 1);
> +	folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> +	folio_add_lru_vma(folio, vma);
> +	set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
> +	update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);

Copy the comment
/* No need to invalidate - it was non-present before */
above it please.

> +}
> +
> +static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
> +		struct vm_area_struct *vma, unsigned long addr,
> +		unsigned int nr_pages, bool uffd_wp)
> +{
> +	map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
> +	add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> +	count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> +}
> +
> +
>  /*
>   * We enter with non-exclusive mmap_lock (to exclude vma changes,
>   * but allow concurrent faults), and pte mapped but not yet locked.
> @@ -5257,7 +5286,13 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>  			pte_unmap_unlock(vmf->pte, vmf->ptl);
>  			return handle_userfault(vmf, VM_UFFD_MISSING);
>  		}
> -		goto setpte;
> +		if (vmf_orig_pte_uffd_wp(vmf))
> +			entry = pte_mkuffd_wp(entry);
> +		set_pte_at(vma->vm_mm, addr, vmf->pte, entry);

entry is only used in this if statement, you can move its declaration inside.

> +
> +		/* No need to invalidate - it was non-present before */
> +		update_mmu_cache_range(vmf, vma, addr, vmf->pte, /*nr_pages=*/ 1);
> +		goto unlock;
>  	}
>
>  	/* Allocate our own private page. */
> @@ -5281,11 +5316,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>  	 */
>  	__folio_mark_uptodate(folio);
>
> -	entry = folio_mk_pte(folio, vma->vm_page_prot);
> -	entry = pte_sw_mkyoung(entry);

It is removed, can you explain why?

> -	if (vma->vm_flags & VM_WRITE)
> -		entry = pte_mkwrite(pte_mkdirty(entry), vma);

OK, this becomes maybe_mkwrite(pte_mkdirty(entry), vma).

> -

The above code is moved into map_anon_folio_pte_nopf(), thus executed
later than before the change. folio, vma->vm_flags, and vma->vm_page_prot
are not changed between, so there should be no functional change.
But it is better to explain it in the commit message to make review easier.

>  	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
>  	if (!vmf->pte)
>  		goto release;
> @@ -5307,19 +5337,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>  		folio_put(folio);
>  		return handle_userfault(vmf, VM_UFFD_MISSING);
>  	}
> -
> -	folio_ref_add(folio, nr_pages - 1);

> -	add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> -	count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);

These counter updates are moved after folio_add_new_anon_rmap(),
mirroring map_anon_folio_pmd_pf()’s order. Looks good to me.

> -	folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> -	folio_add_lru_vma(folio, vma);
> -setpte:

> -	if (vmf_orig_pte_uffd_wp(vmf))
> -		entry = pte_mkuffd_wp(entry);

This is moved above folio_ref_add() in map_anon_folio_pte_nopf(), but
no functional change is expected.

> -	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
> -
> -	/* No need to invalidate - it was non-present before */
> -	update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
> +	map_anon_folio_pte_pf(folio, vmf->pte, vma, addr, nr_pages, vmf_orig_pte_uffd_wp(vmf));
>  unlock:
>  	if (vmf->pte)
>  		pte_unmap_unlock(vmf->pte, vmf->ptl);
> -- 
> 2.53.0

3 things:
1. Copy the comment for update_mmu_cache_range() in map_anon_folio_pte_nopf().
2. Make pte_t entry local in zero-page handling.
3. Explain why entry = pte_sw_mkyoung(entry) is removed.

Thanks.


Best Regards,
Yan, Zi

Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers

Posted by Nico Pache 1 month, 2 weeks ago

On Thu, Feb 12, 2026 at 9:09 AM Zi Yan <ziy@nvidia.com> wrote:
>
> On 11 Feb 2026, at 21:18, Nico Pache wrote:
>
> > The anonymous page fault handler in do_anonymous_page() open-codes the
> > sequence to map a newly allocated anonymous folio at the PTE level:
> >       - construct the PTE entry
> >       - add rmap
> >       - add to LRU
> >       - set the PTEs
> >       - update the MMU cache.
> >
> > Introduce a two helpers to consolidate this duplicated logic, mirroring the
> > existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
> >
> >       map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
> >       references, adds anon rmap and LRU. This function also handles the
> >       uffd_wp that can occur in the pf variant.
> >
> >       map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
> >       counter updates, and mTHP fault allocation statistics for the page fault
> >       path.
> >
> > The zero-page read path in do_anonymous_page() is also untangled from the
> > shared setpte label, since it does not allocate a folio and should not
> > share the same mapping sequence as the write path. Make nr_pages = 1
> > rather than relying on the variable. This makes it more clear that we
> > are operating on the zero page only.
> >
> > This refactoring will also help reduce code duplication between mm/memory.c
> > and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
> > mapping that can be reused by future callers.
> >
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> >  include/linux/mm.h |  4 ++++
> >  mm/memory.c        | 56 ++++++++++++++++++++++++++++++----------------
> >  2 files changed, 41 insertions(+), 19 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index f8a8fd47399c..c3aa1f51e020 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4916,4 +4916,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
> >
> >  void snapshot_page(struct page_snapshot *ps, const struct page *page);
> >
> > +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> > +             struct vm_area_struct *vma, unsigned long addr,
> > +             bool uffd_wp);
> > +
> >  #endif /* _LINUX_MM_H */
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 8c19af97f0a0..61c2277c9d9f 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -5211,6 +5211,35 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
> >       return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
> >  }
> >
> > +
> > +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> > +             struct vm_area_struct *vma, unsigned long addr,
> > +             bool uffd_wp)
> > +{
> > +     pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
> > +     unsigned int nr_pages = folio_nr_pages(folio);
> > +
> > +     entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> > +     if (uffd_wp)
> > +             entry = pte_mkuffd_wp(entry);
> > +
> > +     folio_ref_add(folio, nr_pages - 1);
> > +     folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> > +     folio_add_lru_vma(folio, vma);
> > +     set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
> > +     update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
>
> Copy the comment
> /* No need to invalidate - it was non-present before */
> above it please.

Good call thank you!

>
> > +}
> > +
> > +static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
> > +             struct vm_area_struct *vma, unsigned long addr,
> > +             unsigned int nr_pages, bool uffd_wp)
> > +{
> > +     map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
> > +     add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> > +     count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> > +}
> > +
> > +
> >  /*
> >   * We enter with non-exclusive mmap_lock (to exclude vma changes,
> >   * but allow concurrent faults), and pte mapped but not yet locked.
> > @@ -5257,7 +5286,13 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> >                       pte_unmap_unlock(vmf->pte, vmf->ptl);
> >                       return handle_userfault(vmf, VM_UFFD_MISSING);
> >               }
> > -             goto setpte;
> > +             if (vmf_orig_pte_uffd_wp(vmf))
> > +                     entry = pte_mkuffd_wp(entry);
> > +             set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
>
> entry is only used in this if statement, you can move its declaration inside.

Ack!

>
> > +
> > +             /* No need to invalidate - it was non-present before */
> > +             update_mmu_cache_range(vmf, vma, addr, vmf->pte, /*nr_pages=*/ 1);
> > +             goto unlock;
> >       }
> >
> >       /* Allocate our own private page. */
> > @@ -5281,11 +5316,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> >        */
> >       __folio_mark_uptodate(folio);
> >
> > -     entry = folio_mk_pte(folio, vma->vm_page_prot);
> > -     entry = pte_sw_mkyoung(entry);
>
> It is removed, can you explain why?

Thanks for catching that (as others have too), I will add it back and
run my testing again to make sure everything is still ok. As Joshua
pointed out it may only affect MIPS, hence no issues in my testing.

>
> > -     if (vma->vm_flags & VM_WRITE)
> > -             entry = pte_mkwrite(pte_mkdirty(entry), vma);
>
> OK, this becomes maybe_mkwrite(pte_mkdirty(entry), vma).

Yes, upon further investigation this does seem to slightly change the behavior.

pte_mkdirty() is now being called unconditionally from the VM_WRITE
flag. I noticed other callers in the kernel doing this too.

Is it ok to leave the pte_mkdirty() or should I go back to using
pte_mkwrite with the conditional guarding both mkwrite and mkdirty?

>
> > -
>
> The above code is moved into map_anon_folio_pte_nopf(), thus executed
> later than before the change. folio, vma->vm_flags, and vma->vm_page_prot
> are not changed between, so there should be no functional change.
> But it is better to explain it in the commit message to make review easier.

Will do! Thank you for confirming :) I am pretty sure we can make this
move without any functional change.

>
> >       vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
> >       if (!vmf->pte)
> >               goto release;
> > @@ -5307,19 +5337,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> >               folio_put(folio);
> >               return handle_userfault(vmf, VM_UFFD_MISSING);
> >       }
> > -
> > -     folio_ref_add(folio, nr_pages - 1);
>
> > -     add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> > -     count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
>
> These counter updates are moved after folio_add_new_anon_rmap(),
> mirroring map_anon_folio_pmd_pf()’s order. Looks good to me.
>
> > -     folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> > -     folio_add_lru_vma(folio, vma);
> > -setpte:
>
> > -     if (vmf_orig_pte_uffd_wp(vmf))
> > -             entry = pte_mkuffd_wp(entry);
>
> This is moved above folio_ref_add() in map_anon_folio_pte_nopf(), but
> no functional change is expected.
>
> > -     set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
> > -
> > -     /* No need to invalidate - it was non-present before */
> > -     update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
> > +     map_anon_folio_pte_pf(folio, vmf->pte, vma, addr, nr_pages, vmf_orig_pte_uffd_wp(vmf));
> >  unlock:
> >       if (vmf->pte)
> >               pte_unmap_unlock(vmf->pte, vmf->ptl);
> > --
> > 2.53.0
>
> 3 things:
> 1. Copy the comment for update_mmu_cache_range() in map_anon_folio_pte_nopf().
> 2. Make pte_t entry local in zero-page handling.
> 3. Explain why entry = pte_sw_mkyoung(entry) is removed.
>
> Thanks.

Thanks for the review :) Ill fix the issues stated above!

-- Nico

>
>
> Best Regards,
> Yan, Zi
>

Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers

Posted by Zi Yan 1 month, 2 weeks ago

<snip>

>>
>>> -     if (vma->vm_flags & VM_WRITE)
>>> -             entry = pte_mkwrite(pte_mkdirty(entry), vma);
>>
>> OK, this becomes maybe_mkwrite(pte_mkdirty(entry), vma).
>
> Yes, upon further investigation this does seem to slightly change the behavior.

I did not notice it when I was reviewing it. ;)

>
> pte_mkdirty() is now being called unconditionally from the VM_WRITE
> flag. I noticed other callers in the kernel doing this too.
>
> Is it ok to leave the pte_mkdirty() or should I go back to using
> pte_mkwrite with the conditional guarding both mkwrite and mkdirty?


IMHO, it is better to use the conditional guarding way.
We reach here when userspace reads an address (VM_WRITE is not set)
and no zero page is used. Using maybe_mkwrite(pte_mkdirty(entry), vma)
means we will get a dirty PTE pointing to the allocated page but user
only reads from it.

Best Regards,
Yan, Zi

Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers

Posted by David Hildenbrand (Arm) 1 month, 2 weeks ago

On 2/12/26 21:06, Zi Yan wrote:
> <snip>
> 
>>>
>>>
>>> OK, this becomes maybe_mkwrite(pte_mkdirty(entry), vma).
>>
>> Yes, upon further investigation this does seem to slightly change the behavior.
> 
> I did not notice it when I was reviewing it. ;)
> 
>>
>> pte_mkdirty() is now being called unconditionally from the VM_WRITE
>> flag. I noticed other callers in the kernel doing this too.
>>
>> Is it ok to leave the pte_mkdirty() or should I go back to using
>> pte_mkwrite with the conditional guarding both mkwrite and mkdirty?
> 
> 
> IMHO, it is better to use the conditional guarding way.
> We reach here when userspace reads an address (VM_WRITE is not set)
> and no zero page is used. Using maybe_mkwrite(pte_mkdirty(entry), vma)
> means we will get a dirty PTE pointing to the allocated page but user
> only reads from it.

In general, it's best to not perform any such changes as part of a 
bigger patch. :)

-- 
Cheers,

David

Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers

Posted by Joshua Hahn 1 month, 2 weeks ago

On Wed, 11 Feb 2026 19:18:31 -0700 Nico Pache <npache@redhat.com> wrote:

Hello Nico,

Thank you for the patch! I hope you are having a good day.

> The anonymous page fault handler in do_anonymous_page() open-codes the
> sequence to map a newly allocated anonymous folio at the PTE level:
> 	- construct the PTE entry
> 	- add rmap
> 	- add to LRU
> 	- set the PTEs
> 	- update the MMU cache.
> 
> Introduce a two helpers to consolidate this duplicated logic, mirroring the
> existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
> 
> 	map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
> 	references, adds anon rmap and LRU. This function also handles the
> 	uffd_wp that can occur in the pf variant.
> 
> 	map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
> 	counter updates, and mTHP fault allocation statistics for the page fault
> 	path.
> 
> The zero-page read path in do_anonymous_page() is also untangled from the
> shared setpte label, since it does not allocate a folio and should not
> share the same mapping sequence as the write path. Make nr_pages = 1
> rather than relying on the variable. This makes it more clear that we
> are operating on the zero page only.
> 
> This refactoring will also help reduce code duplication between mm/memory.c
> and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
> mapping that can be reused by future callers.

It seems that based on this description, there should be no functional change
in the code below. Is that correct?

> Signed-off-by: Nico Pache <npache@redhat.com>
> ---
>  include/linux/mm.h |  4 ++++
>  mm/memory.c        | 56 ++++++++++++++++++++++++++++++----------------
>  2 files changed, 41 insertions(+), 19 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index f8a8fd47399c..c3aa1f51e020 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4916,4 +4916,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
>  
>  void snapshot_page(struct page_snapshot *ps, const struct page *page);
>  
> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> +		struct vm_area_struct *vma, unsigned long addr,
> +		bool uffd_wp);
> +
>  #endif /* _LINUX_MM_H */
> diff --git a/mm/memory.c b/mm/memory.c
> index 8c19af97f0a0..61c2277c9d9f 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5211,6 +5211,35 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>  	return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
>  }
>  
> +

^^^ extra newline?

> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> +		struct vm_area_struct *vma, unsigned long addr,
> +		bool uffd_wp)
> +{
> +	pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
> +	unsigned int nr_pages = folio_nr_pages(folio);

Just reading through the code below on what was deleted and what was added,
Maybe we are missing a pte_sw_mkyoung(entry) here? Seems like this would
matter for MIPS systmes, but I couldn't find this change in the changelog.

> +	entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> +	if (uffd_wp)
> +		entry = pte_mkuffd_wp(entry);

The ordering here was also changed, but it wasn't immediately obvious to me
why it was changed.

> +
> +	folio_ref_add(folio, nr_pages - 1);
> +	folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> +	folio_add_lru_vma(folio, vma);
> +	set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
> +	update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
> +}
> +
> +static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
> +		struct vm_area_struct *vma, unsigned long addr,
> +		unsigned int nr_pages, bool uffd_wp)
> +{
> +	map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
> +	add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> +	count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> +}
> +
> +

^^^ extra newline?

>  /*
>   * We enter with non-exclusive mmap_lock (to exclude vma changes,
>   * but allow concurrent faults), and pte mapped but not yet locked.
> @@ -5257,7 +5286,13 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>  			pte_unmap_unlock(vmf->pte, vmf->ptl);
>  			return handle_userfault(vmf, VM_UFFD_MISSING);
>  		}
> -		goto setpte;
> +		if (vmf_orig_pte_uffd_wp(vmf))
> +			entry = pte_mkuffd_wp(entry);
> +		set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
> +
> +		/* No need to invalidate - it was non-present before */
> +		update_mmu_cache_range(vmf, vma, addr, vmf->pte, /*nr_pages=*/ 1);

NIT: Should we try to keep the line under 80 columns? ; -)

> +		goto unlock;
>  	}
>  
>  	/* Allocate our own private page. */
> @@ -5281,11 +5316,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>  	 */
>  	__folio_mark_uptodate(folio);
>  
> -	entry = folio_mk_pte(folio, vma->vm_page_prot);
> -	entry = pte_sw_mkyoung(entry);
> -	if (vma->vm_flags & VM_WRITE)
> -		entry = pte_mkwrite(pte_mkdirty(entry), vma);
> -
>  	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
>  	if (!vmf->pte)
>  		goto release;
> @@ -5307,19 +5337,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>  		folio_put(folio);
>  		return handle_userfault(vmf, VM_UFFD_MISSING);
>  	}
> -
> -	folio_ref_add(folio, nr_pages - 1);
> -	add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> -	count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> -	folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> -	folio_add_lru_vma(folio, vma);
> -setpte:
> -	if (vmf_orig_pte_uffd_wp(vmf))
> -		entry = pte_mkuffd_wp(entry);
> -	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
> -
> -	/* No need to invalidate - it was non-present before */
> -	update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
> +	map_anon_folio_pte_pf(folio, vmf->pte, vma, addr, nr_pages, vmf_orig_pte_uffd_wp(vmf));

NIT: Maybe here as well?

>  unlock:
>  	if (vmf->pte)
>  		pte_unmap_unlock(vmf->pte, vmf->ptl);
> -- 
> 2.53.0

Thank you for the patch again. I hope you have a great day!
Joshua

Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers

Posted by Nico Pache 1 month, 2 weeks ago

On Thu, Feb 12, 2026 at 8:55 AM Joshua Hahn <joshua.hahnjy@gmail.com> wrote:
>
> On Wed, 11 Feb 2026 19:18:31 -0700 Nico Pache <npache@redhat.com> wrote:
>
> Hello Nico,
>
> Thank you for the patch! I hope you are having a good day.

Hey Joshua!

Thank you for reviewing! I hope you have a good day too :)

>
> > The anonymous page fault handler in do_anonymous_page() open-codes the
> > sequence to map a newly allocated anonymous folio at the PTE level:
> >       - construct the PTE entry
> >       - add rmap
> >       - add to LRU
> >       - set the PTEs
> >       - update the MMU cache.
> >
> > Introduce a two helpers to consolidate this duplicated logic, mirroring the
> > existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
> >
> >       map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
> >       references, adds anon rmap and LRU. This function also handles the
> >       uffd_wp that can occur in the pf variant.
> >
> >       map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
> >       counter updates, and mTHP fault allocation statistics for the page fault
> >       path.
> >
> > The zero-page read path in do_anonymous_page() is also untangled from the
> > shared setpte label, since it does not allocate a folio and should not
> > share the same mapping sequence as the write path. Make nr_pages = 1
> > rather than relying on the variable. This makes it more clear that we
> > are operating on the zero page only.
> >
> > This refactoring will also help reduce code duplication between mm/memory.c
> > and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
> > mapping that can be reused by future callers.
>
> It seems that based on this description, there should be no functional change
> in the code below. Is that correct?

Correct, but as you and others have pointed out I believe I missed a
pte_sw_mkyoung.

On closer inspection I also may have inadvertently changed some
behavior around pte_mkdirty().

In the previous implementation we only called pte_mkdirty if its
VM_WRITE. When switching over to maybe_mkwrite pte_mkdirty is no
longer called conditionally, while pte_mkwrite still is.

Nothing showed up in my testing, but some of these things can be
tricky to trigger. Other callers also make this "mistake" (if it even
is one), but I'm aiming for no functional change so I appreciate the
thoroughness here! I will clean up both of these issues.

>
> > Signed-off-by: Nico Pache <npache@redhat.com>
> > ---
> >  include/linux/mm.h |  4 ++++
> >  mm/memory.c        | 56 ++++++++++++++++++++++++++++++----------------
> >  2 files changed, 41 insertions(+), 19 deletions(-)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index f8a8fd47399c..c3aa1f51e020 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -4916,4 +4916,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
> >
> >  void snapshot_page(struct page_snapshot *ps, const struct page *page);
> >
> > +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> > +             struct vm_area_struct *vma, unsigned long addr,
> > +             bool uffd_wp);
> > +
> >  #endif /* _LINUX_MM_H */
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 8c19af97f0a0..61c2277c9d9f 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -5211,6 +5211,35 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
> >       return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
> >  }
> >
> > +
>
> ^^^ extra newline?

oops yes! thanks

>
> > +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> > +             struct vm_area_struct *vma, unsigned long addr,
> > +             bool uffd_wp)
> > +{
> > +     pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
> > +     unsigned int nr_pages = folio_nr_pages(folio);
>
> Just reading through the code below on what was deleted and what was added,
> Maybe we are missing a pte_sw_mkyoung(entry) here? Seems like this would
> matter for MIPS systmes, but I couldn't find this change in the changelog.

I think you are correct. In my khugepaged implementation this was not
present. I will add it back and run some tests! Thank you.

>
> > +     entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> > +     if (uffd_wp)
> > +             entry = pte_mkuffd_wp(entry);
>
> The ordering here was also changed, but it wasn't immediately obvious to me
> why it was changed.

I dont see any data dependencies between the folio rmap/folio/ref
changes and the pte changes. The order was most likely due to the
setpte labeling which this also cleans up. I believe we can reorder
this, but if you see an issue with it please lmk!

>
> > +
> > +     folio_ref_add(folio, nr_pages - 1);
> > +     folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> > +     folio_add_lru_vma(folio, vma);
> > +     set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
> > +     update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
> > +}
> > +
> > +static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
> > +             struct vm_area_struct *vma, unsigned long addr,
> > +             unsigned int nr_pages, bool uffd_wp)
> > +{
> > +     map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
> > +     add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> > +     count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> > +}
> > +
> > +
>
> ^^^ extra newline?

whoops yes thank you!

>
> >  /*
> >   * We enter with non-exclusive mmap_lock (to exclude vma changes,
> >   * but allow concurrent faults), and pte mapped but not yet locked.
> > @@ -5257,7 +5286,13 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> >                       pte_unmap_unlock(vmf->pte, vmf->ptl);
> >                       return handle_userfault(vmf, VM_UFFD_MISSING);
> >               }
> > -             goto setpte;
> > +             if (vmf_orig_pte_uffd_wp(vmf))
> > +                     entry = pte_mkuffd_wp(entry);
> > +             set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
> > +
> > +             /* No need to invalidate - it was non-present before */
> > +             update_mmu_cache_range(vmf, vma, addr, vmf->pte, /*nr_pages=*/ 1);
>
> NIT: Should we try to keep the line under 80 columns? ; -)

Ah yes, I still dont fully understand this rule as it's broken in a
lot of places. Ill move nr_pages to a new line.

>
> > +             goto unlock;
> >       }
> >
> >       /* Allocate our own private page. */
> > @@ -5281,11 +5316,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> >        */
> >       __folio_mark_uptodate(folio);
> >
> > -     entry = folio_mk_pte(folio, vma->vm_page_prot);
> > -     entry = pte_sw_mkyoung(entry);
> > -     if (vma->vm_flags & VM_WRITE)
> > -             entry = pte_mkwrite(pte_mkdirty(entry), vma);
> > -
> >       vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
> >       if (!vmf->pte)
> >               goto release;
> > @@ -5307,19 +5337,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
> >               folio_put(folio);
> >               return handle_userfault(vmf, VM_UFFD_MISSING);
> >       }
> > -
> > -     folio_ref_add(folio, nr_pages - 1);
> > -     add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> > -     count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> > -     folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> > -     folio_add_lru_vma(folio, vma);
> > -setpte:
> > -     if (vmf_orig_pte_uffd_wp(vmf))
> > -             entry = pte_mkuffd_wp(entry);
> > -     set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
> > -
> > -     /* No need to invalidate - it was non-present before */
> > -     update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
> > +     map_anon_folio_pte_pf(folio, vmf->pte, vma, addr, nr_pages, vmf_orig_pte_uffd_wp(vmf));
>
> NIT: Maybe here as well?

ack

>
> >  unlock:
> >       if (vmf->pte)
> >               pte_unmap_unlock(vmf->pte, vmf->ptl);
> > --
> > 2.53.0
>
> Thank you for the patch again. I hope you have a great day!
> Joshua

Thank you! you as well
-- Nico

>

Re: [PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers

Posted by Pedro Falcato 1 month, 2 weeks ago

On Wed, Feb 11, 2026 at 07:18:31PM -0700, Nico Pache wrote:
> The anonymous page fault handler in do_anonymous_page() open-codes the
> sequence to map a newly allocated anonymous folio at the PTE level:
> 	- construct the PTE entry
> 	- add rmap
> 	- add to LRU
> 	- set the PTEs
> 	- update the MMU cache.
> 
> Introduce a two helpers to consolidate this duplicated logic, mirroring the
> existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
> 
> 	map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
> 	references, adds anon rmap and LRU. This function also handles the
> 	uffd_wp that can occur in the pf variant.
> 
> 	map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
> 	counter updates, and mTHP fault allocation statistics for the page fault
> 	path.
> 
> The zero-page read path in do_anonymous_page() is also untangled from the
> shared setpte label, since it does not allocate a folio and should not
> share the same mapping sequence as the write path. Make nr_pages = 1
> rather than relying on the variable. This makes it more clear that we
> are operating on the zero page only.
> 
> This refactoring will also help reduce code duplication between mm/memory.c
> and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
> mapping that can be reused by future callers.
> 
> Signed-off-by: Nico Pache <npache@redhat.com>

Acked-by: Pedro Falcato <pfalcato@suse.de>

Looks a little nicer, thanks :)

-- 
Pedro

[PATCH mm-unstable v1 1/5] mm: consolidate anonymous folio PTE mapping into helpers
[PATCH mm-unstable v1 2/5] mm: introduce is_pmd_order helper
[PATCH mm-unstable v1 3/5] mm/khugepaged: define COLLAPSE_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1
[PATCH mm-unstable v1 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_*
[PATCH mm-unstable v1 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()