[v3] mm: khugepaged cleanups and mTHP prerequisites

[PATCH mm-unstable v3 1/5] mm: consolidate anonymous folio PTE mapping into helpers

Posted by Nico Pache 3 weeks, 5 days ago

The anonymous page fault handler in do_anonymous_page() open-codes the
sequence to map a newly allocated anonymous folio at the PTE level:
	- construct the PTE entry
	- add rmap
	- add to LRU
	- set the PTEs
	- update the MMU cache.

Introduce a two helpers to consolidate this duplicated logic, mirroring the
existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:

	map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
	references, adds anon rmap and LRU. This function also handles the
	uffd_wp that can occur in the pf variant.

	map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
	counter updates, and mTHP fault allocation statistics for the page fault
	path.

The zero-page read path in do_anonymous_page() is also untangled from the
shared setpte label, since it does not allocate a folio and should not
share the same mapping sequence as the write path. Make nr_pages = 1
rather than relying on the variable. This makes it more clear that we
are operating on the zero page only.

This refactoring will also help reduce code duplication between mm/memory.c
and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
mapping that can be reused by future callers.

Reviewed-by: Dev Jain <dev.jain@arm.com>
Reviewed-by: Lance Yang <lance.yang@linux.dev>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Signed-off-by: Nico Pache <npache@redhat.com>
---
 include/linux/mm.h |  4 ++++
 mm/memory.c        | 60 +++++++++++++++++++++++++++++++---------------
 2 files changed, 45 insertions(+), 19 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 4c4fd55fc823..9fea354bd17f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -4903,4 +4903,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
 
 void snapshot_page(struct page_snapshot *ps, const struct page *page);
 
+void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
+		struct vm_area_struct *vma, unsigned long addr,
+		bool uffd_wp);
+
 #endif /* _LINUX_MM_H */
diff --git a/mm/memory.c b/mm/memory.c
index 6aa0ea4af1fc..5c8bf1eb55f5 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5197,6 +5197,37 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
 	return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
 }
 
+void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
+		struct vm_area_struct *vma, unsigned long addr,
+		bool uffd_wp)
+{
+	unsigned int nr_pages = folio_nr_pages(folio);
+	pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
+
+	entry = pte_sw_mkyoung(entry);
+
+	if (vma->vm_flags & VM_WRITE)
+		entry = pte_mkwrite(pte_mkdirty(entry), vma);
+	if (uffd_wp)
+		entry = pte_mkuffd_wp(entry);
+
+	folio_ref_add(folio, nr_pages - 1);
+	folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
+	folio_add_lru_vma(folio, vma);
+	set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
+	update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
+}
+
+static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
+		struct vm_area_struct *vma, unsigned long addr, bool uffd_wp)
+{
+	unsigned int order = folio_order(folio);
+
+	map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
+	add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1 << order);
+	count_mthp_stat(order, MTHP_STAT_ANON_FAULT_ALLOC);
+}
+
 /*
  * We enter with non-exclusive mmap_lock (to exclude vma changes,
  * but allow concurrent faults), and pte mapped but not yet locked.
@@ -5243,7 +5274,14 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 			pte_unmap_unlock(vmf->pte, vmf->ptl);
 			return handle_userfault(vmf, VM_UFFD_MISSING);
 		}
-		goto setpte;
+		if (vmf_orig_pte_uffd_wp(vmf))
+			entry = pte_mkuffd_wp(entry);
+		set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
+
+		/* No need to invalidate - it was non-present before */
+		update_mmu_cache_range(vmf, vma, addr, vmf->pte,
+				       /*nr_pages=*/ 1);
+		goto unlock;
 	}
 
 	/* Allocate our own private page. */
@@ -5267,11 +5305,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 	 */
 	__folio_mark_uptodate(folio);
 
-	entry = folio_mk_pte(folio, vma->vm_page_prot);
-	entry = pte_sw_mkyoung(entry);
-	if (vma->vm_flags & VM_WRITE)
-		entry = pte_mkwrite(pte_mkdirty(entry), vma);
-
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
 	if (!vmf->pte)
 		goto release;
@@ -5293,19 +5326,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
 		folio_put(folio);
 		return handle_userfault(vmf, VM_UFFD_MISSING);
 	}
-
-	folio_ref_add(folio, nr_pages - 1);
-	add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
-	count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
-	folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
-	folio_add_lru_vma(folio, vma);
-setpte:
-	if (vmf_orig_pte_uffd_wp(vmf))
-		entry = pte_mkuffd_wp(entry);
-	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
-
-	/* No need to invalidate - it was non-present before */
-	update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
+	map_anon_folio_pte_pf(folio, vmf->pte, vma, addr,
+			      vmf_orig_pte_uffd_wp(vmf));
 unlock:
 	if (vmf->pte)
 		pte_unmap_unlock(vmf->pte, vmf->ptl);
-- 
2.53.0

Re: [PATCH mm-unstable v3 1/5] mm: consolidate anonymous folio PTE mapping into helpers

Posted by Lorenzo Stoakes (Oracle) 3 weeks, 1 day ago

On Wed, Mar 11, 2026 at 03:13:11PM -0600, Nico Pache wrote:
> The anonymous page fault handler in do_anonymous_page() open-codes the
> sequence to map a newly allocated anonymous folio at the PTE level:
> 	- construct the PTE entry
> 	- add rmap
> 	- add to LRU
> 	- set the PTEs
> 	- update the MMU cache.

Yikes yeah this all needs work. Thanks for looking at this!

>
> Introduce a two helpers to consolidate this duplicated logic, mirroring the

NIT: 'Introduce a two helpers' -> "introduce two helpers'

> existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
>
> 	map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
> 	references, adds anon rmap and LRU. This function also handles the
> 	uffd_wp that can occur in the pf variant.
>
> 	map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
> 	counter updates, and mTHP fault allocation statistics for the page fault
> 	path.

MEGA nit, not sure why you're not just putting this in a bullet list, just
weird to see not-code indented here :)

>
> The zero-page read path in do_anonymous_page() is also untangled from the
> shared setpte label, since it does not allocate a folio and should not
> share the same mapping sequence as the write path. Make nr_pages = 1
> rather than relying on the variable. This makes it more clear that we
> are operating on the zero page only.
>
> This refactoring will also help reduce code duplication between mm/memory.c
> and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
> mapping that can be reused by future callers.

Maybe worth mentioning subseqent patches that will use what you set up
here?

Also you split things out into _nopf() and _pf() variants, it might be
worth saying exactly why you're doing that or what you are preparing to do?

>
> Reviewed-by: Dev Jain <dev.jain@arm.com>
> Reviewed-by: Lance Yang <lance.yang@linux.dev>
> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
> Signed-off-by: Nico Pache <npache@redhat.com>

There's nits above and below, but overall the logic looks good, so with
nits addressed/reasonably responded to:

Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

> ---
>  include/linux/mm.h |  4 ++++
>  mm/memory.c        | 60 +++++++++++++++++++++++++++++++---------------
>  2 files changed, 45 insertions(+), 19 deletions(-)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 4c4fd55fc823..9fea354bd17f 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -4903,4 +4903,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
>
>  void snapshot_page(struct page_snapshot *ps, const struct page *page);
>
> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> +		struct vm_area_struct *vma, unsigned long addr,
> +		bool uffd_wp);

How I hate how uffd infiltrates all our code like this.

Not your fault :)

> +
>  #endif /* _LINUX_MM_H */
> diff --git a/mm/memory.c b/mm/memory.c
> index 6aa0ea4af1fc..5c8bf1eb55f5 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5197,6 +5197,37 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>  	return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
>  }
>
> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
> +		struct vm_area_struct *vma, unsigned long addr,
> +		bool uffd_wp)
> +{
> +	unsigned int nr_pages = folio_nr_pages(folio);

const would be good

> +	pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
> +
> +	entry = pte_sw_mkyoung(entry);
> +
> +	if (vma->vm_flags & VM_WRITE)
> +		entry = pte_mkwrite(pte_mkdirty(entry), vma);
> +	if (uffd_wp)
> +		entry = pte_mkuffd_wp(entry);
> +
> +	folio_ref_add(folio, nr_pages - 1);
> +	folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> +	folio_add_lru_vma(folio, vma);
> +	set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
> +	update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
> +}
> +
> +static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
> +		struct vm_area_struct *vma, unsigned long addr, bool uffd_wp)
> +{
> +	unsigned int order = folio_order(folio);

const would be good here also!

> +
> +	map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
> +	add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1 << order);

Is 1 << order strictly right here? This field is a long value, so 1L <<
order maybe? I get nervous about these shifts...

Note that folio_large_nr_pages() uses 1L << order so that does seem
preferable.

> +	count_mthp_stat(order, MTHP_STAT_ANON_FAULT_ALLOC);
> +}
> +
>  /*
>   * We enter with non-exclusive mmap_lock (to exclude vma changes,
>   * but allow concurrent faults), and pte mapped but not yet locked.
> @@ -5243,7 +5274,14 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>  			pte_unmap_unlock(vmf->pte, vmf->ptl);
>  			return handle_userfault(vmf, VM_UFFD_MISSING);
>  		}
> -		goto setpte;
> +		if (vmf_orig_pte_uffd_wp(vmf))
> +			entry = pte_mkuffd_wp(entry);
> +		set_pte_at(vma->vm_mm, addr, vmf->pte, entry);

How I _despise_ how uffd is implemented in mm. Feels like open coded
nonsense spills out everywhere.

Not your fault obviously :)

> +
> +		/* No need to invalidate - it was non-present before */
> +		update_mmu_cache_range(vmf, vma, addr, vmf->pte,
> +				       /*nr_pages=*/ 1);

Is there any point in passing vmf here given you pass NULL above, and it
appears that nobody actually uses this? I guess it doesn't matter but
seeing this immediately made we question why you set it in one, and not the
other?

Maybe I'm mistaken and some arch uses it? Don't think so though.

Also can't we then just use update_mmu_cache() which is the single-page
wrapper of this AFAICT? That'd make it even simpler.

Having done this, is there any reason to keep the annoying and confusing
initial assignment of nr_pages = 1 at declaration time?

It seems that nr_pages is unconditionally assigned before it's used
anywhere now at line 5298:

	nr_pages = folio_nr_pages(folio);
	addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE);
	...

It's kinda weird to use nr_pages again after you go out of your way to
avoid it using folio_nr_pages() in map_anon_folio_pte_nopf() and
folio_order() in map_anon_folio_pte_pf().

But yeah, ok we align the address and it's yucky maybe leave for now (but
we can definitely stop defaulting nr_pages to 1 :)

> +		goto unlock;
>  	}
>
>  	/* Allocate our own private page. */
> @@ -5267,11 +5305,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>  	 */
>  	__folio_mark_uptodate(folio);
>
> -	entry = folio_mk_pte(folio, vma->vm_page_prot);
> -	entry = pte_sw_mkyoung(entry);
> -	if (vma->vm_flags & VM_WRITE)
> -		entry = pte_mkwrite(pte_mkdirty(entry), vma);
> -
>  	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
>  	if (!vmf->pte)
>  		goto release;
> @@ -5293,19 +5326,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>  		folio_put(folio);
>  		return handle_userfault(vmf, VM_UFFD_MISSING);
>  	}
> -
> -	folio_ref_add(folio, nr_pages - 1);
> -	add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
> -	count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
> -	folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
> -	folio_add_lru_vma(folio, vma);
> -setpte:
> -	if (vmf_orig_pte_uffd_wp(vmf))
> -		entry = pte_mkuffd_wp(entry);
> -	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
> -
> -	/* No need to invalidate - it was non-present before */
> -	update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
> +	map_anon_folio_pte_pf(folio, vmf->pte, vma, addr,
> +			      vmf_orig_pte_uffd_wp(vmf));

So we're going from:

entry = folio_mk_pte(...)
entry = pte_sw_mkyoung(...)
if (write)
	entry = pte_mkwrite(also dirty...)
folio_ref_add(nr_pages - 1)
add_mm_counter(... MM_ANON_PAGES, nr_pages)
count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC)
folio_add_new_anon_rmap(.., RMAP_EXCLUSIVE)
folio_add_lru_vma(folio, vma)
if (vmf_orig_pte_uffd_wp(vmf))
	entry = pte_mkuffd_wp(entry)
set_ptes(mm, addr, vmf->pte, entry, nr_pages)
update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages)

To:

<map_anon_folio_pte_nopf>
entry = folio_mk_pte(...)
entry = pte_sw_mkyoung(...)
if (write)
	entry = pte_mkwrite(also dirty...)
if (vmf_orig_pte_uffd_wp(vmf) <passed in via uffd_wp>) <-- reordered
	entry = pte_mkuffd_wp(entry)
folio_ref_add(nr_pages - 1)
folio_add_new_anon_rmap(.., RMAP_EXCLUSIVE)
folio_add_lru_vma(folio, vma)
set_ptes(mm, addr, pte, entry, nr_pages)
update_mmu_cache_range(NULL, vma, addr, pte, nr_pages)

<map_anon_folio_pte_pf>
add_mm_counter(... MM_ANON_PAGES, nr_pages) <-- reordered
count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC) <-- reodrdered

But the reorderings seem fine, and it is achieving the same thing.

All the parameters being passed seem correct too.

>  unlock:
>  	if (vmf->pte)
>  		pte_unmap_unlock(vmf->pte, vmf->ptl);
> --
> 2.53.0
>

Cheers, Lorenzo

Re: [PATCH mm-unstable v3 1/5] mm: consolidate anonymous folio PTE mapping into helpers

Posted by Nico Pache 2 weeks, 6 days ago


On 3/16/26 12:17 PM, Lorenzo Stoakes (Oracle) wrote:
> On Wed, Mar 11, 2026 at 03:13:11PM -0600, Nico Pache wrote:
>> The anonymous page fault handler in do_anonymous_page() open-codes the
>> sequence to map a newly allocated anonymous folio at the PTE level:
>> 	- construct the PTE entry
>> 	- add rmap
>> 	- add to LRU
>> 	- set the PTEs
>> 	- update the MMU cache.
> 
> Yikes yeah this all needs work. Thanks for looking at this!

np! I believe it looks much cleaner now, and it has the added benefit of
cleaning up some of the mTHP patchset.

I also believe you suggested this so I'll add your SB when I add your RB tag :)

> 
>>
>> Introduce a two helpers to consolidate this duplicated logic, mirroring the
> 
> NIT: 'Introduce a two helpers' -> "introduce two helpers'

ack!

> 
>> existing map_anon_folio_pmd_nopf() pattern for PMD-level mappings:
>>
>> 	map_anon_folio_pte_nopf(): constructs the PTE entry, takes folio
>> 	references, adds anon rmap and LRU. This function also handles the
>> 	uffd_wp that can occur in the pf variant.
>>
>> 	map_anon_folio_pte_pf(): extends the nopf variant to handle MM_ANONPAGES
>> 	counter updates, and mTHP fault allocation statistics for the page fault
>> 	path.
> 
> MEGA nit, not sure why you're not just putting this in a bullet list, just
> weird to see not-code indented here :)

Ill drop the indentation!

> 
>>
>> The zero-page read path in do_anonymous_page() is also untangled from the
>> shared setpte label, since it does not allocate a folio and should not
>> share the same mapping sequence as the write path. Make nr_pages = 1
>> rather than relying on the variable. This makes it more clear that we
>> are operating on the zero page only.
>>
>> This refactoring will also help reduce code duplication between mm/memory.c
>> and mm/khugepaged.c, and provides a clean API for PTE-level anonymous folio
>> mapping that can be reused by future callers.
> 
> Maybe worth mentioning subseqent patches that will use what you set up
> here?

ok ill add something like "... that will be used when adding mTHP support to
khugepaged, and may find future reuse by other callers." Speaking of which I
tried to leverage this elsewhere and I believe that will take extra focus and
time, but may be doable.

> 
> Also you split things out into _nopf() and _pf() variants, it might be
> worth saying exactly why you're doing that or what you are preparing to do?

ok ill expand on this in the "bullet list" you reference above.

> 
>>
>> Reviewed-by: Dev Jain <dev.jain@arm.com>
>> Reviewed-by: Lance Yang <lance.yang@linux.dev>
>> Acked-by: David Hildenbrand (Arm) <david@kernel.org>
>> Signed-off-by: Nico Pache <npache@redhat.com>
> 
> There's nits above and below, but overall the logic looks good, so with
> nits addressed/reasonably responded to:
> 
> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>

Thanks :) Ill take care of those

> 
>> ---
>>  include/linux/mm.h |  4 ++++
>>  mm/memory.c        | 60 +++++++++++++++++++++++++++++++---------------
>>  2 files changed, 45 insertions(+), 19 deletions(-)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 4c4fd55fc823..9fea354bd17f 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -4903,4 +4903,8 @@ static inline bool snapshot_page_is_faithful(const struct page_snapshot *ps)
>>
>>  void snapshot_page(struct page_snapshot *ps, const struct page *page);
>>
>> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
>> +		struct vm_area_struct *vma, unsigned long addr,
>> +		bool uffd_wp);
> 
> How I hate how uffd infiltrates all our code like this.
> 
> Not your fault :)
> 
>> +
>>  #endif /* _LINUX_MM_H */
>> diff --git a/mm/memory.c b/mm/memory.c
>> index 6aa0ea4af1fc..5c8bf1eb55f5 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -5197,6 +5197,37 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>>  	return folio_prealloc(vma->vm_mm, vma, vmf->address, true);
>>  }
>>
>> +void map_anon_folio_pte_nopf(struct folio *folio, pte_t *pte,
>> +		struct vm_area_struct *vma, unsigned long addr,
>> +		bool uffd_wp)
>> +{
>> +	unsigned int nr_pages = folio_nr_pages(folio);
> 
> const would be good
> 
>> +	pte_t entry = folio_mk_pte(folio, vma->vm_page_prot);
>> +
>> +	entry = pte_sw_mkyoung(entry);
>> +
>> +	if (vma->vm_flags & VM_WRITE)
>> +		entry = pte_mkwrite(pte_mkdirty(entry), vma);
>> +	if (uffd_wp)
>> +		entry = pte_mkuffd_wp(entry);
>> +
>> +	folio_ref_add(folio, nr_pages - 1);
>> +	folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
>> +	folio_add_lru_vma(folio, vma);
>> +	set_ptes(vma->vm_mm, addr, pte, entry, nr_pages);
>> +	update_mmu_cache_range(NULL, vma, addr, pte, nr_pages);
>> +}
>> +
>> +static void map_anon_folio_pte_pf(struct folio *folio, pte_t *pte,
>> +		struct vm_area_struct *vma, unsigned long addr, bool uffd_wp)
>> +{
>> +	unsigned int order = folio_order(folio);
> 
> const would be good here also!

ack on the const's

> 
>> +
>> +	map_anon_folio_pte_nopf(folio, pte, vma, addr, uffd_wp);
>> +	add_mm_counter(vma->vm_mm, MM_ANONPAGES, 1 << order);
> 
> Is 1 << order strictly right here? This field is a long value, so 1L <<
> order maybe? I get nervous about these shifts...
> 
> Note that folio_large_nr_pages() uses 1L << order so that does seem
> preferable.

Ok sounds good thanks!

> 
>> +	count_mthp_stat(order, MTHP_STAT_ANON_FAULT_ALLOC);
>> +}
>> +
>>  /*
>>   * We enter with non-exclusive mmap_lock (to exclude vma changes,
>>   * but allow concurrent faults), and pte mapped but not yet locked.
>> @@ -5243,7 +5274,14 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>>  			pte_unmap_unlock(vmf->pte, vmf->ptl);
>>  			return handle_userfault(vmf, VM_UFFD_MISSING);
>>  		}
>> -		goto setpte;
>> +		if (vmf_orig_pte_uffd_wp(vmf))
>> +			entry = pte_mkuffd_wp(entry);
>> +		set_pte_at(vma->vm_mm, addr, vmf->pte, entry);
> 
> How I _despise_ how uffd is implemented in mm. Feels like open coded
> nonsense spills out everywhere.
> 
> Not your fault obviously :)
> 
>> +
>> +		/* No need to invalidate - it was non-present before */
>> +		update_mmu_cache_range(vmf, vma, addr, vmf->pte,
>> +				       /*nr_pages=*/ 1);
> 
> Is there any point in passing vmf here given you pass NULL above, and it
> appears that nobody actually uses this? I guess it doesn't matter but
> seeing this immediately made we question why you set it in one, and not the
> other?
> 
> Maybe I'm mistaken and some arch uses it? Don't think so though.
> 
> Also can't we then just use update_mmu_cache() which is the single-page
> wrapper of this AFAICT? That'd make it even simpler.

> 
> Having done this, is there any reason to keep the annoying and confusing
> initial assignment of nr_pages = 1 at declaration time?
> 
> It seems that nr_pages is unconditionally assigned before it's used
> anywhere now at line 5298:
> 
> 	nr_pages = folio_nr_pages(folio);
> 	addr = ALIGN_DOWN(vmf->address, nr_pages * PAGE_SIZE);
> 	...
> 
> It's kinda weird to use nr_pages again after you go out of your way to
> avoid it using folio_nr_pages() in map_anon_folio_pte_nopf() and
> folio_order() in map_anon_folio_pte_pf().

Yes I believe so, thank you that is cleaner!  In the past I had nr_pages as a
variable but David suggested being explicit with the "1" so it would be obvious
its a single page... update_mmu_cache() should indicate the same thing.

> 
> But yeah, ok we align the address and it's yucky maybe leave for now (but
> we can definitely stop defaulting nr_pages to 1 :)
> 
>> +		goto unlock;
>>  	}
>>
>>  	/* Allocate our own private page. */
>> @@ -5267,11 +5305,6 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>>  	 */
>>  	__folio_mark_uptodate(folio);
>>
>> -	entry = folio_mk_pte(folio, vma->vm_page_prot);
>> -	entry = pte_sw_mkyoung(entry);
>> -	if (vma->vm_flags & VM_WRITE)
>> -		entry = pte_mkwrite(pte_mkdirty(entry), vma);
>> -
>>  	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
>>  	if (!vmf->pte)
>>  		goto release;
>> @@ -5293,19 +5326,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>>  		folio_put(folio);
>>  		return handle_userfault(vmf, VM_UFFD_MISSING);
>>  	}
>> -
>> -	folio_ref_add(folio, nr_pages - 1);
>> -	add_mm_counter(vma->vm_mm, MM_ANONPAGES, nr_pages);
>> -	count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC);
>> -	folio_add_new_anon_rmap(folio, vma, addr, RMAP_EXCLUSIVE);
>> -	folio_add_lru_vma(folio, vma);
>> -setpte:
>> -	if (vmf_orig_pte_uffd_wp(vmf))
>> -		entry = pte_mkuffd_wp(entry);
>> -	set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
>> -
>> -	/* No need to invalidate - it was non-present before */
>> -	update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages);
>> +	map_anon_folio_pte_pf(folio, vmf->pte, vma, addr,
>> +			      vmf_orig_pte_uffd_wp(vmf));
> 
> So we're going from:
> 
> entry = folio_mk_pte(...)
> entry = pte_sw_mkyoung(...)
> if (write)
> 	entry = pte_mkwrite(also dirty...)
> folio_ref_add(nr_pages - 1)
> add_mm_counter(... MM_ANON_PAGES, nr_pages)
> count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC)
> folio_add_new_anon_rmap(.., RMAP_EXCLUSIVE)
> folio_add_lru_vma(folio, vma)
> if (vmf_orig_pte_uffd_wp(vmf))
> 	entry = pte_mkuffd_wp(entry)
> set_ptes(mm, addr, vmf->pte, entry, nr_pages)
> update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages)
> 
> To:
> 
> <map_anon_folio_pte_nopf>
> entry = folio_mk_pte(...)
> entry = pte_sw_mkyoung(...)
> if (write)
> 	entry = pte_mkwrite(also dirty...)
> if (vmf_orig_pte_uffd_wp(vmf) <passed in via uffd_wp>) <-- reordered
> 	entry = pte_mkuffd_wp(entry)
> folio_ref_add(nr_pages - 1)
> folio_add_new_anon_rmap(.., RMAP_EXCLUSIVE)
> folio_add_lru_vma(folio, vma)
> set_ptes(mm, addr, pte, entry, nr_pages)
> update_mmu_cache_range(NULL, vma, addr, pte, nr_pages)
> 
> <map_anon_folio_pte_pf>
> add_mm_counter(... MM_ANON_PAGES, nr_pages) <-- reordered
> count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_FAULT_ALLOC) <-- reodrdered
> 
> But the reorderings seem fine, and it is achieving the same thing.
> 
> All the parameters being passed seem correct too.

Thanks for the review and verifying the logic :) I was particularly scared of
changing the page fault handler, so im glad multiple people have confirmed this
seems fine.

Cheers,
-- Nico

> 
>>  unlock:
>>  	if (vmf->pte)
>>  		pte_unmap_unlock(vmf->pte, vmf->ptl);
>> --
>> 2.53.0
>>
> 
> Cheers, Lorenzo
>

[PATCH mm-unstable v3 1/5] mm: consolidate anonymous folio PTE mapping into helpers
[PATCH mm-unstable v3 2/5] mm: introduce is_pmd_order helper
[PATCH mm-unstable v3 3/5] mm/khugepaged: define KHUGEPAGED_MAX_PTES_LIMIT as HPAGE_PMD_NR - 1
[PATCH mm-unstable v3 4/5] mm/khugepaged: rename hpage_collapse_* to collapse_*
[PATCH mm-unstable v3 5/5] mm/khugepaged: unify khugepaged and madv_collapse with collapse_single_pmd()