mm/rmap: Optimize anonymous large folio unmapping

[PATCH 8/9] mm/rmap: introduce folio_try_share_anon_rmap_ptes

Posted by Dev Jain 1 month ago

In the quest of enabling batched unmapping of anonymous folios, we need to
handle the sharing of exclusive pages. Hence, a batched version of
folio_try_share_anon_rmap_pte is required.

Currently, the sole purpose of nr_pages in __folio_try_share_anon_rmap is
to do some rmap sanity checks. Add helpers to set and clear the
PageAnonExclusive bit on a batch of nr_pages. Note that
__folio_try_share_anon_rmap can receive nr_pages == HPAGE_PMD_NR from the
PMD path, but currently we only clear the bit on the head page. Retain this
behaviour by setting nr_pages = 1 in case the caller is
folio_try_share_anon_rmap_pmd.

Signed-off-by: Dev Jain <dev.jain@arm.com>
---
 include/linux/page-flags.h | 11 +++++++++++
 include/linux/rmap.h       | 28 ++++++++++++++++++++++++++--
 mm/rmap.c                  |  2 +-
 3 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 0e03d816e8b9d..1d74ed9a28c41 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -1178,6 +1178,17 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
 	__clear_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags.f);
 }
 
+static __always_inline void ClearPagesAnonExclusive(struct page *page,
+		unsigned int nr)
+{
+	for (;;) {
+		ClearPageAnonExclusive(page);
+		if (--nr == 0)
+			break;
+		++page;
+	}
+}
+
 #ifdef CONFIG_MMU
 #define __PG_MLOCKED		(1UL << PG_mlocked)
 #else
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 1b7720c66ac87..7a67776dca3fe 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -712,9 +712,13 @@ static __always_inline int __folio_try_share_anon_rmap(struct folio *folio,
 	VM_WARN_ON_FOLIO(!PageAnonExclusive(page), folio);
 	__folio_rmap_sanity_checks(folio, page, nr_pages, level);
 
+	/* We only clear anon-exclusive from head page of PMD folio */
+	if (level == PGTABLE_LEVEL_PMD)
+		nr_pages = 1;
+
 	/* device private folios cannot get pinned via GUP. */
 	if (unlikely(folio_is_device_private(folio))) {
-		ClearPageAnonExclusive(page);
+		ClearPagesAnonExclusive(page, nr_pages);
 		return 0;
 	}
 
@@ -766,7 +770,7 @@ static __always_inline int __folio_try_share_anon_rmap(struct folio *folio,
 
 	if (unlikely(folio_maybe_dma_pinned(folio)))
 		return -EBUSY;
-	ClearPageAnonExclusive(page);
+	ClearPagesAnonExclusive(page, nr_pages);
 
 	/*
 	 * This is conceptually a smp_wmb() paired with the smp_rmb() in
@@ -804,6 +808,26 @@ static inline int folio_try_share_anon_rmap_pte(struct folio *folio,
 	return __folio_try_share_anon_rmap(folio, page, 1, PGTABLE_LEVEL_PTE);
 }
 
+/**
+ * folio_try_share_anon_rmap_ptes - try marking exclusive anonymous pages
+ *				    mapped by PTEs possibly shared to prepare
+ *				   for KSM or temporary unmapping
+ * @folio:	The folio to share a mapping of
+ * @page:	The first mapped exclusive page of the batch
+ * @nr_pages:	The number of pages to share (batch size)
+ *
+ * See folio_try_share_anon_rmap_pte for full description.
+ *
+ * Context: The caller needs to hold the page table lock and has to have the
+ * page table entries cleared/invalidated. Those PTEs used to map consecutive
+ * pages of the folio passed here. The PTEs are all in the same PMD and VMA.
+ */
+static inline int folio_try_share_anon_rmap_ptes(struct folio *folio,
+		struct page *page, unsigned int nr)
+{
+	return __folio_try_share_anon_rmap(folio, page, nr, PGTABLE_LEVEL_PTE);
+}
+
 /**
  * folio_try_share_anon_rmap_pmd - try marking an exclusive anonymous page
  *				   range mapped by a PMD possibly shared to
diff --git a/mm/rmap.c b/mm/rmap.c
index 42f6b00cced01..bba5b571946d8 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -2300,7 +2300,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
 
 			/* See folio_try_share_anon_rmap(): clear PTE first. */
 			if (anon_exclusive &&
-			    folio_try_share_anon_rmap_pte(folio, subpage)) {
+			    folio_try_share_anon_rmap_ptes(folio, subpage, 1)) {
 				folio_put_swap(folio, subpage, 1);
 				set_pte_at(mm, address, pvmw.pte, pteval);
 				goto walk_abort;
-- 
2.34.1

Re: [PATCH 8/9] mm/rmap: introduce folio_try_share_anon_rmap_ptes

Posted by Lorenzo Stoakes (Oracle) 4 weeks, 1 day ago

On Tue, Mar 10, 2026 at 01:00:12PM +0530, Dev Jain wrote:
> In the quest of enabling batched unmapping of anonymous folios, we need to
> handle the sharing of exclusive pages. Hence, a batched version of
> folio_try_share_anon_rmap_pte is required.
>
> Currently, the sole purpose of nr_pages in __folio_try_share_anon_rmap is
> to do some rmap sanity checks. Add helpers to set and clear the
> PageAnonExclusive bit on a batch of nr_pages. Note that
> __folio_try_share_anon_rmap can receive nr_pages == HPAGE_PMD_NR from the
> PMD path, but currently we only clear the bit on the head page. Retain this
> behaviour by setting nr_pages = 1 in case the caller is
> folio_try_share_anon_rmap_pmd.
>
> Signed-off-by: Dev Jain <dev.jain@arm.com>
> ---
>  include/linux/page-flags.h | 11 +++++++++++
>  include/linux/rmap.h       | 28 ++++++++++++++++++++++++++--
>  mm/rmap.c                  |  2 +-
>  3 files changed, 38 insertions(+), 3 deletions(-)
>
> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> index 0e03d816e8b9d..1d74ed9a28c41 100644
> --- a/include/linux/page-flags.h
> +++ b/include/linux/page-flags.h
> @@ -1178,6 +1178,17 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
>  	__clear_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags.f);
>  }
>
> +static __always_inline void ClearPagesAnonExclusive(struct page *page,
> +		unsigned int nr)

You're randomly moving between nr and nr_pages, can we just consistently use
nr_pages please.

> +{
> +	for (;;) {
> +		ClearPageAnonExclusive(page);
> +		if (--nr == 0)

You really require nr to != 0 here or otherwise you're going to be clearing 4
billion pages :)

> +			break;
> +		++page;
> +	}
> +}

Can we put this in mm.h or somewhere else please, and can we do away with this
HorribleNamingConvention, this is new, we can 'get away' with making it
something sensible :)

I wonder if we shouldn't also add a folio pointer here, and some
VM_WARN_ON_ONCE()'s. Like:

static inline void folio_clear_page_batch(struct folio *folio,
					  struct page *first_subpage,
					  unsigned int nr_pages)
{
	struct page *subpage = first_subpage;

	VM_WARN_ON_ONCE(!nr_pages);
	VM_WARN_ON_ONCE(... check first_subpage in folio ...);
	VM_WARN_ON_ONCE(... check first_subpage -> first_subpage + nr_pages in folio ...);

	while (nr_pages--)
		ClearPageAnonExclusive(subpage++);
}

> +
>  #ifdef CONFIG_MMU
>  #define __PG_MLOCKED		(1UL << PG_mlocked)
>  #else
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 1b7720c66ac87..7a67776dca3fe 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -712,9 +712,13 @@ static __always_inline int __folio_try_share_anon_rmap(struct folio *folio,
>  	VM_WARN_ON_FOLIO(!PageAnonExclusive(page), folio);
>  	__folio_rmap_sanity_checks(folio, page, nr_pages, level);
>
> +	/* We only clear anon-exclusive from head page of PMD folio */

Is this accurate? David? I thought anon exclusive was per-subpage for any large
folio...?

If we're explicitly doing this for some reason here, then why introduce it?

> +	if (level == PGTABLE_LEVEL_PMD)
> +		nr_pages = 1;
> +
>  	/* device private folios cannot get pinned via GUP. */
>  	if (unlikely(folio_is_device_private(folio))) {
> -		ClearPageAnonExclusive(page);
> +		ClearPagesAnonExclusive(page, nr_pages);

I really kind of hate this 'we are looking at subpage X with variable page in
folio Y, but we don't mention Y' thing. It's super confusing that we have a
pointer to a thing which sometimes we deref and treat as a value we care about
and sometimes treat as an array.

This pattern exists throughout all the batched stuff and I kind of hate it
everywhere.

I guess the batching means that we are looking at a sub-folio range.

If C had a better type system we could somehow have a type that encoded this,
but it doesn't :>)

But I wonder if we shouldn't just go ahead and rename page -> pages and be
consistent about this?

>  		return 0;
>  	}
>
> @@ -766,7 +770,7 @@ static __always_inline int __folio_try_share_anon_rmap(struct folio *folio,
>
>  	if (unlikely(folio_maybe_dma_pinned(folio)))
>  		return -EBUSY;
> -	ClearPageAnonExclusive(page);
> +	ClearPagesAnonExclusive(page, nr_pages);
>
>  	/*
>  	 * This is conceptually a smp_wmb() paired with the smp_rmb() in
> @@ -804,6 +808,26 @@ static inline int folio_try_share_anon_rmap_pte(struct folio *folio,
>  	return __folio_try_share_anon_rmap(folio, page, 1, PGTABLE_LEVEL_PTE);
>  }
>
> +/**
> + * folio_try_share_anon_rmap_ptes - try marking exclusive anonymous pages
> + *				    mapped by PTEs possibly shared to prepare
> + *				   for KSM or temporary unmapping

This description is very confusing. 'Try marking exclusive anonymous pages
[... marking them as what?] mapped by PTEs[, or (]possibly shared[, or )] to
prepare for KSM[under what circumstances? Why mention KSM here?] or temporary
unmapping [why temporary?]

OK I think you mean to say 'marking' them as 'possibly' shared.

But really by 'shared' you mean clearing anon exclusive right? So maybe the
function name and description should reference that instead.

But this needs clarifying. This isn't an exercise in minimum number of words to
describe the function.

Ohhh now I see this is what the comment is in folio_try_share_anon_rmap_pte() :P

Well, I wish we could update the original too ;) but OK this is fine as-is to
matc that then.

> + * @folio:	The folio to share a mapping of
> + * @page:	The first mapped exclusive page of the batch
> + * @nr_pages:	The number of pages to share (batch size)
> + *
> + * See folio_try_share_anon_rmap_pte for full description.
> + *
> + * Context: The caller needs to hold the page table lock and has to have the
> + * page table entries cleared/invalidated. Those PTEs used to map consecutive
> + * pages of the folio passed here. The PTEs are all in the same PMD and VMA.

Can we VM_WARN_ON_ONCE() any of this? Not completely a necessity.

> + */
> +static inline int folio_try_share_anon_rmap_ptes(struct folio *folio,
> +		struct page *page, unsigned int nr)
> +{
> +	return __folio_try_share_anon_rmap(folio, page, nr, PGTABLE_LEVEL_PTE);
> +}
> +
>  /**
>   * folio_try_share_anon_rmap_pmd - try marking an exclusive anonymous page
>   *				   range mapped by a PMD possibly shared to
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 42f6b00cced01..bba5b571946d8 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -2300,7 +2300,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>
>  			/* See folio_try_share_anon_rmap(): clear PTE first. */
>  			if (anon_exclusive &&
> -			    folio_try_share_anon_rmap_pte(folio, subpage)) {
> +			    folio_try_share_anon_rmap_ptes(folio, subpage, 1)) {

I guess this is because you intend to make this batched later with >1, but I
don't see the point of adding this since folio_try_share_anon_rmap_pte() already
does what you're doing here.

So why not just change this when you actually batch?

Buuuut.... haven't you not already changed this whole function to now 'jump'
ahead if batched, so why are we only specifying nr_pages = 1 here?

Honestly I think this function needs to be fully refactored away from the
appalling giant-ball-of-string mess it is now before we try to add in batching
to be honest.

Let's not accumulate more technical debt.


>  				folio_put_swap(folio, subpage, 1);
>  				set_pte_at(mm, address, pvmw.pte, pteval);
>  				goto walk_abort;
> --
> 2.34.1
>

Thanks, Lorenzo

Re: [PATCH 8/9] mm/rmap: introduce folio_try_share_anon_rmap_ptes

Posted by Dev Jain 4 weeks, 1 day ago


On 10/03/26 3:08 pm, Lorenzo Stoakes (Oracle) wrote:
> On Tue, Mar 10, 2026 at 01:00:12PM +0530, Dev Jain wrote:
>> In the quest of enabling batched unmapping of anonymous folios, we need to
>> handle the sharing of exclusive pages. Hence, a batched version of
>> folio_try_share_anon_rmap_pte is required.
>>
>> Currently, the sole purpose of nr_pages in __folio_try_share_anon_rmap is
>> to do some rmap sanity checks. Add helpers to set and clear the
>> PageAnonExclusive bit on a batch of nr_pages. Note that
>> __folio_try_share_anon_rmap can receive nr_pages == HPAGE_PMD_NR from the
>> PMD path, but currently we only clear the bit on the head page. Retain this
>> behaviour by setting nr_pages = 1 in case the caller is
>> folio_try_share_anon_rmap_pmd.
>>
>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>> ---
>>  include/linux/page-flags.h | 11 +++++++++++
>>  include/linux/rmap.h       | 28 ++++++++++++++++++++++++++--
>>  mm/rmap.c                  |  2 +-
>>  3 files changed, 38 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>> index 0e03d816e8b9d..1d74ed9a28c41 100644
>> --- a/include/linux/page-flags.h
>> +++ b/include/linux/page-flags.h
>> @@ -1178,6 +1178,17 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
>>  	__clear_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags.f);
>>  }
>>
>> +static __always_inline void ClearPagesAnonExclusive(struct page *page,
>> +		unsigned int nr)
> 
> You're randomly moving between nr and nr_pages, can we just consistently use
> nr_pages please.

Okay.

> 
>> +{
>> +	for (;;) {
>> +		ClearPageAnonExclusive(page);
>> +		if (--nr == 0)
> 
> You really require nr to != 0 here or otherwise you're going to be clearing 4
> billion pages :)

I'm following the pattern in pgtable.h, see set_ptes,
clear_young_dirty_ptes, etc.

> 
>> +			break;
>> +		++page;
>> +	}
>> +}
> 
> Can we put this in mm.h or somewhere else please, and can we do away with this

What is the problem with page-flags.h? I am not very aware on which
function to put in which header file semantically, so please educate me on
this.

> HorribleNamingConvention, this is new, we can 'get away' with making it
> something sensible :)

I'll name it folio_clear_pages_anon_exclusive.


> 
> I wonder if we shouldn't also add a folio pointer here, and some
> VM_WARN_ON_ONCE()'s. Like:
> 
> static inline void folio_clear_page_batch(struct folio *folio,
> 					  struct page *first_subpage,
> 					  unsigned int nr_pages)
> {
> 	struct page *subpage = first_subpage;
> 
> 	VM_WARN_ON_ONCE(!nr_pages);
> 	VM_WARN_ON_ONCE(... check first_subpage in folio ...);
> 	VM_WARN_ON_ONCE(... check first_subpage -> first_subpage + nr_pages in folio ...);

I like what you are saying, but __folio_rmap_sanity_checks in the caller
checks exactly this :)

> 
> 	while (nr_pages--)
> 		ClearPageAnonExclusive(subpage++);
> }
> 
>> +
>>  #ifdef CONFIG_MMU
>>  #define __PG_MLOCKED		(1UL << PG_mlocked)
>>  #else
>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>> index 1b7720c66ac87..7a67776dca3fe 100644
>> --- a/include/linux/rmap.h
>> +++ b/include/linux/rmap.h
>> @@ -712,9 +712,13 @@ static __always_inline int __folio_try_share_anon_rmap(struct folio *folio,
>>  	VM_WARN_ON_FOLIO(!PageAnonExclusive(page), folio);
>>  	__folio_rmap_sanity_checks(folio, page, nr_pages, level);
>>
>> +	/* We only clear anon-exclusive from head page of PMD folio */
> 
> Is this accurate? David? I thought anon exclusive was per-subpage for any large
> folio...?

The current behaviour is to do this only. I was also surprised with this,
so I had dug in and found out:

https://lore.kernel.org/all/20220428083441.37290-13-david@redhat.com/

where David says:

"Long story short: once
PTE-mapped, we have to track information about exclusivity per sub-page,
but until then, we can just track it for the compound page in the head
page and not having to update a whole bunch of subpages all of the time
for a simple PMD mapping of a THP."


> 
> If we're explicitly doing this for some reason here, then why introduce it?
> 
>> +	if (level == PGTABLE_LEVEL_PMD)
>> +		nr_pages = 1;
>> +
>>  	/* device private folios cannot get pinned via GUP. */
>>  	if (unlikely(folio_is_device_private(folio))) {
>> -		ClearPageAnonExclusive(page);
>> +		ClearPagesAnonExclusive(page, nr_pages);
> 
> I really kind of hate this 'we are looking at subpage X with variable page in
> folio Y, but we don't mention Y' thing. It's super confusing that we have a
> pointer to a thing which sometimes we deref and treat as a value we care about
> and sometimes treat as an array.
> 
> This pattern exists throughout all the batched stuff and I kind of hate it
> everywhere.
> 
> I guess the batching means that we are looking at a sub-folio range.
> 
> If C had a better type system we could somehow have a type that encoded this,
> but it doesn't :>)
> 
> But I wonder if we shouldn't just go ahead and rename page -> pages and be
> consistent about this?

Agreed. You are correct in saying that this function should receive struct
folio to assert that we are essentially in a page array, and some sanity
checking should happen. But the callers are already doing the checking
in folio_rmap_sanity_checks. Let me think on this.


> 
>>  		return 0;
>>  	}
>>
>> @@ -766,7 +770,7 @@ static __always_inline int __folio_try_share_anon_rmap(struct folio *folio,
>>
>>  	if (unlikely(folio_maybe_dma_pinned(folio)))
>>  		return -EBUSY;
>> -	ClearPageAnonExclusive(page);
>> +	ClearPagesAnonExclusive(page, nr_pages);
>>
>>  	/*
>>  	 * This is conceptually a smp_wmb() paired with the smp_rmb() in
>> @@ -804,6 +808,26 @@ static inline int folio_try_share_anon_rmap_pte(struct folio *folio,
>>  	return __folio_try_share_anon_rmap(folio, page, 1, PGTABLE_LEVEL_PTE);
>>  }
>>
>> +/**
>> + * folio_try_share_anon_rmap_ptes - try marking exclusive anonymous pages
>> + *				    mapped by PTEs possibly shared to prepare
>> + *				   for KSM or temporary unmapping
> 
> This description is very confusing. 'Try marking exclusive anonymous pages
> [... marking them as what?] mapped by PTEs[, or (]possibly shared[, or )] to
> prepare for KSM[under what circumstances? Why mention KSM here?] or temporary
> unmapping [why temporary?]
> 
> OK I think you mean to say 'marking' them as 'possibly' shared.
> 
> But really by 'shared' you mean clearing anon exclusive right? So maybe the
> function name and description should reference that instead.
> 
> But this needs clarifying. This isn't an exercise in minimum number of words to
> describe the function.
> 
> Ohhh now I see this is what the comment is in folio_try_share_anon_rmap_pte() :P
> 
> Well, I wish we could update the original too ;) but OK this is fine as-is to
> matc that then.
> 
>> + * @folio:	The folio to share a mapping of
>> + * @page:	The first mapped exclusive page of the batch
>> + * @nr_pages:	The number of pages to share (batch size)
>> + *
>> + * See folio_try_share_anon_rmap_pte for full description.
>> + *
>> + * Context: The caller needs to hold the page table lock and has to have the
>> + * page table entries cleared/invalidated. Those PTEs used to map consecutive
>> + * pages of the folio passed here. The PTEs are all in the same PMD and VMA.
> 
> Can we VM_WARN_ON_ONCE() any of this? Not completely a necessity.

Again, we have WARN checks in folio_rmap_sanity_checks, and even in
folio_pte_batch. I am afraid of duplication.

> 
>> + */
>> +static inline int folio_try_share_anon_rmap_ptes(struct folio *folio,
>> +		struct page *page, unsigned int nr)
>> +{
>> +	return __folio_try_share_anon_rmap(folio, page, nr, PGTABLE_LEVEL_PTE);
>> +}
>> +
>>  /**
>>   * folio_try_share_anon_rmap_pmd - try marking an exclusive anonymous page
>>   *				   range mapped by a PMD possibly shared to
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 42f6b00cced01..bba5b571946d8 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -2300,7 +2300,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>
>>  			/* See folio_try_share_anon_rmap(): clear PTE first. */
>>  			if (anon_exclusive &&
>> -			    folio_try_share_anon_rmap_pte(folio, subpage)) {
>> +			    folio_try_share_anon_rmap_ptes(folio, subpage, 1)) {
> 
> I guess this is because you intend to make this batched later with >1, but I
> don't see the point of adding this since folio_try_share_anon_rmap_pte() already
> does what you're doing here.
> 
> So why not just change this when you actually batch?

It is generally the consensus to introduce a new function along with its
caller. Although one may argue that the caller introduced here is not
a functional change (still passing nr_pages = 1). So I am fine doing
what you suggest.

> 
> Buuuut.... haven't you not already changed this whole function to now 'jump'
> ahead if batched, so why are we only specifying nr_pages = 1 here?

Because...please bear with the insanity :) currently we are in a ridiculous
situation where

nr_pages can be > 1 for file folios, and lazyfree folios, *and* it is
required that the VMA is non-uffd.

So, the "jump" thingy I was doing in the previous patch was adding support
for file folios, belonging to uffd VMAs (see pte_install_uffd_wp_if_needed,
we need to handle uffd-wp marker for file folios only, and also I should
have mentioned "file folio" in the subject line of that patch, of course
I missed that because reasoning through this code is very difficult)

To answer your question, currently for anon folios nr_pages == 1, so
the jump is a no-op.

When I had discovered the uffd-wp bug some weeks back, I was pushing
back against the idea of hacking around it by disabling batching
for uffd-VMAs in folio_unmap_pte_batch, but solve it then and there
properly. Now we have too many cases - first we added lazyfree support,
then file-non-uffd support, my patch 5 adds file-uffd support, and
last patch finally completes this with anon support.


> 
> Honestly I think this function needs to be fully refactored away from the
> appalling giant-ball-of-string mess it is now before we try to add in batching
> to be honest.
> 
> Let's not accumulate more technical debt.

I agree, I am happy to help in cleaning up this function.

> 
> 
>>  				folio_put_swap(folio, subpage, 1);
>>  				set_pte_at(mm, address, pvmw.pte, pteval);
>>  				goto walk_abort;
>> --
>> 2.34.1
>>
> 
> Thanks, Lorenzo

Re: [PATCH 8/9] mm/rmap: introduce folio_try_share_anon_rmap_ptes

Posted by Lorenzo Stoakes (Oracle) 2 weeks, 6 days ago

On Wed, Mar 11, 2026 at 01:39:25PM +0530, Dev Jain wrote:
>
>
> On 10/03/26 3:08 pm, Lorenzo Stoakes (Oracle) wrote:
> > On Tue, Mar 10, 2026 at 01:00:12PM +0530, Dev Jain wrote:
> >> In the quest of enabling batched unmapping of anonymous folios, we need to
> >> handle the sharing of exclusive pages. Hence, a batched version of
> >> folio_try_share_anon_rmap_pte is required.
> >>
> >> Currently, the sole purpose of nr_pages in __folio_try_share_anon_rmap is
> >> to do some rmap sanity checks. Add helpers to set and clear the
> >> PageAnonExclusive bit on a batch of nr_pages. Note that
> >> __folio_try_share_anon_rmap can receive nr_pages == HPAGE_PMD_NR from the
> >> PMD path, but currently we only clear the bit on the head page. Retain this
> >> behaviour by setting nr_pages = 1 in case the caller is
> >> folio_try_share_anon_rmap_pmd.
> >>
> >> Signed-off-by: Dev Jain <dev.jain@arm.com>
> >> ---
> >>  include/linux/page-flags.h | 11 +++++++++++
> >>  include/linux/rmap.h       | 28 ++++++++++++++++++++++++++--
> >>  mm/rmap.c                  |  2 +-
> >>  3 files changed, 38 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
> >> index 0e03d816e8b9d..1d74ed9a28c41 100644
> >> --- a/include/linux/page-flags.h
> >> +++ b/include/linux/page-flags.h
> >> @@ -1178,6 +1178,17 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
> >>  	__clear_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags.f);
> >>  }
> >>
> >> +static __always_inline void ClearPagesAnonExclusive(struct page *page,
> >> +		unsigned int nr)
> >
> > You're randomly moving between nr and nr_pages, can we just consistently use
> > nr_pages please.
>
> Okay.
>
> >
> >> +{
> >> +	for (;;) {
> >> +		ClearPageAnonExclusive(page);
> >> +		if (--nr == 0)
> >
> > You really require nr to != 0 here or otherwise you're going to be clearing 4
> > billion pages :)
>
> I'm following the pattern in pgtable.h, see set_ptes,
> clear_young_dirty_ptes, etc.
>
> >
> >> +			break;
> >> +		++page;
> >> +	}
> >> +}
> >
> > Can we put this in mm.h or somewhere else please, and can we do away with this
>
> What is the problem with page-flags.h? I am not very aware on which
> function to put in which header file semantically, so please educate me on
> this.

It's a mess in there and this this doesn't really belong.

>
> > HorribleNamingConvention, this is new, we can 'get away' with making it
> > something sensible :)
>
> I'll name it folio_clear_pages_anon_exclusive.

OK

>
>
> >
> > I wonder if we shouldn't also add a folio pointer here, and some
> > VM_WARN_ON_ONCE()'s. Like:
> >
> > static inline void folio_clear_page_batch(struct folio *folio,
> > 					  struct page *first_subpage,
> > 					  unsigned int nr_pages)
> > {
> > 	struct page *subpage = first_subpage;
> >
> > 	VM_WARN_ON_ONCE(!nr_pages);
> > 	VM_WARN_ON_ONCE(... check first_subpage in folio ...);
> > 	VM_WARN_ON_ONCE(... check first_subpage -> first_subpage + nr_pages in folio ...);
>
> I like what you are saying, but __folio_rmap_sanity_checks in the caller
> checks exactly this :)

This is a shared function that can't be assumed to always be called from a
context where that has run. VM_WARN_ON_ONCE()'s are free in release kernels so I
don't think it should be such an issue.

>
> >
> > 	while (nr_pages--)
> > 		ClearPageAnonExclusive(subpage++);
> > }
> >
> >> +
> >>  #ifdef CONFIG_MMU
> >>  #define __PG_MLOCKED		(1UL << PG_mlocked)
> >>  #else
> >> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> >> index 1b7720c66ac87..7a67776dca3fe 100644
> >> --- a/include/linux/rmap.h
> >> +++ b/include/linux/rmap.h
> >> @@ -712,9 +712,13 @@ static __always_inline int __folio_try_share_anon_rmap(struct folio *folio,
> >>  	VM_WARN_ON_FOLIO(!PageAnonExclusive(page), folio);
> >>  	__folio_rmap_sanity_checks(folio, page, nr_pages, level);
> >>
> >> +	/* We only clear anon-exclusive from head page of PMD folio */
> >
> > Is this accurate? David? I thought anon exclusive was per-subpage for any large
> > folio...?
>
> The current behaviour is to do this only. I was also surprised with this,
> so I had dug in and found out:
>
> https://lore.kernel.org/all/20220428083441.37290-13-david@redhat.com/
>
> where David says:
>
> "Long story short: once
> PTE-mapped, we have to track information about exclusivity per sub-page,
> but until then, we can just track it for the compound page in the head
> page and not having to update a whole bunch of subpages all of the time
> for a simple PMD mapping of a THP."

OK

>
>
> >
> > If we're explicitly doing this for some reason here, then why introduce it?
> >
> >> +	if (level == PGTABLE_LEVEL_PMD)
> >> +		nr_pages = 1;
> >> +
> >>  	/* device private folios cannot get pinned via GUP. */
> >>  	if (unlikely(folio_is_device_private(folio))) {
> >> -		ClearPageAnonExclusive(page);
> >> +		ClearPagesAnonExclusive(page, nr_pages);
> >
> > I really kind of hate this 'we are looking at subpage X with variable page in
> > folio Y, but we don't mention Y' thing. It's super confusing that we have a
> > pointer to a thing which sometimes we deref and treat as a value we care about
> > and sometimes treat as an array.
> >
> > This pattern exists throughout all the batched stuff and I kind of hate it
> > everywhere.
> >
> > I guess the batching means that we are looking at a sub-folio range.
> >
> > If C had a better type system we could somehow have a type that encoded this,
> > but it doesn't :>)
> >
> > But I wonder if we shouldn't just go ahead and rename page -> pages and be
> > consistent about this?
>
> Agreed. You are correct in saying that this function should receive struct
> folio to assert that we are essentially in a page array, and some sanity
> checking should happen. But the callers are already doing the checking
> in folio_rmap_sanity_checks. Let me think on this.

You treat that like a license to never put an assert anywhere else in mm... In
any case I'm not sure I mentioend an assert here, more so the pattern around
folio v.s subpages.

>
>
> >
> >>  		return 0;
> >>  	}
> >>
> >> @@ -766,7 +770,7 @@ static __always_inline int __folio_try_share_anon_rmap(struct folio *folio,
> >>
> >>  	if (unlikely(folio_maybe_dma_pinned(folio)))
> >>  		return -EBUSY;
> >> -	ClearPageAnonExclusive(page);
> >> +	ClearPagesAnonExclusive(page, nr_pages);
> >>
> >>  	/*
> >>  	 * This is conceptually a smp_wmb() paired with the smp_rmb() in
> >> @@ -804,6 +808,26 @@ static inline int folio_try_share_anon_rmap_pte(struct folio *folio,
> >>  	return __folio_try_share_anon_rmap(folio, page, 1, PGTABLE_LEVEL_PTE);
> >>  }
> >>
> >> +/**
> >> + * folio_try_share_anon_rmap_ptes - try marking exclusive anonymous pages
> >> + *				    mapped by PTEs possibly shared to prepare
> >> + *				   for KSM or temporary unmapping
> >
> > This description is very confusing. 'Try marking exclusive anonymous pages
> > [... marking them as what?] mapped by PTEs[, or (]possibly shared[, or )] to
> > prepare for KSM[under what circumstances? Why mention KSM here?] or temporary
> > unmapping [why temporary?]
> >
> > OK I think you mean to say 'marking' them as 'possibly' shared.
> >
> > But really by 'shared' you mean clearing anon exclusive right? So maybe the
> > function name and description should reference that instead.
> >
> > But this needs clarifying. This isn't an exercise in minimum number of words to
> > describe the function.
> >
> > Ohhh now I see this is what the comment is in folio_try_share_anon_rmap_pte() :P
> >
> > Well, I wish we could update the original too ;) but OK this is fine as-is to
> > matc that then.
> >
> >> + * @folio:	The folio to share a mapping of
> >> + * @page:	The first mapped exclusive page of the batch
> >> + * @nr_pages:	The number of pages to share (batch size)
> >> + *
> >> + * See folio_try_share_anon_rmap_pte for full description.
> >> + *
> >> + * Context: The caller needs to hold the page table lock and has to have the
> >> + * page table entries cleared/invalidated. Those PTEs used to map consecutive
> >> + * pages of the folio passed here. The PTEs are all in the same PMD and VMA.
> >
> > Can we VM_WARN_ON_ONCE() any of this? Not completely a necessity.
>
> Again, we have WARN checks in folio_rmap_sanity_checks, and even in
> folio_pte_batch. I am afraid of duplication.

Why on earth are you bothering with 'context' then?

If having run folio_rmap_sanity_checks() means we never have to assert or think
about state or context again then why bother?

I looked in __folio_rma_sanity_checks() and I see no:

Breaking it down:

Context: The caller needs to hold the page table lock and has to have the
         page table entries cleared/invalidated. Those PTEs used to map consecutive
         pages of the folio passed here. The PTEs are all in the same PMD and VMA.

- Page table lock <-- NOT checked
- Page table entries cleared/invalidated <-- NOT checked
- PTEs passed in must map all pages in the folio? <-- you aren't passing in PTEs?
- PTEs are all in the same PMD <- You aren't passing in PTEs?
- PTES are all in the same VMA <- You aren't passing in PTEs?

So that's hardly a coherent picture no?

>
> >
> >> + */
> >> +static inline int folio_try_share_anon_rmap_ptes(struct folio *folio,
> >> +		struct page *page, unsigned int nr)
> >> +{
> >> +	return __folio_try_share_anon_rmap(folio, page, nr, PGTABLE_LEVEL_PTE);
> >> +}
> >> +
> >>  /**
> >>   * folio_try_share_anon_rmap_pmd - try marking an exclusive anonymous page
> >>   *				   range mapped by a PMD possibly shared to
> >> diff --git a/mm/rmap.c b/mm/rmap.c
> >> index 42f6b00cced01..bba5b571946d8 100644
> >> --- a/mm/rmap.c
> >> +++ b/mm/rmap.c
> >> @@ -2300,7 +2300,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
> >>
> >>  			/* See folio_try_share_anon_rmap(): clear PTE first. */
> >>  			if (anon_exclusive &&
> >> -			    folio_try_share_anon_rmap_pte(folio, subpage)) {
> >> +			    folio_try_share_anon_rmap_ptes(folio, subpage, 1)) {
> >
> > I guess this is because you intend to make this batched later with >1, but I
> > don't see the point of adding this since folio_try_share_anon_rmap_pte() already
> > does what you're doing here.
> >
> > So why not just change this when you actually batch?
>
> It is generally the consensus to introduce a new function along with its
> caller. Although one may argue that the caller introduced here is not

That's not always the case, sometimes it makes more sense and is cleaner to
introduce it first.

All rules of thumb are open to sensible interpretation.

I'm not sure arbitrarily using the function in a way that makes no sense in the
code is a good approach but this isn't the end of the world.

> a functional change (still passing nr_pages = 1). So I am fine doing
> what you suggest.
>
> >
> > Buuuut.... haven't you not already changed this whole function to now 'jump'
> > ahead if batched, so why are we only specifying nr_pages = 1 here?
>
> Because...please bear with the insanity :) currently we are in a ridiculous
> situation where
>
> nr_pages can be > 1 for file folios, and lazyfree folios, *and* it is
> required that the VMA is non-uffd.
>
> So, the "jump" thingy I was doing in the previous patch was adding support
> for file folios, belonging to uffd VMAs (see pte_install_uffd_wp_if_needed,
> we need to handle uffd-wp marker for file folios only, and also I should
> have mentioned "file folio" in the subject line of that patch, of course
> I missed that because reasoning through this code is very difficult)
>
> To answer your question, currently for anon folios nr_pages == 1, so
> the jump is a no-op.
>
> When I had discovered the uffd-wp bug some weeks back, I was pushing
> back against the idea of hacking around it by disabling batching
> for uffd-VMAs in folio_unmap_pte_batch, but solve it then and there
> properly. Now we have too many cases - first we added lazyfree support,
> then file-non-uffd support, my patch 5 adds file-uffd support, and
> last patch finally completes this with anon support.

I am not a fan of any of that, this just speaks to the need to clean this up
before endlessly adding more functionality and piles of complexity and
confusion.

Let's just add some patches to do things sanely please.

>
>
> >
> > Honestly I think this function needs to be fully refactored away from the
> > appalling giant-ball-of-string mess it is now before we try to add in batching
> > to be honest.
> >
> > Let's not accumulate more technical debt.
>
> I agree, I am happy to help in cleaning up this function.

Right, well then this series needs to have some clean up patches in it first.

>
> >
> >
> >>  				folio_put_swap(folio, subpage, 1);
> >>  				set_pte_at(mm, address, pvmw.pte, pteval);
> >>  				goto walk_abort;
> >> --
> >> 2.34.1
> >>
> >
> > Thanks, Lorenzo
>

Thanks< Lorenzo

Re: [PATCH 8/9] mm/rmap: introduce folio_try_share_anon_rmap_ptes

Posted by Dev Jain 1 day, 1 hour ago


On 19/03/26 9:17 pm, Lorenzo Stoakes (Oracle) wrote:
> On Wed, Mar 11, 2026 at 01:39:25PM +0530, Dev Jain wrote:
>>
>>
>> On 10/03/26 3:08 pm, Lorenzo Stoakes (Oracle) wrote:
>>> On Tue, Mar 10, 2026 at 01:00:12PM +0530, Dev Jain wrote:
>>>> In the quest of enabling batched unmapping of anonymous folios, we need to
>>>> handle the sharing of exclusive pages. Hence, a batched version of
>>>> folio_try_share_anon_rmap_pte is required.
>>>>
>>>> Currently, the sole purpose of nr_pages in __folio_try_share_anon_rmap is
>>>> to do some rmap sanity checks. Add helpers to set and clear the
>>>> PageAnonExclusive bit on a batch of nr_pages. Note that
>>>> __folio_try_share_anon_rmap can receive nr_pages == HPAGE_PMD_NR from the
>>>> PMD path, but currently we only clear the bit on the head page. Retain this
>>>> behaviour by setting nr_pages = 1 in case the caller is
>>>> folio_try_share_anon_rmap_pmd.
>>>>
>>>> Signed-off-by: Dev Jain <dev.jain@arm.com>
>>>> ---
>>>>  include/linux/page-flags.h | 11 +++++++++++
>>>>  include/linux/rmap.h       | 28 ++++++++++++++++++++++++++--
>>>>  mm/rmap.c                  |  2 +-
>>>>  3 files changed, 38 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
>>>> index 0e03d816e8b9d..1d74ed9a28c41 100644
>>>> --- a/include/linux/page-flags.h
>>>> +++ b/include/linux/page-flags.h
>>>> @@ -1178,6 +1178,17 @@ static __always_inline void __ClearPageAnonExclusive(struct page *page)
>>>>  	__clear_bit(PG_anon_exclusive, &PF_ANY(page, 1)->flags.f);
>>>>  }
>>>>
>>>> +static __always_inline void ClearPagesAnonExclusive(struct page *page,
>>>> +		unsigned int nr)
>>>
>>> You're randomly moving between nr and nr_pages, can we just consistently use
>>> nr_pages please.
>>
>> Okay.
>>
>>>
>>>> +{
>>>> +	for (;;) {
>>>> +		ClearPageAnonExclusive(page);
>>>> +		if (--nr == 0)
>>>
>>> You really require nr to != 0 here or otherwise you're going to be clearing 4
>>> billion pages :)
>>
>> I'm following the pattern in pgtable.h, see set_ptes,
>> clear_young_dirty_ptes, etc.
>>
>>>
>>>> +			break;
>>>> +		++page;
>>>> +	}
>>>> +}
>>>
>>> Can we put this in mm.h or somewhere else please, and can we do away with this
>>
>> What is the problem with page-flags.h? I am not very aware on which
>> function to put in which header file semantically, so please educate me on
>> this.
> 
> It's a mess in there and this this doesn't really belong.
> 
>>
>>> HorribleNamingConvention, this is new, we can 'get away' with making it
>>> something sensible :)
>>
>> I'll name it folio_clear_pages_anon_exclusive.
> 
> OK
> 
>>
>>
>>>
>>> I wonder if we shouldn't also add a folio pointer here, and some
>>> VM_WARN_ON_ONCE()'s. Like:
>>>
>>> static inline void folio_clear_page_batch(struct folio *folio,
>>> 					  struct page *first_subpage,
>>> 					  unsigned int nr_pages)
>>> {
>>> 	struct page *subpage = first_subpage;
>>>
>>> 	VM_WARN_ON_ONCE(!nr_pages);
>>> 	VM_WARN_ON_ONCE(... check first_subpage in folio ...);
>>> 	VM_WARN_ON_ONCE(... check first_subpage -> first_subpage + nr_pages in folio ...);
>>
>> I like what you are saying, but __folio_rmap_sanity_checks in the caller
>> checks exactly this :)
> 
> This is a shared function that can't be assumed to always be called from a
> context where that has run. VM_WARN_ON_ONCE()'s are free in release kernels so I
> don't think it should be such an issue.
> 
>>
>>>
>>> 	while (nr_pages--)
>>> 		ClearPageAnonExclusive(subpage++);
>>> }
>>>
>>>> +
>>>>  #ifdef CONFIG_MMU
>>>>  #define __PG_MLOCKED		(1UL << PG_mlocked)
>>>>  #else
>>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>>> index 1b7720c66ac87..7a67776dca3fe 100644
>>>> --- a/include/linux/rmap.h
>>>> +++ b/include/linux/rmap.h
>>>> @@ -712,9 +712,13 @@ static __always_inline int __folio_try_share_anon_rmap(struct folio *folio,
>>>>  	VM_WARN_ON_FOLIO(!PageAnonExclusive(page), folio);
>>>>  	__folio_rmap_sanity_checks(folio, page, nr_pages, level);
>>>>
>>>> +	/* We only clear anon-exclusive from head page of PMD folio */
>>>
>>> Is this accurate? David? I thought anon exclusive was per-subpage for any large
>>> folio...?
>>
>> The current behaviour is to do this only. I was also surprised with this,
>> so I had dug in and found out:
>>
>> https://lore.kernel.org/all/20220428083441.37290-13-david@redhat.com/
>>
>> where David says:
>>
>> "Long story short: once
>> PTE-mapped, we have to track information about exclusivity per sub-page,
>> but until then, we can just track it for the compound page in the head
>> page and not having to update a whole bunch of subpages all of the time
>> for a simple PMD mapping of a THP."
> 
> OK
> 
>>
>>
>>>
>>> If we're explicitly doing this for some reason here, then why introduce it?
>>>
>>>> +	if (level == PGTABLE_LEVEL_PMD)
>>>> +		nr_pages = 1;
>>>> +
>>>>  	/* device private folios cannot get pinned via GUP. */
>>>>  	if (unlikely(folio_is_device_private(folio))) {
>>>> -		ClearPageAnonExclusive(page);
>>>> +		ClearPagesAnonExclusive(page, nr_pages);
>>>
>>> I really kind of hate this 'we are looking at subpage X with variable page in
>>> folio Y, but we don't mention Y' thing. It's super confusing that we have a
>>> pointer to a thing which sometimes we deref and treat as a value we care about
>>> and sometimes treat as an array.
>>>
>>> This pattern exists throughout all the batched stuff and I kind of hate it
>>> everywhere.
>>>
>>> I guess the batching means that we are looking at a sub-folio range.
>>>
>>> If C had a better type system we could somehow have a type that encoded this,
>>> but it doesn't :>)
>>>
>>> But I wonder if we shouldn't just go ahead and rename page -> pages and be
>>> consistent about this?
>>
>> Agreed. You are correct in saying that this function should receive struct
>> folio to assert that we are essentially in a page array, and some sanity
>> checking should happen. But the callers are already doing the checking
>> in folio_rmap_sanity_checks. Let me think on this.

I have no idea why I used the word "caller" here. Its been some weeks so hopefully
my brain works more coherently after a vacation ...

See folio_add_anon_rmap_ptes, folio_remove_rmap_ptes, folio_try_dup_anon_rmap_ptes
etc, all of them eventually assert the constraints in the deeper functions -
__folio_add_rmap, __folio_remove_rmap, __folio_try_dup_anon_rmap - via __folio_rmap_sanity_checks.

So I meant to say that the batched function advertised to the rest of the mm code
eventually checks all of this.

Indeed my argument regarding "folio_pte_batch() in the caller already checks this"
is shaky as you correctly reason.

So, following the pattern in these functions, I can probably drop all of the
"Context: " part, or just mention that the page table lock needs to be held.

Regarding how to assert that [page, page + nr_pages) lies in the folio ...

renaming page -> pages sounds somewhat confusing - usually when we mean
"an array of pages", we pass a double pointer.

I think the main thing we are missing to document here is that all the
batched functions present in the mm code have an unsaid prerequisite -
the caller *always* first goes through folio_pte_batch_flags() -
(if I am wrong on this then please correct me) the caller cannot construct
an nr_pages and pass it to a batched function, without first checking whether
those nr_pages are "uniform" in some bits, which is why we first go through
folio_pte_batch_flags.

So one alternate solution is to update kerneldoc of every batched helper
to say that the nr_pages passed here must be derived from folio_pte_batch_flags.

Another alternate I was thinking was to do something like

struct folio_subrange {
	struct folio *folio;
	struct page *first_page;
	unsigned long nr_pages;
}

and have a constructor which will do folio_subrange_init() and do the necessary
checks. Then we change all batched functions to accept this. Not sure how
convincing this is.


> 
> You treat that like a license to never put an assert anywhere else in mm... In
> any case I'm not sure I mentioend an assert here, more so the pattern around
> folio v.s subpages.
> 
>>
>>
>>>
>>>>  		return 0;
>>>>  	}
>>>>
>>>> @@ -766,7 +770,7 @@ static __always_inline int __folio_try_share_anon_rmap(struct folio *folio,
>>>>
>>>>  	if (unlikely(folio_maybe_dma_pinned(folio)))
>>>>  		return -EBUSY;
>>>> -	ClearPageAnonExclusive(page);
>>>> +	ClearPagesAnonExclusive(page, nr_pages);
>>>>
>>>>  	/*
>>>>  	 * This is conceptually a smp_wmb() paired with the smp_rmb() in
>>>> @@ -804,6 +808,26 @@ static inline int folio_try_share_anon_rmap_pte(struct folio *folio,
>>>>  	return __folio_try_share_anon_rmap(folio, page, 1, PGTABLE_LEVEL_PTE);
>>>>  }
>>>>
>>>> +/**
>>>> + * folio_try_share_anon_rmap_ptes - try marking exclusive anonymous pages
>>>> + *				    mapped by PTEs possibly shared to prepare
>>>> + *				   for KSM or temporary unmapping
>>>
>>> This description is very confusing. 'Try marking exclusive anonymous pages
>>> [... marking them as what?] mapped by PTEs[, or (]possibly shared[, or )] to
>>> prepare for KSM[under what circumstances? Why mention KSM here?] or temporary
>>> unmapping [why temporary?]
>>>
>>> OK I think you mean to say 'marking' them as 'possibly' shared.
>>>
>>> But really by 'shared' you mean clearing anon exclusive right? So maybe the
>>> function name and description should reference that instead.
>>>
>>> But this needs clarifying. This isn't an exercise in minimum number of words to
>>> describe the function.
>>>
>>> Ohhh now I see this is what the comment is in folio_try_share_anon_rmap_pte() :P
>>>
>>> Well, I wish we could update the original too ;) but OK this is fine as-is to
>>> matc that then.
>>>
>>>> + * @folio:	The folio to share a mapping of
>>>> + * @page:	The first mapped exclusive page of the batch
>>>> + * @nr_pages:	The number of pages to share (batch size)
>>>> + *
>>>> + * See folio_try_share_anon_rmap_pte for full description.
>>>> + *
>>>> + * Context: The caller needs to hold the page table lock and has to have the
>>>> + * page table entries cleared/invalidated. Those PTEs used to map consecutive
>>>> + * pages of the folio passed here. The PTEs are all in the same PMD and VMA.
>>>
>>> Can we VM_WARN_ON_ONCE() any of this? Not completely a necessity.
>>
>> Again, we have WARN checks in folio_rmap_sanity_checks, and even in
>> folio_pte_batch. I am afraid of duplication.
> 
> Why on earth are you bothering with 'context' then?
> 
> If having run folio_rmap_sanity_checks() means we never have to assert or think
> about state or context again then why bother?
> 
> I looked in __folio_rma_sanity_checks() and I see no:
> 
> Breaking it down:
> 
> Context: The caller needs to hold the page table lock and has to have the
>          page table entries cleared/invalidated. Those PTEs used to map consecutive
>          pages of the folio passed here. The PTEs are all in the same PMD and VMA.
> 
> - Page table lock <-- NOT checked
> - Page table entries cleared/invalidated <-- NOT checked
> - PTEs passed in must map all pages in the folio? <-- you aren't passing in PTEs?
> - PTEs are all in the same PMD <- You aren't passing in PTEs?
> - PTES are all in the same VMA <- You aren't passing in PTEs?
> 
> So that's hardly a coherent picture no?
> 
>>
>>>
>>>> + */
>>>> +static inline int folio_try_share_anon_rmap_ptes(struct folio *folio,
>>>> +		struct page *page, unsigned int nr)
>>>> +{
>>>> +	return __folio_try_share_anon_rmap(folio, page, nr, PGTABLE_LEVEL_PTE);
>>>> +}
>>>> +
>>>>  /**
>>>>   * folio_try_share_anon_rmap_pmd - try marking an exclusive anonymous page
>>>>   *				   range mapped by a PMD possibly shared to
>>>> diff --git a/mm/rmap.c b/mm/rmap.c
>>>> index 42f6b00cced01..bba5b571946d8 100644
>>>> --- a/mm/rmap.c
>>>> +++ b/mm/rmap.c
>>>> @@ -2300,7 +2300,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma,
>>>>
>>>>  			/* See folio_try_share_anon_rmap(): clear PTE first. */
>>>>  			if (anon_exclusive &&
>>>> -			    folio_try_share_anon_rmap_pte(folio, subpage)) {
>>>> +			    folio_try_share_anon_rmap_ptes(folio, subpage, 1)) {
>>>
>>> I guess this is because you intend to make this batched later with >1, but I
>>> don't see the point of adding this since folio_try_share_anon_rmap_pte() already
>>> does what you're doing here.
>>>
>>> So why not just change this when you actually batch?
>>
>> It is generally the consensus to introduce a new function along with its
>> caller. Although one may argue that the caller introduced here is not
> 
> That's not always the case, sometimes it makes more sense and is cleaner to
> introduce it first.
> 
> All rules of thumb are open to sensible interpretation.
> 
> I'm not sure arbitrarily using the function in a way that makes no sense in the
> code is a good approach but this isn't the end of the world.
> 
>> a functional change (still passing nr_pages = 1). So I am fine doing
>> what you suggest.
>>
>>>
>>> Buuuut.... haven't you not already changed this whole function to now 'jump'
>>> ahead if batched, so why are we only specifying nr_pages = 1 here?
>>
>> Because...please bear with the insanity :) currently we are in a ridiculous
>> situation where
>>
>> nr_pages can be > 1 for file folios, and lazyfree folios, *and* it is
>> required that the VMA is non-uffd.
>>
>> So, the "jump" thingy I was doing in the previous patch was adding support
>> for file folios, belonging to uffd VMAs (see pte_install_uffd_wp_if_needed,
>> we need to handle uffd-wp marker for file folios only, and also I should
>> have mentioned "file folio" in the subject line of that patch, of course
>> I missed that because reasoning through this code is very difficult)
>>
>> To answer your question, currently for anon folios nr_pages == 1, so
>> the jump is a no-op.
>>
>> When I had discovered the uffd-wp bug some weeks back, I was pushing
>> back against the idea of hacking around it by disabling batching
>> for uffd-VMAs in folio_unmap_pte_batch, but solve it then and there
>> properly. Now we have too many cases - first we added lazyfree support,
>> then file-non-uffd support, my patch 5 adds file-uffd support, and
>> last patch finally completes this with anon support.
> 
> I am not a fan of any of that, this just speaks to the need to clean this up
> before endlessly adding more functionality and piles of complexity and
> confusion.
> 
> Let's just add some patches to do things sanely please.
> 
>>
>>
>>>
>>> Honestly I think this function needs to be fully refactored away from the
>>> appalling giant-ball-of-string mess it is now before we try to add in batching
>>> to be honest.
>>>
>>> Let's not accumulate more technical debt.
>>
>> I agree, I am happy to help in cleaning up this function.
> 
> Right, well then this series needs to have some clean up patches in it first.
> 
>>
>>>
>>>
>>>>  				folio_put_swap(folio, subpage, 1);
>>>>  				set_pte_at(mm, address, pvmw.pte, pteval);
>>>>  				goto walk_abort;
>>>> --
>>>> 2.34.1
>>>>
>>>
>>> Thanks, Lorenzo
>>
> 
> Thanks< Lorenzo
>

Re: [PATCH 8/9] mm/rmap: introduce folio_try_share_anon_rmap_ptes

Posted by Wei Yang 3 weeks, 6 days ago

On Wed, Mar 11, 2026 at 01:39:25PM +0530, Dev Jain wrote:
>
[...]
>>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
>>> index 1b7720c66ac87..7a67776dca3fe 100644
>>> --- a/include/linux/rmap.h
>>> +++ b/include/linux/rmap.h
>>> @@ -712,9 +712,13 @@ static __always_inline int __folio_try_share_anon_rmap(struct folio *folio,
>>>  	VM_WARN_ON_FOLIO(!PageAnonExclusive(page), folio);
>>>  	__folio_rmap_sanity_checks(folio, page, nr_pages, level);
>>>
>>> +	/* We only clear anon-exclusive from head page of PMD folio */
>> 
>> Is this accurate? David? I thought anon exclusive was per-subpage for any large
>> folio...?
>
>The current behaviour is to do this only. I was also surprised with this,
>so I had dug in and found out:
>
>https://lore.kernel.org/all/20220428083441.37290-13-david@redhat.com/
>
>where David says:
>
>"Long story short: once
>PTE-mapped, we have to track information about exclusivity per sub-page,
>but until then, we can just track it for the compound page in the head
>page and not having to update a whole bunch of subpages all of the time
>for a simple PMD mapping of a THP."
>

Thanks for digging.

One tiny thing:

Now we have a comment in PageAnonExclusive(), which says:

	/*
	 * HugeTLB stores this information on the head page; THP keeps it per
	 * page
	 */

This may confuse readers? Not your fault, just want to point it out.

-- 
Wei Yang
Help you, Help me