[PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()

Baolin Wang posted 2 patches 2 weeks, 6 days ago
There is a newer version of this series
[PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by Baolin Wang 2 weeks, 6 days ago
Currently, we no longer attempt to write back filesystem folios in pageout(),
and only tmpfs/shmem folios and anonymous swapcache folios can be written back.
Moreover, tmpfs/shmem and swapcache folios do not use the PG_private flag,
which means no fs-private private data is used. Therefore, we can remove the
redundant folio_test_private() checks and related buffer_head release logic.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 mm/vmscan.c | 16 +---------------
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index f1fc36729ddd..8056fccb9cc4 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -697,22 +697,8 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
 	 * swap_backing_dev_info is bust: it doesn't reflect the
 	 * congestion state of the swapdevs.  Easy to fix, if needed.
 	 */
-	if (!is_page_cache_freeable(folio))
+	if (!is_page_cache_freeable(folio) || !mapping)
 		return PAGE_KEEP;
-	if (!mapping) {
-		/*
-		 * Some data journaling orphaned folios can have
-		 * folio->mapping == NULL while being dirty with clean buffers.
-		 */
-		if (folio_test_private(folio)) {
-			if (try_to_free_buffers(folio)) {
-				folio_clear_dirty(folio);
-				pr_info("%s: orphaned folio\n", __func__);
-				return PAGE_CLEAN;
-			}
-		}
-		return PAGE_KEEP;
-	}
 
 	if (!shmem_mapping(mapping) && !folio_test_anon(folio))
 		return PAGE_ACTIVATE;
-- 
2.43.7
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by Shakeel Butt 2 weeks, 6 days ago
On Fri, Sep 12, 2025 at 11:45:07AM +0800, Baolin Wang wrote:
> Currently, we no longer attempt to write back filesystem folios in pageout(),
> and only tmpfs/shmem folios and anonymous swapcache folios can be written back.
> Moreover, tmpfs/shmem and swapcache folios do not use the PG_private flag,
> which means no fs-private private data is used. Therefore, we can remove the
> redundant folio_test_private() checks and related buffer_head release logic.
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  mm/vmscan.c | 16 +---------------
>  1 file changed, 1 insertion(+), 15 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index f1fc36729ddd..8056fccb9cc4 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -697,22 +697,8 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
>  	 * swap_backing_dev_info is bust: it doesn't reflect the
>  	 * congestion state of the swapdevs.  Easy to fix, if needed.
>  	 */
> -	if (!is_page_cache_freeable(folio))
> +	if (!is_page_cache_freeable(folio) || !mapping)
>  		return PAGE_KEEP;
> -	if (!mapping) {
> -		/*
> -		 * Some data journaling orphaned folios can have
> -		 * folio->mapping == NULL while being dirty with clean buffers.
> -		 */

Can this case not happen anymore and try_to_free_buffers is not needed?

> -		if (folio_test_private(folio)) {
> -			if (try_to_free_buffers(folio)) {
> -				folio_clear_dirty(folio);
> -				pr_info("%s: orphaned folio\n", __func__);
> -				return PAGE_CLEAN;
> -			}
> -		}
> -		return PAGE_KEEP;
> -	}
>  
>  	if (!shmem_mapping(mapping) && !folio_test_anon(folio))
>  		return PAGE_ACTIVATE;
> -- 
> 2.43.7
>
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by Baolin Wang 2 weeks, 5 days ago

On 2025/9/13 00:13, Shakeel Butt wrote:
> On Fri, Sep 12, 2025 at 11:45:07AM +0800, Baolin Wang wrote:
>> Currently, we no longer attempt to write back filesystem folios in pageout(),
>> and only tmpfs/shmem folios and anonymous swapcache folios can be written back.
>> Moreover, tmpfs/shmem and swapcache folios do not use the PG_private flag,
>> which means no fs-private private data is used. Therefore, we can remove the
>> redundant folio_test_private() checks and related buffer_head release logic.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   mm/vmscan.c | 16 +---------------
>>   1 file changed, 1 insertion(+), 15 deletions(-)
>>
>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>> index f1fc36729ddd..8056fccb9cc4 100644
>> --- a/mm/vmscan.c
>> +++ b/mm/vmscan.c
>> @@ -697,22 +697,8 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
>>   	 * swap_backing_dev_info is bust: it doesn't reflect the
>>   	 * congestion state of the swapdevs.  Easy to fix, if needed.
>>   	 */
>> -	if (!is_page_cache_freeable(folio))
>> +	if (!is_page_cache_freeable(folio) || !mapping)
>>   		return PAGE_KEEP;
>> -	if (!mapping) {
>> -		/*
>> -		 * Some data journaling orphaned folios can have
>> -		 * folio->mapping == NULL while being dirty with clean buffers.
>> -		 */
> 
> Can this case not happen anymore and try_to_free_buffers is not needed?

For dirty file folios, pageout() will return PAGE_KEEP and put them back 
on the LRU list. So even if mapping = NULL, background workers for 
writeback will continue to handle them, rather than in shrink_folio_list().

For clean file folios, the !mapping case will be be handled later in 
shrink_folio_list(), please see the following comments:

/*
  * If the folio has buffers, try to free the buffer
  * mappings associated with this folio. If we succeed
  * we try to free the folio as well.
  *
  * We do this even if the folio is dirty.
  * filemap_release_folio() does not perform I/O, but it
  * is possible for a folio to have the dirty flag set,
  * but it is actually clean (all its buffers are clean).
  * This happens if the buffers were written out directly,
  * with submit_bh(). ext3 will do this, as well as
  * the blockdev mapping.  filemap_release_folio() will
  * discover that cleanness and will drop the buffers
  * and mark the folio clean - it can be freed.
  *
  * Rarely, folios can have buffers and no ->mapping.
  * These are the folios which were not successfully
  * invalidated in truncate_cleanup_folio().  We try to
  * drop those buffers here and if that worked, and the
  * folio is no longer mapped into process address space
  * (refcount == 1) it can be freed.  Otherwise, leave
  * the folio on the LRU so it is swappable.
  */

>> -		if (folio_test_private(folio)) {
>> -			if (try_to_free_buffers(folio)) {
>> -				folio_clear_dirty(folio);
>> -				pr_info("%s: orphaned folio\n", __func__);
>> -				return PAGE_CLEAN;
>> -			}
>> -		}
>> -		return PAGE_KEEP;
>> -	}
>>   
>>   	if (!shmem_mapping(mapping) && !folio_test_anon(folio))
>>   		return PAGE_ACTIVATE;
>> -- 
>> 2.43.7
>>
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by Hugh Dickins 2 weeks, 2 days ago
On Sat, 13 Sep 2025, Baolin Wang wrote:
> On 2025/9/13 00:13, Shakeel Butt wrote:
> > On Fri, Sep 12, 2025 at 11:45:07AM +0800, Baolin Wang wrote:
> >> Currently, we no longer attempt to write back filesystem folios in
> >> pageout(),
> >> and only tmpfs/shmem folios and anonymous swapcache folios can be written
> >> back.
> >> Moreover, tmpfs/shmem and swapcache folios do not use the PG_private flag,
> >> which means no fs-private private data is used. Therefore, we can remove
> >> the
> >> redundant folio_test_private() checks and related buffer_head release
> >> logic.
> >>
> >> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> >> ---
> >>   mm/vmscan.c | 16 +---------------
> >>   1 file changed, 1 insertion(+), 15 deletions(-)
> >>
> >> diff --git a/mm/vmscan.c b/mm/vmscan.c
> >> index f1fc36729ddd..8056fccb9cc4 100644
> >> --- a/mm/vmscan.c
> >> +++ b/mm/vmscan.c
> >> @@ -697,22 +697,8 @@ static pageout_t pageout(struct folio *folio, struct
> >> address_space *mapping,
> >>     * swap_backing_dev_info is bust: it doesn't reflect the
> >>     * congestion state of the swapdevs.  Easy to fix, if needed.
> >>     */
> >> -	if (!is_page_cache_freeable(folio))
> >> +	if (!is_page_cache_freeable(folio) || !mapping)
> >>   		return PAGE_KEEP;
> >> -	if (!mapping) {
> >> -		/*
> >> -		 * Some data journaling orphaned folios can have
> >> -		 * folio->mapping == NULL while being dirty with clean
> >> buffers.
> >> -		 */
> > 
> > Can this case not happen anymore and try_to_free_buffers is not needed?
> 
> For dirty file folios, pageout() will return PAGE_KEEP and put them back on
> the LRU list. So even if mapping = NULL, background workers for writeback will
> continue to handle them, rather than in shrink_folio_list().

You've persuaded everyone else, but I'm still not convinced:
what are those "background workers for writeback",
that manage to work on orphaned folios with NULL mapping?

I think *this* is the place which deals with that case, and you're
now proposing to remove it (and returning PAGE_KEEP not PAGE_CLEAN,
so it misses the filemap_release_folio() below the switch(pageout())).

There's even a comment over in migrate_folio_unmap():
"Everywhere else except page reclaim, the page is invisible to the vm".

And your argument that the code *afterwards* rejects everything but
shmem or anon, and neither of those would have folio_test_private(),
certainly did not convince me.

Please persuade me.  But I've no evidence that this case does or does
not still arise; and agree that there must be cleaner ways of doing it.

Hugh

> >> -		if (folio_test_private(folio)) {
> >> -			if (try_to_free_buffers(folio)) {
> >> -				folio_clear_dirty(folio);
> >> -				pr_info("%s: orphaned folio\n", __func__);
> >> -				return PAGE_CLEAN;
> >> -			}
> >> -		}
> >> -		return PAGE_KEEP;
> >> -	}
> >>   
> >>    if (!shmem_mapping(mapping) && !folio_test_anon(folio))
> >>   		return PAGE_ACTIVATE;
> >> -- 
> >> 2.43.7
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by Baolin Wang 2 weeks, 2 days ago

On 2025/9/16 12:00, Hugh Dickins wrote:
> On Sat, 13 Sep 2025, Baolin Wang wrote:
>> On 2025/9/13 00:13, Shakeel Butt wrote:
>>> On Fri, Sep 12, 2025 at 11:45:07AM +0800, Baolin Wang wrote:
>>>> Currently, we no longer attempt to write back filesystem folios in
>>>> pageout(),
>>>> and only tmpfs/shmem folios and anonymous swapcache folios can be written
>>>> back.
>>>> Moreover, tmpfs/shmem and swapcache folios do not use the PG_private flag,
>>>> which means no fs-private private data is used. Therefore, we can remove
>>>> the
>>>> redundant folio_test_private() checks and related buffer_head release
>>>> logic.
>>>>
>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>> ---
>>>>    mm/vmscan.c | 16 +---------------
>>>>    1 file changed, 1 insertion(+), 15 deletions(-)
>>>>
>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>> index f1fc36729ddd..8056fccb9cc4 100644
>>>> --- a/mm/vmscan.c
>>>> +++ b/mm/vmscan.c
>>>> @@ -697,22 +697,8 @@ static pageout_t pageout(struct folio *folio, struct
>>>> address_space *mapping,
>>>>      * swap_backing_dev_info is bust: it doesn't reflect the
>>>>      * congestion state of the swapdevs.  Easy to fix, if needed.
>>>>      */
>>>> -	if (!is_page_cache_freeable(folio))
>>>> +	if (!is_page_cache_freeable(folio) || !mapping)
>>>>    		return PAGE_KEEP;
>>>> -	if (!mapping) {
>>>> -		/*
>>>> -		 * Some data journaling orphaned folios can have
>>>> -		 * folio->mapping == NULL while being dirty with clean
>>>> buffers.
>>>> -		 */
>>>
>>> Can this case not happen anymore and try_to_free_buffers is not needed?
>>
>> For dirty file folios, pageout() will return PAGE_KEEP and put them back on
>> the LRU list. So even if mapping = NULL, background workers for writeback will
>> continue to handle them, rather than in shrink_folio_list().
> 
> You've persuaded everyone else, but I'm still not convinced:
> what are those "background workers for writeback",
> that manage to work on orphaned folios with NULL mapping?

Sorry for not being clear. The ‘background workers for writeback’ here 
refer to the workers responsible for handling the writeback of dirty 
data (see wb_workfn()).

> I think *this* is the place which deals with that case, and you're
> now proposing to remove it (and returning PAGE_KEEP not PAGE_CLEAN,
> so it misses the filemap_release_folio() below the switch(pageout())).
> 
> There's even a comment over in migrate_folio_unmap():
> "Everywhere else except page reclaim, the page is invisible to the vm".
> 
> And your argument that the code *afterwards* rejects everything but
> shmem or anon, and neither of those would have folio_test_private(),
> certainly did not convince me.
> 
> Please persuade me.  But I've no evidence that this case does or does
> not still arise; and agree that there must be cleaner ways of doing it.

I did some further analysis, and seems you are right. The flush worker 
does check the folio mapping when writeback, but it does not further 
release the private data, for example, in mpage_prepare_extent_to_map():

/*
  * If the page is no longer dirty, or its mapping no
  * longer corresponds to inode we are writing (which
  * means it has been truncated or invalidated), or the
  * page is already under writeback and we are not doing
  * a data integrity writeback, skip the page
  */
if (!folio_test_dirty(folio) ||
     (folio_test_writeback(folio) &&
      (mpd->wbc->sync_mode == WB_SYNC_NONE)) ||
     unlikely(folio->mapping != mapping)) {
	folio_unlock(folio);
	continue;
}

This is somewhat beyond my expectations. I expected the flush worker 
could handle such cases, allowing page reclaim to skip from handling 
dirty file folios to improve reclaim efficiency. Obviously, I overlooked 
this corner case.

Additionally, I'm still struggling to understand this case where a folio 
is dirty but has a NULL mapping, but I might understand that ext3 
journaling might do this from the comments in truncate_cleanup_folio().

But I still doubt whether this case exists because the refcount check in 
is_page_cache_freeable() considers the pagecache. This means if this 
dirty folio's mapping is NULL, the following check would return false. 
If it returns true, it means that even if we release the private data 
here, the orphaned folio's refcount still doesn't meet the requirements 
for being reclaimed. Please correct me if I missed anything.

static inline int is_page_cache_freeable(struct folio *folio)
{
         /*
          * A freeable page cache folio is referenced only by the caller
          * that isolated the folio, the page cache and optional filesystem
          * private data at folio->private.
          */
         return folio_ref_count(folio) - folio_test_private(folio) ==
                 1 + folio_nr_pages(folio);
}

Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by Baolin Wang 2 weeks, 1 day ago

On 2025/9/16 15:18, Baolin Wang wrote:
> 
> 
> On 2025/9/16 12:00, Hugh Dickins wrote:
>> On Sat, 13 Sep 2025, Baolin Wang wrote:
>>> On 2025/9/13 00:13, Shakeel Butt wrote:
>>>> On Fri, Sep 12, 2025 at 11:45:07AM +0800, Baolin Wang wrote:
>>>>> Currently, we no longer attempt to write back filesystem folios in
>>>>> pageout(),
>>>>> and only tmpfs/shmem folios and anonymous swapcache folios can be 
>>>>> written
>>>>> back.
>>>>> Moreover, tmpfs/shmem and swapcache folios do not use the 
>>>>> PG_private flag,
>>>>> which means no fs-private private data is used. Therefore, we can 
>>>>> remove
>>>>> the
>>>>> redundant folio_test_private() checks and related buffer_head release
>>>>> logic.
>>>>>
>>>>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>> ---
>>>>>    mm/vmscan.c | 16 +---------------
>>>>>    1 file changed, 1 insertion(+), 15 deletions(-)
>>>>>
>>>>> diff --git a/mm/vmscan.c b/mm/vmscan.c
>>>>> index f1fc36729ddd..8056fccb9cc4 100644
>>>>> --- a/mm/vmscan.c
>>>>> +++ b/mm/vmscan.c
>>>>> @@ -697,22 +697,8 @@ static pageout_t pageout(struct folio *folio, 
>>>>> struct
>>>>> address_space *mapping,
>>>>>      * swap_backing_dev_info is bust: it doesn't reflect the
>>>>>      * congestion state of the swapdevs.  Easy to fix, if needed.
>>>>>      */
>>>>> -    if (!is_page_cache_freeable(folio))
>>>>> +    if (!is_page_cache_freeable(folio) || !mapping)
>>>>>            return PAGE_KEEP;
>>>>> -    if (!mapping) {
>>>>> -        /*
>>>>> -         * Some data journaling orphaned folios can have
>>>>> -         * folio->mapping == NULL while being dirty with clean
>>>>> buffers.
>>>>> -         */
>>>>
>>>> Can this case not happen anymore and try_to_free_buffers is not needed?
>>>
>>> For dirty file folios, pageout() will return PAGE_KEEP and put them 
>>> back on
>>> the LRU list. So even if mapping = NULL, background workers for 
>>> writeback will
>>> continue to handle them, rather than in shrink_folio_list().
>>
>> You've persuaded everyone else, but I'm still not convinced:
>> what are those "background workers for writeback",
>> that manage to work on orphaned folios with NULL mapping?
> 
> Sorry for not being clear. The ‘background workers for writeback’ here 
> refer to the workers responsible for handling the writeback of dirty 
> data (see wb_workfn()).
> 
>> I think *this* is the place which deals with that case, and you're
>> now proposing to remove it (and returning PAGE_KEEP not PAGE_CLEAN,
>> so it misses the filemap_release_folio() below the switch(pageout())).
>>
>> There's even a comment over in migrate_folio_unmap():
>> "Everywhere else except page reclaim, the page is invisible to the vm".
>>
>> And your argument that the code *afterwards* rejects everything but
>> shmem or anon, and neither of those would have folio_test_private(),
>> certainly did not convince me.
>>
>> Please persuade me.  But I've no evidence that this case does or does
>> not still arise; and agree that there must be cleaner ways of doing it.
> 
> I did some further analysis, and seems you are right. The flush worker 
> does check the folio mapping when writeback, but it does not further 
> release the private data, for example, in mpage_prepare_extent_to_map():
> 
> /*
>   * If the page is no longer dirty, or its mapping no
>   * longer corresponds to inode we are writing (which
>   * means it has been truncated or invalidated), or the
>   * page is already under writeback and we are not doing
>   * a data integrity writeback, skip the page
>   */
> if (!folio_test_dirty(folio) ||
>      (folio_test_writeback(folio) &&
>       (mpd->wbc->sync_mode == WB_SYNC_NONE)) ||
>      unlikely(folio->mapping != mapping)) {
>      folio_unlock(folio);
>      continue;
> }
> 
> This is somewhat beyond my expectations. I expected the flush worker 
> could handle such cases, allowing page reclaim to skip from handling 
> dirty file folios to improve reclaim efficiency. Obviously, I overlooked 
> this corner case.
> 
> Additionally, I'm still struggling to understand this case where a folio 
> is dirty but has a NULL mapping, but I might understand that ext3 
> journaling might do this from the comments in truncate_cleanup_folio().
> 
> But I still doubt whether this case exists because the refcount check in 
> is_page_cache_freeable() considers the pagecache. This means if this 
> dirty folio's mapping is NULL, the following check would return false. 
> If it returns true, it means that even if we release the private data 
> here, the orphaned folio's refcount still doesn't meet the requirements 
> for being reclaimed. Please correct me if I missed anything.
> 
> static inline int is_page_cache_freeable(struct folio *folio)
> {
>          /*
>           * A freeable page cache folio is referenced only by the caller
>           * that isolated the folio, the page cache and optional filesystem
>           * private data at folio->private.
>           */
>          return folio_ref_count(folio) - folio_test_private(folio) ==
>                  1 + folio_nr_pages(folio);
> }
> 

I continued to dig into the historical commits, where the private check 
was introduced in 2005 by commit ce91b575332b ("orphaned pagecache 
memleak fix"), as the commit message mentioned, it was to address the 
issue where reiserfs pagecache may be truncated while still pinned:

"
Chris found that with data journaling a reiserfs pagecache may be 
truncate while still pinned.  The truncation removes the page->mapping, 
but the page is still listed in the VM queues because it still has 
buffers.  Then during the journaling process, a buffer is marked dirty 
and that sets the PG_dirty bitflag as well (in mark_buffer_dirty). 
After that the page is leaked because it's both dirty and without a mapping.

So we must allow pages without mapping and dirty to reach the 
PagePrivate check.  The page->mapping will be checked again right after 
the PagePrivate check.
"

In 2008, commit a2b345642f530 ("Fix dirty page accounting leak with ext3 
data=journal") seems to be dealing with a similar issue, where the page 
becomes dirty after truncation, and provides a very useful call stack:
truncate_complete_page()
       cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
       do_invalidatepage()
         ext3_invalidatepage()
           journal_invalidatepage()
             journal_unmap_buffer()
               __dispose_buffer()
                 __journal_unfile_buffer()
                   __journal_temp_unlink_buffer()
                     mark_buffer_dirty(); // PG_dirty set, incr. dirty pages

In this fix, we forcefully clear the page's dirty flag during truncation 
(in truncate_complete_page()).

However, I am still unsure how the reiserfs case is checked through 
is_page_cache_freeable() (if the pagecache is truncated, then the 
pagecache refcount would be decreased). Fortunately, reiserfs was 
removed in 2024 by commit fb6f20ecb121 ("reiserfs: The last commit").
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by Hugh Dickins 2 weeks, 1 day ago
On Wed, 17 Sep 2025, Baolin Wang wrote:
> On 2025/9/16 15:18, Baolin Wang wrote:
...
> > 
> > Additionally, I'm still struggling to understand this case where a folio is
> > dirty but has a NULL mapping, but I might understand that ext3 journaling
> > might do this from the comments in truncate_cleanup_folio().
> > 
> > But I still doubt whether this case exists because the refcount check in
> > is_page_cache_freeable() considers the pagecache. This means if this dirty
> > folio's mapping is NULL, the following check would return false. If it
> > returns true, it means that even if we release the private data here, the
> > orphaned folio's refcount still doesn't meet the requirements for being
> > reclaimed. Please correct me if I missed anything.
> > 
> > static inline int is_page_cache_freeable(struct folio *folio)
> > {
> >          /*
> >           * A freeable page cache folio is referenced only by the caller
> >           * that isolated the folio, the page cache and optional filesystem
> >           * private data at folio->private.
> >           */
> >          return folio_ref_count(folio) - folio_test_private(folio) ==
> >                  1 + folio_nr_pages(folio);
> > }
> > 

Good point, yes, it's surprising that that such a folio could pass
that check and reach the code you're proposing to delete.

(Though a racing scanner of physical memory could raise the refcount
momentarily, causing the folio to look like a page cache freeable.)

> 
> I continued to dig into the historical commits, where the private check was
> introduced in 2005 by commit ce91b575332b ("orphaned pagecache memleak fix"),
> as the commit message mentioned, it was to address the issue where reiserfs
> pagecache may be truncated while still pinned:

Yes, I had been doing the same research, coming to that same 2.6.12 commit,
one of the last to go in before the birth of git.

> 
> "
> Chris found that with data journaling a reiserfs pagecache may be truncate
> while still pinned.  The truncation removes the page->mapping, but the page is
> still listed in the VM queues because it still has buffers.  Then during the
> journaling process, a buffer is marked dirty and that sets the PG_dirty
> bitflag as well (in mark_buffer_dirty). After that the page is leaked because
> it's both dirty and without a mapping.
> 
> So we must allow pages without mapping and dirty to reach the PagePrivate
> check.  The page->mapping will be checked again right after the PagePrivate
> check.
> "
> 
> In 2008, commit a2b345642f530 ("Fix dirty page accounting leak with ext3
> data=journal") seems to be dealing with a similar issue, where the page
> becomes dirty after truncation, and provides a very useful call stack:
> truncate_complete_page()
>       cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
>       do_invalidatepage()
>         ext3_invalidatepage()
>           journal_invalidatepage()
>             journal_unmap_buffer()
>               __dispose_buffer()
>                 __journal_unfile_buffer()
>                   __journal_temp_unlink_buffer()
>                     mark_buffer_dirty(); // PG_dirty set, incr. dirty pages
> 
> In this fix, we forcefully clear the page's dirty flag during truncation (in
> truncate_complete_page()).

But missed that one.

> 
> However, I am still unsure how the reiserfs case is checked through
> is_page_cache_freeable() (if the pagecache is truncated, then the pagecache
> refcount would be decreased). Fortunately, reiserfs was removed in 2024 by
> commit fb6f20ecb121 ("reiserfs: The last commit").

I did find a single report of the "pageout: orphaned page" message
(where Andrew claims the message as his forgotten temporary debugging):

https://lore.kernel.org/all/20061002170353.GA26816@king.bitgnome.net/

From 2006 on 2.6.18: and indeed it was on reiserfs - maybe reiserfs
had some extra refcounting on these pages, which caused them to pass
the is_page_cache_freeable() check (but would they actually be freeable,
or leaked? TBH I haven't tried to work that out, nor care very much).

Where does this leave us?  I think it says that your deletion of that
block from pageout() is acceptable now, with reiserfs gone to history.

Though somehow I would prefer, like that ext3 fix, that we would just
clear dirty on such a folio (to avoid "Bad page state" later if it is
freeable), not go to pageout(), but proceed to the folio_needs_release()
block like for clean folios.

But whatever: you've persuaded me! I withdraw my objection to your patch.

Thanks,
Hugh
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by Baolin Wang 2 weeks ago

On 2025/9/17 15:49, Hugh Dickins wrote:
> On Wed, 17 Sep 2025, Baolin Wang wrote:
>> On 2025/9/16 15:18, Baolin Wang wrote:
> ...
>>>
>>> Additionally, I'm still struggling to understand this case where a folio is
>>> dirty but has a NULL mapping, but I might understand that ext3 journaling
>>> might do this from the comments in truncate_cleanup_folio().
>>>
>>> But I still doubt whether this case exists because the refcount check in
>>> is_page_cache_freeable() considers the pagecache. This means if this dirty
>>> folio's mapping is NULL, the following check would return false. If it
>>> returns true, it means that even if we release the private data here, the
>>> orphaned folio's refcount still doesn't meet the requirements for being
>>> reclaimed. Please correct me if I missed anything.
>>>
>>> static inline int is_page_cache_freeable(struct folio *folio)
>>> {
>>>           /*
>>>            * A freeable page cache folio is referenced only by the caller
>>>            * that isolated the folio, the page cache and optional filesystem
>>>            * private data at folio->private.
>>>            */
>>>           return folio_ref_count(folio) - folio_test_private(folio) ==
>>>                   1 + folio_nr_pages(folio);
>>> }
>>>
> 
> Good point, yes, it's surprising that that such a folio could pass
> that check and reach the code you're proposing to delete.
> 
> (Though a racing scanner of physical memory could raise the refcount
> momentarily, causing the folio to look like a page cache freeable.)
> 
>>
>> I continued to dig into the historical commits, where the private check was
>> introduced in 2005 by commit ce91b575332b ("orphaned pagecache memleak fix"),
>> as the commit message mentioned, it was to address the issue where reiserfs
>> pagecache may be truncated while still pinned:
> 
> Yes, I had been doing the same research, coming to that same 2.6.12 commit,
> one of the last to go in before the birth of git.
> 
>>
>> "
>> Chris found that with data journaling a reiserfs pagecache may be truncate
>> while still pinned.  The truncation removes the page->mapping, but the page is
>> still listed in the VM queues because it still has buffers.  Then during the
>> journaling process, a buffer is marked dirty and that sets the PG_dirty
>> bitflag as well (in mark_buffer_dirty). After that the page is leaked because
>> it's both dirty and without a mapping.
>>
>> So we must allow pages without mapping and dirty to reach the PagePrivate
>> check.  The page->mapping will be checked again right after the PagePrivate
>> check.
>> "
>>
>> In 2008, commit a2b345642f530 ("Fix dirty page accounting leak with ext3
>> data=journal") seems to be dealing with a similar issue, where the page
>> becomes dirty after truncation, and provides a very useful call stack:
>> truncate_complete_page()
>>        cancel_dirty_page() // PG_dirty cleared, decr. dirty pages
>>        do_invalidatepage()
>>          ext3_invalidatepage()
>>            journal_invalidatepage()
>>              journal_unmap_buffer()
>>                __dispose_buffer()
>>                  __journal_unfile_buffer()
>>                    __journal_temp_unlink_buffer()
>>                      mark_buffer_dirty(); // PG_dirty set, incr. dirty pages
>>
>> In this fix, we forcefully clear the page's dirty flag during truncation (in
>> truncate_complete_page()).
> 
> But missed that one.
> 
>>
>> However, I am still unsure how the reiserfs case is checked through
>> is_page_cache_freeable() (if the pagecache is truncated, then the pagecache
>> refcount would be decreased). Fortunately, reiserfs was removed in 2024 by
>> commit fb6f20ecb121 ("reiserfs: The last commit").
> 
> I did find a single report of the "pageout: orphaned page" message
> (where Andrew claims the message as his forgotten temporary debugging):
> 
> https://lore.kernel.org/all/20061002170353.GA26816@king.bitgnome.net/
> 
>  From 2006 on 2.6.18: and indeed it was on reiserfs - maybe reiserfs
> had some extra refcounting on these pages, which caused them to pass
> the is_page_cache_freeable() check (but would they actually be freeable,
> or leaked? TBH I haven't tried to work that out, nor care very much).
> 
> Where does this leave us?  I think it says that your deletion of that
> block from pageout() is acceptable now, with reiserfs gone to history.
> 
> Though somehow I would prefer, like that ext3 fix, that we would just
> clear dirty on such a folio (to avoid "Bad page state" later if it is
> freeable), not go to pageout(), but proceed to the folio_needs_release()
> block like for clean folios.
> 
> But whatever: you've persuaded me! I withdraw my objection to your patch.

Thanks for confirming. I will update the commit message based on our 
discussion.
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by Shakeel Butt 2 weeks, 2 days ago
On Sat, Sep 13, 2025 at 11:24:35AM +0800, Baolin Wang wrote:
> 
> 
> On 2025/9/13 00:13, Shakeel Butt wrote:
> > On Fri, Sep 12, 2025 at 11:45:07AM +0800, Baolin Wang wrote:
> > > Currently, we no longer attempt to write back filesystem folios in pageout(),
> > > and only tmpfs/shmem folios and anonymous swapcache folios can be written back.
> > > Moreover, tmpfs/shmem and swapcache folios do not use the PG_private flag,
> > > which means no fs-private private data is used. Therefore, we can remove the
> > > redundant folio_test_private() checks and related buffer_head release logic.
> > > 
> > > Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> > > ---
> > >   mm/vmscan.c | 16 +---------------
> > >   1 file changed, 1 insertion(+), 15 deletions(-)
> > > 
> > > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > > index f1fc36729ddd..8056fccb9cc4 100644
> > > --- a/mm/vmscan.c
> > > +++ b/mm/vmscan.c
> > > @@ -697,22 +697,8 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
> > >   	 * swap_backing_dev_info is bust: it doesn't reflect the
> > >   	 * congestion state of the swapdevs.  Easy to fix, if needed.
> > >   	 */
> > > -	if (!is_page_cache_freeable(folio))
> > > +	if (!is_page_cache_freeable(folio) || !mapping)
> > >   		return PAGE_KEEP;
> > > -	if (!mapping) {
> > > -		/*
> > > -		 * Some data journaling orphaned folios can have
> > > -		 * folio->mapping == NULL while being dirty with clean buffers.
> > > -		 */
> > 
> > Can this case not happen anymore and try_to_free_buffers is not needed?
> 
> For dirty file folios, pageout() will return PAGE_KEEP and put them back on
> the LRU list. So even if mapping = NULL, background workers for writeback
> will continue to handle them, rather than in shrink_folio_list().
> 
> For clean file folios, the !mapping case will be be handled later in
> shrink_folio_list(), please see the following comments:
> 
> /*
>  * If the folio has buffers, try to free the buffer
>  * mappings associated with this folio. If we succeed
>  * we try to free the folio as well.
>  *
>  * We do this even if the folio is dirty.
>  * filemap_release_folio() does not perform I/O, but it
>  * is possible for a folio to have the dirty flag set,
>  * but it is actually clean (all its buffers are clean).
>  * This happens if the buffers were written out directly,
>  * with submit_bh(). ext3 will do this, as well as
>  * the blockdev mapping.  filemap_release_folio() will
>  * discover that cleanness and will drop the buffers
>  * and mark the folio clean - it can be freed.
>  *
>  * Rarely, folios can have buffers and no ->mapping.
>  * These are the folios which were not successfully
>  * invalidated in truncate_cleanup_folio().  We try to
>  * drop those buffers here and if that worked, and the
>  * folio is no longer mapped into process address space
>  * (refcount == 1) it can be freed.  Otherwise, leave
>  * the folio on the LRU so it is swappable.
>  */
> 

Thanks a lot for the explanation, yes this makes sense.

With that, please add:

Acked-by: Shakeel Butt <shakeel.butt@linux.dev>
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by Matthew Wilcox 2 weeks, 6 days ago
On Fri, Sep 12, 2025 at 11:45:07AM +0800, Baolin Wang wrote:
> @@ -697,22 +697,8 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
>  	 * swap_backing_dev_info is bust: it doesn't reflect the
>  	 * congestion state of the swapdevs.  Easy to fix, if needed.
>  	 */
> -	if (!is_page_cache_freeable(folio))
> +	if (!is_page_cache_freeable(folio) || !mapping)
>  		return PAGE_KEEP;

I feel like we need to keep the comment (assuming it's still true ...
which it probably is, although there's nobody who would think to update
this comment if it became no longer true).  I would certainly wonder why
we can have this !mapping test.

> -		/*
> -		 * Some data journaling orphaned folios can have
> -		 * folio->mapping == NULL while being dirty with clean buffers.
> -		 */

I approve of this simplification, and I think there's more work to be
done in this area.  Thanks for doing it.
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by Baolin Wang 2 weeks, 5 days ago

On 2025/9/12 23:21, Matthew Wilcox wrote:
> On Fri, Sep 12, 2025 at 11:45:07AM +0800, Baolin Wang wrote:
>> @@ -697,22 +697,8 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
>>   	 * swap_backing_dev_info is bust: it doesn't reflect the
>>   	 * congestion state of the swapdevs.  Easy to fix, if needed.
>>   	 */
>> -	if (!is_page_cache_freeable(folio))
>> +	if (!is_page_cache_freeable(folio) || !mapping)
>>   		return PAGE_KEEP;
> 
> I feel like we need to keep the comment (assuming it's still true ...
> which it probably is, although there's nobody who would think to update
> this comment if it became no longer true).  I would certainly wonder why
> we can have this !mapping test.

I think the !mapping check is still needed here because the tmpfs/shmem 
folios truncation might race with folio reclamation, see shmem_undo_range().

Additionally, the comments here are no longer related to tmpfs/shmem and 
swapcache folios, so I still prefer to remove them.

>> -		/*
>> -		 * Some data journaling orphaned folios can have
>> -		 * folio->mapping == NULL while being dirty with clean buffers.
>> -		 */
> 
> I approve of this simplification, and I think there's more work to be
> done in this area.  Thanks for doing it.

Thanks for taking a look. Yes, I will continue to take a closer look at 
the further work here.
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by Matthew Wilcox 2 weeks, 2 days ago
On Sat, Sep 13, 2025 at 11:04:48AM +0800, Baolin Wang wrote:
> On 2025/9/12 23:21, Matthew Wilcox wrote:
> > On Fri, Sep 12, 2025 at 11:45:07AM +0800, Baolin Wang wrote:
> > > @@ -697,22 +697,8 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
> > >   	 * swap_backing_dev_info is bust: it doesn't reflect the
> > >   	 * congestion state of the swapdevs.  Easy to fix, if needed.
> > >   	 */
> > > -	if (!is_page_cache_freeable(folio))
> > > +	if (!is_page_cache_freeable(folio) || !mapping)
> > >   		return PAGE_KEEP;
> > 
> > I feel like we need to keep the comment (assuming it's still true ...
> > which it probably is, although there's nobody who would think to update
> > this comment if it became no longer true).  I would certainly wonder why
> > we can have this !mapping test.
> 
> I think the !mapping check is still needed here because the tmpfs/shmem
> folios truncation might race with folio reclamation, see shmem_undo_range().

I agree that we still need the !mapping check.  But it needs this comment
that you're deleting, because it's not obvious why we'd have a dirty
folio with a NULL mapping on the LRU list.

> > > -		/*
> > > -		 * Some data journaling orphaned folios can have
> > > -		 * folio->mapping == NULL while being dirty with clean buffers.
> > > -		 */
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by Baolin Wang 2 weeks ago

On 2025/9/16 04:47, Matthew Wilcox wrote:
> On Sat, Sep 13, 2025 at 11:04:48AM +0800, Baolin Wang wrote:
>> On 2025/9/12 23:21, Matthew Wilcox wrote:
>>> On Fri, Sep 12, 2025 at 11:45:07AM +0800, Baolin Wang wrote:
>>>> @@ -697,22 +697,8 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
>>>>    	 * swap_backing_dev_info is bust: it doesn't reflect the
>>>>    	 * congestion state of the swapdevs.  Easy to fix, if needed.
>>>>    	 */
>>>> -	if (!is_page_cache_freeable(folio))
>>>> +	if (!is_page_cache_freeable(folio) || !mapping)
>>>>    		return PAGE_KEEP;
>>>
>>> I feel like we need to keep the comment (assuming it's still true ...
>>> which it probably is, although there's nobody who would think to update
>>> this comment if it became no longer true).  I would certainly wonder why
>>> we can have this !mapping test.
>>
>> I think the !mapping check is still needed here because the tmpfs/shmem
>> folios truncation might race with folio reclamation, see shmem_undo_range().

Sorry for noise. tmpfs/shmem folios will be clean after calling 
truncate_inode_folio().

> I agree that we still need the !mapping check.  But it needs this comment
> that you're deleting, because it's not obvious why we'd have a dirty
> folio with a NULL mapping on the LRU list.

As I discussed with Hugh in another thread[1], the issue of a folio 
being dirty but having a NULL mapping was fixed by commit a2b345642f530 
("Fix dirty page accounting leak with ext3 data=journal"). I can hardly 
believe this kind of situation can still occur nowadays. Anyway, let me 
leave a warning comment here.

[1] 
https://lore.kernel.org/all/1111883c-974f-e4da-a38f-bb3d337185ad@google.com/

>>>> -		/*
>>>> -		 * Some data journaling orphaned folios can have
>>>> -		 * folio->mapping == NULL while being dirty with clean buffers.
>>>> -		 */
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by David Hildenbrand 2 weeks, 6 days ago
On 12.09.25 05:45, Baolin Wang wrote:
> Currently, we no longer attempt to write back filesystem folios in pageout(),
> and only tmpfs/shmem folios and anonymous swapcache folios can be written back.
> Moreover, tmpfs/shmem and swapcache folios do not use the PG_private flag,
> which means no fs-private private data is used. Therefore, we can remove the
> redundant folio_test_private() checks and related buffer_head release logic.
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers

David / dhildenb
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by David Hildenbrand 2 weeks, 6 days ago
On 12.09.25 05:45, Baolin Wang wrote:
> Currently, we no longer attempt to write back filesystem folios in pageout(),
> and only tmpfs/shmem folios and anonymous swapcache folios can be written back.

Can you point me at the code where that is fenced off?

I can spot a folio_is_file_lru() check before we call it, but the 
description tells me that there are indeed ways we could still pass that 
check for file-lru folios if we are kswapd.

> Moreover, tmpfs/shmem and swapcache folios do not use the PG_private flag,
> which means no fs-private private data is used. Therefore, we can remove the
> redundant folio_test_private() checks and related buffer_head release logic.

If that's indeed the case, do we still need the folio_test_private() 
check in is_page_cache_freeable()?

> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>   mm/vmscan.c | 16 +---------------
>   1 file changed, 1 insertion(+), 15 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index f1fc36729ddd..8056fccb9cc4 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -697,22 +697,8 @@ static pageout_t pageout(struct folio *folio, struct address_space *mapping,
>   	 * swap_backing_dev_info is bust: it doesn't reflect the
>   	 * congestion state of the swapdevs.  Easy to fix, if needed.
>   	 */
> -	if (!is_page_cache_freeable(folio))
> +	if (!is_page_cache_freeable(folio) || !mapping)
>   		return PAGE_KEEP;
> -	if (!mapping) {
> -		/*
> -		 * Some data journaling orphaned folios can have
> -		 * folio->mapping == NULL while being dirty with clean buffers.
> -		 */
> -		if (folio_test_private(folio)) {
> -			if (try_to_free_buffers(folio)) {
> -				folio_clear_dirty(folio);
> -				pr_info("%s: orphaned folio\n", __func__);
> -				return PAGE_CLEAN;
> -			}
> -		}
> -		return PAGE_KEEP;
> -	}
>   
>   	if (!shmem_mapping(mapping) && !folio_test_anon(folio))
>   		return PAGE_ACTIVATE;


-- 
Cheers

David / dhildenb
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by David Hildenbrand 2 weeks, 6 days ago
On 12.09.25 10:24, David Hildenbrand wrote:
> On 12.09.25 05:45, Baolin Wang wrote:
>> Currently, we no longer attempt to write back filesystem folios in pageout(),
>> and only tmpfs/shmem folios and anonymous swapcache folios can be written back.
> 
> Can you point me at the code where that is fenced off?
> 
> I can spot a folio_is_file_lru() check before we call it, but the
> description tells me that there are indeed ways we could still pass that
> check for file-lru folios if we are kswapd.
> 
>> Moreover, tmpfs/shmem and swapcache folios do not use the PG_private flag,
>> which means no fs-private private data is used. Therefore, we can remove the
>> redundant folio_test_private() checks and related buffer_head release logic.
> 
> If that's indeed the case, do we still need the folio_test_private()
> check in is_page_cache_freeable()?

Ah, that's patch #2 :)

-- 
Cheers

David / dhildenb
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by Baolin Wang 2 weeks, 6 days ago

On 2025/9/12 16:24, David Hildenbrand wrote:
> On 12.09.25 10:24, David Hildenbrand wrote:
>> On 12.09.25 05:45, Baolin Wang wrote:
>>> Currently, we no longer attempt to write back filesystem folios in 
>>> pageout(),
>>> and only tmpfs/shmem folios and anonymous swapcache folios can be 
>>> written back.
>>
>> Can you point me at the code where that is fenced off?

Please see the following check in pageout():

if (!shmem_mapping(mapping) && !folio_test_anon(folio))
	return PAGE_ACTIVATE;

>> I can spot a folio_is_file_lru() check before we call it, but the
>> description tells me that there are indeed ways we could still pass that
>> check for file-lru folios if we are kswapd.

Yes, but this also needs further cleanup, as kswapd also cannot reclaim 
filesystem dirty folios in pageout(). I plan to continue optimizing 
dirty file folios in isolate_lru_folios() to avoid some unnecessary scans.

>>> Moreover, tmpfs/shmem and swapcache folios do not use the PG_private 
>>> flag,
>>> which means no fs-private private data is used. Therefore, we can 
>>> remove the
>>> redundant folio_test_private() checks and related buffer_head release 
>>> logic.
>>
>> If that's indeed the case, do we still need the folio_test_private()
>> check in is_page_cache_freeable()?
> 
> Ah, that's patch #2 :)
>
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by David Hildenbrand 2 weeks, 6 days ago
On 12.09.25 10:31, Baolin Wang wrote:
> 
> 
> On 2025/9/12 16:24, David Hildenbrand wrote:
>> On 12.09.25 10:24, David Hildenbrand wrote:
>>> On 12.09.25 05:45, Baolin Wang wrote:
>>>> Currently, we no longer attempt to write back filesystem folios in
>>>> pageout(),
>>>> and only tmpfs/shmem folios and anonymous swapcache folios can be
>>>> written back.
>>>
>>> Can you point me at the code where that is fenced off?
> 
> Please see the following check in pageout():
> 
> if (!shmem_mapping(mapping) && !folio_test_anon(folio))
> 	return PAGE_ACTIVATE;
> 

Oh! I was assuming that we had an earlier check for that, not a check 
afterwards. It would be worth spelling that out in the patch description.

Makes sense, thanks!

-- 
Cheers

David / dhildenb
Re: [PATCH 1/2] mm: vmscan: remove folio_test_private() check in pageout()
Posted by Baolin Wang 2 weeks, 6 days ago

On 2025/9/12 16:57, David Hildenbrand wrote:
> On 12.09.25 10:31, Baolin Wang wrote:
>>
>>
>> On 2025/9/12 16:24, David Hildenbrand wrote:
>>> On 12.09.25 10:24, David Hildenbrand wrote:
>>>> On 12.09.25 05:45, Baolin Wang wrote:
>>>>> Currently, we no longer attempt to write back filesystem folios in
>>>>> pageout(),
>>>>> and only tmpfs/shmem folios and anonymous swapcache folios can be
>>>>> written back.
>>>>
>>>> Can you point me at the code where that is fenced off?
>>
>> Please see the following check in pageout():
>>
>> if (!shmem_mapping(mapping) && !folio_test_anon(folio))
>>     return PAGE_ACTIVATE;
>>
> 
> Oh! I was assuming that we had an earlier check for that, not a check 
> afterwards. It would be worth spelling that out in the patch description.

Sure. I'll mention that if I need a respin. Thanks for reviewing.