btrfs: prevent use-after-free prealloc_file_extent_cluster()

[PATCH 6.12] btrfs: prevent use-after-free prealloc_file_extent_cluster()

Posted by JP Kobryn 6 days, 23 hours ago

Users of filemap_lock_folio() need to guard against the situation where
release_folio() has been invoked during reclaim but the folio was
ultimately not removed from the page cache. This patch covers one location
that was overlooked. Affected code has changed as of 6.17, so this patch is
only targeting stable trees prior.

After acquiring the folio, use set_folio_extent_mapped() to ensure the
folio private state is valid. This is especially important in the subpage
case, where the private field is an allocated struct containing bitmap and
lock data.

Without this protection, the race below is possible:

[mm] page cache reclaim path        [fs] relocation in subpage mode
shrink_folio_list()
  folio_trylock() /* lock acquired */
  filemap_release_folio()
    mapping->a_ops->release_folio()
      btrfs_release_folio()
        __btrfs_release_folio()
          clear_folio_extent_mapped()
            btrfs_detach_folio_state()
              bfs = folio_detach_private(folio)
              btrfs_free_folio_state(folio)
                kfree(bfs) /* point A */

                                   prealloc_file_extent_cluster()
                                     filemap_lock_folio()
                                       folio_try_get() /* inc refcount */
                                       folio_lock() /* wait for lock */

  if (...)
    ...
  else if (!mapping || !__remove_mapping(..))
    /*
     * __remove_mapping() returns zero when
     * folio_ref_freeze(folio, refcount) fails /* point B */
     */
    goto keep_locked /* folio remains in cache */

keep_locked:
  folio_unlock(folio) /* lock released */

                                   /* lock acquired */
                                   btrfs_subpage_clear_updodate()
                                     bfs = folio->priv /* use-after-free */

This patch is intended as a minimal fix for backporting to affected
kernels. As of 6.17, a commit [0] replaced the vulnerable
filemap_lock_folio() + btrfs_subpage_clear_uptodate() sequence with
filemap_invalidate_inode() avoiding the race entirely. That commit was part
of a series with a different goal of preparing for large folio support so
backporting may not be straight forward.

Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Fixes: 9d9ea1e68a05 ("btrfs: subpage: fix relocation potentially overwriting last page data")

[0] 4e346baee95f ("btrfs: reloc: unconditionally invalidate the page cache for each cluster")
---
 fs/btrfs/relocation.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 0d5a3846811a..040e8f28b200 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2811,6 +2811,20 @@ static noinline_for_stack int prealloc_file_extent_cluster(struct reloc_control
 		 * will re-read the whole page anyway.
 		 */
 		if (!IS_ERR(folio)) {
+			/*
+			 * release_folio() could have cleared the folio private data
+			 * while we were not holding the lock.
+			 * Reset the mapping if needed so subpage operations can access
+			 * a valid private folio state.
+			 */
+			ret = set_folio_extent_mapped(folio);
+			if (ret) {
+				folio_unlock(folio);
+				folio_put(folio);
+
+				return ret;
+			}
+
 			btrfs_subpage_clear_uptodate(fs_info, folio, i_size,
 					round_up(i_size, PAGE_SIZE) - i_size);
 			folio_unlock(folio);
-- 
2.52.0

Re: [PATCH 6.12] btrfs: prevent use-after-free prealloc_file_extent_cluster()

Posted by Qu Wenruo 6 days, 20 hours ago


在 2026/2/1 05:23, JP Kobryn 写道:
> Users of filemap_lock_folio() need to guard against the situation where
> release_folio() has been invoked during reclaim but the folio was
> ultimately not removed from the page cache. This patch covers one location
> that was overlooked. Affected code has changed as of 6.17, so this patch is
> only targeting stable trees prior.
> 
> After acquiring the folio, use set_folio_extent_mapped() to ensure the
> folio private state is valid. This is especially important in the subpage
> case, where the private field is an allocated struct containing bitmap and
> lock data.
> 
> Without this protection, the race below is possible:
> 
> [mm] page cache reclaim path        [fs] relocation in subpage mode
> shrink_folio_list()
>    folio_trylock() /* lock acquired */
>    filemap_release_folio()
>      mapping->a_ops->release_folio()
>        btrfs_release_folio()
>          __btrfs_release_folio()
>            clear_folio_extent_mapped()
>              btrfs_detach_folio_state()
>                bfs = folio_detach_private(folio)
>                btrfs_free_folio_state(folio)
>                  kfree(bfs) /* point A */
> 
>                                     prealloc_file_extent_cluster()
>                                       filemap_lock_folio()
>                                         folio_try_get() /* inc refcount */
>                                         folio_lock() /* wait for lock */
> 
>    if (...)
>      ...
>    else if (!mapping || !__remove_mapping(..))
>      /*
>       * __remove_mapping() returns zero when
>       * folio_ref_freeze(folio, refcount) fails /* point B */
>       */
>      goto keep_locked /* folio remains in cache */
> 
> keep_locked:
>    folio_unlock(folio) /* lock released */
> 
>                                     /* lock acquired */
>                                     btrfs_subpage_clear_updodate()
>                                       bfs = folio->priv /* use-after-free */

This patch itself and the root cause look good to me.

Reviewed-by: Qu Wenruo <wqu@suse.com>

> 
> This patch is intended as a minimal fix for backporting to affected
> kernels. As of 6.17, a commit [0] replaced the vulnerable
> filemap_lock_folio() + btrfs_subpage_clear_uptodate() sequence with
> filemap_invalidate_inode() avoiding the race entirely. That commit was part
> of a series with a different goal of preparing for large folio support so
> backporting may not be straight forward.

However I'm not sure if stable tree even accepts non-upstreamed patches.

Thus the stable maintainer may ask you the same question as I did 
before, why not backport the upstream commit 4e346baee95f?

If it's lacking the reason why it's a bug fix, I believe you can modify 
the commit message to include the analyze and the fixes tag.


I'm also curious to learn the proper way for such situation.

Thanks,
Qu

> 
> Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
> Fixes: 9d9ea1e68a05 ("btrfs: subpage: fix relocation potentially overwriting last page data")
> 
> [0] 4e346baee95f ("btrfs: reloc: unconditionally invalidate the page cache for each cluster")
> ---
>   fs/btrfs/relocation.c | 14 ++++++++++++++
>   1 file changed, 14 insertions(+)
> 
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index 0d5a3846811a..040e8f28b200 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -2811,6 +2811,20 @@ static noinline_for_stack int prealloc_file_extent_cluster(struct reloc_control
>   		 * will re-read the whole page anyway.
>   		 */
>   		if (!IS_ERR(folio)) {
> +			/*
> +			 * release_folio() could have cleared the folio private data
> +			 * while we were not holding the lock.
> +			 * Reset the mapping if needed so subpage operations can access
> +			 * a valid private folio state.
> +			 */
> +			ret = set_folio_extent_mapped(folio);
> +			if (ret) {
> +				folio_unlock(folio);
> +				folio_put(folio);
> +
> +				return ret;
> +			}
> +
>   			btrfs_subpage_clear_uptodate(fs_info, folio, i_size,
>   					round_up(i_size, PAGE_SIZE) - i_size);
>   			folio_unlock(folio);

Re: [PATCH 6.12] btrfs: prevent use-after-free prealloc_file_extent_cluster()

Posted by JP Kobryn 6 days, 18 hours ago

On 1/31/26 1:08 PM, Qu Wenruo wrote:
> 
> 
> 在 2026/2/1 05:23, JP Kobryn 写道:
>> Users of filemap_lock_folio() need to guard against the situation where
>> release_folio() has been invoked during reclaim but the folio was
>> ultimately not removed from the page cache. This patch covers one 
>> location
>> that was overlooked. Affected code has changed as of 6.17, so this 
>> patch is
>> only targeting stable trees prior.
>>
>> After acquiring the folio, use set_folio_extent_mapped() to ensure the
>> folio private state is valid. This is especially important in the subpage
>> case, where the private field is an allocated struct containing bitmap 
>> and
>> lock data.
>>
>> Without this protection, the race below is possible:
>>
>> [mm] page cache reclaim path        [fs] relocation in subpage mode
>> shrink_folio_list()
>>    folio_trylock() /* lock acquired */
>>    filemap_release_folio()
>>      mapping->a_ops->release_folio()
>>        btrfs_release_folio()
>>          __btrfs_release_folio()
>>            clear_folio_extent_mapped()
>>              btrfs_detach_folio_state()
>>                bfs = folio_detach_private(folio)
>>                btrfs_free_folio_state(folio)
>>                  kfree(bfs) /* point A */
>>
>>                                     prealloc_file_extent_cluster()
>>                                       filemap_lock_folio()
>>                                         folio_try_get() /* inc 
>> refcount */
>>                                         folio_lock() /* wait for lock */
>>
>>    if (...)
>>      ...
>>    else if (!mapping || !__remove_mapping(..))
>>      /*
>>       * __remove_mapping() returns zero when
>>       * folio_ref_freeze(folio, refcount) fails /* point B */
>>       */
>>      goto keep_locked /* folio remains in cache */
>>
>> keep_locked:
>>    folio_unlock(folio) /* lock released */
>>
>>                                     /* lock acquired */
>>                                     btrfs_subpage_clear_updodate()
>>                                       bfs = folio->priv /* use-after- 
>> free */
> 
> This patch itself and the root cause look good to me.
> 
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> 

Much appreciated :)

>>
>> This patch is intended as a minimal fix for backporting to affected
>> kernels. As of 6.17, a commit [0] replaced the vulnerable
>> filemap_lock_folio() + btrfs_subpage_clear_uptodate() sequence with
>> filemap_invalidate_inode() avoiding the race entirely. That commit was 
>> part
>> of a series with a different goal of preparing for large folio support so
>> backporting may not be straight forward.
> 
> However I'm not sure if stable tree even accepts non-upstreamed patches.
> 
> Thus the stable maintainer may ask you the same question as I did 
> before, why not backport the upstream commit 4e346baee95f?

That commit relies on filemap_invalidate_folio() which was introduced in
6.10 so it would not apply to earlier stable branches.

We need to fix as far back as 5.15 so I can send one additional patch to
cover stable trees 5.15 to 6.6. The patch would be almost identical,
with the only change being using the page API instead of the folio API
(set_folio_extent_mapped() -> set_page_extent_mapped()). Let me know if
you're in agreement and I can send the extra patch.

> 
> If it's lacking the reason why it's a bug fix, I believe you can modify 
> the commit message to include the analyze and the fixes tag.
> 
> 
> I'm also curious to learn the proper way for such situation.

It's new to me as well. For reference, there are some commits on the
list that have language like this: "This is a stable-only fix".

Re: [PATCH 6.12] btrfs: prevent use-after-free prealloc_file_extent_cluster()

Posted by Qu Wenruo 6 days, 18 hours ago


在 2026/2/1 09:53, JP Kobryn 写道:
> On 1/31/26 1:08 PM, Qu Wenruo wrote:
>>
>>
>> 在 2026/2/1 05:23, JP Kobryn 写道:
>>> Users of filemap_lock_folio() need to guard against the situation where
>>> release_folio() has been invoked during reclaim but the folio was
>>> ultimately not removed from the page cache. This patch covers one 
>>> location
>>> that was overlooked. Affected code has changed as of 6.17, so this 
>>> patch is
>>> only targeting stable trees prior.
>>>
>>> After acquiring the folio, use set_folio_extent_mapped() to ensure the
>>> folio private state is valid. This is especially important in the 
>>> subpage
>>> case, where the private field is an allocated struct containing 
>>> bitmap and
>>> lock data.
>>>
>>> Without this protection, the race below is possible:
>>>
>>> [mm] page cache reclaim path        [fs] relocation in subpage mode
>>> shrink_folio_list()
>>>    folio_trylock() /* lock acquired */
>>>    filemap_release_folio()
>>>      mapping->a_ops->release_folio()
>>>        btrfs_release_folio()
>>>          __btrfs_release_folio()
>>>            clear_folio_extent_mapped()
>>>              btrfs_detach_folio_state()
>>>                bfs = folio_detach_private(folio)
>>>                btrfs_free_folio_state(folio)
>>>                  kfree(bfs) /* point A */
>>>
>>>                                     prealloc_file_extent_cluster()
>>>                                       filemap_lock_folio()
>>>                                         folio_try_get() /* inc 
>>> refcount */
>>>                                         folio_lock() /* wait for lock */
>>>
>>>    if (...)
>>>      ...
>>>    else if (!mapping || !__remove_mapping(..))
>>>      /*
>>>       * __remove_mapping() returns zero when
>>>       * folio_ref_freeze(folio, refcount) fails /* point B */
>>>       */
>>>      goto keep_locked /* folio remains in cache */
>>>
>>> keep_locked:
>>>    folio_unlock(folio) /* lock released */
>>>
>>>                                     /* lock acquired */
>>>                                     btrfs_subpage_clear_updodate()
>>>                                       bfs = folio->priv /* use-after- 
>>> free */
>>
>> This patch itself and the root cause look good to me.
>>
>> Reviewed-by: Qu Wenruo <wqu@suse.com>
>>
> 
> Much appreciated :)
> 
>>>
>>> This patch is intended as a minimal fix for backporting to affected
>>> kernels. As of 6.17, a commit [0] replaced the vulnerable
>>> filemap_lock_folio() + btrfs_subpage_clear_uptodate() sequence with
>>> filemap_invalidate_inode() avoiding the race entirely. That commit 
>>> was part
>>> of a series with a different goal of preparing for large folio 
>>> support so
>>> backporting may not be straight forward.
>>
>> However I'm not sure if stable tree even accepts non-upstreamed patches.
>>
>> Thus the stable maintainer may ask you the same question as I did 
>> before, why not backport the upstream commit 4e346baee95f?
> 
> That commit relies on filemap_invalidate_folio() which was introduced in
> 6.10 so it would not apply to earlier stable branches.
> 
> We need to fix as far back as 5.15 so I can send one additional patch to
> cover stable trees 5.15 to 6.6. The patch would be almost identical,
> with the only change being using the page API instead of the folio API
> (set_folio_extent_mapped() -> set_page_extent_mapped()). Let me know if
> you're in agreement and I can send the extra patch.

If stable chooses to go this path, I'm totally fine with similar backports.

Thanks,
Qu

> 
>>
>> If it's lacking the reason why it's a bug fix, I believe you can 
>> modify the commit message to include the analyze and the fixes tag.
>>
>>
>> I'm also curious to learn the proper way for such situation.
> 
> It's new to me as well. For reference, there are some commits on the
> list that have language like this: "This is a stable-only fix".

Re: [PATCH 6.12] btrfs: prevent use-after-free prealloc_file_extent_cluster()

Posted by JP Kobryn 6 days, 13 hours ago

On 1/31/26 3:32 PM, Qu Wenruo wrote:
> 
> 
> 在 2026/2/1 09:53, JP Kobryn 写道:
>> On 1/31/26 1:08 PM, Qu Wenruo wrote:
>>>
>>>
>>> 在 2026/2/1 05:23, JP Kobryn 写道:
>>>> Users of filemap_lock_folio() need to guard against the situation where
>>>> release_folio() has been invoked during reclaim but the folio was
>>>> ultimately not removed from the page cache. This patch covers one 
>>>> location
>>>> that was overlooked. Affected code has changed as of 6.17, so this 
>>>> patch is
>>>> only targeting stable trees prior.
>>>>
>>>> After acquiring the folio, use set_folio_extent_mapped() to ensure the
>>>> folio private state is valid. This is especially important in the 
>>>> subpage
>>>> case, where the private field is an allocated struct containing 
>>>> bitmap and
>>>> lock data.
>>>>
>>>> Without this protection, the race below is possible:
>>>>
>>>> [mm] page cache reclaim path        [fs] relocation in subpage mode
>>>> shrink_folio_list()
>>>>    folio_trylock() /* lock acquired */
>>>>    filemap_release_folio()
>>>>      mapping->a_ops->release_folio()
>>>>        btrfs_release_folio()
>>>>          __btrfs_release_folio()
>>>>            clear_folio_extent_mapped()
>>>>              btrfs_detach_folio_state()
>>>>                bfs = folio_detach_private(folio)
>>>>                btrfs_free_folio_state(folio)
>>>>                  kfree(bfs) /* point A */
>>>>
>>>>                                     prealloc_file_extent_cluster()
>>>>                                       filemap_lock_folio()
>>>>                                         folio_try_get() /* inc 
>>>> refcount */
>>>>                                         folio_lock() /* wait for 
>>>> lock */
>>>>
>>>>    if (...)
>>>>      ...
>>>>    else if (!mapping || !__remove_mapping(..))
>>>>      /*
>>>>       * __remove_mapping() returns zero when
>>>>       * folio_ref_freeze(folio, refcount) fails /* point B */
>>>>       */
>>>>      goto keep_locked /* folio remains in cache */
>>>>
>>>> keep_locked:
>>>>    folio_unlock(folio) /* lock released */
>>>>
>>>>                                     /* lock acquired */
>>>>                                     btrfs_subpage_clear_updodate()
>>>>                                       bfs = folio->priv /* use- 
>>>> after- free */
>>>
>>> This patch itself and the root cause look good to me.
>>>
>>> Reviewed-by: Qu Wenruo <wqu@suse.com>
>>>
>>
>> Much appreciated :)
>>
>>>>
>>>> This patch is intended as a minimal fix for backporting to affected
>>>> kernels. As of 6.17, a commit [0] replaced the vulnerable
>>>> filemap_lock_folio() + btrfs_subpage_clear_uptodate() sequence with
>>>> filemap_invalidate_inode() avoiding the race entirely. That commit 
>>>> was part
>>>> of a series with a different goal of preparing for large folio 
>>>> support so
>>>> backporting may not be straight forward.
>>>
>>> However I'm not sure if stable tree even accepts non-upstreamed patches.
>>>
>>> Thus the stable maintainer may ask you the same question as I did 
>>> before, why not backport the upstream commit 4e346baee95f?
>>
>> That commit relies on filemap_invalidate_folio() which was introduced in
>> 6.10 so it would not apply to earlier stable branches.
>>
>> We need to fix as far back as 5.15 so I can send one additional patch to
>> cover stable trees 5.15 to 6.6. The patch would be almost identical,
>> with the only change being using the page API instead of the folio API
>> (set_folio_extent_mapped() -> set_page_extent_mapped()). Let me know if
>> you're in agreement and I can send the extra patch.
> 
> If stable chooses to go this path, I'm totally fine with similar backports.
> 

Great. Let me resend this patch and the additional one with the proper
stable tags showing the specific kernel versions each apply to.