f2fs: fix large folio read corner cases for immutable files

[PATCH v1 1/5] f2fs: Zero f2fs_folio_state on allocation

Posted by Nanzhe Zhao 1 month ago

f2fs_folio_state is attached to folio->private and is expected to start
with read_pages_pending == 0.  However, the structure was allocated from
ffs_entry_slab without being fully initialized, which can leave
read_pages_pending with stale values.

Allocate the object with __GFP_ZERO so all fields are reliably zeroed at
creation time.

Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
---
 fs/f2fs/data.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index 471e52c6c1e0..ab091b294fa7 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -2389,7 +2389,7 @@ static struct f2fs_folio_state *ffs_find_or_alloc(struct folio *folio)
 	if (ffs)
 		return ffs;

-	ffs = f2fs_kmem_cache_alloc(ffs_entry_slab, GFP_NOIO, true, NULL);
+	ffs = f2fs_kmem_cache_alloc(ffs_entry_slab, GFP_NOIO | __GFP_ZERO, true, NULL);

 	spin_lock_init(&ffs->state_lock);
 	folio_attach_private(folio, ffs);
--
2.34.1

Re: [PATCH v1 1/5] f2fs: Zero f2fs_folio_state on allocation

Posted by Chao Yu 1 month ago

On 1/5/2026 11:30 PM, Nanzhe Zhao wrote:
> f2fs_folio_state is attached to folio->private and is expected to start
> with read_pages_pending == 0.  However, the structure was allocated from
> ffs_entry_slab without being fully initialized, which can leave
> read_pages_pending with stale values.
> 
> Allocate the object with __GFP_ZERO so all fields are reliably zeroed at
> creation time.
> 
> Signed-off-by: Nanzhe Zhao <nzzhao@126.com>

Reviewed-by: Chao Yu <chao@kernel.org>

Thanks,

Re: [PATCH v1 1/5] f2fs: Zero f2fs_folio_state on allocation

Posted by Barry Song 1 month ago

On Tue, Jan 6, 2026 at 12:12 AM Nanzhe Zhao <nzzhao@126.com> wrote:
>
> f2fs_folio_state is attached to folio->private and is expected to start
> with read_pages_pending == 0.  However, the structure was allocated from
> ffs_entry_slab without being fully initialized, which can leave
> read_pages_pending with stale values.
>
> Allocate the object with __GFP_ZERO so all fields are reliably zeroed at
> creation time.
>
> Signed-off-by: Nanzhe Zhao <nzzhao@126.com>


We already have GFP_F2FS_ZERO, but it includes GFP_IO. Should we
introduce another variant, such as GFP_F2FS_NOIO_ZERO (or similar)?
Overall, LGTM.

Reviewed-by: Barry Song <baohua@kernel.org>

> ---
>  fs/f2fs/data.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 471e52c6c1e0..ab091b294fa7 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -2389,7 +2389,7 @@ static struct f2fs_folio_state *ffs_find_or_alloc(struct folio *folio)
>         if (ffs)
>                 return ffs;
>
> -       ffs = f2fs_kmem_cache_alloc(ffs_entry_slab, GFP_NOIO, true, NULL);
> +       ffs = f2fs_kmem_cache_alloc(ffs_entry_slab, GFP_NOIO | __GFP_ZERO, true, NULL);
>
>         spin_lock_init(&ffs->state_lock);
>         folio_attach_private(folio, ffs);
> --
> 2.34.1

Thanks
Barry

Re:Re: [PATCH v1 1/5] f2fs: Zero f2fs_folio_state on allocation

Posted by Nanzhe Zhao 1 month ago

Hi Barry:

>At 2026-01-06 11:38:49, "Barry Song" <21cnbao@gmail.com> wrote:
>>On Tue, Jan 6, 2026 at 12:12 AM Nanzhe Zhao <nzzhao@126.com> wrote:
>>>
>>> f2fs_folio_state is attached to folio->private and is expected to start
>>> with read_pages_pending == 0.  However, the structure was allocated from
>>> ffs_entry_slab without being fully initialized, which can leave
>>> read_pages_pending with stale values.
>>>
>>> Allocate the object with __GFP_ZERO so all fields are reliably zeroed at
>>> creation time.
>>>
>>> Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
>>
>>
>>We already have GFP_F2FS_ZERO, but it includes GFP_IO. Should we
>>introduce another variant, such as GFP_F2FS_NOIO_ZERO (or similar)?
>>Overall, LGTM.
>>

I'm still not fully understand about the exact semantics of GFP_NOIO vs GFP_NOFS. 
I did a bit of digging and, in the current buffered read / readahead context, it seems 
like there may be no meaningful difference for the purpose of avoiding direct-reclaim 
recursion deadlocks?

My current (possibly incomplete) understanding is that in may_enter_fs(), GFP_NOIO 
only changes behavior for swapcache folios, rather than file-backed folios that are
currently in the read IO path,and the swap writeback path won't recurse back into f2fs's 
own writeback function anyway. (On phones there usually isn't  a swap partition; for zram 
 I guess swap writeback is effectively writing to RAM via the zram block device ? 
Sorry for  not being very familiar with the details there.)

I noticed iomap's ifs_alloc uses GFP_NOFS | __GFP_NOFAIL. So if GFP_NOFS is acceptable here, 
we could simply use GFP_F2FS_ZERO and avoid introducing a new GFP_F2FS_NOIO_ZERO variant?

Just curious.I will vote  for GFP_NOIO  from semantic clarity perspective here.

Thanks,
Nanzhe

Re: Re: [PATCH v1 1/5] f2fs: Zero f2fs_folio_state on allocation

Posted by Barry Song 1 month ago

On Wed, Jan 7, 2026 at 4:45 PM Nanzhe Zhao <nzzhao@126.com> wrote:
>
> Hi Barry:
>
> >At 2026-01-06 11:38:49, "Barry Song" <21cnbao@gmail.com> wrote:
> >>On Tue, Jan 6, 2026 at 12:12 AM Nanzhe Zhao <nzzhao@126.com> wrote:
> >>>
> >>> f2fs_folio_state is attached to folio->private and is expected to start
> >>> with read_pages_pending == 0.  However, the structure was allocated from
> >>> ffs_entry_slab without being fully initialized, which can leave
> >>> read_pages_pending with stale values.
> >>>
> >>> Allocate the object with __GFP_ZERO so all fields are reliably zeroed at
> >>> creation time.
> >>>
> >>> Signed-off-by: Nanzhe Zhao <nzzhao@126.com>
> >>
> >>
> >>We already have GFP_F2FS_ZERO, but it includes GFP_IO. Should we
> >>introduce another variant, such as GFP_F2FS_NOIO_ZERO (or similar)?
> >>Overall, LGTM.
> >>
>
> I'm still not fully understand about the exact semantics of GFP_NOIO vs GFP_NOFS.
> I did a bit of digging and, in the current buffered read / readahead context, it seems
> like there may be no meaningful difference for the purpose of avoiding direct-reclaim
> recursion deadlocks?

With GFP_NOIO, we will not swap out pages, including anonymous folios.

                if (folio_test_anon(folio) && folio_test_swapbacked(folio)) {
                        if (!folio_test_swapcache(folio)) {
                                if (!(sc->gfp_mask & __GFP_IO))
                                        goto keep_locked;

When using GFP_NOFS, reclaim can still swap out an anon folio,
provided its swap entry is not filesystem-backed
(see folio_swap_flags(folio)).

static bool may_enter_fs(struct folio *folio, gfp_t gfp_mask)
{
        if (gfp_mask & __GFP_FS)
                return true;
        if (!folio_test_swapcache(folio) || !(gfp_mask & __GFP_IO))
                return false;
        /*
         * We can "enter_fs" for swap-cache with only __GFP_IO
         * providing this isn't SWP_FS_OPS.
         * ->flags can be updated non-atomicially (scan_swap_map_slots),
         * but that will never affect SWP_FS_OPS, so the data_race
         * is safe.
         */
        return !data_race(folio_swap_flags(folio) & SWP_FS_OPS);
}

Note that swap may be backed either by a filesystem swapfile or
directly by a block device.

In short, GFP_NOIO is stricter than GFP_NOFS: it disallows any I/O,
even if the I/O does not involve a filesystem, whereas GFP_NOFS
still permits I/O that is not filesystem-related.

>
> My current (possibly incomplete) understanding is that in may_enter_fs(), GFP_NOIO
> only changes behavior for swapcache folios, rather than file-backed folios that are
> currently in the read IO path,and the swap writeback path won't recurse back into f2fs's
> own writeback function anyway. (On phones there usually isn't  a swap partition; for zram
>  I guess swap writeback is effectively writing to RAM via the zram block device ?
> Sorry for  not being very familiar with the details there.)

This can be the case for a swapfile on F2FS. Note that the check is
performed per folio. On a system with both zRAM and a filesystem-
backed swapfile, some folios may be swapped out while others may
not, depending on where their swap slots are allocated.

>
> I noticed iomap's ifs_alloc uses GFP_NOFS | __GFP_NOFAIL. So if GFP_NOFS is acceptable here,
> we could simply use GFP_F2FS_ZERO and avoid introducing a new GFP_F2FS_NOIO_ZERO variant?
>
> Just curious.I will vote  for GFP_NOIO  from semantic clarity perspective here.

In general, GFP_NOIO is used when handling bios or requests.

Thanks
Barry