fs/f2fs/data.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
f2fs_write_end_io() currently decrements the writeback page counter before
waking sbi->cp_wait for the last F2FS_WB_CP_DATA completion.
That decrement can drop the F2FS_WB_CP_DATA count to zero. It can unblock
a concurrent unmount path waiting in f2fs_wait_on_all_pages(). Unmount can
continue through f2fs_put_super() and eventually free sbi while the end_io
callback is still about to evaluate wq_has_sleeper() and wake_up() on
sbi->cp_wait.
Commit 2d9c4a4ed4ee ("f2fs: fix UAF caused by decrementing sbi->nr_pages[]
in f2fs_write_end_io()") fixed one post-decrement sbi access by moving the
warm-node-list handling before dec_page_count(). The compressed writeback
path follows the same rule and documents that dec_page_count() must be the
last access to sbi when it can drop F2FS_WB_CP_DATA to zero.
Apply the same ordering rule to the cp_wait wakeup. Check whether this is
the last F2FS_WB_CP_DATA completion and wake the waiter before the counter
decrement. Then the callback no longer dereferences sbi->cp_wait after the
lifetime boundary. A waiter that runs before the decrement may observe old
count and sleep until the one-jiffy timeout, but correctness no longer
depends on touching sbi after the counter reaches zero.
Fixes: ce2739e482bc ("f2fs: fix to avoid UAF in f2fs_write_end_io()")
Cc: stable@kernel.org
Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com>
---
fs/f2fs/data.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
index d83a21998ec2..b1e9fb5ca159 100644
--- a/fs/f2fs/data.c
+++ b/fs/f2fs/data.c
@@ -392,16 +392,16 @@ static void f2fs_write_end_io(struct bio *bio)
if (f2fs_in_warm_node_list(folio))
f2fs_del_fsync_node_entry(sbi, folio);
- dec_page_count(sbi, type);
-
/*
- * we should access sbi before folio_end_writeback() to
- * avoid racing w/ kill_f2fs_super()
+ * Access sbi before dec_page_count() and folio_end_writeback()
+ * to avoid racing w/ kill_f2fs_super().
*/
- if (type == F2FS_WB_CP_DATA && !get_pages(sbi, type) &&
- wq_has_sleeper(&sbi->cp_wait))
+ if (type == F2FS_WB_CP_DATA && get_pages(sbi, type) == 1 &&
+ wq_has_sleeper(&sbi->cp_wait))
wake_up(&sbi->cp_wait);
+ dec_page_count(sbi, type);
+
folio_clear_f2fs_gcing(folio);
folio_end_writeback(folio);
}
--
2.43.0
On 5/25/26 13:30, Wenjie Qi wrote:
> f2fs_write_end_io() currently decrements the writeback page counter before
> waking sbi->cp_wait for the last F2FS_WB_CP_DATA completion.
>
> That decrement can drop the F2FS_WB_CP_DATA count to zero. It can unblock
> a concurrent unmount path waiting in f2fs_wait_on_all_pages(). Unmount can
> continue through f2fs_put_super() and eventually free sbi while the end_io
> callback is still about to evaluate wq_has_sleeper() and wake_up() on
> sbi->cp_wait.
>
> Commit 2d9c4a4ed4ee ("f2fs: fix UAF caused by decrementing sbi->nr_pages[]
> in f2fs_write_end_io()") fixed one post-decrement sbi access by moving the
> warm-node-list handling before dec_page_count(). The compressed writeback
> path follows the same rule and documents that dec_page_count() must be the
> last access to sbi when it can drop F2FS_WB_CP_DATA to zero.
>
> Apply the same ordering rule to the cp_wait wakeup. Check whether this is
> the last F2FS_WB_CP_DATA completion and wake the waiter before the counter
> decrement. Then the callback no longer dereferences sbi->cp_wait after the
> lifetime boundary. A waiter that runs before the decrement may observe old
> count and sleep until the one-jiffy timeout, but correctness no longer
> depends on touching sbi after the counter reaches zero.
>
> Fixes: ce2739e482bc ("f2fs: fix to avoid UAF in f2fs_write_end_io()")
> Cc: stable@kernel.org
> Signed-off-by: Wenjie Qi <qiwenjie@xiaomi.com>
> ---
> fs/f2fs/data.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index d83a21998ec2..b1e9fb5ca159 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -392,16 +392,16 @@ static void f2fs_write_end_io(struct bio *bio)
> if (f2fs_in_warm_node_list(folio))
> f2fs_del_fsync_node_entry(sbi, folio);
>
> - dec_page_count(sbi, type);
> -
> /*
> - * we should access sbi before folio_end_writeback() to
> - * avoid racing w/ kill_f2fs_super()
> + * Access sbi before dec_page_count() and folio_end_writeback()
> + * to avoid racing w/ kill_f2fs_super().
> */
> - if (type == F2FS_WB_CP_DATA && !get_pages(sbi, type) &&
> - wq_has_sleeper(&sbi->cp_wait))
> + if (type == F2FS_WB_CP_DATA && get_pages(sbi, type) == 1 &&
> + wq_has_sleeper(&sbi->cp_wait))
> wake_up(&sbi->cp_wait);
If we call dec_page_count() after wake_up(), get_pages() in below function
may return true, and then ckpt thread may need to wait on cp_wait for another
DEFAULT_SCHEDULE_TIMEOUT?
void f2fs_wait_on_all_pages(struct f2fs_sb_info *sbi, int type)
{
DEFINE_WAIT(wait);
for (;;) {
if (!get_pages(sbi, type))
break;
if (unlikely(f2fs_cp_error(sbi) &&
!is_sbi_flag_set(sbi, SBI_IS_CLOSE)))
break;
if (type == F2FS_DIRTY_META)
f2fs_sync_meta_pages(sbi, LONG_MAX, FS_CP_META_IO);
else if (type == F2FS_WB_CP_DATA)
f2fs_submit_merged_write(sbi, DATA);
prepare_to_wait(&sbi->cp_wait, &wait, TASK_UNINTERRUPTIBLE);
io_schedule_timeout(DEFAULT_SCHEDULE_TIMEOUT);
}
finish_wait(&sbi->cp_wait, &wait);
}
How about:
static inline int dec_return_page_count(struct f2fs_sb_info *sbi, int count_type)
{
return atomic_dec_return(&sbi->nr_pages[count_type]);
}
f2fs_write_end_io()
{
...
bool need_wakeup = false;
...
if (type == F2FS_WB_CP_DATA)
need_wakeup = !dec_return_page_count(sbi, type);
else
dec_page_count(sbi, type);
if (need_wakeup && wq_has_sleeper(&sbi->cp_wait))
wake_up(&sbi->cp_wait);
...
}
Thanks,
>
> + dec_page_count(sbi, type);
> +
> folio_clear_f2fs_gcing(folio);
> folio_end_writeback(folio);
> }
Agreed, v1 can wake cp_wait before nr_pages[F2FS_WB_CP_DATA] reaches zero, so the waiter may recheck the counter, still see a non-zero value, and sleep until DEFAULT_SCHEDULE_TIMEOUT. I sent v2 to address this: https://lore.kernel.org/r/20260526034439.1017521-1-qiwenjie@xiaomi.com In v2, I used atomic_dec_and_lock_irqsave() for the F2FS_WB_CP_DATA zero transition and wake waiters while holding cp_wait.lock. f2fs_wait_on_all_pages() also prepares the wait entry and rechecks get_pages() under the same lock before sleeping. This should avoid the missed wakeup you pointed out, while also avoiding an unprotected post-zero access to sbi->cp_wait in f2fs_write_end_io().
© 2016 - 2026 Red Hat, Inc.