[v1] f2fs: pack same-inode blocks by inode during FG_GC

[PATCH 0/2] f2fs: pack same-inode blocks by inode during FG_GC
Posted by Daejun Park 1 month ago
Hi,

This series teaches FG_GC to migrate a victim section's valid blocks in
inode order instead of source segment-offset order, so destination
curseg writes form inode-contiguous runs that span the whole victim
section.  The end result is a measurable drop in post-GC file
fragmentation (filefrag total extents) on large sections.

Patch 1 is a pure refactor: it lifts the per-block migration body
(lock acquisition, move_data_{page,block}() dispatch, rwsem release,
stat update) out of gc_data_segment() into a do_migrate_one_data_block()
helper, and lets add_gc_inode() return the inserted inode_entry pointer.
Patch 2 is the actual packing change: it hangs a per-inode gc_blocks
list off the inode_entry created in phase 3, then drains it once per
section via pack_gc_section() after every source segment has been
parsed.

Activation conditions:
  * sbi->gc_inode_local_packing == true.  Exposed as a sysfs RW knob,
    default derived from __is_large_section(sbi).  Sysfs writes other
    than 0 or 1 are rejected.
  * gc_type == FG_GC.  BG_GC's move_data_page() path defers destination
    allocation to the writeback flusher, so reordering applied during
    GC would be lost.

The packing snapshot is taken once per do_garbage_collect() into a
local 'pack_by_inode' bool and threaded through gc_data_segment() and
pack_gc_section() so a concurrent sysfs toggle cannot make phase 3
enqueue blocks that pack_gc_section() then skips.

Per-block records use a dedicated f2fs_gc_block slab
(SLAB_RECLAIM_ACCOUNT via f2fs_kmem_cache_create); on a fully valid
64 MiB section (SEGS_PER_SEC=32) one section can queue up to
SEGS_PER_SEC * BLKS_PER_SEG records (~512 KiB at 32 B per gc_block).
On gc_block alloc failure the block falls through to the legacy
phase 4 'goto do_migrate' body, so FG_GC progress is preserved under
memory pressure (the very condition that triggers FG_GC).

Measurements (QEMU virtio guest, 4-cycle fragmentation harness,
gc_urgent 40 s):

  Large section (-s 32 = 64 MiB, 64 files x 4 MiB):
    legacy   65536 -> 65536  ( 0 % reduction)
    packed   65536 -> 49170  (24 % reduction)

  Default section (-s 1 = 2 MiB, 128 files x 256 KiB):
    legacy    8192 ->  8192  ( 0 % reduction)
    packed    8192 ->  7690  ( 6 % reduction)

  Natural FG_GC under tight cold migration
  (-s 32, 2 GiB disk 90 % fill, 6 hot x 200 MiB + 6 cold x 100 MiB
   interleaved, background_gc=sync, 300 s hot rewrite):
    legacy   cold extents 350 -> 357 (+7,  no improvement)
    packed   cold extents 350 -> 132 (-218, -63 % reduction)
    move_blks        legacy 42344  packed 34822  (-18 %)
    skipped_gc_rwsem legacy 108    packed   44   (-59 %)
    hot rewrite iters in fixed 300 s window: +45 %

Daejun Park (2):
  f2fs: extract do_migrate_one_data_block() helper for GC migration
  f2fs: pack same-inode blocks by inode during FG_GC

 Documentation/ABI/testing/sysfs-fs-f2fs |  10 ++
 fs/f2fs/f2fs.h                          |   7 +-
 fs/f2fs/gc.c                            | 218 ++++++++++++++++++------
 fs/f2fs/super.c                         |   1 +
 fs/f2fs/sysfs.c                         |   7 +
 5 files changed, 187 insertions(+), 56 deletions(-)

-- 
2.43.0
[PATCH 1/2] f2fs: extract do_migrate_one_data_block() helper for GC migration