Hi,
This series teaches FG_GC to migrate a victim section's valid blocks in
inode order instead of source segment-offset order, so destination
curseg writes form inode-contiguous runs that span the whole victim
section. The end result is a measurable drop in post-GC file
fragmentation (filefrag total extents) on large sections.
Patch 1 is a pure refactor: it lifts the per-block migration body
(lock acquisition, move_data_{page,block}() dispatch, rwsem release,
stat update) out of gc_data_segment() into a do_migrate_one_data_block()
helper, and lets add_gc_inode() return the inserted inode_entry pointer.
Patch 2 is the actual packing change: it hangs a per-inode gc_blocks
list off the inode_entry created in phase 3, then drains it once per
section via pack_gc_section() after every source segment has been
parsed.
Activation conditions:
* sbi->gc_inode_local_packing == true. Exposed as a sysfs RW knob,
default derived from __is_large_section(sbi). Sysfs writes other
than 0 or 1 are rejected.
* gc_type == FG_GC. BG_GC's move_data_page() path defers destination
allocation to the writeback flusher, so reordering applied during
GC would be lost.
The packing snapshot is taken once per do_garbage_collect() into a
local 'pack_by_inode' bool and threaded through gc_data_segment() and
pack_gc_section() so a concurrent sysfs toggle cannot make phase 3
enqueue blocks that pack_gc_section() then skips.
Per-block records use a dedicated f2fs_gc_block slab
(SLAB_RECLAIM_ACCOUNT via f2fs_kmem_cache_create); on a fully valid
64 MiB section (SEGS_PER_SEC=32) one section can queue up to
SEGS_PER_SEC * BLKS_PER_SEG records (~512 KiB at 32 B per gc_block).
On gc_block alloc failure the block falls through to the legacy
phase 4 'goto do_migrate' body, so FG_GC progress is preserved under
memory pressure (the very condition that triggers FG_GC).
Measurements (QEMU virtio guest, 4-cycle fragmentation harness,
gc_urgent 40 s):
Large section (-s 32 = 64 MiB, 64 files x 4 MiB):
legacy 65536 -> 65536 ( 0 % reduction)
packed 65536 -> 49170 (24 % reduction)
Default section (-s 1 = 2 MiB, 128 files x 256 KiB):
legacy 8192 -> 8192 ( 0 % reduction)
packed 8192 -> 7690 ( 6 % reduction)
Natural FG_GC under tight cold migration
(-s 32, 2 GiB disk 90 % fill, 6 hot x 200 MiB + 6 cold x 100 MiB
interleaved, background_gc=sync, 300 s hot rewrite):
legacy cold extents 350 -> 357 (+7, no improvement)
packed cold extents 350 -> 132 (-218, -63 % reduction)
move_blks legacy 42344 packed 34822 (-18 %)
skipped_gc_rwsem legacy 108 packed 44 (-59 %)
hot rewrite iters in fixed 300 s window: +45 %
Daejun Park (2):
f2fs: extract do_migrate_one_data_block() helper for GC migration
f2fs: pack same-inode blocks by inode during FG_GC
Documentation/ABI/testing/sysfs-fs-f2fs | 10 ++
fs/f2fs/f2fs.h | 7 +-
fs/f2fs/gc.c | 218 ++++++++++++++++++------
fs/f2fs/super.c | 1 +
fs/f2fs/sysfs.c | 7 +
5 files changed, 187 insertions(+), 56 deletions(-)
--
2.43.0