:p
atchew
Login
v6->v7: 1. Remove struct ext4_mark_context and call ext4_mb_mark_context with all needed arguments directly. 2. Add new patch to make state in ext4_mb_mark_bb bool and make state in new-added function ext4_mb_mark_context bool. 3. Fix typos in git messages. 4. Keep on-disk bitmap marking code tight in patch separating on-disk bitmap and buddy bitmap freeing. 5. Test 64k blocksize instead of 256k blocksize. 6. Remove RVB from Ojaswin in changed patches and collect updated RVB from Ritesh. 7. As there is no much functional change to initial version, "kvm-xfstest smoke" test and kunit test are ran and result is good. 8. Plan to add ext4_inode_block_valid in ext4_mb_mark_context in new series! v5->v6: 1. Separate block bitmap and buddy bitmap freeing in individual patch and rewrite the descriptions. 2. Remove #ifdef around KUNIT_STATIC_STUB_REDIRECT which should be only defined when CONFIG_KUNIT is enabled after fix [7] which was merged into kunit-next/kunit 3. Use mbt prefix to distiguish test APIs from actual kernel APIs. 4. Add prepare function for ext4_mark_context and rename ext4_mb_mark_group_bb to ext4_mb_mark_context 5. Support to run mballoc test with different layouts setting and different block size is tested. v4->v5: 1. WARN on free blocks to uninitialized group is removed as normal fast commit route may triggers this, see [1] for details. The review tag from Ojaswin of changed patch is also removed and a futher review is needed. v3->v4: 1. Collect Reviewed-by from Ojaswin 2. Do improve as Ojaswin kindly suggested: Fix typo in commit, WARN if try to clear bit of uninitialized group and improve refactoring of AGGRESSIVE_CHECK code. 3. Fix conflic on patch 16 4. Improve git log in patch 16,17 v2->v3: 1. Fix build warnings on "ext4: add some kunit stub for mballoc kunit test" and "ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc" This series is a new version of unmerged patches from [1]. The first 6 patches of this series factor out codes to mark blocks used or freed which will update on disk block bitmap and group descriptor. Several reasons to do this: 1. pair behavior of alloc/free bits. For example, ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this. 2. remove repeat code to read from disk, update and write back to disk. 3. reduce future unit test mocks to avoid real IO to update structure on disk. The last 2 patches add a unit test for mballoc. Before more unit tests are added, there are something should be discussed: 1. How to test static function in mballoc.c Currently, include mballoc-test.c in mballoc.c to test static function in mballoc.c from mballoc-test.c which is one way suggested in [2]. Not sure if there is any more elegant way to test static function without touch mballoc.c. 2. How to add mocks to function in mballoc.c which may issue IO to disk Currently, KUNIT_STATIC_STUB_REDIRECT is added to functions as suggested in kunit document [3]. 3. How to simulate a block bitmap. Currently, a fake buffer_head with bitmap data is returned, then no futher change is needed. If we simulate a block bitmap with an array of data structure like: struct test_bitmap { unsigned int start; unsigned int len; } which is suggested by Theodore in [4], then we need to add mocks to function which expected bitmap from bitmap_bh->b_data, like mb_find_next_bit, mb_find_next_zero_bit and maybe more. Would like to hear any suggestion! Thanks! I run kvm-xfstest with config "ext4/all" and "-g auto" together with patchset for resize, you can see detail report in [6]. Unit test result is as followings: # ./tools/testing/kunit/kunit.py run --kunitconfig=fs/ext4/.kunitconfig --raw_output [02:56:14] Configuring KUnit Kernel ... [02:56:14] Building KUnit Kernel ... Populating config with: $ make ARCH=um O=.kunit olddefconfig Building with: $ make ARCH=um O=.kunit --jobs=88 [02:56:19] Starting KUnit Kernel (1/1)... KTAP version 1 1..2 KTAP version 1 # Subtest: ext4_mballoc_test 1..1 KTAP version 1 # Subtest: test_new_blocks_simple ok 1 block_bits=10 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64 ok 2 block_bits=12 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64 ok 3 block_bits=16 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64 # test_new_blocks_simple: pass:3 fail:0 skip:0 total:3 ok 1 test_new_blocks_simple # Totals: pass:3 fail:0 skip:0 total:3 ok 1 ext4_mballoc_test KTAP version 1 # Subtest: ext4_inode_test 1..1 KTAP version 1 # Subtest: inode_test_xtimestamp_decoding ok 1 1901-12-13 Lower bound of 32bit < 0 timestamp, no extra bits ok 2 1969-12-31 Upper bound of 32bit < 0 timestamp, no extra bits ok 3 1970-01-01 Lower bound of 32bit >=0 timestamp, no extra bits ok 4 2038-01-19 Upper bound of 32bit >=0 timestamp, no extra bits ok 5 2038-01-19 Lower bound of 32bit <0 timestamp, lo extra sec bit on ok 6 2106-02-07 Upper bound of 32bit <0 timestamp, lo extra sec bit on ok 7 2106-02-07 Lower bound of 32bit >=0 timestamp, lo extra sec bit on ok 8 2174-02-25 Upper bound of 32bit >=0 timestamp, lo extra sec bit on ok 9 2174-02-25 Lower bound of 32bit <0 timestamp, hi extra sec bit on ok 10 2242-03-16 Upper bound of 32bit <0 timestamp, hi extra sec bit on ok 11 2242-03-16 Lower bound of 32bit >=0 timestamp, hi extra sec bit on ok 12 2310-04-04 Upper bound of 32bit >=0 timestamp, hi extra sec bit on ok 13 2310-04-04 Upper bound of 32bit>=0 timestamp, hi extra sec bit 1. 1 ns ok 14 2378-04-22 Lower bound of 32bit>= timestamp. Extra sec bits 1. Max ns ok 15 2378-04-22 Lower bound of 32bit >=0 timestamp. All extra sec bits on ok 16 2446-05-10 Upper bound of 32bit >=0 timestamp. All extra sec bits on # inode_test_xtimestamp_decoding: pass:16 fail:0 skip:0 total:16 ok 1 inode_test_xtimestamp_decoding # Totals: pass:16 fail:0 skip:0 total:16 ok 2 ext4_inode_test [02:56:19] Elapsed time: 5.173s total, 0.001s configuring, 5.055s building, 0.074s running [1] https://lore.kernel.org/linux-ext4/20230603150327.3596033-1-shikemeng@huaweicloud.com/T/#m5ff8e3a058ce1cb272dfef3262cd3202ce6e4358 [2] https://lore.kernel.org/linux-ext4/ZC3MoWn2UO6p+Swp@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com/ [3] https://docs.kernel.org/dev-tools/kunit/usage.html#testing-static-functions [4] https://docs.kernel.org/dev-tools/kunit/api/functionredirection.html#c.KUNIT_STATIC_STUB_REDIRECT [5] https://lore.kernel.org/linux-ext4/20230317155047.GB3270589@mit.edu/ [6] https://lore.kernel.org/linux-ext4/20230629120044.1261968-1-shikemeng@huaweicloud.com/T/#mcc8fb0697fd54d9267c02c027e1eb3468026ae56 [7] https://lore.kernel.org/lkml/20230725172051.2142641-1-shikemeng@huaweicloud.com/ Kemeng Shi (12): ext4: make state in ext4_mb_mark_bb to be bool ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb ext4: call ext4_mb_mark_context in ext4_free_blocks_simple ext4: extend ext4_mb_mark_context to support allocation under journal ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb() ext4: call ext4_mb_mark_context in ext4_mb_clear_bb ext4: Separate block bitmap and buddy bitmap freeing in ext4_group_add_blocks() ext4: call ext4_mb_mark_context in ext4_group_add_blocks() ext4: add some kunit stub for mballoc kunit test ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc ext4: run mballoc test with different layouts setting fs/ext4/balloc.c | 10 + fs/ext4/ext4.h | 2 +- fs/ext4/fast_commit.c | 8 +- fs/ext4/mballoc-test.c | 349 ++++++++++++++++++++++++++++ fs/ext4/mballoc.c | 512 +++++++++++++++-------------------------- 5 files changed, 550 insertions(+), 331 deletions(-) create mode 100644 fs/ext4/mballoc-test.c -- 2.30.0
As state could only be either 0 or 1, just make it bool. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- fs/ext4/ext4.h | 2 +- fs/ext4/fast_commit.c | 8 ++++---- fs/ext4/mballoc.c | 2 +- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -XXX,XX +XXX,XX @@ extern int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, extern int ext4_trim_fs(struct super_block *, struct fstrim_range *); extern void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid); extern void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block, - int len, int state); + int len, bool state); static inline bool ext4_mb_cr_expensive(enum criteria cr) { return cr >= CR_GOAL_LEN_SLOW; diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -XXX,XX +XXX,XX @@ static int ext4_fc_replay_add_range(struct super_block *sb, * at the end of the FC replay using our array of * modified inodes. */ - ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0); + ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, false); goto next; } @@ -XXX,XX +XXX,XX @@ ext4_fc_replay_del_range(struct super_block *sb, if (ret > 0) { remaining -= ret; cur += ret; - ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0); + ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, false); } else { remaining -= map.m_len; cur += map.m_len; @@ -XXX,XX +XXX,XX @@ static void ext4_fc_set_bitmaps_and_counters(struct super_block *sb) if (!IS_ERR(path)) { for (j = 0; j < path->p_depth; j++) ext4_mb_mark_bb(inode->i_sb, - path[j].p_block, 1, 1); + path[j].p_block, 1, true); ext4_free_ext_path(path); } cur += ret; ext4_mb_mark_bb(inode->i_sb, map.m_pblk, - map.m_len, 1); + map.m_len, true); } else { cur = cur + (map.m_len ? map.m_len : 1); } diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, * blocks in bitmaps and update counters. */ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block, - int len, int state) + int len, bool state) { struct buffer_head *bitmap_bh = NULL; struct ext4_group_desc *gdp; -- 2.30.0
There are several reasons to add a general function ext4_mb_mark_context to update block bitmap and group descriptor on disk: 1. pair behavior of alloc/free bits. For example, ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this. 2. remove repeat code to read from disk, update and write back to disk. 3. reduce future unit test mocks to catch real IO to update structure on disk. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- fs/ext4/mballoc.c | 147 ++++++++++++++++++++++++---------------------- 1 file changed, 77 insertions(+), 70 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ void ext4_exit_mballoc(void) ext4_groupinfo_destroy_slabs(); } +static int +ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group, + ext4_grpblk_t blkoff, ext4_grpblk_t len) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct buffer_head *bitmap_bh = NULL; + struct ext4_group_desc *gdp; + struct buffer_head *gdp_bh; + int err; + unsigned int i, already, changed; + + bitmap_bh = ext4_read_block_bitmap(sb, group); + if (IS_ERR(bitmap_bh)) + return PTR_ERR(bitmap_bh); + + err = -EIO; + gdp = ext4_get_group_desc(sb, group, &gdp_bh); + if (!gdp) + goto out_err; + + ext4_lock_group(sb, group); + if (ext4_has_group_desc_csum(sb) && + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { + gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); + ext4_free_group_clusters_set(sb, gdp, + ext4_free_clusters_after_init(sb, group, gdp)); + } + + already = 0; + for (i = 0; i < len; i++) + if (mb_test_bit(blkoff + i, bitmap_bh->b_data) == + state) + already++; + changed = len - already; + + if (state) { + mb_set_bits(bitmap_bh->b_data, blkoff, len); + ext4_free_group_clusters_set(sb, gdp, + ext4_free_group_clusters(sb, gdp) - changed); + } else { + mb_clear_bits(bitmap_bh->b_data, blkoff, len); + ext4_free_group_clusters_set(sb, gdp, + ext4_free_group_clusters(sb, gdp) + changed); + } + + ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); + ext4_group_desc_csum_set(sb, group, gdp); + ext4_unlock_group(sb, group); + + if (sbi->s_log_groups_per_flex) { + ext4_group_t flex_group = ext4_flex_group(sbi, group); + struct flex_groups *fg = sbi_array_rcu_deref(sbi, + s_flex_groups, flex_group); + + if (state) + atomic64_sub(changed, &fg->free_clusters); + else + atomic64_add(changed, &fg->free_clusters); + } + + err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh); + if (err) + goto out_err; + err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh); + if (err) + goto out_err; + + sync_dirty_buffer(bitmap_bh); + sync_dirty_buffer(gdp_bh); + +out_err: + brelse(bitmap_bh); + return err; +} /* * Check quota and mark chosen space (ac->ac_b_ex) non-free in bitmaps @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block, int len, bool state) { - struct buffer_head *bitmap_bh = NULL; - struct ext4_group_desc *gdp; - struct buffer_head *gdp_bh; struct ext4_sb_info *sbi = EXT4_SB(sb); ext4_group_t group; ext4_grpblk_t blkoff; - int i, err = 0; - int already; - unsigned int clen, clen_changed, thisgrp_len; + int err = 0; + unsigned int clen, thisgrp_len; while (len > 0) { ext4_get_group_no_and_offset(sb, block, &group, &blkoff); @@ -XXX,XX +XXX,XX @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block, ext4_error(sb, "Marking blocks in system zone - " "Block = %llu, len = %u", block, thisgrp_len); - bitmap_bh = NULL; break; } - bitmap_bh = ext4_read_block_bitmap(sb, group); - if (IS_ERR(bitmap_bh)) { - err = PTR_ERR(bitmap_bh); - bitmap_bh = NULL; - break; - } - - err = -EIO; - gdp = ext4_get_group_desc(sb, group, &gdp_bh); - if (!gdp) - break; - - ext4_lock_group(sb, group); - already = 0; - for (i = 0; i < clen; i++) - if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == - !state) - already++; - - clen_changed = clen - already; - if (state) - mb_set_bits(bitmap_bh->b_data, blkoff, clen); - else - mb_clear_bits(bitmap_bh->b_data, blkoff, clen); - if (ext4_has_group_desc_csum(sb) && - (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { - gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); - ext4_free_group_clusters_set(sb, gdp, - ext4_free_clusters_after_init(sb, group, gdp)); - } - if (state) - clen = ext4_free_group_clusters(sb, gdp) - clen_changed; - else - clen = ext4_free_group_clusters(sb, gdp) + clen_changed; - - ext4_free_group_clusters_set(sb, gdp, clen); - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); - ext4_group_desc_csum_set(sb, group, gdp); - - ext4_unlock_group(sb, group); - - if (sbi->s_log_groups_per_flex) { - ext4_group_t flex_group = ext4_flex_group(sbi, group); - struct flex_groups *fg = sbi_array_rcu_deref(sbi, - s_flex_groups, flex_group); - - if (state) - atomic64_sub(clen_changed, &fg->free_clusters); - else - atomic64_add(clen_changed, &fg->free_clusters); - - } - - err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh); - if (err) - break; - sync_dirty_buffer(bitmap_bh); - err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh); - sync_dirty_buffer(gdp_bh); + err = ext4_mb_mark_context(sb, state, group, blkoff, clen); if (err) break; block += thisgrp_len; len -= thisgrp_len; - brelse(bitmap_bh); BUG_ON(len < 0); } - - if (err) - brelse(bitmap_bh); } /* -- 2.30.0
call ext4_mb_mark_context in ext4_free_blocks_simple to: 1. remove repeat code 2. pair update of free_clusters in ext4_mb_new_blocks_simple. 3. add missing ext4_lock_group/ext4_unlock_group protection. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- fs/ext4/mballoc.c | 33 +-------------------------------- 1 file changed, 1 insertion(+), 32 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b, static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block, unsigned long count) { - struct buffer_head *bitmap_bh; struct super_block *sb = inode->i_sb; - struct ext4_group_desc *gdp; - struct buffer_head *gdp_bh; ext4_group_t group; ext4_grpblk_t blkoff; - int already_freed = 0, err, i; ext4_get_group_no_and_offset(sb, block, &group, &blkoff); - bitmap_bh = ext4_read_block_bitmap(sb, group); - if (IS_ERR(bitmap_bh)) { - pr_warn("Failed to read block bitmap\n"); - return; - } - gdp = ext4_get_group_desc(sb, group, &gdp_bh); - if (!gdp) - goto err_out; - - for (i = 0; i < count; i++) { - if (!mb_test_bit(blkoff + i, bitmap_bh->b_data)) - already_freed++; - } - mb_clear_bits(bitmap_bh->b_data, blkoff, count); - err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh); - if (err) - goto err_out; - ext4_free_group_clusters_set( - sb, gdp, ext4_free_group_clusters(sb, gdp) + - count - already_freed); - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); - ext4_group_desc_csum_set(sb, group, gdp); - ext4_handle_dirty_metadata(NULL, NULL, gdp_bh); - sync_dirty_buffer(bitmap_bh); - sync_dirty_buffer(gdp_bh); - -err_out: - brelse(bitmap_bh); + ext4_mb_mark_context(sb, false, group, blkoff, count); } /** -- 2.30.0
Previously, ext4_mb_mark_context is only called under fast commit replay path, so there is no valid handle when we update block bitmap and group descriptor. This patch try to extend ext4_mb_mark_context to be used by code under journal. There are several improvement: 1. add "handle_t *handle" to struct ext4_mark_context to journal block bitmap and group descriptor update inside ext4_mb_mark_context (the added journal code is based on ext4_mb_mark_diskspace_used where ext4_mb_mark_context is going to be used.) 2. add EXT4_MB_BITMAP_MARKED_CHECK flag to control check if bits in block bitmap are already marked as allocation code under journal asserts that all bits to be changed are not marked before. 3. add "ext4_grpblk_t changed" to struct ext4_mark_context to notify number of bits in block bitmap has changed. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- fs/ext4/mballoc.c | 64 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 49 insertions(+), 15 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ void ext4_exit_mballoc(void) ext4_groupinfo_destroy_slabs(); } +#define EXT4_MB_BITMAP_MARKED_CHECK 0x0001 +#define EXT4_MB_SYNC_UPDATE 0x0002 static int -ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group, - ext4_grpblk_t blkoff, ext4_grpblk_t len) +ext4_mb_mark_context(handle_t *handle, struct super_block *sb, bool state, + ext4_group_t group, ext4_grpblk_t blkoff, + ext4_grpblk_t len, int flags, ext4_grpblk_t *ret_changed) { struct ext4_sb_info *sbi = EXT4_SB(sb); struct buffer_head *bitmap_bh = NULL; struct ext4_group_desc *gdp; struct buffer_head *gdp_bh; int err; - unsigned int i, already, changed; + unsigned int i, already, changed = len; + if (ret_changed) + *ret_changed = 0; bitmap_bh = ext4_read_block_bitmap(sb, group); if (IS_ERR(bitmap_bh)) return PTR_ERR(bitmap_bh); + if (handle) { + BUFFER_TRACE(bitmap_bh, "getting write access"); + err = ext4_journal_get_write_access(handle, sb, bitmap_bh, + EXT4_JTR_NONE); + if (err) + goto out_err; + } + err = -EIO; gdp = ext4_get_group_desc(sb, group, &gdp_bh); if (!gdp) goto out_err; + if (handle) { + BUFFER_TRACE(gdp_bh, "get_write_access"); + err = ext4_journal_get_write_access(handle, sb, gdp_bh, + EXT4_JTR_NONE); + if (err) + goto out_err; + } + ext4_lock_group(sb, group); if (ext4_has_group_desc_csum(sb) && (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group, ext4_free_clusters_after_init(sb, group, gdp)); } - already = 0; - for (i = 0; i < len; i++) - if (mb_test_bit(blkoff + i, bitmap_bh->b_data) == - state) - already++; - changed = len - already; + if (flags & EXT4_MB_BITMAP_MARKED_CHECK) { + already = 0; + for (i = 0; i < len; i++) + if (mb_test_bit(blkoff + i, bitmap_bh->b_data) == + state) + already++; + changed = len - already; + } if (state) { mb_set_bits(bitmap_bh->b_data, blkoff, len); @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group, ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); ext4_group_desc_csum_set(sb, group, gdp); ext4_unlock_group(sb, group); + if (ret_changed) + *ret_changed = changed; if (sbi->s_log_groups_per_flex) { ext4_group_t flex_group = ext4_flex_group(sbi, group); @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group, atomic64_add(changed, &fg->free_clusters); } - err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh); + err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); if (err) goto out_err; - err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh); + err = ext4_handle_dirty_metadata(handle, NULL, gdp_bh); if (err) goto out_err; - sync_dirty_buffer(bitmap_bh); - sync_dirty_buffer(gdp_bh); + if (flags & EXT4_MB_SYNC_UPDATE) { + sync_dirty_buffer(bitmap_bh); + sync_dirty_buffer(gdp_bh); + } out_err: brelse(bitmap_bh); @@ -XXX,XX +XXX,XX @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block, break; } - err = ext4_mb_mark_context(sb, state, group, blkoff, clen); + err = ext4_mb_mark_context(NULL, sb, state, + group, blkoff, clen, + EXT4_MB_BITMAP_MARKED_CHECK | + EXT4_MB_SYNC_UPDATE, + NULL); if (err) break; @@ -XXX,XX +XXX,XX @@ static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block, ext4_grpblk_t blkoff; ext4_get_group_no_and_offset(sb, block, &group, &blkoff); - ext4_mb_mark_context(sb, false, group, blkoff, count); + ext4_mb_mark_context(NULL, sb, false, group, blkoff, count, + EXT4_MB_BITMAP_MARKED_CHECK | + EXT4_MB_SYNC_UPDATE, + NULL); } /** -- 2.30.0
call ext4_mb_mark_context in ext4_mb_mark_diskspace_used to: 1. remove repeat code to normally update bitmap and group descriptor on disk. 2. call ext4_mb_mark_context instead of only setting bits in block bitmap to fix the bitmap. Function ext4_mb_mark_context will also update checksum of bitmap and other counter along with the bit change to keep the cosistent with bit change or block bitmap will be marked corrupted as checksum of bitmap is in inconsistent state. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- fs/ext4/mballoc.c | 86 +++++++++++------------------------------------ 1 file changed, 20 insertions(+), 66 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ static noinline_for_stack int ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, handle_t *handle, unsigned int reserv_clstrs) { - struct buffer_head *bitmap_bh = NULL; struct ext4_group_desc *gdp; - struct buffer_head *gdp_bh; struct ext4_sb_info *sbi; struct super_block *sb; ext4_fsblk_t block; int err, len; + int flags = 0; + ext4_grpblk_t changed; BUG_ON(ac->ac_status != AC_STATUS_FOUND); BUG_ON(ac->ac_b_ex.fe_len <= 0); @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, sb = ac->ac_sb; sbi = EXT4_SB(sb); - bitmap_bh = ext4_read_block_bitmap(sb, ac->ac_b_ex.fe_group); - if (IS_ERR(bitmap_bh)) { - return PTR_ERR(bitmap_bh); - } - - BUFFER_TRACE(bitmap_bh, "getting write access"); - err = ext4_journal_get_write_access(handle, sb, bitmap_bh, - EXT4_JTR_NONE); - if (err) - goto out_err; - - err = -EIO; - gdp = ext4_get_group_desc(sb, ac->ac_b_ex.fe_group, &gdp_bh); + gdp = ext4_get_group_desc(sb, ac->ac_b_ex.fe_group, NULL); if (!gdp) - goto out_err; - + return -EIO; ext4_debug("using block group %u(%d)\n", ac->ac_b_ex.fe_group, ext4_free_group_clusters(sb, gdp)); - BUFFER_TRACE(gdp_bh, "get_write_access"); - err = ext4_journal_get_write_access(handle, sb, gdp_bh, EXT4_JTR_NONE); - if (err) - goto out_err; - block = ext4_grp_offs_to_block(sb, &ac->ac_b_ex); - len = EXT4_C2B(sbi, ac->ac_b_ex.fe_len); if (!ext4_inode_block_valid(ac->ac_inode, block, len)) { ext4_error(sb, "Allocating blocks %llu-%llu which overlap " @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, * Fix the bitmap and return EFSCORRUPTED * We leak some of the blocks here. */ - ext4_lock_group(sb, ac->ac_b_ex.fe_group); - mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start, - ac->ac_b_ex.fe_len); - ext4_unlock_group(sb, ac->ac_b_ex.fe_group); - err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); + err = ext4_mb_mark_context(handle, sb, true, + ac->ac_b_ex.fe_group, + ac->ac_b_ex.fe_start, + ac->ac_b_ex.fe_len, + 0, NULL); if (!err) err = -EFSCORRUPTED; - goto out_err; + return err; } - ext4_lock_group(sb, ac->ac_b_ex.fe_group); #ifdef AGGRESSIVE_CHECK - { - int i; - for (i = 0; i < ac->ac_b_ex.fe_len; i++) { - BUG_ON(mb_test_bit(ac->ac_b_ex.fe_start + i, - bitmap_bh->b_data)); - } - } + flags |= EXT4_MB_BITMAP_MARKED_CHECK; #endif - mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start, - ac->ac_b_ex.fe_len); - if (ext4_has_group_desc_csum(sb) && - (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { - gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); - ext4_free_group_clusters_set(sb, gdp, - ext4_free_clusters_after_init(sb, - ac->ac_b_ex.fe_group, gdp)); - } - len = ext4_free_group_clusters(sb, gdp) - ac->ac_b_ex.fe_len; - ext4_free_group_clusters_set(sb, gdp, len); - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); - ext4_group_desc_csum_set(sb, ac->ac_b_ex.fe_group, gdp); + err = ext4_mb_mark_context(handle, sb, true, ac->ac_b_ex.fe_group, + ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len, + flags, &changed); + + if (err && changed == 0) + return err; - ext4_unlock_group(sb, ac->ac_b_ex.fe_group); +#ifdef AGGRESSIVE_CHECK + BUG_ON(changed != ac->ac_b_ex.fe_len); +#endif percpu_counter_sub(&sbi->s_freeclusters_counter, ac->ac_b_ex.fe_len); /* * Now reduce the dirty block count also. Should not go negative @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, percpu_counter_sub(&sbi->s_dirtyclusters_counter, reserv_clstrs); - if (sbi->s_log_groups_per_flex) { - ext4_group_t flex_group = ext4_flex_group(sbi, - ac->ac_b_ex.fe_group); - atomic64_sub(ac->ac_b_ex.fe_len, - &sbi_array_rcu_deref(sbi, s_flex_groups, - flex_group)->free_clusters); - } - - err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); - if (err) - goto out_err; - err = ext4_handle_dirty_metadata(handle, NULL, gdp_bh); - -out_err: - brelse(bitmap_bh); return err; } -- 2.30.0
This patch separates block bitmap and buddy bitmap freeing in order to update block bitmap with ext4_mb_mark_context in following patch. Separated freeing is safe with concurrent allocation as long as: 1. Firstly allocate block in buddy bitmap, and then in block bitmap. 2. Firstly free block in block bitmap, and then buddy bitmap. Then freed block will only be available to allocation when both buddy bitmap and block bitmap are updated by freeing. Allocation obeys rule 1 already, just do sperated freeing with rule 2. Separated freeing has no race with generate_buddy as: Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date buddy page can be found in sbi->s_buddy_cache and no more buddy initialization of the buddy page will be executed concurrently until buddy page is unloaded. As we always do free in "load buddy, free, unload buddy" sequence, separated freeing has no race with generate_buddy. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- fs/ext4/mballoc.c | 98 +++++++++++++++++++++++------------------------ 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, ext4_error(sb, "Freeing blocks in system zone - " "Block = %llu, count = %lu", block, count); /* err = 0. ext4_std_error should be a no op */ - goto error_return; + goto error_out; } flags |= EXT4_FREE_BLOCKS_VALIDATED; @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, flags &= ~EXT4_FREE_BLOCKS_VALIDATED; } count_clusters = EXT4_NUM_B2C(sbi, count); + trace_ext4_mballoc_free(sb, inode, block_group, bit, count_clusters); + + /* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */ + err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b, + GFP_NOFS|__GFP_NOFAIL); + if (err) + goto error_out; + + if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) && + !ext4_inode_block_valid(inode, block, count)) { + ext4_error(sb, "Freeing blocks in system zone - " + "Block = %llu, count = %lu", block, count); + /* err = 0. ext4_std_error should be a no op */ + goto error_clean; + } + bitmap_bh = ext4_read_block_bitmap(sb, block_group); if (IS_ERR(bitmap_bh)) { err = PTR_ERR(bitmap_bh); bitmap_bh = NULL; - goto error_return; + goto error_clean; } gdp = ext4_get_group_desc(sb, block_group, &gd_bh); if (!gdp) { err = -EIO; - goto error_return; - } - - if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) && - !ext4_inode_block_valid(inode, block, count)) { - ext4_error(sb, "Freeing blocks in system zone - " - "Block = %llu, count = %lu", block, count); - /* err = 0. ext4_std_error should be a no op */ - goto error_return; + goto error_clean; } BUFFER_TRACE(bitmap_bh, "getting write access"); err = ext4_journal_get_write_access(handle, sb, bitmap_bh, EXT4_JTR_NONE); if (err) - goto error_return; + goto error_clean; /* * We are about to modify some metadata. Call the journal APIs @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, BUFFER_TRACE(gd_bh, "get_write_access"); err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE); if (err) - goto error_return; + goto error_clean; #ifdef AGGRESSIVE_CHECK { int i; @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, BUG_ON(!mb_test_bit(bit + i, bitmap_bh->b_data)); } #endif - trace_ext4_mballoc_free(sb, inode, block_group, bit, count_clusters); + ext4_lock_group(sb, block_group); + mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); + ret = ext4_free_group_clusters(sb, gdp) + count_clusters; + ext4_free_group_clusters_set(sb, gdp, ret); + ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); + ext4_group_desc_csum_set(sb, block_group, gdp); + ext4_unlock_group(sb, block_group); - /* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */ - err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b, - GFP_NOFS|__GFP_NOFAIL); - if (err) - goto error_return; + if (sbi->s_log_groups_per_flex) { + ext4_group_t flex_group = ext4_flex_group(sbi, block_group); + atomic64_add(count_clusters, + &sbi_array_rcu_deref(sbi, s_flex_groups, + flex_group)->free_clusters); + } + + /* We dirtied the bitmap block */ + BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); + err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); + + /* And the group descriptor block */ + BUFFER_TRACE(gd_bh, "dirtied group descriptor block"); + ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh); + if (!err) + err = ret; /* * We need to make sure we don't reuse the freed block until after the @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, new_entry->efd_tid = handle->h_transaction->t_tid; ext4_lock_group(sb, block_group); - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); ext4_mb_free_metadata(handle, &e4b, new_entry); } else { - /* need to update group_info->bb_free and bitmap - * with group lock held. generate_buddy look at - * them with group lock_held - */ if (test_opt(sb, DISCARD)) { err = ext4_issue_discard(sb, block_group, bit, count_clusters, NULL); @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info); ext4_lock_group(sb, block_group); - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); mb_free_blocks(inode, &e4b, bit, count_clusters); } - ret = ext4_free_group_clusters(sb, gdp) + count_clusters; - ext4_free_group_clusters_set(sb, gdp, ret); - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); - ext4_group_desc_csum_set(sb, block_group, gdp); ext4_unlock_group(sb, block_group); - if (sbi->s_log_groups_per_flex) { - ext4_group_t flex_group = ext4_flex_group(sbi, block_group); - atomic64_add(count_clusters, - &sbi_array_rcu_deref(sbi, s_flex_groups, - flex_group)->free_clusters); - } - /* * on a bigalloc file system, defer the s_freeclusters_counter * update to the caller (ext4_remove_space and friends) so they @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, count_clusters); } - ext4_mb_unload_buddy(&e4b); - - /* We dirtied the bitmap block */ - BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); - err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); - - /* And the group descriptor block */ - BUFFER_TRACE(gd_bh, "dirtied group descriptor block"); - ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh); - if (!err) - err = ret; - if (overflow && !err) { block += count; count = overflow; + ext4_mb_unload_buddy(&e4b); put_bh(bitmap_bh); /* The range changed so it's no longer validated */ flags &= ~EXT4_FREE_BLOCKS_VALIDATED; goto do_more; } -error_return: + +error_clean: + ext4_mb_unload_buddy(&e4b); brelse(bitmap_bh); +error_out: ext4_std_error(sb, err); } -- 2.30.0
Call ext4_mb_mark_context in ext4_mb_clear_bb to remove repeat code. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- fs/ext4/mballoc.c | 69 +++++++---------------------------------------- 1 file changed, 10 insertions(+), 59 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, ext4_fsblk_t block, unsigned long count, int flags) { - struct buffer_head *bitmap_bh = NULL; struct super_block *sb = inode->i_sb; - struct ext4_group_desc *gdp; struct ext4_group_info *grp; unsigned int overflow; ext4_grpblk_t bit; - struct buffer_head *gd_bh; ext4_group_t block_group; struct ext4_sb_info *sbi; struct ext4_buddy e4b; unsigned int count_clusters; int err = 0; - int ret; + int mark_flags = 0; + ext4_grpblk_t changed; sbi = EXT4_SB(sb); @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, goto error_clean; } - bitmap_bh = ext4_read_block_bitmap(sb, block_group); - if (IS_ERR(bitmap_bh)) { - err = PTR_ERR(bitmap_bh); - bitmap_bh = NULL; - goto error_clean; - } - gdp = ext4_get_group_desc(sb, block_group, &gd_bh); - if (!gdp) { - err = -EIO; - goto error_clean; - } - - BUFFER_TRACE(bitmap_bh, "getting write access"); - err = ext4_journal_get_write_access(handle, sb, bitmap_bh, - EXT4_JTR_NONE); - if (err) - goto error_clean; - - /* - * We are about to modify some metadata. Call the journal APIs - * to unshare ->b_data if a currently-committing transaction is - * using it - */ - BUFFER_TRACE(gd_bh, "get_write_access"); - err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE); - if (err) - goto error_clean; #ifdef AGGRESSIVE_CHECK - { - int i; - for (i = 0; i < count_clusters; i++) - BUG_ON(!mb_test_bit(bit + i, bitmap_bh->b_data)); - } + mark_flags |= EXT4_MB_BITMAP_MARKED_CHECK; #endif - ext4_lock_group(sb, block_group); - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); - ret = ext4_free_group_clusters(sb, gdp) + count_clusters; - ext4_free_group_clusters_set(sb, gdp, ret); - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); - ext4_group_desc_csum_set(sb, block_group, gdp); - ext4_unlock_group(sb, block_group); + err = ext4_mb_mark_context(handle, sb, false, block_group, bit, + count_clusters, mark_flags, &changed); - if (sbi->s_log_groups_per_flex) { - ext4_group_t flex_group = ext4_flex_group(sbi, block_group); - atomic64_add(count_clusters, - &sbi_array_rcu_deref(sbi, s_flex_groups, - flex_group)->free_clusters); - } - /* We dirtied the bitmap block */ - BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); - err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); + if (err && changed == 0) + goto error_clean; - /* And the group descriptor block */ - BUFFER_TRACE(gd_bh, "dirtied group descriptor block"); - ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh); - if (!err) - err = ret; +#ifdef AGGRESSIVE_CHECK + BUG_ON(changed != count_clusters); +#endif /* * We need to make sure we don't reuse the freed block until after the @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, block += count; count = overflow; ext4_mb_unload_buddy(&e4b); - put_bh(bitmap_bh); /* The range changed so it's no longer validated */ flags &= ~EXT4_FREE_BLOCKS_VALIDATED; goto do_more; @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, error_clean: ext4_mb_unload_buddy(&e4b); - brelse(bitmap_bh); error_out: ext4_std_error(sb, err); } -- 2.30.0
This patch separates block bitmap and buddy bitmap freeing in order to udpate block bitmap with ext4_mb_mark_context in following patch. The reason why this can be sperated is explained in previous submit. Put the explanation here to simplify the code archeology to ext4_group_add_blocks(): Separated freeing is safe with concurrent allocation as long as: 1. Firstly allocate block in buddy bitmap, and then in block bitmap. 2. Firstly free block in block bitmap, and then buddy bitmap. Then freed block will only be available to allocation when both buddy bitmap and block bitmap are updated by freeing. Allocation obeys rule 1 already, just do sperated freeing with rule 2. Separated freeing has no race with generate_buddy as: Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date buddy page can be found in sbi->s_buddy_cache and no more buddy initialization of the buddy page will be executed concurrently until buddy page is unloaded. As we always do free in "load buddy, free, unload buddy" sequence, separated freeing has no race with generate_buddy. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- fs/ext4/mballoc.c | 54 +++++++++++++++++++++++------------------------ 1 file changed, 26 insertions(+), 28 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, ext4_warning(sb, "too many blocks added to group %u", block_group); err = -EINVAL; - goto error_return; + goto error_out; + } + + err = ext4_mb_load_buddy(sb, block_group, &e4b); + if (err) + goto error_out; + + if (!ext4_sb_block_valid(sb, NULL, block, count)) { + ext4_error(sb, "Adding blocks in system zones - " + "Block = %llu, count = %lu", + block, count); + err = -EINVAL; + goto error_clean; } bitmap_bh = ext4_read_block_bitmap(sb, block_group); if (IS_ERR(bitmap_bh)) { err = PTR_ERR(bitmap_bh); bitmap_bh = NULL; - goto error_return; + goto error_clean; } desc = ext4_get_group_desc(sb, block_group, &gd_bh); if (!desc) { err = -EIO; - goto error_return; - } - - if (!ext4_sb_block_valid(sb, NULL, block, count)) { - ext4_error(sb, "Adding blocks in system zones - " - "Block = %llu, count = %lu", - block, count); - err = -EINVAL; - goto error_return; + goto error_clean; } BUFFER_TRACE(bitmap_bh, "getting write access"); err = ext4_journal_get_write_access(handle, sb, bitmap_bh, EXT4_JTR_NONE); if (err) - goto error_return; + goto error_clean; /* * We are about to modify some metadata. Call the journal APIs @@ -XXX,XX +XXX,XX @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, BUFFER_TRACE(gd_bh, "get_write_access"); err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE); if (err) - goto error_return; + goto error_clean; for (i = 0, clusters_freed = 0; i < cluster_count; i++) { BUFFER_TRACE(bitmap_bh, "clear bit"); @@ -XXX,XX +XXX,XX @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, } } - err = ext4_mb_load_buddy(sb, block_group, &e4b); - if (err) - goto error_return; - - /* - * need to update group_info->bb_free and bitmap - * with group lock held. generate_buddy look at - * them with group lock_held - */ ext4_lock_group(sb, block_group); mb_clear_bits(bitmap_bh->b_data, bit, cluster_count); - mb_free_blocks(NULL, &e4b, bit, cluster_count); free_clusters_count = clusters_freed + ext4_free_group_clusters(sb, desc); ext4_free_group_clusters_set(sb, desc, free_clusters_count); ext4_block_bitmap_csum_set(sb, desc, bitmap_bh); ext4_group_desc_csum_set(sb, block_group, desc); ext4_unlock_group(sb, block_group); - percpu_counter_add(&sbi->s_freeclusters_counter, - clusters_freed); if (sbi->s_log_groups_per_flex) { ext4_group_t flex_group = ext4_flex_group(sbi, block_group); @@ -XXX,XX +XXX,XX @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, flex_group)->free_clusters); } - ext4_mb_unload_buddy(&e4b); - /* We dirtied the bitmap block */ BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); @@ -XXX,XX +XXX,XX @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, if (!err) err = ret; -error_return: + ext4_lock_group(sb, block_group); + mb_free_blocks(NULL, &e4b, bit, cluster_count); + ext4_unlock_group(sb, block_group); + percpu_counter_add(&sbi->s_freeclusters_counter, + clusters_freed); + +error_clean: brelse(bitmap_bh); + ext4_mb_unload_buddy(&e4b); +error_out: ext4_std_error(sb, err); return err; } -- 2.30.0
Call ext4_mb_mark_context in ext4_group_add_blocks() to remove repeat code. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- fs/ext4/mballoc.c | 82 ++++++----------------------------------------- 1 file changed, 10 insertions(+), 72 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ void ext4_free_blocks(handle_t *handle, struct inode *inode, int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, ext4_fsblk_t block, unsigned long count) { - struct buffer_head *bitmap_bh = NULL; - struct buffer_head *gd_bh; ext4_group_t block_group; ext4_grpblk_t bit; - unsigned int i; - struct ext4_group_desc *desc; struct ext4_sb_info *sbi = EXT4_SB(sb); struct ext4_buddy e4b; - int err = 0, ret, free_clusters_count; - ext4_grpblk_t clusters_freed; + int err = 0; ext4_fsblk_t first_cluster = EXT4_B2C(sbi, block); ext4_fsblk_t last_cluster = EXT4_B2C(sbi, block + count - 1); unsigned long cluster_count = last_cluster - first_cluster + 1; + ext4_grpblk_t changed; ext4_debug("Adding block(s) %llu-%llu\n", block, block + count - 1); - if (count == 0) + if (cluster_count == 0) return 0; ext4_get_group_no_and_offset(sb, block, &block_group, &bit); @@ -XXX,XX +XXX,XX @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, goto error_clean; } - bitmap_bh = ext4_read_block_bitmap(sb, block_group); - if (IS_ERR(bitmap_bh)) { - err = PTR_ERR(bitmap_bh); - bitmap_bh = NULL; - goto error_clean; - } - - desc = ext4_get_group_desc(sb, block_group, &gd_bh); - if (!desc) { - err = -EIO; - goto error_clean; - } - - BUFFER_TRACE(bitmap_bh, "getting write access"); - err = ext4_journal_get_write_access(handle, sb, bitmap_bh, - EXT4_JTR_NONE); - if (err) - goto error_clean; - - /* - * We are about to modify some metadata. Call the journal APIs - * to unshare ->b_data if a currently-committing transaction is - * using it - */ - BUFFER_TRACE(gd_bh, "get_write_access"); - err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE); - if (err) + err = ext4_mb_mark_context(handle, sb, false, block_group, bit, + cluster_count, EXT4_MB_BITMAP_MARKED_CHECK, + &changed); + if (err && changed == 0) goto error_clean; - for (i = 0, clusters_freed = 0; i < cluster_count; i++) { - BUFFER_TRACE(bitmap_bh, "clear bit"); - if (!mb_test_bit(bit + i, bitmap_bh->b_data)) { - ext4_error(sb, "bit already cleared for block %llu", - (ext4_fsblk_t)(block + i)); - BUFFER_TRACE(bitmap_bh, "bit already cleared"); - } else { - clusters_freed++; - } - } - - ext4_lock_group(sb, block_group); - mb_clear_bits(bitmap_bh->b_data, bit, cluster_count); - free_clusters_count = clusters_freed + - ext4_free_group_clusters(sb, desc); - ext4_free_group_clusters_set(sb, desc, free_clusters_count); - ext4_block_bitmap_csum_set(sb, desc, bitmap_bh); - ext4_group_desc_csum_set(sb, block_group, desc); - ext4_unlock_group(sb, block_group); - - if (sbi->s_log_groups_per_flex) { - ext4_group_t flex_group = ext4_flex_group(sbi, block_group); - atomic64_add(clusters_freed, - &sbi_array_rcu_deref(sbi, s_flex_groups, - flex_group)->free_clusters); - } - - /* We dirtied the bitmap block */ - BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); - err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); - - /* And the group descriptor block */ - BUFFER_TRACE(gd_bh, "dirtied group descriptor block"); - ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh); - if (!err) - err = ret; + if (changed != cluster_count) + ext4_error(sb, "bit already cleared in group %u", block_group); ext4_lock_group(sb, block_group); mb_free_blocks(NULL, &e4b, bit, cluster_count); ext4_unlock_group(sb, block_group); percpu_counter_add(&sbi->s_freeclusters_counter, - clusters_freed); + changed); error_clean: - brelse(bitmap_bh); ext4_mb_unload_buddy(&e4b); error_out: ext4_std_error(sb, err); -- 2.30.0
Multiblocks allocation will read and write block bitmap and group descriptor which reside on disk. Add kunit stub to function ext4_get_group_desc, ext4_read_block_bitmap_nowait, ext4_wait_block_bitmap and ext4_mb_mark_context to avoid real IO to disk. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/ext4/balloc.c | 10 ++++++++++ fs/ext4/mballoc.c | 5 +++++ 2 files changed, 15 insertions(+) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -XXX,XX +XXX,XX @@ #include "mballoc.h" #include <trace/events/ext4.h> +#include <kunit/static_stub.h> static unsigned ext4_num_base_meta_clusters(struct super_block *sb, ext4_group_t block_group); @@ -XXX,XX +XXX,XX @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block *sb, struct ext4_sb_info *sbi = EXT4_SB(sb); struct buffer_head *bh_p; + KUNIT_STATIC_STUB_REDIRECT(ext4_get_group_desc, + sb, block_group, bh); + if (block_group >= ngroups) { ext4_error(sb, "block_group >= groups_count - block_group = %u," " groups_count = %u", block_group, ngroups); @@ -XXX,XX +XXX,XX @@ ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group, ext4_fsblk_t bitmap_blk; int err; + KUNIT_STATIC_STUB_REDIRECT(ext4_read_block_bitmap_nowait, + sb, block_group, ignore_locked); + desc = ext4_get_group_desc(sb, block_group, NULL); if (!desc) return ERR_PTR(-EFSCORRUPTED); @@ -XXX,XX +XXX,XX @@ int ext4_wait_block_bitmap(struct super_block *sb, ext4_group_t block_group, { struct ext4_group_desc *desc; + KUNIT_STATIC_STUB_REDIRECT(ext4_wait_block_bitmap, + sb, block_group, bh); + if (!buffer_new(bh)) return 0; desc = ext4_get_group_desc(sb, block_group, NULL); diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ #include <linux/backing-dev.h> #include <linux/freezer.h> #include <trace/events/ext4.h> +#include <kunit/static_stub.h> /* * MUSTDO: @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_context(handle_t *handle, struct super_block *sb, bool state, int err; unsigned int i, already, changed = len; + KUNIT_STATIC_STUB_REDIRECT(ext4_mb_mark_context, + handle, sb, state, group, blkoff, len, + flags, ret_changed); + if (ret_changed) *ret_changed = 0; bitmap_bh = ext4_read_block_bitmap(sb, group); -- 2.30.0
Here are prepared work: 1. Include mballoc-test.c to mballoc.c to be able test static function in mballoc.c. 2. Implement static stub to avoid read IO to disk. 3. Construct fake super_block. Only partial members are set, more members will be set when more functions are tested. Then unit test for ext4_mb_new_blocks_simple is added. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- fs/ext4/mballoc-test.c | 325 +++++++++++++++++++++++++++++++++++++++++ fs/ext4/mballoc.c | 4 + 2 files changed, 329 insertions(+) create mode 100644 fs/ext4/mballoc-test.c diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/fs/ext4/mballoc-test.c @@ -XXX,XX +XXX,XX @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * KUnit test of ext4 multiblocks allocation. + */ + +#include <kunit/test.h> +#include <kunit/static_stub.h> + +#include "ext4.h" + +struct mbt_grp_ctx { + struct buffer_head bitmap_bh; + /* desc and gd_bh are just the place holders for now */ + struct ext4_group_desc desc; + struct buffer_head gd_bh; +}; + +struct mbt_ctx { + struct mbt_grp_ctx *grp_ctx; +}; + +struct mbt_ext4_super_block { + struct super_block sb; + struct mbt_ctx mbt_ctx; +}; + +#define MBT_CTX(_sb) (&(container_of((_sb), struct mbt_ext4_super_block, sb)->mbt_ctx)) +#define MBT_GRP_CTX(_sb, _group) (&MBT_CTX(_sb)->grp_ctx[_group]) + +static struct super_block *mbt_ext4_alloc_super_block(void) +{ + struct ext4_super_block *es = kzalloc(sizeof(*es), GFP_KERNEL); + struct ext4_sb_info *sbi = kzalloc(sizeof(*sbi), GFP_KERNEL); + struct mbt_ext4_super_block *fsb = kzalloc(sizeof(*fsb), GFP_KERNEL); + + if (fsb == NULL || sbi == NULL || es == NULL) + goto out; + + sbi->s_es = es; + fsb->sb.s_fs_info = sbi; + return &fsb->sb; + +out: + kfree(fsb); + kfree(sbi); + kfree(es); + return NULL; +} + +static void mbt_ext4_free_super_block(struct super_block *sb) +{ + struct mbt_ext4_super_block *fsb = + container_of(sb, struct mbt_ext4_super_block, sb); + struct ext4_sb_info *sbi = EXT4_SB(sb); + + kfree(sbi->s_es); + kfree(sbi); + kfree(fsb); +} + +struct mbt_ext4_block_layout { + unsigned char blocksize_bits; + unsigned int cluster_bits; + uint32_t blocks_per_group; + ext4_group_t group_count; + uint16_t desc_size; +}; + +static void mbt_init_sb_layout(struct super_block *sb, + struct mbt_ext4_block_layout *layout) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_super_block *es = sbi->s_es; + + sb->s_blocksize = 1UL << layout->blocksize_bits; + sb->s_blocksize_bits = layout->blocksize_bits; + + sbi->s_groups_count = layout->group_count; + sbi->s_blocks_per_group = layout->blocks_per_group; + sbi->s_cluster_bits = layout->cluster_bits; + sbi->s_cluster_ratio = 1U << layout->cluster_bits; + sbi->s_clusters_per_group = layout->blocks_per_group >> + layout->cluster_bits; + sbi->s_desc_size = layout->desc_size; + + es->s_first_data_block = cpu_to_le32(0); + es->s_blocks_count_lo = cpu_to_le32(layout->blocks_per_group * + layout->group_count); +} + +static int mbt_grp_ctx_init(struct super_block *sb, + struct mbt_grp_ctx *grp_ctx) +{ + grp_ctx->bitmap_bh.b_data = kzalloc(EXT4_BLOCK_SIZE(sb), GFP_KERNEL); + if (grp_ctx->bitmap_bh.b_data == NULL) + return -ENOMEM; + + return 0; +} + +static void mbt_grp_ctx_release(struct mbt_grp_ctx *grp_ctx) +{ + kfree(grp_ctx->bitmap_bh.b_data); + grp_ctx->bitmap_bh.b_data = NULL; +} + +static void mbt_ctx_mark_used(struct super_block *sb, ext4_group_t group, + unsigned int start, unsigned int len) +{ + struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(sb, group); + + mb_set_bits(grp_ctx->bitmap_bh.b_data, start, len); +} + +/* called after mbt_init_sb_layout */ +static int mbt_ctx_init(struct super_block *sb) +{ + struct mbt_ctx *ctx = MBT_CTX(sb); + ext4_group_t i, ngroups = ext4_get_groups_count(sb); + + ctx->grp_ctx = kcalloc(ngroups, sizeof(struct mbt_grp_ctx), + GFP_KERNEL); + if (ctx->grp_ctx == NULL) + return -ENOMEM; + + for (i = 0; i < ngroups; i++) + if (mbt_grp_ctx_init(sb, &ctx->grp_ctx[i])) + goto out; + + /* + * first data block(first cluster in first group) is used by + * metadata, mark it used to avoid to alloc data block at first + * block which will fail ext4_sb_block_valid check. + */ + mb_set_bits(ctx->grp_ctx[0].bitmap_bh.b_data, 0, 1); + + return 0; +out: + while (i-- > 0) + mbt_grp_ctx_release(&ctx->grp_ctx[i]); + kfree(ctx->grp_ctx); + return -ENOMEM; +} + +static void mbt_ctx_release(struct super_block *sb) +{ + struct mbt_ctx *ctx = MBT_CTX(sb); + ext4_group_t i, ngroups = ext4_get_groups_count(sb); + + for (i = 0; i < ngroups; i++) + mbt_grp_ctx_release(&ctx->grp_ctx[i]); + kfree(ctx->grp_ctx); +} + +static struct buffer_head * +ext4_read_block_bitmap_nowait_stub(struct super_block *sb, ext4_group_t block_group, + bool ignore_locked) +{ + struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(sb, block_group); + + /* paired with brelse from caller of ext4_read_block_bitmap_nowait */ + get_bh(&grp_ctx->bitmap_bh); + return &grp_ctx->bitmap_bh; +} + +static int ext4_wait_block_bitmap_stub(struct super_block *sb, + ext4_group_t block_group, + struct buffer_head *bh) +{ + return 0; +} + +static struct ext4_group_desc * +ext4_get_group_desc_stub(struct super_block *sb, ext4_group_t block_group, + struct buffer_head **bh) +{ + struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(sb, block_group); + + if (bh != NULL) + *bh = &grp_ctx->gd_bh; + + return &grp_ctx->desc; +} + +static int +ext4_mb_mark_context_stub(handle_t *handle, struct super_block *sb, bool state, + ext4_group_t group, ext4_grpblk_t blkoff, + ext4_grpblk_t len, int flags, + ext4_grpblk_t *ret_changed) +{ + struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(sb, group); + struct buffer_head *bitmap_bh = &grp_ctx->bitmap_bh; + + if (state) + mb_set_bits(bitmap_bh->b_data, blkoff, len); + else + mb_clear_bits(bitmap_bh->b_data, blkoff, len); + + return 0; +} + +#define TEST_BLOCKSIZE_BITS 10 +#define TEST_CLUSTER_BITS 3 +#define TEST_BLOCKS_PER_GROUP 8192 +#define TEST_GROUP_COUNT 4 +#define TEST_DESC_SIZE 64 +#define TEST_GOAL_GROUP 1 +static int mbt_kunit_init(struct kunit *test) +{ + struct mbt_ext4_block_layout layout = { + .blocksize_bits = TEST_BLOCKSIZE_BITS, + .cluster_bits = TEST_CLUSTER_BITS, + .blocks_per_group = TEST_BLOCKS_PER_GROUP, + .group_count = TEST_GROUP_COUNT, + .desc_size = TEST_DESC_SIZE, + }; + struct super_block *sb; + int ret; + + sb = mbt_ext4_alloc_super_block(); + if (sb == NULL) + return -ENOMEM; + + mbt_init_sb_layout(sb, &layout); + + ret = mbt_ctx_init(sb); + if (ret != 0) { + mbt_ext4_free_super_block(sb); + return ret; + } + + test->priv = sb; + kunit_activate_static_stub(test, + ext4_read_block_bitmap_nowait, + ext4_read_block_bitmap_nowait_stub); + kunit_activate_static_stub(test, + ext4_wait_block_bitmap, + ext4_wait_block_bitmap_stub); + kunit_activate_static_stub(test, + ext4_get_group_desc, + ext4_get_group_desc_stub); + kunit_activate_static_stub(test, + ext4_mb_mark_context, + ext4_mb_mark_context_stub); + return 0; +} + +static void mbt_kunit_exit(struct kunit *test) +{ + struct super_block *sb = (struct super_block *)test->priv; + + mbt_ctx_release(sb); + mbt_ext4_free_super_block(sb); +} + +static void test_new_blocks_simple(struct kunit *test) +{ + struct super_block *sb = (struct super_block *)test->priv; + struct inode inode = { .i_sb = sb, }; + struct ext4_allocation_request ar; + ext4_group_t i, goal_group = TEST_GOAL_GROUP; + int err = 0; + ext4_fsblk_t found; + struct ext4_sb_info *sbi = EXT4_SB(sb); + + ar.inode = &inode; + + /* get block at goal */ + ar.goal = ext4_group_first_block_no(sb, goal_group); + found = ext4_mb_new_blocks_simple(&ar, &err); + KUNIT_ASSERT_EQ_MSG(test, ar.goal, found, + "failed to alloc block at goal, expected %llu found %llu", + ar.goal, found); + + /* get block after goal in goal group */ + ar.goal = ext4_group_first_block_no(sb, goal_group); + found = ext4_mb_new_blocks_simple(&ar, &err); + KUNIT_ASSERT_EQ_MSG(test, ar.goal + EXT4_C2B(sbi, 1), found, + "failed to alloc block after goal in goal group, expected %llu found %llu", + ar.goal + 1, found); + + /* get block after goal group */ + mbt_ctx_mark_used(sb, goal_group, 0, EXT4_CLUSTERS_PER_GROUP(sb)); + ar.goal = ext4_group_first_block_no(sb, goal_group); + found = ext4_mb_new_blocks_simple(&ar, &err); + KUNIT_ASSERT_EQ_MSG(test, + ext4_group_first_block_no(sb, goal_group + 1), found, + "failed to alloc block after goal group, expected %llu found %llu", + ext4_group_first_block_no(sb, goal_group + 1), found); + + /* get block before goal group */ + for (i = goal_group; i < ext4_get_groups_count(sb); i++) + mbt_ctx_mark_used(sb, i, 0, EXT4_CLUSTERS_PER_GROUP(sb)); + ar.goal = ext4_group_first_block_no(sb, goal_group); + found = ext4_mb_new_blocks_simple(&ar, &err); + KUNIT_ASSERT_EQ_MSG(test, + ext4_group_first_block_no(sb, 0) + EXT4_C2B(sbi, 1), found, + "failed to alloc block before goal group, expected %llu found %llu", + ext4_group_first_block_no(sb, 0 + EXT4_C2B(sbi, 1)), found); + + /* no block available, fail to allocate block */ + for (i = 0; i < ext4_get_groups_count(sb); i++) + mbt_ctx_mark_used(sb, i, 0, EXT4_CLUSTERS_PER_GROUP(sb)); + ar.goal = ext4_group_first_block_no(sb, goal_group); + found = ext4_mb_new_blocks_simple(&ar, &err); + KUNIT_ASSERT_NE_MSG(test, err, 0, + "unexpectedly get block when no block is available"); +} + + +static struct kunit_case mbt_test_cases[] = { + KUNIT_CASE(test_new_blocks_simple), + {} +}; + +static struct kunit_suite mbt_test_suite = { + .name = "ext4_mballoc_test", + .init = mbt_kunit_init, + .exit = mbt_kunit_exit, + .test_cases = mbt_test_cases, +}; + +kunit_test_suites(&mbt_test_suite); + +MODULE_LICENSE("GPL"); diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ ext4_mballoc_query_range( return error; } + +#ifdef CONFIG_EXT4_KUNIT_TESTS +#include "mballoc-test.c" +#endif -- 2.30.0
Use KUNIT_CASE_PARAM to run mballoc test with different layouts setting. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/ext4/mballoc-test.c | 52 ++++++++++++++++++++++++++++++------------ 1 file changed, 38 insertions(+), 14 deletions(-) diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc-test.c +++ b/fs/ext4/mballoc-test.c @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_context_stub(handle_t *handle, struct super_block *sb, bool state, return 0; } -#define TEST_BLOCKSIZE_BITS 10 -#define TEST_CLUSTER_BITS 3 -#define TEST_BLOCKS_PER_GROUP 8192 -#define TEST_GROUP_COUNT 4 -#define TEST_DESC_SIZE 64 #define TEST_GOAL_GROUP 1 static int mbt_kunit_init(struct kunit *test) { - struct mbt_ext4_block_layout layout = { - .blocksize_bits = TEST_BLOCKSIZE_BITS, - .cluster_bits = TEST_CLUSTER_BITS, - .blocks_per_group = TEST_BLOCKS_PER_GROUP, - .group_count = TEST_GROUP_COUNT, - .desc_size = TEST_DESC_SIZE, - }; + struct mbt_ext4_block_layout *layout = + (struct mbt_ext4_block_layout *)(test->param_value); struct super_block *sb; int ret; @@ -XXX,XX +XXX,XX @@ static int mbt_kunit_init(struct kunit *test) if (sb == NULL) return -ENOMEM; - mbt_init_sb_layout(sb, &layout); + mbt_init_sb_layout(sb, layout); ret = mbt_ctx_init(sb); if (ret != 0) { @@ -XXX,XX +XXX,XX @@ static void test_new_blocks_simple(struct kunit *test) "unexpectedly get block when no block is available"); } +static const struct mbt_ext4_block_layout mbt_test_layouts[] = { + { + .blocksize_bits = 10, + .cluster_bits = 3, + .blocks_per_group = 8192, + .group_count = 4, + .desc_size = 64, + }, + { + .blocksize_bits = 12, + .cluster_bits = 3, + .blocks_per_group = 8192, + .group_count = 4, + .desc_size = 64, + }, + { + .blocksize_bits = 16, + .cluster_bits = 3, + .blocks_per_group = 8192, + .group_count = 4, + .desc_size = 64, + }, +}; + +static void mbt_show_layout(const struct mbt_ext4_block_layout *layout, + char *desc) +{ + snprintf(desc, KUNIT_PARAM_DESC_SIZE, "block_bits=%d cluster_bits=%d " + "blocks_per_group=%d group_count=%d desc_size=%d\n", + layout->blocksize_bits, layout->cluster_bits, + layout->blocks_per_group, layout->group_count, + layout->desc_size); +} +KUNIT_ARRAY_PARAM(mbt_layouts, mbt_test_layouts, mbt_show_layout); static struct kunit_case mbt_test_cases[] = { - KUNIT_CASE(test_new_blocks_simple), + KUNIT_CASE_PARAM(test_new_blocks_simple, mbt_layouts_gen_params), {} }; -- 2.30.0
v7->v8: 1. Convert all ext4_mb_mark_bb to use bool in patch 1. 2. Improve commit msg as Ritesh suggested. 3. Collect RBs from Ritesh. A lot thanks to Ritesh! v6->v7: 1. Remove struct ext4_mark_context and call ext4_mb_mark_context with all needed arguments directly. 2. Add new patch to make state in ext4_mb_mark_bb bool and make state in new-added function ext4_mb_mark_context bool. 3. Fix typos in git messages. 4. Keep on-disk bitmap marking code tight in patch separating on-disk bitmap and buddy bitmap freeing. 5. Test 64k blocksize instead of 256k blocksize. 6. Remove RVB from Ojaswin in changed patches and collect updated RVB from Ritesh. 7. As there is no much functional change to initial version, "kvm-xfstest smoke" test and kunit test are ran and result is good. 8. Plan to add ext4_inode_block_valid in ext4_mb_mark_context in new series! v5->v6: 1. Separate block bitmap and buddy bitmap freeing in individual patch and rewrite the descriptions. 2. Remove #ifdef around KUNIT_STATIC_STUB_REDIRECT which should be only defined when CONFIG_KUNIT is enabled after fix [7] which was merged into kunit-next/kunit 3. Use mbt prefix to distiguish test APIs from actual kernel APIs. 4. Add prepare function for ext4_mark_context and rename ext4_mb_mark_group_bb to ext4_mb_mark_context 5. Support to run mballoc test with different layouts setting and different block size is tested. v4->v5: 1. WARN on free blocks to uninitialized group is removed as normal fast commit route may triggers this, see [1] for details. The review tag from Ojaswin of changed patch is also removed and a futher review is needed. v3->v4: 1. Collect Reviewed-by from Ojaswin 2. Do improve as Ojaswin kindly suggested: Fix typo in commit, WARN if try to clear bit of uninitialized group and improve refactoring of AGGRESSIVE_CHECK code. 3. Fix conflic on patch 16 4. Improve git log in patch 16,17 v2->v3: 1. Fix build warnings on "ext4: add some kunit stub for mballoc kunit test" and "ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc" This series is a new version of unmerged patches from [1]. The first 6 patches of this series factor out codes to mark blocks used or freed which will update on disk block bitmap and group descriptor. Several reasons to do this: 1. pair behavior of alloc/free bits. For example, ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this. 2. remove repeat code to read from disk, update and write back to disk. 3. reduce future unit test mocks to avoid real IO to update structure on disk. The last 2 patches add a unit test for mballoc. Before more unit tests are added, there are something should be discussed: 1. How to test static function in mballoc.c Currently, include mballoc-test.c in mballoc.c to test static function in mballoc.c from mballoc-test.c which is one way suggested in [2]. Not sure if there is any more elegant way to test static function without touch mballoc.c. 2. How to add mocks to function in mballoc.c which may issue IO to disk Currently, KUNIT_STATIC_STUB_REDIRECT is added to functions as suggested in kunit document [3]. 3. How to simulate a block bitmap. Currently, a fake buffer_head with bitmap data is returned, then no futher change is needed. If we simulate a block bitmap with an array of data structure like: struct test_bitmap { unsigned int start; unsigned int len; } which is suggested by Theodore in [4], then we need to add mocks to function which expected bitmap from bitmap_bh->b_data, like mb_find_next_bit, mb_find_next_zero_bit and maybe more. Would like to hear any suggestion! Thanks! I run kvm-xfstest with config "ext4/all" and "-g auto" together with patchset for resize, you can see detail report in [6]. Unit test result is as following: # ./tools/testing/kunit/kunit.py run --kunitconfig=fs/ext4/.kunitconfig --raw_output [02:56:14] Configuring KUnit Kernel ... [02:56:14] Building KUnit Kernel ... Populating config with: $ make ARCH=um O=.kunit olddefconfig Building with: $ make ARCH=um O=.kunit --jobs=88 [02:56:19] Starting KUnit Kernel (1/1)... KTAP version 1 1..2 KTAP version 1 # Subtest: ext4_mballoc_test 1..1 KTAP version 1 # Subtest: test_new_blocks_simple ok 1 block_bits=10 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64 ok 2 block_bits=12 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64 ok 3 block_bits=16 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64 # test_new_blocks_simple: pass:3 fail:0 skip:0 total:3 ok 1 test_new_blocks_simple # Totals: pass:3 fail:0 skip:0 total:3 ok 1 ext4_mballoc_test KTAP version 1 # Subtest: ext4_inode_test 1..1 KTAP version 1 # Subtest: inode_test_xtimestamp_decoding ok 1 1901-12-13 Lower bound of 32bit < 0 timestamp, no extra bits ok 2 1969-12-31 Upper bound of 32bit < 0 timestamp, no extra bits ok 3 1970-01-01 Lower bound of 32bit >=0 timestamp, no extra bits ok 4 2038-01-19 Upper bound of 32bit >=0 timestamp, no extra bits ok 5 2038-01-19 Lower bound of 32bit <0 timestamp, lo extra sec bit on ok 6 2106-02-07 Upper bound of 32bit <0 timestamp, lo extra sec bit on ok 7 2106-02-07 Lower bound of 32bit >=0 timestamp, lo extra sec bit on ok 8 2174-02-25 Upper bound of 32bit >=0 timestamp, lo extra sec bit on ok 9 2174-02-25 Lower bound of 32bit <0 timestamp, hi extra sec bit on ok 10 2242-03-16 Upper bound of 32bit <0 timestamp, hi extra sec bit on ok 11 2242-03-16 Lower bound of 32bit >=0 timestamp, hi extra sec bit on ok 12 2310-04-04 Upper bound of 32bit >=0 timestamp, hi extra sec bit on ok 13 2310-04-04 Upper bound of 32bit>=0 timestamp, hi extra sec bit 1. 1 ns ok 14 2378-04-22 Lower bound of 32bit>= timestamp. Extra sec bits 1. Max ns ok 15 2378-04-22 Lower bound of 32bit >=0 timestamp. All extra sec bits on ok 16 2446-05-10 Upper bound of 32bit >=0 timestamp. All extra sec bits on # inode_test_xtimestamp_decoding: pass:16 fail:0 skip:0 total:16 ok 1 inode_test_xtimestamp_decoding # Totals: pass:16 fail:0 skip:0 total:16 ok 2 ext4_inode_test [02:56:19] Elapsed time: 5.173s total, 0.001s configuring, 5.055s building, 0.074s running [1] https://lore.kernel.org/linux-ext4/20230603150327.3596033-1-shikemeng@huaweicloud.com/T/#m5ff8e3a058ce1cb272dfef3262cd3202ce6e4358 [2] https://lore.kernel.org/linux-ext4/ZC3MoWn2UO6p+Swp@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com/ [3] https://docs.kernel.org/dev-tools/kunit/usage.html#testing-static-functions [4] https://docs.kernel.org/dev-tools/kunit/api/functionredirection.html#c.KUNIT_STATIC_STUB_REDIRECT [5] https://lore.kernel.org/linux-ext4/20230317155047.GB3270589@mit.edu/ [6] https://lore.kernel.org/linux-ext4/20230629120044.1261968-1-shikemeng@huaweicloud.com/T/#mcc8fb0697fd54d9267c02c027e1eb3468026ae56 [7] https://lore.kernel.org/lkml/20230725172051.2142641-1-shikemeng@huaweicloud.com/ Kemeng Shi (12): ext4: make state in ext4_mb_mark_bb to be bool ext4: factor out codes to update block bitmap and group descriptor on disk from ext4_mb_mark_bb ext4: call ext4_mb_mark_context in ext4_free_blocks_simple ext4: extend ext4_mb_mark_context to support allocation under journal ext4: call ext4_mb_mark_context in ext4_mb_mark_diskspace_used ext4: Separate block bitmap and buddy bitmap freeing in ext4_mb_clear_bb() ext4: call ext4_mb_mark_context in ext4_mb_clear_bb ext4: Separate block bitmap and buddy bitmap freeing in ext4_group_add_blocks() ext4: call ext4_mb_mark_context in ext4_group_add_blocks() ext4: add some kunit stub for mballoc kunit test ext4: add first unit test for ext4_mb_new_blocks_simple in mballoc ext4: run mballoc test with different layouts setting fs/ext4/balloc.c | 10 + fs/ext4/ext4.h | 2 +- fs/ext4/extents.c | 4 +- fs/ext4/fast_commit.c | 8 +- fs/ext4/mballoc-test.c | 349 ++++++++++++++++++++++++++++ fs/ext4/mballoc.c | 514 +++++++++++++++-------------------------- 6 files changed, 553 insertions(+), 334 deletions(-) create mode 100644 fs/ext4/mballoc-test.c -- 2.30.0
As state could only be either 0 or 1, just make it bool. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> --- fs/ext4/ext4.h | 2 +- fs/ext4/extents.c | 4 ++-- fs/ext4/fast_commit.c | 8 ++++---- fs/ext4/mballoc.c | 4 ++-- 4 files changed, 9 insertions(+), 9 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -XXX,XX +XXX,XX @@ extern int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, extern int ext4_trim_fs(struct super_block *, struct fstrim_range *); extern void ext4_process_freed_data(struct super_block *sb, tid_t commit_tid); extern void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block, - int len, int state); + int len, bool state); static inline bool ext4_mb_cr_expensive(enum criteria cr) { return cr >= CR_GOAL_LEN_SLOW; diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -XXX,XX +XXX,XX @@ int ext4_ext_clear_bb(struct inode *inode) for (j = 0; j < path->p_depth; j++) { ext4_mb_mark_bb(inode->i_sb, - path[j].p_block, 1, 0); + path[j].p_block, 1, false); ext4_fc_record_regions(inode->i_sb, inode->i_ino, 0, path[j].p_block, 1, 1); } ext4_free_ext_path(path); } - ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0); + ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, false); ext4_fc_record_regions(inode->i_sb, inode->i_ino, map.m_lblk, map.m_pblk, map.m_len, 1); } diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -XXX,XX +XXX,XX @@ static int ext4_fc_replay_add_range(struct super_block *sb, * at the end of the FC replay using our array of * modified inodes. */ - ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0); + ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, false); goto next; } @@ -XXX,XX +XXX,XX @@ ext4_fc_replay_del_range(struct super_block *sb, if (ret > 0) { remaining -= ret; cur += ret; - ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, 0); + ext4_mb_mark_bb(inode->i_sb, map.m_pblk, map.m_len, false); } else { remaining -= map.m_len; cur += map.m_len; @@ -XXX,XX +XXX,XX @@ static void ext4_fc_set_bitmaps_and_counters(struct super_block *sb) if (!IS_ERR(path)) { for (j = 0; j < path->p_depth; j++) ext4_mb_mark_bb(inode->i_sb, - path[j].p_block, 1, 1); + path[j].p_block, 1, true); ext4_free_ext_path(path); } cur += ret; ext4_mb_mark_bb(inode->i_sb, map.m_pblk, - map.m_len, 1); + map.m_len, true); } else { cur = cur + (map.m_len ? map.m_len : 1); } diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, * blocks in bitmaps and update counters. */ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block, - int len, int state) + int len, bool state) { struct buffer_head *bitmap_bh = NULL; struct ext4_group_desc *gdp; @@ -XXX,XX +XXX,XX @@ ext4_mb_new_blocks_simple(struct ext4_allocation_request *ar, int *errp) } block = ext4_group_first_block_no(sb, group) + EXT4_C2B(sbi, i); - ext4_mb_mark_bb(sb, block, 1, 1); + ext4_mb_mark_bb(sb, block, 1, true); ar->len = 1; return block; -- 2.30.0
There are several reasons to add a general function ext4_mb_mark_context to update block bitmap and group descriptor on disk: 1. pair behavior of alloc/free bits. For example, ext4_mb_new_blocks_simple will update free_clusters in struct flex_groups in ext4_mb_mark_bb while ext4_free_blocks_simple forgets this. 2. remove repeat code to read from disk, update and write back to disk. 3. reduce future unit test mocks to catch real IO to update structure on disk. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/ext4/mballoc.c | 147 ++++++++++++++++++++++++---------------------- 1 file changed, 77 insertions(+), 70 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ void ext4_exit_mballoc(void) ext4_groupinfo_destroy_slabs(); } +static int +ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group, + ext4_grpblk_t blkoff, ext4_grpblk_t len) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct buffer_head *bitmap_bh = NULL; + struct ext4_group_desc *gdp; + struct buffer_head *gdp_bh; + int err; + unsigned int i, already, changed; + + bitmap_bh = ext4_read_block_bitmap(sb, group); + if (IS_ERR(bitmap_bh)) + return PTR_ERR(bitmap_bh); + + err = -EIO; + gdp = ext4_get_group_desc(sb, group, &gdp_bh); + if (!gdp) + goto out_err; + + ext4_lock_group(sb, group); + if (ext4_has_group_desc_csum(sb) && + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { + gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); + ext4_free_group_clusters_set(sb, gdp, + ext4_free_clusters_after_init(sb, group, gdp)); + } + + already = 0; + for (i = 0; i < len; i++) + if (mb_test_bit(blkoff + i, bitmap_bh->b_data) == + state) + already++; + changed = len - already; + + if (state) { + mb_set_bits(bitmap_bh->b_data, blkoff, len); + ext4_free_group_clusters_set(sb, gdp, + ext4_free_group_clusters(sb, gdp) - changed); + } else { + mb_clear_bits(bitmap_bh->b_data, blkoff, len); + ext4_free_group_clusters_set(sb, gdp, + ext4_free_group_clusters(sb, gdp) + changed); + } + + ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); + ext4_group_desc_csum_set(sb, group, gdp); + ext4_unlock_group(sb, group); + + if (sbi->s_log_groups_per_flex) { + ext4_group_t flex_group = ext4_flex_group(sbi, group); + struct flex_groups *fg = sbi_array_rcu_deref(sbi, + s_flex_groups, flex_group); + + if (state) + atomic64_sub(changed, &fg->free_clusters); + else + atomic64_add(changed, &fg->free_clusters); + } + + err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh); + if (err) + goto out_err; + err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh); + if (err) + goto out_err; + + sync_dirty_buffer(bitmap_bh); + sync_dirty_buffer(gdp_bh); + +out_err: + brelse(bitmap_bh); + return err; +} /* * Check quota and mark chosen space (ac->ac_b_ex) non-free in bitmaps @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block, int len, bool state) { - struct buffer_head *bitmap_bh = NULL; - struct ext4_group_desc *gdp; - struct buffer_head *gdp_bh; struct ext4_sb_info *sbi = EXT4_SB(sb); ext4_group_t group; ext4_grpblk_t blkoff; - int i, err = 0; - int already; - unsigned int clen, clen_changed, thisgrp_len; + int err = 0; + unsigned int clen, thisgrp_len; while (len > 0) { ext4_get_group_no_and_offset(sb, block, &group, &blkoff); @@ -XXX,XX +XXX,XX @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block, ext4_error(sb, "Marking blocks in system zone - " "Block = %llu, len = %u", block, thisgrp_len); - bitmap_bh = NULL; break; } - bitmap_bh = ext4_read_block_bitmap(sb, group); - if (IS_ERR(bitmap_bh)) { - err = PTR_ERR(bitmap_bh); - bitmap_bh = NULL; - break; - } - - err = -EIO; - gdp = ext4_get_group_desc(sb, group, &gdp_bh); - if (!gdp) - break; - - ext4_lock_group(sb, group); - already = 0; - for (i = 0; i < clen; i++) - if (!mb_test_bit(blkoff + i, bitmap_bh->b_data) == - !state) - already++; - - clen_changed = clen - already; - if (state) - mb_set_bits(bitmap_bh->b_data, blkoff, clen); - else - mb_clear_bits(bitmap_bh->b_data, blkoff, clen); - if (ext4_has_group_desc_csum(sb) && - (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { - gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); - ext4_free_group_clusters_set(sb, gdp, - ext4_free_clusters_after_init(sb, group, gdp)); - } - if (state) - clen = ext4_free_group_clusters(sb, gdp) - clen_changed; - else - clen = ext4_free_group_clusters(sb, gdp) + clen_changed; - - ext4_free_group_clusters_set(sb, gdp, clen); - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); - ext4_group_desc_csum_set(sb, group, gdp); - - ext4_unlock_group(sb, group); - - if (sbi->s_log_groups_per_flex) { - ext4_group_t flex_group = ext4_flex_group(sbi, group); - struct flex_groups *fg = sbi_array_rcu_deref(sbi, - s_flex_groups, flex_group); - - if (state) - atomic64_sub(clen_changed, &fg->free_clusters); - else - atomic64_add(clen_changed, &fg->free_clusters); - - } - - err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh); - if (err) - break; - sync_dirty_buffer(bitmap_bh); - err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh); - sync_dirty_buffer(gdp_bh); + err = ext4_mb_mark_context(sb, state, group, blkoff, clen); if (err) break; block += thisgrp_len; len -= thisgrp_len; - brelse(bitmap_bh); BUG_ON(len < 0); } - - if (err) - brelse(bitmap_bh); } /* -- 2.30.0
call ext4_mb_mark_context in ext4_free_blocks_simple to: 1. remove repeat code 2. pair update of free_clusters in ext4_mb_new_blocks_simple. 3. add missing ext4_lock_group/ext4_unlock_group protection. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/ext4/mballoc.c | 33 +-------------------------------- 1 file changed, 1 insertion(+), 32 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b, static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block, unsigned long count) { - struct buffer_head *bitmap_bh; struct super_block *sb = inode->i_sb; - struct ext4_group_desc *gdp; - struct buffer_head *gdp_bh; ext4_group_t group; ext4_grpblk_t blkoff; - int already_freed = 0, err, i; ext4_get_group_no_and_offset(sb, block, &group, &blkoff); - bitmap_bh = ext4_read_block_bitmap(sb, group); - if (IS_ERR(bitmap_bh)) { - pr_warn("Failed to read block bitmap\n"); - return; - } - gdp = ext4_get_group_desc(sb, group, &gdp_bh); - if (!gdp) - goto err_out; - - for (i = 0; i < count; i++) { - if (!mb_test_bit(blkoff + i, bitmap_bh->b_data)) - already_freed++; - } - mb_clear_bits(bitmap_bh->b_data, blkoff, count); - err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh); - if (err) - goto err_out; - ext4_free_group_clusters_set( - sb, gdp, ext4_free_group_clusters(sb, gdp) + - count - already_freed); - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); - ext4_group_desc_csum_set(sb, group, gdp); - ext4_handle_dirty_metadata(NULL, NULL, gdp_bh); - sync_dirty_buffer(bitmap_bh); - sync_dirty_buffer(gdp_bh); - -err_out: - brelse(bitmap_bh); + ext4_mb_mark_context(sb, false, group, blkoff, count); } /** -- 2.30.0
Previously, ext4_mb_mark_context is only called under fast commit replay path, so there is no valid handle when we update block bitmap and group descriptor. This patch try to extend ext4_mb_mark_context to be used by code under journal. There are several improvement: 1. Add "handle_t *handle" to struct ext4_mark_context to journal block bitmap and group descriptor update inside ext4_mb_mark_context (the added journal code is based on ext4_mb_mark_diskspace_used where ext4_mb_mark_context is going to be used.) 2. Adds a flag argument to ext4_mb_mark_context() which controls a. EXT4_MB_BITMAP_MARKED_CHECK - whether block bitmap checking is needed. b. EXT4_MB_SYNC_UPDATE - whether dirty buffers (bitmap and group descriptor) needs sync. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/ext4/mballoc.c | 64 ++++++++++++++++++++++++++++++++++++----------- 1 file changed, 49 insertions(+), 15 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ void ext4_exit_mballoc(void) ext4_groupinfo_destroy_slabs(); } +#define EXT4_MB_BITMAP_MARKED_CHECK 0x0001 +#define EXT4_MB_SYNC_UPDATE 0x0002 static int -ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group, - ext4_grpblk_t blkoff, ext4_grpblk_t len) +ext4_mb_mark_context(handle_t *handle, struct super_block *sb, bool state, + ext4_group_t group, ext4_grpblk_t blkoff, + ext4_grpblk_t len, int flags, ext4_grpblk_t *ret_changed) { struct ext4_sb_info *sbi = EXT4_SB(sb); struct buffer_head *bitmap_bh = NULL; struct ext4_group_desc *gdp; struct buffer_head *gdp_bh; int err; - unsigned int i, already, changed; + unsigned int i, already, changed = len; + if (ret_changed) + *ret_changed = 0; bitmap_bh = ext4_read_block_bitmap(sb, group); if (IS_ERR(bitmap_bh)) return PTR_ERR(bitmap_bh); + if (handle) { + BUFFER_TRACE(bitmap_bh, "getting write access"); + err = ext4_journal_get_write_access(handle, sb, bitmap_bh, + EXT4_JTR_NONE); + if (err) + goto out_err; + } + err = -EIO; gdp = ext4_get_group_desc(sb, group, &gdp_bh); if (!gdp) goto out_err; + if (handle) { + BUFFER_TRACE(gdp_bh, "get_write_access"); + err = ext4_journal_get_write_access(handle, sb, gdp_bh, + EXT4_JTR_NONE); + if (err) + goto out_err; + } + ext4_lock_group(sb, group); if (ext4_has_group_desc_csum(sb) && (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group, ext4_free_clusters_after_init(sb, group, gdp)); } - already = 0; - for (i = 0; i < len; i++) - if (mb_test_bit(blkoff + i, bitmap_bh->b_data) == - state) - already++; - changed = len - already; + if (flags & EXT4_MB_BITMAP_MARKED_CHECK) { + already = 0; + for (i = 0; i < len; i++) + if (mb_test_bit(blkoff + i, bitmap_bh->b_data) == + state) + already++; + changed = len - already; + } if (state) { mb_set_bits(bitmap_bh->b_data, blkoff, len); @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group, ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); ext4_group_desc_csum_set(sb, group, gdp); ext4_unlock_group(sb, group); + if (ret_changed) + *ret_changed = changed; if (sbi->s_log_groups_per_flex) { ext4_group_t flex_group = ext4_flex_group(sbi, group); @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_context(struct super_block *sb, bool state, ext4_group_t group, atomic64_add(changed, &fg->free_clusters); } - err = ext4_handle_dirty_metadata(NULL, NULL, bitmap_bh); + err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); if (err) goto out_err; - err = ext4_handle_dirty_metadata(NULL, NULL, gdp_bh); + err = ext4_handle_dirty_metadata(handle, NULL, gdp_bh); if (err) goto out_err; - sync_dirty_buffer(bitmap_bh); - sync_dirty_buffer(gdp_bh); + if (flags & EXT4_MB_SYNC_UPDATE) { + sync_dirty_buffer(bitmap_bh); + sync_dirty_buffer(gdp_bh); + } out_err: brelse(bitmap_bh); @@ -XXX,XX +XXX,XX @@ void ext4_mb_mark_bb(struct super_block *sb, ext4_fsblk_t block, break; } - err = ext4_mb_mark_context(sb, state, group, blkoff, clen); + err = ext4_mb_mark_context(NULL, sb, state, + group, blkoff, clen, + EXT4_MB_BITMAP_MARKED_CHECK | + EXT4_MB_SYNC_UPDATE, + NULL); if (err) break; @@ -XXX,XX +XXX,XX @@ static void ext4_free_blocks_simple(struct inode *inode, ext4_fsblk_t block, ext4_grpblk_t blkoff; ext4_get_group_no_and_offset(sb, block, &group, &blkoff); - ext4_mb_mark_context(sb, false, group, blkoff, count); + ext4_mb_mark_context(NULL, sb, false, group, blkoff, count, + EXT4_MB_BITMAP_MARKED_CHECK | + EXT4_MB_SYNC_UPDATE, + NULL); } /** -- 2.30.0
Call ext4_mb_mark_context in ext4_mb_mark_diskspace_used to: 1. Remove repeat code to normally update bitmap and group descriptor on disk. 2. Now that we have a common API for marking blocks inuse/free in block bitmap, use that instead of open coding it in function ext4_mb_mark_diskspace_used(). The current code was not updating checksum and other counters. ext4_mb_mark_context() should fix these consistency problems. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/ext4/mballoc.c | 86 +++++++++++------------------------------------ 1 file changed, 20 insertions(+), 66 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ static noinline_for_stack int ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, handle_t *handle, unsigned int reserv_clstrs) { - struct buffer_head *bitmap_bh = NULL; struct ext4_group_desc *gdp; - struct buffer_head *gdp_bh; struct ext4_sb_info *sbi; struct super_block *sb; ext4_fsblk_t block; int err, len; + int flags = 0; + ext4_grpblk_t changed; BUG_ON(ac->ac_status != AC_STATUS_FOUND); BUG_ON(ac->ac_b_ex.fe_len <= 0); @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, sb = ac->ac_sb; sbi = EXT4_SB(sb); - bitmap_bh = ext4_read_block_bitmap(sb, ac->ac_b_ex.fe_group); - if (IS_ERR(bitmap_bh)) { - return PTR_ERR(bitmap_bh); - } - - BUFFER_TRACE(bitmap_bh, "getting write access"); - err = ext4_journal_get_write_access(handle, sb, bitmap_bh, - EXT4_JTR_NONE); - if (err) - goto out_err; - - err = -EIO; - gdp = ext4_get_group_desc(sb, ac->ac_b_ex.fe_group, &gdp_bh); + gdp = ext4_get_group_desc(sb, ac->ac_b_ex.fe_group, NULL); if (!gdp) - goto out_err; - + return -EIO; ext4_debug("using block group %u(%d)\n", ac->ac_b_ex.fe_group, ext4_free_group_clusters(sb, gdp)); - BUFFER_TRACE(gdp_bh, "get_write_access"); - err = ext4_journal_get_write_access(handle, sb, gdp_bh, EXT4_JTR_NONE); - if (err) - goto out_err; - block = ext4_grp_offs_to_block(sb, &ac->ac_b_ex); - len = EXT4_C2B(sbi, ac->ac_b_ex.fe_len); if (!ext4_inode_block_valid(ac->ac_inode, block, len)) { ext4_error(sb, "Allocating blocks %llu-%llu which overlap " @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, * Fix the bitmap and return EFSCORRUPTED * We leak some of the blocks here. */ - ext4_lock_group(sb, ac->ac_b_ex.fe_group); - mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start, - ac->ac_b_ex.fe_len); - ext4_unlock_group(sb, ac->ac_b_ex.fe_group); - err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); + err = ext4_mb_mark_context(handle, sb, true, + ac->ac_b_ex.fe_group, + ac->ac_b_ex.fe_start, + ac->ac_b_ex.fe_len, + 0, NULL); if (!err) err = -EFSCORRUPTED; - goto out_err; + return err; } - ext4_lock_group(sb, ac->ac_b_ex.fe_group); #ifdef AGGRESSIVE_CHECK - { - int i; - for (i = 0; i < ac->ac_b_ex.fe_len; i++) { - BUG_ON(mb_test_bit(ac->ac_b_ex.fe_start + i, - bitmap_bh->b_data)); - } - } + flags |= EXT4_MB_BITMAP_MARKED_CHECK; #endif - mb_set_bits(bitmap_bh->b_data, ac->ac_b_ex.fe_start, - ac->ac_b_ex.fe_len); - if (ext4_has_group_desc_csum(sb) && - (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT))) { - gdp->bg_flags &= cpu_to_le16(~EXT4_BG_BLOCK_UNINIT); - ext4_free_group_clusters_set(sb, gdp, - ext4_free_clusters_after_init(sb, - ac->ac_b_ex.fe_group, gdp)); - } - len = ext4_free_group_clusters(sb, gdp) - ac->ac_b_ex.fe_len; - ext4_free_group_clusters_set(sb, gdp, len); - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); - ext4_group_desc_csum_set(sb, ac->ac_b_ex.fe_group, gdp); + err = ext4_mb_mark_context(handle, sb, true, ac->ac_b_ex.fe_group, + ac->ac_b_ex.fe_start, ac->ac_b_ex.fe_len, + flags, &changed); + + if (err && changed == 0) + return err; - ext4_unlock_group(sb, ac->ac_b_ex.fe_group); +#ifdef AGGRESSIVE_CHECK + BUG_ON(changed != ac->ac_b_ex.fe_len); +#endif percpu_counter_sub(&sbi->s_freeclusters_counter, ac->ac_b_ex.fe_len); /* * Now reduce the dirty block count also. Should not go negative @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_diskspace_used(struct ext4_allocation_context *ac, percpu_counter_sub(&sbi->s_dirtyclusters_counter, reserv_clstrs); - if (sbi->s_log_groups_per_flex) { - ext4_group_t flex_group = ext4_flex_group(sbi, - ac->ac_b_ex.fe_group); - atomic64_sub(ac->ac_b_ex.fe_len, - &sbi_array_rcu_deref(sbi, s_flex_groups, - flex_group)->free_clusters); - } - - err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); - if (err) - goto out_err; - err = ext4_handle_dirty_metadata(handle, NULL, gdp_bh); - -out_err: - brelse(bitmap_bh); return err; } -- 2.30.0
This patch separates block bitmap and buddy bitmap freeing in order to update block bitmap with ext4_mb_mark_context in following patch. Separated freeing is safe with concurrent allocation as long as: 1. Firstly allocate block in buddy bitmap, and then in block bitmap. 2. Firstly free block in block bitmap, and then buddy bitmap. Then freed block will only be available to allocation when both buddy bitmap and block bitmap are updated by freeing. Allocation obeys rule 1 already, just do sperated freeing with rule 2. Separated freeing has no race with generate_buddy as: Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date buddy page can be found in sbi->s_buddy_cache and no more buddy initialization of the buddy page will be executed concurrently until buddy page is unloaded. As we always do free in "load buddy, free, unload buddy" sequence, separated freeing has no race with generate_buddy. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/ext4/mballoc.c | 98 +++++++++++++++++++++++------------------------ 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, ext4_error(sb, "Freeing blocks in system zone - " "Block = %llu, count = %lu", block, count); /* err = 0. ext4_std_error should be a no op */ - goto error_return; + goto error_out; } flags |= EXT4_FREE_BLOCKS_VALIDATED; @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, flags &= ~EXT4_FREE_BLOCKS_VALIDATED; } count_clusters = EXT4_NUM_B2C(sbi, count); + trace_ext4_mballoc_free(sb, inode, block_group, bit, count_clusters); + + /* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */ + err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b, + GFP_NOFS|__GFP_NOFAIL); + if (err) + goto error_out; + + if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) && + !ext4_inode_block_valid(inode, block, count)) { + ext4_error(sb, "Freeing blocks in system zone - " + "Block = %llu, count = %lu", block, count); + /* err = 0. ext4_std_error should be a no op */ + goto error_clean; + } + bitmap_bh = ext4_read_block_bitmap(sb, block_group); if (IS_ERR(bitmap_bh)) { err = PTR_ERR(bitmap_bh); bitmap_bh = NULL; - goto error_return; + goto error_clean; } gdp = ext4_get_group_desc(sb, block_group, &gd_bh); if (!gdp) { err = -EIO; - goto error_return; - } - - if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) && - !ext4_inode_block_valid(inode, block, count)) { - ext4_error(sb, "Freeing blocks in system zone - " - "Block = %llu, count = %lu", block, count); - /* err = 0. ext4_std_error should be a no op */ - goto error_return; + goto error_clean; } BUFFER_TRACE(bitmap_bh, "getting write access"); err = ext4_journal_get_write_access(handle, sb, bitmap_bh, EXT4_JTR_NONE); if (err) - goto error_return; + goto error_clean; /* * We are about to modify some metadata. Call the journal APIs @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, BUFFER_TRACE(gd_bh, "get_write_access"); err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE); if (err) - goto error_return; + goto error_clean; #ifdef AGGRESSIVE_CHECK { int i; @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, BUG_ON(!mb_test_bit(bit + i, bitmap_bh->b_data)); } #endif - trace_ext4_mballoc_free(sb, inode, block_group, bit, count_clusters); + ext4_lock_group(sb, block_group); + mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); + ret = ext4_free_group_clusters(sb, gdp) + count_clusters; + ext4_free_group_clusters_set(sb, gdp, ret); + ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); + ext4_group_desc_csum_set(sb, block_group, gdp); + ext4_unlock_group(sb, block_group); - /* __GFP_NOFAIL: retry infinitely, ignore TIF_MEMDIE and memcg limit. */ - err = ext4_mb_load_buddy_gfp(sb, block_group, &e4b, - GFP_NOFS|__GFP_NOFAIL); - if (err) - goto error_return; + if (sbi->s_log_groups_per_flex) { + ext4_group_t flex_group = ext4_flex_group(sbi, block_group); + atomic64_add(count_clusters, + &sbi_array_rcu_deref(sbi, s_flex_groups, + flex_group)->free_clusters); + } + + /* We dirtied the bitmap block */ + BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); + err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); + + /* And the group descriptor block */ + BUFFER_TRACE(gd_bh, "dirtied group descriptor block"); + ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh); + if (!err) + err = ret; /* * We need to make sure we don't reuse the freed block until after the @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, new_entry->efd_tid = handle->h_transaction->t_tid; ext4_lock_group(sb, block_group); - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); ext4_mb_free_metadata(handle, &e4b, new_entry); } else { - /* need to update group_info->bb_free and bitmap - * with group lock held. generate_buddy look at - * them with group lock_held - */ if (test_opt(sb, DISCARD)) { err = ext4_issue_discard(sb, block_group, bit, count_clusters, NULL); @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, EXT4_MB_GRP_CLEAR_TRIMMED(e4b.bd_info); ext4_lock_group(sb, block_group); - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); mb_free_blocks(inode, &e4b, bit, count_clusters); } - ret = ext4_free_group_clusters(sb, gdp) + count_clusters; - ext4_free_group_clusters_set(sb, gdp, ret); - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); - ext4_group_desc_csum_set(sb, block_group, gdp); ext4_unlock_group(sb, block_group); - if (sbi->s_log_groups_per_flex) { - ext4_group_t flex_group = ext4_flex_group(sbi, block_group); - atomic64_add(count_clusters, - &sbi_array_rcu_deref(sbi, s_flex_groups, - flex_group)->free_clusters); - } - /* * on a bigalloc file system, defer the s_freeclusters_counter * update to the caller (ext4_remove_space and friends) so they @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, count_clusters); } - ext4_mb_unload_buddy(&e4b); - - /* We dirtied the bitmap block */ - BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); - err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); - - /* And the group descriptor block */ - BUFFER_TRACE(gd_bh, "dirtied group descriptor block"); - ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh); - if (!err) - err = ret; - if (overflow && !err) { block += count; count = overflow; + ext4_mb_unload_buddy(&e4b); put_bh(bitmap_bh); /* The range changed so it's no longer validated */ flags &= ~EXT4_FREE_BLOCKS_VALIDATED; goto do_more; } -error_return: + +error_clean: + ext4_mb_unload_buddy(&e4b); brelse(bitmap_bh); +error_out: ext4_std_error(sb, err); } -- 2.30.0
Call ext4_mb_mark_context in ext4_mb_clear_bb to remove repeat code. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/ext4/mballoc.c | 69 +++++++---------------------------------------- 1 file changed, 10 insertions(+), 59 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, ext4_fsblk_t block, unsigned long count, int flags) { - struct buffer_head *bitmap_bh = NULL; struct super_block *sb = inode->i_sb; - struct ext4_group_desc *gdp; struct ext4_group_info *grp; unsigned int overflow; ext4_grpblk_t bit; - struct buffer_head *gd_bh; ext4_group_t block_group; struct ext4_sb_info *sbi; struct ext4_buddy e4b; unsigned int count_clusters; int err = 0; - int ret; + int mark_flags = 0; + ext4_grpblk_t changed; sbi = EXT4_SB(sb); @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, goto error_clean; } - bitmap_bh = ext4_read_block_bitmap(sb, block_group); - if (IS_ERR(bitmap_bh)) { - err = PTR_ERR(bitmap_bh); - bitmap_bh = NULL; - goto error_clean; - } - gdp = ext4_get_group_desc(sb, block_group, &gd_bh); - if (!gdp) { - err = -EIO; - goto error_clean; - } - - BUFFER_TRACE(bitmap_bh, "getting write access"); - err = ext4_journal_get_write_access(handle, sb, bitmap_bh, - EXT4_JTR_NONE); - if (err) - goto error_clean; - - /* - * We are about to modify some metadata. Call the journal APIs - * to unshare ->b_data if a currently-committing transaction is - * using it - */ - BUFFER_TRACE(gd_bh, "get_write_access"); - err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE); - if (err) - goto error_clean; #ifdef AGGRESSIVE_CHECK - { - int i; - for (i = 0; i < count_clusters; i++) - BUG_ON(!mb_test_bit(bit + i, bitmap_bh->b_data)); - } + mark_flags |= EXT4_MB_BITMAP_MARKED_CHECK; #endif - ext4_lock_group(sb, block_group); - mb_clear_bits(bitmap_bh->b_data, bit, count_clusters); - ret = ext4_free_group_clusters(sb, gdp) + count_clusters; - ext4_free_group_clusters_set(sb, gdp, ret); - ext4_block_bitmap_csum_set(sb, gdp, bitmap_bh); - ext4_group_desc_csum_set(sb, block_group, gdp); - ext4_unlock_group(sb, block_group); + err = ext4_mb_mark_context(handle, sb, false, block_group, bit, + count_clusters, mark_flags, &changed); - if (sbi->s_log_groups_per_flex) { - ext4_group_t flex_group = ext4_flex_group(sbi, block_group); - atomic64_add(count_clusters, - &sbi_array_rcu_deref(sbi, s_flex_groups, - flex_group)->free_clusters); - } - /* We dirtied the bitmap block */ - BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); - err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); + if (err && changed == 0) + goto error_clean; - /* And the group descriptor block */ - BUFFER_TRACE(gd_bh, "dirtied group descriptor block"); - ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh); - if (!err) - err = ret; +#ifdef AGGRESSIVE_CHECK + BUG_ON(changed != count_clusters); +#endif /* * We need to make sure we don't reuse the freed block until after the @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, block += count; count = overflow; ext4_mb_unload_buddy(&e4b); - put_bh(bitmap_bh); /* The range changed so it's no longer validated */ flags &= ~EXT4_FREE_BLOCKS_VALIDATED; goto do_more; @@ -XXX,XX +XXX,XX @@ static void ext4_mb_clear_bb(handle_t *handle, struct inode *inode, error_clean: ext4_mb_unload_buddy(&e4b); - brelse(bitmap_bh); error_out: ext4_std_error(sb, err); } -- 2.30.0
This patch separates block bitmap and buddy bitmap freeing in order to update block bitmap with ext4_mb_mark_context in following patch. The reason why this can be sperated is explained in previous submit. Put the explanation here to simplify the code archeology to ext4_group_add_blocks(): Separated freeing is safe with concurrent allocation as long as: 1. Firstly allocate block in buddy bitmap, and then in block bitmap. 2. Firstly free block in block bitmap, and then buddy bitmap. Then freed block will only be available to allocation when both buddy bitmap and block bitmap are updated by freeing. Allocation obeys rule 1 already, just do sperated freeing with rule 2. Separated freeing has no race with generate_buddy as: Once ext4_mb_load_buddy_gfp is executed successfully, the update-to-date buddy page can be found in sbi->s_buddy_cache and no more buddy initialization of the buddy page will be executed concurrently until buddy page is unloaded. As we always do free in "load buddy, free, unload buddy" sequence, separated freeing has no race with generate_buddy. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/ext4/mballoc.c | 54 +++++++++++++++++++++++------------------------ 1 file changed, 26 insertions(+), 28 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, ext4_warning(sb, "too many blocks added to group %u", block_group); err = -EINVAL; - goto error_return; + goto error_out; + } + + err = ext4_mb_load_buddy(sb, block_group, &e4b); + if (err) + goto error_out; + + if (!ext4_sb_block_valid(sb, NULL, block, count)) { + ext4_error(sb, "Adding blocks in system zones - " + "Block = %llu, count = %lu", + block, count); + err = -EINVAL; + goto error_clean; } bitmap_bh = ext4_read_block_bitmap(sb, block_group); if (IS_ERR(bitmap_bh)) { err = PTR_ERR(bitmap_bh); bitmap_bh = NULL; - goto error_return; + goto error_clean; } desc = ext4_get_group_desc(sb, block_group, &gd_bh); if (!desc) { err = -EIO; - goto error_return; - } - - if (!ext4_sb_block_valid(sb, NULL, block, count)) { - ext4_error(sb, "Adding blocks in system zones - " - "Block = %llu, count = %lu", - block, count); - err = -EINVAL; - goto error_return; + goto error_clean; } BUFFER_TRACE(bitmap_bh, "getting write access"); err = ext4_journal_get_write_access(handle, sb, bitmap_bh, EXT4_JTR_NONE); if (err) - goto error_return; + goto error_clean; /* * We are about to modify some metadata. Call the journal APIs @@ -XXX,XX +XXX,XX @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, BUFFER_TRACE(gd_bh, "get_write_access"); err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE); if (err) - goto error_return; + goto error_clean; for (i = 0, clusters_freed = 0; i < cluster_count; i++) { BUFFER_TRACE(bitmap_bh, "clear bit"); @@ -XXX,XX +XXX,XX @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, } } - err = ext4_mb_load_buddy(sb, block_group, &e4b); - if (err) - goto error_return; - - /* - * need to update group_info->bb_free and bitmap - * with group lock held. generate_buddy look at - * them with group lock_held - */ ext4_lock_group(sb, block_group); mb_clear_bits(bitmap_bh->b_data, bit, cluster_count); - mb_free_blocks(NULL, &e4b, bit, cluster_count); free_clusters_count = clusters_freed + ext4_free_group_clusters(sb, desc); ext4_free_group_clusters_set(sb, desc, free_clusters_count); ext4_block_bitmap_csum_set(sb, desc, bitmap_bh); ext4_group_desc_csum_set(sb, block_group, desc); ext4_unlock_group(sb, block_group); - percpu_counter_add(&sbi->s_freeclusters_counter, - clusters_freed); if (sbi->s_log_groups_per_flex) { ext4_group_t flex_group = ext4_flex_group(sbi, block_group); @@ -XXX,XX +XXX,XX @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, flex_group)->free_clusters); } - ext4_mb_unload_buddy(&e4b); - /* We dirtied the bitmap block */ BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); @@ -XXX,XX +XXX,XX @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, if (!err) err = ret; -error_return: + ext4_lock_group(sb, block_group); + mb_free_blocks(NULL, &e4b, bit, cluster_count); + ext4_unlock_group(sb, block_group); + percpu_counter_add(&sbi->s_freeclusters_counter, + clusters_freed); + +error_clean: brelse(bitmap_bh); + ext4_mb_unload_buddy(&e4b); +error_out: ext4_std_error(sb, err); return err; } -- 2.30.0
Call ext4_mb_mark_context in ext4_group_add_blocks() to remove repeat code. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/ext4/mballoc.c | 82 ++++++----------------------------------------- 1 file changed, 10 insertions(+), 72 deletions(-) diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ void ext4_free_blocks(handle_t *handle, struct inode *inode, int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, ext4_fsblk_t block, unsigned long count) { - struct buffer_head *bitmap_bh = NULL; - struct buffer_head *gd_bh; ext4_group_t block_group; ext4_grpblk_t bit; - unsigned int i; - struct ext4_group_desc *desc; struct ext4_sb_info *sbi = EXT4_SB(sb); struct ext4_buddy e4b; - int err = 0, ret, free_clusters_count; - ext4_grpblk_t clusters_freed; + int err = 0; ext4_fsblk_t first_cluster = EXT4_B2C(sbi, block); ext4_fsblk_t last_cluster = EXT4_B2C(sbi, block + count - 1); unsigned long cluster_count = last_cluster - first_cluster + 1; + ext4_grpblk_t changed; ext4_debug("Adding block(s) %llu-%llu\n", block, block + count - 1); - if (count == 0) + if (cluster_count == 0) return 0; ext4_get_group_no_and_offset(sb, block, &block_group, &bit); @@ -XXX,XX +XXX,XX @@ int ext4_group_add_blocks(handle_t *handle, struct super_block *sb, goto error_clean; } - bitmap_bh = ext4_read_block_bitmap(sb, block_group); - if (IS_ERR(bitmap_bh)) { - err = PTR_ERR(bitmap_bh); - bitmap_bh = NULL; - goto error_clean; - } - - desc = ext4_get_group_desc(sb, block_group, &gd_bh); - if (!desc) { - err = -EIO; - goto error_clean; - } - - BUFFER_TRACE(bitmap_bh, "getting write access"); - err = ext4_journal_get_write_access(handle, sb, bitmap_bh, - EXT4_JTR_NONE); - if (err) - goto error_clean; - - /* - * We are about to modify some metadata. Call the journal APIs - * to unshare ->b_data if a currently-committing transaction is - * using it - */ - BUFFER_TRACE(gd_bh, "get_write_access"); - err = ext4_journal_get_write_access(handle, sb, gd_bh, EXT4_JTR_NONE); - if (err) + err = ext4_mb_mark_context(handle, sb, false, block_group, bit, + cluster_count, EXT4_MB_BITMAP_MARKED_CHECK, + &changed); + if (err && changed == 0) goto error_clean; - for (i = 0, clusters_freed = 0; i < cluster_count; i++) { - BUFFER_TRACE(bitmap_bh, "clear bit"); - if (!mb_test_bit(bit + i, bitmap_bh->b_data)) { - ext4_error(sb, "bit already cleared for block %llu", - (ext4_fsblk_t)(block + i)); - BUFFER_TRACE(bitmap_bh, "bit already cleared"); - } else { - clusters_freed++; - } - } - - ext4_lock_group(sb, block_group); - mb_clear_bits(bitmap_bh->b_data, bit, cluster_count); - free_clusters_count = clusters_freed + - ext4_free_group_clusters(sb, desc); - ext4_free_group_clusters_set(sb, desc, free_clusters_count); - ext4_block_bitmap_csum_set(sb, desc, bitmap_bh); - ext4_group_desc_csum_set(sb, block_group, desc); - ext4_unlock_group(sb, block_group); - - if (sbi->s_log_groups_per_flex) { - ext4_group_t flex_group = ext4_flex_group(sbi, block_group); - atomic64_add(clusters_freed, - &sbi_array_rcu_deref(sbi, s_flex_groups, - flex_group)->free_clusters); - } - - /* We dirtied the bitmap block */ - BUFFER_TRACE(bitmap_bh, "dirtied bitmap block"); - err = ext4_handle_dirty_metadata(handle, NULL, bitmap_bh); - - /* And the group descriptor block */ - BUFFER_TRACE(gd_bh, "dirtied group descriptor block"); - ret = ext4_handle_dirty_metadata(handle, NULL, gd_bh); - if (!err) - err = ret; + if (changed != cluster_count) + ext4_error(sb, "bit already cleared in group %u", block_group); ext4_lock_group(sb, block_group); mb_free_blocks(NULL, &e4b, bit, cluster_count); ext4_unlock_group(sb, block_group); percpu_counter_add(&sbi->s_freeclusters_counter, - clusters_freed); + changed); error_clean: - brelse(bitmap_bh); ext4_mb_unload_buddy(&e4b); error_out: ext4_std_error(sb, err); -- 2.30.0
Multiblocks allocation will read and write block bitmap and group descriptor which reside on disk. Add kunit stub to function ext4_get_group_desc, ext4_read_block_bitmap_nowait, ext4_wait_block_bitmap and ext4_mb_mark_context to avoid real IO to disk. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/ext4/balloc.c | 10 ++++++++++ fs/ext4/mballoc.c | 5 +++++ 2 files changed, 15 insertions(+) diff --git a/fs/ext4/balloc.c b/fs/ext4/balloc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -XXX,XX +XXX,XX @@ #include "mballoc.h" #include <trace/events/ext4.h> +#include <kunit/static_stub.h> static unsigned ext4_num_base_meta_clusters(struct super_block *sb, ext4_group_t block_group); @@ -XXX,XX +XXX,XX @@ struct ext4_group_desc * ext4_get_group_desc(struct super_block *sb, struct ext4_sb_info *sbi = EXT4_SB(sb); struct buffer_head *bh_p; + KUNIT_STATIC_STUB_REDIRECT(ext4_get_group_desc, + sb, block_group, bh); + if (block_group >= ngroups) { ext4_error(sb, "block_group >= groups_count - block_group = %u," " groups_count = %u", block_group, ngroups); @@ -XXX,XX +XXX,XX @@ ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group, ext4_fsblk_t bitmap_blk; int err; + KUNIT_STATIC_STUB_REDIRECT(ext4_read_block_bitmap_nowait, + sb, block_group, ignore_locked); + desc = ext4_get_group_desc(sb, block_group, NULL); if (!desc) return ERR_PTR(-EFSCORRUPTED); @@ -XXX,XX +XXX,XX @@ int ext4_wait_block_bitmap(struct super_block *sb, ext4_group_t block_group, { struct ext4_group_desc *desc; + KUNIT_STATIC_STUB_REDIRECT(ext4_wait_block_bitmap, + sb, block_group, bh); + if (!buffer_new(bh)) return 0; desc = ext4_get_group_desc(sb, block_group, NULL); diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ #include <linux/backing-dev.h> #include <linux/freezer.h> #include <trace/events/ext4.h> +#include <kunit/static_stub.h> /* * MUSTDO: @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_context(handle_t *handle, struct super_block *sb, bool state, int err; unsigned int i, already, changed = len; + KUNIT_STATIC_STUB_REDIRECT(ext4_mb_mark_context, + handle, sb, state, group, blkoff, len, + flags, ret_changed); + if (ret_changed) *ret_changed = 0; bitmap_bh = ext4_read_block_bitmap(sb, group); -- 2.30.0
Here are prepared work: 1. Include mballoc-test.c to mballoc.c to be able test static function in mballoc.c. 2. Implement static stub to avoid read IO to disk. 3. Construct fake super_block. Only partial members are set, more members will be set when more functions are tested. Then unit test for ext4_mb_new_blocks_simple is added. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/ext4/mballoc-test.c | 325 +++++++++++++++++++++++++++++++++++++++++ fs/ext4/mballoc.c | 4 + 2 files changed, 329 insertions(+) create mode 100644 fs/ext4/mballoc-test.c diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/fs/ext4/mballoc-test.c @@ -XXX,XX +XXX,XX @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * KUnit test of ext4 multiblocks allocation. + */ + +#include <kunit/test.h> +#include <kunit/static_stub.h> + +#include "ext4.h" + +struct mbt_grp_ctx { + struct buffer_head bitmap_bh; + /* desc and gd_bh are just the place holders for now */ + struct ext4_group_desc desc; + struct buffer_head gd_bh; +}; + +struct mbt_ctx { + struct mbt_grp_ctx *grp_ctx; +}; + +struct mbt_ext4_super_block { + struct super_block sb; + struct mbt_ctx mbt_ctx; +}; + +#define MBT_CTX(_sb) (&(container_of((_sb), struct mbt_ext4_super_block, sb)->mbt_ctx)) +#define MBT_GRP_CTX(_sb, _group) (&MBT_CTX(_sb)->grp_ctx[_group]) + +static struct super_block *mbt_ext4_alloc_super_block(void) +{ + struct ext4_super_block *es = kzalloc(sizeof(*es), GFP_KERNEL); + struct ext4_sb_info *sbi = kzalloc(sizeof(*sbi), GFP_KERNEL); + struct mbt_ext4_super_block *fsb = kzalloc(sizeof(*fsb), GFP_KERNEL); + + if (fsb == NULL || sbi == NULL || es == NULL) + goto out; + + sbi->s_es = es; + fsb->sb.s_fs_info = sbi; + return &fsb->sb; + +out: + kfree(fsb); + kfree(sbi); + kfree(es); + return NULL; +} + +static void mbt_ext4_free_super_block(struct super_block *sb) +{ + struct mbt_ext4_super_block *fsb = + container_of(sb, struct mbt_ext4_super_block, sb); + struct ext4_sb_info *sbi = EXT4_SB(sb); + + kfree(sbi->s_es); + kfree(sbi); + kfree(fsb); +} + +struct mbt_ext4_block_layout { + unsigned char blocksize_bits; + unsigned int cluster_bits; + uint32_t blocks_per_group; + ext4_group_t group_count; + uint16_t desc_size; +}; + +static void mbt_init_sb_layout(struct super_block *sb, + struct mbt_ext4_block_layout *layout) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_super_block *es = sbi->s_es; + + sb->s_blocksize = 1UL << layout->blocksize_bits; + sb->s_blocksize_bits = layout->blocksize_bits; + + sbi->s_groups_count = layout->group_count; + sbi->s_blocks_per_group = layout->blocks_per_group; + sbi->s_cluster_bits = layout->cluster_bits; + sbi->s_cluster_ratio = 1U << layout->cluster_bits; + sbi->s_clusters_per_group = layout->blocks_per_group >> + layout->cluster_bits; + sbi->s_desc_size = layout->desc_size; + + es->s_first_data_block = cpu_to_le32(0); + es->s_blocks_count_lo = cpu_to_le32(layout->blocks_per_group * + layout->group_count); +} + +static int mbt_grp_ctx_init(struct super_block *sb, + struct mbt_grp_ctx *grp_ctx) +{ + grp_ctx->bitmap_bh.b_data = kzalloc(EXT4_BLOCK_SIZE(sb), GFP_KERNEL); + if (grp_ctx->bitmap_bh.b_data == NULL) + return -ENOMEM; + + return 0; +} + +static void mbt_grp_ctx_release(struct mbt_grp_ctx *grp_ctx) +{ + kfree(grp_ctx->bitmap_bh.b_data); + grp_ctx->bitmap_bh.b_data = NULL; +} + +static void mbt_ctx_mark_used(struct super_block *sb, ext4_group_t group, + unsigned int start, unsigned int len) +{ + struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(sb, group); + + mb_set_bits(grp_ctx->bitmap_bh.b_data, start, len); +} + +/* called after mbt_init_sb_layout */ +static int mbt_ctx_init(struct super_block *sb) +{ + struct mbt_ctx *ctx = MBT_CTX(sb); + ext4_group_t i, ngroups = ext4_get_groups_count(sb); + + ctx->grp_ctx = kcalloc(ngroups, sizeof(struct mbt_grp_ctx), + GFP_KERNEL); + if (ctx->grp_ctx == NULL) + return -ENOMEM; + + for (i = 0; i < ngroups; i++) + if (mbt_grp_ctx_init(sb, &ctx->grp_ctx[i])) + goto out; + + /* + * first data block(first cluster in first group) is used by + * metadata, mark it used to avoid to alloc data block at first + * block which will fail ext4_sb_block_valid check. + */ + mb_set_bits(ctx->grp_ctx[0].bitmap_bh.b_data, 0, 1); + + return 0; +out: + while (i-- > 0) + mbt_grp_ctx_release(&ctx->grp_ctx[i]); + kfree(ctx->grp_ctx); + return -ENOMEM; +} + +static void mbt_ctx_release(struct super_block *sb) +{ + struct mbt_ctx *ctx = MBT_CTX(sb); + ext4_group_t i, ngroups = ext4_get_groups_count(sb); + + for (i = 0; i < ngroups; i++) + mbt_grp_ctx_release(&ctx->grp_ctx[i]); + kfree(ctx->grp_ctx); +} + +static struct buffer_head * +ext4_read_block_bitmap_nowait_stub(struct super_block *sb, ext4_group_t block_group, + bool ignore_locked) +{ + struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(sb, block_group); + + /* paired with brelse from caller of ext4_read_block_bitmap_nowait */ + get_bh(&grp_ctx->bitmap_bh); + return &grp_ctx->bitmap_bh; +} + +static int ext4_wait_block_bitmap_stub(struct super_block *sb, + ext4_group_t block_group, + struct buffer_head *bh) +{ + return 0; +} + +static struct ext4_group_desc * +ext4_get_group_desc_stub(struct super_block *sb, ext4_group_t block_group, + struct buffer_head **bh) +{ + struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(sb, block_group); + + if (bh != NULL) + *bh = &grp_ctx->gd_bh; + + return &grp_ctx->desc; +} + +static int +ext4_mb_mark_context_stub(handle_t *handle, struct super_block *sb, bool state, + ext4_group_t group, ext4_grpblk_t blkoff, + ext4_grpblk_t len, int flags, + ext4_grpblk_t *ret_changed) +{ + struct mbt_grp_ctx *grp_ctx = MBT_GRP_CTX(sb, group); + struct buffer_head *bitmap_bh = &grp_ctx->bitmap_bh; + + if (state) + mb_set_bits(bitmap_bh->b_data, blkoff, len); + else + mb_clear_bits(bitmap_bh->b_data, blkoff, len); + + return 0; +} + +#define TEST_BLOCKSIZE_BITS 10 +#define TEST_CLUSTER_BITS 3 +#define TEST_BLOCKS_PER_GROUP 8192 +#define TEST_GROUP_COUNT 4 +#define TEST_DESC_SIZE 64 +#define TEST_GOAL_GROUP 1 +static int mbt_kunit_init(struct kunit *test) +{ + struct mbt_ext4_block_layout layout = { + .blocksize_bits = TEST_BLOCKSIZE_BITS, + .cluster_bits = TEST_CLUSTER_BITS, + .blocks_per_group = TEST_BLOCKS_PER_GROUP, + .group_count = TEST_GROUP_COUNT, + .desc_size = TEST_DESC_SIZE, + }; + struct super_block *sb; + int ret; + + sb = mbt_ext4_alloc_super_block(); + if (sb == NULL) + return -ENOMEM; + + mbt_init_sb_layout(sb, &layout); + + ret = mbt_ctx_init(sb); + if (ret != 0) { + mbt_ext4_free_super_block(sb); + return ret; + } + + test->priv = sb; + kunit_activate_static_stub(test, + ext4_read_block_bitmap_nowait, + ext4_read_block_bitmap_nowait_stub); + kunit_activate_static_stub(test, + ext4_wait_block_bitmap, + ext4_wait_block_bitmap_stub); + kunit_activate_static_stub(test, + ext4_get_group_desc, + ext4_get_group_desc_stub); + kunit_activate_static_stub(test, + ext4_mb_mark_context, + ext4_mb_mark_context_stub); + return 0; +} + +static void mbt_kunit_exit(struct kunit *test) +{ + struct super_block *sb = (struct super_block *)test->priv; + + mbt_ctx_release(sb); + mbt_ext4_free_super_block(sb); +} + +static void test_new_blocks_simple(struct kunit *test) +{ + struct super_block *sb = (struct super_block *)test->priv; + struct inode inode = { .i_sb = sb, }; + struct ext4_allocation_request ar; + ext4_group_t i, goal_group = TEST_GOAL_GROUP; + int err = 0; + ext4_fsblk_t found; + struct ext4_sb_info *sbi = EXT4_SB(sb); + + ar.inode = &inode; + + /* get block at goal */ + ar.goal = ext4_group_first_block_no(sb, goal_group); + found = ext4_mb_new_blocks_simple(&ar, &err); + KUNIT_ASSERT_EQ_MSG(test, ar.goal, found, + "failed to alloc block at goal, expected %llu found %llu", + ar.goal, found); + + /* get block after goal in goal group */ + ar.goal = ext4_group_first_block_no(sb, goal_group); + found = ext4_mb_new_blocks_simple(&ar, &err); + KUNIT_ASSERT_EQ_MSG(test, ar.goal + EXT4_C2B(sbi, 1), found, + "failed to alloc block after goal in goal group, expected %llu found %llu", + ar.goal + 1, found); + + /* get block after goal group */ + mbt_ctx_mark_used(sb, goal_group, 0, EXT4_CLUSTERS_PER_GROUP(sb)); + ar.goal = ext4_group_first_block_no(sb, goal_group); + found = ext4_mb_new_blocks_simple(&ar, &err); + KUNIT_ASSERT_EQ_MSG(test, + ext4_group_first_block_no(sb, goal_group + 1), found, + "failed to alloc block after goal group, expected %llu found %llu", + ext4_group_first_block_no(sb, goal_group + 1), found); + + /* get block before goal group */ + for (i = goal_group; i < ext4_get_groups_count(sb); i++) + mbt_ctx_mark_used(sb, i, 0, EXT4_CLUSTERS_PER_GROUP(sb)); + ar.goal = ext4_group_first_block_no(sb, goal_group); + found = ext4_mb_new_blocks_simple(&ar, &err); + KUNIT_ASSERT_EQ_MSG(test, + ext4_group_first_block_no(sb, 0) + EXT4_C2B(sbi, 1), found, + "failed to alloc block before goal group, expected %llu found %llu", + ext4_group_first_block_no(sb, 0 + EXT4_C2B(sbi, 1)), found); + + /* no block available, fail to allocate block */ + for (i = 0; i < ext4_get_groups_count(sb); i++) + mbt_ctx_mark_used(sb, i, 0, EXT4_CLUSTERS_PER_GROUP(sb)); + ar.goal = ext4_group_first_block_no(sb, goal_group); + found = ext4_mb_new_blocks_simple(&ar, &err); + KUNIT_ASSERT_NE_MSG(test, err, 0, + "unexpectedly get block when no block is available"); +} + + +static struct kunit_case mbt_test_cases[] = { + KUNIT_CASE(test_new_blocks_simple), + {} +}; + +static struct kunit_suite mbt_test_suite = { + .name = "ext4_mballoc_test", + .init = mbt_kunit_init, + .exit = mbt_kunit_exit, + .test_cases = mbt_test_cases, +}; + +kunit_test_suites(&mbt_test_suite); + +MODULE_LICENSE("GPL"); diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -XXX,XX +XXX,XX @@ ext4_mballoc_query_range( return error; } + +#ifdef CONFIG_EXT4_KUNIT_TESTS +#include "mballoc-test.c" +#endif -- 2.30.0
Use KUNIT_CASE_PARAM to run mballoc test with different layouts setting. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> --- fs/ext4/mballoc-test.c | 52 ++++++++++++++++++++++++++++++------------ 1 file changed, 38 insertions(+), 14 deletions(-) diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c index XXXXXXX..XXXXXXX 100644 --- a/fs/ext4/mballoc-test.c +++ b/fs/ext4/mballoc-test.c @@ -XXX,XX +XXX,XX @@ ext4_mb_mark_context_stub(handle_t *handle, struct super_block *sb, bool state, return 0; } -#define TEST_BLOCKSIZE_BITS 10 -#define TEST_CLUSTER_BITS 3 -#define TEST_BLOCKS_PER_GROUP 8192 -#define TEST_GROUP_COUNT 4 -#define TEST_DESC_SIZE 64 #define TEST_GOAL_GROUP 1 static int mbt_kunit_init(struct kunit *test) { - struct mbt_ext4_block_layout layout = { - .blocksize_bits = TEST_BLOCKSIZE_BITS, - .cluster_bits = TEST_CLUSTER_BITS, - .blocks_per_group = TEST_BLOCKS_PER_GROUP, - .group_count = TEST_GROUP_COUNT, - .desc_size = TEST_DESC_SIZE, - }; + struct mbt_ext4_block_layout *layout = + (struct mbt_ext4_block_layout *)(test->param_value); struct super_block *sb; int ret; @@ -XXX,XX +XXX,XX @@ static int mbt_kunit_init(struct kunit *test) if (sb == NULL) return -ENOMEM; - mbt_init_sb_layout(sb, &layout); + mbt_init_sb_layout(sb, layout); ret = mbt_ctx_init(sb); if (ret != 0) { @@ -XXX,XX +XXX,XX @@ static void test_new_blocks_simple(struct kunit *test) "unexpectedly get block when no block is available"); } +static const struct mbt_ext4_block_layout mbt_test_layouts[] = { + { + .blocksize_bits = 10, + .cluster_bits = 3, + .blocks_per_group = 8192, + .group_count = 4, + .desc_size = 64, + }, + { + .blocksize_bits = 12, + .cluster_bits = 3, + .blocks_per_group = 8192, + .group_count = 4, + .desc_size = 64, + }, + { + .blocksize_bits = 16, + .cluster_bits = 3, + .blocks_per_group = 8192, + .group_count = 4, + .desc_size = 64, + }, +}; + +static void mbt_show_layout(const struct mbt_ext4_block_layout *layout, + char *desc) +{ + snprintf(desc, KUNIT_PARAM_DESC_SIZE, "block_bits=%d cluster_bits=%d " + "blocks_per_group=%d group_count=%d desc_size=%d\n", + layout->blocksize_bits, layout->cluster_bits, + layout->blocks_per_group, layout->group_count, + layout->desc_size); +} +KUNIT_ARRAY_PARAM(mbt_layouts, mbt_test_layouts, mbt_show_layout); static struct kunit_case mbt_test_cases[] = { - KUNIT_CASE(test_new_blocks_simple), + KUNIT_CASE_PARAM(test_new_blocks_simple, mbt_layouts_gen_params), {} }; -- 2.30.0