[v3] ext4: better scalability for ext4 block allocation

[PATCH v3 15/17] ext4: convert free groups order lists to xarrays

Posted by Baokun Li 2 months, 3 weeks ago

While traversing the list, holding a spin_lock prevents load_buddy, making
direct use of ext4_try_lock_group impossible. This can lead to a bouncing
scenario where spin_is_locked(grp_A) succeeds, but ext4_try_lock_group()
fails, forcing the list traversal to repeatedly restart from grp_A.

In contrast, linear traversal directly uses ext4_try_lock_group(),
avoiding this bouncing. Therefore, we need a lockless, ordered traversal
to achieve linear-like efficiency.

Therefore, this commit converts both average fragment size lists and
largest free order lists into ordered xarrays.

In an xarray, the index represents the block group number and the value
holds the block group information; a non-empty value indicates the block
group's presence.

While insertion and deletion complexity remain O(1), lookup complexity
changes from O(1) to O(nlogn), which may slightly reduce single-threaded
performance.

Additionally, xarray insertions might fail, potentially due to memory
allocation issues. However, since we have linear traversal as a fallback,
this isn't a major problem. Therefore, we've only added a warning message
for insertion failures here.

A helper function ext4_mb_find_good_group_xarray() is added to find good
groups in the specified xarray starting at the specified position start,
and when it reaches ngroups-1, it wraps around to 0 and then to start-1.
This ensures an ordered traversal within the xarray.

Performance test results are as follows: Single-process operations
on an empty disk show negligible impact, while multi-process workloads
demonstrate a noticeable performance gain.

|CPU: Kunpeng 920   |          P80           |            P1           |
|Memory: 512GB      |------------------------|-------------------------|
|960GB SSD (0.5GB/s)| base  |    patched     | base   |    patched     |
|-------------------|-------|----------------|--------|----------------|
|mb_optimize_scan=0 | 20097 | 19555 (-2.6%)  | 316141 | 315636 (-0.2%) |
|mb_optimize_scan=1 | 13318 | 15496 (+16.3%) | 325273 | 323569 (-0.5%) |

|CPU: AMD 9654 * 2  |          P96           |             P1          |
|Memory: 1536GB     |------------------------|-------------------------|
|960GB SSD (1GB/s)  | base  |    patched     | base   |    patched     |
|-------------------|-------|----------------|--------|----------------|
|mb_optimize_scan=0 | 53603 | 53192 (-0.7%)  | 214243 | 212678 (-0.7%) |
|mb_optimize_scan=1 | 20887 | 37636 (+80.1%) | 213632 | 214189 (+0.2%) |

Signed-off-by: Baokun Li <libaokun1@huawei.com>
---
 fs/ext4/ext4.h    |   8 +-
 fs/ext4/mballoc.c | 254 +++++++++++++++++++++++++---------------------
 2 files changed, 140 insertions(+), 122 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 52a72af6ec34..ea412fdb0b76 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1608,10 +1608,8 @@ struct ext4_sb_info {
 	struct list_head s_discard_list;
 	struct work_struct s_discard_work;
 	atomic_t s_retry_alloc_pending;
-	struct list_head *s_mb_avg_fragment_size;
-	rwlock_t *s_mb_avg_fragment_size_locks;
-	struct list_head *s_mb_largest_free_orders;
-	rwlock_t *s_mb_largest_free_orders_locks;
+	struct xarray *s_mb_avg_fragment_size;
+	struct xarray *s_mb_largest_free_orders;
 
 	/* tunables */
 	unsigned long s_stripe;
@@ -3485,8 +3483,6 @@ struct ext4_group_info {
 	void            *bb_bitmap;
 #endif
 	struct rw_semaphore alloc_sem;
-	struct list_head bb_avg_fragment_size_node;
-	struct list_head bb_largest_free_order_node;
 	ext4_grpblk_t	bb_counters[];	/* Nr of free power-of-two-block
 					 * regions, index is order.
 					 * bb_counters[3] = 5 means
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 0c3cbc7e2e85..a9eb997b8c9b 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -132,25 +132,30 @@
  * If "mb_optimize_scan" mount option is set, we maintain in memory group info
  * structures in two data structures:
  *
- * 1) Array of largest free order lists (sbi->s_mb_largest_free_orders)
+ * 1) Array of largest free order xarrays (sbi->s_mb_largest_free_orders)
  *
- *    Locking: sbi->s_mb_largest_free_orders_locks(array of rw locks)
+ *    Locking: Writers use xa_lock, readers use rcu_read_lock.
  *
- *    This is an array of lists where the index in the array represents the
+ *    This is an array of xarrays where the index in the array represents the
  *    largest free order in the buddy bitmap of the participating group infos of
- *    that list. So, there are exactly MB_NUM_ORDERS(sb) (which means total
- *    number of buddy bitmap orders possible) number of lists. Group-infos are
- *    placed in appropriate lists.
+ *    that xarray. So, there are exactly MB_NUM_ORDERS(sb) (which means total
+ *    number of buddy bitmap orders possible) number of xarrays. Group-infos are
+ *    placed in appropriate xarrays.
  *
- * 2) Average fragment size lists (sbi->s_mb_avg_fragment_size)
+ * 2) Average fragment size xarrays (sbi->s_mb_avg_fragment_size)
  *
- *    Locking: sbi->s_mb_avg_fragment_size_locks(array of rw locks)
+ *    Locking: Writers use xa_lock, readers use rcu_read_lock.
  *
- *    This is an array of lists where in the i-th list there are groups with
+ *    This is an array of xarrays where in the i-th xarray there are groups with
  *    average fragment size >= 2^i and < 2^(i+1). The average fragment size
  *    is computed as ext4_group_info->bb_free / ext4_group_info->bb_fragments.
- *    Note that we don't bother with a special list for completely empty groups
- *    so we only have MB_NUM_ORDERS(sb) lists.
+ *    Note that we don't bother with a special xarray for completely empty
+ *    groups so we only have MB_NUM_ORDERS(sb) xarrays. Group-infos are placed
+ *    in appropriate xarrays.
+ *
+ * In xarray, the index is the block group number, the value is the block group
+ * information, and a non-empty value indicates the block group is present in
+ * the current xarray.
  *
  * When "mb_optimize_scan" mount option is set, mballoc consults the above data
  * structures to decide the order in which groups are to be traversed for
@@ -852,21 +857,75 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
 	if (new == old)
 		return;
 
-	if (old >= 0) {
-		write_lock(&sbi->s_mb_avg_fragment_size_locks[old]);
-		list_del(&grp->bb_avg_fragment_size_node);
-		write_unlock(&sbi->s_mb_avg_fragment_size_locks[old]);
-	}
+	if (old >= 0)
+		xa_erase(&sbi->s_mb_avg_fragment_size[old], grp->bb_group);
 
 	grp->bb_avg_fragment_size_order = new;
 	if (new >= 0) {
-		write_lock(&sbi->s_mb_avg_fragment_size_locks[new]);
-		list_add_tail(&grp->bb_avg_fragment_size_node,
-				&sbi->s_mb_avg_fragment_size[new]);
-		write_unlock(&sbi->s_mb_avg_fragment_size_locks[new]);
+		/*
+		* Cannot use __GFP_NOFAIL because we hold the group lock.
+		* Although allocation for insertion may fails, it's not fatal
+		* as we have linear traversal to fall back on.
+		*/
+		int err = xa_insert(&sbi->s_mb_avg_fragment_size[new],
+				    grp->bb_group, grp, GFP_ATOMIC);
+		if (err)
+			mb_debug(sb, "insert group: %u to s_mb_avg_fragment_size[%d] failed, err %d",
+				 grp->bb_group, new, err);
 	}
 }
 
+static struct ext4_group_info *
+ext4_mb_find_good_group_xarray(struct ext4_allocation_context *ac,
+			       struct xarray *xa, ext4_group_t start)
+{
+	struct super_block *sb = ac->ac_sb;
+	struct ext4_sb_info *sbi = EXT4_SB(sb);
+	enum criteria cr = ac->ac_criteria;
+	ext4_group_t ngroups = ext4_get_groups_count(sb);
+	unsigned long group = start;
+	ext4_group_t end = ngroups;
+	struct ext4_group_info *grp;
+
+	if (WARN_ON_ONCE(start >= end))
+		return NULL;
+
+wrap_around:
+	xa_for_each_range(xa, group, grp, start, end - 1) {
+		if (sbi->s_mb_stats)
+			atomic64_inc(&sbi->s_bal_cX_groups_considered[cr]);
+
+		if (!spin_is_locked(ext4_group_lock_ptr(sb, group)) &&
+		    likely(ext4_mb_good_group(ac, group, cr)))
+			return grp;
+
+		cond_resched();
+	}
+
+	if (start) {
+		end = start;
+		start = 0;
+		goto wrap_around;
+	}
+
+	return NULL;
+}
+
+/*
+ * Find a suitable group of given order from the largest free orders xarray.
+ */
+static struct ext4_group_info *
+ext4_mb_find_good_group_largest_free_order(struct ext4_allocation_context *ac,
+					   int order, ext4_group_t start)
+{
+	struct xarray *xa = &EXT4_SB(ac->ac_sb)->s_mb_largest_free_orders[order];
+
+	if (xa_empty(xa))
+		return NULL;
+
+	return ext4_mb_find_good_group_xarray(ac, xa, start);
+}
+
 /*
  * Choose next group by traversing largest_free_order lists. Updates *new_cr if
  * cr level needs an update.
@@ -875,7 +934,7 @@ static void ext4_mb_choose_next_group_p2_aligned(struct ext4_allocation_context
 			enum criteria *new_cr, ext4_group_t *group)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
-	struct ext4_group_info *iter;
+	struct ext4_group_info *grp;
 	int i;
 
 	if (ac->ac_status == AC_STATUS_FOUND)
@@ -885,26 +944,12 @@ static void ext4_mb_choose_next_group_p2_aligned(struct ext4_allocation_context
 		atomic_inc(&sbi->s_bal_p2_aligned_bad_suggestions);
 
 	for (i = ac->ac_2order; i < MB_NUM_ORDERS(ac->ac_sb); i++) {
-		if (list_empty(&sbi->s_mb_largest_free_orders[i]))
-			continue;
-		read_lock(&sbi->s_mb_largest_free_orders_locks[i]);
-		if (list_empty(&sbi->s_mb_largest_free_orders[i])) {
-			read_unlock(&sbi->s_mb_largest_free_orders_locks[i]);
-			continue;
-		}
-		list_for_each_entry(iter, &sbi->s_mb_largest_free_orders[i],
-				    bb_largest_free_order_node) {
-			if (sbi->s_mb_stats)
-				atomic64_inc(&sbi->s_bal_cX_groups_considered[CR_POWER2_ALIGNED]);
-			if (!spin_is_locked(ext4_group_lock_ptr(ac->ac_sb, iter->bb_group)) &&
-			    likely(ext4_mb_good_group(ac, iter->bb_group, CR_POWER2_ALIGNED))) {
-				*group = iter->bb_group;
-				ac->ac_flags |= EXT4_MB_CR_POWER2_ALIGNED_OPTIMIZED;
-				read_unlock(&sbi->s_mb_largest_free_orders_locks[i]);
-				return;
-			}
+		grp = ext4_mb_find_good_group_largest_free_order(ac, i, *group);
+		if (grp) {
+			*group = grp->bb_group;
+			ac->ac_flags |= EXT4_MB_CR_POWER2_ALIGNED_OPTIMIZED;
+			return;
 		}
-		read_unlock(&sbi->s_mb_largest_free_orders_locks[i]);
 	}
 
 	/* Increment cr and search again if no group is found */
@@ -912,35 +957,18 @@ static void ext4_mb_choose_next_group_p2_aligned(struct ext4_allocation_context
 }
 
 /*
- * Find a suitable group of given order from the average fragments list.
+ * Find a suitable group of given order from the average fragments xarray.
  */
 static struct ext4_group_info *
-ext4_mb_find_good_group_avg_frag_lists(struct ext4_allocation_context *ac, int order)
+ext4_mb_find_good_group_avg_frag_xarray(struct ext4_allocation_context *ac,
+					int order, ext4_group_t start)
 {
-	struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb);
-	struct list_head *frag_list = &sbi->s_mb_avg_fragment_size[order];
-	rwlock_t *frag_list_lock = &sbi->s_mb_avg_fragment_size_locks[order];
-	struct ext4_group_info *grp = NULL, *iter;
-	enum criteria cr = ac->ac_criteria;
+	struct xarray *xa = &EXT4_SB(ac->ac_sb)->s_mb_avg_fragment_size[order];
 
-	if (list_empty(frag_list))
+	if (xa_empty(xa))
 		return NULL;
-	read_lock(frag_list_lock);
-	if (list_empty(frag_list)) {
-		read_unlock(frag_list_lock);
-		return NULL;
-	}
-	list_for_each_entry(iter, frag_list, bb_avg_fragment_size_node) {
-		if (sbi->s_mb_stats)
-			atomic64_inc(&sbi->s_bal_cX_groups_considered[cr]);
-		if (!spin_is_locked(ext4_group_lock_ptr(ac->ac_sb, iter->bb_group)) &&
-		    likely(ext4_mb_good_group(ac, iter->bb_group, cr))) {
-			grp = iter;
-			break;
-		}
-	}
-	read_unlock(frag_list_lock);
-	return grp;
+
+	return ext4_mb_find_good_group_xarray(ac, xa, start);
 }
 
 /*
@@ -961,7 +989,7 @@ static void ext4_mb_choose_next_group_goal_fast(struct ext4_allocation_context *
 
 	for (i = mb_avg_fragment_size_order(ac->ac_sb, ac->ac_g_ex.fe_len);
 	     i < MB_NUM_ORDERS(ac->ac_sb); i++) {
-		grp = ext4_mb_find_good_group_avg_frag_lists(ac, i);
+		grp = ext4_mb_find_good_group_avg_frag_xarray(ac, i, *group);
 		if (grp) {
 			*group = grp->bb_group;
 			ac->ac_flags |= EXT4_MB_CR_GOAL_LEN_FAST_OPTIMIZED;
@@ -1057,7 +1085,8 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
 		frag_order = mb_avg_fragment_size_order(ac->ac_sb,
 							ac->ac_g_ex.fe_len);
 
-		grp = ext4_mb_find_good_group_avg_frag_lists(ac, frag_order);
+		grp = ext4_mb_find_good_group_avg_frag_xarray(ac, frag_order,
+							      *group);
 		if (grp) {
 			*group = grp->bb_group;
 			ac->ac_flags |= EXT4_MB_CR_BEST_AVAIL_LEN_OPTIMIZED;
@@ -1162,18 +1191,25 @@ mb_set_largest_free_order(struct super_block *sb, struct ext4_group_info *grp)
 	if (new == old)
 		return;
 
-	if (old >= 0 && !list_empty(&grp->bb_largest_free_order_node)) {
-		write_lock(&sbi->s_mb_largest_free_orders_locks[old]);
-		list_del_init(&grp->bb_largest_free_order_node);
-		write_unlock(&sbi->s_mb_largest_free_orders_locks[old]);
+	if (old >= 0) {
+		struct xarray *xa = &sbi->s_mb_largest_free_orders[old];
+
+		if (!xa_empty(xa) && xa_load(xa, grp->bb_group))
+			xa_erase(xa, grp->bb_group);
 	}
 
 	grp->bb_largest_free_order = new;
 	if (test_opt2(sb, MB_OPTIMIZE_SCAN) && new >= 0 && grp->bb_free) {
-		write_lock(&sbi->s_mb_largest_free_orders_locks[new]);
-		list_add_tail(&grp->bb_largest_free_order_node,
-			      &sbi->s_mb_largest_free_orders[new]);
-		write_unlock(&sbi->s_mb_largest_free_orders_locks[new]);
+		/*
+		* Cannot use __GFP_NOFAIL because we hold the group lock.
+		* Although allocation for insertion may fails, it's not fatal
+		* as we have linear traversal to fall back on.
+		*/
+		int err = xa_insert(&sbi->s_mb_largest_free_orders[new],
+				    grp->bb_group, grp, GFP_ATOMIC);
+		if (err)
+			mb_debug(sb, "insert group: %u to s_mb_largest_free_orders[%d] failed, err %d",
+				 grp->bb_group, new, err);
 	}
 }
 
@@ -3269,6 +3305,7 @@ static int ext4_mb_seq_structs_summary_show(struct seq_file *seq, void *v)
 	unsigned long position = ((unsigned long) v);
 	struct ext4_group_info *grp;
 	unsigned int count;
+	unsigned long idx;
 
 	position--;
 	if (position >= MB_NUM_ORDERS(sb)) {
@@ -3277,11 +3314,8 @@ static int ext4_mb_seq_structs_summary_show(struct seq_file *seq, void *v)
 			seq_puts(seq, "avg_fragment_size_lists:\n");
 
 		count = 0;
-		read_lock(&sbi->s_mb_avg_fragment_size_locks[position]);
-		list_for_each_entry(grp, &sbi->s_mb_avg_fragment_size[position],
-				    bb_avg_fragment_size_node)
+		xa_for_each(&sbi->s_mb_avg_fragment_size[position], idx, grp)
 			count++;
-		read_unlock(&sbi->s_mb_avg_fragment_size_locks[position]);
 		seq_printf(seq, "\tlist_order_%u_groups: %u\n",
 					(unsigned int)position, count);
 		return 0;
@@ -3293,11 +3327,8 @@ static int ext4_mb_seq_structs_summary_show(struct seq_file *seq, void *v)
 		seq_puts(seq, "max_free_order_lists:\n");
 	}
 	count = 0;
-	read_lock(&sbi->s_mb_largest_free_orders_locks[position]);
-	list_for_each_entry(grp, &sbi->s_mb_largest_free_orders[position],
-			    bb_largest_free_order_node)
+	xa_for_each(&sbi->s_mb_largest_free_orders[position], idx, grp)
 		count++;
-	read_unlock(&sbi->s_mb_largest_free_orders_locks[position]);
 	seq_printf(seq, "\tlist_order_%u_groups: %u\n",
 		   (unsigned int)position, count);
 
@@ -3417,8 +3448,6 @@ int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
 	INIT_LIST_HEAD(&meta_group_info[i]->bb_prealloc_list);
 	init_rwsem(&meta_group_info[i]->alloc_sem);
 	meta_group_info[i]->bb_free_root = RB_ROOT;
-	INIT_LIST_HEAD(&meta_group_info[i]->bb_largest_free_order_node);
-	INIT_LIST_HEAD(&meta_group_info[i]->bb_avg_fragment_size_node);
 	meta_group_info[i]->bb_largest_free_order = -1;  /* uninit */
 	meta_group_info[i]->bb_avg_fragment_size_order = -1;  /* uninit */
 	meta_group_info[i]->bb_group = group;
@@ -3628,6 +3657,20 @@ static void ext4_discard_work(struct work_struct *work)
 		ext4_mb_unload_buddy(&e4b);
 }
 
+static inline void ext4_mb_avg_fragment_size_destory(struct ext4_sb_info *sbi)
+{
+	for (int i = 0; i < MB_NUM_ORDERS(sbi->s_sb); i++)
+		xa_destroy(&sbi->s_mb_avg_fragment_size[i]);
+	kfree(sbi->s_mb_avg_fragment_size);
+}
+
+static inline void ext4_mb_largest_free_orders_destory(struct ext4_sb_info *sbi)
+{
+	for (int i = 0; i < MB_NUM_ORDERS(sbi->s_sb); i++)
+		xa_destroy(&sbi->s_mb_largest_free_orders[i]);
+	kfree(sbi->s_mb_largest_free_orders);
+}
+
 int ext4_mb_init(struct super_block *sb)
 {
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
@@ -3673,41 +3716,24 @@ int ext4_mb_init(struct super_block *sb)
 	} while (i < MB_NUM_ORDERS(sb));
 
 	sbi->s_mb_avg_fragment_size =
-		kmalloc_array(MB_NUM_ORDERS(sb), sizeof(struct list_head),
+		kmalloc_array(MB_NUM_ORDERS(sb), sizeof(struct xarray),
 			GFP_KERNEL);
 	if (!sbi->s_mb_avg_fragment_size) {
 		ret = -ENOMEM;
 		goto out;
 	}
-	sbi->s_mb_avg_fragment_size_locks =
-		kmalloc_array(MB_NUM_ORDERS(sb), sizeof(rwlock_t),
-			GFP_KERNEL);
-	if (!sbi->s_mb_avg_fragment_size_locks) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	for (i = 0; i < MB_NUM_ORDERS(sb); i++) {
-		INIT_LIST_HEAD(&sbi->s_mb_avg_fragment_size[i]);
-		rwlock_init(&sbi->s_mb_avg_fragment_size_locks[i]);
-	}
+	for (i = 0; i < MB_NUM_ORDERS(sb); i++)
+		xa_init(&sbi->s_mb_avg_fragment_size[i]);
+
 	sbi->s_mb_largest_free_orders =
-		kmalloc_array(MB_NUM_ORDERS(sb), sizeof(struct list_head),
+		kmalloc_array(MB_NUM_ORDERS(sb), sizeof(struct xarray),
 			GFP_KERNEL);
 	if (!sbi->s_mb_largest_free_orders) {
 		ret = -ENOMEM;
 		goto out;
 	}
-	sbi->s_mb_largest_free_orders_locks =
-		kmalloc_array(MB_NUM_ORDERS(sb), sizeof(rwlock_t),
-			GFP_KERNEL);
-	if (!sbi->s_mb_largest_free_orders_locks) {
-		ret = -ENOMEM;
-		goto out;
-	}
-	for (i = 0; i < MB_NUM_ORDERS(sb); i++) {
-		INIT_LIST_HEAD(&sbi->s_mb_largest_free_orders[i]);
-		rwlock_init(&sbi->s_mb_largest_free_orders_locks[i]);
-	}
+	for (i = 0; i < MB_NUM_ORDERS(sb); i++)
+		xa_init(&sbi->s_mb_largest_free_orders[i]);
 
 	spin_lock_init(&sbi->s_md_lock);
 	atomic_set(&sbi->s_mb_free_pending, 0);
@@ -3792,10 +3818,8 @@ int ext4_mb_init(struct super_block *sb)
 	kfree(sbi->s_mb_last_groups);
 	sbi->s_mb_last_groups = NULL;
 out:
-	kfree(sbi->s_mb_avg_fragment_size);
-	kfree(sbi->s_mb_avg_fragment_size_locks);
-	kfree(sbi->s_mb_largest_free_orders);
-	kfree(sbi->s_mb_largest_free_orders_locks);
+	ext4_mb_avg_fragment_size_destory(sbi);
+	ext4_mb_largest_free_orders_destory(sbi);
 	kfree(sbi->s_mb_offsets);
 	sbi->s_mb_offsets = NULL;
 	kfree(sbi->s_mb_maxs);
@@ -3862,10 +3886,8 @@ void ext4_mb_release(struct super_block *sb)
 		kvfree(group_info);
 		rcu_read_unlock();
 	}
-	kfree(sbi->s_mb_avg_fragment_size);
-	kfree(sbi->s_mb_avg_fragment_size_locks);
-	kfree(sbi->s_mb_largest_free_orders);
-	kfree(sbi->s_mb_largest_free_orders_locks);
+	ext4_mb_avg_fragment_size_destory(sbi);
+	ext4_mb_largest_free_orders_destory(sbi);
 	kfree(sbi->s_mb_offsets);
 	kfree(sbi->s_mb_maxs);
 	iput(sbi->s_buddy_cache);
-- 
2.46.1

Re: [PATCH v3 15/17] ext4: convert free groups order lists to xarrays

Posted by Guenter Roeck 2 months, 2 weeks ago

Hi,

On Mon, Jul 14, 2025 at 09:03:25PM +0800, Baokun Li wrote:
> While traversing the list, holding a spin_lock prevents load_buddy, making
> direct use of ext4_try_lock_group impossible. This can lead to a bouncing
> scenario where spin_is_locked(grp_A) succeeds, but ext4_try_lock_group()
> fails, forcing the list traversal to repeatedly restart from grp_A.
> 

This patch causes crashes for pretty much every architecture when
running unit tests as part of booting.

Example (from x8_64) as well as bisect log attached below.

Guenter

---
...
[    9.353832]         # Subtest: test_new_blocks_simple
[    9.366711] BUG: kernel NULL pointer dereference, address: 0000000000000014
[    9.366931] #PF: supervisor read access in kernel mode
[    9.366993] #PF: error_code(0x0000) - not-present page
[    9.367165] PGD 0 P4D 0
[    9.367305] Oops: Oops: 0000 [#1] SMP PTI
[    9.367686] CPU: 0 UID: 0 PID: 217 Comm: kunit_try_catch Tainted: G                 N  6.16.0-rc7-next-20250722 #1 PREEMPT(voluntary)
[    9.367846] Tainted: [N]=TEST
[    9.367891] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    9.368063] RIP: 0010:ext4_mb_release+0x26e/0x510
[    9.368374] Code: 28 4a cb ff e8 03 5a cf ff 31 db 48 8d 3c 9b 48 83 c3 01 48 c1 e7 04 48 03 bd 60 05 00 00 e8 c9 a6 48 01 48 8b 85 68 03 00 00 <0f> b6 40 14 83 c0 02 39 d8 7f d6 48 8b bd 60 05 00 00 31 db e8 d9
[    9.368581] RSP: 0000:ffffb33b8041fe40 EFLAGS: 00010286
[    9.368659] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[    9.368732] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9a319e36
[    9.368802] RBP: ffff8b89c3502400 R08: 0000000000000001 R09: 0000000000000000
[    9.368872] R10: 0000000000000001 R11: 0000000000000120 R12: ffff8b89c2f49160
[    9.368941] R13: ffff8b89c2f49158 R14: ffff8b89c2f24000 R15: ffff8b89c2f24000
[    9.369042] FS:  0000000000000000(0000) GS:ffff8b8a3381a000(0000) knlGS:0000000000000000
[    9.369127] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.369194] CR2: 0000000000000014 CR3: 0000000009a9c000 CR4: 00000000003506f0
[    9.369324] Call Trace:
[    9.369440]  <TASK>
[    9.369637]  mbt_kunit_exit+0x47/0xf0
[    9.369745]  ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10
[    9.369813]  kunit_try_run_case_cleanup+0x2f/0x40
[    9.369865]  kunit_generic_run_threadfn_adapter+0x1c/0x40
[    9.369922]  kthread+0x10b/0x230
[    9.369965]  ? __pfx_kthread+0x10/0x10
[    9.370013]  ret_from_fork+0x165/0x1b0
[    9.370057]  ? __pfx_kthread+0x10/0x10
[    9.370099]  ret_from_fork_asm+0x1a/0x30
[    9.370188]  </TASK>
[    9.370250] Modules linked in:
[    9.370428] CR2: 0000000000000014
[    9.370657] ---[ end trace 0000000000000000 ]---
[    9.370791] RIP: 0010:ext4_mb_release+0x26e/0x510
[    9.370847] Code: 28 4a cb ff e8 03 5a cf ff 31 db 48 8d 3c 9b 48 83 c3 01 48 c1 e7 04 48 03 bd 60 05 00 00 e8 c9 a6 48 01 48 8b 85 68 03 00 00 <0f> b6 40 14 83 c0 02 39 d8 7f d6 48 8b bd 60 05 00 00 31 db e8 d9
[    9.370996] RSP: 0000:ffffb33b8041fe40 EFLAGS: 00010286
[    9.371050] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[    9.371112] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9a319e36
[    9.371174] RBP: ffff8b89c3502400 R08: 0000000000000001 R09: 0000000000000000
[    9.371235] R10: 0000000000000001 R11: 0000000000000120 R12: ffff8b89c2f49160
[    9.371297] R13: ffff8b89c2f49158 R14: ffff8b89c2f24000 R15: ffff8b89c2f24000
[    9.371358] FS:  0000000000000000(0000) GS:ffff8b8a3381a000(0000) knlGS:0000000000000000
[    9.371428] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.371484] CR2: 0000000000000014 CR3: 0000000009a9c000 CR4: 00000000003506f0
[    9.371598] note: kunit_try_catch[217] exited with irqs disabled
[    9.371861]     # test_new_blocks_simple: try faulted: last line seen fs/ext4/mballoc-test.c:452
[    9.372123]     # test_new_blocks_simple: internal error occurred during test case cleanup: -4
[    9.372440]         not ok 1 block_bits=10 cluster_bits=3 blocks_per_group=8192 group_count=4 desc_size=64
[    9.375702] BUG: kernel NULL pointer dereference, address: 0000000000000014
[    9.375782] #PF: supervisor read access in kernel mode
[    9.375832] #PF: error_code(0x0000) - not-present page
[    9.375881] PGD 0 P4D 0 
[    9.375919] Oops: Oops: 0000 [#2] SMP PTI
[    9.375966] CPU: 0 UID: 0 PID: 219 Comm: kunit_try_catch Tainted: G      D          N  6.16.0-rc7-next-20250722 #1 PREEMPT(voluntary) 
[    9.376085] Tainted: [D]=DIE, [N]=TEST
[    9.376123] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[    9.376220] RIP: 0010:ext4_mb_release+0x26e/0x510
[    9.376275] Code: 28 4a cb ff e8 03 5a cf ff 31 db 48 8d 3c 9b 48 83 c3 01 48 c1 e7 04 48 03 bd 60 05 00 00 e8 c9 a6 48 01 48 8b 85 68 03 00 00 <0f> b6 40 14 83 c0 02 39 d8 7f d6 48 8b bd 60 05 00 00 31 db e8 d9
[    9.376425] RSP: 0000:ffffb33b803f7e40 EFLAGS: 00010286
[    9.376482] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[    9.376546] RDX: 0000000002000008 RSI: ffffffff9a319e36 RDI: ffffffff9a319e36
[    9.376608] RBP: ffff8b89c352a400 R08: 0000000000000000 R09: 0000000000000000
[    9.376669] R10: 0000000000000000 R11: 0000000058d996d7 R12: ffff8b89c2f49cc0
[    9.376730] R13: ffff8b89c2f49cb8 R14: ffff8b89c3524000 R15: ffff8b89c3524000
[    9.376792] FS:  0000000000000000(0000) GS:ffff8b8a3381a000(0000) knlGS:0000000000000000
[    9.376861] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.376913] CR2: 0000000000000014 CR3: 0000000009a9c000 CR4: 00000000003506f0
[    9.376975] Call Trace:
[    9.377004]  <TASK>
[    9.377040]  mbt_kunit_exit+0x47/0xf0
[    9.377089]  ? __pfx_kunit_generic_run_threadfn_adapter+0x10/0x10
[    9.377150]  kunit_try_run_case_cleanup+0x2f/0x40
[    9.377207]  kunit_generic_run_threadfn_adapter+0x1c/0x40
[    9.377266]  kthread+0x10b/0x230
[    9.377308]  ? __pfx_kthread+0x10/0x10
[    9.377353]  ret_from_fork+0x165/0x1b0
[    9.377397]  ? __pfx_kthread+0x10/0x10
[    9.377439]  ret_from_fork_asm+0x1a/0x30
[    9.377505]  </TASK>
[    9.377531] Modules linked in:
[    9.377571] CR2: 0000000000000014
[    9.377609] ---[ end trace 0000000000000000 ]---

---
Bisect log:

# bad: [a933d3dc1968fcfb0ab72879ec304b1971ed1b9a] Add linux-next specific files for 20250723
# good: [89be9a83ccf1f88522317ce02f854f30d6115c41] Linux 6.16-rc7
git bisect start 'HEAD' 'v6.16-rc7'
# bad: [a56f8f8967ad980d45049973561b89dcd9e37e5d] Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect bad a56f8f8967ad980d45049973561b89dcd9e37e5d
# bad: [f6a8dede4030970707e9bae5b3ae76f60df4b75a] Merge branch 'fs-next' of linux-next
git bisect bad f6a8dede4030970707e9bae5b3ae76f60df4b75a
# good: [b863560c5a26fbcf164f5759c98bb5e72e26848d] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git
git bisect good b863560c5a26fbcf164f5759c98bb5e72e26848d
# bad: [690056682cc4de56d8de794bc06a3c04bc7f624b] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs.git
git bisect bad 690056682cc4de56d8de794bc06a3c04bc7f624b
# good: [fea76c3eb7455d1e941fba6fdd89ab41ab7797c8] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux.git
git bisect good fea76c3eb7455d1e941fba6fdd89ab41ab7797c8
# bad: [714a183e8cf1cc1ddddb3318de1694a33f49c694] Merge branch 'dev' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs.git
git bisect bad 714a183e8cf1cc1ddddb3318de1694a33f49c694
# good: [5fb60c0365c4dad347e4958f78976cb733d903f2] f2fs: Pass a folio to __has_merged_page()
git bisect good 5fb60c0365c4dad347e4958f78976cb733d903f2
# bad: [a8a47fa84cc2168b2b3bd645c2c0918eed994fc0] ext4: do not BUG when INLINE_DATA_FL lacks system.data xattr
git bisect bad a8a47fa84cc2168b2b3bd645c2c0918eed994fc0
# good: [a35454ecf8a320c49954fdcdae0e8d3323067632] ext4: use memcpy() instead of strcpy()
git bisect good a35454ecf8a320c49954fdcdae0e8d3323067632
# good: [3772fe7b4225f21a1bfe63e4a338702cc3c153de] ext4: convert sbi->s_mb_free_pending to atomic_t
git bisect good 3772fe7b4225f21a1bfe63e4a338702cc3c153de
# good: [12a5b877c314778ddf9a5c603eeb1803a514ab58] ext4: factor out ext4_mb_might_prefetch()
git bisect good 12a5b877c314778ddf9a5c603eeb1803a514ab58
# bad: [458bfb991155c2e8ba51861d1ef3c81c5a0846f9] ext4: convert free groups order lists to xarrays
git bisect bad 458bfb991155c2e8ba51861d1ef3c81c5a0846f9
# good: [6e0275f6e713f55dd3fc23be317ec11f8db1766d] ext4: factor out ext4_mb_scan_group()
git bisect good 6e0275f6e713f55dd3fc23be317ec11f8db1766d
# first bad commit: [458bfb991155c2e8ba51861d1ef3c81c5a0846f9] ext4: convert free groups order lists to xarrays

Re: [PATCH v3 15/17] ext4: convert free groups order lists to xarrays

Posted by Theodore Ts'o 2 months, 2 weeks ago

On Wed, Jul 23, 2025 at 08:55:14PM -0700, Guenter Roeck wrote:
> Hi,
> 
> On Mon, Jul 14, 2025 at 09:03:25PM +0800, Baokun Li wrote:
> > While traversing the list, holding a spin_lock prevents load_buddy, making
> > direct use of ext4_try_lock_group impossible. This can lead to a bouncing
> > scenario where spin_is_locked(grp_A) succeeds, but ext4_try_lock_group()
> > fails, forcing the list traversal to repeatedly restart from grp_A.
> > 
> 
> This patch causes crashes for pretty much every architecture when
> running unit tests as part of booting.

I'm assuming that you're using a randconfig that happened to enable
CONFIG_EXT4_KUNIT_TESTS=y.

A simpler reprducer is to have a .kunitconfig containing:

CONFIG_KUNIT=y
CONFIG_KUNIT_TEST=y
CONFIG_KUNIT_EXAMPLE_TEST=y
CONFIG_EXT4_KUNIT_TESTS=y

... and then run :./tools/testing/kunit/kunit.py run".

The first failure is actually with [11/17] ext4: fix largest free
orders lists corruption on mb_optimize_scan switch, which triggers a
failure of test_mb_mark_used.

Baokun, can you take a look please?   Many thanks!

	    	       	    	      - Ted

Re: [PATCH v3 15/17] ext4: convert free groups order lists to xarrays

Posted by Zhang Yi 2 months, 2 weeks ago

On 2025/7/24 12:54, Theodore Ts'o wrote:
> On Wed, Jul 23, 2025 at 08:55:14PM -0700, Guenter Roeck wrote:
>> Hi,
>>
>> On Mon, Jul 14, 2025 at 09:03:25PM +0800, Baokun Li wrote:
>>> While traversing the list, holding a spin_lock prevents load_buddy, making
>>> direct use of ext4_try_lock_group impossible. This can lead to a bouncing
>>> scenario where spin_is_locked(grp_A) succeeds, but ext4_try_lock_group()
>>> fails, forcing the list traversal to repeatedly restart from grp_A.
>>>
>>
>> This patch causes crashes for pretty much every architecture when
>> running unit tests as part of booting.
> 
> I'm assuming that you're using a randconfig that happened to enable
> CONFIG_EXT4_KUNIT_TESTS=y.
> 
> A simpler reprducer is to have a .kunitconfig containing:
> 
> CONFIG_KUNIT=y
> CONFIG_KUNIT_TEST=y
> CONFIG_KUNIT_EXAMPLE_TEST=y
> CONFIG_EXT4_KUNIT_TESTS=y
> 
> ... and then run :./tools/testing/kunit/kunit.py run".
> 
> The first failure is actually with [11/17] ext4: fix largest free
> orders lists corruption on mb_optimize_scan switch, which triggers a
> failure of test_mb_mark_used.
> 
> Baokun, can you take a look please?   Many thanks!
> 

Hi Ted and Guenter,

I'm sorry for this regression, we didn't run these tests. Baokun is
currently on a business trip, so I help to look into this issue. The
reason for the failure is that the variable initialization in the
mb unit tests are insufficient, but this series relies on them.

Could you please try the following diff? I have tested it on my
machine, and the issue does not recur. If every thing looks fine, I
will send out the official patch.

Thanks,
Yi.


diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c
index d634c12f1984..a9416b20ff64 100644
--- a/fs/ext4/mballoc-test.c
+++ b/fs/ext4/mballoc-test.c
@@ -155,6 +155,7 @@ static struct super_block *mbt_ext4_alloc_super_block(void)
 	bgl_lock_init(sbi->s_blockgroup_lock);

 	sbi->s_es = &fsb->es;
+	sbi->s_sb = sb;
 	sb->s_fs_info = sbi;

 	up_write(&sb->s_umount);
@@ -802,6 +803,8 @@ static void test_mb_mark_used(struct kunit *test)
 	KUNIT_ASSERT_EQ(test, ret, 0);

 	grp->bb_free = EXT4_CLUSTERS_PER_GROUP(sb);
+	grp->bb_largest_free_order = -1;
+	grp->bb_avg_fragment_size_order = -1;
 	mbt_generate_test_ranges(sb, ranges, TEST_RANGE_COUNT);
 	for (i = 0; i < TEST_RANGE_COUNT; i++)
 		test_mb_mark_used_range(test, &e4b, ranges[i].start,
@@ -875,6 +878,8 @@ static void test_mb_free_blocks(struct kunit *test)
 	ext4_unlock_group(sb, TEST_GOAL_GROUP);

 	grp->bb_free = 0;
+	grp->bb_largest_free_order = -1;
+	grp->bb_avg_fragment_size_order = -1;
 	memset(bitmap, 0xff, sb->s_blocksize);

 	mbt_generate_test_ranges(sb, ranges, TEST_RANGE_COUNT);

Re: [PATCH v3 15/17] ext4: convert free groups order lists to xarrays

Posted by Theodore Ts'o 2 months, 2 weeks ago

On Thu, Jul 24, 2025 at 07:14:58PM +0800, Zhang Yi wrote:
> 
> I'm sorry for this regression, we didn't run these tests.

No worries, I didn't run them either.

> Could you please try the following diff? I have tested it on my
> machine, and the issue does not recur. If every thing looks fine, I
> will send out the official patch.

This patch fixes the test bug which was causing the failure of
test_new_blocks_simple.

However, there is still test failure of test_mb_mark_used in the patch
series starting with bbe11dd13a3f ("ext4: fix largest free orders
lists corruption on mb_optimize_scan switch").  The test failure is
fixed by 458bfb991155 ("ext4: convert free groups order lists to
xarrays").  The reason why this is especialy problematic is that
commit which introduced the problem is marked as "cc: stable", which
means it will get back ported to LTS kernels, thus introducing a
potential bug.

One of the advantages of unit tests is that they are light weight
enough that it is tractable to run them against every commit in the
patch series.  So we should strive to add more unit tests, since it
makes easier to detect regressions.

Anyway, here's the stack trace staring with "ext4: fix largest free
orders lists corruption on mb_optimize_scan switch".  Could you
investigate this failure?  Many thanks!!

						- Ted

[09:35:46] ==================== test_mb_mark_used  ====================
[09:35:46] [ERROR] Test: test_mb_mark_used: missing subtest result line!
[09:35:46] 
[09:35:46] Pid: 35, comm: kunit_try_catch Tainted: G        W        N  6.16.0-rc4-00031-gbbe11dd13a3f-dirty
[09:35:46] RIP: 0033:mb_set_largest_free_order+0x5c/0xc0
[09:35:46] RSP: 00000000a0883d98  EFLAGS: 00010206
[09:35:46] RAX: 0000000060aeaa28 RBX: 0000000060a2d400 RCX: 0000000000000008
[09:35:46] RDX: 0000000060aea9c0 RSI: 0000000000000000 RDI: 0000000060864000
[09:35:46] RBP: 0000000060aea9c0 R08: 0000000000000000 R09: 0000000060a2d400
[09:35:46] R10: 0000000000000400 R11: 0000000060a9cc00 R12: 0000000000000006
[09:35:46] R13: 0000000000000400 R14: 0000000000000305 R15: 0000000000000000
[09:35:46] Kernel panic - not syncing: Segfault with no mm
[09:35:46] CPU: 0 UID: 0 PID: 35 Comm: kunit_try_catch Tainted: G        W        N  6.16.0-rc4-00031-gbbe11dd13a3f-dirty #36 NONE
[09:35:46] Tainted: [W]=WARN, [N]=TEST
[09:35:46] Stack:
[09:35:46]  60210c60 00000200 60a9e400 00000400
[09:35:46]  40060300280 60864000 60a9cc00 60a2d400
[09:35:46]  00000400 60aea9c0 60a9cc00 60aea9c0
[09:35:46] Call Trace:
[09:35:46]  [<60210c60>] ? ext4_mb_generate_buddy+0x1f0/0x230
[09:35:46]  [<60215c3b>] ? test_mb_mark_used+0x28b/0x4e0
[09:35:46]  [<601df5bc>] ? ext4_get_group_desc+0xbc/0x150
[09:35:46]  [<600bf1c0>] ? ktime_get_ts64+0x0/0x190
[09:35:46]  [<60086370>] ? to_kthread+0x0/0x40
[09:35:46]  [<602b559b>] ? kunit_try_run_case+0x7b/0x100
[09:35:46]  [<60086370>] ? to_kthread+0x0/0x40
[09:35:46]  [<602b7850>] ? kunit_generic_run_threadfn_adapter+0x0/0x30
[09:35:46]  [<602b7862>] ? kunit_generic_run_threadfn_adapter+0x12/0x30
[09:35:46]  [<60086a51>] ? kthread+0xf1/0x250
[09:35:46]  [<6004a541>] ? new_thread_handler+0x41/0x60
[09:35:46] [ERROR] Test: test_mb_mark_used: 0 tests run!
[09:35:46] ============= [NO TESTS RUN] test_mb_mark_used =============

Re: [PATCH v3 15/17] ext4: convert free groups order lists to xarrays

Posted by Zhang Yi 2 months, 2 weeks ago

On 2025/7/24 22:54, Theodore Ts'o wrote:
> On Thu, Jul 24, 2025 at 07:14:58PM +0800, Zhang Yi wrote:
>>
>> I'm sorry for this regression, we didn't run these tests.
> 
> No worries, I didn't run them either.
> 
>> Could you please try the following diff? I have tested it on my
>> machine, and the issue does not recur. If every thing looks fine, I
>> will send out the official patch.
> 
> This patch fixes the test bug which was causing the failure of
> test_new_blocks_simple.
> 

The official patch to fix test_new_blocks_simple for the next
branch:

https://lore.kernel.org/linux-ext4/20250725021550.3177573-1-yi.zhang@huaweicloud.com/

> However, there is still test failure of test_mb_mark_used in the patch
> series starting with bbe11dd13a3f ("ext4: fix largest free orders
> lists corruption on mb_optimize_scan switch").  The test failure is
> fixed by 458bfb991155 ("ext4: convert free groups order lists to
> xarrays").  The reason why this is especialy problematic is that
> commit which introduced the problem is marked as "cc: stable", which
> means it will get back ported to LTS kernels, thus introducing a
> potential bug.
> 

Indeed!

> One of the advantages of unit tests is that they are light weight
> enough that it is tractable to run them against every commit in the
> patch series.  So we should strive to add more unit tests, since it
> makes easier to detect regressions.
> 
> Anyway, here's the stack trace staring with "ext4: fix largest free
> orders lists corruption on mb_optimize_scan switch".  Could you
> investigate this failure?  Many thanks!!
> 

Sure! I've sent out the fix that applies to the kernel that has only
merged bbe11dd13a3f ("ext4: fix largest free orders lists corruption
on mb_optimize_scan switch"), but not merged 458bfb991155 ("ext4:
convert free groups order lists to xarrays"). Please give it a try.

https://lore.kernel.org/linux-ext4/20250725021654.3188798-1-yi.zhang@huaweicloud.com/

Best Regards,
Yi.

> 
> [09:35:46] ==================== test_mb_mark_used  ====================
> [09:35:46] [ERROR] Test: test_mb_mark_used: missing subtest result line!
> [09:35:46] 
> [09:35:46] Pid: 35, comm: kunit_try_catch Tainted: G        W        N  6.16.0-rc4-00031-gbbe11dd13a3f-dirty
> [09:35:46] RIP: 0033:mb_set_largest_free_order+0x5c/0xc0
> [09:35:46] RSP: 00000000a0883d98  EFLAGS: 00010206
> [09:35:46] RAX: 0000000060aeaa28 RBX: 0000000060a2d400 RCX: 0000000000000008
> [09:35:46] RDX: 0000000060aea9c0 RSI: 0000000000000000 RDI: 0000000060864000
> [09:35:46] RBP: 0000000060aea9c0 R08: 0000000000000000 R09: 0000000060a2d400
> [09:35:46] R10: 0000000000000400 R11: 0000000060a9cc00 R12: 0000000000000006
> [09:35:46] R13: 0000000000000400 R14: 0000000000000305 R15: 0000000000000000
> [09:35:46] Kernel panic - not syncing: Segfault with no mm
> [09:35:46] CPU: 0 UID: 0 PID: 35 Comm: kunit_try_catch Tainted: G        W        N  6.16.0-rc4-00031-gbbe11dd13a3f-dirty #36 NONE
> [09:35:46] Tainted: [W]=WARN, [N]=TEST
> [09:35:46] Stack:
> [09:35:46]  60210c60 00000200 60a9e400 00000400
> [09:35:46]  40060300280 60864000 60a9cc00 60a2d400
> [09:35:46]  00000400 60aea9c0 60a9cc00 60aea9c0
> [09:35:46] Call Trace:
> [09:35:46]  [<60210c60>] ? ext4_mb_generate_buddy+0x1f0/0x230
> [09:35:46]  [<60215c3b>] ? test_mb_mark_used+0x28b/0x4e0
> [09:35:46]  [<601df5bc>] ? ext4_get_group_desc+0xbc/0x150
> [09:35:46]  [<600bf1c0>] ? ktime_get_ts64+0x0/0x190
> [09:35:46]  [<60086370>] ? to_kthread+0x0/0x40
> [09:35:46]  [<602b559b>] ? kunit_try_run_case+0x7b/0x100
> [09:35:46]  [<60086370>] ? to_kthread+0x0/0x40
> [09:35:46]  [<602b7850>] ? kunit_generic_run_threadfn_adapter+0x0/0x30
> [09:35:46]  [<602b7862>] ? kunit_generic_run_threadfn_adapter+0x12/0x30
> [09:35:46]  [<60086a51>] ? kthread+0xf1/0x250
> [09:35:46]  [<6004a541>] ? new_thread_handler+0x41/0x60
> [09:35:46] [ERROR] Test: test_mb_mark_used: 0 tests run!
> [09:35:46] ============= [NO TESTS RUN] test_mb_mark_used =============
>

Re: [PATCH v3 15/17] ext4: convert free groups order lists to xarrays

Posted by Baokun Li 2 months, 1 week ago

On 7/25/2025 10:28 AM, Zhang Yi wrote:
> On 2025/7/24 22:54, Theodore Ts'o wrote:
>> On Thu, Jul 24, 2025 at 07:14:58PM +0800, Zhang Yi wrote:
>>> I'm sorry for this regression, we didn't run these tests.
>> No worries, I didn't run them either.
>>
>>> Could you please try the following diff? I have tested it on my
>>> machine, and the issue does not recur. If every thing looks fine, I
>>> will send out the official patch.
>> This patch fixes the test bug which was causing the failure of
>> test_new_blocks_simple.
>>
> The official patch to fix test_new_blocks_simple for the next
> branch:
>
> https://lore.kernel.org/linux-ext4/20250725021550.3177573-1-yi.zhang@huaweicloud.com/
>
>> However, there is still test failure of test_mb_mark_used in the patch
>> series starting with bbe11dd13a3f ("ext4: fix largest free orders
>> lists corruption on mb_optimize_scan switch").  The test failure is
>> fixed by 458bfb991155 ("ext4: convert free groups order lists to
>> xarrays").  The reason why this is especialy problematic is that
>> commit which introduced the problem is marked as "cc: stable", which
>> means it will get back ported to LTS kernels, thus introducing a
>> potential bug.
>>
> Indeed!
>
>> One of the advantages of unit tests is that they are light weight
>> enough that it is tractable to run them against every commit in the
>> patch series.  So we should strive to add more unit tests, since it
>> makes easier to detect regressions.
>>
>> Anyway, here's the stack trace staring with "ext4: fix largest free
>> orders lists corruption on mb_optimize_scan switch".  Could you
>> investigate this failure?  Many thanks!!
>>
> Sure! I've sent out the fix that applies to the kernel that has only
> merged bbe11dd13a3f ("ext4: fix largest free orders lists corruption
> on mb_optimize_scan switch"), but not merged 458bfb991155 ("ext4:
> convert free groups order lists to xarrays"). Please give it a try.
>
> https://lore.kernel.org/linux-ext4/20250725021654.3188798-1-yi.zhang@huaweicloud.com/
>
Sorry for the late reply, I haven't had time to look into this this week.
I really appreciate Yi for taking the time to help address these issues.
I'm also very sorry for introducing a regression in the ext4 kunit tests.


Thanks,
Baokun

>
>> [09:35:46] ==================== test_mb_mark_used  ====================
>> [09:35:46] [ERROR] Test: test_mb_mark_used: missing subtest result line!
>> [09:35:46]
>> [09:35:46] Pid: 35, comm: kunit_try_catch Tainted: G        W        N  6.16.0-rc4-00031-gbbe11dd13a3f-dirty
>> [09:35:46] RIP: 0033:mb_set_largest_free_order+0x5c/0xc0
>> [09:35:46] RSP: 00000000a0883d98  EFLAGS: 00010206
>> [09:35:46] RAX: 0000000060aeaa28 RBX: 0000000060a2d400 RCX: 0000000000000008
>> [09:35:46] RDX: 0000000060aea9c0 RSI: 0000000000000000 RDI: 0000000060864000
>> [09:35:46] RBP: 0000000060aea9c0 R08: 0000000000000000 R09: 0000000060a2d400
>> [09:35:46] R10: 0000000000000400 R11: 0000000060a9cc00 R12: 0000000000000006
>> [09:35:46] R13: 0000000000000400 R14: 0000000000000305 R15: 0000000000000000
>> [09:35:46] Kernel panic - not syncing: Segfault with no mm
>> [09:35:46] CPU: 0 UID: 0 PID: 35 Comm: kunit_try_catch Tainted: G        W        N  6.16.0-rc4-00031-gbbe11dd13a3f-dirty #36 NONE
>> [09:35:46] Tainted: [W]=WARN, [N]=TEST
>> [09:35:46] Stack:
>> [09:35:46]  60210c60 00000200 60a9e400 00000400
>> [09:35:46]  40060300280 60864000 60a9cc00 60a2d400
>> [09:35:46]  00000400 60aea9c0 60a9cc00 60aea9c0
>> [09:35:46] Call Trace:
>> [09:35:46]  [<60210c60>] ? ext4_mb_generate_buddy+0x1f0/0x230
>> [09:35:46]  [<60215c3b>] ? test_mb_mark_used+0x28b/0x4e0
>> [09:35:46]  [<601df5bc>] ? ext4_get_group_desc+0xbc/0x150
>> [09:35:46]  [<600bf1c0>] ? ktime_get_ts64+0x0/0x190
>> [09:35:46]  [<60086370>] ? to_kthread+0x0/0x40
>> [09:35:46]  [<602b559b>] ? kunit_try_run_case+0x7b/0x100
>> [09:35:46]  [<60086370>] ? to_kthread+0x0/0x40
>> [09:35:46]  [<602b7850>] ? kunit_generic_run_threadfn_adapter+0x0/0x30
>> [09:35:46]  [<602b7862>] ? kunit_generic_run_threadfn_adapter+0x12/0x30
>> [09:35:46]  [<60086a51>] ? kthread+0xf1/0x250
>> [09:35:46]  [<6004a541>] ? new_thread_handler+0x41/0x60
>> [09:35:46] [ERROR] Test: test_mb_mark_used: 0 tests run!
>> [09:35:46] ============= [NO TESTS RUN] test_mb_mark_used =============
>>

Re: [PATCH v3 15/17] ext4: convert free groups order lists to xarrays

Posted by Guenter Roeck 2 months, 2 weeks ago

On Thu, Jul 24, 2025 at 07:14:58PM +0800, Zhang Yi wrote:
> On 2025/7/24 12:54, Theodore Ts'o wrote:
> > On Wed, Jul 23, 2025 at 08:55:14PM -0700, Guenter Roeck wrote:
> >> Hi,
> >>
> >> On Mon, Jul 14, 2025 at 09:03:25PM +0800, Baokun Li wrote:
> >>> While traversing the list, holding a spin_lock prevents load_buddy, making
> >>> direct use of ext4_try_lock_group impossible. This can lead to a bouncing
> >>> scenario where spin_is_locked(grp_A) succeeds, but ext4_try_lock_group()
> >>> fails, forcing the list traversal to repeatedly restart from grp_A.
> >>>
> >>
> >> This patch causes crashes for pretty much every architecture when
> >> running unit tests as part of booting.
> > 
> > I'm assuming that you're using a randconfig that happened to enable
> > CONFIG_EXT4_KUNIT_TESTS=y.
> > 
> > A simpler reprducer is to have a .kunitconfig containing:
> > 
> > CONFIG_KUNIT=y
> > CONFIG_KUNIT_TEST=y
> > CONFIG_KUNIT_EXAMPLE_TEST=y
> > CONFIG_EXT4_KUNIT_TESTS=y
> > 
> > ... and then run :./tools/testing/kunit/kunit.py run".
> > 
> > The first failure is actually with [11/17] ext4: fix largest free
> > orders lists corruption on mb_optimize_scan switch, which triggers a
> > failure of test_mb_mark_used.
> > 
> > Baokun, can you take a look please?   Many thanks!
> > 
> 
> Hi Ted and Guenter,
> 
> I'm sorry for this regression, we didn't run these tests. Baokun is
> currently on a business trip, so I help to look into this issue. The
> reason for the failure is that the variable initialization in the
> mb unit tests are insufficient, but this series relies on them.
> 
> Could you please try the following diff? I have tested it on my
> machine, and the issue does not recur. If every thing looks fine, I
> will send out the official patch.
> 

Confirmed to fix the problem. Please feel free to add

Tested-by: Guenter Roeck <linux@roeck-us.net>

Thanks,
Guenter

> Thanks,
> Yi.
> 
> 
> diff --git a/fs/ext4/mballoc-test.c b/fs/ext4/mballoc-test.c
> index d634c12f1984..a9416b20ff64 100644
> --- a/fs/ext4/mballoc-test.c
> +++ b/fs/ext4/mballoc-test.c
> @@ -155,6 +155,7 @@ static struct super_block *mbt_ext4_alloc_super_block(void)
>  	bgl_lock_init(sbi->s_blockgroup_lock);
> 
>  	sbi->s_es = &fsb->es;
> +	sbi->s_sb = sb;
>  	sb->s_fs_info = sbi;
> 
>  	up_write(&sb->s_umount);
> @@ -802,6 +803,8 @@ static void test_mb_mark_used(struct kunit *test)
>  	KUNIT_ASSERT_EQ(test, ret, 0);
> 
>  	grp->bb_free = EXT4_CLUSTERS_PER_GROUP(sb);
> +	grp->bb_largest_free_order = -1;
> +	grp->bb_avg_fragment_size_order = -1;
>  	mbt_generate_test_ranges(sb, ranges, TEST_RANGE_COUNT);
>  	for (i = 0; i < TEST_RANGE_COUNT; i++)
>  		test_mb_mark_used_range(test, &e4b, ranges[i].start,
> @@ -875,6 +878,8 @@ static void test_mb_free_blocks(struct kunit *test)
>  	ext4_unlock_group(sb, TEST_GOAL_GROUP);
> 
>  	grp->bb_free = 0;
> +	grp->bb_largest_free_order = -1;
> +	grp->bb_avg_fragment_size_order = -1;
>  	memset(bitmap, 0xff, sb->s_blocksize);
> 
>  	mbt_generate_test_ranges(sb, ranges, TEST_RANGE_COUNT);
> 
> 
>

Re: [PATCH v3 15/17] ext4: convert free groups order lists to xarrays

Posted by Guenter Roeck 2 months, 2 weeks ago

On 7/23/25 21:54, Theodore Ts'o wrote:
> On Wed, Jul 23, 2025 at 08:55:14PM -0700, Guenter Roeck wrote:
>> Hi,
>>
>> On Mon, Jul 14, 2025 at 09:03:25PM +0800, Baokun Li wrote:
>>> While traversing the list, holding a spin_lock prevents load_buddy, making
>>> direct use of ext4_try_lock_group impossible. This can lead to a bouncing
>>> scenario where spin_is_locked(grp_A) succeeds, but ext4_try_lock_group()
>>> fails, forcing the list traversal to repeatedly restart from grp_A.
>>>
>>
>> This patch causes crashes for pretty much every architecture when
>> running unit tests as part of booting.
> 
> I'm assuming that you're using a randconfig that happened to enable
> CONFIG_EXT4_KUNIT_TESTS=y.
> 

I enable as many kunit tests as possible, including CONFIG_EXT4_KUNIT_TESTS=y,
on top of various defconfigs. That results in:
	total: 637 pass: 59 fail: 578
with my qemu boot tests, which in a way is quite impressive ;-).

Guenter

Re: [PATCH v3 15/17] ext4: convert free groups order lists to xarrays

Posted by Jan Kara 2 months, 2 weeks ago

On Mon 14-07-25 21:03:25, Baokun Li wrote:
> While traversing the list, holding a spin_lock prevents load_buddy, making
> direct use of ext4_try_lock_group impossible. This can lead to a bouncing
> scenario where spin_is_locked(grp_A) succeeds, but ext4_try_lock_group()
> fails, forcing the list traversal to repeatedly restart from grp_A.
> 
> In contrast, linear traversal directly uses ext4_try_lock_group(),
> avoiding this bouncing. Therefore, we need a lockless, ordered traversal
> to achieve linear-like efficiency.
> 
> Therefore, this commit converts both average fragment size lists and
> largest free order lists into ordered xarrays.
> 
> In an xarray, the index represents the block group number and the value
> holds the block group information; a non-empty value indicates the block
> group's presence.
> 
> While insertion and deletion complexity remain O(1), lookup complexity
> changes from O(1) to O(nlogn), which may slightly reduce single-threaded
> performance.
> 
> Additionally, xarray insertions might fail, potentially due to memory
> allocation issues. However, since we have linear traversal as a fallback,
> this isn't a major problem. Therefore, we've only added a warning message
> for insertion failures here.
> 
> A helper function ext4_mb_find_good_group_xarray() is added to find good
> groups in the specified xarray starting at the specified position start,
> and when it reaches ngroups-1, it wraps around to 0 and then to start-1.
> This ensures an ordered traversal within the xarray.
> 
> Performance test results are as follows: Single-process operations
> on an empty disk show negligible impact, while multi-process workloads
> demonstrate a noticeable performance gain.
> 
> |CPU: Kunpeng 920   |          P80           |            P1           |
> |Memory: 512GB      |------------------------|-------------------------|
> |960GB SSD (0.5GB/s)| base  |    patched     | base   |    patched     |
> |-------------------|-------|----------------|--------|----------------|
> |mb_optimize_scan=0 | 20097 | 19555 (-2.6%)  | 316141 | 315636 (-0.2%) |
> |mb_optimize_scan=1 | 13318 | 15496 (+16.3%) | 325273 | 323569 (-0.5%) |
> 
> |CPU: AMD 9654 * 2  |          P96           |             P1          |
> |Memory: 1536GB     |------------------------|-------------------------|
> |960GB SSD (1GB/s)  | base  |    patched     | base   |    patched     |
> |-------------------|-------|----------------|--------|----------------|
> |mb_optimize_scan=0 | 53603 | 53192 (-0.7%)  | 214243 | 212678 (-0.7%) |
> |mb_optimize_scan=1 | 20887 | 37636 (+80.1%) | 213632 | 214189 (+0.2%) |
> 
> Signed-off-by: Baokun Li <libaokun1@huawei.com>

The patch looks good and the results are nice. I've just noticed two typos:

> +static inline void ext4_mb_avg_fragment_size_destory(struct ext4_sb_info *sbi)
						^^^ destroy


> +{
> +	for (int i = 0; i < MB_NUM_ORDERS(sbi->s_sb); i++)
> +		xa_destroy(&sbi->s_mb_avg_fragment_size[i]);
> +	kfree(sbi->s_mb_avg_fragment_size);
> +}
> +
> +static inline void ext4_mb_largest_free_orders_destory(struct ext4_sb_info *sbi)
						  ^^^ destroy

> +{
> +	for (int i = 0; i < MB_NUM_ORDERS(sbi->s_sb); i++)
> +		xa_destroy(&sbi->s_mb_largest_free_orders[i]);
> +	kfree(sbi->s_mb_largest_free_orders);
> +}

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

Re: [PATCH v3 15/17] ext4: convert free groups order lists to xarrays

Posted by Baokun Li 2 months, 2 weeks ago

On 2025/7/21 19:07, Jan Kara wrote:
> On Mon 14-07-25 21:03:25, Baokun Li wrote:
>> |CPU: Kunpeng 920   |          P80           |            P1           |
>> |Memory: 512GB      |------------------------|-------------------------|
>> |960GB SSD (0.5GB/s)| base  |    patched     | base   |    patched     |
>> |-------------------|-------|----------------|--------|----------------|
>> |mb_optimize_scan=0 | 20097 | 19555 (-2.6%)  | 316141 | 315636 (-0.2%) |
>> |mb_optimize_scan=1 | 13318 | 15496 (+16.3%) | 325273 | 323569 (-0.5%) |
>>
>> |CPU: AMD 9654 * 2  |          P96           |             P1          |
>> |Memory: 1536GB     |------------------------|-------------------------|
>> |960GB SSD (1GB/s)  | base  |    patched     | base   |    patched     |
>> |-------------------|-------|----------------|--------|----------------|
>> |mb_optimize_scan=0 | 53603 | 53192 (-0.7%)  | 214243 | 212678 (-0.7%) |
>> |mb_optimize_scan=1 | 20887 | 37636 (+80.1%) | 213632 | 214189 (+0.2%) |
>>
>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
> The patch looks good and the results are nice. I've just noticed two typos:
>
>> +static inline void ext4_mb_avg_fragment_size_destory(struct ext4_sb_info *sbi)
> 						^^^ destroy
>
>
>> +static inline void ext4_mb_largest_free_orders_destory(struct ext4_sb_info *sbi)
> 						  ^^^ destroy

Hi Jan, thanks for the review! While examining this patch, I also
identified a comment formatting error that I regret overlooking previously.
My apologies for this oversight.

Hey Ted, could you please help apply the following diff to correct the
spelling errors and comment formatting issues? Or would you prefer I send
out a new patch series or a separate cleanup patch?


Thanks,
Baokun

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a9eb997b8c9b..c61955cba370 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -863,10 +863,10 @@ mb_update_avg_fragment_size(struct super_block *sb, struct ext4_group_info *grp)
  	grp->bb_avg_fragment_size_order = new;
  	if (new >= 0) {
  		/*
-		* Cannot use __GFP_NOFAIL because we hold the group lock.
-		* Although allocation for insertion may fails, it's not fatal
-		* as we have linear traversal to fall back on.
-		*/
+		 * Cannot use __GFP_NOFAIL because we hold the group lock.
+		 * Although allocation for insertion may fails, it's not fatal
+		 * as we have linear traversal to fall back on.
+		 */
  		int err = xa_insert(&sbi->s_mb_avg_fragment_size[new],
  				    grp->bb_group, grp, GFP_ATOMIC);
  		if (err)
@@ -1201,10 +1201,10 @@ mb_set_largest_free_order(struct super_block *sb, struct ext4_group_info *grp)
  	grp->bb_largest_free_order = new;
  	if (test_opt2(sb, MB_OPTIMIZE_SCAN) && new >= 0 && grp->bb_free) {
  		/*
-		* Cannot use __GFP_NOFAIL because we hold the group lock.
-		* Although allocation for insertion may fails, it's not fatal
-		* as we have linear traversal to fall back on.
-		*/
+		 * Cannot use __GFP_NOFAIL because we hold the group lock.
+		 * Although allocation for insertion may fails, it's not fatal
+		 * as we have linear traversal to fall back on.
+		 */
  		int err = xa_insert(&sbi->s_mb_largest_free_orders[new],
  				    grp->bb_group, grp, GFP_ATOMIC);
  		if (err)
@@ -3657,14 +3657,14 @@ static void ext4_discard_work(struct work_struct *work)
  		ext4_mb_unload_buddy(&e4b);
  }
  
-static inline void ext4_mb_avg_fragment_size_destory(struct ext4_sb_info *sbi)
+static inline void ext4_mb_avg_fragment_size_destroy(struct ext4_sb_info *sbi)
  {
  	for (int i = 0; i < MB_NUM_ORDERS(sbi->s_sb); i++)
  		xa_destroy(&sbi->s_mb_avg_fragment_size[i]);
  	kfree(sbi->s_mb_avg_fragment_size);
  }
  
-static inline void ext4_mb_largest_free_orders_destory(struct ext4_sb_info *sbi)
+static inline void ext4_mb_largest_free_orders_destroy(struct ext4_sb_info *sbi)
  {
  	for (int i = 0; i < MB_NUM_ORDERS(sbi->s_sb); i++)
  		xa_destroy(&sbi->s_mb_largest_free_orders[i]);
@@ -3818,8 +3818,8 @@ int ext4_mb_init(struct super_block *sb)
  	kfree(sbi->s_mb_last_groups);
  	sbi->s_mb_last_groups = NULL;
  out:
-	ext4_mb_avg_fragment_size_destory(sbi);
-	ext4_mb_largest_free_orders_destory(sbi);
+	ext4_mb_avg_fragment_size_destroy(sbi);
+	ext4_mb_largest_free_orders_destroy(sbi);
  	kfree(sbi->s_mb_offsets);
  	sbi->s_mb_offsets = NULL;
  	kfree(sbi->s_mb_maxs);
@@ -3886,8 +3886,8 @@ void ext4_mb_release(struct super_block *sb)
  		kvfree(group_info);
  		rcu_read_unlock();
  	}
-	ext4_mb_avg_fragment_size_destory(sbi);
-	ext4_mb_largest_free_orders_destory(sbi);
+	ext4_mb_avg_fragment_size_destroy(sbi);
+	ext4_mb_largest_free_orders_destroy(sbi);
  	kfree(sbi->s_mb_offsets);
  	kfree(sbi->s_mb_maxs);
  	iput(sbi->s_buddy_cache);

Re: [PATCH v3 15/17] ext4: convert free groups order lists to xarrays

Posted by Baokun Li 2 months, 2 weeks ago

在 2025/7/21 20:33, Baokun Li 写道:
> On 2025/7/21 19:07, Jan Kara wrote:
>> On Mon 14-07-25 21:03:25, Baokun Li wrote:
>>> |CPU: Kunpeng 920   |          P80           |            P1           |
>>> |Memory: 512GB      |------------------------|-------------------------|
>>> |960GB SSD (0.5GB/s)| base  |    patched     | base   |    patched     |
>>> |-------------------|-------|----------------|--------|----------------|
>>> |mb_optimize_scan=0 | 20097 | 19555 (-2.6%)  | 316141 | 315636 (-0.2%) |
>>> |mb_optimize_scan=1 | 13318 | 15496 (+16.3%) | 325273 | 323569 (-0.5%) |
>>>
>>> |CPU: AMD 9654 * 2  |          P96           |             P1          |
>>> |Memory: 1536GB     |------------------------|-------------------------|
>>> |960GB SSD (1GB/s)  | base  |    patched     | base   |    patched     |
>>> |-------------------|-------|----------------|--------|----------------|
>>> |mb_optimize_scan=0 | 53603 | 53192 (-0.7%)  | 214243 | 212678 (-0.7%) |
>>> |mb_optimize_scan=1 | 20887 | 37636 (+80.1%) | 213632 | 214189 (+0.2%) |
>>>
>>> Signed-off-by: Baokun Li <libaokun1@huawei.com>
>> The patch looks good and the results are nice. I've just noticed two 
>> typos:
>>
>>> +static inline void ext4_mb_avg_fragment_size_destory(struct 
>>> ext4_sb_info *sbi)
>>                         ^^^ destroy
>>
>>
>>> +static inline void ext4_mb_largest_free_orders_destory(struct 
>>> ext4_sb_info *sbi)
>>                           ^^^ destroy
> 
> Hi Jan, thanks for the review! While examining this patch, I also
> identified a comment formatting error that I regret overlooking previously.
> My apologies for this oversight.
> 
> Hey Ted, could you please help apply the following diff to correct the
> spelling errors and comment formatting issues? Or would you prefer I send
> out a new patch series or a separate cleanup patch?
> 
> 
Sorry, thunderbird is automatically converting tabs to spaces in the
code, try the diff below.


Thanks,
Baokun


diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a9eb997b8c9b..c61955cba370 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -863,10 +863,10 @@ mb_update_avg_fragment_size(struct super_block 
*sb, struct ext4_group_info *grp)
  	grp->bb_avg_fragment_size_order = new;
  	if (new >= 0) {
  		/*
-		* Cannot use __GFP_NOFAIL because we hold the group lock.
-		* Although allocation for insertion may fails, it's not fatal
-		* as we have linear traversal to fall back on.
-		*/
+		 * Cannot use __GFP_NOFAIL because we hold the group lock.
+		 * Although allocation for insertion may fails, it's not fatal
+		 * as we have linear traversal to fall back on.
+		 */
  		int err = xa_insert(&sbi->s_mb_avg_fragment_size[new],
  				    grp->bb_group, grp, GFP_ATOMIC);
  		if (err)
@@ -1201,10 +1201,10 @@ mb_set_largest_free_order(struct super_block 
*sb, struct ext4_group_info *grp)
  	grp->bb_largest_free_order = new;
  	if (test_opt2(sb, MB_OPTIMIZE_SCAN) && new >= 0 && grp->bb_free) {
  		/*
-		* Cannot use __GFP_NOFAIL because we hold the group lock.
-		* Although allocation for insertion may fails, it's not fatal
-		* as we have linear traversal to fall back on.
-		*/
+		 * Cannot use __GFP_NOFAIL because we hold the group lock.
+		 * Although allocation for insertion may fails, it's not fatal
+		 * as we have linear traversal to fall back on.
+		 */
  		int err = xa_insert(&sbi->s_mb_largest_free_orders[new],
  				    grp->bb_group, grp, GFP_ATOMIC);
  		if (err)
@@ -3657,14 +3657,14 @@ static void ext4_discard_work(struct work_struct 
*work)
  		ext4_mb_unload_buddy(&e4b);
  }

-static inline void ext4_mb_avg_fragment_size_destory(struct 
ext4_sb_info *sbi)
+static inline void ext4_mb_avg_fragment_size_destroy(struct 
ext4_sb_info *sbi)
  {
  	for (int i = 0; i < MB_NUM_ORDERS(sbi->s_sb); i++)
  		xa_destroy(&sbi->s_mb_avg_fragment_size[i]);
  	kfree(sbi->s_mb_avg_fragment_size);
  }

-static inline void ext4_mb_largest_free_orders_destory(struct 
ext4_sb_info *sbi)
+static inline void ext4_mb_largest_free_orders_destroy(struct 
ext4_sb_info *sbi)
  {
  	for (int i = 0; i < MB_NUM_ORDERS(sbi->s_sb); i++)
  		xa_destroy(&sbi->s_mb_largest_free_orders[i]);
@@ -3818,8 +3818,8 @@ int ext4_mb_init(struct super_block *sb)
  	kfree(sbi->s_mb_last_groups);
  	sbi->s_mb_last_groups = NULL;
  out:
-	ext4_mb_avg_fragment_size_destory(sbi);
-	ext4_mb_largest_free_orders_destory(sbi);
+	ext4_mb_avg_fragment_size_destroy(sbi);
+	ext4_mb_largest_free_orders_destroy(sbi);
  	kfree(sbi->s_mb_offsets);
  	sbi->s_mb_offsets = NULL;
  	kfree(sbi->s_mb_maxs);
@@ -3886,8 +3886,8 @@ void ext4_mb_release(struct super_block *sb)
  		kvfree(group_info);
  		rcu_read_unlock();
  	}
-	ext4_mb_avg_fragment_size_destory(sbi);
-	ext4_mb_largest_free_orders_destory(sbi);
+	ext4_mb_avg_fragment_size_destroy(sbi);
+	ext4_mb_largest_free_orders_destroy(sbi);
  	kfree(sbi->s_mb_offsets);
  	kfree(sbi->s_mb_maxs);
  	iput(sbi->s_buddy_cache);

Re: [PATCH v3 15/17] ext4: convert free groups order lists to xarrays

Posted by Theodore Ts'o 2 months, 2 weeks ago

Thanks, Baokun!  I've updated the ext4 dev branch with the spelling
fixes integrated into "ext4: convert free groups order lists to
xarrays".

						- Ted

Re: [PATCH v3 15/17] ext4: convert free groups order lists to xarrays

Posted by Baokun Li 2 months, 2 weeks ago

On 7/22/2025 2:01 AM, Theodore Ts'o wrote:
> Thanks, Baokun!  I've updated the ext4 dev branch with the spelling
> fixes integrated into "ext4: convert free groups order lists to
> xarrays".
>
> 						- Ted
>
Thanks for updating the code!


Regards,
Baokun