fs/btrfs/block-group.c | 11 +++++++++++ fs/btrfs/block-group.h | 1 + 2 files changed, 12 insertions(+)
Differential analysis of block-group.c shows an inconsistency between
btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes().
When space is reserved, btrfs_use_block_group_size_class() is called to
set a block group's size class, specializing it for a specific allocation
size to reduce fragmentation. However, when these reservations are
subsequently freed (e.g., due to an error or transaction abort),
btrfs_free_reserved_bytes() fails to perform the corresponding cleanup.
This leads to a state leak where a block group remains stuck with a
specific size class even if it contains no used or reserved bytes. This
stale state causes find_free_extent to unnecessarily skip these block
groups for mismatched size requests, leading to suboptimal allocation
behavior.
Fix this by resetting the size class to BTRFS_BG_SZ_NONE in
btrfs_free_reserved_bytes() when the block group becomes completely
empty.
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
---
fs/btrfs/block-group.c | 11 +++++++++++
fs/btrfs/block-group.h | 1 +
2 files changed, 12 insertions(+)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 08b14449fabe..1ecac4613a3e 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3865,6 +3865,10 @@ void btrfs_free_reserved_bytes(struct btrfs_block_group *cache, u64 num_bytes,
spin_lock(&space_info->lock);
spin_lock(&cache->lock);
+
+ if (btrfs_block_group_should_use_size_class(cache))
+ btrfs_maybe_reset_size_class(cache);
+
bg_ro = cache->ro;
cache->reserved -= num_bytes;
if (is_delalloc)
@@ -4717,3 +4721,10 @@ bool btrfs_block_group_should_use_size_class(const struct btrfs_block_group *bg)
return false;
return true;
}
+
+void btrfs_maybe_reset_size_class(struct btrfs_block_group *bg)
+{
+ lockdep_assert_held(&bg->lock);
+ if (bg->used == 0 && bg->reserved == 0)
+ bg->size_class = BTRFS_BG_SZ_NONE;
+}
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index 5f933455118c..7e02db8a8bc6 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -395,5 +395,6 @@ int btrfs_use_block_group_size_class(struct btrfs_block_group *bg,
enum btrfs_block_group_size_class size_class,
bool force_wrong_size_class);
bool btrfs_block_group_should_use_size_class(const struct btrfs_block_group *bg);
+void btrfs_maybe_reset_size_class(struct btrfs_block_group *bg);
#endif /* BTRFS_BLOCK_GROUP_H */
--
2.25.1
On Sat, Jan 10, 2026 at 8:34 PM Jiasheng Jiang
<jiashengjiangcool@gmail.com> wrote:
>
> Differential analysis of block-group.c shows an inconsistency between
> btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes().
>
> When space is reserved, btrfs_use_block_group_size_class() is called to
> set a block group's size class, specializing it for a specific allocation
> size to reduce fragmentation. However, when these reservations are
> subsequently freed (e.g., due to an error or transaction abort),
> btrfs_free_reserved_bytes() fails to perform the corresponding cleanup.
>
> This leads to a state leak where a block group remains stuck with a
> specific size class even if it contains no used or reserved bytes. This
> stale state causes find_free_extent to unnecessarily skip these block
> groups for mismatched size requests, leading to suboptimal allocation
> behavior.
>
> Fix this by resetting the size class to BTRFS_BG_SZ_NONE in
> btrfs_free_reserved_bytes() when the block group becomes completely
> empty.
>
> Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
> ---
> fs/btrfs/block-group.c | 11 +++++++++++
> fs/btrfs/block-group.h | 1 +
> 2 files changed, 12 insertions(+)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 08b14449fabe..1ecac4613a3e 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -3865,6 +3865,10 @@ void btrfs_free_reserved_bytes(struct btrfs_block_group *cache, u64 num_bytes,
>
> spin_lock(&space_info->lock);
> spin_lock(&cache->lock);
> +
> + if (btrfs_block_group_should_use_size_class(cache))
> + btrfs_maybe_reset_size_class(cache);
This will do nothing, since we decrement the block group's reserved
counter below, and btrfs_maybe_reset_size_class() only resets the
size class if cache->reserved == 0.
> +
> bg_ro = cache->ro;
> cache->reserved -= num_bytes;
> if (is_delalloc)
> @@ -4717,3 +4721,10 @@ bool btrfs_block_group_should_use_size_class(const struct btrfs_block_group *bg)
> return false;
> return true;
> }
> +
> +void btrfs_maybe_reset_size_class(struct btrfs_block_group *bg)
> +{
> + lockdep_assert_held(&bg->lock);
> + if (bg->used == 0 && bg->reserved == 0)
> + bg->size_class = BTRFS_BG_SZ_NONE;
> +}
This is so short and only used in one place.
So no need to make this a function and certainly no need to export it
as it's only used in this file.
Thanks.
> diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
> index 5f933455118c..7e02db8a8bc6 100644
> --- a/fs/btrfs/block-group.h
> +++ b/fs/btrfs/block-group.h
> @@ -395,5 +395,6 @@ int btrfs_use_block_group_size_class(struct btrfs_block_group *bg,
> enum btrfs_block_group_size_class size_class,
> bool force_wrong_size_class);
> bool btrfs_block_group_should_use_size_class(const struct btrfs_block_group *bg);
> +void btrfs_maybe_reset_size_class(struct btrfs_block_group *bg);
>
> #endif /* BTRFS_BLOCK_GROUP_H */
> --
> 2.25.1
>
>
Differential analysis of block-group.c shows an inconsistency between
btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes().
When space is reserved, btrfs_use_block_group_size_class() is called to
set a block group's size class, specializing it for a specific allocation
size to reduce fragmentation. However, when these reservations are
subsequently freed (e.g., due to an error or transaction abort),
btrfs_free_reserved_bytes() fails to perform the corresponding cleanup.
This leads to a state leak where a block group remains stuck with a
specific size class even if it contains no used or reserved bytes. This
stale state causes find_free_extent to unnecessarily skip these block
groups for mismatched size requests, leading to suboptimal allocation
behavior.
Fix this by resetting the size class to BTRFS_BG_SZ_NONE in
btrfs_free_reserved_bytes() when the block group becomes completely
empty.
Fixes: 606d1bf10d7e ("btrfs: migrate the block group space accounting helpers")
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
---
Changelog:
v1 -> v2:
1. Inlined btrfs_maybe_reset_size_class() function.
2. Moved check below the reserved bytes decrement in btrfs_free_reserved_bytes().
---
fs/btrfs/block-group.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 08b14449fabe..8339ad001d3f 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3867,6 +3867,12 @@ void btrfs_free_reserved_bytes(struct btrfs_block_group *cache, u64 num_bytes,
spin_lock(&cache->lock);
bg_ro = cache->ro;
cache->reserved -= num_bytes;
+
+ if (btrfs_block_group_should_use_size_class(cache)) {
+ if (cache->used == 0 && cache->reserved == 0)
+ cache->size_class = BTRFS_BG_SZ_NONE;
+ }
+
if (is_delalloc)
cache->delalloc_bytes -= num_bytes;
spin_unlock(&cache->lock);
--
2.25.1
On Sun, Jan 11, 2026 at 8:25 PM Jiasheng Jiang
<jiashengjiangcool@gmail.com> wrote:
>
> Differential analysis of block-group.c shows an inconsistency between
> btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes().
>
> When space is reserved, btrfs_use_block_group_size_class() is called to
> set a block group's size class, specializing it for a specific allocation
> size to reduce fragmentation. However, when these reservations are
> subsequently freed (e.g., due to an error or transaction abort),
> btrfs_free_reserved_bytes() fails to perform the corresponding cleanup.
>
> This leads to a state leak where a block group remains stuck with a
> specific size class even if it contains no used or reserved bytes. This
> stale state causes find_free_extent to unnecessarily skip these block
> groups for mismatched size requests, leading to suboptimal allocation
> behavior.
Not necessarily always. If there are subsequent allocations for the
same extent size, then there's no problem at all.
There's more than skipping, it can cause allocation of new block
groups if there are none with a matching size class and there aren't
any without a size class.
I wonder if you observed this in practice and what kind of workload.
I think that should be rephrased because as it's stated it gives the
wrong idea that it will always cause bad behaviour, while in reality
that depends a lot on the workload.
>
> Fix this by resetting the size class to BTRFS_BG_SZ_NONE in
> btrfs_free_reserved_bytes() when the block group becomes completely
> empty.
>
> Fixes: 606d1bf10d7e ("btrfs: migrate the block group space accounting helpers")
What? That's completely wrong.
First, that commit only moved code around.
Secondly, that commit happened (2019) before we had support for block
group size classes (2022).
The proper commit would be 52bb7a2166af ("btrfs: introduce size class
to block group allocator").
> Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
> ---
> Changelog:
>
> v1 -> v2:
>
> 1. Inlined btrfs_maybe_reset_size_class() function.
> 2. Moved check below the reserved bytes decrement in btrfs_free_reserved_bytes().
> ---
> fs/btrfs/block-group.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 08b14449fabe..8339ad001d3f 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -3867,6 +3867,12 @@ void btrfs_free_reserved_bytes(struct btrfs_block_group *cache, u64 num_bytes,
> spin_lock(&cache->lock);
> bg_ro = cache->ro;
> cache->reserved -= num_bytes;
> +
> + if (btrfs_block_group_should_use_size_class(cache)) {
> + if (cache->used == 0 && cache->reserved == 0)
> + cache->size_class = BTRFS_BG_SZ_NONE;
> + }
> +
> if (is_delalloc)
> cache->delalloc_bytes -= num_bytes;
> spin_unlock(&cache->lock);
> --
> 2.25.1
>
On Mon, Jan 12, 2026 at 12:39:43 +0000, Filipe Manana fdmanana@kernel.org wrote:
> On Sun, Jan 11, 2026 at 8:25 PM Jiasheng Jiang
> <jiashengjiangcool@gmail.com> wrote:
>>
>> Differential analysis of block-group.c shows an inconsistency between
>> btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes().
>>
>> When space is reserved, btrfs_use_block_group_size_class() is called to
>> set a block group's size class, specializing it for a specific allocation
>> size to reduce fragmentation. However, when these reservations are
>> subsequently freed (e.g., due to an error or transaction abort),
>> btrfs_free_reserved_bytes() fails to perform the corresponding cleanup.
>>
>> This leads to a state leak where a block group remains stuck with a
>> specific size class even if it contains no used or reserved bytes. This
>> stale state causes find_free_extent to unnecessarily skip these block
>> groups for mismatched size requests, leading to suboptimal allocation
>> behavior.
>
> Not necessarily always. If there are subsequent allocations for the
> same extent size, then there's no problem at all.
>
>There's more than skipping, it can cause allocation of new block
>groups if there are none with a matching size class and there aren't
>any without a size class.
>
> I wonder if you observed this in practice and what kind of workload.
>
> I think that should be rephrased because as it's stated it gives the
> wrong idea that it will always cause bad behaviour, while in reality
> that depends a lot on the workload.
You are right. This inconsistency was identified through differential analysis of the space accounting logic. I haven't observed it in a specific production workload yet. I will rephrase the description in v3 to clarify that the impact is workload-dependent and can lead to unnecessary allocation of new block groups.
>>
>> Fix this by resetting the size class to BTRFS_BG_SZ_NONE in
>> btrfs_free_reserved_bytes() when the block group becomes completely
>> empty.
>>
>> Fixes: 606d1bf10d7e ("btrfs: migrate the block group space accounting helpers")
>
> What? That's completely wrong.
>
> First, that commit only moved code around.
> Secondly, that commit happened (2019) before we had support for block
> group size classes (2022).
>
> The proper commit would be 52bb7a2166af ("btrfs: introduce size class
> to block group allocator").
My apologies for the oversight. I will correct the Fixes tag to 52bb7a2166af in v3.
I will send a v3 with the updated commit message and corrected tags shortly.
Best regards,
Jiasheng
Differential analysis of block-group.c shows an inconsistency between
btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes().
When space is reserved, btrfs_use_block_group_size_class() is called to
set a block group's size class, specializing it for a specific allocation
size to reduce fragmentation. However, when these reservations are
subsequently freed (e.g., due to an error or transaction abort),
btrfs_free_reserved_bytes() fails to perform the corresponding cleanup.
This leads to a state where a block group remains stuck with a specific
size class even if it contains no used or reserved bytes. While the
impact depends on the workload, this stale state can cause
find_free_extent to unnecessarily skip these block groups for mismatched
size requests.
In some cases, it may even trigger the allocation of new block groups if
no matching or unassigned block groups are available.
Fix this by resetting the size class to BTRFS_BG_SZ_NONE in
btrfs_free_reserved_bytes() when the block group becomes completely
empty.
Fixes: 52bb7a2166af ("btrfs: introduce size class to block group allocator")
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
---
Changelog:
v2 -> v3:
1. Corrected the "Fixes" tag to 52bb7a2166af.
2. Updated the commit message to reflect that the performance impact is workload-dependent.
3. Added mention that the issue can lead to unnecessary allocation of new block groups.
v1 -> v2:
1. Inlined btrfs_maybe_reset_size_class() function.
2. Moved check below the reserved bytes decrement in btrfs_free_reserved_bytes().
---
fs/btrfs/block-group.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 08b14449fabe..8339ad001d3f 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3867,6 +3867,12 @@ void btrfs_free_reserved_bytes(struct btrfs_block_group *cache, u64 num_bytes,
spin_lock(&cache->lock);
bg_ro = cache->ro;
cache->reserved -= num_bytes;
+
+ if (btrfs_block_group_should_use_size_class(cache)) {
+ if (cache->used == 0 && cache->reserved == 0)
+ cache->size_class = BTRFS_BG_SZ_NONE;
+ }
+
if (is_delalloc)
cache->delalloc_bytes -= num_bytes;
spin_unlock(&cache->lock);
--
2.25.1
On Mon, Jan 12, 2026 at 03:02:48PM +0000, Jiasheng Jiang wrote:
> Differential analysis of block-group.c shows an inconsistency between
> btrfs_add_reserved_bytes() and btrfs_free_reserved_bytes().
>
> When space is reserved, btrfs_use_block_group_size_class() is called to
> set a block group's size class, specializing it for a specific allocation
> size to reduce fragmentation. However, when these reservations are
> subsequently freed (e.g., due to an error or transaction abort),
> btrfs_free_reserved_bytes() fails to perform the corresponding cleanup.
>
> This leads to a state where a block group remains stuck with a specific
> size class even if it contains no used or reserved bytes. While the
> impact depends on the workload, this stale state can cause
> find_free_extent to unnecessarily skip these block groups for mismatched
> size requests.
> In some cases, it may even trigger the allocation of new block groups if
> no matching or unassigned block groups are available.
Sorry I missed v1 and v2 of this, so I may be late with my questions.
This seems reasonable enough, but I am wondering why you are only doing
it for the error handling path.
We can also legitimately free the last extent in a block group which will
go through btrfs_update_block_group(). That will either mark it unused or
for async discard, which may eventually result in the bg getting cleaned up.
But in the meantime it is eligible to be used again so by the same
reasoning as this patch, it shouldn't have a size class.
That aside, I don't think this should be a huge issue for two reasons:
1. size classes are advisory and allocations won't fail because of them,
so I don't think your assertion about triggering new allocations is
accurate. (see LOOP_WRONG_SIZE_CLASS)
2. the bg is likely to end up getting used anyway, so it would take
pretty specific conditions to meaningfully "leak" one.
If this issue is something you have observed in practice rather than a
(valid) theoretical problem, I'd be curious to hear more about your
workload and the negative effects you observed.
One final point: this also makes me wonder if it would be beneficial to
merge some of the logic on hitting 0 used+reserved between
btrfs_update_block_group() and btrfs_free_reserve_bytes(). I imagine we
might be able to also miss a block group that needs to be marked unused
on this path.
Thanks,
Boris
>
> Fix this by resetting the size class to BTRFS_BG_SZ_NONE in
> btrfs_free_reserved_bytes() when the block group becomes completely
> empty.
>
> Fixes: 52bb7a2166af ("btrfs: introduce size class to block group allocator")
> Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
> ---
> Changelog:
>
> v2 -> v3:
> 1. Corrected the "Fixes" tag to 52bb7a2166af.
> 2. Updated the commit message to reflect that the performance impact is workload-dependent.
> 3. Added mention that the issue can lead to unnecessary allocation of new block groups.
>
> v1 -> v2:
> 1. Inlined btrfs_maybe_reset_size_class() function.
> 2. Moved check below the reserved bytes decrement in btrfs_free_reserved_bytes().
> ---
> fs/btrfs/block-group.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 08b14449fabe..8339ad001d3f 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -3867,6 +3867,12 @@ void btrfs_free_reserved_bytes(struct btrfs_block_group *cache, u64 num_bytes,
> spin_lock(&cache->lock);
> bg_ro = cache->ro;
> cache->reserved -= num_bytes;
> +
> + if (btrfs_block_group_should_use_size_class(cache)) {
> + if (cache->used == 0 && cache->reserved == 0)
> + cache->size_class = BTRFS_BG_SZ_NONE;
> + }
> +
> if (is_delalloc)
> cache->delalloc_bytes -= num_bytes;
> spin_unlock(&cache->lock);
> --
> 2.25.1
>
Differential analysis of block-group.c shows an inconsistency in how
block group size classes are managed.
Currently, btrfs_use_block_group_size_class() sets a block group's size
class to specialize it for a specific allocation size. However, this
size class remains "stale" even if the block group becomes completely
empty (both used and reserved bytes reach zero).
This happens in two scenarios:
1. When space reservations are freed (e.g., due to errors or transaction
aborts) via btrfs_free_reserved_bytes().
2. When the last extent in a block group is freed via
btrfs_update_block_group().
While size classes are advisory, a stale size class can cause
find_free_extent to unnecessarily skip candidate block groups during
initial search loops. This undermines the purpose of size classes—to
reduce fragmentation—by keeping block groups restricted to a specific
size class when they could be reused for any size.
Fix this by resetting the size class to BTRFS_BG_SZ_NONE whenever a
block group's used and reserved counts both reach zero. This ensures
that empty block groups are fully available for any allocation size in
the next cycle.
Fixes: 52bb7a2166af ("btrfs: introduce size class to block group allocator")
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
---
Changelog:
v3 -> v4:
1. Introduced btrfs_maybe_reset_size_class() helper to unify the logic.
2. Expanded the fix to include btrfs_update_block_group() to handle cases where the last extent in a block group is freed.
3. Refined the commit message to clarify that size classes are advisory and their stale state impacts allocation efficiency rather than causing absolute allocation failures.
v2 -> v3:
1. Corrected the "Fixes" tag to 52bb7a2166af.
2. Updated the commit message to reflect that the performance impact is workload-dependent.
3. Added mention that the issue can lead to unnecessary allocation of new block groups.
v1 -> v2:
1. Inlined btrfs_maybe_reset_size_class() function.
2. Moved check below the reserved bytes decrement in btrfs_free_reserved_bytes().
---
fs/btrfs/block-group.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 08b14449fabe..343d7724939f 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3675,6 +3675,14 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans)
return ret;
}
+static void btrfs_maybe_reset_size_class(struct btrfs_block_group *cache)
+{
+ lockdep_assert_held(&cache->lock);
+ if (btrfs_block_group_should_use_size_class(cache) &&
+ cache->used == 0 && cache->reserved == 0)
+ cache->size_class = BTRFS_BG_SZ_NONE;
+}
+
int btrfs_update_block_group(struct btrfs_trans_handle *trans,
u64 bytenr, u64 num_bytes, bool alloc)
{
@@ -3739,6 +3747,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
old_val -= num_bytes;
cache->used = old_val;
cache->pinned += num_bytes;
+ btrfs_maybe_reset_size_class(cache);
btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
space_info->bytes_used -= num_bytes;
space_info->disk_used -= num_bytes * factor;
@@ -3867,6 +3876,7 @@ void btrfs_free_reserved_bytes(struct btrfs_block_group *cache, u64 num_bytes,
spin_lock(&cache->lock);
bg_ro = cache->ro;
cache->reserved -= num_bytes;
+ btrfs_maybe_reset_size_class(cache);
if (is_delalloc)
cache->delalloc_bytes -= num_bytes;
spin_unlock(&cache->lock);
--
2.25.1
On Tue, Jan 13, 2026 at 08:55:32PM +0000, Jiasheng Jiang wrote:
> Differential analysis of block-group.c shows an inconsistency in how
> block group size classes are managed.
>
> Currently, btrfs_use_block_group_size_class() sets a block group's size
> class to specialize it for a specific allocation size. However, this
> size class remains "stale" even if the block group becomes completely
> empty (both used and reserved bytes reach zero).
>
> This happens in two scenarios:
> 1. When space reservations are freed (e.g., due to errors or transaction
> aborts) via btrfs_free_reserved_bytes().
> 2. When the last extent in a block group is freed via
> btrfs_update_block_group().
>
> While size classes are advisory, a stale size class can cause
> find_free_extent to unnecessarily skip candidate block groups during
> initial search loops. This undermines the purpose of size classes—to
> reduce fragmentation—by keeping block groups restricted to a specific
> size class when they could be reused for any size.
>
> Fix this by resetting the size class to BTRFS_BG_SZ_NONE whenever a
> block group's used and reserved counts both reach zero. This ensures
> that empty block groups are fully available for any allocation size in
> the next cycle.
>
> Fixes: 52bb7a2166af ("btrfs: introduce size class to block group allocator")
I don't think this necessarily amounts to a bug of the level that needs
a Fixes tag, since I don't think it should realistically break any
workload nor is it that urgent to backport to stable branches.
But if we want to use these tags to note targeted improvements / semi-bugs
then I am fine leaving it.
Either way,
Reviewed-by: Boris Burkov <boris@bur.io>
> Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
> ---
> Changelog:
>
> v3 -> v4:
> 1. Introduced btrfs_maybe_reset_size_class() helper to unify the logic.
> 2. Expanded the fix to include btrfs_update_block_group() to handle cases where the last extent in a block group is freed.
> 3. Refined the commit message to clarify that size classes are advisory and their stale state impacts allocation efficiency rather than causing absolute allocation failures.
>
> v2 -> v3:
> 1. Corrected the "Fixes" tag to 52bb7a2166af.
> 2. Updated the commit message to reflect that the performance impact is workload-dependent.
> 3. Added mention that the issue can lead to unnecessary allocation of new block groups.
>
> v1 -> v2:
> 1. Inlined btrfs_maybe_reset_size_class() function.
> 2. Moved check below the reserved bytes decrement in btrfs_free_reserved_bytes().
> ---
> fs/btrfs/block-group.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
>
> diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
> index 08b14449fabe..343d7724939f 100644
> --- a/fs/btrfs/block-group.c
> +++ b/fs/btrfs/block-group.c
> @@ -3675,6 +3675,14 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans)
> return ret;
> }
>
> +static void btrfs_maybe_reset_size_class(struct btrfs_block_group *cache)
> +{
> + lockdep_assert_held(&cache->lock);
> + if (btrfs_block_group_should_use_size_class(cache) &&
> + cache->used == 0 && cache->reserved == 0)
> + cache->size_class = BTRFS_BG_SZ_NONE;
> +}
> +
> int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> u64 bytenr, u64 num_bytes, bool alloc)
> {
> @@ -3739,6 +3747,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
> old_val -= num_bytes;
> cache->used = old_val;
> cache->pinned += num_bytes;
> + btrfs_maybe_reset_size_class(cache);
> btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
> space_info->bytes_used -= num_bytes;
> space_info->disk_used -= num_bytes * factor;
> @@ -3867,6 +3876,7 @@ void btrfs_free_reserved_bytes(struct btrfs_block_group *cache, u64 num_bytes,
> spin_lock(&cache->lock);
> bg_ro = cache->ro;
> cache->reserved -= num_bytes;
> + btrfs_maybe_reset_size_class(cache);
> if (is_delalloc)
> cache->delalloc_bytes -= num_bytes;
> spin_unlock(&cache->lock);
> --
> 2.25.1
>
Differential analysis of block-group.c shows an inconsistency in how
block group size classes are managed.
Currently, btrfs_use_block_group_size_class() sets a block group's size
class to specialize it for a specific allocation size. However, this
size class remains "stale" even if the block group becomes completely
empty (both used and reserved bytes reach zero).
This happens in two scenarios:
1. When space reservations are freed (e.g., due to errors or transaction
aborts) via btrfs_free_reserved_bytes().
2. When the last extent in a block group is freed via
btrfs_update_block_group().
While size classes are advisory, a stale size class can cause
find_free_extent to unnecessarily skip candidate block groups during
initial search loops. This undermines the purpose of size classes—to
reduce fragmentation—by keeping block groups restricted to a specific
size class when they could be reused for any size.
Fix this by resetting the size class to BTRFS_BG_SZ_NONE whenever a
block group's used and reserved counts both reach zero. This ensures
that empty block groups are fully available for any allocation size in
the next cycle.
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
---
Changelog:
v4 -> v5:
1. Remove the Fixes tag.
v3 -> v4:
1. Introduced btrfs_maybe_reset_size_class() helper to unify the logic.
2. Expanded the fix to include btrfs_update_block_group() to handle cases where the last extent in a block group is freed.
3. Refined the commit message to clarify that size classes are advisory and their stale state impacts allocation efficiency rather than causing absolute allocation failures.
v2 -> v3:
1. Corrected the "Fixes" tag to 52bb7a2166af.
2. Updated the commit message to reflect that the performance impact is workload-dependent.
3. Added mention that the issue can lead to unnecessary allocation of new block groups.
v1 -> v2:
1. Inlined btrfs_maybe_reset_size_class() function.
2. Moved check below the reserved bytes decrement in btrfs_free_reserved_bytes().
---
fs/btrfs/block-group.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index 08b14449fabe..343d7724939f 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -3675,6 +3675,14 @@ int btrfs_write_dirty_block_groups(struct btrfs_trans_handle *trans)
return ret;
}
+static void btrfs_maybe_reset_size_class(struct btrfs_block_group *cache)
+{
+ lockdep_assert_held(&cache->lock);
+ if (btrfs_block_group_should_use_size_class(cache) &&
+ cache->used == 0 && cache->reserved == 0)
+ cache->size_class = BTRFS_BG_SZ_NONE;
+}
+
int btrfs_update_block_group(struct btrfs_trans_handle *trans,
u64 bytenr, u64 num_bytes, bool alloc)
{
@@ -3739,6 +3747,7 @@ int btrfs_update_block_group(struct btrfs_trans_handle *trans,
old_val -= num_bytes;
cache->used = old_val;
cache->pinned += num_bytes;
+ btrfs_maybe_reset_size_class(cache);
btrfs_space_info_update_bytes_pinned(space_info, num_bytes);
space_info->bytes_used -= num_bytes;
space_info->disk_used -= num_bytes * factor;
@@ -3867,6 +3876,7 @@ void btrfs_free_reserved_bytes(struct btrfs_block_group *cache, u64 num_bytes,
spin_lock(&cache->lock);
bg_ro = cache->ro;
cache->reserved -= num_bytes;
+ btrfs_maybe_reset_size_class(cache);
if (is_delalloc)
cache->delalloc_bytes -= num_bytes;
spin_unlock(&cache->lock);
--
2.25.1
On Wed, Jan 14, 2026 at 01:13:38AM +0000, Jiasheng Jiang wrote: > Differential analysis of block-group.c shows an inconsistency in how > block group size classes are managed. > > Currently, btrfs_use_block_group_size_class() sets a block group's size > class to specialize it for a specific allocation size. However, this > size class remains "stale" even if the block group becomes completely > empty (both used and reserved bytes reach zero). > > This happens in two scenarios: > 1. When space reservations are freed (e.g., due to errors or transaction > aborts) via btrfs_free_reserved_bytes(). > 2. When the last extent in a block group is freed via > btrfs_update_block_group(). > > While size classes are advisory, a stale size class can cause > find_free_extent to unnecessarily skip candidate block groups during > initial search loops. This undermines the purpose of size classes—to > reduce fragmentation—by keeping block groups restricted to a specific > size class when they could be reused for any size. > > Fix this by resetting the size class to BTRFS_BG_SZ_NONE whenever a > block group's used and reserved counts both reach zero. This ensures > that empty block groups are fully available for any allocation size in > the next cycle. > > Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Added to for-next, thanks. > v4 -> v5: > 1. Remove the Fixes tag. I've added the tag back as I think it may make sense to backport it. > +static void btrfs_maybe_reset_size_class(struct btrfs_block_group *cache) Renamed 'cache' to 'bg' as the cache was from old code where it was referring to the in memory cache of block groups, while the object we care about is the block group.
© 2016 - 2026 Red Hat, Inc.