[v2] btrfs: stripe-tree: correctly truncate stripe extents on delete

[PATCH v2] btrfs: stripe-tree: correctly truncate stripe extents on delete

Posted by Johannes Thumshirn 1 year, 4 months ago

From: Johannes Thumshirn <johannes.thumshirn@wdc.com>

In our CI system, we're seeing the following ASSERT()ion to trigger when
running RAID stripe-tree tests on non-zoned devices:

 assertion failed: found_start >= start && found_end <= end, in fs/btrfs/raid-stripe-tree.c:64

This ASSERT()ion triggers, because for the initial design of RAID stripe-tree,
I had the "one ordered-extent equals one bio" rule of zoned btrfs in mind.

But for a RAID stripe-tree based system, that is not hosted on a zoned
storage device, but on a regular device this rule doesn't apply.

So in case the range we want to delete starts in the middle of the
previous item, grab the item and "truncate" it's length. That is, subtract
the deleted portion from the key's offset.

In case the range to delete ends in the middle of an item, we have to
adjust both the item's key as well as the stripe extents.

Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---

Changes to v1:
- ASSERT() that slot > 0 before calling btrfs_previous_item()

 fs/btrfs/raid-stripe-tree.c | 52 ++++++++++++++++++++++++++++++++++++-
 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
index 4c859b550f6c..075fecd08d87 100644
--- a/fs/btrfs/raid-stripe-tree.c
+++ b/fs/btrfs/raid-stripe-tree.c
@@ -61,7 +61,57 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
 		trace_btrfs_raid_extent_delete(fs_info, start, end,
 					       found_start, found_end);
 
-		ASSERT(found_start >= start && found_end <= end);
+		if (found_start < start) {
+			struct btrfs_key prev;
+			u64 diff = start - found_start;
+
+			ASSERT(slot > 0);
+
+			ret = btrfs_previous_item(stripe_root, path, start,
+						  BTRFS_RAID_STRIPE_KEY);
+			leaf = path->nodes[0];
+			slot = path->slots[0];
+			btrfs_item_key_to_cpu(leaf, &prev, slot);
+			prev.offset -= diff;
+
+			btrfs_mark_buffer_dirty(trans, leaf);
+
+			start += diff;
+			length -= diff;
+
+			btrfs_release_path(path);
+			continue;
+		}
+
+		if (end < found_end && found_end - end < key.offset) {
+			struct btrfs_stripe_extent *stripe_extent;
+			u64 diff = key.offset - length;
+			int num_stripes;
+
+			num_stripes = btrfs_num_raid_stripes(
+				btrfs_item_size(leaf, slot));
+			stripe_extent = btrfs_item_ptr(
+				leaf, slot, struct btrfs_stripe_extent);
+
+			for (int i = 0; i < num_stripes; i++) {
+				struct btrfs_raid_stride *stride =
+					&stripe_extent->strides[i];
+				u64 physical = btrfs_raid_stride_physical(
+					leaf, stride);
+
+				physical += diff;
+				btrfs_set_raid_stride_physical(leaf, stride,
+							       physical);
+			}
+
+			key.objectid += diff;
+			key.offset -= diff;
+
+			btrfs_mark_buffer_dirty(trans, leaf);
+			btrfs_release_path(path);
+			break;
+		}
+
 		ret = btrfs_del_item(trans, stripe_root, path);
 		if (ret)
 			break;
-- 
2.43.0

Re: [PATCH v2] btrfs: stripe-tree: correctly truncate stripe extents on delete

Posted by Filipe Manana 1 year, 4 months ago

On Wed, Sep 11, 2024 at 10:52 AM Johannes Thumshirn <jth@kernel.org> wrote:
>
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
>
> In our CI system, we're seeing the following ASSERT()ion to trigger when
> running RAID stripe-tree tests on non-zoned devices:
>
>  assertion failed: found_start >= start && found_end <= end, in fs/btrfs/raid-stripe-tree.c:64
>
> This ASSERT()ion triggers, because for the initial design of RAID stripe-tree,
> I had the "one ordered-extent equals one bio" rule of zoned btrfs in mind.
>
> But for a RAID stripe-tree based system, that is not hosted on a zoned
> storage device, but on a regular device this rule doesn't apply.
>
> So in case the range we want to delete starts in the middle of the
> previous item, grab the item and "truncate" it's length. That is, subtract
> the deleted portion from the key's offset.
>
> In case the range to delete ends in the middle of an item, we have to
> adjust both the item's key as well as the stripe extents.
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
>
> Changes to v1:
> - ASSERT() that slot > 0 before calling btrfs_previous_item()
>
>  fs/btrfs/raid-stripe-tree.c | 52 ++++++++++++++++++++++++++++++++++++-
>  1 file changed, 51 insertions(+), 1 deletion(-)
>
> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> index 4c859b550f6c..075fecd08d87 100644
> --- a/fs/btrfs/raid-stripe-tree.c
> +++ b/fs/btrfs/raid-stripe-tree.c
> @@ -61,7 +61,57 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>                 trace_btrfs_raid_extent_delete(fs_info, start, end,
>                                                found_start, found_end);
>
> -               ASSERT(found_start >= start && found_end <= end);

I was looking at this in for-next, and several questions popped in my
mind, see below.

> +               if (found_start < start) {
> +                       struct btrfs_key prev;
> +                       u64 diff = start - found_start;
> +
> +                       ASSERT(slot > 0);
> +
> +                       ret = btrfs_previous_item(stripe_root, path, start,
> +                                                 BTRFS_RAID_STRIPE_KEY);

This doesn't do anything, we ignore the return value and then the
"continue" below makes us go to the start of the loop and do a
btrfs_search_slot(), without using the leaf (see below as well).

> +                       leaf = path->nodes[0];
> +                       slot = path->slots[0];
> +                       btrfs_item_key_to_cpu(leaf, &prev, slot);
> +                       prev.offset -= diff;

Why do we decrement prom prev.offset?
It's a local variable and then we don't use it anymore.

I.e. these 4 lines can be removed.

> +
> +                       btrfs_mark_buffer_dirty(trans, leaf);

Why do we mark the leaf as dirty?
We haven't changed it

> +
> +                       start += diff;
> +                       length -= diff;
> +
> +                       btrfs_release_path(path);

We don't use the path for anything, we extract the key, don't do
anything with it and then release the path.

Or did I miss something?

Thanks.

> +                       continue;
> +               }
> +
> +               if (end < found_end && found_end - end < key.offset) {
> +                       struct btrfs_stripe_extent *stripe_extent;
> +                       u64 diff = key.offset - length;
> +                       int num_stripes;
> +
> +                       num_stripes = btrfs_num_raid_stripes(
> +                               btrfs_item_size(leaf, slot));
> +                       stripe_extent = btrfs_item_ptr(
> +                               leaf, slot, struct btrfs_stripe_extent);
> +
> +                       for (int i = 0; i < num_stripes; i++) {
> +                               struct btrfs_raid_stride *stride =
> +                                       &stripe_extent->strides[i];
> +                               u64 physical = btrfs_raid_stride_physical(
> +                                       leaf, stride);
> +
> +                               physical += diff;
> +                               btrfs_set_raid_stride_physical(leaf, stride,
> +                                                              physical);
> +                       }
> +
> +                       key.objectid += diff;
> +                       key.offset -= diff;
> +
> +                       btrfs_mark_buffer_dirty(trans, leaf);
> +                       btrfs_release_path(path);
> +                       break;
> +               }
> +
>                 ret = btrfs_del_item(trans, stripe_root, path);
>                 if (ret)
>                         break;
> --
> 2.43.0
>
>

Re: [PATCH v2] btrfs: stripe-tree: correctly truncate stripe extents on delete

Posted by Josef Bacik 1 year, 4 months ago

On Wed, Sep 11, 2024 at 11:52:05AM +0200, Johannes Thumshirn wrote:
> From: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 
> In our CI system, we're seeing the following ASSERT()ion to trigger when
> running RAID stripe-tree tests on non-zoned devices:
> 
>  assertion failed: found_start >= start && found_end <= end, in fs/btrfs/raid-stripe-tree.c:64
> 
> This ASSERT()ion triggers, because for the initial design of RAID stripe-tree,
> I had the "one ordered-extent equals one bio" rule of zoned btrfs in mind.
> 
> But for a RAID stripe-tree based system, that is not hosted on a zoned
> storage device, but on a regular device this rule doesn't apply.
> 
> So in case the range we want to delete starts in the middle of the
> previous item, grab the item and "truncate" it's length. That is, subtract
> the deleted portion from the key's offset.
> 
> In case the range to delete ends in the middle of an item, we have to
> adjust both the item's key as well as the stripe extents.
> 
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> ---
> 
> Changes to v1:
> - ASSERT() that slot > 0 before calling btrfs_previous_item()
> 
>  fs/btrfs/raid-stripe-tree.c | 52 ++++++++++++++++++++++++++++++++++++-
>  1 file changed, 51 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/raid-stripe-tree.c b/fs/btrfs/raid-stripe-tree.c
> index 4c859b550f6c..075fecd08d87 100644
> --- a/fs/btrfs/raid-stripe-tree.c
> +++ b/fs/btrfs/raid-stripe-tree.c
> @@ -61,7 +61,57 @@ int btrfs_delete_raid_extent(struct btrfs_trans_handle *trans, u64 start, u64 le
>  		trace_btrfs_raid_extent_delete(fs_info, start, end,
>  					       found_start, found_end);
>  
> -		ASSERT(found_start >= start && found_end <= end);
> +		if (found_start < start) {
> +			struct btrfs_key prev;
> +			u64 diff = start - found_start;
> +
> +			ASSERT(slot > 0);
> +
> +			ret = btrfs_previous_item(stripe_root, path, start,
> +						  BTRFS_RAID_STRIPE_KEY);
> +			leaf = path->nodes[0];
> +			slot = path->slots[0];
> +			btrfs_item_key_to_cpu(leaf, &prev, slot);
> +			prev.offset -= diff;
> +
> +			btrfs_mark_buffer_dirty(trans, leaf);
> +
> +			start += diff;
> +			length -= diff;
> +
> +			btrfs_release_path(path);
> +			continue;
> +		}
> +
> +		if (end < found_end && found_end - end < key.offset) {
> +			struct btrfs_stripe_extent *stripe_extent;
> +			u64 diff = key.offset - length;
> +			int num_stripes;
> +
> +			num_stripes = btrfs_num_raid_stripes(
> +				btrfs_item_size(leaf, slot));
> +			stripe_extent = btrfs_item_ptr(
> +				leaf, slot, struct btrfs_stripe_extent);
> +
> +			for (int i = 0; i < num_stripes; i++) {
> +				struct btrfs_raid_stride *stride =
> +					&stripe_extent->strides[i];
> +				u64 physical = btrfs_raid_stride_physical(
> +					leaf, stride);
> +
> +				physical += diff;
> +				btrfs_set_raid_stride_physical(leaf, stride,
> +							       physical);
> +			}
> +
> +			key.objectid += diff;
> +			key.offset -= diff;

This part was confusing and isn't necessary, you can drop this bit and then add

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef