Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
mostly empty although we will do the split according to our super block
locations, the last super block ends at 256G, we can submit a huge
discard for the range [256G, 8T), causing a super large delay.
We now split the space left to discard based on BTRFS_MAX_DATA_CHUNK_SIZE
in preparation of introduction of cancellation signals handling.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com>
---
fs/btrfs/extent-tree.c | 24 +++++++++++++++++++-----
1 file changed, 19 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a5966324607d..cbe66d0acff8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1239,7 +1239,7 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
u64 *discarded_bytes)
{
int j, ret = 0;
- u64 bytes_left, end;
+ u64 bytes_left, bytes_to_discard, end;
u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT);
/* Adjust the range to be aligned to 512B sectors if necessary. */
@@ -1300,13 +1300,27 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
bytes_left = end - start;
}
- if (bytes_left) {
+ while (bytes_left) {
+ if (bytes_left > BTRFS_MAX_DATA_CHUNK_SIZE)
+ bytes_to_discard = BTRFS_MAX_DATA_CHUNK_SIZE;
+ else
+ bytes_to_discard = bytes_left;
+
ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
- bytes_left >> SECTOR_SHIFT,
+ bytes_to_discard >> SECTOR_SHIFT,
GFP_NOFS);
- if (!ret)
- *discarded_bytes += bytes_left;
+
+ if (ret) {
+ if (ret != -EOPNOTSUPP)
+ break;
+ continue;
+ }
+
+ start += bytes_to_discard;
+ bytes_left -= bytes_to_discard;
+ *discarded_bytes += bytes_to_discard;
}
+
return ret;
}
--
2.46.0
在 2024/9/16 19:46, Luca Stefani 写道:
> Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
> mostly empty although we will do the split according to our super block
> locations, the last super block ends at 256G, we can submit a huge
> discard for the range [256G, 8T), causing a super large delay.
>
> We now split the space left to discard based on BTRFS_MAX_DATA_CHUNK_SIZE
> in preparation of introduction of cancellation signals handling.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
> Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
> Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com>
> ---
> fs/btrfs/extent-tree.c | 24 +++++++++++++++++++-----
> 1 file changed, 19 insertions(+), 5 deletions(-)
>
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index a5966324607d..cbe66d0acff8 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -1239,7 +1239,7 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
> u64 *discarded_bytes)
> {
> int j, ret = 0;
> - u64 bytes_left, end;
> + u64 bytes_left, bytes_to_discard, end;
> u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT);
>
> /* Adjust the range to be aligned to 512B sectors if necessary. */
> @@ -1300,13 +1300,27 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
> bytes_left = end - start;
> }
>
> - if (bytes_left) {
> + while (bytes_left) {
> + if (bytes_left > BTRFS_MAX_DATA_CHUNK_SIZE)
> + bytes_to_discard = BTRFS_MAX_DATA_CHUNK_SIZE;
That MAX_DATA_CHUNK_SIZE is only possible for RAID0/RAID10/RAID5/RAID6,
by spanning the device extents across multiple devices.
For each device, the maximum size is limited to 1G (check
init_alloc_chunk_ctl_policy_regular()).
So you can just limit it to 1G instead.
(If you want, you can also extract that into a macro as a cleanup).
Furthermore, you can use min() instead of a if ().
So you only need:
bytes_to_discard = min(SZ_1G, bytes_left);
Otherwise this looks good enough to me.
If the 1G size is not good enough, we can later tune it to smaller values.
Personally speaking I think 1G would be enough.
Thanks,
Qu
> + else
> + bytes_to_discard = bytes_left;
> +
> ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
> - bytes_left >> SECTOR_SHIFT,
> + bytes_to_discard >> SECTOR_SHIFT,
> GFP_NOFS);
> - if (!ret)
> - *discarded_bytes += bytes_left;
> +
> + if (ret) {
> + if (ret != -EOPNOTSUPP)
> + break;
> + continue;
> + }
> +
> + start += bytes_to_discard;
> + bytes_left -= bytes_to_discard;
> + *discarded_bytes += bytes_to_discard;
> }
> +
> return ret;
> }
>
On 16/09/24 12:39, Qu Wenruo wrote:
>
>
> 在 2024/9/16 19:46, Luca Stefani 写道:
>> Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
>> mostly empty although we will do the split according to our super block
>> locations, the last super block ends at 256G, we can submit a huge
>> discard for the range [256G, 8T), causing a super large delay.
>>
>> We now split the space left to discard based on BTRFS_MAX_DATA_CHUNK_SIZE
>> in preparation of introduction of cancellation signals handling.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
>> Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
>> Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com>
>> ---
>> fs/btrfs/extent-tree.c | 24 +++++++++++++++++++-----
>> 1 file changed, 19 insertions(+), 5 deletions(-)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index a5966324607d..cbe66d0acff8 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -1239,7 +1239,7 @@ static int btrfs_issue_discard(struct
>> block_device *bdev, u64 start, u64 len,
>> u64 *discarded_bytes)
>> {
>> int j, ret = 0;
>> - u64 bytes_left, end;
>> + u64 bytes_left, bytes_to_discard, end;
>> u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT);
>> /* Adjust the range to be aligned to 512B sectors if necessary. */
>> @@ -1300,13 +1300,27 @@ static int btrfs_issue_discard(struct
>> block_device *bdev, u64 start, u64 len,
>> bytes_left = end - start;
>> }
>> - if (bytes_left) {
>> + while (bytes_left) {
>> + if (bytes_left > BTRFS_MAX_DATA_CHUNK_SIZE)
>> + bytes_to_discard = BTRFS_MAX_DATA_CHUNK_SIZE;
>
> That MAX_DATA_CHUNK_SIZE is only possible for RAID0/RAID10/RAID5/RAID6,
> by spanning the device extents across multiple devices.
>
> For each device, the maximum size is limited to 1G (check
> init_alloc_chunk_ctl_policy_regular()).
>
> So you can just limit it to 1G instead.
> (If you want, you can also extract that into a macro as a cleanup).
I think SZ_1G is enough for now.
>
> Furthermore, you can use min() instead of a if ().
>
> So you only need:
>
> bytes_to_discard = min(SZ_1G, bytes_left);
>
> Otherwise this looks good enough to me.
> If the 1G size is not good enough, we can later tune it to smaller values.
>
> Personally speaking I think 1G would be enough.
>
> Thanks,
> Qu
Ack, done in v5
>> + else
>> + bytes_to_discard = bytes_left;
>> +
>> ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
>> - bytes_left >> SECTOR_SHIFT,
>> + bytes_to_discard >> SECTOR_SHIFT,
>> GFP_NOFS);
>> - if (!ret)
>> - *discarded_bytes += bytes_left;
>> +
>> + if (ret) {
>> + if (ret != -EOPNOTSUPP)
>> + break;
>> + continue;
>> + }
>> +
>> + start += bytes_to_discard;
>> + bytes_left -= bytes_to_discard;
>> + *discarded_bytes += bytes_to_discard;
>> }
>> +
>> return ret;
>> }
© 2016 - 2026 Red Hat, Inc.