Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device,
mostly empty although we will do the split according to our super block
locations, the last super block ends at 256G, we can submit a huge
discard for the range [256G, 8T), causing a super large delay.
We now split the space left to discard based on BTRFS_MAX_DATA_CHUNK_SIZE
in preparation of introduction of cancellation signals handling.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180
Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737
Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com>
---
fs/btrfs/extent-tree.c | 24 +++++++++++++++++++-----
1 file changed, 19 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index a5966324607d..cbe66d0acff8 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -1239,7 +1239,7 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
u64 *discarded_bytes)
{
int j, ret = 0;
- u64 bytes_left, end;
+ u64 bytes_left, bytes_to_discard, end;
u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT);
/* Adjust the range to be aligned to 512B sectors if necessary. */
@@ -1300,13 +1300,27 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len,
bytes_left = end - start;
}
- if (bytes_left) {
+ while (bytes_left) {
+ if (bytes_left > BTRFS_MAX_DATA_CHUNK_SIZE)
+ bytes_to_discard = BTRFS_MAX_DATA_CHUNK_SIZE;
+ else
+ bytes_to_discard = bytes_left;
+
ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT,
- bytes_left >> SECTOR_SHIFT,
+ bytes_to_discard >> SECTOR_SHIFT,
GFP_NOFS);
- if (!ret)
- *discarded_bytes += bytes_left;
+
+ if (ret) {
+ if (ret != -EOPNOTSUPP)
+ break;
+ continue;
+ }
+
+ start += bytes_to_discard;
+ bytes_left -= bytes_to_discard;
+ *discarded_bytes += bytes_to_discard;
}
+
return ret;
}
--
2.46.0
在 2024/9/16 19:46, Luca Stefani 写道: > Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device, > mostly empty although we will do the split according to our super block > locations, the last super block ends at 256G, we can submit a huge > discard for the range [256G, 8T), causing a super large delay. > > We now split the space left to discard based on BTRFS_MAX_DATA_CHUNK_SIZE > in preparation of introduction of cancellation signals handling. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180 > Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737 > Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com> > --- > fs/btrfs/extent-tree.c | 24 +++++++++++++++++++----- > 1 file changed, 19 insertions(+), 5 deletions(-) > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c > index a5966324607d..cbe66d0acff8 100644 > --- a/fs/btrfs/extent-tree.c > +++ b/fs/btrfs/extent-tree.c > @@ -1239,7 +1239,7 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, > u64 *discarded_bytes) > { > int j, ret = 0; > - u64 bytes_left, end; > + u64 bytes_left, bytes_to_discard, end; > u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT); > > /* Adjust the range to be aligned to 512B sectors if necessary. */ > @@ -1300,13 +1300,27 @@ static int btrfs_issue_discard(struct block_device *bdev, u64 start, u64 len, > bytes_left = end - start; > } > > - if (bytes_left) { > + while (bytes_left) { > + if (bytes_left > BTRFS_MAX_DATA_CHUNK_SIZE) > + bytes_to_discard = BTRFS_MAX_DATA_CHUNK_SIZE; That MAX_DATA_CHUNK_SIZE is only possible for RAID0/RAID10/RAID5/RAID6, by spanning the device extents across multiple devices. For each device, the maximum size is limited to 1G (check init_alloc_chunk_ctl_policy_regular()). So you can just limit it to 1G instead. (If you want, you can also extract that into a macro as a cleanup). Furthermore, you can use min() instead of a if (). So you only need: bytes_to_discard = min(SZ_1G, bytes_left); Otherwise this looks good enough to me. If the 1G size is not good enough, we can later tune it to smaller values. Personally speaking I think 1G would be enough. Thanks, Qu > + else > + bytes_to_discard = bytes_left; > + > ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, > - bytes_left >> SECTOR_SHIFT, > + bytes_to_discard >> SECTOR_SHIFT, > GFP_NOFS); > - if (!ret) > - *discarded_bytes += bytes_left; > + > + if (ret) { > + if (ret != -EOPNOTSUPP) > + break; > + continue; > + } > + > + start += bytes_to_discard; > + bytes_left -= bytes_to_discard; > + *discarded_bytes += bytes_to_discard; > } > + > return ret; > } >
On 16/09/24 12:39, Qu Wenruo wrote: > > > 在 2024/9/16 19:46, Luca Stefani 写道: >> Per Qu Wenruo in case we have a very large disk, e.g. 8TiB device, >> mostly empty although we will do the split according to our super block >> locations, the last super block ends at 256G, we can submit a huge >> discard for the range [256G, 8T), causing a super large delay. >> >> We now split the space left to discard based on BTRFS_MAX_DATA_CHUNK_SIZE >> in preparation of introduction of cancellation signals handling. >> >> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219180 >> Link: https://bugzilla.suse.com/show_bug.cgi?id=1229737 >> Signed-off-by: Luca Stefani <luca.stefani.ge1@gmail.com> >> --- >> fs/btrfs/extent-tree.c | 24 +++++++++++++++++++----- >> 1 file changed, 19 insertions(+), 5 deletions(-) >> >> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c >> index a5966324607d..cbe66d0acff8 100644 >> --- a/fs/btrfs/extent-tree.c >> +++ b/fs/btrfs/extent-tree.c >> @@ -1239,7 +1239,7 @@ static int btrfs_issue_discard(struct >> block_device *bdev, u64 start, u64 len, >> u64 *discarded_bytes) >> { >> int j, ret = 0; >> - u64 bytes_left, end; >> + u64 bytes_left, bytes_to_discard, end; >> u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT); >> /* Adjust the range to be aligned to 512B sectors if necessary. */ >> @@ -1300,13 +1300,27 @@ static int btrfs_issue_discard(struct >> block_device *bdev, u64 start, u64 len, >> bytes_left = end - start; >> } >> - if (bytes_left) { >> + while (bytes_left) { >> + if (bytes_left > BTRFS_MAX_DATA_CHUNK_SIZE) >> + bytes_to_discard = BTRFS_MAX_DATA_CHUNK_SIZE; > > That MAX_DATA_CHUNK_SIZE is only possible for RAID0/RAID10/RAID5/RAID6, > by spanning the device extents across multiple devices. > > For each device, the maximum size is limited to 1G (check > init_alloc_chunk_ctl_policy_regular()). > > So you can just limit it to 1G instead. > (If you want, you can also extract that into a macro as a cleanup). I think SZ_1G is enough for now. > > Furthermore, you can use min() instead of a if (). > > So you only need: > > bytes_to_discard = min(SZ_1G, bytes_left); > > Otherwise this looks good enough to me. > If the 1G size is not good enough, we can later tune it to smaller values. > > Personally speaking I think 1G would be enough. > > Thanks, > Qu Ack, done in v5 >> + else >> + bytes_to_discard = bytes_left; >> + >> ret = blkdev_issue_discard(bdev, start >> SECTOR_SHIFT, >> - bytes_left >> SECTOR_SHIFT, >> + bytes_to_discard >> SECTOR_SHIFT, >> GFP_NOFS); >> - if (!ret) >> - *discarded_bytes += bytes_left; >> + >> + if (ret) { >> + if (ret != -EOPNOTSUPP) >> + break; >> + continue; >> + } >> + >> + start += bytes_to_discard; >> + bytes_left -= bytes_to_discard; >> + *discarded_bytes += bytes_to_discard; >> } >> + >> return ret; >> }
© 2016 - 2024 Red Hat, Inc.