We are generally moving to int64_t for both offset and bytes parameters
on all io paths.
Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.
We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).
So, convert driver discard handlers bytes parameter to int64_t.
The only caller of all updated function is bdrv_co_pdiscard in
block/io.c. It is already prepared to work with 64bit requests, but
pass at most max(bs->bl.max_pdiscard, INT_MAX) to the driver.
Let's look at all updated functions:
backup-top: pass to bdrv_co_pdiscard which is 64bit
blkdebug: all calculations are still OK, thanks to
bdrv_check_qiov_request().
both rule_check and bdrv_co_pdiscard are 64bit
blklogwrites: pass to blk_loc_writes_co_log which is 64bit
blkreply, copy-on-read, filter-compress: pass to bdrv_co_pdiscard, OK
file-posix: one handler calls raw_account_discard() is 64bit and both
handlers calls raw_do_pdiscard(). Update raw_do_pdiscard, which pass
to RawPosixAIOData::aio_nbytes, which is 64bit (and calls
raw_account_discard())
gluster: somehow, third argument of glfs_discard_async is size_t.
Let's set max_pdiscard accordingly.
iscsi: iscsi_allocmap_set_invalid is 64bit,
!is_byte_request_lun_aligned is 64bit.
list.num is uint32_t. Let's clarify max_pdiscard and
pdiscard_alignment.
mirror_top, preallocate: pass to bdrv_mirror_top_do_write() which is
64bit
nbd: protocol limitation. max_pdiscard is alredy set strict enough,
keep it as is for now.
nvmd: buf.nlb is uint32_t and we do shift. So, add corresponding limits
to nvme_refresh_limits().
qcow2: calculations are still OK, thanks to bdrv_check_qiov_request(),
qcow2_cluster_discard() is 64bit.
raw-format: raw_adjust_offset() is 64bit, bdrv_co_pdiscard too.
sheepdog: the format is deprecated. Don't care and just make old
INT_MAX limit to be explicit
throttle: pass to bdrv_co_pdiscard() which is 64bit and to
throttle_group_co_io_limits_intercept() which is 64bit as well.
test-block-iothread: bytes argument is unused
Great! Now all drivers are prepared to 64bit discard requests or has
explicit max_pdiscard limit.
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
include/block/block_int.h | 2 +-
block/backup-top.c | 2 +-
block/blkdebug.c | 2 +-
block/blklogwrites.c | 4 ++--
block/blkreplay.c | 2 +-
block/copy-on-read.c | 2 +-
block/file-posix.c | 7 ++++---
block/filter-compress.c | 2 +-
block/gluster.c | 7 +++++--
block/iscsi.c | 10 +++++-----
block/mirror.c | 2 +-
block/nbd.c | 6 ++++--
block/nvme.c | 14 +++++++++++++-
block/preallocate.c | 2 +-
block/qcow2.c | 2 +-
block/raw-format.c | 2 +-
block/sheepdog.c | 15 ++++++++++++++-
block/throttle.c | 2 +-
tests/unit/test-block-iothread.c | 2 +-
block/trace-events | 4 ++--
20 files changed, 61 insertions(+), 30 deletions(-)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index cb36ba93a6..adc5ea12cc 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -302,7 +302,7 @@ struct BlockDriver {
int coroutine_fn (*bdrv_co_pwrite_zeroes)(BlockDriverState *bs,
int64_t offset, int64_t bytes, BdrvRequestFlags flags);
int coroutine_fn (*bdrv_co_pdiscard)(BlockDriverState *bs,
- int64_t offset, int bytes);
+ int64_t offset, int64_t bytes);
/* Map [offset, offset + nbytes) range onto a child of @bs to copy from,
* and invoke bdrv_co_copy_range_from(child, ...), or invoke
diff --git a/block/backup-top.c b/block/backup-top.c
index f193cc549c..45240aef9e 100644
--- a/block/backup-top.c
+++ b/block/backup-top.c
@@ -64,7 +64,7 @@ static coroutine_fn int backup_top_cbw(BlockDriverState *bs, uint64_t offset,
}
static int coroutine_fn backup_top_co_pdiscard(BlockDriverState *bs,
- int64_t offset, int bytes)
+ int64_t offset, int64_t bytes)
{
int ret = backup_top_cbw(bs, offset, bytes, 0);
if (ret < 0) {
diff --git a/block/blkdebug.c b/block/blkdebug.c
index c81cb9cb1a..2d98a33982 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -700,7 +700,7 @@ static int coroutine_fn blkdebug_co_pwrite_zeroes(BlockDriverState *bs,
}
static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs,
- int64_t offset, int bytes)
+ int64_t offset, int64_t bytes)
{
uint32_t align = bs->bl.pdiscard_alignment;
int err;
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index d7ae64c22d..f7a251e91f 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -484,9 +484,9 @@ static int coroutine_fn blk_log_writes_co_flush_to_disk(BlockDriverState *bs)
}
static int coroutine_fn
-blk_log_writes_co_pdiscard(BlockDriverState *bs, int64_t offset, int count)
+blk_log_writes_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
{
- return blk_log_writes_co_log(bs, offset, count, NULL, 0,
+ return blk_log_writes_co_log(bs, offset, bytes, NULL, 0,
blk_log_writes_co_do_file_pdiscard,
LOG_DISCARD_FLAG, false);
}
diff --git a/block/blkreplay.c b/block/blkreplay.c
index 89d74a3cca..dcbe780ddb 100644
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -105,7 +105,7 @@ static int coroutine_fn blkreplay_co_pwrite_zeroes(BlockDriverState *bs,
}
static int coroutine_fn blkreplay_co_pdiscard(BlockDriverState *bs,
- int64_t offset, int bytes)
+ int64_t offset, int64_t bytes)
{
uint64_t reqid = blkreplay_next_id();
int ret = bdrv_co_pdiscard(bs->file, offset, bytes);
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index 758a5d44d5..c29cfdd10e 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -214,7 +214,7 @@ static int coroutine_fn cor_co_pwrite_zeroes(BlockDriverState *bs,
static int coroutine_fn cor_co_pdiscard(BlockDriverState *bs,
- int64_t offset, int bytes)
+ int64_t offset, int64_t bytes)
{
return bdrv_co_pdiscard(bs->file, offset, bytes);
}
diff --git a/block/file-posix.c b/block/file-posix.c
index 6114bdd308..6959e2feba 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2875,7 +2875,8 @@ static void raw_account_discard(BDRVRawState *s, uint64_t nbytes, int ret)
}
static coroutine_fn int
-raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int bytes, bool blkdev)
+raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ bool blkdev)
{
BDRVRawState *s = bs->opaque;
RawPosixAIOData acb;
@@ -2899,7 +2900,7 @@ raw_do_pdiscard(BlockDriverState *bs, int64_t offset, int bytes, bool blkdev)
}
static coroutine_fn int
-raw_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes)
+raw_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
{
return raw_do_pdiscard(bs, offset, bytes, false);
}
@@ -3530,7 +3531,7 @@ static int fd_open(BlockDriverState *bs)
}
static coroutine_fn int
-hdev_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes)
+hdev_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
{
BDRVRawState *s = bs->opaque;
int ret;
diff --git a/block/filter-compress.c b/block/filter-compress.c
index fb85686b69..d5be538619 100644
--- a/block/filter-compress.c
+++ b/block/filter-compress.c
@@ -94,7 +94,7 @@ static int coroutine_fn compress_co_pwrite_zeroes(BlockDriverState *bs,
static int coroutine_fn compress_co_pdiscard(BlockDriverState *bs,
- int64_t offset, int bytes)
+ int64_t offset, int64_t bytes)
{
return bdrv_co_pdiscard(bs->file, offset, bytes);
}
diff --git a/block/gluster.c b/block/gluster.c
index 6a17b37c0c..066fdf60fa 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -891,6 +891,7 @@ out:
static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp)
{
bs->bl.max_transfer = GLUSTER_MAX_TRANSFER;
+ bs->bl.max_pdiscard = SIZE_MAX;
}
static int qemu_gluster_reopen_prepare(BDRVReopenState *state,
@@ -1297,18 +1298,20 @@ error:
#ifdef CONFIG_GLUSTERFS_DISCARD
static coroutine_fn int qemu_gluster_co_pdiscard(BlockDriverState *bs,
- int64_t offset, int size)
+ int64_t offset, int64_t bytes)
{
int ret;
GlusterAIOCB acb;
BDRVGlusterState *s = bs->opaque;
+ assert(bytes <= SIZE_MAX); /* rely on max_pdiscard */
+
acb.size = 0;
acb.ret = 0;
acb.coroutine = qemu_coroutine_self();
acb.aio_context = bdrv_get_aio_context(bs);
- ret = glfs_discard_async(s->fd, offset, size, gluster_finish_aiocb, &acb);
+ ret = glfs_discard_async(s->fd, offset, bytes, gluster_finish_aiocb, &acb);
if (ret < 0) {
return -errno;
}
diff --git a/block/iscsi.c b/block/iscsi.c
index b90ed67377..297919ebc2 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1141,7 +1141,8 @@ iscsi_getlength(BlockDriverState *bs)
}
static int
-coroutine_fn iscsi_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes)
+coroutine_fn iscsi_co_pdiscard(BlockDriverState *bs, int64_t offset,
+ int64_t bytes)
{
IscsiLun *iscsilun = bs->opaque;
struct IscsiTask iTask;
@@ -2075,10 +2076,9 @@ static void iscsi_refresh_limits(BlockDriverState *bs, Error **errp)
}
if (iscsilun->lbp.lbpu) {
- if (iscsilun->bl.max_unmap < 0xffffffff / block_size) {
- bs->bl.max_pdiscard =
- iscsilun->bl.max_unmap * iscsilun->block_size;
- }
+ bs->bl.max_pdiscard =
+ MIN_NON_ZERO(iscsilun->bl.max_unmap * iscsilun->block_size,
+ (uint64_t)UINT32_MAX * iscsilun->block_size);
bs->bl.pdiscard_alignment =
iscsilun->bl.opt_unmap_gran * iscsilun->block_size;
} else {
diff --git a/block/mirror.c b/block/mirror.c
index f0a3eac216..3dbe696873 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1486,7 +1486,7 @@ static int coroutine_fn bdrv_mirror_top_pwrite_zeroes(BlockDriverState *bs,
}
static int coroutine_fn bdrv_mirror_top_pdiscard(BlockDriverState *bs,
- int64_t offset, int bytes)
+ int64_t offset, int64_t bytes)
{
return bdrv_mirror_top_do_write(bs, MIRROR_METHOD_DISCARD, offset, bytes,
NULL, 0);
diff --git a/block/nbd.c b/block/nbd.c
index bf56735e4a..03f74b6f60 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1680,15 +1680,17 @@ static int nbd_client_co_flush(BlockDriverState *bs)
}
static int nbd_client_co_pdiscard(BlockDriverState *bs, int64_t offset,
- int bytes)
+ int64_t bytes)
{
BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
NBDRequest request = {
.type = NBD_CMD_TRIM,
.from = offset,
- .len = bytes,
+ .len = bytes, /* len is uint32_t */
};
+ assert(bytes <= UINT32_MAX); /* rely on max_pdiscard */
+
assert(!(s->info.flags & NBD_FLAG_READ_ONLY));
if (!(s->info.flags & NBD_FLAG_SEND_TRIM) || !bytes) {
return 0;
diff --git a/block/nvme.c b/block/nvme.c
index 51fc65fc91..1d59514f63 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -1330,7 +1330,7 @@ static coroutine_fn int nvme_co_pwrite_zeroes(BlockDriverState *bs,
static int coroutine_fn nvme_co_pdiscard(BlockDriverState *bs,
int64_t offset,
- int bytes)
+ int64_t bytes)
{
BDRVNVMeState *s = bs->opaque;
NVMeQueuePair *ioq = s->queues[INDEX_IO(0)];
@@ -1357,6 +1357,14 @@ static int coroutine_fn nvme_co_pdiscard(BlockDriverState *bs,
assert(s->queue_count > 1);
+ /*
+ * Filling the @buf requires @offset and @bytes to satisfy restrictions
+ * defined in nvme_refresh_limits().
+ */
+ assert(QEMU_IS_ALIGNED(bytes, 1UL << s->blkshift));
+ assert(QEMU_IS_ALIGNED(offset, 1UL << s->blkshift));
+ assert((bytes >> s->blkshift) <= UINT32_MAX);
+
buf = qemu_try_memalign(s->page_size, s->page_size);
if (!buf) {
return -ENOMEM;
@@ -1460,6 +1468,10 @@ static void nvme_refresh_limits(BlockDriverState *bs, Error **errp)
bs->bl.max_pwrite_zeroes = 1ULL << (s->blkshift + 16);
bs->bl.pwrite_zeroes_alignment = MAX(bs->bl.request_alignment,
1UL << s->blkshift);
+
+ bs->bl.max_pdiscard = (uint64_t)UINT32_MAX << s->blkshift;
+ bs->bl.pdiscard_alignment = MAX(bs->bl.request_alignment,
+ 1UL << s->blkshift);
}
static void nvme_detach_aio_context(BlockDriverState *bs)
diff --git a/block/preallocate.c b/block/preallocate.c
index 99e28d9f08..1d4233f730 100644
--- a/block/preallocate.c
+++ b/block/preallocate.c
@@ -235,7 +235,7 @@ static coroutine_fn int preallocate_co_preadv_part(
}
static int coroutine_fn preallocate_co_pdiscard(BlockDriverState *bs,
- int64_t offset, int bytes)
+ int64_t offset, int64_t bytes)
{
return bdrv_co_pdiscard(bs->file, offset, bytes);
}
diff --git a/block/qcow2.c b/block/qcow2.c
index 59c5137410..442917b85d 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -3968,7 +3968,7 @@ static coroutine_fn int qcow2_co_pwrite_zeroes(BlockDriverState *bs,
}
static coroutine_fn int qcow2_co_pdiscard(BlockDriverState *bs,
- int64_t offset, int bytes)
+ int64_t offset, int64_t bytes)
{
int ret;
BDRVQcow2State *s = bs->opaque;
diff --git a/block/raw-format.c b/block/raw-format.c
index 4e9304c63b..45846e42d5 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -302,7 +302,7 @@ static int coroutine_fn raw_co_pwrite_zeroes(BlockDriverState *bs,
}
static int coroutine_fn raw_co_pdiscard(BlockDriverState *bs,
- int64_t offset, int bytes)
+ int64_t offset, int64_t bytes)
{
int ret;
diff --git a/block/sheepdog.c b/block/sheepdog.c
index a45c73826d..80e04dccfd 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -3101,7 +3101,7 @@ static int sd_load_vmstate(BlockDriverState *bs, QEMUIOVector *qiov,
static coroutine_fn int sd_co_pdiscard(BlockDriverState *bs, int64_t offset,
- int bytes)
+ int64_t bytes)
{
SheepdogAIOCB acb;
BDRVSheepdogState *s = bs->opaque;
@@ -3113,6 +3113,8 @@ static coroutine_fn int sd_co_pdiscard(BlockDriverState *bs, int64_t offset,
return 0;
}
+ assert(bytes <= INT_MAX); /* thanks to max_pdiscard */
+
memset(&discard_iov, 0, sizeof(discard_iov));
memset(&iov, 0, sizeof(iov));
iov.iov_base = &zero;
@@ -3186,6 +3188,11 @@ static int64_t sd_get_allocated_file_size(BlockDriverState *bs)
return size;
}
+static void sd_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+ bs->bl.max_pdiscard = INT_MAX;
+}
+
static QemuOptsList sd_create_opts = {
.name = "sheepdog-create-opts",
.head = QTAILQ_HEAD_INITIALIZER(sd_create_opts.head),
@@ -3269,6 +3276,8 @@ static BlockDriver bdrv_sheepdog = {
.create_opts = &sd_create_opts,
.strong_runtime_opts = sd_strong_runtime_opts,
+
+ .bdrv_refresh_limits = sd_refresh_limits,
};
static BlockDriver bdrv_sheepdog_tcp = {
@@ -3307,6 +3316,8 @@ static BlockDriver bdrv_sheepdog_tcp = {
.create_opts = &sd_create_opts,
.strong_runtime_opts = sd_strong_runtime_opts,
+
+ .bdrv_refresh_limits = sd_refresh_limits,
};
static BlockDriver bdrv_sheepdog_unix = {
@@ -3345,6 +3356,8 @@ static BlockDriver bdrv_sheepdog_unix = {
.create_opts = &sd_create_opts,
.strong_runtime_opts = sd_strong_runtime_opts,
+
+ .bdrv_refresh_limits = sd_refresh_limits,
};
static void bdrv_sheepdog_init(void)
diff --git a/block/throttle.c b/block/throttle.c
index c13fe9067f..6e8d52fa24 100644
--- a/block/throttle.c
+++ b/block/throttle.c
@@ -145,7 +145,7 @@ static int coroutine_fn throttle_co_pwrite_zeroes(BlockDriverState *bs,
}
static int coroutine_fn throttle_co_pdiscard(BlockDriverState *bs,
- int64_t offset, int bytes)
+ int64_t offset, int64_t bytes)
{
ThrottleGroupMember *tgm = bs->opaque;
throttle_group_co_io_limits_intercept(tgm, bytes, true);
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index 50b8718b2a..9656814814 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -48,7 +48,7 @@ static int coroutine_fn bdrv_test_co_pwritev(BlockDriverState *bs,
}
static int coroutine_fn bdrv_test_co_pdiscard(BlockDriverState *bs,
- int64_t offset, int bytes)
+ int64_t offset, int64_t bytes)
{
return 0;
}
diff --git a/block/trace-events b/block/trace-events
index 3edd2899c2..3b86c03b2f 100644
--- a/block/trace-events
+++ b/block/trace-events
@@ -152,8 +152,8 @@ nvme_write_zeroes(void *s, uint64_t offset, uint64_t bytes, int flags) "s %p off
nvme_qiov_unaligned(const void *qiov, int n, void *base, size_t size, int align) "qiov %p n %d base %p size 0x%zx align 0x%x"
nvme_prw_buffered(void *s, uint64_t offset, uint64_t bytes, int niov, int is_write) "s %p offset 0x%"PRIx64" bytes %"PRId64" niov %d is_write %d"
nvme_rw_done(void *s, int is_write, uint64_t offset, uint64_t bytes, int ret) "s %p is_write %d offset 0x%"PRIx64" bytes %"PRId64" ret %d"
-nvme_dsm(void *s, uint64_t offset, uint64_t bytes) "s %p offset 0x%"PRIx64" bytes %"PRId64""
-nvme_dsm_done(void *s, uint64_t offset, uint64_t bytes, int ret) "s %p offset 0x%"PRIx64" bytes %"PRId64" ret %d"
+nvme_dsm(void *s, int64_t offset, int64_t bytes) "s %p offset 0x%"PRIx64" bytes %"PRId64""
+nvme_dsm_done(void *s, int64_t offset, int64_t bytes, int ret) "s %p offset 0x%"PRIx64" bytes %"PRId64" ret %d"
nvme_dma_map_flush(void *s) "s %p"
nvme_free_req_queue_wait(void *s, unsigned q_index) "s %p q #%u"
nvme_create_queue_pair(unsigned q_index, void *q, unsigned size, void *aio_context, int fd) "index %u q %p size %u aioctx %p fd %d"
--
2.29.2
On Wed, May 05, 2021 at 10:50:00AM +0300, Vladimir Sementsov-Ogievskiy wrote:
> We are generally moving to int64_t for both offset and bytes parameters
> on all io paths.
>
> Main motivation is realization of 64-bit write_zeroes operation for
> fast zeroing large disk chunks, up to the whole disk.
>
> We chose signed type, to be consistent with off_t (which is signed) and
> with possibility for signed return type (where negative value means
> error).
>
> So, convert driver discard handlers bytes parameter to int64_t.
>
> The only caller of all updated function is bdrv_co_pdiscard in
> block/io.c. It is already prepared to work with 64bit requests, but
> pass at most max(bs->bl.max_pdiscard, INT_MAX) to the driver.
>
> Let's look at all updated functions:
>
> backup-top: pass to bdrv_co_pdiscard which is 64bit
and to backup_top_cbw, but that is also 64-bit
>
> blkdebug: all calculations are still OK, thanks to
> bdrv_check_qiov_request().
> both rule_check and bdrv_co_pdiscard are 64bit
>
> blklogwrites: pass to blk_loc_writes_co_log which is 64bit
>
> blkreply, copy-on-read, filter-compress: pass to bdrv_co_pdiscard, OK
blkreplay
>
> file-posix: one handler calls raw_account_discard() is 64bit and both
> handlers calls raw_do_pdiscard(). Update raw_do_pdiscard, which pass
> to RawPosixAIOData::aio_nbytes, which is 64bit (and calls
> raw_account_discard())
>
> gluster: somehow, third argument of glfs_discard_async is size_t.
> Let's set max_pdiscard accordingly.
>
> iscsi: iscsi_allocmap_set_invalid is 64bit,
> !is_byte_request_lun_aligned is 64bit.
> list.num is uint32_t. Let's clarify max_pdiscard and
> pdiscard_alignment.
The patch tweaks max_pdiscard, but doesn't change pdiscard_alignment.
>
> mirror_top, preallocate: pass to bdrv_mirror_top_do_write() which is
> 64bit
file is mirror.c, not mirror-top.c. But it matches the BlockDriver
bdrv_mirror_top name. preallocate does not call
bdrv_mirror_top_do_write, so it's probably worth separating that line
out.
>
> nbd: protocol limitation. max_pdiscard is alredy set strict enough,
> keep it as is for now.
>
> nvmd: buf.nlb is uint32_t and we do shift. So, add corresponding limits
> to nvme_refresh_limits().
nvme
>
> qcow2: calculations are still OK, thanks to bdrv_check_qiov_request(),
> qcow2_cluster_discard() is 64bit.
>
> raw-format: raw_adjust_offset() is 64bit, bdrv_co_pdiscard too.
>
> sheepdog: the format is deprecated. Don't care and just make old
> INT_MAX limit to be explicit
>
> throttle: pass to bdrv_co_pdiscard() which is 64bit and to
> throttle_group_co_io_limits_intercept() which is 64bit as well.
>
> test-block-iothread: bytes argument is unused
>
> Great! Now all drivers are prepared to 64bit discard requests or has
> explicit max_pdiscard limit.
are prepared to handle 64-bit discard requests, or else have explicit
max_pdiscard limits.
>
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
> include/block/block_int.h | 2 +-
> block/backup-top.c | 2 +-
> block/blkdebug.c | 2 +-
> block/blklogwrites.c | 4 ++--
> block/blkreplay.c | 2 +-
> block/copy-on-read.c | 2 +-
> block/file-posix.c | 7 ++++---
> block/filter-compress.c | 2 +-
> block/gluster.c | 7 +++++--
> block/iscsi.c | 10 +++++-----
> block/mirror.c | 2 +-
> block/nbd.c | 6 ++++--
> block/nvme.c | 14 +++++++++++++-
> block/preallocate.c | 2 +-
> block/qcow2.c | 2 +-
> block/raw-format.c | 2 +-
> block/sheepdog.c | 15 ++++++++++++++-
> block/throttle.c | 2 +-
> tests/unit/test-block-iothread.c | 2 +-
> block/trace-events | 4 ++--
> 20 files changed, 61 insertions(+), 30 deletions(-)
>
> +++ b/block/gluster.c
> @@ -891,6 +891,7 @@ out:
> static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp)
> {
> bs->bl.max_transfer = GLUSTER_MAX_TRANSFER;
> + bs->bl.max_pdiscard = SIZE_MAX;
We probably want this to be MIN(GLUSTER_MAX_TRANSFER, SIZE_MAX). Also,
do we want to round it down to alignment boundaries?
> +++ b/block/iscsi.c
> @@ -1141,7 +1141,8 @@ iscsi_getlength(BlockDriverState *bs)
> }
>
> static int
> -coroutine_fn iscsi_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes)
> +coroutine_fn iscsi_co_pdiscard(BlockDriverState *bs, int64_t offset,
> + int64_t bytes)
> {
> IscsiLun *iscsilun = bs->opaque;
> struct IscsiTask iTask;
Did you want to add some sort of assert(bytes / iscsilun->block_size
<= UINT32_MAX), or a comment that we are relying on bl.max_pdiscard?
> +++ b/block/sheepdog.c
> +static void sd_refresh_limits(BlockDriverState *bs, Error **errp)
> +{
> + bs->bl.max_pdiscard = INT_MAX;
Do we want to round this down to alignment?
Looks close!
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
07.06.2021 21:13, Eric Blake wrote:
> On Wed, May 05, 2021 at 10:50:00AM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> We are generally moving to int64_t for both offset and bytes parameters
>> on all io paths.
>>
>> Main motivation is realization of 64-bit write_zeroes operation for
>> fast zeroing large disk chunks, up to the whole disk.
>>
>> We chose signed type, to be consistent with off_t (which is signed) and
>> with possibility for signed return type (where negative value means
>> error).
>>
>> So, convert driver discard handlers bytes parameter to int64_t.
>>
>> The only caller of all updated function is bdrv_co_pdiscard in
>> block/io.c. It is already prepared to work with 64bit requests, but
>> pass at most max(bs->bl.max_pdiscard, INT_MAX) to the driver.
>>
>> Let's look at all updated functions:
>>
>> backup-top: pass to bdrv_co_pdiscard which is 64bit
>
> and to backup_top_cbw, but that is also 64-bit
>
>>
>> blkdebug: all calculations are still OK, thanks to
>> bdrv_check_qiov_request().
>> both rule_check and bdrv_co_pdiscard are 64bit
>>
>> blklogwrites: pass to blk_loc_writes_co_log which is 64bit
>>
>> blkreply, copy-on-read, filter-compress: pass to bdrv_co_pdiscard, OK
>
> blkreplay
>
>>
>> file-posix: one handler calls raw_account_discard() is 64bit and both
>> handlers calls raw_do_pdiscard(). Update raw_do_pdiscard, which pass
>> to RawPosixAIOData::aio_nbytes, which is 64bit (and calls
>> raw_account_discard())
>>
>> gluster: somehow, third argument of glfs_discard_async is size_t.
>> Let's set max_pdiscard accordingly.
>>
>> iscsi: iscsi_allocmap_set_invalid is 64bit,
>> !is_byte_request_lun_aligned is 64bit.
>> list.num is uint32_t. Let's clarify max_pdiscard and
>> pdiscard_alignment.
>
> The patch tweaks max_pdiscard, but doesn't change pdiscard_alignment.
>
>>
>> mirror_top, preallocate: pass to bdrv_mirror_top_do_write() which is
>> 64bit
>
> file is mirror.c, not mirror-top.c. But it matches the BlockDriver
> bdrv_mirror_top name. preallocate does not call
> bdrv_mirror_top_do_write, so it's probably worth separating that line
> out.
>
>>
>> nbd: protocol limitation. max_pdiscard is alredy set strict enough,
>> keep it as is for now.
>>
>> nvmd: buf.nlb is uint32_t and we do shift. So, add corresponding limits
>> to nvme_refresh_limits().
>
> nvme
>
>>
>> qcow2: calculations are still OK, thanks to bdrv_check_qiov_request(),
>> qcow2_cluster_discard() is 64bit.
>>
>> raw-format: raw_adjust_offset() is 64bit, bdrv_co_pdiscard too.
>>
>> sheepdog: the format is deprecated. Don't care and just make old
>> INT_MAX limit to be explicit
>>
>> throttle: pass to bdrv_co_pdiscard() which is 64bit and to
>> throttle_group_co_io_limits_intercept() which is 64bit as well.
>>
>> test-block-iothread: bytes argument is unused
>>
>> Great! Now all drivers are prepared to 64bit discard requests or has
>> explicit max_pdiscard limit.
>
> are prepared to handle 64-bit discard requests, or else have explicit
> max_pdiscard limits.
>
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>> include/block/block_int.h | 2 +-
>> block/backup-top.c | 2 +-
>> block/blkdebug.c | 2 +-
>> block/blklogwrites.c | 4 ++--
>> block/blkreplay.c | 2 +-
>> block/copy-on-read.c | 2 +-
>> block/file-posix.c | 7 ++++---
>> block/filter-compress.c | 2 +-
>> block/gluster.c | 7 +++++--
>> block/iscsi.c | 10 +++++-----
>> block/mirror.c | 2 +-
>> block/nbd.c | 6 ++++--
>> block/nvme.c | 14 +++++++++++++-
>> block/preallocate.c | 2 +-
>> block/qcow2.c | 2 +-
>> block/raw-format.c | 2 +-
>> block/sheepdog.c | 15 ++++++++++++++-
>> block/throttle.c | 2 +-
>> tests/unit/test-block-iothread.c | 2 +-
>> block/trace-events | 4 ++--
>> 20 files changed, 61 insertions(+), 30 deletions(-)
>>
>
>> +++ b/block/gluster.c
>> @@ -891,6 +891,7 @@ out:
>> static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp)
>> {
>> bs->bl.max_transfer = GLUSTER_MAX_TRANSFER;
>> + bs->bl.max_pdiscard = SIZE_MAX;
>
> We probably want this to be MIN(GLUSTER_MAX_TRANSFER, SIZE_MAX). Also,
> do we want to round it down to alignment boundaries?
I don't think so.. We just call glfs_discard_async() function which is not part of Qemu. So we shouldn't assume any extra restrictions except for argument types I think. byte is size_t, so maximum is SIZE_MAX.
>
>> +++ b/block/iscsi.c
>> @@ -1141,7 +1141,8 @@ iscsi_getlength(BlockDriverState *bs)
>> }
>>
>> static int
>> -coroutine_fn iscsi_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes)
>> +coroutine_fn iscsi_co_pdiscard(BlockDriverState *bs, int64_t offset,
>> + int64_t bytes)
>> {
>> IscsiLun *iscsilun = bs->opaque;
>> struct IscsiTask iTask;
>
> Did you want to add some sort of assert(bytes / iscsilun->block_size
> <= UINT32_MAX), or a comment that we are relying on bl.max_pdiscard?
Yes, will add, we are storing it to list.num which is uint32_t and don't want it to be overflowed
>
>> +++ b/block/sheepdog.c
>
>> +static void sd_refresh_limits(BlockDriverState *bs, Error **errp)
>> +{
>> + bs->bl.max_pdiscard = INT_MAX;
>
> Do we want to round this down to alignment?
>
anyway, block/sheepdog.c is absent now :)
--
Best regards,
Vladimir
© 2016 - 2026 Red Hat, Inc.