For block host devices, I/O can happen through either the kernel file
descriptor I/O system calls (preadv/pwritev, io_submit, io_uring)
or the SCSI passthrough ioctl SG_IO.
In the latter case, the size of each transfer can be limited by the
HBA, while for file descriptor I/O the kernel is able to split and
merge I/O in smaller pieces as needed. Applying the HBA limits to
file descriptor I/O results in more system calls and suboptimal
performance, so this patch splits the max_transfer limit in two:
max_transfer remains valid and is used in general, while max_hw_transfer
is limited to the maximum hardware size. max_hw_transfer can then be
included by the scsi-generic driver in the block limits page, to ensure
that the stricter hardware limit is used.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
block/block-backend.c | 12 ++++++++++++
block/file-posix.c | 2 +-
block/io.c | 1 +
hw/scsi/scsi-generic.c | 2 +-
include/block/block_int.h | 7 +++++++
include/sysemu/block-backend.h | 1 +
6 files changed, 23 insertions(+), 2 deletions(-)
diff --git a/block/block-backend.c b/block/block-backend.c
index 15f1ea4288..2ea1412a54 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -1953,6 +1953,18 @@ uint32_t blk_get_request_alignment(BlockBackend *blk)
return bs ? bs->bl.request_alignment : BDRV_SECTOR_SIZE;
}
+/* Returns the maximum hardware transfer length, in bytes; guaranteed nonzero */
+uint64_t blk_get_max_hw_transfer(BlockBackend *blk)
+{
+ BlockDriverState *bs = blk_bs(blk);
+ uint64_t max = INT_MAX;
+
+ if (bs) {
+ max = MIN_NON_ZERO(bs->bl.max_hw_transfer, bs->bl.max_transfer);
+ }
+ return max;
+}
+
/* Returns the maximum transfer length, in bytes; guaranteed nonzero */
uint32_t blk_get_max_transfer(BlockBackend *blk)
{
diff --git a/block/file-posix.c b/block/file-posix.c
index e3241a0dd3..f55f92d0f5 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1234,7 +1234,7 @@ static void raw_refresh_limits(BlockDriverState *bs, Error **errp)
int ret = sg_get_max_transfer_length(s->fd);
if (ret > 0 && ret <= BDRV_REQUEST_MAX_BYTES) {
- bs->bl.max_transfer = pow2floor(ret);
+ bs->bl.max_hw_transfer = pow2floor(ret);
}
ret = sg_get_max_segments(s->fd);
diff --git a/block/io.c b/block/io.c
index 323854d063..089b99bb0c 100644
--- a/block/io.c
+++ b/block/io.c
@@ -127,6 +127,7 @@ static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src)
{
dst->opt_transfer = MAX(dst->opt_transfer, src->opt_transfer);
dst->max_transfer = MIN_NON_ZERO(dst->max_transfer, src->max_transfer);
+ dst->max_hw_transfer = MIN_NON_ZERO(dst->max_hw_transfer, src->max_hw_transfer);
dst->opt_mem_alignment = MAX(dst->opt_mem_alignment,
src->opt_mem_alignment);
dst->min_mem_alignment = MAX(dst->min_mem_alignment,
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 82e1e2ee79..3762dce749 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -179,7 +179,7 @@ static void scsi_handle_inquiry_reply(SCSIGenericReq *r, SCSIDevice *s)
(r->req.cmd.buf[1] & 0x01)) {
page = r->req.cmd.buf[2];
if (page == 0xb0) {
- uint32_t max_transfer = blk_get_max_transfer(s->conf.blk);
+ uint64_t max_transfer = blk_get_max_hw_transfer(s->conf.blk);
uint32_t max_iov = blk_get_max_iov(s->conf.blk);
assert(max_transfer);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 057d88b1fc..f1a54db0f8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -695,6 +695,13 @@ typedef struct BlockLimits {
* clamped down. */
uint32_t max_transfer;
+ /* Maximal hardware transfer length in bytes. Applies whenever
+ * transfers to the device bypass the kernel I/O scheduler, for
+ * example with SG_IO. If larger than max_transfer or if zero,
+ * blk_get_max_hw_transfer will fall back to max_transfer.
+ */
+ uint64_t max_hw_transfer;
+
/* memory alignment, in bytes so that no bounce buffer is needed */
size_t min_mem_alignment;
diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h
index 5423e3d9c6..9ac5f7bbd3 100644
--- a/include/sysemu/block-backend.h
+++ b/include/sysemu/block-backend.h
@@ -208,6 +208,7 @@ void blk_eject(BlockBackend *blk, bool eject_flag);
int blk_get_flags(BlockBackend *blk);
uint32_t blk_get_request_alignment(BlockBackend *blk);
uint32_t blk_get_max_transfer(BlockBackend *blk);
+uint64_t blk_get_max_hw_transfer(BlockBackend *blk);
int blk_get_max_iov(BlockBackend *blk);
void blk_set_guest_block_size(BlockBackend *blk, int align);
void *blk_try_blockalign(BlockBackend *blk, size_t size);
--
2.31.1
On Thu, Jun 03, 2021 at 03:37:18PM +0200, Paolo Bonzini wrote:
> For block host devices, I/O can happen through either the kernel file
> descriptor I/O system calls (preadv/pwritev, io_submit, io_uring)
> or the SCSI passthrough ioctl SG_IO.
>
> In the latter case, the size of each transfer can be limited by the
> HBA, while for file descriptor I/O the kernel is able to split and
> merge I/O in smaller pieces as needed. Applying the HBA limits to
> file descriptor I/O results in more system calls and suboptimal
> performance, so this patch splits the max_transfer limit in two:
> max_transfer remains valid and is used in general, while max_hw_transfer
> is limited to the maximum hardware size. max_hw_transfer can then be
> included by the scsi-generic driver in the block limits page, to ensure
> that the stricter hardware limit is used.
>
> +/* Returns the maximum hardware transfer length, in bytes; guaranteed nonzero */
> +uint64_t blk_get_max_hw_transfer(BlockBackend *blk)
> +{
> + BlockDriverState *bs = blk_bs(blk);
> + uint64_t max = INT_MAX;
This is an unaligned value; should we instead round it down to the
request_alignment granularity?
> +
> + if (bs) {
> + max = MIN_NON_ZERO(bs->bl.max_hw_transfer, bs->bl.max_transfer);
> + }
> + return max;
> +}
> +
> /* Returns the maximum transfer length, in bytes; guaranteed nonzero */
> uint32_t blk_get_max_transfer(BlockBackend *blk)
> {
> +++ b/include/block/block_int.h
> @@ -695,6 +695,13 @@ typedef struct BlockLimits {
> * clamped down. */
> uint32_t max_transfer;
>
> + /* Maximal hardware transfer length in bytes. Applies whenever
Leading /* on its own line, per our style.
> + * transfers to the device bypass the kernel I/O scheduler, for
> + * example with SG_IO. If larger than max_transfer or if zero,
> + * blk_get_max_hw_transfer will fall back to max_transfer.
> + */
Should we mandate any additional requirements on this value such as
multiple of request_alignment or even power-of-2?
> + uint64_t max_hw_transfer;
> +
> /* memory alignment, in bytes so that no bounce buffer is needed */
> size_t min_mem_alignment;
>
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
On 03/06/21 19:33, Eric Blake wrote:
>> +/* Returns the maximum hardware transfer length, in bytes; guaranteed nonzero */
>> +uint64_t blk_get_max_hw_transfer(BlockBackend *blk)
>> +{
>> + BlockDriverState *bs = blk_bs(blk);
>> + uint64_t max = INT_MAX;
>
> This is an unaligned value; should we instead round it down to the
> request_alignment granularity?
See below...
>> +++ b/include/block/block_int.h
>> @@ -695,6 +695,13 @@ typedef struct BlockLimits {
>> * clamped down. */
>> uint32_t max_transfer;
>>
>> + /* Maximal hardware transfer length in bytes. Applies whenever
>
> Leading /* on its own line, per our style.
The whole file still uses this style, I can change it if desired or do
it later for the whole file or even the whole block subsystem.
>> + * transfers to the device bypass the kernel I/O scheduler, for
>> + * example with SG_IO. If larger than max_transfer or if zero,
>> + * blk_get_max_hw_transfer will fall back to max_transfer.
>> + */
>
> Should we mandate any additional requirements on this value such as
> multiple of request_alignment or even power-of-2?
Certainly not power of 2. Multiple of request_alignment probably makes
sense, but max_transfer doesn't have that limit.
Paolo
>> + uint64_t max_hw_transfer;
>> +
>> /* memory alignment, in bytes so that no bounce buffer is needed */
>> size_t min_mem_alignment;
>>
>
On 03.06.21 15:37, Paolo Bonzini wrote:
> For block host devices, I/O can happen through either the kernel file
> descriptor I/O system calls (preadv/pwritev, io_submit, io_uring)
> or the SCSI passthrough ioctl SG_IO.
>
> In the latter case, the size of each transfer can be limited by the
> HBA, while for file descriptor I/O the kernel is able to split and
> merge I/O in smaller pieces as needed. Applying the HBA limits to
> file descriptor I/O results in more system calls and suboptimal
> performance, so this patch splits the max_transfer limit in two:
> max_transfer remains valid and is used in general, while max_hw_transfer
> is limited to the maximum hardware size. max_hw_transfer can then be
> included by the scsi-generic driver in the block limits page, to ensure
> that the stricter hardware limit is used.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block/block-backend.c | 12 ++++++++++++
> block/file-posix.c | 2 +-
> block/io.c | 1 +
> hw/scsi/scsi-generic.c | 2 +-
> include/block/block_int.h | 7 +++++++
> include/sysemu/block-backend.h | 1 +
> 6 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 15f1ea4288..2ea1412a54 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -1953,6 +1953,18 @@ uint32_t blk_get_request_alignment(BlockBackend *blk)
> return bs ? bs->bl.request_alignment : BDRV_SECTOR_SIZE;
> }
>
> +/* Returns the maximum hardware transfer length, in bytes; guaranteed nonzero */
> +uint64_t blk_get_max_hw_transfer(BlockBackend *blk)
> +{
> + BlockDriverState *bs = blk_bs(blk);
> + uint64_t max = INT_MAX;
> +
> + if (bs) {
> + max = MIN_NON_ZERO(bs->bl.max_hw_transfer, bs->bl.max_transfer);
> + }
> + return max;
Both `max_hw_transfer` and `max_transfer` can be 0, so this could return
0, contrary to what the comment above promises.
Should `max` be initialized to 0 with a `MIN_NON_ZERO(max, INT_MAX)`
here (like `blk_get_max_transfer()` does it)?
(As for the rest, I think aligning to the request alignment makes sense,
but then again we don’t do that for max_transfer either, so... this at
least wouldn’t be a new bug.
Regarding the comment, checkpatch complains about it, so it should be
fixed so that /* is on its own line.
Speaking of checkpatch, now that I ran it, it also complains about the
new line in bdrv_merge_limits() exceeding 80 characters, so that should
be fixed, too.)
Max
On 15/06/21 18:18, Max Reitz wrote:
>> }
>> +/* Returns the maximum hardware transfer length, in bytes; guaranteed
>> nonzero */
>> +uint64_t blk_get_max_hw_transfer(BlockBackend *blk)
>> +{
>> + BlockDriverState *bs = blk_bs(blk);
>> + uint64_t max = INT_MAX;
>> +
>> + if (bs) {
>> + max = MIN_NON_ZERO(bs->bl.max_hw_transfer, bs->bl.max_transfer);
>> + }
>> + return max;
>
> Both `max_hw_transfer` and `max_transfer` can be 0, so this could return
> 0, contrary to what the comment above promises.
>
> Should `max` be initialized to 0 with a `MIN_NON_ZERO(max, INT_MAX)`
> here (like `blk_get_max_transfer()` does it)?
Yes, something to that effect.
> (As for the rest, I think aligning to the request alignment makes sense,
> but then again we don’t do that for max_transfer either, so... this at
> least wouldn’t be a new bug.
Ok, will do. I will also add a new patch to align max_transfer to the
request alignment.
> Regarding the comment, checkpatch complains about it, so it should be
> fixed so that /* is on its own line.
That makes it different from every other comment in block_int.h though.
Is it okay to fix all of them in a follow-up?
Paolo
> Speaking of checkpatch, now that I ran it, it also complains about the
> new line in bdrv_merge_limits() exceeding 80 characters, so that should
> be fixed, too.)
On 16.06.21 15:18, Paolo Bonzini wrote:
> On 15/06/21 18:18, Max Reitz wrote:
>>> }
>>> +/* Returns the maximum hardware transfer length, in bytes;
>>> guaranteed nonzero */
>>> +uint64_t blk_get_max_hw_transfer(BlockBackend *blk)
>>> +{
>>> + BlockDriverState *bs = blk_bs(blk);
>>> + uint64_t max = INT_MAX;
>>> +
>>> + if (bs) {
>>> + max = MIN_NON_ZERO(bs->bl.max_hw_transfer,
>>> bs->bl.max_transfer);
>>> + }
>>> + return max;
>>
>> Both `max_hw_transfer` and `max_transfer` can be 0, so this could
>> return 0, contrary to what the comment above promises.
>>
>> Should `max` be initialized to 0 with a `MIN_NON_ZERO(max, INT_MAX)`
>> here (like `blk_get_max_transfer()` does it)?
>
> Yes, something to that effect.
>
>> (As for the rest, I think aligning to the request alignment makes
>> sense, but then again we don’t do that for max_transfer either, so...
>> this at least wouldn’t be a new bug.
>
> Ok, will do. I will also add a new patch to align max_transfer to the
> request alignment.
>
>> Regarding the comment, checkpatch complains about it, so it should be
>> fixed so that /* is on its own line.
>
> That makes it different from every other comment in block_int.h
> though. Is it okay to fix all of them in a follow-up?
The reason it’s different is that the comment style in question was
added to checkpatch only relatively recently. I can’t speak for others,
but I’m a simple person. I just do what makes checkpatch happy. :)
Given that the checkpatch complaint is only a warning, I think it’s OK
to keep the comment as it is here, and perhaps optionally fix all
comments in block_int.h in a follow-up. I don’t think we need to fix
existing comments, but, well, it wouldn’t be wrong.
Max
On 16/06/21 15:46, Max Reitz wrote: > Given that the checkpatch complaint is only a warning, I think it’s OK > to keep the comment as it is here, and perhaps optionally fix all > comments in block_int.h in a follow-up. I don’t think we need to fix > existing comments, but, well, it wouldn’t be wrong. We don't need to, but sometimes the benefit (of not discussing checkpatch over and over) does seem to outweigh the churn. Thanks for the review! Paolo
© 2016 - 2026 Red Hat, Inc.