[v3] 64bit block-layer

[PATCH v3 05/17] block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes()

Posted by Vladimir Sementsov-Ogievskiy 5 years, 9 months ago

We are generally moving to int64_t for both offset and bytes parameters
on all io paths.

Main motivation is realization of 64-bit write_zeroes operation for
fast zeroing large disk chunks, up to the whole disk.

We chose signed type, to be consistent with off_t (which is signed) and
with possibility for signed return type (where negative value means
error).

So, prepare bdrv_co_do_pwrite_zeroes() now.

Patch-correctness audit by Eric Blake:

  Widens from 32- to 64-bit.  Callers:

    bdrv_co_do_copy_on_readv() - passes 'int64_t pnum' bounded by
      fragmenting loop limited to MAX_BOUNCE_BUFFER
    bdrv_aligned_pwritev() - passes 'unsigned int bytes' - latent bug
      fix for sizes between 2G and 4G, if any

    to see if that bug could be tickled, look at callers of
    bdrv_aligned_pwritev:

    bdrv_co_do_zero_pwritev() - splits 'unsigned int bytes' into
      head|body|tail; head and tail are safe but body could be > 2G
    bdrv_co_pwritev_part() - gates with bdrv_check_byte_request()

    continuing the audit, callers of bdrv_co_do_zero_pwritev:

    bdrv_co_pwritev_part() - gates with bdrv_check_byte_request()

    okay, all callers pass < 2G per our current code in
    bdrv_check_byte_request(), so there is no actual bug.

  Use of 'bytes' within the function:

    compute 'int tail' via % 'int alignment' - safe
    fragmentation loop 'int num' - still fragments with a cap on
      max_transfer

    use of 'num' within the loop
    compute 'int head' via % 'int alignment' - safe
    clamp size by 'int max_write_zeroes' - safe
    drv->bdrv_co_pwrite_zeroes(int) - safe because of clamping
    clamp size by 'int max_transfer' - safe
    qemu_iovec_init_buf(size_t) - safe because of clamping
    bdrv_driver_pwritev(uint64_t) [well, int64_t after 4/17] - safe

    So even with the wider type, we aren't exceeding the contract of
    anything we pass it on to.  Later patches may improve
    drv->bdrv_co_pwrite_zeroes and qemu_iovec_init_buf to be 64-bit
    clean, at which point we would want to revisit this function to use
    64-bit clamping rather than 32-bit clamping, but it does not have
    to happen here.

Series: 64bit-block-status
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/io.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block/io.c b/block/io.c
index eeba3b828c..b83749cc50 100644
--- a/block/io.c
+++ b/block/io.c
@@ -42,7 +42,7 @@
 
 static void bdrv_parent_cb_resize(BlockDriverState *bs);
 static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
-    int64_t offset, int bytes, BdrvRequestFlags flags);
+    int64_t offset, int64_t bytes, BdrvRequestFlags flags);
 
 static void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
                                       bool ignore_bds_parents)
@@ -1575,7 +1575,7 @@ int coroutine_fn bdrv_co_preadv_part(BdrvChild *child,
 }
 
 static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
-    int64_t offset, int bytes, BdrvRequestFlags flags)
+    int64_t offset, int64_t bytes, BdrvRequestFlags flags)
 {
     BlockDriver *drv = bs->drv;
     QEMUIOVector qiov;
@@ -1605,7 +1605,7 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
     assert(max_write_zeroes >= bs->bl.request_alignment);
 
     while (bytes > 0 && !ret) {
-        int num = bytes;
+        int64_t num = bytes;
 
         /* Align request.  Block drivers can expect the "bulk" of the request
          * to be aligned, and that unaligned requests do not cross cluster
-- 
2.21.0

Re: [PATCH v3 05/17] block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes()

Posted by Eric Blake 5 years, 9 months ago

On 4/30/20 6:10 AM, Vladimir Sementsov-Ogievskiy wrote:
> We are generally moving to int64_t for both offset and bytes parameters
> on all io paths.
> 
> Main motivation is realization of 64-bit write_zeroes operation for
> fast zeroing large disk chunks, up to the whole disk.
> 
> We chose signed type, to be consistent with off_t (which is signed) and
> with possibility for signed return type (where negative value means
> error).
> 
> So, prepare bdrv_co_do_pwrite_zeroes() now.
> 
> Patch-correctness audit by Eric Blake:
> 

> 
>      use of 'num' within the loop
>      compute 'int head' via % 'int alignment' - safe
>      clamp size by 'int max_write_zeroes' - safe
>      drv->bdrv_co_pwrite_zeroes(int) - safe because of clamping
>      clamp size by 'int max_transfer' - safe
>      qemu_iovec_init_buf(size_t) - safe because of clamping
>      bdrv_driver_pwritev(uint64_t) [well, int64_t after 4/17] - safe

I know you were quoting me, but the [comment] can be dropped (I wrote my 
audit on patches in isolation while reviewing the pre-series state of 
the tree, but when this commit is finally applied, the previous patch 
will already be in place)

> 
>      So even with the wider type, we aren't exceeding the contract of
>      anything we pass it on to.  Later patches may improve
>      drv->bdrv_co_pwrite_zeroes and qemu_iovec_init_buf to be 64-bit
>      clean, at which point we would want to revisit this function to use
>      64-bit clamping rather than 32-bit clamping, but it does not have
>      to happen here.
> 
> Series: 64bit-block-status
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> ---
>   block/io.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v3 05/17] block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes()

Posted by Alberto Garcia 5 years, 9 months ago

On Thu 30 Apr 2020 01:10:21 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
>     compute 'int tail' via % 'int alignment' - safe

    tail = (offset + bytes) % alignment;

both are int64_t, no chance of overflow here?

Berto

Re: [PATCH v3 05/17] block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes()

Posted by Eric Blake 5 years, 9 months ago

On 5/11/20 12:17 PM, Alberto Garcia wrote:
> On Thu 30 Apr 2020 01:10:21 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
>>      compute 'int tail' via % 'int alignment' - safe
> 
>      tail = (offset + bytes) % alignment;
> 
> both are int64_t, no chance of overflow here?

Good question - I know several places check that offset+bytes does not 
overflow, but did not specifically audit if this one does.  Adding an 
assert() in this function may be easier than trying to prove all callers 
pass in safe values.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v3 05/17] block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes()

Posted by Vladimir Sementsov-Ogievskiy 5 years, 7 months ago

11.05.2020 21:34, Eric Blake wrote:
> On 5/11/20 12:17 PM, Alberto Garcia wrote:
>> On Thu 30 Apr 2020 01:10:21 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
>>>      compute 'int tail' via % 'int alignment' - safe
>>
>>      tail = (offset + bytes) % alignment;
>>
>> both are int64_t, no chance of overflow here?
> 
> Good question - I know several places check that offset+bytes does not overflow, but did not specifically audit if this one does.  Adding an assert() in this function may be easier than trying to prove all callers pass in safe values.
> 

Hm, it's preexisting, as int64_t + int may overflow as well. Strange, but I don't see overflow check neither in blk_check_byte_request nor in bdrv_check_byte_request. Only discard, which recently dropped call of bdrv_check_byte_request() has this check.

I can add a patch for overflow check in blk_check_byte_request and bdrv_check_byte_request.. But what about alignment? There may be requests, for which bytes + offset doesn't overflow, but do overflow after aligning up. Refactor bdrv_pad_request() to return an error if we can't pad request due to overflow?

-- 
Best regards,
Vladimir

Re: [PATCH v3 05/17] block/io: support int64_t bytes in bdrv_co_do_pwrite_zeroes()

Posted by Eric Blake 5 years, 7 months ago

On 6/23/20 5:20 AM, Vladimir Sementsov-Ogievskiy wrote:
> 11.05.2020 21:34, Eric Blake wrote:
>> On 5/11/20 12:17 PM, Alberto Garcia wrote:
>>> On Thu 30 Apr 2020 01:10:21 PM CEST, Vladimir Sementsov-Ogievskiy wrote:
>>>>      compute 'int tail' via % 'int alignment' - safe
>>>
>>>      tail = (offset + bytes) % alignment;
>>>
>>> both are int64_t, no chance of overflow here?
>>
>> Good question - I know several places check that offset+bytes does not 
>> overflow, but did not specifically audit if this one does.  Adding an 
>> assert() in this function may be easier than trying to prove all 
>> callers pass in safe values.
>>
> 
> Hm, it's preexisting, as int64_t + int may overflow as well. Strange, 
> but I don't see overflow check neither in blk_check_byte_request nor in 
> bdrv_check_byte_request. Only discard, which recently dropped call of 
> bdrv_check_byte_request() has this check.

In fact, iotest 197 (see commit 461743390) is an instance of testing for 
a bug where we overflowed INT_MAX due to rounding up to cluster size, 
even with a transaction request smaller than limits.

> 
> I can add a patch for overflow check in blk_check_byte_request and 
> bdrv_check_byte_request.. But what about alignment? There may be 
> requests, for which bytes + offset doesn't overflow, but do overflow 
> after aligning up. Refactor bdrv_pad_request() to return an error if we 
> can't pad request due to overflow?

The only cases where int64_t + int can overflow due to rounding up for 
alignment are when the file size is extremely close to 2^63 bytes 
already.  The easiest fix is to reject opening a file that reports a 
size that would overflow when rounded up to alignment (that is, if size 
 > INT64_MAX - alignment, we should refuse to proceed).  Such images 
will never occur for actual disk images (because that is really a LOT of 
storage), but are possible over things like NBD (in fact, nbdkit has 
intentionally made it easy to provoke boundary testing near 2^63 bytes, 
and is already aware that anything larger than 2^63-512 is problematic 
in qemu).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org