nbd: Fix large trim/zero requests

[PATCH for-5.1] nbd: Fix large trim/zero requests

Posted by Eric Blake 3 years, 9 months ago

Although qemu as NBD client limits requests to <2G, the NBD protocol
allows clients to send requests almost all the way up to 4G.  But
because our block layer is not yet 64-bit clean, we accidentally wrap
such requests into a negative size, and fail with EIO instead of
performing the intended operation.

The bug is visible in modern systems with something as simple as:

$ qemu-img create -f qcow2 /tmp/image.img 5G
$ sudo qemu-nbd --connect=/dev/nbd0 /tmp/image.img
$ sudo blkdiscard /dev/nbd0

or with user-space only:

$ truncate --size=3G file
$ qemu-nbd -f raw file
$ nbdsh -u nbd://localhost:10809 -c 'h.trim(3*1024*1024*1024,0)'

Alas, our iotests do not currently make it easy to add external
dependencies on blkdiscard or nbdsh, so we have to rely on manual
testing for now.

This patch can be reverted when we later improve the overall block
layer to be 64-bit clean, but for now, a minimal fix was deemed less
risky prior to release.

CC: qemu-stable@nongnu.org
Fixes: 1f4d6d18ed
Fixes: 1c6c4bb7f0
Fixes: https://github.com/systemd/systemd/issues/16242
Signed-off-by: Eric Blake <eblake@redhat.com>
---
 nbd/server.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 4752a6c8bc07..029618017c90 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -2378,8 +2378,17 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
         if (request->flags & NBD_CMD_FLAG_FAST_ZERO) {
             flags |= BDRV_REQ_NO_FALLBACK;
         }
-        ret = blk_pwrite_zeroes(exp->blk, request->from + exp->dev_offset,
-                                request->len, flags);
+        ret = 0;
+        /* FIXME simplify this when blk_pwrite_zeroes switches to 64-bit */
+        while (ret == 0 && request->len) {
+            int align = client->check_align ?: 1;
+            int len = MIN(request->len, QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES,
+                                                        align));
+            ret = blk_pwrite_zeroes(exp->blk, request->from + exp->dev_offset,
+                                    len, flags);
+            request->len -= len;
+            request->from += len;
+        }
         return nbd_send_generic_reply(client, request->handle, ret,
                                       "writing to file failed", errp);

@@ -2393,8 +2402,17 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
                                       "flush failed", errp);

     case NBD_CMD_TRIM:
-        ret = blk_co_pdiscard(exp->blk, request->from + exp->dev_offset,
-                              request->len);
+        ret = 0;
+        /* FIXME simplify this when blk_co_pdiscard switches to 64-bit */
+        while (ret == 0 && request->len) {
+            int align = client->check_align ?: 1;
+            int len = MIN(request->len, QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES,
+                                                        align));
+            ret = blk_co_pdiscard(exp->blk, request->from + exp->dev_offset,
+                                  len);
+            request->len -= len;
+            request->from += len;
+        }
         if (ret == 0 && request->flags & NBD_CMD_FLAG_FUA) {
             ret = blk_co_flush(exp->blk);
         }
-- 
2.27.0

Re: [PATCH for-5.1] nbd: Fix large trim/zero requests

Posted by Vladimir Sementsov-Ogievskiy 3 years, 9 months ago

23.07.2020 00:22, Eric Blake wrote:
> Although qemu as NBD client limits requests to <2G, the NBD protocol
> allows clients to send requests almost all the way up to 4G.  But
> because our block layer is not yet 64-bit clean, we accidentally wrap
> such requests into a negative size, and fail with EIO instead of
> performing the intended operation.
> 
> The bug is visible in modern systems with something as simple as:
> 
> $ qemu-img create -f qcow2 /tmp/image.img 5G
> $ sudo qemu-nbd --connect=/dev/nbd0 /tmp/image.img
> $ sudo blkdiscard /dev/nbd0
> 
> or with user-space only:
> 
> $ truncate --size=3G file
> $ qemu-nbd -f raw file
> $ nbdsh -u nbd://localhost:10809 -c 'h.trim(3*1024*1024*1024,0)'
> 
> Alas, our iotests do not currently make it easy to add external
> dependencies on blkdiscard or nbdsh, so we have to rely on manual
> testing for now.
> 
> This patch can be reverted when we later improve the overall block
> layer to be 64-bit clean, but for now, a minimal fix was deemed less
> risky prior to release.
> 
> CC: qemu-stable@nongnu.org
> Fixes: 1f4d6d18ed
> Fixes: 1c6c4bb7f0
> Fixes: https://github.com/systemd/systemd/issues/16242
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   nbd/server.c | 26 ++++++++++++++++++++++----
>   1 file changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 4752a6c8bc07..029618017c90 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -2378,8 +2378,17 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
>           if (request->flags & NBD_CMD_FLAG_FAST_ZERO) {
>               flags |= BDRV_REQ_NO_FALLBACK;
>           }
> -        ret = blk_pwrite_zeroes(exp->blk, request->from + exp->dev_offset,
> -                                request->len, flags);
> +        ret = 0;
> +        /* FIXME simplify this when blk_pwrite_zeroes switches to 64-bit */
> +        while (ret == 0 && request->len) {
> +            int align = client->check_align ?: 1;
> +            int len = MIN(request->len, QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES,
> +                                                        align));
> +            ret = blk_pwrite_zeroes(exp->blk, request->from + exp->dev_offset,
> +                                    len, flags);
> +            request->len -= len;
> +            request->from += len;
> +        }
>           return nbd_send_generic_reply(client, request->handle, ret,
>                                         "writing to file failed", errp);
> 
> @@ -2393,8 +2402,17 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
>                                         "flush failed", errp);
> 
>       case NBD_CMD_TRIM:
> -        ret = blk_co_pdiscard(exp->blk, request->from + exp->dev_offset,
> -                              request->len);
> +        ret = 0;
> +        /* FIXME simplify this when blk_co_pdiscard switches to 64-bit */
> +        while (ret == 0 && request->len) {

Did you check all the paths not to return positive ret on success? I'd prefer to compare ret >= 0 for this temporary code to not care of it

> +            int align = client->check_align ?: 1;
> +            int len = MIN(request->len, QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES,
> +                                                        align));
> +            ret = blk_co_pdiscard(exp->blk, request->from + exp->dev_offset,
> +                                  len);
> +            request->len -= len;
> +            request->from += len;

Hmm.. Modifying the function parameter. Safe now, as nbd_handle_request() call is the last usage of request in nbd_trip().

> +        }
>           if (ret == 0 && request->flags & NBD_CMD_FLAG_FUA) {
>               ret = blk_co_flush(exp->blk);
>           }
> 


-- 
Best regards,
Vladimir

Re: [PATCH for-5.1] nbd: Fix large trim/zero requests

Posted by Eric Blake 3 years, 9 months ago

On 7/23/20 2:23 AM, Vladimir Sementsov-Ogievskiy wrote:
> 23.07.2020 00:22, Eric Blake wrote:
>> Although qemu as NBD client limits requests to <2G, the NBD protocol
>> allows clients to send requests almost all the way up to 4G.  But
>> because our block layer is not yet 64-bit clean, we accidentally wrap
>> such requests into a negative size, and fail with EIO instead of
>> performing the intended operation.
>>

>> @@ -2378,8 +2378,17 @@ static coroutine_fn int 
>> nbd_handle_request(NBDClient *client,
>>           if (request->flags & NBD_CMD_FLAG_FAST_ZERO) {
>>               flags |= BDRV_REQ_NO_FALLBACK;
>>           }
>> -        ret = blk_pwrite_zeroes(exp->blk, request->from + 
>> exp->dev_offset,
>> -                                request->len, flags);
>> +        ret = 0;
>> +        /* FIXME simplify this when blk_pwrite_zeroes switches to 
>> 64-bit */
>> +        while (ret == 0 && request->len) {
>> +            int align = client->check_align ?: 1;
>> +            int len = MIN(request->len, 
>> QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES,
>> +                                                        align));
>> +            ret = blk_pwrite_zeroes(exp->blk, request->from + 
>> exp->dev_offset,
>> +                                    len, flags);
>> +            request->len -= len;
>> +            request->from += len;
>> +        }
>>           return nbd_send_generic_reply(client, request->handle, ret,
>>                                         "writing to file failed", errp);

It's easy enough to audit that blk_pwrite_zeroes returns 0 (and not 
arbitrary positive) on success.

>>
>> @@ -2393,8 +2402,17 @@ static coroutine_fn int 
>> nbd_handle_request(NBDClient *client,
>>                                         "flush failed", errp);
>>
>>       case NBD_CMD_TRIM:
>> -        ret = blk_co_pdiscard(exp->blk, request->from + exp->dev_offset,
>> -                              request->len);
>> +        ret = 0;
>> +        /* FIXME simplify this when blk_co_pdiscard switches to 
>> 64-bit */
>> +        while (ret == 0 && request->len) {
> 
> Did you check all the paths not to return positive ret on success? I'd 
> prefer to compare ret >= 0 for this temporary code to not care of it

And for blk_co_pdiscard, the audit is already right here in the patch: 
pre-patch, ret = blk_co_pdiscard, followed by...

> 
>> +            int align = client->check_align ?: 1;
>> +            int len = MIN(request->len, 
>> QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES,
>> +                                                        align));
>> +            ret = blk_co_pdiscard(exp->blk, request->from + 
>> exp->dev_offset,
>> +                                  len);
>> +            request->len -= len;
>> +            request->from += len;
> 
> Hmm.. Modifying the function parameter. Safe now, as 
> nbd_handle_request() call is the last usage of request in nbd_trip().
> 
>> +        }
>>           if (ret == 0 && request->flags & NBD_CMD_FLAG_FUA) {

...a check for ret == 0.

>>               ret = blk_co_flush(exp->blk);
>>           }
>>
> 
> 

Yes, I can respin the patch to use >= 0 as the check for success in both 
functions, but it doesn't change the behavior.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH for-5.1] nbd: Fix large trim/zero requests

Posted by Vladimir Sementsov-Ogievskiy 3 years, 9 months ago

23.07.2020 14:47, Eric Blake wrote:
> On 7/23/20 2:23 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 23.07.2020 00:22, Eric Blake wrote:
>>> Although qemu as NBD client limits requests to <2G, the NBD protocol
>>> allows clients to send requests almost all the way up to 4G.  But
>>> because our block layer is not yet 64-bit clean, we accidentally wrap
>>> such requests into a negative size, and fail with EIO instead of
>>> performing the intended operation.
>>>
> 
>>> @@ -2378,8 +2378,17 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
>>>           if (request->flags & NBD_CMD_FLAG_FAST_ZERO) {
>>>               flags |= BDRV_REQ_NO_FALLBACK;
>>>           }
>>> -        ret = blk_pwrite_zeroes(exp->blk, request->from + exp->dev_offset,
>>> -                                request->len, flags);
>>> +        ret = 0;
>>> +        /* FIXME simplify this when blk_pwrite_zeroes switches to 64-bit */
>>> +        while (ret == 0 && request->len) {
>>> +            int align = client->check_align ?: 1;
>>> +            int len = MIN(request->len, QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES,
>>> +                                                        align));
>>> +            ret = blk_pwrite_zeroes(exp->blk, request->from + exp->dev_offset,
>>> +                                    len, flags);
>>> +            request->len -= len;
>>> +            request->from += len;
>>> +        }
>>>           return nbd_send_generic_reply(client, request->handle, ret,
>>>                                         "writing to file failed", errp);
> 
> It's easy enough to audit that blk_pwrite_zeroes returns 0 (and not arbitrary positive) on success.

Yes, that's why I've posted my commend to blk_co_pdiscard :)

> 
>>>
>>> @@ -2393,8 +2402,17 @@ static coroutine_fn int nbd_handle_request(NBDClient *client,
>>>                                         "flush failed", errp);
>>>
>>>       case NBD_CMD_TRIM:
>>> -        ret = blk_co_pdiscard(exp->blk, request->from + exp->dev_offset,
>>> -                              request->len);
>>> +        ret = 0;
>>> +        /* FIXME simplify this when blk_co_pdiscard switches to 64-bit */
>>> +        while (ret == 0 && request->len) {
>>
>> Did you check all the paths not to return positive ret on success? I'd prefer to compare ret >= 0 for this temporary code to not care of it
> 
> And for blk_co_pdiscard, the audit is already right here in the patch: pre-patch, ret = blk_co_pdiscard, followed by...
> 
>>
>>> +            int align = client->check_align ?: 1;
>>> +            int len = MIN(request->len, QEMU_ALIGN_DOWN(BDRV_REQUEST_MAX_BYTES,
>>> +                                                        align));
>>> +            ret = blk_co_pdiscard(exp->blk, request->from + exp->dev_offset,
>>> +                                  len);
>>> +            request->len -= len;
>>> +            request->from += len;
>>
>> Hmm.. Modifying the function parameter. Safe now, as nbd_handle_request() call is the last usage of request in nbd_trip().
>>
>>> +        }
>>>           if (ret == 0 && request->flags & NBD_CMD_FLAG_FUA) {
> 
> ...a check for ret == 0.

Hmm. Is it a bug or not? Anyway, bdrv_co_pdiscard() does "if (ret && .." as well, so if some driver return positive ret,
it may fail earlier..

> 
>>>               ret = blk_co_flush(exp->blk);
>>>           }
>>>
>>
>>
> 
> Yes, I can respin the patch to use >= 0 as the check for success in both functions, but it doesn't change the behavior.
> 

OK. Anyway:
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>

-- 
Best regards,
Vladimir