[Qemu-devel] [PATCH for-2.11] nbd: Don't crash when server reports NBD_CMD_READ failure

Eric Blake posted 1 patch 6 years, 4 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20171112013936.5942-1-eblake@redhat.com
Test checkpatch passed
Test docker passed
Test ppc passed
Test s390x passed
block/nbd-client.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
[Qemu-devel] [PATCH for-2.11] nbd: Don't crash when server reports NBD_CMD_READ failure
Posted by Eric Blake 6 years, 4 months ago
If a server fails a read, for example with EIO, but the connection
is still live, then we would crash trying to print a non-existent
error message.  Bug introduced in commit f140e300.

Signed-off-by: Eric Blake <eblake@redhat.com>
---
 block/nbd-client.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index b44d4d4a01..5f3375a970 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -78,7 +78,7 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
     while (!s->quit) {
         assert(s->reply.handle == 0);
         ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
-        if (ret < 0) {
+        if (local_err) {
             error_report_err(local_err);
         }
         if (ret <= 0) {
@@ -681,7 +681,7 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,

     ret = nbd_co_receive_cmdread_reply(client, request.handle, offset, qiov,
                                        &local_err);
-    if (ret < 0) {
+    if (local_err) {
         error_report_err(local_err);
     }
     return ret;
-- 
2.13.6


Re: [Qemu-devel] [PATCH for-2.11] nbd: Don't crash when server reports NBD_CMD_READ failure
Posted by Vladimir Sementsov-Ogievskiy 6 years, 4 months ago
12.11.2017 04:39, Eric Blake wrote:
> If a server fails a read, for example with EIO, but the connection
> is still live, then we would crash trying to print a non-existent
> error message.  Bug introduced in commit f140e300.
>
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>   block/nbd-client.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index b44d4d4a01..5f3375a970 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -78,7 +78,7 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
>       while (!s->quit) {
>           assert(s->reply.handle == 0);
>           ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);

hmm.

/* nbd_receive_reply
  * Returns 1 on success
  *         0 on eof, when no data was read (errp is not set)
  *         negative errno on failure (errp is set)
  */
int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)


- looks like errp should be set if ret < 0. may be the function should 
be fixed?
- don't you think that this function returns transferred server error? 
it doesn't.

> -        if (ret < 0) {
> +        if (local_err) {
>               error_report_err(local_err);
>           }
>           if (ret <= 0) {
> @@ -681,7 +681,7 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
>
>       ret = nbd_co_receive_cmdread_reply(client, request.handle, offset, qiov,
>                                          &local_err);
> -    if (ret < 0) {
> +    if (local_err) {
>           error_report_err(local_err);
>       }
>       return ret;

but here, it looks like you are right. My first attempt was to store 
server error to ret and
server error msg to errp (bad idea, as I can see now), but you proposed 
to do not print every server error msg and
deleted this functionality. And now we have a set of functions which can 
return ret < 0
and not set errp.. It looks very confusing, I think. Better solution is 
to refactor this somehow,
may be not mixing server-error-reply with local normal errors..

For now, for second chunk:

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>



-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH for-2.11] nbd: Don't crash when server reports NBD_CMD_READ failure
Posted by Eric Blake 6 years, 4 months ago
On 11/13/2017 05:37 AM, Vladimir Sementsov-Ogievskiy wrote:
> 12.11.2017 04:39, Eric Blake wrote:
>> If a server fails a read, for example with EIO, but the connection
>> is still live, then we would crash trying to print a non-existent
>> error message.  Bug introduced in commit f140e300.
>>
>> Signed-off-by: Eric Blake <eblake@redhat.com>
>> ---
>>   block/nbd-client.c | 4 ++--
>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/block/nbd-client.c b/block/nbd-client.c
>> index b44d4d4a01..5f3375a970 100644
>> --- a/block/nbd-client.c
>> +++ b/block/nbd-client.c
>> @@ -78,7 +78,7 @@ static coroutine_fn void nbd_read_reply_entry(void
>> *opaque)
>>       while (!s->quit) {
>>           assert(s->reply.handle == 0);
>>           ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
> 
> hmm.
> 
> /* nbd_receive_reply
>  * Returns 1 on success
>  *         0 on eof, when no data was read (errp is not set)
>  *         negative errno on failure (errp is set)
>  */
> int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
> 
> 
> - looks like errp should be set if ret < 0. may be the function should
> be fixed?
> - don't you think that this function returns transferred server error?
> it doesn't.
> 
>> -        if (ret < 0) {
>> +        if (local_err) {
>>               error_report_err(local_err);

In my testing, I did not trip on this error_report_err() crashing.
Which means I think we are already honoring our contract:
nbd_receive_reply() is setting local_err if it returns -1.  Thus, for
this call site, 'if (ret < 0)' is equivalent to 'if (local_err)'.  The
only reason to change it, then, is for consistency with the other 2
callers of error_report_err() in this file...

>>           }
>>           if (ret <= 0) {
>> @@ -681,7 +681,7 @@ int nbd_client_co_preadv(BlockDriverState *bs,
>> uint64_t offset,
>>
>>       ret = nbd_co_receive_cmdread_reply(client, request.handle,
>> offset, qiov,
>>                                          &local_err);
>> -    if (ret < 0) {
>> +    if (local_err) {
>>           error_report_err(local_err);
>>       }
>>       return ret;
> 
> but here, it looks like you are right. My first attempt was to store
> server error to ret and
> server error msg to errp (bad idea, as I can see now), but you proposed
> to do not print every server error msg and
> deleted this functionality. And now we have a set of functions which can
> return ret < 0
> and not set errp..

Indeed, this is the caller that MUST return a negative value to get the
caller to report a failed operation, but where the negative value does
NOT imply a dead connection unless local_err was also set, so here, we
must not print anything unless local_err was set.  I caught one of the
two spots like this while revising your patch prior to soft freeze, but
missed this one.

> It looks very confusing, I think. Better solution is
> to refactor this somehow,
> may be not mixing server-error-reply with local normal errors..

A refactoring may be nice, but it would be 2.12 material; at this point,
I want to minimize what further changes go into 2.11.  That said,

> 
> For now, for second chunk:

I see no problem with both chunks going in.  The second is a bug fix,
but the first is for consistency, and I'd rather have all callers to
error_report_err() look consistent, rather than trying to remember that
'ret < 0' does not always imply errp is set.

> 
> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> 
> 
> 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH for-2.11] nbd: Don't crash when server reports NBD_CMD_READ failure
Posted by Vladimir Sementsov-Ogievskiy 6 years, 4 months ago
13.11.2017 17:45, Eric Blake wrote:
> On 11/13/2017 05:37 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 12.11.2017 04:39, Eric Blake wrote:
>>> If a server fails a read, for example with EIO, but the connection
>>> is still live, then we would crash trying to print a non-existent
>>> error message.  Bug introduced in commit f140e300.
>>>
>>> Signed-off-by: Eric Blake <eblake@redhat.com>
>>> ---
>>>    block/nbd-client.c | 4 ++--
>>>    1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/block/nbd-client.c b/block/nbd-client.c
>>> index b44d4d4a01..5f3375a970 100644
>>> --- a/block/nbd-client.c
>>> +++ b/block/nbd-client.c
>>> @@ -78,7 +78,7 @@ static coroutine_fn void nbd_read_reply_entry(void
>>> *opaque)
>>>        while (!s->quit) {
>>>            assert(s->reply.handle == 0);
>>>            ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
>> hmm.
>>
>> /* nbd_receive_reply
>>   * Returns 1 on success
>>   *         0 on eof, when no data was read (errp is not set)
>>   *         negative errno on failure (errp is set)
>>   */
>> int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
>>
>>
>> - looks like errp should be set if ret < 0. may be the function should
>> be fixed?
>> - don't you think that this function returns transferred server error?
>> it doesn't.
>>
>>> -        if (ret < 0) {
>>> +        if (local_err) {
>>>                error_report_err(local_err);
> In my testing, I did not trip on this error_report_err() crashing.
> Which means I think we are already honoring our contract:
> nbd_receive_reply() is setting local_err if it returns -1.  Thus, for
> this call site, 'if (ret < 0)' is equivalent to 'if (local_err)'.  The
> only reason to change it, then, is for consistency with the other 2
> callers of error_report_err() in this file...
>
>>>            }
>>>            if (ret <= 0) {
>>> @@ -681,7 +681,7 @@ int nbd_client_co_preadv(BlockDriverState *bs,
>>> uint64_t offset,
>>>
>>>        ret = nbd_co_receive_cmdread_reply(client, request.handle,
>>> offset, qiov,
>>>                                           &local_err);
>>> -    if (ret < 0) {
>>> +    if (local_err) {
>>>            error_report_err(local_err);
>>>        }
>>>        return ret;
>> but here, it looks like you are right. My first attempt was to store
>> server error to ret and
>> server error msg to errp (bad idea, as I can see now), but you proposed
>> to do not print every server error msg and
>> deleted this functionality. And now we have a set of functions which can
>> return ret < 0
>> and not set errp..
> Indeed, this is the caller that MUST return a negative value to get the
> caller to report a failed operation, but where the negative value does
> NOT imply a dead connection unless local_err was also set, so here, we
> must not print anything unless local_err was set.  I caught one of the
> two spots like this while revising your patch prior to soft freeze, but
> missed this one.
>
>> It looks very confusing, I think. Better solution is
>> to refactor this somehow,
>> may be not mixing server-error-reply with local normal errors..
> A refactoring may be nice, but it would be 2.12 material; at this point,
> I want to minimize what further changes go into 2.11.  That said,
>
>> For now, for second chunk:
> I see no problem with both chunks going in.  The second is a bug fix,
> but the first is for consistency, and I'd rather have all callers to
> error_report_err() look consistent, rather than trying to remember that
> 'ret < 0' does not always imply errp is set.

ok, lets proceed with both chunks.

>
>> Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>
>>
>>


-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH for-2.11] nbd: Don't crash when server reports NBD_CMD_READ failure
Posted by Eric Blake 6 years, 4 months ago
On 11/11/2017 07:39 PM, Eric Blake wrote:
> If a server fails a read, for example with EIO, but the connection
> is still live, then we would crash trying to print a non-existent
> error message.  Bug introduced in commit f140e300.
> 
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  block/nbd-client.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Queued on my NBD tree; will be in 2.11 (I may do another pull request
prior to -rc1, otherwise it will be -rc2).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org