Changeset
include/block/nbd.h                      | 132 +++++++++-
nbd/nbd-internal.h                       |  56 ++---
block/nbd-client.c                       | 401 ++++++++++++++++++++++++++++---
nbd/client.c                             | 205 ++++++++++------
nbd/server.c                             | 189 ++++++++++-----
nbd/trace-events                         |  12 +-
tests/qemu-iotests/nbd-fault-injector.py |   4 +-
7 files changed, 784 insertions(+), 215 deletions(-)
Git apply log
Switched to a new branch '20171012095319.136610-1-vsementsov@virtuozzo.com'
Applying: block/nbd-client: assert qiov len once in nbd_co_request
Applying: block/nbd-client: refactor nbd_co_receive_reply
Applying: nbd: rename some simple-request related objects to be _simple_
Applying: nbd/server: structurize simple reply header sending
Applying: nbd/server: do not use NBDReply structure
Applying: nbd/server: refactor nbd_co_send_simple_reply parameters
Applying: nbd-server: simplify reply transmission
Applying: nbd: header constants indenting
Applying: nbd: Minimal structured read for server
Applying: nbd/client: refactor nbd_receive_starttls
Applying: nbd: share some nbd entities to be reused in block/nbd-client.c
Applying: nbd/client: prepare nbd_receive_reply for structured reply
Applying: nbd: Minimal structured read for client
To https://github.com/patchew-project/qemu
 + 3b431196e5...a7fecd8798 patchew/20171012095319.136610-1-vsementsov@virtuozzo.com -> patchew/20171012095319.136610-1-vsementsov@virtuozzo.com (forced update)
Test passed: s390x

loading

Test passed: docker

loading

Test failed: checkpatch

loading

[Qemu-devel] [PATCH v3 00/13] nbd minimal structured read
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
Minimally implement nbd structured read extension.

v3:

clone: tag up-nbd-minimal-structured-read-v3 from https://src.openvz.org/scm/~vsementsov/qemu.git
online: https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=up-nbd-minimal-structured-read-v3

03: rename  nbd_co_send_reply here too, remove r-b
04-07: splitted nbd server refactoring, with some changes
08: add Eric's r-b
13: rewrite receiving loops by Paolo's suggestion


v2:

01: improve assertion (Eric)
02: add Eric's r-b
06: define NBD_SREP_TYPE_OFFSET_HOLE and NBD_SREP_TYPE_ERROR_OFFSET
    here (Eric)
07: fix return value of nbd_request_simple_option
08-09: new patches
10: mostly rewritten

Vladimir Sementsov-Ogievskiy (13):
  block/nbd-client: assert qiov len once in nbd_co_request
  block/nbd-client: refactor nbd_co_receive_reply
  nbd: rename some simple-request related objects to be _simple_
  nbd/server: structurize simple reply header sending
  nbd/server: do not use NBDReply structure
  nbd/server: refactor nbd_co_send_simple_reply parameters
  nbd-server: simplify reply transmission
  nbd: header constants indenting
  nbd: Minimal structured read for server
  nbd/client: refactor nbd_receive_starttls
  nbd: share some nbd entities to be reused in block/nbd-client.c
  nbd/client: prepare nbd_receive_reply for structured reply
  nbd: Minimal structured read for client

 include/block/nbd.h                      | 132 +++++++++-
 nbd/nbd-internal.h                       |  56 ++---
 block/nbd-client.c                       | 401 ++++++++++++++++++++++++++++---
 nbd/client.c                             | 205 ++++++++++------
 nbd/server.c                             | 189 ++++++++++-----
 nbd/trace-events                         |  12 +-
 tests/qemu-iotests/nbd-fault-injector.py |   4 +-
 7 files changed, 784 insertions(+), 215 deletions(-)

-- 
2.11.1


Re: [Qemu-devel] [PATCH v3 00/13] nbd minimal structured read
Posted by no-reply@patchew.org, 8 weeks ago
Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20171012095319.136610-1-vsementsov@virtuozzo.com
Subject: [Qemu-devel] [PATCH v3 00/13] nbd minimal structured read

=== TEST SCRIPT BEGIN ===
#!/bin/bash

BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0

git config --local diff.renamelimit 0
git config --local diff.renames True

commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
    echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
    if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
        failed=1
        echo
    fi
    n=$((n+1))
done

exit $failed
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 t [tag update]            patchew/20171005151313.32015-1-cohuck@redhat.com -> patchew/20171005151313.32015-1-cohuck@redhat.com
Switched to a new branch 'test'
803c96aeca nbd: Minimal structured read for client
55b2692ff6 nbd/client: prepare nbd_receive_reply for structured reply
3155f99d3a nbd: share some nbd entities to be reused in block/nbd-client.c
cf17e2f26c nbd/client: refactor nbd_receive_starttls
7934ecd6e3 nbd: Minimal structured read for server
abfc671d5e nbd: header constants indenting
1c2944d9d1 nbd-server: simplify reply transmission
fe87c0f68c nbd/server: refactor nbd_co_send_simple_reply parameters
a9de77d411 nbd/server: do not use NBDReply structure
691f470797 nbd/server: structurize simple reply header sending
7f1652db0f nbd: rename some simple-request related objects to be _simple_
ab2e6ba2f6 block/nbd-client: refactor nbd_co_receive_reply
c5baea9e86 block/nbd-client: assert qiov len once in nbd_co_request

=== OUTPUT BEGIN ===
Checking PATCH 1/13: block/nbd-client: assert qiov len once in nbd_co_request...
Checking PATCH 2/13: block/nbd-client: refactor nbd_co_receive_reply...
Checking PATCH 3/13: nbd: rename some simple-request related objects to be _simple_...
Checking PATCH 4/13: nbd/server: structurize simple reply header sending...
Checking PATCH 5/13: nbd/server: do not use NBDReply structure...
Checking PATCH 6/13: nbd/server: refactor nbd_co_send_simple_reply parameters...
Checking PATCH 7/13: nbd-server: simplify reply transmission...
Checking PATCH 8/13: nbd: header constants indenting...
Checking PATCH 9/13: nbd: Minimal structured read for server...
Checking PATCH 10/13: nbd/client: refactor nbd_receive_starttls...
Checking PATCH 11/13: nbd: share some nbd entities to be reused in block/nbd-client.c...
ERROR: return of an errno should typically be -ve (return -EPERM)
#46: FILE: include/block/nbd.h:206:
+        return EPERM;

ERROR: return of an errno should typically be -ve (return -EIO)
#48: FILE: include/block/nbd.h:208:
+        return EIO;

ERROR: return of an errno should typically be -ve (return -ENOMEM)
#50: FILE: include/block/nbd.h:210:
+        return ENOMEM;

ERROR: return of an errno should typically be -ve (return -ENOSPC)
#52: FILE: include/block/nbd.h:212:
+        return ENOSPC;

ERROR: return of an errno should typically be -ve (return -ESHUTDOWN)
#54: FILE: include/block/nbd.h:214:
+        return ESHUTDOWN;

ERROR: return of an errno should typically be -ve (return -EINVAL)
#56: FILE: include/block/nbd.h:216:
+        return EINVAL;

ERROR: return of an errno should typically be -ve (return -EINVAL)
#59: FILE: include/block/nbd.h:219:
+    return EINVAL;

total: 7 errors, 0 warnings, 149 lines checked

Your patch has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

Checking PATCH 12/13: nbd/client: prepare nbd_receive_reply for structured reply...
Checking PATCH 13/13: nbd: Minimal structured read for client...
=== OUTPUT END ===

Test command exited with code: 1


---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org
Re: [Qemu-devel] [PATCH v3 00/13] nbd minimal structured read
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
This is a code movement, no changes.

12.10.2017 13:49, no-reply@patchew.org wrote:
> Hi,
>
> This series seems to have some coding style problems. See output below for
> more information:
>
> Type: series
> Message-id: 20171012095319.136610-1-vsementsov@virtuozzo.com
> Subject: [Qemu-devel] [PATCH v3 00/13] nbd minimal structured read
>
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
>
> BASE=base
> n=1
> total=$(git log --oneline $BASE.. | wc -l)
> failed=0
>
> git config --local diff.renamelimit 0
> git config --local diff.renames True
>
> commits="$(git log --format=%H --reverse $BASE..)"
> for c in $commits; do
>      echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
>      if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
>          failed=1
>          echo
>      fi
>      n=$((n+1))
> done
>
> exit $failed
> === TEST SCRIPT END ===
>
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
>  From https://github.com/patchew-project/qemu
>   t [tag update]            patchew/20171005151313.32015-1-cohuck@redhat.com -> patchew/20171005151313.32015-1-cohuck@redhat.com
> Switched to a new branch 'test'
> 803c96aeca nbd: Minimal structured read for client
> 55b2692ff6 nbd/client: prepare nbd_receive_reply for structured reply
> 3155f99d3a nbd: share some nbd entities to be reused in block/nbd-client.c
> cf17e2f26c nbd/client: refactor nbd_receive_starttls
> 7934ecd6e3 nbd: Minimal structured read for server
> abfc671d5e nbd: header constants indenting
> 1c2944d9d1 nbd-server: simplify reply transmission
> fe87c0f68c nbd/server: refactor nbd_co_send_simple_reply parameters
> a9de77d411 nbd/server: do not use NBDReply structure
> 691f470797 nbd/server: structurize simple reply header sending
> 7f1652db0f nbd: rename some simple-request related objects to be _simple_
> ab2e6ba2f6 block/nbd-client: refactor nbd_co_receive_reply
> c5baea9e86 block/nbd-client: assert qiov len once in nbd_co_request
>
> === OUTPUT BEGIN ===
> Checking PATCH 1/13: block/nbd-client: assert qiov len once in nbd_co_request...
> Checking PATCH 2/13: block/nbd-client: refactor nbd_co_receive_reply...
> Checking PATCH 3/13: nbd: rename some simple-request related objects to be _simple_...
> Checking PATCH 4/13: nbd/server: structurize simple reply header sending...
> Checking PATCH 5/13: nbd/server: do not use NBDReply structure...
> Checking PATCH 6/13: nbd/server: refactor nbd_co_send_simple_reply parameters...
> Checking PATCH 7/13: nbd-server: simplify reply transmission...
> Checking PATCH 8/13: nbd: header constants indenting...
> Checking PATCH 9/13: nbd: Minimal structured read for server...
> Checking PATCH 10/13: nbd/client: refactor nbd_receive_starttls...
> Checking PATCH 11/13: nbd: share some nbd entities to be reused in block/nbd-client.c...
> ERROR: return of an errno should typically be -ve (return -EPERM)
> #46: FILE: include/block/nbd.h:206:
> +        return EPERM;
>
> ERROR: return of an errno should typically be -ve (return -EIO)
> #48: FILE: include/block/nbd.h:208:
> +        return EIO;
>
> ERROR: return of an errno should typically be -ve (return -ENOMEM)
> #50: FILE: include/block/nbd.h:210:
> +        return ENOMEM;
>
> ERROR: return of an errno should typically be -ve (return -ENOSPC)
> #52: FILE: include/block/nbd.h:212:
> +        return ENOSPC;
>
> ERROR: return of an errno should typically be -ve (return -ESHUTDOWN)
> #54: FILE: include/block/nbd.h:214:
> +        return ESHUTDOWN;
>
> ERROR: return of an errno should typically be -ve (return -EINVAL)
> #56: FILE: include/block/nbd.h:216:
> +        return EINVAL;
>
> ERROR: return of an errno should typically be -ve (return -EINVAL)
> #59: FILE: include/block/nbd.h:219:
> +    return EINVAL;
>
> total: 7 errors, 0 warnings, 149 lines checked
>
> Your patch has style problems, please review.  If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
>
> Checking PATCH 12/13: nbd/client: prepare nbd_receive_reply for structured reply...
> Checking PATCH 13/13: nbd: Minimal structured read for client...
> === OUTPUT END ===
>
> Test command exited with code: 1
>
>
> ---
> Email generated automatically by Patchew [http://patchew.org/].
> Please send your feedback to patchew-devel@freelists.org


-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v3 00/13] nbd minimal structured read
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 06:15 AM, Vladimir Sementsov-Ogievskiy wrote:
> This is a code movement, no changes.
> 

>> Checking PATCH 11/13: nbd: share some nbd entities to be reused in
>> block/nbd-client.c...
>> ERROR: return of an errno should typically be -ve (return -EPERM)
>> #46: FILE: include/block/nbd.h:206:
>> +        return EPERM;

Indeed.  We could avoid the false positive by doing:

case ...:
    ret = EPERM;
    break;
...
return ret;

to hide the positive errno value, but I'm also fine ignoring checkpatch
on this one.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v3 00/13] nbd minimal structured read
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Minimally implement nbd structured read extension.
> 
> v3:
> 
> clone: tag up-nbd-minimal-structured-read-v3 from https://src.openvz.org/scm/~vsementsov/qemu.git
> online: https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=up-nbd-minimal-structured-read-v3
> 
> 03: rename  nbd_co_send_reply here too, remove r-b
> 04-07: splitted nbd server refactoring, with some changes

English does not have 'splitted'; 'split' is one of those irregular
verbs that is the same form for present, past, and past participle tenses.

I've pushed 1-7 with tweaks to my NBD staging queue:
http://repo.or.cz/qemu/ericb.git/shortlog/refs/heads/nbd

Let me know if I need to redo any of the tweaks I made to your work.

> 08: add Eric's r-b
> 13: rewrite receiving loops by Paolo's suggestion
> 

and I will resume review of the rest tomorrow.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v3 00/13] nbd minimal structured read
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
13.10.2017 01:39, Eric Blake wrote:
> On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Minimally implement nbd structured read extension.
>>
>> v3:
>>
>> clone: tag up-nbd-minimal-structured-read-v3 from https://src.openvz.org/scm/~vsementsov/qemu.git
>> online: https://src.openvz.org/users/vsementsov/repos/qemu/browse?at=up-nbd-minimal-structured-read-v3
>>
>> 03: rename  nbd_co_send_reply here too, remove r-b
>> 04-07: splitted nbd server refactoring, with some changes
> English does not have 'splitted'; 'split' is one of those irregular
> verbs that is the same form for present, past, and past participle tenses.
>
> I've pushed 1-7 with tweaks to my NBD staging queue:
> http://repo.or.cz/qemu/ericb.git/shortlog/refs/heads/nbd
>
> Let me know if I need to redo any of the tweaks I made to your work.

I've checked, and I agree with all your changes, thank you for this work!

>
>> 08: add Eric's r-b
>> 13: rewrite receiving loops by Paolo's suggestion
>>
> and I will resume review of the rest tomorrow.
>


-- 
Best regards,
Vladimir


[Qemu-devel] [PATCH v3 01/13] block/nbd-client: assert qiov len once in nbd_co_request
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
Also improve the assertion: check that qiov is NULL for other commands
than CMD_READ and CMD_WRITE.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 block/nbd-client.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 72651dcdb1..ddf273a6a4 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -156,7 +156,6 @@ static int nbd_co_send_request(BlockDriverState *bs,
         qio_channel_set_cork(s->ioc, true);
         rc = nbd_send_request(s->ioc, request);
         if (rc >= 0 && !s->quit) {
-            assert(request->len == iov_size(qiov->iov, qiov->niov));
             if (qio_channel_writev_all(s->ioc, qiov->iov, qiov->niov,
                                        NULL) < 0) {
                 rc = -EIO;
@@ -197,7 +196,6 @@ static int nbd_co_receive_reply(NBDClientSession *s,
         assert(s->reply.handle == request->handle);
         ret = -s->reply.error;
         if (qiov && s->reply.error == 0) {
-            assert(request->len == iov_size(qiov->iov, qiov->niov));
             if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
                                       NULL) < 0) {
                 ret = -EIO;
@@ -231,8 +229,12 @@ static int nbd_co_request(BlockDriverState *bs,
     NBDClientSession *client = nbd_get_client_session(bs);
     int ret;
 
-    assert(!qiov || request->type == NBD_CMD_WRITE ||
-           request->type == NBD_CMD_READ);
+    if (qiov) {
+        assert(request->type == NBD_CMD_WRITE || request->type == NBD_CMD_READ);
+        assert(request->len == iov_size(qiov->iov, qiov->niov));
+    } else {
+        assert(request->type != NBD_CMD_WRITE && request->type != NBD_CMD_READ);
+    }
     ret = nbd_co_send_request(bs, request,
                               request->type == NBD_CMD_WRITE ? qiov : NULL);
     if (ret < 0) {
-- 
2.11.1


Re: [Qemu-devel] [PATCH v3 01/13] block/nbd-client: assert qiov len once in nbd_co_request
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Also improve the assertion: check that qiov is NULL for other commands
> than CMD_READ and CMD_WRITE.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  block/nbd-client.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)

Looks identical to v2, where I gave R-b:
https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg01903.html

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Qemu-devel] [PATCH v3 02/13] block/nbd-client: refactor nbd_co_receive_reply
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
Pass handle parameter directly, as the whole request isn't needed.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 block/nbd-client.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index ddf273a6a4..c0683c3c83 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -180,11 +180,11 @@ err:
 }
 
 static int nbd_co_receive_reply(NBDClientSession *s,
-                                NBDRequest *request,
+                                uint64_t handle,
                                 QEMUIOVector *qiov)
 {
     int ret;
-    int i = HANDLE_TO_INDEX(s, request->handle);
+    int i = HANDLE_TO_INDEX(s, handle);
 
     /* Wait until we're woken up by nbd_read_reply_entry.  */
     s->requests[i].receiving = true;
@@ -193,7 +193,7 @@ static int nbd_co_receive_reply(NBDClientSession *s,
     if (!s->ioc || s->quit) {
         ret = -EIO;
     } else {
-        assert(s->reply.handle == request->handle);
+        assert(s->reply.handle == handle);
         ret = -s->reply.error;
         if (qiov && s->reply.error == 0) {
             if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
@@ -241,7 +241,7 @@ static int nbd_co_request(BlockDriverState *bs,
         return ret;
     }
 
-    return nbd_co_receive_reply(client, request,
+    return nbd_co_receive_reply(client, request->handle,
                                 request->type == NBD_CMD_READ ? qiov : NULL);
 }
 
-- 
2.11.1


[Qemu-devel] [PATCH v3 03/13] nbd: rename some simple-request related objects to be _simple_
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
To be consistent when their _structured_analogs will be introduced.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 nbd/nbd-internal.h                       |  2 +-
 nbd/client.c                             |  4 ++--
 nbd/server.c                             | 12 ++++++------
 nbd/trace-events                         |  2 +-
 tests/qemu-iotests/nbd-fault-injector.py |  4 ++--
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 8a609a227f..2d6663de23 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -47,7 +47,7 @@
 #define NBD_OLDSTYLE_NEGOTIATE_SIZE (8 + 8 + 8 + 4 + 124)
 
 #define NBD_REQUEST_MAGIC       0x25609513
-#define NBD_REPLY_MAGIC         0x67446698
+#define NBD_SIMPLE_REPLY_MAGIC  0x67446698
 #define NBD_OPTS_MAGIC          0x49484156454F5054LL
 #define NBD_CLIENT_MAGIC        0x0000420281861253LL
 #define NBD_REP_MAGIC           0x0003e889045565a9LL
diff --git a/nbd/client.c b/nbd/client.c
index 68a0bc1ffc..cd5a2c80ac 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -931,7 +931,7 @@ int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
     }
 
     /* Reply
-       [ 0 ..  3]    magic   (NBD_REPLY_MAGIC)
+       [ 0 ..  3]    magic   (NBD_SIMPLE_REPLY_MAGIC)
        [ 4 ..  7]    error   (0 == no error)
        [ 7 .. 15]    handle
      */
@@ -949,7 +949,7 @@ int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
     }
     trace_nbd_receive_reply(magic, reply->error, reply->handle);
 
-    if (magic != NBD_REPLY_MAGIC) {
+    if (magic != NBD_SIMPLE_REPLY_MAGIC) {
         error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic);
         return -EINVAL;
     }
diff --git a/nbd/server.c b/nbd/server.c
index 993ade30bb..bc7f9f0fd6 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -911,11 +911,11 @@ static int nbd_send_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
     trace_nbd_send_reply(reply->error, reply->handle);
 
     /* Reply
-       [ 0 ..  3]    magic   (NBD_REPLY_MAGIC)
+       [ 0 ..  3]    magic   (NBD_SIMPLE_REPLY_MAGIC)
        [ 4 ..  7]    error   (0 == no error)
        [ 7 .. 15]    handle
      */
-    stl_be_p(buf, NBD_REPLY_MAGIC);
+    stl_be_p(buf, NBD_SIMPLE_REPLY_MAGIC);
     stl_be_p(buf + 4, reply->error);
     stq_be_p(buf + 8, reply->handle);
 
@@ -1208,15 +1208,15 @@ void nbd_export_close_all(void)
     }
 }
 
-static int nbd_co_send_reply(NBDRequestData *req, NBDReply *reply, int len,
-                             Error **errp)
+static int nbd_co_send_simple_reply(NBDRequestData *req, NBDReply *reply,
+                                    int len, Error **errp)
 {
     NBDClient *client = req->client;
     int ret;
 
     g_assert(qemu_in_coroutine());
 
-    trace_nbd_co_send_reply(reply->handle, reply->error, len);
+    trace_nbd_co_send_simple_reply(reply->handle, reply->error, len);
 
     qemu_co_mutex_lock(&client->send_lock);
     client->send_coroutine = qemu_coroutine_self();
@@ -1465,7 +1465,7 @@ reply:
         local_err = NULL;
     }
 
-    if (nbd_co_send_reply(req, &reply, reply_data_len, &local_err) < 0) {
+    if (nbd_co_send_simple_reply(req, &reply, reply_data_len, &local_err) < 0) {
         error_prepend(&local_err, "Failed to send reply: ");
         goto disconnect;
     }
diff --git a/nbd/trace-events b/nbd/trace-events
index 48a4f27682..c5f2d97be2 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -54,7 +54,7 @@ nbd_receive_request(uint32_t magic, uint16_t flags, uint16_t type, uint64_t from
 nbd_send_reply(int32_t error, uint64_t handle) "Sending response to client: { .error = %" PRId32 ", handle = %" PRIu64 " }"
 nbd_blk_aio_attached(const char *name, void *ctx) "Export %s: Attaching clients to AIO context %p\n"
 nbd_blk_aio_detach(const char *name, void *ctx) "Export %s: Detaching clients from AIO context %p\n"
-nbd_co_send_reply(uint64_t handle, uint32_t error, int len) "Send reply: handle = %" PRIu64 ", error = %" PRIu32 ", len = %d"
+nbd_co_send_simple_reply(uint64_t handle, uint32_t error, int len) "Send reply: handle = %" PRIu64 ", error = %" PRIu32 ", len = %d"
 nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type, const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16 " (%s)"
 nbd_co_receive_request_payload_received(uint64_t handle, uint32_t len) "Payload received: handle = %" PRIu64 ", len = %" PRIu32
 nbd_co_receive_request_cmd_write(uint32_t len) "Reading %" PRIu32 " byte(s)"
diff --git a/tests/qemu-iotests/nbd-fault-injector.py b/tests/qemu-iotests/nbd-fault-injector.py
index 1c10dcb51c..8a04d979aa 100755
--- a/tests/qemu-iotests/nbd-fault-injector.py
+++ b/tests/qemu-iotests/nbd-fault-injector.py
@@ -56,7 +56,7 @@ NBD_CMD_READ = 0
 NBD_CMD_WRITE = 1
 NBD_CMD_DISC = 2
 NBD_REQUEST_MAGIC = 0x25609513
-NBD_REPLY_MAGIC = 0x67446698
+NBD_SIMPLE_REPLY_MAGIC = 0x67446698
 NBD_PASSWD = 0x4e42444d41474943
 NBD_OPTS_MAGIC = 0x49484156454F5054
 NBD_CLIENT_MAGIC = 0x0000420281861253
@@ -166,7 +166,7 @@ def read_request(conn):
     return req
 
 def write_reply(conn, error, handle):
-    buf = reply_struct.pack(NBD_REPLY_MAGIC, error, handle)
+    buf = reply_struct.pack(NBD_SIMPLE_REPLY_MAGIC, error, handle)
     conn.send(buf, event='reply')
 
 def handle_connection(conn, use_export):
-- 
2.11.1


Re: [Qemu-devel] [PATCH v3 03/13] nbd: rename some simple-request related objects to be _simple_
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> To be consistent when their _structured_analogs will be introduced.

space before analogs

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  nbd/nbd-internal.h                       |  2 +-
>  nbd/client.c                             |  4 ++--
>  nbd/server.c                             | 12 ++++++------
>  nbd/trace-events                         |  2 +-
>  tests/qemu-iotests/nbd-fault-injector.py |  4 ++--
>  5 files changed, 12 insertions(+), 12 deletions(-)
> 

> +++ b/nbd/trace-events
> @@ -54,7 +54,7 @@ nbd_receive_request(uint32_t magic, uint16_t flags, uint16_t type, uint64_t from
>  nbd_send_reply(int32_t error, uint64_t handle) "Sending response to client: { .error = %" PRId32 ", handle = %" PRIu64 " }"
>  nbd_blk_aio_attached(const char *name, void *ctx) "Export %s: Attaching clients to AIO context %p\n"
>  nbd_blk_aio_detach(const char *name, void *ctx) "Export %s: Detaching clients from AIO context %p\n"
> -nbd_co_send_reply(uint64_t handle, uint32_t error, int len) "Send reply: handle = %" PRIu64 ", error = %" PRIu32 ", len = %d"
> +nbd_co_send_simple_reply(uint64_t handle, uint32_t error, int len) "Send reply: handle = %" PRIu64 ", error = %" PRIu32 ", len = %d"

I'd also insert 'simple' in the log message.

Interesting that we trace when the server sends a simple message, but
the client does not (yet) trace which type of message it receives; but
that can be for later patches (I haven't checked if you already added it
later in this series)

Changes are small enough that I can make them as part of queueing them
on my NBD list, with:

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Qemu-devel] [PATCH v3 04/13] nbd/server: structurize simple reply header sending
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
Use packed structure instead of pointer arithmetics.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/nbd.h |  6 ++++++
 nbd/server.c        | 36 ++++++++++++++----------------------
 2 files changed, 20 insertions(+), 22 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 707fd37575..49008980f4 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -63,6 +63,12 @@ struct NBDReply {
 };
 typedef struct NBDReply NBDReply;
 
+typedef struct NBDSimpleReply {
+    uint32_t magic;  /* NBD_SIMPLE_REPLY_MAGIC */
+    uint32_t error;
+    uint64_t handle;
+} QEMU_PACKED NBDSimpleReply;
+
 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
 #define NBD_FLAG_HAS_FLAGS      (1 << 0)        /* Flags are there */
diff --git a/nbd/server.c b/nbd/server.c
index bc7f9f0fd6..43ade30ba3 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -902,26 +902,6 @@ static int nbd_receive_request(QIOChannel *ioc, NBDRequest *request,
     return 0;
 }
 
-static int nbd_send_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
-{
-    uint8_t buf[NBD_REPLY_SIZE];
-
-    reply->error = system_errno_to_nbd_errno(reply->error);
-
-    trace_nbd_send_reply(reply->error, reply->handle);
-
-    /* Reply
-       [ 0 ..  3]    magic   (NBD_SIMPLE_REPLY_MAGIC)
-       [ 4 ..  7]    error   (0 == no error)
-       [ 7 .. 15]    handle
-     */
-    stl_be_p(buf, NBD_SIMPLE_REPLY_MAGIC);
-    stl_be_p(buf + 4, reply->error);
-    stq_be_p(buf + 8, reply->handle);
-
-    return nbd_write(ioc, buf, sizeof(buf), errp);
-}
-
 #define MAX_NBD_REQUESTS 16
 
 void nbd_client_get(NBDClient *client)
@@ -1208,24 +1188,36 @@ void nbd_export_close_all(void)
     }
 }
 
+static inline void set_be_simple_reply(NBDSimpleReply *reply, uint64_t error,
+                                       uint64_t handle)
+{
+    stl_be_p(&reply->magic, NBD_SIMPLE_REPLY_MAGIC);
+    stl_be_p(&reply->error, error);
+    stq_be_p(&reply->handle, handle);
+}
+
 static int nbd_co_send_simple_reply(NBDRequestData *req, NBDReply *reply,
                                     int len, Error **errp)
 {
     NBDClient *client = req->client;
+    NBDSimpleReply simple_reply;
     int ret;
 
     g_assert(qemu_in_coroutine());
 
     trace_nbd_co_send_simple_reply(reply->handle, reply->error, len);
 
+    set_be_simple_reply(&simple_reply, system_errno_to_nbd_errno(reply->error),
+                        reply->handle);
+
     qemu_co_mutex_lock(&client->send_lock);
     client->send_coroutine = qemu_coroutine_self();
 
     if (!len) {
-        ret = nbd_send_reply(client->ioc, reply, errp);
+        ret = nbd_write(client->ioc, &simple_reply, sizeof(simple_reply), NULL);
     } else {
         qio_channel_set_cork(client->ioc, true);
-        ret = nbd_send_reply(client->ioc, reply, errp);
+        ret = nbd_write(client->ioc, &simple_reply, sizeof(simple_reply), NULL);
         if (ret == 0) {
             ret = nbd_write(client->ioc, req->data, len, errp);
             if (ret < 0) {
-- 
2.11.1


Re: [Qemu-devel] [PATCH v3 04/13] nbd/server: structurize simple reply header sending
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Use packed structure instead of pointer arithmetics.

English is fun!  In the subject line, I'm fairly certain that
"structurize" is not likely to be in any dictionary, yet it is a perfect
word describing the patch, so I'm not touching it ;)

> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/nbd.h |  6 ++++++
>  nbd/server.c        | 36 ++++++++++++++----------------------
>  2 files changed, 20 insertions(+), 22 deletions(-)
> 

> +++ b/nbd/server.c
> @@ -902,26 +902,6 @@ static int nbd_receive_request(QIOChannel *ioc, NBDRequest *request,
>      return 0;
>  }
>  
> -static int nbd_send_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
> -{
> -    uint8_t buf[NBD_REPLY_SIZE];
> -
> -    reply->error = system_errno_to_nbd_errno(reply->error);
> -
> -    trace_nbd_send_reply(reply->error, reply->handle);

We lose a trace here...

>  static int nbd_co_send_simple_reply(NBDRequestData *req, NBDReply *reply,
>                                      int len, Error **errp)
>  {
>      NBDClient *client = req->client;
> +    NBDSimpleReply simple_reply;
>      int ret;
>  
>      g_assert(qemu_in_coroutine());
>  
>      trace_nbd_co_send_simple_reply(reply->handle, reply->error, len);
>  
> +    set_be_simple_reply(&simple_reply, system_errno_to_nbd_errno(reply->error),
> +                        reply->handle);
> +

...but it always occurred immediately after another trace that has
redundant information (well, the trace you kept shows pre- rather than
post-translation of errno value to NBD wire value, and the trace you
drop didn't show length).  I'm fine with the reduction, but it needs a
tweak to trace-events to reap the dead trace.

With that change,
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v3 04/13] nbd/server: structurize simple reply header sending
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:42 PM, Eric Blake wrote:
> On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Use packed structure instead of pointer arithmetics.
> 

>> +    set_be_simple_reply(&simple_reply, system_errno_to_nbd_errno(reply->error),
>> +                        reply->handle);
>> +
> 
> ...but it always occurred immediately after another trace that has
> redundant information (well, the trace you kept shows pre- rather than
> post-translation of errno value to NBD wire value,

On second thought, tracing what gets sent over the wire is probably
nicer than what we had internally (especially if the client traces what
it receives - it's nice to match up values on both sides of the wire).

> 
> With that change,
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 

So here's what I'm squashing in:

diff --git i/nbd/server.c w/nbd/server.c
index 69cd2cda76..65c08fa1cc 100644
--- i/nbd/server.c
+++ w/nbd/server.c
@@ -1201,14 +1201,13 @@ static int
nbd_co_send_simple_reply(NBDRequestData *req, NBDReply *reply,
 {
     NBDClient *client = req->client;
     NBDSimpleReply simple_reply;
+    int nbd_err = system_errno_to_nbd_errno(reply->error);
     int ret;

     g_assert(qemu_in_coroutine());

-    trace_nbd_co_send_simple_reply(reply->handle, reply->error, len);
-
-    set_be_simple_reply(&simple_reply,
system_errno_to_nbd_errno(reply->error),
-                        reply->handle);
+    trace_nbd_co_send_simple_reply(reply->handle, nbd_err, len);
+    set_be_simple_reply(&simple_reply, nbd_err, reply->handle);

     qemu_co_mutex_lock(&client->send_lock);
     client->send_coroutine = qemu_coroutine_self();
diff --git i/nbd/trace-events w/nbd/trace-events
index 4d6f86c2d4..e27614f050 100644
--- i/nbd/trace-events
+++ w/nbd/trace-events
@@ -51,7 +51,6 @@ nbd_negotiate_old_style(uint64_t size, unsigned flags)
"advertising size %" PRIu
 nbd_negotiate_new_style_size_flags(uint64_t size, unsigned flags)
"advertising size %" PRIu64 " and flags 0x%x"
 nbd_negotiate_success(void) "Negotiation succeeded"
 nbd_receive_request(uint32_t magic, uint16_t flags, uint16_t type,
uint64_t from, uint32_t len) "Got request: { magic = 0x%" PRIx32 ",
.flags = 0x%" PRIx16 ", .type = 0x%" PRIx16 ", from = %" PRIu64 ", len =
%" PRIu32 " }"
-nbd_send_reply(int32_t error, uint64_t handle) "Sending response to
client: { .error = %" PRId32 ", handle = %" PRIu64 " }"
 nbd_blk_aio_attached(const char *name, void *ctx) "Export %s: Attaching
clients to AIO context %p\n"
 nbd_blk_aio_detach(const char *name, void *ctx) "Export %s: Detaching
clients from AIO context %p\n"
 nbd_co_send_simple_reply(uint64_t handle, uint32_t error, int len)
"Send simple reply: handle = %" PRIu64 ", error = %" PRIu32 ", len = %d"


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v3 04/13] nbd/server: structurize simple reply header sending
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Use packed structure instead of pointer arithmetics.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/nbd.h |  6 ++++++
>  nbd/server.c        | 36 ++++++++++++++----------------------
>  2 files changed, 20 insertions(+), 22 deletions(-)
> 

>      if (!len) {
> -        ret = nbd_send_reply(client->ioc, reply, errp);
> +        ret = nbd_write(client->ioc, &simple_reply, sizeof(simple_reply), NULL);
>      } else {
>          qio_channel_set_cork(client->ioc, true);
> -        ret = nbd_send_reply(client->ioc, reply, errp);
> +        ret = nbd_write(client->ioc, &simple_reply, sizeof(simple_reply), NULL);

One more thing: this should be errp, not NULL - we don't want to lose
the error, even if the next patch changes things yet again.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Qemu-devel] [PATCH v3 05/13] nbd/server: do not use NBDReply structure
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
NBDReply structure will be upgraded in future patches to handle both
simple and structured replies and will be used only in the client

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 nbd/server.c | 36 +++++++++++++++---------------------
 1 file changed, 15 insertions(+), 21 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 43ade30ba3..3878145f63 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1196,7 +1196,9 @@ static inline void set_be_simple_reply(NBDSimpleReply *reply, uint64_t error,
     stq_be_p(&reply->handle, handle);
 }
 
-static int nbd_co_send_simple_reply(NBDRequestData *req, NBDReply *reply,
+static int nbd_co_send_simple_reply(NBDRequestData *req,
+                                    uint64_t handle,
+                                    uint32_t error,
                                     int len, Error **errp)
 {
     NBDClient *client = req->client;
@@ -1205,10 +1207,10 @@ static int nbd_co_send_simple_reply(NBDRequestData *req, NBDReply *reply,
 
     g_assert(qemu_in_coroutine());
 
-    trace_nbd_co_send_simple_reply(reply->handle, reply->error, len);
+    trace_nbd_co_send_simple_reply(handle, error, len);
 
-    set_be_simple_reply(&simple_reply, system_errno_to_nbd_errno(reply->error),
-                        reply->handle);
+    set_be_simple_reply(&simple_reply, system_errno_to_nbd_errno(error),
+                        handle);
 
     qemu_co_mutex_lock(&client->send_lock);
     client->send_coroutine = qemu_coroutine_self();
@@ -1323,7 +1325,6 @@ static coroutine_fn void nbd_trip(void *opaque)
     NBDExport *exp = client->exp;
     NBDRequestData *req;
     NBDRequest request = { 0 };    /* GCC thinks it can be used uninitialized */
-    NBDReply reply;
     int ret;
     int flags;
     int reply_data_len = 0;
@@ -1343,11 +1344,7 @@ static coroutine_fn void nbd_trip(void *opaque)
         goto disconnect;
     }
 
-    reply.handle = request.handle;
-    reply.error = 0;
-
     if (ret < 0) {
-        reply.error = -ret;
         goto reply;
     }
 
@@ -1366,7 +1363,6 @@ static coroutine_fn void nbd_trip(void *opaque)
             ret = blk_co_flush(exp->blk);
             if (ret < 0) {
                 error_setg_errno(&local_err, -ret, "flush failed");
-                reply.error = -ret;
                 break;
             }
         }
@@ -1375,7 +1371,6 @@ static coroutine_fn void nbd_trip(void *opaque)
                         req->data, request.len);
         if (ret < 0) {
             error_setg_errno(&local_err, -ret, "reading from file failed");
-            reply.error = -ret;
             break;
         }
 
@@ -1384,7 +1379,7 @@ static coroutine_fn void nbd_trip(void *opaque)
         break;
     case NBD_CMD_WRITE:
         if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
-            reply.error = EROFS;
+            ret = -EROFS;
             break;
         }
 
@@ -1396,14 +1391,13 @@ static coroutine_fn void nbd_trip(void *opaque)
                          req->data, request.len, flags);
         if (ret < 0) {
             error_setg_errno(&local_err, -ret, "writing to file failed");
-            reply.error = -ret;
         }
 
         break;
     case NBD_CMD_WRITE_ZEROES:
         if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
             error_setg(&local_err, "Server is read-only, return error");
-            reply.error = EROFS;
+            ret = -EROFS;
             break;
         }
 
@@ -1418,7 +1412,6 @@ static coroutine_fn void nbd_trip(void *opaque)
                                 request.len, flags);
         if (ret < 0) {
             error_setg_errno(&local_err, -ret, "writing to file failed");
-            reply.error = -ret;
         }
 
         break;
@@ -1430,7 +1423,6 @@ static coroutine_fn void nbd_trip(void *opaque)
         ret = blk_co_flush(exp->blk);
         if (ret < 0) {
             error_setg_errno(&local_err, -ret, "flush failed");
-            reply.error = -ret;
         }
 
         break;
@@ -1439,25 +1431,27 @@ static coroutine_fn void nbd_trip(void *opaque)
                               request.len);
         if (ret < 0) {
             error_setg_errno(&local_err, -ret, "discard failed");
-            reply.error = -ret;
         }
 
         break;
     default:
         error_setg(&local_err, "invalid request type (%" PRIu32 ") received",
                    request.type);
-        reply.error = EINVAL;
+        ret = -EINVAL;
     }
 
 reply:
     if (local_err) {
-        /* If we are here local_err is not fatal error, already stored in
-         * reply.error */
+        /* If we get here, local_err was not a fatal error, and should be sent
+         * to the client. */
         error_report_err(local_err);
         local_err = NULL;
     }
 
-    if (nbd_co_send_simple_reply(req, &reply, reply_data_len, &local_err) < 0) {
+    if (nbd_co_send_simple_reply(req, request.handle,
+                                 ret < 0 ? -ret : 0,
+                                 reply_data_len, &local_err) < 0)
+    {
         error_prepend(&local_err, "Failed to send reply: ");
         goto disconnect;
     }
-- 
2.11.1


Re: [Qemu-devel] [PATCH v3 05/13] nbd/server: do not use NBDReply structure
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> NBDReply structure will be upgraded in future patches to handle both
> simple and structured replies and will be used only in the client
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  nbd/server.c | 36 +++++++++++++++---------------------
>  1 file changed, 15 insertions(+), 21 deletions(-)
> 
> diff --git a/nbd/server.c b/nbd/server.c
> index 43ade30ba3..3878145f63 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1196,7 +1196,9 @@ static inline void set_be_simple_reply(NBDSimpleReply *reply, uint64_t error,
>      stq_be_p(&reply->handle, handle);
>  }
>  
> -static int nbd_co_send_simple_reply(NBDRequestData *req, NBDReply *reply,
> +static int nbd_co_send_simple_reply(NBDRequestData *req,
> +                                    uint64_t handle,
> +                                    uint32_t error,

Feels a bit like a step backwards for the parameter list, but I'm not
opposed to the patch (it does make for some nice cleanup elsewhere).

>                                      int len, Error **errp)
>  {
>      NBDClient *client = req->client;
> @@ -1205,10 +1207,10 @@ static int nbd_co_send_simple_reply(NBDRequestData *req, NBDReply *reply,
>  
>      g_assert(qemu_in_coroutine());
>  
> -    trace_nbd_co_send_simple_reply(reply->handle, reply->error, len);
> +    trace_nbd_co_send_simple_reply(handle, error, len);

I had to tweak this to rebase on top of my churn on 4/13.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Qemu-devel] [PATCH v3 06/13] nbd/server: refactor nbd_co_send_simple_reply parameters
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
Pass client and buffer (*data) parameters directly, to make the function
consistent with further structured reply sending functions.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 nbd/server.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index 3878145f63..afc127bbd9 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1196,12 +1196,13 @@ static inline void set_be_simple_reply(NBDSimpleReply *reply, uint64_t error,
     stq_be_p(&reply->handle, handle);
 }
 
-static int nbd_co_send_simple_reply(NBDRequestData *req,
+static int nbd_co_send_simple_reply(NBDClient *client,
                                     uint64_t handle,
                                     uint32_t error,
-                                    int len, Error **errp)
+                                    void *data,
+                                    size_t len,
+                                    Error **errp)
 {
-    NBDClient *client = req->client;
     NBDSimpleReply simple_reply;
     int ret;
 
@@ -1221,7 +1222,7 @@ static int nbd_co_send_simple_reply(NBDRequestData *req,
         qio_channel_set_cork(client->ioc, true);
         ret = nbd_write(client->ioc, &simple_reply, sizeof(simple_reply), NULL);
         if (ret == 0) {
-            ret = nbd_write(client->ioc, req->data, len, errp);
+            ret = nbd_write(client->ioc, data, len, errp);
             if (ret < 0) {
                 ret = -EIO;
             }
@@ -1448,9 +1449,9 @@ reply:
         local_err = NULL;
     }
 
-    if (nbd_co_send_simple_reply(req, request.handle,
+    if (nbd_co_send_simple_reply(req->client, request.handle,
                                  ret < 0 ? -ret : 0,
-                                 reply_data_len, &local_err) < 0)
+                                 req->data, reply_data_len, &local_err) < 0)
     {
         error_prepend(&local_err, "Failed to send reply: ");
         goto disconnect;
-- 
2.11.1


Re: [Qemu-devel] [PATCH v3 06/13] nbd/server: refactor nbd_co_send_simple_reply parameters
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Pass client and buffer (*data) parameters directly, to make the function
> consistent with further structured reply sending functions.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  nbd/server.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Qemu-devel] [PATCH v3 07/13] nbd-server: simplify reply transmission
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
Send qiov via qio_channel_writev_all instead of calling nbd_write twice
with a cork.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 nbd/server.c | 50 ++++++++++++++++++++++++--------------------------
 1 file changed, 24 insertions(+), 26 deletions(-)

diff --git a/nbd/server.c b/nbd/server.c
index afc127bbd9..c1bbe8d2d1 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1188,6 +1188,23 @@ void nbd_export_close_all(void)
     }
 }
 
+static int coroutine_fn nbd_co_send_iov(NBDClient *client, struct iovec *iov,
+                                        unsigned niov, Error **errp)
+{
+    int ret;
+
+    g_assert(qemu_in_coroutine());
+    qemu_co_mutex_lock(&client->send_lock);
+    client->send_coroutine = qemu_coroutine_self();
+
+    ret = qio_channel_writev_all(client->ioc, iov, niov, errp) < 0 ? -EIO : 0;
+
+    client->send_coroutine = NULL;
+    qemu_co_mutex_unlock(&client->send_lock);
+
+    return ret;
+}
+
 static inline void set_be_simple_reply(NBDSimpleReply *reply, uint64_t error,
                                        uint64_t handle)
 {
@@ -1203,36 +1220,17 @@ static int nbd_co_send_simple_reply(NBDClient *client,
                                     size_t len,
                                     Error **errp)
 {
-    NBDSimpleReply simple_reply;
-    int ret;
-
-    g_assert(qemu_in_coroutine());
+    NBDSimpleReply reply;
+    struct iovec iov[] = {
+        {.iov_base = &reply, .iov_len = sizeof(reply)},
+        {.iov_base = data, .iov_len = len}
+    };
 
     trace_nbd_co_send_simple_reply(handle, error, len);
 
-    set_be_simple_reply(&simple_reply, system_errno_to_nbd_errno(error),
-                        handle);
-
-    qemu_co_mutex_lock(&client->send_lock);
-    client->send_coroutine = qemu_coroutine_self();
-
-    if (!len) {
-        ret = nbd_write(client->ioc, &simple_reply, sizeof(simple_reply), NULL);
-    } else {
-        qio_channel_set_cork(client->ioc, true);
-        ret = nbd_write(client->ioc, &simple_reply, sizeof(simple_reply), NULL);
-        if (ret == 0) {
-            ret = nbd_write(client->ioc, data, len, errp);
-            if (ret < 0) {
-                ret = -EIO;
-            }
-        }
-        qio_channel_set_cork(client->ioc, false);
-    }
+    set_be_simple_reply(&reply, system_errno_to_nbd_errno(error), handle);
 
-    client->send_coroutine = NULL;
-    qemu_co_mutex_unlock(&client->send_lock);
-    return ret;
+    return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);
 }
 
 /* nbd_co_receive_request
-- 
2.11.1


Re: [Qemu-devel] [PATCH v3 07/13] nbd-server: simplify reply transmission
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Send qiov via qio_channel_writev_all instead of calling nbd_write twice
> with a cork.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  nbd/server.c | 50 ++++++++++++++++++++++++--------------------------
>  1 file changed, 24 insertions(+), 26 deletions(-)
> 

> @@ -1203,36 +1220,17 @@ static int nbd_co_send_simple_reply(NBDClient *client,
>                                      size_t len,
>                                      Error **errp)
>  {
> -    NBDSimpleReply simple_reply;
> -    int ret;
> -
> -    g_assert(qemu_in_coroutine());
> +    NBDSimpleReply reply;

Why the rename from simple_reply to reply?

> +    struct iovec iov[] = {
> +        {.iov_base = &reply, .iov_len = sizeof(reply)},

I guess it made this line shorter.  But we could reduce some churn by
naming it 'reply' in the first place, back in earlier patches.  At the
same time, I'm not going to bother if there's not a reason to respin the
series (or at least the first half).

> +        {.iov_base = data, .iov_len = len}
> +    };
>  
>      trace_nbd_co_send_simple_reply(handle, error, len);
>  
> -    set_be_simple_reply(&simple_reply, system_errno_to_nbd_errno(error),
> -                        handle);
> -
> -    qemu_co_mutex_lock(&client->send_lock);
> -    client->send_coroutine = qemu_coroutine_self();
> -
> -    if (!len) {
> -        ret = nbd_write(client->ioc, &simple_reply, sizeof(simple_reply), NULL);
> -    } else {
> -        qio_channel_set_cork(client->ioc, true);
> -        ret = nbd_write(client->ioc, &simple_reply, sizeof(simple_reply), NULL);
> -        if (ret == 0) {
> -            ret = nbd_write(client->ioc, data, len, errp);
> -            if (ret < 0) {
> -                ret = -EIO;
> -            }
> -        }
> -        qio_channel_set_cork(client->ioc, false);
> -    }
> +    set_be_simple_reply(&reply, system_errno_to_nbd_errno(error), handle);
>  
> -    client->send_coroutine = NULL;
> -    qemu_co_mutex_unlock(&client->send_lock);
> -    return ret;
> +    return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);

This part is definitely nicer!  Thanks for splitting the v2 patch, it
made review a lot more pleasant.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v3 07/13] nbd-server: simplify reply transmission
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 05:27 PM, Eric Blake wrote:
> On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Send qiov via qio_channel_writev_all instead of calling nbd_write twice
>> with a cork.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>  nbd/server.c | 50 ++++++++++++++++++++++++--------------------------
>>  1 file changed, 24 insertions(+), 26 deletions(-)
>>

Subject-line consistency: elsewhere you used nbd/server instead of
nbd-server. I'll make the obvious tweak.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v3 07/13] nbd-server: simplify reply transmission
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 05:27 PM, Eric Blake wrote:
> On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Send qiov via qio_channel_writev_all instead of calling nbd_write twice
>> with a cork.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>  nbd/server.c | 50 ++++++++++++++++++++++++--------------------------
>>  1 file changed, 24 insertions(+), 26 deletions(-)
>>
> 
>> @@ -1203,36 +1220,17 @@ static int nbd_co_send_simple_reply(NBDClient *client,
>>                                      size_t len,
>>                                      Error **errp)
>>  {
>> -    NBDSimpleReply simple_reply;
>> -    int ret;
>> -
>> -    g_assert(qemu_in_coroutine());
>> +    NBDSimpleReply reply;
> 
> Why the rename from simple_reply to reply?
> 
>> +    struct iovec iov[] = {
>> +        {.iov_base = &reply, .iov_len = sizeof(reply)},
> 
> I guess it made this line shorter.  But we could reduce some churn by
> naming it 'reply' in the first place, back in earlier patches.  At the
> same time, I'm not going to bother if there's not a reason to respin the
> series (or at least the first half).

Answering myself - you couldn't use the name 'reply' until 5/13 removed
it as a parameter name, even though you introduced the name
'simple_reply' in 4/13.  Okay, the rename is fine here.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Qemu-devel] [PATCH v3 08/13] nbd: header constants indenting
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
Prepare indenting for the following commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
---
 include/block/nbd.h | 15 ++++++++-------
 nbd/nbd-internal.h  | 34 +++++++++++++++++-----------------
 2 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 49008980f4..a6df5ce8b5 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -71,13 +71,14 @@ typedef struct NBDSimpleReply {
 
 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
-#define NBD_FLAG_HAS_FLAGS      (1 << 0)        /* Flags are there */
-#define NBD_FLAG_READ_ONLY      (1 << 1)        /* Device is read-only */
-#define NBD_FLAG_SEND_FLUSH     (1 << 2)        /* Send FLUSH */
-#define NBD_FLAG_SEND_FUA       (1 << 3)        /* Send FUA (Force Unit Access) */
-#define NBD_FLAG_ROTATIONAL     (1 << 4)        /* Use elevator algorithm - rotational media */
-#define NBD_FLAG_SEND_TRIM      (1 << 5)        /* Send TRIM (discard) */
-#define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6)     /* Send WRITE_ZEROES */
+#define NBD_FLAG_HAS_FLAGS         (1 << 0) /* Flags are there */
+#define NBD_FLAG_READ_ONLY         (1 << 1) /* Device is read-only */
+#define NBD_FLAG_SEND_FLUSH        (1 << 2) /* Send FLUSH */
+#define NBD_FLAG_SEND_FUA          (1 << 3) /* Send FUA (Force Unit Access) */
+#define NBD_FLAG_ROTATIONAL        (1 << 4) /* Use elevator algorithm -
+                                               rotational media */
+#define NBD_FLAG_SEND_TRIM         (1 << 5) /* Send TRIM (discard) */
+#define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6) /* Send WRITE_ZEROES */
 
 /* New-style handshake (global) flags, sent from server to client, and
    control what will happen during handshake phase. */
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 2d6663de23..11a130d050 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -46,23 +46,23 @@
 /* Size of oldstyle negotiation */
 #define NBD_OLDSTYLE_NEGOTIATE_SIZE (8 + 8 + 8 + 4 + 124)
 
-#define NBD_REQUEST_MAGIC       0x25609513
-#define NBD_SIMPLE_REPLY_MAGIC  0x67446698
-#define NBD_OPTS_MAGIC          0x49484156454F5054LL
-#define NBD_CLIENT_MAGIC        0x0000420281861253LL
-#define NBD_REP_MAGIC           0x0003e889045565a9LL
-
-#define NBD_SET_SOCK            _IO(0xab, 0)
-#define NBD_SET_BLKSIZE         _IO(0xab, 1)
-#define NBD_SET_SIZE            _IO(0xab, 2)
-#define NBD_DO_IT               _IO(0xab, 3)
-#define NBD_CLEAR_SOCK          _IO(0xab, 4)
-#define NBD_CLEAR_QUE           _IO(0xab, 5)
-#define NBD_PRINT_DEBUG         _IO(0xab, 6)
-#define NBD_SET_SIZE_BLOCKS     _IO(0xab, 7)
-#define NBD_DISCONNECT          _IO(0xab, 8)
-#define NBD_SET_TIMEOUT         _IO(0xab, 9)
-#define NBD_SET_FLAGS           _IO(0xab, 10)
+#define NBD_REQUEST_MAGIC           0x25609513
+#define NBD_SIMPLE_REPLY_MAGIC      0x67446698
+#define NBD_OPTS_MAGIC              0x49484156454F5054LL
+#define NBD_CLIENT_MAGIC            0x0000420281861253LL
+#define NBD_REP_MAGIC               0x0003e889045565a9LL
+
+#define NBD_SET_SOCK                _IO(0xab, 0)
+#define NBD_SET_BLKSIZE             _IO(0xab, 1)
+#define NBD_SET_SIZE                _IO(0xab, 2)
+#define NBD_DO_IT                   _IO(0xab, 3)
+#define NBD_CLEAR_SOCK              _IO(0xab, 4)
+#define NBD_CLEAR_QUE               _IO(0xab, 5)
+#define NBD_PRINT_DEBUG             _IO(0xab, 6)
+#define NBD_SET_SIZE_BLOCKS         _IO(0xab, 7)
+#define NBD_DISCONNECT              _IO(0xab, 8)
+#define NBD_SET_TIMEOUT             _IO(0xab, 9)
+#define NBD_SET_FLAGS               _IO(0xab, 10)
 
 /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
  * but only a limited set of errno values is specified in the protocol.
-- 
2.11.1


[Qemu-devel] [PATCH v3 09/13] nbd: Minimal structured read for server
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
Minimal implementation of structured read: one structured reply chunk,
no segmentation.
Minimal structured error implementation: no text message.
Support DF flag, but just ignore it, as there is no segmentation any
way.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/nbd.h |  33 +++++++++++++++++
 nbd/nbd-internal.h  |   1 +
 nbd/server.c        | 100 ++++++++++++++++++++++++++++++++++++++++++++++++----
 3 files changed, 128 insertions(+), 6 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index a6df5ce8b5..dd261f66f0 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -69,6 +69,25 @@ typedef struct NBDSimpleReply {
     uint64_t handle;
 } QEMU_PACKED NBDSimpleReply;
 
+typedef struct NBDStructuredReplyChunk {
+    uint32_t magic;  /* NBD_STRUCTURED_REPLY_MAGIC */
+    uint16_t flags;  /* combination of NBD_SREP_FLAG_* */
+    uint16_t type;   /* NBD_SREP_TYPE_* */
+    uint64_t handle; /* request handle */
+    uint32_t length; /* length of payload */
+} QEMU_PACKED NBDStructuredReplyChunk;
+
+typedef struct NBDStructuredRead {
+    NBDStructuredReplyChunk h;
+    uint64_t offset;
+} QEMU_PACKED NBDStructuredRead;
+
+typedef struct NBDStructuredError {
+    NBDStructuredReplyChunk h;
+    uint32_t error;
+    uint16_t message_length;
+} QEMU_PACKED NBDStructuredError;
+
 /* Transmission (export) flags: sent from server to client during handshake,
    but describe what will happen during transmission */
 #define NBD_FLAG_HAS_FLAGS         (1 << 0) /* Flags are there */
@@ -79,6 +98,7 @@ typedef struct NBDSimpleReply {
                                                rotational media */
 #define NBD_FLAG_SEND_TRIM         (1 << 5) /* Send TRIM (discard) */
 #define NBD_FLAG_SEND_WRITE_ZEROES (1 << 6) /* Send WRITE_ZEROES */
+#define NBD_FLAG_SEND_DF           (1 << 7) /* Send DF (Do not Fragment) */
 
 /* New-style handshake (global) flags, sent from server to client, and
    control what will happen during handshake phase. */
@@ -125,6 +145,7 @@ typedef struct NBDSimpleReply {
 /* Request flags, sent from client to server during transmission phase */
 #define NBD_CMD_FLAG_FUA        (1 << 0) /* 'force unit access' during write */
 #define NBD_CMD_FLAG_NO_HOLE    (1 << 1) /* don't punch hole on zero run */
+#define NBD_CMD_FLAG_DF         (1 << 2) /* don't fragment structured read */
 
 /* Supported request types */
 enum {
@@ -149,6 +170,18 @@ enum {
  * aren't overflowing some other buffer. */
 #define NBD_MAX_NAME_SIZE 256
 
+/* Structured reply flags */
+#define NBD_SREP_FLAG_DONE          (1 << 0) /* This reply-chunk is last */
+
+/* Structured reply types */
+#define NBD_SREP_ERR(value)         ((1 << 15) | (value))
+
+#define NBD_SREP_TYPE_NONE          0
+#define NBD_SREP_TYPE_OFFSET_DATA   1
+#define NBD_SREP_TYPE_OFFSET_HOLE   2
+#define NBD_SREP_TYPE_ERROR         NBD_SREP_ERR(1)
+#define NBD_SREP_TYPE_ERROR_OFFSET  NBD_SREP_ERR(2)
+
 /* Details collected by NBD_OPT_EXPORT_NAME and NBD_OPT_GO */
 struct NBDExportInfo {
     /* Set by client before nbd_receive_negotiate() */
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index 11a130d050..beb30a7a3e 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -48,6 +48,7 @@
 
 #define NBD_REQUEST_MAGIC           0x25609513
 #define NBD_SIMPLE_REPLY_MAGIC      0x67446698
+#define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
 #define NBD_OPTS_MAGIC              0x49484156454F5054LL
 #define NBD_CLIENT_MAGIC            0x0000420281861253LL
 #define NBD_REP_MAGIC               0x0003e889045565a9LL
diff --git a/nbd/server.c b/nbd/server.c
index c1bbe8d2d1..502873b645 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -98,6 +98,8 @@ struct NBDClient {
     QTAILQ_ENTRY(NBDClient) next;
     int nb_requests;
     bool closing;
+
+    bool structured_reply;
 };
 
 /* That's all folks */
@@ -760,6 +762,20 @@ static int nbd_negotiate_options(NBDClient *client, uint16_t myflags,
                     return ret;
                 }
                 break;
+
+            case NBD_OPT_STRUCTURED_REPLY:
+                if (client->structured_reply) {
+                    error_setg(errp, "Double negotiation of structured reply");
+                    return -EINVAL;
+                }
+                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK, option,
+                                             errp);
+                if (ret < 0) {
+                    return ret;
+                }
+                client->structured_reply = true;
+                break;
+
             default:
                 if (nbd_drop(client->ioc, length, errp) < 0) {
                     return -EIO;
@@ -1233,6 +1249,61 @@ static int nbd_co_send_simple_reply(NBDClient *client,
     return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);
 }
 
+static inline void set_be_chunk(NBDStructuredReplyChunk *chunk, uint16_t flags,
+                                uint16_t type, uint64_t handle, uint32_t length)
+{
+    stl_be_p(&chunk->magic, NBD_STRUCTURED_REPLY_MAGIC);
+    stw_be_p(&chunk->flags, flags);
+    stw_be_p(&chunk->type, type);
+    stq_be_p(&chunk->handle, handle);
+    stl_be_p(&chunk->length, length);
+}
+
+static inline int coroutine_fn nbd_co_send_buf(NBDClient *client, void *buf,
+                                               size_t size, Error **errp)
+{
+    struct iovec iov[] = {
+        {.iov_base = buf, .iov_len = size}
+    };
+
+    return nbd_co_send_iov(client, iov, 1, errp);
+}
+
+static int coroutine_fn nbd_co_send_structured_read(NBDClient *client,
+                                                    uint64_t handle,
+                                                    uint64_t offset,
+                                                    void *data,
+                                                    size_t size,
+                                                    Error **errp)
+{
+    NBDStructuredRead chunk;
+    struct iovec iov[] = {
+        {.iov_base = &chunk, .iov_len = sizeof(chunk)},
+        {.iov_base = data, .iov_len = size}
+    };
+
+    set_be_chunk(&chunk.h, NBD_SREP_FLAG_DONE, NBD_SREP_TYPE_OFFSET_DATA,
+                 handle, sizeof(chunk) - sizeof(chunk.h) + size);
+    stq_be_p(&chunk.offset, offset);
+
+    return nbd_co_send_iov(client, iov, 2, errp);
+}
+
+static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
+                                                     uint64_t handle,
+                                                     uint32_t error,
+                                                     Error **errp)
+{
+    NBDStructuredError chunk;
+
+    set_be_chunk(&chunk.h, NBD_SREP_FLAG_DONE, NBD_SREP_TYPE_ERROR, handle,
+                 sizeof(chunk) - sizeof(chunk.h));
+    stl_be_p(&chunk.error, error);
+    stw_be_p(&chunk.message_length, 0);
+
+    return nbd_co_send_buf(client, &chunk, sizeof(chunk), errp);
+}
+
 /* nbd_co_receive_request
  * Collect a client request. Return 0 if request looks valid, -EIO to drop
  * connection right away, and any other negative value to report an error to
@@ -1304,10 +1375,17 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
                    (uint64_t)client->exp->size);
         return request->type == NBD_CMD_WRITE ? -ENOSPC : -EINVAL;
     }
-    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
+    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE |
+                           NBD_CMD_FLAG_DF))
+    {
         error_setg(errp, "unsupported flags (got 0x%x)", request->flags);
         return -EINVAL;
     }
+    if (request->type != NBD_CMD_READ && (request->flags & NBD_CMD_FLAG_DF)) {
+        error_setg(errp, "DF flag used with command %d (%s) which is not READ",
+                   request->type, nbd_cmd_lookup(request->type));
+        return -EINVAL;
+    }
     if (request->type != NBD_CMD_WRITE_ZEROES &&
         (request->flags & NBD_CMD_FLAG_NO_HOLE)) {
         error_setg(errp, "unexpected flags (got 0x%x)", request->flags);
@@ -1374,7 +1452,6 @@ static coroutine_fn void nbd_trip(void *opaque)
         }
 
         reply_data_len = request.len;
-
         break;
     case NBD_CMD_WRITE:
         if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
@@ -1447,10 +1524,21 @@ reply:
         local_err = NULL;
     }
 
-    if (nbd_co_send_simple_reply(req->client, request.handle,
-                                 ret < 0 ? -ret : 0,
-                                 req->data, reply_data_len, &local_err) < 0)
-    {
+    if (client->structured_reply && request.type == NBD_CMD_READ) {
+        if (ret < 0) {
+            ret = nbd_co_send_structured_error(req->client, request.handle,
+                                               -ret, &local_err);
+        } else {
+            ret = nbd_co_send_structured_read(req->client, request.handle,
+                                              request.from, req->data,
+                                              reply_data_len, &local_err);
+        }
+    } else {
+        ret = nbd_co_send_simple_reply(req->client, request.handle,
+                                       ret < 0 ? -ret : 0,
+                                       req->data, reply_data_len, &local_err);
+    }
+    if (ret < 0) {
         error_prepend(&local_err, "Failed to send reply: ");
         goto disconnect;
     }
-- 
2.11.1


Re: [Qemu-devel] [PATCH v3 09/13] nbd: Minimal structured read for server
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Minimal implementation of structured read: one structured reply chunk,
> no segmentation.
> Minimal structured error implementation: no text message.
> Support DF flag, but just ignore it, as there is no segmentation any
> way.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/nbd.h |  33 +++++++++++++++++
>  nbd/nbd-internal.h  |   1 +
>  nbd/server.c        | 100 ++++++++++++++++++++++++++++++++++++++++++++++++----
>  3 files changed, 128 insertions(+), 6 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index a6df5ce8b5..dd261f66f0 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -69,6 +69,25 @@ typedef struct NBDSimpleReply {
>      uint64_t handle;
>  } QEMU_PACKED NBDSimpleReply;
>  
> +typedef struct NBDStructuredReplyChunk {
> +    uint32_t magic;  /* NBD_STRUCTURED_REPLY_MAGIC */
> +    uint16_t flags;  /* combination of NBD_SREP_FLAG_* */

The NBD spec proposal [1] names these NBD_REPLY_FLAG_*. Your naming
NBD_SREP_FLAG_ is not bad, but consistency between the two may be
desirable; on the other hand, since this is the first implementation of
the spec, it is also a possibility that we could rewrite the NBD spec to
use your naming.

[1]
https://github.com/NetworkBlockDevice/nbd/blob/extension-blockstatus/doc/proto.md

> +    uint16_t type;   /* NBD_SREP_TYPE_* */

Ditto for the spec naming it NBD_REPLY_TYPE_* instead of NBD_SREP_TYPE_*.

> +    uint64_t handle; /* request handle */
> +    uint32_t length; /* length of payload */
> +} QEMU_PACKED NBDStructuredReplyChunk;
> +
> +typedef struct NBDStructuredRead {
> +    NBDStructuredReplyChunk h;
> +    uint64_t offset;
> +} QEMU_PACKED NBDStructuredRead;

Worth a comment that this type is used with NBD_SREP_TYPE_OFFSET_DATA
(but without the additional data payload), and is also usable with
NBD_SREP_TYPE_OFFSET_HOLE (no additional payload required)?

> +
> +typedef struct NBDStructuredError {
> +    NBDStructuredReplyChunk h;
> +    uint32_t error;
> +    uint16_t message_length;
> +} QEMU_PACKED NBDStructuredError;

Worth a comment that this type is a common prefix to all
NBD_REPLY_TYPE_ERROR* chunks?

> +++ b/nbd/server.c
> @@ -98,6 +98,8 @@ struct NBDClient {
>      QTAILQ_ENTRY(NBDClient) next;
>      int nb_requests;
>      bool closing;
> +
> +    bool structured_reply;
>  };
>  
>  /* That's all folks */
> @@ -760,6 +762,20 @@ static int nbd_negotiate_options(NBDClient *client, uint16_t myflags,
>                      return ret;
>                  }
>                  break;
> +
> +            case NBD_OPT_STRUCTURED_REPLY:
> +                if (client->structured_reply) {
> +                    error_setg(errp, "Double negotiation of structured reply");
> +                    return -EINVAL;
> +                }

That's rather harsh, to hang up on the client without even replying.  At
the very least, we should try to be nice and tell the client about their
buggy handshake before pulling the plug; or we could be generous and
ignore the request as a no-op (by returning success again).  The spec is
currently silent on whether we have to silently tolerate this condition
or if we can reply with an error; but I would lean towards updating the
spec to permit a reply of NBD_REP_ERR_INVALID in this situation.  I can
tweak the code along those lines.

Writing this makes me remember another thing.  I know that nbdkit has a
finite limit on the number of NBD_OPT_* it is willing to accept from a
client before it kills the connection on the grounds that a client that
doesn't eventually move into transmission phase is more likely trying to
act as a denial of service by consuming server CPU - maybe we should
consider a followup patch to qemu to do likewise, so that an infinite
string of NBD_OPT_STRUCTURED_REPLY doesn't let us spin forever with a
client that got stuck sending the same option.

> @@ -1233,6 +1249,61 @@ static int nbd_co_send_simple_reply(NBDClient *client,
>      return nbd_co_send_iov(client, iov, len ? 2 : 1, errp);
>  }
>  
> +static inline void set_be_chunk(NBDStructuredReplyChunk *chunk, uint16_t flags,
> +                                uint16_t type, uint64_t handle, uint32_t length)
> +{

Might be nice to place this function near set_be_simple_reply(),

> +
> +static inline int coroutine_fn nbd_co_send_buf(NBDClient *client, void *buf,
> +                                               size_t size, Error **errp)
> +{
> +    struct iovec iov[] = {
> +        {.iov_base = buf, .iov_len = size}
> +    };
> +
> +    return nbd_co_send_iov(client, iov, 1, errp);
> +}

and to place this near nbd_co_send_iov().  But see below, at [1]

> +
> +static int coroutine_fn nbd_co_send_structured_read(NBDClient *client,
> +                                                    uint64_t handle,
> +                                                    uint64_t offset,
> +                                                    void *data,
> +                                                    size_t size,
> +                                                    Error **errp)
> +{
> +    NBDStructuredRead chunk;
> +    struct iovec iov[] = {
> +        {.iov_base = &chunk, .iov_len = sizeof(chunk)},
> +        {.iov_base = data, .iov_len = size}
> +    };
> +
> +    set_be_chunk(&chunk.h, NBD_SREP_FLAG_DONE, NBD_SREP_TYPE_OFFSET_DATA,
> +                 handle, sizeof(chunk) - sizeof(chunk.h) + size);
> +    stq_be_p(&chunk.offset, offset);

This is indeed a bare minimum implementation (it doesn't try to
recognize holes at all) - but that's reasonable for the first cut. We
can add hole support on top as later patches.

> +
> +    return nbd_co_send_iov(client, iov, 2, errp);
> +}
> +
> +static int coroutine_fn nbd_co_send_structured_error(NBDClient *client,
> +                                                     uint64_t handle,
> +                                                     uint32_t error,
> +                                                     Error **errp)

It would be trivial to support a human-readable error message as a const
char * parameter. But I'm also okay doing that as a followup patch.

> +{
> +    NBDStructuredError chunk;
> +
> +    set_be_chunk(&chunk.h, NBD_SREP_FLAG_DONE, NBD_SREP_TYPE_ERROR, handle,
> +                 sizeof(chunk) - sizeof(chunk.h));
> +    stl_be_p(&chunk.error, error);

We could get away with assert(error), to prove our server is following spec.

Ouch - I don't think you converted the host errno to the NBD wire value
at any point along the chain.  You also lack a trace message for
anything sent by these two new functions.  Those belong in this patch.

> +    stw_be_p(&chunk.message_length, 0);
> +
> +    return nbd_co_send_buf(client, &chunk, sizeof(chunk), errp);

[1] If we send an optional human-readable message, then we have to use
nbd_co_send_iov anyways; at which point, we have no clients of
nbd_co_send_buf.  If I do a followup that adds the error message, then
it feels like churn if I have to delete something added here; so maybe
the best course is for this patch to just drop the helper function, and
inline nbd_co_send_buf()'s actions here, to make the followup patch
easier to write.

> @@ -1304,10 +1375,17 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
>                     (uint64_t)client->exp->size);
>          return request->type == NBD_CMD_WRITE ? -ENOSPC : -EINVAL;
>      }
> -    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
> +    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE |
> +                           NBD_CMD_FLAG_DF))

This says we accept the client sending NBD_CMD_FLAG_DF, but where did we
advertise it?  I see no reason to advertise it unless a client
negotiates structured replies, at which point the obvious place to set
it is when we reply to the negotiation (and NBD_OPT_INFO prior to that
point won't see the flag, but the NBD_OPT_GO will see it).

> @@ -1374,7 +1452,6 @@ static coroutine_fn void nbd_trip(void *opaque)
>          }
>  
>          reply_data_len = request.len;
> -
>          break;

Spurious whitespace change hunk.

So, here's what I'm squashing, before adding:

Reviewed-by: Eric Blake <eblake@redhat.com>

diff --git i/include/block/nbd.h w/include/block/nbd.h
index dd261f66f0..f1b8c28f58 100644
--- i/include/block/nbd.h
+++ w/include/block/nbd.h
@@ -69,6 +69,7 @@ typedef struct NBDSimpleReply {
     uint64_t handle;
 } QEMU_PACKED NBDSimpleReply;

+/* Header of all structured replies */
 typedef struct NBDStructuredReplyChunk {
     uint32_t magic;  /* NBD_STRUCTURED_REPLY_MAGIC */
     uint16_t flags;  /* combination of NBD_SREP_FLAG_* */
@@ -77,11 +78,13 @@ typedef struct NBDStructuredReplyChunk {
     uint32_t length; /* length of payload */
 } QEMU_PACKED NBDStructuredReplyChunk;

+/* Header of NBD_SREP_TYPE_OFFSET_DATA, complete
NBD_SREP_TYPE_OFFSET_HOLE */
 typedef struct NBDStructuredRead {
     NBDStructuredReplyChunk h;
     uint64_t offset;
 } QEMU_PACKED NBDStructuredRead;

+/* Header of all NBD_SREP_TYPE_ERROR* errors */
 typedef struct NBDStructuredError {
     NBDStructuredReplyChunk h;
     uint32_t error;
diff --git i/nbd/server.c w/nbd/server.c
index e55825cc91..b4966ed8b1 100644
--- i/nbd/server.c
+++ w/nbd/server.c
@@ -765,15 +765,18 @@ static int nbd_negotiate_options(NBDClient
*client, uint16_t myflags,

             case NBD_OPT_STRUCTURED_REPLY:
                 if (client->structured_reply) {
-                    error_setg(errp, "Double negotiation of structured
reply");
-                    return -EINVAL;
+                    ret = nbd_negotiate_send_rep_err(
+                        client->ioc, NBD_REP_ERR_INVALID, option, errp,
+                        "structured reply already negotiated");
+                } else {
+                    ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
+                                                 option, errp);
                 }
-                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
option,
-                                             errp);
                 if (ret < 0) {
                     return ret;
                 }
                 client->structured_reply = true;
+                myflags |= NBD_CMD_FLAG_DF;
                 break;

             default:
@@ -1259,16 +1262,6 @@ static inline void
set_be_chunk(NBDStructuredReplyChunk *chunk, uint16_t flags,
     stl_be_p(&chunk->length, length);
 }

-static inline int coroutine_fn nbd_co_send_buf(NBDClient *client, void
*buf,
-                                               size_t size, Error **errp)
-{
-    struct iovec iov[] = {
-        {.iov_base = buf, .iov_len = size}
-    };
-
-    return nbd_co_send_iov(client, iov, 1, errp);
-}
-
 static int coroutine_fn nbd_co_send_structured_read(NBDClient *client,
                                                     uint64_t handle,
                                                     uint64_t offset,
@@ -1282,6 +1275,7 @@ static int coroutine_fn
nbd_co_send_structured_read(NBDClient *client,
         {.iov_base = data, .iov_len = size}
     };

+    trace_nbd_co_send_structured_read(handle, offset, data, size);
     set_be_chunk(&chunk.h, NBD_SREP_FLAG_DONE, NBD_SREP_TYPE_OFFSET_DATA,
                  handle, sizeof(chunk) - sizeof(chunk.h) + size);
     stq_be_p(&chunk.offset, offset);
@@ -1295,13 +1289,20 @@ static int coroutine_fn
nbd_co_send_structured_error(NBDClient *client,
                                                      Error **errp)
 {
     NBDStructuredError chunk;
+    int nbd_err = system_errno_to_nbd_errno(error);
+    struct iovec iov[] = {
+        {.iov_base = &chunk, .iov_len = sizeof(chunk)},
+        /* FIXME: Support human-readable error message */
+    };

+    assert(nbd_err);
+    trace_nbd_co_send_structured_error(handle, nbd_err);
     set_be_chunk(&chunk.h, NBD_SREP_FLAG_DONE, NBD_SREP_TYPE_ERROR, handle,
                  sizeof(chunk) - sizeof(chunk.h));
-    stl_be_p(&chunk.error, error);
+    stl_be_p(&chunk.error, nbd_err);
     stw_be_p(&chunk.message_length, 0);

-    return nbd_co_send_buf(client, &chunk, sizeof(chunk), errp);
+    return nbd_co_send_iov(client, iov, 1, errp);
 }

 /* nbd_co_receive_request
@@ -1452,6 +1453,7 @@ static coroutine_fn void nbd_trip(void *opaque)
         }

         reply_data_len = request.len;
+
         break;
     case NBD_CMD_WRITE:
         if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
diff --git i/nbd/trace-events w/nbd/trace-events
index e27614f050..0d7c3b4887 100644
--- i/nbd/trace-events
+++ w/nbd/trace-events
@@ -54,6 +54,8 @@ nbd_receive_request(uint32_t magic, uint16_t flags,
uint16_t type, uint64_t from
 nbd_blk_aio_attached(const char *name, void *ctx) "Export %s: Attaching
clients to AIO context %p\n"
 nbd_blk_aio_detach(const char *name, void *ctx) "Export %s: Detaching
clients from AIO context %p\n"
 nbd_co_send_simple_reply(uint64_t handle, uint32_t error, int len)
"Send simple reply: handle = %" PRIu64 ", error = %" PRIu32 ", len = %d"
+nbd_co_send_structured_read(uint64_t handle, uint64_t offset, void
*data, size_t size) "Send structured read data reply: handle = %" PRIu64
", offset = %" PRIu64 ", data = %p, len = %zu"
+nbd_co_send_structured_error(uint64_t handle, int err) "Send structured
error reply: handle = %" PRIu64 ", error = %d"
 nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type,
const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16
" (%s)"
 nbd_co_receive_request_payload_received(uint64_t handle, uint32_t len)
"Payload received: handle = %" PRIu64 ", len = %" PRIu32
 nbd_co_receive_request_cmd_write(uint32_t len) "Reading %" PRIu32 "
byte(s)"



-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v3 09/13] nbd: Minimal structured read for server
Posted by Eric Blake, 8 weeks ago
On 10/13/2017 11:00 AM, Eric Blake wrote:
> On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Minimal implementation of structured read: one structured reply chunk,
>> no segmentation.
>> Minimal structured error implementation: no text message.
>> Support DF flag, but just ignore it, as there is no segmentation any
>> way.
>>
>> @@ -1304,10 +1375,17 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
>>                     (uint64_t)client->exp->size);
>>          return request->type == NBD_CMD_WRITE ? -ENOSPC : -EINVAL;
>>      }
>> -    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
>> +    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE |
>> +                           NBD_CMD_FLAG_DF))
> 
> This says we accept the client sending NBD_CMD_FLAG_DF, but where did we
> advertise it?  I see no reason to advertise it unless a client
> negotiates structured replies, at which point the obvious place to set
> it is when we reply to the negotiation (and NBD_OPT_INFO prior to that
> point won't see the flag, but the NBD_OPT_GO will see it).

Oh, one more tweak: if we didn't advertise the flag, the client
shouldn't be sending it.

> 
> So, here's what I'm squashing, before adding:
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>

I'm also adding:

diff --git i/nbd/server.c w/nbd/server.c
index b4966ed8b1..acad2ceb59 100644
--- i/nbd/server.c
+++ w/nbd/server.c
@@ -1315,6 +1315,7 @@ static int nbd_co_receive_request(NBDRequestData
*req, NBDRequest *request,
                                   Error **errp)
 {
     NBDClient *client = req->client;
+    int valid_flags;

     g_assert(qemu_in_coroutine());
     assert(client->recv_coroutine == qemu_coroutine_self());
@@ -1376,20 +1377,15 @@ static int nbd_co_receive_request(NBDRequestData
*req, NBDRequest *request,
                    (uint64_t)client->exp->size);
         return request->type == NBD_CMD_WRITE ? -ENOSPC : -EINVAL;
     }
-    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE |
-                           NBD_CMD_FLAG_DF))
-    {
-        error_setg(errp, "unsupported flags (got 0x%x)", request->flags);
-        return -EINVAL;
+    valid_flags = NBD_CMD_FLAG_FUA;
+    if (request->type == NBD_CMD_READ && client->structured_reply) {
+        valid_flags |= NBD_CMD_FLAG_DF;
+    } else if (request->type == NBD_CMD_WRITE_ZEROES) {
+        valid_flags |= NBD_CMD_FLAG_NO_HOLE;
     }
-    if (request->type != NBD_CMD_READ && (request->flags &
NBD_CMD_FLAG_DF)) {
-        error_setg(errp, "DF flag used with command %d (%s) which is
not READ",
-                   request->type, nbd_cmd_lookup(request->type));
-        return -EINVAL;
-    }
-    if (request->type != NBD_CMD_WRITE_ZEROES &&
-        (request->flags & NBD_CMD_FLAG_NO_HOLE)) {
-        error_setg(errp, "unexpected flags (got 0x%x)", request->flags);
+    if (request->flags & ~valid_flags) {
+        error_setg(errp, "unsupported flags for command %s (got 0x%x)",
+                   nbd_cmd_lookup(request->type), request->flags);
         return -EINVAL;
     }



-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v3 09/13] nbd: Minimal structured read for server
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
13.10.2017 19:15, Eric Blake wrote:
> On 10/13/2017 11:00 AM, Eric Blake wrote:
>> On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Minimal implementation of structured read: one structured reply chunk,
>>> no segmentation.
>>> Minimal structured error implementation: no text message.
>>> Support DF flag, but just ignore it, as there is no segmentation any
>>> way.
>>>
>>> @@ -1304,10 +1375,17 @@ static int nbd_co_receive_request(NBDRequestData *req, NBDRequest *request,
>>>                      (uint64_t)client->exp->size);
>>>           return request->type == NBD_CMD_WRITE ? -ENOSPC : -EINVAL;
>>>       }
>>> -    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE)) {
>>> +    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE |
>>> +                           NBD_CMD_FLAG_DF))
>> This says we accept the client sending NBD_CMD_FLAG_DF, but where did we
>> advertise it?  I see no reason to advertise it unless a client
>> negotiates structured replies, at which point the obvious place to set
>> it is when we reply to the negotiation (and NBD_OPT_INFO prior to that
>> point won't see the flag, but the NBD_OPT_GO will see it).
> Oh, one more tweak: if we didn't advertise the flag, the client
> shouldn't be sending it.
>
>> So, here's what I'm squashing, before adding:
>>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
> I'm also adding:
>
> diff --git i/nbd/server.c w/nbd/server.c
> index b4966ed8b1..acad2ceb59 100644
> --- i/nbd/server.c
> +++ w/nbd/server.c
> @@ -1315,6 +1315,7 @@ static int nbd_co_receive_request(NBDRequestData
> *req, NBDRequest *request,
>                                     Error **errp)
>   {
>       NBDClient *client = req->client;
> +    int valid_flags;
>
>       g_assert(qemu_in_coroutine());
>       assert(client->recv_coroutine == qemu_coroutine_self());
> @@ -1376,20 +1377,15 @@ static int nbd_co_receive_request(NBDRequestData
> *req, NBDRequest *request,
>                      (uint64_t)client->exp->size);
>           return request->type == NBD_CMD_WRITE ? -ENOSPC : -EINVAL;
>       }
> -    if (request->flags & ~(NBD_CMD_FLAG_FUA | NBD_CMD_FLAG_NO_HOLE |
> -                           NBD_CMD_FLAG_DF))
> -    {
> -        error_setg(errp, "unsupported flags (got 0x%x)", request->flags);
> -        return -EINVAL;
> +    valid_flags = NBD_CMD_FLAG_FUA;
> +    if (request->type == NBD_CMD_READ && client->structured_reply) {
> +        valid_flags |= NBD_CMD_FLAG_DF;
> +    } else if (request->type == NBD_CMD_WRITE_ZEROES) {
> +        valid_flags |= NBD_CMD_FLAG_NO_HOLE;
>       }
> -    if (request->type != NBD_CMD_READ && (request->flags &
> NBD_CMD_FLAG_DF)) {
> -        error_setg(errp, "DF flag used with command %d (%s) which is
> not READ",
> -                   request->type, nbd_cmd_lookup(request->type));
> -        return -EINVAL;
> -    }
> -    if (request->type != NBD_CMD_WRITE_ZEROES &&
> -        (request->flags & NBD_CMD_FLAG_NO_HOLE)) {
> -        error_setg(errp, "unexpected flags (got 0x%x)", request->flags);
> +    if (request->flags & ~valid_flags) {
> +        error_setg(errp, "unsupported flags for command %s (got 0x%x)",
> +                   nbd_cmd_lookup(request->type), request->flags);
>           return -EINVAL;
>       }
>
>
>

ok for me.

-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v3 09/13] nbd: Minimal structured read for server
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
13.10.2017 19:00, Eric Blake wrote:
> On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Minimal implementation of structured read: one structured reply chunk,
>> no segmentation.
>> Minimal structured error implementation: no text message.
>> Support DF flag, but just ignore it, as there is no segmentation any
>> way.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---

[...]

>> Spurious whitespace change hunk.
>>
>> So, here's what I'm squashing, before adding:
>>
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>>
>> diff --git i/include/block/nbd.h w/include/block/nbd.h
>> index dd261f66f0..f1b8c28f58 100644
>> --- i/include/block/nbd.h
>> +++ w/include/block/nbd.h
>> @@ -69,6 +69,7 @@ typedef struct NBDSimpleReply {
>>       uint64_t handle;
>>   } QEMU_PACKED NBDSimpleReply;
>>
>> +/* Header of all structured replies */
>>   typedef struct NBDStructuredReplyChunk {
>>       uint32_t magic;  /* NBD_STRUCTURED_REPLY_MAGIC */
>>       uint16_t flags;  /* combination of NBD_SREP_FLAG_* */
>> @@ -77,11 +78,13 @@ typedef struct NBDStructuredReplyChunk {
>>       uint32_t length; /* length of payload */
>>   } QEMU_PACKED NBDStructuredReplyChunk;
>>
>> +/* Header of NBD_SREP_TYPE_OFFSET_DATA, complete
>> NBD_SREP_TYPE_OFFSET_HOLE */
>>   typedef struct NBDStructuredRead {
>>       NBDStructuredReplyChunk h;
>>       uint64_t offset;
>>   } QEMU_PACKED NBDStructuredRead;
>>
>> +/* Header of all NBD_SREP_TYPE_ERROR* errors */
>>   typedef struct NBDStructuredError {
>>       NBDStructuredReplyChunk h;
>>       uint32_t error;
>> diff --git i/nbd/server.c w/nbd/server.c
>> index e55825cc91..b4966ed8b1 100644
>> --- i/nbd/server.c
>> +++ w/nbd/server.c
>> @@ -765,15 +765,18 @@ static int nbd_negotiate_options(NBDClient
>> *client, uint16_t myflags,
>>
>>               case NBD_OPT_STRUCTURED_REPLY:
>>                   if (client->structured_reply) {
>> -                    error_setg(errp, "Double negotiation of structured
>> reply");
>> -                    return -EINVAL;
>> +                    ret = nbd_negotiate_send_rep_err(
>> +                        client->ioc, NBD_REP_ERR_INVALID, option, errp,
>> +                        "structured reply already negotiated");
>> +                } else {
>> +                    ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
>> +                                                 option, errp);
>>                   }
>> -                ret = nbd_negotiate_send_rep(client->ioc, NBD_REP_ACK,
>> option,
>> -                                             errp);
>>                   if (ret < 0) {
>>                       return ret;
>>                   }
>>                   client->structured_reply = true;
>> +                myflags |= NBD_CMD_FLAG_DF;
>>                   break;
>>
>>               default:
>> @@ -1259,16 +1262,6 @@ static inline void
>> set_be_chunk(NBDStructuredReplyChunk *chunk, uint16_t flags,
>>       stl_be_p(&chunk->length, length);
>>   }
>>
>> -static inline int coroutine_fn nbd_co_send_buf(NBDClient *client, void
>> *buf,
>> -                                               size_t size, Error **errp)
>> -{
>> -    struct iovec iov[] = {
>> -        {.iov_base = buf, .iov_len = size}
>> -    };
>> -
>> -    return nbd_co_send_iov(client, iov, 1, errp);
>> -}
>> -
>>   static int coroutine_fn nbd_co_send_structured_read(NBDClient *client,
>>                                                       uint64_t handle,
>>                                                       uint64_t offset,
>> @@ -1282,6 +1275,7 @@ static int coroutine_fn
>> nbd_co_send_structured_read(NBDClient *client,
>>           {.iov_base = data, .iov_len = size}
>>       };
>>
>> +    trace_nbd_co_send_structured_read(handle, offset, data, size);
>>       set_be_chunk(&chunk.h, NBD_SREP_FLAG_DONE, NBD_SREP_TYPE_OFFSET_DATA,
>>                    handle, sizeof(chunk) - sizeof(chunk.h) + size);
>>       stq_be_p(&chunk.offset, offset);
>> @@ -1295,13 +1289,20 @@ static int coroutine_fn
>> nbd_co_send_structured_error(NBDClient *client,
>>                                                        Error **errp)
>>   {
>>       NBDStructuredError chunk;
>> +    int nbd_err = system_errno_to_nbd_errno(error);
>> +    struct iovec iov[] = {
>> +        {.iov_base = &chunk, .iov_len = sizeof(chunk)},
>> +        /* FIXME: Support human-readable error message */
>> +    };
>>
>> +    assert(nbd_err);
>> +    trace_nbd_co_send_structured_error(handle, nbd_err);
>>       set_be_chunk(&chunk.h, NBD_SREP_FLAG_DONE, NBD_SREP_TYPE_ERROR, handle,
>>                    sizeof(chunk) - sizeof(chunk.h));
>> -    stl_be_p(&chunk.error, error);
>> +    stl_be_p(&chunk.error, nbd_err);
>>       stw_be_p(&chunk.message_length, 0);
>>
>> -    return nbd_co_send_buf(client, &chunk, sizeof(chunk), errp);
>> +    return nbd_co_send_iov(client, iov, 1, errp);
>>   }
>>
>>   /* nbd_co_receive_request
>> @@ -1452,6 +1453,7 @@ static coroutine_fn void nbd_trip(void *opaque)
>>           }
>>
>>           reply_data_len = request.len;
>> +
>>           break;
>>       case NBD_CMD_WRITE:
>>           if (exp->nbdflags & NBD_FLAG_READ_ONLY) {
>> diff --git i/nbd/trace-events w/nbd/trace-events
>> index e27614f050..0d7c3b4887 100644
>> --- i/nbd/trace-events
>> +++ w/nbd/trace-events
>> @@ -54,6 +54,8 @@ nbd_receive_request(uint32_t magic, uint16_t flags,
>> uint16_t type, uint64_t from
>>   nbd_blk_aio_attached(const char *name, void *ctx) "Export %s: Attaching
>> clients to AIO context %p\n"
>>   nbd_blk_aio_detach(const char *name, void *ctx) "Export %s: Detaching
>> clients from AIO context %p\n"
>>   nbd_co_send_simple_reply(uint64_t handle, uint32_t error, int len)
>> "Send simple reply: handle = %" PRIu64 ", error = %" PRIu32 ", len = %d"
>> +nbd_co_send_structured_read(uint64_t handle, uint64_t offset, void
>> *data, size_t size) "Send structured read data reply: handle = %" PRIu64
>> ", offset = %" PRIu64 ", data = %p, len = %zu"
>> +nbd_co_send_structured_error(uint64_t handle, int err) "Send structured
>> error reply: handle = %" PRIu64 ", error = %d"
>>   nbd_co_receive_request_decode_type(uint64_t handle, uint16_t type,
>> const char *name) "Decoding type: handle = %" PRIu64 ", type = %" PRIu16
>> " (%s)"
>>   nbd_co_receive_request_payload_received(uint64_t handle, uint32_t len)
>> "Payload received: handle = %" PRIu64 ", len = %" PRIu32
>>   nbd_co_receive_request_cmd_write(uint32_t len) "Reading %" PRIu32 "
>> byte(s)"
>>
>>
>>

Looks OK for me, thanks.

However, now I don't like _SREP_... It's not very obvious abbreviation. 
Simple and Structured are both with first symbol 'S')))
I don't remember, what was the reason to use it. (obviously to 
distinguish it with simple reply, but where are simple reply flags? 
there no intersection anyway)
So, I prefer s/_SREP_/_REPLY_/ for the whole series.

-- 
Best regards,
Vladimir


[Qemu-devel] [PATCH v3 10/13] nbd/client: refactor nbd_receive_starttls
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
Split out nbd_receive_simple_option to be reused for structured reply
option.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 nbd/client.c     | 64 ++++++++++++++++++++++++++++++++++++++++----------------
 nbd/trace-events |  7 ++++---
 2 files changed, 50 insertions(+), 21 deletions(-)

diff --git a/nbd/client.c b/nbd/client.c
index cd5a2c80ac..c8702a80b1 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -540,35 +540,63 @@ static int nbd_receive_query_exports(QIOChannel *ioc,
     }
 }
 
-static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
-                                        QCryptoTLSCreds *tlscreds,
-                                        const char *hostname, Error **errp)
+/* nbd_request_simple_option
+ * return 1 for successful negotiation,
+ *        0 if operation is unsupported,
+ *        -1 with errp set for any other error
+ */
+static int nbd_request_simple_option(QIOChannel *ioc, int opt, Error **errp)
 {
     nbd_opt_reply reply;
-    QIOChannelTLS *tioc;
-    struct NBDTLSHandshakeData data = { 0 };
 
-    trace_nbd_receive_starttls_request();
-    if (nbd_send_option_request(ioc, NBD_OPT_STARTTLS, 0, NULL, errp) < 0) {
-        return NULL;
+    trace_nbd_receive_simple_option_request(opt, nbd_opt_lookup(opt));
+    if (nbd_send_option_request(ioc, opt, 0, NULL, errp) < 0) {
+        return -1;
     }
 
-    trace_nbd_receive_starttls_reply();
-    if (nbd_receive_option_reply(ioc, NBD_OPT_STARTTLS, &reply, errp) < 0) {
-        return NULL;
+    trace_nbd_receive_simple_option_reply(opt, nbd_opt_lookup(opt));
+    if (nbd_receive_option_reply(ioc, opt, &reply, errp) < 0) {
+        return -1;
     }
 
-    if (reply.type != NBD_REP_ACK) {
-        error_setg(errp, "Server rejected request to start TLS %" PRIx32,
-                   reply.type);
+    if (reply.length != 0) {
+        error_setg(errp, "Option %d ('%s') response length is %" PRIu32
+                   " (it should be zero)", opt, nbd_opt_lookup(opt),
+                   reply.length);
         nbd_send_opt_abort(ioc);
-        return NULL;
+        return -1;
     }
 
-    if (reply.length != 0) {
-        error_setg(errp, "Start TLS response was not zero %" PRIu32,
-                   reply.length);
+    if (reply.type == NBD_REP_ERR_UNSUP) {
+        return 0;
+    }
+
+    if (reply.type != NBD_REP_ACK) {
+        error_setg(errp, "Server rejected request for option %d (%s) "
+                   "with reply %" PRIx32 " (%s)", opt, nbd_opt_lookup(opt),
+                   reply.type, nbd_rep_lookup(reply.type));
         nbd_send_opt_abort(ioc);
+        return -1;
+    }
+
+    trace_nbd_receive_simple_option_approved(opt, nbd_opt_lookup(opt));
+    return 1;
+}
+
+static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
+                                        QCryptoTLSCreds *tlscreds,
+                                        const char *hostname, Error **errp)
+{
+    int ret;
+    QIOChannelTLS *tioc;
+    struct NBDTLSHandshakeData data = { 0 };
+
+    ret = nbd_request_simple_option(ioc, NBD_OPT_STARTTLS, errp);
+    if (ret <= 0) {
+        if (ret == 0) {
+            error_setg(errp, "Server don't support STARTTLS option");
+            nbd_send_opt_abort(ioc);
+        }
         return NULL;
     }
 
diff --git a/nbd/trace-events b/nbd/trace-events
index c5f2d97be2..3235922370 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -9,9 +9,10 @@ nbd_opt_go_info_unknown(int info, const char *name) "Ignoring unknown info %d (%
 nbd_opt_go_info_block_size(uint32_t minimum, uint32_t preferred, uint32_t maximum) "Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%" PRIx32
 nbd_receive_query_exports_start(const char *wantname) "Querying export list for '%s'"
 nbd_receive_query_exports_success(const char *wantname) "Found desired export name '%s'"
-nbd_receive_starttls_request(void) "Requesting TLS from server"
-nbd_receive_starttls_reply(void) "Getting TLS reply from server"
-nbd_receive_starttls_new_client(void) "TLS request approved, setting up TLS"
+nbd_receive_simple_option_request(int opt, const char *name) "Requesting option %d (%s) from server"
+nbd_receive_simple_option_reply(int opt, const char *name) "Getting reply for option %d (%s) from server"
+nbd_receive_simple_option_approved(int opt, const char *name) "Option %d (%s) approved"
+nbd_receive_starttls_new_client(void) "Setting up TLS"
 nbd_receive_starttls_tls_handshake(void) "Starting TLS handshake"
 nbd_receive_negotiate(void *tlscreds, const char *hostname) "Receiving negotiation tlscreds=%p hostname=%s"
 nbd_receive_negotiate_magic(uint64_t magic) "Magic is 0x%" PRIx64
-- 
2.11.1


Re: [Qemu-devel] [PATCH v3 10/13] nbd/client: refactor nbd_receive_starttls
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Split out nbd_receive_simple_option to be reused for structured reply

s/receive/request/, but see [1]

> option.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  nbd/client.c     | 64 ++++++++++++++++++++++++++++++++++++++++----------------
>  nbd/trace-events |  7 ++++---
>  2 files changed, 50 insertions(+), 21 deletions(-)
> 
> diff --git a/nbd/client.c b/nbd/client.c
> index cd5a2c80ac..c8702a80b1 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -540,35 +540,63 @@ static int nbd_receive_query_exports(QIOChannel *ioc,
>      }
>  }
>  
> -static QIOChannel *nbd_receive_starttls(QIOChannel *ioc,
> -                                        QCryptoTLSCreds *tlscreds,
> -                                        const char *hostname, Error **errp)
> +/* nbd_request_simple_option
> + * return 1 for successful negotiation,
> + *        0 if operation is unsupported,
> + *        -1 with errp set for any other error
> + */
> +static int nbd_request_simple_option(QIOChannel *ioc, int opt, Error **errp)
>  {
>      nbd_opt_reply reply;

Unrelated, but a potential cleanup for a future patch: we really should
be using our preferred naming scheme, where this would be NBDOptReply or
similar.

> -    QIOChannelTLS *tioc;
> -    struct NBDTLSHandshakeData data = { 0 };
>  
> -    trace_nbd_receive_starttls_request();
> -    if (nbd_send_option_request(ioc, NBD_OPT_STARTTLS, 0, NULL, errp) < 0) {
> -        return NULL;
> +    trace_nbd_receive_simple_option_request(opt, nbd_opt_lookup(opt));

Naming of the trace doesn't quite match the naming of the function...

> +    if (nbd_send_option_request(ioc, opt, 0, NULL, errp) < 0) {
> +        return -1;
>      }
>  
> -    trace_nbd_receive_starttls_reply();
> -    if (nbd_receive_option_reply(ioc, NBD_OPT_STARTTLS, &reply, errp) < 0) {
> -        return NULL;
> +    trace_nbd_receive_simple_option_reply(opt, nbd_opt_lookup(opt));

[1] ...However, trace_nbd_request_simple_option_request sounds silly.
Can we get by with the shorter nbd_simple_option() as the function name,
then have two traces nbd_simple_option_request and
nbd_simple_option_reply?  Or, recognize that nbd_receive_option_reply()
already traces everything, so this trace is redundant with that one.  In
fact, nbd_send_option_request also traces everything (it doesn't trace
payload, but we aren't sending payload; the only time we need two traces
instead of one is if the local trace can document payload before the
common trace documents everything else.  Another reason for redundant
traces is if you envision needing to trace one local thing without being
overwhelmed by the noise of a common trace firing for unrelated events,
but NBD startup is not that frequent to need fine-grained filtering on
the traces).

> +    if (nbd_receive_option_reply(ioc, opt, &reply, errp) < 0) {
> +        return -1;
>      }
>  
> -    if (reply.type != NBD_REP_ACK) {
> -        error_setg(errp, "Server rejected request to start TLS %" PRIx32,
> -                   reply.type);
> +    if (reply.length != 0) {
> +        error_setg(errp, "Option %d ('%s') response length is %" PRIu32
> +                   " (it should be zero)", opt, nbd_opt_lookup(opt),
> +                   reply.length);
>          nbd_send_opt_abort(ioc);

Just thinking aloud here. For starttls, the NBD spec already says
payload must be 0 length. But for NBD_OPT_STRUCTURED_REPLY, which is not
yet standardized, it is conceivable that in the future, we might have a
way for the server to return a payload to the ACK reply, which informs
the client what additional options have been unlocked by negotiating
structured reply (after all, we document that some options should not be
negotiated unless structured reply is negotiated first).  We don't have
that in the spec now, but writing our client to reject any payload from
the server (rather than silently ignoring the payload) means that we
can't ever write a server with that extended reply.  So it's worth
discussing on the NBD list whether we want to make any guarantees about
forcing a 0-length payload vs. allowing clients to ignore an
unrecognized non-empty payload.  Depending on what the NBD community
decides, the easiest approach here would be a 'bool ignore_payload'
parameter, true when requesting OPT_STRUCTURED_REPLY, false when
requesting OPT_STARTTLS.  But as this is all speculation, I'm also fine
with checking in your stricter version as-is for now; we have until qemu
2.11 is released to tweak our implementation as followups based on
conversations on the NBD list.

> +
> +    trace_nbd_receive_simple_option_approved(opt, nbd_opt_lookup(opt));

Another trace that doesn't add much value.

So, here's what I'm squashing, before I give:
Reviewed-by: Eric Blake <eblake@redhat.com>

diff --git i/nbd/client.c w/nbd/client.c
index c8702a80b1..fa64ec1c5b 100644
--- i/nbd/client.c
+++ w/nbd/client.c
@@ -540,7 +540,7 @@ static int nbd_receive_query_exports(QIOChannel *ioc,
     }
 }

-/* nbd_request_simple_option
+/* nbd_request_simple_option: Send an option request, and parse the reply
  * return 1 for successful negotiation,
  *        0 if operation is unsupported,
  *        -1 with errp set for any other error
@@ -549,12 +549,10 @@ static int nbd_request_simple_option(QIOChannel
*ioc, int opt, Error **errp)
 {
     nbd_opt_reply reply;

-    trace_nbd_receive_simple_option_request(opt, nbd_opt_lookup(opt));
     if (nbd_send_option_request(ioc, opt, 0, NULL, errp) < 0) {
         return -1;
     }

-    trace_nbd_receive_simple_option_reply(opt, nbd_opt_lookup(opt));
     if (nbd_receive_option_reply(ioc, opt, &reply, errp) < 0) {
         return -1;
     }
@@ -579,7 +577,6 @@ static int nbd_request_simple_option(QIOChannel
*ioc, int opt, Error **errp)
         return -1;
     }

-    trace_nbd_receive_simple_option_approved(opt, nbd_opt_lookup(opt));
     return 1;
 }

diff --git i/nbd/trace-events w/nbd/trace-events
index 0c8138ab92..59c0718a6b 100644
--- i/nbd/trace-events
+++ w/nbd/trace-events
@@ -9,9 +9,6 @@ nbd_opt_go_info_unknown(int info, const char *name)
"Ignoring unknown info %d (%
 nbd_opt_go_info_block_size(uint32_t minimum, uint32_t preferred,
uint32_t maximum) "Block sizes are 0x%" PRIx32 ", 0x%" PRIx32 ", 0x%" PRIx32
 nbd_receive_query_exports_start(const char *wantname) "Querying export
list for '%s'"
 nbd_receive_query_exports_success(const char *wantname) "Found desired
export name '%s'"
-nbd_receive_simple_option_request(int opt, const char *name)
"Requesting option %d (%s) from server"
-nbd_receive_simple_option_reply(int opt, const char *name) "Getting
reply for option %d (%s) from server"
-nbd_receive_simple_option_approved(int opt, const char *name) "Option
%d (%s) approved"
 nbd_receive_starttls_new_client(void) "Setting up TLS"
 nbd_receive_starttls_tls_handshake(void) "Starting TLS handshake"
 nbd_receive_negotiate(void *tlscreds, const char *hostname) "Receiving
negotiation tlscreds=%p hostname=%s"


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Qemu-devel] [PATCH v3 11/13] nbd: share some nbd entities to be reused in block/nbd-client.c
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/nbd.h | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
 nbd/nbd-internal.h  | 25 -------------------------
 nbd/client.c        | 32 --------------------------------
 3 files changed, 48 insertions(+), 57 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index dd261f66f0..09e4592971 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -77,6 +77,9 @@ typedef struct NBDStructuredReplyChunk {
     uint32_t length; /* length of payload */
 } QEMU_PACKED NBDStructuredReplyChunk;
 
+#define NBD_SIMPLE_REPLY_MAGIC      0x67446698
+#define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
+
 typedef struct NBDStructuredRead {
     NBDStructuredReplyChunk h;
     uint64_t offset;
@@ -182,6 +185,40 @@ enum {
 #define NBD_SREP_TYPE_ERROR         NBD_SREP_ERR(1)
 #define NBD_SREP_TYPE_ERROR_OFFSET  NBD_SREP_ERR(2)
 
+/* NBD errors are based on errno numbers, so there is a 1:1 mapping,
+ * but only a limited set of errno values is specified in the protocol.
+ * Everything else is squashed to EINVAL.
+ */
+#define NBD_SUCCESS    0
+#define NBD_EPERM      1
+#define NBD_EIO        5
+#define NBD_ENOMEM     12
+#define NBD_EINVAL     22
+#define NBD_ENOSPC     28
+#define NBD_ESHUTDOWN  108
+
+static inline int nbd_errno_to_system_errno(int err)
+{
+    switch (err) {
+    case NBD_SUCCESS:
+        return 0;
+    case NBD_EPERM:
+        return EPERM;
+    case NBD_EIO:
+        return EIO;
+    case NBD_ENOMEM:
+        return ENOMEM;
+    case NBD_ENOSPC:
+        return ENOSPC;
+    case NBD_ESHUTDOWN:
+        return ESHUTDOWN;
+    case NBD_EINVAL:
+        return EINVAL;
+    }
+
+    return EINVAL;
+}
+
 /* Details collected by NBD_OPT_EXPORT_NAME and NBD_OPT_GO */
 struct NBDExportInfo {
     /* Set by client before nbd_receive_negotiate() */
@@ -235,4 +272,15 @@ void nbd_client_put(NBDClient *client);
 void nbd_server_start(SocketAddress *addr, const char *tls_creds,
                       Error **errp);
 
+/* nbd_read
+ * Reads @size bytes from @ioc. Returns 0 on success.
+ */
+static inline int nbd_read(QIOChannel *ioc, void *buffer, size_t size,
+                           Error **errp)
+{
+    return qio_channel_read_all(ioc, buffer, size, errp) < 0 ? -EIO : 0;
+}
+
+int nbd_drop(QIOChannel *ioc, size_t size, Error **errp);
+
 #endif
diff --git a/nbd/nbd-internal.h b/nbd/nbd-internal.h
index beb30a7a3e..970b560d11 100644
--- a/nbd/nbd-internal.h
+++ b/nbd/nbd-internal.h
@@ -47,8 +47,6 @@
 #define NBD_OLDSTYLE_NEGOTIATE_SIZE (8 + 8 + 8 + 4 + 124)
 
 #define NBD_REQUEST_MAGIC           0x25609513
-#define NBD_SIMPLE_REPLY_MAGIC      0x67446698
-#define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
 #define NBD_OPTS_MAGIC              0x49484156454F5054LL
 #define NBD_CLIENT_MAGIC            0x0000420281861253LL
 #define NBD_REP_MAGIC               0x0003e889045565a9LL
@@ -65,18 +63,6 @@
 #define NBD_SET_TIMEOUT             _IO(0xab, 9)
 #define NBD_SET_FLAGS               _IO(0xab, 10)
 
-/* NBD errors are based on errno numbers, so there is a 1:1 mapping,
- * but only a limited set of errno values is specified in the protocol.
- * Everything else is squashed to EINVAL.
- */
-#define NBD_SUCCESS    0
-#define NBD_EPERM      1
-#define NBD_EIO        5
-#define NBD_ENOMEM     12
-#define NBD_EINVAL     22
-#define NBD_ENOSPC     28
-#define NBD_ESHUTDOWN  108
-
 /* nbd_read_eof
  * Tries to read @size bytes from @ioc.
  * Returns 1 on success
@@ -96,15 +82,6 @@ static inline int nbd_read_eof(QIOChannel *ioc, void *buffer, size_t size,
     return ret;
 }
 
-/* nbd_read
- * Reads @size bytes from @ioc. Returns 0 on success.
- */
-static inline int nbd_read(QIOChannel *ioc, void *buffer, size_t size,
-                           Error **errp)
-{
-    return qio_channel_read_all(ioc, buffer, size, errp) < 0 ? -EIO : 0;
-}
-
 /* nbd_write
  * Writes @size bytes to @ioc. Returns 0 on success.
  */
@@ -128,6 +105,4 @@ const char *nbd_rep_lookup(uint32_t rep);
 const char *nbd_info_lookup(uint16_t info);
 const char *nbd_cmd_lookup(uint16_t info);
 
-int nbd_drop(QIOChannel *ioc, size_t size, Error **errp);
-
 #endif
diff --git a/nbd/client.c b/nbd/client.c
index c8702a80b1..f0f3075569 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -22,38 +22,6 @@
 #include "trace.h"
 #include "nbd-internal.h"
 
-static int nbd_errno_to_system_errno(int err)
-{
-    int ret;
-    switch (err) {
-    case NBD_SUCCESS:
-        ret = 0;
-        break;
-    case NBD_EPERM:
-        ret = EPERM;
-        break;
-    case NBD_EIO:
-        ret = EIO;
-        break;
-    case NBD_ENOMEM:
-        ret = ENOMEM;
-        break;
-    case NBD_ENOSPC:
-        ret = ENOSPC;
-        break;
-    case NBD_ESHUTDOWN:
-        ret = ESHUTDOWN;
-        break;
-    default:
-        trace_nbd_unknown_error(err);
-        /* fallthrough */
-    case NBD_EINVAL:
-        ret = EINVAL;
-        break;
-    }
-    return ret;
-}
-
 /* Definitions for opaque data types */
 
 static QTAILQ_HEAD(, NBDExport) exports = QTAILQ_HEAD_INITIALIZER(exports);
-- 
2.11.1


Re: [Qemu-devel] [PATCH v3 11/13] nbd: share some nbd entities to be reused in block/nbd-client.c
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/nbd.h | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  nbd/nbd-internal.h  | 25 -------------------------
>  nbd/client.c        | 32 --------------------------------
>  3 files changed, 48 insertions(+), 57 deletions(-)
> 

> +static inline int nbd_errno_to_system_errno(int err)
> +{
> +    switch (err) {
> +    case NBD_SUCCESS:
> +        return 0;
> +    case NBD_EPERM:
> +        return EPERM;
> +    case NBD_EIO:
> +        return EIO;
> +    case NBD_ENOMEM:
> +        return ENOMEM;
> +    case NBD_ENOSPC:
> +        return ENOSPC;
> +    case NBD_ESHUTDOWN:
> +        return ESHUTDOWN;
> +    case NBD_EINVAL:
> +        return EINVAL;
> +    }
> +
> +    return EINVAL;
> +}

This lacks a trace...

> +++ b/nbd/client.c
> @@ -22,38 +22,6 @@
>  #include "trace.h"
>  #include "nbd-internal.h"
>  
> -static int nbd_errno_to_system_errno(int err)
> -{
> -    int ret;
> -    switch (err) {
> -    case NBD_SUCCESS:
> -        ret = 0;
> -        break;
> -    case NBD_EPERM:
> -        ret = EPERM;
> -        break;
> -    case NBD_EIO:
> -        ret = EIO;
> -        break;
> -    case NBD_ENOMEM:
> -        ret = ENOMEM;
> -        break;
> -    case NBD_ENOSPC:
> -        ret = ENOSPC;
> -        break;
> -    case NBD_ESHUTDOWN:
> -        ret = ESHUTDOWN;
> -        break;
> -    default:
> -        trace_nbd_unknown_error(err);
> -        /* fallthrough */
> -    case NBD_EINVAL:

...that was present here.  And you didn't do straight code motion, but
modified things on the way (hence why checkpatch complained that your
more concise version is suspicious with regards to returning positive
errno).  Does the function still need to be static inline, or should we
just declare a prototype in the header and put the function itself in
nbd/common.c?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v3 11/13] nbd: share some nbd entities to be reused in block/nbd-client.c
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
13.10.2017 21:47, Eric Blake wrote:
> On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   include/block/nbd.h | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>>   nbd/nbd-internal.h  | 25 -------------------------
>>   nbd/client.c        | 32 --------------------------------
>>   3 files changed, 48 insertions(+), 57 deletions(-)
>>
>> +static inline int nbd_errno_to_system_errno(int err)
>> +{
>> +    switch (err) {
>> +    case NBD_SUCCESS:
>> +        return 0;
>> +    case NBD_EPERM:
>> +        return EPERM;
>> +    case NBD_EIO:
>> +        return EIO;
>> +    case NBD_ENOMEM:
>> +        return ENOMEM;
>> +    case NBD_ENOSPC:
>> +        return ENOSPC;
>> +    case NBD_ESHUTDOWN:
>> +        return ESHUTDOWN;
>> +    case NBD_EINVAL:
>> +        return EINVAL;
>> +    }
>> +
>> +    return EINVAL;
>> +}
> This lacks a trace...
>
>> +++ b/nbd/client.c
>> @@ -22,38 +22,6 @@
>>   #include "trace.h"
>>   #include "nbd-internal.h"
>>   
>> -static int nbd_errno_to_system_errno(int err)
>> -{
>> -    int ret;
>> -    switch (err) {
>> -    case NBD_SUCCESS:
>> -        ret = 0;
>> -        break;
>> -    case NBD_EPERM:
>> -        ret = EPERM;
>> -        break;
>> -    case NBD_EIO:
>> -        ret = EIO;
>> -        break;
>> -    case NBD_ENOMEM:
>> -        ret = ENOMEM;
>> -        break;
>> -    case NBD_ENOSPC:
>> -        ret = ENOSPC;
>> -        break;
>> -    case NBD_ESHUTDOWN:
>> -        ret = ESHUTDOWN;
>> -        break;
>> -    default:
>> -        trace_nbd_unknown_error(err);
>> -        /* fallthrough */
>> -    case NBD_EINVAL:
> ...that was present here.  And you didn't do straight code motion, but

hmm, sorry.

> modified things on the way (hence why checkpatch complained that your
> more concise version is suspicious with regards to returning positive
> errno).  Does the function still need to be static inline, or should we
> just declare a prototype in the header and put the function itself in
> nbd/common.c?
>

ok for me to do so.

-- 
Best regards,
Vladimir


Re: [Qemu-devel] [PATCH v3 11/13] nbd: share some nbd entities to be reused in block/nbd-client.c
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/nbd.h | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  nbd/nbd-internal.h  | 25 -------------------------
>  nbd/client.c        | 32 --------------------------------
>  3 files changed, 48 insertions(+), 57 deletions(-)
> 

Another thing I noticed looking at this patch:

> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index dd261f66f0..09e4592971 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -77,6 +77,9 @@ typedef struct NBDStructuredReplyChunk {
>      uint32_t length; /* length of payload */
>  } QEMU_PACKED NBDStructuredReplyChunk;
>  
> +#define NBD_SIMPLE_REPLY_MAGIC      0x67446698
> +#define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef

We have some churn here, as we defined this earlier in the series.
Also, in an ideal world (although I don't know if we're quite there), we
should be able to backport patches for JUST the server, or for JUST the
client, in isolation, to talk to an independent implementation on the
other side of the wire.  To do that, it may be better to define all our
new constants in a standalone patch, rather than embedded as part of the
server implementation in 9/13.

I think at this point, I will take 1-8 as amended, and even prepare a
pull request for those, and then post a v4 series based on your work but
with some things moved around, and get consensus on my changes, before
worrying about the pull request for the second half.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v3 11/13] nbd: share some nbd entities to be reused in block/nbd-client.c
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/nbd.h | 48 ++++++++++++++++++++++++++++++++++++++++++++++++
>  nbd/nbd-internal.h  | 25 -------------------------
>  nbd/client.c        | 32 --------------------------------
>  3 files changed, 48 insertions(+), 57 deletions(-)
> 

By the end of your series, I still don't see any use of

> +++ b/include/block/nbd.h

> @@ -235,4 +272,15 @@ void nbd_client_put(NBDClient *client);
>  void nbd_server_start(SocketAddress *addr, const char *tls_creds,
>                        Error **errp);
>  
> +/* nbd_read
> + * Reads @size bytes from @ioc. Returns 0 on success.
> + */
> +static inline int nbd_read(QIOChannel *ioc, void *buffer, size_t size,
> +                           Error **errp)
> +{
> +    return qio_channel_read_all(ioc, buffer, size, errp) < 0 ? -EIO : 0;
> +}
> +
> +int nbd_drop(QIOChannel *ioc, size_t size, Error **errp);

either of these functions in block/nbd-client.c.  I think that's a good
thing (we refactored it so that nbd/client.c is doing ALL the reading
from the wire, and block/nbd-client.c is relying on nbd/client.c to do
the work), but it means this part of the patch is no longer necessary,
unless I'm missing something.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Qemu-devel] [PATCH v3 12/13] nbd/client: prepare nbd_receive_reply for structured reply
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
In following patch nbd_receive_reply will be used both for simple
and structured reply header receiving.
NBDReply is altered into union of simple reply header and structured
reply chunk header, simple error translation moved to block/nbd-client
to be consistent with further structured reply error translation.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/nbd.h |  30 ++++++++++++----
 block/nbd-client.c  |   8 +++--
 nbd/client.c        | 102 +++++++++++++++++++++++++++++++++++++++++-----------
 nbd/trace-events    |   3 +-
 4 files changed, 113 insertions(+), 30 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 09e4592971..1ef8c8897f 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -57,12 +57,6 @@ struct NBDRequest {
 };
 typedef struct NBDRequest NBDRequest;
 
-struct NBDReply {
-    uint64_t handle;
-    uint32_t error;
-};
-typedef struct NBDReply NBDReply;
-
 typedef struct NBDSimpleReply {
     uint32_t magic;  /* NBD_SIMPLE_REPLY_MAGIC */
     uint32_t error;
@@ -77,9 +71,33 @@ typedef struct NBDStructuredReplyChunk {
     uint32_t length; /* length of payload */
 } QEMU_PACKED NBDStructuredReplyChunk;
 
+typedef union NBDReply {
+    NBDSimpleReply simple;
+    NBDStructuredReplyChunk structured;
+    struct {
+        /* @magic and @handle fields have the same offset and size both in
+         * simple reply and structured reply chunk, so let them be accessible
+         * without ".simple." or ".structured." specification
+         */
+        uint32_t magic;
+        uint32_t _skip;
+        uint64_t handle;
+    } QEMU_PACKED;
+} NBDReply;
+
 #define NBD_SIMPLE_REPLY_MAGIC      0x67446698
 #define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
 
+static inline bool nbd_reply_is_simple(NBDReply *reply)
+{
+    return reply->magic == NBD_SIMPLE_REPLY_MAGIC;
+}
+
+static inline bool nbd_reply_is_structured(NBDReply *reply)
+{
+    return reply->magic == NBD_STRUCTURED_REPLY_MAGIC;
+}
+
 typedef struct NBDStructuredRead {
     NBDStructuredReplyChunk h;
     uint64_t offset;
diff --git a/block/nbd-client.c b/block/nbd-client.c
index c0683c3c83..58493b7ac4 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -92,7 +92,9 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
         i = HANDLE_TO_INDEX(s, s->reply.handle);
         if (i >= MAX_NBD_REQUESTS ||
             !s->requests[i].coroutine ||
-            !s->requests[i].receiving) {
+            !s->requests[i].receiving ||
+            nbd_reply_is_structured(&s->reply))
+        {
             break;
         }
 
@@ -194,8 +196,8 @@ static int nbd_co_receive_reply(NBDClientSession *s,
         ret = -EIO;
     } else {
         assert(s->reply.handle == handle);
-        ret = -s->reply.error;
-        if (qiov && s->reply.error == 0) {
+        ret = -nbd_errno_to_system_errno(s->reply.simple.error);
+        if (qiov && ret == 0) {
             if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
                                       NULL) < 0) {
                 ret = -EIO;
diff --git a/nbd/client.c b/nbd/client.c
index f0f3075569..a38e1a7d8e 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -910,6 +910,57 @@ int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
     return nbd_write(ioc, buf, sizeof(buf), NULL);
 }
 
+/* nbd_receive_simple_reply
+ * Read simple reply except magic field (which should be already read).
+ * Payload is not read (payload is possible for CMD_READ, but here we even
+ * don't know whether it take place or not).
+ */
+static int nbd_receive_simple_reply(QIOChannel *ioc, NBDSimpleReply *reply,
+                                    Error **errp)
+{
+    int ret;
+
+    assert(reply->magic == NBD_SIMPLE_REPLY_MAGIC);
+
+    ret = nbd_read(ioc, (uint8_t *)reply + sizeof(reply->magic),
+                   sizeof(*reply) - sizeof(reply->magic), errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    be32_to_cpus(&reply->error);
+    be64_to_cpus(&reply->handle);
+
+    return 0;
+}
+
+/* nbd_receive_structured_reply_chunk
+ * Read structured reply chunk except magic field (which should be already
+ * read).
+ * Payload is not read.
+ */
+static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
+                                              NBDStructuredReplyChunk *chunk,
+                                              Error **errp)
+{
+    int ret;
+
+    assert(chunk->magic == NBD_STRUCTURED_REPLY_MAGIC);
+
+    ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic),
+                   sizeof(*chunk) - sizeof(chunk->magic), errp);
+    if (ret < 0) {
+        return ret;
+    }
+
+    be16_to_cpus(&chunk->flags);
+    be16_to_cpus(&chunk->type);
+    be64_to_cpus(&chunk->handle);
+    be32_to_cpus(&chunk->length);
+
+    return 0;
+}
+
 /* nbd_receive_reply
  * Returns 1 on success
  *         0 on eof, when no data was read (errp is not set)
@@ -917,37 +968,48 @@ int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
  */
 int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
 {
-    uint8_t buf[NBD_REPLY_SIZE];
-    uint32_t magic;
     int ret;
 
-    ret = nbd_read_eof(ioc, buf, sizeof(buf), errp);
+    ret = nbd_read_eof(ioc, &reply->magic, sizeof(reply->magic), errp);
     if (ret <= 0) {
         return ret;
     }
 
-    /* Reply
-       [ 0 ..  3]    magic   (NBD_SIMPLE_REPLY_MAGIC)
-       [ 4 ..  7]    error   (0 == no error)
-       [ 7 .. 15]    handle
-     */
+    be32_to_cpus(&reply->magic);
 
-    magic = ldl_be_p(buf);
-    reply->error  = ldl_be_p(buf + 4);
-    reply->handle = ldq_be_p(buf + 8);
+    switch (reply->magic) {
+    case NBD_SIMPLE_REPLY_MAGIC:
+        ret = nbd_receive_simple_reply(ioc, &reply->simple, errp);
+        if (ret < 0) {
+            break;
+        }
 
-    reply->error = nbd_errno_to_system_errno(reply->error);
+        if (reply->simple.error == NBD_ESHUTDOWN) {
+            /* This works even on mingw which lacks a native ESHUTDOWN */
+            error_setg(errp, "server shutting down");
+            return -EINVAL;
+        }
 
-    if (reply->error == ESHUTDOWN) {
-        /* This works even on mingw which lacks a native ESHUTDOWN */
-        error_setg(errp, "server shutting down");
+        trace_nbd_receive_simple_reply(
+                nbd_errno_to_system_errno(reply->simple.error),
+                reply->simple.handle);
+        break;
+    case NBD_STRUCTURED_REPLY_MAGIC:
+        ret = nbd_receive_structured_reply_chunk(ioc, &reply->structured, errp);
+        if (ret < 0) {
+            break;
+        }
+        trace_nbd_receive_structured_reply_chunk(reply->structured.flags,
+                                                 reply->structured.type,
+                                                 reply->structured.handle,
+                                                 reply->structured.length);
+        break;
+    default:
+        error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", reply->magic);
         return -EINVAL;
     }
-    trace_nbd_receive_reply(magic, reply->error, reply->handle);
-
-    if (magic != NBD_SIMPLE_REPLY_MAGIC) {
-        error_setg(errp, "invalid magic (got 0x%" PRIx32 ")", magic);
-        return -EINVAL;
+    if (ret < 0) {
+        return ret;
     }
 
     return 1;
diff --git a/nbd/trace-events b/nbd/trace-events
index 3235922370..c5b24f6715 100644
--- a/nbd/trace-events
+++ b/nbd/trace-events
@@ -30,7 +30,8 @@ nbd_client_loop_ret(int ret, const char *error) "NBD loop returned %d: %s"
 nbd_client_clear_queue(void) "Clearing NBD queue"
 nbd_client_clear_socket(void) "Clearing NBD socket"
 nbd_send_request(uint64_t from, uint32_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }"
-nbd_receive_reply(uint32_t magic, int32_t error, uint64_t handle) "Got reply: { magic = 0x%" PRIx32 ", .error = % " PRId32 ", handle = %" PRIu64" }"
+nbd_receive_simple_reply(int32_t error, uint64_t handle) "Got simple reply: { error = % " PRId32 ", handle = %" PRIu64" }"
+nbd_receive_structured_reply_chunk(uint16_t flags, uint16_t type, uint64_t handle, uint32_t length) "Got structured reply chunk: { flags = 0x%" PRIx16 ", type = %d, handle = %" PRIu64 ", length = %" PRIu32 " }"
 
 # nbd/server.c
 nbd_negotiate_send_rep_len(uint32_t opt, const char *optname, uint32_t type, const char *typename, uint32_t len) "Reply opt=0x%" PRIx32 " (%s), type=0x%" PRIx32 " (%s), len=%" PRIu32
-- 
2.11.1


Re: [Qemu-devel] [PATCH v3 12/13] nbd/client: prepare nbd_receive_reply for structured reply
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> In following patch nbd_receive_reply will be used both for simple
> and structured reply header receiving.
> NBDReply is altered into union of simple reply header and structured
> reply chunk header, simple error translation moved to block/nbd-client
> to be consistent with further structured reply error translation.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/nbd.h |  30 ++++++++++++----
>  block/nbd-client.c  |   8 +++--
>  nbd/client.c        | 102 +++++++++++++++++++++++++++++++++++++++++-----------
>  nbd/trace-events    |   3 +-
>  4 files changed, 113 insertions(+), 30 deletions(-)
> 
> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 09e4592971..1ef8c8897f 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -57,12 +57,6 @@ struct NBDRequest {
>  };
>  typedef struct NBDRequest NBDRequest;
>  
> -struct NBDReply {
> -    uint64_t handle;
> -    uint32_t error;
> -};
> -typedef struct NBDReply NBDReply;

The old struct is not packed,

> -
>  typedef struct NBDSimpleReply {
>      uint32_t magic;  /* NBD_SIMPLE_REPLY_MAGIC */
>      uint32_t error;
> @@ -77,9 +71,33 @@ typedef struct NBDStructuredReplyChunk {
>      uint32_t length; /* length of payload */
>  } QEMU_PACKED NBDStructuredReplyChunk;
>  
> +typedef union NBDReply {
> +    NBDSimpleReply simple;
> +    NBDStructuredReplyChunk structured;
> +    struct {
> +        /* @magic and @handle fields have the same offset and size both in
> +         * simple reply and structured reply chunk, so let them be accessible
> +         * without ".simple." or ".structured." specification
> +         */
> +        uint32_t magic;
> +        uint32_t _skip;
> +        uint64_t handle;
> +    } QEMU_PACKED;
> +} NBDReply;

but the new is. I'm not sure if that's a good idea.  When we are dealing
with items on the wire, compilers may emit inefficient code for dealing
with packed structs; once we've parsed off the wire, putting things in a
non-packed struct for the remainder of our local handling may have
better performance.  I'm not striking this outright, especially if Paolo
has an opinion here.

> +
>  #define NBD_SIMPLE_REPLY_MAGIC      0x67446698
>  #define NBD_STRUCTURED_REPLY_MAGIC  0x668e33ef
>  
> +static inline bool nbd_reply_is_simple(NBDReply *reply)
> +{
> +    return reply->magic == NBD_SIMPLE_REPLY_MAGIC;
> +}
> +
> +static inline bool nbd_reply_is_structured(NBDReply *reply)
> +{
> +    return reply->magic == NBD_STRUCTURED_REPLY_MAGIC;
> +}
> +

Putting functions in the middle of structs/constants feels odd; I would
sink these down to the bottom.

>  typedef struct NBDStructuredRead {
>      NBDStructuredReplyChunk h;
>      uint64_t offset;
> diff --git a/block/nbd-client.c b/block/nbd-client.c
> index c0683c3c83..58493b7ac4 100644
> --- a/block/nbd-client.c
> +++ b/block/nbd-client.c
> @@ -92,7 +92,9 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
>          i = HANDLE_TO_INDEX(s, s->reply.handle);
>          if (i >= MAX_NBD_REQUESTS ||
>              !s->requests[i].coroutine ||
> -            !s->requests[i].receiving) {
> +            !s->requests[i].receiving ||
> +            nbd_reply_is_structured(&s->reply))
> +        {

I did a double-take here, but it makes sense: for this patch, we aren't
actually expecting structured replies (we aren't negotiating it with the
server yet), and therefore a server sending us one is a bug.

>              break;
>          }
>  
> @@ -194,8 +196,8 @@ static int nbd_co_receive_reply(NBDClientSession *s,
>          ret = -EIO;
>      } else {
>          assert(s->reply.handle == handle);
> -        ret = -s->reply.error;
> -        if (qiov && s->reply.error == 0) {
> +        ret = -nbd_errno_to_system_errno(s->reply.simple.error);

So here, you are stating that client.c no longer performs the errno
translation, by placing the duty on the caller instead. [1]

> +        if (qiov && ret == 0) {
>              if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
>                                        NULL) < 0) {
>                  ret = -EIO;
> diff --git a/nbd/client.c b/nbd/client.c
> index f0f3075569..a38e1a7d8e 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -910,6 +910,57 @@ int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
>      return nbd_write(ioc, buf, sizeof(buf), NULL);
>  }
>  
> +/* nbd_receive_simple_reply
> + * Read simple reply except magic field (which should be already read).
> + * Payload is not read (payload is possible for CMD_READ, but here we even
> + * don't know whether it take place or not).

I'll probably do some wordsmithing here when I post v4.

> + */
> +static int nbd_receive_simple_reply(QIOChannel *ioc, NBDSimpleReply *reply,
> +                                    Error **errp)
> +{
> +    int ret;
> +
> +    assert(reply->magic == NBD_SIMPLE_REPLY_MAGIC);
> +
> +    ret = nbd_read(ioc, (uint8_t *)reply + sizeof(reply->magic),
> +                   sizeof(*reply) - sizeof(reply->magic), errp);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    be32_to_cpus(&reply->error);
> +    be64_to_cpus(&reply->handle);
> +

Worth a trace here? [2]

> +    return 0;
> +}
> +
> +/* nbd_receive_structured_reply_chunk
> + * Read structured reply chunk except magic field (which should be already
> + * read).
> + * Payload is not read.
> + */
> +static int nbd_receive_structured_reply_chunk(QIOChannel *ioc,
> +                                              NBDStructuredReplyChunk *chunk,
> +                                              Error **errp)
> +{
> +    int ret;
> +
> +    assert(chunk->magic == NBD_STRUCTURED_REPLY_MAGIC);
> +
> +    ret = nbd_read(ioc, (uint8_t *)chunk + sizeof(chunk->magic),
> +                   sizeof(*chunk) - sizeof(chunk->magic), errp);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    be16_to_cpus(&chunk->flags);
> +    be16_to_cpus(&chunk->type);
> +    be64_to_cpus(&chunk->handle);
> +    be32_to_cpus(&chunk->length);

Worth a trace here?

> +
> +    return 0;
> +}
> +
>  /* nbd_receive_reply
>   * Returns 1 on success
>   *         0 on eof, when no data was read (errp is not set)
> @@ -917,37 +968,48 @@ int nbd_send_request(QIOChannel *ioc, NBDRequest *request)
>   */
>  int nbd_receive_reply(QIOChannel *ioc, NBDReply *reply, Error **errp)
>  {
> -    uint8_t buf[NBD_REPLY_SIZE];
> -    uint32_t magic;
>      int ret;
>  
> -    ret = nbd_read_eof(ioc, buf, sizeof(buf), errp);
> +    ret = nbd_read_eof(ioc, &reply->magic, sizeof(reply->magic), errp);
>      if (ret <= 0) {
>          return ret;
>      }
>  
> -    /* Reply
> -       [ 0 ..  3]    magic   (NBD_SIMPLE_REPLY_MAGIC)
> -       [ 4 ..  7]    error   (0 == no error)
> -       [ 7 .. 15]    handle
> -     */
> +    be32_to_cpus(&reply->magic);
>  
> -    magic = ldl_be_p(buf);
> -    reply->error  = ldl_be_p(buf + 4);
> -    reply->handle = ldq_be_p(buf + 8);
> +    switch (reply->magic) {
> +    case NBD_SIMPLE_REPLY_MAGIC:
> +        ret = nbd_receive_simple_reply(ioc, &reply->simple, errp);
> +        if (ret < 0) {
> +            break;
> +        }
>  
> -    reply->error = nbd_errno_to_system_errno(reply->error);

[1] here's where we changed the responsibility. But I see you mentioned
it in the commit message, so we're okay.

> +        if (reply->simple.error == NBD_ESHUTDOWN) {
> +            /* This works even on mingw which lacks a native ESHUTDOWN */
> +            error_setg(errp, "server shutting down");
> +            return -EINVAL;
> +        }
>  
> -    if (reply->error == ESHUTDOWN) {
> -        /* This works even on mingw which lacks a native ESHUTDOWN */
> -        error_setg(errp, "server shutting down");
> +        trace_nbd_receive_simple_reply(
> +                nbd_errno_to_system_errno(reply->simple.error),

However, it does make this trace a bit awkward.  Maybe we just tweak the
trace to now output the raw value received over the wire, rather than
the local host value; in fact, if we had a lookup function for the
various NBD errors, the trace could include the name in addition to the
error number.  Separate cleanup, as it is something that existing code
could benefit from, so I'll tackle that in my v4.

> +                reply->simple.handle);
> +        break;
> +    case NBD_STRUCTURED_REPLY_MAGIC:
> +        ret = nbd_receive_structured_reply_chunk(ioc, &reply->structured, errp);
> +        if (ret < 0) {
> +            break;
> +        }
> +        trace_nbd_receive_structured_reply_chunk(reply->structured.flags,
> +                                                 reply->structured.type,
> +                                                 reply->structured.handle,
> +                                                 reply->structured.length);

[2] oh, you are tracing here on success, rather than in the helper. Not
sure if it makes much difference.

> +++ b/nbd/trace-events
> @@ -30,7 +30,8 @@ nbd_client_loop_ret(int ret, const char *error) "NBD loop returned %d: %s"
>  nbd_client_clear_queue(void) "Clearing NBD queue"
>  nbd_client_clear_socket(void) "Clearing NBD socket"
>  nbd_send_request(uint64_t from, uint32_t len, uint64_t handle, uint16_t flags, uint16_t type, const char *name) "Sending request to server: { .from = %" PRIu64", .len = %" PRIu32 ", .handle = %" PRIu64 ", .flags = 0x%" PRIx16 ", .type = %" PRIu16 " (%s) }"
> -nbd_receive_reply(uint32_t magic, int32_t error, uint64_t handle) "Got reply: { magic = 0x%" PRIx32 ", .error = % " PRId32 ", handle = %" PRIu64" }"
> +nbd_receive_simple_reply(int32_t error, uint64_t handle) "Got simple reply: { error = % " PRId32 ", handle = %" PRIu64" }"
> +nbd_receive_structured_reply_chunk(uint16_t flags, uint16_t type, uint64_t handle, uint32_t length) "Got structured reply chunk: { flags = 0x%" PRIx16 ", type = %d, handle = %" PRIu64 ", length = %" PRIu32 " }"

I'd love to see a %s for the structured type, another lookup function
I'll propose for my v4.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[Qemu-devel] [PATCH v3 13/13] nbd: Minimal structured read for client
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
Minimal implementation: for structured error only error_report error
message.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
---
 include/block/nbd.h |   6 +
 block/nbd-client.c  | 395 ++++++++++++++++++++++++++++++++++++++++++++++++----
 nbd/client.c        |   7 +
 3 files changed, 379 insertions(+), 29 deletions(-)

diff --git a/include/block/nbd.h b/include/block/nbd.h
index 1ef8c8897f..e3350b67a4 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -203,6 +203,11 @@ enum {
 #define NBD_SREP_TYPE_ERROR         NBD_SREP_ERR(1)
 #define NBD_SREP_TYPE_ERROR_OFFSET  NBD_SREP_ERR(2)
 
+static inline bool nbd_srep_type_is_error(int type)
+{
+    return type & (1 << 15);
+}
+
 /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
  * but only a limited set of errno values is specified in the protocol.
  * Everything else is squashed to EINVAL.
@@ -241,6 +246,7 @@ static inline int nbd_errno_to_system_errno(int err)
 struct NBDExportInfo {
     /* Set by client before nbd_receive_negotiate() */
     bool request_sizes;
+    bool structured_reply;
     /* Set by server results during nbd_receive_negotiate() */
     uint64_t size;
     uint16_t flags;
diff --git a/block/nbd-client.c b/block/nbd-client.c
index 58493b7ac4..4d08cf3fd3 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -29,6 +29,7 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qemu/error-report.h"
 #include "nbd-client.h"
 
 #define HANDLE_TO_INDEX(bs, handle) ((handle) ^ (uint64_t)(intptr_t)(bs))
@@ -93,7 +94,7 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
         if (i >= MAX_NBD_REQUESTS ||
             !s->requests[i].coroutine ||
             !s->requests[i].receiving ||
-            nbd_reply_is_structured(&s->reply))
+            (nbd_reply_is_structured(&s->reply) && !s->info.structured_reply))
         {
             break;
         }
@@ -181,75 +182,406 @@ err:
     return rc;
 }
 
-static int nbd_co_receive_reply(NBDClientSession *s,
-                                uint64_t handle,
-                                QEMUIOVector *qiov)
+static inline void payload_advance16(uint8_t **payload, uint16_t **ptr)
+{
+    *ptr = (uint16_t *)*payload;
+    be16_to_cpus(*ptr);
+    *payload += sizeof(**ptr);
+}
+
+static inline void payload_advance32(uint8_t **payload, uint32_t **ptr)
+{
+    *ptr = (uint32_t *)*payload;
+    be32_to_cpus(*ptr);
+    *payload += sizeof(**ptr);
+}
+
+static inline void payload_advance64(uint8_t **payload, uint64_t **ptr)
+{
+    *ptr = (uint64_t *)*payload;
+    be64_to_cpus(*ptr);
+    *payload += sizeof(**ptr);
+}
+
+static int nbd_parse_offset_hole_payload(NBDStructuredReplyChunk *chunk,
+                                         uint8_t *payload, QEMUIOVector *qiov)
+{
+    uint64_t *offset;
+    uint32_t *hole_size;
+
+    if (chunk->length != sizeof(*offset) + sizeof(*hole_size)) {
+        return -EINVAL;
+    }
+
+    payload_advance64(&payload, &offset);
+    payload_advance32(&payload, &hole_size);
+
+    if (*offset + *hole_size > qiov->size) {
+        return -EINVAL;
+    }
+
+    qemu_iovec_memset(qiov, *offset, 0, *hole_size);
+
+    return 0;
+}
+
+static int nbd_parse_error_payload(NBDStructuredReplyChunk *chunk,
+                                   uint8_t *payload, int *request_ret)
+{
+    uint32_t *error;
+    uint16_t *message_size;
+
+    assert(chunk->type & (1 << 15));
+
+    if (chunk->length < sizeof(error) + sizeof(message_size)) {
+        return -EINVAL;
+    }
+
+    payload_advance32(&payload, &error);
+    payload_advance16(&payload, &message_size);
+
+    error_report("%.*s", *message_size, payload);
+
+    /* TODO add special case for ERROR_OFFSET */
+
+    *request_ret = nbd_errno_to_system_errno(*error);
+
+    return 0;
+}
+
+static int nbd_co_receive_offset_data_payload(NBDClientSession *s,
+                                              QEMUIOVector *qiov)
+{
+    QEMUIOVector sub_qiov;
+    uint64_t offset;
+    size_t data_size;
+    int ret;
+    NBDStructuredReplyChunk *chunk = &s->reply.structured;
+
+    assert(nbd_reply_is_structured(&s->reply));
+
+    if (chunk->length < sizeof(offset)) {
+        return -EINVAL;
+    }
+
+    if (nbd_read(s->ioc, &offset, sizeof(offset), NULL) < 0) {
+        return -EIO;
+    }
+    be64_to_cpus(&offset);
+
+    data_size = chunk->length - sizeof(offset);
+    if (offset + data_size > qiov->size) {
+        return -EINVAL;
+    }
+
+    qemu_iovec_init(&sub_qiov, qiov->niov);
+    qemu_iovec_concat(&sub_qiov, qiov, offset, data_size);
+    ret = qio_channel_readv_all(s->ioc, sub_qiov.iov, sub_qiov.niov, NULL);
+    qemu_iovec_destroy(&sub_qiov);
+
+    return ret < 0 ? -EIO : 0;
+}
+
+#define NBD_MAX_MALLOC_PAYLOAD 1000
+static int nbd_co_receive_structured_payload(NBDClientSession *s,
+                                             void **payload)
+{
+    int ret;
+    uint32_t len;
+
+    assert(nbd_reply_is_structured(&s->reply));
+
+    len = s->reply.structured.length;
+
+    if (len == 0) {
+        return 0;
+    }
+
+    if (payload == NULL) {
+        return -EINVAL;
+    }
+
+    if (len > NBD_MAX_MALLOC_PAYLOAD) {
+        return -EINVAL;
+    }
+
+    *payload = qemu_memalign(8, len);
+    ret = nbd_read(s->ioc, *payload, len, NULL);
+    if (ret < 0) {
+        qemu_vfree(*payload);
+        *payload = NULL;
+        return ret;
+    }
+
+    return 0;
+}
+
+/* nbd_co_do_receive_one_chunk
+ * for simple reply:
+ *   set request_ret to received reply error
+ *   if qiov is not NULL: read payload to @qiov
+ * for structured reply chunk:
+ *   if error chunk: read payload, set @request_ret, do not set @payload
+ *   else if offset_data chunk: read payload data to @qiov, do not set @payload
+ *   else: read payload to @payload
+ */
+static int nbd_co_do_receive_one_chunk(NBDClientSession *s, uint64_t handle,
+                                       bool only_structured, int *request_ret,
+                                       QEMUIOVector *qiov, void **payload)
 {
     int ret;
     int i = HANDLE_TO_INDEX(s, handle);
+    void *local_payload = NULL;
+
+    if (payload) {
+        *payload = NULL;
+    }
+    *request_ret = 0;
 
     /* Wait until we're woken up by nbd_read_reply_entry.  */
     s->requests[i].receiving = true;
     qemu_coroutine_yield();
     s->requests[i].receiving = false;
     if (!s->ioc || s->quit) {
-        ret = -EIO;
-    } else {
-        assert(s->reply.handle == handle);
-        ret = -nbd_errno_to_system_errno(s->reply.simple.error);
-        if (qiov && ret == 0) {
-            if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
-                                      NULL) < 0) {
-                ret = -EIO;
-                s->quit = true;
-            }
+        return -EIO;
+    }
+
+    assert(s->reply.handle == handle);
+
+    if (nbd_reply_is_simple(&s->reply)) {
+        if (only_structured) {
+            return -EINVAL;
         }
 
-        /* Tell the read handler to read another header.  */
-        s->reply.handle = 0;
+        *request_ret = -nbd_errno_to_system_errno(s->reply.simple.error);
+        if (*request_ret < 0 || !qiov) {
+            return 0;
+        }
+
+        return qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
+                                     NULL) < 0 ? -EIO : 0;
+    }
+
+    /* handle structured reply chunk */
+    assert(s->info.structured_reply);
+
+    if (s->reply.structured.type == NBD_SREP_TYPE_NONE) {
+        return 0;
+    }
+
+    if (s->reply.structured.type == NBD_SREP_TYPE_OFFSET_DATA) {
+        if (!qiov) {
+            return -EINVAL;
+        }
+
+        return nbd_co_receive_offset_data_payload(s, qiov);
+    }
+
+    if (nbd_srep_type_is_error(s->reply.structured.type)) {
+        payload = &local_payload;
+    }
+
+    ret = nbd_co_receive_structured_payload(s, payload);
+    if (ret < 0) {
+        return ret;
     }
 
-    s->requests[i].coroutine = NULL;
+    if (nbd_srep_type_is_error(s->reply.structured.type)) {
+        ret = nbd_parse_error_payload(&s->reply.structured, local_payload,
+                                      request_ret);
+        qemu_vfree(local_payload);
+        return ret;
+    }
+
+    return 0;
+}
+
+/* nbd_co_receive_one_chunk
+ * Read reply, wake up read_reply_co and set s->quit if needed.
+ * Return value is a fatal error code or normal nbd reply error code
+ */
+static int nbd_co_receive_one_chunk(NBDClientSession *s, uint64_t handle,
+                                    bool only_structured,
+                                    QEMUIOVector *qiov, NBDReply *reply,
+                                    void **payload)
+{
+    int request_ret;
+    int ret = nbd_co_do_receive_one_chunk(s, handle, only_structured,
+                                          &request_ret, qiov, payload);
+
+    if (ret < 0) {
+        s->quit = true;
+    } else {
+        /* For assert at loop start in nbd_read_reply_entry */
+        if (reply) {
+            *reply = s->reply;
+        }
+        s->reply.handle = 0;
+        ret = request_ret;
+    }
 
-    /* Kick the read_reply_co to get the next reply.  */
     if (s->read_reply_co) {
         aio_co_wake(s->read_reply_co);
     }
 
+    return ret;
+}
+
+typedef struct NBDReplyChunkIter {
+    int ret;
+    bool done, only_structured;
+} NBDReplyChunkIter;
+
+#define NBD_FOREACH_REPLY_CHUNK(s, iter, handle, structured, \
+                                qiov, reply, payload) \
+    for (iter = (NBDReplyChunkIter) { .only_structured = structured }; \
+         nbd_reply_chunk_iter_receive(s, &iter, handle, qiov, reply, payload);)
+
+static bool nbd_reply_chunk_iter_receive(NBDClientSession *s,
+                                         NBDReplyChunkIter *iter,
+                                         uint64_t handle,
+                                         QEMUIOVector *qiov, NBDReply *reply,
+                                         void **payload)
+{
+    int ret;
+    NBDReply local_reply;
+    NBDStructuredReplyChunk *chunk;
+    if (s->quit) {
+        if (iter->ret == 0) {
+            iter->ret = -EIO;
+        }
+        goto break_loop;
+    }
+
+    if (iter->done) {
+        /* Previous iteration was last. */
+        goto break_loop;
+    }
+
+    if (reply == NULL) {
+        reply = &local_reply;
+    }
+
+    ret = nbd_co_receive_one_chunk(s, handle, iter->only_structured,
+                                   qiov, reply, payload);
+    if (ret < 0 && iter->ret == 0) {
+        /* If it is a fatal error s->qiov is set by nbd_co_receive_one_chunk */
+        iter->ret = ret;
+    }
+
+    /* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */
+    if (nbd_reply_is_simple(&s->reply) || s->quit) {
+        goto break_loop;
+    }
+
+    chunk = &reply->structured;
+    iter->only_structured = true;
+
+    if (chunk->type == NBD_SREP_TYPE_NONE) {
+        if (!(chunk->flags & NBD_SREP_FLAG_DONE)) {
+            /* protocol error */
+            s->quit = true;
+            if (iter->ret == 0) {
+                iter->ret = -EIO;
+            }
+        }
+        goto break_loop;
+    }
+
+    if (chunk->flags & NBD_SREP_FLAG_DONE) {
+        /* This iteration is last. */
+        iter->done = true;
+    }
+
+    /* Execute the loop body */
+    return true;
+
+break_loop:
+    s->requests[HANDLE_TO_INDEX(s, handle)].coroutine = NULL;
+
     qemu_co_mutex_lock(&s->send_mutex);
     s->in_flight--;
     qemu_co_queue_next(&s->free_sema);
     qemu_co_mutex_unlock(&s->send_mutex);
 
-    return ret;
+    return false;
+}
+
+static int nbd_co_receive_return_code(NBDClientSession *s, uint64_t handle)
+{
+    NBDReplyChunkIter iter;
+
+    NBD_FOREACH_REPLY_CHUNK(s, iter, handle, false, NULL, NULL, NULL) {
+        /* nbd_reply_chunk_iter_receive does all the work */
+        ;
+    }
+
+    return iter.ret;
+}
+
+static int nbd_co_receive_cmdread_reply(NBDClientSession *s, uint64_t handle,
+                                        QEMUIOVector *qiov)
+{
+    NBDReplyChunkIter iter;
+    NBDReply reply;
+    void *payload = NULL;
+
+    NBD_FOREACH_REPLY_CHUNK(s, iter, handle, s->info.structured_reply,
+                            qiov, &reply, &payload)
+    {
+        int ret;
+
+        switch (reply.structured.type) {
+        case NBD_SREP_TYPE_OFFSET_DATA:
+            /* special cased in nbd_co_receive_one_chunk, data is already
+             * in qiov */
+            break;
+        case NBD_SREP_TYPE_OFFSET_HOLE:
+            ret = nbd_parse_offset_hole_payload(&reply.structured, payload,
+                                                qiov);
+            if (ret < 0) {
+                s->quit = true;
+            }
+            break;
+        default:
+            /* not allowed reply type */
+            s->quit = true;
+        }
+
+        qemu_vfree(payload);
+        payload = NULL;
+    }
+
+    return iter.ret;
 }
 
 static int nbd_co_request(BlockDriverState *bs,
                           NBDRequest *request,
-                          QEMUIOVector *qiov)
+                          QEMUIOVector *write_qiov)
 {
     NBDClientSession *client = nbd_get_client_session(bs);
     int ret;
 
-    if (qiov) {
-        assert(request->type == NBD_CMD_WRITE || request->type == NBD_CMD_READ);
-        assert(request->len == iov_size(qiov->iov, qiov->niov));
+    assert(request->type != NBD_CMD_READ);
+    if (write_qiov) {
+        assert(request->type == NBD_CMD_WRITE);
+        assert(request->len == iov_size(write_qiov->iov, write_qiov->niov));
     } else {
-        assert(request->type != NBD_CMD_WRITE && request->type != NBD_CMD_READ);
+        assert(request->type != NBD_CMD_WRITE);
     }
-    ret = nbd_co_send_request(bs, request,
-                              request->type == NBD_CMD_WRITE ? qiov : NULL);
+    ret = nbd_co_send_request(bs, request, write_qiov);
     if (ret < 0) {
         return ret;
     }
 
-    return nbd_co_receive_reply(client, request->handle,
-                                request->type == NBD_CMD_READ ? qiov : NULL);
+    return nbd_co_receive_return_code(client, request->handle);
 }
 
 int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
                          uint64_t bytes, QEMUIOVector *qiov, int flags)
 {
+    int ret;
+    NBDClientSession *client = nbd_get_client_session(bs);
     NBDRequest request = {
         .type = NBD_CMD_READ,
         .from = offset,
@@ -259,7 +591,12 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
     assert(bytes <= NBD_MAX_BUFFER_SIZE);
     assert(!flags);
 
-    return nbd_co_request(bs, &request, qiov);
+    ret = nbd_co_send_request(bs, &request, NULL);
+    if (ret < 0) {
+        return ret;
+    }
+
+    return nbd_co_receive_cmdread_reply(client, request.handle, qiov);
 }
 
 int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
diff --git a/nbd/client.c b/nbd/client.c
index a38e1a7d8e..2f256ee771 100644
--- a/nbd/client.c
+++ b/nbd/client.c
@@ -687,6 +687,13 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
         if (fixedNewStyle) {
             int result;
 
+            result = nbd_request_simple_option(ioc, NBD_OPT_STRUCTURED_REPLY,
+                                               errp);
+            if (result < 0) {
+                goto fail;
+            }
+            info->structured_reply = result == 1;
+
             /* Try NBD_OPT_GO first - if it works, we are done (it
              * also gives us a good message if the server requires
              * TLS).  If it is not available, fall back to
-- 
2.11.1


Re: [Qemu-devel] [PATCH v3 13/13] nbd: Minimal structured read for client
Posted by Eric Blake, 8 weeks ago
On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
> Minimal implementation: for structured error only error_report error
> message.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> ---
>  include/block/nbd.h |   6 +
>  block/nbd-client.c  | 395 ++++++++++++++++++++++++++++++++++++++++++++++++----
>  nbd/client.c        |   7 +
>  3 files changed, 379 insertions(+), 29 deletions(-)
> 

The fun stuff!

> diff --git a/include/block/nbd.h b/include/block/nbd.h
> index 1ef8c8897f..e3350b67a4 100644
> --- a/include/block/nbd.h
> +++ b/include/block/nbd.h
> @@ -203,6 +203,11 @@ enum {
>  #define NBD_SREP_TYPE_ERROR         NBD_SREP_ERR(1)
>  #define NBD_SREP_TYPE_ERROR_OFFSET  NBD_SREP_ERR(2)
>  
> +static inline bool nbd_srep_type_is_error(int type)
> +{
> +    return type & (1 << 15);
> +}

Knock-on effects to your earlier reply that s/srep/reply/ was okay.
Again, I'm not a fan of inline functions until after all the structs and
constants have been declared, so I'd sink this a bit lower.

> +
>  /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
>   * but only a limited set of errno values is specified in the protocol.
>   * Everything else is squashed to EINVAL.
> @@ -241,6 +246,7 @@ static inline int nbd_errno_to_system_errno(int err)
>  struct NBDExportInfo {
>      /* Set by client before nbd_receive_negotiate() */
>      bool request_sizes;
> +    bool structured_reply;
>      /* Set by server results during nbd_receive_negotiate() */

You are correct that the client has to set this before negotiate (if we
are in qemu-nbd and about to hand over to the kernel, we must NOT
request structured_reply unless the kernel has been patched to
understand structured replies - but upstream NBD isn't there yet; but if
we are using block/nbd-client.c, then we can request it). But we must
also check this AFTER negotiate, to see if the server understood our
request (how we handle reads depends on what mode the server is in).  So
maybe this needs another comment.

> +++ b/block/nbd-client.c
> @@ -29,6 +29,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qapi/error.h"
> +#include "qemu/error-report.h"

What errors are we reporting directly, instead of feeding back through
errp?  Without seeing the rest of the patch yet, I'm suspicious of this
include.

>  #include "nbd-client.h"
>  
>  #define HANDLE_TO_INDEX(bs, handle) ((handle) ^ (uint64_t)(intptr_t)(bs))
> @@ -93,7 +94,7 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
>          if (i >= MAX_NBD_REQUESTS ||
>              !s->requests[i].coroutine ||
>              !s->requests[i].receiving ||
> -            nbd_reply_is_structured(&s->reply))
> +            (nbd_reply_is_structured(&s->reply) && !s->info.structured_reply))
>          {

Do we set a good error message when the server sends us garbage? [1]

>              break;
>          }
> @@ -181,75 +182,406 @@ err:
>      return rc;
>  }
>  
> -static int nbd_co_receive_reply(NBDClientSession *s,
> -                                uint64_t handle,
> -                                QEMUIOVector *qiov)
> +static inline void payload_advance16(uint8_t **payload, uint16_t **ptr)
> +{
> +    *ptr = (uint16_t *)*payload;
> +    be16_to_cpus(*ptr);

Won't work.  This is a potentially unaligned cast, where you don't have
the benefit of a packed struct, and the compiler will probably gripe at
you on platforms stricter than x86.  Instead, if I remember correctly,
we should use
 *ptr = ldw_be_p(*ptr);
Ditto to your other sizes.

Why not a helper for an 8-bit read?

> +static int nbd_parse_offset_hole_payload(NBDStructuredReplyChunk *chunk,
> +                                         uint8_t *payload, QEMUIOVector *qiov)
> +{
> +    uint64_t *offset;
> +    uint32_t *hole_size;
> +
> +    if (chunk->length != sizeof(*offset) + sizeof(*hole_size)) {
> +        return -EINVAL;

Should we plumb in errp, and return a decent error message that way
rather than relying on a mere -EINVAL which forces us to drop connection
with the server and treat all remaining outstanding requests as EIO?

Thinking a bit more: If we get here, the server sent us the wrong number
of bytes - but we know from chunk->length how much data the server
claims to have coming, even if we don't know what to do with that data.
And we've already read the payload into malloc'd memory, so we are in
sync to read more replies from the server.  So we don't have to hang up,
but it's probably safer to hang up than to misinterpret what the server
sent and risking misbehavior down the road.

> +    }
> +
> +    payload_advance64(&payload, &offset);
> +    payload_advance32(&payload, &hole_size);
> +
> +    if (*offset + *hole_size > qiov->size) {
> +        return -EINVAL;
> +    }

Whereas here, the server sent us the right number of bytes, but with
invalid semantics (a weaker class of error than above).  Still, once we
know the server is violating protocol, we're probably wiser to hang up
than persisting on keeping the connection with the server.

We aren't checking for overlap with any previously-received chunk, or
with any previously-received error-with-offset.  I'm okay with that, as
it really is easier to implement on the client side (just because the
spec says the server is buggy for doing that does not mean we have to
spend resources in the client validating if the server is buggy).

> +
> +    qemu_iovec_memset(qiov, *offset, 0, *hole_size);
> +
> +    return 0;
> +}
> +
> +static int nbd_parse_error_payload(NBDStructuredReplyChunk *chunk,
> +                                   uint8_t *payload, int *request_ret)
> +{
> +    uint32_t *error;
> +    uint16_t *message_size;
> +
> +    assert(chunk->type & (1 << 15));
> +
> +    if (chunk->length < sizeof(error) + sizeof(message_size)) {
> +        return -EINVAL;
> +    }

Again, should we plumb in the use of errp, rather than just
disconnecting from the server?

> +
> +    payload_advance32(&payload, &error);
> +    payload_advance16(&payload, &message_size);
> +
> +    error_report("%.*s", *message_size, payload);

I think this one is wrong - we definitely want to log the error message
that we got (or at least trace it), but since the chunk is in reply to a
pending request, we should be able to feed the error message to errp of
the request, rather than reporting it here and losing it.

Also, if *message_size is 0, error_report("") isn't very useful (and
right now, the simple server implementation doesn't send a message).

> +
> +    /* TODO add special case for ERROR_OFFSET */
> +
> +    *request_ret = nbd_errno_to_system_errno(*error);

We should validate that the server didn't send us 0 for success.

> +
> +    return 0;

So if we get here, we know we have an error, but we did the minimal
handling of it (regardless of what chunk type it has), and can resume
communication with the server.  This matches the NBD spec.

> +}
> +
> +static int nbd_co_receive_offset_data_payload(NBDClientSession *s,
> +                                              QEMUIOVector *qiov)
> +{
> +    QEMUIOVector sub_qiov;
> +    uint64_t offset;
> +    size_t data_size;
> +    int ret;
> +    NBDStructuredReplyChunk *chunk = &s->reply.structured;
> +
> +    assert(nbd_reply_is_structured(&s->reply));
> +
> +    if (chunk->length < sizeof(offset)) {
> +        return -EINVAL;
> +    }
> +
> +    if (nbd_read(s->ioc, &offset, sizeof(offset), NULL) < 0) {
> +        return -EIO;

errp plumbing is missing (instead of ignoring the nbd_read() error and
relying just on our own -EIO, we should also try to preserve the
original error message).

> +    }
> +    be64_to_cpus(&offset);
> +
> +    data_size = chunk->length - sizeof(offset);
> +    if (offset + data_size > qiov->size) {
> +        return -EINVAL;
> +    }
> +
> +    qemu_iovec_init(&sub_qiov, qiov->niov);
> +    qemu_iovec_concat(&sub_qiov, qiov, offset, data_size);
> +    ret = qio_channel_readv_all(s->ioc, sub_qiov.iov, sub_qiov.niov, NULL);

errp plumbing again.

> +    qemu_iovec_destroy(&sub_qiov);
> +
> +    return ret < 0 ? -EIO : 0;
> +}
> +
> +#define NBD_MAX_MALLOC_PAYLOAD 1000

Feels rather arbitrary, and potentially somewhat small.  What is the
justification for this number, as opposed to reusing NBD_MAX_BUFFER_SIZE
(32M) or some other value?

Should this limit be in a .h file, next to NBD_MAX_BUFFER_SIZE?

> +static int nbd_co_receive_structured_payload(NBDClientSession *s,
> +                                             void **payload)

Documentation should mention that the result requires qemu_vfree()
instead of the more usual g_free()

> +{
> +    int ret;
> +    uint32_t len;
> +
> +    assert(nbd_reply_is_structured(&s->reply));
> +
> +    len = s->reply.structured.length;
> +
> +    if (len == 0) {
> +        return 0;
> +    }
> +
> +    if (payload == NULL) {
> +        return -EINVAL;
> +    }

Again, missing a useful error message (the server sent us payload that
we aren't expecting) for reporting back through errp.

> +
> +    if (len > NBD_MAX_MALLOC_PAYLOAD) {
> +        return -EINVAL;
> +    }
> +
> +    *payload = qemu_memalign(8, len);
> +    ret = nbd_read(s->ioc, *payload, len, NULL);

errp plumbing

> +    if (ret < 0) {
> +        qemu_vfree(*payload);
> +        *payload = NULL;
> +        return ret;
> +    }
> +
> +    return 0;
> +}
> +
> +/* nbd_co_do_receive_one_chunk
> + * for simple reply:
> + *   set request_ret to received reply error
> + *   if qiov is not NULL: read payload to @qiov
> + * for structured reply chunk:
> + *   if error chunk: read payload, set @request_ret, do not set @payload
> + *   else if offset_data chunk: read payload data to @qiov, do not set @payload
> + *   else: read payload to @payload

Mention that payload must be freed with qemu_vfree()

> + */
> +static int nbd_co_do_receive_one_chunk(NBDClientSession *s, uint64_t handle,
> +                                       bool only_structured, int *request_ret,
> +                                       QEMUIOVector *qiov, void **payload)
>  {
>      int ret;
>      int i = HANDLE_TO_INDEX(s, handle);
> +    void *local_payload = NULL;
> +
> +    if (payload) {
> +        *payload = NULL;
> +    }
> +    *request_ret = 0;
>  
>      /* Wait until we're woken up by nbd_read_reply_entry.  */
>      s->requests[i].receiving = true;
>      qemu_coroutine_yield();
>      s->requests[i].receiving = false;
>      if (!s->ioc || s->quit) {
> -        ret = -EIO;
> -    } else {
> -        assert(s->reply.handle == handle);
> -        ret = -nbd_errno_to_system_errno(s->reply.simple.error);
> -        if (qiov && ret == 0) {
> -            if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
> -                                      NULL) < 0) {

Okay, I'll admit that we are already lacking errp plumbing, so adding it
in this patch is not fair (if we add it, it can be a separate patch).

> -                ret = -EIO;
> -                s->quit = true;
> -            }
> +        return -EIO;
> +    }
> +
> +    assert(s->reply.handle == handle);
> +
> +    if (nbd_reply_is_simple(&s->reply)) {
> +        if (only_structured) {
> +            return -EINVAL;
>          }

[1] Earlier, you checked that the server isn't sending structured
replies where we expect simple; and here you're checking that the server
isn't sending simple replies where we expect structured.  Good, we've
covered both mismatches, and I can see why you have it in separate
locations.

>  
> -        /* Tell the read handler to read another header.  */
> -        s->reply.handle = 0;
> +        *request_ret = -nbd_errno_to_system_errno(s->reply.simple.error);
> +        if (*request_ret < 0 || !qiov) {
> +            return 0;
> +        }
> +
> +        return qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
> +                                     NULL) < 0 ? -EIO : 0;
> +    }
> +
> +    /* handle structured reply chunk */
> +    assert(s->info.structured_reply);
> +
> +    if (s->reply.structured.type == NBD_SREP_TYPE_NONE) {
> +        return 0;

We should also check that the server properly set NBD_REPLY_FLAG_DONE
(if we got this type and the flag wasn't set, the server is sending
garbage, and we should request disconnect soon). [2]

> +    }
> +
> +    if (s->reply.structured.type == NBD_SREP_TYPE_OFFSET_DATA) {
> +        if (!qiov) {
> +            return -EINVAL;
> +        }
> +
> +        return nbd_co_receive_offset_data_payload(s, qiov);
> +    }
> +
> +    if (nbd_srep_type_is_error(s->reply.structured.type)) {
> +        payload = &local_payload;
> +    }
> +
> +    ret = nbd_co_receive_structured_payload(s, payload);
> +    if (ret < 0) {
> +        return ret;
>      }
>  
> -    s->requests[i].coroutine = NULL;
> +    if (nbd_srep_type_is_error(s->reply.structured.type)) {
> +        ret = nbd_parse_error_payload(&s->reply.structured, local_payload,
> +                                      request_ret);
> +        qemu_vfree(local_payload);
> +        return ret;
> +    }
> +
> +    return 0;
> +}
> +
> +/* nbd_co_receive_one_chunk
> + * Read reply, wake up read_reply_co and set s->quit if needed.
> + * Return value is a fatal error code or normal nbd reply error code
> + */
> +static int nbd_co_receive_one_chunk(NBDClientSession *s, uint64_t handle,

Is this function called in coroutine context?  Presumably yes, because
of the _co_ infix; but then it should also have the coroutine_fn marker
in the declaration.

> +                                    bool only_structured,
> +                                    QEMUIOVector *qiov, NBDReply *reply,
> +                                    void **payload)
> +{
> +    int request_ret;
> +    int ret = nbd_co_do_receive_one_chunk(s, handle, only_structured,
> +                                          &request_ret, qiov, payload);
> +
> +    if (ret < 0) {
> +        s->quit = true;
> +    } else {
> +        /* For assert at loop start in nbd_read_reply_entry */
> +        if (reply) {
> +            *reply = s->reply;
> +        }
> +        s->reply.handle = 0;
> +        ret = request_ret;
> +    }
>  
> -    /* Kick the read_reply_co to get the next reply.  */
>      if (s->read_reply_co) {
>          aio_co_wake(s->read_reply_co);
>      }
>  
> +    return ret;
> +}
> +
> +typedef struct NBDReplyChunkIter {
> +    int ret;
> +    bool done, only_structured;
> +} NBDReplyChunkIter;
> +
> +#define NBD_FOREACH_REPLY_CHUNK(s, iter, handle, structured, \
> +                                qiov, reply, payload) \
> +    for (iter = (NBDReplyChunkIter) { .only_structured = structured }; \
> +         nbd_reply_chunk_iter_receive(s, &iter, handle, qiov, reply, payload);)
> +
> +static bool nbd_reply_chunk_iter_receive(NBDClientSession *s,
> +                                         NBDReplyChunkIter *iter,
> +                                         uint64_t handle,
> +                                         QEMUIOVector *qiov, NBDReply *reply,
> +                                         void **payload)
> +{
> +    int ret;
> +    NBDReply local_reply;
> +    NBDStructuredReplyChunk *chunk;
> +    if (s->quit) {
> +        if (iter->ret == 0) {
> +            iter->ret = -EIO;
> +        }
> +        goto break_loop;
> +    }
> +
> +    if (iter->done) {
> +        /* Previous iteration was last. */
> +        goto break_loop;
> +    }
> +
> +    if (reply == NULL) {
> +        reply = &local_reply;
> +    }
> +
> +    ret = nbd_co_receive_one_chunk(s, handle, iter->only_structured,
> +                                   qiov, reply, payload);
> +    if (ret < 0 && iter->ret == 0) {
> +        /* If it is a fatal error s->qiov is set by nbd_co_receive_one_chunk */

did you mean s->quit here?

> +        iter->ret = ret;
> +    }
> +
> +    /* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */
> +    if (nbd_reply_is_simple(&s->reply) || s->quit) {
> +        goto break_loop;
> +    }
> +
> +    chunk = &reply->structured;
> +    iter->only_structured = true;

Interesting observation - the NBD spec lets us return a structured error
even to a non-read command.  Only NBD_CMD_READ requires a structured
reply when we haven't yet received any reply; but you are correct that
once a given handle sees one structured chunk without a done bit, then
all future replies for that handle must also be structured.

> +
> +    if (chunk->type == NBD_SREP_TYPE_NONE) {
> +        if (!(chunk->flags & NBD_SREP_FLAG_DONE)) {
> +            /* protocol error */
> +            s->quit = true;
> +            if (iter->ret == 0) {
> +                iter->ret = -EIO;
> +            }

[2] Ah, I see you did it here instead!

> +        }
> +        goto break_loop;
> +    }
> +
> +    if (chunk->flags & NBD_SREP_FLAG_DONE) {
> +        /* This iteration is last. */
> +        iter->done = true;
> +    }
> +
> +    /* Execute the loop body */
> +    return true;
> +
> +break_loop:
> +    s->requests[HANDLE_TO_INDEX(s, handle)].coroutine = NULL;
> +
>      qemu_co_mutex_lock(&s->send_mutex);
>      s->in_flight--;
>      qemu_co_queue_next(&s->free_sema);
>      qemu_co_mutex_unlock(&s->send_mutex);
>  
> -    return ret;
> +    return false;
> +}
> +
> +static int nbd_co_receive_return_code(NBDClientSession *s, uint64_t handle)
> +{
> +    NBDReplyChunkIter iter;
> +
> +    NBD_FOREACH_REPLY_CHUNK(s, iter, handle, false, NULL, NULL, NULL) {
> +        /* nbd_reply_chunk_iter_receive does all the work */
> +        ;
> +    }
> +
> +    return iter.ret;
> +}
> +
> +static int nbd_co_receive_cmdread_reply(NBDClientSession *s, uint64_t handle,
> +                                        QEMUIOVector *qiov)
> +{
> +    NBDReplyChunkIter iter;
> +    NBDReply reply;
> +    void *payload = NULL;
> +
> +    NBD_FOREACH_REPLY_CHUNK(s, iter, handle, s->info.structured_reply,
> +                            qiov, &reply, &payload)
> +    {
> +        int ret;
> +
> +        switch (reply.structured.type) {
> +        case NBD_SREP_TYPE_OFFSET_DATA:
> +            /* special cased in nbd_co_receive_one_chunk, data is already
> +             * in qiov */
> +            break;
> +        case NBD_SREP_TYPE_OFFSET_HOLE:
> +            ret = nbd_parse_offset_hole_payload(&reply.structured, payload,
> +                                                qiov);
> +            if (ret < 0) {
> +                s->quit = true;
> +            }
> +            break;
> +        default:
> +            /* not allowed reply type */

Slightly misleading; may want to also point out that error chunks were
already captured during the foreach loop.

> +            s->quit = true;
> +        }
> +
> +        qemu_vfree(payload);
> +        payload = NULL;
> +    }
> +
> +    return iter.ret;
>  }
>  
>  static int nbd_co_request(BlockDriverState *bs,
>                            NBDRequest *request,
> -                          QEMUIOVector *qiov)
> +                          QEMUIOVector *write_qiov)

The rename is somewhat cosmetic, but I think it reads well.

>  {
>      NBDClientSession *client = nbd_get_client_session(bs);
>      int ret;
>  
> -    if (qiov) {
> -        assert(request->type == NBD_CMD_WRITE || request->type == NBD_CMD_READ);
> -        assert(request->len == iov_size(qiov->iov, qiov->niov));
> +    assert(request->type != NBD_CMD_READ);
> +    if (write_qiov) {
> +        assert(request->type == NBD_CMD_WRITE);
> +        assert(request->len == iov_size(write_qiov->iov, write_qiov->niov));
>      } else {
> -        assert(request->type != NBD_CMD_WRITE && request->type != NBD_CMD_READ);
> +        assert(request->type != NBD_CMD_WRITE);
>      }
> -    ret = nbd_co_send_request(bs, request,
> -                              request->type == NBD_CMD_WRITE ? qiov : NULL);
> +    ret = nbd_co_send_request(bs, request, write_qiov);
>      if (ret < 0) {
>          return ret;
>      }
>  
> -    return nbd_co_receive_reply(client, request->handle,
> -                                request->type == NBD_CMD_READ ? qiov : NULL);
> +    return nbd_co_receive_return_code(client, request->handle);
>  }

So basically read was special enough that it no longer shares the common
code at this level.  Still, I like how you've divided things among the
various functions.

>  
>  int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
>                           uint64_t bytes, QEMUIOVector *qiov, int flags)
>  {
> +    int ret;
> +    NBDClientSession *client = nbd_get_client_session(bs);
>      NBDRequest request = {
>          .type = NBD_CMD_READ,
>          .from = offset,
> @@ -259,7 +591,12 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
>      assert(bytes <= NBD_MAX_BUFFER_SIZE);
>      assert(!flags);
>  
> -    return nbd_co_request(bs, &request, qiov);
> +    ret = nbd_co_send_request(bs, &request, NULL);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    return nbd_co_receive_cmdread_reply(client, request.handle, qiov);
>  }
>  

And of course, your next goal is adding a block_status function that
will also special-case chunked replies - but definitely a later patch ;)

>  int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
> diff --git a/nbd/client.c b/nbd/client.c
> index a38e1a7d8e..2f256ee771 100644
> --- a/nbd/client.c
> +++ b/nbd/client.c
> @@ -687,6 +687,13 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
>          if (fixedNewStyle) {
>              int result;
>  
> +            result = nbd_request_simple_option(ioc, NBD_OPT_STRUCTURED_REPLY,
> +                                               errp);

Bug - we must NOT request this option unless we know we can handle the
server saying yes.  When nbd-client is handling the connection, we're
fine; but when qemu-nbd is handling the connection by using ioctls to
hand off to the kernel, we MUST query the kernel to see if it supports
structured replies (well, for now, we can get by with saying that we
KNOW the kernel code has not been written yet, and therefore our query
is a constant false, but someday we'll have a future patch in that area).

> +            if (result < 0) {
> +                goto fail;
> +            }
> +            info->structured_reply = result == 1;
> +
>              /* Try NBD_OPT_GO first - if it works, we are done (it
>               * also gives us a good message if the server requires
>               * TLS).  If it is not available, fall back to
> 

A lot to digest in this message. While I was comfortable tweaking
previous patches, and/or inserting some of my own for a v4, I think this
one is complex enough that I'd prefer it if I send 9-12 + my own
followup patches as my v4, then you rebase this one on top of that.  But
I also think you have a very promising patch here; you got a lot of
things right (even if it didn't have much in the way of comments), and
the design is looking reasonable.  Perhaps it is also worth seeing if
Paolo has review comments on this one.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v3 13/13] nbd: Minimal structured read for client
Posted by Vladimir Sementsov-Ogievskiy, 8 weeks ago
13.10.2017 23:51, Eric Blake wrote:
> On 10/12/2017 04:53 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Minimal implementation: for structured error only error_report error
>> message.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
>>   include/block/nbd.h |   6 +
>>   block/nbd-client.c  | 395 ++++++++++++++++++++++++++++++++++++++++++++++++----
>>   nbd/client.c        |   7 +
>>   3 files changed, 379 insertions(+), 29 deletions(-)
>>
> The fun stuff!
>
>> diff --git a/include/block/nbd.h b/include/block/nbd.h
>> index 1ef8c8897f..e3350b67a4 100644
>> --- a/include/block/nbd.h
>> +++ b/include/block/nbd.h
>> @@ -203,6 +203,11 @@ enum {
>>   #define NBD_SREP_TYPE_ERROR         NBD_SREP_ERR(1)
>>   #define NBD_SREP_TYPE_ERROR_OFFSET  NBD_SREP_ERR(2)
>>   
>> +static inline bool nbd_srep_type_is_error(int type)
>> +{
>> +    return type & (1 << 15);
>> +}
> Knock-on effects to your earlier reply that s/srep/reply/ was okay.
> Again, I'm not a fan of inline functions until after all the structs and
> constants have been declared, so I'd sink this a bit lower.

why? I put it here to be close to errors, and to NBD_REPLY_ERR which 
uses the same semantics..

>
>> +
>>   /* NBD errors are based on errno numbers, so there is a 1:1 mapping,
>>    * but only a limited set of errno values is specified in the protocol.
>>    * Everything else is squashed to EINVAL.
>> @@ -241,6 +246,7 @@ static inline int nbd_errno_to_system_errno(int err)
>>   struct NBDExportInfo {
>>       /* Set by client before nbd_receive_negotiate() */
>>       bool request_sizes;
>> +    bool structured_reply;
>>       /* Set by server results during nbd_receive_negotiate() */
> You are correct that the client has to set this before negotiate (if we
> are in qemu-nbd and about to hand over to the kernel, we must NOT
> request structured_reply unless the kernel has been patched to
> understand structured replies - but upstream NBD isn't there yet; but if
> we are using block/nbd-client.c, then we can request it). But we must
> also check this AFTER negotiate, to see if the server understood our
> request (how we handle reads depends on what mode the server is in).  So
> maybe this needs another comment.

I just missed these comments. Accordingly to the patch, this field 
should be in the second section.
However, better is make it "in-out" as in your description.


>
>> +++ b/block/nbd-client.c
>> @@ -29,6 +29,7 @@
>>   
>>   #include "qemu/osdep.h"
>>   #include "qapi/error.h"
>> +#include "qemu/error-report.h"
> What errors are we reporting directly, instead of feeding back through
> errp?  Without seeing the rest of the patch yet, I'm suspicious of this
> include.

I've postponed errp handling.. We have no errp parameters in 
.bdrv_co_preadv and friends, so where to feed them back?

>
>>   #include "nbd-client.h"
>>   
>>   #define HANDLE_TO_INDEX(bs, handle) ((handle) ^ (uint64_t)(intptr_t)(bs))
>> @@ -93,7 +94,7 @@ static coroutine_fn void nbd_read_reply_entry(void *opaque)
>>           if (i >= MAX_NBD_REQUESTS ||
>>               !s->requests[i].coroutine ||
>>               !s->requests[i].receiving ||
>> -            nbd_reply_is_structured(&s->reply))
>> +            (nbd_reply_is_structured(&s->reply) && !s->info.structured_reply))
>>           {
> Do we set a good error message when the server sends us garbage? [1]

no, and did not do it before.

>
>>               break;
>>           }
>> @@ -181,75 +182,406 @@ err:
>>       return rc;
>>   }
>>   
>> -static int nbd_co_receive_reply(NBDClientSession *s,
>> -                                uint64_t handle,
>> -                                QEMUIOVector *qiov)
>> +static inline void payload_advance16(uint8_t **payload, uint16_t **ptr)
>> +{
>> +    *ptr = (uint16_t *)*payload;
>> +    be16_to_cpus(*ptr);
> Won't work.  This is a potentially unaligned cast, where you don't have

hmm, yes. payload is allocated by qemu_memalign, but it may be already 
"advanced".

> the benefit of a packed struct, and the compiler will probably gripe at
> you on platforms stricter than x86.  Instead, if I remember correctly,
> we should use
>   *ptr = ldw_be_p(*ptr);
> Ditto to your other sizes.
>
> Why not a helper for an 8-bit read?

as there is no usage for it

>
>> +static int nbd_parse_offset_hole_payload(NBDStructuredReplyChunk *chunk,
>> +                                         uint8_t *payload, QEMUIOVector *qiov)
>> +{
>> +    uint64_t *offset;
>> +    uint32_t *hole_size;
>> +
>> +    if (chunk->length != sizeof(*offset) + sizeof(*hole_size)) {
>> +        return -EINVAL;
> Should we plumb in errp, and return a decent error message that way
> rather than relying on a mere -EINVAL which forces us to drop connection
> with the server and treat all remaining outstanding requests as EIO?
>
> Thinking a bit more: If we get here, the server sent us the wrong number
> of bytes - but we know from chunk->length how much data the server
> claims to have coming, even if we don't know what to do with that data.
> And we've already read the payload into malloc'd memory, so we are in
> sync to read more replies from the server.  So we don't have to hang up,
> but it's probably safer to hang up than to misinterpret what the server
> sent and risking misbehavior down the road.

I'm for safer path.

>
>> +    }
>> +
>> +    payload_advance64(&payload, &offset);
>> +    payload_advance32(&payload, &hole_size);
>> +
>> +    if (*offset + *hole_size > qiov->size) {
>> +        return -EINVAL;
>> +    }
> Whereas here, the server sent us the right number of bytes, but with
> invalid semantics (a weaker class of error than above).  Still, once we
> know the server is violating protocol, we're probably wiser to hang up
> than persisting on keeping the connection with the server.
>
> We aren't checking for overlap with any previously-received chunk, or
> with any previously-received error-with-offset.  I'm okay with that, as
> it really is easier to implement on the client side (just because the
> spec says the server is buggy for doing that does not mean we have to
> spend resources in the client validating if the server is buggy).
>
>> +
>> +    qemu_iovec_memset(qiov, *offset, 0, *hole_size);
>> +
>> +    return 0;
>> +}
>> +
>> +static int nbd_parse_error_payload(NBDStructuredReplyChunk *chunk,
>> +                                   uint8_t *payload, int *request_ret)
>> +{
>> +    uint32_t *error;
>> +    uint16_t *message_size;
>> +
>> +    assert(chunk->type & (1 << 15));
>> +
>> +    if (chunk->length < sizeof(error) + sizeof(message_size)) {
>> +        return -EINVAL;
>> +    }
> Again, should we plumb in the use of errp, rather than just
> disconnecting from the server?
>
>> +
>> +    payload_advance32(&payload, &error);
>> +    payload_advance16(&payload, &message_size);
>> +
>> +    error_report("%.*s", *message_size, payload);
> I think this one is wrong - we definitely want to log the error message
> that we got (or at least trace it), but since the chunk is in reply to a
> pending request, we should be able to feed the error message to errp of
> the request, rather than reporting it here and losing it.
>
> Also, if *message_size is 0, error_report("") isn't very useful (and
> right now, the simple server implementation doesn't send a message).
>
>> +
>> +    /* TODO add special case for ERROR_OFFSET */
>> +
>> +    *request_ret = nbd_errno_to_system_errno(*error);
> We should validate that the server didn't send us 0 for success.
>
>> +
>> +    return 0;
> So if we get here, we know we have an error, but we did the minimal
> handling of it (regardless of what chunk type it has), and can resume
> communication with the server.  This matches the NBD spec.
>
>> +}
>> +
>> +static int nbd_co_receive_offset_data_payload(NBDClientSession *s,
>> +                                              QEMUIOVector *qiov)
>> +{
>> +    QEMUIOVector sub_qiov;
>> +    uint64_t offset;
>> +    size_t data_size;
>> +    int ret;
>> +    NBDStructuredReplyChunk *chunk = &s->reply.structured;
>> +
>> +    assert(nbd_reply_is_structured(&s->reply));
>> +
>> +    if (chunk->length < sizeof(offset)) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (nbd_read(s->ioc, &offset, sizeof(offset), NULL) < 0) {
>> +        return -EIO;
> errp plumbing is missing (instead of ignoring the nbd_read() error and
> relying just on our own -EIO, we should also try to preserve the
> original error message).
>
>> +    }
>> +    be64_to_cpus(&offset);
>> +
>> +    data_size = chunk->length - sizeof(offset);
>> +    if (offset + data_size > qiov->size) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    qemu_iovec_init(&sub_qiov, qiov->niov);
>> +    qemu_iovec_concat(&sub_qiov, qiov, offset, data_size);
>> +    ret = qio_channel_readv_all(s->ioc, sub_qiov.iov, sub_qiov.niov, NULL);
> errp plumbing again.
>
>> +    qemu_iovec_destroy(&sub_qiov);
>> +
>> +    return ret < 0 ? -EIO : 0;
>> +}
>> +
>> +#define NBD_MAX_MALLOC_PAYLOAD 1000
> Feels rather arbitrary, and potentially somewhat small.  What is the
> justification for this number, as opposed to reusing NBD_MAX_BUFFER_SIZE
> (32M) or some other value?

The aim is to make this limit not large, as commands which receive large 
buffers like CMD_READ should
not go to this function.

>
> Should this limit be in a .h file, next to NBD_MAX_BUFFER_SIZE?

it is used only in one function and is there any reason to move it to .h?
I don't see any other usage for this constant.

>
>> +static int nbd_co_receive_structured_payload(NBDClientSession *s,
>> +                                             void **payload)
> Documentation should mention that the result requires qemu_vfree()
> instead of the more usual g_free()
>
>> +{
>> +    int ret;
>> +    uint32_t len;
>> +
>> +    assert(nbd_reply_is_structured(&s->reply));
>> +
>> +    len = s->reply.structured.length;
>> +
>> +    if (len == 0) {
>> +        return 0;
>> +    }
>> +
>> +    if (payload == NULL) {
>> +        return -EINVAL;
>> +    }
> Again, missing a useful error message (the server sent us payload that
> we aren't expecting) for reporting back through errp.
>
>> +
>> +    if (len > NBD_MAX_MALLOC_PAYLOAD) {
>> +        return -EINVAL;
>> +    }
>> +
>> +    *payload = qemu_memalign(8, len);
>> +    ret = nbd_read(s->ioc, *payload, len, NULL);
> errp plumbing
>
>> +    if (ret < 0) {
>> +        qemu_vfree(*payload);
>> +        *payload = NULL;
>> +        return ret;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/* nbd_co_do_receive_one_chunk
>> + * for simple reply:
>> + *   set request_ret to received reply error
>> + *   if qiov is not NULL: read payload to @qiov
>> + * for structured reply chunk:
>> + *   if error chunk: read payload, set @request_ret, do not set @payload
>> + *   else if offset_data chunk: read payload data to @qiov, do not set @payload
>> + *   else: read payload to @payload
> Mention that payload must be freed with qemu_vfree()
>
>> + */
>> +static int nbd_co_do_receive_one_chunk(NBDClientSession *s, uint64_t handle,
>> +                                       bool only_structured, int *request_ret,
>> +                                       QEMUIOVector *qiov, void **payload)
>>   {
>>       int ret;
>>       int i = HANDLE_TO_INDEX(s, handle);
>> +    void *local_payload = NULL;
>> +
>> +    if (payload) {
>> +        *payload = NULL;
>> +    }
>> +    *request_ret = 0;
>>   
>>       /* Wait until we're woken up by nbd_read_reply_entry.  */
>>       s->requests[i].receiving = true;
>>       qemu_coroutine_yield();
>>       s->requests[i].receiving = false;
>>       if (!s->ioc || s->quit) {
>> -        ret = -EIO;
>> -    } else {
>> -        assert(s->reply.handle == handle);
>> -        ret = -nbd_errno_to_system_errno(s->reply.simple.error);
>> -        if (qiov && ret == 0) {
>> -            if (qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
>> -                                      NULL) < 0) {
> Okay, I'll admit that we are already lacking errp plumbing, so adding it
> in this patch is not fair (if we add it, it can be a separate patch).
>
>> -                ret = -EIO;
>> -                s->quit = true;
>> -            }
>> +        return -EIO;
>> +    }
>> +
>> +    assert(s->reply.handle == handle);
>> +
>> +    if (nbd_reply_is_simple(&s->reply)) {
>> +        if (only_structured) {
>> +            return -EINVAL;
>>           }
> [1] Earlier, you checked that the server isn't sending structured
> replies where we expect simple; and here you're checking that the server
> isn't sending simple replies where we expect structured.  Good, we've
> covered both mismatches, and I can see why you have it in separate
> locations.
>
>>   
>> -        /* Tell the read handler to read another header.  */
>> -        s->reply.handle = 0;
>> +        *request_ret = -nbd_errno_to_system_errno(s->reply.simple.error);
>> +        if (*request_ret < 0 || !qiov) {
>> +            return 0;
>> +        }
>> +
>> +        return qio_channel_readv_all(s->ioc, qiov->iov, qiov->niov,
>> +                                     NULL) < 0 ? -EIO : 0;
>> +    }
>> +
>> +    /* handle structured reply chunk */
>> +    assert(s->info.structured_reply);
>> +
>> +    if (s->reply.structured.type == NBD_SREP_TYPE_NONE) {
>> +        return 0;
> We should also check that the server properly set NBD_REPLY_FLAG_DONE
> (if we got this type and the flag wasn't set, the server is sending
> garbage, and we should request disconnect soon). [2]
>
>> +    }
>> +
>> +    if (s->reply.structured.type == NBD_SREP_TYPE_OFFSET_DATA) {
>> +        if (!qiov) {
>> +            return -EINVAL;
>> +        }
>> +
>> +        return nbd_co_receive_offset_data_payload(s, qiov);
>> +    }
>> +
>> +    if (nbd_srep_type_is_error(s->reply.structured.type)) {
>> +        payload = &local_payload;
>> +    }
>> +
>> +    ret = nbd_co_receive_structured_payload(s, payload);
>> +    if (ret < 0) {
>> +        return ret;
>>       }
>>   
>> -    s->requests[i].coroutine = NULL;
>> +    if (nbd_srep_type_is_error(s->reply.structured.type)) {
>> +        ret = nbd_parse_error_payload(&s->reply.structured, local_payload,
>> +                                      request_ret);
>> +        qemu_vfree(local_payload);
>> +        return ret;
>> +    }
>> +
>> +    return 0;
>> +}
>> +
>> +/* nbd_co_receive_one_chunk
>> + * Read reply, wake up read_reply_co and set s->quit if needed.
>> + * Return value is a fatal error code or normal nbd reply error code
>> + */
>> +static int nbd_co_receive_one_chunk(NBDClientSession *s, uint64_t handle,
> Is this function called in coroutine context?  Presumably yes, because
> of the _co_ infix; but then it should also have the coroutine_fn marker
> in the declaration.
>
>> +                                    bool only_structured,
>> +                                    QEMUIOVector *qiov, NBDReply *reply,
>> +                                    void **payload)
>> +{
>> +    int request_ret;
>> +    int ret = nbd_co_do_receive_one_chunk(s, handle, only_structured,
>> +                                          &request_ret, qiov, payload);
>> +
>> +    if (ret < 0) {
>> +        s->quit = true;
>> +    } else {
>> +        /* For assert at loop start in nbd_read_reply_entry */
>> +        if (reply) {
>> +            *reply = s->reply;
>> +        }
>> +        s->reply.handle = 0;
>> +        ret = request_ret;
>> +    }
>>   
>> -    /* Kick the read_reply_co to get the next reply.  */
>>       if (s->read_reply_co) {
>>           aio_co_wake(s->read_reply_co);
>>       }
>>   
>> +    return ret;
>> +}
>> +
>> +typedef struct NBDReplyChunkIter {
>> +    int ret;
>> +    bool done, only_structured;
>> +} NBDReplyChunkIter;
>> +
>> +#define NBD_FOREACH_REPLY_CHUNK(s, iter, handle, structured, \
>> +                                qiov, reply, payload) \
>> +    for (iter = (NBDReplyChunkIter) { .only_structured = structured }; \
>> +         nbd_reply_chunk_iter_receive(s, &iter, handle, qiov, reply, payload);)
>> +
>> +static bool nbd_reply_chunk_iter_receive(NBDClientSession *s,
>> +                                         NBDReplyChunkIter *iter,
>> +                                         uint64_t handle,
>> +                                         QEMUIOVector *qiov, NBDReply *reply,
>> +                                         void **payload)
>> +{
>> +    int ret;
>> +    NBDReply local_reply;
>> +    NBDStructuredReplyChunk *chunk;
>> +    if (s->quit) {
>> +        if (iter->ret == 0) {
>> +            iter->ret = -EIO;
>> +        }
>> +        goto break_loop;
>> +    }
>> +
>> +    if (iter->done) {
>> +        /* Previous iteration was last. */
>> +        goto break_loop;
>> +    }
>> +
>> +    if (reply == NULL) {
>> +        reply = &local_reply;
>> +    }
>> +
>> +    ret = nbd_co_receive_one_chunk(s, handle, iter->only_structured,
>> +                                   qiov, reply, payload);
>> +    if (ret < 0 && iter->ret == 0) {
>> +        /* If it is a fatal error s->qiov is set by nbd_co_receive_one_chunk */
> did you mean s->quit here?
>
>> +        iter->ret = ret;
>> +    }
>> +
>> +    /* Do not execute the body of NBD_FOREACH_REPLY_CHUNK for simple reply. */
>> +    if (nbd_reply_is_simple(&s->reply) || s->quit) {
>> +        goto break_loop;
>> +    }
>> +
>> +    chunk = &reply->structured;
>> +    iter->only_structured = true;
> Interesting observation - the NBD spec lets us return a structured error
> even to a non-read command.  Only NBD_CMD_READ requires a structured
> reply when we haven't yet received any reply; but you are correct that
> once a given handle sees one structured chunk without a done bit, then
> all future replies for that handle must also be structured.
>
>> +
>> +    if (chunk->type == NBD_SREP_TYPE_NONE) {
>> +        if (!(chunk->flags & NBD_SREP_FLAG_DONE)) {
>> +            /* protocol error */
>> +            s->quit = true;
>> +            if (iter->ret == 0) {
>> +                iter->ret = -EIO;
>> +            }
> [2] Ah, I see you did it here instead!

However, looks like [2] is better place, I'll move.

>
>> +        }
>> +        goto break_loop;
>> +    }
>> +
>> +    if (chunk->flags & NBD_SREP_FLAG_DONE) {
>> +        /* This iteration is last. */
>> +        iter->done = true;
>> +    }
>> +
>> +    /* Execute the loop body */
>> +    return true;
>> +
>> +break_loop:
>> +    s->requests[HANDLE_TO_INDEX(s, handle)].coroutine = NULL;
>> +
>>       qemu_co_mutex_lock(&s->send_mutex);
>>       s->in_flight--;
>>       qemu_co_queue_next(&s->free_sema);
>>       qemu_co_mutex_unlock(&s->send_mutex);
>>   
>> -    return ret;
>> +    return false;
>> +}
>> +
>> +static int nbd_co_receive_return_code(NBDClientSession *s, uint64_t handle)
>> +{
>> +    NBDReplyChunkIter iter;
>> +
>> +    NBD_FOREACH_REPLY_CHUNK(s, iter, handle, false, NULL, NULL, NULL) {
>> +        /* nbd_reply_chunk_iter_receive does all the work */
>> +        ;
>> +    }
>> +
>> +    return iter.ret;
>> +}
>> +
>> +static int nbd_co_receive_cmdread_reply(NBDClientSession *s, uint64_t handle,
>> +                                        QEMUIOVector *qiov)
>> +{
>> +    NBDReplyChunkIter iter;
>> +    NBDReply reply;
>> +    void *payload = NULL;
>> +
>> +    NBD_FOREACH_REPLY_CHUNK(s, iter, handle, s->info.structured_reply,
>> +                            qiov, &reply, &payload)
>> +    {
>> +        int ret;
>> +
>> +        switch (reply.structured.type) {
>> +        case NBD_SREP_TYPE_OFFSET_DATA:
>> +            /* special cased in nbd_co_receive_one_chunk, data is already
>> +             * in qiov */
>> +            break;
>> +        case NBD_SREP_TYPE_OFFSET_HOLE:
>> +            ret = nbd_parse_offset_hole_payload(&reply.structured, payload,
>> +                                                qiov);
>> +            if (ret < 0) {
>> +                s->quit = true;
>> +            }
>> +            break;
>> +        default:
>> +            /* not allowed reply type */
> Slightly misleading; may want to also point out that error chunks were
> already captured during the foreach loop.

looks like a bug. they are captured but this loop body should be 
executed on them anyway, so we should
not set quit...

>
>> +            s->quit = true;
>> +        }
>> +
>> +        qemu_vfree(payload);
>> +        payload = NULL;
>> +    }
>> +
>> +    return iter.ret;
>>   }
>>   
>>   static int nbd_co_request(BlockDriverState *bs,
>>                             NBDRequest *request,
>> -                          QEMUIOVector *qiov)
>> +                          QEMUIOVector *write_qiov)
> The rename is somewhat cosmetic, but I think it reads well.
>
>>   {
>>       NBDClientSession *client = nbd_get_client_session(bs);
>>       int ret;
>>   
>> -    if (qiov) {
>> -        assert(request->type == NBD_CMD_WRITE || request->type == NBD_CMD_READ);
>> -        assert(request->len == iov_size(qiov->iov, qiov->niov));
>> +    assert(request->type != NBD_CMD_READ);
>> +    if (write_qiov) {
>> +        assert(request->type == NBD_CMD_WRITE);
>> +        assert(request->len == iov_size(write_qiov->iov, write_qiov->niov));
>>       } else {
>> -        assert(request->type != NBD_CMD_WRITE && request->type != NBD_CMD_READ);
>> +        assert(request->type != NBD_CMD_WRITE);
>>       }
>> -    ret = nbd_co_send_request(bs, request,
>> -                              request->type == NBD_CMD_WRITE ? qiov : NULL);
>> +    ret = nbd_co_send_request(bs, request, write_qiov);
>>       if (ret < 0) {
>>           return ret;
>>       }
>>   
>> -    return nbd_co_receive_reply(client, request->handle,
>> -                                request->type == NBD_CMD_READ ? qiov : NULL);
>> +    return nbd_co_receive_return_code(client, request->handle);
>>   }
> So basically read was special enough that it no longer shares the common
> code at this level.  Still, I like how you've divided things among the
> various functions.
>
>>   
>>   int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
>>                            uint64_t bytes, QEMUIOVector *qiov, int flags)
>>   {
>> +    int ret;
>> +    NBDClientSession *client = nbd_get_client_session(bs);
>>       NBDRequest request = {
>>           .type = NBD_CMD_READ,
>>           .from = offset,
>> @@ -259,7 +591,12 @@ int nbd_client_co_preadv(BlockDriverState *bs, uint64_t offset,
>>       assert(bytes <= NBD_MAX_BUFFER_SIZE);
>>       assert(!flags);
>>   
>> -    return nbd_co_request(bs, &request, qiov);
>> +    ret = nbd_co_send_request(bs, &request, NULL);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    return nbd_co_receive_cmdread_reply(client, request.handle, qiov);
>>   }
>>   
> And of course, your next goal is adding a block_status function that
> will also special-case chunked replies - but definitely a later patch ;)
>
>>   int nbd_client_co_pwritev(BlockDriverState *bs, uint64_t offset,
>> diff --git a/nbd/client.c b/nbd/client.c
>> index a38e1a7d8e..2f256ee771 100644
>> --- a/nbd/client.c
>> +++ b/nbd/client.c
>> @@ -687,6 +687,13 @@ int nbd_receive_negotiate(QIOChannel *ioc, const char *name,
>>           if (fixedNewStyle) {
>>               int result;
>>   
>> +            result = nbd_request_simple_option(ioc, NBD_OPT_STRUCTURED_REPLY,
>> +                                               errp);
> Bug - we must NOT request this option unless we know we can handle the
> server saying yes.  When nbd-client is handling the connection, we're
> fine; but when qemu-nbd is handling the connection by using ioctls to
> hand off to the kernel, we MUST query the kernel to see if it supports
> structured replies (well, for now, we can get by with saying that we
> KNOW the kernel code has not been written yet, and therefore our query
> is a constant false, but someday we'll have a future patch in that area).

I don't know what is the nbd_client_thread and what are you saying 
about, but looks
like I understand what I should do: make structured_reply field of 
NBDExportInfo to be
"in-out", and do not request it if it is false. Set it to true in 
block/nbd-client.c and to false
in qemu-nbd.c, right?

>
>> +            if (result < 0) {
>> +                goto fail;
>> +            }
>> +            info->structured_reply = result == 1;
>> +
>>               /* Try NBD_OPT_GO first - if it works, we are done (it
>>                * also gives us a good message if the server requires
>>                * TLS).  If it is not available, fall back to
>>
> A lot to digest in this message. While I was comfortable tweaking
> previous patches, and/or inserting some of my own for a v4, I think this
> one is complex enough that I'd prefer it if I send 9-12 + my own
> followup patches as my v4, then you rebase this one on top of that.  But
> I also think you have a very promising patch here; you got a lot of
> things right (even if it didn't have much in the way of comments), and
> the design is looking reasonable.  Perhaps it is also worth seeing if
> Paolo has review comments on this one.
>

Ok, thank you for the your review and fixing! I'll send [9/8] soon.

-- 
Best regards,
Vladimir