docs/interop/nbd.txt | 1 + include/block/nbd.h | 2 +- blockdev-nbd.c | 2 +- nbd/server.c | 4 +++- qemu-nbd.c | 2 +- 5 files changed, 7 insertions(+), 4 deletions(-)
The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be
advertised when the server promises cache consistency between
simultaneous clients (basically, rules that determine what FUA and
flush from one client are able to guarantee for reads from another
client). When we don't permit simultaneous clients (such as qemu-nbd
without -e), the bit makes no sense; and for writable images, we
probably have a lot more work before we can declare that actions from
one client are cache-consistent with actions from another. But for
read-only images, where flush isn't changing any data, we might as
well advertise multi-conn support. What's more, advertisement of the
bit makes it easier for clients to determine if 'qemu-nbd -e' was in
use, where a second connection will succeed rather than hang until the
first client goes away.
This patch affects qemu as server in advertising the bit. We may want
to consider patches to qemu as client to attempt parallel connections
for higher throughput by spreading the load over those connections
when a server advertises multi-conn, but for now sticking to one
connection per nbd:// BDS is okay.
See also: https://bugzilla.redhat.com/1708300
Signed-off-by: Eric Blake <eblake@redhat.com>
---
docs/interop/nbd.txt | 1 +
include/block/nbd.h | 2 +-
blockdev-nbd.c | 2 +-
nbd/server.c | 4 +++-
qemu-nbd.c | 2 +-
5 files changed, 7 insertions(+), 4 deletions(-)
diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt
index fc64473e02b2..6dfec7f47647 100644
--- a/docs/interop/nbd.txt
+++ b/docs/interop/nbd.txt
@@ -53,3 +53,4 @@ the operation of that feature.
* 2.12: NBD_CMD_BLOCK_STATUS for "base:allocation"
* 3.0: NBD_OPT_STARTTLS with TLS Pre-Shared Keys (PSK),
NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE
+* 4.2: NBD_FLAG_CAN_MULTI_CONN for sharable read-only exports
diff --git a/include/block/nbd.h b/include/block/nbd.h
index 7b36d672f046..991fd52a5134 100644
--- a/include/block/nbd.h
+++ b/include/block/nbd.h
@@ -326,7 +326,7 @@ typedef struct NBDClient NBDClient;
NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
uint64_t size, const char *name, const char *desc,
- const char *bitmap, uint16_t nbdflags,
+ const char *bitmap, uint16_t nbdflags, bool shared,
void (*close)(NBDExport *), bool writethrough,
BlockBackend *on_eject_blk, Error **errp);
void nbd_export_close(NBDExport *exp);
diff --git a/blockdev-nbd.c b/blockdev-nbd.c
index 66eebab31875..e5d228771292 100644
--- a/blockdev-nbd.c
+++ b/blockdev-nbd.c
@@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name,
}
exp = nbd_export_new(bs, 0, len, name, NULL, bitmap,
- writable ? 0 : NBD_FLAG_READ_ONLY,
+ writable ? 0 : NBD_FLAG_READ_ONLY, true,
NULL, false, on_eject_blk, errp);
if (!exp) {
return;
diff --git a/nbd/server.c b/nbd/server.c
index a2cf085f7635..a602d85070ff 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1460,7 +1460,7 @@ static void nbd_eject_notifier(Notifier *n, void *data)
NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
uint64_t size, const char *name, const char *desc,
- const char *bitmap, uint16_t nbdflags,
+ const char *bitmap, uint16_t nbdflags, bool shared,
void (*close)(NBDExport *), bool writethrough,
BlockBackend *on_eject_blk, Error **errp)
{
@@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset,
perm = BLK_PERM_CONSISTENT_READ;
if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) {
perm |= BLK_PERM_WRITE;
+ } else if (shared) {
+ nbdflags |= NBD_FLAG_CAN_MULTI_CONN;
}
blk = blk_new(bdrv_get_aio_context(bs), perm,
BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED |
diff --git a/qemu-nbd.c b/qemu-nbd.c
index 049645491dab..55f5ceaf5c92 100644
--- a/qemu-nbd.c
+++ b/qemu-nbd.c
@@ -1173,7 +1173,7 @@ int main(int argc, char **argv)
}
export = nbd_export_new(bs, dev_offset, fd_size, export_name,
- export_description, bitmap, nbdflags,
+ export_description, bitmap, nbdflags, shared > 1,
nbd_export_closed, writethrough, NULL,
&error_fatal);
--
2.20.1
On Thu, Aug 15, 2019 at 01:50:24PM -0500, Eric Blake wrote: > The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be > advertised when the server promises cache consistency between > simultaneous clients (basically, rules that determine what FUA and > flush from one client are able to guarantee for reads from another > client). When we don't permit simultaneous clients (such as qemu-nbd > without -e), the bit makes no sense; and for writable images, we > probably have a lot more work before we can declare that actions from > one client are cache-consistent with actions from another. But for > read-only images, where flush isn't changing any data, we might as > well advertise multi-conn support. What's more, advertisement of the > bit makes it easier for clients to determine if 'qemu-nbd -e' was in > use, where a second connection will succeed rather than hang until the > first client goes away. > > This patch affects qemu as server in advertising the bit. We may want > to consider patches to qemu as client to attempt parallel connections > for higher throughput by spreading the load over those connections > when a server advertises multi-conn, but for now sticking to one > connection per nbd:// BDS is okay. > > See also: https://bugzilla.redhat.com/1708300 > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > docs/interop/nbd.txt | 1 + > include/block/nbd.h | 2 +- > blockdev-nbd.c | 2 +- > nbd/server.c | 4 +++- > qemu-nbd.c | 2 +- > 5 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt > index fc64473e02b2..6dfec7f47647 100644 > --- a/docs/interop/nbd.txt > +++ b/docs/interop/nbd.txt > @@ -53,3 +53,4 @@ the operation of that feature. > * 2.12: NBD_CMD_BLOCK_STATUS for "base:allocation" > * 3.0: NBD_OPT_STARTTLS with TLS Pre-Shared Keys (PSK), > NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE > +* 4.2: NBD_FLAG_CAN_MULTI_CONN for sharable read-only exports > diff --git a/include/block/nbd.h b/include/block/nbd.h > index 7b36d672f046..991fd52a5134 100644 > --- a/include/block/nbd.h > +++ b/include/block/nbd.h > @@ -326,7 +326,7 @@ typedef struct NBDClient NBDClient; > > NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, > uint64_t size, const char *name, const char *desc, > - const char *bitmap, uint16_t nbdflags, > + const char *bitmap, uint16_t nbdflags, bool shared, > void (*close)(NBDExport *), bool writethrough, > BlockBackend *on_eject_blk, Error **errp); > void nbd_export_close(NBDExport *exp); > diff --git a/blockdev-nbd.c b/blockdev-nbd.c > index 66eebab31875..e5d228771292 100644 > --- a/blockdev-nbd.c > +++ b/blockdev-nbd.c > @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name, > } > > exp = nbd_export_new(bs, 0, len, name, NULL, bitmap, > - writable ? 0 : NBD_FLAG_READ_ONLY, > + writable ? 0 : NBD_FLAG_READ_ONLY, true, > NULL, false, on_eject_blk, errp); > if (!exp) { > return; > diff --git a/nbd/server.c b/nbd/server.c > index a2cf085f7635..a602d85070ff 100644 > --- a/nbd/server.c > +++ b/nbd/server.c > @@ -1460,7 +1460,7 @@ static void nbd_eject_notifier(Notifier *n, void *data) > > NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, > uint64_t size, const char *name, const char *desc, > - const char *bitmap, uint16_t nbdflags, > + const char *bitmap, uint16_t nbdflags, bool shared, > void (*close)(NBDExport *), bool writethrough, > BlockBackend *on_eject_blk, Error **errp) > { > @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, > perm = BLK_PERM_CONSISTENT_READ; > if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) { > perm |= BLK_PERM_WRITE; > + } else if (shared) { > + nbdflags |= NBD_FLAG_CAN_MULTI_CONN; > } > blk = blk_new(bdrv_get_aio_context(bs), perm, > BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED | > diff --git a/qemu-nbd.c b/qemu-nbd.c > index 049645491dab..55f5ceaf5c92 100644 > --- a/qemu-nbd.c > +++ b/qemu-nbd.c > @@ -1173,7 +1173,7 @@ int main(int argc, char **argv) > } > > export = nbd_export_new(bs, dev_offset, fd_size, export_name, > - export_description, bitmap, nbdflags, > + export_description, bitmap, nbdflags, shared > 1, > nbd_export_closed, writethrough, NULL, > &error_fatal); > Multi-conn is a no-brainer. For nbdkit it more than doubled throughput: https://github.com/libguestfs/nbdkit/commit/910a220aa454b410c44731e8d965e92244b536f5 Those results are for loopback mounts of a file located on /dev/shm and served by nbdkit file plugin, and I would imagine that without the loop mounting / filesystem overhead the results could be even better. For read-only connections where the server can handle more than one connection (-e) it ought to be safe. You have to tell the client how many connections the server may accept, but that's a limitation of the current protocol. So yes ACK, patch makes sense. Worth noting that fio has NBD support so you can test NBD servers directly these days: https://github.com/axboe/fio/commit/d643a1e29d31bf974a613866819dde241c928b6d https://github.com/axboe/fio/blob/master/examples/nbd.fio#L5 Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming and virtualization blog: http://rwmj.wordpress.com virt-df lists disk usage of guests without needing to install any software inside the virtual machine. Supports Linux and Windows. http://people.redhat.com/~rjones/virt-df/
On 8/15/19 2:50 PM, Eric Blake wrote: > The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be > advertised when the server promises cache consistency between > simultaneous clients (basically, rules that determine what FUA and > flush from one client are able to guarantee for reads from another > client). When we don't permit simultaneous clients (such as qemu-nbd > without -e), the bit makes no sense; and for writable images, we > probably have a lot more work before we can declare that actions from > one client are cache-consistent with actions from another. But for > read-only images, where flush isn't changing any data, we might as > well advertise multi-conn support. What's more, advertisement of the > bit makes it easier for clients to determine if 'qemu-nbd -e' was in > use, where a second connection will succeed rather than hang until the > first client goes away. > > This patch affects qemu as server in advertising the bit. We may want > to consider patches to qemu as client to attempt parallel connections > for higher throughput by spreading the load over those connections > when a server advertises multi-conn, but for now sticking to one > connection per nbd:// BDS is okay. > > See also: https://bugzilla.redhat.com/1708300 > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > docs/interop/nbd.txt | 1 + > include/block/nbd.h | 2 +- > blockdev-nbd.c | 2 +- > nbd/server.c | 4 +++- > qemu-nbd.c | 2 +- > 5 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt > index fc64473e02b2..6dfec7f47647 100644 > --- a/docs/interop/nbd.txt > +++ b/docs/interop/nbd.txt > @@ -53,3 +53,4 @@ the operation of that feature. > * 2.12: NBD_CMD_BLOCK_STATUS for "base:allocation" > * 3.0: NBD_OPT_STARTTLS with TLS Pre-Shared Keys (PSK), > NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE > +* 4.2: NBD_FLAG_CAN_MULTI_CONN for sharable read-only exports > diff --git a/include/block/nbd.h b/include/block/nbd.h > index 7b36d672f046..991fd52a5134 100644 > --- a/include/block/nbd.h > +++ b/include/block/nbd.h > @@ -326,7 +326,7 @@ typedef struct NBDClient NBDClient; > > NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, > uint64_t size, const char *name, const char *desc, > - const char *bitmap, uint16_t nbdflags, > + const char *bitmap, uint16_t nbdflags, bool shared, > void (*close)(NBDExport *), bool writethrough, > BlockBackend *on_eject_blk, Error **errp); > void nbd_export_close(NBDExport *exp); > diff --git a/blockdev-nbd.c b/blockdev-nbd.c > index 66eebab31875..e5d228771292 100644 > --- a/blockdev-nbd.c > +++ b/blockdev-nbd.c > @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name, > } > > exp = nbd_export_new(bs, 0, len, name, NULL, bitmap, > - writable ? 0 : NBD_FLAG_READ_ONLY, > + writable ? 0 : NBD_FLAG_READ_ONLY, true, > NULL, false, on_eject_blk, errp); Why is it okay to force the share bit on regardless of the value of 'writable' ? > if (!exp) { > return; > diff --git a/nbd/server.c b/nbd/server.c > index a2cf085f7635..a602d85070ff 100644 > --- a/nbd/server.c > +++ b/nbd/server.c > @@ -1460,7 +1460,7 @@ static void nbd_eject_notifier(Notifier *n, void *data) > > NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, > uint64_t size, const char *name, const char *desc, > - const char *bitmap, uint16_t nbdflags, > + const char *bitmap, uint16_t nbdflags, bool shared, > void (*close)(NBDExport *), bool writethrough, > BlockBackend *on_eject_blk, Error **errp) > { > @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, > perm = BLK_PERM_CONSISTENT_READ; > if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) { > perm |= BLK_PERM_WRITE; > + } else if (shared) { > + nbdflags |= NBD_FLAG_CAN_MULTI_CONN; > } > blk = blk_new(bdrv_get_aio_context(bs), perm, > BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED | > diff --git a/qemu-nbd.c b/qemu-nbd.c > index 049645491dab..55f5ceaf5c92 100644 > --- a/qemu-nbd.c > +++ b/qemu-nbd.c > @@ -1173,7 +1173,7 @@ int main(int argc, char **argv) > } > > export = nbd_export_new(bs, dev_offset, fd_size, export_name, > - export_description, bitmap, nbdflags, > + export_description, bitmap, nbdflags, shared > 1, > nbd_export_closed, writethrough, NULL, > &error_fatal); >
On 8/15/19 4:45 PM, John Snow wrote: > > > On 8/15/19 2:50 PM, Eric Blake wrote: >> The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be >> advertised when the server promises cache consistency between >> simultaneous clients (basically, rules that determine what FUA and >> flush from one client are able to guarantee for reads from another >> client). When we don't permit simultaneous clients (such as qemu-nbd >> without -e), the bit makes no sense; and for writable images, we >> probably have a lot more work before we can declare that actions from >> one client are cache-consistent with actions from another. But for >> read-only images, where flush isn't changing any data, we might as >> well advertise multi-conn support. What's more, advertisement of the >> bit makes it easier for clients to determine if 'qemu-nbd -e' was in >> use, where a second connection will succeed rather than hang until the >> first client goes away. >> >> This patch affects qemu as server in advertising the bit. We may want >> to consider patches to qemu as client to attempt parallel connections >> for higher throughput by spreading the load over those connections >> when a server advertises multi-conn, but for now sticking to one >> connection per nbd:// BDS is okay. >> >> +++ b/blockdev-nbd.c >> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name, >> } >> >> exp = nbd_export_new(bs, 0, len, name, NULL, bitmap, >> - writable ? 0 : NBD_FLAG_READ_ONLY, >> + writable ? 0 : NBD_FLAG_READ_ONLY, true, >> NULL, false, on_eject_blk, errp); > > Why is it okay to force the share bit on regardless of the value of > 'writable' ? Well, it's probably not, except that... >> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, >> perm = BLK_PERM_CONSISTENT_READ; >> if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) { >> perm |= BLK_PERM_WRITE; >> + } else if (shared) { >> + nbdflags |= NBD_FLAG_CAN_MULTI_CONN; >> } requesting shared=true has no effect for a writable export. I can tweak it for less confusion, though. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
On 8/15/19 5:54 PM, Eric Blake wrote: > On 8/15/19 4:45 PM, John Snow wrote: >> >> >> On 8/15/19 2:50 PM, Eric Blake wrote: >>> The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be >>> advertised when the server promises cache consistency between >>> simultaneous clients (basically, rules that determine what FUA and >>> flush from one client are able to guarantee for reads from another >>> client). When we don't permit simultaneous clients (such as qemu-nbd >>> without -e), the bit makes no sense; and for writable images, we >>> probably have a lot more work before we can declare that actions from >>> one client are cache-consistent with actions from another. But for >>> read-only images, where flush isn't changing any data, we might as >>> well advertise multi-conn support. What's more, advertisement of the >>> bit makes it easier for clients to determine if 'qemu-nbd -e' was in >>> use, where a second connection will succeed rather than hang until the >>> first client goes away. >>> >>> This patch affects qemu as server in advertising the bit. We may want >>> to consider patches to qemu as client to attempt parallel connections >>> for higher throughput by spreading the load over those connections >>> when a server advertises multi-conn, but for now sticking to one >>> connection per nbd:// BDS is okay. >>> > >>> +++ b/blockdev-nbd.c >>> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name, >>> } >>> >>> exp = nbd_export_new(bs, 0, len, name, NULL, bitmap, >>> - writable ? 0 : NBD_FLAG_READ_ONLY, >>> + writable ? 0 : NBD_FLAG_READ_ONLY, true, >>> NULL, false, on_eject_blk, errp); >> >> Why is it okay to force the share bit on regardless of the value of >> 'writable' ? > > Well, it's probably not, except that... > > >>> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, >>> perm = BLK_PERM_CONSISTENT_READ; >>> if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) { >>> perm |= BLK_PERM_WRITE; >>> + } else if (shared) { >>> + nbdflags |= NBD_FLAG_CAN_MULTI_CONN; >>> } > > requesting shared=true has no effect for a writable export. > > I can tweak it for less confusion, though. > "Yes John, when it's an else-if it really does matter what specific condition it's following." (Ah, there it is.) Yeah, I think if you have hopes to support this flag in the future for writable exports, I think it might be nicer to reject this bit for RW; and adjust the caller to only request it conditionally. Or not. I guess we don't have to maintain backwards compatibility for internal API like that, so ... dealer's choice: Reviewed-by: John Snow <jsnow@redhat.com>
Patchew URL: https://patchew.org/QEMU/20190815185024.7010-1-eblake@redhat.com/ Hi, This series failed build test on s390x host. Please find the details below. === TEST SCRIPT BEGIN === #!/bin/bash # Testing script will be invoked under the git checkout with # HEAD pointing to a commit that has the patches applied on top of "base" # branch set -e echo echo "=== ENV ===" env echo echo "=== PACKAGES ===" rpm -qa echo echo "=== UNAME ===" uname -a CC=$HOME/bin/cc INSTALL=$PWD/install BUILD=$PWD/build mkdir -p $BUILD $INSTALL SRC=$PWD cd $BUILD $SRC/configure --cc=$CC --prefix=$INSTALL make -j4 # XXX: we need reliable clean up # make check -j4 V=1 make install === TEST SCRIPT END === CC mips64-softmmu/trace/control-target.o CC mips64-softmmu/trace/generated-helpers.o LINK mips64-softmmu/qemu-system-mips64 collect2: error: ld returned 1 exit status make[1]: *** [Makefile:209: qemu-system-mips64] Error 1 make: *** [Makefile:472: mips64-softmmu/all] Error 2 make: *** Waiting for unfinished jobs.... The full log is available at http://patchew.org/logs/20190815185024.7010-1-eblake@redhat.com/testing.s390x/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
15.08.2019 21:50, Eric Blake wrote: > The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be > advertised when the server promises cache consistency between > simultaneous clients (basically, rules that determine what FUA and > flush from one client are able to guarantee for reads from another > client). When we don't permit simultaneous clients (such as qemu-nbd > without -e), the bit makes no sense; and for writable images, we > probably have a lot more work before we can declare that actions from > one client are cache-consistent with actions from another. But for > read-only images, where flush isn't changing any data, we might as > well advertise multi-conn support. What's more, advertisement of the > bit makes it easier for clients to determine if 'qemu-nbd -e' was in > use, where a second connection will succeed rather than hang until the > first client goes away. > > This patch affects qemu as server in advertising the bit. We may want > to consider patches to qemu as client to attempt parallel connections > for higher throughput by spreading the load over those connections > when a server advertises multi-conn, but for now sticking to one > connection per nbd:// BDS is okay. > > See also: https://bugzilla.redhat.com/1708300 > Signed-off-by: Eric Blake <eblake@redhat.com> > --- > docs/interop/nbd.txt | 1 + > include/block/nbd.h | 2 +- > blockdev-nbd.c | 2 +- > nbd/server.c | 4 +++- > qemu-nbd.c | 2 +- > 5 files changed, 7 insertions(+), 4 deletions(-) > > diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt > index fc64473e02b2..6dfec7f47647 100644 > --- a/docs/interop/nbd.txt > +++ b/docs/interop/nbd.txt > @@ -53,3 +53,4 @@ the operation of that feature. > * 2.12: NBD_CMD_BLOCK_STATUS for "base:allocation" > * 3.0: NBD_OPT_STARTTLS with TLS Pre-Shared Keys (PSK), > NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE > +* 4.2: NBD_FLAG_CAN_MULTI_CONN for sharable read-only exports > diff --git a/include/block/nbd.h b/include/block/nbd.h > index 7b36d672f046..991fd52a5134 100644 > --- a/include/block/nbd.h > +++ b/include/block/nbd.h > @@ -326,7 +326,7 @@ typedef struct NBDClient NBDClient; > > NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, > uint64_t size, const char *name, const char *desc, > - const char *bitmap, uint16_t nbdflags, > + const char *bitmap, uint16_t nbdflags, bool shared, > void (*close)(NBDExport *), bool writethrough, > BlockBackend *on_eject_blk, Error **errp); > void nbd_export_close(NBDExport *exp); > diff --git a/blockdev-nbd.c b/blockdev-nbd.c > index 66eebab31875..e5d228771292 100644 > --- a/blockdev-nbd.c > +++ b/blockdev-nbd.c > @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name, > } > > exp = nbd_export_new(bs, 0, len, name, NULL, bitmap, > - writable ? 0 : NBD_FLAG_READ_ONLY, > + writable ? 0 : NBD_FLAG_READ_ONLY, true, s/true/!writable ? > NULL, false, on_eject_blk, errp); > if (!exp) { > return; > diff --git a/nbd/server.c b/nbd/server.c > index a2cf085f7635..a602d85070ff 100644 > --- a/nbd/server.c > +++ b/nbd/server.c > @@ -1460,7 +1460,7 @@ static void nbd_eject_notifier(Notifier *n, void *data) > > NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, > uint64_t size, const char *name, const char *desc, > - const char *bitmap, uint16_t nbdflags, > + const char *bitmap, uint16_t nbdflags, bool shared, > void (*close)(NBDExport *), bool writethrough, > BlockBackend *on_eject_blk, Error **errp) > { > @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, > perm = BLK_PERM_CONSISTENT_READ; > if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) { > perm |= BLK_PERM_WRITE; > + } else if (shared) { > + nbdflags |= NBD_FLAG_CAN_MULTI_CONN; > } > blk = blk_new(bdrv_get_aio_context(bs), perm, > BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED | > diff --git a/qemu-nbd.c b/qemu-nbd.c > index 049645491dab..55f5ceaf5c92 100644 > --- a/qemu-nbd.c > +++ b/qemu-nbd.c > @@ -1173,7 +1173,7 @@ int main(int argc, char **argv) > } > > export = nbd_export_new(bs, dev_offset, fd_size, export_name, > - export_description, bitmap, nbdflags, > + export_description, bitmap, nbdflags, shared > 1, > nbd_export_closed, writethrough, NULL, > &error_fatal); > -- Best regards, Vladimir
16.08.2019 13:23, Vladimir Sementsov-Ogievskiy wrote: > 15.08.2019 21:50, Eric Blake wrote: >> The NBD specification defines NBD_FLAG_CAN_MULTI_CONN, which can be >> advertised when the server promises cache consistency between >> simultaneous clients (basically, rules that determine what FUA and >> flush from one client are able to guarantee for reads from another >> client). When we don't permit simultaneous clients (such as qemu-nbd >> without -e), the bit makes no sense; and for writable images, we >> probably have a lot more work before we can declare that actions from >> one client are cache-consistent with actions from another. But for >> read-only images, where flush isn't changing any data, we might as >> well advertise multi-conn support. What's more, advertisement of the >> bit makes it easier for clients to determine if 'qemu-nbd -e' was in >> use, where a second connection will succeed rather than hang until the >> first client goes away. >> >> This patch affects qemu as server in advertising the bit. We may want >> to consider patches to qemu as client to attempt parallel connections >> for higher throughput by spreading the load over those connections >> when a server advertises multi-conn, but for now sticking to one >> connection per nbd:// BDS is okay. >> >> See also: https://bugzilla.redhat.com/1708300 >> Signed-off-by: Eric Blake <eblake@redhat.com> >> --- >> docs/interop/nbd.txt | 1 + >> include/block/nbd.h | 2 +- >> blockdev-nbd.c | 2 +- >> nbd/server.c | 4 +++- >> qemu-nbd.c | 2 +- >> 5 files changed, 7 insertions(+), 4 deletions(-) >> >> diff --git a/docs/interop/nbd.txt b/docs/interop/nbd.txt >> index fc64473e02b2..6dfec7f47647 100644 >> --- a/docs/interop/nbd.txt >> +++ b/docs/interop/nbd.txt >> @@ -53,3 +53,4 @@ the operation of that feature. >> * 2.12: NBD_CMD_BLOCK_STATUS for "base:allocation" >> * 3.0: NBD_OPT_STARTTLS with TLS Pre-Shared Keys (PSK), >> NBD_CMD_BLOCK_STATUS for "qemu:dirty-bitmap:", NBD_CMD_CACHE >> +* 4.2: NBD_FLAG_CAN_MULTI_CONN for sharable read-only exports >> diff --git a/include/block/nbd.h b/include/block/nbd.h >> index 7b36d672f046..991fd52a5134 100644 >> --- a/include/block/nbd.h >> +++ b/include/block/nbd.h >> @@ -326,7 +326,7 @@ typedef struct NBDClient NBDClient; >> >> NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, >> uint64_t size, const char *name, const char *desc, >> - const char *bitmap, uint16_t nbdflags, >> + const char *bitmap, uint16_t nbdflags, bool shared, >> void (*close)(NBDExport *), bool writethrough, >> BlockBackend *on_eject_blk, Error **errp); >> void nbd_export_close(NBDExport *exp); >> diff --git a/blockdev-nbd.c b/blockdev-nbd.c >> index 66eebab31875..e5d228771292 100644 >> --- a/blockdev-nbd.c >> +++ b/blockdev-nbd.c >> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name, >> } >> >> exp = nbd_export_new(bs, 0, len, name, NULL, bitmap, >> - writable ? 0 : NBD_FLAG_READ_ONLY, >> + writable ? 0 : NBD_FLAG_READ_ONLY, true, > > s/true/!writable ? Oh, I see, John already noticed this, it's checked in nbd_export_new anyway.. > >> NULL, false, on_eject_blk, errp); >> if (!exp) { >> return; >> diff --git a/nbd/server.c b/nbd/server.c >> index a2cf085f7635..a602d85070ff 100644 >> --- a/nbd/server.c >> +++ b/nbd/server.c >> @@ -1460,7 +1460,7 @@ static void nbd_eject_notifier(Notifier *n, void *data) >> >> NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, >> uint64_t size, const char *name, const char *desc, >> - const char *bitmap, uint16_t nbdflags, >> + const char *bitmap, uint16_t nbdflags, bool shared, >> void (*close)(NBDExport *), bool writethrough, >> BlockBackend *on_eject_blk, Error **errp) >> { >> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, >> perm = BLK_PERM_CONSISTENT_READ; >> if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) { >> perm |= BLK_PERM_WRITE; >> + } else if (shared) { >> + nbdflags |= NBD_FLAG_CAN_MULTI_CONN; For me it looks a bit strange: we already have nbdflags parameter for nbd_export_new(), why to add a separate boolean to pass one of nbdflags flags? Also, for qemu-nbd, shouldn't we allow -e only together with -r ? >> } >> blk = blk_new(bdrv_get_aio_context(bs), perm, >> BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED | >> diff --git a/qemu-nbd.c b/qemu-nbd.c >> index 049645491dab..55f5ceaf5c92 100644 >> --- a/qemu-nbd.c >> +++ b/qemu-nbd.c >> @@ -1173,7 +1173,7 @@ int main(int argc, char **argv) >> } >> >> export = nbd_export_new(bs, dev_offset, fd_size, export_name, >> - export_description, bitmap, nbdflags, >> + export_description, bitmap, nbdflags, shared > 1, >> nbd_export_closed, writethrough, NULL, >> &error_fatal); >> > > -- Best regards, Vladimir
On 8/16/19 5:47 AM, Vladimir Sementsov-Ogievskiy wrote: >>> +++ b/blockdev-nbd.c >>> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name, >>> } >>> >>> exp = nbd_export_new(bs, 0, len, name, NULL, bitmap, >>> - writable ? 0 : NBD_FLAG_READ_ONLY, >>> + writable ? 0 : NBD_FLAG_READ_ONLY, true, >> >> s/true/!writable ? > > Oh, I see, John already noticed this, it's checked in nbd_export_new anyway.. Still, since two reviewers have caught it, I'm fixing it :) >>> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, >>> perm = BLK_PERM_CONSISTENT_READ; >>> if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) { >>> perm |= BLK_PERM_WRITE; >>> + } else if (shared) { >>> + nbdflags |= NBD_FLAG_CAN_MULTI_CONN; > > For me it looks a bit strange: we already have nbdflags parameter for nbd_export_new(), why > to add a separate boolean to pass one of nbdflags flags? Because I want to get rid of the nbdflags in my next patch. > > Also, for qemu-nbd, shouldn't we allow -e only together with -r ? I'm reluctant to; it might break whatever existing user is okay exposing it (although such users are questionable, so maybe we can argue they were already broken). Maybe it's time to start a deprecation cycle? -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
On Sat, Aug 17, 2019 at 5:30 PM Eric Blake <eblake@redhat.com> wrote: > On 8/16/19 5:47 AM, Vladimir Sementsov-Ogievskiy wrote: > > >>> +++ b/blockdev-nbd.c > >>> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool > has_name, const char *name, > >>> } > >>> > >>> exp = nbd_export_new(bs, 0, len, name, NULL, bitmap, > >>> - writable ? 0 : NBD_FLAG_READ_ONLY, > >>> + writable ? 0 : NBD_FLAG_READ_ONLY, true, > >> > >> s/true/!writable ? > > > > Oh, I see, John already noticed this, it's checked in nbd_export_new > anyway.. > > Still, since two reviewers have caught it, I'm fixing it :) > > > >>> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, > uint64_t dev_offset, > >>> perm = BLK_PERM_CONSISTENT_READ; > >>> if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) { > >>> perm |= BLK_PERM_WRITE; > >>> + } else if (shared) { > >>> + nbdflags |= NBD_FLAG_CAN_MULTI_CONN; > > > > For me it looks a bit strange: we already have nbdflags parameter for > nbd_export_new(), why > > to add a separate boolean to pass one of nbdflags flags? > > Because I want to get rid of the nbdflags in my next patch. > > > > > Also, for qemu-nbd, shouldn't we allow -e only together with -r ? > > I'm reluctant to; it might break whatever existing user is okay exposing > it (although such users are questionable, so maybe we can argue they > were already broken). Maybe it's time to start a deprecation cycle? > man qemu-nbd (on Centos 7.6) says: -e, --shared=num Allow up to num clients to share the device (default 1) I see that in qemu-img 4.1 there is a note about consistency with writers: -e, --shared=num Allow up to num clients to share the device (default 1). Safe for readers, but for now, consistency is not guaranteed between multiple writers. But it is not clear what are the consistency guarantees. Supporting multiple writers is important. oVirt is giving the user a URL (since 4.3), and the user can use multiple connections using the same URL, each having a connection to the same qemu-nbd socket. I know that some backup vendors tried to use multiple connections to speed up backups, and they may try to do this also for restore. An interesting use case would be using multiple connections on client side to write in parallel to same image, when every client is writing different ranges. Do we have real issue in qemu-nbd serving multiple clients writing to different parts of the same image? Nir
On 8/17/19 8:31 PM, Nir Soffer wrote: >>> Also, for qemu-nbd, shouldn't we allow -e only together with -r ? >> >> I'm reluctant to; it might break whatever existing user is okay exposing >> it (although such users are questionable, so maybe we can argue they >> were already broken). Maybe it's time to start a deprecation cycle? >> > > man qemu-nbd (on Centos 7.6) says: > > -e, --shared=num > Allow up to num clients to share the device (default 1) > > I see that in qemu-img 4.1 there is a note about consistency with writers: > > -e, --shared=num > Allow up to num clients to share the device (default 1). Safe > for readers, but for now, consistency is not guaranteed between multiple > writers. > But it is not clear what are the consistency guarantees. > > Supporting multiple writers is important. oVirt is giving the user a URL > (since 4.3), and the user > can use multiple connections using the same URL, each having a connection > to the same qemu-nbd > socket. I know that some backup vendors tried to use multiple connections > to speed up backups, and > they may try to do this also for restore. > > An interesting use case would be using multiple connections on client side > to write in parallel to > same image, when every client is writing different ranges. Good to know. > > Do we have real issue in qemu-nbd serving multiple clients writing to > different parts of > the same image? If a server advertises multi-conn on a writable image, then clients have stronger guarantees about behavior on what happens with flush on one client vs. write in another, to the point that you can make some better assumptions about image consistency, including what one client will read after another has written. But as long as multiple clients only ever access distinct portions of the disk, then multi-conn is not important to that client (whether for reading or for writing). So it sounds like I have no reason to deprecate qemu-nbd -e 2, even for writable images. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
On Mon, Aug 19, 2019 at 9:04 PM Eric Blake <eblake@redhat.com> wrote: > On 8/17/19 8:31 PM, Nir Soffer wrote: > >>> Also, for qemu-nbd, shouldn't we allow -e only together with -r ? > >> > >> I'm reluctant to; it might break whatever existing user is okay exposing > >> it (although such users are questionable, so maybe we can argue they > >> were already broken). Maybe it's time to start a deprecation cycle? > >> > > > > man qemu-nbd (on Centos 7.6) says: > > > > -e, --shared=num > > Allow up to num clients to share the device (default 1) > > > > I see that in qemu-img 4.1 there is a note about consistency with > writers: > > > > -e, --shared=num > > Allow up to num clients to share the device (default 1). Safe > > for readers, but for now, consistency is not guaranteed between multiple > > writers. > > But it is not clear what are the consistency guarantees. > > > > Supporting multiple writers is important. oVirt is giving the user a URL > > (since 4.3), and the user > > can use multiple connections using the same URL, each having a connection > > to the same qemu-nbd > > socket. I know that some backup vendors tried to use multiple connections > > to speed up backups, and > > they may try to do this also for restore. > > > > An interesting use case would be using multiple connections on client > side > > to write in parallel to > > same image, when every client is writing different ranges. > > Good to know. > > > > > Do we have real issue in qemu-nbd serving multiple clients writing to > > different parts of > > the same image? > > If a server advertises multi-conn on a writable image, then clients have > stronger guarantees about behavior on what happens with flush on one > client vs. write in another, to the point that you can make some better > assumptions about image consistency, including what one client will read > after another has written. But as long as multiple clients only ever > access distinct portions of the disk, then multi-conn is not important > to that client (whether for reading or for writing). > Thanks for making this clear. I think we need to document this in oVirt, so users will be careful about using multiple connections. > > So it sounds like I have no reason to deprecate qemu-nbd -e 2, even for > writable images. > > -- > Eric Blake, Principal Software Engineer > Red Hat, Inc. +1-919-301-3226 > Virtualization: qemu.org | libvirt.org > >
17.08.2019 17:30, Eric Blake wrote: > On 8/16/19 5:47 AM, Vladimir Sementsov-Ogievskiy wrote: > >>>> +++ b/blockdev-nbd.c >>>> @@ -189,7 +189,7 @@ void qmp_nbd_server_add(const char *device, bool has_name, const char *name, >>>> } >>>> >>>> exp = nbd_export_new(bs, 0, len, name, NULL, bitmap, >>>> - writable ? 0 : NBD_FLAG_READ_ONLY, >>>> + writable ? 0 : NBD_FLAG_READ_ONLY, true, >>> >>> s/true/!writable ? >> >> Oh, I see, John already noticed this, it's checked in nbd_export_new anyway.. > > Still, since two reviewers have caught it, I'm fixing it :) With it or without: Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> > > >>>> @@ -1486,6 +1486,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t dev_offset, >>>> perm = BLK_PERM_CONSISTENT_READ; >>>> if ((nbdflags & NBD_FLAG_READ_ONLY) == 0) { >>>> perm |= BLK_PERM_WRITE; >>>> + } else if (shared) { >>>> + nbdflags |= NBD_FLAG_CAN_MULTI_CONN; >> >> For me it looks a bit strange: we already have nbdflags parameter for nbd_export_new(), why >> to add a separate boolean to pass one of nbdflags flags? > > Because I want to get rid of the nbdflags in my next patch. > >> >> Also, for qemu-nbd, shouldn't we allow -e only together with -r ? > > I'm reluctant to; it might break whatever existing user is okay exposing > it (although such users are questionable, so maybe we can argue they > were already broken). Maybe it's time to start a deprecation cycle? > -- Best regards, Vladimir
© 2016 - 2024 Red Hat, Inc.