qapi/sockets.json | 33 +++++++++++++++++- util/qemu-sockets.c | 81 +++++++++++++++++++++++++++++++++++++++------ 2 files changed, 102 insertions(+), 12 deletions(-)
Hi all! We understood, that keepalive is almost superfluous with default 2 hours in /proc/tcp_keepalive_time. Forcing user to setup keepalive for the whole system doesn't seem right, better setup it per-socket. These series suggests to set some defaults for keepalive settings as well as set keepalive itself by default. keepalive helps to not hang on io other side is down. Vladimir Sementsov-Ogievskiy (2): sockets: keep-alive settings util/qemu-sockets: make keep-alive enabled by default qapi/sockets.json | 33 +++++++++++++++++- util/qemu-sockets.c | 81 +++++++++++++++++++++++++++++++++++++++------ 2 files changed, 102 insertions(+), 12 deletions(-) -- 2.21.0
Patchew URL: https://patchew.org/QEMU/20200708191540.28455-1-vsementsov@virtuozzo.com/ Hi, This series failed the docker-mingw@fedora build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #! /bin/bash export ARCH=x86_64 make docker-image-fedora V=1 NETWORK=1 time make docker-test-mingw@fedora J=14 NETWORK=1 === TEST SCRIPT END === CC hw/hppa/trace.o In file included from /tmp/qemu-test/src/util/qemu-sockets.c:24: /tmp/qemu-test/src/util/qemu-sockets.c: In function 'inet_set_keepalive': /tmp/qemu-test/src/util/qemu-sockets.c:469:46: error: 'TCP_KEEPIDLE' undeclared (first use in this function) 469 | ret = qemu_setsockopt(sock, IPPROTO_TCP, TCP_KEEPIDLE, &val, sizeof(val)); | ^~~~~~~~~~~~ /tmp/qemu-test/src/include/qemu-common.h:48:31: note: in definition of macro 'qemu_setsockopt' --- /tmp/qemu-test/src/include/qemu-common.h:48:31: note: in definition of macro 'qemu_setsockopt' 48 | setsockopt(sockfd, level, optname, (const void *)optval, optlen) | ^~~~~~~ /tmp/qemu-test/src/util/qemu-sockets.c:476:46: error: 'TCP_KEEPINTVL' undeclared (first use in this function) 476 | ret = qemu_setsockopt(sock, IPPROTO_TCP, TCP_KEEPINTVL, &val, sizeof(val)); | ^~~~~~~~~~~~~ /tmp/qemu-test/src/include/qemu-common.h:48:31: note: in definition of macro 'qemu_setsockopt' 48 | setsockopt(sockfd, level, optname, (const void *)optval, optlen) | ^~~~~~~ /tmp/qemu-test/src/util/qemu-sockets.c:483:46: error: 'TCP_KEEPCNT' undeclared (first use in this function) 483 | ret = qemu_setsockopt(sock, IPPROTO_TCP, TCP_KEEPCNT, &val, sizeof(val)); | ^~~~~~~~~~~ /tmp/qemu-test/src/include/qemu-common.h:48:31: note: in definition of macro 'qemu_setsockopt' 48 | setsockopt(sockfd, level, optname, (const void *)optval, optlen) | ^~~~~~~ make: *** [/tmp/qemu-test/src/rules.mak:69: util/qemu-sockets.o] Error 1 make: *** Waiting for unfinished jobs.... Traceback (most recent call last): File "./tests/docker/docker.py", line 669, in <module> --- raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=bab255245d94432aabefeb4189c6aced', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-mck04b5u/src/docker-src.2020-07-08-15.38.08.17376:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2. filter=--filter=label=com.qemu.instance.uuid=bab255245d94432aabefeb4189c6aced make[1]: *** [docker-run] Error 1 make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-mck04b5u/src' make: *** [docker-run-test-mingw@fedora] Error 2 real 3m11.264s user 0m8.976s The full log is available at http://patchew.org/logs/20200708191540.28455-1-vsementsov@virtuozzo.com/testing.docker-mingw@fedora/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
On Wed, Jul 08, 2020 at 10:15:37PM +0300, Vladimir Sementsov-Ogievskiy wrote: > Hi all! > > We understood, that keepalive is almost superfluous with default 2 hours > in /proc/tcp_keepalive_time. Forcing user to setup keepalive for the > whole system doesn't seem right, better setup it per-socket. Adding the ability to explicitly configure the keepalive settings makes sense for QEMU. Completely ignoring system defaults when no explicit settings are given though is not valid IMHO. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 7/9/20 3:35 AM, Daniel P. Berrangé wrote: > On Wed, Jul 08, 2020 at 10:15:37PM +0300, Vladimir Sementsov-Ogievskiy wrote: >> Hi all! >> >> We understood, that keepalive is almost superfluous with default 2 hours >> in /proc/tcp_keepalive_time. Forcing user to setup keepalive for the >> whole system doesn't seem right, better setup it per-socket. > > Adding the ability to explicitly configure the keepalive settings makes > sense for QEMU. Completely ignoring system defaults when no explicit > settings are given though is not valid IMHO. We already have the ability to add per-socket keepalive (see commit aec21d3175, in 4.2). I guess what you are trying to further do is determine whether the default value for that field, when not explicitly specified by the user, can have saner semantics (default off for chardev sockets, default on for nbd clients where retry was enabled). But since you already have to explicitly opt-in to nbd retry, what's so hard about opting in to keepalive at the same time, other than more typing? Given that the easiest way to do this is via a machine-coded generation of the command line or QMP command, it doesn't seem that hard to just keep things as they are with explicit opt-in to per-socket keepalive. -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org
09.07.2020 18:34, Eric Blake wrote: > On 7/9/20 3:35 AM, Daniel P. Berrangé wrote: >> On Wed, Jul 08, 2020 at 10:15:37PM +0300, Vladimir Sementsov-Ogievskiy wrote: >>> Hi all! >>> >>> We understood, that keepalive is almost superfluous with default 2 hours >>> in /proc/tcp_keepalive_time. Forcing user to setup keepalive for the >>> whole system doesn't seem right, better setup it per-socket. >> >> Adding the ability to explicitly configure the keepalive settings makes >> sense for QEMU. Completely ignoring system defaults when no explicit >> settings are given though is not valid IMHO. > > We already have the ability to add per-socket keepalive (see commit aec21d3175, in 4.2). I guess what you are trying to further do is determine whether the default value for that field, when not explicitly specified by the user, can have saner semantics (default off for chardev sockets, default on for nbd clients where retry was enabled). But since you already have to explicitly opt-in to nbd retry, what's so hard about opting in to keepalive at the same time, other than more typing? Given that the easiest way to do this is via a machine-coded generation of the command line or QMP command, it doesn't seem that hard to just keep things as they are with explicit opt-in to per-socket keepalive. > Hmm. Looking at the code, I remember that reconnect is not optional, it works by default now. The option we have is "reconnect-delay" which only specify, after how much seconds we'll automatically fail the requests, not retrying them (0 seconds by default). Still, NBD tries to reconnect in background anyway. In our downstream we have now old version of nbd-reconnect interface and enabled non-zero "delay" by default. -- Best regards, Vladimir
09.07.2020 20:14, Vladimir Sementsov-Ogievskiy wrote: > 09.07.2020 18:34, Eric Blake wrote: >> On 7/9/20 3:35 AM, Daniel P. Berrangé wrote: >>> On Wed, Jul 08, 2020 at 10:15:37PM +0300, Vladimir Sementsov-Ogievskiy wrote: >>>> Hi all! >>>> >>>> We understood, that keepalive is almost superfluous with default 2 hours >>>> in /proc/tcp_keepalive_time. Forcing user to setup keepalive for the >>>> whole system doesn't seem right, better setup it per-socket. >>> >>> Adding the ability to explicitly configure the keepalive settings makes >>> sense for QEMU. Completely ignoring system defaults when no explicit >>> settings are given though is not valid IMHO. >> >> We already have the ability to add per-socket keepalive (see commit aec21d3175, in 4.2). I guess what you are trying to further do is determine whether the default value for that field, when not explicitly specified by the user, can have saner semantics (default off for chardev sockets, default on for nbd clients where retry was enabled). But since you already have to explicitly opt-in to nbd retry, what's so hard about opting in to keepalive at the same time, other than more typing? Given that the easiest way to do this is via a machine-coded generation of the command line or QMP command, it doesn't seem that hard to just keep things as they are with explicit opt-in to per-socket keepalive. >> > > Hmm. Looking at the code, I remember that reconnect is not optional, it works by default now. The option we have is "reconnect-delay" which only specify, after how much seconds we'll automatically fail the requests, not retrying them (0 seconds by default). Still, NBD tries to reconnect in background anyway. > > In our downstream we have now old version of nbd-reconnect interface and enabled non-zero "delay" by default. > And now we need to migrate to upstream code. Hmm. So I have several options: 1. Set a downstream default for reconnect-delay to be something like 5min. [most simple thing to do] 2. Set an upstream default to 5min. [needs a discussion, but may be useful] 3. Force all users (not customers I mean, but other teams (and even one another company)) to set reconnect-delay option. [just for completeness, I will not go this way] So, what do you think about [2]? This includes: - non-zero reconnect-delay by default, so requests will wait some time for reconnect and retry - enabled keep-alive and some keep-alive default parameters, to not hang in recvmsg for a long time Some other related ideas: - non-blocking nbd_open - move open to the coroutine - use non-blocking socket connect (for example use qio_channel_socket_connect_async, which runs connect in thread). Currently, if you try to blockdev-add some unavailable nbd host, vm hangs during connect() call which may be about a minute - make blockdev-add to be async qmp command (Kevin's series) - allow reconnect on open also, recently I noted, that bdrv_close may hang due to reconnect: it does bdrv_flush, which waits for NBD to reconnect.. This seems not convenient, probably we should disable reconnect before bdrv_drained_begin() in bdrv_close(). -- Best regards, Vladimir
© 2016 - 2024 Red Hat, Inc.