migration/fd.c | 39 +++++++++++++++++++++++++++++++-------- migration/trace-events | 4 ++-- 2 files changed, 33 insertions(+), 10 deletions(-)
Currently such case is possible for incoming:
QMP: add-fd (fdset = 0, fd of some file):
adds fd to fdset 0 and returns QEMU's fd (e.g. 33)
QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc
...
Incoming migration completes and unrefs ioc -> close(33)
QMP: remove-fd (fdset = 0, fd = 33):
removes fd from fdset 0 and qemu_close() -> close(33) => double close
For outgoing migration the case is the same but getfd instead of add-fd.
Fix it by duping client's fd.
Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
---
migration/fd.c | 39 +++++++++++++++++++++++++++++++--------
migration/trace-events | 4 ++--
2 files changed, 33 insertions(+), 10 deletions(-)
diff --git a/migration/fd.c b/migration/fd.c
index a7c13df4ad..c9ff07ac41 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -15,6 +15,7 @@
*/
#include "qemu/osdep.h"
+#include "qapi/error.h"
#include "channel.h"
#include "fd.h"
#include "migration.h"
@@ -26,15 +27,27 @@
void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
{
QIOChannel *ioc;
- int fd = monitor_get_fd(cur_mon, fdname, errp);
+ int fd, dup_fd;
+
+ fd = monitor_get_fd(cur_mon, fdname, errp);
if (fd == -1) {
return;
}
- trace_migration_fd_outgoing(fd);
- ioc = qio_channel_new_fd(fd, errp);
+ /* fd is previously created by qmp command 'getfd',
+ * so client is responsible to close it. Dup it to save original value from
+ * QIOChannel's destructor */
+ dup_fd = qemu_dup(fd);
+ if (dup_fd == -1) {
+ error_setg(errp, "Cannot dup fd %s: %s (%d)", fdname, strerror(errno),
+ errno);
+ return;
+ }
+
+ trace_migration_fd_outgoing(fd, dup_fd);
+ ioc = qio_channel_new_fd(dup_fd, errp);
if (!ioc) {
- close(fd);
+ close(dup_fd);
return;
}
@@ -55,14 +68,24 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
void fd_start_incoming_migration(const char *infd, Error **errp)
{
QIOChannel *ioc;
- int fd;
+ int fd, dup_fd;
fd = strtol(infd, NULL, 0);
- trace_migration_fd_incoming(fd);
- ioc = qio_channel_new_fd(fd, errp);
+ /* fd is previously created by qmp command 'add-fd' or something else,
+ * so client is responsible to close it. Dup it to save original value from
+ * QIOChannel's destructor */
+ dup_fd = qemu_dup(fd);
+ if (dup_fd == -1) {
+ error_setg(errp, "Cannot dup fd %d: %s (%d)", fd, strerror(errno),
+ errno);
+ return;
+ }
+
+ trace_migration_fd_incoming(fd, dup_fd);
+ ioc = qio_channel_new_fd(dup_fd, errp);
if (!ioc) {
- close(fd);
+ close(dup_fd);
return;
}
diff --git a/migration/trace-events b/migration/trace-events
index de2e136e57..d2d30a6b3c 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -258,8 +258,8 @@ migration_exec_outgoing(const char *cmd) "cmd=%s"
migration_exec_incoming(const char *cmd) "cmd=%s"
# fd.c
-migration_fd_outgoing(int fd) "fd=%d"
-migration_fd_incoming(int fd) "fd=%d"
+migration_fd_outgoing(int fd, int dup_fd) "fd=%d dup_fd=%d"
+migration_fd_incoming(int fd, int dup_fd) "fd=%d dup_fd=%d"
# socket.c
migration_socket_incoming_accepted(void) ""
--
2.21.0
Patchew URL: https://patchew.org/QEMU/20190410092652.22616-1-yury-kotov@yandex-team.ru/ Hi, This series failed the docker-mingw@fedora build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. === TEST SCRIPT BEGIN === #!/bin/bash time make docker-test-mingw@fedora SHOW_ENV=1 J=14 NETWORK=1 === TEST SCRIPT END === CC migration/xbzrle.o CC migration/postcopy-ram.o /tmp/qemu-test/src/migration/fd.c: In function 'fd_start_outgoing_migration': /tmp/qemu-test/src/migration/fd.c:40:14: error: implicit declaration of function 'qemu_dup'; did you mean 'qemu_hexdump'? [-Werror=implicit-function-declaration] dup_fd = qemu_dup(fd); ^~~~~~~~ qemu_hexdump /tmp/qemu-test/src/migration/fd.c:40:14: error: nested extern declaration of 'qemu_dup' [-Werror=nested-externs] cc1: all warnings being treated as errors make: *** [/tmp/qemu-test/src/rules.mak:69: migration/fd.o] Error 1 make: *** Waiting for unfinished jobs.... The full log is available at http://patchew.org/logs/20190410092652.22616-1-yury-kotov@yandex-team.ru/testing.docker-mingw@fedora/?type=message. --- Email generated automatically by Patchew [https://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
* Yury Kotov (yury-kotov@yandex-team.ru) wrote: > Currently such case is possible for incoming: > QMP: add-fd (fdset = 0, fd of some file): > adds fd to fdset 0 and returns QEMU's fd (e.g. 33) > QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc > ... > Incoming migration completes and unrefs ioc -> close(33) > QMP: remove-fd (fdset = 0, fd = 33): > removes fd from fdset 0 and qemu_close() -> close(33) => double close Well spotted! That would very rarely cause a problem, but is a race. > For outgoing migration the case is the same but getfd instead of add-fd. > Fix it by duping client's fd. > > Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru> Note your patch hit a problem building on windows; I don't think we have a qemu_dup for windows. However, I think this problem is wider than just migration. For example, I see that dump.c also uses monitor_get_fd, and it's dump_cleanup also does a close on the fd. So I guess it hits the same problem? Also, qmp.c in qmp_add_client does a close on the fd in some error cases (I've not followed the normal case). So perhaps all the users of monitor_get_fd are hitting this problem. Should we be doing the dup in monitor_get_fd? Dave > --- > migration/fd.c | 39 +++++++++++++++++++++++++++++++-------- > migration/trace-events | 4 ++-- > 2 files changed, 33 insertions(+), 10 deletions(-) > > diff --git a/migration/fd.c b/migration/fd.c > index a7c13df4ad..c9ff07ac41 100644 > --- a/migration/fd.c > +++ b/migration/fd.c > @@ -15,6 +15,7 @@ > */ > > #include "qemu/osdep.h" > +#include "qapi/error.h" > #include "channel.h" > #include "fd.h" > #include "migration.h" > @@ -26,15 +27,27 @@ > void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp) > { > QIOChannel *ioc; > - int fd = monitor_get_fd(cur_mon, fdname, errp); > + int fd, dup_fd; > + > + fd = monitor_get_fd(cur_mon, fdname, errp); > if (fd == -1) { > return; > } > > - trace_migration_fd_outgoing(fd); > - ioc = qio_channel_new_fd(fd, errp); > + /* fd is previously created by qmp command 'getfd', > + * so client is responsible to close it. Dup it to save original value from > + * QIOChannel's destructor */ > + dup_fd = qemu_dup(fd); > + if (dup_fd == -1) { > + error_setg(errp, "Cannot dup fd %s: %s (%d)", fdname, strerror(errno), > + errno); > + return; > + } > + > + trace_migration_fd_outgoing(fd, dup_fd); > + ioc = qio_channel_new_fd(dup_fd, errp); > if (!ioc) { > - close(fd); > + close(dup_fd); > return; > } > > @@ -55,14 +68,24 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc, > void fd_start_incoming_migration(const char *infd, Error **errp) > { > QIOChannel *ioc; > - int fd; > + int fd, dup_fd; > > fd = strtol(infd, NULL, 0); > - trace_migration_fd_incoming(fd); > > - ioc = qio_channel_new_fd(fd, errp); > + /* fd is previously created by qmp command 'add-fd' or something else, > + * so client is responsible to close it. Dup it to save original value from > + * QIOChannel's destructor */ > + dup_fd = qemu_dup(fd); > + if (dup_fd == -1) { > + error_setg(errp, "Cannot dup fd %d: %s (%d)", fd, strerror(errno), > + errno); > + return; > + } > + > + trace_migration_fd_incoming(fd, dup_fd); > + ioc = qio_channel_new_fd(dup_fd, errp); > if (!ioc) { > - close(fd); > + close(dup_fd); > return; > } > > diff --git a/migration/trace-events b/migration/trace-events > index de2e136e57..d2d30a6b3c 100644 > --- a/migration/trace-events > +++ b/migration/trace-events > @@ -258,8 +258,8 @@ migration_exec_outgoing(const char *cmd) "cmd=%s" > migration_exec_incoming(const char *cmd) "cmd=%s" > > # fd.c > -migration_fd_outgoing(int fd) "fd=%d" > -migration_fd_incoming(int fd) "fd=%d" > +migration_fd_outgoing(int fd, int dup_fd) "fd=%d dup_fd=%d" > +migration_fd_incoming(int fd, int dup_fd) "fd=%d dup_fd=%d" > > # socket.c > migration_socket_incoming_accepted(void) "" > -- > 2.21.0 > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Hi, 10.04.2019, 16:58, "Dr. David Alan Gilbert" <dgilbert@redhat.com>: > * Yury Kotov (yury-kotov@yandex-team.ru) wrote: >> Currently such case is possible for incoming: >> QMP: add-fd (fdset = 0, fd of some file): >> adds fd to fdset 0 and returns QEMU's fd (e.g. 33) >> QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc >> ... >> Incoming migration completes and unrefs ioc -> close(33) >> QMP: remove-fd (fdset = 0, fd = 33): >> removes fd from fdset 0 and qemu_close() -> close(33) => double close > > Well spotted! That would very rarely cause a problem, but is a race. > Actually, it hits for incoming case on 2 of 50 VMs on our production... >> For outgoing migration the case is the same but getfd instead of add-fd. >> Fix it by duping client's fd. >> >> Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru> > > Note your patch hit a problem building on windows; I don't think we have > a qemu_dup for windows. > Oh, I see... I'll add an ifdef for this in v2. > However, I think this problem is wider than just migration. > For example, I see that dump.c also uses monitor_get_fd, and it's > dump_cleanup also does a close on the fd. So I guess it hits the same > problem? > Also, qmp.c in qmp_add_client does a close on the fd in some error cases > (I've not followed the normal case). > > So perhaps all the users of monitor_get_fd are hitting this problem. > > Should we be doing the dup in monitor_get_fd? > Hmm, it sounds reasonable but Windows's case will remain broken. And also using fd from fdset without qemu_open will remain broken. Another way to fix them: 1) Add searching of monitor's fds and not duped fdset's fds to qemu_close 2) Replace broken closes to qemu_close Regards, Yury Kotov > Dave > >> --- >> migration/fd.c | 39 +++++++++++++++++++++++++++++++-------- >> migration/trace-events | 4 ++-- >> 2 files changed, 33 insertions(+), 10 deletions(-) >> >> diff --git a/migration/fd.c b/migration/fd.c >> index a7c13df4ad..c9ff07ac41 100644 >> --- a/migration/fd.c >> +++ b/migration/fd.c >> @@ -15,6 +15,7 @@ >> */ >> >> #include "qemu/osdep.h" >> +#include "qapi/error.h" >> #include "channel.h" >> #include "fd.h" >> #include "migration.h" >> @@ -26,15 +27,27 @@ >> void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp) >> { >> QIOChannel *ioc; >> - int fd = monitor_get_fd(cur_mon, fdname, errp); >> + int fd, dup_fd; >> + >> + fd = monitor_get_fd(cur_mon, fdname, errp); >> if (fd == -1) { >> return; >> } >> >> - trace_migration_fd_outgoing(fd); >> - ioc = qio_channel_new_fd(fd, errp); >> + /* fd is previously created by qmp command 'getfd', >> + * so client is responsible to close it. Dup it to save original value from >> + * QIOChannel's destructor */ >> + dup_fd = qemu_dup(fd); >> + if (dup_fd == -1) { >> + error_setg(errp, "Cannot dup fd %s: %s (%d)", fdname, strerror(errno), >> + errno); >> + return; >> + } >> + >> + trace_migration_fd_outgoing(fd, dup_fd); >> + ioc = qio_channel_new_fd(dup_fd, errp); >> if (!ioc) { >> - close(fd); >> + close(dup_fd); >> return; >> } >> >> @@ -55,14 +68,24 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc, >> void fd_start_incoming_migration(const char *infd, Error **errp) >> { >> QIOChannel *ioc; >> - int fd; >> + int fd, dup_fd; >> >> fd = strtol(infd, NULL, 0); >> - trace_migration_fd_incoming(fd); >> >> - ioc = qio_channel_new_fd(fd, errp); >> + /* fd is previously created by qmp command 'add-fd' or something else, >> + * so client is responsible to close it. Dup it to save original value from >> + * QIOChannel's destructor */ >> + dup_fd = qemu_dup(fd); >> + if (dup_fd == -1) { >> + error_setg(errp, "Cannot dup fd %d: %s (%d)", fd, strerror(errno), >> + errno); >> + return; >> + } >> + >> + trace_migration_fd_incoming(fd, dup_fd); >> + ioc = qio_channel_new_fd(dup_fd, errp); >> if (!ioc) { >> - close(fd); >> + close(dup_fd); >> return; >> } >> >> diff --git a/migration/trace-events b/migration/trace-events >> index de2e136e57..d2d30a6b3c 100644 >> --- a/migration/trace-events >> +++ b/migration/trace-events >> @@ -258,8 +258,8 @@ migration_exec_outgoing(const char *cmd) "cmd=%s" >> migration_exec_incoming(const char *cmd) "cmd=%s" >> >> # fd.c >> -migration_fd_outgoing(int fd) "fd=%d" >> -migration_fd_incoming(int fd) "fd=%d" >> +migration_fd_outgoing(int fd, int dup_fd) "fd=%d dup_fd=%d" >> +migration_fd_incoming(int fd, int dup_fd) "fd=%d dup_fd=%d" >> >> # socket.c >> migration_socket_incoming_accepted(void) "" >> -- >> 2.21.0 > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Wed, Apr 10, 2019 at 12:26:52PM +0300, Yury Kotov wrote: > Currently such case is possible for incoming: > QMP: add-fd (fdset = 0, fd of some file): > adds fd to fdset 0 and returns QEMU's fd (e.g. 33) > QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc > ... > Incoming migration completes and unrefs ioc -> close(33) > QMP: remove-fd (fdset = 0, fd = 33): > removes fd from fdset 0 and qemu_close() -> close(33) => double close IMHO this is user error. You've given the FD to QEMU and told it to use it for migration. Once you've done that you should not be calling remove-fd. remove-fd should only be used in some error code path. ie if you have called add-fd, and then some failure means you've decided you can't invoke 'migrate-incoming'. Now you should call "remove-fd". AFAIK, we have never documented that 'add-fd' must be paired with 'remove-fd'. IOW, I think this "fix" is in fact not a fix - it is instead changing the semantics of when "remove-fd" should be used. > For outgoing migration the case is the same but getfd instead of add-fd. > Fix it by duping client's fd. > > Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru> > --- > migration/fd.c | 39 +++++++++++++++++++++++++++++++-------- > migration/trace-events | 4 ++-- > 2 files changed, 33 insertions(+), 10 deletions(-) > > diff --git a/migration/fd.c b/migration/fd.c > index a7c13df4ad..c9ff07ac41 100644 > --- a/migration/fd.c > +++ b/migration/fd.c > @@ -15,6 +15,7 @@ > */ > > #include "qemu/osdep.h" > +#include "qapi/error.h" > #include "channel.h" > #include "fd.h" > #include "migration.h" > @@ -26,15 +27,27 @@ > void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp) > { > QIOChannel *ioc; > - int fd = monitor_get_fd(cur_mon, fdname, errp); > + int fd, dup_fd; > + > + fd = monitor_get_fd(cur_mon, fdname, errp); > if (fd == -1) { > return; > } > > - trace_migration_fd_outgoing(fd); > - ioc = qio_channel_new_fd(fd, errp); > + /* fd is previously created by qmp command 'getfd', > + * so client is responsible to close it. Dup it to save original value from > + * QIOChannel's destructor */ > + dup_fd = qemu_dup(fd); > + if (dup_fd == -1) { > + error_setg(errp, "Cannot dup fd %s: %s (%d)", fdname, strerror(errno), > + errno); > + return; > + } > + > + trace_migration_fd_outgoing(fd, dup_fd); > + ioc = qio_channel_new_fd(dup_fd, errp); > if (!ioc) { > - close(fd); > + close(dup_fd); > return; > } > > @@ -55,14 +68,24 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc, > void fd_start_incoming_migration(const char *infd, Error **errp) > { > QIOChannel *ioc; > - int fd; > + int fd, dup_fd; > > fd = strtol(infd, NULL, 0); > - trace_migration_fd_incoming(fd); > > - ioc = qio_channel_new_fd(fd, errp); > + /* fd is previously created by qmp command 'add-fd' or something else, > + * so client is responsible to close it. Dup it to save original value from > + * QIOChannel's destructor */ > + dup_fd = qemu_dup(fd); > + if (dup_fd == -1) { > + error_setg(errp, "Cannot dup fd %d: %s (%d)", fd, strerror(errno), > + errno); > + return; > + } > + > + trace_migration_fd_incoming(fd, dup_fd); > + ioc = qio_channel_new_fd(dup_fd, errp); > if (!ioc) { > - close(fd); > + close(dup_fd); > return; > } > > diff --git a/migration/trace-events b/migration/trace-events > index de2e136e57..d2d30a6b3c 100644 > --- a/migration/trace-events > +++ b/migration/trace-events > @@ -258,8 +258,8 @@ migration_exec_outgoing(const char *cmd) "cmd=%s" > migration_exec_incoming(const char *cmd) "cmd=%s" > > # fd.c > -migration_fd_outgoing(int fd) "fd=%d" > -migration_fd_incoming(int fd) "fd=%d" > +migration_fd_outgoing(int fd, int dup_fd) "fd=%d dup_fd=%d" > +migration_fd_incoming(int fd, int dup_fd) "fd=%d dup_fd=%d" > > # socket.c > migration_socket_incoming_accepted(void) "" > -- > 2.21.0 > > Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
11.04.2019, 15:04, "Daniel P. Berrangé" <berrange@redhat.com>: > On Wed, Apr 10, 2019 at 12:26:52PM +0300, Yury Kotov wrote: >> Currently such case is possible for incoming: >> QMP: add-fd (fdset = 0, fd of some file): >> adds fd to fdset 0 and returns QEMU's fd (e.g. 33) >> QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc >> ... >> Incoming migration completes and unrefs ioc -> close(33) >> QMP: remove-fd (fdset = 0, fd = 33): >> removes fd from fdset 0 and qemu_close() -> close(33) => double close > > IMHO this is user error. > > You've given the FD to QEMU and told it to use it for migration. > > Once you've done that you should not be calling remove-fd. > > remove-fd should only be used in some error code path. ie if you > have called add-fd, and then some failure means you've decided you > can't invoke 'migrate-incoming'. Now you should call "remove-fd". > > AFAIK, we have never documented that 'add-fd' must be paired > with 'remove-fd'. > > IOW, I think this "fix" is in fact not a fix - it is instead > changing the semantics of when "remove-fd" should be used. > I think it might be user's error but fd is still in cur_mon->fds (in getfd case) or in mon_fdsets (in add-fd case). So if fd is still exists in the lists why user can't close/remove it? So, there are 2 valid options: 1) fd is still valid after migration (dup) 2) fd is closed but also removed from the appropriate list Regards, Yury >> For outgoing migration the case is the same but getfd instead of add-fd. >> Fix it by duping client's fd. >> >> Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru> >> --- >> migration/fd.c | 39 +++++++++++++++++++++++++++++++-------- >> migration/trace-events | 4 ++-- >> 2 files changed, 33 insertions(+), 10 deletions(-) >> >> diff --git a/migration/fd.c b/migration/fd.c >> index a7c13df4ad..c9ff07ac41 100644 >> --- a/migration/fd.c >> +++ b/migration/fd.c >> @@ -15,6 +15,7 @@ >> */ >> >> #include "qemu/osdep.h" >> +#include "qapi/error.h" >> #include "channel.h" >> #include "fd.h" >> #include "migration.h" >> @@ -26,15 +27,27 @@ >> void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp) >> { >> QIOChannel *ioc; >> - int fd = monitor_get_fd(cur_mon, fdname, errp); >> + int fd, dup_fd; >> + >> + fd = monitor_get_fd(cur_mon, fdname, errp); >> if (fd == -1) { >> return; >> } >> >> - trace_migration_fd_outgoing(fd); >> - ioc = qio_channel_new_fd(fd, errp); >> + /* fd is previously created by qmp command 'getfd', >> + * so client is responsible to close it. Dup it to save original value from >> + * QIOChannel's destructor */ >> + dup_fd = qemu_dup(fd); >> + if (dup_fd == -1) { >> + error_setg(errp, "Cannot dup fd %s: %s (%d)", fdname, strerror(errno), >> + errno); >> + return; >> + } >> + >> + trace_migration_fd_outgoing(fd, dup_fd); >> + ioc = qio_channel_new_fd(dup_fd, errp); >> if (!ioc) { >> - close(fd); >> + close(dup_fd); >> return; >> } >> >> @@ -55,14 +68,24 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc, >> void fd_start_incoming_migration(const char *infd, Error **errp) >> { >> QIOChannel *ioc; >> - int fd; >> + int fd, dup_fd; >> >> fd = strtol(infd, NULL, 0); >> - trace_migration_fd_incoming(fd); >> >> - ioc = qio_channel_new_fd(fd, errp); >> + /* fd is previously created by qmp command 'add-fd' or something else, >> + * so client is responsible to close it. Dup it to save original value from >> + * QIOChannel's destructor */ >> + dup_fd = qemu_dup(fd); >> + if (dup_fd == -1) { >> + error_setg(errp, "Cannot dup fd %d: %s (%d)", fd, strerror(errno), >> + errno); >> + return; >> + } >> + >> + trace_migration_fd_incoming(fd, dup_fd); >> + ioc = qio_channel_new_fd(dup_fd, errp); >> if (!ioc) { >> - close(fd); >> + close(dup_fd); >> return; >> } >> >> diff --git a/migration/trace-events b/migration/trace-events >> index de2e136e57..d2d30a6b3c 100644 >> --- a/migration/trace-events >> +++ b/migration/trace-events >> @@ -258,8 +258,8 @@ migration_exec_outgoing(const char *cmd) "cmd=%s" >> migration_exec_incoming(const char *cmd) "cmd=%s" >> >> # fd.c >> -migration_fd_outgoing(int fd) "fd=%d" >> -migration_fd_incoming(int fd) "fd=%d" >> +migration_fd_outgoing(int fd, int dup_fd) "fd=%d dup_fd=%d" >> +migration_fd_incoming(int fd, int dup_fd) "fd=%d dup_fd=%d" >> >> # socket.c >> migration_socket_incoming_accepted(void) "" >> -- >> 2.21.0 > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Thu, Apr 11, 2019 at 03:31:43PM +0300, Yury Kotov wrote: > 11.04.2019, 15:04, "Daniel P. Berrangé" <berrange@redhat.com>: > > On Wed, Apr 10, 2019 at 12:26:52PM +0300, Yury Kotov wrote: > >> Currently such case is possible for incoming: > >> QMP: add-fd (fdset = 0, fd of some file): > >> adds fd to fdset 0 and returns QEMU's fd (e.g. 33) > >> QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc > >> ... > >> Incoming migration completes and unrefs ioc -> close(33) > >> QMP: remove-fd (fdset = 0, fd = 33): > >> removes fd from fdset 0 and qemu_close() -> close(33) => double close > > > > IMHO this is user error. > > > > You've given the FD to QEMU and told it to use it for migration. > > > > Once you've done that you should not be calling remove-fd. > > > > remove-fd should only be used in some error code path. ie if you > > have called add-fd, and then some failure means you've decided you > > can't invoke 'migrate-incoming'. Now you should call "remove-fd". > > > > AFAIK, we have never documented that 'add-fd' must be paired > > with 'remove-fd'. > > > > IOW, I think this "fix" is in fact not a fix - it is instead > > changing the semantics of when "remove-fd" should be used. > > > > I think it might be user's error but fd is still in cur_mon->fds (in getfd case) > or in mon_fdsets (in add-fd case). So if fd is still exists in the lists why > user can't close/remove it? > > So, there are 2 valid options: > 1) fd is still valid after migration (dup) If we do this then existing mgmt apps using QEMU who don't expect to need to call remove-fd are going to be leaking FDs forever. > 2) fd is closed but also removed from the appropriate list monitor_get_fd currently leaves the FD on the list. if all current users of that API are closing the FD themselves, then we could change that method to remove it from the list. Either way the requirements in this area are pooly documented both from QEMU's internal POV and from mgmt app public QMP pov. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
11.04.2019, 15:39, "Daniel P. Berrangé" <berrange@redhat.com>: > On Thu, Apr 11, 2019 at 03:31:43PM +0300, Yury Kotov wrote: >> 11.04.2019, 15:04, "Daniel P. Berrangé" <berrange@redhat.com>: >> > On Wed, Apr 10, 2019 at 12:26:52PM +0300, Yury Kotov wrote: >> >> Currently such case is possible for incoming: >> >> QMP: add-fd (fdset = 0, fd of some file): >> >> adds fd to fdset 0 and returns QEMU's fd (e.g. 33) >> >> QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc >> >> ... >> >> Incoming migration completes and unrefs ioc -> close(33) >> >> QMP: remove-fd (fdset = 0, fd = 33): >> >> removes fd from fdset 0 and qemu_close() -> close(33) => double close >> > >> > IMHO this is user error. >> > >> > You've given the FD to QEMU and told it to use it for migration. >> > >> > Once you've done that you should not be calling remove-fd. >> > >> > remove-fd should only be used in some error code path. ie if you >> > have called add-fd, and then some failure means you've decided you >> > can't invoke 'migrate-incoming'. Now you should call "remove-fd". >> > >> > AFAIK, we have never documented that 'add-fd' must be paired >> > with 'remove-fd'. >> > >> > IOW, I think this "fix" is in fact not a fix - it is instead >> > changing the semantics of when "remove-fd" should be used. >> > >> >> I think it might be user's error but fd is still in cur_mon->fds (in getfd case) >> or in mon_fdsets (in add-fd case). So if fd is still exists in the lists why >> user can't close/remove it? >> >> So, there are 2 valid options: >> 1) fd is still valid after migration (dup) > > If we do this then existing mgmt apps using QEMU who don't expect to need > to call remove-fd are going to be leaking FDs forever. > Ok, so what do you think about modifying qemu_close to remove fd from these two lists? Currently it tryes to remove fd only from mon_fdsets->dup_fds. Regards, Yury >> 2) fd is closed but also removed from the appropriate list > > monitor_get_fd currently leaves the FD on the list. > > if all current users of that API are closing the FD > themselves, then we could change that method to remove > it from the list. > > Either way the requirements in this area are pooly documented > both from QEMU's internal POV and from mgmt app public QMP pov. > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Hi, Just to clarify. I see two possible solutions: 1) Since the migration code doesn't receive fd, it isn't responsible for closing it. So, it may be better to use migrate_fd_param for both incoming/outgoing and add dupping for migrate_fd_param. Thus, clients must close the fd themselves. But existing clients will have a leak. 2) If we don't duplicate fd, then at least we should remove fd from the corresponding list. Therefore, the solution is to fix qemu_close to find the list and remove fd from it. But qemu_close is currently consistent with qemu_open (which opens/dups fd), so adding additional logic might not be a very good idea. I don't see any other solution, but I might miss something. What do you think? Regards, Yury 11.04.2019, 15:58, "Yury Kotov" <yury-kotov@yandex-team.ru>: > 11.04.2019, 15:39, "Daniel P. Berrangé" <berrange@redhat.com>: >> On Thu, Apr 11, 2019 at 03:31:43PM +0300, Yury Kotov wrote: >>> 11.04.2019, 15:04, "Daniel P. Berrangé" <berrange@redhat.com>: >>> > On Wed, Apr 10, 2019 at 12:26:52PM +0300, Yury Kotov wrote: >>> >> Currently such case is possible for incoming: >>> >> QMP: add-fd (fdset = 0, fd of some file): >>> >> adds fd to fdset 0 and returns QEMU's fd (e.g. 33) >>> >> QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc >>> >> ... >>> >> Incoming migration completes and unrefs ioc -> close(33) >>> >> QMP: remove-fd (fdset = 0, fd = 33): >>> >> removes fd from fdset 0 and qemu_close() -> close(33) => double close >>> > >>> > IMHO this is user error. >>> > >>> > You've given the FD to QEMU and told it to use it for migration. >>> > >>> > Once you've done that you should not be calling remove-fd. >>> > >>> > remove-fd should only be used in some error code path. ie if you >>> > have called add-fd, and then some failure means you've decided you >>> > can't invoke 'migrate-incoming'. Now you should call "remove-fd". >>> > >>> > AFAIK, we have never documented that 'add-fd' must be paired >>> > with 'remove-fd'. >>> > >>> > IOW, I think this "fix" is in fact not a fix - it is instead >>> > changing the semantics of when "remove-fd" should be used. >>> > >>> >>> I think it might be user's error but fd is still in cur_mon->fds (in getfd case) >>> or in mon_fdsets (in add-fd case). So if fd is still exists in the lists why >>> user can't close/remove it? >>> >>> So, there are 2 valid options: >>> 1) fd is still valid after migration (dup) >> >> If we do this then existing mgmt apps using QEMU who don't expect to need >> to call remove-fd are going to be leaking FDs forever. > > Ok, so what do you think about modifying qemu_close to remove fd from these two > lists? Currently it tryes to remove fd only from mon_fdsets->dup_fds. > > Regards, > Yury > >>> 2) fd is closed but also removed from the appropriate list >> >> monitor_get_fd currently leaves the FD on the list. >> >> if all current users of that API are closing the FD >> themselves, then we could change that method to remove >> it from the list. >> >> Either way the requirements in this area are pooly documented >> both from QEMU's internal POV and from mgmt app public QMP pov. >> >> Regards, >> Daniel >> -- >> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| >> |: https://libvirt.org -o- https://fstop138.berrange.com :| >> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Mon, Apr 15, 2019 at 12:50:08PM +0300, Yury Kotov wrote: > Hi, > > Just to clarify. I see two possible solutions: > > 1) Since the migration code doesn't receive fd, it isn't responsible for > closing it. So, it may be better to use migrate_fd_param for both > incoming/outgoing and add dupping for migrate_fd_param. Thus, clients must > close the fd themselves. But existing clients will have a leak. We can't break existing clients in this way as they are correctly using the monitor with its current semantics. > 2) If we don't duplicate fd, then at least we should remove fd from > the corresponding list. Therefore, the solution is to fix qemu_close to find > the list and remove fd from it. But qemu_close is currently consistent with > qemu_open (which opens/dups fd), so adding additional logic might not be > a very good idea. qemu_close is not appropriate place to deal with something speciifc to the montor. > I don't see any other solution, but I might miss something. > What do you think? All callers of monitor_get_fd() will close() the FD they get back. Thus monitor_get_fd() should remove it from the list when it returns it, and we should add API docs to monitor_get_fd() to explain this. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
15.04.2019, 13:11, "Daniel P. Berrangé" <berrange@redhat.com>: > On Mon, Apr 15, 2019 at 12:50:08PM +0300, Yury Kotov wrote: >> Hi, >> >> Just to clarify. I see two possible solutions: >> >> 1) Since the migration code doesn't receive fd, it isn't responsible for >> closing it. So, it may be better to use migrate_fd_param for both >> incoming/outgoing and add dupping for migrate_fd_param. Thus, clients must >> close the fd themselves. But existing clients will have a leak. > > We can't break existing clients in this way as they are correctly > using the monitor with its current semantics. > >> 2) If we don't duplicate fd, then at least we should remove fd from >> the corresponding list. Therefore, the solution is to fix qemu_close to find >> the list and remove fd from it. But qemu_close is currently consistent with >> qemu_open (which opens/dups fd), so adding additional logic might not be >> a very good idea. > > qemu_close is not appropriate place to deal with something speciifc > to the montor. > >> I don't see any other solution, but I might miss something. >> What do you think? > > All callers of monitor_get_fd() will close() the FD they get back. > Thus monitor_get_fd() should remove it from the list when it returns > it, and we should add API docs to monitor_get_fd() to explain this. > Ok, it sounds reasonable. But monitor_get_fd is only about outgoing migration. But what about the incoming migration? It doesn't use monitor_get_fd but just converts input string to int and use it as fd. Regards, Yury
15.04.2019, 13:17, "Yury Kotov" <yury-kotov@yandex-team.ru>: > 15.04.2019, 13:11, "Daniel P. Berrangé" <berrange@redhat.com>: >> On Mon, Apr 15, 2019 at 12:50:08PM +0300, Yury Kotov wrote: >>> Hi, >>> >>> Just to clarify. I see two possible solutions: >>> >>> 1) Since the migration code doesn't receive fd, it isn't responsible for >>> closing it. So, it may be better to use migrate_fd_param for both >>> incoming/outgoing and add dupping for migrate_fd_param. Thus, clients must >>> close the fd themselves. But existing clients will have a leak. >> >> We can't break existing clients in this way as they are correctly >> using the monitor with its current semantics. >> >>> 2) If we don't duplicate fd, then at least we should remove fd from >>> the corresponding list. Therefore, the solution is to fix qemu_close to find >>> the list and remove fd from it. But qemu_close is currently consistent with >>> qemu_open (which opens/dups fd), so adding additional logic might not be >>> a very good idea. >> >> qemu_close is not appropriate place to deal with something speciifc >> to the montor. >> But qemu_close is already deal with monitor: It uses monitor_fdset_dup_fd_find & monitor_fdset_dup_fd_remove to find and remove fd from monitor's dup_fds list. >>> I don't see any other solution, but I might miss something. >>> What do you think? >> >> All callers of monitor_get_fd() will close() the FD they get back. >> Thus monitor_get_fd() should remove it from the list when it returns >> it, and we should add API docs to monitor_get_fd() to explain this. > > Ok, it sounds reasonable. But monitor_get_fd is only about outgoing migration. > But what about the incoming migration? It doesn't use monitor_get_fd but just > converts input string to int and use it as fd. > Regards, Yury
On Mon, Apr 15, 2019 at 01:17:06PM +0300, Yury Kotov wrote: > 15.04.2019, 13:11, "Daniel P. Berrangé" <berrange@redhat.com>: > > On Mon, Apr 15, 2019 at 12:50:08PM +0300, Yury Kotov wrote: > >> Hi, > >> > >> Just to clarify. I see two possible solutions: > >> > >> 1) Since the migration code doesn't receive fd, it isn't responsible for > >> closing it. So, it may be better to use migrate_fd_param for both > >> incoming/outgoing and add dupping for migrate_fd_param. Thus, clients must > >> close the fd themselves. But existing clients will have a leak. > > > > We can't break existing clients in this way as they are correctly > > using the monitor with its current semantics. > > > >> 2) If we don't duplicate fd, then at least we should remove fd from > >> the corresponding list. Therefore, the solution is to fix qemu_close to find > >> the list and remove fd from it. But qemu_close is currently consistent with > >> qemu_open (which opens/dups fd), so adding additional logic might not be > >> a very good idea. > > > > qemu_close is not appropriate place to deal with something speciifc > > to the montor. > > > >> I don't see any other solution, but I might miss something. > >> What do you think? > > > > All callers of monitor_get_fd() will close() the FD they get back. > > Thus monitor_get_fd() should remove it from the list when it returns > > it, and we should add API docs to monitor_get_fd() to explain this. > > > Ok, it sounds reasonable. But monitor_get_fd is only about outgoing migration. > But what about the incoming migration? It doesn't use monitor_get_fd but just > converts input string to int and use it as fd. The incoming migration expects the FD to be passed into QEMU by the mgmt app when it is exec'ing the QEMU binary. It doesn't interact with the monitor at all AFAIR. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
© 2016 - 2024 Red Hat, Inc.