[Qemu-devel] [PATCH] migration: Fix handling fd protocol

Yury Kotov posted 1 patch 5 years ago
Test docker-mingw@fedora failed
Test docker-clang@ubuntu passed
Test checkpatch passed
Test asan passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20190410092652.22616-1-yury-kotov@yandex-team.ru
Maintainers: "Dr. David Alan Gilbert" <dgilbert@redhat.com>, Juan Quintela <quintela@redhat.com>
migration/fd.c         | 39 +++++++++++++++++++++++++++++++--------
migration/trace-events |  4 ++--
2 files changed, 33 insertions(+), 10 deletions(-)
[Qemu-devel] [PATCH] migration: Fix handling fd protocol
Posted by Yury Kotov 5 years ago
Currently such case is possible for incoming:
QMP: add-fd (fdset = 0, fd of some file):
    adds fd to fdset 0 and returns QEMU's fd (e.g. 33)
QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc
...
Incoming migration completes and unrefs ioc -> close(33)
QMP: remove-fd (fdset = 0, fd = 33):
    removes fd from fdset 0 and qemu_close() -> close(33) => double close

For outgoing migration the case is the same but getfd instead of add-fd.
Fix it by duping client's fd.

Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
---
 migration/fd.c         | 39 +++++++++++++++++++++++++++++++--------
 migration/trace-events |  4 ++--
 2 files changed, 33 insertions(+), 10 deletions(-)

diff --git a/migration/fd.c b/migration/fd.c
index a7c13df4ad..c9ff07ac41 100644
--- a/migration/fd.c
+++ b/migration/fd.c
@@ -15,6 +15,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qapi/error.h"
 #include "channel.h"
 #include "fd.h"
 #include "migration.h"
@@ -26,15 +27,27 @@
 void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
 {
     QIOChannel *ioc;
-    int fd = monitor_get_fd(cur_mon, fdname, errp);
+    int fd, dup_fd;
+
+    fd = monitor_get_fd(cur_mon, fdname, errp);
     if (fd == -1) {
         return;
     }
 
-    trace_migration_fd_outgoing(fd);
-    ioc = qio_channel_new_fd(fd, errp);
+    /* fd is previously created by qmp command 'getfd',
+     * so client is responsible to close it. Dup it to save original value from
+     * QIOChannel's destructor */
+    dup_fd = qemu_dup(fd);
+    if (dup_fd == -1) {
+        error_setg(errp, "Cannot dup fd %s: %s (%d)", fdname, strerror(errno),
+                   errno);
+        return;
+    }
+
+    trace_migration_fd_outgoing(fd, dup_fd);
+    ioc = qio_channel_new_fd(dup_fd, errp);
     if (!ioc) {
-        close(fd);
+        close(dup_fd);
         return;
     }
 
@@ -55,14 +68,24 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
 void fd_start_incoming_migration(const char *infd, Error **errp)
 {
     QIOChannel *ioc;
-    int fd;
+    int fd, dup_fd;
 
     fd = strtol(infd, NULL, 0);
-    trace_migration_fd_incoming(fd);
 
-    ioc = qio_channel_new_fd(fd, errp);
+    /* fd is previously created by qmp command 'add-fd' or something else,
+     * so client is responsible to close it. Dup it to save original value from
+     * QIOChannel's destructor */
+    dup_fd = qemu_dup(fd);
+    if (dup_fd == -1) {
+        error_setg(errp, "Cannot dup fd %d: %s (%d)", fd, strerror(errno),
+                   errno);
+        return;
+    }
+
+    trace_migration_fd_incoming(fd, dup_fd);
+    ioc = qio_channel_new_fd(dup_fd, errp);
     if (!ioc) {
-        close(fd);
+        close(dup_fd);
         return;
     }
 
diff --git a/migration/trace-events b/migration/trace-events
index de2e136e57..d2d30a6b3c 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -258,8 +258,8 @@ migration_exec_outgoing(const char *cmd) "cmd=%s"
 migration_exec_incoming(const char *cmd) "cmd=%s"
 
 # fd.c
-migration_fd_outgoing(int fd) "fd=%d"
-migration_fd_incoming(int fd) "fd=%d"
+migration_fd_outgoing(int fd, int dup_fd) "fd=%d dup_fd=%d"
+migration_fd_incoming(int fd, int dup_fd) "fd=%d dup_fd=%d"
 
 # socket.c
 migration_socket_incoming_accepted(void) ""
-- 
2.21.0


Re: [Qemu-devel] [PATCH] migration: Fix handling fd protocol
Posted by no-reply@patchew.org 5 years ago
Patchew URL: https://patchew.org/QEMU/20190410092652.22616-1-yury-kotov@yandex-team.ru/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
time make docker-test-mingw@fedora SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC      migration/xbzrle.o
  CC      migration/postcopy-ram.o
/tmp/qemu-test/src/migration/fd.c: In function 'fd_start_outgoing_migration':
/tmp/qemu-test/src/migration/fd.c:40:14: error: implicit declaration of function 'qemu_dup'; did you mean 'qemu_hexdump'? [-Werror=implicit-function-declaration]
     dup_fd = qemu_dup(fd);
              ^~~~~~~~
              qemu_hexdump
/tmp/qemu-test/src/migration/fd.c:40:14: error: nested extern declaration of 'qemu_dup' [-Werror=nested-externs]
cc1: all warnings being treated as errors
make: *** [/tmp/qemu-test/src/rules.mak:69: migration/fd.o] Error 1
make: *** Waiting for unfinished jobs....


The full log is available at
http://patchew.org/logs/20190410092652.22616-1-yury-kotov@yandex-team.ru/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Re: [Qemu-devel] [PATCH] migration: Fix handling fd protocol
Posted by Dr. David Alan Gilbert 5 years ago
* Yury Kotov (yury-kotov@yandex-team.ru) wrote:
> Currently such case is possible for incoming:
> QMP: add-fd (fdset = 0, fd of some file):
>     adds fd to fdset 0 and returns QEMU's fd (e.g. 33)
> QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc
> ...
> Incoming migration completes and unrefs ioc -> close(33)
> QMP: remove-fd (fdset = 0, fd = 33):
>     removes fd from fdset 0 and qemu_close() -> close(33) => double close

Well spotted! That would very rarely cause a problem, but is a race.

> For outgoing migration the case is the same but getfd instead of add-fd.
> Fix it by duping client's fd.
> 
> Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>

Note your patch hit a problem building on windows; I don't think we have
a qemu_dup for windows.

However, I think this problem is wider than just migration.
For example, I see that dump.c also uses monitor_get_fd, and it's
dump_cleanup also does a close on the fd. So I guess it hits the same
problem?
Also, qmp.c in qmp_add_client does a close on the fd in some error cases
(I've not followed the normal case).

So perhaps all the users of monitor_get_fd are hitting this problem.

Should we be doing the dup in monitor_get_fd?

Dave


> ---
>  migration/fd.c         | 39 +++++++++++++++++++++++++++++++--------
>  migration/trace-events |  4 ++--
>  2 files changed, 33 insertions(+), 10 deletions(-)
> 
> diff --git a/migration/fd.c b/migration/fd.c
> index a7c13df4ad..c9ff07ac41 100644
> --- a/migration/fd.c
> +++ b/migration/fd.c
> @@ -15,6 +15,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qapi/error.h"
>  #include "channel.h"
>  #include "fd.h"
>  #include "migration.h"
> @@ -26,15 +27,27 @@
>  void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
>  {
>      QIOChannel *ioc;
> -    int fd = monitor_get_fd(cur_mon, fdname, errp);
> +    int fd, dup_fd;
> +
> +    fd = monitor_get_fd(cur_mon, fdname, errp);
>      if (fd == -1) {
>          return;
>      }
>  
> -    trace_migration_fd_outgoing(fd);
> -    ioc = qio_channel_new_fd(fd, errp);
> +    /* fd is previously created by qmp command 'getfd',
> +     * so client is responsible to close it. Dup it to save original value from
> +     * QIOChannel's destructor */
> +    dup_fd = qemu_dup(fd);
> +    if (dup_fd == -1) {
> +        error_setg(errp, "Cannot dup fd %s: %s (%d)", fdname, strerror(errno),
> +                   errno);
> +        return;
> +    }
> +
> +    trace_migration_fd_outgoing(fd, dup_fd);
> +    ioc = qio_channel_new_fd(dup_fd, errp);
>      if (!ioc) {
> -        close(fd);
> +        close(dup_fd);
>          return;
>      }
>  
> @@ -55,14 +68,24 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
>  void fd_start_incoming_migration(const char *infd, Error **errp)
>  {
>      QIOChannel *ioc;
> -    int fd;
> +    int fd, dup_fd;
>  
>      fd = strtol(infd, NULL, 0);
> -    trace_migration_fd_incoming(fd);
>  
> -    ioc = qio_channel_new_fd(fd, errp);
> +    /* fd is previously created by qmp command 'add-fd' or something else,
> +     * so client is responsible to close it. Dup it to save original value from
> +     * QIOChannel's destructor */
> +    dup_fd = qemu_dup(fd);
> +    if (dup_fd == -1) {
> +        error_setg(errp, "Cannot dup fd %d: %s (%d)", fd, strerror(errno),
> +                   errno);
> +        return;
> +    }
> +
> +    trace_migration_fd_incoming(fd, dup_fd);
> +    ioc = qio_channel_new_fd(dup_fd, errp);
>      if (!ioc) {
> -        close(fd);
> +        close(dup_fd);
>          return;
>      }
>  
> diff --git a/migration/trace-events b/migration/trace-events
> index de2e136e57..d2d30a6b3c 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -258,8 +258,8 @@ migration_exec_outgoing(const char *cmd) "cmd=%s"
>  migration_exec_incoming(const char *cmd) "cmd=%s"
>  
>  # fd.c
> -migration_fd_outgoing(int fd) "fd=%d"
> -migration_fd_incoming(int fd) "fd=%d"
> +migration_fd_outgoing(int fd, int dup_fd) "fd=%d dup_fd=%d"
> +migration_fd_incoming(int fd, int dup_fd) "fd=%d dup_fd=%d"
>  
>  # socket.c
>  migration_socket_incoming_accepted(void) ""
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration: Fix handling fd protocol
Posted by Yury Kotov 5 years ago
Hi,

10.04.2019, 16:58, "Dr. David Alan Gilbert" <dgilbert@redhat.com>:
> * Yury Kotov (yury-kotov@yandex-team.ru) wrote:
>>  Currently such case is possible for incoming:
>>  QMP: add-fd (fdset = 0, fd of some file):
>>      adds fd to fdset 0 and returns QEMU's fd (e.g. 33)
>>  QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc
>>  ...
>>  Incoming migration completes and unrefs ioc -> close(33)
>>  QMP: remove-fd (fdset = 0, fd = 33):
>>      removes fd from fdset 0 and qemu_close() -> close(33) => double close
>
> Well spotted! That would very rarely cause a problem, but is a race.
>

Actually, it hits for incoming case on 2 of 50 VMs on our production...

>>  For outgoing migration the case is the same but getfd instead of add-fd.
>>  Fix it by duping client's fd.
>>
>>  Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
>
> Note your patch hit a problem building on windows; I don't think we have
> a qemu_dup for windows.
>

Oh, I see... I'll add an ifdef for this in v2.

> However, I think this problem is wider than just migration.
> For example, I see that dump.c also uses monitor_get_fd, and it's
> dump_cleanup also does a close on the fd. So I guess it hits the same
> problem?
> Also, qmp.c in qmp_add_client does a close on the fd in some error cases
> (I've not followed the normal case).
>
> So perhaps all the users of monitor_get_fd are hitting this problem.
>
> Should we be doing the dup in monitor_get_fd?
>

Hmm, it sounds reasonable but Windows's case will remain broken.
And also using fd from fdset without qemu_open will remain broken.

Another way to fix them:
1) Add searching of monitor's fds and not duped fdset's fds to qemu_close
2) Replace broken closes to qemu_close

Regards,
Yury Kotov

> Dave
>
>>  ---
>>   migration/fd.c | 39 +++++++++++++++++++++++++++++++--------
>>   migration/trace-events | 4 ++--
>>   2 files changed, 33 insertions(+), 10 deletions(-)
>>
>>  diff --git a/migration/fd.c b/migration/fd.c
>>  index a7c13df4ad..c9ff07ac41 100644
>>  --- a/migration/fd.c
>>  +++ b/migration/fd.c
>>  @@ -15,6 +15,7 @@
>>    */
>>
>>   #include "qemu/osdep.h"
>>  +#include "qapi/error.h"
>>   #include "channel.h"
>>   #include "fd.h"
>>   #include "migration.h"
>>  @@ -26,15 +27,27 @@
>>   void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
>>   {
>>       QIOChannel *ioc;
>>  - int fd = monitor_get_fd(cur_mon, fdname, errp);
>>  + int fd, dup_fd;
>>  +
>>  + fd = monitor_get_fd(cur_mon, fdname, errp);
>>       if (fd == -1) {
>>           return;
>>       }
>>
>>  - trace_migration_fd_outgoing(fd);
>>  - ioc = qio_channel_new_fd(fd, errp);
>>  + /* fd is previously created by qmp command 'getfd',
>>  + * so client is responsible to close it. Dup it to save original value from
>>  + * QIOChannel's destructor */
>>  + dup_fd = qemu_dup(fd);
>>  + if (dup_fd == -1) {
>>  + error_setg(errp, "Cannot dup fd %s: %s (%d)", fdname, strerror(errno),
>>  + errno);
>>  + return;
>>  + }
>>  +
>>  + trace_migration_fd_outgoing(fd, dup_fd);
>>  + ioc = qio_channel_new_fd(dup_fd, errp);
>>       if (!ioc) {
>>  - close(fd);
>>  + close(dup_fd);
>>           return;
>>       }
>>
>>  @@ -55,14 +68,24 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
>>   void fd_start_incoming_migration(const char *infd, Error **errp)
>>   {
>>       QIOChannel *ioc;
>>  - int fd;
>>  + int fd, dup_fd;
>>
>>       fd = strtol(infd, NULL, 0);
>>  - trace_migration_fd_incoming(fd);
>>
>>  - ioc = qio_channel_new_fd(fd, errp);
>>  + /* fd is previously created by qmp command 'add-fd' or something else,
>>  + * so client is responsible to close it. Dup it to save original value from
>>  + * QIOChannel's destructor */
>>  + dup_fd = qemu_dup(fd);
>>  + if (dup_fd == -1) {
>>  + error_setg(errp, "Cannot dup fd %d: %s (%d)", fd, strerror(errno),
>>  + errno);
>>  + return;
>>  + }
>>  +
>>  + trace_migration_fd_incoming(fd, dup_fd);
>>  + ioc = qio_channel_new_fd(dup_fd, errp);
>>       if (!ioc) {
>>  - close(fd);
>>  + close(dup_fd);
>>           return;
>>       }
>>
>>  diff --git a/migration/trace-events b/migration/trace-events
>>  index de2e136e57..d2d30a6b3c 100644
>>  --- a/migration/trace-events
>>  +++ b/migration/trace-events
>>  @@ -258,8 +258,8 @@ migration_exec_outgoing(const char *cmd) "cmd=%s"
>>   migration_exec_incoming(const char *cmd) "cmd=%s"
>>
>>   # fd.c
>>  -migration_fd_outgoing(int fd) "fd=%d"
>>  -migration_fd_incoming(int fd) "fd=%d"
>>  +migration_fd_outgoing(int fd, int dup_fd) "fd=%d dup_fd=%d"
>>  +migration_fd_incoming(int fd, int dup_fd) "fd=%d dup_fd=%d"
>>
>>   # socket.c
>>   migration_socket_incoming_accepted(void) ""
>>  --
>>  2.21.0
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration: Fix handling fd protocol
Posted by Daniel P. Berrangé 5 years ago
On Wed, Apr 10, 2019 at 12:26:52PM +0300, Yury Kotov wrote:
> Currently such case is possible for incoming:
> QMP: add-fd (fdset = 0, fd of some file):
>     adds fd to fdset 0 and returns QEMU's fd (e.g. 33)
> QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc
> ...
> Incoming migration completes and unrefs ioc -> close(33)
> QMP: remove-fd (fdset = 0, fd = 33):
>     removes fd from fdset 0 and qemu_close() -> close(33) => double close

IMHO this is user error.

You've given the FD to QEMU and told it to use it for migration.

Once you've done that you should not be calling remove-fd.

remove-fd should only be used in some error code path. ie if you
have called  add-fd, and then some failure means you've decided you
can't invoke 'migrate-incoming'. Now you should call "remove-fd".

AFAIK, we have never documented that 'add-fd' must be paired
with 'remove-fd'.

IOW, I think this "fix" is in fact not a fix - it is instead
changing the semantics of when "remove-fd" should be used.

> For outgoing migration the case is the same but getfd instead of add-fd.
> Fix it by duping client's fd.
> 
> Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
> ---
>  migration/fd.c         | 39 +++++++++++++++++++++++++++++++--------
>  migration/trace-events |  4 ++--
>  2 files changed, 33 insertions(+), 10 deletions(-)
> 
> diff --git a/migration/fd.c b/migration/fd.c
> index a7c13df4ad..c9ff07ac41 100644
> --- a/migration/fd.c
> +++ b/migration/fd.c
> @@ -15,6 +15,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qapi/error.h"
>  #include "channel.h"
>  #include "fd.h"
>  #include "migration.h"
> @@ -26,15 +27,27 @@
>  void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
>  {
>      QIOChannel *ioc;
> -    int fd = monitor_get_fd(cur_mon, fdname, errp);
> +    int fd, dup_fd;
> +
> +    fd = monitor_get_fd(cur_mon, fdname, errp);
>      if (fd == -1) {
>          return;
>      }
>  
> -    trace_migration_fd_outgoing(fd);
> -    ioc = qio_channel_new_fd(fd, errp);
> +    /* fd is previously created by qmp command 'getfd',
> +     * so client is responsible to close it. Dup it to save original value from
> +     * QIOChannel's destructor */
> +    dup_fd = qemu_dup(fd);
> +    if (dup_fd == -1) {
> +        error_setg(errp, "Cannot dup fd %s: %s (%d)", fdname, strerror(errno),
> +                   errno);
> +        return;
> +    }
> +
> +    trace_migration_fd_outgoing(fd, dup_fd);
> +    ioc = qio_channel_new_fd(dup_fd, errp);
>      if (!ioc) {
> -        close(fd);
> +        close(dup_fd);
>          return;
>      }
>  
> @@ -55,14 +68,24 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
>  void fd_start_incoming_migration(const char *infd, Error **errp)
>  {
>      QIOChannel *ioc;
> -    int fd;
> +    int fd, dup_fd;
>  
>      fd = strtol(infd, NULL, 0);
> -    trace_migration_fd_incoming(fd);
>  
> -    ioc = qio_channel_new_fd(fd, errp);
> +    /* fd is previously created by qmp command 'add-fd' or something else,
> +     * so client is responsible to close it. Dup it to save original value from
> +     * QIOChannel's destructor */
> +    dup_fd = qemu_dup(fd);
> +    if (dup_fd == -1) {
> +        error_setg(errp, "Cannot dup fd %d: %s (%d)", fd, strerror(errno),
> +                   errno);
> +        return;
> +    }
> +
> +    trace_migration_fd_incoming(fd, dup_fd);
> +    ioc = qio_channel_new_fd(dup_fd, errp);
>      if (!ioc) {
> -        close(fd);
> +        close(dup_fd);
>          return;
>      }
>  
> diff --git a/migration/trace-events b/migration/trace-events
> index de2e136e57..d2d30a6b3c 100644
> --- a/migration/trace-events
> +++ b/migration/trace-events
> @@ -258,8 +258,8 @@ migration_exec_outgoing(const char *cmd) "cmd=%s"
>  migration_exec_incoming(const char *cmd) "cmd=%s"
>  
>  # fd.c
> -migration_fd_outgoing(int fd) "fd=%d"
> -migration_fd_incoming(int fd) "fd=%d"
> +migration_fd_outgoing(int fd, int dup_fd) "fd=%d dup_fd=%d"
> +migration_fd_incoming(int fd, int dup_fd) "fd=%d dup_fd=%d"
>  
>  # socket.c
>  migration_socket_incoming_accepted(void) ""
> -- 
> 2.21.0
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] migration: Fix handling fd protocol
Posted by Yury Kotov 5 years ago
11.04.2019, 15:04, "Daniel P. Berrangé" <berrange@redhat.com>:
> On Wed, Apr 10, 2019 at 12:26:52PM +0300, Yury Kotov wrote:
>>  Currently such case is possible for incoming:
>>  QMP: add-fd (fdset = 0, fd of some file):
>>      adds fd to fdset 0 and returns QEMU's fd (e.g. 33)
>>  QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc
>>  ...
>>  Incoming migration completes and unrefs ioc -> close(33)
>>  QMP: remove-fd (fdset = 0, fd = 33):
>>      removes fd from fdset 0 and qemu_close() -> close(33) => double close
>
> IMHO this is user error.
>
> You've given the FD to QEMU and told it to use it for migration.
>
> Once you've done that you should not be calling remove-fd.
>
> remove-fd should only be used in some error code path. ie if you
> have called add-fd, and then some failure means you've decided you
> can't invoke 'migrate-incoming'. Now you should call "remove-fd".
>
> AFAIK, we have never documented that 'add-fd' must be paired
> with 'remove-fd'.
>
> IOW, I think this "fix" is in fact not a fix - it is instead
> changing the semantics of when "remove-fd" should be used.
>

I think it might be user's error but fd is still in cur_mon->fds (in getfd case)
or in mon_fdsets (in add-fd case). So if fd is still exists in the lists why
user can't close/remove it?

So, there are 2 valid options:
1) fd is still valid after migration (dup)
2) fd is closed but also removed from the appropriate list

Regards,
Yury

>>  For outgoing migration the case is the same but getfd instead of add-fd.
>>  Fix it by duping client's fd.
>>
>>  Signed-off-by: Yury Kotov <yury-kotov@yandex-team.ru>
>>  ---
>>   migration/fd.c | 39 +++++++++++++++++++++++++++++++--------
>>   migration/trace-events | 4 ++--
>>   2 files changed, 33 insertions(+), 10 deletions(-)
>>
>>  diff --git a/migration/fd.c b/migration/fd.c
>>  index a7c13df4ad..c9ff07ac41 100644
>>  --- a/migration/fd.c
>>  +++ b/migration/fd.c
>>  @@ -15,6 +15,7 @@
>>    */
>>
>>   #include "qemu/osdep.h"
>>  +#include "qapi/error.h"
>>   #include "channel.h"
>>   #include "fd.h"
>>   #include "migration.h"
>>  @@ -26,15 +27,27 @@
>>   void fd_start_outgoing_migration(MigrationState *s, const char *fdname, Error **errp)
>>   {
>>       QIOChannel *ioc;
>>  - int fd = monitor_get_fd(cur_mon, fdname, errp);
>>  + int fd, dup_fd;
>>  +
>>  + fd = monitor_get_fd(cur_mon, fdname, errp);
>>       if (fd == -1) {
>>           return;
>>       }
>>
>>  - trace_migration_fd_outgoing(fd);
>>  - ioc = qio_channel_new_fd(fd, errp);
>>  + /* fd is previously created by qmp command 'getfd',
>>  + * so client is responsible to close it. Dup it to save original value from
>>  + * QIOChannel's destructor */
>>  + dup_fd = qemu_dup(fd);
>>  + if (dup_fd == -1) {
>>  + error_setg(errp, "Cannot dup fd %s: %s (%d)", fdname, strerror(errno),
>>  + errno);
>>  + return;
>>  + }
>>  +
>>  + trace_migration_fd_outgoing(fd, dup_fd);
>>  + ioc = qio_channel_new_fd(dup_fd, errp);
>>       if (!ioc) {
>>  - close(fd);
>>  + close(dup_fd);
>>           return;
>>       }
>>
>>  @@ -55,14 +68,24 @@ static gboolean fd_accept_incoming_migration(QIOChannel *ioc,
>>   void fd_start_incoming_migration(const char *infd, Error **errp)
>>   {
>>       QIOChannel *ioc;
>>  - int fd;
>>  + int fd, dup_fd;
>>
>>       fd = strtol(infd, NULL, 0);
>>  - trace_migration_fd_incoming(fd);
>>
>>  - ioc = qio_channel_new_fd(fd, errp);
>>  + /* fd is previously created by qmp command 'add-fd' or something else,
>>  + * so client is responsible to close it. Dup it to save original value from
>>  + * QIOChannel's destructor */
>>  + dup_fd = qemu_dup(fd);
>>  + if (dup_fd == -1) {
>>  + error_setg(errp, "Cannot dup fd %d: %s (%d)", fd, strerror(errno),
>>  + errno);
>>  + return;
>>  + }
>>  +
>>  + trace_migration_fd_incoming(fd, dup_fd);
>>  + ioc = qio_channel_new_fd(dup_fd, errp);
>>       if (!ioc) {
>>  - close(fd);
>>  + close(dup_fd);
>>           return;
>>       }
>>
>>  diff --git a/migration/trace-events b/migration/trace-events
>>  index de2e136e57..d2d30a6b3c 100644
>>  --- a/migration/trace-events
>>  +++ b/migration/trace-events
>>  @@ -258,8 +258,8 @@ migration_exec_outgoing(const char *cmd) "cmd=%s"
>>   migration_exec_incoming(const char *cmd) "cmd=%s"
>>
>>   # fd.c
>>  -migration_fd_outgoing(int fd) "fd=%d"
>>  -migration_fd_incoming(int fd) "fd=%d"
>>  +migration_fd_outgoing(int fd, int dup_fd) "fd=%d dup_fd=%d"
>>  +migration_fd_incoming(int fd, int dup_fd) "fd=%d dup_fd=%d"
>>
>>   # socket.c
>>   migration_socket_incoming_accepted(void) ""
>>  --
>>  2.21.0
>
> Regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] migration: Fix handling fd protocol
Posted by Daniel P. Berrangé 5 years ago
On Thu, Apr 11, 2019 at 03:31:43PM +0300, Yury Kotov wrote:
> 11.04.2019, 15:04, "Daniel P. Berrangé" <berrange@redhat.com>:
> > On Wed, Apr 10, 2019 at 12:26:52PM +0300, Yury Kotov wrote:
> >>  Currently such case is possible for incoming:
> >>  QMP: add-fd (fdset = 0, fd of some file):
> >>      adds fd to fdset 0 and returns QEMU's fd (e.g. 33)
> >>  QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc
> >>  ...
> >>  Incoming migration completes and unrefs ioc -> close(33)
> >>  QMP: remove-fd (fdset = 0, fd = 33):
> >>      removes fd from fdset 0 and qemu_close() -> close(33) => double close
> >
> > IMHO this is user error.
> >
> > You've given the FD to QEMU and told it to use it for migration.
> >
> > Once you've done that you should not be calling remove-fd.
> >
> > remove-fd should only be used in some error code path. ie if you
> > have called add-fd, and then some failure means you've decided you
> > can't invoke 'migrate-incoming'. Now you should call "remove-fd".
> >
> > AFAIK, we have never documented that 'add-fd' must be paired
> > with 'remove-fd'.
> >
> > IOW, I think this "fix" is in fact not a fix - it is instead
> > changing the semantics of when "remove-fd" should be used.
> >
> 
> I think it might be user's error but fd is still in cur_mon->fds (in getfd case)
> or in mon_fdsets (in add-fd case). So if fd is still exists in the lists why
> user can't close/remove it?
> 
> So, there are 2 valid options:
> 1) fd is still valid after migration (dup)

If we do this then existing mgmt apps using QEMU who don't expect to need
to call remove-fd are going to be leaking FDs forever.

> 2) fd is closed but also removed from the appropriate list

monitor_get_fd currently leaves the FD on the list.

if all current users of that API are closing the FD
themselves, then we could change that method to remove
it from the list.

Either way the requirements in this area are pooly documented
both from QEMU's internal POV and from mgmt app public QMP pov.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] migration: Fix handling fd protocol
Posted by Yury Kotov 5 years ago
11.04.2019, 15:39, "Daniel P. Berrangé" <berrange@redhat.com>:
> On Thu, Apr 11, 2019 at 03:31:43PM +0300, Yury Kotov wrote:
>>  11.04.2019, 15:04, "Daniel P. Berrangé" <berrange@redhat.com>:
>>  > On Wed, Apr 10, 2019 at 12:26:52PM +0300, Yury Kotov wrote:
>>  >>  Currently such case is possible for incoming:
>>  >>  QMP: add-fd (fdset = 0, fd of some file):
>>  >>      adds fd to fdset 0 and returns QEMU's fd (e.g. 33)
>>  >>  QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc
>>  >>  ...
>>  >>  Incoming migration completes and unrefs ioc -> close(33)
>>  >>  QMP: remove-fd (fdset = 0, fd = 33):
>>  >>      removes fd from fdset 0 and qemu_close() -> close(33) => double close
>>  >
>>  > IMHO this is user error.
>>  >
>>  > You've given the FD to QEMU and told it to use it for migration.
>>  >
>>  > Once you've done that you should not be calling remove-fd.
>>  >
>>  > remove-fd should only be used in some error code path. ie if you
>>  > have called add-fd, and then some failure means you've decided you
>>  > can't invoke 'migrate-incoming'. Now you should call "remove-fd".
>>  >
>>  > AFAIK, we have never documented that 'add-fd' must be paired
>>  > with 'remove-fd'.
>>  >
>>  > IOW, I think this "fix" is in fact not a fix - it is instead
>>  > changing the semantics of when "remove-fd" should be used.
>>  >
>>
>>  I think it might be user's error but fd is still in cur_mon->fds (in getfd case)
>>  or in mon_fdsets (in add-fd case). So if fd is still exists in the lists why
>>  user can't close/remove it?
>>
>>  So, there are 2 valid options:
>>  1) fd is still valid after migration (dup)
>
> If we do this then existing mgmt apps using QEMU who don't expect to need
> to call remove-fd are going to be leaking FDs forever.
>

Ok, so what do you think about modifying qemu_close to remove fd from these two
lists? Currently it tryes to remove fd only from mon_fdsets->dup_fds.

Regards,
Yury

>>  2) fd is closed but also removed from the appropriate list
>
> monitor_get_fd currently leaves the FD on the list.
>
> if all current users of that API are closing the FD
> themselves, then we could change that method to remove
> it from the list.
>
> Either way the requirements in this area are pooly documented
> both from QEMU's internal POV and from mgmt app public QMP pov.
>
> Regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] migration: Fix handling fd protocol
Posted by Yury Kotov 5 years ago
Hi,

Just to clarify. I see two possible solutions:

1) Since the migration code doesn't receive fd, it isn't responsible for
closing it. So, it may be better to use migrate_fd_param for both
incoming/outgoing and add dupping for migrate_fd_param. Thus, clients must
close the fd themselves. But existing clients will have a leak.

2) If we don't duplicate fd, then at least we should remove fd from
the corresponding list. Therefore, the solution is to fix qemu_close to find
the list and remove fd from it. But qemu_close is currently consistent with
qemu_open (which opens/dups fd), so adding additional logic might not be
a very good idea.

I don't see any other solution, but I might miss something.
What do you think?

Regards,
Yury

11.04.2019, 15:58, "Yury Kotov" <yury-kotov@yandex-team.ru>:
> 11.04.2019, 15:39, "Daniel P. Berrangé" <berrange@redhat.com>:
>>  On Thu, Apr 11, 2019 at 03:31:43PM +0300, Yury Kotov wrote:
>>>   11.04.2019, 15:04, "Daniel P. Berrangé" <berrange@redhat.com>:
>>>   > On Wed, Apr 10, 2019 at 12:26:52PM +0300, Yury Kotov wrote:
>>>   >>  Currently such case is possible for incoming:
>>>   >>  QMP: add-fd (fdset = 0, fd of some file):
>>>   >>      adds fd to fdset 0 and returns QEMU's fd (e.g. 33)
>>>   >>  QMP: migrate-incoming (uri = "fd:33"): fd is stored in QIOChannel *ioc
>>>   >>  ...
>>>   >>  Incoming migration completes and unrefs ioc -> close(33)
>>>   >>  QMP: remove-fd (fdset = 0, fd = 33):
>>>   >>      removes fd from fdset 0 and qemu_close() -> close(33) => double close
>>>   >
>>>   > IMHO this is user error.
>>>   >
>>>   > You've given the FD to QEMU and told it to use it for migration.
>>>   >
>>>   > Once you've done that you should not be calling remove-fd.
>>>   >
>>>   > remove-fd should only be used in some error code path. ie if you
>>>   > have called add-fd, and then some failure means you've decided you
>>>   > can't invoke 'migrate-incoming'. Now you should call "remove-fd".
>>>   >
>>>   > AFAIK, we have never documented that 'add-fd' must be paired
>>>   > with 'remove-fd'.
>>>   >
>>>   > IOW, I think this "fix" is in fact not a fix - it is instead
>>>   > changing the semantics of when "remove-fd" should be used.
>>>   >
>>>
>>>   I think it might be user's error but fd is still in cur_mon->fds (in getfd case)
>>>   or in mon_fdsets (in add-fd case). So if fd is still exists in the lists why
>>>   user can't close/remove it?
>>>
>>>   So, there are 2 valid options:
>>>   1) fd is still valid after migration (dup)
>>
>>  If we do this then existing mgmt apps using QEMU who don't expect to need
>>  to call remove-fd are going to be leaking FDs forever.
>
> Ok, so what do you think about modifying qemu_close to remove fd from these two
> lists? Currently it tryes to remove fd only from mon_fdsets->dup_fds.
>
> Regards,
> Yury
>
>>>   2) fd is closed but also removed from the appropriate list
>>
>>  monitor_get_fd currently leaves the FD on the list.
>>
>>  if all current users of that API are closing the FD
>>  themselves, then we could change that method to remove
>>  it from the list.
>>
>>  Either way the requirements in this area are pooly documented
>>  both from QEMU's internal POV and from mgmt app public QMP pov.
>>
>>  Regards,
>>  Daniel
>>  --
>>  |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
>>  |: https://libvirt.org -o- https://fstop138.berrange.com :|
>>  |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] migration: Fix handling fd protocol
Posted by Daniel P. Berrangé 5 years ago
On Mon, Apr 15, 2019 at 12:50:08PM +0300, Yury Kotov wrote:
> Hi,
> 
> Just to clarify. I see two possible solutions:
> 
> 1) Since the migration code doesn't receive fd, it isn't responsible for
> closing it. So, it may be better to use migrate_fd_param for both
> incoming/outgoing and add dupping for migrate_fd_param. Thus, clients must
> close the fd themselves. But existing clients will have a leak.

We can't break existing clients in this way as they are correctly
using the monitor with its current semantics.

> 2) If we don't duplicate fd, then at least we should remove fd from
> the corresponding list. Therefore, the solution is to fix qemu_close to find
> the list and remove fd from it. But qemu_close is currently consistent with
> qemu_open (which opens/dups fd), so adding additional logic might not be
> a very good idea.

qemu_close is not appropriate place to deal with something speciifc
to the montor.

> I don't see any other solution, but I might miss something.
> What do you think?

All callers of monitor_get_fd() will close() the FD they get back.
Thus monitor_get_fd() should remove it from the list when it returns
it, and we should add API docs to monitor_get_fd() to explain this.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [Qemu-devel] [PATCH] migration: Fix handling fd protocol
Posted by Yury Kotov 5 years ago
15.04.2019, 13:11, "Daniel P. Berrangé" <berrange@redhat.com>:
> On Mon, Apr 15, 2019 at 12:50:08PM +0300, Yury Kotov wrote:
>>  Hi,
>>
>>  Just to clarify. I see two possible solutions:
>>
>>  1) Since the migration code doesn't receive fd, it isn't responsible for
>>  closing it. So, it may be better to use migrate_fd_param for both
>>  incoming/outgoing and add dupping for migrate_fd_param. Thus, clients must
>>  close the fd themselves. But existing clients will have a leak.
>
> We can't break existing clients in this way as they are correctly
> using the monitor with its current semantics.
>
>>  2) If we don't duplicate fd, then at least we should remove fd from
>>  the corresponding list. Therefore, the solution is to fix qemu_close to find
>>  the list and remove fd from it. But qemu_close is currently consistent with
>>  qemu_open (which opens/dups fd), so adding additional logic might not be
>>  a very good idea.
>
> qemu_close is not appropriate place to deal with something speciifc
> to the montor.
>
>>  I don't see any other solution, but I might miss something.
>>  What do you think?
>
> All callers of monitor_get_fd() will close() the FD they get back.
> Thus monitor_get_fd() should remove it from the list when it returns
> it, and we should add API docs to monitor_get_fd() to explain this.
>
Ok, it sounds reasonable. But monitor_get_fd is only about outgoing migration.
But what about the incoming migration? It doesn't use monitor_get_fd but just
converts input string to int and use it as fd.

Regards,
Yury


Re: [Qemu-devel] [PATCH] migration: Fix handling fd protocol
Posted by Yury Kotov 5 years ago
15.04.2019, 13:17, "Yury Kotov" <yury-kotov@yandex-team.ru>:
> 15.04.2019, 13:11, "Daniel P. Berrangé" <berrange@redhat.com>:
>>  On Mon, Apr 15, 2019 at 12:50:08PM +0300, Yury Kotov wrote:
>>>   Hi,
>>>
>>>   Just to clarify. I see two possible solutions:
>>>
>>>   1) Since the migration code doesn't receive fd, it isn't responsible for
>>>   closing it. So, it may be better to use migrate_fd_param for both
>>>   incoming/outgoing and add dupping for migrate_fd_param. Thus, clients must
>>>   close the fd themselves. But existing clients will have a leak.
>>
>>  We can't break existing clients in this way as they are correctly
>>  using the monitor with its current semantics.
>>
>>>   2) If we don't duplicate fd, then at least we should remove fd from
>>>   the corresponding list. Therefore, the solution is to fix qemu_close to find
>>>   the list and remove fd from it. But qemu_close is currently consistent with
>>>   qemu_open (which opens/dups fd), so adding additional logic might not be
>>>   a very good idea.
>>
>>  qemu_close is not appropriate place to deal with something speciifc
>>  to the montor.
>>

But qemu_close is already deal with monitor:
It uses monitor_fdset_dup_fd_find & monitor_fdset_dup_fd_remove to find and
remove fd from monitor's dup_fds list.

>>>   I don't see any other solution, but I might miss something.
>>>   What do you think?
>>
>>  All callers of monitor_get_fd() will close() the FD they get back.
>>  Thus monitor_get_fd() should remove it from the list when it returns
>>  it, and we should add API docs to monitor_get_fd() to explain this.
>
> Ok, it sounds reasonable. But monitor_get_fd is only about outgoing migration.
> But what about the incoming migration? It doesn't use monitor_get_fd but just
> converts input string to int and use it as fd.
>

Regards,
Yury

Re: [Qemu-devel] [PATCH] migration: Fix handling fd protocol
Posted by Daniel P. Berrangé 5 years ago
On Mon, Apr 15, 2019 at 01:17:06PM +0300, Yury Kotov wrote:
> 15.04.2019, 13:11, "Daniel P. Berrangé" <berrange@redhat.com>:
> > On Mon, Apr 15, 2019 at 12:50:08PM +0300, Yury Kotov wrote:
> >>  Hi,
> >>
> >>  Just to clarify. I see two possible solutions:
> >>
> >>  1) Since the migration code doesn't receive fd, it isn't responsible for
> >>  closing it. So, it may be better to use migrate_fd_param for both
> >>  incoming/outgoing and add dupping for migrate_fd_param. Thus, clients must
> >>  close the fd themselves. But existing clients will have a leak.
> >
> > We can't break existing clients in this way as they are correctly
> > using the monitor with its current semantics.
> >
> >>  2) If we don't duplicate fd, then at least we should remove fd from
> >>  the corresponding list. Therefore, the solution is to fix qemu_close to find
> >>  the list and remove fd from it. But qemu_close is currently consistent with
> >>  qemu_open (which opens/dups fd), so adding additional logic might not be
> >>  a very good idea.
> >
> > qemu_close is not appropriate place to deal with something speciifc
> > to the montor.
> >
> >>  I don't see any other solution, but I might miss something.
> >>  What do you think?
> >
> > All callers of monitor_get_fd() will close() the FD they get back.
> > Thus monitor_get_fd() should remove it from the list when it returns
> > it, and we should add API docs to monitor_get_fd() to explain this.
> >
> Ok, it sounds reasonable. But monitor_get_fd is only about outgoing migration.
> But what about the incoming migration? It doesn't use monitor_get_fd but just
> converts input string to int and use it as fd.

The incoming migration expects the FD to be passed into QEMU by the mgmt
app when it is exec'ing the QEMU binary. It doesn't interact with the
monitor at all AFAIR.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|