[PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case

Rao, Lei posted 1 patch 2 years, 4 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20211201075427.155702-1-lei.rao@intel.com
Maintainers: Eric Blake <eblake@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>, "Daniel P. Berrangé" <berrange@redhat.com>, Hanna Reitz <hreitz@redhat.com>, Kevin Wolf <kwolf@redhat.com>
block/nbd.c          |  5 +++++
include/io/channel.h | 19 +++++++++++++++++++
io/channel.c         | 22 ++++++++++++++++++++++
3 files changed, 46 insertions(+)
[PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
Posted by Rao, Lei 2 years, 4 months ago
    We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO.
    The patch fixes as follow:
        #0  0x00007fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
        #1  0x00005563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=599999697630) at util/qemu-timer.c:348
        #2  0x00005563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669
        #3  0x00005563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432
        #4  0x00005563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438
        #5  0x00005563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063
        #6  0x00005563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568
        #7  0x00005563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494
        #8  0x00005563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538
        #9  0x00005563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98)
            at qapi/qmp-dispatch.c:132
        #10 0x00005563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175
        #11 0x00005563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145
        #12 0x00005563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234
        #13 0x00005563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89
        #14 0x00005563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117
        #15 0x00005563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459
        #16 0x00005563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260
        #17 0x00007fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
        #18 0x00005563d6978cda in glib_pollfds_poll () at util/main-loop.c:219
        #19 0x00005563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242
        #20 0x00005563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518
        #21 0x00005563d658f5ed in main_loop () at vl.c:1814
        #22 0x00005563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8, envp=0x7fff3cf5c2b8) at vl.c:450

    From the call trace, we can see that the QEMU main thread is waiting for the in_flight return to zero. But the in_filght never reaches 0.
    After analysis, the root cause is that the coroutine of NBD was not awakened. Although this bug was found in the COLO test, I think this is a
    universal problem in the QIO module. This issue also affects other modules depending on QIO such as NBD. We dump the following data:
    $2 = {
      in_flight = 2,
      state = NBD_CLIENT_QUIT,
      connect_status = 0,
      connect_err = 0x0,
      wait_in_flight = false,
      requests = {{
          coroutine = 0x0,
          offset = 273077071872,
          receiving = false,
        }, {
          coroutine = 0x7f1164571df0,
          offset = 498792529920,
          receiving = false,
        }, {
          coroutine = 0x0,
          offset = 500663590912,
          receiving = false,
        }, {
          ...
        } },
      reply = {
        simple = {
          magic = 1732535960,
          error = 0,
          handle = 0
        },
        structured = {
          magic = 1732535960,
          flags = 0,
          type = 0,
          handle = 0,
          length = 0
        },
        {
          magic = 1732535960,
          _skip = 0,
          handle = 0
        }
      },
      bs = 0x7f11640e2140,
      reconnect_delay = 0,
      saddr = 0x7f11640e1f80,
      export = 0x7f11640dc560 "parent0",
    }
    From the data, we can see the coroutine of NBD does not exit normally when killing the NBD server(we kill the Secondary VM in the COLO stress test).
    Then it will not execute in_flight--. So, the QEMU main thread will hang here. Further analysis, I found the coroutine of NBD will yield
    in nbd_send_request() or qio_channel_write_all() in nbd_co_send_request. By the way, the yield is due to the kernel return EAGAIN(under the stress test).
    However, NBD didn't know it would yield here. So, the nbd_recv_coroutines_wake won't wake it up because req->receiving is false. if the NBD server
    is terminated abnormally at the same time. The G_IO_OUT will be invalid, the coroutine will never be awakened. In addition, the s->ioc will be released
    immediately. if we force to wake up the coroutine of NBD, access to the ioc->xxx will cause segment fault. Finally, I add a state named force_quit to
    the QIOChannel to ensure that QIO will exit immediately. And I add the function of qio_channel_coroutines_wake to wake it up.

Signed-off-by: Lei Rao <lei.rao@intel.com>
Signed-off-by: Zhang Chen <chen.zhang@intel.com>
---
 block/nbd.c          |  5 +++++
 include/io/channel.h | 19 +++++++++++++++++++
 io/channel.c         | 22 ++++++++++++++++++++++
 3 files changed, 46 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 5ef462db1b..5ee4eaaf57 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -208,6 +208,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)
     assert(!s->in_flight);
 
     if (s->ioc) {
+        qio_channel_set_force_quit(s->ioc, true);
+        qio_channel_coroutines_wake(s->ioc);
         qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);
         yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
                                  nbd_yank, s->bs);
@@ -315,6 +317,7 @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState *bs,
 
     /* successfully connected */
     s->state = NBD_CLIENT_CONNECTED;
+    qio_channel_set_force_quit(s->ioc, false);
     qemu_co_queue_restart_all(&s->free_sema);
 
     return 0;
@@ -345,6 +348,8 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
 
     /* Finalize previous connection if any */
     if (s->ioc) {
+        qio_channel_set_force_quit(s->ioc, true);
+        qio_channel_coroutines_wake(s->ioc);
         qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
         yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
                                  nbd_yank, s->bs);
diff --git a/include/io/channel.h b/include/io/channel.h
index 88988979f8..bc5af8cdd6 100644
--- a/include/io/channel.h
+++ b/include/io/channel.h
@@ -78,6 +78,7 @@ struct QIOChannel {
     AioContext *ctx;
     Coroutine *read_coroutine;
     Coroutine *write_coroutine;
+    bool force_quit;
 #ifdef _WIN32
     HANDLE event; /* For use with GSource on Win32 */
 #endif
@@ -484,6 +485,24 @@ int qio_channel_set_blocking(QIOChannel *ioc,
                              bool enabled,
                              Error **errp);
 
+/**
+ * qio_channel_force_quit:
+ * @ioc: the channel object
+ * @quit: the new flag state
+ *
+ * Set the new flag state
+ */
+void qio_channel_set_force_quit(QIOChannel *ioc, bool quit);
+
+/**
+ * qio_channel_coroutines_wake:
+ * @ioc: the channel object
+ *
+ * Wake up the coroutines to ensure that they will exit normally
+ * when the server terminated abnormally
+ */
+void qio_channel_coroutines_wake(QIOChannel *ioc);
+
 /**
  * qio_channel_close:
  * @ioc: the channel object
diff --git a/io/channel.c b/io/channel.c
index e8b019dc36..bc1a9e055c 100644
--- a/io/channel.c
+++ b/io/channel.c
@@ -136,6 +136,9 @@ int qio_channel_readv_full_all_eof(QIOChannel *ioc,
         if (len == QIO_CHANNEL_ERR_BLOCK) {
             if (qemu_in_coroutine()) {
                 qio_channel_yield(ioc, G_IO_IN);
+                if (ioc->force_quit) {
+                    goto cleanup;
+                }
             } else {
                 qio_channel_wait(ioc, G_IO_IN);
             }
@@ -242,6 +245,9 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
         if (len == QIO_CHANNEL_ERR_BLOCK) {
             if (qemu_in_coroutine()) {
                 qio_channel_yield(ioc, G_IO_OUT);
+                if (ioc->force_quit) {
+                    goto cleanup;
+                }
             } else {
                 qio_channel_wait(ioc, G_IO_OUT);
             }
@@ -543,6 +549,9 @@ void coroutine_fn qio_channel_yield(QIOChannel *ioc,
     }
     qio_channel_set_aio_fd_handlers(ioc);
     qemu_coroutine_yield();
+    if (ioc->force_quit) {
+        return;
+    }
 
     /* Allow interrupting the operation by reentering the coroutine other than
      * through the aio_fd_handlers. */
@@ -555,6 +564,19 @@ void coroutine_fn qio_channel_yield(QIOChannel *ioc,
     }
 }
 
+void qio_channel_set_force_quit(QIOChannel *ioc, bool quit)
+{
+    ioc->force_quit = quit;
+}
+
+void qio_channel_coroutines_wake(QIOChannel *ioc)
+{
+    if (ioc->read_coroutine) {
+        qio_channel_restart_read(ioc);
+    } else if (ioc->write_coroutine) {
+        qio_channel_restart_write(ioc);
+    }
+}
 
 static gboolean qio_channel_wait_complete(QIOChannel *ioc,
                                           GIOCondition condition,
-- 
2.32.0


Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
Posted by Daniel P. Berrangé 2 years, 4 months ago
On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote:
>     We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO.
>     The patch fixes as follow:
>         #0  0x00007fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
>         #1  0x00005563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=599999697630) at util/qemu-timer.c:348
>         #2  0x00005563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669
>         #3  0x00005563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432
>         #4  0x00005563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438
>         #5  0x00005563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063
>         #6  0x00005563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568
>         #7  0x00005563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494
>         #8  0x00005563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538
>         #9  0x00005563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98)
>             at qapi/qmp-dispatch.c:132
>         #10 0x00005563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175
>         #11 0x00005563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145
>         #12 0x00005563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234
>         #13 0x00005563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89
>         #14 0x00005563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117
>         #15 0x00005563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459
>         #16 0x00005563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260
>         #17 0x00007fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
>         #18 0x00005563d6978cda in glib_pollfds_poll () at util/main-loop.c:219
>         #19 0x00005563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242
>         #20 0x00005563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518
>         #21 0x00005563d658f5ed in main_loop () at vl.c:1814
>         #22 0x00005563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8, envp=0x7fff3cf5c2b8) at vl.c:450
> 
>     From the call trace, we can see that the QEMU main thread is waiting for the in_flight return to zero. But the in_filght never reaches 0.
>     After analysis, the root cause is that the coroutine of NBD was not awakened. Although this bug was found in the COLO test, I think this is a
>     universal problem in the QIO module. This issue also affects other modules depending on QIO such as NBD. We dump the following data:
>     $2 = {
>       in_flight = 2,
>       state = NBD_CLIENT_QUIT,
>       connect_status = 0,
>       connect_err = 0x0,
>       wait_in_flight = false,
>       requests = {{
>           coroutine = 0x0,
>           offset = 273077071872,
>           receiving = false,
>         }, {
>           coroutine = 0x7f1164571df0,
>           offset = 498792529920,
>           receiving = false,
>         }, {
>           coroutine = 0x0,
>           offset = 500663590912,
>           receiving = false,
>         }, {
>           ...
>         } },
>       reply = {
>         simple = {
>           magic = 1732535960,
>           error = 0,
>           handle = 0
>         },
>         structured = {
>           magic = 1732535960,
>           flags = 0,
>           type = 0,
>           handle = 0,
>           length = 0
>         },
>         {
>           magic = 1732535960,
>           _skip = 0,
>           handle = 0
>         }
>       },
>       bs = 0x7f11640e2140,
>       reconnect_delay = 0,
>       saddr = 0x7f11640e1f80,
>       export = 0x7f11640dc560 "parent0",
>     }
>     From the data, we can see the coroutine of NBD does not exit normally when killing the NBD server(we kill the Secondary VM in the COLO stress test).
>     Then it will not execute in_flight--. So, the QEMU main thread will hang here. Further analysis, I found the coroutine of NBD will yield
>     in nbd_send_request() or qio_channel_write_all() in nbd_co_send_request. By the way, the yield is due to the kernel return EAGAIN(under the stress test).
>     However, NBD didn't know it would yield here. So, the nbd_recv_coroutines_wake won't wake it up because req->receiving is false. if the NBD server
>     is terminated abnormally at the same time. The G_IO_OUT will be invalid, the coroutine will never be awakened. In addition, the s->ioc will be released
>     immediately. if we force to wake up the coroutine of NBD, access to the ioc->xxx will cause segment fault. Finally, I add a state named force_quit to
>     the QIOChannel to ensure that QIO will exit immediately. And I add the function of qio_channel_coroutines_wake to wake it up.
> 
> Signed-off-by: Lei Rao <lei.rao@intel.com>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> ---
>  block/nbd.c          |  5 +++++
>  include/io/channel.h | 19 +++++++++++++++++++
>  io/channel.c         | 22 ++++++++++++++++++++++
>  3 files changed, 46 insertions(+)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 5ef462db1b..5ee4eaaf57 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -208,6 +208,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>      assert(!s->in_flight);
>  
>      if (s->ioc) {
> +        qio_channel_set_force_quit(s->ioc, true);
> +        qio_channel_coroutines_wake(s->ioc);
>          qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, NULL);

Calling shutdown here should already be causing the
qio_chanel_readv_all to wakeup and break out of its
poll() loop. We shouldn't need to add a second way
to break out of the poll().

>          yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
>                                   nbd_yank, s->bs);
> @@ -315,6 +317,7 @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState *bs,
>  
>      /* successfully connected */
>      s->state = NBD_CLIENT_CONNECTED;
> +    qio_channel_set_force_quit(s->ioc, false);
>      qemu_co_queue_restart_all(&s->free_sema);
>  
>      return 0;
> @@ -345,6 +348,8 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
>  
>      /* Finalize previous connection if any */
>      if (s->ioc) {
> +        qio_channel_set_force_quit(s->ioc, true);
> +        qio_channel_coroutines_wake(s->ioc);

Isn't this code path just missing a qio_channel_shutdown call or
a qio_channel_close call to make the socket trigger wakeup from
poll.

>          qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
>          yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
>                                   nbd_yank, s->bs);
> diff --git a/include/io/channel.h b/include/io/channel.h
> index 88988979f8..bc5af8cdd6 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -78,6 +78,7 @@ struct QIOChannel {
>      AioContext *ctx;
>      Coroutine *read_coroutine;
>      Coroutine *write_coroutine;
> +    bool force_quit;
>  #ifdef _WIN32
>      HANDLE event; /* For use with GSource on Win32 */
>  #endif
> @@ -484,6 +485,24 @@ int qio_channel_set_blocking(QIOChannel *ioc,
>                               bool enabled,
>                               Error **errp);
>  
> +/**
> + * qio_channel_force_quit:
> + * @ioc: the channel object
> + * @quit: the new flag state
> + *
> + * Set the new flag state
> + */
> +void qio_channel_set_force_quit(QIOChannel *ioc, bool quit);
> +
> +/**
> + * qio_channel_coroutines_wake:
> + * @ioc: the channel object
> + *
> + * Wake up the coroutines to ensure that they will exit normally
> + * when the server terminated abnormally
> + */
> +void qio_channel_coroutines_wake(QIOChannel *ioc);
> +
>  /**
>   * qio_channel_close:
>   * @ioc: the channel object
> diff --git a/io/channel.c b/io/channel.c
> index e8b019dc36..bc1a9e055c 100644
> --- a/io/channel.c
> +++ b/io/channel.c
> @@ -136,6 +136,9 @@ int qio_channel_readv_full_all_eof(QIOChannel *ioc,
>          if (len == QIO_CHANNEL_ERR_BLOCK) {
>              if (qemu_in_coroutine()) {
>                  qio_channel_yield(ioc, G_IO_IN);
> +                if (ioc->force_quit) {
> +                    goto cleanup;
> +                }
>              } else {
>                  qio_channel_wait(ioc, G_IO_IN);
>              }
> @@ -242,6 +245,9 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
>          if (len == QIO_CHANNEL_ERR_BLOCK) {
>              if (qemu_in_coroutine()) {
>                  qio_channel_yield(ioc, G_IO_OUT);
> +                if (ioc->force_quit) {
> +                    goto cleanup;
> +                }
>              } else {
>                  qio_channel_wait(ioc, G_IO_OUT);
>              }
> @@ -543,6 +549,9 @@ void coroutine_fn qio_channel_yield(QIOChannel *ioc,
>      }
>      qio_channel_set_aio_fd_handlers(ioc);
>      qemu_coroutine_yield();
> +    if (ioc->force_quit) {
> +        return;
> +    }
>  
>      /* Allow interrupting the operation by reentering the coroutine other than
>       * through the aio_fd_handlers. */
> @@ -555,6 +564,19 @@ void coroutine_fn qio_channel_yield(QIOChannel *ioc,
>      }
>  }
>  
> +void qio_channel_set_force_quit(QIOChannel *ioc, bool quit)
> +{
> +    ioc->force_quit = quit;
> +}
> +
> +void qio_channel_coroutines_wake(QIOChannel *ioc)
> +{
> +    if (ioc->read_coroutine) {
> +        qio_channel_restart_read(ioc);
> +    } else if (ioc->write_coroutine) {
> +        qio_channel_restart_write(ioc);
> +    }
> +}

None of this should be needed AFAICT, as the poll/coroutine code
shuld already break out of poll if the socket is closed, or is
marked shutdown.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


RE: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
Posted by Rao, Lei 2 years, 4 months ago

-----Original Message-----
From: Daniel P. Berrangé <berrange@redhat.com> 
Sent: Wednesday, December 1, 2021 5:11 PM
To: Rao, Lei <lei.rao@intel.com>
Cc: Zhang, Chen <chen.zhang@intel.com>; eblake@redhat.com; vsementsov@virtuozzo.com; kwolf@redhat.com; hreitz@redhat.com; qemu-block@nongnu.org; qemu-devel@nongnu.org
Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case

On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote:
>     We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO.
>     The patch fixes as follow:
>         #0  0x00007fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
>         #1  0x00005563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=599999697630) at util/qemu-timer.c:348
>         #2  0x00005563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669
>         #3  0x00005563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432
>         #4  0x00005563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438
>         #5  0x00005563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063
>         #6  0x00005563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568
>         #7  0x00005563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494
>         #8  0x00005563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538
>         #9  0x00005563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98)
>             at qapi/qmp-dispatch.c:132
>         #10 0x00005563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175
>         #11 0x00005563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145
>         #12 0x00005563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234
>         #13 0x00005563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89
>         #14 0x00005563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117
>         #15 0x00005563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459
>         #16 0x00005563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260
>         #17 0x00007fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
>         #18 0x00005563d6978cda in glib_pollfds_poll () at util/main-loop.c:219
>         #19 0x00005563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242
>         #20 0x00005563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518
>         #21 0x00005563d658f5ed in main_loop () at vl.c:1814
>         #22 0x00005563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8, 
> envp=0x7fff3cf5c2b8) at vl.c:450
> 
>     From the call trace, we can see that the QEMU main thread is waiting for the in_flight return to zero. But the in_filght never reaches 0.
>     After analysis, the root cause is that the coroutine of NBD was not awakened. Although this bug was found in the COLO test, I think this is a
>     universal problem in the QIO module. This issue also affects other modules depending on QIO such as NBD. We dump the following data:
>     $2 = {
>       in_flight = 2,
>       state = NBD_CLIENT_QUIT,
>       connect_status = 0,
>       connect_err = 0x0,
>       wait_in_flight = false,
>       requests = {{
>           coroutine = 0x0,
>           offset = 273077071872,
>           receiving = false,
>         }, {
>           coroutine = 0x7f1164571df0,
>           offset = 498792529920,
>           receiving = false,
>         }, {
>           coroutine = 0x0,
>           offset = 500663590912,
>           receiving = false,
>         }, {
>           ...
>         } },
>       reply = {
>         simple = {
>           magic = 1732535960,
>           error = 0,
>           handle = 0
>         },
>         structured = {
>           magic = 1732535960,
>           flags = 0,
>           type = 0,
>           handle = 0,
>           length = 0
>         },
>         {
>           magic = 1732535960,
>           _skip = 0,
>           handle = 0
>         }
>       },
>       bs = 0x7f11640e2140,
>       reconnect_delay = 0,
>       saddr = 0x7f11640e1f80,
>       export = 0x7f11640dc560 "parent0",
>     }
>     From the data, we can see the coroutine of NBD does not exit normally when killing the NBD server(we kill the Secondary VM in the COLO stress test).
>     Then it will not execute in_flight--. So, the QEMU main thread will hang here. Further analysis, I found the coroutine of NBD will yield
>     in nbd_send_request() or qio_channel_write_all() in nbd_co_send_request. By the way, the yield is due to the kernel return EAGAIN(under the stress test).
>     However, NBD didn't know it would yield here. So, the nbd_recv_coroutines_wake won't wake it up because req->receiving is false. if the NBD server
>     is terminated abnormally at the same time. The G_IO_OUT will be invalid, the coroutine will never be awakened. In addition, the s->ioc will be released
>     immediately. if we force to wake up the coroutine of NBD, access to the ioc->xxx will cause segment fault. Finally, I add a state named force_quit to
>     the QIOChannel to ensure that QIO will exit immediately. And I add the function of qio_channel_coroutines_wake to wake it up.
> 
> Signed-off-by: Lei Rao <lei.rao@intel.com>
> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> ---
>  block/nbd.c          |  5 +++++
>  include/io/channel.h | 19 +++++++++++++++++++
>  io/channel.c         | 22 ++++++++++++++++++++++
>  3 files changed, 46 insertions(+)
> 
> diff --git a/block/nbd.c b/block/nbd.c index 5ef462db1b..5ee4eaaf57 
> 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> @@ -208,6 +208,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>      assert(!s->in_flight);
>  
>      if (s->ioc) {
> +        qio_channel_set_force_quit(s->ioc, true);
> +        qio_channel_coroutines_wake(s->ioc);
>          qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, 
> NULL);

Calling shutdown here should already be causing the qio_chanel_readv_all to wakeup and break out of its
poll() loop. We shouldn't need to add a second way to break out of the poll().

Calling shutdown can wake up the coroutines which is polling. But I think it's not enough. I tried to forcibly wake up the NBD coroutine,
It may cause segment fault. The root cause is that it will continue to access ioc->xxx in qio_channel_yield(), but the ioc has been released and set it NULL such as in
nbd_co_do_establish_connection(); I think call shutdown will have the same result. So, I add the force_quit, once set it true, it will immediately exit without accessing IOC.

Thanks
Lei

>          yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
>                                   nbd_yank, s->bs); @@ -315,6 +317,7 
> @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState 
> *bs,
>  
>      /* successfully connected */
>      s->state = NBD_CLIENT_CONNECTED;
> +    qio_channel_set_force_quit(s->ioc, false);
>      qemu_co_queue_restart_all(&s->free_sema);
>  
>      return 0;
> @@ -345,6 +348,8 @@ static coroutine_fn void 
> nbd_reconnect_attempt(BDRVNBDState *s)
>  
>      /* Finalize previous connection if any */
>      if (s->ioc) {
> +        qio_channel_set_force_quit(s->ioc, true);
> +        qio_channel_coroutines_wake(s->ioc);

Isn't this code path just missing a qio_channel_shutdown call or a qio_channel_close call to make the socket trigger wakeup from poll.

I don't think it can solve the bug even if it is added, the reason is as above.
Thanks,
Lei

>          qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
>          yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
>                                   nbd_yank, s->bs); diff --git 
> a/include/io/channel.h b/include/io/channel.h index 
> 88988979f8..bc5af8cdd6 100644
> --- a/include/io/channel.h
> +++ b/include/io/channel.h
> @@ -78,6 +78,7 @@ struct QIOChannel {
>      AioContext *ctx;
>      Coroutine *read_coroutine;
>      Coroutine *write_coroutine;
> +    bool force_quit;
>  #ifdef _WIN32
>      HANDLE event; /* For use with GSource on Win32 */  #endif @@ 
> -484,6 +485,24 @@ int qio_channel_set_blocking(QIOChannel *ioc,
>                               bool enabled,
>                               Error **errp);
>  
> +/**
> + * qio_channel_force_quit:
> + * @ioc: the channel object
> + * @quit: the new flag state
> + *
> + * Set the new flag state
> + */
> +void qio_channel_set_force_quit(QIOChannel *ioc, bool quit);
> +
> +/**
> + * qio_channel_coroutines_wake:
> + * @ioc: the channel object
> + *
> + * Wake up the coroutines to ensure that they will exit normally
> + * when the server terminated abnormally  */ void 
> +qio_channel_coroutines_wake(QIOChannel *ioc);
> +
>  /**
>   * qio_channel_close:
>   * @ioc: the channel object
> diff --git a/io/channel.c b/io/channel.c index e8b019dc36..bc1a9e055c 
> 100644
> --- a/io/channel.c
> +++ b/io/channel.c
> @@ -136,6 +136,9 @@ int qio_channel_readv_full_all_eof(QIOChannel *ioc,
>          if (len == QIO_CHANNEL_ERR_BLOCK) {
>              if (qemu_in_coroutine()) {
>                  qio_channel_yield(ioc, G_IO_IN);
> +                if (ioc->force_quit) {
> +                    goto cleanup;
> +                }
>              } else {
>                  qio_channel_wait(ioc, G_IO_IN);
>              }
> @@ -242,6 +245,9 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
>          if (len == QIO_CHANNEL_ERR_BLOCK) {
>              if (qemu_in_coroutine()) {
>                  qio_channel_yield(ioc, G_IO_OUT);
> +                if (ioc->force_quit) {
> +                    goto cleanup;
> +                }
>              } else {
>                  qio_channel_wait(ioc, G_IO_OUT);
>              }
> @@ -543,6 +549,9 @@ void coroutine_fn qio_channel_yield(QIOChannel *ioc,
>      }
>      qio_channel_set_aio_fd_handlers(ioc);
>      qemu_coroutine_yield();
> +    if (ioc->force_quit) {
> +        return;
> +    }
>  
>      /* Allow interrupting the operation by reentering the coroutine other than
>       * through the aio_fd_handlers. */ @@ -555,6 +564,19 @@ void 
> coroutine_fn qio_channel_yield(QIOChannel *ioc,
>      }
>  }
>  
> +void qio_channel_set_force_quit(QIOChannel *ioc, bool quit) {
> +    ioc->force_quit = quit;
> +}
> +
> +void qio_channel_coroutines_wake(QIOChannel *ioc) {
> +    if (ioc->read_coroutine) {
> +        qio_channel_restart_read(ioc);
> +    } else if (ioc->write_coroutine) {
> +        qio_channel_restart_write(ioc);
> +    }
> +}

None of this should be needed AFAICT, as the poll/coroutine code shuld already break out of poll if the socket is closed, or is marked shutdown.


Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
Posted by Vladimir Sementsov-Ogievskiy 2 years, 4 months ago
01.12.2021 12:48, Rao, Lei wrote:
> 
> 
> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Wednesday, December 1, 2021 5:11 PM
> To: Rao, Lei <lei.rao@intel.com>
> Cc: Zhang, Chen <chen.zhang@intel.com>; eblake@redhat.com; vsementsov@virtuozzo.com; kwolf@redhat.com; hreitz@redhat.com; qemu-block@nongnu.org; qemu-devel@nongnu.org
> Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
> 
> On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote:
>>      We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO.
>>      The patch fixes as follow:
>>          #0  0x00007fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
>>          #1  0x00005563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=599999697630) at util/qemu-timer.c:348
>>          #2  0x00005563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669
>>          #3  0x00005563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432
>>          #4  0x00005563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438
>>          #5  0x00005563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063
>>          #6  0x00005563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568
>>          #7  0x00005563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494
>>          #8  0x00005563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538
>>          #9  0x00005563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98)
>>              at qapi/qmp-dispatch.c:132
>>          #10 0x00005563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175
>>          #11 0x00005563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145
>>          #12 0x00005563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234
>>          #13 0x00005563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89
>>          #14 0x00005563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117
>>          #15 0x00005563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459
>>          #16 0x00005563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260
>>          #17 0x00007fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>          #18 0x00005563d6978cda in glib_pollfds_poll () at util/main-loop.c:219
>>          #19 0x00005563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242
>>          #20 0x00005563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518
>>          #21 0x00005563d658f5ed in main_loop () at vl.c:1814
>>          #22 0x00005563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8,
>> envp=0x7fff3cf5c2b8) at vl.c:450
>>
>>      From the call trace, we can see that the QEMU main thread is waiting for the in_flight return to zero. But the in_filght never reaches 0.
>>      After analysis, the root cause is that the coroutine of NBD was not awakened. Although this bug was found in the COLO test, I think this is a
>>      universal problem in the QIO module. This issue also affects other modules depending on QIO such as NBD. We dump the following data:
>>      $2 = {
>>        in_flight = 2,
>>        state = NBD_CLIENT_QUIT,
>>        connect_status = 0,
>>        connect_err = 0x0,
>>        wait_in_flight = false,
>>        requests = {{
>>            coroutine = 0x0,
>>            offset = 273077071872,
>>            receiving = false,
>>          }, {
>>            coroutine = 0x7f1164571df0,
>>            offset = 498792529920,
>>            receiving = false,
>>          }, {
>>            coroutine = 0x0,
>>            offset = 500663590912,
>>            receiving = false,
>>          }, {
>>            ...
>>          } },
>>        reply = {
>>          simple = {
>>            magic = 1732535960,
>>            error = 0,
>>            handle = 0
>>          },
>>          structured = {
>>            magic = 1732535960,
>>            flags = 0,
>>            type = 0,
>>            handle = 0,
>>            length = 0
>>          },
>>          {
>>            magic = 1732535960,
>>            _skip = 0,
>>            handle = 0
>>          }
>>        },
>>        bs = 0x7f11640e2140,
>>        reconnect_delay = 0,
>>        saddr = 0x7f11640e1f80,
>>        export = 0x7f11640dc560 "parent0",
>>      }
>>      From the data, we can see the coroutine of NBD does not exit normally when killing the NBD server(we kill the Secondary VM in the COLO stress test).
>>      Then it will not execute in_flight--. So, the QEMU main thread will hang here. Further analysis, I found the coroutine of NBD will yield
>>      in nbd_send_request() or qio_channel_write_all() in nbd_co_send_request. By the way, the yield is due to the kernel return EAGAIN(under the stress test).
>>      However, NBD didn't know it would yield here. So, the nbd_recv_coroutines_wake won't wake it up because req->receiving is false. if the NBD server
>>      is terminated abnormally at the same time. The G_IO_OUT will be invalid, the coroutine will never be awakened. In addition, the s->ioc will be released
>>      immediately. if we force to wake up the coroutine of NBD, access to the ioc->xxx will cause segment fault. Finally, I add a state named force_quit to
>>      the QIOChannel to ensure that QIO will exit immediately. And I add the function of qio_channel_coroutines_wake to wake it up.

If qio_channel_shutdown() can't kill request that is in qio_channel_write_all() - it's not a reponsibility of nbd driver, qio channel layer should take care of this..

Or, you probably need some keep-alive settings set up.

>>
>> Signed-off-by: Lei Rao <lei.rao@intel.com>
>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>> ---
>>   block/nbd.c          |  5 +++++
>>   include/io/channel.h | 19 +++++++++++++++++++
>>   io/channel.c         | 22 ++++++++++++++++++++++
>>   3 files changed, 46 insertions(+)
>>
>> diff --git a/block/nbd.c b/block/nbd.c index 5ef462db1b..5ee4eaaf57
>> 100644
>> --- a/block/nbd.c
>> +++ b/block/nbd.c
>> @@ -208,6 +208,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>>       assert(!s->in_flight);
>>   
>>       if (s->ioc) {
>> +        qio_channel_set_force_quit(s->ioc, true);
>> +        qio_channel_coroutines_wake(s->ioc);
>>           qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH,
>> NULL);
> 
> Calling shutdown here should already be causing the qio_chanel_readv_all to wakeup and break out of its
> poll() loop. We shouldn't need to add a second way to break out of the poll().
> 
> Calling shutdown can wake up the coroutines which is polling. But I think it's not enough. I tried to forcibly wake up the NBD coroutine,
> It may cause segment fault. The root cause is that it will continue to access ioc->xxx in qio_channel_yield(), but the ioc has been released and set it NULL such as in
> nbd_co_do_establish_connection(); I think call shutdown will have the same result. So, I add the force_quit, once set it true, it will immediately exit without accessing IOC.
> 

What do you mean by "the NBD coroutine" and by "forcibly wake up"?

In recent Qemu there is no specific NBD coroutine. Only request coroutines should exist.

Forcibly waking up any coroutine is generally unsafe in Qemu, most of the code is not prepared for this, crash is normal result for such try.

Also, there is good mechanism to debug coroutines in gdb:

source scripts/qemu-gdb.py
qemu coroutine <coroutine pointer>

- it will dump stack trace of a coroutine, it may help.

Also, to find coroutines look at bs->tracked_requests list, all requests of bs are in the list with coroutine pointers in .co field.

> Thanks
> Lei
> 
>>           yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
>>                                    nbd_yank, s->bs); @@ -315,6 +317,7
>> @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState
>> *bs,
>>   
>>       /* successfully connected */
>>       s->state = NBD_CLIENT_CONNECTED;
>> +    qio_channel_set_force_quit(s->ioc, false);
>>       qemu_co_queue_restart_all(&s->free_sema);
>>   
>>       return 0;
>> @@ -345,6 +348,8 @@ static coroutine_fn void
>> nbd_reconnect_attempt(BDRVNBDState *s)
>>   
>>       /* Finalize previous connection if any */
>>       if (s->ioc) {
>> +        qio_channel_set_force_quit(s->ioc, true);
>> +        qio_channel_coroutines_wake(s->ioc);
> 
> Isn't this code path just missing a qio_channel_shutdown call or a qio_channel_close call to make the socket trigger wakeup from poll.
> 
> I don't think it can solve the bug even if it is added, the reason is as above.
> Thanks,
> Lei
> 
>>           qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
>>           yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
>>                                    nbd_yank, s->bs); diff --git
>> a/include/io/channel.h b/include/io/channel.h index
>> 88988979f8..bc5af8cdd6 100644
>> --- a/include/io/channel.h
>> +++ b/include/io/channel.h
>> @@ -78,6 +78,7 @@ struct QIOChannel {
>>       AioContext *ctx;
>>       Coroutine *read_coroutine;
>>       Coroutine *write_coroutine;
>> +    bool force_quit;
>>   #ifdef _WIN32
>>       HANDLE event; /* For use with GSource on Win32 */  #endif @@
>> -484,6 +485,24 @@ int qio_channel_set_blocking(QIOChannel *ioc,
>>                                bool enabled,
>>                                Error **errp);
>>   
>> +/**
>> + * qio_channel_force_quit:
>> + * @ioc: the channel object
>> + * @quit: the new flag state
>> + *
>> + * Set the new flag state
>> + */
>> +void qio_channel_set_force_quit(QIOChannel *ioc, bool quit);
>> +
>> +/**
>> + * qio_channel_coroutines_wake:
>> + * @ioc: the channel object
>> + *
>> + * Wake up the coroutines to ensure that they will exit normally
>> + * when the server terminated abnormally  */ void
>> +qio_channel_coroutines_wake(QIOChannel *ioc);
>> +
>>   /**
>>    * qio_channel_close:
>>    * @ioc: the channel object
>> diff --git a/io/channel.c b/io/channel.c index e8b019dc36..bc1a9e055c
>> 100644
>> --- a/io/channel.c
>> +++ b/io/channel.c
>> @@ -136,6 +136,9 @@ int qio_channel_readv_full_all_eof(QIOChannel *ioc,
>>           if (len == QIO_CHANNEL_ERR_BLOCK) {
>>               if (qemu_in_coroutine()) {
>>                   qio_channel_yield(ioc, G_IO_IN);
>> +                if (ioc->force_quit) {
>> +                    goto cleanup;
>> +                }
>>               } else {
>>                   qio_channel_wait(ioc, G_IO_IN);
>>               }
>> @@ -242,6 +245,9 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
>>           if (len == QIO_CHANNEL_ERR_BLOCK) {
>>               if (qemu_in_coroutine()) {
>>                   qio_channel_yield(ioc, G_IO_OUT);
>> +                if (ioc->force_quit) {
>> +                    goto cleanup;
>> +                }
>>               } else {
>>                   qio_channel_wait(ioc, G_IO_OUT);
>>               }
>> @@ -543,6 +549,9 @@ void coroutine_fn qio_channel_yield(QIOChannel *ioc,
>>       }
>>       qio_channel_set_aio_fd_handlers(ioc);
>>       qemu_coroutine_yield();
>> +    if (ioc->force_quit) {
>> +        return;
>> +    }
>>   
>>       /* Allow interrupting the operation by reentering the coroutine other than
>>        * through the aio_fd_handlers. */ @@ -555,6 +564,19 @@ void
>> coroutine_fn qio_channel_yield(QIOChannel *ioc,
>>       }
>>   }
>>   
>> +void qio_channel_set_force_quit(QIOChannel *ioc, bool quit) {
>> +    ioc->force_quit = quit;
>> +}
>> +
>> +void qio_channel_coroutines_wake(QIOChannel *ioc) {
>> +    if (ioc->read_coroutine) {
>> +        qio_channel_restart_read(ioc);
>> +    } else if (ioc->write_coroutine) {
>> +        qio_channel_restart_write(ioc);
>> +    }
>> +}
> 
> None of this should be needed AFAICT, as the poll/coroutine code shuld already break out of poll if the socket is closed, or is marked shutdown.
> 
> 
> Regards,
> Daniel
> 


-- 
Best regards,
Vladimir

RE: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
Posted by Rao, Lei 2 years, 4 months ago

-----Original Message-----
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> 
Sent: Wednesday, December 1, 2021 10:27 PM
To: Rao, Lei <lei.rao@intel.com>; Daniel P. Berrangé <berrange@redhat.com>
Cc: Zhang, Chen <chen.zhang@intel.com>; eblake@redhat.com; kwolf@redhat.com; hreitz@redhat.com; qemu-block@nongnu.org; qemu-devel@nongnu.org
Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case

01.12.2021 12:48, Rao, Lei wrote:
> 
> 
> -----Original Message-----
> From: Daniel P. Berrangé <berrange@redhat.com>
> Sent: Wednesday, December 1, 2021 5:11 PM
> To: Rao, Lei <lei.rao@intel.com>
> Cc: Zhang, Chen <chen.zhang@intel.com>; eblake@redhat.com; 
> vsementsov@virtuozzo.com; kwolf@redhat.com; hreitz@redhat.com; 
> qemu-block@nongnu.org; qemu-devel@nongnu.org
> Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure 
> QIO exits cleanly in some corner case
> 
> On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote:
>>      We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO.
>>      The patch fixes as follow:
>>          #0  0x00007fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
>>          #1  0x00005563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=599999697630) at util/qemu-timer.c:348
>>          #2  0x00005563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669
>>          #3  0x00005563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432
>>          #4  0x00005563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438
>>          #5  0x00005563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063
>>          #6  0x00005563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568
>>          #7  0x00005563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494
>>          #8  0x00005563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538
>>          #9  0x00005563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98)
>>              at qapi/qmp-dispatch.c:132
>>          #10 0x00005563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175
>>          #11 0x00005563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145
>>          #12 0x00005563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234
>>          #13 0x00005563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89
>>          #14 0x00005563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117
>>          #15 0x00005563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459
>>          #16 0x00005563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260
>>          #17 0x00007fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>          #18 0x00005563d6978cda in glib_pollfds_poll () at util/main-loop.c:219
>>          #19 0x00005563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242
>>          #20 0x00005563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518
>>          #21 0x00005563d658f5ed in main_loop () at vl.c:1814
>>          #22 0x00005563d6596bb7 in main (argc=61, 
>> argv=0x7fff3cf5c0c8,
>> envp=0x7fff3cf5c2b8) at vl.c:450
>>
>>      From the call trace, we can see that the QEMU main thread is waiting for the in_flight return to zero. But the in_filght never reaches 0.
>>      After analysis, the root cause is that the coroutine of NBD was not awakened. Although this bug was found in the COLO test, I think this is a
>>      universal problem in the QIO module. This issue also affects other modules depending on QIO such as NBD. We dump the following data:
>>      $2 = {
>>        in_flight = 2,
>>        state = NBD_CLIENT_QUIT,
>>        connect_status = 0,
>>        connect_err = 0x0,
>>        wait_in_flight = false,
>>        requests = {{
>>            coroutine = 0x0,
>>            offset = 273077071872,
>>            receiving = false,
>>          }, {
>>            coroutine = 0x7f1164571df0,
>>            offset = 498792529920,
>>            receiving = false,
>>          }, {
>>            coroutine = 0x0,
>>            offset = 500663590912,
>>            receiving = false,
>>          }, {
>>            ...
>>          } },
>>        reply = {
>>          simple = {
>>            magic = 1732535960,
>>            error = 0,
>>            handle = 0
>>          },
>>          structured = {
>>            magic = 1732535960,
>>            flags = 0,
>>            type = 0,
>>            handle = 0,
>>            length = 0
>>          },
>>          {
>>            magic = 1732535960,
>>            _skip = 0,
>>            handle = 0
>>          }
>>        },
>>        bs = 0x7f11640e2140,
>>        reconnect_delay = 0,
>>        saddr = 0x7f11640e1f80,
>>        export = 0x7f11640dc560 "parent0",
>>      }
>>      From the data, we can see the coroutine of NBD does not exit normally when killing the NBD server(we kill the Secondary VM in the COLO stress test).
>>      Then it will not execute in_flight--. So, the QEMU main thread will hang here. Further analysis, I found the coroutine of NBD will yield
>>      in nbd_send_request() or qio_channel_write_all() in nbd_co_send_request. By the way, the yield is due to the kernel return EAGAIN(under the stress test).
>>      However, NBD didn't know it would yield here. So, the nbd_recv_coroutines_wake won't wake it up because req->receiving is false. if the NBD server
>>      is terminated abnormally at the same time. The G_IO_OUT will be invalid, the coroutine will never be awakened. In addition, the s->ioc will be released
>>      immediately. if we force to wake up the coroutine of NBD, access to the ioc->xxx will cause segment fault. Finally, I add a state named force_quit to
>>      the QIOChannel to ensure that QIO will exit immediately. And I add the function of qio_channel_coroutines_wake to wake it up.

If qio_channel_shutdown() can't kill request that is in qio_channel_write_all() - it's not a reponsibility of nbd driver, qio channel layer should take care of this..

Or, you probably need some keep-alive settings set up.

>>
>> Signed-off-by: Lei Rao <lei.rao@intel.com>
>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>> ---
>>   block/nbd.c          |  5 +++++
>>   include/io/channel.h | 19 +++++++++++++++++++
>>   io/channel.c         | 22 ++++++++++++++++++++++
>>   3 files changed, 46 insertions(+)
>>
>> diff --git a/block/nbd.c b/block/nbd.c index 5ef462db1b..5ee4eaaf57
>> 100644
>> --- a/block/nbd.c
>> +++ b/block/nbd.c
>> @@ -208,6 +208,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>>       assert(!s->in_flight);
>>   
>>       if (s->ioc) {
>> +        qio_channel_set_force_quit(s->ioc, true);
>> +        qio_channel_coroutines_wake(s->ioc);
>>           qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH, 
>> NULL);
> 
> Calling shutdown here should already be causing the 
> qio_chanel_readv_all to wakeup and break out of its
> poll() loop. We shouldn't need to add a second way to break out of the poll().
> 
> Calling shutdown can wake up the coroutines which is polling. But I 
> think it's not enough. I tried to forcibly wake up the NBD coroutine, 
> It may cause segment fault. The root cause is that it will continue to access ioc->xxx in qio_channel_yield(), but the ioc has been released and set it NULL such as in nbd_co_do_establish_connection(); I think call shutdown will have the same result. So, I add the force_quit, once set it true, it will immediately exit without accessing IOC.
> 

What do you mean by "the NBD coroutine" and by "forcibly wake up"?

In recent Qemu there is no specific NBD coroutine. Only request coroutines should exist.

Forcibly waking up any coroutine is generally unsafe in Qemu, most of the code is not prepared for this, crash is normal result for such try.

Also, there is good mechanism to debug coroutines in gdb:

source scripts/qemu-gdb.py
qemu coroutine <coroutine pointer>

- it will dump stack trace of a coroutine, it may help.

Also, to find coroutines look at bs->tracked_requests list, all requests of bs are in the list with coroutine pointers in .co field.

I am sorry for the unclear description. The NBD coroutine means the request coroutines. About "the forcibly wake up", I just set the receiving to true before qio_channel_writev_all() in nbd_co_send_request() to ensure that the request coroutines can be awakened by nbd_recv_coroutine_wake_one().
But that's just a test. Only waking up the request coroutine or shutdown the QIO is not enough because there will be a synchronization issue. For example, When the NBD request coroutine is awakened in qio_channel_writev_full_all(), it will continue to access the ioc->xxx, but the ioc has been set to NULL in 
nbd_co_do_establish_connection(); it will cause segment fault, and I have been verified this. So how about don't yield in QIO, when the request can't be sent, QIO can return to the caller directly?  Does the caller decide whether to yield or not?

> Thanks
> Lei
> 
>>           yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
>>                                    nbd_yank, s->bs); @@ -315,6 +317,7 
>> @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState
>> *bs,
>>   
>>       /* successfully connected */
>>       s->state = NBD_CLIENT_CONNECTED;
>> +    qio_channel_set_force_quit(s->ioc, false);
>>       qemu_co_queue_restart_all(&s->free_sema);
>>   
>>       return 0;
>> @@ -345,6 +348,8 @@ static coroutine_fn void 
>> nbd_reconnect_attempt(BDRVNBDState *s)
>>   
>>       /* Finalize previous connection if any */
>>       if (s->ioc) {
>> +        qio_channel_set_force_quit(s->ioc, true);
>> +        qio_channel_coroutines_wake(s->ioc);
> 
> Isn't this code path just missing a qio_channel_shutdown call or a qio_channel_close call to make the socket trigger wakeup from poll.
> 
> I don't think it can solve the bug even if it is added, the reason is as above.
> Thanks,
> Lei
> 
>>           qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
>>           yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
>>                                    nbd_yank, s->bs); diff --git 
>> a/include/io/channel.h b/include/io/channel.h index
>> 88988979f8..bc5af8cdd6 100644
>> --- a/include/io/channel.h
>> +++ b/include/io/channel.h
>> @@ -78,6 +78,7 @@ struct QIOChannel {
>>       AioContext *ctx;
>>       Coroutine *read_coroutine;
>>       Coroutine *write_coroutine;
>> +    bool force_quit;
>>   #ifdef _WIN32
>>       HANDLE event; /* For use with GSource on Win32 */  #endif @@
>> -484,6 +485,24 @@ int qio_channel_set_blocking(QIOChannel *ioc,
>>                                bool enabled,
>>                                Error **errp);
>>   
>> +/**
>> + * qio_channel_force_quit:
>> + * @ioc: the channel object
>> + * @quit: the new flag state
>> + *
>> + * Set the new flag state
>> + */
>> +void qio_channel_set_force_quit(QIOChannel *ioc, bool quit);
>> +
>> +/**
>> + * qio_channel_coroutines_wake:
>> + * @ioc: the channel object
>> + *
>> + * Wake up the coroutines to ensure that they will exit normally
>> + * when the server terminated abnormally  */ void 
>> +qio_channel_coroutines_wake(QIOChannel *ioc);
>> +
>>   /**
>>    * qio_channel_close:
>>    * @ioc: the channel object
>> diff --git a/io/channel.c b/io/channel.c index e8b019dc36..bc1a9e055c
>> 100644
>> --- a/io/channel.c
>> +++ b/io/channel.c
>> @@ -136,6 +136,9 @@ int qio_channel_readv_full_all_eof(QIOChannel *ioc,
>>           if (len == QIO_CHANNEL_ERR_BLOCK) {
>>               if (qemu_in_coroutine()) {
>>                   qio_channel_yield(ioc, G_IO_IN);
>> +                if (ioc->force_quit) {
>> +                    goto cleanup;
>> +                }
>>               } else {
>>                   qio_channel_wait(ioc, G_IO_IN);
>>               }
>> @@ -242,6 +245,9 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
>>           if (len == QIO_CHANNEL_ERR_BLOCK) {
>>               if (qemu_in_coroutine()) {
>>                   qio_channel_yield(ioc, G_IO_OUT);
>> +                if (ioc->force_quit) {
>> +                    goto cleanup;
>> +                }
>>               } else {
>>                   qio_channel_wait(ioc, G_IO_OUT);
>>               }
>> @@ -543,6 +549,9 @@ void coroutine_fn qio_channel_yield(QIOChannel *ioc,
>>       }
>>       qio_channel_set_aio_fd_handlers(ioc);
>>       qemu_coroutine_yield();
>> +    if (ioc->force_quit) {
>> +        return;
>> +    }
>>   
>>       /* Allow interrupting the operation by reentering the coroutine other than
>>        * through the aio_fd_handlers. */ @@ -555,6 +564,19 @@ void 
>> coroutine_fn qio_channel_yield(QIOChannel *ioc,
>>       }
>>   }
>>   
>> +void qio_channel_set_force_quit(QIOChannel *ioc, bool quit) {
>> +    ioc->force_quit = quit;
>> +}
>> +
>> +void qio_channel_coroutines_wake(QIOChannel *ioc) {
>> +    if (ioc->read_coroutine) {
>> +        qio_channel_restart_read(ioc);
>> +    } else if (ioc->write_coroutine) {
>> +        qio_channel_restart_write(ioc);
>> +    }
>> +}
> 
> None of this should be needed AFAICT, as the poll/coroutine code shuld already break out of poll if the socket is closed, or is marked shutdown.
> 
> 
> Regards,
> Daniel
> 


--
Best regards,
Vladimir
Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
Posted by Rao, Lei 2 years, 4 months ago
Sorry, resending with correct indentation and quoting.

On 12/1/2021 10:27 PM, Vladimir Sementsov-Ogievskiy wrote:
> 01.12.2021 12:48, Rao, Lei wrote:
>>
>>
>> -----Original Message-----
>> From: Daniel P. Berrangé <berrange@redhat.com>
>> Sent: Wednesday, December 1, 2021 5:11 PM
>> To: Rao, Lei <lei.rao@intel.com>
>> Cc: Zhang, Chen <chen.zhang@intel.com>; eblake@redhat.com; vsementsov@virtuozzo.com; kwolf@redhat.com; hreitz@redhat.com; qemu-block@nongnu.org; qemu-devel@nongnu.org
>> Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
>>
>> On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote:
>>>      We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO.
>>>      The patch fixes as follow:
>>>          #0  0x00007fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
>>>          #1  0x00005563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=599999697630) at util/qemu-timer.c:348
>>>          #2  0x00005563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669
>>>          #3  0x00005563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432
>>>          #4  0x00005563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438
>>>          #5  0x00005563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063
>>>          #6  0x00005563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568
>>>          #7  0x00005563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494
>>>          #8  0x00005563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538
>>>          #9  0x00005563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98)
>>>              at qapi/qmp-dispatch.c:132
>>>          #10 0x00005563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175
>>>          #11 0x00005563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145
>>>          #12 0x00005563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234
>>>          #13 0x00005563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89
>>>          #14 0x00005563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117
>>>          #15 0x00005563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459
>>>          #16 0x00005563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260
>>>          #17 0x00007fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>>          #18 0x00005563d6978cda in glib_pollfds_poll () at util/main-loop.c:219
>>>          #19 0x00005563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242
>>>          #20 0x00005563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518
>>>          #21 0x00005563d658f5ed in main_loop () at vl.c:1814
>>>          #22 0x00005563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8,
>>> envp=0x7fff3cf5c2b8) at vl.c:450
>>>
>>>      From the call trace, we can see that the QEMU main thread is waiting for the in_flight return to zero. But the in_filght never reaches 0.
>>>      After analysis, the root cause is that the coroutine of NBD was not awakened. Although this bug was found in the COLO test, I think this is a
>>>      universal problem in the QIO module. This issue also affects other modules depending on QIO such as NBD. We dump the following data:
>>>      $2 = {
>>>        in_flight = 2,
>>>        state = NBD_CLIENT_QUIT,
>>>        connect_status = 0,
>>>        connect_err = 0x0,
>>>        wait_in_flight = false,
>>>        requests = {{
>>>            coroutine = 0x0,
>>>            offset = 273077071872,
>>>            receiving = false,
>>>          }, {
>>>            coroutine = 0x7f1164571df0,
>>>            offset = 498792529920,
>>>            receiving = false,
>>>          }, {
>>>            coroutine = 0x0,
>>>            offset = 500663590912,
>>>            receiving = false,
>>>          }, {
>>>            ...
>>>          } },
>>>        reply = {
>>>          simple = {
>>>            magic = 1732535960,
>>>            error = 0,
>>>            handle = 0
>>>          },
>>>          structured = {
>>>            magic = 1732535960,
>>>            flags = 0,
>>>            type = 0,
>>>            handle = 0,
>>>            length = 0
>>>          },
>>>          {
>>>            magic = 1732535960,
>>>            _skip = 0,
>>>            handle = 0
>>>          }
>>>        },
>>>        bs = 0x7f11640e2140,
>>>        reconnect_delay = 0,
>>>        saddr = 0x7f11640e1f80,
>>>        export = 0x7f11640dc560 "parent0",
>>>      }
>>>      From the data, we can see the coroutine of NBD does not exit normally when killing the NBD server(we kill the Secondary VM in the COLO stress test).
>>>      Then it will not execute in_flight--. So, the QEMU main thread will hang here. Further analysis, I found the coroutine of NBD will yield
>>>      in nbd_send_request() or qio_channel_write_all() in nbd_co_send_request. By the way, the yield is due to the kernel return EAGAIN(under the stress test).
>>>      However, NBD didn't know it would yield here. So, the nbd_recv_coroutines_wake won't wake it up because req->receiving is false. if the NBD server
>>>      is terminated abnormally at the same time. The G_IO_OUT will be invalid, the coroutine will never be awakened. In addition, the s->ioc will be released
>>>      immediately. if we force to wake up the coroutine of NBD, access to the ioc->xxx will cause segment fault. Finally, I add a state named force_quit to
>>>      the QIOChannel to ensure that QIO will exit immediately. And I add the function of qio_channel_coroutines_wake to wake it up.
> 
> If qio_channel_shutdown() can't kill request that is in qio_channel_write_all() - it's not a reponsibility of nbd driver, qio channel layer should take care of this..
> 
> Or, you probably need some keep-alive settings set up.
> 
>>>
>>> Signed-off-by: Lei Rao <lei.rao@intel.com>
>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>>> ---
>>>   block/nbd.c          |  5 +++++
>>>   include/io/channel.h | 19 +++++++++++++++++++
>>>   io/channel.c         | 22 ++++++++++++++++++++++
>>>   3 files changed, 46 insertions(+)
>>>
>>> diff --git a/block/nbd.c b/block/nbd.c index 5ef462db1b..5ee4eaaf57
>>> 100644
>>> --- a/block/nbd.c
>>> +++ b/block/nbd.c
>>> @@ -208,6 +208,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>>>       assert(!s->in_flight);
>>>       if (s->ioc) {
>>> +        qio_channel_set_force_quit(s->ioc, true);
>>> +        qio_channel_coroutines_wake(s->ioc);
>>>           qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH,
>>> NULL);
>>
>> Calling shutdown here should already be causing the qio_chanel_readv_all to wakeup and break out of its
>> poll() loop. We shouldn't need to add a second way to break out of the poll().
>>
>> Calling shutdown can wake up the coroutines which is polling. But I think it's not enough. I tried to forcibly wake up the NBD coroutine,
>> It may cause segment fault. The root cause is that it will continue to access ioc->xxx in qio_channel_yield(), but the ioc has been released and set it NULL such as in
>> nbd_co_do_establish_connection(); I think call shutdown will have the same result. So, I add the force_quit, once set it true, it will immediately exit without accessing IOC.
>>
> 
> What do you mean by "the NBD coroutine" and by "forcibly wake up"?
> 
> In recent Qemu there is no specific NBD coroutine. Only request coroutines should exist.
> 
> Forcibly waking up any coroutine is generally unsafe in Qemu, most of the code is not prepared for this, crash is normal result for such try.
> 
> Also, there is good mechanism to debug coroutines in gdb:
> 
> source scripts/qemu-gdb.py
> qemu coroutine <coroutine pointer>
> 
> - it will dump stack trace of a coroutine, it may help.
> 
> Also, to find coroutines look at bs->tracked_requests list, all requests of bs are in the list with coroutine pointers in .co field.

I am sorry for the unclear description. The NBD coroutine means the request coroutines.

About "the forcibly wake up" I just set the receiving to true before qio_channel_writev_all() in nbd_co_send_request()
to ensure that the request coroutines can be awakened by nbd_recv_coroutine_wake_one(). But that's just a test.

The issue is, only waking up the request coroutine or shutdown the QIO is not enough because there will be a synchronization issue.
For example, When the NBD request coroutine is awakened in qio_channel_writev_full_all(), it will continue to access the ioc->xxx,
but the ioc has been set to NULL in nbd_co_do_establish_connection(); it will cause segment fault.

Since this is caused by QIO yielding that the upper layer is not aware of, How about we don't yield in QIO?
That is, when the request can't be sent, QIO can return to the caller directly, so the caller can decide whether to yield or not.

Thanks,
Lei

> 
>> Thanks
>> Lei
>>
>>>           yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
>>>                                    nbd_yank, s->bs); @@ -315,6 +317,7
>>> @@ int coroutine_fn nbd_co_do_establish_connection(BlockDriverState
>>> *bs,
>>>       /* successfully connected */
>>>       s->state = NBD_CLIENT_CONNECTED;
>>> +    qio_channel_set_force_quit(s->ioc, false);
>>>       qemu_co_queue_restart_all(&s->free_sema);
>>>       return 0;
>>> @@ -345,6 +348,8 @@ static coroutine_fn void
>>> nbd_reconnect_attempt(BDRVNBDState *s)
>>>       /* Finalize previous connection if any */
>>>       if (s->ioc) {
>>> +        qio_channel_set_force_quit(s->ioc, true);
>>> +        qio_channel_coroutines_wake(s->ioc);
>>
>> Isn't this code path just missing a qio_channel_shutdown call or a qio_channel_close call to make the socket trigger wakeup from poll.
>>
>> I don't think it can solve the bug even if it is added, the reason is as above.
>> Thanks,
>> Lei
>>
>>>           qio_channel_detach_aio_context(QIO_CHANNEL(s->ioc));
>>>           yank_unregister_function(BLOCKDEV_YANK_INSTANCE(s->bs->node_name),
>>>                                    nbd_yank, s->bs); diff --git
>>> a/include/io/channel.h b/include/io/channel.h index
>>> 88988979f8..bc5af8cdd6 100644
>>> --- a/include/io/channel.h
>>> +++ b/include/io/channel.h
>>> @@ -78,6 +78,7 @@ struct QIOChannel {
>>>       AioContext *ctx;
>>>       Coroutine *read_coroutine;
>>>       Coroutine *write_coroutine;
>>> +    bool force_quit;
>>>   #ifdef _WIN32
>>>       HANDLE event; /* For use with GSource on Win32 */  #endif @@
>>> -484,6 +485,24 @@ int qio_channel_set_blocking(QIOChannel *ioc,
>>>                                bool enabled,
>>>                                Error **errp);
>>> +/**
>>> + * qio_channel_force_quit:
>>> + * @ioc: the channel object
>>> + * @quit: the new flag state
>>> + *
>>> + * Set the new flag state
>>> + */
>>> +void qio_channel_set_force_quit(QIOChannel *ioc, bool quit);
>>> +
>>> +/**
>>> + * qio_channel_coroutines_wake:
>>> + * @ioc: the channel object
>>> + *
>>> + * Wake up the coroutines to ensure that they will exit normally
>>> + * when the server terminated abnormally  */ void
>>> +qio_channel_coroutines_wake(QIOChannel *ioc);
>>> +
>>>   /**
>>>    * qio_channel_close:
>>>    * @ioc: the channel object
>>> diff --git a/io/channel.c b/io/channel.c index e8b019dc36..bc1a9e055c
>>> 100644
>>> --- a/io/channel.c
>>> +++ b/io/channel.c
>>> @@ -136,6 +136,9 @@ int qio_channel_readv_full_all_eof(QIOChannel *ioc,
>>>           if (len == QIO_CHANNEL_ERR_BLOCK) {
>>>               if (qemu_in_coroutine()) {
>>>                   qio_channel_yield(ioc, G_IO_IN);
>>> +                if (ioc->force_quit) {
>>> +                    goto cleanup;
>>> +                }
>>>               } else {
>>>                   qio_channel_wait(ioc, G_IO_IN);
>>>               }
>>> @@ -242,6 +245,9 @@ int qio_channel_writev_full_all(QIOChannel *ioc,
>>>           if (len == QIO_CHANNEL_ERR_BLOCK) {
>>>               if (qemu_in_coroutine()) {
>>>                   qio_channel_yield(ioc, G_IO_OUT);
>>> +                if (ioc->force_quit) {
>>> +                    goto cleanup;
>>> +                }
>>>               } else {
>>>                   qio_channel_wait(ioc, G_IO_OUT);
>>>               }
>>> @@ -543,6 +549,9 @@ void coroutine_fn qio_channel_yield(QIOChannel *ioc,
>>>       }
>>>       qio_channel_set_aio_fd_handlers(ioc);
>>>       qemu_coroutine_yield();
>>> +    if (ioc->force_quit) {
>>> +        return;
>>> +    }
>>>       /* Allow interrupting the operation by reentering the coroutine other than
>>>        * through the aio_fd_handlers. */ @@ -555,6 +564,19 @@ void
>>> coroutine_fn qio_channel_yield(QIOChannel *ioc,
>>>       }
>>>   }
>>> +void qio_channel_set_force_quit(QIOChannel *ioc, bool quit) {
>>> +    ioc->force_quit = quit;
>>> +}
>>> +
>>> +void qio_channel_coroutines_wake(QIOChannel *ioc) {
>>> +    if (ioc->read_coroutine) {
>>> +        qio_channel_restart_read(ioc);
>>> +    } else if (ioc->write_coroutine) {
>>> +        qio_channel_restart_write(ioc);
>>> +    }
>>> +}
>>
>> None of this should be needed AFAICT, as the poll/coroutine code shuld already break out of poll if the socket is closed, or is marked shutdown.
>>
>>
>> Regards,
>> Daniel
>>
> 
> 

Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
Posted by Daniel P. Berrangé 2 years, 4 months ago
On Thu, Dec 02, 2021 at 01:14:47PM +0800, Rao, Lei wrote:
> Sorry, resending with correct indentation and quoting.
> 
> On 12/1/2021 10:27 PM, Vladimir Sementsov-Ogievskiy wrote:
> > 01.12.2021 12:48, Rao, Lei wrote:
> > > 
> > > 
> > > -----Original Message-----
> > > From: Daniel P. Berrangé <berrange@redhat.com>
> > > Sent: Wednesday, December 1, 2021 5:11 PM
> > > To: Rao, Lei <lei.rao@intel.com>
> > > Cc: Zhang, Chen <chen.zhang@intel.com>; eblake@redhat.com; vsementsov@virtuozzo.com; kwolf@redhat.com; hreitz@redhat.com; qemu-block@nongnu.org; qemu-devel@nongnu.org
> > > Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
> > > 
> > > On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote:
> > > >      We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO.
> > > >      The patch fixes as follow:
> > > >          #0  0x00007fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
> > > >          #1  0x00005563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=599999697630) at util/qemu-timer.c:348
> > > >          #2  0x00005563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669
> > > >          #3  0x00005563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432
> > > >          #4  0x00005563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438
> > > >          #5  0x00005563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063
> > > >          #6  0x00005563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568
> > > >          #7  0x00005563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494
> > > >          #8  0x00005563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538
> > > >          #9  0x00005563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98)
> > > >              at qapi/qmp-dispatch.c:132
> > > >          #10 0x00005563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175
> > > >          #11 0x00005563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145
> > > >          #12 0x00005563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234
> > > >          #13 0x00005563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89
> > > >          #14 0x00005563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117
> > > >          #15 0x00005563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459
> > > >          #16 0x00005563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260
> > > >          #17 0x00007fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
> > > >          #18 0x00005563d6978cda in glib_pollfds_poll () at util/main-loop.c:219
> > > >          #19 0x00005563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242
> > > >          #20 0x00005563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518
> > > >          #21 0x00005563d658f5ed in main_loop () at vl.c:1814
> > > >          #22 0x00005563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8,
> > > > envp=0x7fff3cf5c2b8) at vl.c:450
> > > > 
> > > >      From the call trace, we can see that the QEMU main thread is waiting for the in_flight return to zero. But the in_filght never reaches 0.
> > > >      After analysis, the root cause is that the coroutine of NBD was not awakened. Although this bug was found in the COLO test, I think this is a
> > > >      universal problem in the QIO module. This issue also affects other modules depending on QIO such as NBD. We dump the following data:
> > > >      $2 = {
> > > >        in_flight = 2,
> > > >        state = NBD_CLIENT_QUIT,
> > > >        connect_status = 0,
> > > >        connect_err = 0x0,
> > > >        wait_in_flight = false,
> > > >        requests = {{
> > > >            coroutine = 0x0,
> > > >            offset = 273077071872,
> > > >            receiving = false,
> > > >          }, {
> > > >            coroutine = 0x7f1164571df0,
> > > >            offset = 498792529920,
> > > >            receiving = false,
> > > >          }, {
> > > >            coroutine = 0x0,
> > > >            offset = 500663590912,
> > > >            receiving = false,
> > > >          }, {
> > > >            ...
> > > >          } },
> > > >        reply = {
> > > >          simple = {
> > > >            magic = 1732535960,
> > > >            error = 0,
> > > >            handle = 0
> > > >          },
> > > >          structured = {
> > > >            magic = 1732535960,
> > > >            flags = 0,
> > > >            type = 0,
> > > >            handle = 0,
> > > >            length = 0
> > > >          },
> > > >          {
> > > >            magic = 1732535960,
> > > >            _skip = 0,
> > > >            handle = 0
> > > >          }
> > > >        },
> > > >        bs = 0x7f11640e2140,
> > > >        reconnect_delay = 0,
> > > >        saddr = 0x7f11640e1f80,
> > > >        export = 0x7f11640dc560 "parent0",
> > > >      }
> > > >      From the data, we can see the coroutine of NBD does not exit normally when killing the NBD server(we kill the Secondary VM in the COLO stress test).
> > > >      Then it will not execute in_flight--. So, the QEMU main thread will hang here. Further analysis, I found the coroutine of NBD will yield
> > > >      in nbd_send_request() or qio_channel_write_all() in nbd_co_send_request. By the way, the yield is due to the kernel return EAGAIN(under the stress test).
> > > >      However, NBD didn't know it would yield here. So, the nbd_recv_coroutines_wake won't wake it up because req->receiving is false. if the NBD server
> > > >      is terminated abnormally at the same time. The G_IO_OUT will be invalid, the coroutine will never be awakened. In addition, the s->ioc will be released
> > > >      immediately. if we force to wake up the coroutine of NBD, access to the ioc->xxx will cause segment fault. Finally, I add a state named force_quit to
> > > >      the QIOChannel to ensure that QIO will exit immediately. And I add the function of qio_channel_coroutines_wake to wake it up.
> > 
> > If qio_channel_shutdown() can't kill request that is in qio_channel_write_all() - it's not a reponsibility of nbd driver, qio channel layer should take care of this..
> > 
> > Or, you probably need some keep-alive settings set up.
> > 
> > > > 
> > > > Signed-off-by: Lei Rao <lei.rao@intel.com>
> > > > Signed-off-by: Zhang Chen <chen.zhang@intel.com>
> > > > ---
> > > >   block/nbd.c          |  5 +++++
> > > >   include/io/channel.h | 19 +++++++++++++++++++
> > > >   io/channel.c         | 22 ++++++++++++++++++++++
> > > >   3 files changed, 46 insertions(+)
> > > > 
> > > > diff --git a/block/nbd.c b/block/nbd.c index 5ef462db1b..5ee4eaaf57
> > > > 100644
> > > > --- a/block/nbd.c
> > > > +++ b/block/nbd.c
> > > > @@ -208,6 +208,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)
> > > >       assert(!s->in_flight);
> > > >       if (s->ioc) {
> > > > +        qio_channel_set_force_quit(s->ioc, true);
> > > > +        qio_channel_coroutines_wake(s->ioc);
> > > >           qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH,
> > > > NULL);
> > > 
> > > Calling shutdown here should already be causing the qio_chanel_readv_all to wakeup and break out of its
> > > poll() loop. We shouldn't need to add a second way to break out of the poll().
> > > 
> > > Calling shutdown can wake up the coroutines which is polling. But I think it's not enough. I tried to forcibly wake up the NBD coroutine,
> > > It may cause segment fault. The root cause is that it will continue to access ioc->xxx in qio_channel_yield(), but the ioc has been released and set it NULL such as in
> > > nbd_co_do_establish_connection(); I think call shutdown will have the same result. So, I add the force_quit, once set it true, it will immediately exit without accessing IOC.
> > > 
> > 
> > What do you mean by "the NBD coroutine" and by "forcibly wake up"?
> > 
> > In recent Qemu there is no specific NBD coroutine. Only request coroutines should exist.
> > 
> > Forcibly waking up any coroutine is generally unsafe in Qemu, most of the code is not prepared for this, crash is normal result for such try.
> > 
> > Also, there is good mechanism to debug coroutines in gdb:
> > 
> > source scripts/qemu-gdb.py
> > qemu coroutine <coroutine pointer>
> > 
> > - it will dump stack trace of a coroutine, it may help.
> > 
> > Also, to find coroutines look at bs->tracked_requests list, all requests of bs are in the list with coroutine pointers in .co field.
> 
> I am sorry for the unclear description. The NBD coroutine means the request coroutines.
> 
> About "the forcibly wake up" I just set the receiving to true before qio_channel_writev_all() in nbd_co_send_request()
> to ensure that the request coroutines can be awakened by nbd_recv_coroutine_wake_one(). But that's just a test.
> 
> The issue is, only waking up the request coroutine or shutdown the QIO is not enough because there will be a synchronization issue.
> For example, When the NBD request coroutine is awakened in qio_channel_writev_full_all(), it will continue to access the ioc->xxx,
> but the ioc has been set to NULL in nbd_co_do_establish_connection(); it will cause segment fault.

That is a serious bug. Nothing is permitted to free the QIOChannel while a
qio_channel_writev_full_all API call (or any other I/O call is in progress).

If multiple threads are involved, then each needs to be holding a reference
the QIOChannel to stop it being freed while in use.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
Posted by Vladimir Sementsov-Ogievskiy 2 years, 4 months ago
02.12.2021 11:53, Daniel P. Berrangé wrote:
> On Thu, Dec 02, 2021 at 01:14:47PM +0800, Rao, Lei wrote:
>> Sorry, resending with correct indentation and quoting.
>>
>> On 12/1/2021 10:27 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> 01.12.2021 12:48, Rao, Lei wrote:
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>>> Sent: Wednesday, December 1, 2021 5:11 PM
>>>> To: Rao, Lei <lei.rao@intel.com>
>>>> Cc: Zhang, Chen <chen.zhang@intel.com>; eblake@redhat.com; vsementsov@virtuozzo.com; kwolf@redhat.com; hreitz@redhat.com; qemu-block@nongnu.org; qemu-devel@nongnu.org
>>>> Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
>>>>
>>>> On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote:
>>>>>       We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO.
>>>>>       The patch fixes as follow:
>>>>>           #0  0x00007fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
>>>>>           #1  0x00005563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=599999697630) at util/qemu-timer.c:348
>>>>>           #2  0x00005563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669
>>>>>           #3  0x00005563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432
>>>>>           #4  0x00005563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438
>>>>>           #5  0x00005563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063
>>>>>           #6  0x00005563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568
>>>>>           #7  0x00005563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494
>>>>>           #8  0x00005563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538
>>>>>           #9  0x00005563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98)
>>>>>               at qapi/qmp-dispatch.c:132
>>>>>           #10 0x00005563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175
>>>>>           #11 0x00005563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145
>>>>>           #12 0x00005563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234
>>>>>           #13 0x00005563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89
>>>>>           #14 0x00005563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117
>>>>>           #15 0x00005563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459
>>>>>           #16 0x00005563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260
>>>>>           #17 0x00007fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>>>>           #18 0x00005563d6978cda in glib_pollfds_poll () at util/main-loop.c:219
>>>>>           #19 0x00005563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242
>>>>>           #20 0x00005563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518
>>>>>           #21 0x00005563d658f5ed in main_loop () at vl.c:1814
>>>>>           #22 0x00005563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8,
>>>>> envp=0x7fff3cf5c2b8) at vl.c:450
>>>>>
>>>>>       From the call trace, we can see that the QEMU main thread is waiting for the in_flight return to zero. But the in_filght never reaches 0.
>>>>>       After analysis, the root cause is that the coroutine of NBD was not awakened. Although this bug was found in the COLO test, I think this is a
>>>>>       universal problem in the QIO module. This issue also affects other modules depending on QIO such as NBD. We dump the following data:
>>>>>       $2 = {
>>>>>         in_flight = 2,
>>>>>         state = NBD_CLIENT_QUIT,
>>>>>         connect_status = 0,
>>>>>         connect_err = 0x0,
>>>>>         wait_in_flight = false,
>>>>>         requests = {{
>>>>>             coroutine = 0x0,
>>>>>             offset = 273077071872,
>>>>>             receiving = false,
>>>>>           }, {
>>>>>             coroutine = 0x7f1164571df0,
>>>>>             offset = 498792529920,
>>>>>             receiving = false,
>>>>>           }, {
>>>>>             coroutine = 0x0,
>>>>>             offset = 500663590912,
>>>>>             receiving = false,
>>>>>           }, {
>>>>>             ...
>>>>>           } },
>>>>>         reply = {
>>>>>           simple = {
>>>>>             magic = 1732535960,
>>>>>             error = 0,
>>>>>             handle = 0
>>>>>           },
>>>>>           structured = {
>>>>>             magic = 1732535960,
>>>>>             flags = 0,
>>>>>             type = 0,
>>>>>             handle = 0,
>>>>>             length = 0
>>>>>           },
>>>>>           {
>>>>>             magic = 1732535960,
>>>>>             _skip = 0,
>>>>>             handle = 0
>>>>>           }
>>>>>         },
>>>>>         bs = 0x7f11640e2140,
>>>>>         reconnect_delay = 0,
>>>>>         saddr = 0x7f11640e1f80,
>>>>>         export = 0x7f11640dc560 "parent0",
>>>>>       }
>>>>>       From the data, we can see the coroutine of NBD does not exit normally when killing the NBD server(we kill the Secondary VM in the COLO stress test).
>>>>>       Then it will not execute in_flight--. So, the QEMU main thread will hang here. Further analysis, I found the coroutine of NBD will yield
>>>>>       in nbd_send_request() or qio_channel_write_all() in nbd_co_send_request. By the way, the yield is due to the kernel return EAGAIN(under the stress test).
>>>>>       However, NBD didn't know it would yield here. So, the nbd_recv_coroutines_wake won't wake it up because req->receiving is false. if the NBD server
>>>>>       is terminated abnormally at the same time. The G_IO_OUT will be invalid, the coroutine will never be awakened. In addition, the s->ioc will be released
>>>>>       immediately. if we force to wake up the coroutine of NBD, access to the ioc->xxx will cause segment fault. Finally, I add a state named force_quit to
>>>>>       the QIOChannel to ensure that QIO will exit immediately. And I add the function of qio_channel_coroutines_wake to wake it up.
>>>
>>> If qio_channel_shutdown() can't kill request that is in qio_channel_write_all() - it's not a reponsibility of nbd driver, qio channel layer should take care of this..
>>>
>>> Or, you probably need some keep-alive settings set up.
>>>
>>>>>
>>>>> Signed-off-by: Lei Rao <lei.rao@intel.com>
>>>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>>>>> ---
>>>>>    block/nbd.c          |  5 +++++
>>>>>    include/io/channel.h | 19 +++++++++++++++++++
>>>>>    io/channel.c         | 22 ++++++++++++++++++++++
>>>>>    3 files changed, 46 insertions(+)
>>>>>
>>>>> diff --git a/block/nbd.c b/block/nbd.c index 5ef462db1b..5ee4eaaf57
>>>>> 100644
>>>>> --- a/block/nbd.c
>>>>> +++ b/block/nbd.c
>>>>> @@ -208,6 +208,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>>>>>        assert(!s->in_flight);
>>>>>        if (s->ioc) {
>>>>> +        qio_channel_set_force_quit(s->ioc, true);
>>>>> +        qio_channel_coroutines_wake(s->ioc);
>>>>>            qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH,
>>>>> NULL);
>>>>
>>>> Calling shutdown here should already be causing the qio_chanel_readv_all to wakeup and break out of its
>>>> poll() loop. We shouldn't need to add a second way to break out of the poll().
>>>>
>>>> Calling shutdown can wake up the coroutines which is polling. But I think it's not enough. I tried to forcibly wake up the NBD coroutine,
>>>> It may cause segment fault. The root cause is that it will continue to access ioc->xxx in qio_channel_yield(), but the ioc has been released and set it NULL such as in
>>>> nbd_co_do_establish_connection(); I think call shutdown will have the same result. So, I add the force_quit, once set it true, it will immediately exit without accessing IOC.
>>>>
>>>
>>> What do you mean by "the NBD coroutine" and by "forcibly wake up"?
>>>
>>> In recent Qemu there is no specific NBD coroutine. Only request coroutines should exist.
>>>
>>> Forcibly waking up any coroutine is generally unsafe in Qemu, most of the code is not prepared for this, crash is normal result for such try.
>>>
>>> Also, there is good mechanism to debug coroutines in gdb:
>>>
>>> source scripts/qemu-gdb.py
>>> qemu coroutine <coroutine pointer>
>>>
>>> - it will dump stack trace of a coroutine, it may help.
>>>
>>> Also, to find coroutines look at bs->tracked_requests list, all requests of bs are in the list with coroutine pointers in .co field.
>>
>> I am sorry for the unclear description. The NBD coroutine means the request coroutines.
>>
>> About "the forcibly wake up" I just set the receiving to true before qio_channel_writev_all() in nbd_co_send_request()
>> to ensure that the request coroutines can be awakened by nbd_recv_coroutine_wake_one(). But that's just a test.

I'm not sure that it's safe. On nbd_channel_error() we call qio_channel_shutdown() - it should wake up everything that sleep in qio_channel code. The .receiving field is only for nbd.c specific internal yield point.

Could your change itself lead to the crash I'm not sure too.

Still, on normal execution paths, there shouldn't be a situation when some qio_channel_writev_full_all() is still executing during nbd_co_do_establish_connection(). If it is - it's a bug in nbd.c of course.

To check it without changing the .receiving field, you should print all stacks of nbd request coroutines at the point of qemu hang. What they all are doing?

If one of them is in nbd_co_do_establish_connection() and another is in nbd_co_do_establish_connection()  - that's a bug.

>>
>> The issue is, only waking up the request coroutine or shutdown the QIO is not enough because there will be a synchronization issue.
>> For example, When the NBD request coroutine is awakened in qio_channel_writev_full_all(), it will continue to access the ioc->xxx,
>> but the ioc has been set to NULL in nbd_co_do_establish_connection(); it will cause segment fault.

Could you provide a call stack of this segmentation fault? And coroutine stack of coroutine which is in qio_channel_writev_full_all() at the moment. And value of s->in_flight?

> 
> That is a serious bug. Nothing is permitted to free the QIOChannel while a
> qio_channel_writev_full_all API call (or any other I/O call is in progress).

I understand this.. There are only two paths that calls nbd_co_do_establish_connection():

If it's nbd_co_send_request() -> nbd_reconnect_attempt() -> nbd_co_do_establish_connection()

Then in nbd_co_send_request() we should have been waiting until s->in_flight becomes 0. And we are under mutex.. So there should not be any parallel qio_channel_writev_full_all() or any other qio_channel requests.

Another path is nbd_open() -> nbd_do_establish_connection(). Here for sure can't be any requests yet..

> 
> If multiple threads are involved, then each needs to be holding a reference
> the QIOChannel to stop it being freed while in use.
> 



-- 
Best regards,
Vladimir

Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
Posted by Rao, Lei 2 years, 4 months ago

On 12/2/2021 5:54 PM, Vladimir Sementsov-Ogievskiy wrote:
> 02.12.2021 11:53, Daniel P. Berrangé wrote:
>> On Thu, Dec 02, 2021 at 01:14:47PM +0800, Rao, Lei wrote:
>>> Sorry, resending with correct indentation and quoting.
>>>
>>> On 12/1/2021 10:27 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>> 01.12.2021 12:48, Rao, Lei wrote:
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>>>> Sent: Wednesday, December 1, 2021 5:11 PM
>>>>> To: Rao, Lei <lei.rao@intel.com>
>>>>> Cc: Zhang, Chen <chen.zhang@intel.com>; eblake@redhat.com; vsementsov@virtuozzo.com; kwolf@redhat.com; hreitz@redhat.com; qemu-block@nongnu.org; qemu-devel@nongnu.org
>>>>> Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
>>>>>
>>>>> On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote:
>>>>>>       We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO.
>>>>>>       The patch fixes as follow:
>>>>>>           #0  0x00007fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
>>>>>>           #1  0x00005563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=599999697630) at util/qemu-timer.c:348
>>>>>>           #2  0x00005563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669
>>>>>>           #3  0x00005563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432
>>>>>>           #4  0x00005563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438
>>>>>>           #5  0x00005563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063
>>>>>>           #6  0x00005563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568
>>>>>>           #7  0x00005563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494
>>>>>>           #8  0x00005563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538
>>>>>>           #9  0x00005563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98)
>>>>>>               at qapi/qmp-dispatch.c:132
>>>>>>           #10 0x00005563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175
>>>>>>           #11 0x00005563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145
>>>>>>           #12 0x00005563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234
>>>>>>           #13 0x00005563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89
>>>>>>           #14 0x00005563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117
>>>>>>           #15 0x00005563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459
>>>>>>           #16 0x00005563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260
>>>>>>           #17 0x00007fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>>>>>           #18 0x00005563d6978cda in glib_pollfds_poll () at util/main-loop.c:219
>>>>>>           #19 0x00005563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242
>>>>>>           #20 0x00005563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518
>>>>>>           #21 0x00005563d658f5ed in main_loop () at vl.c:1814
>>>>>>           #22 0x00005563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8,
>>>>>> envp=0x7fff3cf5c2b8) at vl.c:450
>>>>>>
>>>>>>       From the call trace, we can see that the QEMU main thread is waiting for the in_flight return to zero. But the in_filght never reaches 0.
>>>>>>       After analysis, the root cause is that the coroutine of NBD was not awakened. Although this bug was found in the COLO test, I think this is a
>>>>>>       universal problem in the QIO module. This issue also affects other modules depending on QIO such as NBD. We dump the following data:
>>>>>>       $2 = {
>>>>>>         in_flight = 2,
>>>>>>         state = NBD_CLIENT_QUIT,
>>>>>>         connect_status = 0,
>>>>>>         connect_err = 0x0,
>>>>>>         wait_in_flight = false,
>>>>>>         requests = {{
>>>>>>             coroutine = 0x0,
>>>>>>             offset = 273077071872,
>>>>>>             receiving = false,
>>>>>>           }, {
>>>>>>             coroutine = 0x7f1164571df0,
>>>>>>             offset = 498792529920,
>>>>>>             receiving = false,
>>>>>>           }, {
>>>>>>             coroutine = 0x0,
>>>>>>             offset = 500663590912,
>>>>>>             receiving = false,
>>>>>>           }, {
>>>>>>             ...
>>>>>>           } },
>>>>>>         reply = {
>>>>>>           simple = {
>>>>>>             magic = 1732535960,
>>>>>>             error = 0,
>>>>>>             handle = 0
>>>>>>           },
>>>>>>           structured = {
>>>>>>             magic = 1732535960,
>>>>>>             flags = 0,
>>>>>>             type = 0,
>>>>>>             handle = 0,
>>>>>>             length = 0
>>>>>>           },
>>>>>>           {
>>>>>>             magic = 1732535960,
>>>>>>             _skip = 0,
>>>>>>             handle = 0
>>>>>>           }
>>>>>>         },
>>>>>>         bs = 0x7f11640e2140,
>>>>>>         reconnect_delay = 0,
>>>>>>         saddr = 0x7f11640e1f80,
>>>>>>         export = 0x7f11640dc560 "parent0",
>>>>>>       }
>>>>>>       From the data, we can see the coroutine of NBD does not exit normally when killing the NBD server(we kill the Secondary VM in the COLO stress test).
>>>>>>       Then it will not execute in_flight--. So, the QEMU main thread will hang here. Further analysis, I found the coroutine of NBD will yield
>>>>>>       in nbd_send_request() or qio_channel_write_all() in nbd_co_send_request. By the way, the yield is due to the kernel return EAGAIN(under the stress test).
>>>>>>       However, NBD didn't know it would yield here. So, the nbd_recv_coroutines_wake won't wake it up because req->receiving is false. if the NBD server
>>>>>>       is terminated abnormally at the same time. The G_IO_OUT will be invalid, the coroutine will never be awakened. In addition, the s->ioc will be released
>>>>>>       immediately. if we force to wake up the coroutine of NBD, access to the ioc->xxx will cause segment fault. Finally, I add a state named force_quit to
>>>>>>       the QIOChannel to ensure that QIO will exit immediately. And I add the function of qio_channel_coroutines_wake to wake it up.
>>>>
>>>> If qio_channel_shutdown() can't kill request that is in qio_channel_write_all() - it's not a reponsibility of nbd driver, qio channel layer should take care of this..
>>>>
>>>> Or, you probably need some keep-alive settings set up.
>>>>
>>>>>>
>>>>>> Signed-off-by: Lei Rao <lei.rao@intel.com>
>>>>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>>>>>> ---
>>>>>>    block/nbd.c          |  5 +++++
>>>>>>    include/io/channel.h | 19 +++++++++++++++++++
>>>>>>    io/channel.c         | 22 ++++++++++++++++++++++
>>>>>>    3 files changed, 46 insertions(+)
>>>>>>
>>>>>> diff --git a/block/nbd.c b/block/nbd.c index 5ef462db1b..5ee4eaaf57
>>>>>> 100644
>>>>>> --- a/block/nbd.c
>>>>>> +++ b/block/nbd.c
>>>>>> @@ -208,6 +208,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>>>>>>        assert(!s->in_flight);
>>>>>>        if (s->ioc) {
>>>>>> +        qio_channel_set_force_quit(s->ioc, true);
>>>>>> +        qio_channel_coroutines_wake(s->ioc);
>>>>>>            qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH,
>>>>>> NULL);
>>>>>
>>>>> Calling shutdown here should already be causing the qio_chanel_readv_all to wakeup and break out of its
>>>>> poll() loop. We shouldn't need to add a second way to break out of the poll().
>>>>>
>>>>> Calling shutdown can wake up the coroutines which is polling. But I think it's not enough. I tried to forcibly wake up the NBD coroutine,
>>>>> It may cause segment fault. The root cause is that it will continue to access ioc->xxx in qio_channel_yield(), but the ioc has been released and set it NULL such as in
>>>>> nbd_co_do_establish_connection(); I think call shutdown will have the same result. So, I add the force_quit, once set it true, it will immediately exit without accessing IOC.
>>>>>
>>>>
>>>> What do you mean by "the NBD coroutine" and by "forcibly wake up"?
>>>>
>>>> In recent Qemu there is no specific NBD coroutine. Only request coroutines should exist.
>>>>
>>>> Forcibly waking up any coroutine is generally unsafe in Qemu, most of the code is not prepared for this, crash is normal result for such try.
>>>>
>>>> Also, there is good mechanism to debug coroutines in gdb:
>>>>
>>>> source scripts/qemu-gdb.py
>>>> qemu coroutine <coroutine pointer>
>>>>
>>>> - it will dump stack trace of a coroutine, it may help.
>>>>
>>>> Also, to find coroutines look at bs->tracked_requests list, all requests of bs are in the list with coroutine pointers in .co field.
>>>
>>> I am sorry for the unclear description. The NBD coroutine means the request coroutines.
>>>
>>> About "the forcibly wake up" I just set the receiving to true before qio_channel_writev_all() in nbd_co_send_request()
>>> to ensure that the request coroutines can be awakened by nbd_recv_coroutine_wake_one(). But that's just a test.
> 
> I'm not sure that it's safe. On nbd_channel_error() we call qio_channel_shutdown() - it should wake up everything that sleep in qio_channel code. The .receiving field is only for nbd.c specific internal yield point.
> 
> Could your change itself lead to the crash I'm not sure too.
> 
> Still, on normal execution paths, there shouldn't be a situation when some qio_channel_writev_full_all() is still executing during nbd_co_do_establish_connection(). If it is - it's a bug in nbd.c of course.
> 
> To check it without changing the .receiving field, you should print all stacks of nbd request coroutines at the point of qemu hang. What they all are doing?
> 
> If one of them is in nbd_co_do_establish_connection() and another is in nbd_co_do_establish_connection()  - that's a bug.

OK,I will try to reproduce this bug again, but this may take some time. Previously in our stress test, it usually takes two or three days to hit this issue.

> 
>>>
>>> The issue is, only waking up the request coroutine or shutdown the QIO is not enough because there will be a synchronization issue.
>>> For example, When the NBD request coroutine is awakened in qio_channel_writev_full_all(), it will continue to access the ioc->xxx,
>>> but the ioc has been set to NULL in nbd_co_do_establish_connection(); it will cause segment fault.
> 
> Could you provide a call stack of this segmentation fault? And coroutine stack of coroutine which is in qio_channel_writev_full_all() at the moment. And value of s->in_flight?
> 
>>
>> That is a serious bug. Nothing is permitted to free the QIOChannel while a
>> qio_channel_writev_full_all API call (or any other I/O call is in progress).
> 
> I understand this.. There are only two paths that calls nbd_co_do_establish_connection():
> 
> If it's nbd_co_send_request() -> nbd_reconnect_attempt() -> nbd_co_do_establish_connection()
> 
> Then in nbd_co_send_request() we should have been waiting until s->in_flight becomes 0. And we are under mutex.. So there should not be any parallel qio_channel_writev_full_all() or any other qio_channel requests.
> 
> Another path is nbd_open() -> nbd_do_establish_connection(). Here for sure can't be any requests yet..
> 
>>
>> If multiple threads are involved, then each needs to be holding a reference
>> the QIOChannel to stop it being freed while in use.
>>
> 
> 
> 

Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
Posted by Rao, Lei 2 years, 4 months ago

On 12/3/2021 9:26 AM, Rao, Lei wrote:
> 
> 
> On 12/2/2021 5:54 PM, Vladimir Sementsov-Ogievskiy wrote:
>> 02.12.2021 11:53, Daniel P. Berrangé wrote:
>>> On Thu, Dec 02, 2021 at 01:14:47PM +0800, Rao, Lei wrote:
>>>> Sorry, resending with correct indentation and quoting.
>>>>
>>>> On 12/1/2021 10:27 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 01.12.2021 12:48, Rao, Lei wrote:
>>>>>>
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>>>>> Sent: Wednesday, December 1, 2021 5:11 PM
>>>>>> To: Rao, Lei <lei.rao@intel.com>
>>>>>> Cc: Zhang, Chen <chen.zhang@intel.com>; eblake@redhat.com; vsementsov@virtuozzo.com; kwolf@redhat.com; hreitz@redhat.com; qemu-block@nongnu.org; qemu-devel@nongnu.org
>>>>>> Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
>>>>>>
>>>>>> On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote:
>>>>>>>       We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO.
>>>>>>>       The patch fixes as follow:
>>>>>>>           #0  0x00007fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
>>>>>>>           #1  0x00005563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=599999697630) at util/qemu-timer.c:348
>>>>>>>           #2  0x00005563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669
>>>>>>>           #3  0x00005563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432
>>>>>>>           #4  0x00005563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438
>>>>>>>           #5  0x00005563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063
>>>>>>>           #6  0x00005563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568
>>>>>>>           #7  0x00005563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494
>>>>>>>           #8  0x00005563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538
>>>>>>>           #9  0x00005563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98)
>>>>>>>               at qapi/qmp-dispatch.c:132
>>>>>>>           #10 0x00005563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175
>>>>>>>           #11 0x00005563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145
>>>>>>>           #12 0x00005563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234
>>>>>>>           #13 0x00005563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89
>>>>>>>           #14 0x00005563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117
>>>>>>>           #15 0x00005563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459
>>>>>>>           #16 0x00005563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260
>>>>>>>           #17 0x00007fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>>>>>>           #18 0x00005563d6978cda in glib_pollfds_poll () at util/main-loop.c:219
>>>>>>>           #19 0x00005563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242
>>>>>>>           #20 0x00005563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518
>>>>>>>           #21 0x00005563d658f5ed in main_loop () at vl.c:1814
>>>>>>>           #22 0x00005563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8,
>>>>>>> envp=0x7fff3cf5c2b8) at vl.c:450
>>>>>>>
>>>>>>>
>>>>>>> Signed-off-by: Lei Rao <lei.rao@intel.com>
>>>>>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>>>>>>> ---
>>>>>>>    block/nbd.c          |  5 +++++
>>>>>>>    include/io/channel.h | 19 +++++++++++++++++++
>>>>>>>    io/channel.c         | 22 ++++++++++++++++++++++
>>>>>>>    3 files changed, 46 insertions(+)
>>>>>>>
>>>>>>> diff --git a/block/nbd.c b/block/nbd.c index 5ef462db1b..5ee4eaaf57
>>>>>>> 100644
>>>>>>> --- a/block/nbd.c
>>>>>>> +++ b/block/nbd.c
>>>>>>> @@ -208,6 +208,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>>>>>>>        assert(!s->in_flight);
>>>>>>>        if (s->ioc) {
>>>>>>> +        qio_channel_set_force_quit(s->ioc, true);
>>>>>>> +        qio_channel_coroutines_wake(s->ioc);
>>>>>>>            qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH,
>>>>>>> NULL);
>>>>>>
>>>>>> Calling shutdown here should already be causing the qio_chanel_readv_all to wakeup and break out of its
>>>>>> poll() loop. We shouldn't need to add a second way to break out of the poll().
>>>>>>
>>>>>> Calling shutdown can wake up the coroutines which is polling. But I think it's not enough. I tried to forcibly wake up the NBD coroutine,
>>>>>> It may cause segment fault. The root cause is that it will continue to access ioc->xxx in qio_channel_yield(), but the ioc has been released and set it NULL such as in
>>>>>> nbd_co_do_establish_connection(); I think call shutdown will have the same result. So, I add the force_quit, once set it true, it will immediately exit without accessing IOC.
>>>>>>
>>>>>
>>>>> What do you mean by "the NBD coroutine" and by "forcibly wake up"?
>>>>>
>>>>> In recent Qemu there is no specific NBD coroutine. Only request coroutines should exist.
>>>>>
>>>>> Forcibly waking up any coroutine is generally unsafe in Qemu, most of the code is not prepared for this, crash is normal result for such try.
>>>>>
>>>>> Also, there is good mechanism to debug coroutines in gdb:
>>>>>
>>>>> source scripts/qemu-gdb.py
>>>>> qemu coroutine <coroutine pointer>
>>>>>
>>>>> - it will dump stack trace of a coroutine, it may help.
>>>>>
>>>>> Also, to find coroutines look at bs->tracked_requests list, all requests of bs are in the list with coroutine pointers in .co field.
>>>>
>>>> I am sorry for the unclear description. The NBD coroutine means the request coroutines.
>>>>
>>>> About "the forcibly wake up" I just set the receiving to true before qio_channel_writev_all() in nbd_co_send_request()
>>>> to ensure that the request coroutines can be awakened by nbd_recv_coroutine_wake_one(). But that's just a test.
>>
>> I'm not sure that it's safe. On nbd_channel_error() we call qio_channel_shutdown() - it should wake up everything that sleep in qio_channel code. The .receiving field is only for nbd.c specific internal yield point.
>>
>> Could your change itself lead to the crash I'm not sure too.
>>
>> Still, on normal execution paths, there shouldn't be a situation when some qio_channel_writev_full_all() is still executing during nbd_co_do_establish_connection(). If it is - it's a bug in nbd.c of course.
>>
>> To check it without changing the .receiving field, you should print all stacks of nbd request coroutines at the point of qemu hang. What they all are doing?
>>
>> If one of them is in nbd_co_do_establish_connection() and another is in nbd_co_do_establish_connection()  - that's a bug.
> 
> OK,I will try to reproduce this bug again, but this may take some time. Previously in our stress test, it usually takes two or three days to hit this issue.


The bug can be reproduced in the latest QEMU. But the root cause is different due to QEMU code change.
The gdb stack is as follows:

#0 0x00007fe00c53dbf6 in __ppoll (fds=0x7fdbcc140f40, nfds=2, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
#1 0x000055c67de72a11 in qemu_poll_ns (fds=0x7fdbcc140f40, nfds=2, timeout=-1) at ../util/qemu-timer.c:336
#2 0x000055c67de43fab in fdmon_poll_wait (ctx=0x55c67ea73ca0, ready_list=0x7fdbec9fe730, timeout=-1) at ../util/fdmon-poll.c:80
#3 0x000055c67de4387b in aio_poll (ctx=0x55c67ea73ca0, blocking=true) at ../util/aio-posix.c:607
#4 0x000055c67dd1230a in bdrv_drain_all_begin () at ../block/io.c:675
#5 0x000055c67dd1271a in bdrv_drain_all () at ../block/io.c:726
#6 0x000055c67db3b1a2 in do_vm_stop (state=RUN_STATE_COLO, send_stop=true) at ../softmmu/cpus.c:268
#7 0x000055c67db3bec9 in vm_stop (state=RUN_STATE_COLO) at ../softmmu/cpus.c:658
#8 0x000055c67db3bf9d in vm_stop_force_state (state=RUN_STATE_COLO) at ../softmmu/cpus.c:707
#9 0x000055c67d8159da in colo_incoming_process_checkpoint (mis=0x55c67ed0d550, fb=0x7fdbcc135f40, bioc=0x7fdbcc02d010, errp=0x7fdbec9fe928)at ../migration/colo.c:685
#10 0x000055c67d815e3f in colo_wait_handle_message (mis=0x55c67ed0d550, fb=0x7fdbcc135f40, bioc=0x7fdbcc02d010, errp=0x7fdbec9fe928)at ../migration/colo.c:815
#11 0x000055c67d81604a in colo_process_incoming_thread (opaque=0x55c67ed0d550) at ../migration/colo.c:883
#12 0x000055c67de47e1d in qemu_thread_start (args=0x55c67f1bb380) at ../util/qemu-thread-posix.c:556
#13 0x00007fe00c625609 in start_thread (arg=<optimized out>) at pthread_create.c:477

 From the call trace, we can see the thread of COLO is hung in bdrv_drain_all(). Specifically, it is hung in blk_root_drained_poll().

After analysis, the root cause is that, when the QEMU is draining IO, the nbd_drianed_poll returns true because there are some
IO requests not completed. This causes aio_poll having no chance to return.

The data dump of the NBD client is as follows:

(gdb) p client
$41 = (NBDClient *) 0x5639f19754a0
(gdb) p *client
$42 = {refcount = 3, close_fn = 0x5639ef270bd4 <nbd_blockdev_client_closed>, exp = 0x5639f1d769d0, tlscreds = 0x0, tlsauthz = 0x0, sioc = 0x5639f1d8aa20,
   ioc = 0x5639f1d8aa20, recv_coroutine = 0x5639f183bb10, send_lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0},
     handoff = 0, sequence = 0, holder = 0x0}, send_coroutine = 0x0, read_yielding = false, quiescing = true, next = {tqe_next = 0x0, tqe_circ = {
       tql_next = 0x0, tql_prev = 0x5639f1d76a28}}, nb_requests = 1, closing = false, check_align = 1, structured_reply = true, export_meta = {
     exp = 0x5639f1d769d0, count = 1, base_allocation = true, allocation_depth = false, bitmaps = 0x0}, opt = 7, optlen = 0}
	
We can see the nb_requests = 1. That means there is an IO request not completed on the NBD server side.
And the NBD receive coroutine is as follows:

(gdb) qemu coroutine 0x5639f183bb10
#0  0x00005639ef3fe50b in qemu_coroutine_switch (from_=0x5639f183bb10, to_=0x7fd0f5fff5a0, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:302
#1  0x00005639ef40dfc2 in qemu_coroutine_yield () at ../util/qemu-coroutine.c:193
#2  0x00005639ef24e4db in qio_channel_yield (ioc=0x5639f1d8aa20, condition=G_IO_IN) at ../io/channel.c:545
#3  0x00005639ef24d71a in qio_channel_readv_full_all_eof (ioc=0x5639f1d8aa20, iov=0x7fcc20edcde0, niov=1, fds=0x0, nfds=0x0, errp=0x7fcc20edcf10)
     at ../io/channel.c:138
#4  0x00005639ef24d8df in qio_channel_readv_full_all (ioc=0x5639f1d8aa20, iov=0x7fcc20edcde0, niov=1, fds=0x0, nfds=0x0, errp=0x7fcc20edcf10)
     at ../io/channel.c:202
#5  0x00005639ef24d5ec in qio_channel_readv_all (ioc=0x5639f1d8aa20, iov=0x7fcc20edcde0, niov=1, errp=0x7fcc20edcf10) at ../io/channel.c:103
#6  0x00005639ef24dd1d in qio_channel_read_all (ioc=0x5639f1d8aa20, buf=0x5639f30eb000 "", buflen=19660800, errp=0x7fcc20edcf10) at ../io/channel.c:320
#7  0x00005639ef25ec54 in nbd_read (ioc=0x5639f1d8aa20, buffer=0x5639f30eb000, size=19660800, desc=0x5639ef5ae73a "CMD_WRITE data", errp=0x7fcc20edcf10)
     at /home/workspace/qemu/include/block/nbd.h:361
#8  0x00005639ef264682 in nbd_co_receive_request (req=0x5639f1744a50, request=0x7fcc20edcf30, errp=0x7fcc20edcf10) at ../nbd/server.c:2328
#9  0x00005639ef2652b3 in nbd_trip (opaque=0x5639f19754a0) at ../nbd/server.c:2629
#10 0x00005639ef3fe20a in coroutine_trampoline (i0=-243025136, i1=22073) at ../util/coroutine-ucontext.c:173
#11 0x00007fd52da17660 in __start_context () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91

We can see the NBD server is trying to read data from NBD client, while NBD client can't reply data
in time because the QEMU main thread is busy on other tasks. So, the IO request can't be completed
Then it will hang here. The point is we can't assume NBD client behaves like local block driver,
network/NBD client side maybe have delay,etc...

Although this issue happens in COLO stress test, this looks like a generic problem.
if NBD is involved, any usage of bdrv_drain_all() will depend on NBD client to complete all the requests.
However, generally we shouldn't depend on the NBD client side because all bad things can happen on network.

To solve this COLO issue, one option is, we can slightly change COLO operation sequence to drain IO in the
a NBD client before the NBD server drains IO.

But for more generic option, is it possible to prevent NBD server poll IO in this scenarios or add a timeout mechanism in nbd_drained_poll()?

Thanks,
Lei

> 
>>
>>>>
>>>> The issue is, only waking up the request coroutine or shutdown the QIO is not enough because there will be a synchronization issue.
>>>> For example, When the NBD request coroutine is awakened in qio_channel_writev_full_all(), it will continue to access the ioc->xxx,
>>>> but the ioc has been set to NULL in nbd_co_do_establish_connection(); it will cause segment fault.
>>
>> Could you provide a call stack of this segmentation fault? And coroutine stack of coroutine which is in qio_channel_writev_full_all() at the moment. And value of s->in_flight?
>>
>>>
>>> That is a serious bug. Nothing is permitted to free the QIOChannel while a
>>> qio_channel_writev_full_all API call (or any other I/O call is in progress).
>>
>> I understand this.. There are only two paths that calls nbd_co_do_establish_connection():
>>
>> If it's nbd_co_send_request() -> nbd_reconnect_attempt() -> nbd_co_do_establish_connection()
>>
>> Then in nbd_co_send_request() we should have been waiting until s->in_flight becomes 0. And we are under mutex.. So there should not be any parallel qio_channel_writev_full_all() or any other qio_channel requests.
>>
>> Another path is nbd_open() -> nbd_do_establish_connection(). Here for sure can't be any requests yet..
>>
>>>
>>> If multiple threads are involved, then each needs to be holding a reference
>>> the QIOChannel to stop it being freed while in us
>>>
>>
>>
>>

Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
Posted by Vladimir Sementsov-Ogievskiy 2 years, 4 months ago
13.12.2021 11:02, Rao, Lei wrote:
> 
> 
> On 12/3/2021 9:26 AM, Rao, Lei wrote:
>>
>>
>> On 12/2/2021 5:54 PM, Vladimir Sementsov-Ogievskiy wrote:
>>> 02.12.2021 11:53, Daniel P. Berrangé wrote:
>>>> On Thu, Dec 02, 2021 at 01:14:47PM +0800, Rao, Lei wrote:
>>>>> Sorry, resending with correct indentation and quoting.
>>>>>
>>>>> On 12/1/2021 10:27 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>>>> 01.12.2021 12:48, Rao, Lei wrote:
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>>>>>> Sent: Wednesday, December 1, 2021 5:11 PM
>>>>>>> To: Rao, Lei <lei.rao@intel.com>
>>>>>>> Cc: Zhang, Chen <chen.zhang@intel.com>; eblake@redhat.com; vsementsov@virtuozzo.com; kwolf@redhat.com; hreitz@redhat.com; qemu-block@nongnu.org; qemu-devel@nongnu.org
>>>>>>> Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
>>>>>>>
>>>>>>> On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote:
>>>>>>>>       We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO.
>>>>>>>>       The patch fixes as follow:
>>>>>>>>           #0  0x00007fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
>>>>>>>>           #1  0x00005563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=599999697630) at util/qemu-timer.c:348
>>>>>>>>           #2  0x00005563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669
>>>>>>>>           #3  0x00005563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432
>>>>>>>>           #4  0x00005563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438
>>>>>>>>           #5  0x00005563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063
>>>>>>>>           #6  0x00005563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568
>>>>>>>>           #7  0x00005563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494
>>>>>>>>           #8  0x00005563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538
>>>>>>>>           #9  0x00005563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98)
>>>>>>>>               at qapi/qmp-dispatch.c:132
>>>>>>>>           #10 0x00005563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175
>>>>>>>>           #11 0x00005563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145
>>>>>>>>           #12 0x00005563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234
>>>>>>>>           #13 0x00005563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89
>>>>>>>>           #14 0x00005563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117
>>>>>>>>           #15 0x00005563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459
>>>>>>>>           #16 0x00005563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260
>>>>>>>>           #17 0x00007fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>>>>>>>           #18 0x00005563d6978cda in glib_pollfds_poll () at util/main-loop.c:219
>>>>>>>>           #19 0x00005563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242
>>>>>>>>           #20 0x00005563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518
>>>>>>>>           #21 0x00005563d658f5ed in main_loop () at vl.c:1814
>>>>>>>>           #22 0x00005563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8,
>>>>>>>> envp=0x7fff3cf5c2b8) at vl.c:450
>>>>>>>>
>>>>>>>>
>>>>>>>> Signed-off-by: Lei Rao <lei.rao@intel.com>
>>>>>>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>>>>>>>> ---
>>>>>>>>    block/nbd.c          |  5 +++++
>>>>>>>>    include/io/channel.h | 19 +++++++++++++++++++
>>>>>>>>    io/channel.c         | 22 ++++++++++++++++++++++
>>>>>>>>    3 files changed, 46 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/block/nbd.c b/block/nbd.c index 5ef462db1b..5ee4eaaf57
>>>>>>>> 100644
>>>>>>>> --- a/block/nbd.c
>>>>>>>> +++ b/block/nbd.c
>>>>>>>> @@ -208,6 +208,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>>>>>>>>        assert(!s->in_flight);
>>>>>>>>        if (s->ioc) {
>>>>>>>> +        qio_channel_set_force_quit(s->ioc, true);
>>>>>>>> +        qio_channel_coroutines_wake(s->ioc);
>>>>>>>>            qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH,
>>>>>>>> NULL);
>>>>>>>
>>>>>>> Calling shutdown here should already be causing the qio_chanel_readv_all to wakeup and break out of its
>>>>>>> poll() loop. We shouldn't need to add a second way to break out of the poll().
>>>>>>>
>>>>>>> Calling shutdown can wake up the coroutines which is polling. But I think it's not enough. I tried to forcibly wake up the NBD coroutine,
>>>>>>> It may cause segment fault. The root cause is that it will continue to access ioc->xxx in qio_channel_yield(), but the ioc has been released and set it NULL such as in
>>>>>>> nbd_co_do_establish_connection(); I think call shutdown will have the same result. So, I add the force_quit, once set it true, it will immediately exit without accessing IOC.
>>>>>>>
>>>>>>
>>>>>> What do you mean by "the NBD coroutine" and by "forcibly wake up"?
>>>>>>
>>>>>> In recent Qemu there is no specific NBD coroutine. Only request coroutines should exist.
>>>>>>
>>>>>> Forcibly waking up any coroutine is generally unsafe in Qemu, most of the code is not prepared for this, crash is normal result for such try.
>>>>>>
>>>>>> Also, there is good mechanism to debug coroutines in gdb:
>>>>>>
>>>>>> source scripts/qemu-gdb.py
>>>>>> qemu coroutine <coroutine pointer>
>>>>>>
>>>>>> - it will dump stack trace of a coroutine, it may help.
>>>>>>
>>>>>> Also, to find coroutines look at bs->tracked_requests list, all requests of bs are in the list with coroutine pointers in .co field.
>>>>>
>>>>> I am sorry for the unclear description. The NBD coroutine means the request coroutines.
>>>>>
>>>>> About "the forcibly wake up" I just set the receiving to true before qio_channel_writev_all() in nbd_co_send_request()
>>>>> to ensure that the request coroutines can be awakened by nbd_recv_coroutine_wake_one(). But that's just a test.
>>>
>>> I'm not sure that it's safe. On nbd_channel_error() we call qio_channel_shutdown() - it should wake up everything that sleep in qio_channel code. The .receiving field is only for nbd.c specific internal yield point.
>>>
>>> Could your change itself lead to the crash I'm not sure too.
>>>
>>> Still, on normal execution paths, there shouldn't be a situation when some qio_channel_writev_full_all() is still executing during nbd_co_do_establish_connection(). If it is - it's a bug in nbd.c of course.
>>>
>>> To check it without changing the .receiving field, you should print all stacks of nbd request coroutines at the point of qemu hang. What they all are doing?
>>>
>>> If one of them is in nbd_co_do_establish_connection() and another is in nbd_co_do_establish_connection()  - that's a bug.
>>
>> OK,I will try to reproduce this bug again, but this may take some time. Previously in our stress test, it usually takes two or three days to hit this issue.
> 
> 
> The bug can be reproduced in the latest QEMU. But the root cause is different due to QEMU code change.
> The gdb stack is as follows:
> 
> #0 0x00007fe00c53dbf6 in __ppoll (fds=0x7fdbcc140f40, nfds=2, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
> #1 0x000055c67de72a11 in qemu_poll_ns (fds=0x7fdbcc140f40, nfds=2, timeout=-1) at ../util/qemu-timer.c:336
> #2 0x000055c67de43fab in fdmon_poll_wait (ctx=0x55c67ea73ca0, ready_list=0x7fdbec9fe730, timeout=-1) at ../util/fdmon-poll.c:80
> #3 0x000055c67de4387b in aio_poll (ctx=0x55c67ea73ca0, blocking=true) at ../util/aio-posix.c:607
> #4 0x000055c67dd1230a in bdrv_drain_all_begin () at ../block/io.c:675
> #5 0x000055c67dd1271a in bdrv_drain_all () at ../block/io.c:726
> #6 0x000055c67db3b1a2 in do_vm_stop (state=RUN_STATE_COLO, send_stop=true) at ../softmmu/cpus.c:268
> #7 0x000055c67db3bec9 in vm_stop (state=RUN_STATE_COLO) at ../softmmu/cpus.c:658
> #8 0x000055c67db3bf9d in vm_stop_force_state (state=RUN_STATE_COLO) at ../softmmu/cpus.c:707
> #9 0x000055c67d8159da in colo_incoming_process_checkpoint (mis=0x55c67ed0d550, fb=0x7fdbcc135f40, bioc=0x7fdbcc02d010, errp=0x7fdbec9fe928)at ../migration/colo.c:685
> #10 0x000055c67d815e3f in colo_wait_handle_message (mis=0x55c67ed0d550, fb=0x7fdbcc135f40, bioc=0x7fdbcc02d010, errp=0x7fdbec9fe928)at ../migration/colo.c:815
> #11 0x000055c67d81604a in colo_process_incoming_thread (opaque=0x55c67ed0d550) at ../migration/colo.c:883
> #12 0x000055c67de47e1d in qemu_thread_start (args=0x55c67f1bb380) at ../util/qemu-thread-posix.c:556
> #13 0x00007fe00c625609 in start_thread (arg=<optimized out>) at pthread_create.c:477
> 
>  From the call trace, we can see the thread of COLO is hung in bdrv_drain_all(). Specifically, it is hung in blk_root_drained_poll().
> 
> After analysis, the root cause is that, when the QEMU is draining IO, the nbd_drianed_poll returns true because there are some
> IO requests not completed. This causes aio_poll having no chance to return.
> 
> The data dump of the NBD client is as follows:
> 
> (gdb) p client
> $41 = (NBDClient *) 0x5639f19754a0
> (gdb) p *client
> $42 = {refcount = 3, close_fn = 0x5639ef270bd4 <nbd_blockdev_client_closed>, exp = 0x5639f1d769d0, tlscreds = 0x0, tlsauthz = 0x0, sioc = 0x5639f1d8aa20,
>    ioc = 0x5639f1d8aa20, recv_coroutine = 0x5639f183bb10, send_lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0},
>      handoff = 0, sequence = 0, holder = 0x0}, send_coroutine = 0x0, read_yielding = false, quiescing = true, next = {tqe_next = 0x0, tqe_circ = {
>        tql_next = 0x0, tql_prev = 0x5639f1d76a28}}, nb_requests = 1, closing = false, check_align = 1, structured_reply = true, export_meta = {
>      exp = 0x5639f1d769d0, count = 1, base_allocation = true, allocation_depth = false, bitmaps = 0x0}, opt = 7, optlen = 0}
> 
> We can see the nb_requests = 1. That means there is an IO request not completed on the NBD server side.
> And the NBD receive coroutine is as follows:
> 
> (gdb) qemu coroutine 0x5639f183bb10
> #0  0x00005639ef3fe50b in qemu_coroutine_switch (from_=0x5639f183bb10, to_=0x7fd0f5fff5a0, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:302
> #1  0x00005639ef40dfc2 in qemu_coroutine_yield () at ../util/qemu-coroutine.c:193
> #2  0x00005639ef24e4db in qio_channel_yield (ioc=0x5639f1d8aa20, condition=G_IO_IN) at ../io/channel.c:545
> #3  0x00005639ef24d71a in qio_channel_readv_full_all_eof (ioc=0x5639f1d8aa20, iov=0x7fcc20edcde0, niov=1, fds=0x0, nfds=0x0, errp=0x7fcc20edcf10)
>      at ../io/channel.c:138
> #4  0x00005639ef24d8df in qio_channel_readv_full_all (ioc=0x5639f1d8aa20, iov=0x7fcc20edcde0, niov=1, fds=0x0, nfds=0x0, errp=0x7fcc20edcf10)
>      at ../io/channel.c:202
> #5  0x00005639ef24d5ec in qio_channel_readv_all (ioc=0x5639f1d8aa20, iov=0x7fcc20edcde0, niov=1, errp=0x7fcc20edcf10) at ../io/channel.c:103
> #6  0x00005639ef24dd1d in qio_channel_read_all (ioc=0x5639f1d8aa20, buf=0x5639f30eb000 "", buflen=19660800, errp=0x7fcc20edcf10) at ../io/channel.c:320
> #7  0x00005639ef25ec54 in nbd_read (ioc=0x5639f1d8aa20, buffer=0x5639f30eb000, size=19660800, desc=0x5639ef5ae73a "CMD_WRITE data", errp=0x7fcc20edcf10)
>      at /home/workspace/qemu/include/block/nbd.h:361
> #8  0x00005639ef264682 in nbd_co_receive_request (req=0x5639f1744a50, request=0x7fcc20edcf30, errp=0x7fcc20edcf10) at ../nbd/server.c:2328
> #9  0x00005639ef2652b3 in nbd_trip (opaque=0x5639f19754a0) at ../nbd/server.c:2629
> #10 0x00005639ef3fe20a in coroutine_trampoline (i0=-243025136, i1=22073) at ../util/coroutine-ucontext.c:173
> #11 0x00007fd52da17660 in __start_context () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91


So client side hangs in bdrv_drain_all_begin() yes? We see that server side waits for data of NBD_CMD_WRITE, it's normal for server side.

But what is going on in the client, we don't know. We need also a call stack of coroutine of tracked request on the client side. So on client, you should look bs->tracked_requests list, where bs is BlockDriverState of nbd node, and check .co field of the request, that's a coroutine, and run qemu coroutine command.


> 
> We can see the NBD server is trying to read data from NBD client, while NBD client can't reply data
> in time because the QEMU main thread is busy on other tasks. So, the IO request can't be completed
> Then it will hang here. The point is we can't assume NBD client behaves like local block driver,
> network/NBD client side maybe have delay,etc...

Yes, but on channel_shutdown it should finish immediately, isn't it?

Also, you don't say about any network problems in your environment, so we are not saying about hard disconnect?

There still may be some bug, that can be fixed.

> 
> Although this issue happens in COLO stress test, this looks like a generic problem.
> if NBD is involved, any usage of bdrv_drain_all() will depend on NBD client to complete all the requests.
> However, generally we shouldn't depend on the NBD client side because all bad things can happen on network.


Yes, bdrv_drain_all() waits for all in-flight requests to finish, and if we have long requests, we can wait for a long time.

For example, if nbd reconnect-delay is set to 30 seconds (I don't think that it's your case?), and server disconnected, bdrv_drain_all will hang up to 30 seconds. To avoid this hang on block-job cancelling (assume you have mirror job, you want to cancel it, but you have to wait 30 seconds for drain) there was implemented bdrv_cancel_in_flight(), called from mirror_cancel(), that finally cancel all requests waiting for connection.

Another example, is hard disconnect of the server: in this case tcp io may hang for a long time. To avoid this, you should setup keep-alive properly.

But as I understand, that's all not your problems, as network is OK?

> 
> To solve this COLO issue, one option is, we can slightly change COLO operation sequence to drain IO in the
> a NBD client before the NBD server drains IO.

What you mean by drain in NBD server? As I see from call stacks, drain is on the client side, and on the server we just serve requsts..



-- 
Best regards,
Vladimir

Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
Posted by Rao, Lei 2 years, 4 months ago

On 12/13/2021 4:38 PM, Vladimir Sementsov-Ogievskiy wrote:
> 13.12.2021 11:02, Rao, Lei wrote:
>>
>>
>> On 12/3/2021 9:26 AM, Rao, Lei wrote:
>>>
>>>
>>> On 12/2/2021 5:54 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>> 02.12.2021 11:53, Daniel P. Berrangé wrote:
>>>>> On Thu, Dec 02, 2021 at 01:14:47PM +0800, Rao, Lei wrote:
>>>>>> Sorry, resending with correct indentation and quoting.
>>>>>>
>>>>>> On 12/1/2021 10:27 PM, Vladimir Sementsov-Ogievskiy wrote:
>>>>>>> 01.12.2021 12:48, Rao, Lei wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Daniel P. Berrangé <berrange@redhat.com>
>>>>>>>> Sent: Wednesday, December 1, 2021 5:11 PM
>>>>>>>> To: Rao, Lei <lei.rao@intel.com>
>>>>>>>> Cc: Zhang, Chen <chen.zhang@intel.com>; eblake@redhat.com; vsementsov@virtuozzo.com; kwolf@redhat.com; hreitz@redhat.com; qemu-block@nongnu.org; qemu-devel@nongnu.org
>>>>>>>> Subject: Re: [PATCH] QIO: Add force_quit to the QIOChannel to ensure QIO exits cleanly in some corner case
>>>>>>>>
>>>>>>>> On Wed, Dec 01, 2021 at 03:54:27PM +0800, Rao, Lei wrote:
>>>>>>>>>       We found that the QIO channel coroutine could not be awakened in some corner cases during our stress test for COLO.
>>>>>>>>>       The patch fixes as follow:
>>>>>>>>>           #0  0x00007fad72e24bf6 in __ppoll (fds=0x5563d75861f0, nfds=1, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
>>>>>>>>>           #1  0x00005563d6977c68 in qemu_poll_ns (fds=0x5563d75861f0, nfds=1, timeout=599999697630) at util/qemu-timer.c:348
>>>>>>>>>           #2  0x00005563d697ac44 in aio_poll (ctx=0x5563d755dfa0, blocking=true) at util/aio-posix.c:669
>>>>>>>>>           #3  0x00005563d68ba24f in bdrv_do_drained_begin (bs=0x5563d75ea0c0, recursive=false, parent=0x0, ignore_bds_parents=false, poll=true) at block/io.c:432
>>>>>>>>>           #4  0x00005563d68ba338 in bdrv_drained_begin (bs=0x5563d75ea0c0) at block/io.c:438
>>>>>>>>>           #5  0x00005563d689c6d2 in quorum_del_child (bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block/quorum.c:1063
>>>>>>>>>           #6  0x00005563d684328f in bdrv_del_child (parent_bs=0x5563d75ea0c0, child=0x5563d7908600, errp=0x7fff3cf5b960) at block.c:6568
>>>>>>>>>           #7  0x00005563d658499e in qmp_x_blockdev_change (parent=0x5563d80ec4c0 "colo-disk0", has_child=true, child=0x5563d7577d30 "children.1", has_node=false, node=0x0,errp=0x7fff3cf5b960) at blockdev.c:4494
>>>>>>>>>           #8  0x00005563d67e8b4e in qmp_marshal_x_blockdev_change (args=0x7fad6400ada0, ret=0x7fff3cf5b9f8, errp=0x7fff3cf5b9f0) at qapi/qapi-commands-block-core.c:1538
>>>>>>>>>           #9  0x00005563d691cd9e in do_qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true, errp=0x7fff3cf5ba98)
>>>>>>>>>               at qapi/qmp-dispatch.c:132
>>>>>>>>>           #10 0x00005563d691cfab in qmp_dispatch (cmds=0x5563d719a4f0 <qmp_commands>, request=0x7fad64009d80, allow_oob=true) at qapi/qmp-dispatch.c:175
>>>>>>>>>           #11 0x00005563d67b7787 in monitor_qmp_dispatch (mon=0x5563d7605d40, req=0x7fad64009d80) at monitor/qmp.c:145
>>>>>>>>>           #12 0x00005563d67b7b24 in monitor_qmp_bh_dispatcher (data=0x0) at monitor/qmp.c:234
>>>>>>>>>           #13 0x00005563d69754c2 in aio_bh_call (bh=0x5563d7473230) at util/async.c:89
>>>>>>>>>           #14 0x00005563d697555e in aio_bh_poll (ctx=0x5563d7471da0) at util/async.c:117
>>>>>>>>>           #15 0x00005563d697a423 in aio_dispatch (ctx=0x5563d7471da0) at util/aio-posix.c:459
>>>>>>>>>           #16 0x00005563d6975917 in aio_ctx_dispatch (source=0x5563d7471da0, callback=0x0, user_data=0x0) at util/async.c:260
>>>>>>>>>           #17 0x00007fad730e1fbd in g_main_context_dispatch () from /lib/x86_64-linux-gnu/libglib-2.0.so.0
>>>>>>>>>           #18 0x00005563d6978cda in glib_pollfds_poll () at util/main-loop.c:219
>>>>>>>>>           #19 0x00005563d6978d58 in os_host_main_loop_wait (timeout=977650) at util/main-loop.c:242
>>>>>>>>>           #20 0x00005563d6978e69 in main_loop_wait (nonblocking=0) at util/main-loop.c:518
>>>>>>>>>           #21 0x00005563d658f5ed in main_loop () at vl.c:1814
>>>>>>>>>           #22 0x00005563d6596bb7 in main (argc=61, argv=0x7fff3cf5c0c8,
>>>>>>>>> envp=0x7fff3cf5c2b8) at vl.c:450
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Signed-off-by: Lei Rao <lei.rao@intel.com>
>>>>>>>>> Signed-off-by: Zhang Chen <chen.zhang@intel.com>
>>>>>>>>> ---
>>>>>>>>>    block/nbd.c          |  5 +++++
>>>>>>>>>    include/io/channel.h | 19 +++++++++++++++++++
>>>>>>>>>    io/channel.c         | 22 ++++++++++++++++++++++
>>>>>>>>>    3 files changed, 46 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/block/nbd.c b/block/nbd.c index 5ef462db1b..5ee4eaaf57
>>>>>>>>> 100644
>>>>>>>>> --- a/block/nbd.c
>>>>>>>>> +++ b/block/nbd.c
>>>>>>>>> @@ -208,6 +208,8 @@ static void nbd_teardown_connection(BlockDriverState *bs)
>>>>>>>>>        assert(!s->in_flight);
>>>>>>>>>        if (s->ioc) {
>>>>>>>>> +        qio_channel_set_force_quit(s->ioc, true);
>>>>>>>>> +        qio_channel_coroutines_wake(s->ioc);
>>>>>>>>>            qio_channel_shutdown(s->ioc, QIO_CHANNEL_SHUTDOWN_BOTH,
>>>>>>>>> NULL);
>>>>>>>>
>>>>>>>> Calling shutdown here should already be causing the qio_chanel_readv_all to wakeup and break out of its
>>>>>>>> poll() loop. We shouldn't need to add a second way to break out of the poll().
>>>>>>>>
>>>>>>>> Calling shutdown can wake up the coroutines which is polling. But I think it's not enough. I tried to forcibly wake up the NBD coroutine,
>>>>>>>> It may cause segment fault. The root cause is that it will continue to access ioc->xxx in qio_channel_yield(), but the ioc has been released and set it NULL such as in
>>>>>>>> nbd_co_do_establish_connection(); I think call shutdown will have the same result. So, I add the force_quit, once set it true, it will immediately exit without accessing IOC.
>>>>>>>>
>>>>>>>
>>>>>>> What do you mean by "the NBD coroutine" and by "forcibly wake up"?
>>>>>>>
>>>>>>> In recent Qemu there is no specific NBD coroutine. Only request coroutines should exist.
>>>>>>>
>>>>>>> Forcibly waking up any coroutine is generally unsafe in Qemu, most of the code is not prepared for this, crash is normal result for such try.
>>>>>>>
>>>>>>> Also, there is good mechanism to debug coroutines in gdb:
>>>>>>>
>>>>>>> source scripts/qemu-gdb.py
>>>>>>> qemu coroutine <coroutine pointer>
>>>>>>>
>>>>>>> - it will dump stack trace of a coroutine, it may help.
>>>>>>>
>>>>>>> Also, to find coroutines look at bs->tracked_requests list, all requests of bs are in the list with coroutine pointers in .co field.
>>>>>>
>>>>>> I am sorry for the unclear description. The NBD coroutine means the request coroutines.
>>>>>>
>>>>>> About "the forcibly wake up" I just set the receiving to true before qio_channel_writev_all() in nbd_co_send_request()
>>>>>> to ensure that the request coroutines can be awakened by nbd_recv_coroutine_wake_one(). But that's just a test.
>>>>
>>>> I'm not sure that it's safe. On nbd_channel_error() we call qio_channel_shutdown() - it should wake up everything that sleep in qio_channel code. The .receiving field is only for nbd.c specific internal yield point.
>>>>
>>>> Could your change itself lead to the crash I'm not sure too.
>>>>
>>>> Still, on normal execution paths, there shouldn't be a situation when some qio_channel_writev_full_all() is still executing during nbd_co_do_establish_connection(). If it is - it's a bug in nbd.c of course.
>>>>
>>>> To check it without changing the .receiving field, you should print all stacks of nbd request coroutines at the point of qemu hang. What they all are doing?
>>>>
>>>> If one of them is in nbd_co_do_establish_connection() and another is in nbd_co_do_establish_connection()  - that's a bug.
>>>
>>> OK,I will try to reproduce this bug again, but this may take some time. Previously in our stress test, it usually takes two or three days to hit this issue.
>>
>>
>> The bug can be reproduced in the latest QEMU. But the root cause is different due to QEMU code change.
>> The gdb stack is as follows:
>>
>> #0 0x00007fe00c53dbf6 in __ppoll (fds=0x7fdbcc140f40, nfds=2, timeout=<optimized out>, sigmask=0x0) at ../sysdeps/unix/sysv/linux/ppoll.c:44
>> #1 0x000055c67de72a11 in qemu_poll_ns (fds=0x7fdbcc140f40, nfds=2, timeout=-1) at ../util/qemu-timer.c:336
>> #2 0x000055c67de43fab in fdmon_poll_wait (ctx=0x55c67ea73ca0, ready_list=0x7fdbec9fe730, timeout=-1) at ../util/fdmon-poll.c:80
>> #3 0x000055c67de4387b in aio_poll (ctx=0x55c67ea73ca0, blocking=true) at ../util/aio-posix.c:607
>> #4 0x000055c67dd1230a in bdrv_drain_all_begin () at ../block/io.c:675
>> #5 0x000055c67dd1271a in bdrv_drain_all () at ../block/io.c:726
>> #6 0x000055c67db3b1a2 in do_vm_stop (state=RUN_STATE_COLO, send_stop=true) at ../softmmu/cpus.c:268
>> #7 0x000055c67db3bec9 in vm_stop (state=RUN_STATE_COLO) at ../softmmu/cpus.c:658
>> #8 0x000055c67db3bf9d in vm_stop_force_state (state=RUN_STATE_COLO) at ../softmmu/cpus.c:707
>> #9 0x000055c67d8159da in colo_incoming_process_checkpoint (mis=0x55c67ed0d550, fb=0x7fdbcc135f40, bioc=0x7fdbcc02d010, errp=0x7fdbec9fe928)at ../migration/colo.c:685
>> #10 0x000055c67d815e3f in colo_wait_handle_message (mis=0x55c67ed0d550, fb=0x7fdbcc135f40, bioc=0x7fdbcc02d010, errp=0x7fdbec9fe928)at ../migration/colo.c:815
>> #11 0x000055c67d81604a in colo_process_incoming_thread (opaque=0x55c67ed0d550) at ../migration/colo.c:883
>> #12 0x000055c67de47e1d in qemu_thread_start (args=0x55c67f1bb380) at ../util/qemu-thread-posix.c:556
>> #13 0x00007fe00c625609 in start_thread (arg=<optimized out>) at pthread_create.c:477
>>
>>  From the call trace, we can see the thread of COLO is hung in bdrv_drain_all(). Specifically, it is hung in blk_root_drained_poll().
>>
>> After analysis, the root cause is that, when the QEMU is draining IO, the nbd_drianed_poll returns true because there are some
>> IO requests not completed. This causes aio_poll having no chance to return.
>>
>> The data dump of the NBD client is as follows:
>>
>> (gdb) p client
>> $41 = (NBDClient *) 0x5639f19754a0
>> (gdb) p *client
>> $42 = {refcount = 3, close_fn = 0x5639ef270bd4 <nbd_blockdev_client_closed>, exp = 0x5639f1d769d0, tlscreds = 0x0, tlsauthz = 0x0, sioc = 0x5639f1d8aa20,
>>    ioc = 0x5639f1d8aa20, recv_coroutine = 0x5639f183bb10, send_lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0},
>>      handoff = 0, sequence = 0, holder = 0x0}, send_coroutine = 0x0, read_yielding = false, quiescing = true, next = {tqe_next = 0x0, tqe_circ = {
>>        tql_next = 0x0, tql_prev = 0x5639f1d76a28}}, nb_requests = 1, closing = false, check_align = 1, structured_reply = true, export_meta = {
>>      exp = 0x5639f1d769d0, count = 1, base_allocation = true, allocation_depth = false, bitmaps = 0x0}, opt = 7, optlen = 0}
>>
>> We can see the nb_requests = 1. That means there is an IO request not completed on the NBD server side.
>> And the NBD receive coroutine is as follows:
>>
>> (gdb) qemu coroutine 0x5639f183bb10
>> #0  0x00005639ef3fe50b in qemu_coroutine_switch (from_=0x5639f183bb10, to_=0x7fd0f5fff5a0, action=COROUTINE_YIELD) at ../util/coroutine-ucontext.c:302
>> #1  0x00005639ef40dfc2 in qemu_coroutine_yield () at ../util/qemu-coroutine.c:193
>> #2  0x00005639ef24e4db in qio_channel_yield (ioc=0x5639f1d8aa20, condition=G_IO_IN) at ../io/channel.c:545
>> #3  0x00005639ef24d71a in qio_channel_readv_full_all_eof (ioc=0x5639f1d8aa20, iov=0x7fcc20edcde0, niov=1, fds=0x0, nfds=0x0, errp=0x7fcc20edcf10)
>>      at ../io/channel.c:138
>> #4  0x00005639ef24d8df in qio_channel_readv_full_all (ioc=0x5639f1d8aa20, iov=0x7fcc20edcde0, niov=1, fds=0x0, nfds=0x0, errp=0x7fcc20edcf10)
>>      at ../io/channel.c:202
>> #5  0x00005639ef24d5ec in qio_channel_readv_all (ioc=0x5639f1d8aa20, iov=0x7fcc20edcde0, niov=1, errp=0x7fcc20edcf10) at ../io/channel.c:103
>> #6  0x00005639ef24dd1d in qio_channel_read_all (ioc=0x5639f1d8aa20, buf=0x5639f30eb000 "", buflen=19660800, errp=0x7fcc20edcf10) at ../io/channel.c:320
>> #7  0x00005639ef25ec54 in nbd_read (ioc=0x5639f1d8aa20, buffer=0x5639f30eb000, size=19660800, desc=0x5639ef5ae73a "CMD_WRITE data", errp=0x7fcc20edcf10)
>>      at /home/workspace/qemu/include/block/nbd.h:361
>> #8  0x00005639ef264682 in nbd_co_receive_request (req=0x5639f1744a50, request=0x7fcc20edcf30, errp=0x7fcc20edcf10) at ../nbd/server.c:2328
>> #9  0x00005639ef2652b3 in nbd_trip (opaque=0x5639f19754a0) at ../nbd/server.c:2629
>> #10 0x00005639ef3fe20a in coroutine_trampoline (i0=-243025136, i1=22073) at ../util/coroutine-ucontext.c:173
>> #11 0x00007fd52da17660 in __start_context () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
> 
> 
> So client side hangs in bdrv_drain_all_begin() yes? We see that server side waits for data of NBD_CMD_WRITE, it's normal for server side.
> 
> But what is going on in the client, we don't know. We need also a call stack of coroutine of tracked request on the client side. So on client, you should look bs->tracked_requests list, where bs is BlockDriverState of nbd node, and check .co field of the request, that's a coroutine, and run qemu coroutine command.
> 
> 
>>
>> We can see the NBD server is trying to read data from NBD client, while NBD client can't reply data
>> in time because the QEMU main thread is busy on other tasks. So, the IO request can't be completed
>> Then it will hang here. The point is we can't assume NBD client behaves like local block driver,
>> network/NBD client side maybe have delay,etc...
> 
> Yes, but on channel_shutdown it should finish immediately, isn't it?
> 
> Also, you don't say about any network problems in your environment, so we are not saying about hard disconnect?
> 
> There still may be some bug, that can be fixed.
> 
>>
>> Although this issue happens in COLO stress test, this looks like a generic problem.
>> if NBD is involved, any usage of bdrv_drain_all() will depend on NBD client to complete all the requests.
>> However, generally we shouldn't depend on the NBD client side because all bad things can happen on network.
> 
> 
> Yes, bdrv_drain_all() waits for all in-flight requests to finish, and if we have long requests, we can wait for a long time.
> 
> For example, if nbd reconnect-delay is set to 30 seconds (I don't think that it's your case?), and server disconnected, bdrv_drain_all will hang up to 30 seconds. To avoid this hang on block-job cancelling (assume you have mirror job, you want to cancel it, but you have to wait 30 seconds for drain) there was implemented bdrv_cancel_in_flight(), called from mirror_cancel(), that finally cancel all requests waiting for connection.
> 
> Another example, is hard disconnect of the server: in this case tcp io may hang for a long time. To avoid this, you should setup keep-alive properly.
> 
> But as I understand, that's all not your problems, as network is OK?

There is no network issue. This is COLO usage. There are COLO primary VM and secodary VM running on different physical nodes. Primary VM constantly sync disk data to secondary VM via NBD, in which secondary VM is NBD server and
primary VM is NBD client.

In this specific issue, due to some COLO bugs, primary VM hangs in some scenario which causes NBD client hang, so no data return to NBD server.

We can fix this issue in COLO. However, I am thinking this may also show an issue in NBD. NBD server shouldn't wait forever for client side in the nbd_drained_poll() if nb_requests is not equal to zero. What do you think?

Thanks,
Lei

> 
>>
>> To solve this COLO issue, one option is, we can slightly change COLO operation sequence to drain IO in the
>> a NBD client before the NBD server drains IO.
> 
> What you mean by drain in NBD server? As I see from call stacks, drain is on the client side, and on the server we just serve requsts..
> 
> 
>