[v1] Fix deadlock with bdrv_open of self-served NBD

[PATCH 7/8] qio: Use AioContext for default-context QIONetListener

Posted by Eric Blake 1 week, 5 days ago

The user "John Doe" reported a deadlock when attempting to use
qemu-storage-daemon to serve both a base file over NBD, and a qcow2
file with that NBD export as its backing file, from the same process,
even though it worked just fine when there were two q-s-d processes.
The bulk of the NBD server code properly uses coroutines to make
progress in an event-driven manner, but the code for spawning a new
coroutine at the point when listen(2) detects a new client was
hard-coded to use the global GMainContext; in other words, the
callback that triggers nbd_client_new to let the server start the
negotiation sequence with the client requires the main loop to be
making progress.  However, the code for bdrv_open of a qcow2 image
with an NBD backing file uses an AIO_WAIT_WHILE nested event loop to
ensure that the entire qcow2 backing chain is either fully loaded or
rejected, without any side effects from the main loop causing unwanted
changes to the disk being loaded (in short, an AioContext represents
the set of actions that are known to be safe while handling block
layer I/O, while excluding any other pending actions in the global
main loop with potentially larger risk of unwanted side effects).

This creates a classic case of deadlock: the server can't progress to
the point of accept(2)ing the client to write to the NBD socket
because the main loop is being starved until the AIO_WAIT_WHILE
completes the bdrv_open, but the AIO_WAIT_WHILE can't progress because
it is blocked on the client coroutine stuck in a read() of the
expected magic number from the server side of the socket.

Fortunately, the way that AioContext is set up, any callback that is
registered to the global AioContext will also be serviced by the main
loop.  So the fix for the deadlock is to alter QIONetListener so that
if it is not being used in an explicit alternative GMainContext, then
it should perform its polling via the global AioContext (which
indirectly still progresses in the default GMainContext) rather than
directly in the default GMainContext.  This has no change in behavior
to any prior use that did not starve the main loop, but has the
additional benefit that in the bdrv_open case of a nested AioContext
loop, the server's listen/accept handler is no longer starved because
it is now part of the same AioContext loop.  From there, since NBD
already uses coroutines for both server and client code, the nested
AioContext loop finishes quickly and opening the qcow2 backing chain
no longer deadlocks.

The next patch will add a unit test (kept separate to make it easier
to rearrange the series to demonstrate the deadlock without this
patch).

Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
Signed-off-by: Eric Blake <eblake@redhat.com>
---
 io/net-listener.c | 53 ++++++++++++++++++++++++++++++++++++++---------
 io/trace-events   |  4 ++--
 2 files changed, 45 insertions(+), 12 deletions(-)

diff --git a/io/net-listener.c b/io/net-listener.c
index ce29bf3c993..9f4e3c0be0c 100644
--- a/io/net-listener.c
+++ b/io/net-listener.c
@@ -23,6 +23,7 @@
 #include "io/dns-resolver.h"
 #include "qapi/error.h"
 #include "qemu/module.h"
+#include "qemu/main-loop.h"
 #include "trace.h"

 QIONetListener *qio_net_listener_new(void)
@@ -62,6 +63,15 @@ static gboolean qio_net_listener_channel_func(QIOChannel *ioc,
 }


+static void qio_net_listener_aio_func(void *opaque)
+{
+    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(opaque);
+
+    qio_net_listener_channel_func(QIO_CHANNEL(sioc), G_IO_IN,
+                                  sioc->listener);
+}
+
+
 int qio_net_listener_open_sync(QIONetListener *listener,
                                SocketAddress *addr,
                                int num,
@@ -117,15 +127,33 @@ qio_net_listener_watch(QIONetListener *listener, size_t i, const char *caller)
         return;
     }

-    trace_qio_net_listener_watch_enabled(listener, listener->io_func, caller);
+    trace_qio_net_listener_watch_enabled(listener, listener->io_func,
+                                         listener->context, caller);
     if (i == 0) {
         object_ref(OBJECT(listener));
     }
     for ( ; i < listener->nsioc; i++) {
-        listener->io_source[i] = qio_channel_add_watch_source(
-            QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
-            qio_net_listener_channel_func,
-            listener, NULL, listener->context);
+        if (listener->context) {
+            /*
+             * The user passed a GMainContext with the async callback;
+             * they plan on running their own g_main_loop.
+             */
+            listener->io_source[i] = qio_channel_add_watch_source(
+                QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
+                qio_net_listener_channel_func,
+                listener, NULL, listener->context);
+        } else {
+            /*
+             * The user is fine with the default context. But by doing
+             * it in the main thread's AioContext rather than
+             * specifically in a GMainContext, we can remain
+             * responsive even if another AioContext depends on
+             * connecting to this server.
+             */
+            aio_set_fd_handler(qemu_get_aio_context(), listener->sioc[i]->fd,
+                               qio_net_listener_aio_func, NULL, NULL, NULL,
+                               listener->sioc[i]);
+        }
     }
 }

@@ -138,12 +166,17 @@ qio_net_listener_unwatch(QIONetListener *listener, const char *caller)
         return;
     }

-    trace_qio_net_listener_watch_disabled(listener, caller);
+    trace_qio_net_listener_watch_disabled(listener, listener->context, caller);
     for (i = 0; i < listener->nsioc; i++) {
-        if (listener->io_source[i]) {
-            g_source_destroy(listener->io_source[i]);
-            g_source_unref(listener->io_source[i]);
-            listener->io_source[i] = NULL;
+        if (listener->context) {
+            if (listener->io_source[i]) {
+                g_source_destroy(listener->io_source[i]);
+                g_source_unref(listener->io_source[i]);
+                listener->io_source[i] = NULL;
+            }
+        } else {
+            aio_set_fd_handler(qemu_get_aio_context(), listener->sioc[i]->fd,
+                               NULL, NULL, NULL, NULL, NULL);
         }
     }
     object_unref(OBJECT(listener));
diff --git a/io/trace-events b/io/trace-events
index 8cc4cae3a5d..1b01b2d51e6 100644
--- a/io/trace-events
+++ b/io/trace-events
@@ -74,6 +74,6 @@ qio_channel_command_abort(void *ioc, int pid) "Command abort ioc=%p pid=%d"
 qio_channel_command_wait(void *ioc, int pid, int ret, int status) "Command abort ioc=%p pid=%d ret=%d status=%d"

 # net-listener.c
-qio_net_listener_watch_enabled(void *listener, void *func, const char *extra) "Net listener=%p watch enabled func=%p by %s"
-qio_net_listener_watch_disabled(void *listener, const char *extra) "Net listener=%p watch disabled by %s"
+qio_net_listener_watch_enabled(void *listener, void *func, void *ctx, const char *extra) "Net listener=%p watch enabled func=%p ctx=%p by %s"
+qio_net_listener_watch_disabled(void *listener, void *ctx, const char *extra) "Net listener=%p watch disabled ctx=%p by %s"
 qio_net_listener_callback(void *listener, void *func) "Net listener=%p callback forwarding to func=%p"
-- 
2.51.1

Re: [PATCH 7/8] qio: Use AioContext for default-context QIONetListener

Posted by Kevin Wolf 1 week, 4 days ago

Am 03.11.2025 um 21:10 hat Eric Blake geschrieben:
> The user "John Doe" reported a deadlock when attempting to use
> qemu-storage-daemon to serve both a base file over NBD, and a qcow2
> file with that NBD export as its backing file, from the same process,
> even though it worked just fine when there were two q-s-d processes.
> The bulk of the NBD server code properly uses coroutines to make
> progress in an event-driven manner, but the code for spawning a new
> coroutine at the point when listen(2) detects a new client was
> hard-coded to use the global GMainContext; in other words, the
> callback that triggers nbd_client_new to let the server start the
> negotiation sequence with the client requires the main loop to be
> making progress.  However, the code for bdrv_open of a qcow2 image
> with an NBD backing file uses an AIO_WAIT_WHILE nested event loop to
> ensure that the entire qcow2 backing chain is either fully loaded or
> rejected, without any side effects from the main loop causing unwanted
> changes to the disk being loaded (in short, an AioContext represents
> the set of actions that are known to be safe while handling block
> layer I/O, while excluding any other pending actions in the global
> main loop with potentially larger risk of unwanted side effects).
> 
> This creates a classic case of deadlock: the server can't progress to
> the point of accept(2)ing the client to write to the NBD socket
> because the main loop is being starved until the AIO_WAIT_WHILE
> completes the bdrv_open, but the AIO_WAIT_WHILE can't progress because
> it is blocked on the client coroutine stuck in a read() of the
> expected magic number from the server side of the socket.
> 
> Fortunately, the way that AioContext is set up, any callback that is
> registered to the global AioContext will also be serviced by the main
> loop.  So the fix for the deadlock is to alter QIONetListener so that
> if it is not being used in an explicit alternative GMainContext, then
> it should perform its polling via the global AioContext (which
> indirectly still progresses in the default GMainContext) rather than
> directly in the default GMainContext.  This has no change in behavior
> to any prior use that did not starve the main loop, but has the
> additional benefit that in the bdrv_open case of a nested AioContext
> loop, the server's listen/accept handler is no longer starved because
> it is now part of the same AioContext loop.  From there, since NBD
> already uses coroutines for both server and client code, the nested
> AioContext loop finishes quickly and opening the qcow2 backing chain
> no longer deadlocks.
> 
> The next patch will add a unit test (kept separate to make it easier
> to rearrange the series to demonstrate the deadlock without this
> patch).
> 
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
> Signed-off-by: Eric Blake <eblake@redhat.com>

This assumes that you'll only ever want to listen on a socket in the
main thread. I understand this is what the NBD server does today even if
you specify an iothread, so despite the limitation the patch is useful.
We just might need to change the interfaces again later.

Kevin

Re: [PATCH 7/8] qio: Use AioContext for default-context QIONetListener

Posted by Daniel P. Berrangé 1 week, 4 days ago

On Mon, Nov 03, 2025 at 02:10:58PM -0600, Eric Blake wrote:
> The user "John Doe" reported a deadlock when attempting to use
> qemu-storage-daemon to serve both a base file over NBD, and a qcow2
> file with that NBD export as its backing file, from the same process,
> even though it worked just fine when there were two q-s-d processes.
> The bulk of the NBD server code properly uses coroutines to make
> progress in an event-driven manner, but the code for spawning a new
> coroutine at the point when listen(2) detects a new client was
> hard-coded to use the global GMainContext; in other words, the
> callback that triggers nbd_client_new to let the server start the
> negotiation sequence with the client requires the main loop to be
> making progress.  However, the code for bdrv_open of a qcow2 image
> with an NBD backing file uses an AIO_WAIT_WHILE nested event loop to
> ensure that the entire qcow2 backing chain is either fully loaded or
> rejected, without any side effects from the main loop causing unwanted
> changes to the disk being loaded (in short, an AioContext represents
> the set of actions that are known to be safe while handling block
> layer I/O, while excluding any other pending actions in the global
> main loop with potentially larger risk of unwanted side effects).
> 
> This creates a classic case of deadlock: the server can't progress to
> the point of accept(2)ing the client to write to the NBD socket
> because the main loop is being starved until the AIO_WAIT_WHILE
> completes the bdrv_open, but the AIO_WAIT_WHILE can't progress because
> it is blocked on the client coroutine stuck in a read() of the
> expected magic number from the server side of the socket.
> 
> Fortunately, the way that AioContext is set up, any callback that is
> registered to the global AioContext will also be serviced by the main
> loop.  So the fix for the deadlock is to alter QIONetListener so that
> if it is not being used in an explicit alternative GMainContext, then
> it should perform its polling via the global AioContext (which
> indirectly still progresses in the default GMainContext) rather than
> directly in the default GMainContext.  This has no change in behavior
> to any prior use that did not starve the main loop, but has the
> additional benefit that in the bdrv_open case of a nested AioContext
> loop, the server's listen/accept handler is no longer starved because
> it is now part of the same AioContext loop.  From there, since NBD
> already uses coroutines for both server and client code, the nested
> AioContext loop finishes quickly and opening the qcow2 backing chain
> no longer deadlocks.
> 
> The next patch will add a unit test (kept separate to make it easier
> to rearrange the series to demonstrate the deadlock without this
> patch).
> 
> Fixes: https://gitlab.com/qemu-project/qemu/-/issues/3169
> Signed-off-by: Eric Blake <eblake@redhat.com>
> ---
>  io/net-listener.c | 53 ++++++++++++++++++++++++++++++++++++++---------
>  io/trace-events   |  4 ++--
>  2 files changed, 45 insertions(+), 12 deletions(-)
> 
> diff --git a/io/net-listener.c b/io/net-listener.c
> index ce29bf3c993..9f4e3c0be0c 100644
> --- a/io/net-listener.c
> +++ b/io/net-listener.c
> @@ -23,6 +23,7 @@
>  #include "io/dns-resolver.h"
>  #include "qapi/error.h"
>  #include "qemu/module.h"
> +#include "qemu/main-loop.h"
>  #include "trace.h"
> 
>  QIONetListener *qio_net_listener_new(void)
> @@ -62,6 +63,15 @@ static gboolean qio_net_listener_channel_func(QIOChannel *ioc,
>  }
> 
> 
> +static void qio_net_listener_aio_func(void *opaque)
> +{
> +    QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(opaque);
> +
> +    qio_net_listener_channel_func(QIO_CHANNEL(sioc), G_IO_IN,
> +                                  sioc->listener);
> +}
> +
> +
>  int qio_net_listener_open_sync(QIONetListener *listener,
>                                 SocketAddress *addr,
>                                 int num,
> @@ -117,15 +127,33 @@ qio_net_listener_watch(QIONetListener *listener, size_t i, const char *caller)
>          return;
>      }
> 
> -    trace_qio_net_listener_watch_enabled(listener, listener->io_func, caller);
> +    trace_qio_net_listener_watch_enabled(listener, listener->io_func,
> +                                         listener->context, caller);
>      if (i == 0) {
>          object_ref(OBJECT(listener));
>      }
>      for ( ; i < listener->nsioc; i++) {
> -        listener->io_source[i] = qio_channel_add_watch_source(
> -            QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
> -            qio_net_listener_channel_func,
> -            listener, NULL, listener->context);
> +        if (listener->context) {
> +            /*
> +             * The user passed a GMainContext with the async callback;
> +             * they plan on running their own g_main_loop.
> +             */
> +            listener->io_source[i] = qio_channel_add_watch_source(
> +                QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
> +                qio_net_listener_channel_func,
> +                listener, NULL, listener->context);
> +        } else {
> +            /*
> +             * The user is fine with the default context. But by doing
> +             * it in the main thread's AioContext rather than
> +             * specifically in a GMainContext, we can remain
> +             * responsive even if another AioContext depends on
> +             * connecting to this server.
> +             */
> +            aio_set_fd_handler(qemu_get_aio_context(), listener->sioc[i]->fd,
> +                               qio_net_listener_aio_func, NULL, NULL, NULL,
> +                               listener->sioc[i]);
> +        }

I'm not really happy with the listener directly accessing the 'fd'
fields in the QIOSocketChannel, as compared to the GSource approach
where the underlying transport is not exposed to the caller.

If we want to use an AioContext instead of a GSource, then I think
we need to add a method to either QIOChannelSocket, or QIOChannel
base, as an alternative to the GSource watches, and then have the
listener conditionally use the AioContext APIs.


Also in QIOChannel base, we have a io_set_aio_fd_handler() method
that we use elsewhere. Can we perhaps leverage that in some way.

eg, instead of taking the AioContext code path based off
"if (listener->context)", take the code path based on whether
the QIOChannel has had a call qio_channel_set_aio_fd_handler
to register AIO handlers ? Maybe that method doesn't quite fit,
but conceptually I would be more comfortable with an approach
that explicitly associates an AioContext with either the
channel or the listener object, rather than this heuristic
of "if (listener->context)".

>      }
>  }
> 
> @@ -138,12 +166,17 @@ qio_net_listener_unwatch(QIONetListener *listener, const char *caller)
>          return;
>      }
> 
> -    trace_qio_net_listener_watch_disabled(listener, caller);
> +    trace_qio_net_listener_watch_disabled(listener, listener->context, caller);
>      for (i = 0; i < listener->nsioc; i++) {
> -        if (listener->io_source[i]) {
> -            g_source_destroy(listener->io_source[i]);
> -            g_source_unref(listener->io_source[i]);
> -            listener->io_source[i] = NULL;
> +        if (listener->context) {
> +            if (listener->io_source[i]) {
> +                g_source_destroy(listener->io_source[i]);
> +                g_source_unref(listener->io_source[i]);
> +                listener->io_source[i] = NULL;
> +            }
> +        } else {
> +            aio_set_fd_handler(qemu_get_aio_context(), listener->sioc[i]->fd,
> +                               NULL, NULL, NULL, NULL, NULL);
>          }
>      }
>      object_unref(OBJECT(listener));
> diff --git a/io/trace-events b/io/trace-events
> index 8cc4cae3a5d..1b01b2d51e6 100644
> --- a/io/trace-events
> +++ b/io/trace-events
> @@ -74,6 +74,6 @@ qio_channel_command_abort(void *ioc, int pid) "Command abort ioc=%p pid=%d"
>  qio_channel_command_wait(void *ioc, int pid, int ret, int status) "Command abort ioc=%p pid=%d ret=%d status=%d"
> 
>  # net-listener.c
> -qio_net_listener_watch_enabled(void *listener, void *func, const char *extra) "Net listener=%p watch enabled func=%p by %s"
> -qio_net_listener_watch_disabled(void *listener, const char *extra) "Net listener=%p watch disabled by %s"
> +qio_net_listener_watch_enabled(void *listener, void *func, void *ctx, const char *extra) "Net listener=%p watch enabled func=%p ctx=%p by %s"
> +qio_net_listener_watch_disabled(void *listener, void *ctx, const char *extra) "Net listener=%p watch disabled ctx=%p by %s"
>  qio_net_listener_callback(void *listener, void *func) "Net listener=%p callback forwarding to func=%p"
> -- 
> 2.51.1
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|

Re: [PATCH 7/8] qio: Use AioContext for default-context QIONetListener

Posted by Eric Blake 1 week, 3 days ago

On Tue, Nov 04, 2025 at 11:37:21AM +0000, Daniel P. Berrangé wrote:
> On Mon, Nov 03, 2025 at 02:10:58PM -0600, Eric Blake wrote:
> >      for ( ; i < listener->nsioc; i++) {
> > -        listener->io_source[i] = qio_channel_add_watch_source(
> > -            QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
> > -            qio_net_listener_channel_func,
> > -            listener, NULL, listener->context);
> > +        if (listener->context) {
> > +            /*
> > +             * The user passed a GMainContext with the async callback;
> > +             * they plan on running their own g_main_loop.
> > +             */
> > +            listener->io_source[i] = qio_channel_add_watch_source(
> > +                QIO_CHANNEL(listener->sioc[i]), G_IO_IN,
> > +                qio_net_listener_channel_func,
> > +                listener, NULL, listener->context);
> > +        } else {
> > +            /*
> > +             * The user is fine with the default context. But by doing
> > +             * it in the main thread's AioContext rather than
> > +             * specifically in a GMainContext, we can remain
> > +             * responsive even if another AioContext depends on
> > +             * connecting to this server.
> > +             */
> > +            aio_set_fd_handler(qemu_get_aio_context(), listener->sioc[i]->fd,
> > +                               qio_net_listener_aio_func, NULL, NULL, NULL,
> > +                               listener->sioc[i]);
> > +        }
> 
> I'm not really happy with the listener directly accessing the 'fd'
> fields in the QIOSocketChannel, as compared to the GSource approach
> where the underlying transport is not exposed to the caller.
> 
> If we want to use an AioContext instead of a GSource, then I think
> we need to add a method to either QIOChannelSocket, or QIOChannel
> base, as an alternative to the GSource watches, and then have the
> listener conditionally use the AioContext APIs.
> 
> 
> Also in QIOChannel base, we have a io_set_aio_fd_handler() method
> that we use elsewhere. Can we perhaps leverage that in some way.

I will explore that idea for v2.

> 
> eg, instead of taking the AioContext code path based off
> "if (listener->context)", take the code path based on whether
> the QIOChannel has had a call qio_channel_set_aio_fd_handler
> to register AIO handlers ? Maybe that method doesn't quite fit,
> but conceptually I would be more comfortable with an approach
> that explicitly associates an AioContext with either the
> channel or the listener object, rather than this heuristic
> of "if (listener->context)".

I wonder if qio_channel_set_follow_coroutine_ctx() might be the
trigger point you are thinking of. NBD code already calls this, but
only AFTER the client has connected.  Would having
ioc->follow_coroutine_ctx is a better witness that the caller
specifically wants the channel behind NetListener to run in an
AioContext, rather than blindly declaring that all NetListeners get
AioContext unless they use qio_net_listener_set_client_func_full(),
and where I change the NBD code to call follow_coroutine_ctx() sooner?


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org

[PATCH 1/8] qio: Add trace points to net_listener
[PATCH 2/8] qio: Minor optimization when callback function is unchanged
[PATCH 3/8] qio: Remember context of qio_net_listener_set_client_func_full
[PATCH 4/8] qio: Factor out helpers qio_net_listener_[un]watch
[PATCH 5/8] qio: Let listening sockets remember their owning QIONetListener
[PATCH 6/8] qio: Hoist ref of listener outside loop
[PATCH 7/8] qio: Use AioContext for default-context QIONetListener
[PATCH 8/8] iotests: Add coverage of recent NBD qio deadlock fix