[PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race

Jie Song posted 1 patch 1 month ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20251111150144.76751-1-mail@jiesong.me
Maintainers: "Marc-André Lureau" <marcandre.lureau@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Markus Armbruster <armbru@redhat.com>
chardev/char-socket.c         | 18 ++++++++++++++++++
include/chardev/char-socket.h |  2 ++
monitor/qmp.c                 |  6 ++++++
3 files changed, 26 insertions(+)
[PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race
Posted by Jie Song 1 month ago
From: Jie Song <songjie_yewu@cmss.chinamobile.com>

When starting a dummy QEMU process with virsh, monitor_init_qmp() enables
IOThread monitoring of the QMP fd by default. However, a race condition
exists during the initialization phase: the IOThread only removes the
main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
which may be delayed under high system load.

This creates a window between monitor_qmp_setup_handlers_bh() and
qio_net_listener_set_client_func_full() where both the main thread and
IOThread are simultaneously monitoring the same fd and processing events.
This race can cause either the main thread or the IOThread to hang and
become unresponsive.

Fix this by proactively cleaning up the listener's IO sources in
monitor_init_qmp() before the IOThread initializes QMP monitoring,
ensuring exclusive fd ownership and eliminating the race condition.

The fix introduces socket_chr_listener_cleanup() to destroy and unref
all existing IO sources on the socket chardev listener, guaranteeing
that no concurrent fd monitoring occurs during the transition to
IOThread handling.

Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
---
 chardev/char-socket.c         | 18 ++++++++++++++++++
 include/chardev/char-socket.h |  2 ++
 monitor/qmp.c                 |  6 ++++++
 3 files changed, 26 insertions(+)

diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 62852e3caf..073a9da855 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -656,6 +656,24 @@ static void tcp_chr_telnet_destroy(SocketChardev *s)
     }
 }
 
+void socket_chr_listener_cleanup(Chardev *chr)
+{
+    SocketChardev *s = SOCKET_CHARDEV(chr);
+
+    if (s->listener) {
+        QIONetListener *listener = s->listener;
+        size_t i;
+
+        for (i = 0; i < listener->nsioc; i++) {
+            if (listener->io_source[i]) {
+                g_source_destroy(listener->io_source[i]);
+                g_source_unref(listener->io_source[i]);
+                listener->io_source[i] = NULL;
+            }
+        }
+    }
+}
+
 static void tcp_chr_update_read_handler(Chardev *chr)
 {
     SocketChardev *s = SOCKET_CHARDEV(chr);
diff --git a/include/chardev/char-socket.h b/include/chardev/char-socket.h
index d6d13ad37f..682440c6de 100644
--- a/include/chardev/char-socket.h
+++ b/include/chardev/char-socket.h
@@ -84,4 +84,6 @@ typedef struct SocketChardev SocketChardev;
 DECLARE_INSTANCE_CHECKER(SocketChardev, SOCKET_CHARDEV,
                          TYPE_CHARDEV_SOCKET)
 
+void socket_chr_listener_cleanup(Chardev *chr);
+
 #endif /* CHAR_SOCKET_H */
diff --git a/monitor/qmp.c b/monitor/qmp.c
index cb99a12d94..d9d1fafa70 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -25,6 +25,7 @@
 #include "qemu/osdep.h"
 
 #include "chardev/char-io.h"
+#include "chardev/char-socket.h"
 #include "monitor-internal.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-control.h"
@@ -537,6 +538,11 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
          * e.g. the chardev is in client mode, with wait=on.
          */
         remove_fd_in_watch(chr);
+        /*
+         * Clean up listener IO sources early to prevent racy fd
+         * handling between the main thread and the I/O thread.
+         */
+        socket_chr_listener_cleanup(chr);
         /*
          * We can't call qemu_chr_fe_set_handlers() directly here
          * since chardev might be running in the monitor I/O
-- 
2.43.0
Re: [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race
Posted by Daniel P. Berrangé 1 month ago
On Tue, Nov 11, 2025 at 11:01:44PM +0800, Jie Song wrote:
> From: Jie Song <songjie_yewu@cmss.chinamobile.com>
> 
> When starting a dummy QEMU process with virsh, monitor_init_qmp() enables
> IOThread monitoring of the QMP fd by default. However, a race condition
> exists during the initialization phase: the IOThread only removes the
> main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
> which may be delayed under high system load.
> 
> This creates a window between monitor_qmp_setup_handlers_bh() and
> qio_net_listener_set_client_func_full() where both the main thread and
> IOThread are simultaneously monitoring the same fd and processing events.
> This race can cause either the main thread or the IOThread to hang and
> become unresponsive.
> 
> Fix this by proactively cleaning up the listener's IO sources in
> monitor_init_qmp() before the IOThread initializes QMP monitoring,
> ensuring exclusive fd ownership and eliminating the race condition.
> 
> The fix introduces socket_chr_listener_cleanup() to destroy and unref
> all existing IO sources on the socket chardev listener, guaranteeing
> that no concurrent fd monitoring occurs during the transition to
> IOThread handling.
> 
> Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
> ---
>  chardev/char-socket.c         | 18 ++++++++++++++++++
>  include/chardev/char-socket.h |  2 ++
>  monitor/qmp.c                 |  6 ++++++
>  3 files changed, 26 insertions(+)
> 
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 62852e3caf..073a9da855 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -656,6 +656,24 @@ static void tcp_chr_telnet_destroy(SocketChardev *s)
>      }
>  }
>  
> +void socket_chr_listener_cleanup(Chardev *chr)
> +{
> +    SocketChardev *s = SOCKET_CHARDEV(chr);
> +
> +    if (s->listener) {
> +        QIONetListener *listener = s->listener;
> +        size_t i;
> +
> +        for (i = 0; i < listener->nsioc; i++) {
> +            if (listener->io_source[i]) {
> +                g_source_destroy(listener->io_source[i]);
> +                g_source_unref(listener->io_source[i]);
> +                listener->io_source[i] = NULL;
> +            }
> +        }
> +    }
> +}
> +
>  static void tcp_chr_update_read_handler(Chardev *chr)
>  {
>      SocketChardev *s = SOCKET_CHARDEV(chr);
> diff --git a/include/chardev/char-socket.h b/include/chardev/char-socket.h
> index d6d13ad37f..682440c6de 100644
> --- a/include/chardev/char-socket.h
> +++ b/include/chardev/char-socket.h
> @@ -84,4 +84,6 @@ typedef struct SocketChardev SocketChardev;
>  DECLARE_INSTANCE_CHECKER(SocketChardev, SOCKET_CHARDEV,
>                           TYPE_CHARDEV_SOCKET)
>  
> +void socket_chr_listener_cleanup(Chardev *chr);
> +
>  #endif /* CHAR_SOCKET_H */
> diff --git a/monitor/qmp.c b/monitor/qmp.c
> index cb99a12d94..d9d1fafa70 100644
> --- a/monitor/qmp.c
> +++ b/monitor/qmp.c
> @@ -25,6 +25,7 @@
>  #include "qemu/osdep.h"
>  
>  #include "chardev/char-io.h"
> +#include "chardev/char-socket.h"
>  #include "monitor-internal.h"
>  #include "qapi/error.h"
>  #include "qapi/qapi-commands-control.h"
> @@ -537,6 +538,11 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
>           * e.g. the chardev is in client mode, with wait=on.
>           */
>          remove_fd_in_watch(chr);
> +        /*
> +         * Clean up listener IO sources early to prevent racy fd
> +         * handling between the main thread and the I/O thread.
> +         */
> +        socket_chr_listener_cleanup(chr);

This is unsafe (may crash) because the chardev used by the monitor
may not be a SocketChardev. Having todo back

QMP is already calling 'remove_fd_in_watch' to purge the I/O sources.
So if there is a flaw, I would expect any fix to be entirely in the
chardev code, in a path from remove_fd_in_watch.

>          /*
>           * We can't call qemu_chr_fe_set_handlers() directly here
>           * since chardev might be running in the monitor I/O

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race
Posted by Jie Song 1 month ago
Hi Daniel,

Thank you for your review and valuable feedback.

You're absolutely right about the concerns. Let me clarify the scenario 
this patch addresses:
The remove_fd_in_watch() function handles the client-side connection case. 
However, when the chardev is configured in server mode 
(e.g., -qmp unix:/var/lib/libvirt/qemu/qmp-xxx/qmp.monitor,server=on,wait=off), 
there's listener that needs cleanup. The socket_chr_listener_cleanup() 
is specifically intended to handle this server-side listener to prevent the 
race condition between the main thread and IOThread monitoring the same listener fd.

I apologize for the unsafe assumption that the chardev would always be a SocketChardev.
You're correct that this could cause crashes with other chardev types. 
To fix this properly, I’m considering a more general design. 
Would the following approach be acceptable?

  1.Add a chr_listener_cleanup callback to the ChardevClass structure
  2.Implement this callback in SocketChardev
  3.Register it in char_socket_class_init()
  4.In monitor/qmp.c, call it through the class method
    remove_fd_in_watch(chr);
    ChardevClass *cc = CHARDEV_GET_CLASS(chr);
    if (cc->chr_listener_cleanup) {
        cc->chr_listener_cleanup(chr);
    }

This would maintain type safety while keeping the fix properly abstracted
at the chardev layer. Would this fix make sense?

Looking forward to your guidance.

Best regards,
Jie Song 

Re: [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race
Posted by Markus Armbruster 1 month ago
Daniel, is this in your area of expertise?

Jie Song, can you identify the commit that introduced the bug?

Jie Song <mail@jiesong.me> writes:

> From: Jie Song <songjie_yewu@cmss.chinamobile.com>
>
> When starting a dummy QEMU process with virsh, monitor_init_qmp() enables
> IOThread monitoring of the QMP fd by default. However, a race condition
> exists during the initialization phase: the IOThread only removes the
> main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
> which may be delayed under high system load.
>
> This creates a window between monitor_qmp_setup_handlers_bh() and
> qio_net_listener_set_client_func_full() where both the main thread and
> IOThread are simultaneously monitoring the same fd and processing events.
> This race can cause either the main thread or the IOThread to hang and
> become unresponsive.
>
> Fix this by proactively cleaning up the listener's IO sources in
> monitor_init_qmp() before the IOThread initializes QMP monitoring,
> ensuring exclusive fd ownership and eliminating the race condition.
>
> The fix introduces socket_chr_listener_cleanup() to destroy and unref
> all existing IO sources on the socket chardev listener, guaranteeing
> that no concurrent fd monitoring occurs during the transition to
> IOThread handling.
>
> Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
> ---
>  chardev/char-socket.c         | 18 ++++++++++++++++++
>  include/chardev/char-socket.h |  2 ++
>  monitor/qmp.c                 |  6 ++++++
>  3 files changed, 26 insertions(+)
>
> diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> index 62852e3caf..073a9da855 100644
> --- a/chardev/char-socket.c
> +++ b/chardev/char-socket.c
> @@ -656,6 +656,24 @@ static void tcp_chr_telnet_destroy(SocketChardev *s)
>      }
>  }
>  
> +void socket_chr_listener_cleanup(Chardev *chr)
> +{
> +    SocketChardev *s = SOCKET_CHARDEV(chr);
> +
> +    if (s->listener) {
> +        QIONetListener *listener = s->listener;
> +        size_t i;
> +
> +        for (i = 0; i < listener->nsioc; i++) {
> +            if (listener->io_source[i]) {
> +                g_source_destroy(listener->io_source[i]);
> +                g_source_unref(listener->io_source[i]);
> +                listener->io_source[i] = NULL;
> +            }
> +        }
> +    }
> +}
> +
>  static void tcp_chr_update_read_handler(Chardev *chr)
>  {
>      SocketChardev *s = SOCKET_CHARDEV(chr);
> diff --git a/include/chardev/char-socket.h b/include/chardev/char-socket.h
> index d6d13ad37f..682440c6de 100644
> --- a/include/chardev/char-socket.h
> +++ b/include/chardev/char-socket.h
> @@ -84,4 +84,6 @@ typedef struct SocketChardev SocketChardev;
>  DECLARE_INSTANCE_CHECKER(SocketChardev, SOCKET_CHARDEV,
>                           TYPE_CHARDEV_SOCKET)
>  
> +void socket_chr_listener_cleanup(Chardev *chr);
> +
>  #endif /* CHAR_SOCKET_H */
> diff --git a/monitor/qmp.c b/monitor/qmp.c
> index cb99a12d94..d9d1fafa70 100644
> --- a/monitor/qmp.c
> +++ b/monitor/qmp.c
> @@ -25,6 +25,7 @@
>  #include "qemu/osdep.h"
>  
>  #include "chardev/char-io.h"
> +#include "chardev/char-socket.h"
>  #include "monitor-internal.h"
>  #include "qapi/error.h"
>  #include "qapi/qapi-commands-control.h"
> @@ -537,6 +538,11 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
>           * e.g. the chardev is in client mode, with wait=on.
>           */
>          remove_fd_in_watch(chr);
> +        /*
> +         * Clean up listener IO sources early to prevent racy fd
> +         * handling between the main thread and the I/O thread.
> +         */
> +        socket_chr_listener_cleanup(chr);
>          /*
>           * We can't call qemu_chr_fe_set_handlers() directly here
>           * since chardev might be running in the monitor I/O
Re: [PATCH] monitor/qmp: cleanup socket listener sources early to avoid fd handling race
Posted by Jie Song 1 month ago
> Daniel, is this in your area of expertise?
> 
> Jie Song, can you identify the commit that introduced the bug?
> 
> Jie Song <mail@jiesong.me> writes:
> 
> > From: Jie Song <songjie_yewu@cmss.chinamobile.com>
> >
> > When starting a dummy QEMU process with virsh, monitor_init_qmp() enables
> > IOThread monitoring of the QMP fd by default. However, a race condition
> > exists during the initialization phase: the IOThread only removes the
> > main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
> > which may be delayed under high system load.
> >
> > This creates a window between monitor_qmp_setup_handlers_bh() and
> > qio_net_listener_set_client_func_full() where both the main thread and
> > IOThread are simultaneously monitoring the same fd and processing events.
> > This race can cause either the main thread or the IOThread to hang and
> > become unresponsive.
> >
> > Fix this by proactively cleaning up the listener's IO sources in
> > monitor_init_qmp() before the IOThread initializes QMP monitoring,
> > ensuring exclusive fd ownership and eliminating the race condition.
> >
> > The fix introduces socket_chr_listener_cleanup() to destroy and unref
> > all existing IO sources on the socket chardev listener, guaranteeing
> > that no concurrent fd monitoring occurs during the transition to
> > IOThread handling.
> >
> > Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
> > ---
> >  chardev/char-socket.c         | 18 ++++++++++++++++++
> >  include/chardev/char-socket.h |  2 ++
> >  monitor/qmp.c                 |  6 ++++++
> >  3 files changed, 26 insertions(+)
> >
> > diff --git a/chardev/char-socket.c b/chardev/char-socket.c
> > index 62852e3caf..073a9da855 100644
> > --- a/chardev/char-socket.c
> > +++ b/chardev/char-socket.c
> > @@ -656,6 +656,24 @@ static void tcp_chr_telnet_destroy(SocketChardev *s)
> >      }
> >  }
> >  
> > +void socket_chr_listener_cleanup(Chardev *chr)
> > +{
> > +    SocketChardev *s = SOCKET_CHARDEV(chr);
> > +
> > +    if (s->listener) {
> > +        QIONetListener *listener = s->listener;
> > +        size_t i;
> > +
> > +        for (i = 0; i < listener->nsioc; i++) {
> > +            if (listener->io_source[i]) {
> > +                g_source_destroy(listener->io_source[i]);
> > +                g_source_unref(listener->io_source[i]);
> > +                listener->io_source[i] = NULL;
> > +            }
> > +        }
> > +    }
> > +}
> > +
> >  static void tcp_chr_update_read_handler(Chardev *chr)
> >  {
> >      SocketChardev *s = SOCKET_CHARDEV(chr);
> > diff --git a/include/chardev/char-socket.h b/include/chardev/char-socket.h
> > index d6d13ad37f..682440c6de 100644
> > --- a/include/chardev/char-socket.h
> > +++ b/include/chardev/char-socket.h
> > @@ -84,4 +84,6 @@ typedef struct SocketChardev SocketChardev;
> >  DECLARE_INSTANCE_CHECKER(SocketChardev, SOCKET_CHARDEV,
> >                           TYPE_CHARDEV_SOCKET)
> >  
> > +void socket_chr_listener_cleanup(Chardev *chr);
> > +
> >  #endif /* CHAR_SOCKET_H */
> > diff --git a/monitor/qmp.c b/monitor/qmp.c
> > index cb99a12d94..d9d1fafa70 100644
> > --- a/monitor/qmp.c
> > +++ b/monitor/qmp.c
> > @@ -25,6 +25,7 @@
> >  #include "qemu/osdep.h"
> >  
> >  #include "chardev/char-io.h"
> > +#include "chardev/char-socket.h"
> >  #include "monitor-internal.h"
> >  #include "qapi/error.h"
> >  #include "qapi/qapi-commands-control.h"
> > @@ -537,6 +538,11 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
> >           * e.g. the chardev is in client mode, with wait=on.
> >           */
> >          remove_fd_in_watch(chr);
> > +        /*
> > +         * Clean up listener IO sources early to prevent racy fd
> > +         * handling between the main thread and the I/O thread.
> > +         */
> > +        socket_chr_listener_cleanup(chr);
> >          /*
> >           * We can't call qemu_chr_fe_set_handlers() directly here
> >           * since chardev might be running in the monitor I/O

Hi Markus,

Thank you for the question.

The issue you're referring to is not tied to any specific commit but rather 
arises from the current process flow. Specifically, in scenarios like the one 
with virsh starting a dummy QEMU process, the following command line may
triggers the bug:
`/usr/bin/qemu-system-x86_64 -S -no-user-config -nodefaults -nographic -machine
none,accel=tcg -qmp unix:/var/lib/libvirt/qemu/qmp-xxx/qmp.monitor,server=on,wait=off`

We can reproduce this issue using gdb with the following steps:
  1.Pause the I/O thread: Execute monitor_init_qmp in the main thread, and before 
    aio_bh_schedule_oneshot is called, suspend the I/O thread (scheduler-locking on). 
    This simulates a high load scenario.
  2.Set a breakpoint at qemu_accept: Allow the main thread to continue running. 
    The main thread will reach qemu_accept, and at this point, the main thread will 
    be listening for the corresponding chardev (the QMP socket).
  3.Simulate a client connection: Use nc -U to simulate a client connecting to the 
    Unix socket. The main thread will detect the event and hit the breakpoint at qemu_accept.
  4.Resume the I/O thread: Now, switch to the I/O thread and allow it to run. 
    It will also reach the qemu_accept breakpoint, creating a race condition where 
    both threads are handling the same accept event.

This race causes either the main thread or the IOThread to hang and become unresponsive.

The issue stems from the window between when the main thread sets up the listener watch and
when the IOThread takes over exclusive ownership. Under normal conditions this window is 
very small, but under high load or with specific timing, both threads can end up processing 
events on the same fd simultaneously.

I hope this explanation clarifies the issue. 

Best regards,
Jie Song