[PATCH v4] monitor/qmp: cleanup SocketChardev listener sources early to avoid fd handling race

Jie Song posted 1 patch 2 months, 2 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20251125140706.114197-1-mail@jiesong.me
Maintainers: "Marc-André Lureau" <marcandre.lureau@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, Markus Armbruster <armbru@redhat.com>
chardev/char-io.c         |  8 ++++++++
chardev/char-socket.c     | 10 ++++++++++
include/chardev/char-io.h |  2 ++
include/chardev/char.h    |  2 ++
monitor/qmp.c             |  5 +++++
5 files changed, 27 insertions(+)
[PATCH v4] monitor/qmp: cleanup SocketChardev listener sources early to avoid fd handling race
Posted by Jie Song 2 months, 2 weeks ago
From: Jie Song <songjie_yewu@cmss.chinamobile.com>

When starting a dummy QEMU process with virsh version, monitor_init_qmp()
enables IOThread monitoring of the QMP fd by default. However, a race
condition exists during the initialization phase: the IOThread only removes
the main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
which may be delayed under high system load.

This creates a window between monitor_qmp_setup_handlers_bh() and
qio_net_listener_set_client_func_full() where both the main thread and
IOThread are simultaneously monitoring the same fd and processing events.
This race can cause either the main thread or the IOThread to hang and
become unresponsive.

Fix this by proactively cleaning up the listener's IO sources in
monitor_init_qmp() before the IOThread initializes QMP monitoring,
ensuring exclusive fd ownership and eliminating the race condition.

Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
Changes in v4:
- Correct typos and add braces, reviewed by Marc-André Lureau.
- Link to v3:
  https://lists.nongnu.org/archive/html/qemu-devel/2025-11/msg03504.html
- Link to v2:
  https://lists.nongnu.org/archive/html/qemu-devel/2025-11/msg02443.html
- Link to v1:
  https://lists.nongnu.org/archive/html/qemu-devel/2025-11/msg01621.html
---
 chardev/char-io.c         |  8 ++++++++
 chardev/char-socket.c     | 10 ++++++++++
 include/chardev/char-io.h |  2 ++
 include/chardev/char.h    |  2 ++
 monitor/qmp.c             |  5 +++++
 5 files changed, 27 insertions(+)

diff --git a/chardev/char-io.c b/chardev/char-io.c
index 3be17b51ca..beac5cd245 100644
--- a/chardev/char-io.c
+++ b/chardev/char-io.c
@@ -182,3 +182,11 @@ int io_channel_send(QIOChannel *ioc, const void *buf, size_t len)
 {
     return io_channel_send_full(ioc, buf, len, NULL, 0);
 }
+
+void remove_listener_fd_in_watch(Chardev *chr)
+{
+    ChardevClass *cc = CHARDEV_GET_CLASS(chr);
+    if (cc->chr_listener_cleanup) {
+        cc->chr_listener_cleanup(chr);
+    }
+}
diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 26d2f11202..3f45dd2ecd 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -1570,6 +1570,15 @@ char_socket_get_connected(Object *obj, Error **errp)
     return s->state == TCP_CHARDEV_STATE_CONNECTED;
 }
 
+static void tcp_chr_listener_cleanup(Chardev *chr)
+{
+    SocketChardev *s = SOCKET_CHARDEV(chr);
+    if (s->listener) {
+        qio_net_listener_set_client_func_full(s->listener, NULL, NULL,
+                                              NULL, chr->gcontext);
+    }
+}
+
 static void char_socket_class_init(ObjectClass *oc, const void *data)
 {
     ChardevClass *cc = CHARDEV_CLASS(oc);
@@ -1587,6 +1596,7 @@ static void char_socket_class_init(ObjectClass *oc, const void *data)
     cc->chr_add_client = tcp_chr_add_client;
     cc->chr_add_watch = tcp_chr_add_watch;
     cc->chr_update_read_handler = tcp_chr_update_read_handler;
+    cc->chr_listener_cleanup = tcp_chr_listener_cleanup;
 
     object_class_property_add(oc, "addr", "SocketAddress",
                               char_socket_get_addr, NULL,
diff --git a/include/chardev/char-io.h b/include/chardev/char-io.h
index ac379ea70e..540131346d 100644
--- a/include/chardev/char-io.h
+++ b/include/chardev/char-io.h
@@ -43,4 +43,6 @@ int io_channel_send(QIOChannel *ioc, const void *buf, size_t len);
 int io_channel_send_full(QIOChannel *ioc, const void *buf, size_t len,
                          int *fds, size_t nfds);
 
+void remove_listener_fd_in_watch(Chardev *chr);
+
 #endif /* CHAR_IO_H */
diff --git a/include/chardev/char.h b/include/chardev/char.h
index b65e9981c1..192cad67d4 100644
--- a/include/chardev/char.h
+++ b/include/chardev/char.h
@@ -307,6 +307,8 @@ struct ChardevClass {
 
     /* handle various events */
     void (*chr_be_event)(Chardev *s, QEMUChrEvent event);
+
+    void (*chr_listener_cleanup)(Chardev *chr);
 };
 
 Chardev *qemu_chardev_new(const char *id, const char *typename,
diff --git a/monitor/qmp.c b/monitor/qmp.c
index cb99a12d94..7ae070dc8d 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -537,6 +537,11 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
          * e.g. the chardev is in client mode, with wait=on.
          */
         remove_fd_in_watch(chr);
+        /*
+         * Clean up listener IO sources early to prevent racy fd
+         * handling between the main thread and the I/O thread.
+         */
+        remove_listener_fd_in_watch(chr);
         /*
          * We can't call qemu_chr_fe_set_handlers() directly here
          * since chardev might be running in the monitor I/O
-- 
2.43.0


Re: [PATCH v4] monitor/qmp: cleanup SocketChardev listener sources early to avoid fd handling race
Posted by Michael Tokarev 1 month ago
On 11/25/25 17:07, Jie Song wrote:
> From: Jie Song <songjie_yewu@cmss.chinamobile.com>
> 
> When starting a dummy QEMU process with virsh version, monitor_init_qmp()
> enables IOThread monitoring of the QMP fd by default. However, a race
> condition exists during the initialization phase: the IOThread only removes
> the main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
> which may be delayed under high system load.
> 
> This creates a window between monitor_qmp_setup_handlers_bh() and
> qio_net_listener_set_client_func_full() where both the main thread and
> IOThread are simultaneously monitoring the same fd and processing events.
> This race can cause either the main thread or the IOThread to hang and
> become unresponsive.
> 
> Fix this by proactively cleaning up the listener's IO sources in
> monitor_init_qmp() before the IOThread initializes QMP monitoring,
> ensuring exclusive fd ownership and eliminating the race condition.

I'm picking this up for qemu-stable series (10.0.x, 10.1.x, 10.2.x),
for now.  Please let me know if I shouldn't.

Yes I've seen this change has a breakage potential, too -- let's see
how it works out.

Thanks,

/mjt
Re: [PATCH v4] monitor/qmp: cleanup SocketChardev listener sources early to avoid fd handling race
Posted by Jie Song 3 weeks, 5 days ago
Hi Michael, sorry for the late reply — this message was unfortunately
caught by my mail filter and I just noticed it.

Regarding applying this change to qemu-stable: this patch only 
uses the existing qio_net_listener_set_client_func_full() helper.
Considering this limited dependency and the fact that the change
is confined to cleaning up the listener sources earlier to avoid
the fd handling race, it should be safe to apply this patch to 
the qemu-stable series.

Thanks for taking a look and for considering it for the stable branches.

> On 11/25/25 17:07, Jie Song wrote:
> > From: Jie Song <songjie_yewu@cmss.chinamobile.com>
> > 
> > When starting a dummy QEMU process with virsh version, monitor_init_qmp()
> > enables IOThread monitoring of the QMP fd by default. However, a race
> > condition exists during the initialization phase: the IOThread only removes
> > the main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
> > which may be delayed under high system load.
> > 
> > This creates a window between monitor_qmp_setup_handlers_bh() and
> > qio_net_listener_set_client_func_full() where both the main thread and
> > IOThread are simultaneously monitoring the same fd and processing events.
> > This race can cause either the main thread or the IOThread to hang and
> > become unresponsive.
> > 
> > Fix this by proactively cleaning up the listener's IO sources in
> > monitor_init_qmp() before the IOThread initializes QMP monitoring,
> > ensuring exclusive fd ownership and eliminating the race condition.
> 
> I'm picking this up for qemu-stable series (10.0.x, 10.1.x, 10.2.x),
> for now.  Please let me know if I shouldn't.
> 
> Yes I've seen this change has a breakage potential, too -- let's see
> how it works out.
> 
> Thanks,
> 
> /mjt

Best regards,
Jie Song

Re: [PATCH v4] monitor/qmp: cleanup SocketChardev listener sources early to avoid fd handling race
Posted by Markus Armbruster 2 months, 1 week ago
Jie Song, Marc-André, is this bug serious enough and the fix safe enough
to still go into 10.2?

Jie Song <mail@jiesong.me> writes:

> From: Jie Song <songjie_yewu@cmss.chinamobile.com>
>
> When starting a dummy QEMU process with virsh version, monitor_init_qmp()
> enables IOThread monitoring of the QMP fd by default. However, a race
> condition exists during the initialization phase: the IOThread only removes
> the main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
> which may be delayed under high system load.
>
> This creates a window between monitor_qmp_setup_handlers_bh() and
> qio_net_listener_set_client_func_full() where both the main thread and
> IOThread are simultaneously monitoring the same fd and processing events.
> This race can cause either the main thread or the IOThread to hang and
> become unresponsive.
>
> Fix this by proactively cleaning up the listener's IO sources in
> monitor_init_qmp() before the IOThread initializes QMP monitoring,
> ensuring exclusive fd ownership and eliminating the race condition.
>
> Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
> Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Re: [PATCH v4] monitor/qmp: cleanup SocketChardev listener sources early to avoid fd handling race
Posted by Jie Song 2 months, 1 week ago
Hi Markus,

> Jie Song, Marc-André, is this bug serious enough and the fix safe enough
> to still go into 10.2?

First, regarding the seriousness of this bug, although the probability of encountering 
it in a production environment is relatively low, it has existed for quite some time.

Secondly, with regard to the safety of this fix, it has been verified successfully
in the test environment. However, it would be better if more people could help to
review it to further ensure its robustness.

> 
> Jie Song <mail@jiesong.me> writes:
> 
> > From: Jie Song <songjie_yewu@cmss.chinamobile.com>
> >
> > When starting a dummy QEMU process with virsh version, monitor_init_qmp()
> > enables IOThread monitoring of the QMP fd by default. However, a race
> > condition exists during the initialization phase: the IOThread only removes
> > the main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
> > which may be delayed under high system load.
> >
> > This creates a window between monitor_qmp_setup_handlers_bh() and
> > qio_net_listener_set_client_func_full() where both the main thread and
> > IOThread are simultaneously monitoring the same fd and processing events.
> > This race can cause either the main thread or the IOThread to hang and
> > become unresponsive.
> >
> > Fix this by proactively cleaning up the listener's IO sources in
> > monitor_init_qmp() before the IOThread initializes QMP monitoring,
> > ensuring exclusive fd ownership and eliminating the race condition.
> >
> > Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
> > Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>

Regards,
Jie Song

Re: [PATCH v4] monitor/qmp: cleanup SocketChardev listener sources early to avoid fd handling race
Posted by Markus Armbruster 2 months, 1 week ago
Jie Song <mail@jiesong.me> writes:

> Hi Markus,
>
>> Jie Song, Marc-André, is this bug serious enough and the fix safe enough
>> to still go into 10.2?
>
> First, regarding the seriousness of this bug, although the probability of encountering 
> it in a production environment is relatively low, it has existed for quite some time.
>
> Secondly, with regard to the safety of this fix, it has been verified successfully
> in the test environment. However, it would be better if more people could help to
> review it to further ensure its robustness.

This confirms Marc-André's "too late for 10.2" feeling.

I'll track this patch for 11.0.  More review would be nice, but if we
can get it, I'll get the patch merged early in the development cycle.

Thank you!
Re: [PATCH v4] monitor/qmp: cleanup SocketChardev listener sources early to avoid fd handling race
Posted by Jie Song 2 months, 1 week ago
> Jie Song <mail@jiesong.me> writes:
> 
> > Hi Markus,
> >
> >> Jie Song, Marc-André, is this bug serious enough and the fix safe enough
> >> to still go into 10.2?
> >
> > First, regarding the seriousness of this bug, although the probability of 
> > encountering 
> > it in a production environment is relatively low, it has existed for quite 
> > some time.
> >
> > Secondly, with regard to the safety of this fix, it has been verified 
> > successfully
> > in the test environment. However, it would be better if more people could 
> > help to
> > review it to further ensure its robustness.
> 
> This confirms Marc-André's "too late for 10.2" feeling.
> 
> I'll track this patch for 11.0.  More review would be nice, but if we
> can get it, I'll get the patch merged early in the development cycle.
> 
> Thank you!

Hi Markus,

Thanks for the update and for tracking this patch for 11.0.

I’ll keep following up on this and will address any comments
or issues that come up.

Thanks again for your support.

Best regards,
Jie Song

Re: [PATCH v4] monitor/qmp: cleanup SocketChardev listener sources early to avoid fd handling race
Posted by Marc-André Lureau 2 months, 1 week ago
Hi Markus

On Mon, Dec 1, 2025 at 10:00 AM Markus Armbruster <armbru@redhat.com> wrote:
>
> Jie Song, Marc-André, is this bug serious enough and the fix safe enough
> to still go into 10.2?
>

My feeling is that it's a bit late for 10.2, as I suppose this bug has
been present for a long time and the risk of regression is high. Also
not enough people reviewed it.

> Jie Song <mail@jiesong.me> writes:
>
> > From: Jie Song <songjie_yewu@cmss.chinamobile.com>
> >
> > When starting a dummy QEMU process with virsh version, monitor_init_qmp()
> > enables IOThread monitoring of the QMP fd by default. However, a race
> > condition exists during the initialization phase: the IOThread only removes
> > the main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
> > which may be delayed under high system load.
> >
> > This creates a window between monitor_qmp_setup_handlers_bh() and
> > qio_net_listener_set_client_func_full() where both the main thread and
> > IOThread are simultaneously monitoring the same fd and processing events.
> > This race can cause either the main thread or the IOThread to hang and
> > become unresponsive.
> >
> > Fix this by proactively cleaning up the listener's IO sources in
> > monitor_init_qmp() before the IOThread initializes QMP monitoring,
> > ensuring exclusive fd ownership and eliminating the race condition.
> >
> > Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
> > Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
>


-- 
Marc-André Lureau