chardev/char-io.c | 8 ++++++++ chardev/char-socket.c | 10 ++++++++++ include/chardev/char-io.h | 2 ++ include/chardev/char.h | 2 ++ monitor/qmp.c | 5 +++++ 5 files changed, 27 insertions(+)
From: Jie Song <songjie_yewu@cmss.chinamobile.com>
When starting a dummy QEMU process with virsh version, monitor_init_qmp()
enables IOThread monitoring of the QMP fd by default. However, a race
condition exists during the initialization phase: the IOThread only removes
the main thread's fd watch when it reaches qio_net_listener_set_client_func_full(),
which may be delayed under high system load.
This creates a window between monitor_qmp_setup_handlers_bh() and
qio_net_listener_set_client_func_full() where both the main thread and
IOThread are simultaneously monitoring the same fd and processing events.
This race can cause either the main thread or the IOThread to hang and
become unresponsive.
Fix this by proactively cleaning up the listener's IO sources in
monitor_init_qmp() before the IOThread initializes QMP monitoring,
ensuring exclusive fd ownership and eliminating the race condition.
Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
---
Changes in v4:
- Correct typos and add braces, reviewed by Marc-André Lureau.
- Link to v3:
https://lists.nongnu.org/archive/html/qemu-devel/2025-11/msg03504.html
- Link to v2:
https://lists.nongnu.org/archive/html/qemu-devel/2025-11/msg02443.html
- Link to v1:
https://lists.nongnu.org/archive/html/qemu-devel/2025-11/msg01621.html
---
chardev/char-io.c | 8 ++++++++
chardev/char-socket.c | 10 ++++++++++
include/chardev/char-io.h | 2 ++
include/chardev/char.h | 2 ++
monitor/qmp.c | 5 +++++
5 files changed, 27 insertions(+)
diff --git a/chardev/char-io.c b/chardev/char-io.c
index 3be17b51ca..beac5cd245 100644
--- a/chardev/char-io.c
+++ b/chardev/char-io.c
@@ -182,3 +182,11 @@ int io_channel_send(QIOChannel *ioc, const void *buf, size_t len)
{
return io_channel_send_full(ioc, buf, len, NULL, 0);
}
+
+void remove_listener_fd_in_watch(Chardev *chr)
+{
+ ChardevClass *cc = CHARDEV_GET_CLASS(chr);
+ if (cc->chr_listener_cleanup) {
+ cc->chr_listener_cleanup(chr);
+ }
+}
diff --git a/chardev/char-socket.c b/chardev/char-socket.c
index 26d2f11202..3f45dd2ecd 100644
--- a/chardev/char-socket.c
+++ b/chardev/char-socket.c
@@ -1570,6 +1570,15 @@ char_socket_get_connected(Object *obj, Error **errp)
return s->state == TCP_CHARDEV_STATE_CONNECTED;
}
+static void tcp_chr_listener_cleanup(Chardev *chr)
+{
+ SocketChardev *s = SOCKET_CHARDEV(chr);
+ if (s->listener) {
+ qio_net_listener_set_client_func_full(s->listener, NULL, NULL,
+ NULL, chr->gcontext);
+ }
+}
+
static void char_socket_class_init(ObjectClass *oc, const void *data)
{
ChardevClass *cc = CHARDEV_CLASS(oc);
@@ -1587,6 +1596,7 @@ static void char_socket_class_init(ObjectClass *oc, const void *data)
cc->chr_add_client = tcp_chr_add_client;
cc->chr_add_watch = tcp_chr_add_watch;
cc->chr_update_read_handler = tcp_chr_update_read_handler;
+ cc->chr_listener_cleanup = tcp_chr_listener_cleanup;
object_class_property_add(oc, "addr", "SocketAddress",
char_socket_get_addr, NULL,
diff --git a/include/chardev/char-io.h b/include/chardev/char-io.h
index ac379ea70e..540131346d 100644
--- a/include/chardev/char-io.h
+++ b/include/chardev/char-io.h
@@ -43,4 +43,6 @@ int io_channel_send(QIOChannel *ioc, const void *buf, size_t len);
int io_channel_send_full(QIOChannel *ioc, const void *buf, size_t len,
int *fds, size_t nfds);
+void remove_listener_fd_in_watch(Chardev *chr);
+
#endif /* CHAR_IO_H */
diff --git a/include/chardev/char.h b/include/chardev/char.h
index b65e9981c1..192cad67d4 100644
--- a/include/chardev/char.h
+++ b/include/chardev/char.h
@@ -307,6 +307,8 @@ struct ChardevClass {
/* handle various events */
void (*chr_be_event)(Chardev *s, QEMUChrEvent event);
+
+ void (*chr_listener_cleanup)(Chardev *chr);
};
Chardev *qemu_chardev_new(const char *id, const char *typename,
diff --git a/monitor/qmp.c b/monitor/qmp.c
index cb99a12d94..7ae070dc8d 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -537,6 +537,11 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
* e.g. the chardev is in client mode, with wait=on.
*/
remove_fd_in_watch(chr);
+ /*
+ * Clean up listener IO sources early to prevent racy fd
+ * handling between the main thread and the I/O thread.
+ */
+ remove_listener_fd_in_watch(chr);
/*
* We can't call qemu_chr_fe_set_handlers() directly here
* since chardev might be running in the monitor I/O
--
2.43.0
Jie Song, Marc-André, is this bug serious enough and the fix safe enough to still go into 10.2? Jie Song <mail@jiesong.me> writes: > From: Jie Song <songjie_yewu@cmss.chinamobile.com> > > When starting a dummy QEMU process with virsh version, monitor_init_qmp() > enables IOThread monitoring of the QMP fd by default. However, a race > condition exists during the initialization phase: the IOThread only removes > the main thread's fd watch when it reaches qio_net_listener_set_client_func_full(), > which may be delayed under high system load. > > This creates a window between monitor_qmp_setup_handlers_bh() and > qio_net_listener_set_client_func_full() where both the main thread and > IOThread are simultaneously monitoring the same fd and processing events. > This race can cause either the main thread or the IOThread to hang and > become unresponsive. > > Fix this by proactively cleaning up the listener's IO sources in > monitor_init_qmp() before the IOThread initializes QMP monitoring, > ensuring exclusive fd ownership and eliminating the race condition. > > Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com> > Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Hi Markus, > Jie Song, Marc-André, is this bug serious enough and the fix safe enough > to still go into 10.2? First, regarding the seriousness of this bug, although the probability of encountering it in a production environment is relatively low, it has existed for quite some time. Secondly, with regard to the safety of this fix, it has been verified successfully in the test environment. However, it would be better if more people could help to review it to further ensure its robustness. > > Jie Song <mail@jiesong.me> writes: > > > From: Jie Song <songjie_yewu@cmss.chinamobile.com> > > > > When starting a dummy QEMU process with virsh version, monitor_init_qmp() > > enables IOThread monitoring of the QMP fd by default. However, a race > > condition exists during the initialization phase: the IOThread only removes > > the main thread's fd watch when it reaches qio_net_listener_set_client_func_full(), > > which may be delayed under high system load. > > > > This creates a window between monitor_qmp_setup_handlers_bh() and > > qio_net_listener_set_client_func_full() where both the main thread and > > IOThread are simultaneously monitoring the same fd and processing events. > > This race can cause either the main thread or the IOThread to hang and > > become unresponsive. > > > > Fix this by proactively cleaning up the listener's IO sources in > > monitor_init_qmp() before the IOThread initializes QMP monitoring, > > ensuring exclusive fd ownership and eliminating the race condition. > > > > Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com> > > Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> Regards, Jie Song
Jie Song <mail@jiesong.me> writes: > Hi Markus, > >> Jie Song, Marc-André, is this bug serious enough and the fix safe enough >> to still go into 10.2? > > First, regarding the seriousness of this bug, although the probability of encountering > it in a production environment is relatively low, it has existed for quite some time. > > Secondly, with regard to the safety of this fix, it has been verified successfully > in the test environment. However, it would be better if more people could help to > review it to further ensure its robustness. This confirms Marc-André's "too late for 10.2" feeling. I'll track this patch for 11.0. More review would be nice, but if we can get it, I'll get the patch merged early in the development cycle. Thank you!
> Jie Song <mail@jiesong.me> writes: > > > Hi Markus, > > > >> Jie Song, Marc-André, is this bug serious enough and the fix safe enough > >> to still go into 10.2? > > > > First, regarding the seriousness of this bug, although the probability of > > encountering > > it in a production environment is relatively low, it has existed for quite > > some time. > > > > Secondly, with regard to the safety of this fix, it has been verified > > successfully > > in the test environment. However, it would be better if more people could > > help to > > review it to further ensure its robustness. > > This confirms Marc-André's "too late for 10.2" feeling. > > I'll track this patch for 11.0. More review would be nice, but if we > can get it, I'll get the patch merged early in the development cycle. > > Thank you! Hi Markus, Thanks for the update and for tracking this patch for 11.0. I’ll keep following up on this and will address any comments or issues that come up. Thanks again for your support. Best regards, Jie Song
Hi Markus On Mon, Dec 1, 2025 at 10:00 AM Markus Armbruster <armbru@redhat.com> wrote: > > Jie Song, Marc-André, is this bug serious enough and the fix safe enough > to still go into 10.2? > My feeling is that it's a bit late for 10.2, as I suppose this bug has been present for a long time and the risk of regression is high. Also not enough people reviewed it. > Jie Song <mail@jiesong.me> writes: > > > From: Jie Song <songjie_yewu@cmss.chinamobile.com> > > > > When starting a dummy QEMU process with virsh version, monitor_init_qmp() > > enables IOThread monitoring of the QMP fd by default. However, a race > > condition exists during the initialization phase: the IOThread only removes > > the main thread's fd watch when it reaches qio_net_listener_set_client_func_full(), > > which may be delayed under high system load. > > > > This creates a window between monitor_qmp_setup_handlers_bh() and > > qio_net_listener_set_client_func_full() where both the main thread and > > IOThread are simultaneously monitoring the same fd and processing events. > > This race can cause either the main thread or the IOThread to hang and > > become unresponsive. > > > > Fix this by proactively cleaning up the listener's IO sources in > > monitor_init_qmp() before the IOThread initializes QMP monitoring, > > ensuring exclusive fd ownership and eliminating the race condition. > > > > Signed-off-by: Jie Song <songjie_yewu@cmss.chinamobile.com> > > Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com> > -- Marc-André Lureau
© 2016 - 2025 Red Hat, Inc.