qapi/qmp-dispatch.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently
received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup.
The root cause is a race condition between the main thread's shutdown sequence and the coroutine's dispatching mechanism. When handling a non-coroutine QMP command, qmp_dispatcher_co schedules the actual command execution as a bottom half in iohandler_ctx and then yields. At this suspended point, qmp_dispatcher_co_busy remains true.
Subsequently, the main thread in monitor_cleanup(), sets qmp_dispatcher_co_shutdown, and calls qmp_dispatcher_co_wake(). Since qmp_dispatcher_co_busy is already true, the aio_co_wake is skipped. The main thread then enters the AIO_WAIT_WHILE_UNLOCKED loop, it executes the scheduled BH (do_qmp_dispatch_bh) via aio_poll(iohandler_ctx, false), which attempts to wake up the coroutine, aio_co_wake schedules a new wake-up BH in iohandler_ctx. The main thread then blocks indefinitely in aio_poll(qemu_aio_context, true), while the coroutine's wake-up BH is starved in iohandler_ctx, qmp_dispatcher_co never reaches termination, resulting in a deadlock.
The execution sequence is illustrated below:
IO Thread Main Thread (qemu_aio_context) qmp_dispatcher_co (iohandler_ctx)
| | |
|-- query-commands | |
|-- qmp_dispatcher_co_wake() | |
| (sets busy = true) | |
| | <-- Wakes up in iohandler_ctx --> |
| | |-- qmp_dispatch()
| | |-- Schedules BH (do_qmp_dispatch_bh)
| | |-- qemu_coroutine_yield()
| | [State: Suspended, busy=true]
| [ quit triggered ] |
| |-- monitor_cleanup()
| |-- qmp_dispatcher_co_shutdown = true
| |-- qmp_dispatcher_co_wake()
| | -> Checks busy flag. It's TRUE!
| | -> Skips aio_co_wake().
| |
| |-- AIO_WAIT_WHILE_UNLOCKED:
| | |-- aio_poll(iohandler_ctx, false)
| | | -> Executes do_qmp_dispatch_bh
| | | -> Schedules 'co_schedule_bh' in iohandler_ctx
| | |
| | |-- aio_poll(qemu_aio_context, true)
| | | -> Blocks indefinitely! (Deadlock)
| |
| X (Main thread sleeping) X (Waiting for next iohandler_ctx poll)
To fix this, we add an explicit aio_wait_kick() in do_qmp_dispatch_bh() to break the main loop out of its blocking poll, allowing it to evaluate the loop condition and poll iohandler_ctx.
Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
Signed-off-by: wubo.bob <wubo.bob@bytedance.com>
---
qapi/qmp-dispatch.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 9bb1e6a9f4..e3897d5197 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -128,6 +128,16 @@ static void do_qmp_dispatch_bh(void *opaque)
data->cmd->fn(data->args, data->ret, data->errp);
monitor_set_cur(qemu_coroutine_self(), NULL);
aio_co_wake(data->co);
+
+ /*
+ * If the QMP dispatcher coroutine is waiting to be scheduled
+ * in iohandler_ctx, we must kick the main loop. This ensures
+ * that AIO_WAIT_WHILE_UNLOCKED() in monitor_cleanup() doesn't
+ * block indefinitely waiting for an event in qemu_aio_context,
+ * but actually gets the chance to poll iohandler_ctx and resume
+ * the coroutine.
+ */
+ aio_wait_kick();
}
/*
--
2.32.1 (Apple Git-133)
On 27.03.2026 16:10, hongmianquan wrote: > During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently > received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup. > > The root cause is a race condition between the main thread's shutdown sequence and the coroutine's dispatching mechanism. When handling a non-coroutine QMP command, qmp_dispatcher_co schedules the actual command execution as a bottom half in iohandler_ctx and then yields. At this suspended point, qmp_dispatcher_co_busy remains true. > Subsequently, the main thread in monitor_cleanup(), sets qmp_dispatcher_co_shutdown, and calls qmp_dispatcher_co_wake(). Since qmp_dispatcher_co_busy is already true, the aio_co_wake is skipped. The main thread then enters the AIO_WAIT_WHILE_UNLOCKED loop, it executes the scheduled BH (do_qmp_dispatch_bh) via aio_poll(iohandler_ctx, false), which attempts to wake up the coroutine, aio_co_wake schedules a new wake-up BH in iohandler_ctx. The main thread then blocks indefinitely in aio_poll(qemu_aio_context, true), while the coroutine's wake-up BH is starved in iohandler_ctx, qmp_dispatcher_co never reaches termination, resulting in a deadlock. > > The execution sequence is illustrated below: > > IO Thread Main Thread (qemu_aio_context) qmp_dispatcher_co (iohandler_ctx) > | | | > |-- query-commands | | > |-- qmp_dispatcher_co_wake() | | > | (sets busy = true) | | > | | <-- Wakes up in iohandler_ctx --> | > | | |-- qmp_dispatch() > | | |-- Schedules BH (do_qmp_dispatch_bh) > | | |-- qemu_coroutine_yield() > | | [State: Suspended, busy=true] > | [ quit triggered ] | > | |-- monitor_cleanup() > | |-- qmp_dispatcher_co_shutdown = true > | |-- qmp_dispatcher_co_wake() > | | -> Checks busy flag. It's TRUE! > | | -> Skips aio_co_wake(). > | | > | |-- AIO_WAIT_WHILE_UNLOCKED: > | | |-- aio_poll(iohandler_ctx, false) > | | | -> Executes do_qmp_dispatch_bh > | | | -> Schedules 'co_schedule_bh' in iohandler_ctx > | | | > | | |-- aio_poll(qemu_aio_context, true) > | | | -> Blocks indefinitely! (Deadlock) > | | > | X (Main thread sleeping) X (Waiting for next iohandler_ctx poll) > > To fix this, we add an explicit aio_wait_kick() in do_qmp_dispatch_bh() to break the main loop out of its blocking poll, allowing it to evaluate the loop condition and poll iohandler_ctx. Shouldn't this one too be picked up for the stable series? Sounds like a good candidate to me. Thanks, /mjt
Am 31.03.2026 um 23:38 hat Michael Tokarev geschrieben: > On 27.03.2026 16:10, hongmianquan wrote: > > During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently > > received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup. > > Shouldn't this one too be picked up for the stable series? > Sounds like a good candidate to me. Yes, makes sense, please pick it up. I also agree with the SCSI and IDE patch you picked up. Kevin
"hongmianquan" <hongmianquan@bytedance.com> writes: > During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently > received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup. > > The root cause is a race condition between the main thread's shutdown sequence and the coroutine's dispatching mechanism. When handling a non-coroutine QMP command, qmp_dispatcher_co schedules the actual command execution as a bottom half in iohandler_ctx and then yields. At this suspended point, qmp_dispatcher_co_busy remains true. > Subsequently, the main thread in monitor_cleanup(), sets qmp_dispatcher_co_shutdown, and calls qmp_dispatcher_co_wake(). Since qmp_dispatcher_co_busy is already true, the aio_co_wake is skipped. The main thread then enters the AIO_WAIT_WHILE_UNLOCKED loop, it executes the scheduled BH (do_qmp_dispatch_bh) via aio_poll(iohandler_ctx, false), which attempts to wake up the coroutine, aio_co_wake schedules a new wake-up BH in iohandler_ctx. The main thread then blocks indefinitely in aio_poll(qemu_aio_context, true), while the coroutine's wake-up BH is starved in iohandler_ctx, qmp_dispatcher_co never reaches termination, resulting in a deadlock. > > The execution sequence is illustrated below: > > IO Thread Main Thread (qemu_aio_context) qmp_dispatcher_co (iohandler_ctx) > | | | > |-- query-commands | | > |-- qmp_dispatcher_co_wake() | | > | (sets busy = true) | | > | | <-- Wakes up in iohandler_ctx --> | > | | |-- qmp_dispatch() > | | |-- Schedules BH (do_qmp_dispatch_bh) > | | |-- qemu_coroutine_yield() > | | [State: Suspended, busy=true] > | [ quit triggered ] | > | |-- monitor_cleanup() > | |-- qmp_dispatcher_co_shutdown = true > | |-- qmp_dispatcher_co_wake() > | | -> Checks busy flag. It's TRUE! > | | -> Skips aio_co_wake(). > | | > | |-- AIO_WAIT_WHILE_UNLOCKED: > | | |-- aio_poll(iohandler_ctx, false) > | | | -> Executes do_qmp_dispatch_bh > | | | -> Schedules 'co_schedule_bh' in iohandler_ctx > | | | > | | |-- aio_poll(qemu_aio_context, true) > | | | -> Blocks indefinitely! (Deadlock) > | | > | X (Main thread sleeping) X (Waiting for next iohandler_ctx poll) > > To fix this, we add an explicit aio_wait_kick() in do_qmp_dispatch_bh() to break the main loop out of its blocking poll, allowing it to evaluate the loop condition and poll iohandler_ctx. > > Signed-off-by: hongmianquan <hongmianquan@bytedance.com> > Signed-off-by: wubo.bob <wubo.bob@bytedance.com> Please line-wrap your paragraphs at 70 columns or so. The maintainer accepting the patch may do that for you, to save you a respin. Suggested-by: Kevin Wolf <kwolf@redhat.com> Acked-by: Markus Armbruster <armbru@redhat.com>
Am 31.03.2026 um 15:24 hat Markus Armbruster geschrieben: > "hongmianquan" <hongmianquan@bytedance.com> writes: > > > During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently > > received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup. > > > > The root cause is a race condition between the main thread's shutdown sequence and the coroutine's dispatching mechanism. When handling a non-coroutine QMP command, qmp_dispatcher_co schedules the actual command execution as a bottom half in iohandler_ctx and then yields. At this suspended point, qmp_dispatcher_co_busy remains true. > > Subsequently, the main thread in monitor_cleanup(), sets qmp_dispatcher_co_shutdown, and calls qmp_dispatcher_co_wake(). Since qmp_dispatcher_co_busy is already true, the aio_co_wake is skipped. The main thread then enters the AIO_WAIT_WHILE_UNLOCKED loop, it executes the scheduled BH (do_qmp_dispatch_bh) via aio_poll(iohandler_ctx, false), which attempts to wake up the coroutine, aio_co_wake schedules a new wake-up BH in iohandler_ctx. The main thread then blocks indefinitely in aio_poll(qemu_aio_context, true), while the coroutine's wake-up BH is starved in iohandler_ctx, qmp_dispatcher_co never reaches termination, resulting in a deadlock. > > > > The execution sequence is illustrated below: > > > > IO Thread Main Thread (qemu_aio_context) qmp_dispatcher_co (iohandler_ctx) > > | | | > > |-- query-commands | | > > |-- qmp_dispatcher_co_wake() | | > > | (sets busy = true) | | > > | | <-- Wakes up in iohandler_ctx --> | > > | | |-- qmp_dispatch() > > | | |-- Schedules BH (do_qmp_dispatch_bh) > > | | |-- qemu_coroutine_yield() > > | | [State: Suspended, busy=true] > > | [ quit triggered ] | > > | |-- monitor_cleanup() > > | |-- qmp_dispatcher_co_shutdown = true > > | |-- qmp_dispatcher_co_wake() > > | | -> Checks busy flag. It's TRUE! > > | | -> Skips aio_co_wake(). > > | | > > | |-- AIO_WAIT_WHILE_UNLOCKED: > > | | |-- aio_poll(iohandler_ctx, false) > > | | | -> Executes do_qmp_dispatch_bh > > | | | -> Schedules 'co_schedule_bh' in iohandler_ctx > > | | | > > | | |-- aio_poll(qemu_aio_context, true) > > | | | -> Blocks indefinitely! (Deadlock) > > | | > > | X (Main thread sleeping) X (Waiting for next iohandler_ctx poll) > > > > To fix this, we add an explicit aio_wait_kick() in do_qmp_dispatch_bh() to break the main loop out of its blocking poll, allowing it to evaluate the loop condition and poll iohandler_ctx. > > > > Signed-off-by: hongmianquan <hongmianquan@bytedance.com> > > Signed-off-by: wubo.bob <wubo.bob@bytedance.com> > > Please line-wrap your paragraphs at 70 columns or so. The maintainer > accepting the patch may do that for you, to save you a respin. > > Suggested-by: Kevin Wolf <kwolf@redhat.com> > Acked-by: Markus Armbruster <armbru@redhat.com> Thanks, applied to the block branch. Kevin
© 2016 - 2026 Red Hat, Inc.