[PATCH v2] monitor: Fix deadlock in monitor_cleanup

hongmianquan posted 1 patch 6 days, 1 hour ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260327131024.51947-1-hongmianquan@bytedance.com
Maintainers: Markus Armbruster <armbru@redhat.com>, Michael Roth <michael.roth@amd.com>
qapi/qmp-dispatch.c | 10 ++++++++++
1 file changed, 10 insertions(+)
[PATCH v2] monitor: Fix deadlock in monitor_cleanup
Posted by hongmianquan 6 days, 1 hour ago
During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently
received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup.

The root cause is a race condition between the main thread's shutdown sequence and the coroutine's dispatching mechanism. When handling a non-coroutine QMP command, qmp_dispatcher_co schedules the actual command execution as a bottom half in iohandler_ctx and then yields. At this suspended point, qmp_dispatcher_co_busy remains true.
Subsequently, the main thread in monitor_cleanup(), sets qmp_dispatcher_co_shutdown, and calls qmp_dispatcher_co_wake(). Since qmp_dispatcher_co_busy is already true, the aio_co_wake is skipped. The main thread then enters the AIO_WAIT_WHILE_UNLOCKED loop, it executes the scheduled BH (do_qmp_dispatch_bh) via aio_poll(iohandler_ctx, false), which attempts to wake up the coroutine, aio_co_wake schedules a new wake-up BH in iohandler_ctx. The main thread then blocks indefinitely in aio_poll(qemu_aio_context, true), while the coroutine's wake-up BH is starved in iohandler_ctx, qmp_dispatcher_co never reaches termination, resulting in a deadlock.

The execution sequence is illustrated below:

 IO Thread                 Main Thread (qemu_aio_context)        qmp_dispatcher_co (iohandler_ctx)
    |                                 |                                        |
    |-- query-commands                |                                        |
    |-- qmp_dispatcher_co_wake()      |                                        |
    |    (sets busy = true)           |                                        |
    |                                 |   <-- Wakes up in iohandler_ctx -->    |
    |                                 |                                        |-- qmp_dispatch()
    |                                 |                                        |-- Schedules BH (do_qmp_dispatch_bh)
    |                                 |                                        |-- qemu_coroutine_yield()
    |                                 |                                            [State: Suspended, busy=true]
    |   [ quit triggered ]            |
    |                                 |-- monitor_cleanup()
    |                                 |-- qmp_dispatcher_co_shutdown = true
    |                                 |-- qmp_dispatcher_co_wake()
    |                                 |    -> Checks busy flag. It's TRUE!
    |                                 |    -> Skips aio_co_wake().
    |                                 |
    |                                 |-- AIO_WAIT_WHILE_UNLOCKED:
    |                                 |   |-- aio_poll(iohandler_ctx, false)
    |                                 |   |    -> Executes do_qmp_dispatch_bh
    |                                 |   |    -> Schedules 'co_schedule_bh' in iohandler_ctx
    |                                 |   |
    |                                 |   |-- aio_poll(qemu_aio_context, true)
    |                                 |   |    -> Blocks indefinitely! (Deadlock)
    |                                 |
    |                                 X (Main thread sleeping)                 X (Waiting for next iohandler_ctx poll)

To fix this, we add an explicit aio_wait_kick() in do_qmp_dispatch_bh() to break the main loop out of its blocking poll, allowing it to evaluate the loop condition and poll iohandler_ctx.

Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
Signed-off-by: wubo.bob <wubo.bob@bytedance.com>
---
 qapi/qmp-dispatch.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 9bb1e6a9f4..e3897d5197 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -128,6 +128,16 @@ static void do_qmp_dispatch_bh(void *opaque)
     data->cmd->fn(data->args, data->ret, data->errp);
     monitor_set_cur(qemu_coroutine_self(), NULL);
     aio_co_wake(data->co);
+
+    /*
+     * If the QMP dispatcher coroutine is waiting to be scheduled
+     * in iohandler_ctx, we must kick the main loop. This ensures
+     * that AIO_WAIT_WHILE_UNLOCKED() in monitor_cleanup() doesn't
+     * block indefinitely waiting for an event in qemu_aio_context,
+     * but actually gets the chance to poll iohandler_ctx and resume
+     * the coroutine.
+     */
+    aio_wait_kick();
 }
 
 /*
-- 
2.32.1 (Apple Git-133)
Re: [PATCH v2] monitor: Fix deadlock in monitor_cleanup
Posted by Michael Tokarev 1 day, 16 hours ago
On 27.03.2026 16:10, hongmianquan wrote:
> During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently
> received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup.
> 
> The root cause is a race condition between the main thread's shutdown sequence and the coroutine's dispatching mechanism. When handling a non-coroutine QMP command, qmp_dispatcher_co schedules the actual command execution as a bottom half in iohandler_ctx and then yields. At this suspended point, qmp_dispatcher_co_busy remains true.
> Subsequently, the main thread in monitor_cleanup(), sets qmp_dispatcher_co_shutdown, and calls qmp_dispatcher_co_wake(). Since qmp_dispatcher_co_busy is already true, the aio_co_wake is skipped. The main thread then enters the AIO_WAIT_WHILE_UNLOCKED loop, it executes the scheduled BH (do_qmp_dispatch_bh) via aio_poll(iohandler_ctx, false), which attempts to wake up the coroutine, aio_co_wake schedules a new wake-up BH in iohandler_ctx. The main thread then blocks indefinitely in aio_poll(qemu_aio_context, true), while the coroutine's wake-up BH is starved in iohandler_ctx, qmp_dispatcher_co never reaches termination, resulting in a deadlock.
> 
> The execution sequence is illustrated below:
> 
>   IO Thread                 Main Thread (qemu_aio_context)        qmp_dispatcher_co (iohandler_ctx)
>      |                                 |                                        |
>      |-- query-commands                |                                        |
>      |-- qmp_dispatcher_co_wake()      |                                        |
>      |    (sets busy = true)           |                                        |
>      |                                 |   <-- Wakes up in iohandler_ctx -->    |
>      |                                 |                                        |-- qmp_dispatch()
>      |                                 |                                        |-- Schedules BH (do_qmp_dispatch_bh)
>      |                                 |                                        |-- qemu_coroutine_yield()
>      |                                 |                                            [State: Suspended, busy=true]
>      |   [ quit triggered ]            |
>      |                                 |-- monitor_cleanup()
>      |                                 |-- qmp_dispatcher_co_shutdown = true
>      |                                 |-- qmp_dispatcher_co_wake()
>      |                                 |    -> Checks busy flag. It's TRUE!
>      |                                 |    -> Skips aio_co_wake().
>      |                                 |
>      |                                 |-- AIO_WAIT_WHILE_UNLOCKED:
>      |                                 |   |-- aio_poll(iohandler_ctx, false)
>      |                                 |   |    -> Executes do_qmp_dispatch_bh
>      |                                 |   |    -> Schedules 'co_schedule_bh' in iohandler_ctx
>      |                                 |   |
>      |                                 |   |-- aio_poll(qemu_aio_context, true)
>      |                                 |   |    -> Blocks indefinitely! (Deadlock)
>      |                                 |
>      |                                 X (Main thread sleeping)                 X (Waiting for next iohandler_ctx poll)
> 
> To fix this, we add an explicit aio_wait_kick() in do_qmp_dispatch_bh() to break the main loop out of its blocking poll, allowing it to evaluate the loop condition and poll iohandler_ctx.

Shouldn't this one too be picked up for the stable series?
Sounds like a good candidate to me.

Thanks,

/mjt
Re: [PATCH v2] monitor: Fix deadlock in monitor_cleanup
Posted by Kevin Wolf 1 day, 5 hours ago
Am 31.03.2026 um 23:38 hat Michael Tokarev geschrieben:
> On 27.03.2026 16:10, hongmianquan wrote:
> > During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently
> > received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup.
> 
> Shouldn't this one too be picked up for the stable series?
> Sounds like a good candidate to me.

Yes, makes sense, please pick it up. I also agree with the SCSI and IDE
patch you picked up.

Kevin
Re: [PATCH v2] monitor: Fix deadlock in monitor_cleanup
Posted by Markus Armbruster 2 days ago
"hongmianquan" <hongmianquan@bytedance.com> writes:

> During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently
> received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup.
>
> The root cause is a race condition between the main thread's shutdown sequence and the coroutine's dispatching mechanism. When handling a non-coroutine QMP command, qmp_dispatcher_co schedules the actual command execution as a bottom half in iohandler_ctx and then yields. At this suspended point, qmp_dispatcher_co_busy remains true.
> Subsequently, the main thread in monitor_cleanup(), sets qmp_dispatcher_co_shutdown, and calls qmp_dispatcher_co_wake(). Since qmp_dispatcher_co_busy is already true, the aio_co_wake is skipped. The main thread then enters the AIO_WAIT_WHILE_UNLOCKED loop, it executes the scheduled BH (do_qmp_dispatch_bh) via aio_poll(iohandler_ctx, false), which attempts to wake up the coroutine, aio_co_wake schedules a new wake-up BH in iohandler_ctx. The main thread then blocks indefinitely in aio_poll(qemu_aio_context, true), while the coroutine's wake-up BH is starved in iohandler_ctx, qmp_dispatcher_co never reaches termination, resulting in a deadlock.
>
> The execution sequence is illustrated below:
>
>  IO Thread                 Main Thread (qemu_aio_context)        qmp_dispatcher_co (iohandler_ctx)
>     |                                 |                                        |
>     |-- query-commands                |                                        |
>     |-- qmp_dispatcher_co_wake()      |                                        |
>     |    (sets busy = true)           |                                        |
>     |                                 |   <-- Wakes up in iohandler_ctx -->    |
>     |                                 |                                        |-- qmp_dispatch()
>     |                                 |                                        |-- Schedules BH (do_qmp_dispatch_bh)
>     |                                 |                                        |-- qemu_coroutine_yield()
>     |                                 |                                            [State: Suspended, busy=true]
>     |   [ quit triggered ]            |
>     |                                 |-- monitor_cleanup()
>     |                                 |-- qmp_dispatcher_co_shutdown = true
>     |                                 |-- qmp_dispatcher_co_wake()
>     |                                 |    -> Checks busy flag. It's TRUE!
>     |                                 |    -> Skips aio_co_wake().
>     |                                 |
>     |                                 |-- AIO_WAIT_WHILE_UNLOCKED:
>     |                                 |   |-- aio_poll(iohandler_ctx, false)
>     |                                 |   |    -> Executes do_qmp_dispatch_bh
>     |                                 |   |    -> Schedules 'co_schedule_bh' in iohandler_ctx
>     |                                 |   |
>     |                                 |   |-- aio_poll(qemu_aio_context, true)
>     |                                 |   |    -> Blocks indefinitely! (Deadlock)
>     |                                 |
>     |                                 X (Main thread sleeping)                 X (Waiting for next iohandler_ctx poll)
>
> To fix this, we add an explicit aio_wait_kick() in do_qmp_dispatch_bh() to break the main loop out of its blocking poll, allowing it to evaluate the loop condition and poll iohandler_ctx.
>
> Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
> Signed-off-by: wubo.bob <wubo.bob@bytedance.com>

Please line-wrap your paragraphs at 70 columns or so.  The maintainer
accepting the patch may do that for you, to save you a respin.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Acked-by: Markus Armbruster <armbru@redhat.com>
Re: [PATCH v2] monitor: Fix deadlock in monitor_cleanup
Posted by Kevin Wolf 2 days ago
Am 31.03.2026 um 15:24 hat Markus Armbruster geschrieben:
> "hongmianquan" <hongmianquan@bytedance.com> writes:
> 
> > During qemu_cleanup, if a non-coroutine QMP command (e.g., query-commands) is concurrently
> > received and processed by the mon_iothread, it can lead to a deadlock in monitor_cleanup.
> >
> > The root cause is a race condition between the main thread's shutdown sequence and the coroutine's dispatching mechanism. When handling a non-coroutine QMP command, qmp_dispatcher_co schedules the actual command execution as a bottom half in iohandler_ctx and then yields. At this suspended point, qmp_dispatcher_co_busy remains true.
> > Subsequently, the main thread in monitor_cleanup(), sets qmp_dispatcher_co_shutdown, and calls qmp_dispatcher_co_wake(). Since qmp_dispatcher_co_busy is already true, the aio_co_wake is skipped. The main thread then enters the AIO_WAIT_WHILE_UNLOCKED loop, it executes the scheduled BH (do_qmp_dispatch_bh) via aio_poll(iohandler_ctx, false), which attempts to wake up the coroutine, aio_co_wake schedules a new wake-up BH in iohandler_ctx. The main thread then blocks indefinitely in aio_poll(qemu_aio_context, true), while the coroutine's wake-up BH is starved in iohandler_ctx, qmp_dispatcher_co never reaches termination, resulting in a deadlock.
> >
> > The execution sequence is illustrated below:
> >
> >  IO Thread                 Main Thread (qemu_aio_context)        qmp_dispatcher_co (iohandler_ctx)
> >     |                                 |                                        |
> >     |-- query-commands                |                                        |
> >     |-- qmp_dispatcher_co_wake()      |                                        |
> >     |    (sets busy = true)           |                                        |
> >     |                                 |   <-- Wakes up in iohandler_ctx -->    |
> >     |                                 |                                        |-- qmp_dispatch()
> >     |                                 |                                        |-- Schedules BH (do_qmp_dispatch_bh)
> >     |                                 |                                        |-- qemu_coroutine_yield()
> >     |                                 |                                            [State: Suspended, busy=true]
> >     |   [ quit triggered ]            |
> >     |                                 |-- monitor_cleanup()
> >     |                                 |-- qmp_dispatcher_co_shutdown = true
> >     |                                 |-- qmp_dispatcher_co_wake()
> >     |                                 |    -> Checks busy flag. It's TRUE!
> >     |                                 |    -> Skips aio_co_wake().
> >     |                                 |
> >     |                                 |-- AIO_WAIT_WHILE_UNLOCKED:
> >     |                                 |   |-- aio_poll(iohandler_ctx, false)
> >     |                                 |   |    -> Executes do_qmp_dispatch_bh
> >     |                                 |   |    -> Schedules 'co_schedule_bh' in iohandler_ctx
> >     |                                 |   |
> >     |                                 |   |-- aio_poll(qemu_aio_context, true)
> >     |                                 |   |    -> Blocks indefinitely! (Deadlock)
> >     |                                 |
> >     |                                 X (Main thread sleeping)                 X (Waiting for next iohandler_ctx poll)
> >
> > To fix this, we add an explicit aio_wait_kick() in do_qmp_dispatch_bh() to break the main loop out of its blocking poll, allowing it to evaluate the loop condition and poll iohandler_ctx.
> >
> > Signed-off-by: hongmianquan <hongmianquan@bytedance.com>
> > Signed-off-by: wubo.bob <wubo.bob@bytedance.com>
> 
> Please line-wrap your paragraphs at 70 columns or so.  The maintainer
> accepting the patch may do that for you, to save you a respin.
> 
> Suggested-by: Kevin Wolf <kwolf@redhat.com>
> Acked-by: Markus Armbruster <armbru@redhat.com>

Thanks, applied to the block branch.

Kevin