[PATCH v3 0/1] coroutine: fix lost wakeup in qemu_co_sleep_wake()

Denis V. Lunev via qemu development posted 1 patch 4 days, 1 hour ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260610115850.2410566-1-den@openvz.org
Maintainers: Stefan Hajnoczi <stefanha@redhat.com>, Kevin Wolf <kwolf@redhat.com>
include/qemu/coroutine.h    | 17 +++++++++---
tests/unit/test-coroutine.c | 53 +++++++++++++++++++++++++++++++++++++
util/qemu-coroutine-sleep.c | 53 ++++++++++++++++++++++++++-----------
3 files changed, 104 insertions(+), 19 deletions(-)
[PATCH v3 0/1] coroutine: fix lost wakeup in qemu_co_sleep_wake()
Posted by Denis V. Lunev via qemu development 4 days, 1 hour ago
Changes since v2
----------------
  * Drop the redundant fast-path check in qemu_co_sleep(); the publish
    cmpxchg already consumes a pending wake (Kevin).
  * Add a deterministic regression test, no threads or timers needed.

Changes since v1
----------------
  * Patch 1 (graph-lock) applied as e3082ab3b3, dropped here.
  * Fix the qemu_co_sleep_wake() primitive instead of working around
    it in qcow2 (Kevin).

Problem
-------

The qemu shutdown / blockdev-close path can deadlock permanently on
upstream master. The main thread enters ppoll(timeout=-1) holding
BQL, no other thread has a wake source that points back at it, and
qemu has to be SIGKILLed. The hang has no timeout -- it is a hard
deadlock, not a slow operation; behind BQL, RCU, VCPUs and every
iothread path that needs BQL stall with it.

The race exposed in qcow2's cache_clean_timer cancellation path:

  ppoll -> aio_poll -> cache_clean_timer_del_and_wait -> qcow2_close

The race diagram and the exact stale-state read are in the patch's
commit message.

Reproducer
----------

Environment: 4-vCPU VM guest, kernel 6.12.x, upstream master at
de5d8bfd61. On modern bare-metal the window is narrow enough that the
hang rarely reproduces without a VM -- a VM guest under full CPU
saturation is what makes the timing reliable.

    # reproducer
    stress-ng --cpu "$(nproc)" --timeout 0 &
    for r in $(seq 20); do
        timeout 120 ./build/tests/qemu-iotests/check -qcow2 iothreads-create
    done
    kill %1

With `stress-ng --cpu $(nproc)` the race surfaces. With
`stress-ng --cpu $(($(nproc) - 1))` or without a stressor it does
not reproduce reliably across 20 iterations.

The new unit test reproduces the same lost wakeup deterministically,
without a VM or a stressor.

Results
-------

Same guest, 20 iterations of the loop above, master at de5d8bfd61:

  without this patch:  reproduces reliably (qcow2_close in ppoll)
  with this patch:     20/20 PASS

Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Hanna Reitz <hreitz@redhat.com>

Denis V. Lunev (1):
  coroutine: fix lost wakeup in qemu_co_sleep_wake()

 include/qemu/coroutine.h    | 17 +++++++++---
 tests/unit/test-coroutine.c | 53 +++++++++++++++++++++++++++++++++++++
 util/qemu-coroutine-sleep.c | 53 ++++++++++++++++++++++++++-----------
 3 files changed, 104 insertions(+), 19 deletions(-)

-- 
2.53.0