Changes since v2
----------------
* Drop the redundant fast-path check in qemu_co_sleep(); the publish
cmpxchg already consumes a pending wake (Kevin).
* Add a deterministic regression test, no threads or timers needed.
Changes since v1
----------------
* Patch 1 (graph-lock) applied as e3082ab3b3, dropped here.
* Fix the qemu_co_sleep_wake() primitive instead of working around
it in qcow2 (Kevin).
Problem
-------
The qemu shutdown / blockdev-close path can deadlock permanently on
upstream master. The main thread enters ppoll(timeout=-1) holding
BQL, no other thread has a wake source that points back at it, and
qemu has to be SIGKILLed. The hang has no timeout -- it is a hard
deadlock, not a slow operation; behind BQL, RCU, VCPUs and every
iothread path that needs BQL stall with it.
The race exposed in qcow2's cache_clean_timer cancellation path:
ppoll -> aio_poll -> cache_clean_timer_del_and_wait -> qcow2_close
The race diagram and the exact stale-state read are in the patch's
commit message.
Reproducer
----------
Environment: 4-vCPU VM guest, kernel 6.12.x, upstream master at
de5d8bfd61. On modern bare-metal the window is narrow enough that the
hang rarely reproduces without a VM -- a VM guest under full CPU
saturation is what makes the timing reliable.
# reproducer
stress-ng --cpu "$(nproc)" --timeout 0 &
for r in $(seq 20); do
timeout 120 ./build/tests/qemu-iotests/check -qcow2 iothreads-create
done
kill %1
With `stress-ng --cpu $(nproc)` the race surfaces. With
`stress-ng --cpu $(($(nproc) - 1))` or without a stressor it does
not reproduce reliably across 20 iterations.
The new unit test reproduces the same lost wakeup deterministically,
without a VM or a stressor.
Results
-------
Same guest, 20 iterations of the loop above, master at de5d8bfd61:
without this patch: reproduces reliably (qcow2_close in ppoll)
with this patch: 20/20 PASS
Signed-off-by: Denis V. Lunev <den@openvz.org>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Hanna Reitz <hreitz@redhat.com>
Denis V. Lunev (1):
coroutine: fix lost wakeup in qemu_co_sleep_wake()
include/qemu/coroutine.h | 17 +++++++++---
tests/unit/test-coroutine.c | 53 +++++++++++++++++++++++++++++++++++++
util/qemu-coroutine-sleep.c | 53 ++++++++++++++++++++++++++-----------
3 files changed, 104 insertions(+), 19 deletions(-)
--
2.53.0