block/linux-aio.c | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)
Observed in production where a cached-I/O backup path was driven
through aio=native, making io_submit(2) complete synchronously and
closing the recursion cycle. On the supported aio=native + cache=none
+ qcow2 configuration the cycle stays bounded by accident rather than
by construction; this patch bounds it explicitly.
Bisect:
v8.1.0 (forward edge only) no crash / 20
84d61e5f36^ no crash / 20
84d61e5f36 (backward edge in) crash at attempt 17
v8.2.0 crash at attempt 4
master + this patch no crash / 80
The closing commit is 84d61e5f36 ("virtio: use defer_call() in
virtio_irqfd_notify()").
No iotest: crash rate is 6..17 per 20 on unpatched master; a formal
test would be flaky. The vmdk + aio=native + cache=none shape is
not otherwise exercised by the suite.
--- gen-workload.py -----------------------------------------------
#!/usr/bin/env python3
import random, sys
REGION = 32 * 1024 * 1024
CLUSTER = 64 * 1024
SEED = 0xC0FFEE
def main(out):
r = random.Random(SEED); ops = []
for _ in range(10000):
off = r.randrange(0, REGION - 4096) & ~4095
ops.append("aio_write -q %d 4k" % off)
for i in range(10000):
size, n = ("64k", 65536) if i < 5000 else ("128k", 131072)
off = r.randrange(0, REGION - n) & ~(CLUSTER - 1)
ops.append("aio_write -q -z -u %d %s" % (off, size))
r.shuffle(ops); ops.append("aio_flush")
open(out, "w").write("\n".join(ops) + "\n")
if __name__ == "__main__":
main(sys.argv[1] if len(sys.argv) > 1 else "t.cmds")
-------------------------------------------------------------------
--- repro.sh ------------------------------------------------------
#!/bin/bash
set -u
qimg=$1; qio=$2; label=$3; attempts=${4:-20}
cmds=${5:-$(dirname "$0")/t.cmds}
vmdk=/tmp/t.$label.vmdk; log=/tmp/repro_$label.log
: > "$log"
for i in $(seq 1 "$attempts"); do
rm -f "$vmdk"
"$qimg" create -f vmdk "$vmdk" 256M >/dev/null 2>&1
"$qio" -f vmdk -n --cache=none --aio=native "$vmdk" < "$cmds" \
>>"$log" 2>&1
rc=$?
[ $rc -ge 128 ] && { echo "CRASH attempt $i rc=$rc" >>"$log"; break; }
done
echo "DONE $label rc=$rc attempt=$i" >> "$log"
-------------------------------------------------------------------
python3 gen-workload.py t.cmds
./repro.sh /path/to/qemu-img /path/to/qemu-io test 20
Notes:
* IOQ_SUBMIT_MAX_DEPTH = 8. Round headroom over the bounded depth
of the supported async-completion path.
* Per-thread __thread counter, matching util/defer-call.c's storage.
A per-LinuxAioState field would let multiple devices on one
thread recurse independently.
Changes from v2:
* moved depth guard to struct qemu_laiocb (suggestion from Stefan)
Changes from v1:
* removed all downstream marks
Denis V. Lunev (1):
block/linux-aio: bound ioq_submit() recursion depth
block/linux-aio.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Hanna Reitz <hreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
--
2.51.0
On Wed, May 20, 2026 at 04:25:02PM +0200, Denis V. Lunev via qemu development wrote:
> Observed in production where a cached-I/O backup path was driven
> through aio=native, making io_submit(2) complete synchronously and
> closing the recursion cycle. On the supported aio=native + cache=none
> + qcow2 configuration the cycle stays bounded by accident rather than
> by construction; this patch bounds it explicitly.
>
> Bisect:
>
> v8.1.0 (forward edge only) no crash / 20
> 84d61e5f36^ no crash / 20
> 84d61e5f36 (backward edge in) crash at attempt 17
> v8.2.0 crash at attempt 4
> master + this patch no crash / 80
>
> The closing commit is 84d61e5f36 ("virtio: use defer_call() in
> virtio_irqfd_notify()").
>
> No iotest: crash rate is 6..17 per 20 on unpatched master; a formal
> test would be flaky. The vmdk + aio=native + cache=none shape is
> not otherwise exercised by the suite.
>
> --- gen-workload.py -----------------------------------------------
> #!/usr/bin/env python3
> import random, sys
> REGION = 32 * 1024 * 1024
> CLUSTER = 64 * 1024
> SEED = 0xC0FFEE
> def main(out):
> r = random.Random(SEED); ops = []
> for _ in range(10000):
> off = r.randrange(0, REGION - 4096) & ~4095
> ops.append("aio_write -q %d 4k" % off)
> for i in range(10000):
> size, n = ("64k", 65536) if i < 5000 else ("128k", 131072)
> off = r.randrange(0, REGION - n) & ~(CLUSTER - 1)
> ops.append("aio_write -q -z -u %d %s" % (off, size))
> r.shuffle(ops); ops.append("aio_flush")
> open(out, "w").write("\n".join(ops) + "\n")
> if __name__ == "__main__":
> main(sys.argv[1] if len(sys.argv) > 1 else "t.cmds")
> -------------------------------------------------------------------
>
> --- repro.sh ------------------------------------------------------
> #!/bin/bash
> set -u
> qimg=$1; qio=$2; label=$3; attempts=${4:-20}
> cmds=${5:-$(dirname "$0")/t.cmds}
> vmdk=/tmp/t.$label.vmdk; log=/tmp/repro_$label.log
> : > "$log"
> for i in $(seq 1 "$attempts"); do
> rm -f "$vmdk"
> "$qimg" create -f vmdk "$vmdk" 256M >/dev/null 2>&1
> "$qio" -f vmdk -n --cache=none --aio=native "$vmdk" < "$cmds" \
> >>"$log" 2>&1
> rc=$?
> [ $rc -ge 128 ] && { echo "CRASH attempt $i rc=$rc" >>"$log"; break; }
> done
> echo "DONE $label rc=$rc attempt=$i" >> "$log"
> -------------------------------------------------------------------
>
> python3 gen-workload.py t.cmds
> ./repro.sh /path/to/qemu-img /path/to/qemu-io test 20
>
> Notes:
>
> * IOQ_SUBMIT_MAX_DEPTH = 8. Round headroom over the bounded depth
> of the supported async-completion path.
> * Per-thread __thread counter, matching util/defer-call.c's storage.
> A per-LinuxAioState field would let multiple devices on one
> thread recurse independently.
>
> Changes from v2:
> * moved depth guard to struct qemu_laiocb (suggestion from Stefan)
>
> Changes from v1:
> * removed all downstream marks
>
> Denis V. Lunev (1):
> block/linux-aio: bound ioq_submit() recursion depth
>
> block/linux-aio.c | 23 +++++++++++++++++++++++
> 1 file changed, 23 insertions(+)
>
> Signed-off-by: Denis V. Lunev <den@openvz.org>
> CC: Kevin Wolf <kwolf@redhat.com>
> CC: Hanna Reitz <hreitz@redhat.com>
> CC: Stefan Hajnoczi <stefanha@redhat.com>
> CC: Paolo Bonzini <pbonzini@redhat.com>
> --
> 2.51.0
>
>
Thanks, applied to my block tree:
https://gitlab.com/stefanha/qemu/commits/block
Stefan
© 2016 - 2026 Red Hat, Inc.