[PATCH hci-8.0 0/1] block/linux-aio: fix reproducible SIGSEGV from unbounded ioq_submit() recursion

Denis V. Lunev via qemu development posted 1 patch 1 day, 2 hours ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20260420100655.3318452-1-den@openvz.org
Maintainers: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>
block/linux-aio.c | 23 +++++++++++++++++++++++
1 file changed, 23 insertions(+)
[PATCH hci-8.0 0/1] block/linux-aio: fix reproducible SIGSEGV from unbounded ioq_submit() recursion
Posted by Denis V. Lunev via qemu development 1 day, 2 hours ago
Observed in production where a cached-I/O backup path was driven
through aio=native, making io_submit(2) complete synchronously and
closing the recursion cycle.  On the supported aio=native + cache=none
+ qcow2 configuration the cycle stays bounded by accident rather than
by construction; this patch bounds it explicitly.

Bisect:

  v8.1.0 (forward edge only)      no crash / 20
  84d61e5f36^                     no crash / 20
  84d61e5f36 (backward edge in)   crash at attempt 17
  v8.2.0                          crash at attempt  4
  master + this patch             no crash / 80

The closing commit is 84d61e5f36 ("virtio: use defer_call() in
virtio_irqfd_notify()").

No iotest: crash rate is 6..17 per 20 on unpatched master; a formal
test would be flaky.  The vmdk + aio=native + cache=none shape is
not otherwise exercised by the suite.

--- gen-workload.py -----------------------------------------------
#!/usr/bin/env python3
import random, sys
REGION  = 32 * 1024 * 1024
CLUSTER = 64 * 1024
SEED    = 0xC0FFEE
def main(out):
    r = random.Random(SEED); ops = []
    for _ in range(10000):
        off = r.randrange(0, REGION - 4096) & ~4095
        ops.append("aio_write -q %d 4k" % off)
    for i in range(10000):
        size, n = ("64k", 65536) if i < 5000 else ("128k", 131072)
        off = r.randrange(0, REGION - n) & ~(CLUSTER - 1)
        ops.append("aio_write -q -z -u %d %s" % (off, size))
    r.shuffle(ops); ops.append("aio_flush")
    open(out, "w").write("\n".join(ops) + "\n")
if __name__ == "__main__":
    main(sys.argv[1] if len(sys.argv) > 1 else "t.cmds")
-------------------------------------------------------------------

--- repro.sh ------------------------------------------------------
#!/bin/bash
set -u
qimg=$1; qio=$2; label=$3; attempts=${4:-20}
cmds=${5:-$(dirname "$0")/t.cmds}
vmdk=/tmp/t.$label.vmdk; log=/tmp/repro_$label.log
: > "$log"
for i in $(seq 1 "$attempts"); do
    rm -f "$vmdk"
    "$qimg" create -f vmdk "$vmdk" 256M >/dev/null 2>&1
    "$qio" -f vmdk -n --cache=none --aio=native "$vmdk" < "$cmds" \
        >>"$log" 2>&1
    rc=$?
    [ $rc -ge 128 ] && { echo "CRASH attempt $i rc=$rc" >>"$log"; break; }
done
echo "DONE $label rc=$rc attempt=$i" >> "$log"
-------------------------------------------------------------------

  python3 gen-workload.py t.cmds
  ./repro.sh /path/to/qemu-img /path/to/qemu-io test 20

Notes:

 * IOQ_SUBMIT_MAX_DEPTH = 8.  Round headroom over the bounded depth
   of the supported async-completion path.
 * Per-thread __thread counter, matching util/defer-call.c's storage.
   A per-LinuxAioState field would let multiple devices on one
   thread recurse independently.

Denis V. Lunev (1):
  block/linux-aio: bound ioq_submit() recursion depth

 block/linux-aio.c | 23 +++++++++++++++++++++++
 1 file changed, 23 insertions(+)

Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Kevin Wolf <kwolf@redhat.com>
CC: Hanna Reitz <hreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Paolo Bonzini <pbonzini@redhat.com>
--
2.51.0