[PATCH 3/4] qemu-img: prevent stack overflow in bench by using bottom half

gerben@altlinux.org posted 4 patches 10 months, 2 weeks ago
[PATCH 3/4] qemu-img: prevent stack overflow in bench by using bottom half
Posted by gerben@altlinux.org 10 months, 2 weeks ago
From: Denis Rastyogin <gerben@altlinux.org>

This error was discovered by fuzzing qemu-img.

Previously, new I/O requests were launched synchronously inside the
completion callback `bench_cb`, leading to deep recursion and stack
overflow. This patch moves the launching of new requests to a separate
function `bench_bh`, scheduled via `qemu_bh_schedule` to run in the event
loop context, thus unwinding the stack and preventing overflow.

Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
Signed-off-by: Denis Rastyogin <gerben@altlinux.org>
---
 qemu-img.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/qemu-img.c b/qemu-img.c
index 71c9fe496f..5cbf3d18d7 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4426,6 +4426,7 @@ typedef struct BenchData {
     int in_flight;
     bool in_flush;
     uint64_t offset;
+    QEMUBH *bh;
 } BenchData;
 
 static void bench_undrained_flush_cb(void *opaque, int ret)
@@ -4479,7 +4480,16 @@ static void bench_cb(void *opaque, int ret)
             }
         }
     }
+    if (b->n > b->in_flight && b->in_flight < b->nrreq) {
+        qemu_bh_schedule(b->bh);
+    }
+}
 
+static void bench_bh(void *opaque)
+{
+    BenchData *b = opaque;
+    BlockAIOCB *acb;
+    
     while (b->n > b->in_flight && b->in_flight < b->nrreq) {
         int64_t offset = b->offset;
         /* blk_aio_* might look for completed I/Os and kick bench_cb
@@ -4737,6 +4747,7 @@ static int img_bench(int argc, char **argv)
     }
 
     gettimeofday(&t1, NULL);
+    data.bh = qemu_bh_new(bench_bh, &data);
     bench_cb(&data, 0);
 
     while (data.n > 0) {
@@ -4755,6 +4766,9 @@ out:
     qemu_vfree(data.buf);
     blk_unref(blk);
 
+    if (data.bh) {
+        qemu_bh_delete(data.bh);
+    }
     if (ret) {
         return 1;
     }
-- 
2.42.2
Re: [PATCH 3/4] qemu-img: prevent stack overflow in bench by using bottom half
Posted by Kevin Wolf 9 months, 2 weeks ago
Am 27.03.2025 um 17:24 hat gerben@altlinux.org geschrieben:
> From: Denis Rastyogin <gerben@altlinux.org>
> 
> This error was discovered by fuzzing qemu-img.
> 
> Previously, new I/O requests were launched synchronously inside the
> completion callback `bench_cb`, leading to deep recursion and stack
> overflow. This patch moves the launching of new requests to a separate
> function `bench_bh`, scheduled via `qemu_bh_schedule` to run in the event
> loop context, thus unwinding the stack and preventing overflow.
> 
> Signed-off-by: Vasiliy Kovalev <kovalev@altlinux.org>
> Signed-off-by: Denis Rastyogin <gerben@altlinux.org>

Normally, the callback shouldn't immediately be called, except for
errors, which take a different code path anyway.

Was this tested with the null block driver or with a backend that is
actually relevant in practice?

I wonder if switching qemu-img bench to coroutines would make sense. But
since this is a benchmarking tool, we need to measure the performance
difference from both an additional BH and from switching to coroutines
compared to the current state. In case of doubt, I'd leave this unfixed,
this isn't something that is run in production.

Kevin