The tests/test-bdrv-drain /bdrv-drain/iothread/drain test case does the
following:
1. The preadv coroutine calls aio_bh_schedule_oneshot() and then yields.
2. The one-shot BH executes in another AioContext. All it does is call
aio_co_wakeup(preadv_co).
3. The preadv coroutine is re-entered and returns.
There is a race condition in aio_co_wake() where the preadv coroutine
returns and the test case destroys the preadv IOThread. aio_co_wake()
can still be running in the other AioContext and it performs an access
to the freed IOThread AioContext.
Here is the race in aio_co_schedule():
QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
co, co_scheduled_next);
<-- race: co may execute before we invoke qemu_bh_schedule()!
qemu_bh_schedule(ctx->co_schedule_bh);
So if co causes ctx to be freed then we're in trouble. Fix this problem
by holding a reference to ctx.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
util/async.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/util/async.c b/util/async.c
index 8d2105729c..4e4c7af51e 100644
--- a/util/async.c
+++ b/util/async.c
@@ -459,9 +459,17 @@ void aio_co_schedule(AioContext *ctx, Coroutine *co)
abort();
}
+ /* The coroutine might run and release the last ctx reference before we
+ * invoke qemu_bh_schedule(). Take a reference to keep ctx alive until
+ * we're done.
+ */
+ aio_context_ref(ctx);
+
QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines,
co, co_scheduled_next);
qemu_bh_schedule(ctx->co_schedule_bh);
+
+ aio_context_unref(ctx);
}
void aio_co_wake(struct Coroutine *co)
--
2.21.0
On Tue, Jul 23, 2019 at 8:06 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: > So if co causes ctx to be freed then we're in trouble. Fix this problem > by holding a reference to ctx. For QEMU 4.2. I'm not aware of a way to trigger this bug in QEMU proper. This fix just makes tests/test-bdrv-drain more reliable. Stefan
On 7/23/19 9:09 PM, Stefan Hajnoczi wrote: > On Tue, Jul 23, 2019 at 8:06 PM Stefan Hajnoczi <stefanha@redhat.com> wrote: >> So if co causes ctx to be freed then we're in trouble. Fix this problem >> by holding a reference to ctx. > > For QEMU 4.2. I'm not aware of a way to trigger this bug in QEMU > proper. This fix just makes tests/test-bdrv-drain more reliable. This looks harmless for 4.1-rc3.
On 23/07/19 21:06, Stefan Hajnoczi wrote: > The tests/test-bdrv-drain /bdrv-drain/iothread/drain test case does the > following: > > 1. The preadv coroutine calls aio_bh_schedule_oneshot() and then yields. > 2. The one-shot BH executes in another AioContext. All it does is call > aio_co_wakeup(preadv_co). > 3. The preadv coroutine is re-entered and returns. > > There is a race condition in aio_co_wake() where the preadv coroutine > returns and the test case destroys the preadv IOThread. aio_co_wake() > can still be running in the other AioContext and it performs an access > to the freed IOThread AioContext. > > Here is the race in aio_co_schedule(): > > QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines, > co, co_scheduled_next); > <-- race: co may execute before we invoke qemu_bh_schedule()! > qemu_bh_schedule(ctx->co_schedule_bh); > > So if co causes ctx to be freed then we're in trouble. Fix this problem > by holding a reference to ctx. > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > util/async.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/util/async.c b/util/async.c > index 8d2105729c..4e4c7af51e 100644 > --- a/util/async.c > +++ b/util/async.c > @@ -459,9 +459,17 @@ void aio_co_schedule(AioContext *ctx, Coroutine *co) > abort(); > } > > + /* The coroutine might run and release the last ctx reference before we > + * invoke qemu_bh_schedule(). Take a reference to keep ctx alive until > + * we're done. > + */ > + aio_context_ref(ctx); > + > QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines, > co, co_scheduled_next); > qemu_bh_schedule(ctx->co_schedule_bh); > + > + aio_context_unref(ctx); > } > > void aio_co_wake(struct Coroutine *co) > This must have been painful to debug. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Paolo
On Tue, Jul 23, 2019 at 08:06:23PM +0100, Stefan Hajnoczi wrote: > The tests/test-bdrv-drain /bdrv-drain/iothread/drain test case does the > following: > > 1. The preadv coroutine calls aio_bh_schedule_oneshot() and then yields. > 2. The one-shot BH executes in another AioContext. All it does is call > aio_co_wakeup(preadv_co). > 3. The preadv coroutine is re-entered and returns. > > There is a race condition in aio_co_wake() where the preadv coroutine > returns and the test case destroys the preadv IOThread. aio_co_wake() > can still be running in the other AioContext and it performs an access > to the freed IOThread AioContext. > > Here is the race in aio_co_schedule(): > > QSLIST_INSERT_HEAD_ATOMIC(&ctx->scheduled_coroutines, > co, co_scheduled_next); > <-- race: co may execute before we invoke qemu_bh_schedule()! > qemu_bh_schedule(ctx->co_schedule_bh); > > So if co causes ctx to be freed then we're in trouble. Fix this problem > by holding a reference to ctx. > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > --- > util/async.c | 8 ++++++++ > 1 file changed, 8 insertions(+) Thanks, applied to my block tree: https://github.com/stefanha/qemu/commits/block Stefan
© 2016 - 2026 Red Hat, Inc.