tests/test-bdrv-drain can hang in tests/iothread.c:iothread_run():
while (!atomic_read(&iothread->stopping)) {
aio_poll(iothread->ctx, true);
}
The iothread_join() function works as follows:
void iothread_join(IOThread *iothread)
{
iothread->stopping = true;
aio_notify(iothread->ctx);
qemu_thread_join(&iothread->thread);
If iothread_run() checks iothread->stopping before the iothread_join()
thread sets stopping to true, then aio_notify() may be optimized away
and iothread_run() hangs forever in aio_poll().
The correct way to change iothread->stopping is from a BH that executes
within iothread_run(). This ensures that iothread->stopping is checked
after we set it to true.
This was already fixed for ./iothread.c (note this is a different source
file!) by commit 2362a28ea11c145e1a13ae79342d76dc118a72a6 ("iothread:
fix iothread_stop() race condition"), but not for tests/iothread.c.
Fixes: 0c330a734b51c177ab8488932ac3b0c4d63a718a
("aio: introduce aio_co_schedule and aio_co_wake")
Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20191003100103.331-1-stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
tests/iothread.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/tests/iothread.c b/tests/iothread.c
index 777d9eea46..13c9fdcd8d 100644
--- a/tests/iothread.c
+++ b/tests/iothread.c
@@ -55,10 +55,16 @@ static void *iothread_run(void *opaque)
return NULL;
}
-void iothread_join(IOThread *iothread)
+static void iothread_stop_bh(void *opaque)
{
+ IOThread *iothread = opaque;
+
iothread->stopping = true;
- aio_notify(iothread->ctx);
+}
+
+void iothread_join(IOThread *iothread)
+{
+ aio_bh_schedule_oneshot(iothread->ctx, iothread_stop_bh, iothread);
qemu_thread_join(&iothread->thread);
qemu_cond_destroy(&iothread->init_done_cond);
qemu_mutex_destroy(&iothread->init_done_lock);
--
2.21.0
On 14/10/19 10:52, Stefan Hajnoczi wrote:
> tests/test-bdrv-drain can hang in tests/iothread.c:iothread_run():
>
> while (!atomic_read(&iothread->stopping)) {
> aio_poll(iothread->ctx, true);
> }
>
> The iothread_join() function works as follows:
>
> void iothread_join(IOThread *iothread)
> {
> iothread->stopping = true;
> aio_notify(iothread->ctx);
> qemu_thread_join(&iothread->thread);
>
> If iothread_run() checks iothread->stopping before the iothread_join()
> thread sets stopping to true, then aio_notify() may be optimized away
> and iothread_run() hangs forever in aio_poll().
>
> The correct way to change iothread->stopping is from a BH that executes
> within iothread_run(). This ensures that iothread->stopping is checked
> after we set it to true.
>
> This was already fixed for ./iothread.c (note this is a different source
> file!) by commit 2362a28ea11c145e1a13ae79342d76dc118a72a6 ("iothread:
> fix iothread_stop() race condition"), but not for tests/iothread.c.
Aha, I did have some kind of dejavu when sending the patch I have just
sent; let's see if this also fixes the test-aio-multithread assertion
failure.
Note that with this change the atomic read of iothread->stopping can go
away; I can send a separate patch later.
Paolo
> Fixes: 0c330a734b51c177ab8488932ac3b0c4d63a718a
> ("aio: introduce aio_co_schedule and aio_co_wake")
> Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> Message-Id: <20191003100103.331-1-stefanha@redhat.com>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
> tests/iothread.c | 10 ++++++++--
> 1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/tests/iothread.c b/tests/iothread.c
> index 777d9eea46..13c9fdcd8d 100644
> --- a/tests/iothread.c
> +++ b/tests/iothread.c
> @@ -55,10 +55,16 @@ static void *iothread_run(void *opaque)
> return NULL;
> }
>
> -void iothread_join(IOThread *iothread)
> +static void iothread_stop_bh(void *opaque)
> {
> + IOThread *iothread = opaque;
> +
> iothread->stopping = true;
> - aio_notify(iothread->ctx);
> +}
> +
> +void iothread_join(IOThread *iothread)
> +{
> + aio_bh_schedule_oneshot(iothread->ctx, iothread_stop_bh, iothread);
> qemu_thread_join(&iothread->thread);
> qemu_cond_destroy(&iothread->init_done_cond);
> qemu_mutex_destroy(&iothread->init_done_lock);
>
On Mon, Oct 14, 2019 at 01:11:41PM +0200, Paolo Bonzini wrote:
> On 14/10/19 10:52, Stefan Hajnoczi wrote:
> > tests/test-bdrv-drain can hang in tests/iothread.c:iothread_run():
> >
> > while (!atomic_read(&iothread->stopping)) {
> > aio_poll(iothread->ctx, true);
> > }
> >
> > The iothread_join() function works as follows:
> >
> > void iothread_join(IOThread *iothread)
> > {
> > iothread->stopping = true;
> > aio_notify(iothread->ctx);
> > qemu_thread_join(&iothread->thread);
> >
> > If iothread_run() checks iothread->stopping before the iothread_join()
> > thread sets stopping to true, then aio_notify() may be optimized away
> > and iothread_run() hangs forever in aio_poll().
> >
> > The correct way to change iothread->stopping is from a BH that executes
> > within iothread_run(). This ensures that iothread->stopping is checked
> > after we set it to true.
> >
> > This was already fixed for ./iothread.c (note this is a different source
> > file!) by commit 2362a28ea11c145e1a13ae79342d76dc118a72a6 ("iothread:
> > fix iothread_stop() race condition"), but not for tests/iothread.c.
>
> Aha, I did have some kind of dejavu when sending the patch I have just
> sent; let's see if this also fixes the test-aio-multithread assertion
> failure.
>
> Note that with this change the atomic read of iothread->stopping can go
> away; I can send a separate patch later.
Yes, I thought about the atomic_read() later as well.
Stefan
© 2016 - 2026 Red Hat, Inc.