[v2] aio: remove AioContext lock

[PATCH v2 04/14] aio: make aio_context_acquire()/aio_context_release() a no-op

Posted by Stefan Hajnoczi 2 years, 2 months ago

aio_context_acquire()/aio_context_release() has been replaced by
fine-grained locking to protect state shared by multiple threads. The
AioContext lock still plays the role of balancing locking in
AIO_WAIT_WHILE() and many functions in QEMU either require that the
AioContext lock is held or not held for this reason. In other words, the
AioContext lock is purely there for consistency with itself and serves
no real purpose anymore.

Stop actually acquiring/releasing the lock in
aio_context_acquire()/aio_context_release() so that subsequent patches
can remove callers across the codebase incrementally.

I have performed "make check" and qemu-iotests stress tests across
x86-64, ppc64le, and aarch64 to confirm that there are no failures as a
result of eliminating the lock.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Acked-by: Kevin Wolf <kwolf@redhat.com>
---
 util/async.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/util/async.c b/util/async.c
index 8f90ddc304..04ee83d220 100644
--- a/util/async.c
+++ b/util/async.c
@@ -725,12 +725,12 @@ void aio_context_unref(AioContext *ctx)
 
 void aio_context_acquire(AioContext *ctx)
 {
-    qemu_rec_mutex_lock(&ctx->lock);
+    /* TODO remove this function */
 }
 
 void aio_context_release(AioContext *ctx)
 {
-    qemu_rec_mutex_unlock(&ctx->lock);
+    /* TODO remove this function */
 }
 
 QEMU_DEFINE_STATIC_CO_TLS(AioContext *, my_aiocontext)
-- 
2.43.0

Re: [PATCH v2 04/14] aio: make aio_context_acquire()/aio_context_release() a no-op

Posted by Kevin Wolf 2 years, 1 month ago

Am 05.12.2023 um 19:20 hat Stefan Hajnoczi geschrieben:
> aio_context_acquire()/aio_context_release() has been replaced by
> fine-grained locking to protect state shared by multiple threads. The
> AioContext lock still plays the role of balancing locking in
> AIO_WAIT_WHILE() and many functions in QEMU either require that the
> AioContext lock is held or not held for this reason. In other words, the
> AioContext lock is purely there for consistency with itself and serves
> no real purpose anymore.
> 
> Stop actually acquiring/releasing the lock in
> aio_context_acquire()/aio_context_release() so that subsequent patches
> can remove callers across the codebase incrementally.
> 
> I have performed "make check" and qemu-iotests stress tests across
> x86-64, ppc64le, and aarch64 to confirm that there are no failures as a
> result of eliminating the lock.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> Reviewed-by: Eric Blake <eblake@redhat.com>
> Acked-by: Kevin Wolf <kwolf@redhat.com>

I knew why I wasn't confident enough to give a R-b... This crashes
qemu-storage-daemon in the qemu-iotests case graph-changes-while-io.

qemu-storage-daemon: ../nbd/server.c:2542: nbd_co_receive_request: Assertion `client->recv_coroutine == qemu_coroutine_self()' failed.

(gdb) bt
#0  0x00007fdb00529884 in __pthread_kill_implementation () from /lib64/libc.so.6
#1  0x00007fdb004d8afe in raise () from /lib64/libc.so.6
#2  0x00007fdb004c187f in abort () from /lib64/libc.so.6
#3  0x00007fdb004c179b in __assert_fail_base.cold () from /lib64/libc.so.6
#4  0x00007fdb004d1187 in __assert_fail () from /lib64/libc.so.6
#5  0x0000557f9f9534eb in nbd_co_receive_request (errp=0x7fdafc25eec0, request=0x7fdafc25ef10, req=0x7fdaf00159c0) at ../nbd/server.c:2542
#6  nbd_trip (opaque=0x557fa0b33fa0) at ../nbd/server.c:2962
#7  0x0000557f9faa416b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:177
#8  0x00007fdb004efe90 in ?? () from /lib64/libc.so.6
#9  0x00007fdafc35f680 in ?? ()
#10 0x0000000000000000 in ?? ()
(gdb) p *client
$2 = {refcount = 4, close_fn = 0x557f9f95dc40 <nbd_blockdev_client_closed>, exp = 0x557fa0b30590, tlscreds = 0x0, tlsauthz = 0x0, sioc = 0x557fa0b33d90, ioc = 0x557fa0b33d90,
  recv_coroutine = 0x7fdaf0015eb0, send_lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0}, handoff = 0, sequence = 0, holder = 0x0},
  send_coroutine = 0x0, read_yielding = false, quiescing = false, next = {tqe_next = 0x0, tqe_circ = {tql_next = 0x0, tql_prev = 0x557fa0b305e8}}, nb_requests = 1, closing = false,
  check_align = 1, mode = NBD_MODE_EXTENDED, contexts = {exp = 0x557fa0b30590, count = 1, base_allocation = true, allocation_depth = false, bitmaps = 0x0}, opt = 7, optlen = 0}
(gdb) p co_tls_current
$3 = (Coroutine *) 0x7fdaf00061d0

Kevin

Re: [PATCH v2 04/14] aio: make aio_context_acquire()/aio_context_release() a no-op

Posted by Kevin Wolf 2 years, 1 month ago

Am 19.12.2023 um 16:28 hat Kevin Wolf geschrieben:
> Am 05.12.2023 um 19:20 hat Stefan Hajnoczi geschrieben:
> > aio_context_acquire()/aio_context_release() has been replaced by
> > fine-grained locking to protect state shared by multiple threads. The
> > AioContext lock still plays the role of balancing locking in
> > AIO_WAIT_WHILE() and many functions in QEMU either require that the
> > AioContext lock is held or not held for this reason. In other words, the
> > AioContext lock is purely there for consistency with itself and serves
> > no real purpose anymore.
> > 
> > Stop actually acquiring/releasing the lock in
> > aio_context_acquire()/aio_context_release() so that subsequent patches
> > can remove callers across the codebase incrementally.
> > 
> > I have performed "make check" and qemu-iotests stress tests across
> > x86-64, ppc64le, and aarch64 to confirm that there are no failures as a
> > result of eliminating the lock.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > Reviewed-by: Eric Blake <eblake@redhat.com>
> > Acked-by: Kevin Wolf <kwolf@redhat.com>
> 
> I knew why I wasn't confident enough to give a R-b... This crashes
> qemu-storage-daemon in the qemu-iotests case graph-changes-while-io.
> 
> qemu-storage-daemon: ../nbd/server.c:2542: nbd_co_receive_request: Assertion `client->recv_coroutine == qemu_coroutine_self()' failed.
> 
> (gdb) bt
> #0  0x00007fdb00529884 in __pthread_kill_implementation () from /lib64/libc.so.6
> #1  0x00007fdb004d8afe in raise () from /lib64/libc.so.6
> #2  0x00007fdb004c187f in abort () from /lib64/libc.so.6
> #3  0x00007fdb004c179b in __assert_fail_base.cold () from /lib64/libc.so.6
> #4  0x00007fdb004d1187 in __assert_fail () from /lib64/libc.so.6
> #5  0x0000557f9f9534eb in nbd_co_receive_request (errp=0x7fdafc25eec0, request=0x7fdafc25ef10, req=0x7fdaf00159c0) at ../nbd/server.c:2542
> #6  nbd_trip (opaque=0x557fa0b33fa0) at ../nbd/server.c:2962
> #7  0x0000557f9faa416b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:177
> #8  0x00007fdb004efe90 in ?? () from /lib64/libc.so.6
> #9  0x00007fdafc35f680 in ?? ()
> #10 0x0000000000000000 in ?? ()
> (gdb) p *client
> $2 = {refcount = 4, close_fn = 0x557f9f95dc40 <nbd_blockdev_client_closed>, exp = 0x557fa0b30590, tlscreds = 0x0, tlsauthz = 0x0, sioc = 0x557fa0b33d90, ioc = 0x557fa0b33d90,
>   recv_coroutine = 0x7fdaf0015eb0, send_lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0}, handoff = 0, sequence = 0, holder = 0x0},
>   send_coroutine = 0x0, read_yielding = false, quiescing = false, next = {tqe_next = 0x0, tqe_circ = {tql_next = 0x0, tql_prev = 0x557fa0b305e8}}, nb_requests = 1, closing = false,
>   check_align = 1, mode = NBD_MODE_EXTENDED, contexts = {exp = 0x557fa0b30590, count = 1, base_allocation = true, allocation_depth = false, bitmaps = 0x0}, opt = 7, optlen = 0}
> (gdb) p co_tls_current
> $3 = (Coroutine *) 0x7fdaf00061d0

This one isn't easy to debug...

The first problem here is that two nbd_trip() coroutines are scheduled
in the same iothread, and creating the second one overwrites
client->recv_coroutine, which triggers the assertion in the first one.

This can be fixed by introducing a new mutex in NBDClient and taking it
in nbd_client_receive_next_request() so that there is no race between
checking client->recv_coroutine != NULL and setting it to a new
coroutine. (Not entirely sure why two different threads are doing this,
maybe the main thread reentering in drained_end and the iothread waiting
for the next request?)

However, I'm seeing new assertion failures when I do that:
client->quiescing isn't set in the -EAGAIN case in nbd_trip(). I haven't
really figured out yet where this comes from. Taking the new NBDClient
lock in the drain functions and in nbd_trip() doesn't seem to be enough
to fix it anyway. Or maybe I didn't quite find the right places to take
it.

I'm not sure if the list of NBD clients needs a lock, too, or if it is
only ever accessed in the main thread, but that should be unrelated to
the assertion failures I'm seeing.

Kevin

Re: [PATCH v2 04/14] aio: make aio_context_acquire()/aio_context_release() a no-op

Posted by Stefan Hajnoczi 2 years, 1 month ago

On Tue, 19 Dec 2023 at 13:20, Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 19.12.2023 um 16:28 hat Kevin Wolf geschrieben:
> > Am 05.12.2023 um 19:20 hat Stefan Hajnoczi geschrieben:
> > > aio_context_acquire()/aio_context_release() has been replaced by
> > > fine-grained locking to protect state shared by multiple threads. The
> > > AioContext lock still plays the role of balancing locking in
> > > AIO_WAIT_WHILE() and many functions in QEMU either require that the
> > > AioContext lock is held or not held for this reason. In other words, the
> > > AioContext lock is purely there for consistency with itself and serves
> > > no real purpose anymore.
> > >
> > > Stop actually acquiring/releasing the lock in
> > > aio_context_acquire()/aio_context_release() so that subsequent patches
> > > can remove callers across the codebase incrementally.
> > >
> > > I have performed "make check" and qemu-iotests stress tests across
> > > x86-64, ppc64le, and aarch64 to confirm that there are no failures as a
> > > result of eliminating the lock.
> > >
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Reviewed-by: Eric Blake <eblake@redhat.com>
> > > Acked-by: Kevin Wolf <kwolf@redhat.com>
> >
> > I knew why I wasn't confident enough to give a R-b... This crashes
> > qemu-storage-daemon in the qemu-iotests case graph-changes-while-io.
> >
> > qemu-storage-daemon: ../nbd/server.c:2542: nbd_co_receive_request: Assertion `client->recv_coroutine == qemu_coroutine_self()' failed.
> >
> > (gdb) bt
> > #0  0x00007fdb00529884 in __pthread_kill_implementation () from /lib64/libc.so.6
> > #1  0x00007fdb004d8afe in raise () from /lib64/libc.so.6
> > #2  0x00007fdb004c187f in abort () from /lib64/libc.so.6
> > #3  0x00007fdb004c179b in __assert_fail_base.cold () from /lib64/libc.so.6
> > #4  0x00007fdb004d1187 in __assert_fail () from /lib64/libc.so.6
> > #5  0x0000557f9f9534eb in nbd_co_receive_request (errp=0x7fdafc25eec0, request=0x7fdafc25ef10, req=0x7fdaf00159c0) at ../nbd/server.c:2542
> > #6  nbd_trip (opaque=0x557fa0b33fa0) at ../nbd/server.c:2962
> > #7  0x0000557f9faa416b in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at ../util/coroutine-ucontext.c:177
> > #8  0x00007fdb004efe90 in ?? () from /lib64/libc.so.6
> > #9  0x00007fdafc35f680 in ?? ()
> > #10 0x0000000000000000 in ?? ()
> > (gdb) p *client
> > $2 = {refcount = 4, close_fn = 0x557f9f95dc40 <nbd_blockdev_client_closed>, exp = 0x557fa0b30590, tlscreds = 0x0, tlsauthz = 0x0, sioc = 0x557fa0b33d90, ioc = 0x557fa0b33d90,
> >   recv_coroutine = 0x7fdaf0015eb0, send_lock = {locked = 0, ctx = 0x0, from_push = {slh_first = 0x0}, to_pop = {slh_first = 0x0}, handoff = 0, sequence = 0, holder = 0x0},
> >   send_coroutine = 0x0, read_yielding = false, quiescing = false, next = {tqe_next = 0x0, tqe_circ = {tql_next = 0x0, tql_prev = 0x557fa0b305e8}}, nb_requests = 1, closing = false,
> >   check_align = 1, mode = NBD_MODE_EXTENDED, contexts = {exp = 0x557fa0b30590, count = 1, base_allocation = true, allocation_depth = false, bitmaps = 0x0}, opt = 7, optlen = 0}
> > (gdb) p co_tls_current
> > $3 = (Coroutine *) 0x7fdaf00061d0
>
> This one isn't easy to debug...
>
> The first problem here is that two nbd_trip() coroutines are scheduled
> in the same iothread, and creating the second one overwrites
> client->recv_coroutine, which triggers the assertion in the first one.
>
> This can be fixed by introducing a new mutex in NBDClient and taking it
> in nbd_client_receive_next_request() so that there is no race between
> checking client->recv_coroutine != NULL and setting it to a new
> coroutine. (Not entirely sure why two different threads are doing this,
> maybe the main thread reentering in drained_end and the iothread waiting
> for the next request?)
>
> However, I'm seeing new assertion failures when I do that:
> client->quiescing isn't set in the -EAGAIN case in nbd_trip(). I haven't
> really figured out yet where this comes from. Taking the new NBDClient
> lock in the drain functions and in nbd_trip() doesn't seem to be enough
> to fix it anyway. Or maybe I didn't quite find the right places to take
> it.

bdrv_graph_wrlock() -> bdrv_drain_all_begin_nopoll() followed by
bdrv_drain_all_end() causes this issue.

It's a race condition where nbd_trip() in the IOThread sees
client->quiescing == true for a moment but then the drained region
ends before nbd_trip() re-acquires the lock and reaches
assert(client->quiescing).

The "nopoll" part of bdrv_drain_all_begin_nopoll() seems to be the
issue. We cannot assume all requests have quiesced when .drained_end()
is called.

I'm running more tests now to be sure I have a working solution. Will
send patches soon.

Stefan

Re: [PATCH v2 04/14] aio: make aio_context_acquire()/aio_context_release() a no-op

Posted by Stefan Hajnoczi 2 years, 1 month ago

The following hack makes the test pass but there are larger safety
issues that I'll need to look at on Wednesday:

diff --git a/nbd/server.c b/nbd/server.c
index 895cf0a752..cf4b7d5c6d 100644
--- a/nbd/server.c
+++ b/nbd/server.c
@@ -1617,7 +1617,7 @@ static void nbd_drained_begin(void *opaque)
     }
 }

-static void nbd_drained_end(void *opaque)
+static void nbd_resume_clients(void *opaque)
 {
     NBDExport *exp = opaque;
     NBDClient *client;
@@ -1628,6 +1628,15 @@ static void nbd_drained_end(void *opaque)
     }
 }

+static void nbd_drained_end(void *opaque)
+{
+    NBDExport *exp = opaque;
+
+    /* TODO how to make sure exp doesn't go away? */
+    /* TODO what if AioContext changes before this runs? */
+    aio_bh_schedule_oneshot(nbd_export_aio_context(exp),
nbd_resume_clients, exp);
+}
+
 static bool nbd_drained_poll(void *opaque)
 {
     NBDExport *exp = opaque;

Re: [PATCH v2 04/14] aio: make aio_context_acquire()/aio_context_release() a no-op

Posted by Kevin Wolf 2 years, 1 month ago

Am 19.12.2023 um 22:23 hat Stefan Hajnoczi geschrieben:
> The following hack makes the test pass but there are larger safety
> issues that I'll need to look at on Wednesday:

I see, you're taking the same approach as in the SCSI layer: Don't make
things thread-safe, but just always access them from the same thread.

In theory this should be okay, but I'm almost sure that at least
nbd_drained_poll() must then run in the same AioContext, too.

> diff --git a/nbd/server.c b/nbd/server.c
> index 895cf0a752..cf4b7d5c6d 100644
> --- a/nbd/server.c
> +++ b/nbd/server.c
> @@ -1617,7 +1617,7 @@ static void nbd_drained_begin(void *opaque)
>      }
>  }
> 
> -static void nbd_drained_end(void *opaque)
> +static void nbd_resume_clients(void *opaque)
>  {
>      NBDExport *exp = opaque;
>      NBDClient *client;
> @@ -1628,6 +1628,15 @@ static void nbd_drained_end(void *opaque)
>      }
>  }
> 
> +static void nbd_drained_end(void *opaque)
> +{
> +    NBDExport *exp = opaque;
> +
> +    /* TODO how to make sure exp doesn't go away? */

blk_exp_ref()?

> +    /* TODO what if AioContext changes before this runs? */
> +    aio_bh_schedule_oneshot(nbd_export_aio_context(exp),
> nbd_resume_clients, exp);

We could increase client->nb_requests if we change it to be accessed
atomically. Then nbd_drained_poll() will make any AioContext change wait
for the BH.

Or maybe aio_wait_bh_oneshot() would already solve both problems?

> +}
> +

Kevin

Re: [PATCH v2 04/14] aio: make aio_context_acquire()/aio_context_release() a no-op

Posted by Stefan Hajnoczi 2 years, 1 month ago

On Wed, 20 Dec 2023 at 04:32, Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 19.12.2023 um 22:23 hat Stefan Hajnoczi geschrieben:
> > The following hack makes the test pass but there are larger safety
> > issues that I'll need to look at on Wednesday:
>
> I see, you're taking the same approach as in the SCSI layer: Don't make
> things thread-safe, but just always access them from the same thread.

Yes, but it feels like a hack to me. You pointed out that other parts
also don't look thread-safe (e.g. the clients list) and I agree. I've
started annotating the code and will try to come up with a full fix
today.

Stefan