[Qemu-devel] [PATCH v2 08/17] block: Add missing locking in bdrv_co_drain_bh_cb()

Kevin Wolf posted 17 patches 7 years, 1 month ago
There is a newer version of this series
[Qemu-devel] [PATCH v2 08/17] block: Add missing locking in bdrv_co_drain_bh_cb()
Posted by Kevin Wolf 7 years, 1 month ago
bdrv_do_drained_begin/end() assume that they are called with the
AioContext lock of bs held. If we call drain functions from a coroutine
with the AioContext lock held, we yield and schedule a BH to move out of
coroutine context. This means that the lock for the home context of the
coroutine is released and must be re-acquired in the bottom half.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/qemu/coroutine.h |  5 +++++
 block/io.c               | 15 +++++++++++++++
 util/qemu-coroutine.c    |  5 +++++
 3 files changed, 25 insertions(+)

diff --git a/include/qemu/coroutine.h b/include/qemu/coroutine.h
index 6f8a487041..9801e7f5a4 100644
--- a/include/qemu/coroutine.h
+++ b/include/qemu/coroutine.h
@@ -90,6 +90,11 @@ void qemu_aio_coroutine_enter(AioContext *ctx, Coroutine *co);
 void coroutine_fn qemu_coroutine_yield(void);
 
 /**
+ * Get the AioContext of the given coroutine
+ */
+AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co);
+
+/**
  * Get the currently executing coroutine
  */
 Coroutine *coroutine_fn qemu_coroutine_self(void);
diff --git a/block/io.c b/block/io.c
index 7100344c7b..914ba78f1a 100644
--- a/block/io.c
+++ b/block/io.c
@@ -288,6 +288,18 @@ static void bdrv_co_drain_bh_cb(void *opaque)
     BlockDriverState *bs = data->bs;
 
     if (bs) {
+        AioContext *ctx = bdrv_get_aio_context(bs);
+        AioContext *co_ctx = qemu_coroutine_get_aio_context(co);
+
+        /*
+         * When the coroutine yielded, the lock for its home context was
+         * released, so we need to re-acquire it here. If it explicitly
+         * acquired a different context, the lock is still held and we don't
+         * want to lock it a second time (or AIO_WAIT_WHILE() would hang).
+         */
+        if (ctx == co_ctx) {
+            aio_context_acquire(ctx);
+        }
         bdrv_dec_in_flight(bs);
         if (data->begin) {
             bdrv_do_drained_begin(bs, data->recursive, data->parent,
@@ -296,6 +308,9 @@ static void bdrv_co_drain_bh_cb(void *opaque)
             bdrv_do_drained_end(bs, data->recursive, data->parent,
                                 data->ignore_bds_parents);
         }
+        if (ctx == co_ctx) {
+            aio_context_release(ctx);
+        }
     } else {
         assert(data->begin);
         bdrv_drain_all_begin();
diff --git a/util/qemu-coroutine.c b/util/qemu-coroutine.c
index 1ba4191b84..2295928d33 100644
--- a/util/qemu-coroutine.c
+++ b/util/qemu-coroutine.c
@@ -198,3 +198,8 @@ bool qemu_coroutine_entered(Coroutine *co)
 {
     return co->caller;
 }
+
+AioContext *coroutine_fn qemu_coroutine_get_aio_context(Coroutine *co)
+{
+    return co->ctx;
+}
-- 
2.13.6


Re: [Qemu-devel] [PATCH v2 08/17] block: Add missing locking in bdrv_co_drain_bh_cb()
Posted by Paolo Bonzini 7 years, 1 month ago
On 13/09/2018 14:52, Kevin Wolf wrote:
> bdrv_do_drained_begin/end() assume that they are called with the
> AioContext lock of bs held. If we call drain functions from a coroutine
> with the AioContext lock held, we yield and schedule a BH to move out of
> coroutine context. This means that the lock for the home context of the
> coroutine is released and must be re-acquired in the bottom half.

What exactly needs the lock, is it bdrv_drain_invoke?

Would it make sense to always do release/acquire in bdrv_drain, and
always do acquire/release in bdrv_drain_invoke?  (Conditional locking is
tricky...).

Thanks,

Paolo

Re: [Qemu-devel] [PATCH v2 08/17] block: Add missing locking in bdrv_co_drain_bh_cb()
Posted by Kevin Wolf 7 years, 1 month ago
Am 13.09.2018 um 17:17 hat Paolo Bonzini geschrieben:
> On 13/09/2018 14:52, Kevin Wolf wrote:
> > bdrv_do_drained_begin/end() assume that they are called with the
> > AioContext lock of bs held. If we call drain functions from a coroutine
> > with the AioContext lock held, we yield and schedule a BH to move out of
> > coroutine context. This means that the lock for the home context of the
> > coroutine is released and must be re-acquired in the bottom half.
> 
> What exactly needs the lock, is it bdrv_drain_invoke?
> 
> Would it make sense to always do release/acquire in bdrv_drain, and
> always do acquire/release in bdrv_drain_invoke?  (Conditional locking is
> tricky...).

The thing that made it obvious was an aio_poll() call around which we
want to release the lock temporarily, and if you don't hold it, you get
a crash. This aio_poll() has actually disappeared in v2, and I'm not
sure if AIO_WAIT_WHILE() can hit it, but I think locking is still right.

I'm not sure what data structures are actually protected by it, but the
simple rule as documented for bdrv_co_drain() has always been to hold
the AioContext lock of bs when you call bdrv_drain(bs), so this patch
just obeys it.

Kevin

Re: [Qemu-devel] [PATCH v2 08/17] block: Add missing locking in bdrv_co_drain_bh_cb()
Posted by Max Reitz 7 years, 1 month ago
On 13.09.18 14:52, Kevin Wolf wrote:
> bdrv_do_drained_begin/end() assume that they are called with the
> AioContext lock of bs held. If we call drain functions from a coroutine
> with the AioContext lock held, we yield and schedule a BH to move out of
> coroutine context. This means that the lock for the home context of the
> coroutine is released and must be re-acquired in the bottom half.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  include/qemu/coroutine.h |  5 +++++
>  block/io.c               | 15 +++++++++++++++
>  util/qemu-coroutine.c    |  5 +++++
>  3 files changed, 25 insertions(+)

I suppose the coroutine lock you mean is the one aio_co_enter() takes?

If so:

Reviewed-by: Max Reitz <mreitz@redhat.com>

(Sorry, I still really don't have much understanding of how coroutines,
bottom halves, and all nice things like them work.)