[PATCH v2 09/24] block: move drain outside of bdrv_try_change_aio_context()

Fiona Ebner posted 24 patches 5 months, 4 weeks ago
Maintainers: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, John Snow <jsnow@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>, Ari Sundholm <ari@tuxera.com>, Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>, Alberto Garcia <berto@igalia.com>, Wen Congyang <wencongyang2@huawei.com>, Xie Changlong <xiechanglong.d@gmail.com>
There is a newer version of this series
[PATCH v2 09/24] block: move drain outside of bdrv_try_change_aio_context()
Posted by Fiona Ebner 5 months, 4 weeks ago
This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".

Convert the function to a _locked() version that has to be called with
the graph lock held and add a convenience wrapper that has to be
called with the graph unlocked, which drains and takes the lock
itself. Since bdrv_try_change_aio_context() is global state code, the
wrapper is too.

Callers are adapted to use the appropriate variant. In the
test_set_aio_context() unit test, prior drains can be removed, because
draining already happens inside the new wrapper.

Note that bdrv_attach_child_common_abort(), bdrv_attach_child_common()
and bdrv_root_unref_child() hold the graph lock and are not actually
allowed to drain either. This will be addressed in the following
commits.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

Changes in v2:
* Split up into smaller pieces, flesh out commit messages.

 block.c                            | 58 ++++++++++++++++++++++--------
 blockdev.c                         | 15 +++++---
 include/block/block-global-state.h |  8 +++--
 tests/unit/test-bdrv-drain.c       |  4 ---
 4 files changed, 60 insertions(+), 25 deletions(-)

diff --git a/block.c b/block.c
index 7148618504..5281ad1313 100644
--- a/block.c
+++ b/block.c
@@ -3028,7 +3028,10 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
     bdrv_replace_child_noperm(s->child, NULL);
 
     if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
-        bdrv_try_change_aio_context(bs, s->old_child_ctx, NULL, &error_abort);
+        bdrv_drain_all_begin();
+        bdrv_try_change_aio_context_locked(bs, s->old_child_ctx, NULL,
+                                           &error_abort);
+        bdrv_drain_all_end();
     }
 
     if (bdrv_child_get_parent_aio_context(s->child) != s->old_parent_ctx) {
@@ -3115,8 +3118,10 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
     parent_ctx = bdrv_child_get_parent_aio_context(new_child);
     if (child_ctx != parent_ctx) {
         Error *local_err = NULL;
-        int ret = bdrv_try_change_aio_context(child_bs, parent_ctx, NULL,
-                                              &local_err);
+        bdrv_drain_all_begin();
+        int ret = bdrv_try_change_aio_context_locked(child_bs, parent_ctx, NULL,
+                                                     &local_err);
+        bdrv_drain_all_end();
 
         if (ret < 0 && child_class->change_aio_ctx) {
             Transaction *aio_ctx_tran = tran_new();
@@ -3319,8 +3324,10 @@ void bdrv_root_unref_child(BdrvChild *child)
          * When the parent requiring a non-default AioContext is removed, the
          * node moves back to the main AioContext
          */
-        bdrv_try_change_aio_context(child_bs, qemu_get_aio_context(), NULL,
-                                    NULL);
+        bdrv_drain_all_begin();
+        bdrv_try_change_aio_context_locked(child_bs, qemu_get_aio_context(),
+                                           NULL, NULL);
+        bdrv_drain_all_end();
     }
 
     bdrv_schedule_unref(child_bs);
@@ -7697,9 +7704,13 @@ static bool bdrv_change_aio_context(BlockDriverState *bs, AioContext *ctx,
  *
  * If ignore_child is not NULL, that child (and its subgraph) will not
  * be touched.
+ *
+ * Called with the graph lock held.
+ *
+ * Called while all bs are drained.
  */
-int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
-                                BdrvChild *ignore_child, Error **errp)
+int bdrv_try_change_aio_context_locked(BlockDriverState *bs, AioContext *ctx,
+                                       BdrvChild *ignore_child, Error **errp)
 {
     Transaction *tran;
     GHashTable *visited;
@@ -7708,20 +7719,16 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
 
     /*
      * Recursion phase: go through all nodes of the graph.
-     * Take care of checking that all nodes support changing AioContext
-     * and drain them, building a linear list of callbacks to run if everything
-     * is successful (the transaction itself).
+     * Take care of checking that all nodes support changing AioContext,
+     * building a linear list of callbacks to run if everything is successful
+     * (the transaction itself).
      */
     tran = tran_new();
     visited = g_hash_table_new(NULL, NULL);
     if (ignore_child) {
         g_hash_table_add(visited, ignore_child);
     }
-    bdrv_drain_all_begin();
-    bdrv_graph_rdlock_main_loop();
     ret = bdrv_change_aio_context(bs, ctx, visited, tran, errp);
-    bdrv_graph_rdunlock_main_loop();
-    bdrv_drain_all_end();
     g_hash_table_destroy(visited);
 
     /*
@@ -7741,6 +7748,29 @@ int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
     return 0;
 }
 
+/*
+ * Change bs's and recursively all of its parents' and children's AioContext
+ * to the given new context, returning an error if that isn't possible.
+ *
+ * If ignore_child is not NULL, that child (and its subgraph) will not
+ * be touched.
+ */
+int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+                                BdrvChild *ignore_child, Error **errp)
+{
+    int ret;
+
+    GLOBAL_STATE_CODE();
+
+    bdrv_drain_all_begin();
+    bdrv_graph_rdlock_main_loop();
+    ret = bdrv_try_change_aio_context_locked(bs, ctx, ignore_child, errp);
+    bdrv_graph_rdunlock_main_loop();
+    bdrv_drain_all_end();
+
+    return ret;
+}
+
 void bdrv_add_aio_context_notifier(BlockDriverState *bs,
         void (*attached_aio_context)(AioContext *new_context, void *opaque),
         void (*detach_aio_context)(void *opaque), void *opaque)
diff --git a/blockdev.c b/blockdev.c
index 3982f9776b..750beba41f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3601,12 +3601,13 @@ void qmp_x_blockdev_set_iothread(const char *node_name, StrOrNull *iothread,
     AioContext *new_context;
     BlockDriverState *bs;
 
-    GRAPH_RDLOCK_GUARD_MAINLOOP();
+    bdrv_drain_all_begin();
+    bdrv_graph_rdlock_main_loop();
 
     bs = bdrv_find_node(node_name);
     if (!bs) {
         error_setg(errp, "Failed to find node with node-name='%s'", node_name);
-        return;
+        goto out;
     }
 
     /* Protects against accidents. */
@@ -3614,14 +3615,14 @@ void qmp_x_blockdev_set_iothread(const char *node_name, StrOrNull *iothread,
         error_setg(errp, "Node %s is associated with a BlockBackend and could "
                          "be in use (use force=true to override this check)",
                          node_name);
-        return;
+        goto out;
     }
 
     if (iothread->type == QTYPE_QSTRING) {
         IOThread *obj = iothread_by_id(iothread->u.s);
         if (!obj) {
             error_setg(errp, "Cannot find iothread %s", iothread->u.s);
-            return;
+            goto out;
         }
 
         new_context = iothread_get_aio_context(obj);
@@ -3629,7 +3630,11 @@ void qmp_x_blockdev_set_iothread(const char *node_name, StrOrNull *iothread,
         new_context = qemu_get_aio_context();
     }
 
-    bdrv_try_change_aio_context(bs, new_context, NULL, errp);
+    bdrv_try_change_aio_context_locked(bs, new_context, NULL, errp);
+
+out:
+    bdrv_graph_rdunlock_main_loop();
+    bdrv_drain_all_end();
 }
 
 QemuOptsList qemu_common_drive_opts = {
diff --git a/include/block/block-global-state.h b/include/block/block-global-state.h
index aad160956a..91f249b5ad 100644
--- a/include/block/block-global-state.h
+++ b/include/block/block-global-state.h
@@ -278,8 +278,12 @@ bool GRAPH_RDLOCK
 bdrv_child_change_aio_context(BdrvChild *c, AioContext *ctx,
                               GHashTable *visited, Transaction *tran,
                               Error **errp);
-int bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
-                                BdrvChild *ignore_child, Error **errp);
+int GRAPH_UNLOCKED
+bdrv_try_change_aio_context(BlockDriverState *bs, AioContext *ctx,
+                            BdrvChild *ignore_child, Error **errp);
+int GRAPH_RDLOCK
+bdrv_try_change_aio_context_locked(BlockDriverState *bs, AioContext *ctx,
+                                   BdrvChild *ignore_child, Error **errp);
 
 int GRAPH_RDLOCK bdrv_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz);
 int bdrv_probe_geometry(BlockDriverState *bs, HDGeometry *geo);
diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 290cd2a70e..3185f3f429 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1396,14 +1396,10 @@ static void test_set_aio_context(void)
     bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
                               &error_abort);
 
-    bdrv_drained_begin(bs);
     bdrv_try_change_aio_context(bs, ctx_a, NULL, &error_abort);
-    bdrv_drained_end(bs);
 
-    bdrv_drained_begin(bs);
     bdrv_try_change_aio_context(bs, ctx_b, NULL, &error_abort);
     bdrv_try_change_aio_context(bs, qemu_get_aio_context(), NULL, &error_abort);
-    bdrv_drained_end(bs);
 
     bdrv_unref(bs);
     iothread_join(a);
-- 
2.39.5
Re: [PATCH v2 09/24] block: move drain outside of bdrv_try_change_aio_context()
Posted by Kevin Wolf 5 months, 4 weeks ago
Am 20.05.2025 um 12:29 hat Fiona Ebner geschrieben:
> This is part of resolving the deadlock mentioned in commit "block:
> move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
> 
> Convert the function to a _locked() version that has to be called with
> the graph lock held and add a convenience wrapper that has to be
> called with the graph unlocked, which drains and takes the lock
> itself. Since bdrv_try_change_aio_context() is global state code, the
> wrapper is too.
> 
> Callers are adapted to use the appropriate variant. In the
> test_set_aio_context() unit test, prior drains can be removed, because
> draining already happens inside the new wrapper.

I'm not sure what's the difference between qmp_x_blockdev_set_iothread()
which is converted to _locked() and e.g. qmp_blockdev_mirror().

The reason I could see in qmp_x_blockdev_set_iothread() is that (as we
discussed in the RFC version of this series) draining can change the
graph and that could in theory invalidate bs. But the same is true for
other functions in blockdev.c.

If it's just that qmp_x_blockdev_set_iothread() was easy to fix and the
others are more complex, that's fine with me, but maybe worth mentioning
in the commit message.

> Note that bdrv_attach_child_common_abort(), bdrv_attach_child_common()
> and bdrv_root_unref_child() hold the graph lock and are not actually
> allowed to drain either. This will be addressed in the following
> commits.
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Re: [PATCH v2 09/24] block: move drain outside of bdrv_try_change_aio_context()
Posted by Fiona Ebner 5 months, 3 weeks ago
Am 21.05.25 um 18:36 schrieb Kevin Wolf:
> Am 20.05.2025 um 12:29 hat Fiona Ebner geschrieben:
>> This is part of resolving the deadlock mentioned in commit "block:
>> move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
>>
>> Convert the function to a _locked() version that has to be called with
>> the graph lock held and add a convenience wrapper that has to be
>> called with the graph unlocked, which drains and takes the lock
>> itself. Since bdrv_try_change_aio_context() is global state code, the
>> wrapper is too.
>>
>> Callers are adapted to use the appropriate variant. In the
>> test_set_aio_context() unit test, prior drains can be removed, because
>> draining already happens inside the new wrapper.
> 
> I'm not sure what's the difference between qmp_x_blockdev_set_iothread()
> which is converted to _locked() and e.g. qmp_blockdev_mirror().

For deciding on the variant, I only looked at whether the caller holds
the graph lock or not.

> The reason I could see in qmp_x_blockdev_set_iothread() is that (as we
> discussed in the RFC version of this series) draining can change the
> graph and that could in theory invalidate bs. But the same is true for
> other functions in blockdev.c.

Right, my patches are not enough to address that kind of problem.

> If it's just that qmp_x_blockdev_set_iothread() was easy to fix and the
> others are more complex, that's fine with me, but maybe worth mentioning
> in the commit message.

Yes, I can mention something in the commit message, maybe:

Functions like qmp_blockdev_mirror() query the nodes to act on before
draining and locking. In theory, draining could invalidate those nodes.
This kind of issue is not addressed by these commits.

Best Regards,
Fiona
Re: [PATCH v2 09/24] block: move drain outside of bdrv_try_change_aio_context()
Posted by Kevin Wolf 5 months, 3 weeks ago
Am 22.05.2025 um 11:56 hat Fiona Ebner geschrieben:
> Am 21.05.25 um 18:36 schrieb Kevin Wolf:
> > Am 20.05.2025 um 12:29 hat Fiona Ebner geschrieben:
> >> This is part of resolving the deadlock mentioned in commit "block:
> >> move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
> >>
> >> Convert the function to a _locked() version that has to be called with
> >> the graph lock held and add a convenience wrapper that has to be
> >> called with the graph unlocked, which drains and takes the lock
> >> itself. Since bdrv_try_change_aio_context() is global state code, the
> >> wrapper is too.
> >>
> >> Callers are adapted to use the appropriate variant. In the
> >> test_set_aio_context() unit test, prior drains can be removed, because
> >> draining already happens inside the new wrapper.
> > 
> > I'm not sure what's the difference between qmp_x_blockdev_set_iothread()
> > which is converted to _locked() and e.g. qmp_blockdev_mirror().
> 
> For deciding on the variant, I only looked at whether the caller holds
> the graph lock or not.

Oh, I see. I was thinking much too complicated once again.

Maybe just add "the appropriate variant, depending on whether the caller
already holds the lock"?

> > The reason I could see in qmp_x_blockdev_set_iothread() is that (as we
> > discussed in the RFC version of this series) draining can change the
> > graph and that could in theory invalidate bs. But the same is true for
> > other functions in blockdev.c.
> 
> Right, my patches are not enough to address that kind of problem.
> 
> > If it's just that qmp_x_blockdev_set_iothread() was easy to fix and the
> > others are more complex, that's fine with me, but maybe worth mentioning
> > in the commit message.
> 
> Yes, I can mention something in the commit message, maybe:
> 
> Functions like qmp_blockdev_mirror() query the nodes to act on before
> draining and locking. In theory, draining could invalidate those nodes.
> This kind of issue is not addressed by these commits.

Doesn't hurt, but it's also not really needed.

Kevin