[PATCH v3 10/24] block: move drain outside of bdrv_attach_child_common(_abort)()

Fiona Ebner posted 24 patches 5 months, 3 weeks ago
Maintainers: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, John Snow <jsnow@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>, Ari Sundholm <ari@tuxera.com>, Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>, Alberto Garcia <berto@igalia.com>, Wen Congyang <wencongyang2@huawei.com>, Xie Changlong <xiechanglong.d@gmail.com>
There is a newer version of this series
[PATCH v3 10/24] block: move drain outside of bdrv_attach_child_common(_abort)()
Posted by Fiona Ebner 5 months, 3 weeks ago
This is part of resolving the deadlock mentioned in commit "block:
move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".

The function bdrv_attach_child_common_abort() is used only as the
abort callback in bdrv_attach_child_common_drv transactions, so the
tran_finalize() calls of such transactions need to be in drained
sections too.

All code paths are covered:
The bdrv_attach_child_common_drv transactions are only used in
bdrv_attach_child_common(), so it is enough to check callers of
bdrv_attach_child_common() following the transactions.

bdrv_attach_child_common() is called by:
1. bdrv_attach_child_noperm(), which does not finalize the
   transaction yet.
2. bdrv_root_attach_child(), where a drained section is introduced.

bdrv_attach_child_noperm() is called by:
1. bdrv_attach_child(), where a drained section is introduced.
2. bdrv_set_file_or_backing_noperm(), which does not finalize the
   transaction yet.
3. bdrv_append(), where a drained section is introduced.

bdrv_set_file_or_backing_noperm() is called by:
1. bdrv_set_backing_hd_drained(), where a drained section is
   introduced.
2. bdrv_reopen_parse_file_or_backing(), which does not finalize the
   transaction yet. Draining the old child bs currently happens under
   the graph lock there. This is replaced with an assertion, because
   the drain will be moved further up to the caller.

bdrv_reopen_parse_file_or_backing() is called by:
1. bdrv_reopen_prepare(), which does not finalize the transaction yet.

bdrv_reopen_prepare() is called by:
1. bdrv_reopen_multiple(), which does finalize the transaction. It is
   called after bdrv_reopen_queue(), which starts a drained section.
   The drained section ends, when bdrv_reopen_queue_free() is called
   at the end of bdrv_reopen_multiple().

This resolves all code paths.

The functions bdrv_set_backing_hd_drained(), bdrv_attach_child() and
bdrv_root_attach_child() run under the graph lock, so they are not
actually allowed to drain. This will be addressed in the following
commits.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---

No changes in v3.

 block.c | 27 +++++++++++----------------
 1 file changed, 11 insertions(+), 16 deletions(-)

diff --git a/block.c b/block.c
index 3aaacabf7f..64db71e38d 100644
--- a/block.c
+++ b/block.c
@@ -3028,10 +3028,8 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
     bdrv_replace_child_noperm(s->child, NULL);
 
     if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
-        bdrv_drain_all_begin();
         bdrv_try_change_aio_context_locked(bs, s->old_child_ctx, NULL,
                                            &error_abort);
-        bdrv_drain_all_end();
     }
 
     if (bdrv_child_get_parent_aio_context(s->child) != s->old_parent_ctx) {
@@ -3043,10 +3041,8 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
 
         /* No need to visit `child`, because it has been detached already */
         visited = g_hash_table_new(NULL, NULL);
-        bdrv_drain_all_begin();
         ret = s->child->klass->change_aio_ctx(s->child, s->old_parent_ctx,
                                               visited, tran, &error_abort);
-        bdrv_drain_all_end();
         g_hash_table_destroy(visited);
 
         /* transaction is supposed to always succeed */
@@ -3118,10 +3114,8 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
     parent_ctx = bdrv_child_get_parent_aio_context(new_child);
     if (child_ctx != parent_ctx) {
         Error *local_err = NULL;
-        bdrv_drain_all_begin();
         int ret = bdrv_try_change_aio_context_locked(child_bs, parent_ctx, NULL,
                                                      &local_err);
-        bdrv_drain_all_end();
 
         if (ret < 0 && child_class->change_aio_ctx) {
             Transaction *aio_ctx_tran = tran_new();
@@ -3129,11 +3123,9 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
             bool ret_child;
 
             g_hash_table_add(visited, new_child);
-            bdrv_drain_all_begin();
             ret_child = child_class->change_aio_ctx(new_child, child_ctx,
                                                     visited, aio_ctx_tran,
                                                     NULL);
-            bdrv_drain_all_end();
             if (ret_child == true) {
                 error_free(local_err);
                 ret = 0;
@@ -3244,6 +3236,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
 
     GLOBAL_STATE_CODE();
 
+    bdrv_drain_all_begin();
     child = bdrv_attach_child_common(child_bs, child_name, child_class,
                                    child_role, perm, shared_perm, opaque,
                                    tran, errp);
@@ -3256,6 +3249,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
 
 out:
     tran_finalize(tran, ret);
+    bdrv_drain_all_end();
 
     bdrv_schedule_unref(child_bs);
 
@@ -3283,6 +3277,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 
     GLOBAL_STATE_CODE();
 
+    bdrv_drain_all_begin();
     child = bdrv_attach_child_noperm(parent_bs, child_bs, child_name,
                                      child_class, child_role, tran, errp);
     if (!child) {
@@ -3297,6 +3292,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 
 out:
     tran_finalize(tran, ret);
+    bdrv_drain_all_end();
 
     bdrv_schedule_unref(child_bs);
 
@@ -3573,6 +3569,7 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
         assert(bs->backing->bs->quiesce_counter > 0);
     }
 
+    bdrv_drain_all_begin();
     ret = bdrv_set_file_or_backing_noperm(bs, backing_hd, true, tran, errp);
     if (ret < 0) {
         goto out;
@@ -3581,6 +3578,7 @@ int bdrv_set_backing_hd_drained(BlockDriverState *bs,
     ret = bdrv_refresh_perms(bs, tran, errp);
 out:
     tran_finalize(tran, ret);
+    bdrv_drain_all_end();
     return ret;
 }
 
@@ -4721,6 +4719,8 @@ int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only,
  * Return 0 on success, otherwise return < 0 and set @errp.
  *
  * @reopen_state->bs can move to a different AioContext in this function.
+ *
+ * The old child bs must be drained.
  */
 static int GRAPH_UNLOCKED
 bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
@@ -4814,7 +4814,7 @@ bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
 
     if (old_child_bs) {
         bdrv_ref(old_child_bs);
-        bdrv_drained_begin(old_child_bs);
+        assert(old_child_bs->quiesce_counter > 0);
     }
 
     bdrv_graph_rdunlock_main_loop();
@@ -4826,7 +4826,6 @@ bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
     bdrv_graph_wrunlock();
 
     if (old_child_bs) {
-        bdrv_drained_end(old_child_bs);
         bdrv_unref(old_child_bs);
     }
 
@@ -5501,9 +5500,7 @@ int bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top,
     assert(!bs_new->backing);
     bdrv_graph_rdunlock_main_loop();
 
-    bdrv_drained_begin(bs_top);
-    bdrv_drained_begin(bs_new);
-
+    bdrv_drain_all_begin();
     bdrv_graph_wrlock();
 
     child = bdrv_attach_child_noperm(bs_new, bs_top, "backing",
@@ -5525,9 +5522,7 @@ out:
 
     bdrv_refresh_limits(bs_top, NULL, NULL);
     bdrv_graph_wrunlock();
-
-    bdrv_drained_end(bs_top);
-    bdrv_drained_end(bs_new);
+    bdrv_drain_all_end();
 
     return ret;
 }
-- 
2.39.5
Re: [PATCH v3 10/24] block: move drain outside of bdrv_attach_child_common(_abort)()
Posted by Kevin Wolf 5 months, 3 weeks ago
Am 26.05.2025 um 15:21 hat Fiona Ebner geschrieben:
> This is part of resolving the deadlock mentioned in commit "block:
> move draining out of bdrv_change_aio_context() and mark GRAPH_RDLOCK".
> 
> The function bdrv_attach_child_common_abort() is used only as the
> abort callback in bdrv_attach_child_common_drv transactions, so the
> tran_finalize() calls of such transactions need to be in drained
> sections too.
> 
> All code paths are covered:
> The bdrv_attach_child_common_drv transactions are only used in
> bdrv_attach_child_common(), so it is enough to check callers of
> bdrv_attach_child_common() following the transactions.
> 
> bdrv_attach_child_common() is called by:
> 1. bdrv_attach_child_noperm(), which does not finalize the
>    transaction yet.
> 2. bdrv_root_attach_child(), where a drained section is introduced.
> 
> bdrv_attach_child_noperm() is called by:
> 1. bdrv_attach_child(), where a drained section is introduced.
> 2. bdrv_set_file_or_backing_noperm(), which does not finalize the
>    transaction yet.
> 3. bdrv_append(), where a drained section is introduced.
> 
> bdrv_set_file_or_backing_noperm() is called by:
> 1. bdrv_set_backing_hd_drained(), where a drained section is
>    introduced.
> 2. bdrv_reopen_parse_file_or_backing(), which does not finalize the
>    transaction yet. Draining the old child bs currently happens under
>    the graph lock there. This is replaced with an assertion, because
>    the drain will be moved further up to the caller.
> 
> bdrv_reopen_parse_file_or_backing() is called by:
> 1. bdrv_reopen_prepare(), which does not finalize the transaction yet.
> 
> bdrv_reopen_prepare() is called by:
> 1. bdrv_reopen_multiple(), which does finalize the transaction. It is
>    called after bdrv_reopen_queue(), which starts a drained section.
>    The drained section ends, when bdrv_reopen_queue_free() is called
>    at the end of bdrv_reopen_multiple().
> 
> This resolves all code paths.
> 
> The functions bdrv_set_backing_hd_drained(), bdrv_attach_child() and
> bdrv_root_attach_child() run under the graph lock, so they are not
> actually allowed to drain. This will be addressed in the following
> commits.
> 
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> 
> No changes in v3.
> 
>  block.c | 27 +++++++++++----------------
>  1 file changed, 11 insertions(+), 16 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 3aaacabf7f..64db71e38d 100644
> --- a/block.c
> +++ b/block.c
> @@ -3028,10 +3028,8 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
>      bdrv_replace_child_noperm(s->child, NULL);
>  
>      if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
> -        bdrv_drain_all_begin();
>          bdrv_try_change_aio_context_locked(bs, s->old_child_ctx, NULL,
>                                             &error_abort);
> -        bdrv_drain_all_end();
>      }
>  
>      if (bdrv_child_get_parent_aio_context(s->child) != s->old_parent_ctx) {
> @@ -3043,10 +3041,8 @@ static void GRAPH_WRLOCK bdrv_attach_child_common_abort(void *opaque)
>  
>          /* No need to visit `child`, because it has been detached already */
>          visited = g_hash_table_new(NULL, NULL);
> -        bdrv_drain_all_begin();
>          ret = s->child->klass->change_aio_ctx(s->child, s->old_parent_ctx,
>                                                visited, tran, &error_abort);
> -        bdrv_drain_all_end();
>          g_hash_table_destroy(visited);
>  
>          /* transaction is supposed to always succeed */
> @@ -3118,10 +3114,8 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
>      parent_ctx = bdrv_child_get_parent_aio_context(new_child);
>      if (child_ctx != parent_ctx) {
>          Error *local_err = NULL;
> -        bdrv_drain_all_begin();
>          int ret = bdrv_try_change_aio_context_locked(child_bs, parent_ctx, NULL,
>                                                       &local_err);
> -        bdrv_drain_all_end();
>  
>          if (ret < 0 && child_class->change_aio_ctx) {
>              Transaction *aio_ctx_tran = tran_new();
> @@ -3129,11 +3123,9 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
>              bool ret_child;
>  
>              g_hash_table_add(visited, new_child);
> -            bdrv_drain_all_begin();
>              ret_child = child_class->change_aio_ctx(new_child, child_ctx,
>                                                      visited, aio_ctx_tran,
>                                                      NULL);
> -            bdrv_drain_all_end();
>              if (ret_child == true) {
>                  error_free(local_err);
>                  ret = 0;

Should we mention in the function comment for bdrv_attach_child_common()
that all block nodes must be drained from before this functoin is called
until after the transaction is finalized?

A similar note would probably be good for all the other functions you
mention in the commit message that don't finalize the transaction yet so
that we convert them in this same patch.

> @@ -4721,6 +4719,8 @@ int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only,
>   * Return 0 on success, otherwise return < 0 and set @errp.
>   *
>   * @reopen_state->bs can move to a different AioContext in this function.
> + *
> + * The old child bs must be drained.
>   */
>  static int GRAPH_UNLOCKED
>  bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,

Only the old child or all nodes?

bdrv_try_change_aio_context_locked() is documented as "Called while all
bs are drained." and we call it indirectly here (which will be more
obvious when you add the comments as suggested above).

Apart from the comments, the actual code looks fine to me.

Kevin
Re: [PATCH v3 10/24] block: move drain outside of bdrv_attach_child_common(_abort)()
Posted by Fiona Ebner 5 months, 3 weeks ago
Am 27.05.25 um 17:23 schrieb Kevin Wolf:
> Am 26.05.2025 um 15:21 hat Fiona Ebner geschrieben:
>> @@ -3129,11 +3123,9 @@ bdrv_attach_child_common(BlockDriverState *child_bs,
>>              bool ret_child;
>>  
>>              g_hash_table_add(visited, new_child);
>> -            bdrv_drain_all_begin();
>>              ret_child = child_class->change_aio_ctx(new_child, child_ctx,
>>                                                      visited, aio_ctx_tran,
>>                                                      NULL);
>> -            bdrv_drain_all_end();
>>              if (ret_child == true) {
>>                  error_free(local_err);
>>                  ret = 0;
> 
> Should we mention in the function comment for bdrv_attach_child_common()
> that all block nodes must be drained from before this functoin is called
> until after the transaction is finalized?
> 
> A similar note would probably be good for all the other functions you
> mention in the commit message that don't finalize the transaction yet so
> that we convert them in this same patch.
> 
>> @@ -4721,6 +4719,8 @@ int bdrv_reopen_set_read_only(BlockDriverState *bs, bool read_only,
>>   * Return 0 on success, otherwise return < 0 and set @errp.
>>   *
>>   * @reopen_state->bs can move to a different AioContext in this function.
>> + *
>> + * The old child bs must be drained.
>>   */
>>  static int GRAPH_UNLOCKED
>>  bdrv_reopen_parse_file_or_backing(BDRVReopenState *reopen_state,
> 
> Only the old child or all nodes?
> 
> bdrv_try_change_aio_context_locked() is documented as "Called while all
> bs are drained." and we call it indirectly here (which will be more
> obvious when you add the comments as suggested above).

Yes, it needs to be all nodes. I'll try to document the requirement for
all affected functions in v4 (also what you mentioned in the replies to
the other patches).

Best Regards,
Fiona