[v2] block: do not drain while holding the graph lock

[PATCH v2 00/24] block: do not drain while holding the graph lock

Fiona Ebner posted 24 patches 8 months, 3 weeks ago

Diff against v3 v4
Download series mbox

Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20250520103012.424311-1-f.ebner@proxmox.com

Maintainers: Kevin Wolf <kwolf@redhat.com>, Hanna Reitz <hreitz@redhat.com>, John Snow <jsnow@redhat.com>, Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>, Ari Sundholm <ari@tuxera.com>, Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>, Alberto Garcia <berto@igalia.com>, Wen Congyang <wencongyang2@huawei.com>, Xie Changlong <xiechanglong.d@gmail.com>

There is a newer version of this series

block.c                                       | 156 ++++++++++--------
block/backup.c                                |   2 +-
block/blklogwrites.c                          |   4 +-
block/blkverify.c                             |   2 +-
block/block-backend.c                         |  10 +-
block/commit.c                                |   2 +-
block/graph-lock.c                            |  40 ++++-
block/io.c                                    |   8 +-
block/mirror.c                                |   2 +-
block/qcow2.c                                 |   2 +-
block/quorum.c                                |   4 +-
block/replication.c                           |   4 +-
block/snapshot.c                              |  28 ++--
block/stream.c                                |   8 +-
block/vmdk.c                                  |  10 +-
blockdev.c                                    |  78 +++++++--
blockjob.c                                    |  10 +-
include/block/block-global-state.h            |  15 +-
include/block/block-io.h                      |   2 +-
include/block/block_int-common.h              |   8 +-
include/block/graph-lock.h                    |  11 ++
qemu-img.c                                    |   2 +
.../qemu-iotests/tests/graph-changes-while-io | 102 +++++++++++-
.../tests/graph-changes-while-io.out          |   4 +-
tests/unit/test-bdrv-drain.c                  |  22 +--
tests/unit/test-bdrv-graph-mod.c              |  10 +-
26 files changed, 373 insertions(+), 173 deletions(-)

Expand all Fold all

[PATCH v2 00/24] block: do not drain while holding the graph lock

Posted by Fiona Ebner 8 months, 3 weeks ago

Previous discussion [0].

Changes in v2:
* Split the big patch moving the drain outside of
  bdrv_change_aio_context(), mark functions along the way with graph
  lock annotations.
* In {internal,external}_snapshot_action, check that associated bs did
  not change after drain and re-acquiring the lock.
* Improve error handling using goto where appropriate.
* Add bdrv_graph_wrlock_drained() convenience wrapper rather than
  adding a flag argument.
* Don't use atomics to access bs->quiesce_counter field, add a patch
  to adapt the two existing places that used atomics.
* Re-use 'top' image for graph-changes-while-io test case and rename
  the other image to 'mid'. Remove the image files after the test.
* Use "must be" instead of "needs to be" in documentation, use single
  line comments where possible.
* Remove yet another outdated comment.
* I did not add Kevin's R-b for the patch marking bdrv_drained_begin()
  GRAPH_RDLOCK, as the earlier patches/preconditions changed.

This series is an attempt to fix a deadlock issue reported by Andrey
here [1].

bdrv_drained_begin() polls and is not allowed to be called with the
block graph lock held. Mark the function as GRAPH_UNLOCKED.

This alone does not catch the issue reported by Andrey, because there
is a bdrv_graph_rdunlock_main_loop() before bdrv_drained_begin() in
the function bdrv_change_aio_context(). That unlock is of course
ineffective if the exclusive lock is held, but it prevents TSA from
finding the issue.

Thus the bdrv_drained_begin() call from inside
bdrv_change_aio_context() needs to be moved up the call stack before
acquiring the locks. This is the bulk of the series.

Granular draining is not trivially possible, because many of the
affected functions can recursively call themselves.

In place where bdrv_drained_begin() calls were removed, assertions
are added, checking the quiesced_counter to ensure that the nodes
already got drained further up in the call stack.

NOTE:
there are pre-existing test failures on current master, e.g. '240' for
all formats, '295 296 inactive-node-nbd luks-detached-header' for luks
and 'mirror-sparse' for raw. For me, the failures do not change after
this series.

[0]: https://lore.kernel.org/qemu-devel/20250508140936.3344485-1-f.ebner@proxmox.com/
[1]: https://lore.kernel.org/qemu-devel/73839c04-7616-407e-b057-80ca69e63f51@virtuozzo.com/

Andrey Drobyshev (1):
  iotests/graph-changes-while-io: add test case with removal of lower
    snapshot

Fiona Ebner (23):
  block: remove outdated comments about AioContext locking
  block: move drain outside of read-locked bdrv_reopen_queue_child()
  block/snapshot: move drain outside of read-locked
    bdrv_snapshot_delete()
  block: move drain outside of read-locked bdrv_inactivate_recurse()
  block: mark bdrv_parent_change_aio_context() GRAPH_RDLOCK
  block: mark change_aio_ctx() callback and instances as
    GRAPH_RDLOCK(_PTR)
  block: mark bdrv_child_change_aio_context() GRAPH_RDLOCK
  block: move drain outside of bdrv_change_aio_context() and mark
    GRAPH_RDLOCK
  block: move drain outside of bdrv_try_change_aio_context()
  block: move drain outside of bdrv_attach_child_common(_abort)()
  block: move drain outside of bdrv_set_backing_hd_drained()
  block: move drain outside of bdrv_root_attach_child()
  block: move drain outside of bdrv_attach_child()
  block: move drain outside of quorum_add_child()
  block: move drain outside of bdrv_root_unref_child()
  block: move drain outside of quorum_del_child()
  blockdev: drain while unlocked in internal_snapshot_action()
  blockdev: drain while unlocked in external_snapshot_action()
  block: mark bdrv_drained_begin() as GRAPH_UNLOCKED
  iotests/graph-changes-while-io: remove image file after test
  block/io: remove duplicate GLOBAL_STATE_CODE() in
    bdrv_do_drained_end()
  block: never use atomics to access bs->quiesce_counter
  block: add bdrv_graph_wrlock_drained() convenience wrapper

 block.c                                       | 156 ++++++++++--------
 block/backup.c                                |   2 +-
 block/blklogwrites.c                          |   4 +-
 block/blkverify.c                             |   2 +-
 block/block-backend.c                         |  10 +-
 block/commit.c                                |   2 +-
 block/graph-lock.c                            |  40 ++++-
 block/io.c                                    |   8 +-
 block/mirror.c                                |   2 +-
 block/qcow2.c                                 |   2 +-
 block/quorum.c                                |   4 +-
 block/replication.c                           |   4 +-
 block/snapshot.c                              |  28 ++--
 block/stream.c                                |   8 +-
 block/vmdk.c                                  |  10 +-
 blockdev.c                                    |  78 +++++++--
 blockjob.c                                    |  10 +-
 include/block/block-global-state.h            |  15 +-
 include/block/block-io.h                      |   2 +-
 include/block/block_int-common.h              |   8 +-
 include/block/graph-lock.h                    |  11 ++
 qemu-img.c                                    |   2 +
 .../qemu-iotests/tests/graph-changes-while-io | 102 +++++++++++-
 .../tests/graph-changes-while-io.out          |   4 +-
 tests/unit/test-bdrv-drain.c                  |  22 +--
 tests/unit/test-bdrv-graph-mod.c              |  10 +-
 26 files changed, 373 insertions(+), 173 deletions(-)

-- 
2.39.5

Re: [PATCH v2 00/24] block: do not drain while holding the graph lock

Posted by Eric Blake 8 months, 3 weeks ago

On Tue, May 20, 2025 at 12:29:48PM +0200, Fiona Ebner wrote:
> Previous discussion [0].
> 

> NOTE:
> there are pre-existing test failures on current master, e.g. '240' for
> all formats, '295 296 inactive-node-nbd luks-detached-header' for luks
> and 'mirror-sparse' for raw. For me, the failures do not change after
> this series.

I have not yet reproduced the mirror-sparse failure locally; I suspect
the culprit may be a difference in filesystems and how well (or not)
SEEK_HOLE and FALLOC_FL_PUNCH_HOLE work on different filesystems.

Can I get more details on the failure as you see it, so I can work on
skipping the test when the filesystem does not support the test at
hand?  (It might be better to reply in context to the pull request
that introduced the file [2])

[2] https://lists.gnu.org/archive/html/qemu-devel/2025-05/msg03623.html

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org