[PULL 00/28] Block layer patches

Kevin Wolf posted 28 patches 7 months, 3 weeks ago
Only 27 patches received!
There is a newer version of this series
qapi/block-core.json                             |   6 +-
include/block/aio.h                              |   1 -
include/block/block-common.h                     |   7 +
include/block/block-global-state.h               |  32 +-
include/block/block-io.h                         |   1 -
include/block/block_int-common.h                 |  34 +-
include/block/block_int-global-state.h           |  14 +-
include/sysemu/block-backend-global-state.h      |   4 +-
block.c                                          | 348 +++++++---
block/blklogwrites.c                             |   4 +
block/blkverify.c                                |   2 +
block/block-backend.c                            |  64 +-
block/copy-before-write.c                        |  10 +-
block/crypto.c                                   |   6 +-
block/graph-lock.c                               |  26 +-
block/io.c                                       |  23 +-
block/mirror.c                                   |   8 +
block/preallocate.c                              | 133 ++--
block/qcow.c                                     |   5 +-
block/qcow2.c                                    |   7 +-
block/quorum.c                                   |  23 +-
block/replication.c                              |   9 +
block/snapshot.c                                 |   2 +
block/stream.c                                   |  20 +-
block/vmdk.c                                     |  15 +
blockdev.c                                       |  23 +-
blockjob.c                                       |   2 +
hw/nvme/ctrl.c                                   |   7 -
qemu-img.c                                       |   8 +-
softmmu/dma-helpers.c                            |   8 -
tests/unit/test-bdrv-drain.c                     |  31 +-
tests/unit/test-bdrv-graph-mod.c                 |  20 +
tests/unit/test-block-iothread.c                 |   3 +
util/thread-pool.c                               |   8 -
scripts/block-coroutine-wrapper.py               |  24 +-
tests/qemu-iotests/051.pc.out                    |   6 +-
tests/qemu-iotests/122.out                       |  84 +--
tests/qemu-iotests/146.out                       | 780 +++++++++++------------
tests/qemu-iotests/154.out                       | 194 +++---
tests/qemu-iotests/179.out                       | 178 +++---
tests/qemu-iotests/209.out                       |   4 +-
tests/qemu-iotests/221.out                       |  16 +-
tests/qemu-iotests/223.out                       |  60 +-
tests/qemu-iotests/241.out                       |  10 +-
tests/qemu-iotests/244.out                       |  24 +-
tests/qemu-iotests/252.out                       |  10 +-
tests/qemu-iotests/253.out                       |  20 +-
tests/qemu-iotests/274.out                       |  48 +-
tests/qemu-iotests/tests/nbd-qemu-allocation.out |  16 +-
tests/qemu-iotests/tests/qemu-img-bitmaps.out    |  24 +-
50 files changed, 1376 insertions(+), 1036 deletions(-)
[PULL 00/28] Block layer patches
Posted by Kevin Wolf 7 months, 3 weeks ago
The following changes since commit 005ad32358f12fe9313a4a01918a55e60d4f39e5:

  Merge tag 'pull-tpm-2023-09-12-3' of https://github.com/stefanberger/qemu-tpm into staging (2023-09-13 13:41:57 -0400)

are available in the Git repository at:

  https://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to 5d96864b73225ee61b0dad7e928f0cddf14270fc:

  block-coroutine-wrapper: use qemu_get_current_aio_context() (2023-09-15 15:49:14 +0200)

----------------------------------------------------------------
Block layer patches

- Graph locking part 4 (node management)
- qemu-img map: report compressed data blocks
- block-backend: process I/O in the current AioContext

----------------------------------------------------------------
Andrey Drobyshev via (2):
      block: add BDRV_BLOCK_COMPRESSED flag for bdrv_block_status()
      qemu-img: map: report compressed data blocks

Kevin Wolf (21):
      block: Remove unused BlockReopenQueueEntry.perms_checked
      preallocate: Factor out preallocate_truncate_to_real_size()
      preallocate: Don't poll during permission updates
      block: Take AioContext lock for bdrv_append() more consistently
      block: Introduce bdrv_schedule_unref()
      block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions
      block-coroutine-wrapper: Allow arbitrary parameter names
      block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK
      block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK
      block: Mark bdrv_attach_child_common() GRAPH_WRLOCK
      block: Call transaction callbacks with lock held
      block: Mark bdrv_attach_child() GRAPH_WRLOCK
      block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK
      block: Mark bdrv_get_cumulative_perm() and callers GRAPH_RDLOCK
      block: Mark bdrv_child_perm() GRAPH_RDLOCK
      block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK
      block: Take graph rdlock in bdrv_drop_intermediate()
      block: Take graph rdlock in bdrv_change_aio_context()
      block: Mark bdrv_root_unref_child() GRAPH_WRLOCK
      block: Mark bdrv_unref_child() GRAPH_WRLOCK
      block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK

Stefan Hajnoczi (5):
      block: remove AIOCBInfo->get_aio_context()
      test-bdrv-drain: avoid race with BH in IOThread drain test
      block-backend: process I/O in the current AioContext
      block-backend: process zoned requests in the current AioContext
      block-coroutine-wrapper: use qemu_get_current_aio_context()

 qapi/block-core.json                             |   6 +-
 include/block/aio.h                              |   1 -
 include/block/block-common.h                     |   7 +
 include/block/block-global-state.h               |  32 +-
 include/block/block-io.h                         |   1 -
 include/block/block_int-common.h                 |  34 +-
 include/block/block_int-global-state.h           |  14 +-
 include/sysemu/block-backend-global-state.h      |   4 +-
 block.c                                          | 348 +++++++---
 block/blklogwrites.c                             |   4 +
 block/blkverify.c                                |   2 +
 block/block-backend.c                            |  64 +-
 block/copy-before-write.c                        |  10 +-
 block/crypto.c                                   |   6 +-
 block/graph-lock.c                               |  26 +-
 block/io.c                                       |  23 +-
 block/mirror.c                                   |   8 +
 block/preallocate.c                              | 133 ++--
 block/qcow.c                                     |   5 +-
 block/qcow2.c                                    |   7 +-
 block/quorum.c                                   |  23 +-
 block/replication.c                              |   9 +
 block/snapshot.c                                 |   2 +
 block/stream.c                                   |  20 +-
 block/vmdk.c                                     |  15 +
 blockdev.c                                       |  23 +-
 blockjob.c                                       |   2 +
 hw/nvme/ctrl.c                                   |   7 -
 qemu-img.c                                       |   8 +-
 softmmu/dma-helpers.c                            |   8 -
 tests/unit/test-bdrv-drain.c                     |  31 +-
 tests/unit/test-bdrv-graph-mod.c                 |  20 +
 tests/unit/test-block-iothread.c                 |   3 +
 util/thread-pool.c                               |   8 -
 scripts/block-coroutine-wrapper.py               |  24 +-
 tests/qemu-iotests/051.pc.out                    |   6 +-
 tests/qemu-iotests/122.out                       |  84 +--
 tests/qemu-iotests/146.out                       | 780 +++++++++++------------
 tests/qemu-iotests/154.out                       | 194 +++---
 tests/qemu-iotests/179.out                       | 178 +++---
 tests/qemu-iotests/209.out                       |   4 +-
 tests/qemu-iotests/221.out                       |  16 +-
 tests/qemu-iotests/223.out                       |  60 +-
 tests/qemu-iotests/241.out                       |  10 +-
 tests/qemu-iotests/244.out                       |  24 +-
 tests/qemu-iotests/252.out                       |  10 +-
 tests/qemu-iotests/253.out                       |  20 +-
 tests/qemu-iotests/274.out                       |  48 +-
 tests/qemu-iotests/tests/nbd-qemu-allocation.out |  16 +-
 tests/qemu-iotests/tests/qemu-img-bitmaps.out    |  24 +-
 50 files changed, 1376 insertions(+), 1036 deletions(-)
Re: [PULL 00/28] Block layer patches
Posted by Stefan Hajnoczi 7 months, 2 weeks ago
Hi Kevin,
The following CI failure looks like it is related to this pull
request. Please take a look:
https://gitlab.com/qemu-project/qemu/-/jobs/5112083994

▶ 823/840 qcow2 iothreads-commit-active FAIL
823/840 qemu:block / io-qcow2-iothreads-commit-active ERROR 6.16s exit status 1
>>> MALLOC_PERTURB_=184 PYTHON=/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/pyvenv/bin/python3 /home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/pyvenv/bin/python3 /home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/../tests/qemu-iotests/check -tap -qcow2 iothreads-commit-active --source-dir /home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/tests/qemu-iotests --build-dir /home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/tests/qemu-iotests
――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
stderr:
--- /home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/tests/qemu-iotests/tests/iothreads-commit-active.out
+++ /home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/build/scratch/qcow2-file-iothreads-commit-active/iothreads-commit-active.out.bad
@@ -18,6 +18,35 @@
{"execute": "job-complete", "arguments": {"id": "job1"}}
{"return": {}}
{"data": {"device": "job1", "len": 131072, "offset": 131072, "speed":
0, "type": "commit"}, "event": "BLOCK_JOB_READY", "timestamp":
{"microseconds": "USECS", "seconds": "SECS"}}
-{"data": {"device": "job1", "len": 131072, "offset": 131072, "speed":
0, "type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp":
{"microseconds": "USECS", "seconds": "SECS"}}
-{"execute": "job-dismiss", "arguments": {"id": "job1"}}
-{"return": {}}
+Traceback (most recent call last):
+ File "/home/gitlab-runner/builds/E8PpwMky/0/qemu-project/qemu/python/qemu/qmp/events.py",
line 557, in get
+ return await self._queue.get()
+ File "/usr/lib/python3.10/asyncio/queues.py", line 159, in get
+ await getter
+asyncio.exceptions.CancelledError
+
+During handling of the above exception, another exception occurred:
+
+Traceback (most recent call last):
+ File "/usr/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
+ return fut.result()
+asyncio.exceptions.CancelledError
+
+The above exception was the direct cause of the following exception:

On Fri, 15 Sept 2023 at 10:45, Kevin Wolf <kwolf@redhat.com> wrote:
>
> The following changes since commit 005ad32358f12fe9313a4a01918a55e60d4f39e5:
>
>   Merge tag 'pull-tpm-2023-09-12-3' of https://github.com/stefanberger/qemu-tpm into staging (2023-09-13 13:41:57 -0400)
>
> are available in the Git repository at:
>
>   https://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to 5d96864b73225ee61b0dad7e928f0cddf14270fc:
>
>   block-coroutine-wrapper: use qemu_get_current_aio_context() (2023-09-15 15:49:14 +0200)
>
> ----------------------------------------------------------------
> Block layer patches
>
> - Graph locking part 4 (node management)
> - qemu-img map: report compressed data blocks
> - block-backend: process I/O in the current AioContext
>
> ----------------------------------------------------------------
> Andrey Drobyshev via (2):
>       block: add BDRV_BLOCK_COMPRESSED flag for bdrv_block_status()
>       qemu-img: map: report compressed data blocks
>
> Kevin Wolf (21):
>       block: Remove unused BlockReopenQueueEntry.perms_checked
>       preallocate: Factor out preallocate_truncate_to_real_size()
>       preallocate: Don't poll during permission updates
>       block: Take AioContext lock for bdrv_append() more consistently
>       block: Introduce bdrv_schedule_unref()
>       block-coroutine-wrapper: Add no_co_wrapper_bdrv_wrlock functions
>       block-coroutine-wrapper: Allow arbitrary parameter names
>       block: Mark bdrv_replace_child_noperm() GRAPH_WRLOCK
>       block: Mark bdrv_replace_child_tran() GRAPH_WRLOCK
>       block: Mark bdrv_attach_child_common() GRAPH_WRLOCK
>       block: Call transaction callbacks with lock held
>       block: Mark bdrv_attach_child() GRAPH_WRLOCK
>       block: Mark bdrv_parent_perms_conflict() and callers GRAPH_RDLOCK
>       block: Mark bdrv_get_cumulative_perm() and callers GRAPH_RDLOCK
>       block: Mark bdrv_child_perm() GRAPH_RDLOCK
>       block: Mark bdrv_parent_cb_change_media() GRAPH_RDLOCK
>       block: Take graph rdlock in bdrv_drop_intermediate()
>       block: Take graph rdlock in bdrv_change_aio_context()
>       block: Mark bdrv_root_unref_child() GRAPH_WRLOCK
>       block: Mark bdrv_unref_child() GRAPH_WRLOCK
>       block: Mark bdrv_add/del_child() and caller GRAPH_WRLOCK
>
> Stefan Hajnoczi (5):
>       block: remove AIOCBInfo->get_aio_context()
>       test-bdrv-drain: avoid race with BH in IOThread drain test
>       block-backend: process I/O in the current AioContext
>       block-backend: process zoned requests in the current AioContext
>       block-coroutine-wrapper: use qemu_get_current_aio_context()
>
>  qapi/block-core.json                             |   6 +-
>  include/block/aio.h                              |   1 -
>  include/block/block-common.h                     |   7 +
>  include/block/block-global-state.h               |  32 +-
>  include/block/block-io.h                         |   1 -
>  include/block/block_int-common.h                 |  34 +-
>  include/block/block_int-global-state.h           |  14 +-
>  include/sysemu/block-backend-global-state.h      |   4 +-
>  block.c                                          | 348 +++++++---
>  block/blklogwrites.c                             |   4 +
>  block/blkverify.c                                |   2 +
>  block/block-backend.c                            |  64 +-
>  block/copy-before-write.c                        |  10 +-
>  block/crypto.c                                   |   6 +-
>  block/graph-lock.c                               |  26 +-
>  block/io.c                                       |  23 +-
>  block/mirror.c                                   |   8 +
>  block/preallocate.c                              | 133 ++--
>  block/qcow.c                                     |   5 +-
>  block/qcow2.c                                    |   7 +-
>  block/quorum.c                                   |  23 +-
>  block/replication.c                              |   9 +
>  block/snapshot.c                                 |   2 +
>  block/stream.c                                   |  20 +-
>  block/vmdk.c                                     |  15 +
>  blockdev.c                                       |  23 +-
>  blockjob.c                                       |   2 +
>  hw/nvme/ctrl.c                                   |   7 -
>  qemu-img.c                                       |   8 +-
>  softmmu/dma-helpers.c                            |   8 -
>  tests/unit/test-bdrv-drain.c                     |  31 +-
>  tests/unit/test-bdrv-graph-mod.c                 |  20 +
>  tests/unit/test-block-iothread.c                 |   3 +
>  util/thread-pool.c                               |   8 -
>  scripts/block-coroutine-wrapper.py               |  24 +-
>  tests/qemu-iotests/051.pc.out                    |   6 +-
>  tests/qemu-iotests/122.out                       |  84 +--
>  tests/qemu-iotests/146.out                       | 780 +++++++++++------------
>  tests/qemu-iotests/154.out                       | 194 +++---
>  tests/qemu-iotests/179.out                       | 178 +++---
>  tests/qemu-iotests/209.out                       |   4 +-
>  tests/qemu-iotests/221.out                       |  16 +-
>  tests/qemu-iotests/223.out                       |  60 +-
>  tests/qemu-iotests/241.out                       |  10 +-
>  tests/qemu-iotests/244.out                       |  24 +-
>  tests/qemu-iotests/252.out                       |  10 +-
>  tests/qemu-iotests/253.out                       |  20 +-
>  tests/qemu-iotests/274.out                       |  48 +-
>  tests/qemu-iotests/tests/nbd-qemu-allocation.out |  16 +-
>  tests/qemu-iotests/tests/qemu-img-bitmaps.out    |  24 +-
>  50 files changed, 1376 insertions(+), 1036 deletions(-)
>
>
Re: [PULL 00/28] Block layer patches
Posted by Stefan Hajnoczi 7 months, 2 weeks ago
Hi Kevin,
I believe that my own commit "block-coroutine-wrapper: use
qemu_get_current_aio_context()" breaks this test. The failure is
non-deterministic (happens about 1 out of 4 runs).

It seems the job hangs and the test times out in vm.run_job('job1', wait=5.0).

I haven't debugged it yet but wanted to share this information to save
some time. Tomorrow I'll investigate further.

Stefan
Re: [PULL 00/28] Block layer patches
Posted by Kevin Wolf 7 months, 2 weeks ago
Am 18.09.2023 um 20:56 hat Stefan Hajnoczi geschrieben:
> Hi Kevin,
> I believe that my own commit "block-coroutine-wrapper: use
> qemu_get_current_aio_context()" breaks this test. The failure is
> non-deterministic (happens about 1 out of 4 runs).
> 
> It seems the job hangs and the test times out in vm.run_job('job1', wait=5.0).
> 
> I haven't debugged it yet but wanted to share this information to save
> some time. Tomorrow I'll investigate further.

Yes, it's relatively easily reproducible if I run the test in a loop,
and I can't seem to reproduce it without the last patch. Should I
unstage the full series again, or do you think that the last patch is
really optional this time?

However, I'm unsure how the stack traces I'm seeing are related to your
patch. Maybe it just made an existing bug more likely to be triggered?

What I'm seeing is that the reader lock is held by an iothread that is
waiting for its AioContext lock to make progress:

Thread 3 (Thread 0x7f811e9346c0 (LWP 26390) "qemu-system-x86"):
#0  0x00007f81250aaf80 in __lll_lock_wait () at /lib64/libc.so.6
#1  0x00007f81250b149a in pthread_mutex_lock@@GLIBC_2.2.5 () at /lib64/libc.so.6
#2  0x000055b7b170967e in qemu_mutex_lock_impl (mutex=0x55b7b34e3080, file=0x55b7b199e1f7 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:94
#3  0x000055b7b1709953 in qemu_rec_mutex_lock_impl (mutex=0x55b7b34e3080, file=0x55b7b199e1f7 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:149
#4  0x000055b7b1728318 in aio_context_acquire (ctx=0x55b7b34e3020) at ../util/async.c:728
#5  0x000055b7b1727c49 in co_schedule_bh_cb (opaque=0x55b7b34e3020) at ../util/async.c:565
#6  0x000055b7b1726f1c in aio_bh_call (bh=0x55b7b34e2e70) at ../util/async.c:169
#7  0x000055b7b17270ee in aio_bh_poll (ctx=0x55b7b34e3020) at ../util/async.c:216
#8  0x000055b7b170351d in aio_poll (ctx=0x55b7b34e3020, blocking=true) at ../util/aio-posix.c:722
#9  0x000055b7b1518604 in iothread_run (opaque=0x55b7b2904460) at ../iothread.c:63
#10 0x000055b7b170a955 in qemu_thread_start (args=0x55b7b34e36b0) at ../util/qemu-thread-posix.c:541
#11 0x00007f81250ae15d in start_thread () at /lib64/libc.so.6
#12 0x00007f812512fc00 in clone3 () at /lib64/libc.so.6

On the other hand, the main thread wants to acquire the writer lock,
but it holds the AioContext lock of the iothread (it takes it in
job_prepare_locked()):

Thread 1 (Thread 0x7f811f4b7b00 (LWP 26388) "qemu-system-x86"):
#0  0x00007f8125122356 in ppoll () at /lib64/libc.so.6
#1  0x000055b7b172eae0 in qemu_poll_ns (fds=0x55b7b34ec910, nfds=1, timeout=-1) at ../util/qemu-timer.c:339
#2  0x000055b7b1704ebd in fdmon_poll_wait (ctx=0x55b7b3269210, ready_list=0x7ffc90b05680, timeout=-1) at ../util/fdmon-poll.c:79
#3  0x000055b7b1703284 in aio_poll (ctx=0x55b7b3269210, blocking=true) at ../util/aio-posix.c:670
#4  0x000055b7b1567c3b in bdrv_graph_wrlock (bs=0x0) at ../block/graph-lock.c:145
#5  0x000055b7b1554c1c in blk_remove_bs (blk=0x55b7b4425800) at ../block/block-backend.c:916
#6  0x000055b7b1554779 in blk_delete (blk=0x55b7b4425800) at ../block/block-backend.c:497
#7  0x000055b7b1554133 in blk_unref (blk=0x55b7b4425800) at ../block/block-backend.c:557
#8  0x000055b7b157a149 in mirror_exit_common (job=0x55b7b4419000) at ../block/mirror.c:696
#9  0x000055b7b1577015 in mirror_prepare (job=0x55b7b4419000) at ../block/mirror.c:807
#10 0x000055b7b153a1a7 in job_prepare_locked (job=0x55b7b4419000) at ../job.c:988
#11 0x000055b7b153a0d9 in job_txn_apply_locked (job=0x55b7b4419000, fn=0x55b7b153a110 <job_prepare_locked>) at ../job.c:191
#12 0x000055b7b1538b6d in job_do_finalize_locked (job=0x55b7b4419000) at ../job.c:1011
#13 0x000055b7b153a886 in job_completed_txn_success_locked (job=0x55b7b4419000) at ../job.c:1068
#14 0x000055b7b1539372 in job_completed_locked (job=0x55b7b4419000) at ../job.c:1082
#15 0x000055b7b153a71b in job_exit (opaque=0x55b7b4419000) at ../job.c:1103
#16 0x000055b7b1726f1c in aio_bh_call (bh=0x7f8110005470) at ../util/async.c:169
#17 0x000055b7b17270ee in aio_bh_poll (ctx=0x55b7b3269210) at ../util/async.c:216
#18 0x000055b7b1702c05 in aio_dispatch (ctx=0x55b7b3269210) at ../util/aio-posix.c:423
#19 0x000055b7b1728a14 in aio_ctx_dispatch (source=0x55b7b3269210, callback=0x0, user_data=0x0) at ../util/async.c:358
#20 0x00007f8126c31c7f in g_main_dispatch (context=0x55b7b3269720) at ../glib/gmain.c:3454
#21 g_main_context_dispatch (context=0x55b7b3269720) at ../glib/gmain.c:4172
#22 0x000055b7b1729c98 in glib_pollfds_poll () at ../util/main-loop.c:290
#23 0x000055b7b1729572 in os_host_main_loop_wait (timeout=27462700) at ../util/main-loop.c:313
#24 0x000055b7b1729452 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
#25 0x000055b7b119a1eb in qemu_main_loop () at ../softmmu/runstate.c:772
#26 0x000055b7b14c102d in qemu_default_main () at ../softmmu/main.c:37
#27 0x000055b7b14c1068 in main (argc=44, argv=0x7ffc90b05d58) at ../softmmu/main.c:48

At first I thought we just need to look into the AioContext locking in
job completion and drop it in the right places.

But in fact, first of all, blk_remove_bs() needs to make up its mind if
it wants the caller to hold the AioContext or not and document that.
Because it calls both bdrv_drained_begin() (which requires holding the
AioContext lock) and bdrv_graph_wrlock(NULL) (which forbids it).

If we could fully get rid of the AioContext lock (as we originally
stated as a goal), that would automatically solve this kind of
deadlocks.

Kevin
Re: [PULL 00/28] Block layer patches
Posted by Stefan Hajnoczi 7 months, 2 weeks ago
On Tue, 19 Sept 2023 at 06:26, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 18.09.2023 um 20:56 hat Stefan Hajnoczi geschrieben:
> If we could fully get rid of the AioContext lock (as we originally
> stated as a goal), that would automatically solve this kind of
> deadlocks.

Grepping for "ctx locked", "context acquired", etc does not bring up a
lot of comments describing variables that are protected by the
AioContext lock.

However, there are at least hundreds of functions that assume they are
called with the AioContext lock held.

There are a few strategies:

Top-down
--------
Shorten AioContext lock critical sections to cover only APIs that need them.
Then push the lock down into the API and repeat the next lower level until
aio_context_acquire() + AIO_WAIT_WHILE() + aio_context_release() can be
replaced with AIO_WAIT_UNLOCKED().

Bottom-up
---------
Switch AIO_WAIT_WHILE() to aio_context_release() + AIO_WAIT_WHILE_UNLOCKED() +
aio_context_acquire(). Then move the lock up into callers and repeat at the
next higher level until aio_context_acquire() + aio_context_release() cancel
each other out.

Big bang
--------
Remove aio_context_acquire/release() and fix tests until they pass.

I think top-down is safer than bottom-up, because bottom-up is more
likely to cause issues with callers that do not tolerate temporarily
dropping the lock.

The big bang approach is only reasonable if the AioContext lock is no
longer used to protect variables (which we don't know for sure because
that requires auditing every line of code).

My concern with the top-down approach is that so much code needs to be
audited and the conversions are temporary steps (it's almost a waste
of time for maintainers to review them).

I'm tempted to go for the big bang approach but also don't want to
introduce a slew of new race conditions. :/

Stefan
Re: [PULL 00/28] Block layer patches
Posted by Stefan Hajnoczi 7 months, 2 weeks ago
On Tue, 19 Sept 2023 at 06:26, Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 18.09.2023 um 20:56 hat Stefan Hajnoczi geschrieben:
> > Hi Kevin,
> > I believe that my own commit "block-coroutine-wrapper: use
> > qemu_get_current_aio_context()" breaks this test. The failure is
> > non-deterministic (happens about 1 out of 4 runs).
> >
> > It seems the job hangs and the test times out in vm.run_job('job1', wait=5.0).
> >
> > I haven't debugged it yet but wanted to share this information to save
> > some time. Tomorrow I'll investigate further.
>
> Yes, it's relatively easily reproducible if I run the test in a loop,
> and I can't seem to reproduce it without the last patch. Should I
> unstage the full series again, or do you think that the last patch is
> really optional this time?
>
> However, I'm unsure how the stack traces I'm seeing are related to your
> patch. Maybe it just made an existing bug more likely to be triggered?
>
> What I'm seeing is that the reader lock is held by an iothread that is
> waiting for its AioContext lock to make progress:
>
> Thread 3 (Thread 0x7f811e9346c0 (LWP 26390) "qemu-system-x86"):
> #0  0x00007f81250aaf80 in __lll_lock_wait () at /lib64/libc.so.6
> #1  0x00007f81250b149a in pthread_mutex_lock@@GLIBC_2.2.5 () at /lib64/libc.so.6
> #2  0x000055b7b170967e in qemu_mutex_lock_impl (mutex=0x55b7b34e3080, file=0x55b7b199e1f7 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:94
> #3  0x000055b7b1709953 in qemu_rec_mutex_lock_impl (mutex=0x55b7b34e3080, file=0x55b7b199e1f7 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:149
> #4  0x000055b7b1728318 in aio_context_acquire (ctx=0x55b7b34e3020) at ../util/async.c:728
> #5  0x000055b7b1727c49 in co_schedule_bh_cb (opaque=0x55b7b34e3020) at ../util/async.c:565
> #6  0x000055b7b1726f1c in aio_bh_call (bh=0x55b7b34e2e70) at ../util/async.c:169
> #7  0x000055b7b17270ee in aio_bh_poll (ctx=0x55b7b34e3020) at ../util/async.c:216
> #8  0x000055b7b170351d in aio_poll (ctx=0x55b7b34e3020, blocking=true) at ../util/aio-posix.c:722
> #9  0x000055b7b1518604 in iothread_run (opaque=0x55b7b2904460) at ../iothread.c:63
> #10 0x000055b7b170a955 in qemu_thread_start (args=0x55b7b34e36b0) at ../util/qemu-thread-posix.c:541
> #11 0x00007f81250ae15d in start_thread () at /lib64/libc.so.6
> #12 0x00007f812512fc00 in clone3 () at /lib64/libc.so.6
>
> On the other hand, the main thread wants to acquire the writer lock,
> but it holds the AioContext lock of the iothread (it takes it in
> job_prepare_locked()):
>
> Thread 1 (Thread 0x7f811f4b7b00 (LWP 26388) "qemu-system-x86"):
> #0  0x00007f8125122356 in ppoll () at /lib64/libc.so.6
> #1  0x000055b7b172eae0 in qemu_poll_ns (fds=0x55b7b34ec910, nfds=1, timeout=-1) at ../util/qemu-timer.c:339
> #2  0x000055b7b1704ebd in fdmon_poll_wait (ctx=0x55b7b3269210, ready_list=0x7ffc90b05680, timeout=-1) at ../util/fdmon-poll.c:79
> #3  0x000055b7b1703284 in aio_poll (ctx=0x55b7b3269210, blocking=true) at ../util/aio-posix.c:670
> #4  0x000055b7b1567c3b in bdrv_graph_wrlock (bs=0x0) at ../block/graph-lock.c:145
> #5  0x000055b7b1554c1c in blk_remove_bs (blk=0x55b7b4425800) at ../block/block-backend.c:916
> #6  0x000055b7b1554779 in blk_delete (blk=0x55b7b4425800) at ../block/block-backend.c:497
> #7  0x000055b7b1554133 in blk_unref (blk=0x55b7b4425800) at ../block/block-backend.c:557
> #8  0x000055b7b157a149 in mirror_exit_common (job=0x55b7b4419000) at ../block/mirror.c:696
> #9  0x000055b7b1577015 in mirror_prepare (job=0x55b7b4419000) at ../block/mirror.c:807
> #10 0x000055b7b153a1a7 in job_prepare_locked (job=0x55b7b4419000) at ../job.c:988
> #11 0x000055b7b153a0d9 in job_txn_apply_locked (job=0x55b7b4419000, fn=0x55b7b153a110 <job_prepare_locked>) at ../job.c:191
> #12 0x000055b7b1538b6d in job_do_finalize_locked (job=0x55b7b4419000) at ../job.c:1011
> #13 0x000055b7b153a886 in job_completed_txn_success_locked (job=0x55b7b4419000) at ../job.c:1068
> #14 0x000055b7b1539372 in job_completed_locked (job=0x55b7b4419000) at ../job.c:1082
> #15 0x000055b7b153a71b in job_exit (opaque=0x55b7b4419000) at ../job.c:1103
> #16 0x000055b7b1726f1c in aio_bh_call (bh=0x7f8110005470) at ../util/async.c:169
> #17 0x000055b7b17270ee in aio_bh_poll (ctx=0x55b7b3269210) at ../util/async.c:216
> #18 0x000055b7b1702c05 in aio_dispatch (ctx=0x55b7b3269210) at ../util/aio-posix.c:423
> #19 0x000055b7b1728a14 in aio_ctx_dispatch (source=0x55b7b3269210, callback=0x0, user_data=0x0) at ../util/async.c:358
> #20 0x00007f8126c31c7f in g_main_dispatch (context=0x55b7b3269720) at ../glib/gmain.c:3454
> #21 g_main_context_dispatch (context=0x55b7b3269720) at ../glib/gmain.c:4172
> #22 0x000055b7b1729c98 in glib_pollfds_poll () at ../util/main-loop.c:290
> #23 0x000055b7b1729572 in os_host_main_loop_wait (timeout=27462700) at ../util/main-loop.c:313
> #24 0x000055b7b1729452 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
> #25 0x000055b7b119a1eb in qemu_main_loop () at ../softmmu/runstate.c:772
> #26 0x000055b7b14c102d in qemu_default_main () at ../softmmu/main.c:37
> #27 0x000055b7b14c1068 in main (argc=44, argv=0x7ffc90b05d58) at ../softmmu/main.c:48
>
> At first I thought we just need to look into the AioContext locking in
> job completion and drop it in the right places.
>
> But in fact, first of all, blk_remove_bs() needs to make up its mind if
> it wants the caller to hold the AioContext or not and document that.
> Because it calls both bdrv_drained_begin() (which requires holding the
> AioContext lock) and bdrv_graph_wrlock(NULL) (which forbids it).

I was surprised to see bdrv_graph_wrlock(NULL) called. The AioContext
lock is held across aio_poll() and that's why the hang occurs.

I'll make blk_remove_bs() consistent with respect to the AioContext
lock so this test failure can be fixed.

> If we could fully get rid of the AioContext lock (as we originally
> stated as a goal), that would automatically solve this kind of
> deadlocks.

Although it's not an indication that it's already safe to remove the
AioContext lock, I commented out the lock/unlock in
aio_context_acquire/release() and found that "make check" passes. Only
graph-changes-while-io fails when I run "qemu-iotests -qcow2".

I think the main thing keeping the AioContext lock around is
AIO_WAIT_WHILE(). The caller must hold the lock and that forces many
APIs to extend this requirement to their caller.

Perhaps more AIO_WAIT_WHILE() instances can now be converted to
AIO_WAIT_WHILE_UNLOCKED().

Stefan
Re: [PULL 00/28] Block layer patches
Posted by Stefan Hajnoczi 7 months, 2 weeks ago
On Tue, 19 Sept 2023 at 06:26, Kevin Wolf <kwolf@redhat.com> wrote:
>
> Am 18.09.2023 um 20:56 hat Stefan Hajnoczi geschrieben:
> > Hi Kevin,
> > I believe that my own commit "block-coroutine-wrapper: use
> > qemu_get_current_aio_context()" breaks this test. The failure is
> > non-deterministic (happens about 1 out of 4 runs).
> >
> > It seems the job hangs and the test times out in vm.run_job('job1', wait=5.0).
> >
> > I haven't debugged it yet but wanted to share this information to save
> > some time. Tomorrow I'll investigate further.
>
> Yes, it's relatively easily reproducible if I run the test in a loop,
> and I can't seem to reproduce it without the last patch. Should I
> unstage the full series again, or do you think that the last patch is
> really optional this time?

Please drop the last patch. I'm not aware of dependencies on the last patch.

> However, I'm unsure how the stack traces I'm seeing are related to your
> patch. Maybe it just made an existing bug more likely to be triggered?

I'll share my thoughts once I've looked at the crash today.

Regarding AioContext lock removal: I'll work on that and see what
still depends on the lock.

Stefan

> What I'm seeing is that the reader lock is held by an iothread that is
> waiting for its AioContext lock to make progress:
>
> Thread 3 (Thread 0x7f811e9346c0 (LWP 26390) "qemu-system-x86"):
> #0  0x00007f81250aaf80 in __lll_lock_wait () at /lib64/libc.so.6
> #1  0x00007f81250b149a in pthread_mutex_lock@@GLIBC_2.2.5 () at /lib64/libc.so.6
> #2  0x000055b7b170967e in qemu_mutex_lock_impl (mutex=0x55b7b34e3080, file=0x55b7b199e1f7 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:94
> #3  0x000055b7b1709953 in qemu_rec_mutex_lock_impl (mutex=0x55b7b34e3080, file=0x55b7b199e1f7 "../util/async.c", line=728) at ../util/qemu-thread-posix.c:149
> #4  0x000055b7b1728318 in aio_context_acquire (ctx=0x55b7b34e3020) at ../util/async.c:728
> #5  0x000055b7b1727c49 in co_schedule_bh_cb (opaque=0x55b7b34e3020) at ../util/async.c:565
> #6  0x000055b7b1726f1c in aio_bh_call (bh=0x55b7b34e2e70) at ../util/async.c:169
> #7  0x000055b7b17270ee in aio_bh_poll (ctx=0x55b7b34e3020) at ../util/async.c:216
> #8  0x000055b7b170351d in aio_poll (ctx=0x55b7b34e3020, blocking=true) at ../util/aio-posix.c:722
> #9  0x000055b7b1518604 in iothread_run (opaque=0x55b7b2904460) at ../iothread.c:63
> #10 0x000055b7b170a955 in qemu_thread_start (args=0x55b7b34e36b0) at ../util/qemu-thread-posix.c:541
> #11 0x00007f81250ae15d in start_thread () at /lib64/libc.so.6
> #12 0x00007f812512fc00 in clone3 () at /lib64/libc.so.6
>
> On the other hand, the main thread wants to acquire the writer lock,
> but it holds the AioContext lock of the iothread (it takes it in
> job_prepare_locked()):
>
> Thread 1 (Thread 0x7f811f4b7b00 (LWP 26388) "qemu-system-x86"):
> #0  0x00007f8125122356 in ppoll () at /lib64/libc.so.6
> #1  0x000055b7b172eae0 in qemu_poll_ns (fds=0x55b7b34ec910, nfds=1, timeout=-1) at ../util/qemu-timer.c:339
> #2  0x000055b7b1704ebd in fdmon_poll_wait (ctx=0x55b7b3269210, ready_list=0x7ffc90b05680, timeout=-1) at ../util/fdmon-poll.c:79
> #3  0x000055b7b1703284 in aio_poll (ctx=0x55b7b3269210, blocking=true) at ../util/aio-posix.c:670
> #4  0x000055b7b1567c3b in bdrv_graph_wrlock (bs=0x0) at ../block/graph-lock.c:145
> #5  0x000055b7b1554c1c in blk_remove_bs (blk=0x55b7b4425800) at ../block/block-backend.c:916
> #6  0x000055b7b1554779 in blk_delete (blk=0x55b7b4425800) at ../block/block-backend.c:497
> #7  0x000055b7b1554133 in blk_unref (blk=0x55b7b4425800) at ../block/block-backend.c:557
> #8  0x000055b7b157a149 in mirror_exit_common (job=0x55b7b4419000) at ../block/mirror.c:696
> #9  0x000055b7b1577015 in mirror_prepare (job=0x55b7b4419000) at ../block/mirror.c:807
> #10 0x000055b7b153a1a7 in job_prepare_locked (job=0x55b7b4419000) at ../job.c:988
> #11 0x000055b7b153a0d9 in job_txn_apply_locked (job=0x55b7b4419000, fn=0x55b7b153a110 <job_prepare_locked>) at ../job.c:191
> #12 0x000055b7b1538b6d in job_do_finalize_locked (job=0x55b7b4419000) at ../job.c:1011
> #13 0x000055b7b153a886 in job_completed_txn_success_locked (job=0x55b7b4419000) at ../job.c:1068
> #14 0x000055b7b1539372 in job_completed_locked (job=0x55b7b4419000) at ../job.c:1082
> #15 0x000055b7b153a71b in job_exit (opaque=0x55b7b4419000) at ../job.c:1103
> #16 0x000055b7b1726f1c in aio_bh_call (bh=0x7f8110005470) at ../util/async.c:169
> #17 0x000055b7b17270ee in aio_bh_poll (ctx=0x55b7b3269210) at ../util/async.c:216
> #18 0x000055b7b1702c05 in aio_dispatch (ctx=0x55b7b3269210) at ../util/aio-posix.c:423
> #19 0x000055b7b1728a14 in aio_ctx_dispatch (source=0x55b7b3269210, callback=0x0, user_data=0x0) at ../util/async.c:358
> #20 0x00007f8126c31c7f in g_main_dispatch (context=0x55b7b3269720) at ../glib/gmain.c:3454
> #21 g_main_context_dispatch (context=0x55b7b3269720) at ../glib/gmain.c:4172
> #22 0x000055b7b1729c98 in glib_pollfds_poll () at ../util/main-loop.c:290
> #23 0x000055b7b1729572 in os_host_main_loop_wait (timeout=27462700) at ../util/main-loop.c:313
> #24 0x000055b7b1729452 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:592
> #25 0x000055b7b119a1eb in qemu_main_loop () at ../softmmu/runstate.c:772
> #26 0x000055b7b14c102d in qemu_default_main () at ../softmmu/main.c:37
> #27 0x000055b7b14c1068 in main (argc=44, argv=0x7ffc90b05d58) at ../softmmu/main.c:48
>
> At first I thought we just need to look into the AioContext locking in
> job completion and drop it in the right places.
>
> But in fact, first of all, blk_remove_bs() needs to make up its mind if
> it wants the caller to hold the AioContext or not and document that.
> Because it calls both bdrv_drained_begin() (which requires holding the
> AioContext lock) and bdrv_graph_wrlock(NULL) (which forbids it).
>
> If we could fully get rid of the AioContext lock (as we originally
> stated as a goal), that would automatically solve this kind of
> deadlocks.
>
> Kevin
>