[Qemu-devel] [PULL 00/42] Block patches

Failed in applying to current master (apply log)
There is a newer version of this series
qapi/block-core.json          | 104 ++++++++++++---
include/block/aio-wait.h      |  28 ++--
include/block/block.h         |   6 +-
include/block/block_int.h     |  18 ++-
include/block/blockjob.h      |   3 +
include/qemu/coroutine.h      |   5 +
include/qemu/job.h            |  23 ++--
block.c                       |   6 +-
block/block-backend.c         |  31 +++--
block/commit.c                |  97 ++++++++------
block/io.c                    |  30 +++--
block/linux-aio.c             |   2 +-
block/mirror.c                |  49 +++++--
block/stream.c                |  28 ++--
blockdev.c                    |  84 ++++++++++--
blockjob.c                    |   9 +-
hmp.c                         |   5 +-
job.c                         | 144 +++++++++++----------
tests/test-bdrv-drain.c       | 294 +++++++++++++++++++++++++++++++++++++++---
tests/test-blockjob-txn.c     |   4 +-
tests/test-blockjob.c         | 120 ++++++++---------
util/aio-wait.c               |  11 +-
util/async.c                  |   2 +-
util/qemu-coroutine.c         |   5 +
tests/qemu-iotests/040        |  52 +++++++-
tests/qemu-iotests/040.out    |   4 +-
tests/qemu-iotests/051        |   3 +
tests/qemu-iotests/051.out    |   3 +
tests/qemu-iotests/051.pc.out |   3 +
29 files changed, 856 insertions(+), 317 deletions(-)
[Qemu-devel] [PULL 00/42] Block patches
Posted by Max Reitz 5 years, 7 months ago
The following changes since commit 506e4a00de01e0b29fa83db5cbbc3d154253b4ea:

  Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-3.1-20180925' into staging (2018-09-25 13:30:45 +0100)

are available in the Git repository at:

  https://git.xanclic.moe/XanClic/qemu.git tags/pull-block-2018-09-25

for you to fetch changes up to 9c76ff9c16be890e70fce30754b096ff9950d1ee:

  Merge remote-tracking branch 'kevin/tags/for-upstream' into block (2018-09-25 16:12:44 +0200)

----------------------------------------------------------------
Block layer patches:
- Drain fixes
- node-name parameters for block-commit
- Refactor block jobs to use transactional callbacks for exiting

----------------------------------------------------------------
Alberto Garcia (2):
      block: Fix use after free error in bdrv_open_inherit()
      qemu-iotests: Test snapshot=on with nonexistent TMPDIR

Fam Zheng (1):
      job: Fix nested aio_poll() hanging in job_txn_apply

John Snow (16):
      block/commit: add block job creation flags
      block/mirror: add block job creation flags
      block/stream: add block job creation flags
      block/commit: refactor commit to use job callbacks
      block/mirror: don't install backing chain on abort
      block/mirror: conservative mirror_exit refactor
      block/stream: refactor stream to use job callbacks
      tests/blockjob: replace Blockjob with Job
      tests/test-blockjob: remove exit callback
      tests/test-blockjob-txn: move .exit to .clean
      jobs: remove .exit callback
      qapi/block-commit: expose new job properties
      qapi/block-mirror: expose new job properties
      qapi/block-stream: expose new job properties
      block/backup: qapi documentation fixup
      blockdev: document transactional shortcomings

Kevin Wolf (21):
      commit: Add top-node/base-node options
      qemu-iotests: Test commit with top-node/base-node
      job: Fix missing locking due to mismerge
      blockjob: Wake up BDS when job becomes idle
      aio-wait: Increase num_waiters even in home thread
      test-bdrv-drain: Drain with block jobs in an I/O thread
      test-blockjob: Acquire AioContext around job_cancel_sync()
      job: Use AIO_WAIT_WHILE() in job_finish_sync()
      test-bdrv-drain: Test AIO_WAIT_WHILE() in completion callback
      block: Add missing locking in bdrv_co_drain_bh_cb()
      block-backend: Add .drained_poll callback
      block-backend: Fix potential double blk_delete()
      block-backend: Decrease in_flight only after callback
      blockjob: Lie better in child_job_drained_poll()
      block: Remove aio_poll() in bdrv_drain_poll variants
      test-bdrv-drain: Test nested poll in bdrv_drain_poll_top_level()
      job: Avoid deadlocks in job_completed_txn_abort()
      test-bdrv-drain: AIO_WAIT_WHILE() in job .commit/.abort
      test-bdrv-drain: Fix outdated comments
      block: Use a single global AioWait
      test-bdrv-drain: Test draining job source child and parent

Max Reitz (1):
      Merge remote-tracking branch 'kevin/tags/for-upstream' into block

Sergio Lopez (2):
      block/linux-aio: acquire AioContext before qemu_laio_process_completions
      util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb

 qapi/block-core.json          | 104 ++++++++++++---
 include/block/aio-wait.h      |  28 ++--
 include/block/block.h         |   6 +-
 include/block/block_int.h     |  18 ++-
 include/block/blockjob.h      |   3 +
 include/qemu/coroutine.h      |   5 +
 include/qemu/job.h            |  23 ++--
 block.c                       |   6 +-
 block/block-backend.c         |  31 +++--
 block/commit.c                |  97 ++++++++------
 block/io.c                    |  30 +++--
 block/linux-aio.c             |   2 +-
 block/mirror.c                |  49 +++++--
 block/stream.c                |  28 ++--
 blockdev.c                    |  84 ++++++++++--
 blockjob.c                    |   9 +-
 hmp.c                         |   5 +-
 job.c                         | 144 +++++++++++----------
 tests/test-bdrv-drain.c       | 294 +++++++++++++++++++++++++++++++++++++++---
 tests/test-blockjob-txn.c     |   4 +-
 tests/test-blockjob.c         | 120 ++++++++---------
 util/aio-wait.c               |  11 +-
 util/async.c                  |   2 +-
 util/qemu-coroutine.c         |   5 +
 tests/qemu-iotests/040        |  52 +++++++-
 tests/qemu-iotests/040.out    |   4 +-
 tests/qemu-iotests/051        |   3 +
 tests/qemu-iotests/051.out    |   3 +
 tests/qemu-iotests/051.pc.out |   3 +
 29 files changed, 856 insertions(+), 317 deletions(-)

-- 
2.17.1


Re: [Qemu-devel] [PULL 00/42] Block patches
Posted by Peter Maydell 5 years, 7 months ago
On 25 September 2018 at 16:14, Max Reitz <mreitz@redhat.com> wrote:
> The following changes since commit 506e4a00de01e0b29fa83db5cbbc3d154253b4ea:
>
>   Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-3.1-20180925' into staging (2018-09-25 13:30:45 +0100)
>
> are available in the Git repository at:
>
>   https://git.xanclic.moe/XanClic/qemu.git tags/pull-block-2018-09-25
>
> for you to fetch changes up to 9c76ff9c16be890e70fce30754b096ff9950d1ee:
>
>   Merge remote-tracking branch 'kevin/tags/for-upstream' into block (2018-09-25 16:12:44 +0200)
>
> ----------------------------------------------------------------
> Block layer patches:
> - Drain fixes
> - node-name parameters for block-commit
> - Refactor block jobs to use transactional callbacks for exiting
>
> ----------------------------------------------------------------

Applied, thanks.

-- PMM

Re: [Qemu-devel] [PULL 00/42] Block patches
Posted by Peter Maydell 5 years, 6 months ago
On 25 September 2018 at 18:09, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 25 September 2018 at 16:14, Max Reitz <mreitz@redhat.com> wrote:
>> The following changes since commit 506e4a00de01e0b29fa83db5cbbc3d154253b4ea:
>>
>>   Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-3.1-20180925' into staging (2018-09-25 13:30:45 +0100)
>>
>> are available in the Git repository at:
>>
>>   https://git.xanclic.moe/XanClic/qemu.git tags/pull-block-2018-09-25
>>
>> for you to fetch changes up to 9c76ff9c16be890e70fce30754b096ff9950d1ee:
>>
>>   Merge remote-tracking branch 'kevin/tags/for-upstream' into block (2018-09-25 16:12:44 +0200)
>>
>> ----------------------------------------------------------------
>> Block layer patches:
>> - Drain fixes
>> - node-name parameters for block-commit
>> - Refactor block jobs to use transactional callbacks for exiting
>>
>> ----------------------------------------------------------------
>
> Applied, thanks.

I'm finding that test-bdrv-drain hangs intermittently on my OSX host.
It hangs using CPU, not stopped completely, but when I attach with lldb
it says that we got a NULL pointer deref in one of the threads, so
why the whole thing didn't exit with a signal I have no idea. Anyway:

manooth$ lldb -p 40546
(lldb) process attach --pid 40546
Process 40546 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00007fff59f17d82 libsystem_kernel.dylib`__semwait_signal + 10
libsystem_kernel.dylib`__semwait_signal:
->  0x7fff59f17d82 <+10>: jae    0x7fff59f17d8c            ; <+20>
    0x7fff59f17d84 <+12>: movq   %rax, %rdi
    0x7fff59f17d87 <+15>: jmp    0x7fff59f0eb0e            ; cerror
    0x7fff59f17d8c <+20>: retq
  thread #4, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000000000000
error: memory read failed for 0x0
Target 0: (test-bdrv-drain) stopped.

Executable module set to
"/Users/pm215/src/qemu-for-merges/build/all/tests/test-bdrv-drain".
Architecture set to: x86_64-apple-macosx.
(lldb) thread backtrace all
warning: could not execute support code to read Objective-C class data
in the process. This may reduce the quality of type information
available.
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fff59f17d82 libsystem_kernel.dylib`__semwait_signal + 10
    frame #1: 0x00007fff5a0e3824 libsystem_pthread.dylib`_pthread_join + 626
    frame #2: 0x00000001024a2112
test-bdrv-drain`qemu_thread_join(thread=<unavailable>) at
qemu-thread-posix.c:565 [opt]
    frame #3: 0x0000000102465171
test-bdrv-drain`iothread_join(iothread=0x00007f868dd00a90) at
iothread.c:62 [opt]
    frame #4: 0x00000001023cbb41
test-bdrv-drain`test_iothread_common(drain_type=BDRV_SUBTREE_DRAIN,
drain_thread=<unavailable>) at test-bdrv-drain.c:762 [opt]
    frame #5: 0x0000000102622a47
libglib-2.0.0.dylib`g_test_run_suite_internal + 697
    frame #6: 0x0000000102622c0a
libglib-2.0.0.dylib`g_test_run_suite_internal + 1148
    frame #7: 0x0000000102622c0a
libglib-2.0.0.dylib`g_test_run_suite_internal + 1148
    frame #8: 0x0000000102622020 libglib-2.0.0.dylib`g_test_run_suite + 121
    frame #9: 0x0000000102621f73 libglib-2.0.0.dylib`g_test_run + 17
    frame #10: 0x00000001023c7cda test-bdrv-drain`main(argc=1,
argv=0x00007ffeed839a90) at test-bdrv-drain.c:1606 [opt]
    frame #11: 0x00007fff59dc7015 libdyld.dylib`start + 1
  thread #2
    frame #0: 0x00007fff59f17a16 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff5a0e0589
libsystem_pthread.dylib`_pthread_cond_wait + 732
    frame #2: 0x00000001024a1dec test-bdrv-drain`qemu_event_wait
[inlined] qemu_futex_wait(ev=<unavailable>, val=4294967295) at
qemu-thread-posix.c:347 [opt]
    frame #3: 0x00000001024a1dcc
test-bdrv-drain`qemu_event_wait(ev=0x00000001025029c8) at
qemu-thread-posix.c:442 [opt]
    frame #4: 0x00000001024b4c88
test-bdrv-drain`call_rcu_thread(opaque=<unavailable>) at rcu.c:261
[opt]
    frame #5: 0x00007fff5a0df661 libsystem_pthread.dylib`_pthread_body + 340
    frame #6: 0x00007fff5a0df50d libsystem_pthread.dylib`_pthread_start + 377
    frame #7: 0x00007fff5a0debf9 libsystem_pthread.dylib`thread_start + 13
  thread #3
    frame #0: 0x00007fff59f1803a libsystem_kernel.dylib`__sigwait + 10
    frame #1: 0x00007fff5a0e1ad9 libsystem_pthread.dylib`sigwait + 61
    frame #2: 0x000000010249fdcb
test-bdrv-drain`sigwait_compat(opaque=0x00007f868dc0efd0) at
compatfd.c:36 [opt]
    frame #3: 0x00007fff5a0df661 libsystem_pthread.dylib`_pthread_body + 340
    frame #4: 0x00007fff5a0df50d libsystem_pthread.dylib`_pthread_start + 377
    frame #5: 0x00007fff5a0debf9 libsystem_pthread.dylib`thread_start + 13
  thread #4
    frame #0: 0x0000000000000000
    frame #1: 0x00000001024af579
test-bdrv-drain`notifier_list_notify(list=<unavailable>,
data=0x0000000000000000) at notify.c:40 [opt]
    frame #2: 0x00000001024a1f35
test-bdrv-drain`qemu_thread_atexit_run(arg=<unavailable>) at
qemu-thread-posix.c:473 [opt]
    frame #3: 0x00007fff5a0e1163
libsystem_pthread.dylib`_pthread_tsd_cleanup + 463
    frame #4: 0x00007fff5a0e0ee9 libsystem_pthread.dylib`_pthread_exit + 79
    frame #5: 0x00007fff5a0df66c libsystem_pthread.dylib`_pthread_body + 351
    frame #6: 0x00007fff5a0df50d libsystem_pthread.dylib`_pthread_start + 377
    frame #7: 0x00007fff5a0debf9 libsystem_pthread.dylib`thread_start + 13
  thread #5
    frame #0: 0x00007fff59f17cf2 libsystem_kernel.dylib`__select + 10
    frame #1: 0x000000010260fb60 libglib-2.0.0.dylib`g_poll + 430
    frame #2: 0x000000010249f7ec
test-bdrv-drain`aio_poll(ctx=0x00007f868dc32a10, blocking=true) at
aio-posix.c:645 [opt]
    frame #3: 0x00000001024652dd
test-bdrv-drain`iothread_run(opaque=0x00007f868dd000a0) at
iothread.c:51 [opt]
    frame #4: 0x00007fff5a0df661 libsystem_pthread.dylib`_pthread_body + 340
    frame #5: 0x00007fff5a0df50d libsystem_pthread.dylib`_pthread_start + 377
    frame #6: 0x00007fff5a0debf9 libsystem_pthread.dylib`thread_start + 13

(lldb) thread select 4
* thread #4
    frame #0: 0x0000000000000000
error: memory read failed for 0x0
(lldb) frame select 1
test-bdrv-drain was compiled with optimization - stepping may behave
oddly; variables may not be available.
frame #1: 0x00000001024af579
test-bdrv-drain`notifier_list_notify(list=<unavailable>,
data=0x0000000000000000) at notify.c:40 [opt]
   37       Notifier *notifier, *next;
   38
   39       QLIST_FOREACH_SAFE(notifier, &list->notifiers, node, next) {
-> 40           notifier->notify(notifier, data);
   41       }
   42   }
   43
(lldb) print notifier
(Notifier *) $0 = 0x0000000000000000

Non-debug build so debug info not very helpful, I'm afraid.

thanks
-- PMM

Re: [Qemu-devel] [PULL 00/42] Block patches
Posted by Peter Maydell 5 years, 6 months ago
On 28 September 2018 at 15:36, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 25 September 2018 at 18:09, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 25 September 2018 at 16:14, Max Reitz <mreitz@redhat.com> wrote:
>>> The following changes since commit 506e4a00de01e0b29fa83db5cbbc3d154253b4ea:
>>>
>>>   Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-3.1-20180925' into staging (2018-09-25 13:30:45 +0100)
>>>
>>> are available in the Git repository at:
>>>
>>>   https://git.xanclic.moe/XanClic/qemu.git tags/pull-block-2018-09-25
>>>
>>> for you to fetch changes up to 9c76ff9c16be890e70fce30754b096ff9950d1ee:
>>>
>>>   Merge remote-tracking branch 'kevin/tags/for-upstream' into block (2018-09-25 16:12:44 +0200)
>>>
>>> ----------------------------------------------------------------
>>> Block layer patches:
>>> - Drain fixes
>>> - node-name parameters for block-commit
>>> - Refactor block jobs to use transactional callbacks for exiting
>>>
>>> ----------------------------------------------------------------
>>
>> Applied, thanks.
>
> I'm finding that test-bdrv-drain hangs intermittently on my OSX host.

Here's a hang from my aarch32 Linux build:
(gdb) thread apply all bt

Thread 4 (Thread 0xe68fe040 (LWP 13198)):
#0  __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
#1  0xec79ec0a in __GI_ppoll (fds=0xe5f015c0, nfds=1,
timeout=<optimised out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:50
#2  0x000b28ee in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimised
out>, __fds=<optimised out>)
    at /usr/include/arm-linux-gnueabihf/bits/poll2.h:77
#3  qemu_poll_ns (fds=<optimised out>, nfds=<optimised out>,
timeout=<optimised out>) at
/home/peter.maydell/qemu/util/qemu-timer.c:322
#4  0x000b3dc4 in aio_poll (ctx=0xe5f00470, blocking=<optimised out>)
at /home/peter.maydell/qemu/util/aio-posix.c:645
#5  0x00091230 in iothread_run (opaque=0x149e3a0) at
/home/peter.maydell/qemu/tests/iothread.c:51
#6  0xec8035b4 in start_thread (arg=0x0) at pthread_create.c:335
#7  0xec7a4bec in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:89
from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 3 (Thread 0xe72e9040 (LWP 13197)):
#0  __libc_do_syscall () at ../sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:46
#1  0xec79ec0a in __GI_ppoll (fds=0xe69016c0, nfds=1,
timeout=<optimised out>, timeout@entry=0x0, sigmask=sigmask@entry=0x0)
    at ../sysdeps/unix/sysv/linux/ppoll.c:50
#2  0x000b28ee in ppoll (__ss=0x0, __timeout=0x0, __nfds=<optimised
out>, __fds=<optimised out>)
    at /usr/include/arm-linux-gnueabihf/bits/poll2.h:77
#3  qemu_poll_ns (fds=<optimised out>, nfds=<optimised out>,
timeout=<optimised out>) at
/home/peter.maydell/qemu/util/qemu-timer.c:322
#4  0x000b3dc4 in aio_poll (ctx=0xe6900470, blocking=<optimised out>)
at /home/peter.maydell/qemu/util/aio-posix.c:645
#5  0x00091230 in iothread_run (opaque=0x149e268) at
/home/peter.maydell/qemu/tests/iothread.c:51
#6  0xec8035b4 in start_thread (arg=0x0) at pthread_create.c:335
#7  0xec7a4bec in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:89
from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 2 (Thread 0xebd2b040 (LWP 13196)):
#0  syscall () at ../sysdeps/unix/sysv/linux/arm/syscall.S:37
#1  0x000b578e in qemu_futex_wait (val=<optimised out>, f=<optimised
out>) at /home/peter.maydell/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x1169c4 <rcu_call_ready_event>) at
/home/peter.maydell/qemu/util/qemu-thread-posix.c:442
#3  0x000bfc54 in call_rcu_thread (opaque=<optimised out>) at
/home/peter.maydell/qemu/util/rcu.c:261
#4  0xec8035b4 in start_thread (arg=0x0) at pthread_create.c:335
#5  0xec7a4bec in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:89
from /lib/arm-linux-gnueabihf/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thread 1 (Thread 0xebd2e000 (LWP 13194)):
#0  syscall () at ../sysdeps/unix/sysv/linux/arm/syscall.S:37
#1  0x000b578e in qemu_futex_wait (val=<optimised out>, f=<optimised
out>) at /home/peter.maydell/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x115644 <done_event>) at
/home/peter.maydell/qemu/util/qemu-thread-posix.c:442
#3  0x0001e89c in test_iothread_common
(drain_type=drain_type@entry=BDRV_DRAIN_ALL,
drain_thread=drain_thread@entry=0)
    at /home/peter.maydell/qemu/tests/test-bdrv-drain.c:733
#4  0x0001eab2 in test_iothread_drain_all () at
/home/peter.maydell/qemu/tests/test-bdrv-drain.c:768
#5  0xeca3894c in ?? () from /lib/arm-linux-gnueabihf/libglib-2.0.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

thanks
-- PMM

Re: [Qemu-devel] [PULL 00/42] Block patches
Posted by Peter Maydell 5 years, 6 months ago
On 28 September 2018 at 15:36, Peter Maydell <peter.maydell@linaro.org> wrote:
> I'm finding that test-bdrv-drain hangs intermittently on my OSX host.

Ping? Between this and test-replication I'm finding that my
parallel build tests for merges are failing about 50% of the
time :-(

If there's no immediate sign of a fix, could we disable these
tests?

thanks
-- PMM

Re: [Qemu-devel] [PULL 00/42] Block patches
Posted by Kevin Wolf 5 years, 6 months ago
Am 01.10.2018 um 15:03 hat Peter Maydell geschrieben:
> On 28 September 2018 at 15:36, Peter Maydell <peter.maydell@linaro.org> wrote:
> > I'm finding that test-bdrv-drain hangs intermittently on my OSX host.
> 
> Ping? Between this and test-replication I'm finding that my
> parallel build tests for merges are failing about 50% of the
> time :-(

Sorry, there wasn't much more than a weekend between your report and
now.

For the replication one, I think we can just take the AioContext lock in
the test case while we decide how the API should really be used. I'll
prepare a fix for that (and hopefully I'll be able to reproduce the
problem reliably enough to verify the fix).

Max said he could reproduce some hang in test-bdrv-drain (though we
don't know if this has anything to do with your OS X hang, which looked
rather odd) and would look into it, but I don't think we know the
problem yet. I'll try to reproduce that one after fixing the replication
test.

> If there's no immediate sign of a fix, could we disable these tests?

I hope we don't have to, though it depends on your definition of
immediate. :-)

Kevin

Re: [Qemu-devel] [Qemu-block] [PULL 00/42] Block patches
Posted by Kevin Wolf 5 years, 6 months ago
Am 01.10.2018 um 16:14 hat Kevin Wolf geschrieben:
> Am 01.10.2018 um 15:03 hat Peter Maydell geschrieben:
> > On 28 September 2018 at 15:36, Peter Maydell <peter.maydell@linaro.org> wrote:
> > > I'm finding that test-bdrv-drain hangs intermittently on my OSX host.
> > 
> > Ping? Between this and test-replication I'm finding that my
> > parallel build tests for merges are failing about 50% of the
> > time :-(
> 
> Sorry, there wasn't much more than a weekend between your report and
> now.
> 
> For the replication one, I think we can just take the AioContext lock in
> the test case while we decide how the API should really be used. I'll
> prepare a fix for that (and hopefully I'll be able to reproduce the
> problem reliably enough to verify the fix).
> 
> Max said he could reproduce some hang in test-bdrv-drain (though we
> don't know if this has anything to do with your OS X hang, which looked
> rather odd) and would look into it, but I don't think we know the
> problem yet. I'll try to reproduce that one after fixing the replication
> test.

So I sent two patches for the two test cases that should fix the bugs
that made the tests fail relatively frequently. I can still reproduce
another hang, which is a bit mysterious to me:

Thread 2 (Thread 3321.3818):
#0  0x00007f2ebbdcc4e9 in syscall () from /lib64/libc.so.6
#1  0x00005594d095690b in qemu_futex_wait (val=<optimized out>, f=<optimized out>) at /home/kwolf/source/qemu/include/qemu/futex.h:29
#2  qemu_event_wait (ev=ev@entry=0x5594d0bff228 <rcu_call_ready_event>) at util/qemu-thread-posix.c:442
#3  0x00005594d0965f58 in call_rcu_thread (opaque=<optimized out>) at util/rcu.c:261
#4  0x00007f2ebc09d36d in start_thread () from /lib64/libpthread.so.0
#5  0x00007f2ebbdd1b4f in clone () from /lib64/libc.so.6

Thread 1 (Thread 3321.3321):
#0  0x00007f2ebc09e89d in pthread_join () from /lib64/libpthread.so.0
#1  0x00005594d0956b6f in qemu_thread_join (thread=thread@entry=0x5594d16bd0b8) at util/qemu-thread-posix.c:565
#2  0x00005594d091f4d9 in iothread_join (iothread=0x5594d16bd0b0) at tests/iothread.c:62
#3  0x00005594d08806cc in test_iothread_common (drain_type=BDRV_DRAIN_ALL, drain_thread=<optimized out>) at tests/test-bdrv-drain.c:763
#4  0x00007f2ebd58e178 in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
#5  0x00007f2ebd58e37b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
#6  0x00007f2ebd58e37b in g_test_run_suite_internal () from /lib64/libglib-2.0.so.0
#7  0x00007f2ebd58e51b in g_test_run_suite () from /lib64/libglib-2.0.so.0
#8  0x00007f2ebd58e571 in g_test_run () from /lib64/libglib-2.0.so.0
#9  0x00005594d087a534 in main (argc=<optimized out>, argv=<optimized out>) at tests/test-bdrv-drain.c:1606

This pthread_join() is waiting for a thread that doesn't even exist any
more. I caught the bug in rr and am clearly seeing how the iothread is
notified and terminates. But pthread_join() just doesn't return.

Kevin