[Qemu-devel] [PULL 00/23] Block layer patches

Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20181001171901.11004-1-kwolf@redhat.com
Test docker-clang@ubuntu failed
Test checkpatch passed
There is a newer version of this series
qapi/block-core.json       |   4 +-
docs/qcow2-cache.txt       |  59 ++++++++++++--------
block/qcow2.h              |  19 ++++---
include/block/block.h      |   1 +
include/qemu/units.h       |  55 ++++++++++++++++++
block.c                    | 135 +++++++++++++++++++++++++++++----------------
block/block-backend.c      |   3 +
block/file-posix.c         |  19 +++++--
block/qcow2.c              |  43 +++++++++------
qemu-io-cmds.c             |   2 +-
tests/test-bdrv-drain.c    |   4 +-
tests/test-replication.c   |  11 ++++
qemu-options.hx            |  12 ++--
tests/qemu-iotests/067.out |   1 +
tests/qemu-iotests/137     |   8 ++-
tests/qemu-iotests/137.out |   4 +-
tests/qemu-iotests/153.out |  76 ++++++++++++-------------
tests/qemu-iotests/182.out |   2 +-
18 files changed, 307 insertions(+), 151 deletions(-)
[Qemu-devel] [PULL 00/23] Block layer patches
Posted by Kevin Wolf 6 years, 6 months ago
The following changes since commit 07f426c35eddd79388a23d11cb278600d7e3831d:

  Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20180926' into staging (2018-09-28 18:56:09 +0100)

are available in the git repository at:

  git://repo.or.cz/qemu/kevin.git tags/for-upstream

for you to fetch changes up to dd353157942a59c21da07da5ac8749a871f7c3ed:

  tests/test-bdrv-drain: Fix too late qemu_event_reset() (2018-10-01 19:13:55 +0200)

----------------------------------------------------------------
Block layer patches:

- qcow2 cache option default changes (Linux: 32 MB maximum, limited by
  whatever cache size can be made use of with the specific image;
  default cache-clean-interval of 10 minutes)
- reopen: Allow specifying unchanged child node references, and changing
  a few generic options (discard, detect-zeroes)
- Fix werror/rerror defaults for -device drive=<node-name>
- Test case fixes

----------------------------------------------------------------
Alberto Garcia (9):
      qemu-io: Fix writethrough check in reopen
      file-posix: x-check-cache-dropped should default to false on reopen
      block: Remove child references from bs->{options,explicit_options}
      block: Don't look for child references in append_open_options()
      block: Allow child references on reopen
      block: Forbid trying to change unsupported options during reopen
      file-posix: Forbid trying to change unsupported options during reopen
      block: Allow changing 'discard' on reopen
      block: Allow changing 'detect-zeroes' on reopen

Fam Zheng (1):
      file-posix: Include filename in locking error message

Kevin Wolf (3):
      block-backend: Set werror/rerror defaults in blk_new()
      test-replication: Lock AioContext around blk_unref()
      tests/test-bdrv-drain: Fix too late qemu_event_reset()

Leonid Bloch (10):
      qcow2: Options' documentation fixes
      include: Add a lookup table of sizes
      qcow2: Make sizes more humanly readable
      qcow2: Avoid duplication in setting the refcount cache size
      qcow2: Assign the L2 cache relatively to the image size
      qcow2: Increase the default upper limit on the L2 cache size
      qcow2: Resize the cache upon image resizing
      qcow2: Set the default cache-clean-interval to 10 minutes
      qcow2: Explicit number replaced by a constant
      qcow2: Fix cache-clean-interval documentation

 qapi/block-core.json       |   4 +-
 docs/qcow2-cache.txt       |  59 ++++++++++++--------
 block/qcow2.h              |  19 ++++---
 include/block/block.h      |   1 +
 include/qemu/units.h       |  55 ++++++++++++++++++
 block.c                    | 135 +++++++++++++++++++++++++++++----------------
 block/block-backend.c      |   3 +
 block/file-posix.c         |  19 +++++--
 block/qcow2.c              |  43 +++++++++------
 qemu-io-cmds.c             |   2 +-
 tests/test-bdrv-drain.c    |   4 +-
 tests/test-replication.c   |  11 ++++
 qemu-options.hx            |  12 ++--
 tests/qemu-iotests/067.out |   1 +
 tests/qemu-iotests/137     |   8 ++-
 tests/qemu-iotests/137.out |   4 +-
 tests/qemu-iotests/153.out |  76 ++++++++++++-------------
 tests/qemu-iotests/182.out |   2 +-
 18 files changed, 307 insertions(+), 151 deletions(-)

Re: [Qemu-devel] [PULL 00/23] Block layer patches
Posted by Peter Maydell 6 years, 6 months ago
On 1 October 2018 at 18:18, Kevin Wolf <kwolf@redhat.com> wrote:
> The following changes since commit 07f426c35eddd79388a23d11cb278600d7e3831d:
>
>   Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20180926' into staging (2018-09-28 18:56:09 +0100)
>
> are available in the git repository at:
>
>   git://repo.or.cz/qemu/kevin.git tags/for-upstream
>
> for you to fetch changes up to dd353157942a59c21da07da5ac8749a871f7c3ed:
>
>   tests/test-bdrv-drain: Fix too late qemu_event_reset() (2018-10-01 19:13:55 +0200)
>
> ----------------------------------------------------------------
> Block layer patches:
>
> - qcow2 cache option default changes (Linux: 32 MB maximum, limited by
>   whatever cache size can be made use of with the specific image;
>   default cache-clean-interval of 10 minutes)
> - reopen: Allow specifying unchanged child node references, and changing
>   a few generic options (discard, detect-zeroes)
> - Fix werror/rerror defaults for -device drive=<node-name>
> - Test case fixes

I still got a hang on OSX on test-bdrv-drain, but I've applied
this anyway, since hopefully it fixes the other intermittent
failure and may reduce the likelihood with the test-bdrv-drain.

thanks
-- PMM

Re: [Qemu-devel] [PULL 00/23] Block layer patches
Posted by Peter Maydell 6 years, 6 months ago
On 2 October 2018 at 09:06, Peter Maydell <peter.maydell@linaro.org> wrote:
> I still got a hang on OSX on test-bdrv-drain, but I've applied
> this anyway, since hopefully it fixes the other intermittent
> failure and may reduce the likelihood with the test-bdrv-drain.

OSX seems to fail test-bdrv-drain fairly frequently. Here's
a back trace from a debug build. When run under the debugger
it seems to stop with a NULL pointer failure in notifier_list_notify();
when not run under the debugger it seems to hang eating CPU...

/bdrv-drain/iothread/drain_subtree: Process 77283 stopped
* thread #12, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000000000000
error: memory read failed for 0x0
Target 1: (test-bdrv-drain) stopped.
(lldb) bt
* thread #12, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000000000000
    frame #1: 0x000000010016524f
test-bdrv-drain`notifier_list_notify(list=0x0000700008501e50,
data=0x0000000000000000) at notify.c:40
    frame #2: 0x0000000100150c92
test-bdrv-drain`qemu_thread_atexit_run(arg=0x0000000100b24f88) at
qemu-thread-posix.c:473
    frame #3: 0x00007fff5a0e1163
libsystem_pthread.dylib`_pthread_tsd_cleanup + 463
    frame #4: 0x00007fff5a0e0ee9 libsystem_pthread.dylib`_pthread_exit + 79
    frame #5: 0x00007fff5a0df66c libsystem_pthread.dylib`_pthread_body + 351
    frame #6: 0x00007fff5a0df50d libsystem_pthread.dylib`_pthread_start + 377
    frame #7: 0x00007fff5a0debf9 libsystem_pthread.dylib`thread_start + 13
(lldb) info thread
error: 'info' is not a valid command.
error: Unrecognized command 'info'.
(lldb) thread backtrace all
  thread #1, queue = 'com.apple.main-thread'
    frame #0: 0x00007fff59f17d82 libsystem_kernel.dylib`__semwait_signal + 10
    frame #1: 0x00007fff5a0e3824 libsystem_pthread.dylib`_pthread_join + 626
    frame #2: 0x0000000100150f2a
test-bdrv-drain`qemu_thread_join(thread=0x0000000103001058) at
qemu-thread-posix.c:565
    frame #3: 0x00000001000f6d70
test-bdrv-drain`iothread_join(iothread=0x0000000103001050) at
iothread.c:62
    frame #4: 0x000000010000a9a0
test-bdrv-drain`test_iothread_common(drain_type=BDRV_SUBTREE_DRAIN,
drain_thread=1) at test-bdrv-drain.c:762
    frame #5: 0x000000010000789f
test-bdrv-drain`test_iothread_drain_subtree at test-bdrv-drain.c:781
    frame #6: 0x00000001003aea47
libglib-2.0.0.dylib`g_test_run_suite_internal + 697
    frame #7: 0x00000001003aec0a
libglib-2.0.0.dylib`g_test_run_suite_internal + 1148
    frame #8: 0x00000001003aec0a
libglib-2.0.0.dylib`g_test_run_suite_internal + 1148
    frame #9: 0x00000001003ae020 libglib-2.0.0.dylib`g_test_run_suite + 121
    frame #10: 0x00000001003adf73 libglib-2.0.0.dylib`g_test_run + 17
    frame #11: 0x0000000100001dd0 test-bdrv-drain`main(argc=1,
argv=0x00007ffeefbffa70) at test-bdrv-drain.c:1606
    frame #12: 0x00007fff59dc7015 libdyld.dylib`start + 1
  thread #2
    frame #0: 0x00007fff59f17a16 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff5a0e0589
libsystem_pthread.dylib`_pthread_cond_wait + 732
    frame #2: 0x0000000100150b5e
test-bdrv-drain`qemu_futex_wait(ev=0x00000001001bbad8, val=4294967295)
at qemu-thread-posix.c:347
    frame #3: 0x0000000100150acd
test-bdrv-drain`qemu_event_wait(ev=0x00000001001bbad8) at
qemu-thread-posix.c:442
    frame #4: 0x000000010016ca82
test-bdrv-drain`call_rcu_thread(opaque=0x0000000000000000) at
rcu.c:261
    frame #5: 0x0000000100150e76
test-bdrv-drain`qemu_thread_start(args=0x0000000100b1dfb0) at
qemu-thread-posix.c:504
    frame #6: 0x00007fff5a0df661 libsystem_pthread.dylib`_pthread_body + 340
    frame #7: 0x00007fff5a0df50d libsystem_pthread.dylib`_pthread_start + 377
    frame #8: 0x00007fff5a0debf9 libsystem_pthread.dylib`thread_start + 13
  thread #3
    frame #0: 0x00007fff59f1803a libsystem_kernel.dylib`__sigwait + 10
    frame #1: 0x00007fff5a0e1ad9 libsystem_pthread.dylib`sigwait + 61
    frame #2: 0x000000010014d781
test-bdrv-drain`sigwait_compat(opaque=0x0000000100b027d0) at
compatfd.c:36
    frame #3: 0x0000000100150e76
test-bdrv-drain`qemu_thread_start(args=0x0000000100b1e560) at
qemu-thread-posix.c:504
    frame #4: 0x00007fff5a0df661 libsystem_pthread.dylib`_pthread_body + 340
    frame #5: 0x00007fff5a0df50d libsystem_pthread.dylib`_pthread_start + 377
    frame #6: 0x00007fff5a0debf9 libsystem_pthread.dylib`thread_start + 13
* thread #12, stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x0000000000000000
    frame #1: 0x000000010016524f
test-bdrv-drain`notifier_list_notify(list=0x0000700008501e50,
data=0x0000000000000000) at notify.c:40
    frame #2: 0x0000000100150c92
test-bdrv-drain`qemu_thread_atexit_run(arg=0x0000000100b24f88) at
qemu-thread-posix.c:473
    frame #3: 0x00007fff5a0e1163
libsystem_pthread.dylib`_pthread_tsd_cleanup + 463
    frame #4: 0x00007fff5a0e0ee9 libsystem_pthread.dylib`_pthread_exit + 79
    frame #5: 0x00007fff5a0df66c libsystem_pthread.dylib`_pthread_body + 351
    frame #6: 0x00007fff5a0df50d libsystem_pthread.dylib`_pthread_start + 377
    frame #7: 0x00007fff5a0debf9 libsystem_pthread.dylib`thread_start + 13
  thread #13
    frame #0: 0x00007fff59f17cf2 libsystem_kernel.dylib`__select + 10
    frame #1: 0x000000010039bb60 libglib-2.0.0.dylib`g_poll + 430
    frame #2: 0x0000000100149d7b
test-bdrv-drain`qemu_poll_ns(fds=0x0000000100b25570, nfds=1,
timeout=-1) at qemu-timer.c:337
    frame #3: 0x000000010014c609
test-bdrv-drain`aio_poll(ctx=0x0000000100b26330, blocking=true) at
aio-posix.c:645
    frame #4: 0x00000001000f700f
test-bdrv-drain`iothread_run(opaque=0x0000000100a03620) at
iothread.c:51
    frame #5: 0x0000000100150e76
test-bdrv-drain`qemu_thread_start(args=0x0000000100a05240) at
qemu-thread-posix.c:504
    frame #6: 0x00007fff5a0df661 libsystem_pthread.dylib`_pthread_body + 340
    frame #7: 0x00007fff5a0df50d libsystem_pthread.dylib`_pthread_start + 377
    frame #8: 0x00007fff5a0debf9 libsystem_pthread.dylib`thread_start + 13


As far as I can tell it always fails with
/bdrv-drain/iothread/drain_subtree, but this test
doesn't fail if we just run it alone, so something
earlier in the test is setting it up to go wrong.

I don't understand entirely what's going on with the
union in qemu_thread_atexit_run() (this seems to be
Paolo's code from a few years back), but the pointer
passed to qemu_thread_atexit_run() is a pointer to
zeroed memory:

(lldb) memory read -c 32 arg
0x100a25558: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x100a25568: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

which when interpreted as a list_head means that
the iteration through the list gets a node with
NULLs in all its fields, and we try to call NULL.

thanks
-- PMM