[Qemu-devel] [PATCH v2 0/9] blockdev: fix QMP 'transaction' with IOThreads

Stefan Hajnoczi posted 9 patches 6 years, 4 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20171206144550.22295-1-stefanha@redhat.com
Test checkpatch passed
Test docker passed
Test ppc passed
Test s390x passed
qapi/block-core.json       |  36 +++++++
include/sysemu/iothread.h  |   1 +
blockdev.c                 | 258 +++++++++++++++++++++++++++++++++------------
iothread.c                 |   7 ++
tests/qemu-iotests/202     |  95 +++++++++++++++++
tests/qemu-iotests/202.out |  11 ++
tests/qemu-iotests/group   |   1 +
7 files changed, 339 insertions(+), 70 deletions(-)
create mode 100755 tests/qemu-iotests/202
create mode 100644 tests/qemu-iotests/202.out
[Qemu-devel] [PATCH v2 0/9] blockdev: fix QMP 'transaction' with IOThreads
Posted by Stefan Hajnoczi 6 years, 4 months ago
v2:
 * Use StrOrNull for x-blockdev-set-iothread iothread argument [eblake]

(This is for QEMU 2.12 because this bug is not -rc4 critical)

Previously AioContext was held across QMP 'transaction' in an attempt to
achieve bdrv_drained_begin/end() semantics.  Nowadays we have
bdrv_drained_begin/end() and the AioContext lock just protects state.
Therefore there is no reason to hold AioContext across
.prepare/.commit/.abort/.clean() anymore.

Besides being cleanup-worthy, holding AioContext also exposes a bug:
BDRV_POLL_WHILE() doesn't support recursive AioContext locking and will hang if
depth > 1.  This is easy to trigger by submitting a transaction with 2 actions
that touch two drives assigned to an IOThread.  The IOThread's AioContext will
be locked twice and BDRV_POLL_WHILE() will hang.

BDRV_POLL_WHILE() is best fixed by eliminating the AioContext lock entirely in
favor of fine-grained locking.  I discussed some fixes for BDRV_POLL_WHILE()
with Paolo but we came to the conclusion that it will just add complexity when
we really want to stop using AioContext locking.

Summary:

 * Patch 1 fixes missing AioContext lock protection

 * Patches 2-6 clean up excessive AioContext locked regions in QMP
   'transaction' to solve the hang

 * Patches 7-9 add a qemu-iotests test case and the necessary infrastructure

Stefan Hajnoczi (9):
  blockdev: hold AioContext for bdrv_unref() in
    external_snapshot_clean()
  block: don't keep AioContext acquired after
    external_snapshot_prepare()
  block: don't keep AioContext acquired after drive_backup_prepare()
  block: don't keep AioContext acquired after blockdev_backup_prepare()
  block: don't keep AioContext acquired after
    internal_snapshot_prepare()
  block: drop unused BlockDirtyBitmapState->aio_context field
  iothread: add iothread_by_id() API
  blockdev: add x-blockdev-set-iothread testing command
  qemu-iotests: add 202 external snapshots IOThread test

 qapi/block-core.json       |  36 +++++++
 include/sysemu/iothread.h  |   1 +
 blockdev.c                 | 258 +++++++++++++++++++++++++++++++++------------
 iothread.c                 |   7 ++
 tests/qemu-iotests/202     |  95 +++++++++++++++++
 tests/qemu-iotests/202.out |  11 ++
 tests/qemu-iotests/group   |   1 +
 7 files changed, 339 insertions(+), 70 deletions(-)
 create mode 100755 tests/qemu-iotests/202
 create mode 100644 tests/qemu-iotests/202.out

-- 
2.14.3


Re: [Qemu-devel] [Qemu-block] [PATCH v2 0/9] blockdev: fix QMP 'transaction' with IOThreads
Posted by Stefan Hajnoczi 6 years, 4 months ago
On Wed, Dec 06, 2017 at 02:45:41PM +0000, Stefan Hajnoczi wrote:
> v2:
>  * Use StrOrNull for x-blockdev-set-iothread iothread argument [eblake]

Thanks for the reviews, Eric.

I will wait for one more block layer person (Kevin, Paolo, or Fam?) to
review before merging.

Stefan
Re: [Qemu-devel] [PATCH v2 0/9] blockdev: fix QMP 'transaction' with IOThreads
Posted by Paolo Bonzini 6 years, 4 months ago
On 06/12/2017 15:45, Stefan Hajnoczi wrote:
> v2:
>  * Use StrOrNull for x-blockdev-set-iothread iothread argument [eblake]
> 
> (This is for QEMU 2.12 because this bug is not -rc4 critical)
> 
> Previously AioContext was held across QMP 'transaction' in an attempt to
> achieve bdrv_drained_begin/end() semantics.  Nowadays we have
> bdrv_drained_begin/end() and the AioContext lock just protects state.
> Therefore there is no reason to hold AioContext across
> .prepare/.commit/.abort/.clean() anymore.
> 
> Besides being cleanup-worthy, holding AioContext also exposes a bug:
> BDRV_POLL_WHILE() doesn't support recursive AioContext locking and will hang if
> depth > 1.  This is easy to trigger by submitting a transaction with 2 actions
> that touch two drives assigned to an IOThread.  The IOThread's AioContext will
> be locked twice and BDRV_POLL_WHILE() will hang.
> 
> BDRV_POLL_WHILE() is best fixed by eliminating the AioContext lock entirely in
> favor of fine-grained locking.  I discussed some fixes for BDRV_POLL_WHILE()
> with Paolo but we came to the conclusion that it will just add complexity when
> we really want to stop using AioContext locking.
> 
> Summary:
> 
>  * Patch 1 fixes missing AioContext lock protection
> 
>  * Patches 2-6 clean up excessive AioContext locked regions in QMP
>    'transaction' to solve the hang
> 
>  * Patches 7-9 add a qemu-iotests test case and the necessary infrastructure
> 
> Stefan Hajnoczi (9):
>   blockdev: hold AioContext for bdrv_unref() in
>     external_snapshot_clean()
>   block: don't keep AioContext acquired after
>     external_snapshot_prepare()
>   block: don't keep AioContext acquired after drive_backup_prepare()
>   block: don't keep AioContext acquired after blockdev_backup_prepare()
>   block: don't keep AioContext acquired after
>     internal_snapshot_prepare()
>   block: drop unused BlockDirtyBitmapState->aio_context field
>   iothread: add iothread_by_id() API
>   blockdev: add x-blockdev-set-iothread testing command
>   qemu-iotests: add 202 external snapshots IOThread test
> 
>  qapi/block-core.json       |  36 +++++++
>  include/sysemu/iothread.h  |   1 +
>  blockdev.c                 | 258 +++++++++++++++++++++++++++++++++------------
>  iothread.c                 |   7 ++
>  tests/qemu-iotests/202     |  95 +++++++++++++++++
>  tests/qemu-iotests/202.out |  11 ++
>  tests/qemu-iotests/group   |   1 +
>  7 files changed, 339 insertions(+), 70 deletions(-)
>  create mode 100755 tests/qemu-iotests/202
>  create mode 100644 tests/qemu-iotests/202.out
> 

Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>

Re: [Qemu-devel] [PATCH v2 0/9] blockdev: fix QMP 'transaction' with IOThreads
Posted by Stefan Hajnoczi 6 years, 4 months ago
On Wed, Dec 06, 2017 at 02:45:41PM +0000, Stefan Hajnoczi wrote:
> v2:
>  * Use StrOrNull for x-blockdev-set-iothread iothread argument [eblake]
> 
> (This is for QEMU 2.12 because this bug is not -rc4 critical)
> 
> Previously AioContext was held across QMP 'transaction' in an attempt to
> achieve bdrv_drained_begin/end() semantics.  Nowadays we have
> bdrv_drained_begin/end() and the AioContext lock just protects state.
> Therefore there is no reason to hold AioContext across
> .prepare/.commit/.abort/.clean() anymore.
> 
> Besides being cleanup-worthy, holding AioContext also exposes a bug:
> BDRV_POLL_WHILE() doesn't support recursive AioContext locking and will hang if
> depth > 1.  This is easy to trigger by submitting a transaction with 2 actions
> that touch two drives assigned to an IOThread.  The IOThread's AioContext will
> be locked twice and BDRV_POLL_WHILE() will hang.
> 
> BDRV_POLL_WHILE() is best fixed by eliminating the AioContext lock entirely in
> favor of fine-grained locking.  I discussed some fixes for BDRV_POLL_WHILE()
> with Paolo but we came to the conclusion that it will just add complexity when
> we really want to stop using AioContext locking.
> 
> Summary:
> 
>  * Patch 1 fixes missing AioContext lock protection
> 
>  * Patches 2-6 clean up excessive AioContext locked regions in QMP
>    'transaction' to solve the hang
> 
>  * Patches 7-9 add a qemu-iotests test case and the necessary infrastructure
> 
> Stefan Hajnoczi (9):
>   blockdev: hold AioContext for bdrv_unref() in
>     external_snapshot_clean()
>   block: don't keep AioContext acquired after
>     external_snapshot_prepare()
>   block: don't keep AioContext acquired after drive_backup_prepare()
>   block: don't keep AioContext acquired after blockdev_backup_prepare()
>   block: don't keep AioContext acquired after
>     internal_snapshot_prepare()
>   block: drop unused BlockDirtyBitmapState->aio_context field
>   iothread: add iothread_by_id() API
>   blockdev: add x-blockdev-set-iothread testing command
>   qemu-iotests: add 202 external snapshots IOThread test
> 
>  qapi/block-core.json       |  36 +++++++
>  include/sysemu/iothread.h  |   1 +
>  blockdev.c                 | 258 +++++++++++++++++++++++++++++++++------------
>  iothread.c                 |   7 ++
>  tests/qemu-iotests/202     |  95 +++++++++++++++++
>  tests/qemu-iotests/202.out |  11 ++
>  tests/qemu-iotests/group   |   1 +
>  7 files changed, 339 insertions(+), 70 deletions(-)
>  create mode 100755 tests/qemu-iotests/202
>  create mode 100644 tests/qemu-iotests/202.out
> 
> -- 
> 2.14.3
> 

Thanks, applied to my block-next tree:
https://github.com/stefanha/qemu/commits/block-next

Stefan
Re: [Qemu-devel] [PATCH v2 0/9] blockdev: fix QMP 'transaction' with IOThreads
Posted by Kevin Wolf 6 years, 4 months ago
Am 06.12.2017 um 15:45 hat Stefan Hajnoczi geschrieben:
> v2:
>  * Use StrOrNull for x-blockdev-set-iothread iothread argument [eblake]
> 
> (This is for QEMU 2.12 because this bug is not -rc4 critical)
> 
> Previously AioContext was held across QMP 'transaction' in an attempt to
> achieve bdrv_drained_begin/end() semantics.  Nowadays we have
> bdrv_drained_begin/end() and the AioContext lock just protects state.
> Therefore there is no reason to hold AioContext across
> .prepare/.commit/.abort/.clean() anymore.
> 
> Besides being cleanup-worthy, holding AioContext also exposes a bug:
> BDRV_POLL_WHILE() doesn't support recursive AioContext locking and will hang if
> depth > 1.  This is easy to trigger by submitting a transaction with 2 actions
> that touch two drives assigned to an IOThread.  The IOThread's AioContext will
> be locked twice and BDRV_POLL_WHILE() will hang.
> 
> BDRV_POLL_WHILE() is best fixed by eliminating the AioContext lock entirely in
> favor of fine-grained locking.  I discussed some fixes for BDRV_POLL_WHILE()
> with Paolo but we came to the conclusion that it will just add complexity when
> we really want to stop using AioContext locking.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

The only thing that I see that could use some improvements is the test
case, which tests only a small part of the bugs that are fixed by this
series. This doesn't invalidate any of the review, but I think without
full test coverage, it's incomplete.

Kevin