[Qemu-devel] [PATCH for 2.9 v3 00/10] block: Fixes regarding dataplane and management operations

Fam Zheng posted 10 patches 7 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170410150542.30376-1-famz@redhat.com
Test checkpatch passed
Test docker passed
Test s390x passed
block.c                    | 17 +++++++++++++++--
block/block-backend.c      |  4 ++--
block/io.c                 | 34 ++++++++++++++++++----------------
blockjob.c                 |  4 ++--
include/block/aio.h        | 18 ++++++++++++++++++
include/block/block.h      | 27 +++++++++++++++++++++++++++
include/qemu/coroutine.h   |  5 +++++
qemu-io-cmds.c             |  2 +-
tests/qemu-iotests/109.out | 10 +++++-----
tests/test-blockjob-txn.c  |  6 +++++-
util/async.c               | 14 +++++++++++++-
util/qemu-coroutine.c      | 11 ++++++++---
util/trace-events          |  2 +-
13 files changed, 120 insertions(+), 34 deletions(-)
[Qemu-devel] [PATCH for 2.9 v3 00/10] block: Fixes regarding dataplane and management operations
Posted by Fam Zheng 7 years ago
v3: Respin the unmerged changes from v2 and include one new fix:

    (Yes, it is a big series for the last -rc, and I personally prefer the v2
    approach for the 4-9 part of the problem, which is much more mechanical.)

    - 1, 2 are redoing previous patch 4, using bdrv_parent_drained_begin/end.
      [Kevin]
      Also fix the ordering against aio_context_release. [Stefan]
    - 3 is unchanged from patch 6 in v2.
    - 4-9 are reworking of patch 5 following Paolo's suggestion, which allowed
      better patch split.
    - 10 is finding of a latent bug, which is revealed by patch 9.

v2: - Drop patch 4 in v1. A second thought made me feel neither it nor Kevin's
      suggestion to move the BH process to bdrv_drain_recurse/BDRV_POLL_WHILE
      is a complete fix. So leave it for a separate patch.
    - Add rev-by to patches 1, 3, 4.
    - Split from patch 1 in v1 and add patch 2, for the new assertions. [Kevin]
    - Rewrite patch 5. Fix block job's co when a BDS is moved to a different
      aio context. [Kevin]
    - Add patch 6.

Crashes are reported on dataplane devices when doing snapshot and commit under
guest I/O.

With this series, Ed's test case '176' now passes:

    https://github.com/skyportsystems/qemu-1/commits/eswierk-iotests-2.9

Fam Zheng (10):
  block: Make bdrv_parent_drained_begin/end public
  block: Quiesce old aio context during bdrv_set_aio_context
  tests/block-job-txn: Don't start block job before adding to txn
  coroutine: Extract qemu_aio_coroutine_enter
  async: Introduce aio_co_enter and aio_co_enter_if_inactive
  block: Introduce bdrv_coroutine_enter and *_if_inactive
  blockjob: Use bdrv_coroutine_enter to start coroutine
  qemu-io-cmds: Use bdrv_coroutine_enter
  block: Use bdrv_coroutine_enter to start I/O coroutines
  block: Fix bdrv_co_flush early return

 block.c                    | 17 +++++++++++++++--
 block/block-backend.c      |  4 ++--
 block/io.c                 | 34 ++++++++++++++++++----------------
 blockjob.c                 |  4 ++--
 include/block/aio.h        | 18 ++++++++++++++++++
 include/block/block.h      | 27 +++++++++++++++++++++++++++
 include/qemu/coroutine.h   |  5 +++++
 qemu-io-cmds.c             |  2 +-
 tests/qemu-iotests/109.out | 10 +++++-----
 tests/test-blockjob-txn.c  |  6 +++++-
 util/async.c               | 14 +++++++++++++-
 util/qemu-coroutine.c      | 11 ++++++++---
 util/trace-events          |  2 +-
 13 files changed, 120 insertions(+), 34 deletions(-)

-- 
2.9.3


Re: [Qemu-devel] [PATCH for 2.9 v3 00/10] block: Fixes regarding dataplane and management operations
Posted by Stefan Hajnoczi 7 years ago
On Mon, Apr 10, 2017 at 11:05:32PM +0800, Fam Zheng wrote:
> v3: Respin the unmerged changes from v2 and include one new fix:
> 
>     (Yes, it is a big series for the last -rc, and I personally prefer the v2
>     approach for the 4-9 part of the problem, which is much more mechanical.)
> 
>     - 1, 2 are redoing previous patch 4, using bdrv_parent_drained_begin/end.
>       [Kevin]
>       Also fix the ordering against aio_context_release. [Stefan]
>     - 3 is unchanged from patch 6 in v2.
>     - 4-9 are reworking of patch 5 following Paolo's suggestion, which allowed
>       better patch split.
>     - 10 is finding of a latent bug, which is revealed by patch 9.
> 
> v2: - Drop patch 4 in v1. A second thought made me feel neither it nor Kevin's
>       suggestion to move the BH process to bdrv_drain_recurse/BDRV_POLL_WHILE
>       is a complete fix. So leave it for a separate patch.
>     - Add rev-by to patches 1, 3, 4.
>     - Split from patch 1 in v1 and add patch 2, for the new assertions. [Kevin]
>     - Rewrite patch 5. Fix block job's co when a BDS is moved to a different
>       aio context. [Kevin]
>     - Add patch 6.
> 
> Crashes are reported on dataplane devices when doing snapshot and commit under
> guest I/O.
> 
> With this series, Ed's test case '176' now passes:
> 
>     https://github.com/skyportsystems/qemu-1/commits/eswierk-iotests-2.9
> 
> Fam Zheng (10):
>   block: Make bdrv_parent_drained_begin/end public
>   block: Quiesce old aio context during bdrv_set_aio_context
>   tests/block-job-txn: Don't start block job before adding to txn
>   coroutine: Extract qemu_aio_coroutine_enter
>   async: Introduce aio_co_enter and aio_co_enter_if_inactive
>   block: Introduce bdrv_coroutine_enter and *_if_inactive
>   blockjob: Use bdrv_coroutine_enter to start coroutine
>   qemu-io-cmds: Use bdrv_coroutine_enter
>   block: Use bdrv_coroutine_enter to start I/O coroutines
>   block: Fix bdrv_co_flush early return
> 
>  block.c                    | 17 +++++++++++++++--
>  block/block-backend.c      |  4 ++--
>  block/io.c                 | 34 ++++++++++++++++++----------------
>  blockjob.c                 |  4 ++--
>  include/block/aio.h        | 18 ++++++++++++++++++
>  include/block/block.h      | 27 +++++++++++++++++++++++++++
>  include/qemu/coroutine.h   |  5 +++++
>  qemu-io-cmds.c             |  2 +-
>  tests/qemu-iotests/109.out | 10 +++++-----
>  tests/test-blockjob-txn.c  |  6 +++++-
>  util/async.c               | 14 +++++++++++++-
>  util/qemu-coroutine.c      | 11 ++++++++---
>  util/trace-events          |  2 +-
>  13 files changed, 120 insertions(+), 34 deletions(-)

We need a fix for 2.9.  Interactions between AioContexts, coroutines,
and blockjobs are complex and brittle.  That is a pre-existing problem
though and hopefully we can simplify abstractions in 2.10.

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Re: [Qemu-devel] [PATCH for 2.9 v3 00/10] block: Fixes regarding dataplane and management operations
Posted by Kevin Wolf 7 years ago
Am 10.04.2017 um 17:05 hat Fam Zheng geschrieben:
> v3: Respin the unmerged changes from v2 and include one new fix:
> 
>     (Yes, it is a big series for the last -rc, and I personally prefer the v2
>     approach for the 4-9 part of the problem, which is much more mechanical.)
> 
>     - 1, 2 are redoing previous patch 4, using bdrv_parent_drained_begin/end.
>       [Kevin]
>       Also fix the ordering against aio_context_release. [Stefan]
>     - 3 is unchanged from patch 6 in v2.
>     - 4-9 are reworking of patch 5 following Paolo's suggestion, which allowed
>       better patch split.
>     - 10 is finding of a latent bug, which is revealed by patch 9.
> 
> v2: - Drop patch 4 in v1. A second thought made me feel neither it nor Kevin's
>       suggestion to move the BH process to bdrv_drain_recurse/BDRV_POLL_WHILE
>       is a complete fix. So leave it for a separate patch.
>     - Add rev-by to patches 1, 3, 4.
>     - Split from patch 1 in v1 and add patch 2, for the new assertions. [Kevin]
>     - Rewrite patch 5. Fix block job's co when a BDS is moved to a different
>       aio context. [Kevin]
>     - Add patch 6.
> 
> Crashes are reported on dataplane devices when doing snapshot and commit under
> guest I/O.
> 
> With this series, Ed's test case '176' now passes:
> 
>     https://github.com/skyportsystems/qemu-1/commits/eswierk-iotests-2.9

I had only two points for this series. The first is that it adds unused
functions, which doesn't hurt (but I might just send a PATCH 11/10 to
remove them again). The second one is that some sheepdog code is
suspicious, but if anything it just means that this series is incomplete,
so not a show stopper either.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

Re: [Qemu-devel] [PATCH for 2.9 v3 00/10] block: Fixes regarding dataplane and management operations
Posted by Fam Zheng 7 years ago
On Tue, 04/11 13:05, Kevin Wolf wrote:
> Am 10.04.2017 um 17:05 hat Fam Zheng geschrieben:
> > v3: Respin the unmerged changes from v2 and include one new fix:
> > 
> >     (Yes, it is a big series for the last -rc, and I personally prefer the v2
> >     approach for the 4-9 part of the problem, which is much more mechanical.)
> > 
> >     - 1, 2 are redoing previous patch 4, using bdrv_parent_drained_begin/end.
> >       [Kevin]
> >       Also fix the ordering against aio_context_release. [Stefan]
> >     - 3 is unchanged from patch 6 in v2.
> >     - 4-9 are reworking of patch 5 following Paolo's suggestion, which allowed
> >       better patch split.
> >     - 10 is finding of a latent bug, which is revealed by patch 9.
> > 
> > v2: - Drop patch 4 in v1. A second thought made me feel neither it nor Kevin's
> >       suggestion to move the BH process to bdrv_drain_recurse/BDRV_POLL_WHILE
> >       is a complete fix. So leave it for a separate patch.
> >     - Add rev-by to patches 1, 3, 4.
> >     - Split from patch 1 in v1 and add patch 2, for the new assertions. [Kevin]
> >     - Rewrite patch 5. Fix block job's co when a BDS is moved to a different
> >       aio context. [Kevin]
> >     - Add patch 6.
> > 
> > Crashes are reported on dataplane devices when doing snapshot and commit under
> > guest I/O.
> > 
> > With this series, Ed's test case '176' now passes:
> > 
> >     https://github.com/skyportsystems/qemu-1/commits/eswierk-iotests-2.9
> 
> I had only two points for this series. The first is that it adds unused
> functions, which doesn't hurt (but I might just send a PATCH 11/10 to
> remove them again). The second one is that some sheepdog code is
> suspicious, but if anything it just means that this series is incomplete,
> so not a show stopper either.
> 
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>

Thanks, I'll squash in the removal patch and send a pull request for 2.9.

Fam