[Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections

Max Reitz posted 9 patches 6 years, 4 months ago
Test s390x passed
Test checkpatch passed
Test asan passed
Test docker-mingw@fedora passed
Test docker-clang@ubuntu passed
Test FreeBSD passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20190619152603.5937-1-mreitz@redhat.com
Maintainers: Fam Zheng <fam@euphon.net>, Stefan Hajnoczi <stefanha@redhat.com>, Max Reitz <mreitz@redhat.com>, Kevin Wolf <kwolf@redhat.com>
There is a newer version of this series
include/block/block.h      |  22 +++++-
include/block/block_int.h  |  23 ++++++
block.c                    |  24 +++---
block/io.c                 | 155 ++++++++++++++++++++++++++++++-------
python/qemu/__init__.py    |   5 +-
tests/qemu-iotests/040     |  40 +++++++++-
tests/qemu-iotests/040.out |   4 +-
tests/qemu-iotests/255     |   2 +-
8 files changed, 231 insertions(+), 44 deletions(-)
[Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections
Posted by Max Reitz 6 years, 4 months ago
Hi,

This is v2 to “block: Keep track of parent quiescing”.

Please read this cover letter, because I’m very unsure about the design
in this series and I’d appreciate some comments.

As Kevin wrote in his reply to that series, the actual problem is that
bdrv_drain_invoke() polls on every node whenever ending a drain.  This
may cause graph changes, which is actually forbidden.

To solve that problem, this series makes the drain code construct a list
of undrain operations that have been initiated, and then polls all of
them on the root level once graph changes are acceptable.

Note that I don’t like this list concept very much, so I’m open to
alternatives.

Furthermore, all BdrvChildRoles with BDS parents have a broken
.drained_end() implementation.  The documentation clearly states that
this function is not allowed to poll, but it does.  So this series
changes it to a variant (using the new code) that does not poll.

There is a catch, which may actually be a problem, I don’t know: The new
variant of that .drained_end() does not poll at all, never.  As
described above, now every bdrv_drain_invoke() returns an object that
describes when it will be done and which can thus be polled for.  These
objects are just discarded when using BdrvChildRole.drained_end().  That
does not feel quite right.  It would probably be more correct to let
BdrvChildRole.drained_end() return these objects so the top level
bdrv_drained_end() can poll for their completion.

I decided not to do this, for two reasons:
(1) Doing so would spill the “list of objects to poll for” design to
    places outside of block/io.c.  I don’t like the design very much as
    it is, but I can live with it as long as it’s constrained to the
    core drain code in block/io.c.
    This is made worse by the fact that currently, those objects are of
    type BdrvCoDrainData.  But it shouldn’t be a problem to add a new
    type that is externally visible (we only need the AioContext and
    whether bdrv_drain_invoke_entry() is done).

(2) It seems to work as it is.

The alternative would be to add the same GSList ** parameter to
BdrvChildRole.drained_end() that I added in the core drain code in patch
2, and then let the .drained_end() implementation fill that with objects
to poll for.  (Which would be accomplished by adding a frontend to
bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
parameter through.)

Opinions?


And then we have bdrv_replace_child_noperm(), which actually wants a
polling BdrvChildRole.drained_end().  So this series adds
BdrvChildRole.drained_end_unquiesce(), which takes that role (if there
is any polling to do).

Note that if I implemented the alternative described above
(.drained_end() gets said GSList ** parameter), a
.drained_end_unquiesce() wouldn’t be necessary.
bdrv_parent_drained_end_single() could just poll the list returned by
.drained_end() by itself.


Finally, patches 1, 8, and 9 are unmodified from v1.


git backport-diff against v1:

Key:
[----] : patches are identical
[####] : number of functional differences between upstream/downstream patch
[down] : patch is downstream-only
The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively

001/9:[----] [--] 'block: Introduce BdrvChild.parent_quiesce_counter'
002/9:[down] 'block: Add @data_objs to bdrv_drain_invoke()'
003/9:[down] 'block: Add bdrv_poll_drain_data_objs()'
004/9:[down] 'block: Move polling out of bdrv_drain_invoke()'
005/9:[down] 'block: Add @poll to bdrv_parent_drained_end_single()'
006/9:[down] 'block: Add bdrv_drained_end_no_poll()'
007/9:[down] 'block: Fix BDS children's .drained_end()'
008/9:[----] [--] 'iotests: Add @has_quit to vm.shutdown()'
009/9:[----] [--] 'iotests: Test commit with a filter on the chain'


Max Reitz (9):
  block: Introduce BdrvChild.parent_quiesce_counter
  block: Add @data_objs to bdrv_drain_invoke()
  block: Add bdrv_poll_drain_data_objs()
  block: Move polling out of bdrv_drain_invoke()
  block: Add @poll to bdrv_parent_drained_end_single()
  block: Add bdrv_drained_end_no_poll()
  block: Fix BDS children's .drained_end()
  iotests: Add @has_quit to vm.shutdown()
  iotests: Test commit with a filter on the chain

 include/block/block.h      |  22 +++++-
 include/block/block_int.h  |  23 ++++++
 block.c                    |  24 +++---
 block/io.c                 | 155 ++++++++++++++++++++++++++++++-------
 python/qemu/__init__.py    |   5 +-
 tests/qemu-iotests/040     |  40 +++++++++-
 tests/qemu-iotests/040.out |   4 +-
 tests/qemu-iotests/255     |   2 +-
 8 files changed, 231 insertions(+), 44 deletions(-)

-- 
2.21.0


Re: [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections
Posted by Kevin Wolf 6 years, 3 months ago
Am 19.06.2019 um 17:25 hat Max Reitz geschrieben:
> Hi,
> 
> This is v2 to “block: Keep track of parent quiescing”.
> 
> Please read this cover letter, because I’m very unsure about the design
> in this series and I’d appreciate some comments.
> 
> As Kevin wrote in his reply to that series, the actual problem is that
> bdrv_drain_invoke() polls on every node whenever ending a drain.  This
> may cause graph changes, which is actually forbidden.
> 
> To solve that problem, this series makes the drain code construct a list
> of undrain operations that have been initiated, and then polls all of
> them on the root level once graph changes are acceptable.
> 
> Note that I don’t like this list concept very much, so I’m open to
> alternatives.

So drain_end is different from drain_begin in that it wants to wait only
for all bdrv_drain_invoke() calls to complete, but not for other
requests that are in flight. Makes sense.

Though instead of managing a whole list, wouldn't a counter suffice?

> Furthermore, all BdrvChildRoles with BDS parents have a broken
> .drained_end() implementation.  The documentation clearly states that
> this function is not allowed to poll, but it does.  So this series
> changes it to a variant (using the new code) that does not poll.
> 
> There is a catch, which may actually be a problem, I don’t know: The new
> variant of that .drained_end() does not poll at all, never.  As
> described above, now every bdrv_drain_invoke() returns an object that
> describes when it will be done and which can thus be polled for.  These
> objects are just discarded when using BdrvChildRole.drained_end().  That
> does not feel quite right.  It would probably be more correct to let
> BdrvChildRole.drained_end() return these objects so the top level
> bdrv_drained_end() can poll for their completion.
> 
> I decided not to do this, for two reasons:
> (1) Doing so would spill the “list of objects to poll for” design to
>     places outside of block/io.c.  I don’t like the design very much as
>     it is, but I can live with it as long as it’s constrained to the
>     core drain code in block/io.c.
>     This is made worse by the fact that currently, those objects are of
>     type BdrvCoDrainData.  But it shouldn’t be a problem to add a new
>     type that is externally visible (we only need the AioContext and
>     whether bdrv_drain_invoke_entry() is done).
> 
> (2) It seems to work as it is.
> 
> The alternative would be to add the same GSList ** parameter to
> BdrvChildRole.drained_end() that I added in the core drain code in patch
> 2, and then let the .drained_end() implementation fill that with objects
> to poll for.  (Which would be accomplished by adding a frontend to
> bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
> parameter through.)
> 
> Opinions?

I think I would add an int* to BdrvChildRole.drained_end() so that we
can just increase the counter whereever we need to.

> And then we have bdrv_replace_child_noperm(), which actually wants a
> polling BdrvChildRole.drained_end().  So this series adds
> BdrvChildRole.drained_end_unquiesce(), which takes that role (if there
> is any polling to do).
> 
> Note that if I implemented the alternative described above
> (.drained_end() gets said GSList ** parameter), a
> .drained_end_unquiesce() wouldn’t be necessary.
> bdrv_parent_drained_end_single() could just poll the list returned by
> .drained_end() by itself.

The split between .drained_end/.drained_end_unquiesce feels wrong. It
shouldn't be the job of the BdrvChildRole to worry about this. Polling
should be handled inside bdrv_parent_drained_end_single(), like we do in
bdrv_parent_drained_begin_single(), so that the BdrvChildRole never has
to poll.

> Finally, patches 1, 8, and 9 are unmodified from v1.
> [...]
> 
>  include/block/block.h      |  22 +++++-
>  include/block/block_int.h  |  23 ++++++
>  block.c                    |  24 +++---
>  block/io.c                 | 155 ++++++++++++++++++++++++++++++-------
>  python/qemu/__init__.py    |   5 +-
>  tests/qemu-iotests/040     |  40 +++++++++-
>  tests/qemu-iotests/040.out |   4 +-
>  tests/qemu-iotests/255     |   2 +-
>  8 files changed, 231 insertions(+), 44 deletions(-)

I feel this series should add something to tests/test-bdrv-drain.c, too.
qemu-iotests can only test high-level block job commands that happen to
trigger the bug today, but that code may change in the future. Unit
tests allow us to test the problematic cases more directly.

Kevin

Re: [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections
Posted by Max Reitz 6 years, 3 months ago
On 16.07.19 16:40, Kevin Wolf wrote:
> Am 19.06.2019 um 17:25 hat Max Reitz geschrieben:
>> Hi,
>>
>> This is v2 to “block: Keep track of parent quiescing”.
>>
>> Please read this cover letter, because I’m very unsure about the design
>> in this series and I’d appreciate some comments.
>>
>> As Kevin wrote in his reply to that series, the actual problem is that
>> bdrv_drain_invoke() polls on every node whenever ending a drain.  This
>> may cause graph changes, which is actually forbidden.
>>
>> To solve that problem, this series makes the drain code construct a list
>> of undrain operations that have been initiated, and then polls all of
>> them on the root level once graph changes are acceptable.
>>
>> Note that I don’t like this list concept very much, so I’m open to
>> alternatives.
> 
> So drain_end is different from drain_begin in that it wants to wait only
> for all bdrv_drain_invoke() calls to complete, but not for other
> requests that are in flight. Makes sense.
> 
> Though instead of managing a whole list, wouldn't a counter suffice?
> 
>> Furthermore, all BdrvChildRoles with BDS parents have a broken
>> .drained_end() implementation.  The documentation clearly states that
>> this function is not allowed to poll, but it does.  So this series
>> changes it to a variant (using the new code) that does not poll.
>>
>> There is a catch, which may actually be a problem, I don’t know: The new
>> variant of that .drained_end() does not poll at all, never.  As
>> described above, now every bdrv_drain_invoke() returns an object that
>> describes when it will be done and which can thus be polled for.  These
>> objects are just discarded when using BdrvChildRole.drained_end().  That
>> does not feel quite right.  It would probably be more correct to let
>> BdrvChildRole.drained_end() return these objects so the top level
>> bdrv_drained_end() can poll for their completion.
>>
>> I decided not to do this, for two reasons:
>> (1) Doing so would spill the “list of objects to poll for” design to
>>     places outside of block/io.c.  I don’t like the design very much as
>>     it is, but I can live with it as long as it’s constrained to the
>>     core drain code in block/io.c.
>>     This is made worse by the fact that currently, those objects are of
>>     type BdrvCoDrainData.  But it shouldn’t be a problem to add a new
>>     type that is externally visible (we only need the AioContext and
>>     whether bdrv_drain_invoke_entry() is done).
>>
>> (2) It seems to work as it is.
>>
>> The alternative would be to add the same GSList ** parameter to
>> BdrvChildRole.drained_end() that I added in the core drain code in patch
>> 2, and then let the .drained_end() implementation fill that with objects
>> to poll for.  (Which would be accomplished by adding a frontend to
>> bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
>> parameter through.)
>>
>> Opinions?
> 
> I think I would add an int* to BdrvChildRole.drained_end() so that we
> can just increase the counter whereever we need to.

So you mean just polling the @bs for which a caller gave poll=true until
the counter reaches 0?  I’ll try, sounds good (if I can get it to work).

>> And then we have bdrv_replace_child_noperm(), which actually wants a
>> polling BdrvChildRole.drained_end().  So this series adds
>> BdrvChildRole.drained_end_unquiesce(), which takes that role (if there
>> is any polling to do).
>>
>> Note that if I implemented the alternative described above
>> (.drained_end() gets said GSList ** parameter), a
>> .drained_end_unquiesce() wouldn’t be necessary.
>> bdrv_parent_drained_end_single() could just poll the list returned by
>> .drained_end() by itself.
> 
> The split between .drained_end/.drained_end_unquiesce feels wrong. It
> shouldn't be the job of the BdrvChildRole to worry about this. Polling
> should be handled inside bdrv_parent_drained_end_single(), like we do in
> bdrv_parent_drained_begin_single(), so that the BdrvChildRole never has
> to poll.

If it’s just an int pointer, there’s no point in having two functions, I
suppose.

Thanks for you suggestion!

>> Finally, patches 1, 8, and 9 are unmodified from v1.
>> [...]
>>
>>  include/block/block.h      |  22 +++++-
>>  include/block/block_int.h  |  23 ++++++
>>  block.c                    |  24 +++---
>>  block/io.c                 | 155 ++++++++++++++++++++++++++++++-------
>>  python/qemu/__init__.py    |   5 +-
>>  tests/qemu-iotests/040     |  40 +++++++++-
>>  tests/qemu-iotests/040.out |   4 +-
>>  tests/qemu-iotests/255     |   2 +-
>>  8 files changed, 231 insertions(+), 44 deletions(-)
> 
> I feel this series should add something to tests/test-bdrv-drain.c, too.
> qemu-iotests can only test high-level block job commands that happen to
> trigger the bug today, but that code may change in the future. Unit
> tests allow us to test the problematic cases more directly.

Well, I’m glad if test-bdrv-drain just keeps working. :-)

I’ll try.

Max

Re: [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections
Posted by Kevin Wolf 6 years, 3 months ago
Am 16.07.2019 um 18:24 hat Max Reitz geschrieben:
> On 16.07.19 16:40, Kevin Wolf wrote:
> > Am 19.06.2019 um 17:25 hat Max Reitz geschrieben:
> >> Hi,
> >>
> >> This is v2 to “block: Keep track of parent quiescing”.
> >>
> >> Please read this cover letter, because I’m very unsure about the design
> >> in this series and I’d appreciate some comments.
> >>
> >> As Kevin wrote in his reply to that series, the actual problem is that
> >> bdrv_drain_invoke() polls on every node whenever ending a drain.  This
> >> may cause graph changes, which is actually forbidden.
> >>
> >> To solve that problem, this series makes the drain code construct a list
> >> of undrain operations that have been initiated, and then polls all of
> >> them on the root level once graph changes are acceptable.
> >>
> >> Note that I don’t like this list concept very much, so I’m open to
> >> alternatives.
> > 
> > So drain_end is different from drain_begin in that it wants to wait only
> > for all bdrv_drain_invoke() calls to complete, but not for other
> > requests that are in flight. Makes sense.
> > 
> > Though instead of managing a whole list, wouldn't a counter suffice?
> > 
> >> Furthermore, all BdrvChildRoles with BDS parents have a broken
> >> .drained_end() implementation.  The documentation clearly states that
> >> this function is not allowed to poll, but it does.  So this series
> >> changes it to a variant (using the new code) that does not poll.
> >>
> >> There is a catch, which may actually be a problem, I don’t know: The new
> >> variant of that .drained_end() does not poll at all, never.  As
> >> described above, now every bdrv_drain_invoke() returns an object that
> >> describes when it will be done and which can thus be polled for.  These
> >> objects are just discarded when using BdrvChildRole.drained_end().  That
> >> does not feel quite right.  It would probably be more correct to let
> >> BdrvChildRole.drained_end() return these objects so the top level
> >> bdrv_drained_end() can poll for their completion.
> >>
> >> I decided not to do this, for two reasons:
> >> (1) Doing so would spill the “list of objects to poll for” design to
> >>     places outside of block/io.c.  I don’t like the design very much as
> >>     it is, but I can live with it as long as it’s constrained to the
> >>     core drain code in block/io.c.
> >>     This is made worse by the fact that currently, those objects are of
> >>     type BdrvCoDrainData.  But it shouldn’t be a problem to add a new
> >>     type that is externally visible (we only need the AioContext and
> >>     whether bdrv_drain_invoke_entry() is done).
> >>
> >> (2) It seems to work as it is.
> >>
> >> The alternative would be to add the same GSList ** parameter to
> >> BdrvChildRole.drained_end() that I added in the core drain code in patch
> >> 2, and then let the .drained_end() implementation fill that with objects
> >> to poll for.  (Which would be accomplished by adding a frontend to
> >> bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
> >> parameter through.)
> >>
> >> Opinions?
> > 
> > I think I would add an int* to BdrvChildRole.drained_end() so that we
> > can just increase the counter whereever we need to.
> 
> So you mean just polling the @bs for which a caller gave poll=true until
> the counter reaches 0?  I’ll try, sounds good (if I can get it to work).

Yes, that's what I have in mind.

We expect graph changes to happen during the polling, but I think the
caller is responsible for making sure that the top-level @bs stays
around, so we don't need to be extra careful here.

Also, bdrv_drain_invoke() is always called in the same AioContext as the
top-level drain operation, so the whole aio_context_acquire/release
stuff from this series should become unnecessary, and we don't need
atomics to access the counter either.

So I think this should really simplify the series a lot.

Kevin
Re: [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections
Posted by Max Reitz 6 years, 3 months ago
On 16.07.19 18:37, Kevin Wolf wrote:
> Am 16.07.2019 um 18:24 hat Max Reitz geschrieben:
>> On 16.07.19 16:40, Kevin Wolf wrote:
>>> Am 19.06.2019 um 17:25 hat Max Reitz geschrieben:
>>>> Hi,
>>>>
>>>> This is v2 to “block: Keep track of parent quiescing”.
>>>>
>>>> Please read this cover letter, because I’m very unsure about the design
>>>> in this series and I’d appreciate some comments.
>>>>
>>>> As Kevin wrote in his reply to that series, the actual problem is that
>>>> bdrv_drain_invoke() polls on every node whenever ending a drain.  This
>>>> may cause graph changes, which is actually forbidden.
>>>>
>>>> To solve that problem, this series makes the drain code construct a list
>>>> of undrain operations that have been initiated, and then polls all of
>>>> them on the root level once graph changes are acceptable.
>>>>
>>>> Note that I don’t like this list concept very much, so I’m open to
>>>> alternatives.
>>>
>>> So drain_end is different from drain_begin in that it wants to wait only
>>> for all bdrv_drain_invoke() calls to complete, but not for other
>>> requests that are in flight. Makes sense.
>>>
>>> Though instead of managing a whole list, wouldn't a counter suffice?
>>>
>>>> Furthermore, all BdrvChildRoles with BDS parents have a broken
>>>> .drained_end() implementation.  The documentation clearly states that
>>>> this function is not allowed to poll, but it does.  So this series
>>>> changes it to a variant (using the new code) that does not poll.
>>>>
>>>> There is a catch, which may actually be a problem, I don’t know: The new
>>>> variant of that .drained_end() does not poll at all, never.  As
>>>> described above, now every bdrv_drain_invoke() returns an object that
>>>> describes when it will be done and which can thus be polled for.  These
>>>> objects are just discarded when using BdrvChildRole.drained_end().  That
>>>> does not feel quite right.  It would probably be more correct to let
>>>> BdrvChildRole.drained_end() return these objects so the top level
>>>> bdrv_drained_end() can poll for their completion.
>>>>
>>>> I decided not to do this, for two reasons:
>>>> (1) Doing so would spill the “list of objects to poll for” design to
>>>>     places outside of block/io.c.  I don’t like the design very much as
>>>>     it is, but I can live with it as long as it’s constrained to the
>>>>     core drain code in block/io.c.
>>>>     This is made worse by the fact that currently, those objects are of
>>>>     type BdrvCoDrainData.  But it shouldn’t be a problem to add a new
>>>>     type that is externally visible (we only need the AioContext and
>>>>     whether bdrv_drain_invoke_entry() is done).
>>>>
>>>> (2) It seems to work as it is.
>>>>
>>>> The alternative would be to add the same GSList ** parameter to
>>>> BdrvChildRole.drained_end() that I added in the core drain code in patch
>>>> 2, and then let the .drained_end() implementation fill that with objects
>>>> to poll for.  (Which would be accomplished by adding a frontend to
>>>> bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
>>>> parameter through.)
>>>>
>>>> Opinions?
>>>
>>> I think I would add an int* to BdrvChildRole.drained_end() so that we
>>> can just increase the counter whereever we need to.
>>
>> So you mean just polling the @bs for which a caller gave poll=true until
>> the counter reaches 0?  I’ll try, sounds good (if I can get it to work).
> 
> Yes, that's what I have in mind.
> 
> We expect graph changes to happen during the polling, but I think the
> caller is responsible for making sure that the top-level @bs stays
> around, so we don't need to be extra careful here.
> 
> Also, bdrv_drain_invoke() is always called in the same AioContext as the
> top-level drain operation, so the whole aio_context_acquire/release
> stuff from this series should become unnecessary, and we don't need
> atomics to access the counter either.
> 
> So I think this should really simplify the series a lot.

Hm.  Unfortunately, not all nodes in a chain always have the same
AioContext.

I think they generally should, but there is at least one exception:
bdrv_set_aio_context*() itself.  bdrv_set_aio_context_ignore() drains
the node, then puts other members of the subgraph into the same
AioContext, then itself.

Now say this reaches the bottom node.  That node will not recurse
anywhere else, but only change its own AioContext, in a drained section.
 So when that section ends, the bottom node will be in a different
AioContext than the other nodes.

So, er, well.  I have three ideas:

(1) Skip the polling on the top level drained_end if the node still has
another quiesce_counter on it.  Sounds a bit too error-prone to me.

(2) Drop the drained sections in bdrv_set_aio_context_ignore().  Instead
require the root caller to have the whole subtree drained.  That way,
drained_end will never be invoked while the subtree has different
AioContexts.

(3) I need a list after all (one that only contains AioContexts, but still).


I like (3) as little as I did in this series.  (1) seems wrong.  I’ll
try (2) first.

Max

Re: [Qemu-devel] [PATCH v2 0/9] block: Delay poll when ending drained sections
Posted by Max Reitz 6 years, 3 months ago
Ping – as this series fixes an abort and a segfault, I would appreciate
reviews.

(Head over to “Fixes for concurrent block jobs” for even more fixes for
aborts and segfaults.)

On 19.06.19 17:25, Max Reitz wrote:
> Hi,
> 
> This is v2 to “block: Keep track of parent quiescing”.
> 
> Please read this cover letter, because I’m very unsure about the design
> in this series and I’d appreciate some comments.
> 
> As Kevin wrote in his reply to that series, the actual problem is that
> bdrv_drain_invoke() polls on every node whenever ending a drain.  This
> may cause graph changes, which is actually forbidden.
> 
> To solve that problem, this series makes the drain code construct a list
> of undrain operations that have been initiated, and then polls all of
> them on the root level once graph changes are acceptable.
> 
> Note that I don’t like this list concept very much, so I’m open to
> alternatives.
> 
> Furthermore, all BdrvChildRoles with BDS parents have a broken
> .drained_end() implementation.  The documentation clearly states that
> this function is not allowed to poll, but it does.  So this series
> changes it to a variant (using the new code) that does not poll.
> 
> There is a catch, which may actually be a problem, I don’t know: The new
> variant of that .drained_end() does not poll at all, never.  As
> described above, now every bdrv_drain_invoke() returns an object that
> describes when it will be done and which can thus be polled for.  These
> objects are just discarded when using BdrvChildRole.drained_end().  That
> does not feel quite right.  It would probably be more correct to let
> BdrvChildRole.drained_end() return these objects so the top level
> bdrv_drained_end() can poll for their completion.
> 
> I decided not to do this, for two reasons:
> (1) Doing so would spill the “list of objects to poll for” design to
>     places outside of block/io.c.  I don’t like the design very much as
>     it is, but I can live with it as long as it’s constrained to the
>     core drain code in block/io.c.
>     This is made worse by the fact that currently, those objects are of
>     type BdrvCoDrainData.  But it shouldn’t be a problem to add a new
>     type that is externally visible (we only need the AioContext and
>     whether bdrv_drain_invoke_entry() is done).
> 
> (2) It seems to work as it is.
> 
> The alternative would be to add the same GSList ** parameter to
> BdrvChildRole.drained_end() that I added in the core drain code in patch
> 2, and then let the .drained_end() implementation fill that with objects
> to poll for.  (Which would be accomplished by adding a frontend to
> bdrv_do_drained_end() that lets bdrv_child_cb_drained_poll() pass the
> parameter through.)
> 
> Opinions?
> 
> 
> And then we have bdrv_replace_child_noperm(), which actually wants a
> polling BdrvChildRole.drained_end().  So this series adds
> BdrvChildRole.drained_end_unquiesce(), which takes that role (if there
> is any polling to do).
> 
> Note that if I implemented the alternative described above
> (.drained_end() gets said GSList ** parameter), a
> .drained_end_unquiesce() wouldn’t be necessary.
> bdrv_parent_drained_end_single() could just poll the list returned by
> .drained_end() by itself.
> 
> 
> Finally, patches 1, 8, and 9 are unmodified from v1.
> 
> 
> git backport-diff against v1:
> 
> Key:
> [----] : patches are identical
> [####] : number of functional differences between upstream/downstream patch
> [down] : patch is downstream-only
> The flags [FC] indicate (F)unctional and (C)ontextual differences, respectively
> 
> 001/9:[----] [--] 'block: Introduce BdrvChild.parent_quiesce_counter'
> 002/9:[down] 'block: Add @data_objs to bdrv_drain_invoke()'
> 003/9:[down] 'block: Add bdrv_poll_drain_data_objs()'
> 004/9:[down] 'block: Move polling out of bdrv_drain_invoke()'
> 005/9:[down] 'block: Add @poll to bdrv_parent_drained_end_single()'
> 006/9:[down] 'block: Add bdrv_drained_end_no_poll()'
> 007/9:[down] 'block: Fix BDS children's .drained_end()'
> 008/9:[----] [--] 'iotests: Add @has_quit to vm.shutdown()'
> 009/9:[----] [--] 'iotests: Test commit with a filter on the chain'
> 
> 
> Max Reitz (9):
>   block: Introduce BdrvChild.parent_quiesce_counter
>   block: Add @data_objs to bdrv_drain_invoke()
>   block: Add bdrv_poll_drain_data_objs()
>   block: Move polling out of bdrv_drain_invoke()
>   block: Add @poll to bdrv_parent_drained_end_single()
>   block: Add bdrv_drained_end_no_poll()
>   block: Fix BDS children's .drained_end()
>   iotests: Add @has_quit to vm.shutdown()
>   iotests: Test commit with a filter on the chain
> 
>  include/block/block.h      |  22 +++++-
>  include/block/block_int.h  |  23 ++++++
>  block.c                    |  24 +++---
>  block/io.c                 | 155 ++++++++++++++++++++++++++++++-------
>  python/qemu/__init__.py    |   5 +-
>  tests/qemu-iotests/040     |  40 +++++++++-
>  tests/qemu-iotests/040.out |   4 +-
>  tests/qemu-iotests/255     |   2 +-
>  8 files changed, 231 insertions(+), 44 deletions(-)
>