[Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors

Fam Zheng posted 1 patch 6 years, 4 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20171123175747.2309-1-famz@redhat.com
Test checkpatch passed
Test docker passed
Test ppc passed
Test s390x passed
block/io.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
[Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors
Posted by Fam Zheng 6 years, 4 months ago
Jeff's block job patch made the latent drain bug visible, and I find this
patch, which by itself also makes some sense, can hide it again. :) With it
applied we are at least back to the ground where patchew's iotests (make
docker-test-block@fedora) can pass.

The real bug is that in the middle of bdrv_parent_drained_end(), bs's parent
list changes. One drained_end call before the mirror_exit() already did one
blk_root_drained_end(), a second drained_end on an updated parent node can do
another same blk_root_drained_end(), making it unbalanced with
blk_root_drained_begin(). This is shown by the following three backtraces as
captured by rr with a crashed "qemu-img commit", essentially the same as in
the failed iotest 020:

* Backtrace 1, where drain begins:

(rr) bt

* Backtrace 2, in the early phase of bdrv_parent_drained_end(), before
  mirror_exit happend:

(rr) bt

* Backtrace 3, in a later phase of the same bdrv_parent_drained_end(), after
  mirror_exit() which changed the node graph:

(rr) bt

IMO we should rethink bdrv_parent_drained_begin/end to avoid such complications
and maybe in the long term get rid of the nested BDRV_POLL_WHILE() if possible.

It's late for me so I'm posting the patch anyway in case we could use it for
-rc3.

Note this doesn't fix the hanging 056, which I haven't debugged yet.

Fam

Fam Zheng (1):
  block: Don't poll for drain end

 block/io.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

-- 
2.14.3


Re: [Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors
Posted by Jeff Cody 6 years, 4 months ago
On Fri, Nov 24, 2017 at 01:57:46AM +0800, Fam Zheng wrote:
> Jeff's block job patch made the latent drain bug visible, and I find this
> patch, which by itself also makes some sense, can hide it again. :) With it
> applied we are at least back to the ground where patchew's iotests (make
> docker-test-block@fedora) can pass.
> 

Unfortunately, I am still seeing segfaults and aborts even with this patch.
For instance, on tests: 097 141 176.

k
> The real bug is that in the middle of bdrv_parent_drained_end(), bs's parent
> list changes. One drained_end call before the mirror_exit() already did one
> blk_root_drained_end(), a second drained_end on an updated parent node can do
> another same blk_root_drained_end(), making it unbalanced with
> blk_root_drained_begin(). This is shown by the following three backtraces as
> captured by rr with a crashed "qemu-img commit", essentially the same as in
> the failed iotest 020:
> 
> * Backtrace 1, where drain begins:
> 
> (rr) bt
> 
> * Backtrace 2, in the early phase of bdrv_parent_drained_end(), before
>   mirror_exit happend:
> 
> (rr) bt
> 
> * Backtrace 3, in a later phase of the same bdrv_parent_drained_end(), after
>   mirror_exit() which changed the node graph:
> 
> (rr) bt
> 
> IMO we should rethink bdrv_parent_drained_begin/end to avoid such complications
> and maybe in the long term get rid of the nested BDRV_POLL_WHILE() if possible.
> 
> It's late for me so I'm posting the patch anyway in case we could use it for
> -rc3.
> 
> Note this doesn't fix the hanging 056, which I haven't debugged yet.
> 
> Fam
> 
> Fam Zheng (1):
>   block: Don't poll for drain end
> 
>  block/io.c | 8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> -- 
> 2.14.3
> 

Re: [Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors
Posted by Fam Zheng 6 years, 4 months ago
On Fri, 11/24 01:12, Jeff Cody wrote:
> On Fri, Nov 24, 2017 at 01:57:46AM +0800, Fam Zheng wrote:
> > Jeff's block job patch made the latent drain bug visible, and I find this
> > patch, which by itself also makes some sense, can hide it again. :) With it
> > applied we are at least back to the ground where patchew's iotests (make
> > docker-test-block@fedora) can pass.
> > 
> 
> Unfortunately, I am still seeing segfaults and aborts even with this patch.
> For instance, on tests: 097 141 176.

OK, so the graph change during bdrv_parent_drained_begin/end still bits. We'll
need more code to secure it.

Fam

Re: [Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors
Posted by Kevin Wolf 6 years, 4 months ago
Am 23.11.2017 um 18:57 hat Fam Zheng geschrieben:
> Jeff's block job patch made the latent drain bug visible, and I find this
> patch, which by itself also makes some sense, can hide it again. :) With it
> applied we are at least back to the ground where patchew's iotests (make
> docker-test-block@fedora) can pass.
> 
> The real bug is that in the middle of bdrv_parent_drained_end(), bs's parent
> list changes. One drained_end call before the mirror_exit() already did one
> blk_root_drained_end(), a second drained_end on an updated parent node can do
> another same blk_root_drained_end(), making it unbalanced with
> blk_root_drained_begin(). This is shown by the following three backtraces as
> captured by rr with a crashed "qemu-img commit", essentially the same as in
> the failed iotest 020:
> 
> * Backtrace 1, where drain begins:
> 
> (rr) bt
> 
> * Backtrace 2, in the early phase of bdrv_parent_drained_end(), before
>   mirror_exit happend:
> 
> (rr) bt
> 
> * Backtrace 3, in a later phase of the same bdrv_parent_drained_end(), after
>   mirror_exit() which changed the node graph:
> 
> (rr) bt
> 
> IMO we should rethink bdrv_parent_drained_begin/end to avoid such complications
> and maybe in the long term get rid of the nested BDRV_POLL_WHILE() if possible.

Maybe the backtraces would help me understand the problem if they were
actually there. :-)

Kevin

Re: [Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors
Posted by Fam Zheng 6 years, 4 months ago
On Fri, 11/24 17:39, Kevin Wolf wrote:
> Maybe the backtraces would help me understand the problem if they were
> actually there. :-)

Ouch, looks like git-tag(1) doesn't like to store # lines in the message.
Unfortunately I rebooted the laptop and didn't save the vim buffer, thinking it
is already in the message. But from another reply I think your understanding
matches how I interpret the backtraces.

Fam

Re: [Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors
Posted by Kevin Wolf 6 years, 4 months ago
Am 23.11.2017 um 18:57 hat Fam Zheng geschrieben:
> Jeff's block job patch made the latent drain bug visible, and I find this
> patch, which by itself also makes some sense, can hide it again. :) With it
> applied we are at least back to the ground where patchew's iotests (make
> docker-test-block@fedora) can pass.
> 
> The real bug is that in the middle of bdrv_parent_drained_end(), bs's parent
> list changes. One drained_end call before the mirror_exit() already did one
> blk_root_drained_end(), a second drained_end on an updated parent node can do
> another same blk_root_drained_end(), making it unbalanced with
> blk_root_drained_begin(). This is shown by the following three backtraces as
> captured by rr with a crashed "qemu-img commit", essentially the same as in
> the failed iotest 020:

My conclusion what really happens in 020 is that we have a graph like
this:

                             mirror target BB --+
                                                |
                                                v
    qemu-img BB -> mirror_top_bs -> overlay -> base

bdrv_drained_end(base) results in it being available for requests again,
so it calls bdrv_parent_drained_end() for overlay. While draining
itself, the mirror job completes and changes the BdrvChild between
mirror_top_bs and overlay (which is currently being drained) to point to
base instead. After returning, QLIST_FOREACH() continues to iterate the
parents of base instead of those of overlay, resulting in a second
blk_drained_end() for the mirror target BB.

This instance can be fixed relatively easily (see below) by using
QLIST_FOREACH_SAFE() instead.

However, I'm not sure if all problems with the graph change can be
solved this way and whether we can really allow graph changes while
iterating the graph for bdrv_drained_begin/end. Not allowing it would
require some more serious changes to the block jobs that delays their
completion until after bdrv_drain_end() has finished (not sure how to
even get a callback at that point...)

And the test cases that Jeff mentions still fail with this patch, too.
But at least it doesn't only make failure less likely by reducing the
window for a race condition, but seems to attack a real problem.

Kevin


diff --git a/block/io.c b/block/io.c
index 4fdf93a014..6773926fc1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -42,9 +42,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 
 void bdrv_parent_drained_begin(BlockDriverState *bs)
 {
-    BdrvChild *c;
+    BdrvChild *c, *next;
 
-    QLIST_FOREACH(c, &bs->parents, next_parent) {
+    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
         if (c->role->drained_begin) {
             c->role->drained_begin(c);
         }
@@ -53,9 +53,9 @@ void bdrv_parent_drained_begin(BlockDriverState *bs)
 
 void bdrv_parent_drained_end(BlockDriverState *bs)
 {
-    BdrvChild *c;
+    BdrvChild *c, *next;
 
-    QLIST_FOREACH(c, &bs->parents, next_parent) {
+    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
         if (c->role->drained_end) {
             c->role->drained_end(c);
         }


Re: [Qemu-devel] [Qemu-block] [PATCH 0/1] block: Workaround for the iotests errors
Posted by John Snow 6 years, 4 months ago

On 11/27/2017 06:29 PM, Kevin Wolf wrote:
> Am 23.11.2017 um 18:57 hat Fam Zheng geschrieben:
>> Jeff's block job patch made the latent drain bug visible, and I find this
>> patch, which by itself also makes some sense, can hide it again. :) With it
>> applied we are at least back to the ground where patchew's iotests (make
>> docker-test-block@fedora) can pass.
>>
>> The real bug is that in the middle of bdrv_parent_drained_end(), bs's parent
>> list changes. One drained_end call before the mirror_exit() already did one
>> blk_root_drained_end(), a second drained_end on an updated parent node can do
>> another same blk_root_drained_end(), making it unbalanced with
>> blk_root_drained_begin(). This is shown by the following three backtraces as
>> captured by rr with a crashed "qemu-img commit", essentially the same as in
>> the failed iotest 020:
> 
> My conclusion what really happens in 020 is that we have a graph like
> this:
> 
>                              mirror target BB --+
>                                                 |
>                                                 v
>     qemu-img BB -> mirror_top_bs -> overlay -> base
> 
> bdrv_drained_end(base) results in it being available for requests again,
> so it calls bdrv_parent_drained_end() for overlay. While draining
> itself, the mirror job completes and changes the BdrvChild between
> mirror_top_bs and overlay (which is currently being drained) to point to
> base instead. After returning, QLIST_FOREACH() continues to iterate the
> parents of base instead of those of overlay, resulting in a second
> blk_drained_end() for the mirror target BB.
> 
> This instance can be fixed relatively easily (see below) by using
> QLIST_FOREACH_SAFE() instead.
> 
> However, I'm not sure if all problems with the graph change can be
> solved this way and whether we can really allow graph changes while
> iterating the graph for bdrv_drained_begin/end. Not allowing it would
> require some more serious changes to the block jobs that delays their
> completion until after bdrv_drain_end() has finished (not sure how to
> even get a callback at that point...)
> 
> And the test cases that Jeff mentions still fail with this patch, too.
> But at least it doesn't only make failure less likely by reducing the
> window for a race condition, but seems to attack a real problem.
> 
> Kevin
> 
> 
> diff --git a/block/io.c b/block/io.c
> index 4fdf93a014..6773926fc1 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -42,9 +42,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>  
>  void bdrv_parent_drained_begin(BlockDriverState *bs)
>  {
> -    BdrvChild *c;
> +    BdrvChild *c, *next;
>  
> -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
>          if (c->role->drained_begin) {
>              c->role->drained_begin(c);
>          }
> @@ -53,9 +53,9 @@ void bdrv_parent_drained_begin(BlockDriverState *bs)
>  
>  void bdrv_parent_drained_end(BlockDriverState *bs)
>  {
> -    BdrvChild *c;
> +    BdrvChild *c, *next;
>  
> -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
>          if (c->role->drained_end) {
>              c->role->drained_end(c);
>          }
> 
> 

With this patch applied to 5e19aed5, I see the following failures (still?):

raw:
	055 (pause_job timeouts)
	109 (apparently a discrepancy over whether busy should be true/false.)

qcow2:
	056 (hang),
	087 (lacking crypto, normal for me)
	141 (unexpected timeout/hang)
	176 (SIGSEGV)
	188 (lacking crypto, normal for me)
	189 (lacking crypto, normal for me)
	198 (lacking crypto, I guess this is normal now)


I'm on my way to the gym for the evening, I will try to investigate
later this evening. I'm not worried about 087, 188, 189 or 198.

Anyway, as this micro-patchlet does fix observable problems with iotest 020;

Tested-by: John Snow <jsnow@redhat.com>

Re: [Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors
Posted by Jeff Cody 6 years, 4 months ago
On Tue, Nov 28, 2017 at 12:29:09AM +0100, Kevin Wolf wrote:
> Am 23.11.2017 um 18:57 hat Fam Zheng geschrieben:
> > Jeff's block job patch made the latent drain bug visible, and I find this
> > patch, which by itself also makes some sense, can hide it again. :) With it
> > applied we are at least back to the ground where patchew's iotests (make
> > docker-test-block@fedora) can pass.
> > 
> > The real bug is that in the middle of bdrv_parent_drained_end(), bs's parent
> > list changes. One drained_end call before the mirror_exit() already did one
> > blk_root_drained_end(), a second drained_end on an updated parent node can do
> > another same blk_root_drained_end(), making it unbalanced with
> > blk_root_drained_begin(). This is shown by the following three backtraces as
> > captured by rr with a crashed "qemu-img commit", essentially the same as in
> > the failed iotest 020:
> 
> My conclusion what really happens in 020 is that we have a graph like
> this:
> 
>                              mirror target BB --+
>                                                 |
>                                                 v
>     qemu-img BB -> mirror_top_bs -> overlay -> base
> 
> bdrv_drained_end(base) results in it being available for requests again,
> so it calls bdrv_parent_drained_end() for overlay. While draining
> itself, the mirror job completes and changes the BdrvChild between
> mirror_top_bs and overlay (which is currently being drained) to point to
> base instead. After returning, QLIST_FOREACH() continues to iterate the
> parents of base instead of those of overlay, resulting in a second
> blk_drained_end() for the mirror target BB.
> 
> This instance can be fixed relatively easily (see below) by using
> QLIST_FOREACH_SAFE() instead.
> 
> However, I'm not sure if all problems with the graph change can be
> solved this way and whether we can really allow graph changes while
> iterating the graph for bdrv_drained_begin/end. Not allowing it would
> require some more serious changes to the block jobs that delays their
> completion until after bdrv_drain_end() has finished (not sure how to
> even get a callback at that point...)
> 

That is at least part of what is causing the segfaults that I am still
seeing (after your patch):

We enter bdrv_drain_recurse(), and the BDS has been reaped:


Thread 1 "qemu-img" received signal SIGSEGV, Segmentation fault.
0x000000010014b56e in qemu_mutex_unlock (mutex=0x76767676767676d6) at util/qemu-thread-posix.c:92
92          assert(mutex->initialized);

#0  0x000000010014b56e in qemu_mutex_unlock (mutex=0x76767676767676d6) at util/qemu-thread-posix.c:92
#1  0x00000001001450bf in aio_context_release (ctx=0x7676767676767676) at util/async.c:507
#2  0x000000010009d5c7 in bdrv_drain_recurse (bs=0x100843270, begin=false) at block/io.c:201
#3  0x000000010009d949 in bdrv_drained_end (bs=0x100843270) at block/io.c:297
#4  0x000000010002e705 in bdrv_child_cb_drained_end (child=0x100870c40) at block.c:822
#5  0x000000010009cdc6 in bdrv_parent_drained_end (bs=0x100863b10) at block/io.c:60
#6  0x000000010009d938 in bdrv_drained_end (bs=0x100863b10) at block/io.c:296
#7  0x000000010009d9d5 in bdrv_drain (bs=0x100863b10) at block/io.c:322
#8  0x000000010008c39c in blk_drain (blk=0x100881220) at block/block-backend.c:1523
#9  0x000000010009a85e in mirror_drain (job=0x10088df50) at block/mirror.c:996
#10 0x0000000100037eca in block_job_drain (job=0x10088df50) at blockjob.c:187
#11 0x000000010003856d in block_job_finish_sync (job=0x10088df50, finish=0x100038926 <block_job_complete>, errp=0x7fffffffd1d8) at blockjob.c:378
#12 0x0000000100038b80 in block_job_complete_sync (job=0x10088df50, errp=0x7fffffffd1d8) at blockjob.c:544
#13 0x0000000100023de5 in run_block_job (job=0x10088df50, errp=0x7fffffffd1d8) at qemu-img.c:872
#14 0x0000000100024305 in img_commit (argc=3, argv=0x7fffffffd390) at qemu-img.c:1034
#15 0x000000010002ccd1 in main (argc=3, argv=0x7fffffffd390) at qemu-img.c:4763


(gdb) f 2
#2  0x000000010009d5c7 in bdrv_drain_recurse (bs=0x100843270, begin=false) at block/io.c:201
201         waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
(gdb) list
196
197         /* Ensure any pending metadata writes are submitted to bs->file.  */
198         bdrv_drain_invoke(bs, begin);
199
200         /* Wait for drained requests to finish */
201         waited = BDRV_POLL_WHILE(bs, atomic_read(&bs->in_flight) > 0);
202
203         QLIST_FOREACH_SAFE(child, &bs->children, next, tmp) {
204             BlockDriverState *bs = child->bs;
205             bool in_main_loop =


(gdb) p *bs
$1 = {
  open_flags = 8835232, 
  read_only = true, 
  encrypted = false, 
  sg = false, 
  probed = false, 
  force_share = 56, 
  implicit = 59, 
  drv = 0x0, 
  opaque = 0x0, 
  aio_context = 0x7676767676767676, 
  aio_notifiers = {
    lh_first = 0x7676767676767676
  }, 
  [...]




> And the test cases that Jeff mentions still fail with this patch, too.
> But at least it doesn't only make failure less likely by reducing the
> window for a race condition, but seems to attack a real problem.
> 
> Kevin
> 
> 
> diff --git a/block/io.c b/block/io.c
> index 4fdf93a014..6773926fc1 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -42,9 +42,9 @@ static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
>  
>  void bdrv_parent_drained_begin(BlockDriverState *bs)
>  {
> -    BdrvChild *c;
> +    BdrvChild *c, *next;
>  
> -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
>          if (c->role->drained_begin) {
>              c->role->drained_begin(c);
>          }
> @@ -53,9 +53,9 @@ void bdrv_parent_drained_begin(BlockDriverState *bs)
>  
>  void bdrv_parent_drained_end(BlockDriverState *bs)
>  {
> -    BdrvChild *c;
> +    BdrvChild *c, *next;
>  
> -    QLIST_FOREACH(c, &bs->parents, next_parent) {
> +    QLIST_FOREACH_SAFE(c, &bs->parents, next_parent, next) {
>          if (c->role->drained_end) {
>              c->role->drained_end(c);
>          }
>

Re: [Qemu-devel] [PATCH 0/1] block: Workaround for the iotests errors
Posted by Kevin Wolf 6 years, 4 months ago
Am 28.11.2017 um 06:43 hat Jeff Cody geschrieben:
> On Tue, Nov 28, 2017 at 12:29:09AM +0100, Kevin Wolf wrote:
> > Am 23.11.2017 um 18:57 hat Fam Zheng geschrieben:
> > > Jeff's block job patch made the latent drain bug visible, and I find this
> > > patch, which by itself also makes some sense, can hide it again. :) With it
> > > applied we are at least back to the ground where patchew's iotests (make
> > > docker-test-block@fedora) can pass.
> > > 
> > > The real bug is that in the middle of bdrv_parent_drained_end(), bs's parent
> > > list changes. One drained_end call before the mirror_exit() already did one
> > > blk_root_drained_end(), a second drained_end on an updated parent node can do
> > > another same blk_root_drained_end(), making it unbalanced with
> > > blk_root_drained_begin(). This is shown by the following three backtraces as
> > > captured by rr with a crashed "qemu-img commit", essentially the same as in
> > > the failed iotest 020:
> > 
> > My conclusion what really happens in 020 is that we have a graph like
> > this:
> > 
> >                              mirror target BB --+
> >                                                 |
> >                                                 v
> >     qemu-img BB -> mirror_top_bs -> overlay -> base
> > 
> > bdrv_drained_end(base) results in it being available for requests again,
> > so it calls bdrv_parent_drained_end() for overlay. While draining
> > itself, the mirror job completes and changes the BdrvChild between
> > mirror_top_bs and overlay (which is currently being drained) to point to
> > base instead. After returning, QLIST_FOREACH() continues to iterate the
> > parents of base instead of those of overlay, resulting in a second
> > blk_drained_end() for the mirror target BB.
> > 
> > This instance can be fixed relatively easily (see below) by using
> > QLIST_FOREACH_SAFE() instead.
> > 
> > However, I'm not sure if all problems with the graph change can be
> > solved this way and whether we can really allow graph changes while
> > iterating the graph for bdrv_drained_begin/end. Not allowing it would
> > require some more serious changes to the block jobs that delays their
> > completion until after bdrv_drain_end() has finished (not sure how to
> > even get a callback at that point...)
> > 
> 
> That is at least part of what is causing the segfaults that I am still
> seeing (after your patch):
> 
> We enter bdrv_drain_recurse(), and the BDS has been reaped:

Not sure which test case this is referring to, probably 097 as that's
the next one in your list?

Anyway, test cases 097 and 176 can be fixed for me by keeping some
additional references. This quick fix is probably not quite correct
according to the comment in bdrv_drain_recurse() because bdrv_ref/unref
are only allowed in the main loop thread.

Also, case 141 is still failing.

Kevin


diff --git a/include/block/block_int.h b/include/block/block_int.h
index a5482775ec..c8bdf3648a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -604,6 +604,7 @@ struct BlockDriverState {
     bool probed;    /* if true, format was probed rather than specified */
     bool force_share; /* if true, always allow all shared permissions */
     bool implicit;  /* if true, this filter node was automatically inserted */
+    bool closing;
 
     BlockDriver *drv; /* NULL means no media */
     void *opaque;
diff --git a/block.c b/block.c
index 9a1a0d1e73..7419e28a3b 100644
--- a/block.c
+++ b/block.c
@@ -3392,6 +3392,13 @@ static void bdrv_delete(BlockDriverState *bs)
     assert(bdrv_op_blocker_is_empty(bs));
     assert(!bs->refcnt);
 
+    /* An additional ref/unref pair during the shutdown (e.g. while draining
+     * the requests of this node) should not cause infinite recursion. */
+    if (bs->closing) {
+        return;
+    }
+    bs->closing = true;
+
     bdrv_close(bs);
 
     /* remove from list, if necessary */
diff --git a/block/io.c b/block/io.c
index 6773926fc1..6589246c48 100644
--- a/block/io.c
+++ b/block/io.c
@@ -194,6 +194,8 @@ static bool bdrv_drain_recurse(BlockDriverState *bs, bool begin)
     BdrvChild *child, *tmp;
     bool waited;
 
+    bdrv_ref(bs);
+
     /* Ensure any pending metadata writes are submitted to bs->file.  */
     bdrv_drain_invoke(bs, begin);
 
@@ -221,6 +223,8 @@ static bool bdrv_drain_recurse(BlockDriverState *bs, bool begin)
         }
     }
 
+    bdrv_unref(bs);
+
     return waited;
 }
 
@@ -293,9 +297,13 @@ void bdrv_drained_end(BlockDriverState *bs)
         return;
     }
 
+    /* bdrv_parent_drained_end() and bdrv_drain_recurse() may cause bs to be
+     * deleted if we don't keep an extra reference */
+    bdrv_ref(bs);
     bdrv_parent_drained_end(bs);
     bdrv_drain_recurse(bs, false);
     aio_enable_external(bdrv_get_aio_context(bs));
+    bdrv_unref(bs);
 }
 
 /*
diff --git a/block/io.c b/block/io.c
index 6773926fc1..6589246c48 100644
--- a/block/io.c
+++ b/block/io.c
@@ -194,6 +194,8 @@ static bool bdrv_drain_recurse(BlockDriverState *bs, bool begin)
     BdrvChild *child, *tmp;
     bool waited;
 
+    bdrv_ref(bs);
+
     /* Ensure any pending metadata writes are submitted to bs->file.  */
     bdrv_drain_invoke(bs, begin);
 
@@ -221,6 +223,8 @@ static bool bdrv_drain_recurse(BlockDriverState *bs, bool begin)
         }
     }
 
+    bdrv_unref(bs);
+
     return waited;
 }
 
@@ -293,9 +297,13 @@ void bdrv_drained_end(BlockDriverState *bs)
         return;
     }
 
+    /* bdrv_parent_drained_end() and bdrv_drain_recurse() may cause bs to be
+     * deleted if we don't keep an extra reference */
+    bdrv_ref(bs);
     bdrv_parent_drained_end(bs);
     bdrv_drain_recurse(bs, false);
     aio_enable_external(bdrv_get_aio_context(bs));
+    bdrv_unref(bs);
 }
 
 /*

Re: [Qemu-devel] [Qemu-block] [PATCH 0/1] block: Workaround for the iotests errors
Posted by Kevin Wolf 6 years, 4 months ago
Am 28.11.2017 um 12:42 hat Kevin Wolf geschrieben:
> Am 28.11.2017 um 06:43 hat Jeff Cody geschrieben:
> > On Tue, Nov 28, 2017 at 12:29:09AM +0100, Kevin Wolf wrote:
> > > Am 23.11.2017 um 18:57 hat Fam Zheng geschrieben:
> > > > Jeff's block job patch made the latent drain bug visible, and I find this
> > > > patch, which by itself also makes some sense, can hide it again. :) With it
> > > > applied we are at least back to the ground where patchew's iotests (make
> > > > docker-test-block@fedora) can pass.
> > > > 
> > > > The real bug is that in the middle of bdrv_parent_drained_end(), bs's parent
> > > > list changes. One drained_end call before the mirror_exit() already did one
> > > > blk_root_drained_end(), a second drained_end on an updated parent node can do
> > > > another same blk_root_drained_end(), making it unbalanced with
> > > > blk_root_drained_begin(). This is shown by the following three backtraces as
> > > > captured by rr with a crashed "qemu-img commit", essentially the same as in
> > > > the failed iotest 020:
> > > 
> > > My conclusion what really happens in 020 is that we have a graph like
> > > this:
> > > 
> > >                              mirror target BB --+
> > >                                                 |
> > >                                                 v
> > >     qemu-img BB -> mirror_top_bs -> overlay -> base
> > > 
> > > bdrv_drained_end(base) results in it being available for requests again,
> > > so it calls bdrv_parent_drained_end() for overlay. While draining
> > > itself, the mirror job completes and changes the BdrvChild between
> > > mirror_top_bs and overlay (which is currently being drained) to point to
> > > base instead. After returning, QLIST_FOREACH() continues to iterate the
> > > parents of base instead of those of overlay, resulting in a second
> > > blk_drained_end() for the mirror target BB.
> > > 
> > > This instance can be fixed relatively easily (see below) by using
> > > QLIST_FOREACH_SAFE() instead.
> > > 
> > > However, I'm not sure if all problems with the graph change can be
> > > solved this way and whether we can really allow graph changes while
> > > iterating the graph for bdrv_drained_begin/end. Not allowing it would
> > > require some more serious changes to the block jobs that delays their
> > > completion until after bdrv_drain_end() has finished (not sure how to
> > > even get a callback at that point...)
> > > 
> > 
> > That is at least part of what is causing the segfaults that I am still
> > seeing (after your patch):
> > 
> > We enter bdrv_drain_recurse(), and the BDS has been reaped:
> 
> Not sure which test case this is referring to, probably 097 as that's
> the next one in your list?
> 
> Anyway, test cases 097 and 176 can be fixed for me by keeping some
> additional references. This quick fix is probably not quite correct
> according to the comment in bdrv_drain_recurse() because bdrv_ref/unref
> are only allowed in the main loop thread.
> 
> Also, case 141 is still failing.

As for 141, this one just hangs now because the test case sets speed=1
and so the job throttling decides to sleep for a few hours. We used to
interrupt block_job_sleep_ns(), now we don't any more. I think we need
to allow this again.

Kevin