[v1] block: Attempt on fixing 030-reported errors

[PATCH 0/7] block: Attempt on fixing 030-reported errors

Posted by Hanna Reitz 2 years, 6 months ago

Hi,

I’ve tried to investigate what causes the iotest 030 to fail.  Here’s
what I found:

(1) stream_prepare() gets the base node by looking up the node below
    above_base.  It then invokes bdrv_cor_filter_drop(), before we
    actually use the base node.
    bdrv_cor_filter_drop() modifies the block graph, which means
    draining, which means other parties might modify the graph, too.
    Therefore, afterwards, the node below above_base might be completely
    different, and the base node we got before might already be gone.

(2) bdrv_replace_child_noperm() can set BdrvChild.bs to NULL.  That’s
    problematic, because most of our code cannot deal with BdrvChild
    objects whose .bs pointer is NULL.  We assume that such objects are
    immediately removed from the BDS.children list, and that they won’t
    appear under bs->backing or bs->file (i.e. that those pointers are
    immediately NULLed when bs->{backing,file}->bs is NULLed).
    After setting BdrvChild.bs to NULL, bdrv_replace_child_noperm() may
    invoke bdrv_parent_drained_end_single() on the BdrvChild.
    Therefore, other code is run at that point, and it might not be
    ready to encounter something like
    `bs->backing != NULL && bs->backing->bs == NULL`.

(3) 030 in one case launches four stream jobs concurrently, all with
    speed=1024.  It then unthrottles them one after each other, but the
    problem is that if one job finishes, the jobs above it will be
    advanced by a step (which is actually 512k); so since we unthrottle
    bottom to top, it’s possible that all jobs below the top job are
    finished before we get to unthrottle the top job.  This will advance
    the top job so far (3 * 512k + 512k = 2M) that it actually finishes
    despite still being throttled.  Attempting to unthrottle it then
    throws an error.


Here’s how I think we can solve these problems:

(1) Invoke bdrv_cor_filter_drop() first, then get the base node
    afterwards, when the graph will no longer change.
    Implemented in patch 1.

(2A) bdrv_replace_child_noperm() should immediately set bs->file or
     bs->backing to NULL when it sets bs->{file,backing}->bs to NULL.
     It should also immediately remove any BdrvChild with .bs == NULL
     from the parent’s BDS.children list.
     Implemented in patches 2 through 6.

(2B) Alternatively, we could always keep the whole subgraph drained
     while we manipulate it.  Then, the bdrv_parent_drained_end_single()
     in bdrv_replace_child_noperm() wouldn’t do anything.
     To fix 030, we would need to add a drained section to
     stream_prepare(): Namely we’d need to drain the subgraph below the
     COR filter node.
     This would be a much simpler solution, but I don’t feel like it’s
     the right one.

(3) Just unthrottle the jobs from bottom to top instead of top to
    bottom.


As you can see, I’m not sure which of 2A or 2B is the right solution.  I
decided to investigate both: 2A was much more complicated, but seemed
like the right thing to do; 2B is much simpler, but doesn’t feel as
right.  Therefore, I decided to go with 2A in this first version of this
series.


Hanna Reitz (7):
  stream: Traverse graph after modification
  block: Manipulate children list in .attach/.detach
  block: Unite remove_empty_child and child_free
  block: Drop detached child from ignore list
  block: Pass BdrvChild ** to replace_child_noperm
  block: Let replace_child_noperm free children
  iotests/030: Unthrottle parallel jobs in reverse

 block.c                | 178 +++++++++++++++++++++++++++++------------
 block/stream.c         |   7 +-
 tests/qemu-iotests/030 |  11 ++-
 3 files changed, 144 insertions(+), 52 deletions(-)

-- 
2.33.1

Re: [PATCH 0/7] block: Attempt on fixing 030-reported errors

Posted by Kevin Wolf 2 years, 6 months ago

Am 04.11.2021 um 11:38 hat Hanna Reitz geschrieben:
> Hanna Reitz (7):
>   stream: Traverse graph after modification
>   block: Manipulate children list in .attach/.detach
>   block: Unite remove_empty_child and child_free
>   block: Drop detached child from ignore list
>   block: Pass BdrvChild ** to replace_child_noperm
>   block: Let replace_child_noperm free children
>   iotests/030: Unthrottle parallel jobs in reverse

Now I know that I don't aspire to a new career as a full time borrow
checker. :-)

Patches 1-4:
Reviewed-by: Kevin Wolf <kwolf@redhat.com>

Re: [PATCH 0/7] block: Attempt on fixing 030-reported errors

Posted by Kevin Wolf 2 years, 6 months ago

Am 04.11.2021 um 11:38 hat Hanna Reitz geschrieben:
> (2A) bdrv_replace_child_noperm() should immediately set bs->file or
>      bs->backing to NULL when it sets bs->{file,backing}->bs to NULL.
>      It should also immediately remove any BdrvChild with .bs == NULL
>      from the parent’s BDS.children list.
>      Implemented in patches 2 through 6.
> 
> (2B) Alternatively, we could always keep the whole subgraph drained
>      while we manipulate it.  Then, the bdrv_parent_drained_end_single()
>      in bdrv_replace_child_noperm() wouldn’t do anything.
>      To fix 030, we would need to add a drained section to
>      stream_prepare(): Namely we’d need to drain the subgraph below the
>      COR filter node.
>      This would be a much simpler solution, but I don’t feel like it’s
>      the right one.

> As you can see, I’m not sure which of 2A or 2B is the right solution.  I
> decided to investigate both: 2A was much more complicated, but seemed
> like the right thing to do; 2B is much simpler, but doesn’t feel as
> right.  Therefore, I decided to go with 2A in this first version of this
> series.

I haven't looked at the patches yet, but if I understand correctly the
choice you're presenting here is between protecting code from accessing
invalid state and not creating the invalid state in the first place. I
agree that the latter is preferable as long as it doesn't make things so
complicated that we would be willing to accept the higher risk of
breakage in the former. If it's doable in five patches, it's probably
not complicated enough to make such compromises.

Kevin

Re: [PATCH 0/7] block: Attempt on fixing 030-reported errors

Posted by Hanna Reitz 2 years, 6 months ago

On 04.11.21 12:58, Kevin Wolf wrote:
> Am 04.11.2021 um 11:38 hat Hanna Reitz geschrieben:
>> (2A) bdrv_replace_child_noperm() should immediately set bs->file or
>>       bs->backing to NULL when it sets bs->{file,backing}->bs to NULL.
>>       It should also immediately remove any BdrvChild with .bs == NULL
>>       from the parent’s BDS.children list.
>>       Implemented in patches 2 through 6.
>>
>> (2B) Alternatively, we could always keep the whole subgraph drained
>>       while we manipulate it.  Then, the bdrv_parent_drained_end_single()
>>       in bdrv_replace_child_noperm() wouldn’t do anything.
>>       To fix 030, we would need to add a drained section to
>>       stream_prepare(): Namely we’d need to drain the subgraph below the
>>       COR filter node.
>>       This would be a much simpler solution, but I don’t feel like it’s
>>       the right one.
>> As you can see, I’m not sure which of 2A or 2B is the right solution.  I
>> decided to investigate both: 2A was much more complicated, but seemed
>> like the right thing to do; 2B is much simpler, but doesn’t feel as
>> right.  Therefore, I decided to go with 2A in this first version of this
>> series.
> I haven't looked at the patches yet, but if I understand correctly the
> choice you're presenting here is between protecting code from accessing
> invalid state and not creating the invalid state in the first place.

Yes, that’s right.

> I agree that the latter is preferable as long as it doesn't make things
> so complicated that we would be willing to accept the higher risk of
> breakage in the former.

No, I don’t think it’s too complicated.  Just not as sample as a 
drained_begin + drained_end.

> If it's doable in five patches, it's probably
> not complicated enough to make such compromises.

Without the clean-up patches that are patches 3 and 4, it would be 
doable in even fewer patches. :)

Hanna