[Qemu-devel] [PATCH v2 0/2] mirror dead-lock

Vladimir Sementsov-Ogievskiy posted 2 patches 5 years, 4 months ago
Test asan passed
Test checkpatch passed
Test docker-quick@centos7 passed
Test docker-mingw@fedora passed
Test docker-clang@ubuntu passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20181129101801.6421-1-vsementsov@virtuozzo.com
block/mirror.c             | 13 ++++-----
tests/qemu-iotests/235     | 59 ++++++++++++++++++++++++++++++++++++++
tests/qemu-iotests/235.out |  1 +
tests/qemu-iotests/group   |  1 +
4 files changed, 66 insertions(+), 8 deletions(-)
create mode 100755 tests/qemu-iotests/235
create mode 100644 tests/qemu-iotests/235.out
[Qemu-devel] [PATCH v2 0/2] mirror dead-lock
Posted by Vladimir Sementsov-Ogievskiy 5 years, 4 months ago
Hi all!

v2: add fix:)

We've faced the following mirror bug:

Just run mirror on qcow2 image more than 1G, and qemu is in dead lock.

Dead lock described in 01, in short, we have extra aio_context_acquire
and aio_context_release around blk_aio_pwritev in mirror_read_complete.
So, write may yield to the main loop, and aio context is acquired. Main
loop than hangs on trying to lock BQL, which is locked by cpu thread,
and the cpu thread hangs on trying to acquire aio context.

Hm, now the thing looks fixed, by I still have a questions:

Is it a common thing, that we can't yield inside
aio_context_acquire/release ?

Was commit b9e413dd3756 
"block: explicitly acquire aiocontext in aio callbacks that need it"
wrong? Why it added these acquire/release, when it is written in 
multiple-iothreads.txt, that "Side note: the best way to schedule a function
call across threads is to call aio_bh_schedule_oneshot().  No acquire/release
or locking is needed."

Can someone in short describe, what BQL and aio context lock means, what they
protect, and haw they should cooperate?

Vladimir Sementsov-Ogievskiy (2):
  mirror: fix dead-lock
  iotests: simple mirror test with kvm on 1G image

 block/mirror.c             | 13 ++++-----
 tests/qemu-iotests/235     | 59 ++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/235.out |  1 +
 tests/qemu-iotests/group   |  1 +
 4 files changed, 66 insertions(+), 8 deletions(-)
 create mode 100755 tests/qemu-iotests/235
 create mode 100644 tests/qemu-iotests/235.out

-- 
2.18.0


Re: [Qemu-devel] [Qemu-devel for-3.1?] [PATCH v2 0/2] mirror dead-lock
Posted by Eric Blake 5 years, 4 months ago
On 11/29/18 4:17 AM, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> v2: add fix:)
> 
> We've faced the following mirror bug:
> 
> Just run mirror on qcow2 image more than 1G, and qemu is in dead lock.

How long has the bug been present?  Based on patch 1 mentioning commit 
2e1990b26e5 (in 3.0), I'm guessing that this bug is not a regression 
specific to 3.1?  If so, then maybe we're better off deferring this to 
4.0 just relying on qemu-stable (already cc'd); on the other hand, if it 
is an easy-to-trigger deadlock on something the user is likely to do, 
can we justify it being severe enough to warrant -rc4?

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [Qemu-devel for-3.1?] [PATCH v2 0/2] mirror dead-lock
Posted by Max Reitz 5 years, 4 months ago
On 29.11.18 15:14, Eric Blake wrote:
> On 11/29/18 4:17 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> v2: add fix:)
>>
>> We've faced the following mirror bug:
>>
>> Just run mirror on qcow2 image more than 1G, and qemu is in dead lock.
> 
> How long has the bug been present?  Based on patch 1 mentioning commit
> 2e1990b26e5 (in 3.0), I'm guessing that this bug is not a regression
> specific to 3.1?  If so, then maybe we're better off deferring this to
> 4.0 just relying on qemu-stable (already cc'd); on the other hand, if it
> is an easy-to-trigger deadlock on something the user is likely to do,
> can we justify it being severe enough to warrant -rc4?

The test does indeed break on 3.0.0 (and passes on 2.12.0).  But mirror
just not working from non-trivial qcow2 images?  To me, that is a reason
to delay the release, even though it is not a regression.

Max

Re: [Qemu-devel] [PATCH v2 0/2] mirror dead-lock
Posted by Max Reitz 5 years, 4 months ago
On 29.11.18 11:17, Vladimir Sementsov-Ogievskiy wrote:
> Hi all!
> 
> v2: add fix:)
> 
> We've faced the following mirror bug:
> 
> Just run mirror on qcow2 image more than 1G, and qemu is in dead lock.

So because apparently there is going to be an rc4 anyway (like basically
always...), I'd really like to bring this fix into it, unless there are
any objections from anyone (though all of you are more than welcome to
explicitly agree, too :-)).

Do you have any plans for the iotest?  Right now, I'd rather just take
patch 1 as-is and add the test later, but then again, adding a patch for
rc4 without a test is not so nice either, I suppose.

Max

Re: [Qemu-devel] [PATCH v2 0/2] mirror dead-lock
Posted by Vladimir Sementsov-Ogievskiy 5 years, 4 months ago
03.12.2018 16:59, Max Reitz wrote:
> On 29.11.18 11:17, Vladimir Sementsov-Ogievskiy wrote:
>> Hi all!
>>
>> v2: add fix:)
>>
>> We've faced the following mirror bug:
>>
>> Just run mirror on qcow2 image more than 1G, and qemu is in dead lock.
> 
> So because apparently there is going to be an rc4 anyway (like basically
> always...), I'd really like to bring this fix into it, unless there are
> any objections from anyone (though all of you are more than welcome to
> explicitly agree, too :-)).
> 
> Do you have any plans for the iotest?  Right now, I'd rather just take
> patch 1 as-is and add the test later, but then again, adding a patch for
> rc4 without a test is not so nice either, I suppose.
> 
> Max
> 

I think, everything is better with test:) I can't say that I really like your
additions, because it's anyway a kind of cheating, less real-life, but on the
other hand, as I understand, allocating a lot of disk space in iotests is a bad
thing too.

May be it should be a kind of parameter, with default to your variant, something
like ./check --big-disk-allocations-allowed :). But let's commit at least the
test with your additions.

-- 
Best regards,
Vladimir
Re: [Qemu-devel] [PATCH v2 0/2] mirror dead-lock
Posted by Max Reitz 5 years, 4 months ago
On 03.12.18 15:13, Vladimir Sementsov-Ogievskiy wrote:
> 03.12.2018 16:59, Max Reitz wrote:
>> On 29.11.18 11:17, Vladimir Sementsov-Ogievskiy wrote:
>>> Hi all!
>>>
>>> v2: add fix:)
>>>
>>> We've faced the following mirror bug:
>>>
>>> Just run mirror on qcow2 image more than 1G, and qemu is in dead lock.
>>
>> So because apparently there is going to be an rc4 anyway (like basically
>> always...), I'd really like to bring this fix into it, unless there are
>> any objections from anyone (though all of you are more than welcome to
>> explicitly agree, too :-)).
>>
>> Do you have any plans for the iotest?  Right now, I'd rather just take
>> patch 1 as-is and add the test later, but then again, adding a patch for
>> rc4 without a test is not so nice either, I suppose.
>>
>> Max
>>
> 
> I think, everything is better with test:) I can't say that I really like your
> additions, because it's anyway a kind of cheating, less real-life,

You mean like all iotests? ;-)

iotests usually only test a single specific thing and not a full
real-life case.

>                                                                    but on the
> other hand, as I understand, allocating a lot of disk space in iotests is a bad
> thing too.
> 
> May be it should be a kind of parameter, with default to your variant, something
> like ./check --big-disk-allocations-allowed :).

As I said, we would need to add a new group (e.g. "big-disk-allocation")
and then probably disable that group in check by default.  You could run
those tests with ./check -g big-disk-allocation.

> But let's commit at least the test with your additions.

I mean, we can also add both tests.  But I should say that your version
did not fail on tmpfs before this fix, and I usually run tests on tmpfs,
so...  It wouldn't be very indicative of the issue for me.

Max

Re: [Qemu-devel] [PATCH v2 0/2] mirror dead-lock
Posted by Eric Blake 5 years, 4 months ago
On 12/3/18 8:26 AM, Max Reitz wrote:

>>> So because apparently there is going to be an rc4 anyway (like basically
>>> always...), I'd really like to bring this fix into it, unless there are
>>> any objections from anyone (though all of you are more than welcome to
>>> explicitly agree, too :-)).

I agree with fixing this in -rc4.

>>>
>>> Do you have any plans for the iotest?  Right now, I'd rather just take
>>> patch 1 as-is and add the test later, but then again, adding a patch for
>>> rc4 without a test is not so nice either, I suppose.

...


>>
>> May be it should be a kind of parameter, with default to your variant, something
>> like ./check --big-disk-allocations-allowed :).
> 
> As I said, we would need to add a new group (e.g. "big-disk-allocation")
> and then probably disable that group in check by default.  You could run
> those tests with ./check -g big-disk-allocation.
> 
>> But let's commit at least the test with your additions.
> 
> I mean, we can also add both tests.  But I should say that your version
> did not fail on tmpfs before this fix, and I usually run tests on tmpfs,
> so...  It wouldn't be very indicative of the issue for me.

My take - patch 1 for fixing the bug, plus Max's patch 3 for quickly 
testing the bug for 3.1. Vladimir's patch 2 should defer to 4.0, if we 
want it at all, since we're still debating about whether we need it (and 
even how we would spell it in iotests to run or not run it under 
particular setups).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH v2 0/2] mirror dead-lock
Posted by Peter Maydell 5 years, 4 months ago
On Mon, 3 Dec 2018 at 14:03, Max Reitz <mreitz@redhat.com> wrote:
>
> On 29.11.18 11:17, Vladimir Sementsov-Ogievskiy wrote:
> > Hi all!
> >
> > v2: add fix:)
> >
> > We've faced the following mirror bug:
> >
> > Just run mirror on qcow2 image more than 1G, and qemu is in dead lock.
>
> So because apparently there is going to be an rc4 anyway (like basically
> always...), I'd really like to bring this fix into it, unless there are
> any objections from anyone (though all of you are more than welcome to
> explicitly agree, too :-)).

Discussion on IRC seemed to lean in favour of putting this fix into
rc4 as well. Could somebody (not sure if this would be you or Kevin)
send a pullrequest with it in today, please ?

thanks
-- PMM

Re: [Qemu-devel] [PATCH v2 0/2] mirror dead-lock
Posted by Kevin Wolf 5 years, 4 months ago
Am 29.11.2018 um 11:17 hat Vladimir Sementsov-Ogievskiy geschrieben:
> Hi all!
> 
> v2: add fix:)
> 
> We've faced the following mirror bug:
> 
> Just run mirror on qcow2 image more than 1G, and qemu is in dead lock.
> 
> Dead lock described in 01, in short, we have extra aio_context_acquire
> and aio_context_release around blk_aio_pwritev in mirror_read_complete.
> So, write may yield to the main loop, and aio context is acquired. Main
> loop than hangs on trying to lock BQL, which is locked by cpu thread,
> and the cpu thread hangs on trying to acquire aio context.
> 
> Hm, now the thing looks fixed, by I still have a questions:
> 
> Is it a common thing, that we can't yield inside
> aio_context_acquire/release ?
> 
> Was commit b9e413dd3756 
> "block: explicitly acquire aiocontext in aio callbacks that need it"
> wrong? Why it added these acquire/release, when it is written in 
> multiple-iothreads.txt, that "Side note: the best way to schedule a function
> call across threads is to call aio_bh_schedule_oneshot().  No acquire/release
> or locking is needed."
> 
> Can someone in short describe, what BQL and aio context lock means, what they
> protect, and haw they should cooperate?

Thanks, applied patch 1 to the block branch. (We'll use Max' v3 for the
test case.)

Kevin