Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1367369#c8 It's possible to wedge QEMU if the guest tries to reset a virtio-pci device as QEMU is also using the drive for a blockjob. This patchset aims to allow us to safely pause/resume jobs attached to individual nodes in a manner similar to how bdrv_drain_all_begin/end do. John Snow (3): blockjob: add block_job_start_shim block-backend: add drained_begin / drained_end ops blockjob: add devops to blockjob backends block/block-backend.c | 24 ++++++++++++++++-- blockjob.c | 55 +++++++++++++++++++++++++++++++++--------- include/sysemu/block-backend.h | 8 ++++++ 3 files changed, 73 insertions(+), 14 deletions(-) -- 2.9.3
Hi,
This series seems to have some coding style problems. See output below for
more information:
Type: series
Subject: [Qemu-devel] [PATCH v2 0/3] block: pause block jobs for bdrv_drain_begin/end
Message-id: 20170316212351.13797-1-jsnow@redhat.com
=== TEST SCRIPT BEGIN ===
#!/bin/bash
BASE=base
n=1
total=$(git log --oneline $BASE.. | wc -l)
failed=0
# Useful git options
git config --local diff.renamelimit 0
git config --local diff.renames True
commits="$(git log --format=%H --reverse $BASE..)"
for c in $commits; do
echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
failed=1
echo
fi
n=$((n+1))
done
exit $failed
=== TEST SCRIPT END ===
Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
* [new tag] patchew/20170316212351.13797-1-jsnow@redhat.com -> patchew/20170316212351.13797-1-jsnow@redhat.com
Switched to a new branch 'test'
1cca6f3 blockjob: add devops to blockjob backends
864d906 block-backend: add drained_begin / drained_end ops
5e4f22d blockjob: add block_job_start_shim
=== OUTPUT BEGIN ===
Checking PATCH 1/3: blockjob: add block_job_start_shim...
Checking PATCH 2/3: block-backend: add drained_begin / drained_end ops...
ERROR: suspect code indent for conditional statements (8, 14)
#70: FILE: block/block-backend.c:1903:
+ if (blk->dev_ops && blk->dev_ops->drained_end) {
+ blk->dev_ops->drained_end(blk->dev_opaque);
total: 1 errors, 0 warnings, 67 lines checked
Your patch has style problems, please review. If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
Checking PATCH 3/3: blockjob: add devops to blockjob backends...
=== OUTPUT END ===
Test command exited with code: 1
---
Email generated automatically by Patchew [http://patchew.org/].
Please send your feedback to patchew-devel@freelists.org
ping, is this the only issue? Any feedback? If this can hit 2.9 that
would be good.
--js
On 03/16/2017 05:28 PM, no-reply@patchew.org wrote:
> Hi,
>
> This series seems to have some coding style problems. See output below for
> more information:
>
> Type: series
> Subject: [Qemu-devel] [PATCH v2 0/3] block: pause block jobs for bdrv_drain_begin/end
> Message-id: 20170316212351.13797-1-jsnow@redhat.com
>
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
>
> BASE=base
> n=1
> total=$(git log --oneline $BASE.. | wc -l)
> failed=0
>
> # Useful git options
> git config --local diff.renamelimit 0
> git config --local diff.renames True
>
> commits="$(git log --format=%H --reverse $BASE..)"
> for c in $commits; do
> echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
> if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
> failed=1
> echo
> fi
> n=$((n+1))
> done
>
> exit $failed
> === TEST SCRIPT END ===
>
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> From https://github.com/patchew-project/qemu
> * [new tag] patchew/20170316212351.13797-1-jsnow@redhat.com -> patchew/20170316212351.13797-1-jsnow@redhat.com
> Switched to a new branch 'test'
> 1cca6f3 blockjob: add devops to blockjob backends
> 864d906 block-backend: add drained_begin / drained_end ops
> 5e4f22d blockjob: add block_job_start_shim
>
> === OUTPUT BEGIN ===
> Checking PATCH 1/3: blockjob: add block_job_start_shim...
> Checking PATCH 2/3: block-backend: add drained_begin / drained_end ops...
> ERROR: suspect code indent for conditional statements (8, 14)
> #70: FILE: block/block-backend.c:1903:
> + if (blk->dev_ops && blk->dev_ops->drained_end) {
> + blk->dev_ops->drained_end(blk->dev_opaque);
>
> total: 1 errors, 0 warnings, 67 lines checked
>
> Your patch has style problems, please review. If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
>
> Checking PATCH 3/3: blockjob: add devops to blockjob backends...
> === OUTPUT END ===
>
> Test command exited with code: 1
>
>
> ---
> Email generated automatically by Patchew [http://patchew.org/].
> Please send your feedback to patchew-devel@freelists.org
>
On Wed, Mar 22, 2017 at 11:37:15AM -0400, John Snow wrote:
> ping, is this the only issue? Any feedback? If this can hit 2.9 that
> would be good.
>
The series looks fine to me, and I can patch up the nit from patchew when
applying. But do you happen to have a qemu-iotest for this case, or is it
not very feasible to create one?
>
> On 03/16/2017 05:28 PM, no-reply@patchew.org wrote:
> > Hi,
> >
> > This series seems to have some coding style problems. See output below for
> > more information:
> >
> > Type: series
> > Subject: [Qemu-devel] [PATCH v2 0/3] block: pause block jobs for bdrv_drain_begin/end
> > Message-id: 20170316212351.13797-1-jsnow@redhat.com
> >
> > === TEST SCRIPT BEGIN ===
> > #!/bin/bash
> >
> > BASE=base
> > n=1
> > total=$(git log --oneline $BASE.. | wc -l)
> > failed=0
> >
> > # Useful git options
> > git config --local diff.renamelimit 0
> > git config --local diff.renames True
> >
> > commits="$(git log --format=%H --reverse $BASE..)"
> > for c in $commits; do
> > echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
> > if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
> > failed=1
> > echo
> > fi
> > n=$((n+1))
> > done
> >
> > exit $failed
> > === TEST SCRIPT END ===
> >
> > Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> > From https://github.com/patchew-project/qemu
> > * [new tag] patchew/20170316212351.13797-1-jsnow@redhat.com -> patchew/20170316212351.13797-1-jsnow@redhat.com
> > Switched to a new branch 'test'
> > 1cca6f3 blockjob: add devops to blockjob backends
> > 864d906 block-backend: add drained_begin / drained_end ops
> > 5e4f22d blockjob: add block_job_start_shim
> >
> > === OUTPUT BEGIN ===
> > Checking PATCH 1/3: blockjob: add block_job_start_shim...
> > Checking PATCH 2/3: block-backend: add drained_begin / drained_end ops...
> > ERROR: suspect code indent for conditional statements (8, 14)
> > #70: FILE: block/block-backend.c:1903:
> > + if (blk->dev_ops && blk->dev_ops->drained_end) {
> > + blk->dev_ops->drained_end(blk->dev_opaque);
> >
> > total: 1 errors, 0 warnings, 67 lines checked
> >
> > Your patch has style problems, please review. If any of these errors
> > are false positives report them to the maintainer, see
> > CHECKPATCH in MAINTAINERS.
> >
> > Checking PATCH 3/3: blockjob: add devops to blockjob backends...
> > === OUTPUT END ===
> >
> > Test command exited with code: 1
> >
> >
> > ---
> > Email generated automatically by Patchew [http://patchew.org/].
> > Please send your feedback to patchew-devel@freelists.org
> >
On 03/22/2017 12:01 PM, Jeff Cody wrote:
> On Wed, Mar 22, 2017 at 11:37:15AM -0400, John Snow wrote:
>> ping, is this the only issue? Any feedback? If this can hit 2.9 that
>> would be good.
>>
>
> The series looks fine to me, and I can patch up the nit from patchew when
> applying. But do you happen to have a qemu-iotest for this case, or is it
> not very feasible to create one?
>
I might need a hint from Paolo on how; my reproducer ATM is to literally
boot a Fedora VM and issue reboots/QMP commands and manually observe a
hang. (Which is pretty subjective ...)
An iotest version would probably involve using the qtest socket to issue
a PCI reset of some sort inbetween QMP commands as necessary, but
testing for a hang in iotests seems race-prone. I have no idea how long
this hang would last on other machines, for instance.
There might be a more fool-proof automated testing method, but at the
second I'm drawing a blank.
>>
>> On 03/16/2017 05:28 PM, no-reply@patchew.org wrote:
>>> Hi,
>>>
>>> This series seems to have some coding style problems. See output below for
>>> more information:
>>>
>>> Type: series
>>> Subject: [Qemu-devel] [PATCH v2 0/3] block: pause block jobs for bdrv_drain_begin/end
>>> Message-id: 20170316212351.13797-1-jsnow@redhat.com
>>>
>>> === TEST SCRIPT BEGIN ===
>>> #!/bin/bash
>>>
>>> BASE=base
>>> n=1
>>> total=$(git log --oneline $BASE.. | wc -l)
>>> failed=0
>>>
>>> # Useful git options
>>> git config --local diff.renamelimit 0
>>> git config --local diff.renames True
>>>
>>> commits="$(git log --format=%H --reverse $BASE..)"
>>> for c in $commits; do
>>> echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
>>> if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
>>> failed=1
>>> echo
>>> fi
>>> n=$((n+1))
>>> done
>>>
>>> exit $failed
>>> === TEST SCRIPT END ===
>>>
>>> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
>>> From https://github.com/patchew-project/qemu
>>> * [new tag] patchew/20170316212351.13797-1-jsnow@redhat.com -> patchew/20170316212351.13797-1-jsnow@redhat.com
>>> Switched to a new branch 'test'
>>> 1cca6f3 blockjob: add devops to blockjob backends
>>> 864d906 block-backend: add drained_begin / drained_end ops
>>> 5e4f22d blockjob: add block_job_start_shim
>>>
>>> === OUTPUT BEGIN ===
>>> Checking PATCH 1/3: blockjob: add block_job_start_shim...
>>> Checking PATCH 2/3: block-backend: add drained_begin / drained_end ops...
>>> ERROR: suspect code indent for conditional statements (8, 14)
>>> #70: FILE: block/block-backend.c:1903:
>>> + if (blk->dev_ops && blk->dev_ops->drained_end) {
>>> + blk->dev_ops->drained_end(blk->dev_opaque);
>>>
>>> total: 1 errors, 0 warnings, 67 lines checked
>>>
>>> Your patch has style problems, please review. If any of these errors
>>> are false positives report them to the maintainer, see
>>> CHECKPATCH in MAINTAINERS.
>>>
>>> Checking PATCH 3/3: blockjob: add devops to blockjob backends...
>>> === OUTPUT END ===
>>>
>>> Test command exited with code: 1
>>>
>>>
>>> ---
>>> Email generated automatically by Patchew [http://patchew.org/].
>>> Please send your feedback to patchew-devel@freelists.org
>>>
On Wed, Mar 22, 2017 at 12:05:26PM -0400, John Snow wrote:
>
>
> On 03/22/2017 12:01 PM, Jeff Cody wrote:
> > On Wed, Mar 22, 2017 at 11:37:15AM -0400, John Snow wrote:
> >> ping, is this the only issue? Any feedback? If this can hit 2.9 that
> >> would be good.
> >>
> >
> > The series looks fine to me, and I can patch up the nit from patchew when
> > applying. But do you happen to have a qemu-iotest for this case, or is it
> > not very feasible to create one?
> >
>
> I might need a hint from Paolo on how; my reproducer ATM is to literally
> boot a Fedora VM and issue reboots/QMP commands and manually observe a
> hang. (Which is pretty subjective ...)
>
> An iotest version would probably involve using the qtest socket to issue
> a PCI reset of some sort inbetween QMP commands as necessary, but
> testing for a hang in iotests seems race-prone. I have no idea how long
> this hang would last on other machines, for instance.
>
> There might be a more fool-proof automated testing method, but at the
> second I'm drawing a blank.
>
OK, I figured as much. We'll chalk that up as "not very feasible", at least
for the moment :)
> >>
> >> On 03/16/2017 05:28 PM, no-reply@patchew.org wrote:
> >>> Hi,
> >>>
> >>> This series seems to have some coding style problems. See output below for
> >>> more information:
> >>>
> >>> Type: series
> >>> Subject: [Qemu-devel] [PATCH v2 0/3] block: pause block jobs for bdrv_drain_begin/end
> >>> Message-id: 20170316212351.13797-1-jsnow@redhat.com
> >>>
> >>> === TEST SCRIPT BEGIN ===
> >>> #!/bin/bash
> >>>
> >>> BASE=base
> >>> n=1
> >>> total=$(git log --oneline $BASE.. | wc -l)
> >>> failed=0
> >>>
> >>> # Useful git options
> >>> git config --local diff.renamelimit 0
> >>> git config --local diff.renames True
> >>>
> >>> commits="$(git log --format=%H --reverse $BASE..)"
> >>> for c in $commits; do
> >>> echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
> >>> if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
> >>> failed=1
> >>> echo
> >>> fi
> >>> n=$((n+1))
> >>> done
> >>>
> >>> exit $failed
> >>> === TEST SCRIPT END ===
> >>>
> >>> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> >>> From https://github.com/patchew-project/qemu
> >>> * [new tag] patchew/20170316212351.13797-1-jsnow@redhat.com -> patchew/20170316212351.13797-1-jsnow@redhat.com
> >>> Switched to a new branch 'test'
> >>> 1cca6f3 blockjob: add devops to blockjob backends
> >>> 864d906 block-backend: add drained_begin / drained_end ops
> >>> 5e4f22d blockjob: add block_job_start_shim
> >>>
> >>> === OUTPUT BEGIN ===
> >>> Checking PATCH 1/3: blockjob: add block_job_start_shim...
> >>> Checking PATCH 2/3: block-backend: add drained_begin / drained_end ops...
> >>> ERROR: suspect code indent for conditional statements (8, 14)
> >>> #70: FILE: block/block-backend.c:1903:
> >>> + if (blk->dev_ops && blk->dev_ops->drained_end) {
> >>> + blk->dev_ops->drained_end(blk->dev_opaque);
> >>>
> >>> total: 1 errors, 0 warnings, 67 lines checked
> >>>
> >>> Your patch has style problems, please review. If any of these errors
> >>> are false positives report them to the maintainer, see
> >>> CHECKPATCH in MAINTAINERS.
> >>>
> >>> Checking PATCH 3/3: blockjob: add devops to blockjob backends...
> >>> === OUTPUT END ===
> >>>
> >>> Test command exited with code: 1
> >>>
> >>>
> >>> ---
> >>> Email generated automatically by Patchew [http://patchew.org/].
> >>> Please send your feedback to patchew-devel@freelists.org
> >>>
On Thu, Mar 16, 2017 at 02:28:47PM -0700, no-reply@patchew.org wrote:
> Hi,
>
> This series seems to have some coding style problems. See output below for
> more information:
>
> Type: series
> Subject: [Qemu-devel] [PATCH v2 0/3] block: pause block jobs for bdrv_drain_begin/end
> Message-id: 20170316212351.13797-1-jsnow@redhat.com
>
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
>
> BASE=base
> n=1
> total=$(git log --oneline $BASE.. | wc -l)
> failed=0
>
> # Useful git options
> git config --local diff.renamelimit 0
> git config --local diff.renames True
>
> commits="$(git log --format=%H --reverse $BASE..)"
> for c in $commits; do
> echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..."
> if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then
> failed=1
> echo
> fi
> n=$((n+1))
> done
>
> exit $failed
> === TEST SCRIPT END ===
>
> Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
> From https://github.com/patchew-project/qemu
> * [new tag] patchew/20170316212351.13797-1-jsnow@redhat.com -> patchew/20170316212351.13797-1-jsnow@redhat.com
> Switched to a new branch 'test'
> 1cca6f3 blockjob: add devops to blockjob backends
> 864d906 block-backend: add drained_begin / drained_end ops
> 5e4f22d blockjob: add block_job_start_shim
>
> === OUTPUT BEGIN ===
> Checking PATCH 1/3: blockjob: add block_job_start_shim...
> Checking PATCH 2/3: block-backend: add drained_begin / drained_end ops...
> ERROR: suspect code indent for conditional statements (8, 14)
> #70: FILE: block/block-backend.c:1903:
> + if (blk->dev_ops && blk->dev_ops->drained_end) {
> + blk->dev_ops->drained_end(blk->dev_opaque);
>
> total: 1 errors, 0 warnings, 67 lines checked
>
> Your patch has style problems, please review. If any of these errors
> are false positives report them to the maintainer, see
> CHECKPATCH in MAINTAINERS.
>
> Checking PATCH 3/3: blockjob: add devops to blockjob backends...
> === OUTPUT END ===
>
> Test command exited with code: 1
>
>
> ---
> Email generated automatically by Patchew [http://patchew.org/].
> Please send your feedback to patchew-devel@freelists.org
Fixed the patchew nit, and:
Thanks,
Applied to my block branch:
git://github.com/codyprime/qemu-kvm-jtc.git block
-Jeff
On Thu, Mar 16, 2017 at 9:23 PM, John Snow <jsnow@redhat.com> wrote: > Reference: https://bugzilla.redhat.com/show_bug.cgi?id=1367369#c8 > > It's possible to wedge QEMU if the guest tries to reset a virtio-pci > device as QEMU is also using the drive for a blockjob. This patchset > aims to allow us to safely pause/resume jobs attached to individual > nodes in a manner similar to how bdrv_drain_all_begin/end do. Weird, I thought the 0 nanosecond sleep that block jobs do in their main loop allows aio_poll() loops to finish. Maybe this bug is a regression? I've CCed Fam because I think he has also tackled this issue in the past. Stefan
On 23/03/2017 18:44, Stefan Hajnoczi wrote:
>> It's possible to wedge QEMU if the guest tries to reset a virtio-pci
>> device as QEMU is also using the drive for a blockjob. This patchset
>> aims to allow us to safely pause/resume jobs attached to individual
>> nodes in a manner similar to how bdrv_drain_all_begin/end do.
>
> Weird, I thought the 0 nanosecond sleep that block jobs do in their
> main loop allows aio_poll() loops to finish.
The 0 nanosecond sleep is now done in the BDS AioContext rather than in
the "non-aio_poll-aware" main loop:
commit 0b9caf9b3166c8deb3c4f3a774c2384b069dc29c
Author: Fam Zheng <famz@redhat.com>
Date: Tue Aug 26 15:15:43 2014 +0800
coroutine: Drop co_sleep_ns
block_job_sleep_ns is the only user. Since we are moving towards
AioContext aware code, it's better to use the explicit version and drop
the old one.
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> I've CCed Fam because I think he has also tackled this issue in the past.
Yes, he fixed the same issue for bdrv_drain_all.
Paolo
On Thu, Mar 23, 2017 at 06:57:14PM +0100, Paolo Bonzini wrote: > > > On 23/03/2017 18:44, Stefan Hajnoczi wrote: > >> It's possible to wedge QEMU if the guest tries to reset a virtio-pci > >> device as QEMU is also using the drive for a blockjob. This patchset > >> aims to allow us to safely pause/resume jobs attached to individual > >> nodes in a manner similar to how bdrv_drain_all_begin/end do. > > > > Weird, I thought the 0 nanosecond sleep that block jobs do in their > > main loop allows aio_poll() loops to finish. > > The 0 nanosecond sleep is now done in the BDS AioContext rather than in > the "non-aio_poll-aware" main loop: > > commit 0b9caf9b3166c8deb3c4f3a774c2384b069dc29c > Author: Fam Zheng <famz@redhat.com> > Date: Tue Aug 26 15:15:43 2014 +0800 > > coroutine: Drop co_sleep_ns > > block_job_sleep_ns is the only user. Since we are moving towards > AioContext aware code, it's better to use the explicit version and drop > the old one. > > Signed-off-by: Fam Zheng <famz@redhat.com> > Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> But we hold the AioContext lock and are calling aio_poll(), so I would expect our loop to terminate. The blockjob coroutine should still be leaving this little gap in activity during which the aio_poll() loop finishes. Stefan
On Fri, 03/24 15:27, Stefan Hajnoczi wrote: > On Thu, Mar 23, 2017 at 06:57:14PM +0100, Paolo Bonzini wrote: > > > > > > On 23/03/2017 18:44, Stefan Hajnoczi wrote: > > >> It's possible to wedge QEMU if the guest tries to reset a virtio-pci > > >> device as QEMU is also using the drive for a blockjob. This patchset > > >> aims to allow us to safely pause/resume jobs attached to individual > > >> nodes in a manner similar to how bdrv_drain_all_begin/end do. > > > > > > Weird, I thought the 0 nanosecond sleep that block jobs do in their > > > main loop allows aio_poll() loops to finish. > > > > The 0 nanosecond sleep is now done in the BDS AioContext rather than in > > the "non-aio_poll-aware" main loop: > > > > commit 0b9caf9b3166c8deb3c4f3a774c2384b069dc29c > > Author: Fam Zheng <famz@redhat.com> > > Date: Tue Aug 26 15:15:43 2014 +0800 > > > > coroutine: Drop co_sleep_ns > > > > block_job_sleep_ns is the only user. Since we are moving towards > > AioContext aware code, it's better to use the explicit version and drop > > the old one. > > > > Signed-off-by: Fam Zheng <famz@redhat.com> > > Reviewed-by: Benoît Canet <benoit.canet@nodalink.com> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> > > But we hold the AioContext lock and are calling aio_poll(), so I would > expect our loop to terminate. The blockjob coroutine should still be > leaving this little gap in activity during which the aio_poll() loop > finishes. I am not sure, but at each gap, aio_poll() is free to fire the 0 nanosecond sleep timer cb already, which will generate more I/O. The correct thing is, like in this series, set the pause flag. Fam
© 2016 - 2026 Red Hat, Inc.