[Qemu-devel] [PATCH] block/mirror: fix fail to cancel when VM has heavy BLK IO

Liang Li posted 1 patch 6 years, 3 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20180124061728.GA99621@localhost
Test checkpatch passed
Test docker-build@min-glib passed
Test docker-mingw@fedora passed
Test docker-quick@centos6 passed
Test ppc passed
Test s390x passed
block/mirror.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
[Qemu-devel] [PATCH] block/mirror: fix fail to cancel when VM has heavy BLK IO
Posted by Liang Li 6 years, 3 months ago
We found that when doing drive mirror to a low speed shared storage,
if there was heavy BLK IO write workload in VM after the 'ready' event,
drive mirror block job can't be canceled immediately, it would keep
running until the heavy BLK IO workload stopped in the VM. This patch
fixed this issue.

Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Jeff Cody <jcody@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Signed-off-by: Huaitong Han <hanhuaitong@didichuxing.com>
Signed-off-by: Liang Li <liliangleo@didichuxing.com>
---
 block/mirror.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/block/mirror.c b/block/mirror.c
index c9badc1..3bc49a5 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -869,11 +869,9 @@ static void coroutine_fn mirror_run(void *opaque)
 
         ret = 0;
         trace_mirror_before_sleep(s, cnt, s->synced, delay_ns);
-        if (!s->synced) {
-            block_job_sleep_ns(&s->common, delay_ns);
-            if (block_job_is_cancelled(&s->common)) {
-                break;
-            }
+
+        if (block_job_is_cancelled(&s->common)) {
+            break;
         } else if (!should_complete) {
             delay_ns = (s->in_flight == 0 && cnt == 0 ? SLICE_TIME : 0);
             block_job_sleep_ns(&s->common, delay_ns);
@@ -887,7 +885,7 @@ immediate_exit:
          * or it was cancelled prematurely so that we do not guarantee that
          * the target is a copy of the source.
          */
-        assert(ret < 0 || (!s->synced && block_job_is_cancelled(&s->common)));
+        assert(ret < 0 || block_job_is_cancelled(&s->common));
         assert(need_drain);
         mirror_wait_for_all_io(s);
     }
-- 
1.8.3.1


Re: [Qemu-devel] [PATCH] block/mirror: fix fail to cancel when VM has heavy BLK IO
Posted by Eric Blake 6 years, 3 months ago
On 01/24/2018 12:17 AM, Liang Li wrote:
> We found that when doing drive mirror to a low speed shared storage,
> if there was heavy BLK IO write workload in VM after the 'ready' event,
> drive mirror block job can't be canceled immediately, it would keep
> running until the heavy BLK IO workload stopped in the VM. This patch
> fixed this issue.

I think you are breaking semantics here.  Libvirt relies on
'block-job-cancel' after the 'ready' event to be a clean point-in-time
snapshot, but that is only possible if there is no out-of-order pending
I/O at the time the action takes place.  Breaking in the middle of the
loop, without using bdrv_drain(), risks leaving an inconsistent copy of
data in the mirror not corresponding to any point-in-time on the source.

There's ongoing work on adding async mirroring; this may be a better
solution to the issue you are seeing.

https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg05419.html

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH] block/mirror: fix fail to cancel when VM has heavy BLK IO
Posted by Liang Li 6 years, 2 months ago
On Wed, Jan 24, 2018 at 01:16:39PM -0600, Eric Blake wrote:
> On 01/24/2018 12:17 AM, Liang Li wrote:
> > We found that when doing drive mirror to a low speed shared storage,
> > if there was heavy BLK IO write workload in VM after the 'ready' event,
> > drive mirror block job can't be canceled immediately, it would keep
> > running until the heavy BLK IO workload stopped in the VM. This patch
> > fixed this issue.
> 
> I think you are breaking semantics here.  Libvirt relies on
> 'block-job-cancel' after the 'ready' event to be a clean point-in-time
> snapshot, but that is only possible if there is no out-of-order pending
> I/O at the time the action takes place.  Breaking in the middle of the
> loop, without using bdrv_drain(), risks leaving an inconsistent copy of
> data in the mirror not corresponding to any point-in-time on the source.
> 
> There's ongoing work on adding async mirroring; this may be a better
> solution to the issue you are seeing.
> 
> https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg05419.html
> 
Hi Eric,

Thinks for your information, I didn't know libvirt depends on 'block-job-cancel'
for some of the block related operations.

It's seems a new interface should provided by qemu for use case that just
for aborting block job and don't care abort the mirror data integrality, and
libvirt can make use of this new interface.

Do you think this is the right direction?

Liang
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
> 



Re: [Qemu-devel] [PATCH] block/mirror: fix fail to cancel when VM has heavy BLK IO
Posted by Eric Blake 6 years, 2 months ago
On 01/24/2018 10:59 PM, Liang Li wrote:
>>
>> There's ongoing work on adding async mirroring; this may be a better
>> solution to the issue you are seeing.
>>
>> https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg05419.html
>>
> Hi Eric,
> 
> Thinks for your information, I didn't know libvirt depends on 'block-job-cancel'
> for some of the block related operations.
> 
> It's seems a new interface should provided by qemu for use case that just
> for aborting block job and don't care abort the mirror data integrality, and
> libvirt can make use of this new interface.
> 
> Do you think this is the right direction?

I don't know if it is better to wait for the new async mirroring code to
land, or to just propose a new QMP command that can force-quit an
ongoing mirror in the READY state, but you are correct that the only
safe way to do it is by adding a new command (or a new optional flag to
the existing block-job-cancel command).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH] block/mirror: fix fail to cancel when VM has heavy BLK IO
Posted by Liang Li 6 years, 2 months ago
On Thu, Jan 25, 2018 at 08:48:22AM -0600, Eric Blake wrote:
> On 01/24/2018 10:59 PM, Liang Li wrote:
> >>
> >> There's ongoing work on adding async mirroring; this may be a better
> >> solution to the issue you are seeing.
> >>
> >> https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg05419.html
> >>
> > Hi Eric,
> > 
> > Thinks for your information, I didn't know libvirt depends on 'block-job-cancel'
> > for some of the block related operations.
> > 
> > It's seems a new interface should provided by qemu for use case that just
> > for aborting block job and don't care abort the mirror data integrality, and
> > libvirt can make use of this new interface.
> > 
> > Do you think this is the right direction?
> 
> I don't know if it is better to wait for the new async mirroring code to
> land, or to just propose a new QMP command that can force-quit an
> ongoing mirror in the READY state, but you are correct that the only
> safe way to do it is by adding a new command (or a new optional flag to
> the existing block-job-cancel command).
> 

Active sync does not conflict with the new QMP command, no need to wait.
The current QMP command is:

{ 'command': 'block-job-cancel', 'data': { 'device': 'str', '*force': 'bool' } }

'force' has other meaning which is not used by libvirt, for the change, there
are 3 options:

a. Now that 'force' is not used by libvirt and it current semantic is not very useful,
we can change it's semantic to force-quit without syncing.

b. change 'force' from bool to flag, and bit 0 is used for it's original meaning.

c. add another bool parameter.


which is the best one?

Thanks!

Liang 


> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
> 




Re: [Qemu-devel] [PATCH] block/mirror: fix fail to cancel when VM has heavy BLK IO
Posted by Eric Blake 6 years, 2 months ago
On 01/26/2018 12:46 AM, Liang Li wrote:
> The current QMP command is:
> 
> { 'command': 'block-job-cancel', 'data': { 'device': 'str', '*force': 'bool' } }
> 
> 'force' has other meaning which is not used by libvirt, for the change, there
> are 3 options:
> 
> a. Now that 'force' is not used by libvirt and it current semantic is not very useful,
> we can change it's semantic to force-quit without syncing.

The current semantics are:

# @force: whether to allow cancellation of a paused job (default
#         false).  Since 1.3.

You are right that libvirt is not using it at the moment; but that
doesn't tell us whether someone else is using it.  On the other hand, it
is a fairly easy argument to make that "a job which is paused is not
complete, so forcing it to cancel means an unclean image left behind",
which can then be reformulated as "the force flag says to cancel
immediately, whether the job is paused or has pending data, and thus
leave an unclean image behind".  In other words, I don't think it is too
bad to just tidy up the wording, and allow the existing 'force':true
parameter to be enabled to quit a job that won't converge.

> 
> b. change 'force' from bool to flag, and bit 0 is used for it's original meaning.

Not possible.  You can't change from 'force':true to 'force':1 in JSON,
at least not without rewriting the command to use an alternate that
accepts both bool and int (actually, I seem to recall that we tightened
QAPI to not permit alternates that might be ambiguous when parsed by
QemuOpts, which may mean that is not even possible - although I haven't
tried to see if it works or gives an error).

> 
> c. add another bool parameter.

Also doable, if we are concerned that existing semantics of 'force'
affecting only paused jobs must be preserved.

> 
> 
> which is the best one?

1 is slightly less code, but 3 is more conservative.  I'd be okay with
option 1 if no one else can provide a reason why it would break something.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [Qemu-devel] [PATCH] block/mirror: fix fail to cancel when VM has heavy BLK IO
Posted by Liang Li 6 years, 2 months ago
On Fri, Jan 26, 2018 at 08:04:08AM -0600, Eric Blake wrote:
> On 01/26/2018 12:46 AM, Liang Li wrote:
> > The current QMP command is:
> > 
> > { 'command': 'block-job-cancel', 'data': { 'device': 'str', '*force': 'bool' } }
> > 
> > 'force' has other meaning which is not used by libvirt, for the change, there
> > are 3 options:
> > 
> > a. Now that 'force' is not used by libvirt and it current semantic is not very useful,
> > we can change it's semantic to force-quit without syncing.
> 
> The current semantics are:
> 
> # @force: whether to allow cancellation of a paused job (default
> #         false).  Since 1.3.
> 
> You are right that libvirt is not using it at the moment; but that
> doesn't tell us whether someone else is using it.  On the other hand, it
> is a fairly easy argument to make that "a job which is paused is not
> complete, so forcing it to cancel means an unclean image left behind",
> which can then be reformulated as "the force flag says to cancel
> immediately, whether the job is paused or has pending data, and thus
> leave an unclean image behind".  In other words, I don't think it is too
> bad to just tidy up the wording, and allow the existing 'force':true
> parameter to be enabled to quit a job that won't converge.
> 
> > 
> > b. change 'force' from bool to flag, and bit 0 is used for it's original meaning.
> 
> Not possible.  You can't change from 'force':true to 'force':1 in JSON,
> at least not without rewriting the command to use an alternate that
> accepts both bool and int (actually, I seem to recall that we tightened
> QAPI to not permit alternates that might be ambiguous when parsed by
> QemuOpts, which may mean that is not even possible - although I haven't
> tried to see if it works or gives an error).
> 
> > 
> > c. add another bool parameter.
> 
> Also doable, if we are concerned that existing semantics of 'force'
> affecting only paused jobs must be preserved.
> 
> > 
> > 
> > which is the best one?
> 
> 1 is slightly less code, but 3 is more conservative.  I'd be okay with
> option 1 if no one else can provide a reason why it would break something.
> 

OK. I will send a patch based on the first option.

Thanks!

Liang
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.           +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org
> 




Re: [Qemu-devel] [Qemu-block] [PATCH] block/mirror: fix fail to cancel when VM has heavy BLK IO
Posted by John Snow 6 years, 2 months ago

On 01/24/2018 02:16 PM, Eric Blake wrote:
> On 01/24/2018 12:17 AM, Liang Li wrote:
>> We found that when doing drive mirror to a low speed shared storage,
>> if there was heavy BLK IO write workload in VM after the 'ready' event,
>> drive mirror block job can't be canceled immediately, it would keep
>> running until the heavy BLK IO workload stopped in the VM. This patch
>> fixed this issue.
> 
> I think you are breaking semantics here.  Libvirt relies on
> 'block-job-cancel' after the 'ready' event to be a clean point-in-time
> snapshot, but that is only possible if there is no out-of-order pending
> I/O at the time the action takes place.  Breaking in the middle of the
> loop, without using bdrv_drain(), risks leaving an inconsistent copy of
> data in the mirror not corresponding to any point-in-time on the source.
> 
> There's ongoing work on adding async mirroring; this may be a better
> solution to the issue you are seeing.
> 
> https://lists.gnu.org/archive/html/qemu-devel/2018-01/msg05419.html
> 

Sounds like another point for the idea of using a "completion mode" in a
2.0 API instead of treating "cancel" like a valid way of completing a job.

(Kevin: If you're taking this up, it would be *very* nice to have jobs
have an option via job-set-property or some such command that allows us
to change our desired completion mode on the fly, which frees up cancel
to be simply a cancel.)

--js