[PATCH] qemu: blockjob: Avoid spurious log errors when cancelling a shallow copy with reused images

Peter Krempa posted 1 patch 2 years, 2 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/f77c3cc35ca56ecf1899402e9cec6dca849bc771.1645548938.git.pkrempa@redhat.com
Test syntax-check failed
src/qemu/qemu_blockjob.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
[PATCH] qemu: blockjob: Avoid spurious log errors when cancelling a shallow copy with reused images
Posted by Peter Krempa 2 years, 2 months ago
In case when a user starts a block copy operation with
VIR_DOMAIN_BLOCK_COPY_SHALLOW and VIR_DOMAIN_BLOCK_COPY_REUSE_EXT and
both the reused image and the original disk have a backing image libvirt
specifically does not insert the backing image until after the job is
asked to be completed via virBlockJobAbort with
VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT.

This is so that management applications can copy the backing image on
the background.

Now when a user aborts the block job instead of cancelling it we'd
ignore the fact that we didn't insert the backing image yet and the
cancellation would result into a 'blockdev-del' of a invalid node name
and thus an 'error' severity entry in the log.

To solve this issue we use the same conditions when the backing image
addition is avoided to remove the internal state for them prior to the
call to unplug the mirror destination.

Reported-by: Kashyap Chamarthy <kchamart@redhat.com>
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
---
 src/qemu/qemu_blockjob.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c
index 726df95067..2442865b9b 100644
--- a/src/qemu/qemu_blockjob.c
+++ b/src/qemu/qemu_blockjob.c
@@ -1406,6 +1406,8 @@ qemuBlockJobProcessEventConcludedCopyAbort(virQEMUDriver *driver,
                                            qemuBlockJobData *job,
                                            qemuDomainAsyncJob asyncJob)
 {
+    qemuDomainObjPrivate *priv = vm->privateData;
+
     VIR_DEBUG("copy job '%s' on VM '%s' aborted", job->name, vm->def->name);

     /* mirror may be NULL for copy job corresponding to migration */
@@ -1413,6 +1415,21 @@ qemuBlockJobProcessEventConcludedCopyAbort(virQEMUDriver *driver,
         !job->disk->mirror)
         return;

+    if (!job->jobflagsmissing) {
+        bool shallow = job->jobflags & VIR_DOMAIN_BLOCK_COPY_SHALLOW;
+        bool reuse = job->jobflags & VIR_DOMAIN_BLOCK_COPY_REUSE_EXT;
+
+        /* In the special case of a shallow copy with reused image we don't
+         * hotplug the full chain when QEMU_CAPS_BLOCKDEV_SNAPSHOT_ALLOW_WRITE_ONLY
+         * is supported. Attempting to delete it would thus result in spurious
+         * errors as we'd attempt to blockdev-del images which were not added
+         * yet */
+        if (reuse && shallow &&
+            virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKDEV_SNAPSHOT_ALLOW_WRITE_ONLY) &&
+            virStorageSourceHasBacking(job->disk->mirror))
+            g_clear_pointer(&job->disk->mirror->backingStore, virObjectUnref);
+    }
+
     /* activeWrite bitmap is removed automatically here */
     qemuBlockJobEventProcessConcludedRemoveChain(driver, vm, asyncJob, job->disk->mirror);
     g_clear_pointer(&job->disk->mirror, virObjectUnref);
-- 
2.35.1

Re: [PATCH] qemu: blockjob: Avoid spurious log errors when cancelling a shallow copy with reused images
Posted by Ján Tomko 2 years, 2 months ago
On a Tuesday in 2022, Peter Krempa wrote:
>In case when a user starts a block copy operation with
>VIR_DOMAIN_BLOCK_COPY_SHALLOW and VIR_DOMAIN_BLOCK_COPY_REUSE_EXT and
>both the reused image and the original disk have a backing image libvirt
>specifically does not insert the backing image until after the job is
>asked to be completed via virBlockJobAbort with
>VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT.
>
>This is so that management applications can copy the backing image on
>the background.
>
>Now when a user aborts the block job instead of cancelling it we'd
>ignore the fact that we didn't insert the backing image yet and the
>cancellation would result into a 'blockdev-del' of a invalid node name
>and thus an 'error' severity entry in the log.
>
>To solve this issue we use the same conditions when the backing image
>addition is avoided to remove the internal state for them prior to the
>call to unplug the mirror destination.
>
>Reported-by: Kashyap Chamarthy <kchamart@redhat.com>
>Signed-off-by: Peter Krempa <pkrempa@redhat.com>
>---
> src/qemu/qemu_blockjob.c | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>

Reviewed-by: Ján Tomko <jtomko@redhat.com>

Jano
Re: [PATCH] qemu: blockjob: Avoid spurious log errors when cancelling a shallow copy with reused images
Posted by Kashyap Chamarthy 2 years, 2 months ago
On Tue, Feb 22, 2022 at 05:55:38PM +0100, Peter Krempa wrote:
> In case when a user starts a block copy operation with
> VIR_DOMAIN_BLOCK_COPY_SHALLOW and VIR_DOMAIN_BLOCK_COPY_REUSE_EXT and
> both the reused image and the original disk have a backing image libvirt
> specifically does not insert the backing image until after the job is
> asked to be completed via virBlockJobAbort with
> VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT.

Thanks for the quick patch!

First, to refresh my memory (and for others reading the thread), these
are the two main uses of combining '--reuse-external' and '--shallow'
flags with "blockcopy":

(a) You can control what the backing file is, as long as the top of that
    backing file chain has identical guest-visible contents to what the
    original chain had in its backing file. 

(b) The new backing file can have a _different_ backing chain depth or
    even different format.

            * * *

Now to construct a reproducer with `virsh`.   Peter, tell me what I got
wrong :-)

(1) Let's start with this image chain (overlays are always of qcow2
    format):

        base.raw <-- overlay1 <-- overlay2 (live QEMU).

    With the goal of copying the above into the below chain (note, here
    we're flattening both base.raw and overlay1 into a single file,
    "flat-o-b1")

         flat-o-b1.raw  <--  copy (live QEMU)

(2) Make a *raw* variant of "base <-- overlay1", call it "flat-b-o1.raw"
    (i.e. flattened version of combined base and overlay1):

    $ qemu-img convert -f qcow2 -O raw overlay1.qcow2 \
        flat-b-o1.raw

(3) Then, create an empty QCOW2 file to create "flat-b-o1 <-- copy (empty)":

    $ qemu-img create -f qcow2 \
        -o backing_file=flat-b-o1.raw,backing_fmt=raw copy.qcow2

(4) *Then* perform the `blockcopy --reuse-external --shallow`:

    $ virsh blockcopy \
        --domain vm1 vda ./copy.qcow2 \
        --wait --verbose --reuse-external --shallow \
        --finish

(5) And then pivot the job, so that live QEMU now points to copy

    $ virsh blockjob --pivot

And the final result, we get the "goal" chain in step (1).  

    src:  base <-- overlay1 <-- overlay2
                ==               ==
    dst:    flat-o-b1       <--  copy (live QEMU)


Am I missing anything else?

> This is so that management applications can copy the backing image on
> the background.
> 
> Now when a user aborts the block job instead of cancelling it we'd
> ignore the fact that we didn't insert the backing image yet and the
> cancellation would result into a 'blockdev-del' of a invalid node name
> and thus an 'error' severity entry in the log.
> 
> To solve this issue we use the same conditions when the backing image
> addition is avoided to remove the internal state for them prior to the
> call to unplug the mirror destination.
> 
> Reported-by: Kashyap Chamarthy <kchamart@redhat.com>
> Signed-off-by: Peter Krempa <pkrempa@redhat.com>
> ---
>  src/qemu/qemu_blockjob.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/src/qemu/qemu_blockjob.c b/src/qemu/qemu_blockjob.c
> index 726df95067..2442865b9b 100644
> --- a/src/qemu/qemu_blockjob.c
> +++ b/src/qemu/qemu_blockjob.c
> @@ -1406,6 +1406,8 @@ qemuBlockJobProcessEventConcludedCopyAbort(virQEMUDriver *driver,
>                                             qemuBlockJobData *job,
>                                             qemuDomainAsyncJob asyncJob)
>  {
> +    qemuDomainObjPrivate *priv = vm->privateData;
> +
>      VIR_DEBUG("copy job '%s' on VM '%s' aborted", job->name, vm->def->name);
> 
>      /* mirror may be NULL for copy job corresponding to migration */
> @@ -1413,6 +1415,21 @@ qemuBlockJobProcessEventConcludedCopyAbort(virQEMUDriver *driver,
>          !job->disk->mirror)
>          return;
> 
> +    if (!job->jobflagsmissing) {
> +        bool shallow = job->jobflags & VIR_DOMAIN_BLOCK_COPY_SHALLOW;
> +        bool reuse = job->jobflags & VIR_DOMAIN_BLOCK_COPY_REUSE_EXT;
> +
> +        /* In the special case of a shallow copy with reused image we don't
> +         * hotplug the full chain when QEMU_CAPS_BLOCKDEV_SNAPSHOT_ALLOW_WRITE_ONLY
> +         * is supported. Attempting to delete it would thus result in spurious
> +         * errors as we'd attempt to blockdev-del images which were not added
> +         * yet */
> +        if (reuse && shallow &&
> +            virQEMUCapsGet(priv->qemuCaps, QEMU_CAPS_BLOCKDEV_SNAPSHOT_ALLOW_WRITE_ONLY) &&
> +            virStorageSourceHasBacking(job->disk->mirror))
> +            g_clear_pointer(&job->disk->mirror->backingStore, virObjectUnref);
> +    }
> +
>      /* activeWrite bitmap is removed automatically here */
>      qemuBlockJobEventProcessConcludedRemoveChain(driver, vm, asyncJob, job->disk->mirror);
>      g_clear_pointer(&job->disk->mirror, virObjectUnref);
> -- 
> 2.35.1
> 

-- 
/kashyap

Re: [PATCH] qemu: blockjob: Avoid spurious log errors when cancelling a shallow copy with reused images
Posted by Peter Krempa 2 years, 2 months ago
On Wed, Feb 23, 2022 at 14:17:40 +0100, Kashyap Chamarthy wrote:
> On Tue, Feb 22, 2022 at 05:55:38PM +0100, Peter Krempa wrote:
> > In case when a user starts a block copy operation with
> > VIR_DOMAIN_BLOCK_COPY_SHALLOW and VIR_DOMAIN_BLOCK_COPY_REUSE_EXT and
> > both the reused image and the original disk have a backing image libvirt
> > specifically does not insert the backing image until after the job is
> > asked to be completed via virBlockJobAbort with
> > VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT.
> 
> Thanks for the quick patch!
> 
> First, to refresh my memory (and for others reading the thread), these
> are the two main uses of combining '--reuse-external' and '--shallow'
> flags with "blockcopy":
> 
> (a) You can control what the backing file is, as long as the top of that
>     backing file chain has identical guest-visible contents to what the
>     original chain had in its backing file. 
> 
> (b) The new backing file can have a _different_ backing chain depth or
>     even different format.
> 
>             * * *
> 
> Now to construct a reproducer with `virsh`.   Peter, tell me what I got
> wrong :-)
> 
> (1) Let's start with this image chain (overlays are always of qcow2
>     format):
> 
>         base.raw <-- overlay1 <-- overlay2 (live QEMU).
> 
>     With the goal of copying the above into the below chain (note, here
>     we're flattening both base.raw and overlay1 into a single file,
>     "flat-o-b1")
> 
>          flat-o-b1.raw  <--  copy (live QEMU)
> 
> (2) Make a *raw* variant of "base <-- overlay1", call it "flat-b-o1.raw"
>     (i.e. flattened version of combined base and overlay1):
> 
>     $ qemu-img convert -f qcow2 -O raw overlay1.qcow2 \
>         flat-b-o1.raw
> 
> (3) Then, create an empty QCOW2 file to create "flat-b-o1 <-- copy (empty)":
> 
>     $ qemu-img create -f qcow2 \
>         -o backing_file=flat-b-o1.raw,backing_fmt=raw copy.qcow2
> 
> (4) *Then* perform the `blockcopy --reuse-external --shallow`:
> 
>     $ virsh blockcopy \
>         --domain vm1 vda ./copy.qcow2 \
>         --wait --verbose --reuse-external --shallow \
>         --finish

If you are working on a persistent domain/vm use --transient-job to
avoid the need to undefine it.

Also note that '--finish' is equivalent to aborting the job after it
reaches synchronized phase ...


> 
> (5) And then pivot the job, so that live QEMU now points to copy
> 
>     $ virsh blockjob --pivot


... so this will actually say that there is no blockjob, because the
previous step actually cancelled it.


> 
> And the final result, we get the "goal" chain in step (1).  
> 
>     src:  base <-- overlay1 <-- overlay2
>                 ==               ==
>     dst:    flat-o-b1       <--  copy (live QEMU)
> 
> 
> Am I missing anything else?

Now the bug is that when you _cancel_ the job we attempt to blockdev-del
some images which were not blockdev-added, and thus create some spurious
log entries, but the rest of the code behaves correctly.

So you'll only see the failure this is fixing in the logs.

Re: [PATCH] qemu: blockjob: Avoid spurious log errors when cancelling a shallow copy with reused images
Posted by Kashyap Chamarthy 2 years, 2 months ago
On Wed, Feb 23, 2022 at 02:33:04PM +0100, Peter Krempa wrote:
> On Wed, Feb 23, 2022 at 14:17:40 +0100, Kashyap Chamarthy wrote:
> > On Tue, Feb 22, 2022 at 05:55:38PM +0100, Peter Krempa wrote:

[...]

> > Now to construct a reproducer with `virsh`.   Peter, tell me what I got
> > wrong :-)
> > 
> > (1) Let's start with this image chain (overlays are always of qcow2
> >     format):
> > 
> >         base.raw <-- overlay1 <-- overlay2 (live QEMU).
> > 
> >     With the goal of copying the above into the below chain (note, here
> >     we're flattening both base.raw and overlay1 into a single file,
> >     "flat-o-b1")
> > 
> >          flat-o-b1.raw  <--  copy (live QEMU)
> > 
> > (2) Make a *raw* variant of "base <-- overlay1", call it "flat-b-o1.raw"
> >     (i.e. flattened version of combined base and overlay1):
> > 
> >     $ qemu-img convert -f qcow2 -O raw overlay1.qcow2 \
> >         flat-b-o1.raw
> > 
> > (3) Then, create an empty QCOW2 file to create "flat-b-o1 <-- copy (empty)":
> > 
> >     $ qemu-img create -f qcow2 \
> >         -o backing_file=flat-b-o1.raw,backing_fmt=raw copy.qcow2
> > 
> > (4) *Then* perform the `blockcopy --reuse-external --shallow`:
> > 
> >     $ virsh blockcopy \
> >         --domain vm1 vda ./copy.qcow2 \
> >         --wait --verbose --reuse-external --shallow \
> >         --finish
> 
> If you are working on a persistent domain/vm use --transient-job to
> avoid the need to undefine it.

Yeah, makes sense.

> Also note that '--finish' is equivalent to aborting the job after it
> reaches synchronized phase ...

Damn, I thought I _removed_ finish, but still forgot it despite making a
mental note.  Thanks for catching htat.

> > (5) And then pivot the job, so that live QEMU now points to copy
> > 
> >     $ virsh blockjob --pivot
> 
> 
> ... so this will actually say that there is no blockjob, because the
> previous step actually cancelled it.

Yeah, indeed.

> > 
> > And the final result, we get the "goal" chain in step (1).  
> > 
> >     src:  base <-- overlay1 <-- overlay2
> >                 ==               ==
> >     dst:    flat-o-b1       <--  copy (live QEMU)
> > 
> > 
> > Am I missing anything else?
> 
> Now the bug is that when you _cancel_ the job we attempt to blockdev-del
> some images which were not blockdev-added, and thus create some spurious
> log entries, but the rest of the code behaves correctly.

Yep, understood.

> So you'll only see the failure this is fixing in the logs.

Noted.  Thanks for the review.

-- 
/kashyap