[v1] block: Ignore guest dev permissions during incoming migration

[Qemu-devel] [RFC PATCH for-2.9] block: Ignore guest dev permissions during incoming migration

Posted by Kevin Wolf 8 years, 10 months ago

Usually guest devices don't like other writers to the same image, so
they use blk_set_perm() to prevent this from happening. In the migration
phase before the VM is actually running, though, they don't have a
problem with writes to the image. On the other hand, storage migration
needs to be able to write to the image in this phase, so the restrictive
blk_set_perm() call of qdev devices breaks it.

This patch flags all BlockBackends with a qdev device as
blk->disable_perm during incoming migration, which means that the
requested permissions are stored in the BlockBackend, but not actually
applied to its root node yet.

Once migration has finished and the VM should be resumed, the
permissions are applied. If they cannot be applied (e.g. because the NBD
server used for block migration hasn't been shut down), resuming the VM
fails.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/block-backend.c | 40 +++++++++++++++++++++++++++++++++++++++-
 include/block/block.h |  2 ++
 migration/migration.c |  8 ++++++++
 qmp.c                 |  6 ++++++
 4 files changed, 55 insertions(+), 1 deletion(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 0b63773..f817040 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -61,6 +61,7 @@ struct BlockBackend {
 
     uint64_t perm;
     uint64_t shared_perm;
+    bool disable_perm;
 
     bool allow_write_beyond_eof;
 
@@ -578,7 +579,7 @@ int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
 {
     int ret;
 
-    if (blk->root) {
+    if (blk->root && !blk->disable_perm) {
         ret = bdrv_child_try_set_perm(blk->root, perm, shared_perm, errp);
         if (ret < 0) {
             return ret;
@@ -597,15 +598,52 @@ void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm)
     *shared_perm = blk->shared_perm;
 }
 
+/*
+ * Notifies the user of all BlockBackends that migration has completed. qdev
+ * devices can tighten their permissions in response (specifically revoke
+ * shared write permissions that we needed for storage migration).
+ *
+ * If an error is returned, the VM cannot be allowed to be resumed.
+ */
+void blk_resume_after_migration(Error **errp)
+{
+    BlockBackend *blk;
+    Error *local_err = NULL;
+
+    for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
+        if (!blk->disable_perm) {
+            continue;
+        }
+
+        blk->disable_perm = false;
+
+        blk_set_perm(blk, blk->perm, blk->shared_perm, &local_err);
+        if (local_err) {
+            error_propagate(errp, local_err);
+            blk->disable_perm = true;
+            return;
+        }
+    }
+}
+
 static int blk_do_attach_dev(BlockBackend *blk, void *dev)
 {
     if (blk->dev) {
         return -EBUSY;
     }
+
+    /* While migration is still incoming, we don't need to apply the
+     * permissions of guest device BlockBackends. We might still have a block
+     * job or NBD server writing to the image for storage migration. */
+    if (runstate_check(RUN_STATE_INMIGRATE)) {
+        blk->disable_perm = true;
+    }
+
     blk_ref(blk);
     blk->dev = dev;
     blk->legacy_dev = false;
     blk_iostatus_reset(blk);
+
     return 0;
 }
 
diff --git a/include/block/block.h b/include/block/block.h
index 5149260..3e09222 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -366,6 +366,8 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
 void bdrv_invalidate_cache_all(Error **errp);
 int bdrv_inactivate_all(void);
 
+void blk_resume_after_migration(Error **errp);
+
 /* Ensure contents are flushed to disk.  */
 int bdrv_flush(BlockDriverState *bs);
 int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
diff --git a/migration/migration.c b/migration/migration.c
index 54060f7..ad4036f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -349,6 +349,14 @@ static void process_incoming_migration_bh(void *opaque)
         exit(EXIT_FAILURE);
     }
 
+    /* If we get an error here, just don't restart the VM yet. */
+    blk_resume_after_migration(&local_err);
+    if (local_err) {
+        error_free(local_err);
+        local_err = NULL;
+        autostart = false;
+    }
+
     /*
      * This must happen after all error conditions are dealt with and
      * we're sure the VM is going to be running on this host.
diff --git a/qmp.c b/qmp.c
index fa82b59..a744e44 100644
--- a/qmp.c
+++ b/qmp.c
@@ -207,6 +207,12 @@ void qmp_cont(Error **errp)
         }
     }
 
+    blk_resume_after_migration(&local_err);
+    if (local_err) {
+        error_propagate(errp, local_err);
+        return;
+    }
+
     if (runstate_check(RUN_STATE_INMIGRATE)) {
         autostart = 1;
     } else {
-- 
1.8.3.1

Re: [Qemu-devel] [RFC PATCH for-2.9] block: Ignore guest dev permissions during incoming migration

Posted by Max Reitz 8 years, 10 months ago

On 04.04.2017 17:35, Kevin Wolf wrote:
> Usually guest devices don't like other writers to the same image, so
> they use blk_set_perm() to prevent this from happening. In the migration
> phase before the VM is actually running, though, they don't have a
> problem with writes to the image. On the other hand, storage migration
> needs to be able to write to the image in this phase, so the restrictive
> blk_set_perm() call of qdev devices breaks it.
> 
> This patch flags all BlockBackends with a qdev device as
> blk->disable_perm during incoming migration, which means that the
> requested permissions are stored in the BlockBackend, but not actually
> applied to its root node yet.
> 
> Once migration has finished and the VM should be resumed, the
> permissions are applied. If they cannot be applied (e.g. because the NBD
> server used for block migration hasn't been shut down), resuming the VM
> fails.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/block-backend.c | 40 +++++++++++++++++++++++++++++++++++++++-
>  include/block/block.h |  2 ++
>  migration/migration.c |  8 ++++++++
>  qmp.c                 |  6 ++++++
>  4 files changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 0b63773..f817040 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c

[...]

> @@ -597,15 +598,52 @@ void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm)
>      *shared_perm = blk->shared_perm;
>  }
>  
> +/*
> + * Notifies the user of all BlockBackends that migration has completed. qdev
> + * devices can tighten their permissions in response (specifically revoke
> + * shared write permissions that we needed for storage migration).
> + *
> + * If an error is returned, the VM cannot be allowed to be resumed.
> + */
> +void blk_resume_after_migration(Error **errp)
> +{
> +    BlockBackend *blk;
> +    Error *local_err = NULL;
> +
> +    for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {

Shouldn't we use blk_all_next()?

Trusting you that silently disabling autostart is something the upper
layers can deal with, the rest looks good to me.

(The only other runtime changes of autostart apart from stop/cont appear
to be in blockdev_init() (if (bdrv_key_required()), but I don't think
that can happen anymore) and in migration/colo.c (which enables it and
emits an error message).)

Max

> +        if (!blk->disable_perm) {
> +            continue;
> +        }
> +
> +        blk->disable_perm = false;
> +
> +        blk_set_perm(blk, blk->perm, blk->shared_perm, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            blk->disable_perm = true;
> +            return;
> +        }
> +    }
> +}
> +

Re: [Qemu-devel] [RFC PATCH for-2.9] block: Ignore guest dev permissions during incoming migration

Posted by Kevin Wolf 8 years, 10 months ago

Am 05.04.2017 um 15:22 hat Max Reitz geschrieben:
> On 04.04.2017 17:35, Kevin Wolf wrote:
> > Usually guest devices don't like other writers to the same image, so
> > they use blk_set_perm() to prevent this from happening. In the migration
> > phase before the VM is actually running, though, they don't have a
> > problem with writes to the image. On the other hand, storage migration
> > needs to be able to write to the image in this phase, so the restrictive
> > blk_set_perm() call of qdev devices breaks it.
> > 
> > This patch flags all BlockBackends with a qdev device as
> > blk->disable_perm during incoming migration, which means that the
> > requested permissions are stored in the BlockBackend, but not actually
> > applied to its root node yet.
> > 
> > Once migration has finished and the VM should be resumed, the
> > permissions are applied. If they cannot be applied (e.g. because the NBD
> > server used for block migration hasn't been shut down), resuming the VM
> > fails.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/block-backend.c | 40 +++++++++++++++++++++++++++++++++++++++-
> >  include/block/block.h |  2 ++
> >  migration/migration.c |  8 ++++++++
> >  qmp.c                 |  6 ++++++
> >  4 files changed, 55 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/block-backend.c b/block/block-backend.c
> > index 0b63773..f817040 100644
> > --- a/block/block-backend.c
> > +++ b/block/block-backend.c
> 
> [...]
> 
> > @@ -597,15 +598,52 @@ void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm)
> >      *shared_perm = blk->shared_perm;
> >  }
> >  
> > +/*
> > + * Notifies the user of all BlockBackends that migration has completed. qdev
> > + * devices can tighten their permissions in response (specifically revoke
> > + * shared write permissions that we needed for storage migration).
> > + *
> > + * If an error is returned, the VM cannot be allowed to be resumed.
> > + */
> > +void blk_resume_after_migration(Error **errp)
> > +{
> > +    BlockBackend *blk;
> > +    Error *local_err = NULL;
> > +
> > +    for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
> 
> Shouldn't we use blk_all_next()?

Good catch, thanks.

At first I added it into the loop in qmp_cont() and then copied it here
without noticing the resetting the iostatus is really only needed for
monitor-owned BBs at the moment, but this one is different.

Of course, as soon as we improve query-block to work reasonably well
with -device drive=<node-name>, qmp_cont() needs to use blk_all_next(),
too.

> Trusting you that silently disabling autostart is something the upper
> layers can deal with, the rest looks good to me.
> 
> (The only other runtime changes of autostart apart from stop/cont appear
> to be in blockdev_init() (if (bdrv_key_required()), but I don't think
> that can happen anymore) and in migration/colo.c (which enables it and
> emits an error message).)

I think in practice libvirt always sets -S on the destination anyway.

Kevin

Re: [Qemu-devel] [RFC PATCH for-2.9] block: Ignore guest dev permissions during incoming migration

Posted by Daniel P. Berrange 8 years, 10 months ago

On Thu, Apr 06, 2017 at 01:22:56PM +0200, Kevin Wolf wrote:
> Am 05.04.2017 um 15:22 hat Max Reitz geschrieben:
> > On 04.04.2017 17:35, Kevin Wolf wrote:
> > > Usually guest devices don't like other writers to the same image, so
> > > they use blk_set_perm() to prevent this from happening. In the migration
> > > phase before the VM is actually running, though, they don't have a
> > > problem with writes to the image. On the other hand, storage migration
> > > needs to be able to write to the image in this phase, so the restrictive
> > > blk_set_perm() call of qdev devices breaks it.
> > > 
> > > This patch flags all BlockBackends with a qdev device as
> > > blk->disable_perm during incoming migration, which means that the
> > > requested permissions are stored in the BlockBackend, but not actually
> > > applied to its root node yet.
> > > 
> > > Once migration has finished and the VM should be resumed, the
> > > permissions are applied. If they cannot be applied (e.g. because the NBD
> > > server used for block migration hasn't been shut down), resuming the VM
> > > fails.
> > > 
> > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > > ---
> > >  block/block-backend.c | 40 +++++++++++++++++++++++++++++++++++++++-
> > >  include/block/block.h |  2 ++
> > >  migration/migration.c |  8 ++++++++
> > >  qmp.c                 |  6 ++++++
> > >  4 files changed, 55 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/block/block-backend.c b/block/block-backend.c
> > > index 0b63773..f817040 100644
> > > --- a/block/block-backend.c
> > > +++ b/block/block-backend.c
> > 
> > [...]
> > 
> > > @@ -597,15 +598,52 @@ void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm)
> > >      *shared_perm = blk->shared_perm;
> > >  }
> > >  
> > > +/*
> > > + * Notifies the user of all BlockBackends that migration has completed. qdev
> > > + * devices can tighten their permissions in response (specifically revoke
> > > + * shared write permissions that we needed for storage migration).
> > > + *
> > > + * If an error is returned, the VM cannot be allowed to be resumed.
> > > + */
> > > +void blk_resume_after_migration(Error **errp)
> > > +{
> > > +    BlockBackend *blk;
> > > +    Error *local_err = NULL;
> > > +
> > > +    for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
> > 
> > Shouldn't we use blk_all_next()?
> 
> Good catch, thanks.
> 
> At first I added it into the loop in qmp_cont() and then copied it here
> without noticing the resetting the iostatus is really only needed for
> monitor-owned BBs at the moment, but this one is different.
> 
> Of course, as soon as we improve query-block to work reasonably well
> with -device drive=<node-name>, qmp_cont() needs to use blk_all_next(),
> too.
> 
> > Trusting you that silently disabling autostart is something the upper
> > layers can deal with, the rest looks good to me.
> > 
> > (The only other runtime changes of autostart apart from stop/cont appear
> > to be in blockdev_init() (if (bdrv_key_required()), but I don't think
> > that can happen anymore) and in migration/colo.c (which enables it and
> > emits an error message).)
> 
> I think in practice libvirt always sets -S on the destination anyway.

Libvirt uses -S for *all* QEMU instances it starts, regardless of use of
migration, since we need todo things after qemu starts, but before CPUs
are run.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://entangle-photo.org       -o-    http://search.cpan.org/~danberr/ :|

Re: [Qemu-devel] [RFC PATCH for-2.9] block: Ignore guest dev permissions during incoming migration

Posted by Kevin Wolf 8 years, 10 months ago

Am 04.04.2017 um 17:35 hat Kevin Wolf geschrieben:
> Usually guest devices don't like other writers to the same image, so
> they use blk_set_perm() to prevent this from happening. In the migration
> phase before the VM is actually running, though, they don't have a
> problem with writes to the image. On the other hand, storage migration
> needs to be able to write to the image in this phase, so the restrictive
> blk_set_perm() call of qdev devices breaks it.
> 
> This patch flags all BlockBackends with a qdev device as
> blk->disable_perm during incoming migration, which means that the
> requested permissions are stored in the BlockBackend, but not actually
> applied to its root node yet.
> 
> Once migration has finished and the VM should be resumed, the
> permissions are applied. If they cannot be applied (e.g. because the NBD
> server used for block migration hasn't been shut down), resuming the VM
> fails.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Ciprian, can you give this patch a try and report back whether it fixes
the storage migration bug you encountered?

Kevin

Re: [Qemu-devel] [Qemu-block] [RFC PATCH for-2.9] block: Ignore guest dev permissions during incoming migration

Posted by Kashyap Chamarthy 8 years, 10 months ago

On Tue, Apr 04, 2017 at 05:35:56PM +0200, Kevin Wolf wrote:
> Usually guest devices don't like other writers to the same image, so
> they use blk_set_perm() to prevent this from happening. In the migration
> phase before the VM is actually running, though, they don't have a
> problem with writes to the image. On the other hand, storage migration
> needs to be able to write to the image in this phase, so the restrictive
> blk_set_perm() call of qdev devices breaks it.
> 
> This patch flags all BlockBackends with a qdev device as
> blk->disable_perm during incoming migration, which means that the
> requested permissions are stored in the BlockBackend, but not actually
> applied to its root node yet.
> 
> Once migration has finished and the VM should be resumed, the
> permissions are applied. If they cannot be applied (e.g. because the NBD
> server used for block migration hasn't been shut down), resuming the VM
> fails.

So I have an environment with a patched QEMU built with your fix to test
it with libvirt APIs, however, there's a libvirt bug that I just
discovered which fails the NBD-based live storage migration:

    https://www.redhat.com/archives/libvir-list/2017-April/msg00350.html
    -- NBD-based storage migration fails with "error: invalid argument:
    monitor must not be NULL"

Meanwhile, I'm about it test with plain QMP.

> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/block-backend.c | 40 +++++++++++++++++++++++++++++++++++++++-
>  include/block/block.h |  2 ++
>  migration/migration.c |  8 ++++++++
>  qmp.c                 |  6 ++++++
>  4 files changed, 55 insertions(+), 1 deletion(-)
> 
> diff --git a/block/block-backend.c b/block/block-backend.c
> index 0b63773..f817040 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -61,6 +61,7 @@ struct BlockBackend {
>  
>      uint64_t perm;
>      uint64_t shared_perm;
> +    bool disable_perm;
>  
>      bool allow_write_beyond_eof;
>  
> @@ -578,7 +579,7 @@ int blk_set_perm(BlockBackend *blk, uint64_t perm, uint64_t shared_perm,
>  {
>      int ret;
>  
> -    if (blk->root) {
> +    if (blk->root && !blk->disable_perm) {
>          ret = bdrv_child_try_set_perm(blk->root, perm, shared_perm, errp);
>          if (ret < 0) {
>              return ret;
> @@ -597,15 +598,52 @@ void blk_get_perm(BlockBackend *blk, uint64_t *perm, uint64_t *shared_perm)
>      *shared_perm = blk->shared_perm;
>  }
>  
> +/*
> + * Notifies the user of all BlockBackends that migration has completed. qdev
> + * devices can tighten their permissions in response (specifically revoke
> + * shared write permissions that we needed for storage migration).
> + *
> + * If an error is returned, the VM cannot be allowed to be resumed.
> + */
> +void blk_resume_after_migration(Error **errp)
> +{
> +    BlockBackend *blk;
> +    Error *local_err = NULL;
> +
> +    for (blk = blk_next(NULL); blk; blk = blk_next(blk)) {
> +        if (!blk->disable_perm) {
> +            continue;
> +        }
> +
> +        blk->disable_perm = false;
> +
> +        blk_set_perm(blk, blk->perm, blk->shared_perm, &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            blk->disable_perm = true;
> +            return;
> +        }
> +    }
> +}
> +
>  static int blk_do_attach_dev(BlockBackend *blk, void *dev)
>  {
>      if (blk->dev) {
>          return -EBUSY;
>      }
> +
> +    /* While migration is still incoming, we don't need to apply the
> +     * permissions of guest device BlockBackends. We might still have a block
> +     * job or NBD server writing to the image for storage migration. */
> +    if (runstate_check(RUN_STATE_INMIGRATE)) {
> +        blk->disable_perm = true;
> +    }
> +
>      blk_ref(blk);
>      blk->dev = dev;
>      blk->legacy_dev = false;
>      blk_iostatus_reset(blk);
> +
>      return 0;
>  }
>  
> diff --git a/include/block/block.h b/include/block/block.h
> index 5149260..3e09222 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -366,6 +366,8 @@ void bdrv_invalidate_cache(BlockDriverState *bs, Error **errp);
>  void bdrv_invalidate_cache_all(Error **errp);
>  int bdrv_inactivate_all(void);
>  
> +void blk_resume_after_migration(Error **errp);
> +
>  /* Ensure contents are flushed to disk.  */
>  int bdrv_flush(BlockDriverState *bs);
>  int coroutine_fn bdrv_co_flush(BlockDriverState *bs);
> diff --git a/migration/migration.c b/migration/migration.c
> index 54060f7..ad4036f 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -349,6 +349,14 @@ static void process_incoming_migration_bh(void *opaque)
>          exit(EXIT_FAILURE);
>      }
>  
> +    /* If we get an error here, just don't restart the VM yet. */
> +    blk_resume_after_migration(&local_err);
> +    if (local_err) {
> +        error_free(local_err);
> +        local_err = NULL;
> +        autostart = false;
> +    }
> +
>      /*
>       * This must happen after all error conditions are dealt with and
>       * we're sure the VM is going to be running on this host.
> diff --git a/qmp.c b/qmp.c
> index fa82b59..a744e44 100644
> --- a/qmp.c
> +++ b/qmp.c
> @@ -207,6 +207,12 @@ void qmp_cont(Error **errp)
>          }
>      }
>  
> +    blk_resume_after_migration(&local_err);
> +    if (local_err) {
> +        error_propagate(errp, local_err);
> +        return;
> +    }
> +
>      if (runstate_check(RUN_STATE_INMIGRATE)) {
>          autostart = 1;
>      } else {
> -- 
> 1.8.3.1
> 
> 

-- 
/kashyap

Re: [Qemu-devel] [Qemu-block] [RFC PATCH for-2.9] block: Ignore guest dev permissions during incoming migration

Posted by Kashyap Chamarthy 8 years, 10 months ago

On Tue, Apr 04, 2017 at 05:35:56PM +0200, Kevin Wolf wrote:
> Usually guest devices don't like other writers to the same image, so
> they use blk_set_perm() to prevent this from happening. In the migration
> phase before the VM is actually running, though, they don't have a
> problem with writes to the image. On the other hand, storage migration
> needs to be able to write to the image in this phase, so the restrictive
> blk_set_perm() call of qdev devices breaks it.
> 
> This patch flags all BlockBackends with a qdev device as
> blk->disable_perm during incoming migration, which means that the
> requested permissions are stored in the BlockBackend, but not actually
> applied to its root node yet.
> 
> Once migration has finished and the VM should be resumed, the
> permissions are applied. If they cannot be applied (e.g. because the NBD
> server used for block migration hasn't been shut down), resuming the VM
> fails.
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/block-backend.c | 40 +++++++++++++++++++++++++++++++++++++++-
>  include/block/block.h |  2 ++
>  migration/migration.c |  8 ++++++++
>  qmp.c                 |  6 ++++++
>  4 files changed, 55 insertions(+), 1 deletion(-)

With your fix applied, now I don't see the original error ("error:
internal error: unable to execute QEMU command 'nbd-server-add':
Conflicts with use by drive-virtio-disk0 as 'root', which does not allow
'write' on #block163"), and I can export a disk via `nbd-server-add`
with 'writeable' flag.  

However, with this fix, running `drive-mirror` seg-faults source QEMU.

Reproducer:

(1) On destination QEMU

$ /home/stack/build/build-qemu/x86_64-softmmu/qemu-system-x86_64 \
    -display none -nodefconfig -nodefaults -m 512 \
    -blockdev node-name=bar,driver=qcow2,file.driver=file,file.filename=./dst-disk.qcow2 \
    -serial unix:/tmp/monitor,server,nowait \
    -incoming tcp:localhost:6666 -qmp stdio
{"QMP": {"version": {"qemu": {"micro": 93, "minor": 8, "major": 2}, "package": " (v2.9.0-rc3-3-g3a8624b)"}, "capabilities": []}}
{ "execute": "qmp_capabilities" }
{"return": {}}
{ "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host": "localhost", "port": "3333" } } } }
{"return": {}}
{ "execute": "nbd-server-add", "arguments": { "device": "bar","writable": true } }
{"return": {}}
/home/stack/src/qemu/nbd/server.c:nbd_receive_request():L706: read failed


(2) On source QEMU:

$  /home/stack/build/build-qemu/x86_64-softmmu/qemu-system-x86_64 \
    -display none -nodefconfig -nodefaults -m 512 \
    -device virtio-scsi-pci,id=scsi -device virtio-serial-pci \
    -blockdev node-name=foo,driver=qcow2,file.driver=file,file.filename=./cirros-0.3.5.qcow2
    -qmp stdio
{"QMP": {"version": {"qemu": {"micro": 93, "minor": 8, "major": 2}, "package": " (v2.9.0-rc3-3-g3a8624b)"}, "capabilities": []}}
{ "execute": "qmp_capabilities" }
{"return": {}}
{ "execute": "drive-mirror", "arguments": { "device": "foo", "target": "nbd:localhost:3333:exportname=bar", "sync": "full","format": "raw", "mode": "existing" } }
Segmentation fault (core dumped)


[...]

-- 
/kashyap

Re: [Qemu-devel] [Qemu-block] [RFC PATCH for-2.9] block: Ignore guest dev permissions during incoming migration

Posted by Kashyap Chamarthy 8 years, 10 months ago

On Thu, Apr 06, 2017 at 07:26:15PM +0200, Kashyap Chamarthy wrote:
> On Tue, Apr 04, 2017 at 05:35:56PM +0200, Kevin Wolf wrote:
> > Usually guest devices don't like other writers to the same image, so
> > they use blk_set_perm() to prevent this from happening. In the migration
> > phase before the VM is actually running, though, they don't have a
> > problem with writes to the image. On the other hand, storage migration
> > needs to be able to write to the image in this phase, so the restrictive
> > blk_set_perm() call of qdev devices breaks it.
> > 
> > This patch flags all BlockBackends with a qdev device as
> > blk->disable_perm during incoming migration, which means that the
> > requested permissions are stored in the BlockBackend, but not actually
> > applied to its root node yet.
> > 
> > Once migration has finished and the VM should be resumed, the
> > permissions are applied. If they cannot be applied (e.g. because the NBD
> > server used for block migration hasn't been shut down), resuming the VM
> > fails.
> > 
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> > ---
> >  block/block-backend.c | 40 +++++++++++++++++++++++++++++++++++++++-
> >  include/block/block.h |  2 ++
> >  migration/migration.c |  8 ++++++++
> >  qmp.c                 |  6 ++++++
> >  4 files changed, 55 insertions(+), 1 deletion(-)
> 
> With your fix applied, now I don't see the original error ("error:
> internal error: unable to execute QEMU command 'nbd-server-add':
> Conflicts with use by drive-virtio-disk0 as 'root', which does not allow
> 'write' on #block163"), and I can export a disk via `nbd-server-add`
> with 'writeable' flag.  
> 
> However, with this fix, running `drive-mirror` seg-faults source QEMU.

TL;DR: Kevin / Max pointed out that segmentation fault should be fixed
by supplying a 'job-id' parameter to `drive-mirror`.  (Longer version,
refer below.)

So:

	Tested-by: Kashyap Chamarthy <kchamart@redhat.com>

> Reproducer:
> 
> (1) On destination QEMU
> 
> $ /home/stack/build/build-qemu/x86_64-softmmu/qemu-system-x86_64 \
>     -display none -nodefconfig -nodefaults -m 512 \
>     -blockdev node-name=bar,driver=qcow2,file.driver=file,file.filename=./dst-disk.qcow2 \
>     -serial unix:/tmp/monitor,server,nowait \
>     -incoming tcp:localhost:6666 -qmp stdio
> {"QMP": {"version": {"qemu": {"micro": 93, "minor": 8, "major": 2}, "package": " (v2.9.0-rc3-3-g3a8624b)"}, "capabilities": []}}
> { "execute": "qmp_capabilities" }
> {"return": {}}
> { "execute": "nbd-server-start", "arguments": { "addr": { "type": "inet","data": { "host": "localhost", "port": "3333" } } } }
> {"return": {}}
> { "execute": "nbd-server-add", "arguments": { "device": "bar","writable": true } }
> {"return": {}}
> /home/stack/src/qemu/nbd/server.c:nbd_receive_request():L706: read failed
> 
> 
> (2) On source QEMU:
> 
> $  /home/stack/build/build-qemu/x86_64-softmmu/qemu-system-x86_64 \
>     -display none -nodefconfig -nodefaults -m 512 \
>     -device virtio-scsi-pci,id=scsi -device virtio-serial-pci \
>     -blockdev node-name=foo,driver=qcow2,file.driver=file,file.filename=./cirros-0.3.5.qcow2
>     -qmp stdio
> {"QMP": {"version": {"qemu": {"micro": 93, "minor": 8, "major": 2}, "package": " (v2.9.0-rc3-3-g3a8624b)"}, "capabilities": []}}
> { "execute": "qmp_capabilities" }
> {"return": {}}
> { "execute": "drive-mirror", "arguments": { "device": "foo", "target": "nbd:localhost:3333:exportname=bar", "sync": "full","format": "raw", "mode": "existing" } }
> Segmentation fault (core dumped)

Okay, there was a missing piece:  I was pointed out on IRC that the
`drive-mirror` QMP command was missing the 'job-id' (introduced by
commit: 71aa986 - "mirror: Add 'job-id' parameter to 'blockdev-mirror'
and 'drive-mirror'") parameter.  Kevin tells me that this is mandatory
_if_ you're using 'node-name' (which I was, in my '-blockdev'
command-line above).

And Max points out, the above should be caught by his:

	https://lists.nongnu.org/archive/html/qemu-block/2017-04/msg00059.html --
	[PATCH for-2.9 0/2] block/mirror: Fix use-after-free

And sure enough, adding the 'job-id' to `drive-mirror` on source will
let `drive-mirror` proceed succesfully: 

[...]
{ "execute": "drive-mirror", "arguments": { "device": "foo", "job-id": "job-0", "target": "nbd:localhost:3333:exportname=bar", "sync": "full","format": "raw", "mode": "existing" } }
{"return": {}}
{"timestamp": {"seconds": 1491498318, "microseconds": 435816}, "event": "BLOCK_JOB_READY", "data": {"device": "job-0", "len": 41126400, "offset": 41126400, "speed": 0, "type": "mirror"}}


Issue a `block-job-complete` (or `block-job-cancel) to end the running
mirror job:

[...]
{"execute": "block-job-complete", "arguments": {"device": "job-0"} }
{"return": {}}
{"timestamp": {"seconds": 1491500183, "microseconds": 768746}, "event": "BLOCK_JOB_COMPLETED", "data": {"device": "job-0", "len": 41126400, "offset": 41126400, "speed": 0, "type": "mirror"}}

Then, stop the NBD server on the destination QEMU:

[...]
{"execute":"nbd-server-stop"}
{"return": {}}


* * *

Just writing for completeness' sake, if you try to stop the NBD server
on the destination, _without_ gracefully completing or cancelling the
`drive-mirror` job on the source, you see the below:

[...]
{"execute":"nbd-server-stop"}
{"return": {}}
/home/stack/src/qemu/nbd/server.c:nbd_receive_request():L706: read failed
[...]

Kevin and Max (thanks!) are aware of this on IRC, and are investigating
this.


-- 
/kashyap