[Qemu-devel] [PATCH] migration, xen: Fix block image lock issue on live migration

Anthony PERARD posted 1 patch 6 years, 6 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20171002163058.15651-1-anthony.perard@citrix.com
Test checkpatch passed
Test docker passed
Test s390x passed
There is a newer version of this series
migration/savevm.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
[Qemu-devel] [PATCH] migration, xen: Fix block image lock issue on live migration
Posted by Anthony PERARD 6 years, 6 months ago
When doing a live migration of a Xen guest with libxl, the images for
block devices are locked by the original QEMU process, and this prevent
the QEMU at the destination to take the lock and the migration fail.

From QEMU point of view, once the RAM of a domain is migrated, there is
two QMP commands, "stop" then "xen-save-devices-state", at which point a
new QEMU is spawned at the destination.

Release locks in "xen-save-devices-state" so the destination can takes
them.

Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
---
CCing libxl maintainers:
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
 migration/savevm.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/migration/savevm.c b/migration/savevm.c
index 4a88228614..69d904c179 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2263,6 +2263,20 @@ void qmp_xen_save_devices_state(const char *filename, Error **errp)
     qemu_fclose(f);
     if (ret < 0) {
         error_setg(errp, QERR_IO_ERROR);
+    } else {
+        /* libxl calls the QMP command "stop" before calling
+         * "xen-save-devices-state" and in case of migration failure, libxl
+         * would call "cont".
+         * So call bdrv_inactivate_all (release locks) here to let the other
+         * side of the migration take controle of the images.
+         */
+        if (!saved_vm_running) {
+            ret = bdrv_inactivate_all();
+            if (ret) {
+                error_setg(errp, "%s: bdrv_inactivate_all() failed (%d)",
+                           __func__, ret);
+            }
+        }
     }
 
  the_end:
-- 
Anthony PERARD


Re: [Qemu-devel] [PATCH] migration, xen: Fix block image lock issue on live migration
Posted by Dr. David Alan Gilbert 6 years, 6 months ago
Adding in kwolf;  it looks sane to me; Kevin?
If I'm reading this right, this is just after the device state save.

Dave

* Anthony PERARD (anthony.perard@citrix.com) wrote:
> When doing a live migration of a Xen guest with libxl, the images for
> block devices are locked by the original QEMU process, and this prevent
> the QEMU at the destination to take the lock and the migration fail.
> 
> From QEMU point of view, once the RAM of a domain is migrated, there is
> two QMP commands, "stop" then "xen-save-devices-state", at which point a
> new QEMU is spawned at the destination.
> 
> Release locks in "xen-save-devices-state" so the destination can takes
> them.
> 
> Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> ---
> CCing libxl maintainers:
> CC: Ian Jackson <ian.jackson@eu.citrix.com>
> CC: Wei Liu <wei.liu2@citrix.com>
> ---
>  migration/savevm.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/migration/savevm.c b/migration/savevm.c
> index 4a88228614..69d904c179 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -2263,6 +2263,20 @@ void qmp_xen_save_devices_state(const char *filename, Error **errp)
>      qemu_fclose(f);
>      if (ret < 0) {
>          error_setg(errp, QERR_IO_ERROR);
> +    } else {
> +        /* libxl calls the QMP command "stop" before calling
> +         * "xen-save-devices-state" and in case of migration failure, libxl
> +         * would call "cont".
> +         * So call bdrv_inactivate_all (release locks) here to let the other
> +         * side of the migration take controle of the images.
> +         */
> +        if (!saved_vm_running) {
> +            ret = bdrv_inactivate_all();
> +            if (ret) {
> +                error_setg(errp, "%s: bdrv_inactivate_all() failed (%d)",
> +                           __func__, ret);
> +            }
> +        }
>      }
>  
>   the_end:
> -- 
> Anthony PERARD
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration, xen: Fix block image lock issue on live migration
Posted by Kevin Wolf 6 years, 6 months ago
Am 02.10.2017 um 21:18 hat Dr. David Alan Gilbert geschrieben:
> Adding in kwolf;  it looks sane to me; Kevin?
> If I'm reading this right, this is just after the device state save.

Is this actual migration? Because the code looks more like it's copied
and adapted from the snapshot code rather than from the actual migration
code.

If Xen doesn't use the standard mechanisms, I don't know what they need
to do. Snapshots don't need to inactivate images, but migration does.
Compared to the normal migration path, this looks very simplistic, so I
wouldn't be surprised if there was more wrong than just file locking.

This looks like it could work as a hack to the problem at hand. Whether
it is a proper solution, I can't say without investing a lot more time.

Kevin

> * Anthony PERARD (anthony.perard@citrix.com) wrote:
> > When doing a live migration of a Xen guest with libxl, the images for
> > block devices are locked by the original QEMU process, and this prevent
> > the QEMU at the destination to take the lock and the migration fail.
> > 
> > From QEMU point of view, once the RAM of a domain is migrated, there is
> > two QMP commands, "stop" then "xen-save-devices-state", at which point a
> > new QEMU is spawned at the destination.
> > 
> > Release locks in "xen-save-devices-state" so the destination can takes
> > them.
> > 
> > Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
> > ---
> > CCing libxl maintainers:
> > CC: Ian Jackson <ian.jackson@eu.citrix.com>
> > CC: Wei Liu <wei.liu2@citrix.com>
> > ---
> >  migration/savevm.c | 14 ++++++++++++++
> >  1 file changed, 14 insertions(+)
> > 
> > diff --git a/migration/savevm.c b/migration/savevm.c
> > index 4a88228614..69d904c179 100644
> > --- a/migration/savevm.c
> > +++ b/migration/savevm.c
> > @@ -2263,6 +2263,20 @@ void qmp_xen_save_devices_state(const char *filename, Error **errp)
> >      qemu_fclose(f);
> >      if (ret < 0) {
> >          error_setg(errp, QERR_IO_ERROR);
> > +    } else {
> > +        /* libxl calls the QMP command "stop" before calling
> > +         * "xen-save-devices-state" and in case of migration failure, libxl
> > +         * would call "cont".
> > +         * So call bdrv_inactivate_all (release locks) here to let the other
> > +         * side of the migration take controle of the images.
> > +         */
> > +        if (!saved_vm_running) {
> > +            ret = bdrv_inactivate_all();
> > +            if (ret) {
> > +                error_setg(errp, "%s: bdrv_inactivate_all() failed (%d)",
> > +                           __func__, ret);
> > +            }
> > +        }
> >      }
> >  
> >   the_end:
> > -- 
> > Anthony PERARD
> > 
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

Re: [Qemu-devel] [PATCH] migration, xen: Fix block image lock issue on live migration
Posted by Anthony PERARD 6 years, 5 months ago
On Wed, Oct 04, 2017 at 03:03:49PM +0200, Kevin Wolf wrote:
> Am 02.10.2017 um 21:18 hat Dr. David Alan Gilbert geschrieben:
> > Adding in kwolf;  it looks sane to me; Kevin?
> > If I'm reading this right, this is just after the device state save.
> 
> Is this actual migration? Because the code looks more like it's copied
> and adapted from the snapshot code rather than from the actual migration
> code.

Well the Xen tool stack takes care of the migration, we only need to
save the device states from QEMU, I guess similair to a snapshot.

> If Xen doesn't use the standard mechanisms, I don't know what they need
> to do. Snapshots don't need to inactivate images, but migration does.
> Compared to the normal migration path, this looks very simplistic, so I
> wouldn't be surprised if there was more wrong than just file locking.

I realize now that if one would want to take a snapshot of a running
Xen guest, this xen-save-devices-state qmp command will be called as
well.

So I can see a few options to better handle snapshots, we could:
- Add a new parameter to xen-save-devices-state, "live_migration" which
  could default to 'true' so older version of Xen will still works.
- Create a new qmp command that sole purpose is to call
  bdrv_inactivate_all, I don't know what else this command would have to
  do.
- or just take this patch.

Thanks.

> This looks like it could work as a hack to the problem at hand. Whether
> it is a proper solution, I can't say without investing a lot more time.


-- 
Anthony PERARD

Re: [Qemu-devel] [Xen-devel] [PATCH] migration, xen: Fix block image lock issue on live migration
Posted by Roger Pau Monné 6 years, 6 months ago
On Mon, Oct 02, 2017 at 04:30:58PM +0000, Anthony PERARD wrote:
> When doing a live migration of a Xen guest with libxl, the images for
> block devices are locked by the original QEMU process, and this prevent
> the QEMU at the destination to take the lock and the migration fail.
> 
> From QEMU point of view, once the RAM of a domain is migrated, there is
> two QMP commands, "stop" then "xen-save-devices-state", at which point a
> new QEMU is spawned at the destination.
> 
> Release locks in "xen-save-devices-state" so the destination can takes
> them.

What happens if the migration fails on the destination? Will QEMU pick
the lock again when resuming on the source in this case?

Thanks, Roger.

Re: [Qemu-devel] [Xen-devel] [PATCH] migration, xen: Fix block image lock issue on live migration
Posted by Anthony PERARD 6 years, 6 months ago
On Tue, Oct 03, 2017 at 12:33:37PM +0100, Roger Pau Monné wrote:
> On Mon, Oct 02, 2017 at 04:30:58PM +0000, Anthony PERARD wrote:
> > When doing a live migration of a Xen guest with libxl, the images for
> > block devices are locked by the original QEMU process, and this prevent
> > the QEMU at the destination to take the lock and the migration fail.
> > 
> > From QEMU point of view, once the RAM of a domain is migrated, there is
> > two QMP commands, "stop" then "xen-save-devices-state", at which point a
> > new QEMU is spawned at the destination.
> > 
> > Release locks in "xen-save-devices-state" so the destination can takes
> > them.
> 
> What happens if the migration fails on the destination? Will QEMU pick
> the lock again when resuming on the source in this case?

Yes, calling the QMP command "cont" to resume the activity makes QEMU
take the lock again, and libxl would call "cont". (I don't think you can
pick this kind of lock ;-). )

-- 
Anthony PERARD