migration/savevm.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
When doing a live migration of a Xen guest with libxl, the images for
block devices are locked by the original QEMU process, and this prevent
the QEMU at the destination to take the lock and the migration fail.
From QEMU point of view, once the RAM of a domain is migrated, there is
two QMP commands, "stop" then "xen-save-devices-state", at which point a
new QEMU is spawned at the destination.
Release locks in "xen-save-devices-state" so the destination can takes
them.
Signed-off-by: Anthony PERARD <anthony.perard@citrix.com>
---
CCing libxl maintainers:
CC: Ian Jackson <ian.jackson@eu.citrix.com>
CC: Wei Liu <wei.liu2@citrix.com>
---
migration/savevm.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/migration/savevm.c b/migration/savevm.c
index 4a88228614..69d904c179 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -2263,6 +2263,20 @@ void qmp_xen_save_devices_state(const char *filename, Error **errp)
qemu_fclose(f);
if (ret < 0) {
error_setg(errp, QERR_IO_ERROR);
+ } else {
+ /* libxl calls the QMP command "stop" before calling
+ * "xen-save-devices-state" and in case of migration failure, libxl
+ * would call "cont".
+ * So call bdrv_inactivate_all (release locks) here to let the other
+ * side of the migration take controle of the images.
+ */
+ if (!saved_vm_running) {
+ ret = bdrv_inactivate_all();
+ if (ret) {
+ error_setg(errp, "%s: bdrv_inactivate_all() failed (%d)",
+ __func__, ret);
+ }
+ }
}
the_end:
--
Anthony PERARD
Adding in kwolf; it looks sane to me; Kevin? If I'm reading this right, this is just after the device state save. Dave * Anthony PERARD (anthony.perard@citrix.com) wrote: > When doing a live migration of a Xen guest with libxl, the images for > block devices are locked by the original QEMU process, and this prevent > the QEMU at the destination to take the lock and the migration fail. > > From QEMU point of view, once the RAM of a domain is migrated, there is > two QMP commands, "stop" then "xen-save-devices-state", at which point a > new QEMU is spawned at the destination. > > Release locks in "xen-save-devices-state" so the destination can takes > them. > > Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> > --- > CCing libxl maintainers: > CC: Ian Jackson <ian.jackson@eu.citrix.com> > CC: Wei Liu <wei.liu2@citrix.com> > --- > migration/savevm.c | 14 ++++++++++++++ > 1 file changed, 14 insertions(+) > > diff --git a/migration/savevm.c b/migration/savevm.c > index 4a88228614..69d904c179 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -2263,6 +2263,20 @@ void qmp_xen_save_devices_state(const char *filename, Error **errp) > qemu_fclose(f); > if (ret < 0) { > error_setg(errp, QERR_IO_ERROR); > + } else { > + /* libxl calls the QMP command "stop" before calling > + * "xen-save-devices-state" and in case of migration failure, libxl > + * would call "cont". > + * So call bdrv_inactivate_all (release locks) here to let the other > + * side of the migration take controle of the images. > + */ > + if (!saved_vm_running) { > + ret = bdrv_inactivate_all(); > + if (ret) { > + error_setg(errp, "%s: bdrv_inactivate_all() failed (%d)", > + __func__, ret); > + } > + } > } > > the_end: > -- > Anthony PERARD > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
Am 02.10.2017 um 21:18 hat Dr. David Alan Gilbert geschrieben: > Adding in kwolf; it looks sane to me; Kevin? > If I'm reading this right, this is just after the device state save. Is this actual migration? Because the code looks more like it's copied and adapted from the snapshot code rather than from the actual migration code. If Xen doesn't use the standard mechanisms, I don't know what they need to do. Snapshots don't need to inactivate images, but migration does. Compared to the normal migration path, this looks very simplistic, so I wouldn't be surprised if there was more wrong than just file locking. This looks like it could work as a hack to the problem at hand. Whether it is a proper solution, I can't say without investing a lot more time. Kevin > * Anthony PERARD (anthony.perard@citrix.com) wrote: > > When doing a live migration of a Xen guest with libxl, the images for > > block devices are locked by the original QEMU process, and this prevent > > the QEMU at the destination to take the lock and the migration fail. > > > > From QEMU point of view, once the RAM of a domain is migrated, there is > > two QMP commands, "stop" then "xen-save-devices-state", at which point a > > new QEMU is spawned at the destination. > > > > Release locks in "xen-save-devices-state" so the destination can takes > > them. > > > > Signed-off-by: Anthony PERARD <anthony.perard@citrix.com> > > --- > > CCing libxl maintainers: > > CC: Ian Jackson <ian.jackson@eu.citrix.com> > > CC: Wei Liu <wei.liu2@citrix.com> > > --- > > migration/savevm.c | 14 ++++++++++++++ > > 1 file changed, 14 insertions(+) > > > > diff --git a/migration/savevm.c b/migration/savevm.c > > index 4a88228614..69d904c179 100644 > > --- a/migration/savevm.c > > +++ b/migration/savevm.c > > @@ -2263,6 +2263,20 @@ void qmp_xen_save_devices_state(const char *filename, Error **errp) > > qemu_fclose(f); > > if (ret < 0) { > > error_setg(errp, QERR_IO_ERROR); > > + } else { > > + /* libxl calls the QMP command "stop" before calling > > + * "xen-save-devices-state" and in case of migration failure, libxl > > + * would call "cont". > > + * So call bdrv_inactivate_all (release locks) here to let the other > > + * side of the migration take controle of the images. > > + */ > > + if (!saved_vm_running) { > > + ret = bdrv_inactivate_all(); > > + if (ret) { > > + error_setg(errp, "%s: bdrv_inactivate_all() failed (%d)", > > + __func__, ret); > > + } > > + } > > } > > > > the_end: > > -- > > Anthony PERARD > > > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Wed, Oct 04, 2017 at 03:03:49PM +0200, Kevin Wolf wrote: > Am 02.10.2017 um 21:18 hat Dr. David Alan Gilbert geschrieben: > > Adding in kwolf; it looks sane to me; Kevin? > > If I'm reading this right, this is just after the device state save. > > Is this actual migration? Because the code looks more like it's copied > and adapted from the snapshot code rather than from the actual migration > code. Well the Xen tool stack takes care of the migration, we only need to save the device states from QEMU, I guess similair to a snapshot. > If Xen doesn't use the standard mechanisms, I don't know what they need > to do. Snapshots don't need to inactivate images, but migration does. > Compared to the normal migration path, this looks very simplistic, so I > wouldn't be surprised if there was more wrong than just file locking. I realize now that if one would want to take a snapshot of a running Xen guest, this xen-save-devices-state qmp command will be called as well. So I can see a few options to better handle snapshots, we could: - Add a new parameter to xen-save-devices-state, "live_migration" which could default to 'true' so older version of Xen will still works. - Create a new qmp command that sole purpose is to call bdrv_inactivate_all, I don't know what else this command would have to do. - or just take this patch. Thanks. > This looks like it could work as a hack to the problem at hand. Whether > it is a proper solution, I can't say without investing a lot more time. -- Anthony PERARD
On Mon, Oct 02, 2017 at 04:30:58PM +0000, Anthony PERARD wrote: > When doing a live migration of a Xen guest with libxl, the images for > block devices are locked by the original QEMU process, and this prevent > the QEMU at the destination to take the lock and the migration fail. > > From QEMU point of view, once the RAM of a domain is migrated, there is > two QMP commands, "stop" then "xen-save-devices-state", at which point a > new QEMU is spawned at the destination. > > Release locks in "xen-save-devices-state" so the destination can takes > them. What happens if the migration fails on the destination? Will QEMU pick the lock again when resuming on the source in this case? Thanks, Roger.
On Tue, Oct 03, 2017 at 12:33:37PM +0100, Roger Pau Monné wrote: > On Mon, Oct 02, 2017 at 04:30:58PM +0000, Anthony PERARD wrote: > > When doing a live migration of a Xen guest with libxl, the images for > > block devices are locked by the original QEMU process, and this prevent > > the QEMU at the destination to take the lock and the migration fail. > > > > From QEMU point of view, once the RAM of a domain is migrated, there is > > two QMP commands, "stop" then "xen-save-devices-state", at which point a > > new QEMU is spawned at the destination. > > > > Release locks in "xen-save-devices-state" so the destination can takes > > them. > > What happens if the migration fails on the destination? Will QEMU pick > the lock again when resuming on the source in this case? Yes, calling the QMP command "cont" to resume the activity makes QEMU take the lock again, and libxl would call "cont". (I don't think you can pick this kind of lock ;-). ) -- Anthony PERARD
© 2016 - 2024 Red Hat, Inc.