[PATCH V8 06/39] cpr: reboot mode

Steve Sistare posted 39 patches 3 years, 7 months ago
Maintainers: Stefano Stabellini <sstabellini@kernel.org>, Anthony Perard <anthony.perard@citrix.com>, Paul Durrant <paul@xen.org>, David Hildenbrand <david@redhat.com>, Igor Mammedov <imammedo@redhat.com>, "Marc-André Lureau" <marcandre.lureau@redhat.com>, Paolo Bonzini <pbonzini@redhat.com>, "Alex Bennée" <alex.bennee@linaro.org>, "Philippe Mathieu-Daudé" <f4bug@amsat.org>, "Dr. David Alan Gilbert" <dgilbert@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Alex Williamson <alex.williamson@redhat.com>, Steve Sistare <steven.sistare@oracle.com>, Mark Kanda <mark.kanda@oracle.com>, Peter Xu <peterx@redhat.com>, Juan Quintela <quintela@redhat.com>, Markus Armbruster <armbru@redhat.com>, Michael Roth <michael.roth@amd.com>, John Snow <jsnow@redhat.com>, Cleber Rosa <crosa@redhat.com>, Beraldo Leal <bleal@redhat.com>, Eric Blake <eblake@redhat.com>, Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>, Wainer dos Santos Moschetta <wainersm@redhat.com>, Stefan Hajnoczi <stefanha@redhat.com>, Stefan Weil <sw@weilnetz.de>
There is a newer version of this series
[PATCH V8 06/39] cpr: reboot mode
Posted by Steve Sistare 3 years, 7 months ago
Provide the cpr-save and cpr-load functions for live update.  These save and
restore VM state, with minimal guest pause time, so that qemu may be updated
to a new version in between.

cpr-save stops the VM and saves vmstate to an ordinary file.  It supports
any type of guest image and block device, but the caller must not modify
guest block devices between cpr-save and cpr-load.

cpr-save supports several modes, the first of which is reboot. In this mode
the caller invokes cpr-save and then terminates qemu.  The caller may then
update the host kernel and system software and reboot.  The caller resumes
the guest by running qemu with the same arguments as the original process
and invoking cpr-load.  To use this mode, guest ram must be mapped to a
persistent shared memory file such as /dev/dax0.0 or /dev/shm PKRAM.

The reboot mode supports vfio devices if the caller first suspends the
guest, such as by issuing guest-suspend-ram to the qemu guest agent.  The
guest drivers' suspend methods flush outstanding requests and re-initialize
the devices, and thus there is no device state to save and restore.

cpr-load loads state from the file.  If the VM was running at cpr-save time
then VM execution resumes.  If the VM was suspended at cpr-save time, then
the caller must issue a system_wakeup command to resume.

cpr-save syntax:
  { 'enum': 'CprMode', 'data': [ 'reboot' ] }
  { 'command': 'cpr-save', 'data': { 'filename': 'str', 'mode': 'CprMode' }}

cpr-load syntax:
  { 'command': 'cpr-load', 'data': { 'filename': 'str', 'mode': 'CprMode' }}

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
 MAINTAINERS             |   8 ++++
 include/migration/cpr.h |  16 +++++++
 migration/cpr.c         | 116 ++++++++++++++++++++++++++++++++++++++++++++++++
 migration/meson.build   |   1 +
 qapi/cpr.json           |  62 ++++++++++++++++++++++++++
 qapi/meson.build        |   1 +
 qapi/qapi-schema.json   |   1 +
 softmmu/runstate.c      |   1 +
 8 files changed, 206 insertions(+)
 create mode 100644 include/migration/cpr.h
 create mode 100644 migration/cpr.c
 create mode 100644 qapi/cpr.json

diff --git a/MAINTAINERS b/MAINTAINERS
index 4cf6174..9273891 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3152,6 +3152,14 @@ F: net/filter-rewriter.c
 F: net/filter-mirror.c
 F: tests/qtest/test-filter*
 
+CPR
+M: Steve Sistare <steven.sistare@oracle.com>
+M: Mark Kanda <mark.kanda@oracle.com>
+S: Maintained
+F: include/migration/cpr.h
+F: migration/cpr.c
+F: qapi/cpr.json
+
 Record/replay
 M: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
 R: Paolo Bonzini <pbonzini@redhat.com>
diff --git a/include/migration/cpr.h b/include/migration/cpr.h
new file mode 100644
index 0000000..1b6c82f
--- /dev/null
+++ b/include/migration/cpr.h
@@ -0,0 +1,16 @@
+/*
+ * Copyright (c) 2021, 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef MIGRATION_CPR_H
+#define MIGRATION_CPR_H
+
+#include "qapi/qapi-types-cpr.h"
+
+void cpr_set_mode(CprMode mode);
+CprMode cpr_get_mode(void);
+
+#endif
diff --git a/migration/cpr.c b/migration/cpr.c
new file mode 100644
index 0000000..24b0bcc
--- /dev/null
+++ b/migration/cpr.c
@@ -0,0 +1,116 @@
+/*
+ * Copyright (c) 2021, 2022 Oracle and/or its affiliates.
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "migration/cpr.h"
+#include "migration/global_state.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-cpr.h"
+#include "qemu-file-channel.h"
+#include "qemu-file.h"
+#include "savevm.h"
+#include "sysemu/cpu-timers.h"
+#include "sysemu/runstate.h"
+#include "sysemu/sysemu.h"
+
+static CprMode cpr_mode = CPR_MODE_NONE;
+
+CprMode cpr_get_mode(void)
+{
+    return cpr_mode;
+}
+
+void cpr_set_mode(CprMode mode)
+{
+    cpr_mode = mode;
+}
+
+void qmp_cpr_save(const char *filename, CprMode mode, Error **errp)
+{
+    int ret;
+    QEMUFile *f;
+    int saved_vm_running = runstate_is_running();
+
+    if (global_state_store()) {
+        error_setg(errp, "Error saving global state");
+        return;
+    }
+
+    f = qemu_fopen_file(filename, O_CREAT | O_WRONLY | O_TRUNC, 0600,
+                        "cpr-save", errp);
+    if (!f) {
+        return;
+    }
+
+    if (runstate_check(RUN_STATE_SUSPENDED)) {
+        /* Update timers_state before saving.  Suspend did not so do. */
+        cpu_disable_ticks();
+    }
+    vm_stop(RUN_STATE_SAVE_VM);
+
+    cpr_set_mode(mode);
+    ret = qemu_save_device_state(f);
+    qemu_fclose(f);
+    if (ret < 0) {
+        error_setg(errp, "Error %d while saving VM state", ret);
+        goto err;
+    }
+
+    return;
+
+err:
+    if (saved_vm_running) {
+        vm_start();
+    }
+    cpr_set_mode(CPR_MODE_NONE);
+}
+
+void qmp_cpr_load(const char *filename, CprMode mode, Error **errp)
+{
+    QEMUFile *f;
+    int ret;
+    RunState state;
+
+    if (runstate_is_running()) {
+        error_setg(errp, "cpr-load called for a running VM");
+        return;
+    }
+
+    f = qemu_fopen_file(filename, O_RDONLY, 0, "cpr-load", errp);
+    if (!f) {
+        return;
+    }
+
+    if (qemu_get_be32(f) != QEMU_VM_FILE_MAGIC ||
+        qemu_get_be32(f) != QEMU_VM_FILE_VERSION) {
+        error_setg(errp, "%s is not a vmstate file", filename);
+        qemu_fclose(f);
+        return;
+    }
+
+    cpr_set_mode(mode);
+    ret = qemu_load_device_state(f);
+    qemu_fclose(f);
+    if (ret < 0) {
+        error_setg(errp, "Error %d while loading VM state", ret);
+        goto out;
+    }
+
+    state = global_state_get_runstate();
+    if (state == RUN_STATE_RUNNING) {
+        vm_start();
+    } else {
+        runstate_set(state);
+        if (runstate_check(RUN_STATE_SUSPENDED)) {
+            /* Force vm_start to be called later. */
+            qemu_system_start_on_wakeup_request();
+        }
+    }
+
+out:
+    cpr_set_mode(CPR_MODE_NONE);
+}
diff --git a/migration/meson.build b/migration/meson.build
index 6880b61..76fcfdb 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -15,6 +15,7 @@ softmmu_ss.add(files(
   'channel.c',
   'colo-failover.c',
   'colo.c',
+  'cpr.c',
   'exec.c',
   'fd.c',
   'global_state.c',
diff --git a/qapi/cpr.json b/qapi/cpr.json
new file mode 100644
index 0000000..bdaabcb
--- /dev/null
+++ b/qapi/cpr.json
@@ -0,0 +1,62 @@
+# -*- Mode: Python -*-
+#
+# Copyright (c) 2021, 2022 Oracle and/or its affiliates.
+#
+# This work is licensed under the terms of the GNU GPL, version 2.
+# See the COPYING file in the top-level directory.
+
+##
+# = CPR - CheckPoint and Restart
+##
+
+{ 'include': 'common.json' }
+
+##
+# @CprMode:
+#
+# @reboot: checkpoint can be cpr-load'ed after a host reboot.
+#
+# Since: 7.1
+##
+{ 'enum': 'CprMode',
+  'data': [ 'none', 'reboot' ] }
+
+##
+# @cpr-save:
+#
+# Pause the VCPUs, and create a checkpoint of the virtual machine device state
+# in @filename.  Unlike snapshot-save, this command completes synchronously,
+# saves state to an ordinary file, does not save guest block device blocks,
+# and does not require that guest RAM be saved in the file.  The caller must
+# not modify guest block devices between cpr-save and cpr-load.
+#
+# If @mode is 'reboot', the checkpoint remains valid after a host reboot.
+# The guest RAM memory-backend should be shared and non-volatile across
+# reboot, else it will be saved to the file.  To resume from the checkpoint,
+# issue the quit command, reboot the system, start qemu using the same
+# arguments plus -S, and issue the cpr-load command.
+#
+# @filename: name of checkpoint file
+# @mode: @CprMode mode
+#
+# Since: 7.1
+##
+{ 'command': 'cpr-save',
+  'data': { 'filename': 'str',
+            'mode': 'CprMode' } }
+
+##
+# @cpr-load:
+#
+# Load a virtual machine from the checkpoint file @filename that was created
+# earlier by the cpr-save command, and continue the VCPUs.  @mode must match
+# the mode specified for cpr-save.
+#
+# @filename: name of checkpoint file
+# @mode: @CprMode mode
+#
+# Since: 7.1
+##
+{ 'command': 'cpr-load',
+  'data': { 'filename': 'str',
+            'mode': 'CprMode' } }
diff --git a/qapi/meson.build b/qapi/meson.build
index 656ef0e..d9ab29d 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -30,6 +30,7 @@ qapi_all_modules = [
   'common',
   'compat',
   'control',
+  'cpr',
   'crypto',
   'dump',
   'error',
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index 4912b97..001d790 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -77,6 +77,7 @@
 { 'include': 'ui.json' }
 { 'include': 'authz.json' }
 { 'include': 'migration.json' }
+{ 'include': 'cpr.json' }
 { 'include': 'transaction.json' }
 { 'include': 'trace.json' }
 { 'include': 'compat.json' }
diff --git a/softmmu/runstate.c b/softmmu/runstate.c
index 9b27d74..cfd6aa9 100644
--- a/softmmu/runstate.c
+++ b/softmmu/runstate.c
@@ -116,6 +116,7 @@ static const RunStateTransition runstate_transitions_def[] = {
     { RUN_STATE_PRELAUNCH, RUN_STATE_FINISH_MIGRATE },
     { RUN_STATE_PRELAUNCH, RUN_STATE_INMIGRATE },
     { RUN_STATE_PRELAUNCH, RUN_STATE_SUSPENDED },
+    { RUN_STATE_PRELAUNCH, RUN_STATE_PAUSED },
 
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_RUNNING },
     { RUN_STATE_FINISH_MIGRATE, RUN_STATE_PAUSED },
-- 
1.8.3.1
Re: [PATCH V8 06/39] cpr: reboot mode
Posted by Daniel P. Berrangé 3 years, 7 months ago
On Wed, Jun 15, 2022 at 07:51:53AM -0700, Steve Sistare wrote:
> Provide the cpr-save and cpr-load functions for live update.  These save and
> restore VM state, with minimal guest pause time, so that qemu may be updated
> to a new version in between.
> 
> cpr-save stops the VM and saves vmstate to an ordinary file.  It supports
> any type of guest image and block device, but the caller must not modify
> guest block devices between cpr-save and cpr-load.
> 
> cpr-save supports several modes, the first of which is reboot. In this mode
> the caller invokes cpr-save and then terminates qemu.  The caller may then
> update the host kernel and system software and reboot.  The caller resumes
> the guest by running qemu with the same arguments as the original process
> and invoking cpr-load.  To use this mode, guest ram must be mapped to a
> persistent shared memory file such as /dev/dax0.0 or /dev/shm PKRAM.
> 
> The reboot mode supports vfio devices if the caller first suspends the
> guest, such as by issuing guest-suspend-ram to the qemu guest agent.  The
> guest drivers' suspend methods flush outstanding requests and re-initialize
> the devices, and thus there is no device state to save and restore.
> 
> cpr-load loads state from the file.  If the VM was running at cpr-save time
> then VM execution resumes.  If the VM was suspended at cpr-save time, then
> the caller must issue a system_wakeup command to resume.
> 
> cpr-save syntax:
>   { 'enum': 'CprMode', 'data': [ 'reboot' ] }
>   { 'command': 'cpr-save', 'data': { 'filename': 'str', 'mode': 'CprMode' }}
> 
> cpr-load syntax:
>   { 'command': 'cpr-load', 'data': { 'filename': 'str', 'mode': 'CprMode' }}

I'm still a little unsure if this direction for QAPI exposure is the
best, or whether we should instead leverage the migration commands.

I particularly concerned that we might regret having an API that
is designed only around storage in local files/blockdevs. The
migration layer has flexibility to use many protocols which has
been useful in the past to be able to offload work to an external
process. For example, libvirt uses migrate-to-fd so it can use
a helper that adds O_DIRECT support such that we avoid trashing
the host I/O cache for save/restore.

At the same time though, the migrate APIs don't currently support
a plain "file" protocol. This was because historically we needed
the QEMUFile to support O_NONBLOCK and this fails with plain
files or block devices, so QEMU threads could get blocked. For
the save side this doesn't matter so much, as QEMU now has the
outgoing migrate channels in blocking mode, only the incoming
side use non-blocking.  We could add a plain "file" protocol
to migration if we clearly document its limitations, and indeed
I've suggested we do that for another unrelated bit of work
for libvirts VM save/restore functionality.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH V8 06/39] cpr: reboot mode
Posted by Steven Sistare 3 years, 7 months ago
On 6/16/2022 7:10 AM, Daniel P. Berrangé wrote:
> On Wed, Jun 15, 2022 at 07:51:53AM -0700, Steve Sistare wrote:
>> Provide the cpr-save and cpr-load functions for live update.  These save and
>> restore VM state, with minimal guest pause time, so that qemu may be updated
>> to a new version in between.
>>
>> cpr-save stops the VM and saves vmstate to an ordinary file.  It supports
>> any type of guest image and block device, but the caller must not modify
>> guest block devices between cpr-save and cpr-load.
>>
>> cpr-save supports several modes, the first of which is reboot. In this mode
>> the caller invokes cpr-save and then terminates qemu.  The caller may then
>> update the host kernel and system software and reboot.  The caller resumes
>> the guest by running qemu with the same arguments as the original process
>> and invoking cpr-load.  To use this mode, guest ram must be mapped to a
>> persistent shared memory file such as /dev/dax0.0 or /dev/shm PKRAM.
>>
>> The reboot mode supports vfio devices if the caller first suspends the
>> guest, such as by issuing guest-suspend-ram to the qemu guest agent.  The
>> guest drivers' suspend methods flush outstanding requests and re-initialize
>> the devices, and thus there is no device state to save and restore.
>>
>> cpr-load loads state from the file.  If the VM was running at cpr-save time
>> then VM execution resumes.  If the VM was suspended at cpr-save time, then
>> the caller must issue a system_wakeup command to resume.
>>
>> cpr-save syntax:
>>   { 'enum': 'CprMode', 'data': [ 'reboot' ] }
>>   { 'command': 'cpr-save', 'data': { 'filename': 'str', 'mode': 'CprMode' }}
>>
>> cpr-load syntax:
>>   { 'command': 'cpr-load', 'data': { 'filename': 'str', 'mode': 'CprMode' }}
> 
> I'm still a little unsure if this direction for QAPI exposure is the
> best, or whether we should instead leverage the migration commands.
> 
> I particularly concerned that we might regret having an API that
> is designed only around storage in local files/blockdevs. The
> migration layer has flexibility to use many protocols which has
> been useful in the past to be able to offload work to an external
> process. For example, libvirt uses migrate-to-fd so it can use
> a helper that adds O_DIRECT support such that we avoid trashing
> the host I/O cache for save/restore.
> 
> At the same time though, the migrate APIs don't currently support
> a plain "file" protocol. This was because historically we needed
> the QEMUFile to support O_NONBLOCK and this fails with plain
> files or block devices, so QEMU threads could get blocked. For
> the save side this doesn't matter so much, as QEMU now has the
> outgoing migrate channels in blocking mode, only the incoming
> side use non-blocking.  We could add a plain "file" protocol
> to migration if we clearly document its limitations, and indeed
> I've suggested we do that for another unrelated bit of work
> for libvirts VM save/restore functionality.

OK, I'll give it a try:
  - delete cpr-save, cpr-load, and cpr-exec
  - add file uri
  - add argv to MigrationParameters for the execv call.

- Steve