[PATCH V1 00/26] Live update: cpr-exec

Steve Sistare posted 26 patches 2 weeks, 4 days ago
Failed in applying to current master (apply log)
Maintainers: Stefano Stabellini <sstabellini@kernel.org>, Anthony PERARD <anthony@xenproject.org>, Paul Durrant <paul@xen.org>, David Hildenbrand <david@redhat.com>, Igor Mammedov <imammedo@redhat.com>, "Dr. David Alan Gilbert" <dave@treblig.org>, Eduardo Habkost <eduardo@habkost.net>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, "Philippe Mathieu-Daudé" <philmd@linaro.org>, Yanan Wang <wangyanan55@huawei.com>, Paolo Bonzini <pbonzini@redhat.com>, "Daniel P. Berrangé" <berrange@redhat.com>, "Michael S. Tsirkin" <mst@redhat.com>, Alex Williamson <alex.williamson@redhat.com>, "Cédric Le Goater" <clg@redhat.com>, Richard Henderson <richard.henderson@linaro.org>, Peter Xu <peterx@redhat.com>, Fabiano Rosas <farosas@suse.de>, Eric Blake <eblake@redhat.com>, Markus Armbruster <armbru@redhat.com>, Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>, Thomas Huth <thuth@redhat.com>, Ilya Leoshkevich <iii@linux.ibm.com>, Stefan Weil <sw@weilnetz.de>
accel/xen/xen-all.c            |   5 +
backends/hostmem-epc.c         |  12 +-
hmp-commands.hx                |   2 +-
hw/core/machine.c              |  22 +++
hw/core/qdev.c                 |   1 +
hw/intc/apic_common.c          |   2 +-
hw/vfio/migration.c            |   3 +-
include/exec/cpu-common.h      |   3 +-
include/exec/memory.h          |  15 ++
include/exec/ramblock.h        |  10 +-
include/hw/boards.h            |   1 +
include/migration/blocker.h    |   7 +
include/migration/cpr.h        |  14 ++
include/migration/misc.h       |  11 ++
include/migration/vmstate.h    | 133 +++++++++++++++-
include/qemu/osdep.h           |   9 ++
include/sysemu/runstate.h      |   3 +
include/sysemu/seccomp.h       |   1 +
include/sysemu/sysemu.h        |   1 -
migration/cpr.c                | 131 ++++++++++++++++
migration/meson.build          |   3 +
migration/migration-hmp-cmds.c |  50 +++++-
migration/migration.c          |  48 +++++-
migration/migration.h          |   5 +-
migration/options.c            |  13 ++
migration/precreate.c          | 139 +++++++++++++++++
migration/ram.c                |  16 +-
migration/savevm.c             | 306 +++++++++++++++++++++++++++++-------
migration/savevm.h             |   3 +
migration/trace-events         |   7 +
migration/vmstate-factory.c    |  78 ++++++++++
migration/vmstate-types.c      |  24 +++
migration/vmstate.c            |   3 +-
qapi/migration.json            |  48 +++++-
qemu-options.hx                |  22 ++-
replay/replay.c                |   6 +
stubs/migr-blocker.c           |   5 +
stubs/vmstate.c                |  13 ++
system/globals.c               |   1 -
system/memory.c                |  19 ++-
system/physmem.c               | 346 +++++++++++++++++++++++++++--------------
system/qemu-seccomp.c          |  10 +-
system/runstate.c              |  29 ++++
system/trace-events            |   4 +
system/vl.c                    |  26 +++-
target/s390x/cpu_models.c      |   4 +-
util/oslib-posix.c             |   9 ++
util/oslib-win32.c             |   4 +
48 files changed, 1417 insertions(+), 210 deletions(-)
create mode 100644 include/migration/cpr.h
create mode 100644 migration/cpr.c
create mode 100644 migration/precreate.c
create mode 100644 migration/vmstate-factory.c
[PATCH V1 00/26] Live update: cpr-exec
Posted by Steve Sistare 2 weeks, 4 days ago
This patch series adds the live migration cpr-exec mode.  In this mode, QEMU
stops the VM, writes VM state to the migration URI, and directly exec's a
new version of QEMU on the same host, replacing the original process while
retaining its PID.  Guest RAM is preserved in place, albeit with new virtual
addresses.  The user completes the migration by specifying the -incoming
option, and by issuing the migrate-incoming command if necessary.  This
saves and restores VM state, with minimal guest pause time, so that QEMU may
be updated to a new version in between.

The new interfaces are:
  * cpr-exec (MigMode migration parameter)
  * cpr-exec-args (migration parameter)
  * memfd-alloc=on (command-line option for -machine)
  * only-migratable-modes (command-line argument)

The caller sets the mode parameter before invoking the migrate command.

Arguments for the new QEMU process are taken from the cpr-exec-args parameter.
The first argument should be the path of a new QEMU binary, or a prefix
command that exec's the new QEMU binary, and the arguments should include
the -incoming option.

Memory backend objects must have the share=on attribute, and must be mmap'able
in the new QEMU process.  For example, memory-backend-file is acceptable,
but memory-backend-ram is not.

QEMU must be started with the '-machine memfd-alloc=on' option.  This causes
implicit RAM blocks (those not explicitly described by a memory-backend
object) to be allocated by mmap'ing a memfd.  Examples include VGA, ROM,
and even guest RAM when it is specified without without reference to a
memory-backend object.   The memfds are kept open across exec, their values
are saved in vmstate which is retrieved after exec, and they are re-mmap'd.

The '-only-migratable-modes cpr-exec' option guarantees that the
configuration supports cpr-exec.  QEMU will exit at start time if not.

Example:

In this example, we simply restart the same version of QEMU, but in
a real scenario one would set a new QEMU binary path in cpr-exec-args.

  # qemu-kvm -monitor stdio -object
  memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
  -m 4G -machine memfd-alloc=on ...

  QEMU 9.1.50 monitor - type 'help' for more information
  (qemu) info status
  VM status: running
  (qemu) migrate_set_parameter mode cpr-exec
  (qemu) migrate_set_parameter cpr-exec-args qemu-kvm ... -incoming file:vm.state
  (qemu) migrate -d file:vm.state
  (qemu) QEMU 9.1.50 monitor - type 'help' for more information
  (qemu) info status
  VM status: running

cpr-exec mode preserves attributes of outgoing devices that must be known
before the device is created on the incoming side, such as the memfd descriptor
number, but currently the migration stream is read after all devices are
created.  To solve this problem, I add two VMStateDescription options:
precreate and factory.  precreate objects are saved to their own migration
stream, distinct from the main stream, and are read early by incoming QEMU,
before devices are created.  Factory objects are allocated on demand, without
relying on a pre-registered object's opaque address, which is necessary
because the devices to which the state will apply have not been created yet
and hence have not registered an opaque address to receive the state.

This patch series implements a minimal version of cpr-exec.  Future series
will add support for:
  * vfio
  * chardev's without loss of connectivity
  * vhost
  * fine-grained seccomp controls
  * hostmem-memfd
  * cpr-exec migration test


Steve Sistare (26):
  oslib: qemu_clear_cloexec
  vl: helper to request re-exec
  migration: SAVEVM_FOREACH
  migration: delete unused parameter mis
  migration: precreate vmstate
  migration: precreate vmstate for exec
  migration: VMStateId
  migration: vmstate_info_void_ptr
  migration: vmstate_register_named
  migration: vmstate_unregister_named
  migration: vmstate_register at init time
  migration: vmstate factory object
  physmem: ram_block_create
  physmem: hoist guest_memfd creation
  physmem: hoist host memory allocation
  physmem: set ram block idstr earlier
  machine: memfd-alloc option
  migration: cpr-exec-args parameter
  physmem: preserve ram blocks for cpr
  migration: cpr-exec mode
  migration: migrate_add_blocker_mode
  migration: ram block cpr-exec blockers
  migration: misc cpr-exec blockers
  seccomp: cpr-exec blocker
  migration: fix mismatched GPAs during cpr-exec
  migration: only-migratable-modes

 accel/xen/xen-all.c            |   5 +
 backends/hostmem-epc.c         |  12 +-
 hmp-commands.hx                |   2 +-
 hw/core/machine.c              |  22 +++
 hw/core/qdev.c                 |   1 +
 hw/intc/apic_common.c          |   2 +-
 hw/vfio/migration.c            |   3 +-
 include/exec/cpu-common.h      |   3 +-
 include/exec/memory.h          |  15 ++
 include/exec/ramblock.h        |  10 +-
 include/hw/boards.h            |   1 +
 include/migration/blocker.h    |   7 +
 include/migration/cpr.h        |  14 ++
 include/migration/misc.h       |  11 ++
 include/migration/vmstate.h    | 133 +++++++++++++++-
 include/qemu/osdep.h           |   9 ++
 include/sysemu/runstate.h      |   3 +
 include/sysemu/seccomp.h       |   1 +
 include/sysemu/sysemu.h        |   1 -
 migration/cpr.c                | 131 ++++++++++++++++
 migration/meson.build          |   3 +
 migration/migration-hmp-cmds.c |  50 +++++-
 migration/migration.c          |  48 +++++-
 migration/migration.h          |   5 +-
 migration/options.c            |  13 ++
 migration/precreate.c          | 139 +++++++++++++++++
 migration/ram.c                |  16 +-
 migration/savevm.c             | 306 +++++++++++++++++++++++++++++-------
 migration/savevm.h             |   3 +
 migration/trace-events         |   7 +
 migration/vmstate-factory.c    |  78 ++++++++++
 migration/vmstate-types.c      |  24 +++
 migration/vmstate.c            |   3 +-
 qapi/migration.json            |  48 +++++-
 qemu-options.hx                |  22 ++-
 replay/replay.c                |   6 +
 stubs/migr-blocker.c           |   5 +
 stubs/vmstate.c                |  13 ++
 system/globals.c               |   1 -
 system/memory.c                |  19 ++-
 system/physmem.c               | 346 +++++++++++++++++++++++++++--------------
 system/qemu-seccomp.c          |  10 +-
 system/runstate.c              |  29 ++++
 system/trace-events            |   4 +
 system/vl.c                    |  26 +++-
 target/s390x/cpu_models.c      |   4 +-
 util/oslib-posix.c             |   9 ++
 util/oslib-win32.c             |   4 +
 48 files changed, 1417 insertions(+), 210 deletions(-)
 create mode 100644 include/migration/cpr.h
 create mode 100644 migration/cpr.c
 create mode 100644 migration/precreate.c
 create mode 100644 migration/vmstate-factory.c

-- 
1.8.3.1
cpr-exec doc (was Re: [PATCH V1 00/26] Live update: cpr-exec)
Posted by Steven Sistare 2 weeks, 1 day ago
On 4/29/2024 11:55 AM, Steve Sistare wrote:
> This patch series adds the live migration cpr-exec mode.

Here is the text I plan to add to docs/devel/migration/CPR.rst.  It is
premature for me to submit this as a patch, because it includes all
the functionality I plan to add in this and future series, but it may
help you while reviewing this series.

- Steve

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::

cpr-exec mode
---------------

In this mode, QEMU stops the VM, writes VM state to the migration
URI, and directly exec's a new version of QEMU on the same host,
replacing the original process while retaining its PID.  Guest RAM is
preserved in place, albeit with new virtual addresses.  The user
completes the migration by specifying the ``-incoming`` option, and
by issuing the ``migrate-incoming`` command if necessary; see details
below.

This mode supports vfio devices by preserving device descriptors and
hence kernel state across the exec, even for devices that do not
support live migration, and preserves tap and vhost descriptors.

cpr-exec also preserves descriptors for a subset of chardevs,
including socket, file, parallel, pipe, serial, pty, stdio, and null.
chardevs that support cpr-exec have the QEMU_CHAR_FEATURE_CPR set in
the Chardev object.  The client side of a preserved chardev sees no
loss of connectivity during cpr-exec.  More chardevs could be
preserved with additional developement.

All chardevs have a ``reopen-on-cpr`` option which causes the chardev
to be closed and reopened during cpr-exec.  This can be set to allow
cpr-exec when the configuration includes a chardev (such as vc) that
does not have QEMU_CHAR_FEATURE_CPR.

Because the old and new QEMU instances are not active concurrently,
the URI cannot be a type that streams data from one instance to the
other.

Usage
^^^^^

Arguments for the new QEMU process are taken from the
@cpr-exec-args parameter.  The first argument should be the
path of a new QEMU binary, or a prefix command that exec's the
new QEMU binary, and the arguments should include the ''-incoming''
option.

Memory backend objects must have the ``share=on`` attribute, and
must be mmap'able in the new QEMU process.  For example,
memory-backend-file is acceptable, but memory-backend-ram is
not.

The VM must be started with the ``-machine memfd-alloc=on``
option.  This causes implicit RAM blocks (those not explicitly
described by a memory-backend object) to be allocated by
mmap'ing a memfd.  Examples include VGA, ROM, and even guest
RAM when it is specified without without reference to a
memory-backend object.

Add the ``-only-migratable-modes cpr-exec`` option to guarantee that
the configuration supports cpr-exec.  QEMU will exit at start time
if not.

Outgoing:
   * Set the migration mode parameter to ``cpr-exec``.
   * Set the ``cpr-exec-args`` parameter.
   * Issue the ``migrate`` command.  It is recommended the the URI be
     a ``file`` type, but one can use other types such as ``exec``,
     provided the command captures all the data from the outgoing side,
     and provides all the data to the incoming side.

Incoming:
   * You do not need to explicitly start new QEMU.  It is started as
     a side effect of the migrate command above.
   * If the VM was running when the outgoing ``migrate`` command was
     issued, then QEMU automatically resumes VM execution.

Example 1: incoming URI
^^^^^^^^^^^^^^^^^^^^^^^

In these examples, we simply restart the same version of QEMU, but in
a real scenario one would set a new QEMU binary path in cpr-exec-args.

::

   # qemu-kvm -monitor stdio
   -object 
memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m 4G
   -machine memfd-alloc=on
   ...

   QEMU 9.1.50 monitor - type 'help' for more information
   (qemu) info status
   VM status: running
   (qemu) migrate_set_parameter mode cpr-exec
   (qemu) migrate_set_parameter cpr-exec-args qemu-kvm ... -incoming 
file:vm.state
   (qemu) migrate -d file:vm.state
   (qemu) QEMU 9.1.50 monitor - type 'help' for more information
   (qemu) info status
   VM status: running

Example 2: incoming defer
^^^^^^^^^^^^^^^^^^^^^^^^^
::

   # qemu-kvm -monitor stdio
   -object 
memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on -m 4G
   -machine memfd-alloc=on
   ...

   QEMU 9.1.50 monitor - type 'help' for more information
   (qemu) info status
   VM status: running
   (qemu) migrate_set_parameter mode cpr-exec
   (qemu) migrate_set_parameter cpr-exec-args qemu-kvm ... -incoming defer
   (qemu) migrate -d file:vm.state
   (qemu) QEMU 9.1.50 monitor - type 'help' for more information
   (qemu) info status
   status: paused (inmigrate)
   (qemu) migrate_incoming file:vm.state
   (qemu) info status
   VM status: running


Caveats
^^^^^^^

cpr-exec mode may not be used with postcopy, background-snapshot,
or COLO.

cpr-exec mode requires permission to use the exec system call, which
is denied by certain sandbox options, such as spawn.  Use finer
grained controls to allow exec, eg:
``-sandbox on,fork=deny,ns=deny,exec=allow``

::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
Re: cpr-exec doc (was Re: [PATCH V1 00/26] Live update: cpr-exec)
Posted by Peter Xu 2 weeks, 1 day ago
On Thu, May 02, 2024 at 12:13:17PM -0400, Steven Sistare wrote:
> On 4/29/2024 11:55 AM, Steve Sistare wrote:
> > This patch series adds the live migration cpr-exec mode.
> 
> Here is the text I plan to add to docs/devel/migration/CPR.rst.  It is
> premature for me to submit this as a patch, because it includes all
> the functionality I plan to add in this and future series, but it may
> help you while reviewing this series.

I haven't reached this series at all yet but thanks for sending this,
definitely helpful for reviews.  I almost tried to ask for it. :)

I don't think it's an issue to send doc updates without full
implementations ready.  We can still mark things as BTD even in doc IMHO,
and it may help to provide a better picture of the whole thing if e.g. this
series only implemented part of them, to either reviewers or users (for the
latter, if the partially impl feature can already be consumed).

Thanks,

-- 
Peter Xu