Hi Steve
On Tue, Jul 6, 2021 at 8:58 PM Steve Sistare <steven.sistare@oracle.com>
wrote:
> Provide the cprsave, cprexec, and cprload commands for live update. These
> save and restore VM state, with minimal guest pause time, so that qemu may
> be updated to a new version in between.
>
> cprsave stops the VM and saves vmstate to an ordinary file. It supports
> any
> type of guest image and block device, but the caller must not modify guest
> block devices between cprsave and cprload. It supports two modes: reboot
> and restart.
>
> In reboot mode, the caller invokes cprsave and then terminates qemu.
> The caller may then update the host kernel and system software and reboot.
> The caller resumes the guest by running qemu with the same arguments as the
> original process and invoking cprload. To use this mode, guest ram must be
> mapped to a persistent shared memory file such as /dev/dax0.0, or /dev/shm
> PKRAM as proposed in
> https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yznaga@oracle.com
> .
>
> The reboot mode supports vfio devices if the caller first suspends the
> guest, such as by issuing guest-suspend-ram to the qemu guest agent. The
> guest drivers' suspend methods flush outstanding requests and re-initialize
> the devices, and thus there is no device state to save and restore.
>
> Restart mode preserves the guest VM across a restart of the qemu process.
> After cprsave, the caller passes qemu command-line arguments to cprexec,
> which directly exec's the new qemu binary. The arguments must include -S
> so new qemu starts in a paused state and waits for the cprload command.
> The restart mode supports vfio devices by preserving the vfio container,
> group, device, and event descriptors across the qemu re-exec, and by
> updating DMA mapping virtual addresses using VFIO_DMA_UNMAP_FLAG_VADDR and
> VFIO_DMA_MAP_FLAG_VADDR as defined in
> https://lore.kernel.org/kvm/1611939252-7240-1-git-send-email-steven.sistare@oracle.com/
> and integrated in Linux kernel 5.12.
>
> To use the restart mode, qemu must be started with the memfd-alloc option,
> which allocates guest ram using memfd_create. The memfd's are saved to
> the environment and kept open across exec, after which they are found from
> the environment and re-mmap'd. Hence guest ram is preserved in place,
> albeit with new virtual addresses in the qemu process.
>
> The caller resumes the guest by invoking cprload, which loads state from
> the file. If the VM was running at cprsave time, then VM execution
> resumes.
> If the VM was suspended at cprsave time (reboot mode), then the caller must
> issue a system_wakeup command to resume.
>
> The first patches add reboot mode:
> - qemu_ram_volatile
> - cpr: reboot mode
> - cpr: QMP interfaces for reboot
> - cpr: HMP interfaces for reboot
>
> The next patches add restart mode:
> - as_flat_walk
> - oslib: qemu_clr_cloexec
> - machine: memfd-alloc option
> - vl: add helper to request re-exec
> - string to strList
> - util: env var helpers
> - cpr: restart mode
> - cpr: QMP interfaces for restart
> - cpr: HMP interfaces for restart
>
> The next patches add vfio support for restart mode:
> - pci: export functions for cpr
> - vfio-pci: refactor for cpr
> - vfio-pci: cpr part 1
> - vfio-pci: cpr part 2
>
> The next patches preserve various descriptor-based backend devices across
> cprexec:
> - vhost: reset vhost devices upon cprsave
> - hostmem-memfd: cpr support
> - chardev: cpr framework
> - chardev: cpr for simple devices
> - chardev: cpr for pty
> - chardev: cpr for sockets
> - cpr: only-cpr-capable option
> - simplify savevm
>
> Here is an example of updating qemu from v4.2.0 to v4.2.1 using
> restart mode. The software update is performed while the guest is
> running to minimize downtime.
>
> window 1 | window 2
> |
> # qemu-system-x86_64 ... |
> QEMU 4.2.0 monitor - type 'help' ... |
> (qemu) info status |
> VM status: running |
> | # yum update qemu
> (qemu) cprsave /tmp/qemu.sav restart |
> (qemu) cprexec qemu-system-x86_64 -S ... |
> QEMU 4.2.1 monitor - type 'help' ... |
> (qemu) info status |
> VM status: paused (prelaunch) |
> (qemu) cprload /tmp/qemu.sav |
> (qemu) info status |
> VM status: running |
>
>
> Here is an example of updating the host kernel using reboot mode.
>
> window 1 | window 2
> |
> # qemu-system-x86_64 ...mem-path=/dev/dax0.0 ...|
> QEMU 4.2.1 monitor - type 'help' ... |
> (qemu) info status |
> VM status: running |
> | # yum update kernel-uek
> (qemu) cprsave /tmp/qemu.sav restart |
> (qemu) quit |
> |
> # systemctl kexec |
> kexec_core: Starting new kernel |
> ... |
> |
> # qemu-system-x86_64 -S mem-path=/dev/dax0.0 ...|
> QEMU 4.2.1 monitor - type 'help' ... |
> (qemu) info status |
> VM status: paused (prelaunch) |
> (qemu) cprload /tmp/qemu.sav |
> (qemu) info status |
> VM status: running |
>
> Changes from V1 to V2:
> - revert vmstate infrastructure changes
> - refactor cpr functions into new files
> - delete MADV_DOEXEC and use memfd + VFIO_DMA_UNMAP_FLAG_SUSPEND to
> preserve memory.
> - add framework to filter chardev's that support cpr
> - save and restore vfio eventfd's
> - modify cprinfo QMP interface
> - incorporate misc review feedback
> - remove unrelated and unneeded patches
> - refactor all patches into a shorter and easier to review series
>
> Changes from V2 to V3:
> - rebase to qemu 6.0.0
> - use final definition of vfio ioctls (VFIO_DMA_UNMAP_FLAG_VADDR etc)
> - change memfd-alloc to a machine option
> - Use qio_channel_socket_new_fd instead of adding
> qio_channel_socket_new_fd
> - close monitor socket during cpr
> - fix a few unreported bugs
> - support memory-backend-memfd
>
> Changes from V3 to V4:
> - split reboot mode into separate patches
> - add cprexec command
> - delete QEMU_START_FREEZE, argv_main, and /usr/bin/qemu-exec
> - add more checks for vfio and cpr compatibility, and recover after
> errors
> - save vfio pci config in vmstate
> - rename {setenv,getenv}_event_fd to {save,load}_event_fd
> - use qemu_strtol
> - change 6.0 references to 6.1
> - use strerror(), use EXIT_FAILURE, remove period from error messages
> - distribute MAINTAINERS additions to each patch
>
> Steve Sistare (21):
> qemu_ram_volatile
> cpr: reboot mode
> as_flat_walk
> oslib: qemu_clr_cloexec
> machine: memfd-alloc option
> vl: add helper to request re-exec
> string to strList
> util: env var helpers
> cpr: restart mode
> cpr: QMP interfaces for restart
> cpr: HMP interfaces for restart
> pci: export functions for cpr
> vfio-pci: refactor for cpr
> vfio-pci: cpr part 1
> vfio-pci: cpr part 2
> hostmem-memfd: cpr support
> chardev: cpr framework
> chardev: cpr for simple devices
> chardev: cpr for pty
> cpr: only-cpr-capable option
> simplify savevm
>
> Mark Kanda, Steve Sistare (4):
> cpr: QMP interfaces for reboot
> cpr: HMP interfaces for reboot
> vhost: reset vhost devices upon cprsave
> chardev: cpr for sockets
>
> MAINTAINERS | 12 +++
> backends/hostmem-memfd.c | 21 ++--
> chardev/char-mux.c | 1 +
> chardev/char-null.c | 1 +
> chardev/char-pty.c | 15 ++-
> chardev/char-serial.c | 1 +
> chardev/char-socket.c | 35 +++++++
> chardev/char-stdio.c | 8 ++
> chardev/char.c | 41 +++++++-
> gdbstub.c | 1 +
> hmp-commands.hx | 62 ++++++++++++
> hw/core/machine.c | 19 ++++
> hw/pci/msix.c | 20 ++--
> hw/pci/pci.c | 7 +-
> hw/vfio/common.c | 78 ++++++++++++--
> hw/vfio/cpr.c | 154 ++++++++++++++++++++++++++++
> hw/vfio/meson.build | 1 +
> hw/vfio/pci.c | 230
> +++++++++++++++++++++++++++++++++++++++---
> hw/vfio/trace-events | 1 +
> hw/virtio/vhost.c | 11 ++
> include/chardev/char.h | 6 ++
> include/exec/memory.h | 25 +++++
> include/hw/boards.h | 1 +
> include/hw/pci/msix.h | 5 +
> include/hw/pci/pci.h | 2 +
> include/hw/vfio/vfio-common.h | 8 ++
> include/hw/virtio/vhost.h | 1 +
> include/migration/cpr.h | 20 ++++
> include/monitor/hmp.h | 4 +
> include/qemu/env.h | 23 +++++
> include/qemu/osdep.h | 1 +
> include/sysemu/runstate.h | 2 +
> include/sysemu/sysemu.h | 1 +
> linux-headers/linux/vfio.h | 27 +++++
> migration/cpr.c | 195 +++++++++++++++++++++++++++++++++++
> migration/meson.build | 1 +
> migration/migration.c | 5 +
> migration/savevm.c | 21 ++--
> migration/savevm.h | 2 +
> monitor/hmp-cmds.c | 75 ++++++++++++--
> monitor/hmp.c | 3 +
> monitor/qmp-cmds.c | 36 +++++++
> monitor/qmp.c | 3 +
> qapi/char.json | 5 +-
> qapi/cpr.json | 88 ++++++++++++++++
> qapi/meson.build | 1 +
> qapi/qapi-schema.json | 1 +
> qemu-options.hx | 39 ++++++-
> softmmu/globals.c | 1 +
> softmmu/memory.c | 48 +++++++++
> softmmu/physmem.c | 49 +++++++--
> softmmu/runstate.c | 58 ++++++++++-
> softmmu/vl.c | 14 ++-
> stubs/cpr.c | 3 +
> stubs/meson.build | 1 +
> trace-events | 1 +
> util/env.c | 95 +++++++++++++++++
> util/meson.build | 1 +
> util/oslib-posix.c | 9 ++
> util/oslib-win32.c | 4 +
> util/qemu-config.c | 4 +
> 61 files changed, 1525 insertions(+), 83 deletions(-)
> create mode 100644 hw/vfio/cpr.c
> create mode 100644 include/migration/cpr.h
> create mode 100644 include/qemu/env.h
> create mode 100644 migration/cpr.c
> create mode 100644 qapi/cpr.json
> create mode 100644 stubs/cpr.c
> create mode 100644 util/env.c
>
> --
> 1.8.3.1
>
>
>
It doesn't apply on master, could you rebase and resend?
thanks
--
Marc-André Lureau