block/monitor/block-hmp-cmds.c | 7 +- block/snapshot.c | 167 +++++++++++++++++--------- include/block/snapshot.h | 19 +-- include/migration/snapshot.h | 10 +- migration/savevm.c | 210 +++++++++++++++++++++++++++------ monitor/hmp-cmds.c | 11 +- qapi/job.json | 9 +- qapi/migration.json | 112 ++++++++++++++++++ replay/replay-snapshot.c | 4 +- softmmu/vl.c | 2 +- tests/qemu-iotests/267.out | 14 +-- tests/qemu-iotests/310 | 125 ++++++++++++++++++++ tests/qemu-iotests/310.out | 0 tests/qemu-iotests/group | 1 + 14 files changed, 562 insertions(+), 129 deletions(-) create mode 100755 tests/qemu-iotests/310 create mode 100644 tests/qemu-iotests/310.out
A followup to: v1: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg00866.html When QMP was first introduced some 10+ years ago now, the snapshot related commands (savevm/loadvm/delvm) were not converted. This was primarily because their implementation causes blocking of the thread running the monitor commands. This was (and still is) considered undesirable behaviour both in HMP and QMP. In theory someone was supposed to fix this flaw at some point in the past 10 years and bring them into the QMP world. Sadly, thus far it hasn't happened as people always had more important things to work on. Enterprise apps were much more interested in external snapshots than internal snapshots as they have many more features. Meanwhile users still want to use internal snapshots as there is a certainly simplicity in having everything self-contained in one image, even though it has limitations. Thus the apps that end up executing the savevm/loadvm/delvm via the "human-monitor-command" QMP command. IOW, the problematic blocking behaviour that was one of the reasons for not having savevm/loadvm/delvm in QMP is experienced by applications regardless. By not portting the commands to QMP due to one design flaw, we've forced apps and users to suffer from other design flaws of HMP ( bad error reporting, strong type checking of args, no introspection) for an additional 10 years. This feels rather sub-optimal :-( In practice users don't appear to care strongly about the fact that these commands block the VM while they run. I might have seen one bug report about it, but it certainly isn't something that comes up as a frequent topic except among us QEMU maintainers. Users do care about having access to the snapshot feature. Where I am seeing frequent complaints is wrt the use of OVMF combined with snapshots which has some serious pain points. This is getting worse as the push to ditch legacy BIOS in favour of UEFI gain momentum both across OS vendors and mgmt apps. Solving it requires new parameters to the commands, but doing this in HMP is super unappealing. After 10 years, I think it is time for us to be a little pragmatic about our handling of snapshots commands. My desire is that libvirt should never use "human-monitor-command" under any circumstances, because of the inherant flaws in HMP as a protocol for machine consumption. Thus in this series I'm proposing a fairly direct mapping of the existing HMP commands for savevm/loadvm/delvm into QMP as a first step. This does not solve the blocking thread problem, but it does put in a place a design using the jobs framework which can facilitate solving it later. It does also solve the error reporting, type checking and introspection problems inherant to HMP. So we're winning on 3 out of the 4 problems, and pushed apps to a QMP design that will let us solve the last remaining problem. With a QMP variant, we reasonably deal with the problems related to OVMF: - The logic to pick which disk to store the vmstate in is not satsifactory. The first block driver state cannot be assumed to be the root disk image, it might be OVMF varstore and we don't want to store vmstate in there. - The logic to decide which disks must be snapshotted is hardwired to all disks which are writable Again with OVMF there might be a writable varstore, but this can be raw rather than qcow2 format, and thus unable to be snapshotted. While users might wish to snapshot their varstore, in some/many/most cases it is entirely uneccessary. Users are blocked from snapshotting their VM though due to this varstore. These are solved by adding two parameters to the commands. The first is a block device node name that identifies the image to store vmstate in, and the second is a list of node names to include for the snapshots. If the list of nodes isn't given, it falls back to the historical behaviour of using all disks matching some undocumented criteria. In the block code I've only dealt with node names for block devices, as IIUC, this is all that libvirt should need in the -blockdev world it now lives in. IOW, I've made not attempt to cope with people wanting to use these QMP commands in combination with -drive args. I've done some minimal work in libvirt to start to make use of the new commands to validate their functionality, but this isn't finished yet. My ultimate goal is to make the GNOME Boxes maintainer happy again by having internal snapshots work with OVMF: https://gitlab.gnome.org/GNOME/gnome-boxes/-/commit/c486da262f6566326fbcb5e= f45c5f64048f16a6e HELP NEEDED: this series starts to implement the approach that Kevin suggested wrto use of generic jobs. When I try to actually run the code though it crashes: ERROR:/home/berrange/src/virt/qemu/softmmu/cpus.c:1788:qemu_mutex_unlock_ioth= read: assertion failed: (qemu_mutex_iothread_locked()) Bail out! ERROR:/home/berrange/src/virt/qemu/softmmu/cpus.c:1788:qemu_mutex_u= nlock_iothread: assertion failed: (qemu_mutex_iothread_locked()) Obviously I've missed something related to locking, but I've no idea what, so I'm sending this v2 simply as a way to solicit suggestions for what I've messed up. You can reproduce with I/O tests using "check -qcow2 310" and it gave a stack: Thread 5 (Thread 0x7fffe6e4c700 (LWP 3399011)): #0 futex_wait_cancelable (private=0, expected=0, futex_word=0x5555566a9fd8) at ../sysdeps/nptl/futex-internal.h:183 #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x555556227160 <qemu_global_mutex>, cond=0x5555566a9fb0) at pthread_cond_wait.c:508 #2 __pthread_cond_wait (cond=cond@entry=0x5555566a9fb0, mutex=mutex@entry=0x555556227160 <qemu_global_mutex>) at pthread_cond_wait.c:638 #3 0x0000555555ceb6cb in qemu_cond_wait_impl (cond=0x5555566a9fb0, mutex=0x555556227160 <qemu_global_mutex>, file=0x555555d44198 "/home/berrange/src/virt/qemu/softmmu/cpus.c", line=1145) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:174 #4 0x0000555555931974 in qemu_wait_io_event (cpu=cpu@entry=0x555556685050) at /home/berrange/src/virt/qemu/softmmu/cpus.c:1145 #5 0x0000555555933a89 in qemu_dummy_cpu_thread_fn (arg=arg@entry=0x555556685050) at /home/berrange/src/virt/qemu/softmmu/cpus.c:1241 #6 0x0000555555ceb049 in qemu_thread_start (args=0x7fffe6e476f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521 #7 0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477 #8 0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 4 (Thread 0x7fffe764d700 (LWP 3399010)): #0 0x00007ffff4effb6f in __GI___poll (fds=0x7fffdc006ec0, nfds=3, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 #1 0x00007ffff7c1aace in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0 #2 0x00007ffff7c1ae53 in g_main_loop_run () at /lib64/libglib-2.0.so.0 #3 0x00005555559a9d81 in iothread_run (opaque=opaque@entry=0x55555632f200) at /home/berrange/src/virt/qemu/iothread.c:82 #4 0x0000555555ceb049 in qemu_thread_start (args=0x7fffe76486f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521 #5 0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477 #6 0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 3 (Thread 0x7fffe7e4e700 (LWP 3399009)): #0 0x00007ffff4fe5c58 in futex_abstimed_wait_cancelable (private=0, abstime=0x7fffe7e49650, clockid=0, expected=0, futex_word=0x5555562bf888) at ../sysdeps/nptl/futex-internal.h:320 #1 do_futex_wait (sem=sem@entry=0x5555562bf888, abstime=abstime@entry=0x7fffe7e49650, clockid=0) at sem_waitcommon.c:112 #2 0x00007ffff4fe5d83 in __new_sem_wait_slow (sem=sem@entry=0x5555562bf888, abstime=abstime@entry=0x7fffe7e49650, clockid=0) at sem_waitcommon.c:184 #3 0x00007ffff4fe5e12 in sem_timedwait (sem=sem@entry=0x5555562bf888, abstime=abstime@entry=0x7fffe7e49650) at sem_timedwait.c:40 #4 0x0000555555cebbdf in qemu_sem_timedwait (sem=sem@entry=0x5555562bf888, ms=ms@entry=10000) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:307 #5 0x0000555555d03fa4 in worker_thread (opaque=opaque@entry=0x5555562bf810) at /home/berrange/src/virt/qemu/util/thread-pool.c:91 #6 0x0000555555ceb049 in qemu_thread_start (args=0x7fffe7e496f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521 #7 0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477 #8 0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 2 (Thread 0x7fffe8750700 (LWP 3399008)): #0 0x00007ffff4ed1871 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7fffe874b670, rem=0x7fffe874b680) at ../sysdeps/unix/sysv/linux/--Type <RET> for more, q to quit, c to continue without paging-- clock_nanosleep.c:48 #1 0x00007ffff4ed71c7 in __GI___nanosleep (requested_time=<optimized out>, remaining=<optimized out>) at nanosleep.c:27 #2 0x00007ffff7c462f7 in g_usleep () at /lib64/libglib-2.0.so.0 #3 0x0000555555cf3fd0 in call_rcu_thread (opaque=opaque@entry=0x0) at /home/berrange/src/virt/qemu/util/rcu.c:250 #4 0x0000555555ceb049 in qemu_thread_start (args=0x7fffe874b6f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521 #5 0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477 #6 0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 Thread 1 (Thread 0x7fffe88abec0 (LWP 3398996)): #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007ffff4e2e895 in __GI_abort () at abort.c:79 #2 0x00007ffff7be5b8c in g_assertion_message_expr.cold () at /lib64/libglib-2.0.so.0 #3 0x00007ffff7c43a1f in g_assertion_message_expr () at /lib64/libglib-2.0.so.0 #4 0x0000555555932da0 in qemu_mutex_unlock_iothread () at /home/berrange/src/virt/qemu/softmmu/cpus.c:1788 #5 qemu_mutex_unlock_iothread () at /home/berrange/src/virt/qemu/softmmu/cpus.c:1786 #6 0x0000555555cfeceb in os_host_main_loop_wait (timeout=26359275747000) at /home/berrange/src/virt/qemu/util/main-loop.c:232 #7 main_loop_wait (nonblocking=nonblocking@entry=0) at /home/berrange/src/virt/qemu/util/main-loop.c:516 #8 0x0000555555941f27 in qemu_main_loop () at /home/berrange/src/virt/qemu/softmmu/vl.c:1676 #9 0x000055555581a18e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/berrange/src/virt/qemu/softmmu/main.c:49 (gdb) Changed in v2: - Use new command names "snapshot-{load,save,delete}" to make it clear that these are different from the "savevm|loadvm|delvm" as they use the Job framework - Use an include list for block devs, not an exclude list Daniel P. Berrang=C3=A9 (6): migration: improve error reporting of block driver state name block: push error reporting into bdrv_all_*_snapshot functions migration: stop returning errno from load_snapshot() block: add ability to specify list of blockdevs during snapshot block: allow specifying name of block device for vmstate storage migration: introduce snapshot-{save,load,delete} QMP commands block/monitor/block-hmp-cmds.c | 7 +- block/snapshot.c | 167 +++++++++++++++++--------- include/block/snapshot.h | 19 +-- include/migration/snapshot.h | 10 +- migration/savevm.c | 210 +++++++++++++++++++++++++++------ monitor/hmp-cmds.c | 11 +- qapi/job.json | 9 +- qapi/migration.json | 112 ++++++++++++++++++ replay/replay-snapshot.c | 4 +- softmmu/vl.c | 2 +- tests/qemu-iotests/267.out | 14 +-- tests/qemu-iotests/310 | 125 ++++++++++++++++++++ tests/qemu-iotests/310.out | 0 tests/qemu-iotests/group | 1 + 14 files changed, 562 insertions(+), 129 deletions(-) create mode 100755 tests/qemu-iotests/310 create mode 100644 tests/qemu-iotests/310.out --=20 2.26.2
On Mon, Jul 27, 2020 at 04:08:37PM +0100, Daniel P. Berrangé wrote: > A followup to: > > v1: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg00866.html snip > HELP NEEDED: this series starts to implement the approach that Kevin > suggested wrto use of generic jobs. > > When I try to actually run the code though it crashes: > > ERROR:/home/berrange/src/virt/qemu/softmmu/cpus.c:1788:qemu_mutex_unlock_ioth= > read: assertion failed: (qemu_mutex_iothread_locked()) > Bail out! ERROR:/home/berrange/src/virt/qemu/softmmu/cpus.c:1788:qemu_mutex_u= > nlock_iothread: assertion failed: (qemu_mutex_iothread_locked()) > > Obviously I've missed something related to locking, but I've no idea > what, so I'm sending this v2 simply as a way to solicit suggestions > for what I've messed up. What I've found is qmp_snapshot_save() is the QMP handler and runs in the main thread, so iothread lock is held. This calls job_create() which ends up invoking snapshot_save_job_run in a background coroutine, but IIUC iothread lock is still held when the coroutine starts. This then invokes save_snapshot() which invokes qemu_savevm_state This calls qemu_mutex_unlock_iothread() and then qemu_savevm_state_setup(). Eventually something in the qcow2 code triggers qemu_coroutine_yield() so control goes back to the main event loop thread. The problem is that the iothread lock has been released, but the main event loop thread is still expecting it to be held. I've no idea how to go about solving this problem. The save_snapshot() code, as written today, needs to run serialized with everything else, but because the job framework has used a coroutine to run it, we can switch back to the main event thread at any time. I don't know how to force save_snapshot() to be serialized when using the generic job framework. > > You can reproduce with I/O tests using "check -qcow2 310" and it > gave a stack: > > Thread 5 (Thread 0x7fffe6e4c700 (LWP 3399011)): > #0 futex_wait_cancelable (private=0, expected=0, futex_word=0x5555566a9fd8) at ../sysdeps/nptl/futex-internal.h:183 > #1 __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x555556227160 <qemu_global_mutex>, cond=0x5555566a9fb0) at pthread_cond_wait.c:508 > #2 __pthread_cond_wait (cond=cond@entry=0x5555566a9fb0, mutex=mutex@entry=0x555556227160 <qemu_global_mutex>) at pthread_cond_wait.c:638 > #3 0x0000555555ceb6cb in qemu_cond_wait_impl (cond=0x5555566a9fb0, mutex=0x555556227160 <qemu_global_mutex>, file=0x555555d44198 "/home/berrange/src/virt/qemu/softmmu/cpus.c", line=1145) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:174 > #4 0x0000555555931974 in qemu_wait_io_event (cpu=cpu@entry=0x555556685050) at /home/berrange/src/virt/qemu/softmmu/cpus.c:1145 > #5 0x0000555555933a89 in qemu_dummy_cpu_thread_fn (arg=arg@entry=0x555556685050) at /home/berrange/src/virt/qemu/softmmu/cpus.c:1241 > #6 0x0000555555ceb049 in qemu_thread_start (args=0x7fffe6e476f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521 > #7 0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477 > #8 0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > Thread 4 (Thread 0x7fffe764d700 (LWP 3399010)): > #0 0x00007ffff4effb6f in __GI___poll (fds=0x7fffdc006ec0, nfds=3, timeout=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 > #1 0x00007ffff7c1aace in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0 > #2 0x00007ffff7c1ae53 in g_main_loop_run () at /lib64/libglib-2.0.so.0 > #3 0x00005555559a9d81 in iothread_run (opaque=opaque@entry=0x55555632f200) at /home/berrange/src/virt/qemu/iothread.c:82 > #4 0x0000555555ceb049 in qemu_thread_start (args=0x7fffe76486f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521 > #5 0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477 > #6 0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > Thread 3 (Thread 0x7fffe7e4e700 (LWP 3399009)): > #0 0x00007ffff4fe5c58 in futex_abstimed_wait_cancelable (private=0, abstime=0x7fffe7e49650, clockid=0, expected=0, futex_word=0x5555562bf888) at ../sysdeps/nptl/futex-internal.h:320 > #1 do_futex_wait (sem=sem@entry=0x5555562bf888, abstime=abstime@entry=0x7fffe7e49650, clockid=0) at sem_waitcommon.c:112 > #2 0x00007ffff4fe5d83 in __new_sem_wait_slow (sem=sem@entry=0x5555562bf888, abstime=abstime@entry=0x7fffe7e49650, clockid=0) at sem_waitcommon.c:184 > #3 0x00007ffff4fe5e12 in sem_timedwait (sem=sem@entry=0x5555562bf888, abstime=abstime@entry=0x7fffe7e49650) at sem_timedwait.c:40 > #4 0x0000555555cebbdf in qemu_sem_timedwait (sem=sem@entry=0x5555562bf888, ms=ms@entry=10000) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:307 > #5 0x0000555555d03fa4 in worker_thread (opaque=opaque@entry=0x5555562bf810) at /home/berrange/src/virt/qemu/util/thread-pool.c:91 > #6 0x0000555555ceb049 in qemu_thread_start (args=0x7fffe7e496f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521 > #7 0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477 > #8 0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > Thread 2 (Thread 0x7fffe8750700 (LWP 3399008)): > #0 0x00007ffff4ed1871 in __GI___clock_nanosleep (clock_id=clock_id@entry=0, flags=flags@entry=0, req=0x7fffe874b670, rem=0x7fffe874b680) at ../sysdeps/unix/sysv/linux/--Type <RET> for more, q to quit, c to continue without paging-- > clock_nanosleep.c:48 > #1 0x00007ffff4ed71c7 in __GI___nanosleep (requested_time=<optimized out>, remaining=<optimized out>) at nanosleep.c:27 > #2 0x00007ffff7c462f7 in g_usleep () at /lib64/libglib-2.0.so.0 > #3 0x0000555555cf3fd0 in call_rcu_thread (opaque=opaque@entry=0x0) at /home/berrange/src/virt/qemu/util/rcu.c:250 > #4 0x0000555555ceb049 in qemu_thread_start (args=0x7fffe874b6f0) at /home/berrange/src/virt/qemu/util/qemu-thread-posix.c:521 > #5 0x00007ffff4fdc432 in start_thread (arg=<optimized out>) at pthread_create.c:477 > #6 0x00007ffff4f0a9d3 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > Thread 1 (Thread 0x7fffe88abec0 (LWP 3398996)): > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > #1 0x00007ffff4e2e895 in __GI_abort () at abort.c:79 > #2 0x00007ffff7be5b8c in g_assertion_message_expr.cold () at /lib64/libglib-2.0.so.0 > #3 0x00007ffff7c43a1f in g_assertion_message_expr () at /lib64/libglib-2.0.so.0 > #4 0x0000555555932da0 in qemu_mutex_unlock_iothread () at /home/berrange/src/virt/qemu/softmmu/cpus.c:1788 > #5 qemu_mutex_unlock_iothread () at /home/berrange/src/virt/qemu/softmmu/cpus.c:1786 > #6 0x0000555555cfeceb in os_host_main_loop_wait (timeout=26359275747000) at /home/berrange/src/virt/qemu/util/main-loop.c:232 > #7 main_loop_wait (nonblocking=nonblocking@entry=0) at /home/berrange/src/virt/qemu/util/main-loop.c:516 > #8 0x0000555555941f27 in qemu_main_loop () at /home/berrange/src/virt/qemu/softmmu/vl.c:1676 > #9 0x000055555581a18e in main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at /home/berrange/src/virt/qemu/softmmu/main.c:49 > (gdb) The coroutine trace to go with this is (gdb) qemu coroutine 0x5555575b86b0 #0 qemu_coroutine_switch (from_=<optimized out>, to_=<optimized out>, action=action@entry=COROUTINE_YIELD) at /home/berrange/src/virt/qemu/util/coroutine-ucontext.c:303 #1 0x0000555555cfbccd in qemu_coroutine_yield () at /home/berrange/src/virt/qemu/util/qemu-coroutine.c:193 #2 0x0000555555d00d49 in thread_pool_submit_co (pool=0x5555562b8810, func=func@entry=0x555555c3a2d0 <handle_aiocb_rw>, arg=arg@entry=0x7fffe61ff200) at /home/berrange/src/virt/qemu/util/thread-pool.c:288 #3 0x0000555555c3c4db in raw_thread_pool_submit (arg=0x7fffe61ff200, func=0x555555c3a2d0 <handle_aiocb_rw>, bs=0x55555653c2d0) at /home/berrange/src/virt/qemu/block/file-posix.c:1975 #4 raw_co_prw (bs=0x55555653c2d0, offset=131072, bytes=65536, qiov=0x7fffe61ff560, type=1) at /home/berrange/src/virt/qemu/block/file-posix.c:2022 #5 0x0000555555c43131 in bdrv_driver_preadv (bs=0x55555653c2d0, offset=131072, bytes=65536, qiov=0x7fffe61ff560, qiov_offset=<optimized out>, flags=0) at /home/berrange/src/virt/qemu/block/io.c:1139 #6 0x0000555555c47484 in bdrv_aligned_preadv (req=req@entry=0x7fffe61ff440, offset=131072, bytes=65536, align=<optimized out>, qiov=0x7fffe61ff560, qiov_offset=0, flags=0, child=0x5555562b2a80, child=<optimized out>) at /home/berrange/src/virt/qemu/block/io.c:1515 #7 0x0000555555c48955 in bdrv_co_preadv_part (child=0x5555562b2a80, offset=<optimized out>, bytes=<optimized out>, qiov=<optimized out>, qiov_offset=<optimized out>, flags=0) at /home/berrange/src/virt/qemu/block/io.c:1757 #8 0x0000555555c48ef4 in bdrv_run_co (opaque=0x7fffe61ff540, entry=0x555555c48a60 <bdrv_rw_co_entry>, bs=0x55555653c2d0) at /home/berrange/src/virt/qemu/block/io.c:915 #9 bdrv_prwv_co (flags=0, is_write=false, qiov=0x7fffe61ff560, offset=131072, child=<optimized out>) at /home/berrange/src/virt/qemu/block/io.c:966 #10 bdrv_preadv (qiov=0x7fffe61ff560, offset=131072, child=<optimized out>) at /home/berrange/src/virt/qemu/block/io.c:1024 #11 bdrv_pread (child=<optimized out>, offset=offset@entry=131072, buf=<optimized out>, bytes=<optimized out>) at /home/berrange/src/virt/qemu/block/io.c:1041 #12 0x0000555555c1fa2a in qcow2_cache_do_get (bs=0x555556542e00, c=0x55555654b130, offset=131072, table=0x7fffe61ff630, read_from_disk=<optimized out>) at /home/berrange/src/virt/qemu/block/qcow2-cache.c:49 #13 0x0000555555c14206 in qcow2_get_refcount (bs=bs@entry=0x555556542e00, cluster_index=0, refcount=refcount@entry=0x7fffe61ff670) at /home/berrange/src/virt/qemu/block/qcow2-refcount.c:271 #14 0x0000555555c1450d in alloc_clusters_noref (bs=bs@entry=0x555556542e00, size=140737054242416, size@entry=24, max=max@entry=72057594037927935) at /home/berrange/src/virt/qemu/block/qcow2-refcount.c:981 #15 0x0000555555c15534 in qcow2_alloc_clusters (bs=bs@entry=0x555556542e00, size=size@entry=24) at /home/berrange/src/virt/qemu/block/qcow2-refcount.c:1013 #16 0x0000555555c1a1f3 in qcow2_grow_l1_table (bs=0x555556542e00, min_size=min_size@entry=3, exact_size=exact_size@entry=false) at /home/berrange/src/virt/qemu/block/qcow2-cluster.c:139 #17 0x0000555555c1a983 in get_cluster_table (bs=bs@entry=0x555556542e00, offset=offset@entry=1073741824, new_l2_slice=new_l2_slice@entry=0x7fffe61ff8b0, new_l2_index=new_l2_index@entry=0x7fffe61ff8a8) at /home/berrange/src/virt/qemu/block/qcow2-cluster.c:688 #18 0x0000555555c1bee9 in handle_copied (m=0x7fffe61ff968, bytes=<synthetic pointer>, host_offset=<synthetic pointer>, guest_offset=1073741824, bs=0x555556542e00) at /home/berrange/src/virt/qemu/block/qcow2-cluster.c:1185 #19 qcow2_alloc_cluster_offset (bs=bs@entry=0x555556542e00, offset=offset@entry=1073741824, bytes=bytes@entry=0x7fffe61ff95c, host_offset=host_offset@entry=0x7fffe61ff960, m=m@entry=0x7fffe61ff968) at /home/berrange/src/virt/qemu/block/qcow2-cluster.c:1570 --Type <RET> for more, q to quit, c to continue without paging-- #20 0x0000555555c0adbc in qcow2_co_pwritev_part (bs=0x555556542e00, offset=1073741824, bytes=281, qiov=0x7fffe61ffa30, qiov_offset=0, flags=<optimized out>) at /home/berrange/src/virt/qemu/block/qcow2.c:2577 #21 0x0000555555c439f1 in bdrv_co_rw_vmstate (bs=0x555556542e00, qiov=<optimized out>, pos=<optimized out>, is_read=<optimized out>) at /home/berrange/src/virt/qemu/block/io.c:2660 #22 0x0000555555c45ecc in bdrv_co_rw_vmstate_entry (opaque=0x7fffe61ff9f0) at /home/berrange/src/virt/qemu/block/io.c:2674 #23 bdrv_run_co (opaque=0x7fffe61ff9f0, entry=0x555555c43a50 <bdrv_co_rw_vmstate_entry>, bs=0x555556542e00) at /home/berrange/src/virt/qemu/block/io.c:915 #24 bdrv_rw_vmstate (is_read=false, pos=0, qiov=0x7fffe61ff9e0, bs=0x555556542e00) at /home/berrange/src/virt/qemu/block/io.c:2688 #25 bdrv_writev_vmstate (bs=bs@entry=0x555556542e00, qiov=qiov@entry=0x7fffe61ffa30, pos=pos@entry=0) at /home/berrange/src/virt/qemu/block/io.c:2707 #26 0x0000555555b43ac8 in block_writev_buffer (opaque=0x555556542e00, iov=<optimized out>, iovcnt=<optimized out>, pos=0, errp=<optimized out>) at /home/berrange/src/virt/qemu/migration/savevm.c:139 #27 0x0000555555b4e698 in qemu_fflush (f=f@entry=0x5555566ae980) at /home/berrange/src/virt/qemu/migration/qemu-file.c:232 #28 0x000055555586bdeb in ram_save_setup (f=0x5555566ae980, opaque=<optimized out>) at /home/berrange/src/virt/qemu/migration/ram.c:2513 #29 0x0000555555b45bd1 in qemu_savevm_state_setup (f=f@entry=0x5555566ae980) at /home/berrange/src/virt/qemu/migration/savevm.c:1178 #30 0x0000555555b48713 in qemu_savevm_state (errp=0x5555575ba280, f=0x5555566ae980) at /home/berrange/src/virt/qemu/migration/savevm.c:1535 #31 save_snapshot (name=<optimized out>, vmstate=<optimized out>, devices=0x555556e50ba0, errp=errp@entry=0x5555575ba280) at /home/berrange/src/virt/qemu/migration/savevm.c:2744 #32 0x0000555555b48a8b in snapshot_save_job_run (job=0x5555575ba200, errp=0x5555575ba280) at /home/berrange/src/virt/qemu/migration/savevm.c:3016 #33 0x0000555555bef632 in job_co_entry (opaque=0x5555575ba200) at /home/berrange/src/virt/qemu/job.c:908 #34 0x0000555555cf74a3 in coroutine_trampoline (i0=<optimized out>, i1=<optimized out>) at /home/berrange/src/virt/qemu/util/coroutine-ucontext.c:173 #35 0x00007ffff4e5a250 in __start_context () at ../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91 #36 0x00007fffffffc6a0 in () #37 0x0000000000000000 in () Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Sorry for taking so long to reply. Daniel P. Berrangé <berrange@redhat.com> writes: > A followup to: > > v1: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg00866.html > > When QMP was first introduced some 10+ years ago now, the snapshot > related commands (savevm/loadvm/delvm) were not converted. This was > primarily because their implementation causes blocking of the thread > running the monitor commands. This was (and still is) considered > undesirable behaviour both in HMP and QMP. One of several reasons. > In theory someone was supposed to fix this flaw at some point in the > past 10 years and bring them into the QMP world. Sadly, thus far it > hasn't happened as people always had more important things to work > on. Enterprise apps were much more interested in external snapshots > than internal snapshots as they have many more features. Several attempts have been made to bring the functionality to QMP. Sadly, they went nowhere. I posted an analysis of the issues in reply to one of the more serious attempts: Message-ID: <87lh7l783q.fsf@blackfin.pond.sub.org> https://lists.nongnu.org/archive/html/qemu-devel/2016-01/msg03593.html I'd like to hear your take on it. I append the relevant part for your convenience. Perhaps your code is already close to what I describe there. I'm interested in where it falls short. > Meanwhile users still want to use internal snapshots as there is > a certainly simplicity in having everything self-contained in one > image, even though it has limitations. Thus the apps that end up > executing the savevm/loadvm/delvm via the "human-monitor-command" > QMP command. > > IOW, the problematic blocking behaviour that was one of the reasons > for not having savevm/loadvm/delvm in QMP is experienced by applications > regardless. By not portting the commands to QMP due to one design flaw, > we've forced apps and users to suffer from other design flaws of HMP ( > bad error reporting, strong type checking of args, no introspection) for > an additional 10 years. This feels rather sub-optimal :-( > > In practice users don't appear to care strongly about the fact that these > commands block the VM while they run. I might have seen one bug report > about it, but it certainly isn't something that comes up as a frequent > topic except among us QEMU maintainers. Users do care about having > access to the snapshot feature. > > Where I am seeing frequent complaints is wrt the use of OVMF combined > with snapshots which has some serious pain points. This is getting worse > as the push to ditch legacy BIOS in favour of UEFI gain momentum both > across OS vendors and mgmt apps. Solving it requires new parameters to > the commands, but doing this in HMP is super unappealing. > > After 10 years, I think it is time for us to be a little pragmatic about > our handling of snapshots commands. My desire is that libvirt should never > use "human-monitor-command" under any circumstances, because of the > inherant flaws in HMP as a protocol for machine consumption. > > Thus in this series I'm proposing a fairly direct mapping of the existing > HMP commands for savevm/loadvm/delvm into QMP as a first step. This does > not solve the blocking thread problem, but it does put in a place a > design using the jobs framework which can facilitate solving it later. > It does also solve the error reporting, type checking and introspection > problems inherant to HMP. So we're winning on 3 out of the 4 problems, > and pushed apps to a QMP design that will let us solve the last > remaining problem. > > With a QMP variant, we reasonably deal with the problems related to OVMF: > > - The logic to pick which disk to store the vmstate in is not > satsifactory. > > The first block driver state cannot be assumed to be the root disk > image, it might be OVMF varstore and we don't want to store vmstate > in there. Yes, this is one of the issues. Glad you're addressing it. > - The logic to decide which disks must be snapshotted is hardwired > to all disks which are writable > > Again with OVMF there might be a writable varstore, but this can be > raw rather than qcow2 format, and thus unable to be snapshotted. > While users might wish to snapshot their varstore, in some/many/most > cases it is entirely uneccessary. Users are blocked from snapshotting > their VM though due to this varstore. Another one. Glad again. > These are solved by adding two parameters to the commands. The first is > a block device node name that identifies the image to store vmstate in, > and the second is a list of node names to include for the snapshots. > If the list of nodes isn't given, it falls back to the historical > behaviour of using all disks matching some undocumented criteria. > > In the block code I've only dealt with node names for block devices, as > IIUC, this is all that libvirt should need in the -blockdev world it now > lives in. IOW, I've made not attempt to cope with people wanting to use > these QMP commands in combination with -drive args. > > I've done some minimal work in libvirt to start to make use of the new > commands to validate their functionality, but this isn't finished yet. > > My ultimate goal is to make the GNOME Boxes maintainer happy again by > having internal snapshots work with OVMF: > > https://gitlab.gnome.org/GNOME/gnome-boxes/-/commit/c486da262f6566326fbcb5e= > f45c5f64048f16a6e > > HELP NEEDED: this series starts to implement the approach that Kevin > suggested wrto use of generic jobs. Does this mean you're trying to use the jobs infrastructure? > When I try to actually run the code though it crashes: > > ERROR:/home/berrange/src/virt/qemu/softmmu/cpus.c:1788:qemu_mutex_unlock_ioth= > read: assertion failed: (qemu_mutex_iothread_locked()) > Bail out! ERROR:/home/berrange/src/virt/qemu/softmmu/cpus.c:1788:qemu_mutex_u= > nlock_iothread: assertion failed: (qemu_mutex_iothread_locked()) > > Obviously I've missed something related to locking, but I've no idea > what, so I'm sending this v2 simply as a way to solicit suggestions > for what I've messed up. > > You can reproduce with I/O tests using "check -qcow2 310" and it > gave a stack: [...] Relevant part of Message-ID: <87lh7l783q.fsf@blackfin.pond.sub.org> If we can't make a sane QMP interface, I'd rather have no QMP interface. However, I believe we *can* make a sane QMP interface if we put in the design work. The design work must start with a review of what we're trying to accomplish, and how to fit it into the rest of the system. Here's my attempt. Since my knowledge on snapshots is rather superficial, I'm cc'ing Kevin for additional snapshot expertise. Kevin, please correct me when I talk nonsense. I'm further cc'ing Eric and Peter for the management layer perspective. A point-in-time snapshot of a system consists of a snapshot of its machine state and snapshots of its storage. All the snapshots need to be made at the same point in time for the result to be consistent. Snapshots of read-only storage carry no information and are commonly omitted. Isolated storage snapshots can make sense, but snapshotting the machine state without also snapshotting the machine's storage doesn't sound useful to me. Both storage and machine state snapshots come in two flavours: internal and external. External ones can be made with any block backend, but internal storage snapshots work only with certain formats, notably qcow2. QMP supports both kinds of storage snapshots. Both kinds of storage snapshots need exclusive access while they work. They're relatively quick, but the delay could be noticable for large internal snapshots, and perhaps for external snapshots on really slow storage. Internal machine state snapshots are currently only available via HMP's savevm, which integrates internal machine state and storage snapshots. This is non-live, i.e. the guest is stopped while the snapshot gets saved. I figure we could make it live if we really wanted to. Another instance of the emerging background job concept. On the implementation level, QCOW2 can't currently store a machine state snapshot without also storing a storage snapshot. I guess we could change this if we really wanted to. External machine state snapshots are basically migrate to file. Supported by QMP. Live migration to file is possible, but currently wastes space, because memory dirtied during migration gets saved multiple times. Could be fixed either by making migration update previously saved memory instead of appending (beware, random I/O), or by compacting the file afterwards. Non-live migration to file doesn't waste space that way. To take multiple *consistent* snapshots, you have to bundle them up in a transaction. Transactions currently support only *storage* snapshots, though. Work-around for external machine state snapshot: migrate to file (possibly live), leaving the guest sopped on completion, take storage snapshots, resume guest. You can combine internal and external storage snapshots with an external machine state snapshot to get a mixed system snapshot. You currently can't do that with an internal machine state snapshot: the only way to take one is HMP savevm, which insists on internally snapshotting all writable storage, and doesn't transact together with external storage snapshots. Except for the case "purely internal snapshot with just one writable storage device", a system snapshot consists of multiple parts stored in separate files. Tying the parts together is a management problem. QEMU provides rudimentary management for purely internal snapshots, but it's flawed: missing storage isn't detected, and additional storage can creep in if snapshot tags or IDs happen to match. I guess managing the parts is better left to the management layer. I figure a fully general QMP interface would let you perform a system snapshot by combining storage snapshots of either kind with either kind of machine state snapshot. We already have most of the building blocks: we can take external and internal storage snapshots, and combine them in transactions. What's missing is transactionable machine state snapshots. We know how to work around it for external machine state snapshots (see above), but a transaction-based solution might be nicer. Any solution for internal machine state snapshots in QMP should at least try to fit into this. Some restrictions are okay. For instance, we probably need to restrict internal machine state snapshots to piggyback on an internal storage snapshot for now, so we don't have to dig up QCOW2 just to get QMP support. We can talk about more convenient interfaces for common special cases, but I feel we need to design for the general case. We don't have to implement the completely general case right away, though. As long as we know where we want to go, incremental steps towards that goal are fine. Can we create a transactionable QMP command to take an internal machine state snapshot? This would be like HMP savevm with the following differences: * Separate parameters for tag and ID. I'll have none of this overloading nonsense in QMP. * Specify the destination block device. I'll have none of this "pick a device in some magic, undocumented way" in QMP. * Leave alone the other block devices. Adding transaction actions to snapshot all the writable block devices to get a full system snapshot is the user's responsibility. Limitations: * No live internal machine snapshot, yet. * The storage device taking the internal snapshot must also be internally snapshot for now. In fact, the command does both (tolerable wart). Open questions: * Do we want the QMP command to delete existing snapshots with conflicting tag / ID, like HMP savevm does? Or do we want it to fail the transaction? * Do we need transactions for switching to a system snapshot, too? Opinions?
On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote: > Sorry for taking so long to reply. > > Daniel P. Berrangé <berrange@redhat.com> writes: > > > A followup to: > > > > v1: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg00866.html > > > > When QMP was first introduced some 10+ years ago now, the snapshot > > related commands (savevm/loadvm/delvm) were not converted. This was > > primarily because their implementation causes blocking of the thread > > running the monitor commands. This was (and still is) considered > > undesirable behaviour both in HMP and QMP. > > One of several reasons. > > > In theory someone was supposed to fix this flaw at some point in the > > past 10 years and bring them into the QMP world. Sadly, thus far it > > hasn't happened as people always had more important things to work > > on. Enterprise apps were much more interested in external snapshots > > than internal snapshots as they have many more features. > > Several attempts have been made to bring the functionality to QMP. > Sadly, they went nowhere. > > I posted an analysis of the issues in reply to one of the more serious > attempts: > > Message-ID: <87lh7l783q.fsf@blackfin.pond.sub.org> > https://lists.nongnu.org/archive/html/qemu-devel/2016-01/msg03593.html > > I'd like to hear your take on it. I append the relevant part for your > convenience. Perhaps your code is already close to what I describe > there. I'm interested in where it falls short. > > > Meanwhile users still want to use internal snapshots as there is > > a certainly simplicity in having everything self-contained in one > > image, even though it has limitations. Thus the apps that end up > > executing the savevm/loadvm/delvm via the "human-monitor-command" > > QMP command. > > > > IOW, the problematic blocking behaviour that was one of the reasons > > for not having savevm/loadvm/delvm in QMP is experienced by applications > > regardless. By not portting the commands to QMP due to one design flaw, > > we've forced apps and users to suffer from other design flaws of HMP ( > > bad error reporting, strong type checking of args, no introspection) for > > an additional 10 years. This feels rather sub-optimal :-( > > > > In practice users don't appear to care strongly about the fact that these > > commands block the VM while they run. I might have seen one bug report > > about it, but it certainly isn't something that comes up as a frequent > > topic except among us QEMU maintainers. Users do care about having > > access to the snapshot feature. > > > > Where I am seeing frequent complaints is wrt the use of OVMF combined > > with snapshots which has some serious pain points. This is getting worse > > as the push to ditch legacy BIOS in favour of UEFI gain momentum both > > across OS vendors and mgmt apps. Solving it requires new parameters to > > the commands, but doing this in HMP is super unappealing. > > > > After 10 years, I think it is time for us to be a little pragmatic about > > our handling of snapshots commands. My desire is that libvirt should never > > use "human-monitor-command" under any circumstances, because of the > > inherant flaws in HMP as a protocol for machine consumption. > > > > Thus in this series I'm proposing a fairly direct mapping of the existing > > HMP commands for savevm/loadvm/delvm into QMP as a first step. This does > > not solve the blocking thread problem, but it does put in a place a > > design using the jobs framework which can facilitate solving it later. > > It does also solve the error reporting, type checking and introspection > > problems inherant to HMP. So we're winning on 3 out of the 4 problems, > > and pushed apps to a QMP design that will let us solve the last > > remaining problem. > > > > With a QMP variant, we reasonably deal with the problems related to OVMF: > > > > - The logic to pick which disk to store the vmstate in is not > > satsifactory. > > > > The first block driver state cannot be assumed to be the root disk > > image, it might be OVMF varstore and we don't want to store vmstate > > in there. > > Yes, this is one of the issues. Glad you're addressing it. > > > - The logic to decide which disks must be snapshotted is hardwired > > to all disks which are writable > > > > Again with OVMF there might be a writable varstore, but this can be > > raw rather than qcow2 format, and thus unable to be snapshotted. > > While users might wish to snapshot their varstore, in some/many/most > > cases it is entirely uneccessary. Users are blocked from snapshotting > > their VM though due to this varstore. > > Another one. Glad again. > > > These are solved by adding two parameters to the commands. The first is > > a block device node name that identifies the image to store vmstate in, > > and the second is a list of node names to include for the snapshots. > > If the list of nodes isn't given, it falls back to the historical > > behaviour of using all disks matching some undocumented criteria. > > > > In the block code I've only dealt with node names for block devices, as > > IIUC, this is all that libvirt should need in the -blockdev world it now > > lives in. IOW, I've made not attempt to cope with people wanting to use > > these QMP commands in combination with -drive args. > > > > I've done some minimal work in libvirt to start to make use of the new > > commands to validate their functionality, but this isn't finished yet. > > > > My ultimate goal is to make the GNOME Boxes maintainer happy again by > > having internal snapshots work with OVMF: > > > > https://gitlab.gnome.org/GNOME/gnome-boxes/-/commit/c486da262f6566326fbcb5e= > > f45c5f64048f16a6e > > > > HELP NEEDED: this series starts to implement the approach that Kevin > > suggested wrto use of generic jobs. > > Does this mean you're trying to use the jobs infrastructure? Yes, this is working now. > Relevant part of Message-ID: <87lh7l783q.fsf@blackfin.pond.sub.org> > > If we can't make a sane QMP interface, I'd rather have no QMP interface. I strongly disagree with this. This absolutist position is why we have made zero progress in 10+ years, leaving mgmt apps suffering with HMP passthrough, as described above. This is a prime example of perfect being the enemy of good. > However, I believe we *can* make a sane QMP interface if we put in the > design work. We should make a credible attempt at QMP design, but if perfection isn't practical, we should none the less do *something* in QMP, even if we find we need to deprecate & replace it later. > The design work must start with a review of what we're trying to > accomplish, and how to fit it into the rest of the system. Here's my > attempt. Since my knowledge on snapshots is rather superficial, I'm > cc'ing Kevin for additional snapshot expertise. Kevin, please correct > me when I talk nonsense. I'm further cc'ing Eric and Peter for the > management layer perspective. The things I'm trying to accomplish as listed in the text above. Primarily this is about fixing the ability to snapshot guests where QEMU's heuristics for picking block devices is broken. ie explicitly list the disks to snapshot and which to store vmstate in. Converting from HMP to QMP is esentially an enabler, because adding new args to existing savevm/loadvm HMP commands is just too horrible to contemplate, as mgmt apps have no sane way to probe HMP. That's what QMP is for. Essentially the goal is to fix the inability to use internal snapshots with UEFI based VMs that is causing immediate pain for mgmt apps due to increased need to support UEFI. > A point-in-time snapshot of a system consists of a snapshot of its > machine state and snapshots of its storage. All the snapshots need to > be made at the same point in time for the result to be consistent. > > Snapshots of read-only storage carry no information and are commonly > omitted. > > Isolated storage snapshots can make sense, but snapshotting the machine > state without also snapshotting the machine's storage doesn't sound > useful to me. > > Both storage and machine state snapshots come in two flavours: internal > and external. > > External ones can be made with any block backend, but internal storage > snapshots work only with certain formats, notably qcow2. QMP supports > both kinds of storage snapshots. > > Both kinds of storage snapshots need exclusive access while they work. > They're relatively quick, but the delay could be noticable for large > internal snapshots, and perhaps for external snapshots on really slow > storage. > > Internal machine state snapshots are currently only available via HMP's > savevm, which integrates internal machine state and storage snapshots. > This is non-live, i.e. the guest is stopped while the snapshot gets > saved. I figure we could make it live if we really wanted to. Another > instance of the emerging background job concept. > > On the implementation level, QCOW2 can't currently store a machine state > snapshot without also storing a storage snapshot. I guess we could > change this if we really wanted to. > > External machine state snapshots are basically migrate to file. > Supported by QMP. > > Live migration to file is possible, but currently wastes space, because > memory dirtied during migration gets saved multiple times. Could be > fixed either by making migration update previously saved memory instead > of appending (beware, random I/O), or by compacting the file afterwards. > > Non-live migration to file doesn't waste space that way. > > To take multiple *consistent* snapshots, you have to bundle them up in a > transaction. Transactions currently support only *storage* snapshots, > though. > > Work-around for external machine state snapshot: migrate to file > (possibly live), leaving the guest sopped on completion, take storage > snapshots, resume guest. > > You can combine internal and external storage snapshots with an external > machine state snapshot to get a mixed system snapshot. > > You currently can't do that with an internal machine state snapshot: the > only way to take one is HMP savevm, which insists on internally > snapshotting all writable storage, and doesn't transact together with > external storage snapshots. > > Except for the case "purely internal snapshot with just one writable > storage device", a system snapshot consists of multiple parts stored in > separate files. Tying the parts together is a management problem. QEMU > provides rudimentary management for purely internal snapshots, but it's > flawed: missing storage isn't detected, and additional storage can creep > in if snapshot tags or IDs happen to match. I guess managing the parts > is better left to the management layer. > > I figure a fully general QMP interface would let you perform a system > snapshot by combining storage snapshots of either kind with either kind > of machine state snapshot. > > We already have most of the building blocks: we can take external and > internal storage snapshots, and combine them in transactions. > > What's missing is transactionable machine state snapshots. > > We know how to work around it for external machine state snapshots (see > above), but a transaction-based solution might be nicer. > > Any solution for internal machine state snapshots in QMP should at least > try to fit into this. Some restrictions are okay. For instance, we > probably need to restrict internal machine state snapshots to piggyback > on an internal storage snapshot for now, so we don't have to dig up > QCOW2 just to get QMP support. From the POV of practicality, making a design that unifies internal and external snapshots is something I'm considering out of scope. It increases the design time burden, as well as implementation burden. On my side, improving internal snapshots is a "spare time" project, not something I can justify spending weeks or months on. My goal is to implement something that is achievable in a short amount of time that gets us out of the hole we've been in for 10 years. Minimal refactoring of the internal snapshot code aside from fixing the critical limitations we have today around choice of disks to snapshot. If someone later wants to come up with a grand unified design for everything that's fine, we can deprecate the new QMP commands I'm proposing now. > We can talk about more convenient interfaces for common special cases, > but I feel we need to design for the general case. We don't have to > implement the completely general case right away, though. As long as we > know where we want to go, incremental steps towards that goal are fine. > > Can we create a transactionable QMP command to take an internal machine > state snapshot? > > This would be like HMP savevm with the following differences: > > * Separate parameters for tag and ID. I'll have none of this > overloading nonsense in QMP. > > * Specify the destination block device. I'll have none of this "pick a > device in some magic, undocumented way" in QMP. > > * Leave alone the other block devices. Adding transaction actions to > snapshot all the writable block devices to get a full system snapshot > is the user's responsibility. > > Limitations: > > * No live internal machine snapshot, yet. > > * The storage device taking the internal snapshot must also be > internally snapshot for now. In fact, the command does both > (tolerable wart). > > Open questions: > > * Do we want the QMP command to delete existing snapshots with > conflicting tag / ID, like HMP savevm does? Or do we want it to fail > the transaction? The intent is for the QMP commands to operate exclusively on 'tags', and never consider "ID". > * Do we need transactions for switching to a system snapshot, too? > > Opinions? Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Wed, Aug 26, 2020 at 07:28:24PM +0100, Daniel P. Berrangé wrote: > On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote: > > Open questions: > > > > * Do we want the QMP command to delete existing snapshots with > > conflicting tag / ID, like HMP savevm does? Or do we want it to fail > > the transaction? > > The intent is for the QMP commands to operate exclusively on > 'tags', and never consider "ID". I forgot that even HMP ignores "ID" now and works exclusively in terms of tags since: commit 6ca080453ea403959ccde661030ca16264acc181 Author: Daniel Henrique Barboza <danielhb413@gmail.com> Date: Wed Nov 7 11:09:58 2018 -0200 block/snapshot.c: eliminate use of ID input in snapshot operations Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Daniel P. Berrangé <berrange@redhat.com> writes: > On Wed, Aug 26, 2020 at 07:28:24PM +0100, Daniel P. Berrangé wrote: >> On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote: >> > Open questions: >> > >> > * Do we want the QMP command to delete existing snapshots with >> > conflicting tag / ID, like HMP savevm does? Or do we want it to fail >> > the transaction? >> >> The intent is for the QMP commands to operate exclusively on >> 'tags', and never consider "ID". > > I forgot that even HMP ignores "ID" now and works exclusively in terms > of tags since: > > > commit 6ca080453ea403959ccde661030ca16264acc181 > Author: Daniel Henrique Barboza <danielhb413@gmail.com> > Date: Wed Nov 7 11:09:58 2018 -0200 > > block/snapshot.c: eliminate use of ID input in snapshot operations Almost a year after I sent the memo I quoted. It's an incompatible change, but nobody complained, and I'm glad we got this issue out of the way.
Am 27.08.2020 um 13:06 hat Markus Armbruster geschrieben: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Wed, Aug 26, 2020 at 07:28:24PM +0100, Daniel P. Berrangé wrote: > >> On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote: > >> > Open questions: > >> > > >> > * Do we want the QMP command to delete existing snapshots with > >> > conflicting tag / ID, like HMP savevm does? Or do we want it to fail > >> > the transaction? > >> > >> The intent is for the QMP commands to operate exclusively on > >> 'tags', and never consider "ID". > > > > I forgot that even HMP ignores "ID" now and works exclusively in terms > > of tags since: > > > > > > commit 6ca080453ea403959ccde661030ca16264acc181 > > Author: Daniel Henrique Barboza <danielhb413@gmail.com> > > Date: Wed Nov 7 11:09:58 2018 -0200 > > > > block/snapshot.c: eliminate use of ID input in snapshot operations > > Almost a year after I sent the memo I quoted. It's an incompatible > change, but nobody complained, and I'm glad we got this issue out of the > way. FWIW, I would have ignored any complaint about incompatible changes in HMP. It's not supposed to be a stable API, but UI. Kevin
Kevin Wolf <kwolf@redhat.com> writes: > Am 27.08.2020 um 13:06 hat Markus Armbruster geschrieben: >> Daniel P. Berrangé <berrange@redhat.com> writes: >> >> > On Wed, Aug 26, 2020 at 07:28:24PM +0100, Daniel P. Berrangé wrote: >> >> On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote: >> >> > Open questions: >> >> > >> >> > * Do we want the QMP command to delete existing snapshots with >> >> > conflicting tag / ID, like HMP savevm does? Or do we want it to fail >> >> > the transaction? >> >> >> >> The intent is for the QMP commands to operate exclusively on >> >> 'tags', and never consider "ID". >> > >> > I forgot that even HMP ignores "ID" now and works exclusively in terms >> > of tags since: >> > >> > >> > commit 6ca080453ea403959ccde661030ca16264acc181 >> > Author: Daniel Henrique Barboza <danielhb413@gmail.com> >> > Date: Wed Nov 7 11:09:58 2018 -0200 >> > >> > block/snapshot.c: eliminate use of ID input in snapshot operations >> >> Almost a year after I sent the memo I quoted. It's an incompatible >> change, but nobody complained, and I'm glad we got this issue out of the >> way. > > FWIW, I would have ignored any complaint about incompatible changes in > HMP. It's not supposed to be a stable API, but UI. The iffy part is actually the loss of ability to access snapshots that lack a name. Complaints about that would have been valid, I think. Fortunately, there have been none.
Am 28.08.2020 um 08:20 hat Markus Armbruster geschrieben: > Kevin Wolf <kwolf@redhat.com> writes: > > > Am 27.08.2020 um 13:06 hat Markus Armbruster geschrieben: > >> Daniel P. Berrangé <berrange@redhat.com> writes: > >> > >> > On Wed, Aug 26, 2020 at 07:28:24PM +0100, Daniel P. Berrangé wrote: > >> >> On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote: > >> >> > Open questions: > >> >> > > >> >> > * Do we want the QMP command to delete existing snapshots with > >> >> > conflicting tag / ID, like HMP savevm does? Or do we want it to fail > >> >> > the transaction? > >> >> > >> >> The intent is for the QMP commands to operate exclusively on > >> >> 'tags', and never consider "ID". > >> > > >> > I forgot that even HMP ignores "ID" now and works exclusively in terms > >> > of tags since: > >> > > >> > > >> > commit 6ca080453ea403959ccde661030ca16264acc181 > >> > Author: Daniel Henrique Barboza <danielhb413@gmail.com> > >> > Date: Wed Nov 7 11:09:58 2018 -0200 > >> > > >> > block/snapshot.c: eliminate use of ID input in snapshot operations > >> > >> Almost a year after I sent the memo I quoted. It's an incompatible > >> change, but nobody complained, and I'm glad we got this issue out of the > >> way. > > > > FWIW, I would have ignored any complaint about incompatible changes in > > HMP. It's not supposed to be a stable API, but UI. > > The iffy part is actually the loss of ability to access snapshots that > lack a name. Complaints about that would have been valid, I think. > Fortunately, there have been none. 'loadvm ""' should do the trick for these. The same way as you have to use with 'savevm' to create them in non-prehistoric versions of QEMU. We stopped creating snapshots with empty names by default in 0.14, so they are probably not very relevant any more. (Versioned machine types go back "only" to 1.0, so good luck loading a snapshot from an older version. And I wouldn't bet money either on a 1.0 snapshot still working with that machine type...) Kevin
Kevin Wolf <kwolf@redhat.com> writes: > Am 28.08.2020 um 08:20 hat Markus Armbruster geschrieben: >> Kevin Wolf <kwolf@redhat.com> writes: >> >> > Am 27.08.2020 um 13:06 hat Markus Armbruster geschrieben: >> >> Daniel P. Berrangé <berrange@redhat.com> writes: >> >> >> >> > On Wed, Aug 26, 2020 at 07:28:24PM +0100, Daniel P. Berrangé wrote: >> >> >> On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote: >> >> >> > Open questions: >> >> >> > >> >> >> > * Do we want the QMP command to delete existing snapshots with >> >> >> > conflicting tag / ID, like HMP savevm does? Or do we want it to fail >> >> >> > the transaction? >> >> >> >> >> >> The intent is for the QMP commands to operate exclusively on >> >> >> 'tags', and never consider "ID". >> >> > >> >> > I forgot that even HMP ignores "ID" now and works exclusively in terms >> >> > of tags since: >> >> > >> >> > >> >> > commit 6ca080453ea403959ccde661030ca16264acc181 >> >> > Author: Daniel Henrique Barboza <danielhb413@gmail.com> >> >> > Date: Wed Nov 7 11:09:58 2018 -0200 >> >> > >> >> > block/snapshot.c: eliminate use of ID input in snapshot operations >> >> >> >> Almost a year after I sent the memo I quoted. It's an incompatible >> >> change, but nobody complained, and I'm glad we got this issue out of the >> >> way. >> > >> > FWIW, I would have ignored any complaint about incompatible changes in >> > HMP. It's not supposed to be a stable API, but UI. >> >> The iffy part is actually the loss of ability to access snapshots that >> lack a name. Complaints about that would have been valid, I think. >> Fortunately, there have been none. > > 'loadvm ""' should do the trick for these. As long as you have at most one. > The same way as you have to > use with 'savevm' to create them in non-prehistoric versions of QEMU. > We stopped creating snapshots with empty names by default in 0.14, so > they are probably not very relevant any more. (Versioned machine types > go back "only" to 1.0, so good luck loading a snapshot from an older > version. And I wouldn't bet money either on a 1.0 snapshot still working > with that machine type...) No argument.
Daniel P. Berrangé <berrange@redhat.com> writes: > On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote: >> Sorry for taking so long to reply. >> >> Daniel P. Berrangé <berrange@redhat.com> writes: >> >> > A followup to: >> > >> > v1: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg00866.html >> > >> > When QMP was first introduced some 10+ years ago now, the snapshot >> > related commands (savevm/loadvm/delvm) were not converted. This was >> > primarily because their implementation causes blocking of the thread >> > running the monitor commands. This was (and still is) considered >> > undesirable behaviour both in HMP and QMP. >> >> One of several reasons. >> >> > In theory someone was supposed to fix this flaw at some point in the >> > past 10 years and bring them into the QMP world. Sadly, thus far it >> > hasn't happened as people always had more important things to work >> > on. Enterprise apps were much more interested in external snapshots >> > than internal snapshots as they have many more features. >> >> Several attempts have been made to bring the functionality to QMP. >> Sadly, they went nowhere. >> >> I posted an analysis of the issues in reply to one of the more serious >> attempts: >> >> Message-ID: <87lh7l783q.fsf@blackfin.pond.sub.org> >> https://lists.nongnu.org/archive/html/qemu-devel/2016-01/msg03593.html >> >> I'd like to hear your take on it. I append the relevant part for your >> convenience. Perhaps your code is already close to what I describe >> there. I'm interested in where it falls short. >> >> > Meanwhile users still want to use internal snapshots as there is >> > a certainly simplicity in having everything self-contained in one >> > image, even though it has limitations. Thus the apps that end up >> > executing the savevm/loadvm/delvm via the "human-monitor-command" >> > QMP command. >> > >> > IOW, the problematic blocking behaviour that was one of the reasons >> > for not having savevm/loadvm/delvm in QMP is experienced by applications >> > regardless. By not portting the commands to QMP due to one design flaw, >> > we've forced apps and users to suffer from other design flaws of HMP ( >> > bad error reporting, strong type checking of args, no introspection) for >> > an additional 10 years. This feels rather sub-optimal :-( >> > >> > In practice users don't appear to care strongly about the fact that these >> > commands block the VM while they run. I might have seen one bug report >> > about it, but it certainly isn't something that comes up as a frequent >> > topic except among us QEMU maintainers. Users do care about having >> > access to the snapshot feature. >> > >> > Where I am seeing frequent complaints is wrt the use of OVMF combined >> > with snapshots which has some serious pain points. This is getting worse >> > as the push to ditch legacy BIOS in favour of UEFI gain momentum both >> > across OS vendors and mgmt apps. Solving it requires new parameters to >> > the commands, but doing this in HMP is super unappealing. >> > >> > After 10 years, I think it is time for us to be a little pragmatic about >> > our handling of snapshots commands. My desire is that libvirt should never >> > use "human-monitor-command" under any circumstances, because of the >> > inherant flaws in HMP as a protocol for machine consumption. >> > >> > Thus in this series I'm proposing a fairly direct mapping of the existing >> > HMP commands for savevm/loadvm/delvm into QMP as a first step. This does >> > not solve the blocking thread problem, but it does put in a place a >> > design using the jobs framework which can facilitate solving it later. >> > It does also solve the error reporting, type checking and introspection >> > problems inherant to HMP. So we're winning on 3 out of the 4 problems, >> > and pushed apps to a QMP design that will let us solve the last >> > remaining problem. >> > >> > With a QMP variant, we reasonably deal with the problems related to OVMF: >> > >> > - The logic to pick which disk to store the vmstate in is not >> > satsifactory. >> > >> > The first block driver state cannot be assumed to be the root disk >> > image, it might be OVMF varstore and we don't want to store vmstate >> > in there. >> >> Yes, this is one of the issues. Glad you're addressing it. >> >> > - The logic to decide which disks must be snapshotted is hardwired >> > to all disks which are writable >> > >> > Again with OVMF there might be a writable varstore, but this can be >> > raw rather than qcow2 format, and thus unable to be snapshotted. >> > While users might wish to snapshot their varstore, in some/many/most >> > cases it is entirely uneccessary. Users are blocked from snapshotting >> > their VM though due to this varstore. >> >> Another one. Glad again. >> >> > These are solved by adding two parameters to the commands. The first is >> > a block device node name that identifies the image to store vmstate in, >> > and the second is a list of node names to include for the snapshots. >> > If the list of nodes isn't given, it falls back to the historical >> > behaviour of using all disks matching some undocumented criteria. >> > >> > In the block code I've only dealt with node names for block devices, as >> > IIUC, this is all that libvirt should need in the -blockdev world it now >> > lives in. IOW, I've made not attempt to cope with people wanting to use >> > these QMP commands in combination with -drive args. >> > >> > I've done some minimal work in libvirt to start to make use of the new >> > commands to validate their functionality, but this isn't finished yet. >> > >> > My ultimate goal is to make the GNOME Boxes maintainer happy again by >> > having internal snapshots work with OVMF: >> > >> > https://gitlab.gnome.org/GNOME/gnome-boxes/-/commit/c486da262f6566326fbcb5e= >> > f45c5f64048f16a6e >> > >> > HELP NEEDED: this series starts to implement the approach that Kevin >> > suggested wrto use of generic jobs. >> >> Does this mean you're trying to use the jobs infrastructure? > > Yes, this is working now. > > >> Relevant part of Message-ID: <87lh7l783q.fsf@blackfin.pond.sub.org> >> >> If we can't make a sane QMP interface, I'd rather have no QMP interface. > > I strongly disagree with this. This absolutist position is why we > have made zero progress in 10+ years, leaving mgmt apps suffering > with HMP passthrough, as described above. > > This is a prime example of perfect being the enemy of good. You seem to use a different version of 'sane' than I do :) For me, a 'sane QMP interface' does not have to be perfect. It should * solve a problem people have (check) * be non-magical (clearly feasible; just a bit more work than "slap QMP around the existing --- and less than sane --- HMP commands") * be as general as we can make it, given the resources (this is a matter of thinking through alternatives and picking a reasonable one) * allow compatible evolution towards a more perfect interface (this is a matter of thinking ahead, to reduce the risk of boxing ourselves into a corner) >> However, I believe we *can* make a sane QMP interface if we put in the >> design work. > > We should make a credible attempt at QMP design, but if perfection > isn't practical, we should none the less do *something* in QMP, even > if we find we need to deprecate & replace it later. All the prior iterations failed at "attempt", i.e. long before coming anywhere near "credible". >> The design work must start with a review of what we're trying to >> accomplish, and how to fit it into the rest of the system. Here's my >> attempt. Since my knowledge on snapshots is rather superficial, I'm >> cc'ing Kevin for additional snapshot expertise. Kevin, please correct >> me when I talk nonsense. I'm further cc'ing Eric and Peter for the >> management layer perspective. > > The things I'm trying to accomplish as listed in the text above. > Primarily this is about fixing the ability to snapshot guests where > QEMU's heuristics for picking block devices is broken. ie explicitly > list the disks to snapshot and which to store vmstate in. > > Converting from HMP to QMP is esentially an enabler, because adding > new args to existing savevm/loadvm HMP commands is just too horrible > to contemplate, as mgmt apps have no sane way to probe HMP. That's > what QMP is for. > > Essentially the goal is to fix the inability to use internal snapshots > with UEFI based VMs that is causing immediate pain for mgmt apps due > to increased need to support UEFI. I think understand your goal. Now try to understand mine: I want to keep QMP supportable. I'm willing to accept less than perfect additions, but not when it's less than perfect just because people can't be bothered to think about any other solution than the first that comes to mind, in this case a 1:1 port of the existing HMP commands. You've clearly put in some thought. That's why I'm not sending you the canned "the exact same thing has been tried before, I said no then because of $reasons, the reasons haven't changed, and neither has my no" letter. >> A point-in-time snapshot of a system consists of a snapshot of its >> machine state and snapshots of its storage. All the snapshots need to >> be made at the same point in time for the result to be consistent. >> >> Snapshots of read-only storage carry no information and are commonly >> omitted. >> >> Isolated storage snapshots can make sense, but snapshotting the machine >> state without also snapshotting the machine's storage doesn't sound >> useful to me. >> >> Both storage and machine state snapshots come in two flavours: internal >> and external. >> >> External ones can be made with any block backend, but internal storage >> snapshots work only with certain formats, notably qcow2. QMP supports >> both kinds of storage snapshots. >> >> Both kinds of storage snapshots need exclusive access while they work. >> They're relatively quick, but the delay could be noticable for large >> internal snapshots, and perhaps for external snapshots on really slow >> storage. >> >> Internal machine state snapshots are currently only available via HMP's >> savevm, which integrates internal machine state and storage snapshots. >> This is non-live, i.e. the guest is stopped while the snapshot gets >> saved. I figure we could make it live if we really wanted to. Another >> instance of the emerging background job concept. >> >> On the implementation level, QCOW2 can't currently store a machine state >> snapshot without also storing a storage snapshot. I guess we could >> change this if we really wanted to. >> >> External machine state snapshots are basically migrate to file. >> Supported by QMP. >> >> Live migration to file is possible, but currently wastes space, because >> memory dirtied during migration gets saved multiple times. Could be >> fixed either by making migration update previously saved memory instead >> of appending (beware, random I/O), or by compacting the file afterwards. >> >> Non-live migration to file doesn't waste space that way. >> >> To take multiple *consistent* snapshots, you have to bundle them up in a >> transaction. Transactions currently support only *storage* snapshots, >> though. >> >> Work-around for external machine state snapshot: migrate to file >> (possibly live), leaving the guest sopped on completion, take storage >> snapshots, resume guest. >> >> You can combine internal and external storage snapshots with an external >> machine state snapshot to get a mixed system snapshot. >> >> You currently can't do that with an internal machine state snapshot: the >> only way to take one is HMP savevm, which insists on internally >> snapshotting all writable storage, and doesn't transact together with >> external storage snapshots. >> >> Except for the case "purely internal snapshot with just one writable >> storage device", a system snapshot consists of multiple parts stored in >> separate files. Tying the parts together is a management problem. QEMU >> provides rudimentary management for purely internal snapshots, but it's >> flawed: missing storage isn't detected, and additional storage can creep >> in if snapshot tags or IDs happen to match. I guess managing the parts >> is better left to the management layer. >> >> I figure a fully general QMP interface would let you perform a system >> snapshot by combining storage snapshots of either kind with either kind >> of machine state snapshot. >> >> We already have most of the building blocks: we can take external and >> internal storage snapshots, and combine them in transactions. >> >> What's missing is transactionable machine state snapshots. >> >> We know how to work around it for external machine state snapshots (see >> above), but a transaction-based solution might be nicer. >> >> Any solution for internal machine state snapshots in QMP should at least >> try to fit into this. Some restrictions are okay. For instance, we >> probably need to restrict internal machine state snapshots to piggyback >> on an internal storage snapshot for now, so we don't have to dig up >> QCOW2 just to get QMP support. > > From the POV of practicality, making a design that unifies internal > and external snapshots is something I'm considering out of scope. > It increases the design time burden, as well as implementation burden. > On my side, improving internal snapshots is a "spare time" project, > not something I can justify spending weeks or months on. I'm not demanding a solution that unifies internal and external snapshots. I'm asking for a bit of serious thought on an interface that could compatibly evolve there. Hours, not weeks or months. > My goal is to implement something that is achievable in a short > amount of time that gets us out of the hole we've been in for 10 > years. Minimal refactoring of the internal snapshot code aside > from fixing the critical limitations we have today around choice > of disks to snapshot. > > If someone later wants to come up with a grand unified design > for everything that's fine, we can deprecate the new QMP commands > I'm proposing now. Failing at coming up with an interface that has a reasonable chance to be future-proof is okay. Not even trying is not okay. Having not tried for 10+ years doesn't make it okay. I believe I already did much of the design work back in 2016. Now I'd like you to engage with it a bit deeper than just "perfect is the enemy of good; now take my patches, please". Specifically, I'd like you to think about monolothic snapshot command (internal snapshots only by design) vs. transaction of individual snapshot commands (design is not restricted to internal snapshots, but we may want to accept implementation restrictions). We already have transactionable individual storage snapshot commands. What's missing is a transactionable machine state snapshot command. One restriction I'd readily accept at this time is "the machine state snapshot must write to a QCOW2 that is also internally snapshot in the same transaction". Now explain to me why this is impractical. >> We can talk about more convenient interfaces for common special cases, >> but I feel we need to design for the general case. We don't have to >> implement the completely general case right away, though. As long as we >> know where we want to go, incremental steps towards that goal are fine. >> >> Can we create a transactionable QMP command to take an internal machine >> state snapshot? >> >> This would be like HMP savevm with the following differences: >> >> * Separate parameters for tag and ID. I'll have none of this >> overloading nonsense in QMP. >> >> * Specify the destination block device. I'll have none of this "pick a >> device in some magic, undocumented way" in QMP. >> >> * Leave alone the other block devices. Adding transaction actions to >> snapshot all the writable block devices to get a full system snapshot >> is the user's responsibility. >> >> Limitations: >> >> * No live internal machine snapshot, yet. >> >> * The storage device taking the internal snapshot must also be >> internally snapshot for now. In fact, the command does both >> (tolerable wart). >> >> Open questions: >> >> * Do we want the QMP command to delete existing snapshots with >> conflicting tag / ID, like HMP savevm does? Or do we want it to fail >> the transaction? > > The intent is for the QMP commands to operate exclusively on > 'tags', and never consider "ID". > >> * Do we need transactions for switching to a system snapshot, too? >> >> Opinions? > > > > Regards, > Daniel
On Thu, Aug 27, 2020 at 01:04:43PM +0200, Markus Armbruster wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote: > > From the POV of practicality, making a design that unifies internal > > and external snapshots is something I'm considering out of scope. > > It increases the design time burden, as well as implementation burden. > > On my side, improving internal snapshots is a "spare time" project, > > not something I can justify spending weeks or months on. > > I'm not demanding a solution that unifies internal and external > snapshots. I'm asking for a bit of serious thought on an interface that > could compatibly evolve there. Hours, not weeks or months. > > > My goal is to implement something that is achievable in a short > > amount of time that gets us out of the hole we've been in for 10 > > years. Minimal refactoring of the internal snapshot code aside > > from fixing the critical limitations we have today around choice > > of disks to snapshot. > > > > If someone later wants to come up with a grand unified design > > for everything that's fine, we can deprecate the new QMP commands > > I'm proposing now. > > Failing at coming up with an interface that has a reasonable chance to > be future-proof is okay. > > Not even trying is not okay. This was raised in my v1 posting: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01346.html but the conclusion was that it was a non-trivial amount of extra implementation work > Specifically, I'd like you to think about monolothic snapshot command > (internal snapshots only by design) vs. transaction of individual > snapshot commands (design is not restricted to internal snapshots, but > we may want to accept implementation restrictions). > > We already have transactionable individual storage snapshot commands. > What's missing is a transactionable machine state snapshot command. At a high level I consider what I've proposed as being higher level syntax sugar vs a more generic future impl based on multiple commands for snapshotting disk & state. I don't think I'd claim that it will evolve to become the design you're suggesting here, as they are designs from different levels in the stack. IOW, I think the would ultimately just exist in parallel. I don't think that's a real problem from a maint POV, as the large burden from the monolithic snapshot code is not the HMP/QMP interface, but rather the guts of the internal impl in the migration/savevm.c and block/snapshot.c files. That code will exist for as long as the HMP commands exist, and adding a QMP interface doesn't make that situation worse unless we were intending to drop the existing HMP commands. If we did change our minds though, we can just deprecate the QMP command at any time we like. > One restriction I'd readily accept at this time is "the machine state > snapshot must write to a QCOW2 that is also internally snapshot in the > same transaction". > > Now explain to me why this is impractical. The issues were described by Kevin here: https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg02057.html Assuming the migration impl is actually possible to solve, there is still the question of actually writing it. That's a non-trivial amount of work someone has to find time for. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Daniel P. Berrangé <berrange@redhat.com> writes: > On Thu, Aug 27, 2020 at 01:04:43PM +0200, Markus Armbruster wrote: >> Daniel P. Berrangé <berrange@redhat.com> writes: >> >> > On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote: >> > From the POV of practicality, making a design that unifies internal >> > and external snapshots is something I'm considering out of scope. >> > It increases the design time burden, as well as implementation burden. >> > On my side, improving internal snapshots is a "spare time" project, >> > not something I can justify spending weeks or months on. >> >> I'm not demanding a solution that unifies internal and external >> snapshots. I'm asking for a bit of serious thought on an interface that >> could compatibly evolve there. Hours, not weeks or months. >> >> > My goal is to implement something that is achievable in a short >> > amount of time that gets us out of the hole we've been in for 10 >> > years. Minimal refactoring of the internal snapshot code aside >> > from fixing the critical limitations we have today around choice >> > of disks to snapshot. >> > >> > If someone later wants to come up with a grand unified design >> > for everything that's fine, we can deprecate the new QMP commands >> > I'm proposing now. >> >> Failing at coming up with an interface that has a reasonable chance to >> be future-proof is okay. >> >> Not even trying is not okay. > > This was raised in my v1 posting: > > https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01346.html > > but the conclusion was that it was a non-trivial amount of extra > implementation work Thanks for the pointer. I've now read that review thread. >> Specifically, I'd like you to think about monolothic snapshot command >> (internal snapshots only by design) vs. transaction of individual >> snapshot commands (design is not restricted to internal snapshots, but >> we may want to accept implementation restrictions). >> >> We already have transactionable individual storage snapshot commands. >> What's missing is a transactionable machine state snapshot command. > > At a high level I consider what I've proposed as being higher level > syntax sugar vs a more generic future impl based on multiple commands > for snapshotting disk & state. I don't think I'd claim that it will > evolve to become the design you're suggesting here, as they are designs > from different levels in the stack. IOW, I think the would ultimately > just exist in parallel. I don't think that's a real problem from a > maint POV, as the large burden from the monolithic snapshot code is > not the HMP/QMP interface, but rather the guts of the internal > impl in the migration/savevm.c and block/snapshot.c files. That code > will exist for as long as the HMP commands exist, and adding a QMP > interface doesn't make that situation worse unless we were intending > to drop the existing HMP commands. If we did change our minds though, > we can just deprecate the QMP command at any time we like. > > >> One restriction I'd readily accept at this time is "the machine state >> snapshot must write to a QCOW2 that is also internally snapshot in the >> same transaction". >> >> Now explain to me why this is impractical. > > The issues were described by Kevin here: > > https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg02057.html > > Assuming the migration impl is actually possible to solve, there is > still the question of actually writing it. That's a non-trivial > amount of work someone has to find time for. Kevin explained how the transactionable machine state snapshot command should be made non-blocking: post-copy. I don't dispute that creating such a post-copy snapshot is a non-trivial task. It is out of reach for you and me. I didn't actually ask for it, though. You argue that providing a blocking snapshot in QMP is better than nothing, and good enough for quite a few applications. I agree! I blocked prior attempts at porting HMP's savevm/loadvm to QMP not because they were blocking, but because they stuck to the HMP interface, and the HMP interface is bonkers. I would accept the restriction "snapshotting machine state is blocking, i.e. it stops the machine." As I wrote in 2016, "Limitations: No live internal machine snapshot, yet." Aside: unless I'm mistaken, taking an internal block device snapshot is also blocking, but unlike taking a machine state snapshot, it's fast enough for the blocking not to matter. That's the "sync" in blockdev-snapshot-internal-sync. I asked you to consider the interface design I proposed back in 2016. You point out above that your interface is more high-level, and could be turned into sugar for a low level interface. If true, this means your proposal doesn't box us into a corner, which is good. Let me elaborate a bit on the desugaring, just to make sure we're on the same page. Please correct me where I'm talking nonsense. snapshot-save creates job that snapshots a set of block devices and the machine state. The snapshots are consistent, i.e. they are all taken at the same point in time. The block device snapshots are all internal. The machine state snapshot is saved together with one of the (internal) block device snapshots. This is basically a transaction of blockdev-snapshot-internal-sync (which exists) plus machine-snapshot-internal-sync (which doesn't exist) wrapped in a job. Likweise for snapshot-load, except there not even the command for block snapshots exists. I doubt creating the (transactionable, but blocking) low-level commands is actually out of reach. It's a matter of adding interfaces to parts of the code you already got working. I'm not demanding you do that, though. As I said, my chief concerns are keeping "bonkers" out of QMP, and not boxing us in needlessly.
On Tue, Sep 01, 2020 at 03:22:24PM +0200, Markus Armbruster wrote: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Thu, Aug 27, 2020 at 01:04:43PM +0200, Markus Armbruster wrote: > >> Daniel P. Berrangé <berrange@redhat.com> writes: > >> > >> > On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote: > >> > From the POV of practicality, making a design that unifies internal > >> > and external snapshots is something I'm considering out of scope. > >> > It increases the design time burden, as well as implementation burden. > >> > On my side, improving internal snapshots is a "spare time" project, > >> > not something I can justify spending weeks or months on. > >> > >> I'm not demanding a solution that unifies internal and external > >> snapshots. I'm asking for a bit of serious thought on an interface that > >> could compatibly evolve there. Hours, not weeks or months. > >> > >> > My goal is to implement something that is achievable in a short > >> > amount of time that gets us out of the hole we've been in for 10 > >> > years. Minimal refactoring of the internal snapshot code aside > >> > from fixing the critical limitations we have today around choice > >> > of disks to snapshot. > >> > > >> > If someone later wants to come up with a grand unified design > >> > for everything that's fine, we can deprecate the new QMP commands > >> > I'm proposing now. > >> > >> Failing at coming up with an interface that has a reasonable chance to > >> be future-proof is okay. > >> > >> Not even trying is not okay. > > > > This was raised in my v1 posting: > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01346.html > > > > but the conclusion was that it was a non-trivial amount of extra > > implementation work > > Thanks for the pointer. I've now read that review thread. > > >> Specifically, I'd like you to think about monolothic snapshot command > >> (internal snapshots only by design) vs. transaction of individual > >> snapshot commands (design is not restricted to internal snapshots, but > >> we may want to accept implementation restrictions). > >> > >> We already have transactionable individual storage snapshot commands. > >> What's missing is a transactionable machine state snapshot command. > > > > At a high level I consider what I've proposed as being higher level > > syntax sugar vs a more generic future impl based on multiple commands > > for snapshotting disk & state. I don't think I'd claim that it will > > evolve to become the design you're suggesting here, as they are designs > > from different levels in the stack. IOW, I think the would ultimately > > just exist in parallel. I don't think that's a real problem from a > > maint POV, as the large burden from the monolithic snapshot code is > > not the HMP/QMP interface, but rather the guts of the internal > > impl in the migration/savevm.c and block/snapshot.c files. That code > > will exist for as long as the HMP commands exist, and adding a QMP > > interface doesn't make that situation worse unless we were intending > > to drop the existing HMP commands. If we did change our minds though, > > we can just deprecate the QMP command at any time we like. > > > > > >> One restriction I'd readily accept at this time is "the machine state > >> snapshot must write to a QCOW2 that is also internally snapshot in the > >> same transaction". > >> > >> Now explain to me why this is impractical. > > > > The issues were described by Kevin here: > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg02057.html > > > > Assuming the migration impl is actually possible to solve, there is > > still the question of actually writing it. That's a non-trivial > > amount of work someone has to find time for. > > Kevin explained how the transactionable machine state snapshot command > should be made non-blocking: post-copy. > > I don't dispute that creating such a post-copy snapshot is a non-trivial > task. It is out of reach for you and me. I didn't actually ask for it, > though. > > You argue that providing a blocking snapshot in QMP is better than > nothing, and good enough for quite a few applications. I agree! I > blocked prior attempts at porting HMP's savevm/loadvm to QMP not because > they were blocking, but because they stuck to the HMP interface, and the > HMP interface is bonkers. I would accept the restriction "snapshotting > machine state is blocking, i.e. it stops the machine." As I wrote in > 2016, "Limitations: No live internal machine snapshot, yet." FYI, when I documented the new QAPI commands I implemented, i choose to *not* say that the snapshot is blocking. Instead I said: # Applications should not assume that the snapshot load is complete # when this command returns. Completion is indicated by the job # status. Clients can wait for the JOB_STATUS_CHANGE event. If the # job aborts, errors can be obtained via the 'query-jobs' command, # though. The idea was that if we fix these QAPI commands to not block in future, we want to minimize the risk of breaking clients, by discouraging them from assuming the impl will always be blocking in future. IOW they should assume the commands are asynchronous right now, even though they are not. > Aside: unless I'm mistaken, taking an internal block device snapshot is > also blocking, but unlike taking a machine state snapshot, it's fast > enough for the blocking not to matter. That's the "sync" in > blockdev-snapshot-internal-sync. > Let me elaborate a bit on the desugaring, just to make sure we're on the > same page. Please correct me where I'm talking nonsense. > > snapshot-save creates job that snapshots a set of block devices and the > machine state. The snapshots are consistent, i.e. they are all taken at > the same point in time. The block device snapshots are all internal. > The machine state snapshot is saved together with one of the (internal) > block device snapshots. Yep > This is basically a transaction of blockdev-snapshot-internal-sync > (which exists) plus machine-snapshot-internal-sync (which doesn't exist) > wrapped in a job. Yes, except it isn't clear to me whether you can separate out the functionality into two separate commands. It might be neccessary to save the vmstate at the same time as the disk snapshot. In such a case, instead of machine-snapshot-internal-sync, we might end up having a "include vmstate" bool option to blockdev-snapshot-internal-sync Either way though, we'd end up with a series of commands inside a transaction. > Likweise or snapshot-load, except there not even the command for block > snapshots exists. > > I doubt creating the (transactionable, but blocking) low-level commands > is actually out of reach. It's a matter of adding interfaces to parts > of the code you already got working. If we're splitting it up into one command per disk, and another command for vmstate, then it will require refactoring the current migration/savevm.c and block/snapshot.c code AFAICT, because its all written around the idea of processing all disks at the same time. > I'm not demanding you do that, though. As I said, my chief concerns are > keeping "bonkers" out of QMP, and not boxing us in needlessly. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
Am 01.09.2020 um 15:22 hat Markus Armbruster geschrieben: > Daniel P. Berrangé <berrange@redhat.com> writes: > > > On Thu, Aug 27, 2020 at 01:04:43PM +0200, Markus Armbruster wrote: > >> Daniel P. Berrangé <berrange@redhat.com> writes: > >> > >> > On Wed, Aug 26, 2020 at 05:52:06PM +0200, Markus Armbruster wrote: > >> > From the POV of practicality, making a design that unifies internal > >> > and external snapshots is something I'm considering out of scope. > >> > It increases the design time burden, as well as implementation burden. > >> > On my side, improving internal snapshots is a "spare time" project, > >> > not something I can justify spending weeks or months on. > >> > >> I'm not demanding a solution that unifies internal and external > >> snapshots. I'm asking for a bit of serious thought on an interface that > >> could compatibly evolve there. Hours, not weeks or months. > >> > >> > My goal is to implement something that is achievable in a short > >> > amount of time that gets us out of the hole we've been in for 10 > >> > years. Minimal refactoring of the internal snapshot code aside > >> > from fixing the critical limitations we have today around choice > >> > of disks to snapshot. > >> > > >> > If someone later wants to come up with a grand unified design > >> > for everything that's fine, we can deprecate the new QMP commands > >> > I'm proposing now. > >> > >> Failing at coming up with an interface that has a reasonable chance to > >> be future-proof is okay. > >> > >> Not even trying is not okay. > > > > This was raised in my v1 posting: > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg01346.html > > > > but the conclusion was that it was a non-trivial amount of extra > > implementation work > > Thanks for the pointer. I've now read that review thread. > > >> Specifically, I'd like you to think about monolothic snapshot command > >> (internal snapshots only by design) vs. transaction of individual > >> snapshot commands (design is not restricted to internal snapshots, but > >> we may want to accept implementation restrictions). > >> > >> We already have transactionable individual storage snapshot commands. > >> What's missing is a transactionable machine state snapshot command. > > > > At a high level I consider what I've proposed as being higher level > > syntax sugar vs a more generic future impl based on multiple commands > > for snapshotting disk & state. I don't think I'd claim that it will > > evolve to become the design you're suggesting here, as they are designs > > from different levels in the stack. IOW, I think the would ultimately > > just exist in parallel. I don't think that's a real problem from a > > maint POV, as the large burden from the monolithic snapshot code is > > not the HMP/QMP interface, but rather the guts of the internal > > impl in the migration/savevm.c and block/snapshot.c files. That code > > will exist for as long as the HMP commands exist, and adding a QMP > > interface doesn't make that situation worse unless we were intending > > to drop the existing HMP commands. If we did change our minds though, > > we can just deprecate the QMP command at any time we like. > > > > > >> One restriction I'd readily accept at this time is "the machine state > >> snapshot must write to a QCOW2 that is also internally snapshot in the > >> same transaction". > >> > >> Now explain to me why this is impractical. > > > > The issues were described by Kevin here: > > > > https://lists.gnu.org/archive/html/qemu-devel/2020-07/msg02057.html > > > > Assuming the migration impl is actually possible to solve, there is > > still the question of actually writing it. That's a non-trivial > > amount of work someone has to find time for. > > Kevin explained how the transactionable machine state snapshot command > should be made non-blocking: post-copy. > > I don't dispute that creating such a post-copy snapshot is a non-trivial > task. It is out of reach for you and me. I didn't actually ask for it, > though. > > You argue that providing a blocking snapshot in QMP is better than > nothing, and good enough for quite a few applications. I agree! I > blocked prior attempts at porting HMP's savevm/loadvm to QMP not because > they were blocking, but because they stuck to the HMP interface, and the > HMP interface is bonkers. I would accept the restriction "snapshotting > machine state is blocking, i.e. it stops the machine." As I wrote in > 2016, "Limitations: No live internal machine snapshot, yet." > > Aside: unless I'm mistaken, taking an internal block device snapshot is > also blocking, but unlike taking a machine state snapshot, it's fast > enough for the blocking not to matter. That's the "sync" in > blockdev-snapshot-internal-sync. > > I asked you to consider the interface design I proposed back in 2016. > You point out above that your interface is more high-level, and could be > turned into sugar for a low level interface. > > If true, this means your proposal doesn't box us into a corner, which is > good. > > Let me elaborate a bit on the desugaring, just to make sure we're on the > same page. Please correct me where I'm talking nonsense. > > snapshot-save creates job that snapshots a set of block devices and the > machine state. The snapshots are consistent, i.e. they are all taken at > the same point in time. The block device snapshots are all internal. > The machine state snapshot is saved together with one of the (internal) > block device snapshots. > > This is basically a transaction of blockdev-snapshot-internal-sync > (which exists) plus machine-snapshot-internal-sync (which doesn't exist) > wrapped in a job. > > Likweise for snapshot-load, except there not even the command for block > snapshots exists. > > I doubt creating the (transactionable, but blocking) low-level commands > is actually out of reach. It's a matter of adding interfaces to parts > of the code you already got working. > > I'm not demanding you do that, though. As I said, my chief concerns are > keeping "bonkers" out of QMP, and not boxing us in needlessly. I doubt this is as easy as it may seem at the first sight. To remind everyone, the way internal snapshots with VM state work today is: 1. Write VM state to VM state area inside the image (in qcow2, this is essentially the same as disk content, except at offsets higher than the image size). 2. Create the qcow2 snapshot, which means that the current disk content (including the VM state at higher offsets) becomes COW and the snapshotted copy becomes read-ony. 3. Discard the VM state in the active layer again, we don't need it there. The implication is that either 1. has to be completed before 2. can happen, or that 2. must be able to write into an already taken snapshot rather than to the active layer (which in turn implies that the snapshot must have completed). So a naive implementation with a transaction might not give the right result. It's not just independent operations, but some ordering between them is required. I feel having a single block job that covers both parts gives us more flexibility in how we want to synchronise writing the VM state and taking the disk snapshot - or in your words, avoids boxing us into a corner. Kevin
© 2016 - 2024 Red Hat, Inc.