migration/migration.c | 13 +++++++++++++ migration/migration.h | 1 + migration/ram.c | 26 +++++++++++++++++--------- qapi/migration.json | 6 +++++- 4 files changed, 36 insertions(+), 10 deletions(-)
1) What's this
When the migration capability 'bypass-shared-memory'
is set, the shared memory will be bypassed when migration.
It is the key feature to enable several excellent features for
the qemu, such as qemu-local-migration, qemu-live-update,
extremely-fast-save-restore, vm-template, vm-fast-live-clone,
yet-another-post-copy-migration, etc..
The philosophy behind this key feature, including the resulting
advanced key features, is that a part of the memory management
is separated out from the qemu, and let the other toolkits
such as libvirt, kata-containers (https://github.com/kata-containers)
runv(https://github.com/hyperhq/runv/) or some multiple cooperative
qemu commands directly access to it, manage it, provide features on it.
2) Status in real world
The hyperhq(http://hyper.sh http://hypercontainer.io/)
introduced the feature vm-template(vm-fast-live-clone)
to the hyper container for several years, it works perfect.
(see https://github.com/hyperhq/runv/pull/297).
The feature vm-template makes the containers(VMs) can
be started in 130ms and save 80M memory for every
container(VM). So that the hyper containers are fast
and high-density as normal containers.
kata-containers project (https://github.com/kata-containers)
which was launched by hyper, intel and friends and which descended
from runv (and clear-container) should have this feature enabled.
Unfortunately, due to the code confliction between runv&cc,
this feature was temporary disabled and it is being brought
back by hyper and intel team.
3) How to use and bring up advanced features.
In current qemu command line, shared memory has
to be configured via memory-object.
a) feature: qemu-local-migration, qemu-live-update
Set the mem-path on the tmpfs and set share=on for it when
start the vm. example:
-object \
memory-backend-file,id=mem,size=128M,mem-path=/dev/shm/memory,share=on \
-numa node,nodeid=0,cpus=0-7,memdev=mem
when you want to migrate the vm locally (after fixed a security bug
of the qemu-binary, or other reason), you can start a new qemu with
the same command line and -incoming, then you can migrate the
vm from the old qemu to the new qemu with the migration capability
'bypass-shared-memory' set. The migration will migrate the device-state
*ONLY*, the memory is the origin memory backed by tmpfs file.
b) feature: extremely-fast-save-restore
the same above, but the mem-path is on the persistent file system.
c) feature: vm-template, vm-fast-live-clone
the template vm is started as 1), and paused when the guest reaches
the template point(example: the guest app is ready), then the template
vm is saved. (the qemu process of the template can be killed now, because
we need only the memory and the device state files (in tmpfs)).
Then we can launch one or multiple VMs base on the template vm states,
the new VMs are started without the “share=on”, all the new VMs share
the initial memory from the memory file, they save a lot of memory.
all the new VMs start from the template point, the guest app can go to
work quickly.
The new VM booted from template vm can’t become template again,
if you need this unusual chained-template feature, you can write
a cloneable-tmpfs kernel module for it.
The libvirt toolkit can’t manage vm-template currently, in the
hyperhq/runv, we use qemu wrapper script to do it. I hope someone add
“libvrit managed template” feature to libvirt.
d) feature: yet-another-post-copy-migration
It is a possible feature, no toolkit can do it well now.
Using nbd server/client on the memory file is reluctantly Ok but
inconvenient. A special feature for tmpfs might be needed to
fully complete this feature.
No one need yet another post copy migration method,
but it is possible when some crazy man need it.
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Sebastien Boeuf <sebastien.boeuf@intel.com>
Cc: James O. D. Hunt <james.o.hunt@intel.com>
Cc: Xu Wang <gnawux@gmail.com>
Cc: Peng Tao <bergwolf@gmail.com>
Cc: Xiao Guangrong <xiaoguangrong@tencent.com>
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
---
Changes in V3:
rebased on upstream master
update the available version of the capability to
v2.13
Changes in V2:
rebased on 2.11.1
migration/migration.c | 13 +++++++++++++
migration/migration.h | 1 +
migration/ram.c | 26 +++++++++++++++++---------
qapi/migration.json | 6 +++++-
4 files changed, 36 insertions(+), 10 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 52a5092add..c5a3591bc7 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1509,6 +1509,19 @@ bool migrate_release_ram(void)
return s->enabled_capabilities[MIGRATION_CAPABILITY_RELEASE_RAM];
}
+bool migrate_bypass_shared_memory(void)
+{
+ MigrationState *s;
+
+ /* it is not workable with postcopy yet. */
+ if (migrate_postcopy_ram())
+ return false;
+
+ s = migrate_get_current();
+
+ return s->enabled_capabilities[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY];
+}
+
bool migrate_postcopy_ram(void)
{
MigrationState *s;
diff --git a/migration/migration.h b/migration/migration.h
index 8d2f320c48..cfd2513ef0 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -206,6 +206,7 @@ MigrationState *migrate_get_current(void);
bool migrate_postcopy(void);
+bool migrate_bypass_shared_memory(void);
bool migrate_release_ram(void);
bool migrate_postcopy_ram(void);
bool migrate_zero_blocks(void);
diff --git a/migration/ram.c b/migration/ram.c
index 0e90efa092..6881ec1d80 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -780,6 +780,10 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
unsigned long *bitmap = rb->bmap;
unsigned long next;
+ /* when this ramblock is requested bypassing */
+ if (!bitmap)
+ return size;
+
if (rs->ram_bulk_stage && start > 0) {
next = start + 1;
} else {
@@ -850,7 +854,9 @@ static void migration_bitmap_sync(RAMState *rs)
qemu_mutex_lock(&rs->bitmap_mutex);
rcu_read_lock();
RAMBLOCK_FOREACH(block) {
- migration_bitmap_sync_range(rs, block, 0, block->used_length);
+ if (!migrate_bypass_shared_memory() || !qemu_ram_is_shared(block)) {
+ migration_bitmap_sync_range(rs, block, 0, block->used_length);
+ }
}
rcu_read_unlock();
qemu_mutex_unlock(&rs->bitmap_mutex);
@@ -2132,18 +2138,12 @@ static int ram_state_init(RAMState **rsp)
qemu_mutex_init(&(*rsp)->src_page_req_mutex);
QSIMPLEQ_INIT(&(*rsp)->src_page_requests);
- /*
- * Count the total number of pages used by ram blocks not including any
- * gaps due to alignment or unplugs.
- */
- (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
-
ram_state_reset(*rsp);
return 0;
}
-static void ram_list_init_bitmaps(void)
+static void ram_list_init_bitmaps(RAMState *rs)
{
RAMBlock *block;
unsigned long pages;
@@ -2151,9 +2151,17 @@ static void ram_list_init_bitmaps(void)
/* Skip setting bitmap if there is no RAM */
if (ram_bytes_total()) {
QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+ if (migrate_bypass_shared_memory() && qemu_ram_is_shared(block)) {
+ continue;
+ }
pages = block->max_length >> TARGET_PAGE_BITS;
block->bmap = bitmap_new(pages);
bitmap_set(block->bmap, 0, pages);
+ /*
+ * Count the total number of pages used by ram blocks not
+ * including any gaps due to alignment or unplugs.
+ */
+ rs->migration_dirty_pages += pages;
if (migrate_postcopy_ram()) {
block->unsentmap = bitmap_new(pages);
bitmap_set(block->unsentmap, 0, pages);
@@ -2169,7 +2177,7 @@ static void ram_init_bitmaps(RAMState *rs)
qemu_mutex_lock_ramlist();
rcu_read_lock();
- ram_list_init_bitmaps();
+ ram_list_init_bitmaps(rs);
memory_global_dirty_log_start();
migration_bitmap_sync(rs);
diff --git a/qapi/migration.json b/qapi/migration.json
index 9d0bf82cf4..45326480bd 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -357,13 +357,17 @@
# @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps.
# (since 2.12)
#
+# @bypass-shared-memory: the shared memory region will be bypassed on migration.
+# This feature allows the memory region to be reused by new qemu(s)
+# or be migrated separately. (since 2.13)
+#
# Since: 1.2
##
{ 'enum': 'MigrationCapability',
'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
'block', 'return-path', 'pause-before-switchover', 'x-multifd',
- 'dirty-bitmaps' ] }
+ 'dirty-bitmaps', 'bypass-shared-memory' ] }
##
# @MigrationCapabilityStatus:
--
2.14.3 (Apple Git-98)
Hi, This series seems to have some coding style problems. See output below for more information: Type: series Message-id: 20180401084848.36725-1-jiangshanlai@gmail.com Subject: [Qemu-devel] [PATCH V3] migration: add capability to bypass the shared memory === TEST SCRIPT BEGIN === #!/bin/bash BASE=base n=1 total=$(git log --oneline $BASE.. | wc -l) failed=0 git config --local diff.renamelimit 0 git config --local diff.renames True git config --local diff.algorithm histogram commits="$(git log --format=%H --reverse $BASE..)" for c in $commits; do echo "Checking PATCH $n/$total: $(git log -n 1 --format=%s $c)..." if ! git show $c --format=email | ./scripts/checkpatch.pl --mailback -; then failed=1 echo fi n=$((n+1)) done exit $failed === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 From https://github.com/patchew-project/qemu * [new tag] patchew/20180401084848.36725-1-jiangshanlai@gmail.com -> patchew/20180401084848.36725-1-jiangshanlai@gmail.com Switched to a new branch 'test' 8886edc9cf migration: add capability to bypass the shared memory === OUTPUT BEGIN === Checking PATCH 1/1: migration: add capability to bypass the shared memory... ERROR: braces {} are necessary for all arms of this statement #118: FILE: migration/migration.c:1525: + if (migrate_postcopy_ram()) [...] ERROR: braces {} are necessary for all arms of this statement #150: FILE: migration/ram.c:784: + if (!bitmap) [...] total: 2 errors, 0 warnings, 108 lines checked Your patch has style problems, please review. If any of these errors are false positives report them to the maintainer, see CHECKPATCH in MAINTAINERS. === OUTPUT END === Test command exited with code: 1 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
Hi, This series failed docker-mingw@fedora build test. Please find the testing commands and their output below. If you have Docker installed, you can probably reproduce it locally. Type: series Message-id: 20180401084848.36725-1-jiangshanlai@gmail.com Subject: [Qemu-devel] [PATCH V3] migration: add capability to bypass the shared memory === TEST SCRIPT BEGIN === #!/bin/bash set -e git submodule update --init dtc # Let docker tests dump environment info export SHOW_ENV=1 export J=8 time make docker-test-mingw@fedora === TEST SCRIPT END === Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384 Switched to a new branch 'test' 8886edc9cf migration: add capability to bypass the shared memory === OUTPUT BEGIN === Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into '/var/tmp/patchew-tester-tmp-_154viaz/src/dtc'... Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42' BUILD fedora make[1]: Entering directory '/var/tmp/patchew-tester-tmp-_154viaz/src' GEN /var/tmp/patchew-tester-tmp-_154viaz/src/docker-src.2018-04-01-04.54.41.32446/qemu.tar Cloning into '/var/tmp/patchew-tester-tmp-_154viaz/src/docker-src.2018-04-01-04.54.41.32446/qemu.tar.vroot'... done. Checking out files: 44% (2682/6066) Checking out files: 45% (2730/6066) Checking out files: 46% (2791/6066) Checking out files: 47% (2852/6066) Checking out files: 48% (2912/6066) Checking out files: 49% (2973/6066) Checking out files: 50% (3033/6066) Checking out files: 51% (3094/6066) Checking out files: 51% (3100/6066) Checking out files: 52% (3155/6066) Checking out files: 53% (3215/6066) Checking out files: 54% (3276/6066) Checking out files: 55% (3337/6066) Checking out files: 55% (3378/6066) Checking out files: 56% (3397/6066) Checking out files: 57% (3458/6066) Checking out files: 58% (3519/6066) Checking out files: 59% (3579/6066) Checking out files: 59% (3629/6066) Checking out files: 60% (3640/6066) Checking out files: 61% (3701/6066) Checking out files: 62% (3761/6066) Checking out files: 63% (3822/6066) Checking out files: 64% (3883/6066) Checking out files: 65% (3943/6066) Checking out files: 66% (4004/6066) Checking out files: 67% (4065/6066) Checking out files: 68% (4125/6066) Checking out files: 69% (4186/6066) Checking out files: 70% (4247/6066) Checking out files: 71% (4307/6066) Checking out files: 72% (4368/6066) Checking out files: 73% (4429/6066) Checking out files: 74% (4489/6066) Checking out files: 75% (4550/6066) Checking out files: 76% (4611/6066) Checking out files: 77% (4671/6066) Checking out files: 78% (4732/6066) Checking out files: 79% (4793/6066) Checking out files: 80% (4853/6066) Checking out files: 81% (4914/6066) Checking out files: 82% (4975/6066) Checking out files: 83% (5035/6066) Checking out files: 84% (5096/6066) Checking out files: 85% (5157/6066) Checking out files: 86% (5217/6066) Checking out files: 87% (5278/6066) Checking out files: 88% (5339/6066) Checking out files: 89% (5399/6066) Checking out files: 90% (5460/6066) Checking out files: 91% (5521/6066) Checking out files: 92% (5581/6066) Checking out files: 93% (5642/6066) Checking out files: 94% (5703/6066) Checking out files: 95% (5763/6066) Checking out files: 96% (5824/6066) Checking out files: 97% (5885/6066) Checking out files: 97% (5943/6066) Checking out files: 98% (5945/6066) Checking out files: 99% (6006/6066) Checking out files: 100% (6066/6066) Checking out files: 100% (6066/6066), done. Your branch is up-to-date with 'origin/test'. Submodule 'dtc' (git://git.qemu-project.org/dtc.git) registered for path 'dtc' Cloning into '/var/tmp/patchew-tester-tmp-_154viaz/src/docker-src.2018-04-01-04.54.41.32446/qemu.tar.vroot/dtc'... Submodule path 'dtc': checked out 'e54388015af1fb4bf04d0bca99caba1074d9cc42' Submodule 'ui/keycodemapdb' (git://git.qemu.org/keycodemapdb.git) registered for path 'ui/keycodemapdb' Cloning into '/var/tmp/patchew-tester-tmp-_154viaz/src/docker-src.2018-04-01-04.54.41.32446/qemu.tar.vroot/ui/keycodemapdb'... Submodule path 'ui/keycodemapdb': checked out '6b3d716e2b6472eb7189d3220552280ef3d832ce' COPY RUNNER RUN test-mingw in qemu:fedora Packages installed: PyYAML-3.12-5.fc27.x86_64 SDL-devel-1.2.15-29.fc27.x86_64 bc-1.07.1-3.fc27.x86_64 bison-3.0.4-8.fc27.x86_64 bzip2-1.0.6-24.fc27.x86_64 ccache-3.3.6-1.fc27.x86_64 clang-5.0.1-3.fc27.x86_64 findutils-4.6.0-16.fc27.x86_64 flex-2.6.1-5.fc27.x86_64 gcc-7.3.1-5.fc27.x86_64 gcc-c++-7.3.1-5.fc27.x86_64 gettext-0.19.8.1-12.fc27.x86_64 git-2.14.3-3.fc27.x86_64 glib2-devel-2.54.3-2.fc27.x86_64 hostname-3.18-4.fc27.x86_64 libaio-devel-0.3.110-9.fc27.x86_64 libasan-7.3.1-5.fc27.x86_64 libfdt-devel-1.4.6-1.fc27.x86_64 libubsan-7.3.1-5.fc27.x86_64 llvm-5.0.1-3.fc27.x86_64 make-4.2.1-4.fc27.x86_64 mingw32-SDL-1.2.15-9.fc27.noarch mingw32-bzip2-1.0.6-9.fc27.noarch mingw32-curl-7.54.1-2.fc27.noarch mingw32-glib2-2.54.1-1.fc27.noarch mingw32-gmp-6.1.2-2.fc27.noarch mingw32-gnutls-3.5.13-2.fc27.noarch mingw32-gtk2-2.24.31-4.fc27.noarch mingw32-gtk3-3.22.16-1.fc27.noarch mingw32-libjpeg-turbo-1.5.1-3.fc27.noarch mingw32-libpng-1.6.29-2.fc27.noarch mingw32-libssh2-1.8.0-3.fc27.noarch mingw32-libtasn1-4.13-1.fc27.noarch mingw32-nettle-3.3-3.fc27.noarch mingw32-pixman-0.34.0-3.fc27.noarch mingw32-pkg-config-0.28-9.fc27.x86_64 mingw64-SDL-1.2.15-9.fc27.noarch mingw64-bzip2-1.0.6-9.fc27.noarch mingw64-curl-7.54.1-2.fc27.noarch mingw64-glib2-2.54.1-1.fc27.noarch mingw64-gmp-6.1.2-2.fc27.noarch mingw64-gnutls-3.5.13-2.fc27.noarch mingw64-gtk2-2.24.31-4.fc27.noarch mingw64-gtk3-3.22.16-1.fc27.noarch mingw64-libjpeg-turbo-1.5.1-3.fc27.noarch mingw64-libpng-1.6.29-2.fc27.noarch mingw64-libssh2-1.8.0-3.fc27.noarch mingw64-libtasn1-4.13-1.fc27.noarch mingw64-nettle-3.3-3.fc27.noarch mingw64-pixman-0.34.0-3.fc27.noarch mingw64-pkg-config-0.28-9.fc27.x86_64 nettle-devel-3.4-1.fc27.x86_64 perl-5.26.1-403.fc27.x86_64 pixman-devel-0.34.0-4.fc27.x86_64 python3-3.6.2-13.fc27.x86_64 sparse-0.5.1-2.fc27.x86_64 tar-1.29-7.fc27.x86_64 which-2.21-4.fc27.x86_64 zlib-devel-1.2.11-4.fc27.x86_64 Environment variables: TARGET_LIST= PACKAGES=ccache gettext git tar PyYAML sparse flex bison python3 bzip2 hostname glib2-devel pixman-devel zlib-devel SDL-devel libfdt-devel gcc gcc-c++ llvm clang make perl which bc findutils libaio-devel nettle-devel libasan libubsan mingw32-pixman mingw32-glib2 mingw32-gmp mingw32-SDL mingw32-pkg-config mingw32-gtk2 mingw32-gtk3 mingw32-gnutls mingw32-nettle mingw32-libtasn1 mingw32-libjpeg-turbo mingw32-libpng mingw32-curl mingw32-libssh2 mingw32-bzip2 mingw64-pixman mingw64-glib2 mingw64-gmp mingw64-SDL mingw64-pkg-config mingw64-gtk2 mingw64-gtk3 mingw64-gnutls mingw64-nettle mingw64-libtasn1 mingw64-libjpeg-turbo mingw64-libpng mingw64-curl mingw64-libssh2 mingw64-bzip2 J=8 V= HOSTNAME=0a5568de7c6a DEBUG= SHOW_ENV=1 PWD=/ HOME=/root CCACHE_DIR=/var/tmp/ccache DISTTAG=f27container QEMU_CONFIGURE_OPTS=--python=/usr/bin/python3 FGC=f27 TEST_DIR=/tmp/qemu-test SHLVL=1 FEATURES=mingw clang pyyaml asan dtc PATH=/usr/lib/ccache:/usr/lib64/ccache:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin MAKEFLAGS= -j8 EXTRA_CONFIGURE_OPTS= _=/usr/bin/env Configure options: --enable-werror --target-list=x86_64-softmmu,aarch64-softmmu --prefix=/tmp/qemu-test/install --python=/usr/bin/python3 --cross-prefix=x86_64-w64-mingw32- --enable-trace-backends=simple --enable-gnutls --enable-nettle --enable-curl --enable-vnc --enable-bzip2 --enable-guest-agent --with-sdlabi=1.2 --with-gtkabi=2.0 Install prefix /tmp/qemu-test/install BIOS directory /tmp/qemu-test/install firmware path /tmp/qemu-test/install/share/qemu-firmware binary directory /tmp/qemu-test/install library directory /tmp/qemu-test/install/lib module directory /tmp/qemu-test/install/lib libexec directory /tmp/qemu-test/install/libexec include directory /tmp/qemu-test/install/include config directory /tmp/qemu-test/install local state directory queried at runtime Windows SDK no Source path /tmp/qemu-test/src GIT binary git GIT submodules C compiler x86_64-w64-mingw32-gcc Host C compiler cc C++ compiler x86_64-w64-mingw32-g++ Objective-C compiler clang ARFLAGS rv CFLAGS -O2 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=2 -g QEMU_CFLAGS -I/usr/x86_64-w64-mingw32/sys-root/mingw/include/pixman-1 -I$(SRC_PATH)/dtc/libfdt -Werror -DHAS_LIBSSH2_SFTP_FSYNC -mms-bitfields -I/usr/x86_64-w64-mingw32/sys-root/mingw/include/glib-2.0 -I/usr/x86_64-w64-mingw32/sys-root/mingw/lib/glib-2.0/include -I/usr/x86_64-w64-mingw32/sys-root/mingw/include -m64 -mcx16 -mthreads -D__USE_MINGW_ANSI_STDIO=1 -DWIN32_LEAN_AND_MEAN -DWINVER=0x501 -D_GNU_SOURCE -D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE -Wstrict-prototypes -Wredundant-decls -Wall -Wundef -Wwrite-strings -Wmissing-prototypes -fno-strict-aliasing -fno-common -fwrapv -Wexpansion-to-defined -Wendif-labels -Wno-shift-negative-value -Wno-missing-include-dirs -Wempty-body -Wnested-externs -Wformat-security -Wformat-y2k -Winit-self -Wignored-qualifiers -Wold-style-declaration -Wold-style-definition -Wtype-limits -fstack-protector-strong -I/usr/x86_64-w64-mingw32/sys-root/mingw/include -I/usr/x86_64-w64-mingw32/sys-root/mingw/include/p11-kit-1 -I/usr/x86_64-w64-mingw32/sys-root/mingw/include -I/usr/x86_64-w64-mingw32/sys-root/mingw/include -I/usr/x86_64-w64-mingw32/sys-root/mingw/include/libpng16 LDFLAGS -Wl,--nxcompat -Wl,--no-seh -Wl,--dynamicbase -Wl,--warn-common -m64 -g make make install install python /usr/bin/python3 -B smbd /usr/sbin/smbd module support no host CPU x86_64 host big endian no target list x86_64-softmmu aarch64-softmmu gprof enabled no sparse enabled no strip binaries yes profiler no static build no SDL support yes (1.2.15) GTK support yes (2.24.31) GTK GL support no VTE support no TLS priority NORMAL GNUTLS support yes GNUTLS rnd yes libgcrypt no libgcrypt kdf no nettle yes (3.3) nettle kdf yes libtasn1 yes curses support no virgl support no curl support yes mingw32 support yes Audio drivers dsound Block whitelist (rw) Block whitelist (ro) VirtFS support no Multipath support no VNC support yes VNC SASL support no VNC JPEG support yes VNC PNG support yes xen support no brlapi support no bluez support no Documentation no PIE no vde support no netmap support no Linux AIO support no ATTR/XATTR support no Install blobs yes KVM support no HAX support yes HVF support no WHPX support no TCG support yes TCG debug enabled no TCG interpreter no malloc trim support no RDMA support no fdt support yes membarrier no preadv support no fdatasync no madvise no posix_madvise no posix_memalign no libcap-ng support no vhost-net support no vhost-crypto support no vhost-scsi support no vhost-vsock support no vhost-user support no Trace backends simple Trace output file trace-<pid> spice support no rbd support no xfsctl support no smartcard support no libusb no usb net redir no OpenGL support no OpenGL dmabufs no libiscsi support no libnfs support no build guest agent yes QGA VSS support no QGA w32 disk info yes QGA MSI support no seccomp support no coroutine backend win32 coroutine pool yes debug stack usage no crypto afalg no GlusterFS support no gcov gcov gcov enabled no TPM support yes libssh2 support yes TPM passthrough no TPM emulator no QOM debugging yes Live block migration yes lzo support no snappy support no bzip2 support yes NUMA host support no libxml2 no tcmalloc support no jemalloc support no avx2 optimization yes replication support yes VxHS block device no capstone no WARNING: Use of GTK 2.0 is deprecated and will be removed in WARNING: future releases. Please switch to using GTK 3.0 WARNING: Use of SDL 1.2 is deprecated and will be removed in WARNING: future releases. Please switch to using SDL 2.0 mkdir -p dtc/libfdt mkdir -p dtc/tests GEN x86_64-softmmu/config-devices.mak.tmp GEN config-host.h GEN aarch64-softmmu/config-devices.mak.tmp GEN qemu-options.def GEN qapi-gen GEN trace/generated-tcg-tracers.h GEN trace/generated-helpers-wrappers.h GEN trace/generated-helpers.h GEN trace/generated-helpers.c GEN aarch64-softmmu/config-devices.mak GEN module_block.h GEN x86_64-softmmu/config-devices.mak GEN ui/input-keymap-atset1-to-qcode.c GEN ui/input-keymap-linux-to-qcode.c GEN ui/input-keymap-qcode-to-atset3.c GEN ui/input-keymap-qcode-to-atset2.c GEN ui/input-keymap-qcode-to-atset1.c GEN ui/input-keymap-qcode-to-linux.c GEN ui/input-keymap-qcode-to-qnum.c GEN ui/input-keymap-qcode-to-sun.c GEN ui/input-keymap-qnum-to-qcode.c GEN ui/input-keymap-win32-to-qcode.c GEN ui/input-keymap-usb-to-qcode.c GEN ui/input-keymap-x11-to-qcode.c GEN ui/input-keymap-xorgevdev-to-qcode.c GEN ui/input-keymap-xorgkbd-to-qcode.c GEN ui/input-keymap-xorgxquartz-to-qcode.c GEN ui/input-keymap-xorgxwin-to-qcode.c GEN trace-root.h GEN tests/test-qapi-gen GEN util/trace.h GEN crypto/trace.h GEN io/trace.h GEN migration/trace.h GEN block/trace.h GEN chardev/trace.h GEN hw/block/trace.h GEN hw/block/dataplane/trace.h GEN hw/char/trace.h GEN hw/intc/trace.h GEN hw/net/trace.h GEN hw/rdma/trace.h GEN hw/rdma/vmw/trace.h GEN hw/virtio/trace.h GEN hw/audio/trace.h GEN hw/misc/trace.h GEN hw/misc/macio/trace.h GEN hw/usb/trace.h GEN hw/scsi/trace.h GEN hw/nvram/trace.h GEN hw/display/trace.h GEN hw/input/trace.h GEN hw/timer/trace.h GEN hw/dma/trace.h GEN hw/sparc/trace.h GEN hw/sparc64/trace.h GEN hw/sd/trace.h GEN hw/isa/trace.h GEN hw/mem/trace.h GEN hw/i386/trace.h GEN hw/i386/xen/trace.h GEN hw/9pfs/trace.h GEN hw/ppc/trace.h GEN hw/pci/trace.h GEN hw/pci-host/trace.h GEN hw/s390x/trace.h GEN hw/vfio/trace.h GEN hw/acpi/trace.h GEN hw/arm/trace.h GEN hw/alpha/trace.h GEN hw/hppa/trace.h GEN hw/xen/trace.h GEN hw/ide/trace.h GEN hw/tpm/trace.h GEN ui/trace.h GEN audio/trace.h GEN net/trace.h GEN target/arm/trace.h GEN target/i386/trace.h GEN target/mips/trace.h GEN target/sparc/trace.h GEN target/s390x/trace.h GEN target/ppc/trace.h GEN qom/trace.h GEN linux-user/trace.h GEN qapi/trace.h GEN accel/tcg/trace.h GEN accel/kvm/trace.h GEN nbd/trace.h GEN scsi/trace.h GEN trace-root.c GEN util/trace.c GEN crypto/trace.c GEN io/trace.c GEN migration/trace.c GEN block/trace.c GEN chardev/trace.c GEN hw/block/trace.c GEN hw/block/dataplane/trace.c GEN hw/char/trace.c GEN hw/intc/trace.c GEN hw/net/trace.c GEN hw/rdma/trace.c GEN hw/rdma/vmw/trace.c GEN hw/virtio/trace.c GEN hw/audio/trace.c GEN hw/misc/trace.c GEN hw/misc/macio/trace.c GEN hw/usb/trace.c GEN hw/scsi/trace.c GEN hw/nvram/trace.c GEN hw/display/trace.c GEN hw/input/trace.c GEN hw/timer/trace.c GEN hw/dma/trace.c GEN hw/sparc/trace.c GEN hw/sparc64/trace.c GEN hw/sd/trace.c GEN hw/isa/trace.c GEN hw/mem/trace.c GEN hw/i386/trace.c GEN hw/i386/xen/trace.c GEN hw/9pfs/trace.c GEN hw/ppc/trace.c GEN hw/pci/trace.c GEN hw/pci-host/trace.c GEN hw/s390x/trace.c GEN hw/vfio/trace.c GEN hw/acpi/trace.c GEN hw/arm/trace.c GEN hw/alpha/trace.c GEN hw/hppa/trace.c GEN hw/xen/trace.c GEN hw/ide/trace.c GEN hw/tpm/trace.c GEN ui/trace.c GEN audio/trace.c GEN net/trace.c GEN target/arm/trace.c GEN target/i386/trace.c GEN target/mips/trace.c GEN target/sparc/trace.c GEN target/s390x/trace.c GEN target/ppc/trace.c GEN qom/trace.c GEN linux-user/trace.c GEN qapi/trace.c GEN accel/tcg/trace.c GEN accel/kvm/trace.c GEN nbd/trace.c GEN scsi/trace.c GEN config-all-devices.mak DEP /tmp/qemu-test/src/dtc/tests/dumptrees.c DEP /tmp/qemu-test/src/dtc/tests/trees.S DEP /tmp/qemu-test/src/dtc/tests/testutils.c DEP /tmp/qemu-test/src/dtc/tests/value-labels.c DEP /tmp/qemu-test/src/dtc/tests/asm_tree_dump.c DEP /tmp/qemu-test/src/dtc/tests/truncated_property.c DEP /tmp/qemu-test/src/dtc/tests/check_path.c DEP /tmp/qemu-test/src/dtc/tests/overlay_bad_fixup.c DEP /tmp/qemu-test/src/dtc/tests/overlay.c DEP /tmp/qemu-test/src/dtc/tests/subnode_iterate.c DEP /tmp/qemu-test/src/dtc/tests/property_iterate.c DEP /tmp/qemu-test/src/dtc/tests/integer-expressions.c DEP /tmp/qemu-test/src/dtc/tests/utilfdt_test.c DEP /tmp/qemu-test/src/dtc/tests/path_offset_aliases.c DEP /tmp/qemu-test/src/dtc/tests/add_subnode_with_nops.c DEP /tmp/qemu-test/src/dtc/tests/dtbs_equal_unordered.c DEP /tmp/qemu-test/src/dtc/tests/dtb_reverse.c DEP /tmp/qemu-test/src/dtc/tests/dtbs_equal_ordered.c DEP /tmp/qemu-test/src/dtc/tests/extra-terminating-null.c DEP /tmp/qemu-test/src/dtc/tests/incbin.c DEP /tmp/qemu-test/src/dtc/tests/boot-cpuid.c DEP /tmp/qemu-test/src/dtc/tests/phandle_format.c DEP /tmp/qemu-test/src/dtc/tests/path-references.c DEP /tmp/qemu-test/src/dtc/tests/references.c DEP /tmp/qemu-test/src/dtc/tests/string_escapes.c DEP /tmp/qemu-test/src/dtc/tests/propname_escapes.c DEP /tmp/qemu-test/src/dtc/tests/appendprop2.c DEP /tmp/qemu-test/src/dtc/tests/appendprop1.c DEP /tmp/qemu-test/src/dtc/tests/del_node.c DEP /tmp/qemu-test/src/dtc/tests/del_property.c DEP /tmp/qemu-test/src/dtc/tests/setprop.c DEP /tmp/qemu-test/src/dtc/tests/rw_tree1.c DEP /tmp/qemu-test/src/dtc/tests/set_name.c DEP /tmp/qemu-test/src/dtc/tests/open_pack.c DEP /tmp/qemu-test/src/dtc/tests/nopulate.c DEP /tmp/qemu-test/src/dtc/tests/mangle-layout.c DEP /tmp/qemu-test/src/dtc/tests/move_and_save.c DEP /tmp/qemu-test/src/dtc/tests/sw_tree1.c DEP /tmp/qemu-test/src/dtc/tests/nop_node.c DEP /tmp/qemu-test/src/dtc/tests/nop_property.c DEP /tmp/qemu-test/src/dtc/tests/setprop_inplace.c DEP /tmp/qemu-test/src/dtc/tests/stringlist.c DEP /tmp/qemu-test/src/dtc/tests/addr_size_cells.c DEP /tmp/qemu-test/src/dtc/tests/notfound.c DEP /tmp/qemu-test/src/dtc/tests/sized_cells.c DEP /tmp/qemu-test/src/dtc/tests/char_literal.c DEP /tmp/qemu-test/src/dtc/tests/get_alias.c DEP /tmp/qemu-test/src/dtc/tests/node_offset_by_compatible.c DEP /tmp/qemu-test/src/dtc/tests/node_check_compatible.c DEP /tmp/qemu-test/src/dtc/tests/node_offset_by_phandle.c DEP /tmp/qemu-test/src/dtc/tests/node_offset_by_prop_value.c DEP /tmp/qemu-test/src/dtc/tests/parent_offset.c DEP /tmp/qemu-test/src/dtc/tests/supernode_atdepth_offset.c DEP /tmp/qemu-test/src/dtc/tests/get_path.c DEP /tmp/qemu-test/src/dtc/tests/get_phandle.c DEP /tmp/qemu-test/src/dtc/tests/getprop.c DEP /tmp/qemu-test/src/dtc/tests/get_name.c DEP /tmp/qemu-test/src/dtc/tests/path_offset.c DEP /tmp/qemu-test/src/dtc/tests/subnode_offset.c DEP /tmp/qemu-test/src/dtc/tests/find_property.c DEP /tmp/qemu-test/src/dtc/tests/root_node.c DEP /tmp/qemu-test/src/dtc/tests/get_mem_rsv.c DEP /tmp/qemu-test/src/dtc/libfdt/fdt_overlay.c DEP /tmp/qemu-test/src/dtc/libfdt/fdt_addresses.c DEP /tmp/qemu-test/src/dtc/libfdt/fdt_empty_tree.c DEP /tmp/qemu-test/src/dtc/libfdt/fdt_strerror.c DEP /tmp/qemu-test/src/dtc/libfdt/fdt_rw.c DEP /tmp/qemu-test/src/dtc/libfdt/fdt_sw.c DEP /tmp/qemu-test/src/dtc/libfdt/fdt_wip.c DEP /tmp/qemu-test/src/dtc/libfdt/fdt_ro.c DEP /tmp/qemu-test/src/dtc/libfdt/fdt.c DEP /tmp/qemu-test/src/dtc/util.c DEP /tmp/qemu-test/src/dtc/fdtoverlay.c DEP /tmp/qemu-test/src/dtc/fdtput.c DEP /tmp/qemu-test/src/dtc/fdtget.c DEP /tmp/qemu-test/src/dtc/fdtdump.c LEX convert-dtsv0-lexer.lex.c DEP /tmp/qemu-test/src/dtc/srcpos.c BISON dtc-parser.tab.c LEX dtc-lexer.lex.c DEP /tmp/qemu-test/src/dtc/treesource.c DEP /tmp/qemu-test/src/dtc/livetree.c DEP /tmp/qemu-test/src/dtc/fstree.c DEP /tmp/qemu-test/src/dtc/flattree.c DEP /tmp/qemu-test/src/dtc/dtc.c DEP /tmp/qemu-test/src/dtc/data.c DEP /tmp/qemu-test/src/dtc/checks.c DEP convert-dtsv0-lexer.lex.c DEP dtc-parser.tab.c DEP dtc-lexer.lex.c CHK version_gen.h UPD version_gen.h DEP /tmp/qemu-test/src/dtc/util.c CC libfdt/fdt.o CC libfdt/fdt_ro.o CC libfdt/fdt_wip.o CC libfdt/fdt_empty_tree.o CC libfdt/fdt_sw.o CC libfdt/fdt_strerror.o CC libfdt/fdt_addresses.o CC libfdt/fdt_rw.o CC libfdt/fdt_overlay.o AR libfdt/libfdt.a x86_64-w64-mingw32-ar: creating libfdt/libfdt.a a - libfdt/fdt.o a - libfdt/fdt_ro.o a - libfdt/fdt_wip.o a - libfdt/fdt_sw.o a - libfdt/fdt_rw.o a - libfdt/fdt_strerror.o a - libfdt/fdt_empty_tree.o a - libfdt/fdt_addresses.o a - libfdt/fdt_overlay.o RC version.o mkdir -p dtc/libfdt mkdir -p dtc/tests GEN qga/qapi-generated/qapi-gen CC qapi/qapi-types.o CC qapi/qapi-builtin-types.o CC qapi/qapi-types-block-core.o CC qapi/qapi-types-char.o CC qapi/qapi-types-common.o CC qapi/qapi-types-crypto.o CC qapi/qapi-types-block.o CC qapi/qapi-types-introspect.o CC qapi/qapi-types-migration.o CC qapi/qapi-types-misc.o CC qapi/qapi-types-net.o CC qapi/qapi-types-rocker.o CC qapi/qapi-types-run-state.o CC qapi/qapi-types-sockets.o CC qapi/qapi-types-tpm.o CC qapi/qapi-types-trace.o CC qapi/qapi-types-transaction.o CC qapi/qapi-types-ui.o CC qapi/qapi-builtin-visit.o CC qapi/qapi-visit.o CC qapi/qapi-visit-block-core.o CC qapi/qapi-visit-block.o CC qapi/qapi-visit-char.o CC qapi/qapi-visit-common.o CC qapi/qapi-visit-crypto.o CC qapi/qapi-visit-introspect.o CC qapi/qapi-visit-migration.o CC qapi/qapi-visit-misc.o CC qapi/qapi-visit-net.o CC qapi/qapi-visit-rocker.o CC qapi/qapi-visit-run-state.o CC qapi/qapi-visit-sockets.o CC qapi/qapi-visit-tpm.o CC qapi/qapi-visit-trace.o CC qapi/qapi-visit-transaction.o CC qapi/qapi-visit-ui.o CC qapi/qapi-events.o CC qapi/qapi-events-block-core.o CC qapi/qapi-events-block.o CC qapi/qapi-events-char.o make: *** [/tmp/qemu-test/src/rules.mak:66: qapi/qapi-types.o] Error 1 make: *** Waiting for unfinished jobs.... Traceback (most recent call last): File "./tests/docker/docker.py", line 407, in <module> sys.exit(main()) File "./tests/docker/docker.py", line 404, in main return args.cmdobj.run(args, argv) File "./tests/docker/docker.py", line 261, in run return Docker().run(argv, args.keep, quiet=args.quiet) File "./tests/docker/docker.py", line 229, in run quiet=quiet) File "./tests/docker/docker.py", line 147, in _do_check return subprocess.check_call(self._command + cmd, **kwargs) File "/usr/lib64/python2.7/subprocess.py", line 186, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['docker', 'run', '--label', 'com.qemu.instance.uuid=624ffe64358a11e8aeea52540069c830', '-u', '0', '--security-opt', 'seccomp=unconfined', '--rm', '--net=none', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=8', '-e', 'DEBUG=', '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/root/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-_154viaz/src/docker-src.2018-04-01-04.54.41.32446:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2 make[1]: *** [tests/docker/Makefile.include:129: docker-run] Error 1 make[1]: Leaving directory '/var/tmp/patchew-tester-tmp-_154viaz/src' make: *** [tests/docker/Makefile.include:163: docker-run-test-mingw@fedora] Error 2 real 2m16.109s user 0m9.531s sys 0m8.757s === OUTPUT END === Test command exited with code: 2 --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-devel@redhat.com
1) What's this
When the migration capability 'bypass-shared-memory'
is set, the shared memory will be bypassed when migration.
It is the key feature to enable several excellent features for
the qemu, such as qemu-local-migration, qemu-live-update,
extremely-fast-save-restore, vm-template, vm-fast-live-clone,
yet-another-post-copy-migration, etc..
The philosophy behind this key feature, including the resulting
advanced key features, is that a part of the memory management
is separated out from the qemu, and let the other toolkits
such as libvirt, kata-containers (https://github.com/kata-containers)
runv(https://github.com/hyperhq/runv/) or some multiple cooperative
qemu commands directly access to it, manage it, provide features on it.
2) Status in real world
The hyperhq(http://hyper.sh http://hypercontainer.io/)
introduced the feature vm-template(vm-fast-live-clone)
to the hyper container for several years, it works perfect.
(see https://github.com/hyperhq/runv/pull/297).
The feature vm-template makes the containers(VMs) can
be started in 130ms and save 80M memory for every
container(VM). So that the hyper containers are fast
and high-density as normal containers.
kata-containers project (https://github.com/kata-containers)
which was launched by hyper, intel and friends and which descended
from runv (and clear-container) should have this feature enabled.
Unfortunately, due to the code confliction between runv&cc,
this feature was temporary disabled and it is being brought
back by hyper and intel team.
3) How to use and bring up advanced features.
In current qemu command line, shared memory has
to be configured via memory-object.
a) feature: qemu-local-migration, qemu-live-update
Set the mem-path on the tmpfs and set share=on for it when
start the vm. example:
-object \
memory-backend-file,id=mem,size=128M,mem-path=/dev/shm/memory,share=on \
-numa node,nodeid=0,cpus=0-7,memdev=mem
when you want to migrate the vm locally (after fixed a security bug
of the qemu-binary, or other reason), you can start a new qemu with
the same command line and -incoming, then you can migrate the
vm from the old qemu to the new qemu with the migration capability
'bypass-shared-memory' set. The migration will migrate the device-state
*ONLY*, the memory is the origin memory backed by tmpfs file.
b) feature: extremely-fast-save-restore
the same above, but the mem-path is on the persistent file system.
c) feature: vm-template, vm-fast-live-clone
the template vm is started as 1), and paused when the guest reaches
the template point(example: the guest app is ready), then the template
vm is saved. (the qemu process of the template can be killed now, because
we need only the memory and the device state files (in tmpfs)).
Then we can launch one or multiple VMs base on the template vm states,
the new VMs are started without the “share=on”, all the new VMs share
the initial memory from the memory file, they save a lot of memory.
all the new VMs start from the template point, the guest app can go to
work quickly.
The new VM booted from template vm can’t become template again,
if you need this unusual chained-template feature, you can write
a cloneable-tmpfs kernel module for it.
The libvirt toolkit can’t manage vm-template currently, in the
hyperhq/runv, we use qemu wrapper script to do it. I hope someone add
“libvrit managed template” feature to libvirt.
d) feature: yet-another-post-copy-migration
It is a possible feature, no toolkit can do it well now.
Using nbd server/client on the memory file is reluctantly Ok but
inconvenient. A special feature for tmpfs might be needed to
fully complete this feature.
No one need yet another post copy migration method,
but it is possible when some crazy man need it.
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Sebastien Boeuf <sebastien.boeuf@intel.com>
Cc: James O. D. Hunt <james.o.hunt@intel.com>
Cc: Xu Wang <gnawux@gmail.com>
Cc: Peng Tao <bergwolf@gmail.com>
Cc: Xiao Guangrong <xiaoguangrong@tencent.com>
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
---
Changes in V4:
fixes checkpatch.pl errors
Changes in V3:
rebased on upstream master
update the available version of the capability to
v2.13
Changes in V2:
rebased on 2.11.1
migration/migration.c | 14 ++++++++++++++
migration/migration.h | 1 +
migration/ram.c | 27 ++++++++++++++++++---------
qapi/migration.json | 6 +++++-
4 files changed, 38 insertions(+), 10 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 52a5092add..6a63102d7f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1509,6 +1509,20 @@ bool migrate_release_ram(void)
return s->enabled_capabilities[MIGRATION_CAPABILITY_RELEASE_RAM];
}
+bool migrate_bypass_shared_memory(void)
+{
+ MigrationState *s;
+
+ /* it is not workable with postcopy yet. */
+ if (migrate_postcopy_ram()) {
+ return false;
+ }
+
+ s = migrate_get_current();
+
+ return s->enabled_capabilities[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY];
+}
+
bool migrate_postcopy_ram(void)
{
MigrationState *s;
diff --git a/migration/migration.h b/migration/migration.h
index 8d2f320c48..cfd2513ef0 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -206,6 +206,7 @@ MigrationState *migrate_get_current(void);
bool migrate_postcopy(void);
+bool migrate_bypass_shared_memory(void);
bool migrate_release_ram(void);
bool migrate_postcopy_ram(void);
bool migrate_zero_blocks(void);
diff --git a/migration/ram.c b/migration/ram.c
index 0e90efa092..bca170c386 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -780,6 +780,11 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
unsigned long *bitmap = rb->bmap;
unsigned long next;
+ /* when this ramblock is requested bypassing */
+ if (!bitmap) {
+ return size;
+ }
+
if (rs->ram_bulk_stage && start > 0) {
next = start + 1;
} else {
@@ -850,7 +855,9 @@ static void migration_bitmap_sync(RAMState *rs)
qemu_mutex_lock(&rs->bitmap_mutex);
rcu_read_lock();
RAMBLOCK_FOREACH(block) {
- migration_bitmap_sync_range(rs, block, 0, block->used_length);
+ if (!migrate_bypass_shared_memory() || !qemu_ram_is_shared(block)) {
+ migration_bitmap_sync_range(rs, block, 0, block->used_length);
+ }
}
rcu_read_unlock();
qemu_mutex_unlock(&rs->bitmap_mutex);
@@ -2132,18 +2139,12 @@ static int ram_state_init(RAMState **rsp)
qemu_mutex_init(&(*rsp)->src_page_req_mutex);
QSIMPLEQ_INIT(&(*rsp)->src_page_requests);
- /*
- * Count the total number of pages used by ram blocks not including any
- * gaps due to alignment or unplugs.
- */
- (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
-
ram_state_reset(*rsp);
return 0;
}
-static void ram_list_init_bitmaps(void)
+static void ram_list_init_bitmaps(RAMState *rs)
{
RAMBlock *block;
unsigned long pages;
@@ -2151,9 +2152,17 @@ static void ram_list_init_bitmaps(void)
/* Skip setting bitmap if there is no RAM */
if (ram_bytes_total()) {
QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+ if (migrate_bypass_shared_memory() && qemu_ram_is_shared(block)) {
+ continue;
+ }
pages = block->max_length >> TARGET_PAGE_BITS;
block->bmap = bitmap_new(pages);
bitmap_set(block->bmap, 0, pages);
+ /*
+ * Count the total number of pages used by ram blocks not
+ * including any gaps due to alignment or unplugs.
+ */
+ rs->migration_dirty_pages += pages;
if (migrate_postcopy_ram()) {
block->unsentmap = bitmap_new(pages);
bitmap_set(block->unsentmap, 0, pages);
@@ -2169,7 +2178,7 @@ static void ram_init_bitmaps(RAMState *rs)
qemu_mutex_lock_ramlist();
rcu_read_lock();
- ram_list_init_bitmaps();
+ ram_list_init_bitmaps(rs);
memory_global_dirty_log_start();
migration_bitmap_sync(rs);
diff --git a/qapi/migration.json b/qapi/migration.json
index 9d0bf82cf4..45326480bd 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -357,13 +357,17 @@
# @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps.
# (since 2.12)
#
+# @bypass-shared-memory: the shared memory region will be bypassed on migration.
+# This feature allows the memory region to be reused by new qemu(s)
+# or be migrated separately. (since 2.13)
+#
# Since: 1.2
##
{ 'enum': 'MigrationCapability',
'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
'block', 'return-path', 'pause-before-switchover', 'x-multifd',
- 'dirty-bitmaps' ] }
+ 'dirty-bitmaps', 'bypass-shared-memory' ] }
##
# @MigrationCapabilityStatus:
--
2.14.3 (Apple Git-98)
On 04/04/2018 07:47 PM, Lai Jiangshan wrote: > 1) What's this > > When the migration capability 'bypass-shared-memory' > is set, the shared memory will be bypassed when migration. > > It is the key feature to enable several excellent features for > the qemu, such as qemu-local-migration, qemu-live-update, > extremely-fast-save-restore, vm-template, vm-fast-live-clone, > yet-another-post-copy-migration, etc.. > > The philosophy behind this key feature, including the resulting > advanced key features, is that a part of the memory management > is separated out from the qemu, and let the other toolkits > such as libvirt, kata-containers (https://github.com/kata-containers) > runv(https://github.com/hyperhq/runv/) or some multiple cooperative > qemu commands directly access to it, manage it, provide features on it. > > 2) Status in real world > > The hyperhq(http://hyper.sh http://hypercontainer.io/) > introduced the feature vm-template(vm-fast-live-clone) > to the hyper container for several years, it works perfect. > (see https://github.com/hyperhq/runv/pull/297). > > The feature vm-template makes the containers(VMs) can > be started in 130ms and save 80M memory for every > container(VM). So that the hyper containers are fast > and high-density as normal containers. > > kata-containers project (https://github.com/kata-containers) > which was launched by hyper, intel and friends and which descended > from runv (and clear-container) should have this feature enabled. > Unfortunately, due to the code confliction between runv&cc, > this feature was temporary disabled and it is being brought > back by hyper and intel team. > > 3) How to use and bring up advanced features. > > In current qemu command line, shared memory has > to be configured via memory-object. > > a) feature: qemu-local-migration, qemu-live-update > Set the mem-path on the tmpfs and set share=on for it when > start the vm. example: > -object \ > memory-backend-file,id=mem,size=128M,mem-path=/dev/shm/memory,share=on \ > -numa node,nodeid=0,cpus=0-7,memdev=mem > > when you want to migrate the vm locally (after fixed a security bug > of the qemu-binary, or other reason), you can start a new qemu with > the same command line and -incoming, then you can migrate the > vm from the old qemu to the new qemu with the migration capability > 'bypass-shared-memory' set. The migration will migrate the device-state > *ONLY*, the memory is the origin memory backed by tmpfs file. > > b) feature: extremely-fast-save-restore > the same above, but the mem-path is on the persistent file system. > > c) feature: vm-template, vm-fast-live-clone > the template vm is started as 1), and paused when the guest reaches > the template point(example: the guest app is ready), then the template > vm is saved. (the qemu process of the template can be killed now, because > we need only the memory and the device state files (in tmpfs)). > > Then we can launch one or multiple VMs base on the template vm states, > the new VMs are started without the “share=on”, all the new VMs share > the initial memory from the memory file, they save a lot of memory. > all the new VMs start from the template point, the guest app can go to > work quickly. > > The new VM booted from template vm can’t become template again, > if you need this unusual chained-template feature, you can write > a cloneable-tmpfs kernel module for it. > > The libvirt toolkit can’t manage vm-template currently, in the > hyperhq/runv, we use qemu wrapper script to do it. I hope someone add > “libvrit managed template” feature to libvirt. > > d) feature: yet-another-post-copy-migration > It is a possible feature, no toolkit can do it well now. > Using nbd server/client on the memory file is reluctantly Ok but > inconvenient. A special feature for tmpfs might be needed to > fully complete this feature. > No one need yet another post copy migration method, > but it is possible when some crazy man need it. Excellent work. :) It's a brilliant feature that can improve our production a lot. Reviewed-by: Xiao Guangrong <xiaoguangrong@tencent.com>
Hi, * Lai Jiangshan (jiangshanlai@gmail.com) wrote: > 1) What's this > > When the migration capability 'bypass-shared-memory' > is set, the shared memory will be bypassed when migration. > > It is the key feature to enable several excellent features for > the qemu, such as qemu-local-migration, qemu-live-update, > extremely-fast-save-restore, vm-template, vm-fast-live-clone, > yet-another-post-copy-migration, etc.. > > The philosophy behind this key feature, including the resulting > advanced key features, is that a part of the memory management > is separated out from the qemu, and let the other toolkits > such as libvirt, kata-containers (https://github.com/kata-containers) > runv(https://github.com/hyperhq/runv/) or some multiple cooperative > qemu commands directly access to it, manage it, provide features on it. > > 2) Status in real world > > The hyperhq(http://hyper.sh http://hypercontainer.io/) > introduced the feature vm-template(vm-fast-live-clone) > to the hyper container for several years, it works perfect. > (see https://github.com/hyperhq/runv/pull/297). > > The feature vm-template makes the containers(VMs) can > be started in 130ms and save 80M memory for every > container(VM). So that the hyper containers are fast > and high-density as normal containers. > > kata-containers project (https://github.com/kata-containers) > which was launched by hyper, intel and friends and which descended > from runv (and clear-container) should have this feature enabled. > Unfortunately, due to the code confliction between runv&cc, > this feature was temporary disabled and it is being brought > back by hyper and intel team. > > 3) How to use and bring up advanced features. > > In current qemu command line, shared memory has > to be configured via memory-object. > > a) feature: qemu-local-migration, qemu-live-update > Set the mem-path on the tmpfs and set share=on for it when > start the vm. example: > -object \ > memory-backend-file,id=mem,size=128M,mem-path=/dev/shm/memory,share=on \ > -numa node,nodeid=0,cpus=0-7,memdev=mem > > when you want to migrate the vm locally (after fixed a security bug > of the qemu-binary, or other reason), you can start a new qemu with > the same command line and -incoming, then you can migrate the > vm from the old qemu to the new qemu with the migration capability > 'bypass-shared-memory' set. The migration will migrate the device-state > *ONLY*, the memory is the origin memory backed by tmpfs file. > > b) feature: extremely-fast-save-restore > the same above, but the mem-path is on the persistent file system. > > c) feature: vm-template, vm-fast-live-clone > the template vm is started as 1), and paused when the guest reaches > the template point(example: the guest app is ready), then the template > vm is saved. (the qemu process of the template can be killed now, because > we need only the memory and the device state files (in tmpfs)). > > Then we can launch one or multiple VMs base on the template vm states, > the new VMs are started without the “share=on”, all the new VMs share > the initial memory from the memory file, they save a lot of memory. > all the new VMs start from the template point, the guest app can go to > work quickly. How do you handle the storage in this case, or giving each VM it's own MAC address? > The new VM booted from template vm can’t become template again, > if you need this unusual chained-template feature, you can write > a cloneable-tmpfs kernel module for it. > > The libvirt toolkit can’t manage vm-template currently, in the > hyperhq/runv, we use qemu wrapper script to do it. I hope someone add > “libvrit managed template” feature to libvirt. > d) feature: yet-another-post-copy-migration > It is a possible feature, no toolkit can do it well now. > Using nbd server/client on the memory file is reluctantly Ok but > inconvenient. A special feature for tmpfs might be needed to > fully complete this feature. > No one need yet another post copy migration method, > but it is possible when some crazy man need it. As the crazy person who did the existing postcopy; one is enough! Some minor fix requests below, but this looks nice and simple. Shared memory is interesting because tehre are lots of different uses; e.g. your uses, but also vhost-user which is sharing for a completely different reason. > Cc: Samuel Ortiz <sameo@linux.intel.com> > Cc: Sebastien Boeuf <sebastien.boeuf@intel.com> > Cc: James O. D. Hunt <james.o.hunt@intel.com> > Cc: Xu Wang <gnawux@gmail.com> > Cc: Peng Tao <bergwolf@gmail.com> > Cc: Xiao Guangrong <xiaoguangrong@tencent.com> > Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com> > Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> > --- > > Changes in V4: > fixes checkpatch.pl errors > > Changes in V3: > rebased on upstream master > update the available version of the capability to > v2.13 > > Changes in V2: > rebased on 2.11.1 > > migration/migration.c | 14 ++++++++++++++ > migration/migration.h | 1 + > migration/ram.c | 27 ++++++++++++++++++--------- > qapi/migration.json | 6 +++++- > 4 files changed, 38 insertions(+), 10 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 52a5092add..6a63102d7f 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -1509,6 +1509,20 @@ bool migrate_release_ram(void) > return s->enabled_capabilities[MIGRATION_CAPABILITY_RELEASE_RAM]; > } > > +bool migrate_bypass_shared_memory(void) > +{ > + MigrationState *s; > + > + /* it is not workable with postcopy yet. */ > + if (migrate_postcopy_ram()) { > + return false; > + } Please change this to work in the same way as the check for postcopy+compress in migration.c migrate_caps_check. > + s = migrate_get_current(); > + > + return s->enabled_capabilities[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY]; > +} > + > bool migrate_postcopy_ram(void) > { > MigrationState *s; > diff --git a/migration/migration.h b/migration/migration.h > index 8d2f320c48..cfd2513ef0 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -206,6 +206,7 @@ MigrationState *migrate_get_current(void); > > bool migrate_postcopy(void); > > +bool migrate_bypass_shared_memory(void); > bool migrate_release_ram(void); > bool migrate_postcopy_ram(void); > bool migrate_zero_blocks(void); > diff --git a/migration/ram.c b/migration/ram.c > index 0e90efa092..bca170c386 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -780,6 +780,11 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, > unsigned long *bitmap = rb->bmap; > unsigned long next; > > + /* when this ramblock is requested bypassing */ > + if (!bitmap) { > + return size; > + } > + > if (rs->ram_bulk_stage && start > 0) { > next = start + 1; > } else { > @@ -850,7 +855,9 @@ static void migration_bitmap_sync(RAMState *rs) > qemu_mutex_lock(&rs->bitmap_mutex); > rcu_read_lock(); > RAMBLOCK_FOREACH(block) { > - migration_bitmap_sync_range(rs, block, 0, block->used_length); > + if (!migrate_bypass_shared_memory() || !qemu_ram_is_shared(block)) { > + migration_bitmap_sync_range(rs, block, 0, block->used_length); > + } > } > rcu_read_unlock(); > qemu_mutex_unlock(&rs->bitmap_mutex); > @@ -2132,18 +2139,12 @@ static int ram_state_init(RAMState **rsp) > qemu_mutex_init(&(*rsp)->src_page_req_mutex); > QSIMPLEQ_INIT(&(*rsp)->src_page_requests); > > - /* > - * Count the total number of pages used by ram blocks not including any > - * gaps due to alignment or unplugs. > - */ > - (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS; > - > ram_state_reset(*rsp); > > return 0; > } > > -static void ram_list_init_bitmaps(void) > +static void ram_list_init_bitmaps(RAMState *rs) > { > RAMBlock *block; > unsigned long pages; > @@ -2151,9 +2152,17 @@ static void ram_list_init_bitmaps(void) > /* Skip setting bitmap if there is no RAM */ > if (ram_bytes_total()) { I think you need to add here a : rs->migration_dirty_pages = 0; I don't see anywhere else that initialises it, and there is the case of a migration that fails, followed by a 2nd attempt. > QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { > + if (migrate_bypass_shared_memory() && qemu_ram_is_shared(block)) { > + continue; > + } > pages = block->max_length >> TARGET_PAGE_BITS; > block->bmap = bitmap_new(pages); > bitmap_set(block->bmap, 0, pages); > + /* > + * Count the total number of pages used by ram blocks not > + * including any gaps due to alignment or unplugs. > + */ > + rs->migration_dirty_pages += pages; > if (migrate_postcopy_ram()) { > block->unsentmap = bitmap_new(pages); > bitmap_set(block->unsentmap, 0, pages); > @@ -2169,7 +2178,7 @@ static void ram_init_bitmaps(RAMState *rs) > qemu_mutex_lock_ramlist(); > rcu_read_lock(); > > - ram_list_init_bitmaps(); > + ram_list_init_bitmaps(rs); > memory_global_dirty_log_start(); > migration_bitmap_sync(rs); > > diff --git a/qapi/migration.json b/qapi/migration.json > index 9d0bf82cf4..45326480bd 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -357,13 +357,17 @@ > # @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps. > # (since 2.12) > # > +# @bypass-shared-memory: the shared memory region will be bypassed on migration. > +# This feature allows the memory region to be reused by new qemu(s) > +# or be migrated separately. (since 2.13) > +# > # Since: 1.2 > ## > { 'enum': 'MigrationCapability', > 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', > 'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram', > 'block', 'return-path', 'pause-before-switchover', 'x-multifd', > - 'dirty-bitmaps' ] } > + 'dirty-bitmaps', 'bypass-shared-memory' ] } > > ## > # @MigrationCapabilityStatus: > -- > 2.14.3 (Apple Git-98) > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Tue, Apr 10, 2018 at 1:30 AM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: > Hi, > > * Lai Jiangshan (jiangshanlai@gmail.com) wrote: >> 1) What's this >> >> When the migration capability 'bypass-shared-memory' >> is set, the shared memory will be bypassed when migration. >> >> It is the key feature to enable several excellent features for >> the qemu, such as qemu-local-migration, qemu-live-update, >> extremely-fast-save-restore, vm-template, vm-fast-live-clone, >> yet-another-post-copy-migration, etc.. >> >> The philosophy behind this key feature, including the resulting >> advanced key features, is that a part of the memory management >> is separated out from the qemu, and let the other toolkits >> such as libvirt, kata-containers (https://github.com/kata-containers) >> runv(https://github.com/hyperhq/runv/) or some multiple cooperative >> qemu commands directly access to it, manage it, provide features on it. >> >> 2) Status in real world >> >> The hyperhq(http://hyper.sh http://hypercontainer.io/) >> introduced the feature vm-template(vm-fast-live-clone) >> to the hyper container for several years, it works perfect. >> (see https://github.com/hyperhq/runv/pull/297). >> >> The feature vm-template makes the containers(VMs) can >> be started in 130ms and save 80M memory for every >> container(VM). So that the hyper containers are fast >> and high-density as normal containers. >> >> kata-containers project (https://github.com/kata-containers) >> which was launched by hyper, intel and friends and which descended >> from runv (and clear-container) should have this feature enabled. >> Unfortunately, due to the code confliction between runv&cc, >> this feature was temporary disabled and it is being brought >> back by hyper and intel team. >> >> 3) How to use and bring up advanced features. >> >> In current qemu command line, shared memory has >> to be configured via memory-object. >> >> a) feature: qemu-local-migration, qemu-live-update >> Set the mem-path on the tmpfs and set share=on for it when >> start the vm. example: >> -object \ >> memory-backend-file,id=mem,size=128M,mem-path=/dev/shm/memory,share=on \ >> -numa node,nodeid=0,cpus=0-7,memdev=mem >> >> when you want to migrate the vm locally (after fixed a security bug >> of the qemu-binary, or other reason), you can start a new qemu with >> the same command line and -incoming, then you can migrate the >> vm from the old qemu to the new qemu with the migration capability >> 'bypass-shared-memory' set. The migration will migrate the device-state >> *ONLY*, the memory is the origin memory backed by tmpfs file. >> >> b) feature: extremely-fast-save-restore >> the same above, but the mem-path is on the persistent file system. >> >> c) feature: vm-template, vm-fast-live-clone >> the template vm is started as 1), and paused when the guest reaches >> the template point(example: the guest app is ready), then the template >> vm is saved. (the qemu process of the template can be killed now, because >> we need only the memory and the device state files (in tmpfs)). >> >> Then we can launch one or multiple VMs base on the template vm states, >> the new VMs are started without the “share=on”, all the new VMs share >> the initial memory from the memory file, they save a lot of memory. >> all the new VMs start from the template point, the guest app can go to >> work quickly. > > How do you handle the storage in this case, or giving each VM it's own > MAC address? The user or the upper layer tools can copy/clone the storage (on xfs,btrfs,ceph...). The user or the upper layer tools can handle the interface MAC itself while this patch just focus on memory. hyper/runv clone the vm before the interfaces are inserted. vm-template are often used along with hotplugging. > >> The new VM booted from template vm can’t become template again, >> if you need this unusual chained-template feature, you can write >> a cloneable-tmpfs kernel module for it. >> >> The libvirt toolkit can’t manage vm-template currently, in the >> hyperhq/runv, we use qemu wrapper script to do it. I hope someone add >> “libvrit managed template” feature to libvirt. > >> d) feature: yet-another-post-copy-migration >> It is a possible feature, no toolkit can do it well now. >> Using nbd server/client on the memory file is reluctantly Ok but >> inconvenient. A special feature for tmpfs might be needed to >> fully complete this feature. >> No one need yet another post copy migration method, >> but it is possible when some crazy man need it. > > As the crazy person who did the existing postcopy; one is enough! > Very true. This part of comments just shows how much potentials there are for such a simple migration capability. > Some minor fix requests below, but this looks nice and simple. > Will do soon. Thank for your review. > Shared memory is interesting because tehre are lots of different uses; > e.g. your uses, but also vhost-user which is sharing for a completely > different reason. > >> Cc: Samuel Ortiz <sameo@linux.intel.com> >> Cc: Sebastien Boeuf <sebastien.boeuf@intel.com> >> Cc: James O. D. Hunt <james.o.hunt@intel.com> >> Cc: Xu Wang <gnawux@gmail.com> >> Cc: Peng Tao <bergwolf@gmail.com> >> Cc: Xiao Guangrong <xiaoguangrong@tencent.com> >> Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com> >> Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> >> --- >> >> Changes in V4: >> fixes checkpatch.pl errors >> >> Changes in V3: >> rebased on upstream master >> update the available version of the capability to >> v2.13 >> >> Changes in V2: >> rebased on 2.11.1 >> >> migration/migration.c | 14 ++++++++++++++ >> migration/migration.h | 1 + >> migration/ram.c | 27 ++++++++++++++++++--------- >> qapi/migration.json | 6 +++++- >> 4 files changed, 38 insertions(+), 10 deletions(-) >> >> diff --git a/migration/migration.c b/migration/migration.c >> index 52a5092add..6a63102d7f 100644 >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -1509,6 +1509,20 @@ bool migrate_release_ram(void) >> return s->enabled_capabilities[MIGRATION_CAPABILITY_RELEASE_RAM]; >> } >> >> +bool migrate_bypass_shared_memory(void) >> +{ >> + MigrationState *s; >> + >> + /* it is not workable with postcopy yet. */ >> + if (migrate_postcopy_ram()) { >> + return false; >> + } > > Please change this to work in the same way as the check for > postcopy+compress in migration.c migrate_caps_check. > >> + s = migrate_get_current(); >> + >> + return s->enabled_capabilities[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY]; >> +} >> + >> bool migrate_postcopy_ram(void) >> { >> MigrationState *s; >> diff --git a/migration/migration.h b/migration/migration.h >> index 8d2f320c48..cfd2513ef0 100644 >> --- a/migration/migration.h >> +++ b/migration/migration.h >> @@ -206,6 +206,7 @@ MigrationState *migrate_get_current(void); >> >> bool migrate_postcopy(void); >> >> +bool migrate_bypass_shared_memory(void); >> bool migrate_release_ram(void); >> bool migrate_postcopy_ram(void); >> bool migrate_zero_blocks(void); >> diff --git a/migration/ram.c b/migration/ram.c >> index 0e90efa092..bca170c386 100644 >> --- a/migration/ram.c >> +++ b/migration/ram.c >> @@ -780,6 +780,11 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, >> unsigned long *bitmap = rb->bmap; >> unsigned long next; >> >> + /* when this ramblock is requested bypassing */ >> + if (!bitmap) { >> + return size; >> + } >> + >> if (rs->ram_bulk_stage && start > 0) { >> next = start + 1; >> } else { >> @@ -850,7 +855,9 @@ static void migration_bitmap_sync(RAMState *rs) >> qemu_mutex_lock(&rs->bitmap_mutex); >> rcu_read_lock(); >> RAMBLOCK_FOREACH(block) { >> - migration_bitmap_sync_range(rs, block, 0, block->used_length); >> + if (!migrate_bypass_shared_memory() || !qemu_ram_is_shared(block)) { >> + migration_bitmap_sync_range(rs, block, 0, block->used_length); >> + } >> } >> rcu_read_unlock(); >> qemu_mutex_unlock(&rs->bitmap_mutex); >> @@ -2132,18 +2139,12 @@ static int ram_state_init(RAMState **rsp) >> qemu_mutex_init(&(*rsp)->src_page_req_mutex); >> QSIMPLEQ_INIT(&(*rsp)->src_page_requests); >> >> - /* >> - * Count the total number of pages used by ram blocks not including any >> - * gaps due to alignment or unplugs. >> - */ >> - (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS; >> - >> ram_state_reset(*rsp); >> >> return 0; >> } >> >> -static void ram_list_init_bitmaps(void) >> +static void ram_list_init_bitmaps(RAMState *rs) >> { >> RAMBlock *block; >> unsigned long pages; >> @@ -2151,9 +2152,17 @@ static void ram_list_init_bitmaps(void) >> /* Skip setting bitmap if there is no RAM */ >> if (ram_bytes_total()) { > > I think you need to add here a : > rs->migration_dirty_pages = 0; > > I don't see anywhere else that initialises it, and there is the case of > a migration that fails, followed by a 2nd attempt. > >> QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { >> + if (migrate_bypass_shared_memory() && qemu_ram_is_shared(block)) { >> + continue; >> + } >> pages = block->max_length >> TARGET_PAGE_BITS; >> block->bmap = bitmap_new(pages); >> bitmap_set(block->bmap, 0, pages); >> + /* >> + * Count the total number of pages used by ram blocks not >> + * including any gaps due to alignment or unplugs. >> + */ >> + rs->migration_dirty_pages += pages; >> if (migrate_postcopy_ram()) { >> block->unsentmap = bitmap_new(pages); >> bitmap_set(block->unsentmap, 0, pages); >> @@ -2169,7 +2178,7 @@ static void ram_init_bitmaps(RAMState *rs) >> qemu_mutex_lock_ramlist(); >> rcu_read_lock(); >> >> - ram_list_init_bitmaps(); >> + ram_list_init_bitmaps(rs); >> memory_global_dirty_log_start(); >> migration_bitmap_sync(rs); >> >> diff --git a/qapi/migration.json b/qapi/migration.json >> index 9d0bf82cf4..45326480bd 100644 >> --- a/qapi/migration.json >> +++ b/qapi/migration.json >> @@ -357,13 +357,17 @@ >> # @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps. >> # (since 2.12) >> # >> +# @bypass-shared-memory: the shared memory region will be bypassed on migration. >> +# This feature allows the memory region to be reused by new qemu(s) >> +# or be migrated separately. (since 2.13) >> +# >> # Since: 1.2 >> ## >> { 'enum': 'MigrationCapability', >> 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', >> 'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram', >> 'block', 'return-path', 'pause-before-switchover', 'x-multifd', >> - 'dirty-bitmaps' ] } >> + 'dirty-bitmaps', 'bypass-shared-memory' ] } >> >> ## >> # @MigrationCapabilityStatus: >> -- >> 2.14.3 (Apple Git-98) >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
1) What's this
When the migration capability 'bypass-shared-memory'
is set, the shared memory will be bypassed when migration.
It is the key feature to enable several excellent features for
the qemu, such as qemu-local-migration, qemu-live-update,
extremely-fast-save-restore, vm-template, vm-fast-live-clone,
yet-another-post-copy-migration, etc..
The philosophy behind this key feature, including the resulting
advanced key features, is that a part of the memory management
is separated out from the qemu, and let the other toolkits
such as libvirt, kata-containers (https://github.com/kata-containers)
runv(https://github.com/hyperhq/runv/) or some multiple cooperative
qemu commands directly access to it, manage it, provide features on it.
2) Status in real world
The hyperhq(http://hyper.sh http://hypercontainer.io/)
introduced the feature vm-template(vm-fast-live-clone)
to the hyper container for several years, it works perfect.
(see https://github.com/hyperhq/runv/pull/297).
The feature vm-template makes the containers(VMs) can
be started in 130ms and save 80M memory for every
container(VM). So that the hyper containers are fast
and high-density as normal containers.
kata-containers project (https://github.com/kata-containers)
which was launched by hyper, intel and friends and which descended
from runv (and clear-container) should have this feature enabled.
Unfortunately, due to the code confliction between runv&cc,
this feature was temporary disabled and it is being brought
back by hyper and intel team.
3) How to use and bring up advanced features.
In current qemu command line, shared memory has
to be configured via memory-object.
a) feature: qemu-local-migration, qemu-live-update
Set the mem-path on the tmpfs and set share=on for it when
start the vm. example:
-object \
memory-backend-file,id=mem,size=128M,mem-path=/dev/shm/memory,share=on \
-numa node,nodeid=0,cpus=0-7,memdev=mem
when you want to migrate the vm locally (after fixed a security bug
of the qemu-binary, or other reason), you can start a new qemu with
the same command line and -incoming, then you can migrate the
vm from the old qemu to the new qemu with the migration capability
'bypass-shared-memory' set. The migration will migrate the device-state
*ONLY*, the memory is the origin memory backed by tmpfs file.
b) feature: extremely-fast-save-restore
the same above, but the mem-path is on the persistent file system.
c) feature: vm-template, vm-fast-live-clone
the template vm is started as 1), and paused when the guest reaches
the template point(example: the guest app is ready), then the template
vm is saved. (the qemu process of the template can be killed now, because
we need only the memory and the device state files (in tmpfs)).
Then we can launch one or multiple VMs base on the template vm states,
the new VMs are started without the “share=on”, all the new VMs share
the initial memory from the memory file, they save a lot of memory.
all the new VMs start from the template point, the guest app can go to
work quickly.
The new VM booted from template vm can’t become template again,
if you need this unusual chained-template feature, you can write
a cloneable-tmpfs kernel module for it.
The libvirt toolkit can’t manage vm-template currently, in the
hyperhq/runv, we use qemu wrapper script to do it. I hope someone add
“libvrit managed template” feature to libvirt.
d) feature: yet-another-post-copy-migration
It is a possible feature, no toolkit can do it well now.
Using nbd server/client on the memory file is reluctantly Ok but
inconvenient. A special feature for tmpfs might be needed to
fully complete this feature.
No one need yet another post copy migration method,
but it is possible when some crazy man need it.
Cc: Juan Quintela <quintela@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Cc: Markus Armbruster <armbru@redhat.com>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Sebastien Boeuf <sebastien.boeuf@intel.com>
Cc: James O. D. Hunt <james.o.hunt@intel.com>
Cc: Xu Wang <gnawux@gmail.com>
Cc: Peng Tao <bergwolf@gmail.com>
Cc: Xiao Guangrong <xiaoguangrong@tencent.com>
Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com>
Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com>
---
Changes in V5:
check cappability conflict in migrate_caps_check()
Changes in V4:
fixes checkpatch.pl errors
Changes in V3:
rebased on upstream master
update the available version of the capability to
v2.13
Changes in V2:
rebased on 2.11.1
migration/migration.c | 22 ++++++++++++++++++++++
migration/migration.h | 1 +
migration/ram.c | 27 ++++++++++++++++++---------
qapi/migration.json | 6 +++++-
4 files changed, 46 insertions(+), 10 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 52a5092add..110b40f6d4 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -736,6 +736,19 @@ static bool migrate_caps_check(bool *cap_list,
return false;
}
+ if (cap_list[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY]) {
+ /* Bypass and postcopy are quite conflicting ways
+ * to get memory in the destination. And there
+ * is not code to discriminate the differences and
+ * handle the conflicts currently. It should be possible
+ * to fix, but it is generally useless when both ways
+ * are used together.
+ */
+ error_setg(errp, "Bypass is not currently compatible "
+ "with postcopy");
+ return false;
+ }
+
/* This check is reasonably expensive, so only when it's being
* set the first time, also it's only the destination that needs
* special support.
@@ -1509,6 +1522,15 @@ bool migrate_release_ram(void)
return s->enabled_capabilities[MIGRATION_CAPABILITY_RELEASE_RAM];
}
+bool migrate_bypass_shared_memory(void)
+{
+ MigrationState *s;
+
+ s = migrate_get_current();
+
+ return s->enabled_capabilities[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY];
+}
+
bool migrate_postcopy_ram(void)
{
MigrationState *s;
diff --git a/migration/migration.h b/migration/migration.h
index 8d2f320c48..cfd2513ef0 100644
--- a/migration/migration.h
+++ b/migration/migration.h
@@ -206,6 +206,7 @@ MigrationState *migrate_get_current(void);
bool migrate_postcopy(void);
+bool migrate_bypass_shared_memory(void);
bool migrate_release_ram(void);
bool migrate_postcopy_ram(void);
bool migrate_zero_blocks(void);
diff --git a/migration/ram.c b/migration/ram.c
index 0e90efa092..bca170c386 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -780,6 +780,11 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb,
unsigned long *bitmap = rb->bmap;
unsigned long next;
+ /* when this ramblock is requested bypassing */
+ if (!bitmap) {
+ return size;
+ }
+
if (rs->ram_bulk_stage && start > 0) {
next = start + 1;
} else {
@@ -850,7 +855,9 @@ static void migration_bitmap_sync(RAMState *rs)
qemu_mutex_lock(&rs->bitmap_mutex);
rcu_read_lock();
RAMBLOCK_FOREACH(block) {
- migration_bitmap_sync_range(rs, block, 0, block->used_length);
+ if (!migrate_bypass_shared_memory() || !qemu_ram_is_shared(block)) {
+ migration_bitmap_sync_range(rs, block, 0, block->used_length);
+ }
}
rcu_read_unlock();
qemu_mutex_unlock(&rs->bitmap_mutex);
@@ -2132,18 +2139,12 @@ static int ram_state_init(RAMState **rsp)
qemu_mutex_init(&(*rsp)->src_page_req_mutex);
QSIMPLEQ_INIT(&(*rsp)->src_page_requests);
- /*
- * Count the total number of pages used by ram blocks not including any
- * gaps due to alignment or unplugs.
- */
- (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS;
-
ram_state_reset(*rsp);
return 0;
}
-static void ram_list_init_bitmaps(void)
+static void ram_list_init_bitmaps(RAMState *rs)
{
RAMBlock *block;
unsigned long pages;
@@ -2151,9 +2152,17 @@ static void ram_list_init_bitmaps(void)
/* Skip setting bitmap if there is no RAM */
if (ram_bytes_total()) {
QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
+ if (migrate_bypass_shared_memory() && qemu_ram_is_shared(block)) {
+ continue;
+ }
pages = block->max_length >> TARGET_PAGE_BITS;
block->bmap = bitmap_new(pages);
bitmap_set(block->bmap, 0, pages);
+ /*
+ * Count the total number of pages used by ram blocks not
+ * including any gaps due to alignment or unplugs.
+ */
+ rs->migration_dirty_pages += pages;
if (migrate_postcopy_ram()) {
block->unsentmap = bitmap_new(pages);
bitmap_set(block->unsentmap, 0, pages);
@@ -2169,7 +2178,7 @@ static void ram_init_bitmaps(RAMState *rs)
qemu_mutex_lock_ramlist();
rcu_read_lock();
- ram_list_init_bitmaps();
+ ram_list_init_bitmaps(rs);
memory_global_dirty_log_start();
migration_bitmap_sync(rs);
diff --git a/qapi/migration.json b/qapi/migration.json
index 9d0bf82cf4..45326480bd 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -357,13 +357,17 @@
# @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps.
# (since 2.12)
#
+# @bypass-shared-memory: the shared memory region will be bypassed on migration.
+# This feature allows the memory region to be reused by new qemu(s)
+# or be migrated separately. (since 2.13)
+#
# Since: 1.2
##
{ 'enum': 'MigrationCapability',
'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks',
'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram',
'block', 'return-path', 'pause-before-switchover', 'x-multifd',
- 'dirty-bitmaps' ] }
+ 'dirty-bitmaps', 'bypass-shared-memory' ] }
##
# @MigrationCapabilityStatus:
--
2.15.1 (Apple Git-101)
* Lai Jiangshan (jiangshanlai@gmail.com) wrote: > 1) What's this > > When the migration capability 'bypass-shared-memory' > is set, the shared memory will be bypassed when migration. > > It is the key feature to enable several excellent features for > the qemu, such as qemu-local-migration, qemu-live-update, > extremely-fast-save-restore, vm-template, vm-fast-live-clone, > yet-another-post-copy-migration, etc.. > > The philosophy behind this key feature, including the resulting > advanced key features, is that a part of the memory management > is separated out from the qemu, and let the other toolkits > such as libvirt, kata-containers (https://github.com/kata-containers) > runv(https://github.com/hyperhq/runv/) or some multiple cooperative > qemu commands directly access to it, manage it, provide features on it. > > 2) Status in real world > > The hyperhq(http://hyper.sh http://hypercontainer.io/) > introduced the feature vm-template(vm-fast-live-clone) > to the hyper container for several years, it works perfect. > (see https://github.com/hyperhq/runv/pull/297). > > The feature vm-template makes the containers(VMs) can > be started in 130ms and save 80M memory for every > container(VM). So that the hyper containers are fast > and high-density as normal containers. > > kata-containers project (https://github.com/kata-containers) > which was launched by hyper, intel and friends and which descended > from runv (and clear-container) should have this feature enabled. > Unfortunately, due to the code confliction between runv&cc, > this feature was temporary disabled and it is being brought > back by hyper and intel team. > 3) How to use and bring up advanced features. > > In current qemu command line, shared memory has > to be configured via memory-object. > > a) feature: qemu-local-migration, qemu-live-update > Set the mem-path on the tmpfs and set share=on for it when > start the vm. example: > -object \ > memory-backend-file,id=mem,size=128M,mem-path=/dev/shm/memory,share=on \ > -numa node,nodeid=0,cpus=0-7,memdev=mem > > when you want to migrate the vm locally (after fixed a security bug > of the qemu-binary, or other reason), you can start a new qemu with > the same command line and -incoming, then you can migrate the > vm from the old qemu to the new qemu with the migration capability > 'bypass-shared-memory' set. The migration will migrate the device-state > *ONLY*, the memory is the origin memory backed by tmpfs file. > > b) feature: extremely-fast-save-restore > the same above, but the mem-path is on the persistent file system. > > c) feature: vm-template, vm-fast-live-clone > the template vm is started as 1), and paused when the guest reaches > the template point(example: the guest app is ready), then the template > vm is saved. (the qemu process of the template can be killed now, because > we need only the memory and the device state files (in tmpfs)). > > Then we can launch one or multiple VMs base on the template vm states, > the new VMs are started without the “share=on”, all the new VMs share > the initial memory from the memory file, they save a lot of memory. > all the new VMs start from the template point, the guest app can go to > work quickly. > > The new VM booted from template vm can’t become template again, > if you need this unusual chained-template feature, you can write > a cloneable-tmpfs kernel module for it. > I've just tried doing something similar with this patch; it's really interesting. I used LVM snapshotting for the RAM: cd /dev/shm fallocate -l 20G backingfile losetup -f ./backingfile pvcreate /dev/loop0 vgcreate ram /dev/loop0 lvcreate -L4G -nram1 ram /dev/loop0 qemu -M pc,accel=kvm -m 4G -object memory-backend-file,id=mem,size=4G,mem-path=/dev/ram/ram1,share=on -numa node,memdev=mem -vnc :0 -drive file=my.qcow2,id=d,cache=none -monitor stdio boot the VM, and do a : migrate_set_capability bypass-shared-memory on migrate_set_speed 10G migrate "exec:cat > migstream1" q then: lvcreate -n ramsnap1 -s ram/ram1 -L4G qemu -M pc,accel=kvm -m 4G -object memory-backend-file,id=mem,size=4G,mem-path=/dev/ram/ramsnap1,share=on -numa node,memdev=mem -vnc :0 -drive file=my.qcow2,id=d,cache=none -monitor stdio -snapshot -incoming "exec:cat migstream1" lvcreate -n ramsnap2 -s ram/ram1 -L4G qemu -M pc,accel=kvm -m 4G -object memory-backend-file,id=mem,size=4G,mem-path=/dev/ram/ramsnap2,share=on -numa node,memdev=mem -vnc :1 -drive file=my.qcow2,id=d,cache=none -monitor stdio -snapshot -incoming "exec:cat migstream1" and I've got two separate instances of qemu restored from that stream. It seems to work; I wonder if we ever need things like msync() or similar? I've not tried creating a 2nd template with this. > The libvirt toolkit can’t manage vm-template currently, in the > hyperhq/runv, we use qemu wrapper script to do it. I hope someone add > “libvrit managed template” feature to libvirt. > > d) feature: yet-another-post-copy-migration > It is a possible feature, no toolkit can do it well now. > Using nbd server/client on the memory file is reluctantly Ok but > inconvenient. A special feature for tmpfs might be needed to > fully complete this feature. > No one need yet another post copy migration method, > but it is possible when some crazy man need it. > > Cc: Juan Quintela <quintela@redhat.com> > Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com> > Cc: Eric Blake <eblake@redhat.com> > Cc: Markus Armbruster <armbru@redhat.com> > Cc: Samuel Ortiz <sameo@linux.intel.com> > Cc: Sebastien Boeuf <sebastien.boeuf@intel.com> > Cc: James O. D. Hunt <james.o.hunt@intel.com> > Cc: Xu Wang <gnawux@gmail.com> > Cc: Peng Tao <bergwolf@gmail.com> > Cc: Xiao Guangrong <xiaoguangrong@tencent.com> > Cc: Xiao Guangrong <xiaoguangrong.eric@gmail.com> > Signed-off-by: Lai Jiangshan <jiangshanlai@gmail.com> > --- > > Changes in V5: > check cappability conflict in migrate_caps_check() > > Changes in V4: > fixes checkpatch.pl errors > > Changes in V3: > rebased on upstream master > update the available version of the capability to > v2.13 > > Changes in V2: > rebased on 2.11.1 > > migration/migration.c | 22 ++++++++++++++++++++++ > migration/migration.h | 1 + > migration/ram.c | 27 ++++++++++++++++++--------- > qapi/migration.json | 6 +++++- > 4 files changed, 46 insertions(+), 10 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 52a5092add..110b40f6d4 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -736,6 +736,19 @@ static bool migrate_caps_check(bool *cap_list, > return false; > } > > + if (cap_list[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY]) { > + /* Bypass and postcopy are quite conflicting ways > + * to get memory in the destination. And there > + * is not code to discriminate the differences and > + * handle the conflicts currently. It should be possible > + * to fix, but it is generally useless when both ways > + * are used together. > + */ > + error_setg(errp, "Bypass is not currently compatible " > + "with postcopy"); > + return false; > + } > + Good. > /* This check is reasonably expensive, so only when it's being > * set the first time, also it's only the destination that needs > * special support. > @@ -1509,6 +1522,15 @@ bool migrate_release_ram(void) > return s->enabled_capabilities[MIGRATION_CAPABILITY_RELEASE_RAM]; > } > > +bool migrate_bypass_shared_memory(void) > +{ > + MigrationState *s; > + > + s = migrate_get_current(); > + > + return s->enabled_capabilities[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY]; > +} > + > bool migrate_postcopy_ram(void) > { > MigrationState *s; > diff --git a/migration/migration.h b/migration/migration.h > index 8d2f320c48..cfd2513ef0 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -206,6 +206,7 @@ MigrationState *migrate_get_current(void); > > bool migrate_postcopy(void); > > +bool migrate_bypass_shared_memory(void); > bool migrate_release_ram(void); > bool migrate_postcopy_ram(void); > bool migrate_zero_blocks(void); > diff --git a/migration/ram.c b/migration/ram.c > index 0e90efa092..bca170c386 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -780,6 +780,11 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, > unsigned long *bitmap = rb->bmap; > unsigned long next; > > + /* when this ramblock is requested bypassing */ > + if (!bitmap) { > + return size; > + } > + > if (rs->ram_bulk_stage && start > 0) { > next = start + 1; > } else { > @@ -850,7 +855,9 @@ static void migration_bitmap_sync(RAMState *rs) > qemu_mutex_lock(&rs->bitmap_mutex); > rcu_read_lock(); > RAMBLOCK_FOREACH(block) { > - migration_bitmap_sync_range(rs, block, 0, block->used_length); > + if (!migrate_bypass_shared_memory() || !qemu_ram_is_shared(block)) { > + migration_bitmap_sync_range(rs, block, 0, block->used_length); > + } > } > rcu_read_unlock(); > qemu_mutex_unlock(&rs->bitmap_mutex); > @@ -2132,18 +2139,12 @@ static int ram_state_init(RAMState **rsp) > qemu_mutex_init(&(*rsp)->src_page_req_mutex); > QSIMPLEQ_INIT(&(*rsp)->src_page_requests); > > - /* > - * Count the total number of pages used by ram blocks not including any > - * gaps due to alignment or unplugs. > - */ > - (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS; > - > ram_state_reset(*rsp); > > return 0; > } > > -static void ram_list_init_bitmaps(void) > +static void ram_list_init_bitmaps(RAMState *rs) > { > RAMBlock *block; > unsigned long pages; > @@ -2151,9 +2152,17 @@ static void ram_list_init_bitmaps(void) > /* Skip setting bitmap if there is no RAM */ > if (ram_bytes_total()) { > QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { > + if (migrate_bypass_shared_memory() && qemu_ram_is_shared(block)) { > + continue; > + } > pages = block->max_length >> TARGET_PAGE_BITS; > block->bmap = bitmap_new(pages); > bitmap_set(block->bmap, 0, pages); > + /* > + * Count the total number of pages used by ram blocks not > + * including any gaps due to alignment or unplugs. > + */ > + rs->migration_dirty_pages += pages; > if (migrate_postcopy_ram()) { > block->unsentmap = bitmap_new(pages); > bitmap_set(block->unsentmap, 0, pages); Can you please rework this to combine with Cédric Le Goater's 'discard non-migratable RAMBlocks' - it's quite similar to what you're trying to do but for a different reason; If you look at the v2 from April 13, I think you can just find somewhere to clear the RAM_MIGRATABLE flag. One thing I noticed; in my world I've got some code that checks if we ever do a RAM iteration, don't find any dirty blocks but then still have migration_dirty_pages being none-0; and with this patch I'm seeing that check trigger: ram_find_and_save_block: no page found, yet dirty_pages=480 it doesn't seem to trigger without the patch. Dave > @@ -2169,7 +2178,7 @@ static void ram_init_bitmaps(RAMState *rs) > qemu_mutex_lock_ramlist(); > rcu_read_lock(); > > - ram_list_init_bitmaps(); > + ram_list_init_bitmaps(rs); > memory_global_dirty_log_start(); > migration_bitmap_sync(rs); > > diff --git a/qapi/migration.json b/qapi/migration.json > index 9d0bf82cf4..45326480bd 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -357,13 +357,17 @@ > # @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps. > # (since 2.12) > # > +# @bypass-shared-memory: the shared memory region will be bypassed on migration. > +# This feature allows the memory region to be reused by new qemu(s) > +# or be migrated separately. (since 2.13) > +# > # Since: 1.2 > ## > { 'enum': 'MigrationCapability', > 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', > 'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram', > 'block', 'return-path', 'pause-before-switchover', 'x-multifd', > - 'dirty-bitmaps' ] } > + 'dirty-bitmaps', 'bypass-shared-memory' ] } > > ## > # @MigrationCapabilityStatus: > -- > 2.15.1 (Apple Git-101) > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On Fri, Apr 20, 2018 at 12:38 AM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: >> -static void ram_list_init_bitmaps(void) >> +static void ram_list_init_bitmaps(RAMState *rs) >> { >> RAMBlock *block; >> unsigned long pages; >> @@ -2151,9 +2152,17 @@ static void ram_list_init_bitmaps(void) >> /* Skip setting bitmap if there is no RAM */ >> if (ram_bytes_total()) { >> QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { >> + if (migrate_bypass_shared_memory() && qemu_ram_is_shared(block)) { >> + continue; >> + } >> pages = block->max_length >> TARGET_PAGE_BITS; >> block->bmap = bitmap_new(pages); >> bitmap_set(block->bmap, 0, pages); >> + /* >> + * Count the total number of pages used by ram blocks not >> + * including any gaps due to alignment or unplugs. >> + */ >> + rs->migration_dirty_pages += pages; >> if (migrate_postcopy_ram()) { >> block->unsentmap = bitmap_new(pages); >> bitmap_set(block->unsentmap, 0, pages); > > Can you please rework this to combine with Cédric Le Goater's > 'discard non-migratable RAMBlocks' - it's quite similar to what you're > trying to do but for a different reason; If you look at the v2 from > April 13, I think you can just find somewhere to clear the > RAM_MIGRATABLE flag. Hello Dave: It seems we need to add new qmp/hmp command to clear/add RAM_MIGRATABLE flag which is overkill for such a simple feature. Please point out if there is any simple way to do so. And this kind of memory is not "un-MIGRATABLE", the user just decided not to migrate it/them for one of the migrations. But they are always MIGRATABLE regardless the user migrate them or not. So clearing/setting the flag may cause confusion in this case. What do you think? Bypassing is an option for every migration. For the same vm instance, the user might migrate it out multiple times. He wants to bypass shared memory in some migrations and do the normal migrations in other times. So it is better that Bypassing is an option or capability of migration instead of ramblock. I don't insist on avoiding using RAM_MIGRATABLE. Thanks, Lai > > One thing I noticed; in my world I've got some code that checks if we > ever do a RAM iteration, don't find any dirty blocks but then still have > migration_dirty_pages being none-0; and with this patch I'm seeing that > check trigger: > > ram_find_and_save_block: no page found, yet dirty_pages=480 > > it doesn't seem to trigger without the patch. Does initializing the migration_dirty_pages as you suggested help? > > Dave > >> @@ -2169,7 +2178,7 @@ static void ram_init_bitmaps(RAMState *rs) >> qemu_mutex_lock_ramlist(); >> rcu_read_lock(); >> >> - ram_list_init_bitmaps(); >> + ram_list_init_bitmaps(rs); >> memory_global_dirty_log_start(); >> migration_bitmap_sync(rs); >> >> diff --git a/qapi/migration.json b/qapi/migration.json >> index 9d0bf82cf4..45326480bd 100644 >> --- a/qapi/migration.json >> +++ b/qapi/migration.json >> @@ -357,13 +357,17 @@ >> # @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps. >> # (since 2.12) >> # >> +# @bypass-shared-memory: the shared memory region will be bypassed on migration. >> +# This feature allows the memory region to be reused by new qemu(s) >> +# or be migrated separately. (since 2.13) >> +# >> # Since: 1.2 >> ## >> { 'enum': 'MigrationCapability', >> 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', >> 'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram', >> 'block', 'return-path', 'pause-before-switchover', 'x-multifd', >> - 'dirty-bitmaps' ] } >> + 'dirty-bitmaps', 'bypass-shared-memory' ] } >> >> ## >> # @MigrationCapabilityStatus: >> -- >> 2.15.1 (Apple Git-101) >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
* Lai Jiangshan (jiangshanlai@gmail.com) wrote: > On Fri, Apr 20, 2018 at 12:38 AM, Dr. David Alan Gilbert > <dgilbert@redhat.com> wrote: > > >> -static void ram_list_init_bitmaps(void) > >> +static void ram_list_init_bitmaps(RAMState *rs) > >> { > >> RAMBlock *block; > >> unsigned long pages; > >> @@ -2151,9 +2152,17 @@ static void ram_list_init_bitmaps(void) > >> /* Skip setting bitmap if there is no RAM */ > >> if (ram_bytes_total()) { > >> QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { > >> + if (migrate_bypass_shared_memory() && qemu_ram_is_shared(block)) { > >> + continue; > >> + } > >> pages = block->max_length >> TARGET_PAGE_BITS; > >> block->bmap = bitmap_new(pages); > >> bitmap_set(block->bmap, 0, pages); > >> + /* > >> + * Count the total number of pages used by ram blocks not > >> + * including any gaps due to alignment or unplugs. > >> + */ > >> + rs->migration_dirty_pages += pages; > >> if (migrate_postcopy_ram()) { > >> block->unsentmap = bitmap_new(pages); > >> bitmap_set(block->unsentmap, 0, pages); > > > > Can you please rework this to combine with Cédric Le Goater's > > 'discard non-migratable RAMBlocks' - it's quite similar to what you're > > trying to do but for a different reason; If you look at the v2 from > > April 13, I think you can just find somewhere to clear the > > RAM_MIGRATABLE flag. > > Hello Dave: > > It seems we need to add new qmp/hmp command to clear/add > RAM_MIGRATABLE flag which is overkill for such a simple feature. > Please point out if there is any simple way to do so. I'm fine with you still using a capability to enable/disable it - I think that part of your patch is fine; but then I think you just need to check that capability somewhere in Cedric's code; perhaps in his qemu_ram_is_migratable? > And this kind of memory is not "un-MIGRATABLE", the user > just decided not to migrate it/them for one of the migrations. > But they are always MIGRATABLE regardless the user migrate > them or not. So clearing/setting the flag may > cause confusion in this case. What do you think? The 'RAM_MIGRATABLE' is just an internal name for the flag; it's not seen by the user; it's as good a name as any. > Bypassing is an option for every migration. For the > same vm instance, the user might migrate it out > multiple times. He wants to bypass shared memory > in some migrations and do the normal migrations in > other times. So it is better that Bypassing is an option > or capability of migration instead of ramblock. > > I don't insist on avoiding using RAM_MIGRATABLE. and so it might be best for you not to change the flag, just to add to qemu_ram_is_migratable. > Thanks, > Lai > > > > > One thing I noticed; in my world I've got some code that checks if we > > ever do a RAM iteration, don't find any dirty blocks but then still have > > migration_dirty_pages being none-0; and with this patch I'm seeing that > > check trigger: > > > > ram_find_and_save_block: no page found, yet dirty_pages=480 > > > > it doesn't seem to trigger without the patch. > > Does initializing the migration_dirty_pages as you suggested help? I've not had a chance to try yet; here is the debug patch I've got: @@ -1594,6 +1594,13 @@ static int ram_find_and_save_block(RAMState *rs, bool last_stage) } } while (!pages && again); + if (!pages && !again && pss.complete_round && rs->migration_dirty_pages) + { + /* Should make this fail migration ? */ + fprintf(stderr, "%s: no page found, yet dirty_pages=%"PRIu64"\n", + __func__, rs->migration_dirty_pages); + } + rs->last_seen_block = pss.block; rs->last_page = pss.page; Dave > > > > Dave > > > >> @@ -2169,7 +2178,7 @@ static void ram_init_bitmaps(RAMState *rs) > >> qemu_mutex_lock_ramlist(); > >> rcu_read_lock(); > >> > >> - ram_list_init_bitmaps(); > >> + ram_list_init_bitmaps(rs); > >> memory_global_dirty_log_start(); > >> migration_bitmap_sync(rs); > >> > >> diff --git a/qapi/migration.json b/qapi/migration.json > >> index 9d0bf82cf4..45326480bd 100644 > >> --- a/qapi/migration.json > >> +++ b/qapi/migration.json > >> @@ -357,13 +357,17 @@ > >> # @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps. > >> # (since 2.12) > >> # > >> +# @bypass-shared-memory: the shared memory region will be bypassed on migration. > >> +# This feature allows the memory region to be reused by new qemu(s) > >> +# or be migrated separately. (since 2.13) > >> +# > >> # Since: 1.2 > >> ## > >> { 'enum': 'MigrationCapability', > >> 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', > >> 'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram', > >> 'block', 'return-path', 'pause-before-switchover', 'x-multifd', > >> - 'dirty-bitmaps' ] } > >> + 'dirty-bitmaps', 'bypass-shared-memory' ] } > >> > >> ## > >> # @MigrationCapabilityStatus: > >> -- > >> 2.15.1 (Apple Git-101) > >> > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
On 04/26/2018 09:05 PM, Dr. David Alan Gilbert wrote: >>> Can you please rework this to combine with Cédric Le Goater's >>> 'discard non-migratable RAMBlocks' - it's quite similar to what you're >>> trying to do but for a different reason; If you look at the v2 from >>> April 13, I think you can just find somewhere to clear the >>> RAM_MIGRATABLE flag. >> Hello Dave: >> >> It seems we need to add new qmp/hmp command to clear/add >> RAM_MIGRATABLE flag which is overkill for such a simple feature. >> Please point out if there is any simple way to do so. > I'm fine with you still using a capability to enable/disable it - I > think that part of your patch is fine; but then I think you just > need to check that capability somewhere in Cedric's code; perhaps in his > qemu_ram_is_migratable? > I have a v3 for this patch but it only adds an error_report(). Working on the v2 should be fine. Thanks, C.
On Mon, Apr 16, 2018 at 11:00:11PM +0800, Lai Jiangshan wrote: > > migration/migration.c | 22 ++++++++++++++++++++++ > migration/migration.h | 1 + > migration/ram.c | 27 ++++++++++++++++++--------- > qapi/migration.json | 6 +++++- > 4 files changed, 46 insertions(+), 10 deletions(-) > > diff --git a/migration/migration.c b/migration/migration.c > index 52a5092add..110b40f6d4 100644 > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -736,6 +736,19 @@ static bool migrate_caps_check(bool *cap_list, > return false; > } > > + if (cap_list[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY]) { > + /* Bypass and postcopy are quite conflicting ways > + * to get memory in the destination. And there > + * is not code to discriminate the differences and > + * handle the conflicts currently. It should be possible > + * to fix, but it is generally useless when both ways > + * are used together. > + */ > + error_setg(errp, "Bypass is not currently compatible " > + "with postcopy"); > + return false; > + } > + > /* This check is reasonably expensive, so only when it's being > * set the first time, also it's only the destination that needs > * special support. > @@ -1509,6 +1522,15 @@ bool migrate_release_ram(void) > return s->enabled_capabilities[MIGRATION_CAPABILITY_RELEASE_RAM]; > } > > +bool migrate_bypass_shared_memory(void) > +{ > + MigrationState *s; > + > + s = migrate_get_current(); > + > + return s->enabled_capabilities[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY]; > +} > + > bool migrate_postcopy_ram(void) > { > MigrationState *s; > diff --git a/migration/migration.h b/migration/migration.h > index 8d2f320c48..cfd2513ef0 100644 > --- a/migration/migration.h > +++ b/migration/migration.h > @@ -206,6 +206,7 @@ MigrationState *migrate_get_current(void); > > bool migrate_postcopy(void); > > +bool migrate_bypass_shared_memory(void); > bool migrate_release_ram(void); > bool migrate_postcopy_ram(void); > bool migrate_zero_blocks(void); > diff --git a/migration/ram.c b/migration/ram.c > index 0e90efa092..bca170c386 100644 > --- a/migration/ram.c > +++ b/migration/ram.c > @@ -780,6 +780,11 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, > unsigned long *bitmap = rb->bmap; > unsigned long next; > > + /* when this ramblock is requested bypassing */ > + if (!bitmap) { > + return size; > + } > + > if (rs->ram_bulk_stage && start > 0) { > next = start + 1; > } else { > @@ -850,7 +855,9 @@ static void migration_bitmap_sync(RAMState *rs) > qemu_mutex_lock(&rs->bitmap_mutex); > rcu_read_lock(); > RAMBLOCK_FOREACH(block) { > - migration_bitmap_sync_range(rs, block, 0, block->used_length); > + if (!migrate_bypass_shared_memory() || !qemu_ram_is_shared(block)) { > + migration_bitmap_sync_range(rs, block, 0, block->used_length); > + } > } > rcu_read_unlock(); > qemu_mutex_unlock(&rs->bitmap_mutex); > @@ -2132,18 +2139,12 @@ static int ram_state_init(RAMState **rsp) > qemu_mutex_init(&(*rsp)->src_page_req_mutex); > QSIMPLEQ_INIT(&(*rsp)->src_page_requests); > > - /* > - * Count the total number of pages used by ram blocks not including any > - * gaps due to alignment or unplugs. > - */ > - (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS; > - > ram_state_reset(*rsp); > > return 0; > } > > -static void ram_list_init_bitmaps(void) > +static void ram_list_init_bitmaps(RAMState *rs) > { > RAMBlock *block; > unsigned long pages; > @@ -2151,9 +2152,17 @@ static void ram_list_init_bitmaps(void) > /* Skip setting bitmap if there is no RAM */ > if (ram_bytes_total()) { > QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { > + if (migrate_bypass_shared_memory() && qemu_ram_is_shared(block)) { > + continue; > + } > pages = block->max_length >> TARGET_PAGE_BITS; > block->bmap = bitmap_new(pages); > bitmap_set(block->bmap, 0, pages); > + /* > + * Count the total number of pages used by ram blocks not > + * including any gaps due to alignment or unplugs. > + */ > + rs->migration_dirty_pages += pages; Hi Jiangshan, I think you should use 'block->used_length >> TARGET_PAGE_BITS' instead of pages here. As I have said before, we should skip dirty logging the related operations of the shared memory to speed up the live migration process, and more important, skipping dirty log can avoid splitting the EPT entry from 2M/1G to 4K if transparent hugpage is used, and thus avoid performance degradation after migration. Some other things we should pay attention to is that some virtio devices may change the vring status when the source qemu process exit, we have some find in the previous version of QEMU, e.g. 2.6. thanks! Liang > if (migrate_postcopy_ram()) { > block->unsentmap = bitmap_new(pages); > bitmap_set(block->unsentmap, 0, pages); > @@ -2169,7 +2178,7 @@ static void ram_init_bitmaps(RAMState *rs) > qemu_mutex_lock_ramlist(); > rcu_read_lock(); > > - ram_list_init_bitmaps(); > + ram_list_init_bitmaps(rs); > memory_global_dirty_log_start(); > migration_bitmap_sync(rs); > > diff --git a/qapi/migration.json b/qapi/migration.json > index 9d0bf82cf4..45326480bd 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -357,13 +357,17 @@ > # @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps. > # (since 2.12) > # > +# @bypass-shared-memory: the shared memory region will be bypassed on migration. > +# This feature allows the memory region to be reused by new qemu(s) > +# or be migrated separately. (since 2.13) > +# > # Since: 1.2 > ## > { 'enum': 'MigrationCapability', > 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', > 'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram', > 'block', 'return-path', 'pause-before-switchover', 'x-multifd', > - 'dirty-bitmaps' ] } > + 'dirty-bitmaps', 'bypass-shared-memory' ] } > > ## > # @MigrationCapabilityStatus: > -- > 2.15.1 (Apple Git-101) > >
On Tue, Apr 10, 2018 at 1:30 AM, Dr. David Alan Gilbert <dgilbert@redhat.com> wrote: >> >> +bool migrate_bypass_shared_memory(void) >> +{ >> + MigrationState *s; >> + >> + /* it is not workable with postcopy yet. */ >> + if (migrate_postcopy_ram()) { >> + return false; >> + } > > Please change this to work in the same way as the check for > postcopy+compress in migration.c migrate_caps_check. done in V5. > >> + s = migrate_get_current(); >> + >> + return s->enabled_capabilities[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY]; >> +} >> + >> bool migrate_postcopy_ram(void) >> { >> MigrationState *s; >> diff --git a/migration/migration.h b/migration/migration.h >> index 8d2f320c48..cfd2513ef0 100644 >> --- a/migration/migration.h >> +++ b/migration/migration.h >> @@ -206,6 +206,7 @@ MigrationState *migrate_get_current(void); >> >> bool migrate_postcopy(void); >> >> +bool migrate_bypass_shared_memory(void); >> bool migrate_release_ram(void); >> bool migrate_postcopy_ram(void); >> bool migrate_zero_blocks(void); >> diff --git a/migration/ram.c b/migration/ram.c >> index 0e90efa092..bca170c386 100644 >> --- a/migration/ram.c >> +++ b/migration/ram.c >> @@ -780,6 +780,11 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, >> unsigned long *bitmap = rb->bmap; >> unsigned long next; >> >> + /* when this ramblock is requested bypassing */ >> + if (!bitmap) { >> + return size; >> + } >> + >> if (rs->ram_bulk_stage && start > 0) { >> next = start + 1; >> } else { >> @@ -850,7 +855,9 @@ static void migration_bitmap_sync(RAMState *rs) >> qemu_mutex_lock(&rs->bitmap_mutex); >> rcu_read_lock(); >> RAMBLOCK_FOREACH(block) { >> - migration_bitmap_sync_range(rs, block, 0, block->used_length); >> + if (!migrate_bypass_shared_memory() || !qemu_ram_is_shared(block)) { >> + migration_bitmap_sync_range(rs, block, 0, block->used_length); >> + } >> } >> rcu_read_unlock(); >> qemu_mutex_unlock(&rs->bitmap_mutex); >> @@ -2132,18 +2139,12 @@ static int ram_state_init(RAMState **rsp) >> qemu_mutex_init(&(*rsp)->src_page_req_mutex); >> QSIMPLEQ_INIT(&(*rsp)->src_page_requests); >> >> - /* >> - * Count the total number of pages used by ram blocks not including any >> - * gaps due to alignment or unplugs. >> - */ >> - (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS; >> - >> ram_state_reset(*rsp); >> >> return 0; >> } >> >> -static void ram_list_init_bitmaps(void) >> +static void ram_list_init_bitmaps(RAMState *rs) >> { >> RAMBlock *block; >> unsigned long pages; >> @@ -2151,9 +2152,17 @@ static void ram_list_init_bitmaps(void) >> /* Skip setting bitmap if there is no RAM */ >> if (ram_bytes_total()) { > > I think you need to add here a : > rs->migration_dirty_pages = 0; In ram_state_init(), *rsp = g_try_new0(RAMState, 1); so the state is always reset. > > I don't see anywhere else that initialises it, and there is the case of > a migration that fails, followed by a 2nd attempt. > >> QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { >> + if (migrate_bypass_shared_memory() && qemu_ram_is_shared(block)) { >> + continue; >> + } >> pages = block->max_length >> TARGET_PAGE_BITS; >> block->bmap = bitmap_new(pages); >> bitmap_set(block->bmap, 0, pages); >> + /* >> + * Count the total number of pages used by ram blocks not >> + * including any gaps due to alignment or unplugs. >> + */ >> + rs->migration_dirty_pages += pages; >> if (migrate_postcopy_ram()) { >> block->unsentmap = bitmap_new(pages); >> bitmap_set(block->unsentmap, 0, pages); >> @@ -2169,7 +2178,7 @@ static void ram_init_bitmaps(RAMState *rs) >> qemu_mutex_lock_ramlist(); >> rcu_read_lock(); >> >> - ram_list_init_bitmaps(); >> + ram_list_init_bitmaps(rs); >> memory_global_dirty_log_start(); >> migration_bitmap_sync(rs); >> >> diff --git a/qapi/migration.json b/qapi/migration.json >> index 9d0bf82cf4..45326480bd 100644 >> --- a/qapi/migration.json >> +++ b/qapi/migration.json >> @@ -357,13 +357,17 @@ >> # @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps. >> # (since 2.12) >> # >> +# @bypass-shared-memory: the shared memory region will be bypassed on migration. >> +# This feature allows the memory region to be reused by new qemu(s) >> +# or be migrated separately. (since 2.13) >> +# >> # Since: 1.2 >> ## >> { 'enum': 'MigrationCapability', >> 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', >> 'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram', >> 'block', 'return-path', 'pause-before-switchover', 'x-multifd', >> - 'dirty-bitmaps' ] } >> + 'dirty-bitmaps', 'bypass-shared-memory' ] } >> >> ## >> # @MigrationCapabilityStatus: >> -- >> 2.14.3 (Apple Git-98) >> > -- > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
* Lai Jiangshan (jiangshanlai@gmail.com) wrote: > On Tue, Apr 10, 2018 at 1:30 AM, Dr. David Alan Gilbert > <dgilbert@redhat.com> wrote: > > >> > >> +bool migrate_bypass_shared_memory(void) > >> +{ > >> + MigrationState *s; > >> + > >> + /* it is not workable with postcopy yet. */ > >> + if (migrate_postcopy_ram()) { > >> + return false; > >> + } > > > > Please change this to work in the same way as the check for > > postcopy+compress in migration.c migrate_caps_check. > > > done in V5. > > > > >> + s = migrate_get_current(); > >> + > >> + return s->enabled_capabilities[MIGRATION_CAPABILITY_BYPASS_SHARED_MEMORY]; > >> +} > >> + > >> bool migrate_postcopy_ram(void) > >> { > >> MigrationState *s; > >> diff --git a/migration/migration.h b/migration/migration.h > >> index 8d2f320c48..cfd2513ef0 100644 > >> --- a/migration/migration.h > >> +++ b/migration/migration.h > >> @@ -206,6 +206,7 @@ MigrationState *migrate_get_current(void); > >> > >> bool migrate_postcopy(void); > >> > >> +bool migrate_bypass_shared_memory(void); > >> bool migrate_release_ram(void); > >> bool migrate_postcopy_ram(void); > >> bool migrate_zero_blocks(void); > >> diff --git a/migration/ram.c b/migration/ram.c > >> index 0e90efa092..bca170c386 100644 > >> --- a/migration/ram.c > >> +++ b/migration/ram.c > >> @@ -780,6 +780,11 @@ unsigned long migration_bitmap_find_dirty(RAMState *rs, RAMBlock *rb, > >> unsigned long *bitmap = rb->bmap; > >> unsigned long next; > >> > >> + /* when this ramblock is requested bypassing */ > >> + if (!bitmap) { > >> + return size; > >> + } > >> + > >> if (rs->ram_bulk_stage && start > 0) { > >> next = start + 1; > >> } else { > >> @@ -850,7 +855,9 @@ static void migration_bitmap_sync(RAMState *rs) > >> qemu_mutex_lock(&rs->bitmap_mutex); > >> rcu_read_lock(); > >> RAMBLOCK_FOREACH(block) { > >> - migration_bitmap_sync_range(rs, block, 0, block->used_length); > >> + if (!migrate_bypass_shared_memory() || !qemu_ram_is_shared(block)) { > >> + migration_bitmap_sync_range(rs, block, 0, block->used_length); > >> + } > >> } > >> rcu_read_unlock(); > >> qemu_mutex_unlock(&rs->bitmap_mutex); > >> @@ -2132,18 +2139,12 @@ static int ram_state_init(RAMState **rsp) > >> qemu_mutex_init(&(*rsp)->src_page_req_mutex); > >> QSIMPLEQ_INIT(&(*rsp)->src_page_requests); > >> > >> - /* > >> - * Count the total number of pages used by ram blocks not including any > >> - * gaps due to alignment or unplugs. > >> - */ > >> - (*rsp)->migration_dirty_pages = ram_bytes_total() >> TARGET_PAGE_BITS; > >> - > >> ram_state_reset(*rsp); > >> > >> return 0; > >> } > >> > >> -static void ram_list_init_bitmaps(void) > >> +static void ram_list_init_bitmaps(RAMState *rs) > >> { > >> RAMBlock *block; > >> unsigned long pages; > >> @@ -2151,9 +2152,17 @@ static void ram_list_init_bitmaps(void) > >> /* Skip setting bitmap if there is no RAM */ > >> if (ram_bytes_total()) { > > > > I think you need to add here a : > > rs->migration_dirty_pages = 0; > > In ram_state_init(), > *rsp = g_try_new0(RAMState, 1); > so the state is always reset. Ah, you're right. Dave > > > > I don't see anywhere else that initialises it, and there is the case of > > a migration that fails, followed by a 2nd attempt. > > > >> QLIST_FOREACH_RCU(block, &ram_list.blocks, next) { > >> + if (migrate_bypass_shared_memory() && qemu_ram_is_shared(block)) { > >> + continue; > >> + } > >> pages = block->max_length >> TARGET_PAGE_BITS; > >> block->bmap = bitmap_new(pages); > >> bitmap_set(block->bmap, 0, pages); > >> + /* > >> + * Count the total number of pages used by ram blocks not > >> + * including any gaps due to alignment or unplugs. > >> + */ > >> + rs->migration_dirty_pages += pages; > >> if (migrate_postcopy_ram()) { > >> block->unsentmap = bitmap_new(pages); > >> bitmap_set(block->unsentmap, 0, pages); > >> @@ -2169,7 +2178,7 @@ static void ram_init_bitmaps(RAMState *rs) > >> qemu_mutex_lock_ramlist(); > >> rcu_read_lock(); > >> > >> - ram_list_init_bitmaps(); > >> + ram_list_init_bitmaps(rs); > >> memory_global_dirty_log_start(); > >> migration_bitmap_sync(rs); > >> > >> diff --git a/qapi/migration.json b/qapi/migration.json > >> index 9d0bf82cf4..45326480bd 100644 > >> --- a/qapi/migration.json > >> +++ b/qapi/migration.json > >> @@ -357,13 +357,17 @@ > >> # @dirty-bitmaps: If enabled, QEMU will migrate named dirty bitmaps. > >> # (since 2.12) > >> # > >> +# @bypass-shared-memory: the shared memory region will be bypassed on migration. > >> +# This feature allows the memory region to be reused by new qemu(s) > >> +# or be migrated separately. (since 2.13) > >> +# > >> # Since: 1.2 > >> ## > >> { 'enum': 'MigrationCapability', > >> 'data': ['xbzrle', 'rdma-pin-all', 'auto-converge', 'zero-blocks', > >> 'compress', 'events', 'postcopy-ram', 'x-colo', 'release-ram', > >> 'block', 'return-path', 'pause-before-switchover', 'x-multifd', > >> - 'dirty-bitmaps' ] } > >> + 'dirty-bitmaps', 'bypass-shared-memory' ] } > >> > >> ## > >> # @MigrationCapabilityStatus: > >> -- > >> 2.14.3 (Apple Git-98) > >> > > -- > > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
© 2016 - 2024 Red Hat, Inc.