[PATCH 10/20] qemu: Add support for mapped-ram on save

Jim Fehlig via Devel posted 20 patches 3 months, 2 weeks ago
[PATCH 10/20] qemu: Add support for mapped-ram on save
Posted by Jim Fehlig via Devel 3 months, 2 weeks ago
Introduce support for QEMU's new mapped-ram stream format [1].
mapped-ram is enabled by default if the underlying QEMU advertises
the mapped-ram migration capability. It can be disabled by changing
the 'save_image_version' setting in qemu.conf to version '2'.

To use mapped-ram with QEMU:
- The 'mapped-ram' migration capability must be set to true
- The 'multifd' migration capability must be set to true and
  the 'multifd-channels' migration parameter must set to 1
- QEMU must be provided an fdset containing the migration fd
- The 'migrate' qmp command is invoked with a URI referencing the
  fdset and an offset where to start writing the data stream, e.g.

  {"execute":"migrate",
   "arguments":{"detach":true,"resume":false,
                "uri":"file:/dev/fdset/0,offset=0x11921"}}

The mapped-ram stream, in conjunction with direct IO and multifd
support provided by subsequent patches, can significantly improve
the time required to save VM memory state. The following tables
compare mapped-ram with the existing, sequential save stream. In
all cases, the save and restore operations are to/from a block
device comprised of two NVMe disks in RAID0 configuration with
xfs (~8600MiB/s). The values in the 'save time' and 'restore time'
columns were scraped from the 'real' time reported by time(1). The
'Size' and 'Blocks' columns were provided by the corresponding
outputs of stat(1).

VM: 32G RAM, 1 vcpu, idle (shortly after boot)

                       | save    | restore |
		       | time    | time    | Size         | Blocks
-----------------------+---------+---------+--------------+--------
legacy                 | 6.193s  | 4.399s  | 985744812    | 1925288
-----------------------+---------+---------+--------------+--------
mapped-ram             | 5.109s  | 1.176s  | 34368554354  | 1774472
-----------------------+---------+---------+--------------+--------
legacy + direct IO     | 5.725s  | 4.512s  | 985765251    | 1925328
-----------------------+---------+---------+--------------+--------
mapped-ram + direct IO | 4.627s  | 1.490s  | 34368554354  | 1774304
-----------------------+---------+---------+--------------+--------
mapped-ram + direct IO |         |         |              |
 + multifd-channels=8  | 4.421s  | 0.845s  | 34368554318  | 1774312
-------------------------------------------------------------------

VM: 32G RAM, 30G dirty, 1 vcpu in tight loop dirtying memory

                       | save    | restore |
		       | time    | time    | Size         | Blocks
-----------------------+---------+---------+--------------+---------
legacy                 | 25.800s | 14.332s | 33154309983  | 64754512
-----------------------+---------+---------+--------------+---------
mapped-ram             | 18.742s | 15.027s | 34368559228  | 64617160
-----------------------+---------+---------+--------------+---------
legacy + direct IO     | 13.115s | 18.050s | 33154310496  | 64754520
-----------------------+---------+---------+--------------+---------
mapped-ram + direct IO | 13.623s | 15.959s | 34368557392  | 64662040
-----------------------+-------- +---------+--------------+---------
mapped-ram + direct IO |         |         |              |
 + multifd-channels=8  | 6.994s  | 6.470s  | 34368554980  | 64665776
--------------------------------------------------------------------

As can be seen from the tables, one caveat of mapped-ram is the logical
file size of a saved image is basically equivalent to the VM memory size.
Note however that mapped-ram typically uses fewer blocks on disk.

Another caveat of mapped-ram is the requirement for a seekable file
descriptor, which currently makes it incompatible with libvirt's
support for save image compression. Also note the mapped-ram stream
is incompatible with the existing stream format, hence mapped-ram
cannot be used to restore an image saved with the existing format
and vice versa.

[1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapped-ram.rst?ref_type=heads

Signed-off-by: Jim Fehlig <jfehlig@suse.com>
---
 src/qemu/qemu_driver.c    |  20 ++++--
 src/qemu/qemu_migration.c | 139 ++++++++++++++++++++++++++------------
 src/qemu/qemu_migration.h |   4 +-
 src/qemu/qemu_monitor.c   |  36 ++++++++++
 src/qemu/qemu_monitor.h   |   4 ++
 src/qemu/qemu_saveimage.c |  43 +++++++++---
 src/qemu/qemu_saveimage.h |   2 +
 src/qemu/qemu_snapshot.c  |   9 ++-
 8 files changed, 195 insertions(+), 62 deletions(-)

diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c
index 6c0d3e097c..977763e5d8 100644
--- a/src/qemu/qemu_driver.c
+++ b/src/qemu/qemu_driver.c
@@ -2617,6 +2617,8 @@ qemuDomainSaveInternal(virQEMUDriver *driver,
     qemuDomainObjPrivate *priv = vm->privateData;
     virQEMUSaveData *data = NULL;
     g_autoptr(qemuDomainSaveCookie) cookie = NULL;
+    g_autoptr(qemuMigrationParams) saveParams = NULL;
+    bool mappedRam = false;
 
     if (virDomainObjBeginAsyncJob(vm, VIR_ASYNC_JOB_SAVE,
                                   VIR_DOMAIN_JOB_OPERATION_SAVE, flags) < 0)
@@ -2683,12 +2685,17 @@ qemuDomainSaveInternal(virQEMUDriver *driver,
     if (!(cookie = qemuDomainSaveCookieNew(vm)))
         goto endjob;
 
-    if (!(data = virQEMUSaveDataNew(driver, xml, cookie, was_running, compressed)))
+
+    if (!(data = virQEMUSaveDataNew(driver, vm, xml, cookie, was_running, compressed)))
         goto endjob;
     xml = NULL;
 
+    mappedRam = data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM;
+    if (!(saveParams = qemuMigrationParamsForSave(mappedRam)))
+        goto endjob;
+
     ret = qemuSaveImageCreate(driver, vm, path, data, compressor,
-                              flags, VIR_ASYNC_JOB_SAVE);
+                              saveParams, flags, VIR_ASYNC_JOB_SAVE);
     if (ret < 0)
         goto endjob;
 
@@ -3122,6 +3129,8 @@ doCoreDump(virQEMUDriver *driver,
                          memory_dump_format) < 0)
             goto cleanup;
     } else {
+        g_autoptr(qemuMigrationParams) dump_params = NULL;
+
         if (dumpformat != VIR_DOMAIN_CORE_DUMP_FORMAT_RAW) {
             virReportError(VIR_ERR_OPERATION_UNSUPPORTED, "%s",
                            _("kdump-compressed format is only supported with memory-only dump"));
@@ -3131,8 +3140,11 @@ doCoreDump(virQEMUDriver *driver,
         if (!qemuMigrationSrcIsAllowed(vm, false, VIR_ASYNC_JOB_DUMP, 0))
             goto cleanup;
 
-        if (qemuMigrationSrcToFile(driver, vm, fd, compressor,
-                                   VIR_ASYNC_JOB_DUMP) < 0)
+        if (!(dump_params = qemuMigrationParamsNew()))
+            goto cleanup;
+
+        if (qemuMigrationSrcToFile(driver, vm, &fd, compressor,
+                                   dump_params, dump_flags, VIR_ASYNC_JOB_DUMP) < 0)
             goto cleanup;
     }
 
diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c
index c1bdaa273b..3ab5e2fa30 100644
--- a/src/qemu/qemu_migration.c
+++ b/src/qemu/qemu_migration.c
@@ -6944,46 +6944,17 @@ qemuMigrationProcessUnattended(virQEMUDriver *driver,
 }
 
 
-/* Helper function called while vm is active.  */
-int
-qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm,
-                       int fd,
-                       virCommand *compressor,
-                       virDomainAsyncJob asyncJob)
+static int
+qemuMigrationSrcToLegacyFile(virQEMUDriver *driver,
+                             virDomainObj *vm,
+                             int fd,
+                             virCommand *compressor,
+                             virDomainAsyncJob asyncJob)
 {
     qemuDomainObjPrivate *priv = vm->privateData;
-    int rc;
     int ret = -1;
     int pipeFD[2] = { -1, -1 };
-    unsigned long saveMigBandwidth = priv->migMaxBandwidth;
     char *errbuf = NULL;
-    virErrorPtr orig_err = NULL;
-    g_autoptr(qemuMigrationParams) migParams = NULL;
-
-    if (qemuMigrationSetDBusVMState(driver, vm) < 0)
-        return -1;
-
-    /* Increase migration bandwidth to unlimited since target is a file.
-     * Failure to change migration speed is not fatal. */
-    if (!(migParams = qemuMigrationParamsForSave(false)))
-        return -1;
-
-    if (qemuMigrationParamsSetULL(migParams,
-                                  QEMU_MIGRATION_PARAM_MAX_BANDWIDTH,
-                                  QEMU_DOMAIN_MIG_BANDWIDTH_MAX * 1024 * 1024) < 0)
-        return -1;
-
-    if (qemuMigrationParamsApply(vm, asyncJob, migParams, 0) < 0)
-        return -1;
-
-    priv->migMaxBandwidth = QEMU_DOMAIN_MIG_BANDWIDTH_MAX;
-
-    if (!virDomainObjIsActive(vm)) {
-        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
-                       _("guest unexpectedly quit"));
-        /* nothing to tear down */
-        return -1;
-    }
 
     if (compressor && virPipe(pipeFD) < 0)
         return -1;
@@ -7000,7 +6971,7 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm,
         goto cleanup;
 
     if (!compressor) {
-        rc = qemuMonitorMigrateToFd(priv->mon, 0, fd);
+        ret = qemuMonitorMigrateToFd(priv->mon, 0, fd);
     } else {
         virCommandSetInputFD(compressor, pipeFD[0]);
         virCommandSetOutputFD(compressor, &fd);
@@ -7016,12 +6987,99 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm,
             qemuDomainObjExitMonitor(vm);
             goto cleanup;
         }
-        rc = qemuMonitorMigrateToFd(priv->mon, 0, pipeFD[1]);
+        ret = qemuMonitorMigrateToFd(priv->mon, 0, pipeFD[1]);
         if (VIR_CLOSE(pipeFD[0]) < 0 ||
             VIR_CLOSE(pipeFD[1]) < 0)
             VIR_WARN("failed to close intermediate pipe");
     }
     qemuDomainObjExitMonitor(vm);
+
+ cleanup:
+    VIR_FORCE_CLOSE(pipeFD[0]);
+    VIR_FORCE_CLOSE(pipeFD[1]);
+
+    if (errbuf) {
+        VIR_DEBUG("Compression binary stderr: %s", NULLSTR(errbuf));
+        VIR_FREE(errbuf);
+    }
+
+    return ret;
+}
+
+
+static int
+qemuMigrationSrcToMappedFile(virQEMUDriver *driver,
+                             virDomainObj *vm,
+                             int *fd,
+                             unsigned int flags,
+                             virDomainAsyncJob asyncJob)
+{
+    int ret;
+
+    /* mapped-ram does not support directIO */
+    if ((flags & VIR_DOMAIN_SAVE_BYPASS_CACHE)) {
+        virReportError(VIR_ERR_OPERATION_FAILED, "%s",
+                       _("bypass cache unsupported by this system"));
+        return -1;
+    }
+
+    if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, *fd) < 0)
+        return -1;
+
+    if (qemuDomainObjEnterMonitorAsync(vm, asyncJob) < 0)
+        return -1;
+
+    ret = qemuMonitorMigrateToFdSet(vm, 0, fd);
+    qemuDomainObjExitMonitor(vm);
+    return ret;
+}
+
+
+/* Helper function called while vm is active.  */
+int
+qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm,
+                       int *fd,
+                       virCommand *compressor,
+                       qemuMigrationParams *migParams,
+                       unsigned int flags,
+                       virDomainAsyncJob asyncJob)
+{
+    qemuDomainObjPrivate *priv = vm->privateData;
+    int rc;
+    int ret = -1;
+    unsigned long saveMigBandwidth = priv->migMaxBandwidth;
+    virErrorPtr orig_err = NULL;
+
+    if (qemuMigrationSetDBusVMState(driver, vm) < 0)
+        return -1;
+
+    /* Increase migration bandwidth to unlimited since target is a file.
+     * Failure to change migration speed is not fatal. */
+    if (migParams &&
+        qemuMigrationParamsSetULL(migParams,
+                                  QEMU_MIGRATION_PARAM_MAX_BANDWIDTH,
+                                  QEMU_DOMAIN_MIG_BANDWIDTH_MAX * 1024 * 1024) < 0)
+        return -1;
+
+    if (qemuMigrationParamsApply(vm, asyncJob, migParams, 0) < 0)
+        return -1;
+
+    priv->migMaxBandwidth = QEMU_DOMAIN_MIG_BANDWIDTH_MAX;
+
+    if (!virDomainObjIsActive(vm)) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("guest unexpectedly quit"));
+        /* nothing to tear down */
+        return -1;
+    }
+
+    if (migParams &&
+        qemuMigrationParamsCapEnabled(migParams, QEMU_MIGRATION_CAP_MAPPED_RAM)) {
+        rc = qemuMigrationSrcToMappedFile(driver, vm, fd, flags, asyncJob);
+    } else {
+        rc = qemuMigrationSrcToLegacyFile(driver, vm, *fd, compressor, asyncJob);
+    }
+
     if (rc < 0)
         goto cleanup;
 
@@ -7057,13 +7115,6 @@ qemuMigrationSrcToFile(virQEMUDriver *driver, virDomainObj *vm,
         priv->migMaxBandwidth = saveMigBandwidth;
     }
 
-    VIR_FORCE_CLOSE(pipeFD[0]);
-    VIR_FORCE_CLOSE(pipeFD[1]);
-    if (errbuf) {
-        VIR_DEBUG("Compression binary stderr: %s", NULLSTR(errbuf));
-        VIR_FREE(errbuf);
-    }
-
     virErrorRestore(&orig_err);
 
     return ret;
diff --git a/src/qemu/qemu_migration.h b/src/qemu/qemu_migration.h
index ed62fd4a91..039a926e69 100644
--- a/src/qemu/qemu_migration.h
+++ b/src/qemu/qemu_migration.h
@@ -236,8 +236,10 @@ qemuMigrationSrcIsAllowed(virDomainObj *vm,
 int
 qemuMigrationSrcToFile(virQEMUDriver *driver,
                        virDomainObj *vm,
-                       int fd,
+                       int *fd,
                        virCommand *compressor,
+                       qemuMigrationParams *migParams,
+                       unsigned int flags,
                        virDomainAsyncJob asyncJob)
     ATTRIBUTE_NONNULL(1) ATTRIBUTE_NONNULL(2) G_GNUC_WARN_UNUSED_RESULT;
 
diff --git a/src/qemu/qemu_monitor.c b/src/qemu/qemu_monitor.c
index b1c0c6a064..9a454a1d08 100644
--- a/src/qemu/qemu_monitor.c
+++ b/src/qemu/qemu_monitor.c
@@ -2237,6 +2237,42 @@ qemuMonitorMigrateToFd(qemuMonitor *mon,
 }
 
 
+int
+qemuMonitorMigrateToFdSet(virDomainObj *vm,
+                          unsigned int flags,
+                          int *fd)
+{
+    qemuDomainObjPrivate *priv = vm->privateData;
+    qemuMonitor *mon = priv->mon;
+    off_t offset;
+    g_autoptr(qemuFDPass) fdPassMigrate = NULL;
+    unsigned int setId;
+    g_autofree char *uri = NULL;
+    int ret;
+
+    VIR_DEBUG("fd=%d flags=0x%x", *fd, flags);
+
+    QEMU_CHECK_MONITOR(mon);
+
+    if ((offset = lseek(*fd, 0, SEEK_CUR)) == -1) {
+        virReportSystemError(errno,
+                             "%s", _("failed to seek on file descriptor"));
+        return -1;
+    }
+
+    fdPassMigrate = qemuFDPassNew("migrate", priv);
+    qemuFDPassAddFD(fdPassMigrate, fd, "-buffered-fd");
+    if (qemuFDPassTransferMonitor(fdPassMigrate, mon) < 0)
+        return -1;
+
+    qemuFDPassGetId(fdPassMigrate, &setId);
+    uri = g_strdup_printf("file:/dev/fdset/%u,offset=%#lx", setId, offset);
+    ret = qemuMonitorJSONMigrate(mon, flags, uri);
+
+    return ret;
+}
+
+
 int
 qemuMonitorMigrateToHost(qemuMonitor *mon,
                          unsigned int flags,
diff --git a/src/qemu/qemu_monitor.h b/src/qemu/qemu_monitor.h
index 76c859a888..ebacdf110e 100644
--- a/src/qemu/qemu_monitor.h
+++ b/src/qemu/qemu_monitor.h
@@ -844,6 +844,10 @@ int qemuMonitorMigrateToFd(qemuMonitor *mon,
                            unsigned int flags,
                            int fd);
 
+int qemuMonitorMigrateToFdSet(virDomainObj *vm,
+                              unsigned int flags,
+                              int *fd);
+
 int qemuMonitorMigrateToHost(qemuMonitor *mon,
                              unsigned int flags,
                              const char *protocol,
diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c
index 6f2ce40124..98a1ad638d 100644
--- a/src/qemu/qemu_saveimage.c
+++ b/src/qemu/qemu_saveimage.c
@@ -96,6 +96,7 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(virQEMUSaveData, virQEMUSaveDataFree);
  */
 virQEMUSaveData *
 virQEMUSaveDataNew(virQEMUDriver *driver,
+                   virDomainObj *vm,
                    char *domXML,
                    qemuDomainSaveCookie *cookieObj,
                    bool running,
@@ -115,6 +116,19 @@ virQEMUSaveDataNew(virQEMUDriver *driver,
     header = &data->header;
     memcpy(header->magic, QEMU_SAVE_PARTIAL, sizeof(header->magic));
     header->version = cfg->saveImageVersion;
+
+    /* Enable mapped-ram feature if available and save version >= 3 */
+    if (header->version >= QEMU_SAVE_VERSION &&
+        qemuMigrationCapsGet(vm, QEMU_MIGRATION_CAP_MAPPED_RAM)) {
+        if (compressed != QEMU_SAVE_FORMAT_RAW) {
+            virReportError(VIR_ERR_OPERATION_FAILED,
+                           _("compression is not supported with save image version %1$u"),
+                           header->version);
+            goto error;
+        }
+        header->features |= QEMU_SAVE_FEATURE_MAPPED_RAM;
+    }
+
     header->was_running = running ? 1 : 0;
     header->compressed = compressed;
 
@@ -366,6 +380,7 @@ qemuSaveImageCreateFd(virQEMUDriver *driver,
                       virDomainObj *vm,
                       const char *path,
                       virFileWrapperFd *wrapperFd,
+                      bool mappedRam,
                       bool *needUnlink,
                       unsigned int flags)
 {
@@ -375,7 +390,7 @@ qemuSaveImageCreateFd(virQEMUDriver *driver,
     int directFlag = 0;
     unsigned int wrapperFlags = VIR_FILE_WRAPPER_NON_BLOCKING;
 
-    if (flags & VIR_DOMAIN_SAVE_BYPASS_CACHE) {
+    if (!mappedRam && flags & VIR_DOMAIN_SAVE_BYPASS_CACHE) {
         wrapperFlags |= VIR_FILE_WRAPPER_BYPASS_CACHE;
         directFlag = virFileDirectFdFlag();
         if (directFlag < 0) {
@@ -395,7 +410,7 @@ qemuSaveImageCreateFd(virQEMUDriver *driver,
     if (qemuSecuritySetImageFDLabel(driver->securityManager, vm->def, fd) < 0)
         return -1;
 
-    if (!(wrapperFd = virFileWrapperFdNew(&fd, path, wrapperFlags)))
+    if (!mappedRam && !(wrapperFd = virFileWrapperFdNew(&fd, path, wrapperFlags)))
         return -1;
 
     ret = fd;
@@ -414,6 +429,7 @@ qemuSaveImageCreate(virQEMUDriver *driver,
                     const char *path,
                     virQEMUSaveData *data,
                     virCommand *compressor,
+                    qemuMigrationParams *saveParams,
                     unsigned int flags,
                     virDomainAsyncJob asyncJob)
 {
@@ -422,9 +438,10 @@ qemuSaveImageCreate(virQEMUDriver *driver,
     int ret = -1;
     int fd = -1;
     virFileWrapperFd *wrapperFd = NULL;
+    bool mappedRam = data->header.features & QEMU_SAVE_FEATURE_MAPPED_RAM;
 
     /* Obtain the file handle.  */
-    fd = qemuSaveImageCreateFd(driver, vm, path, wrapperFd, &needUnlink, flags);
+    fd = qemuSaveImageCreateFd(driver, vm, path, wrapperFd, mappedRam, &needUnlink, flags);
 
     if (fd < 0)
         goto cleanup;
@@ -433,7 +450,7 @@ qemuSaveImageCreate(virQEMUDriver *driver,
         goto cleanup;
 
     /* Perform the migration */
-    if (qemuMigrationSrcToFile(driver, vm, fd, compressor, asyncJob) < 0)
+    if (qemuMigrationSrcToFile(driver, vm, &fd, compressor, saveParams, flags, asyncJob) < 0)
         goto cleanup;
 
     /* Touch up file header to mark image complete. */
@@ -441,14 +458,18 @@ qemuSaveImageCreate(virQEMUDriver *driver,
     /* Reopen the file to touch up the header, since we aren't set
      * up to seek backwards on wrapperFd.  The reopened fd will
      * trigger a single page of file system cache pollution, but
-     * that's acceptable.  */
-    if (VIR_CLOSE(fd) < 0) {
-        virReportSystemError(errno, _("unable to close %1$s"), path);
-        goto cleanup;
-    }
+     * that's acceptable.
+     * If using mapped-ram, the fd was passed to qemu, so no need
+     * to close it.  */
+    if (!mappedRam) {
+        if (VIR_CLOSE(fd) < 0) {
+            virReportSystemError(errno, _("unable to close %1$s"), path);
+            goto cleanup;
+        }
 
-    if (qemuDomainFileWrapperFDClose(vm, wrapperFd) < 0)
-        goto cleanup;
+        if (qemuDomainFileWrapperFDClose(vm, wrapperFd) < 0)
+            goto cleanup;
+    }
 
     if ((fd = qemuDomainOpenFile(cfg, vm->def, path, O_WRONLY, NULL)) < 0 ||
         virQEMUSaveDataFinish(data, &fd, path) < 0)
diff --git a/src/qemu/qemu_saveimage.h b/src/qemu/qemu_saveimage.h
index 3cee846f14..eb2835f11b 100644
--- a/src/qemu/qemu_saveimage.h
+++ b/src/qemu/qemu_saveimage.h
@@ -114,6 +114,7 @@ qemuSaveImageCreate(virQEMUDriver *driver,
                     const char *path,
                     virQEMUSaveData *data,
                     virCommand *compressor,
+                    qemuMigrationParams *saveParams,
                     unsigned int flags,
                     virDomainAsyncJob asyncJob);
 
@@ -124,6 +125,7 @@ virQEMUSaveDataWrite(virQEMUSaveData *data,
 
 virQEMUSaveData *
 virQEMUSaveDataNew(virQEMUDriver *driver,
+                   virDomainObj *vm,
                    char *domXML,
                    qemuDomainSaveCookie *cookieObj,
                    bool running,
diff --git a/src/qemu/qemu_snapshot.c b/src/qemu/qemu_snapshot.c
index 1d75208814..9833fb6206 100644
--- a/src/qemu/qemu_snapshot.c
+++ b/src/qemu/qemu_snapshot.c
@@ -1367,6 +1367,8 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver,
 
     /* do the memory snapshot if necessary */
     if (memory) {
+        g_autoptr(qemuMigrationParams) snap_params = NULL;
+
         /* check if migration is possible */
         if (!qemuMigrationSrcIsAllowed(vm, false, VIR_ASYNC_JOB_SNAPSHOT, 0))
             goto cleanup;
@@ -1390,7 +1392,7 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver,
             !(snapdef->cookie = (virObject *) qemuDomainSaveCookieNew(vm)))
             goto cleanup;
 
-        if (!(data = virQEMUSaveDataNew(driver, xml,
+        if (!(data = virQEMUSaveDataNew(driver, vm, xml,
                                         (qemuDomainSaveCookie *) snapdef->cookie,
                                         resume, compressed)))
             goto cleanup;
@@ -1398,8 +1400,11 @@ qemuSnapshotCreateActiveExternal(virQEMUDriver *driver,
 
         memory_existing = virFileExists(snapdef->memorysnapshotfile);
 
+        if (!(snap_params = qemuMigrationParamsNew()))
+            goto cleanup;
+
         if ((ret = qemuSaveImageCreate(driver, vm, snapdef->memorysnapshotfile,
-                                       data, compressor, 0,
+                                       data, compressor, snap_params, 0,
                                        VIR_ASYNC_JOB_SNAPSHOT)) < 0)
             goto cleanup;
 
-- 
2.35.3
Re: [PATCH 10/20] qemu: Add support for mapped-ram on save
Posted by Daniel P. Berrangé 1 month, 2 weeks ago
On Thu, Aug 08, 2024 at 05:38:03PM -0600, Jim Fehlig via Devel wrote:
> Introduce support for QEMU's new mapped-ram stream format [1].
> mapped-ram is enabled by default if the underlying QEMU advertises
> the mapped-ram migration capability. It can be disabled by changing
> the 'save_image_version' setting in qemu.conf to version '2'.
> 
> To use mapped-ram with QEMU:
> - The 'mapped-ram' migration capability must be set to true
> - The 'multifd' migration capability must be set to true and
>   the 'multifd-channels' migration parameter must set to 1
> - QEMU must be provided an fdset containing the migration fd
> - The 'migrate' qmp command is invoked with a URI referencing the
>   fdset and an offset where to start writing the data stream, e.g.
> 
>   {"execute":"migrate",
>    "arguments":{"detach":true,"resume":false,
>                 "uri":"file:/dev/fdset/0,offset=0x11921"}}
> 
> The mapped-ram stream, in conjunction with direct IO and multifd
> support provided by subsequent patches, can significantly improve
> the time required to save VM memory state. The following tables
> compare mapped-ram with the existing, sequential save stream. In
> all cases, the save and restore operations are to/from a block
> device comprised of two NVMe disks in RAID0 configuration with
> xfs (~8600MiB/s). The values in the 'save time' and 'restore time'
> columns were scraped from the 'real' time reported by time(1). The
> 'Size' and 'Blocks' columns were provided by the corresponding
> outputs of stat(1).
> 
> VM: 32G RAM, 1 vcpu, idle (shortly after boot)
> 
>                        | save    | restore |
> 		       | time    | time    | Size         | Blocks
> -----------------------+---------+---------+--------------+--------
> legacy                 | 6.193s  | 4.399s  | 985744812    | 1925288
> -----------------------+---------+---------+--------------+--------
> mapped-ram             | 5.109s  | 1.176s  | 34368554354  | 1774472

I'm surprised by the restore time speed up, as I didn't think
mapped-ram should make any perf difference without direct IO
and multifd.

> -----------------------+---------+---------+--------------+--------
> legacy + direct IO     | 5.725s  | 4.512s  | 985765251    | 1925328
> -----------------------+---------+---------+--------------+--------
> mapped-ram + direct IO | 4.627s  | 1.490s  | 34368554354  | 1774304

Still somewhat surprised by the speed up on restore here too

> -----------------------+---------+---------+--------------+--------
> mapped-ram + direct IO |         |         |              |
>  + multifd-channels=8  | 4.421s  | 0.845s  | 34368554318  | 1774312
> -------------------------------------------------------------------
> 
> VM: 32G RAM, 30G dirty, 1 vcpu in tight loop dirtying memory
> 
>                        | save    | restore |
> 		       | time    | time    | Size         | Blocks
> -----------------------+---------+---------+--------------+---------
> legacy                 | 25.800s | 14.332s | 33154309983  | 64754512
> -----------------------+---------+---------+--------------+---------
> mapped-ram             | 18.742s | 15.027s | 34368559228  | 64617160
> -----------------------+---------+---------+--------------+---------
> legacy + direct IO     | 13.115s | 18.050s | 33154310496  | 64754520
> -----------------------+---------+---------+--------------+---------
> mapped-ram + direct IO | 13.623s | 15.959s | 34368557392  | 64662040

These figures make more sense with restore time matching save time
more or less.

> -----------------------+-------- +---------+--------------+---------
> mapped-ram + direct IO |         |         |              |
>  + multifd-channels=8  | 6.994s  | 6.470s  | 34368554980  | 64665776
> --------------------------------------------------------------------
> 
> As can be seen from the tables, one caveat of mapped-ram is the logical
> file size of a saved image is basically equivalent to the VM memory size.
> Note however that mapped-ram typically uses fewer blocks on disk.
> 
> Another caveat of mapped-ram is the requirement for a seekable file
> descriptor, which currently makes it incompatible with libvirt's
> support for save image compression. Also note the mapped-ram stream
> is incompatible with the existing stream format, hence mapped-ram
> cannot be used to restore an image saved with the existing format
> and vice versa.
> 
> [1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapped-ram.rst?ref_type=heads
> 
> Signed-off-by: Jim Fehlig <jfehlig@suse.com>
> ---
>  src/qemu/qemu_driver.c    |  20 ++++--
>  src/qemu/qemu_migration.c | 139 ++++++++++++++++++++++++++------------
>  src/qemu/qemu_migration.h |   4 +-
>  src/qemu/qemu_monitor.c   |  36 ++++++++++
>  src/qemu/qemu_monitor.h   |   4 ++
>  src/qemu/qemu_saveimage.c |  43 +++++++++---
>  src/qemu/qemu_saveimage.h |   2 +
>  src/qemu/qemu_snapshot.c  |   9 ++-
>  8 files changed, 195 insertions(+), 62 deletions(-)
> 



> diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c
> index 6f2ce40124..98a1ad638d 100644
> --- a/src/qemu/qemu_saveimage.c
> +++ b/src/qemu/qemu_saveimage.c
> @@ -96,6 +96,7 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(virQEMUSaveData, virQEMUSaveDataFree);
>   */
>  virQEMUSaveData *
>  virQEMUSaveDataNew(virQEMUDriver *driver,
> +                   virDomainObj *vm,
>                     char *domXML,
>                     qemuDomainSaveCookie *cookieObj,
>                     bool running,
> @@ -115,6 +116,19 @@ virQEMUSaveDataNew(virQEMUDriver *driver,
>      header = &data->header;
>      memcpy(header->magic, QEMU_SAVE_PARTIAL, sizeof(header->magic));
>      header->version = cfg->saveImageVersion;
> +
> +    /* Enable mapped-ram feature if available and save version >= 3 */
> +    if (header->version >= QEMU_SAVE_VERSION &&
> +        qemuMigrationCapsGet(vm, QEMU_MIGRATION_CAP_MAPPED_RAM)) {
> +        if (compressed != QEMU_SAVE_FORMAT_RAW) {
> +            virReportError(VIR_ERR_OPERATION_FAILED,
> +                           _("compression is not supported with save image version %1$u"),
> +                           header->version);
> +            goto error;
> +        }
> +        header->features |= QEMU_SAVE_FEATURE_MAPPED_RAM;
> +    }

If the QEMU we're usnig doesnt have CAP_MAPPED_RAM, then I think
we should NOT default to Version 3 save images, as that's creating
a backcompat problem for zero user benefit.

This suggests that in qemu_conf.c, we should initialize the
default value to '0', and then in this code, if we see
version 0 we should pick either 2 or 3 depending on mapped
ram.

> +
>      header->was_running = running ? 1 : 0;
>      header->compressed = compressed;
>  

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH 10/20] qemu: Add support for mapped-ram on save
Posted by Fabiano Rosas 1 month, 2 weeks ago
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Aug 08, 2024 at 05:38:03PM -0600, Jim Fehlig via Devel wrote:
>> Introduce support for QEMU's new mapped-ram stream format [1].
>> mapped-ram is enabled by default if the underlying QEMU advertises
>> the mapped-ram migration capability. It can be disabled by changing
>> the 'save_image_version' setting in qemu.conf to version '2'.
>> 
>> To use mapped-ram with QEMU:
>> - The 'mapped-ram' migration capability must be set to true
>> - The 'multifd' migration capability must be set to true and
>>   the 'multifd-channels' migration parameter must set to 1
>> - QEMU must be provided an fdset containing the migration fd
>> - The 'migrate' qmp command is invoked with a URI referencing the
>>   fdset and an offset where to start writing the data stream, e.g.
>> 
>>   {"execute":"migrate",
>>    "arguments":{"detach":true,"resume":false,
>>                 "uri":"file:/dev/fdset/0,offset=0x11921"}}
>> 
>> The mapped-ram stream, in conjunction with direct IO and multifd
>> support provided by subsequent patches, can significantly improve
>> the time required to save VM memory state. The following tables
>> compare mapped-ram with the existing, sequential save stream. In
>> all cases, the save and restore operations are to/from a block
>> device comprised of two NVMe disks in RAID0 configuration with
>> xfs (~8600MiB/s). The values in the 'save time' and 'restore time'
>> columns were scraped from the 'real' time reported by time(1). The
>> 'Size' and 'Blocks' columns were provided by the corresponding
>> outputs of stat(1).
>> 
>> VM: 32G RAM, 1 vcpu, idle (shortly after boot)
>> 
>>                        | save    | restore |
>> 		       | time    | time    | Size         | Blocks
>> -----------------------+---------+---------+--------------+--------
>> legacy                 | 6.193s  | 4.399s  | 985744812    | 1925288
>> -----------------------+---------+---------+--------------+--------
>> mapped-ram             | 5.109s  | 1.176s  | 34368554354  | 1774472
>
> I'm surprised by the restore time speed up, as I didn't think
> mapped-ram should make any perf difference without direct IO
> and multifd.
>
>> -----------------------+---------+---------+--------------+--------
>> legacy + direct IO     | 5.725s  | 4.512s  | 985765251    | 1925328
>> -----------------------+---------+---------+--------------+--------
>> mapped-ram + direct IO | 4.627s  | 1.490s  | 34368554354  | 1774304
>
> Still somewhat surprised by the speed up on restore here too

Hmm, I'm thinking this might be caused by zero page handling. The non
mapped-ram path has an extra buffer_is_zero() and memset() of the hva
page.

Now, is it an issue that mapped-ram skips that memset? I assume guest
memory will always be clear at the start of migration. There won't be a
situation where the destination VM starts with memory already
dirty... *and* the save file is also different, otherwise it wouldn't
make any difference.

>
>> -----------------------+---------+---------+--------------+--------
>> mapped-ram + direct IO |         |         |              |
>>  + multifd-channels=8  | 4.421s  | 0.845s  | 34368554318  | 1774312
>> -------------------------------------------------------------------
>> 
>> VM: 32G RAM, 30G dirty, 1 vcpu in tight loop dirtying memory
>> 
>>                        | save    | restore |
>> 		       | time    | time    | Size         | Blocks
>> -----------------------+---------+---------+--------------+---------
>> legacy                 | 25.800s | 14.332s | 33154309983  | 64754512
>> -----------------------+---------+---------+--------------+---------
>> mapped-ram             | 18.742s | 15.027s | 34368559228  | 64617160
>> -----------------------+---------+---------+--------------+---------
>> legacy + direct IO     | 13.115s | 18.050s | 33154310496  | 64754520
>> -----------------------+---------+---------+--------------+---------
>> mapped-ram + direct IO | 13.623s | 15.959s | 34368557392  | 64662040
>
> These figures make more sense with restore time matching save time
> more or less.
>
>> -----------------------+-------- +---------+--------------+---------
>> mapped-ram + direct IO |         |         |              |
>>  + multifd-channels=8  | 6.994s  | 6.470s  | 34368554980  | 64665776
>> --------------------------------------------------------------------
>> 
>> As can be seen from the tables, one caveat of mapped-ram is the logical
>> file size of a saved image is basically equivalent to the VM memory size.
>> Note however that mapped-ram typically uses fewer blocks on disk.
>> 
>> Another caveat of mapped-ram is the requirement for a seekable file
>> descriptor, which currently makes it incompatible with libvirt's
>> support for save image compression. Also note the mapped-ram stream
>> is incompatible with the existing stream format, hence mapped-ram
>> cannot be used to restore an image saved with the existing format
>> and vice versa.
>> 
>> [1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapped-ram.rst?ref_type=heads
>> 
>> Signed-off-by: Jim Fehlig <jfehlig@suse.com>
>> ---
>>  src/qemu/qemu_driver.c    |  20 ++++--
>>  src/qemu/qemu_migration.c | 139 ++++++++++++++++++++++++++------------
>>  src/qemu/qemu_migration.h |   4 +-
>>  src/qemu/qemu_monitor.c   |  36 ++++++++++
>>  src/qemu/qemu_monitor.h   |   4 ++
>>  src/qemu/qemu_saveimage.c |  43 +++++++++---
>>  src/qemu/qemu_saveimage.h |   2 +
>>  src/qemu/qemu_snapshot.c  |   9 ++-
>>  8 files changed, 195 insertions(+), 62 deletions(-)
>> 
>
>
>
>> diff --git a/src/qemu/qemu_saveimage.c b/src/qemu/qemu_saveimage.c
>> index 6f2ce40124..98a1ad638d 100644
>> --- a/src/qemu/qemu_saveimage.c
>> +++ b/src/qemu/qemu_saveimage.c
>> @@ -96,6 +96,7 @@ G_DEFINE_AUTOPTR_CLEANUP_FUNC(virQEMUSaveData, virQEMUSaveDataFree);
>>   */
>>  virQEMUSaveData *
>>  virQEMUSaveDataNew(virQEMUDriver *driver,
>> +                   virDomainObj *vm,
>>                     char *domXML,
>>                     qemuDomainSaveCookie *cookieObj,
>>                     bool running,
>> @@ -115,6 +116,19 @@ virQEMUSaveDataNew(virQEMUDriver *driver,
>>      header = &data->header;
>>      memcpy(header->magic, QEMU_SAVE_PARTIAL, sizeof(header->magic));
>>      header->version = cfg->saveImageVersion;
>> +
>> +    /* Enable mapped-ram feature if available and save version >= 3 */
>> +    if (header->version >= QEMU_SAVE_VERSION &&
>> +        qemuMigrationCapsGet(vm, QEMU_MIGRATION_CAP_MAPPED_RAM)) {
>> +        if (compressed != QEMU_SAVE_FORMAT_RAW) {
>> +            virReportError(VIR_ERR_OPERATION_FAILED,
>> +                           _("compression is not supported with save image version %1$u"),
>> +                           header->version);
>> +            goto error;
>> +        }
>> +        header->features |= QEMU_SAVE_FEATURE_MAPPED_RAM;
>> +    }
>
> If the QEMU we're usnig doesnt have CAP_MAPPED_RAM, then I think
> we should NOT default to Version 3 save images, as that's creating
> a backcompat problem for zero user benefit.
>
> This suggests that in qemu_conf.c, we should initialize the
> default value to '0', and then in this code, if we see
> version 0 we should pick either 2 or 3 depending on mapped
> ram.
>
>> +
>>      header->was_running = running ? 1 : 0;
>>      header->compressed = compressed;
>>  
>
> With regards,
> Daniel
Re: [PATCH 10/20] qemu: Add support for mapped-ram on save
Posted by Daniel P. Berrangé 1 month, 1 week ago
On Thu, Oct 10, 2024 at 12:06:51PM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Thu, Aug 08, 2024 at 05:38:03PM -0600, Jim Fehlig via Devel wrote:
> >> Introduce support for QEMU's new mapped-ram stream format [1].
> >> mapped-ram is enabled by default if the underlying QEMU advertises
> >> the mapped-ram migration capability. It can be disabled by changing
> >> the 'save_image_version' setting in qemu.conf to version '2'.
> >> 
> >> To use mapped-ram with QEMU:
> >> - The 'mapped-ram' migration capability must be set to true
> >> - The 'multifd' migration capability must be set to true and
> >>   the 'multifd-channels' migration parameter must set to 1
> >> - QEMU must be provided an fdset containing the migration fd
> >> - The 'migrate' qmp command is invoked with a URI referencing the
> >>   fdset and an offset where to start writing the data stream, e.g.
> >> 
> >>   {"execute":"migrate",
> >>    "arguments":{"detach":true,"resume":false,
> >>                 "uri":"file:/dev/fdset/0,offset=0x11921"}}
> >> 
> >> The mapped-ram stream, in conjunction with direct IO and multifd
> >> support provided by subsequent patches, can significantly improve
> >> the time required to save VM memory state. The following tables
> >> compare mapped-ram with the existing, sequential save stream. In
> >> all cases, the save and restore operations are to/from a block
> >> device comprised of two NVMe disks in RAID0 configuration with
> >> xfs (~8600MiB/s). The values in the 'save time' and 'restore time'
> >> columns were scraped from the 'real' time reported by time(1). The
> >> 'Size' and 'Blocks' columns were provided by the corresponding
> >> outputs of stat(1).
> >> 
> >> VM: 32G RAM, 1 vcpu, idle (shortly after boot)
> >> 
> >>                        | save    | restore |
> >> 		       | time    | time    | Size         | Blocks
> >> -----------------------+---------+---------+--------------+--------
> >> legacy                 | 6.193s  | 4.399s  | 985744812    | 1925288
> >> -----------------------+---------+---------+--------------+--------
> >> mapped-ram             | 5.109s  | 1.176s  | 34368554354  | 1774472
> >
> > I'm surprised by the restore time speed up, as I didn't think
> > mapped-ram should make any perf difference without direct IO
> > and multifd.
> >
> >> -----------------------+---------+---------+--------------+--------
> >> legacy + direct IO     | 5.725s  | 4.512s  | 985765251    | 1925328
> >> -----------------------+---------+---------+--------------+--------
> >> mapped-ram + direct IO | 4.627s  | 1.490s  | 34368554354  | 1774304
> >
> > Still somewhat surprised by the speed up on restore here too
> 
> Hmm, I'm thinking this might be caused by zero page handling. The non
> mapped-ram path has an extra buffer_is_zero() and memset() of the hva
> page.
> 
> Now, is it an issue that mapped-ram skips that memset? I assume guest
> memory will always be clear at the start of migration. There won't be a
> situation where the destination VM starts with memory already
> dirty... *and* the save file is also different, otherwise it wouldn't
> make any difference.

Consider the snapshot use case. You're running the VM, so memory
has arbitrary contents, now you restore to a saved snapshot. QEMU
remains running this whole time and you can't assume initial
memory is zeroed. Surely we need the memset ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH 10/20] qemu: Add support for mapped-ram on save
Posted by Fabiano Rosas 1 month, 1 week ago
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Oct 10, 2024 at 12:06:51PM -0300, Fabiano Rosas wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > On Thu, Aug 08, 2024 at 05:38:03PM -0600, Jim Fehlig via Devel wrote:
>> >> Introduce support for QEMU's new mapped-ram stream format [1].
>> >> mapped-ram is enabled by default if the underlying QEMU advertises
>> >> the mapped-ram migration capability. It can be disabled by changing
>> >> the 'save_image_version' setting in qemu.conf to version '2'.
>> >> 
>> >> To use mapped-ram with QEMU:
>> >> - The 'mapped-ram' migration capability must be set to true
>> >> - The 'multifd' migration capability must be set to true and
>> >>   the 'multifd-channels' migration parameter must set to 1
>> >> - QEMU must be provided an fdset containing the migration fd
>> >> - The 'migrate' qmp command is invoked with a URI referencing the
>> >>   fdset and an offset where to start writing the data stream, e.g.
>> >> 
>> >>   {"execute":"migrate",
>> >>    "arguments":{"detach":true,"resume":false,
>> >>                 "uri":"file:/dev/fdset/0,offset=0x11921"}}
>> >> 
>> >> The mapped-ram stream, in conjunction with direct IO and multifd
>> >> support provided by subsequent patches, can significantly improve
>> >> the time required to save VM memory state. The following tables
>> >> compare mapped-ram with the existing, sequential save stream. In
>> >> all cases, the save and restore operations are to/from a block
>> >> device comprised of two NVMe disks in RAID0 configuration with
>> >> xfs (~8600MiB/s). The values in the 'save time' and 'restore time'
>> >> columns were scraped from the 'real' time reported by time(1). The
>> >> 'Size' and 'Blocks' columns were provided by the corresponding
>> >> outputs of stat(1).
>> >> 
>> >> VM: 32G RAM, 1 vcpu, idle (shortly after boot)
>> >> 
>> >>                        | save    | restore |
>> >> 		       | time    | time    | Size         | Blocks
>> >> -----------------------+---------+---------+--------------+--------
>> >> legacy                 | 6.193s  | 4.399s  | 985744812    | 1925288
>> >> -----------------------+---------+---------+--------------+--------
>> >> mapped-ram             | 5.109s  | 1.176s  | 34368554354  | 1774472
>> >
>> > I'm surprised by the restore time speed up, as I didn't think
>> > mapped-ram should make any perf difference without direct IO
>> > and multifd.
>> >
>> >> -----------------------+---------+---------+--------------+--------
>> >> legacy + direct IO     | 5.725s  | 4.512s  | 985765251    | 1925328
>> >> -----------------------+---------+---------+--------------+--------
>> >> mapped-ram + direct IO | 4.627s  | 1.490s  | 34368554354  | 1774304
>> >
>> > Still somewhat surprised by the speed up on restore here too
>> 
>> Hmm, I'm thinking this might be caused by zero page handling. The non
>> mapped-ram path has an extra buffer_is_zero() and memset() of the hva
>> page.
>> 
>> Now, is it an issue that mapped-ram skips that memset? I assume guest
>> memory will always be clear at the start of migration. There won't be a
>> situation where the destination VM starts with memory already
>> dirty... *and* the save file is also different, otherwise it wouldn't
>> make any difference.
>
> Consider the snapshot use case. You're running the VM, so memory
> has arbitrary contents, now you restore to a saved snapshot. QEMU
> remains running this whole time and you can't assume initial
> memory is zeroed. Surely we need the memset ?

Hmm, I probably have a big gap on my knowledge here, but savevm doesn't
hook into file migration, so there's no way to load a snapshot with
mapped-ram that I know of. Is this something that libvirt enables
somehow? There would be no -incoming on the cmdline.

>
> With regards,
> Daniel
Re: [PATCH 10/20] qemu: Add support for mapped-ram on save
Posted by Daniel P. Berrangé 1 month, 1 week ago
On Thu, Oct 10, 2024 at 02:52:56PM -0300, Fabiano Rosas wrote:
> Daniel P. Berrangé <berrange@redhat.com> writes:
> 
> > On Thu, Oct 10, 2024 at 12:06:51PM -0300, Fabiano Rosas wrote:
> >> Daniel P. Berrangé <berrange@redhat.com> writes:
> >> 
> >> > On Thu, Aug 08, 2024 at 05:38:03PM -0600, Jim Fehlig via Devel wrote:
> >> >> Introduce support for QEMU's new mapped-ram stream format [1].
> >> >> mapped-ram is enabled by default if the underlying QEMU advertises
> >> >> the mapped-ram migration capability. It can be disabled by changing
> >> >> the 'save_image_version' setting in qemu.conf to version '2'.
> >> >> 
> >> >> To use mapped-ram with QEMU:
> >> >> - The 'mapped-ram' migration capability must be set to true
> >> >> - The 'multifd' migration capability must be set to true and
> >> >>   the 'multifd-channels' migration parameter must set to 1
> >> >> - QEMU must be provided an fdset containing the migration fd
> >> >> - The 'migrate' qmp command is invoked with a URI referencing the
> >> >>   fdset and an offset where to start writing the data stream, e.g.
> >> >> 
> >> >>   {"execute":"migrate",
> >> >>    "arguments":{"detach":true,"resume":false,
> >> >>                 "uri":"file:/dev/fdset/0,offset=0x11921"}}
> >> >> 
> >> >> The mapped-ram stream, in conjunction with direct IO and multifd
> >> >> support provided by subsequent patches, can significantly improve
> >> >> the time required to save VM memory state. The following tables
> >> >> compare mapped-ram with the existing, sequential save stream. In
> >> >> all cases, the save and restore operations are to/from a block
> >> >> device comprised of two NVMe disks in RAID0 configuration with
> >> >> xfs (~8600MiB/s). The values in the 'save time' and 'restore time'
> >> >> columns were scraped from the 'real' time reported by time(1). The
> >> >> 'Size' and 'Blocks' columns were provided by the corresponding
> >> >> outputs of stat(1).
> >> >> 
> >> >> VM: 32G RAM, 1 vcpu, idle (shortly after boot)
> >> >> 
> >> >>                        | save    | restore |
> >> >> 		       | time    | time    | Size         | Blocks
> >> >> -----------------------+---------+---------+--------------+--------
> >> >> legacy                 | 6.193s  | 4.399s  | 985744812    | 1925288
> >> >> -----------------------+---------+---------+--------------+--------
> >> >> mapped-ram             | 5.109s  | 1.176s  | 34368554354  | 1774472
> >> >
> >> > I'm surprised by the restore time speed up, as I didn't think
> >> > mapped-ram should make any perf difference without direct IO
> >> > and multifd.
> >> >
> >> >> -----------------------+---------+---------+--------------+--------
> >> >> legacy + direct IO     | 5.725s  | 4.512s  | 985765251    | 1925328
> >> >> -----------------------+---------+---------+--------------+--------
> >> >> mapped-ram + direct IO | 4.627s  | 1.490s  | 34368554354  | 1774304
> >> >
> >> > Still somewhat surprised by the speed up on restore here too
> >> 
> >> Hmm, I'm thinking this might be caused by zero page handling. The non
> >> mapped-ram path has an extra buffer_is_zero() and memset() of the hva
> >> page.
> >> 
> >> Now, is it an issue that mapped-ram skips that memset? I assume guest
> >> memory will always be clear at the start of migration. There won't be a
> >> situation where the destination VM starts with memory already
> >> dirty... *and* the save file is also different, otherwise it wouldn't
> >> make any difference.
> >
> > Consider the snapshot use case. You're running the VM, so memory
> > has arbitrary contents, now you restore to a saved snapshot. QEMU
> > remains running this whole time and you can't assume initial
> > memory is zeroed. Surely we need the memset ?
> 
> Hmm, I probably have a big gap on my knowledge here, but savevm doesn't
> hook into file migration, so there's no way to load a snapshot with
> mapped-ram that I know of. Is this something that libvirt enables
> somehow? There would be no -incoming on the cmdline.

Opps, yes, i always forget savevm is off in its own little world.

Upstream we've talking about making savevm be a facade around the
'migrate' command, but no one has ever made a PoC.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH 10/20] qemu: Add support for mapped-ram on save
Posted by Fabiano Rosas 1 month, 1 week ago
Daniel P. Berrangé <berrange@redhat.com> writes:

> On Thu, Oct 10, 2024 at 02:52:56PM -0300, Fabiano Rosas wrote:
>> Daniel P. Berrangé <berrange@redhat.com> writes:
>> 
>> > On Thu, Oct 10, 2024 at 12:06:51PM -0300, Fabiano Rosas wrote:
>> >> Daniel P. Berrangé <berrange@redhat.com> writes:
>> >> 
>> >> > On Thu, Aug 08, 2024 at 05:38:03PM -0600, Jim Fehlig via Devel wrote:
>> >> >> Introduce support for QEMU's new mapped-ram stream format [1].
>> >> >> mapped-ram is enabled by default if the underlying QEMU advertises
>> >> >> the mapped-ram migration capability. It can be disabled by changing
>> >> >> the 'save_image_version' setting in qemu.conf to version '2'.
>> >> >> 
>> >> >> To use mapped-ram with QEMU:
>> >> >> - The 'mapped-ram' migration capability must be set to true
>> >> >> - The 'multifd' migration capability must be set to true and
>> >> >>   the 'multifd-channels' migration parameter must set to 1
>> >> >> - QEMU must be provided an fdset containing the migration fd
>> >> >> - The 'migrate' qmp command is invoked with a URI referencing the
>> >> >>   fdset and an offset where to start writing the data stream, e.g.
>> >> >> 
>> >> >>   {"execute":"migrate",
>> >> >>    "arguments":{"detach":true,"resume":false,
>> >> >>                 "uri":"file:/dev/fdset/0,offset=0x11921"}}
>> >> >> 
>> >> >> The mapped-ram stream, in conjunction with direct IO and multifd
>> >> >> support provided by subsequent patches, can significantly improve
>> >> >> the time required to save VM memory state. The following tables
>> >> >> compare mapped-ram with the existing, sequential save stream. In
>> >> >> all cases, the save and restore operations are to/from a block
>> >> >> device comprised of two NVMe disks in RAID0 configuration with
>> >> >> xfs (~8600MiB/s). The values in the 'save time' and 'restore time'
>> >> >> columns were scraped from the 'real' time reported by time(1). The
>> >> >> 'Size' and 'Blocks' columns were provided by the corresponding
>> >> >> outputs of stat(1).
>> >> >> 
>> >> >> VM: 32G RAM, 1 vcpu, idle (shortly after boot)
>> >> >> 
>> >> >>                        | save    | restore |
>> >> >> 		       | time    | time    | Size         | Blocks
>> >> >> -----------------------+---------+---------+--------------+--------
>> >> >> legacy                 | 6.193s  | 4.399s  | 985744812    | 1925288
>> >> >> -----------------------+---------+---------+--------------+--------
>> >> >> mapped-ram             | 5.109s  | 1.176s  | 34368554354  | 1774472
>> >> >
>> >> > I'm surprised by the restore time speed up, as I didn't think
>> >> > mapped-ram should make any perf difference without direct IO
>> >> > and multifd.
>> >> >
>> >> >> -----------------------+---------+---------+--------------+--------
>> >> >> legacy + direct IO     | 5.725s  | 4.512s  | 985765251    | 1925328
>> >> >> -----------------------+---------+---------+--------------+--------
>> >> >> mapped-ram + direct IO | 4.627s  | 1.490s  | 34368554354  | 1774304
>> >> >
>> >> > Still somewhat surprised by the speed up on restore here too
>> >> 
>> >> Hmm, I'm thinking this might be caused by zero page handling. The non
>> >> mapped-ram path has an extra buffer_is_zero() and memset() of the hva
>> >> page.
>> >> 
>> >> Now, is it an issue that mapped-ram skips that memset? I assume guest
>> >> memory will always be clear at the start of migration. There won't be a
>> >> situation where the destination VM starts with memory already
>> >> dirty... *and* the save file is also different, otherwise it wouldn't
>> >> make any difference.
>> >
>> > Consider the snapshot use case. You're running the VM, so memory
>> > has arbitrary contents, now you restore to a saved snapshot. QEMU
>> > remains running this whole time and you can't assume initial
>> > memory is zeroed. Surely we need the memset ?
>> 
>> Hmm, I probably have a big gap on my knowledge here, but savevm doesn't
>> hook into file migration, so there's no way to load a snapshot with
>> mapped-ram that I know of. Is this something that libvirt enables
>> somehow? There would be no -incoming on the cmdline.
>
> Opps, yes, i always forget savevm is off in its own little world.
>
> Upstream we've talking about making savevm be a facade around the
> 'migrate' command, but no one has ever made a PoC.

Yeah, that would be nice. Once I learn how the data ends up in the qcow2
image, maybe I can look into adding a new 'snapshot' migration mode to
QEMU.

>
> With regards,
> Daniel