[PATCH 00/20] qemu: support mapped-ram+directio+mulitfd

Jim Fehlig via Devel posted 20 patches 1 year, 6 months ago
Failed in applying to current master (apply log)
There is a newer version of this series
docs/manpages/virsh.rst            |   9 +-
include/libvirt/libvirt-domain.h   |  13 ++
src/libvirt-domain.c               |  52 +++++--
src/qemu/libvirtd_qemu.aug         |   1 +
src/qemu/qemu.conf.in              |   6 +
src/qemu/qemu_conf.c               |  16 +++
src/qemu/qemu_conf.h               |   5 +
src/qemu/qemu_driver.c             | 104 +++++++++-----
src/qemu/qemu_fd.c                 |  18 +++
src/qemu/qemu_fd.h                 |   3 +
src/qemu/qemu_migration.c          | 192 +++++++++++++++++--------
src/qemu/qemu_migration.h          |   9 +-
src/qemu/qemu_migration_params.c   |  86 ++++++++++++
src/qemu/qemu_migration_params.h   |  17 +++
src/qemu/qemu_monitor.c            |  39 ++++++
src/qemu/qemu_monitor.h            |   5 +
src/qemu/qemu_process.c            | 120 +++++++++++-----
src/qemu/qemu_process.h            |  19 ++-
src/qemu/qemu_saveimage.c          | 216 ++++++++++++++++++++---------
src/qemu/qemu_saveimage.h          |  35 +++--
src/qemu/qemu_snapshot.c           |  26 ++--
src/qemu/test_libvirtd_qemu.aug.in |   1 +
tools/virsh-domain.c               |  79 +++++++++--
23 files changed, 827 insertions(+), 244 deletions(-)
[PATCH 00/20] qemu: support mapped-ram+directio+mulitfd
Posted by Jim Fehlig via Devel 1 year, 6 months ago
This series is essentially V1 of a prior RFC [1] to support QEMU's
mapped-ram stream format [2] and migration capability. Along with
supporting mapped-ram, it implements a design approach we discussed
for supporting parallel save/restore [3]. In summary, the approach is

1. Add mapped-ram migration capability
2. Steal an element from save header 'unused' for a 'features' variable
   and bump save version to 3.
3. Add /etc/libvirt/qemu.conf knob for the save format version,
   defaulting to latest v3
4. Use v3 (aka mapped-ram) by default
5. Use mapped-ram with BYPASS_CACHE for v3, old approach for v2
6. include: Define constants for parallel save/restore
7. qemu: Add support for parallel save. Implies mapped-ram, reject if v2
8. qemu: Add support for parallel restore. Implies mapped-ram.
   Reject if v2
9. tools: add parallel parameter to virsh save command
10. tools: add parallel parameter to virsh restore command

With this series, saving and restoring using mapped-ram is enabled by
default if the underlying QEMU advertises the mapped-ram migration
capability. It can be disabled by changing the 'save_image_version'
setting in qemu.conf.

To use mapped-ram with QEMU:
- The 'mapped-ram' migration capability must be set to true
- The 'multifd' migration capability must be set to true and
  the 'multifd-channels' migration parameter must set to a
  value >= 1
- QEMU must be provided an fdset containing the migration fd(s)
- The 'migrate' qmp command is invoked with a URI referencing the fdset
  and an offset where to start reading or writing the data stream, e.g.
    
    {"execute":"migrate",
     "arguments":{"detach":true,"resume":false,
                  "uri":"file:/dev/fdset/0,offset=0x11921"}}
    
The mapped-ram stream, in conjunction with direct IO and multifd, can
significantly improve the time required to save VM memory state. The
following tables compare mapped-ram with the existing, sequential save
stream. In all cases, the save and restore operations are to/from a
block device comprised of two NVMe disks in RAID0 configuration with
xfs (~8600MiB/s). The values in the 'save time' and 'restore time'
columns were scraped from the 'real' time reported by time(1). The
'Size' and 'Blocks' columns were provided by the corresponding
outputs of stat(1).

VM: 32G RAM, 1 vcpu, idle (shortly after boot)

                       | save    | restore |
		       | time    | time    | Size         | Blocks
-----------------------+---------+---------+--------------+--------
legacy                 | 6.193s  | 4.399s  | 985744812    | 1925288
-----------------------+---------+---------+--------------+--------
mapped-ram             | 5.109s  | 1.176s  | 34368554354  | 1774472
-----------------------+---------+---------+--------------+--------
legacy + direct IO     | 5.725s  | 4.512s  | 985765251    | 1925328
-----------------------+---------+---------+--------------+--------
mapped-ram + direct IO | 4.627s  | 1.490s  | 34368554354  | 1774304
-----------------------+---------+---------+--------------+--------
mapped-ram + direct IO |         |         |              |
 + multifd-channels=8  | 4.421s  | 0.845s  | 34368554318  | 1774312
-------------------------------------------------------------------

VM: 32G RAM, 30G dirty, 1 vcpu in tight loop dirtying memory

                       | save    | restore |
		       | time    | time    | Size         | Blocks
-----------------------+---------+---------+--------------+---------
legacy                 | 25.800s | 14.332s | 33154309983  | 64754512
-----------------------+---------+---------+--------------+---------
mapped-ram             | 18.742s | 15.027s | 34368559228  | 64617160
-----------------------+---------+---------+--------------+---------
legacy + direct IO     | 13.115s | 18.050s | 33154310496  | 64754520
-----------------------+---------+---------+--------------+---------
mapped-ram + direct IO | 13.623s | 15.959s | 34368557392  | 64662040
-----------------------+-------- +---------+--------------+---------
mapped-ram + direct IO |         |         |              |
 + multifd-channels=8  | 6.994s  | 6.470s  | 34368554980  | 64665776
--------------------------------------------------------------------

As can be seen from the tables, one caveat of mapped-ram is the logical file
size of a saved image is basically equivalent to the VM memory size. Note
however that mapped-ram typically uses fewer blocks on disk.

Support for mapped-ram+direct-io only recently landed in upstream QEMU
and will first appear in the 9.1 release, which may complicate merging
support in libvirt. Specifically, I'm not sure how to detect if the
combination is supported by QEMU. Suggestions welcomed.

Similar to the RFC, V1 ignores compression. libvirt currently supports
compression by connecting the output of QEMU's save stream to the specified
compression program via a pipe. This approach is incompatible with mapped-ram
since the fd provided to QEMU must be seekable. In general, we can consider
mapped-ram and compression incompatible and document they cannot be used
together.

[1] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/EF6YS5YIPYF2JXFMSKP6OLEJ2XWXJ3XW/

[2] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapped-ram.rst?ref_type=heads

[3] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/K4BDDJDMJ22XMJEFAUE323H5S5E47VQX/

Claudio Fontana (2):
  include: Define constants for parallel save/restore
  tools: add parallel parameter to virsh restore command

Jim Fehlig (17):
  lib: virDomainSaveParams: Ensure absolute save path
  qemu_fd: Add function to retrieve fdset ID
  qemu: Add function to check capability in migration params
  qemu: Add function to get bool value from migration params
  qemu: Add mapped-ram migration capability
  qemu: Add function to get migration params for save
  qemu: QEMU_SAVE_VERSION: Bump to version 3
  qemu: conf: Add setting for save image version
  qemu: Add helper function for creating save image fd
  qemu: Add support for mapped-ram on save
  qemu: Decompose qemuSaveImageOpen
  qemu: Move creation of qemuProcessIncomingDef struct
  qemu: Apply migration parameters in qemuMigrationDstRun
  qemu: Add support for mapped-ram on restore
  qemu: Support O_DIRECT with mapped-ram on save
  qemu: Support O_DIRECT with mapped-ram on restore
  qemu: Add support for parallel save and restore

Li Zhang (1):
  tools: add parallel parameter to virsh save command

 docs/manpages/virsh.rst            |   9 +-
 include/libvirt/libvirt-domain.h   |  13 ++
 src/libvirt-domain.c               |  52 +++++--
 src/qemu/libvirtd_qemu.aug         |   1 +
 src/qemu/qemu.conf.in              |   6 +
 src/qemu/qemu_conf.c               |  16 +++
 src/qemu/qemu_conf.h               |   5 +
 src/qemu/qemu_driver.c             | 104 +++++++++-----
 src/qemu/qemu_fd.c                 |  18 +++
 src/qemu/qemu_fd.h                 |   3 +
 src/qemu/qemu_migration.c          | 192 +++++++++++++++++--------
 src/qemu/qemu_migration.h          |   9 +-
 src/qemu/qemu_migration_params.c   |  86 ++++++++++++
 src/qemu/qemu_migration_params.h   |  17 +++
 src/qemu/qemu_monitor.c            |  39 ++++++
 src/qemu/qemu_monitor.h            |   5 +
 src/qemu/qemu_process.c            | 120 +++++++++++-----
 src/qemu/qemu_process.h            |  19 ++-
 src/qemu/qemu_saveimage.c          | 216 ++++++++++++++++++++---------
 src/qemu/qemu_saveimage.h          |  35 +++--
 src/qemu/qemu_snapshot.c           |  26 ++--
 src/qemu/test_libvirtd_qemu.aug.in |   1 +
 tools/virsh-domain.c               |  79 +++++++++--
 23 files changed, 827 insertions(+), 244 deletions(-)

-- 
2.35.3
Re: [PATCH 00/20] qemu: support mapped-ram+directio+mulitfd
Posted by Jim Fehlig via Devel 1 year, 1 month ago
Happy new year everyone! One resolution I have is to revive this topic and 
strive to get the feature merged :-).

On 8/8/24 17:37, Jim Fehlig wrote:
> This series is essentially V1 of a prior RFC [1] to support QEMU's
> mapped-ram stream format [2] and migration capability. Along with
> supporting mapped-ram, it implements a design approach we discussed
> for supporting parallel save/restore [3]. In summary, the approach is
> 
> 1. Add mapped-ram migration capability
> 2. Steal an element from save header 'unused' for a 'features' variable
>     and bump save version to 3.

IIRC, the work stalled while looking for agreement on this part of the approach. 
It's implemented in patch7, and there I asked about using the 'format' field of 
SaveImageHeader, instead of introducing another field

https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/NV4E4O2STJQU7F52RFHWLE52C42NX75E/

In fact, after looking again with fresher eyes, I'm wondering if adding a new 
'format' is enough? I.e., add a new element to virQEMUSaveFormat enum. An older 
libvirt would refuse to process an image saved with the new format. This should 
allow us to avoid bumping the save version, which in turn avoids the version 
control knob in qemu.conf. Defaulting to mapped-ram would be a matter of 
changing the existing 'save_image_format' from 'raw' to 'sparse' (or whatever we 
want to call it).

Does this seem reasonable? Have I forgot about something since this work stalled?

Regards,
Jim

> 3. Add /etc/libvirt/qemu.conf knob for the save format version,
>     defaulting to latest v3
> 4. Use v3 (aka mapped-ram) by default
> 5. Use mapped-ram with BYPASS_CACHE for v3, old approach for v2
> 6. include: Define constants for parallel save/restore
> 7. qemu: Add support for parallel save. Implies mapped-ram, reject if v2
> 8. qemu: Add support for parallel restore. Implies mapped-ram.
>     Reject if v2
> 9. tools: add parallel parameter to virsh save command
> 10. tools: add parallel parameter to virsh restore command
> 
> With this series, saving and restoring using mapped-ram is enabled by
> default if the underlying QEMU advertises the mapped-ram migration
> capability. It can be disabled by changing the 'save_image_version'
> setting in qemu.conf.
> 
> To use mapped-ram with QEMU:
> - The 'mapped-ram' migration capability must be set to true
> - The 'multifd' migration capability must be set to true and
>    the 'multifd-channels' migration parameter must set to a
>    value >= 1
> - QEMU must be provided an fdset containing the migration fd(s)
> - The 'migrate' qmp command is invoked with a URI referencing the fdset
>    and an offset where to start reading or writing the data stream, e.g.
>      
>      {"execute":"migrate",
>       "arguments":{"detach":true,"resume":false,
>                    "uri":"file:/dev/fdset/0,offset=0x11921"}}
>      
> The mapped-ram stream, in conjunction with direct IO and multifd, can
> significantly improve the time required to save VM memory state. The
> following tables compare mapped-ram with the existing, sequential save
> stream. In all cases, the save and restore operations are to/from a
> block device comprised of two NVMe disks in RAID0 configuration with
> xfs (~8600MiB/s). The values in the 'save time' and 'restore time'
> columns were scraped from the 'real' time reported by time(1). The
> 'Size' and 'Blocks' columns were provided by the corresponding
> outputs of stat(1).
> 
> VM: 32G RAM, 1 vcpu, idle (shortly after boot)
> 
>                         | save    | restore |
> 		       | time    | time    | Size         | Blocks
> -----------------------+---------+---------+--------------+--------
> legacy                 | 6.193s  | 4.399s  | 985744812    | 1925288
> -----------------------+---------+---------+--------------+--------
> mapped-ram             | 5.109s  | 1.176s  | 34368554354  | 1774472
> -----------------------+---------+---------+--------------+--------
> legacy + direct IO     | 5.725s  | 4.512s  | 985765251    | 1925328
> -----------------------+---------+---------+--------------+--------
> mapped-ram + direct IO | 4.627s  | 1.490s  | 34368554354  | 1774304
> -----------------------+---------+---------+--------------+--------
> mapped-ram + direct IO |         |         |              |
>   + multifd-channels=8  | 4.421s  | 0.845s  | 34368554318  | 1774312
> -------------------------------------------------------------------
> 
> VM: 32G RAM, 30G dirty, 1 vcpu in tight loop dirtying memory
> 
>                         | save    | restore |
> 		       | time    | time    | Size         | Blocks
> -----------------------+---------+---------+--------------+---------
> legacy                 | 25.800s | 14.332s | 33154309983  | 64754512
> -----------------------+---------+---------+--------------+---------
> mapped-ram             | 18.742s | 15.027s | 34368559228  | 64617160
> -----------------------+---------+---------+--------------+---------
> legacy + direct IO     | 13.115s | 18.050s | 33154310496  | 64754520
> -----------------------+---------+---------+--------------+---------
> mapped-ram + direct IO | 13.623s | 15.959s | 34368557392  | 64662040
> -----------------------+-------- +---------+--------------+---------
> mapped-ram + direct IO |         |         |              |
>   + multifd-channels=8  | 6.994s  | 6.470s  | 34368554980  | 64665776
> --------------------------------------------------------------------
> 
> As can be seen from the tables, one caveat of mapped-ram is the logical file
> size of a saved image is basically equivalent to the VM memory size. Note
> however that mapped-ram typically uses fewer blocks on disk.
> 
> Support for mapped-ram+direct-io only recently landed in upstream QEMU
> and will first appear in the 9.1 release, which may complicate merging
> support in libvirt. Specifically, I'm not sure how to detect if the
> combination is supported by QEMU. Suggestions welcomed.
> 
> Similar to the RFC, V1 ignores compression. libvirt currently supports
> compression by connecting the output of QEMU's save stream to the specified
> compression program via a pipe. This approach is incompatible with mapped-ram
> since the fd provided to QEMU must be seekable. In general, we can consider
> mapped-ram and compression incompatible and document they cannot be used
> together.
> 
> [1] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/EF6YS5YIPYF2JXFMSKP6OLEJ2XWXJ3XW/
> 
> [2] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapped-ram.rst?ref_type=heads
> 
> [3] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/K4BDDJDMJ22XMJEFAUE323H5S5E47VQX/
> 
> Claudio Fontana (2):
>    include: Define constants for parallel save/restore
>    tools: add parallel parameter to virsh restore command
> 
> Jim Fehlig (17):
>    lib: virDomainSaveParams: Ensure absolute save path
>    qemu_fd: Add function to retrieve fdset ID
>    qemu: Add function to check capability in migration params
>    qemu: Add function to get bool value from migration params
>    qemu: Add mapped-ram migration capability
>    qemu: Add function to get migration params for save
>    qemu: QEMU_SAVE_VERSION: Bump to version 3
>    qemu: conf: Add setting for save image version
>    qemu: Add helper function for creating save image fd
>    qemu: Add support for mapped-ram on save
>    qemu: Decompose qemuSaveImageOpen
>    qemu: Move creation of qemuProcessIncomingDef struct
>    qemu: Apply migration parameters in qemuMigrationDstRun
>    qemu: Add support for mapped-ram on restore
>    qemu: Support O_DIRECT with mapped-ram on save
>    qemu: Support O_DIRECT with mapped-ram on restore
>    qemu: Add support for parallel save and restore
> 
> Li Zhang (1):
>    tools: add parallel parameter to virsh save command
> 
>   docs/manpages/virsh.rst            |   9 +-
>   include/libvirt/libvirt-domain.h   |  13 ++
>   src/libvirt-domain.c               |  52 +++++--
>   src/qemu/libvirtd_qemu.aug         |   1 +
>   src/qemu/qemu.conf.in              |   6 +
>   src/qemu/qemu_conf.c               |  16 +++
>   src/qemu/qemu_conf.h               |   5 +
>   src/qemu/qemu_driver.c             | 104 +++++++++-----
>   src/qemu/qemu_fd.c                 |  18 +++
>   src/qemu/qemu_fd.h                 |   3 +
>   src/qemu/qemu_migration.c          | 192 +++++++++++++++++--------
>   src/qemu/qemu_migration.h          |   9 +-
>   src/qemu/qemu_migration_params.c   |  86 ++++++++++++
>   src/qemu/qemu_migration_params.h   |  17 +++
>   src/qemu/qemu_monitor.c            |  39 ++++++
>   src/qemu/qemu_monitor.h            |   5 +
>   src/qemu/qemu_process.c            | 120 +++++++++++-----
>   src/qemu/qemu_process.h            |  19 ++-
>   src/qemu/qemu_saveimage.c          | 216 ++++++++++++++++++++---------
>   src/qemu/qemu_saveimage.h          |  35 +++--
>   src/qemu/qemu_snapshot.c           |  26 ++--
>   src/qemu/test_libvirtd_qemu.aug.in |   1 +
>   tools/virsh-domain.c               |  79 +++++++++--
>   23 files changed, 827 insertions(+), 244 deletions(-)
>
Re: [PATCH 00/20] qemu: support mapped-ram+directio+mulitfd
Posted by Jim Fehlig via Devel 1 year ago
On 1/8/25 16:38, Jim Fehlig wrote:
> Happy new year everyone! One resolution I have is to revive this topic and 
> strive to get the feature merged :-).
> 
> On 8/8/24 17:37, Jim Fehlig wrote:
>> This series is essentially V1 of a prior RFC [1] to support QEMU's
>> mapped-ram stream format [2] and migration capability. Along with
>> supporting mapped-ram, it implements a design approach we discussed
>> for supporting parallel save/restore [3]. In summary, the approach is
>>
>> 1. Add mapped-ram migration capability
>> 2. Steal an element from save header 'unused' for a 'features' variable
>>     and bump save version to 3.
> 
> IIRC, the work stalled while looking for agreement on this part of the approach. 
> It's implemented in patch7, and there I asked about using the 'format' field of 
> SaveImageHeader, instead of introducing another field
> 
> https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/ 
> NV4E4O2STJQU7F52RFHWLE52C42NX75E/
> 
> In fact, after looking again with fresher eyes, I'm wondering if adding a new 
> 'format' is enough? I.e., add a new element to virQEMUSaveFormat enum. An older 
> libvirt would refuse to process an image saved with the new format. This should 
> allow us to avoid bumping the save version, which in turn avoids the version 
> control knob in qemu.conf. Defaulting to mapped-ram would be a matter of 
> changing the existing 'save_image_format' from 'raw' to 'sparse' (or whatever we 
> want to call it).
> 
> Does this seem reasonable? Have I forgot about something since this work stalled?

FYI, I've been adjusting the series according to this proposal. Looks good so 
far. I wont be able to work on it tomorrow, but should have an updated patchset 
posted soon.

Regards,
Jim