[PATCH RFC 0/9] qemu: Support mapped-ram migration capability

Jim Fehlig via Devel posted 9 patches 2 months, 3 weeks ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20240613225228.10665-1-jfehlig@suse.com
src/qemu/libvirtd_qemu.aug         |   1 +
src/qemu/qemu.conf.in              |   6 +
src/qemu/qemu_conf.c               |   8 ++
src/qemu/qemu_conf.h               |   1 +
src/qemu/qemu_driver.c             |  25 ++--
src/qemu/qemu_fd.c                 |  18 +++
src/qemu/qemu_fd.h                 |   3 +
src/qemu/qemu_migration.c          |  99 ++++++++++++++-
src/qemu/qemu_migration.h          |  11 +-
src/qemu/qemu_migration_params.c   |  20 +++
src/qemu/qemu_migration_params.h   |   4 +
src/qemu/qemu_monitor.c            |  40 ++++++
src/qemu/qemu_monitor.h            |   5 +
src/qemu/qemu_process.c            |  63 +++++++---
src/qemu/qemu_process.h            |  16 ++-
src/qemu/qemu_saveimage.c          | 187 +++++++++++++++++++++++------
src/qemu/qemu_saveimage.h          |  20 ++-
src/qemu/qemu_snapshot.c           |  12 +-
src/qemu/test_libvirtd_qemu.aug.in |   1 +
19 files changed, 455 insertions(+), 85 deletions(-)
[PATCH RFC 0/9] qemu: Support mapped-ram migration capability
Posted by Jim Fehlig via Devel 2 months, 3 weeks ago
This series is a RFC for support of QEMU's mapped-ram migration
capability [1] for saving and restoring VMs. It implements the first
part of the design approach we discussed for supporting parallel
save/restore [2]. In summary, the approach is

1. Add mapped-ram migration capability
2. Steal an element from save header 'unused' for a 'features' variable
   and bump save version to 3.
3. Add /etc/libvirt/qemu.conf knob for the save format version,
   defaulting to latest v3
4. Use v3 (aka mapped-ram) by default
5. Use mapped-ram with BYPASS_CACHE for v3, old approach for v2
6. include: Define constants for parallel save/restore
7. qemu: Add support for parallel save. Implies mapped-ram, reject if v2
8. qemu: Add support for parallel restore. Implies mapped-ram.
   Reject if v2
9. tools: add parallel parameter to virsh save command
10. tools: add parallel parameter to virsh restore command

This series implements 1-5, with the BYPASS_CACHE support in patches 8
and 9 being quite hacky. They are included to discuss approaches to make
them less hacky. See the patches for details.

The QEMU mapped-ram capability currently does not support directio.
Fabino is working on that now [3]. This complicates merging support
in libvirt. I don't think it's reasonable to enable mapped-ram by
default when BYPASS_CACHE cannot be supported. Should we wait until
the mapped-ram directio support is merged in QEMU before supporting
mapped-ram in libvirt?

For the moment, compression is ignored in the new save version.
Currently, libvirt connects the output of QEMU's save stream to the
specified compression program via a pipe. This approach is incompatible
with mapped-ram since the fd provided to QEMU must be seekable. One
option is to reopen and compress the saved image after the actual save
operation has completed. This has the downside of requiring the iohelper
to handle BYPASS_CACHE, which would preclude us from removing it
sometime in the future. Other suggestions much welcomed.

Note the logical file size of mapped-ram saved images is slightly
larger than guest RAM size, so the files are often much larger than the
files produced by the existing, sequential format. However, actual blocks
written to disk is often lower with mapped-ram saved images. E.g. a saved
image from a 30G, freshly booted, idle guest results in the following
'Size' and 'Blocks' values reported by stat(1)

                 Size         Blocks
sequential     998595770      1950392
mapped-ram     34368584225    1800456

With the same guest running a workload that dirties memory

                 Size         Blocks
sequential     33173330615    64791672
mapped-ram     34368578210    64706944

Thanks for any comments on this RFC!

[1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapped-ram.rst?ref_type=heads
[2] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/K4BDDJDMJ22XMJEFAUE323H5S5E47VQX/
[3] https://mail.gnu.org/archive/html/qemu-devel/2024-05/msg04432.html

Jim Fehlig (9):
  qemu: Enable mapped-ram migration capability
  qemu_fd: Add function to retrieve fdset ID
  qemu: Add function to get migration params for save
  qemu: Add a 'features' element to save image header and bump version
  qemu: conf: Add setting for save image version
  qemu: Add support for mapped-ram on save
  qemu: Enable mapped-ram on restore
  qemu: Support O_DIRECT with mapped-ram on save
  qemu: Support O_DIRECT with mapped-ram on restore

 src/qemu/libvirtd_qemu.aug         |   1 +
 src/qemu/qemu.conf.in              |   6 +
 src/qemu/qemu_conf.c               |   8 ++
 src/qemu/qemu_conf.h               |   1 +
 src/qemu/qemu_driver.c             |  25 ++--
 src/qemu/qemu_fd.c                 |  18 +++
 src/qemu/qemu_fd.h                 |   3 +
 src/qemu/qemu_migration.c          |  99 ++++++++++++++-
 src/qemu/qemu_migration.h          |  11 +-
 src/qemu/qemu_migration_params.c   |  20 +++
 src/qemu/qemu_migration_params.h   |   4 +
 src/qemu/qemu_monitor.c            |  40 ++++++
 src/qemu/qemu_monitor.h            |   5 +
 src/qemu/qemu_process.c            |  63 +++++++---
 src/qemu/qemu_process.h            |  16 ++-
 src/qemu/qemu_saveimage.c          | 187 +++++++++++++++++++++++------
 src/qemu/qemu_saveimage.h          |  20 ++-
 src/qemu/qemu_snapshot.c           |  12 +-
 src/qemu/test_libvirtd_qemu.aug.in |   1 +
 19 files changed, 455 insertions(+), 85 deletions(-)

-- 
2.44.0
Re: [PATCH RFC 0/9] qemu: Support mapped-ram migration capability
Posted by Martin Kletzander 1 month ago
On Thu, Jun 13, 2024 at 04:43:14PM -0600, Jim Fehlig via Devel wrote:
>This series is a RFC for support of QEMU's mapped-ram migration
>capability [1] for saving and restoring VMs. It implements the first
>part of the design approach we discussed for supporting parallel
>save/restore [2]. In summary, the approach is
>
>1. Add mapped-ram migration capability
>2. Steal an element from save header 'unused' for a 'features' variable
>   and bump save version to 3.
>3. Add /etc/libvirt/qemu.conf knob for the save format version,
>   defaulting to latest v3
>4. Use v3 (aka mapped-ram) by default
>5. Use mapped-ram with BYPASS_CACHE for v3, old approach for v2
>6. include: Define constants for parallel save/restore
>7. qemu: Add support for parallel save. Implies mapped-ram, reject if v2
>8. qemu: Add support for parallel restore. Implies mapped-ram.
>   Reject if v2
>9. tools: add parallel parameter to virsh save command
>10. tools: add parallel parameter to virsh restore command
>
>This series implements 1-5, with the BYPASS_CACHE support in patches 8
>and 9 being quite hacky. They are included to discuss approaches to make
>them less hacky. See the patches for details.
>

They might seem tiny bit hacky, but it's not that big of a deal I think.

You could eliminate two conditions by making the first FD always
non-direct (as in either there is no BYPASS_CACHE or it's already
wrapped by the I/O helper), but it would complicate other things in the
code and would get even more hairy IMHO.

>The QEMU mapped-ram capability currently does not support directio.
>Fabino is working on that now [3]. This complicates merging support
>in libvirt. I don't think it's reasonable to enable mapped-ram by
>default when BYPASS_CACHE cannot be supported. Should we wait until
>the mapped-ram directio support is merged in QEMU before supporting
>mapped-ram in libvirt?
>

By the time I looked at this series the direct-io work has already went
in, but there is still the need for the second descriptor to do some
unaligned I/O.

 From the QEMU patches I'm not sure whether you also need to set the
direct-io migration capability/flag when migrating to an fdset.  Maybe
that's needed for migration into a file directly.

>For the moment, compression is ignored in the new save version.
>Currently, libvirt connects the output of QEMU's save stream to the
>specified compression program via a pipe. This approach is incompatible
>with mapped-ram since the fd provided to QEMU must be seekable. One
>option is to reopen and compress the saved image after the actual save
>operation has completed. This has the downside of requiring the iohelper
>to handle BYPASS_CACHE, which would preclude us from removing it
>sometime in the future. Other suggestions much welcomed.
>

I was wondering whether it would make sense to use user-space block I/O,
but we'd have to use some compression on a block-by-block basis and
since you need to be able to compress each write separately, that means
you might just save few bytes here and there.  And on top of that you'd
have to compress each individual block and that block needs to be
allocated as a whole, so no space would be saved at all.  So that does
not make sense unless there is some new format.

And compression after the save is finished is in my opinion kind of
pointless.  You don't save time and you only save disk space _after_ the
compression step is done.  Not to mention you'd have to uncompress it
again before starting QEMU from it.  I'd be fine with making users
choose between compression and mapped-ram, at least for now.  They can
compress the resulting file on their own.

>Note the logical file size of mapped-ram saved images is slightly
>larger than guest RAM size, so the files are often much larger than the
>files produced by the existing, sequential format. However, actual blocks
>written to disk is often lower with mapped-ram saved images. E.g. a saved
>image from a 30G, freshly booted, idle guest results in the following
>'Size' and 'Blocks' values reported by stat(1)
>
>                 Size         Blocks
>sequential     998595770      1950392
>mapped-ram     34368584225    1800456
>
>With the same guest running a workload that dirties memory
>
>                 Size         Blocks
>sequential     33173330615    64791672
>mapped-ram     34368578210    64706944
>

That's fine and even better.  It saves space, the only thing that
everyone needs to keep in mind is to treat it as a sparse file.

The other option for the space saving would be to consolidate the
streamed changes in the resulting file, but for little to no gain.
The mapped-ram is better.

>Thanks for any comments on this RFC!
>
>[1] https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapped-ram.rst?ref_type=heads
>[2] https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/K4BDDJDMJ22XMJEFAUE323H5S5E47VQX/
>[3] https://mail.gnu.org/archive/html/qemu-devel/2024-05/msg04432.html

Some more teeny tiny comments in two patches will follow.

Martin

>
>Jim Fehlig (9):
>  qemu: Enable mapped-ram migration capability
>  qemu_fd: Add function to retrieve fdset ID
>  qemu: Add function to get migration params for save
>  qemu: Add a 'features' element to save image header and bump version
>  qemu: conf: Add setting for save image version
>  qemu: Add support for mapped-ram on save
>  qemu: Enable mapped-ram on restore
>  qemu: Support O_DIRECT with mapped-ram on save
>  qemu: Support O_DIRECT with mapped-ram on restore
>
> src/qemu/libvirtd_qemu.aug         |   1 +
> src/qemu/qemu.conf.in              |   6 +
> src/qemu/qemu_conf.c               |   8 ++
> src/qemu/qemu_conf.h               |   1 +
> src/qemu/qemu_driver.c             |  25 ++--
> src/qemu/qemu_fd.c                 |  18 +++
> src/qemu/qemu_fd.h                 |   3 +
> src/qemu/qemu_migration.c          |  99 ++++++++++++++-
> src/qemu/qemu_migration.h          |  11 +-
> src/qemu/qemu_migration_params.c   |  20 +++
> src/qemu/qemu_migration_params.h   |   4 +
> src/qemu/qemu_monitor.c            |  40 ++++++
> src/qemu/qemu_monitor.h            |   5 +
> src/qemu/qemu_process.c            |  63 +++++++---
> src/qemu/qemu_process.h            |  16 ++-
> src/qemu/qemu_saveimage.c          | 187 +++++++++++++++++++++++------
> src/qemu/qemu_saveimage.h          |  20 ++-
> src/qemu/qemu_snapshot.c           |  12 +-
> src/qemu/test_libvirtd_qemu.aug.in |   1 +
> 19 files changed, 455 insertions(+), 85 deletions(-)
>
>-- 
>2.44.0
>
Re: [PATCH RFC 0/9] qemu: Support mapped-ram migration capability
Posted by Jim Fehlig via Devel 1 month ago
Hi Martin,

On 8/7/24 06:32, Martin Kletzander wrote:
> On Thu, Jun 13, 2024 at 04:43:14PM -0600, Jim Fehlig via Devel wrote:
>> This series is a RFC for support of QEMU's mapped-ram migration
>> capability [1] for saving and restoring VMs. It implements the first
>> part of the design approach we discussed for supporting parallel
>> save/restore [2]. In summary, the approach is
>>
>> 1. Add mapped-ram migration capability
>> 2. Steal an element from save header 'unused' for a 'features' variable
>>   and bump save version to 3.
>> 3. Add /etc/libvirt/qemu.conf knob for the save format version,
>>   defaulting to latest v3
>> 4. Use v3 (aka mapped-ram) by default
>> 5. Use mapped-ram with BYPASS_CACHE for v3, old approach for v2
>> 6. include: Define constants for parallel save/restore
>> 7. qemu: Add support for parallel save. Implies mapped-ram, reject if v2
>> 8. qemu: Add support for parallel restore. Implies mapped-ram.
>>   Reject if v2
>> 9. tools: add parallel parameter to virsh save command
>> 10. tools: add parallel parameter to virsh restore command
>>
>> This series implements 1-5, with the BYPASS_CACHE support in patches 8
>> and 9 being quite hacky. They are included to discuss approaches to make
>> them less hacky. See the patches for details.
>>
> 
> They might seem tiny bit hacky, but it's not that big of a deal I think.

In the meantime I've been working on a V1 that's a bit less hacky :-).

> You could eliminate two conditions by making the first FD always
> non-direct (as in either there is no BYPASS_CACHE or it's already
> wrapped by the I/O helper), but it would complicate other things in the
> code and would get even more hairy IMHO.

In V1, the first FD is always buffered. The directio one, if needed, is opened a 
bit later in the save process.

> 
>> The QEMU mapped-ram capability currently does not support directio.
>> Fabino is working on that now [3]. This complicates merging support
>> in libvirt. I don't think it's reasonable to enable mapped-ram by
>> default when BYPASS_CACHE cannot be supported. Should we wait until
>> the mapped-ram directio support is merged in QEMU before supporting
>> mapped-ram in libvirt?
>>
> 
> By the time I looked at this series the direct-io work has already went
> in, but there is still the need for the second descriptor to do some
> unaligned I/O.

Correct. However, ATM I don't know how to detect if qemu supports the direct-io 
migration parameter.

> 
>  From the QEMU patches I'm not sure whether you also need to set the
> direct-io migration capability/flag when migrating to an fdset.  Maybe
> that's needed for migration into a file directly.

The direct-io migration parameter must be set to 'true' and QEMU provided a 
second FD with O_DIRECT set.

> 
>> For the moment, compression is ignored in the new save version.
>> Currently, libvirt connects the output of QEMU's save stream to the
>> specified compression program via a pipe. This approach is incompatible
>> with mapped-ram since the fd provided to QEMU must be seekable. One
>> option is to reopen and compress the saved image after the actual save
>> operation has completed. This has the downside of requiring the iohelper
>> to handle BYPASS_CACHE, which would preclude us from removing it
>> sometime in the future. Other suggestions much welcomed.
>>
> 
> I was wondering whether it would make sense to use user-space block I/O,
> but we'd have to use some compression on a block-by-block basis and
> since you need to be able to compress each write separately, that means
> you might just save few bytes here and there.  And on top of that you'd
> have to compress each individual block and that block needs to be
> allocated as a whole, so no space would be saved at all.  So that does
> not make sense unless there is some new format.
> 
> And compression after the save is finished is in my opinion kind of
> pointless.  You don't save time and you only save disk space _after_ the
> compression step is done.  Not to mention you'd have to uncompress it
> again before starting QEMU from it.  I'd be fine with making users
> choose between compression and mapped-ram, at least for now.  They can
> compress the resulting file on their own.

I'd prefer to make compression incompatible with mapped-ram too. Sounds like 
Daniel is agreeable to that as well.

Regards,
Jim

> 
>> Note the logical file size of mapped-ram saved images is slightly
>> larger than guest RAM size, so the files are often much larger than the
>> files produced by the existing, sequential format. However, actual blocks
>> written to disk is often lower with mapped-ram saved images. E.g. a saved
>> image from a 30G, freshly booted, idle guest results in the following
>> 'Size' and 'Blocks' values reported by stat(1)
>>
>>                 Size         Blocks
>> sequential     998595770      1950392
>> mapped-ram     34368584225    1800456
>>
>> With the same guest running a workload that dirties memory
>>
>>                 Size         Blocks
>> sequential     33173330615    64791672
>> mapped-ram     34368578210    64706944
>>
> 
> That's fine and even better.  It saves space, the only thing that
> everyone needs to keep in mind is to treat it as a sparse file.
> 
> The other option for the space saving would be to consolidate the
> streamed changes in the resulting file, but for little to no gain.
> The mapped-ram is better.
> 
>> Thanks for any comments on this RFC!
>>
>> [1] 
>> https://gitlab.com/qemu-project/qemu/-/blob/master/docs/devel/migration/mapped-ram.rst?ref_type=heads
>> [2] 
>> https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/K4BDDJDMJ22XMJEFAUE323H5S5E47VQX/
>> [3] https://mail.gnu.org/archive/html/qemu-devel/2024-05/msg04432.html
> 
> Some more teeny tiny comments in two patches will follow.
> 
> Martin
> 
>>
>> Jim Fehlig (9):
>>  qemu: Enable mapped-ram migration capability
>>  qemu_fd: Add function to retrieve fdset ID
>>  qemu: Add function to get migration params for save
>>  qemu: Add a 'features' element to save image header and bump version
>>  qemu: conf: Add setting for save image version
>>  qemu: Add support for mapped-ram on save
>>  qemu: Enable mapped-ram on restore
>>  qemu: Support O_DIRECT with mapped-ram on save
>>  qemu: Support O_DIRECT with mapped-ram on restore
>>
>> src/qemu/libvirtd_qemu.aug         |   1 +
>> src/qemu/qemu.conf.in              |   6 +
>> src/qemu/qemu_conf.c               |   8 ++
>> src/qemu/qemu_conf.h               |   1 +
>> src/qemu/qemu_driver.c             |  25 ++--
>> src/qemu/qemu_fd.c                 |  18 +++
>> src/qemu/qemu_fd.h                 |   3 +
>> src/qemu/qemu_migration.c          |  99 ++++++++++++++-
>> src/qemu/qemu_migration.h          |  11 +-
>> src/qemu/qemu_migration_params.c   |  20 +++
>> src/qemu/qemu_migration_params.h   |   4 +
>> src/qemu/qemu_monitor.c            |  40 ++++++
>> src/qemu/qemu_monitor.h            |   5 +
>> src/qemu/qemu_process.c            |  63 +++++++---
>> src/qemu/qemu_process.h            |  16 ++-
>> src/qemu/qemu_saveimage.c          | 187 +++++++++++++++++++++++------
>> src/qemu/qemu_saveimage.h          |  20 ++-
>> src/qemu/qemu_snapshot.c           |  12 +-
>> src/qemu/test_libvirtd_qemu.aug.in |   1 +
>> 19 files changed, 455 insertions(+), 85 deletions(-)
>>
>> -- 
>> 2.44.0
>>
Re: [PATCH RFC 0/9] qemu: Support mapped-ram migration capability
Posted by Daniel P. Berrangé 1 month ago
On Wed, Aug 07, 2024 at 02:32:57PM +0200, Martin Kletzander wrote:
> On Thu, Jun 13, 2024 at 04:43:14PM -0600, Jim Fehlig via Devel wrote:
> > This series is a RFC for support of QEMU's mapped-ram migration
> > capability [1] for saving and restoring VMs. It implements the first
> > part of the design approach we discussed for supporting parallel
> > save/restore [2]. In summary, the approach is
> > 
> > 1. Add mapped-ram migration capability
> > 2. Steal an element from save header 'unused' for a 'features' variable
> >   and bump save version to 3.
> > 3. Add /etc/libvirt/qemu.conf knob for the save format version,
> >   defaulting to latest v3
> > 4. Use v3 (aka mapped-ram) by default
> > 5. Use mapped-ram with BYPASS_CACHE for v3, old approach for v2
> > 6. include: Define constants for parallel save/restore
> > 7. qemu: Add support for parallel save. Implies mapped-ram, reject if v2
> > 8. qemu: Add support for parallel restore. Implies mapped-ram.
> >   Reject if v2
> > 9. tools: add parallel parameter to virsh save command
> > 10. tools: add parallel parameter to virsh restore command
> > 
> > This series implements 1-5, with the BYPASS_CACHE support in patches 8
> > and 9 being quite hacky. They are included to discuss approaches to make
> > them less hacky. See the patches for details.
> > 
> 
> They might seem tiny bit hacky, but it's not that big of a deal I think.
> 
> You could eliminate two conditions by making the first FD always
> non-direct (as in either there is no BYPASS_CACHE or it's already
> wrapped by the I/O helper), but it would complicate other things in the
> code and would get even more hairy IMHO.
> 
> > The QEMU mapped-ram capability currently does not support directio.
> > Fabino is working on that now [3]. This complicates merging support
> > in libvirt. I don't think it's reasonable to enable mapped-ram by
> > default when BYPASS_CACHE cannot be supported. Should we wait until
> > the mapped-ram directio support is merged in QEMU before supporting
> > mapped-ram in libvirt?
> > 
> 
> By the time I looked at this series the direct-io work has already went
> in, but there is still the need for the second descriptor to do some
> unaligned I/O.
> 
> From the QEMU patches I'm not sure whether you also need to set the
> direct-io migration capability/flag when migrating to an fdset.  Maybe
> that's needed for migration into a file directly.
> 
> > For the moment, compression is ignored in the new save version.
> > Currently, libvirt connects the output of QEMU's save stream to the
> > specified compression program via a pipe. This approach is incompatible
> > with mapped-ram since the fd provided to QEMU must be seekable. One
> > option is to reopen and compress the saved image after the actual save
> > operation has completed. This has the downside of requiring the iohelper
> > to handle BYPASS_CACHE, which would preclude us from removing it
> > sometime in the future. Other suggestions much welcomed.
> > 
> 
> I was wondering whether it would make sense to use user-space block I/O,
> but we'd have to use some compression on a block-by-block basis and
> since you need to be able to compress each write separately, that means
> you might just save few bytes here and there.  And on top of that you'd
> have to compress each individual block and that block needs to be
> allocated as a whole, so no space would be saved at all.  So that does
> not make sense unless there is some new format.
> 
> And compression after the save is finished is in my opinion kind of
> pointless.  You don't save time and you only save disk space _after_ the
> compression step is done.  Not to mention you'd have to uncompress it
> again before starting QEMU from it.  I'd be fine with making users
> choose between compression and mapped-ram, at least for now.  They can
> compress the resulting file on their own.

That argument for compressing on their own applies to the existing
code too. The reason we want it in libvirt is that we make it
'just work' without requiring apps to have a login shell to the
hypervisor to run commands out of band.

So basically it depends whether disk space is more important than
overall wallclock time. It might still be worthwhile if the use of
multifd with mapped-ram massively reduces the overall save duration
and we also had a parallelized compression tool we were spawning.
eg xz can be told to use all CPU thrfads.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH RFC 0/9] qemu: Support mapped-ram migration capability
Posted by Jim Fehlig via Devel 1 month ago
On 8/7/24 09:49, Daniel P. Berrangé wrote:
> On Wed, Aug 07, 2024 at 02:32:57PM +0200, Martin Kletzander wrote:
>> On Thu, Jun 13, 2024 at 04:43:14PM -0600, Jim Fehlig via Devel wrote:
>>> This series is a RFC for support of QEMU's mapped-ram migration
>>> capability [1] for saving and restoring VMs. It implements the first
>>> part of the design approach we discussed for supporting parallel
>>> save/restore [2]. In summary, the approach is
>>>
>>> 1. Add mapped-ram migration capability
>>> 2. Steal an element from save header 'unused' for a 'features' variable
>>>    and bump save version to 3.
>>> 3. Add /etc/libvirt/qemu.conf knob for the save format version,
>>>    defaulting to latest v3
>>> 4. Use v3 (aka mapped-ram) by default
>>> 5. Use mapped-ram with BYPASS_CACHE for v3, old approach for v2
>>> 6. include: Define constants for parallel save/restore
>>> 7. qemu: Add support for parallel save. Implies mapped-ram, reject if v2
>>> 8. qemu: Add support for parallel restore. Implies mapped-ram.
>>>    Reject if v2
>>> 9. tools: add parallel parameter to virsh save command
>>> 10. tools: add parallel parameter to virsh restore command
>>>
>>> This series implements 1-5, with the BYPASS_CACHE support in patches 8
>>> and 9 being quite hacky. They are included to discuss approaches to make
>>> them less hacky. See the patches for details.
>>>
>>
>> They might seem tiny bit hacky, but it's not that big of a deal I think.
>>
>> You could eliminate two conditions by making the first FD always
>> non-direct (as in either there is no BYPASS_CACHE or it's already
>> wrapped by the I/O helper), but it would complicate other things in the
>> code and would get even more hairy IMHO.
>>
>>> The QEMU mapped-ram capability currently does not support directio.
>>> Fabino is working on that now [3]. This complicates merging support
>>> in libvirt. I don't think it's reasonable to enable mapped-ram by
>>> default when BYPASS_CACHE cannot be supported. Should we wait until
>>> the mapped-ram directio support is merged in QEMU before supporting
>>> mapped-ram in libvirt?
>>>
>>
>> By the time I looked at this series the direct-io work has already went
>> in, but there is still the need for the second descriptor to do some
>> unaligned I/O.
>>
>>  From the QEMU patches I'm not sure whether you also need to set the
>> direct-io migration capability/flag when migrating to an fdset.  Maybe
>> that's needed for migration into a file directly.
>>
>>> For the moment, compression is ignored in the new save version.
>>> Currently, libvirt connects the output of QEMU's save stream to the
>>> specified compression program via a pipe. This approach is incompatible
>>> with mapped-ram since the fd provided to QEMU must be seekable. One
>>> option is to reopen and compress the saved image after the actual save
>>> operation has completed. This has the downside of requiring the iohelper
>>> to handle BYPASS_CACHE, which would preclude us from removing it
>>> sometime in the future. Other suggestions much welcomed.
>>>
>>
>> I was wondering whether it would make sense to use user-space block I/O,
>> but we'd have to use some compression on a block-by-block basis and
>> since you need to be able to compress each write separately, that means
>> you might just save few bytes here and there.  And on top of that you'd
>> have to compress each individual block and that block needs to be
>> allocated as a whole, so no space would be saved at all.  So that does
>> not make sense unless there is some new format.
>>
>> And compression after the save is finished is in my opinion kind of
>> pointless.  You don't save time and you only save disk space _after_ the
>> compression step is done.  Not to mention you'd have to uncompress it
>> again before starting QEMU from it.  I'd be fine with making users
>> choose between compression and mapped-ram, at least for now.  They can
>> compress the resulting file on their own.
> 
> That argument for compressing on their own applies to the existing
> code too. The reason we want it in libvirt is that we make it
> 'just work' without requiring apps to have a login shell to the
> hypervisor to run commands out of band.
> 
> So basically it depends whether disk space is more important than
> overall wallclock time. It might still be worthwhile if the use of
> multifd with mapped-ram massively reduces the overall save duration
> and we also had a parallelized compression tool we were spawning.
> eg xz can be told to use all CPU thrfads.

mapped-ram+direct-io+multifd is quite an improvement over the current 
save/restore mechanism. The following table shows some save/restore stats from a 
guest with 32G RAM, 30G dirty, 1 vcpu in tight loop dirtying memory.

                        | save    | restore |
                        | time    | time    | Size         | Blocks
-----------------------+---------+---------+--------------+---------
legacy                 | 25.800s | 14.332s | 33154309983  | 64754512
-----------------------+---------+---------+--------------+---------
mapped-ram             | 18.742s | 15.027s | 34368559228  | 64617160
-----------------------+---------+---------+--------------+---------
legacy + direct IO     | 13.115s | 18.050s | 33154310496  | 64754520
-----------------------+---------+---------+--------------+---------
mapped-ram + direct IO | 13.623s | 15.959s | 34368557392  | 64662040
-----------------------+-------- +---------+--------------+---------
mapped-ram + direct IO |         |         |              |
  + multifd-channels=8  | 6.994s  | 6.470s  | 34368554980  | 64665776
--------------------------------------------------------------------

In all cases, the save and restore operations are to/from a block device 
comprised of two NVMe disks in RAID0 configuration with xfs (~8600MiB/s). The 
values in the 'save time' and 'restore time' columns were scraped from the 
'real' time reported by time(1). The 'Size' and 'Blocks' columns were provided 
by the corresponding outputs of stat(1).

Regards,
Jim
Re: [PATCH RFC 0/9] qemu: Support mapped-ram migration capability
Posted by Daniel P. Berrangé 1 month ago
On Thu, Jun 13, 2024 at 04:43:14PM -0600, Jim Fehlig via Devel wrote:
> This series is a RFC for support of QEMU's mapped-ram migration
> capability [1] for saving and restoring VMs. It implements the first
> part of the design approach we discussed for supporting parallel
> save/restore [2]. In summary, the approach is
> 
> 1. Add mapped-ram migration capability
> 2. Steal an element from save header 'unused' for a 'features' variable
>    and bump save version to 3.
> 3. Add /etc/libvirt/qemu.conf knob for the save format version,
>    defaulting to latest v3
> 4. Use v3 (aka mapped-ram) by default
> 5. Use mapped-ram with BYPASS_CACHE for v3, old approach for v2
> 6. include: Define constants for parallel save/restore
> 7. qemu: Add support for parallel save. Implies mapped-ram, reject if v2
> 8. qemu: Add support for parallel restore. Implies mapped-ram.
>    Reject if v2
> 9. tools: add parallel parameter to virsh save command
> 10. tools: add parallel parameter to virsh restore command
> 
> This series implements 1-5, with the BYPASS_CACHE support in patches 8
> and 9 being quite hacky. They are included to discuss approaches to make
> them less hacky. See the patches for details.

I'm a little on the edge about the user interface for the choice
of formats - whether the version number knob in qemu.conf is the
ideal approach or not.

> The QEMU mapped-ram capability currently does not support directio.
> Fabino is working on that now [3]. This complicates merging support
> in libvirt. I don't think it's reasonable to enable mapped-ram by
> default when BYPASS_CACHE cannot be supported. Should we wait until
> the mapped-ram directio support is merged in QEMU before supporting
> mapped-ram in libvirt?
> 
> For the moment, compression is ignored in the new save version.
> Currently, libvirt connects the output of QEMU's save stream to the
> specified compression program via a pipe. This approach is incompatible
> with mapped-ram since the fd provided to QEMU must be seekable. One
> option is to reopen and compress the saved image after the actual save
> operation has completed. This has the downside of requiring the iohelper
> to handle BYPASS_CACHE, which would preclude us from removing it
> sometime in the future. Other suggestions much welcomed.

Going back to the original motivation for mapped-ram. The first key
factor was that it will make it more viable to use multi-fd for
parallelized saving and restoring, as it lets threads write concurrently
without needing synchronization. The predictable worst case file size
when the VM is live & dirtying memory, was an added benefit.

The the streaming format, we can compress as we stream, so adding
compression burns more CPU, but this is parallelized with the QEMU
saving, so the wallclock time doesn't increase as badly.

With mapped-ram, if we want to compress, we need to wait for the
image to be fully written, as we can be re-writing regions if the
guest is live. IOW, compression cannot be parallelized with the
file writing. So compression will add significant wallclock time,
as well as having the transient extra disk usage penalty.

We can optimize to avoid the extra disk penalty by using discard
on the intermediate file every time we read a chunk. eg this
loop:

  while () {
       read 1 MB from save file
       discard 1MB from save file
       write 1 MB to compressor pipe
  }

We can't avoid the wallclock penalty from compression, and as you
say, we still need the iohelper chained after the compressor to
get O_DIRECT.

> Note the logical file size of mapped-ram saved images is slightly
> larger than guest RAM size, so the files are often much larger than the
> files produced by the existing, sequential format. However, actual blocks
> written to disk is often lower with mapped-ram saved images. E.g. a saved
> image from a 30G, freshly booted, idle guest results in the following
> 'Size' and 'Blocks' values reported by stat(1)
> 
>                  Size         Blocks
> sequential     998595770      1950392
> mapped-ram     34368584225    1800456
> 
> With the same guest running a workload that dirties memory
> 
>                  Size         Blocks
> sequential     33173330615    64791672
> mapped-ram     34368578210    64706944

I'm a little concerned this difference between sparse and non-sparse
formats could trip up existing applications that are using libvirt.

eg, openstack uses the save/restore to file facilities and IIUC can
upload the saved file to its image storage (glance). Experiance tells
us that applications will often not handle sparseness correctly if
they have never been expecting it.


Overall I'm wondering if we need to give a direct choice to mgmt
apps.

We added a the save/restore variants that accept virTypedParameters,
so we could define a VIR_DOMAIN_SAVE_PARAM_FORMAT, which accepts
'stream' and 'mapped' as options. This choice would then influence
whether we save in v2 or v3 format.  On restore we don't need a
parameter as we just probe the on disk format.

As a documentation task we can then save that compression is
incompatible with 'mapped'.

Annoyingly we already have a 'save_image_formt' in qemu.conf though
taking  'raw', 'zstd', 'lzop', etc to choose the compression type.
So we have a terminology clash.

We probably should have exposed this compression choice in the API
too, via a VIR_DOMAIN_SAVE_PARAM_COMPRESSION typed parameter, taking
values 'default', 'raw', 'zstd', 'lzop', etc, where 'default' (or
omitted) means honour the qemu.conf setting and all others override
qemu.conf

Might suggest we name the new API parameter VR_DOMAIN_SAVE_PARAM_LAYOUT
for stream vs mapped choice ?

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH RFC 0/9] qemu: Support mapped-ram migration capability
Posted by Jim Fehlig via Devel 3 weeks, 5 days ago
On 8/7/24 09:45, Daniel P. Berrangé wrote:
> On Thu, Jun 13, 2024 at 04:43:14PM -0600, Jim Fehlig via Devel wrote:
>> This series is a RFC for support of QEMU's mapped-ram migration
>> capability [1] for saving and restoring VMs. It implements the first
>> part of the design approach we discussed for supporting parallel
>> save/restore [2]. In summary, the approach is
>>
>> 1. Add mapped-ram migration capability
>> 2. Steal an element from save header 'unused' for a 'features' variable
>>     and bump save version to 3.
>> 3. Add /etc/libvirt/qemu.conf knob for the save format version,
>>     defaulting to latest v3
>> 4. Use v3 (aka mapped-ram) by default
>> 5. Use mapped-ram with BYPASS_CACHE for v3, old approach for v2
>> 6. include: Define constants for parallel save/restore
>> 7. qemu: Add support for parallel save. Implies mapped-ram, reject if v2
>> 8. qemu: Add support for parallel restore. Implies mapped-ram.
>>     Reject if v2
>> 9. tools: add parallel parameter to virsh save command
>> 10. tools: add parallel parameter to virsh restore command
>>
>> This series implements 1-5, with the BYPASS_CACHE support in patches 8
>> and 9 being quite hacky. They are included to discuss approaches to make
>> them less hacky. See the patches for details.
> 
> I'm a little on the edge about the user interface for the choice
> of formats - whether the version number knob in qemu.conf is the
> ideal approach or not.
> 
>> The QEMU mapped-ram capability currently does not support directio.
>> Fabino is working on that now [3]. This complicates merging support
>> in libvirt. I don't think it's reasonable to enable mapped-ram by
>> default when BYPASS_CACHE cannot be supported. Should we wait until
>> the mapped-ram directio support is merged in QEMU before supporting
>> mapped-ram in libvirt?
>>
>> For the moment, compression is ignored in the new save version.
>> Currently, libvirt connects the output of QEMU's save stream to the
>> specified compression program via a pipe. This approach is incompatible
>> with mapped-ram since the fd provided to QEMU must be seekable. One
>> option is to reopen and compress the saved image after the actual save
>> operation has completed. This has the downside of requiring the iohelper
>> to handle BYPASS_CACHE, which would preclude us from removing it
>> sometime in the future. Other suggestions much welcomed.
> 
> Going back to the original motivation for mapped-ram. The first key
> factor was that it will make it more viable to use multi-fd for
> parallelized saving and restoring, as it lets threads write concurrently
> without needing synchronization. The predictable worst case file size
> when the VM is live & dirtying memory, was an added benefit.
> 
> The the streaming format, we can compress as we stream, so adding
> compression burns more CPU, but this is parallelized with the QEMU
> saving, so the wallclock time doesn't increase as badly.
> 
> With mapped-ram, if we want to compress, we need to wait for the
> image to be fully written, as we can be re-writing regions if the
> guest is live. IOW, compression cannot be parallelized with the
> file writing. So compression will add significant wallclock time,
> as well as having the transient extra disk usage penalty.
> 
> We can optimize to avoid the extra disk penalty by using discard
> on the intermediate file every time we read a chunk. eg this
> loop:
> 
>    while () {
>         read 1 MB from save file
>         discard 1MB from save file
>         write 1 MB to compressor pipe
>    }
> 
> We can't avoid the wallclock penalty from compression, and as you
> say, we still need the iohelper chained after the compressor to
> get O_DIRECT.
> 
>> Note the logical file size of mapped-ram saved images is slightly
>> larger than guest RAM size, so the files are often much larger than the
>> files produced by the existing, sequential format. However, actual blocks
>> written to disk is often lower with mapped-ram saved images. E.g. a saved
>> image from a 30G, freshly booted, idle guest results in the following
>> 'Size' and 'Blocks' values reported by stat(1)
>>
>>                   Size         Blocks
>> sequential     998595770      1950392
>> mapped-ram     34368584225    1800456
>>
>> With the same guest running a workload that dirties memory
>>
>>                   Size         Blocks
>> sequential     33173330615    64791672
>> mapped-ram     34368578210    64706944
> 
> I'm a little concerned this difference between sparse and non-sparse
> formats could trip up existing applications that are using libvirt.
> 
> eg, openstack uses the save/restore to file facilities and IIUC can
> upload the saved file to its image storage (glance). Experiance tells
> us that applications will often not handle sparseness correctly if
> they have never been expecting it.
> 
> 
> Overall I'm wondering if we need to give a direct choice to mgmt
> apps.
> 
> We added a the save/restore variants that accept virTypedParameters,
> so we could define a VIR_DOMAIN_SAVE_PARAM_FORMAT, which accepts
> 'stream' and 'mapped' as options. This choice would then influence
> whether we save in v2 or v3 format.  On restore we don't need a
> parameter as we just probe the on disk format.
> 
> As a documentation task we can then save that compression is
> incompatible with 'mapped'.
> 
> Annoyingly we already have a 'save_image_formt' in qemu.conf though
> taking  'raw', 'zstd', 'lzop', etc to choose the compression type.
> So we have a terminology clash.

Thinking about this more, and your previous idea about abusing the header 
'compressed' field [1], I'm beginning to warm to the idea of adding a new item 
to the list currently accepted by save_image_format. I'm also warming to 
Fabiano's suggestion to call the new item 'sparse' :-).

AFAICT, from a user perspective, save_image_format does not imply compression. 
The implication primarily stems from variable and function naming in the code 
:-). The current documentation of save_image_format may slightly imply 
compression, but we can easily improve that. The setting already accepts 'raw' 
for the existing (non-compressed) stream format. Any opinions on adding 'sparse' 
as another supported save_image_format?

Regards,
Jim

[1] 
https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/PP3XCRF2DW4ZQC7NJUHNL4RHNDJ3PFKS/
Re: [PATCH RFC 0/9] qemu: Support mapped-ram migration capability
Posted by Jim Fehlig via Devel 3 weeks, 1 day ago
On 8/12/24 17:16, Jim Fehlig wrote:
> On 8/7/24 09:45, Daniel P. Berrangé wrote:
>> Annoyingly we already have a 'save_image_formt' in qemu.conf though
>> taking  'raw', 'zstd', 'lzop', etc to choose the compression type.
>> So we have a terminology clash.
> 
> Thinking about this more, and your previous idea about abusing the header 
> 'compressed' field [1], I'm beginning to warm to the idea of adding a new item 
> to the list currently accepted by save_image_format. I'm also warming to 
> Fabiano's suggestion to call the new item 'sparse' :-).
> 
> AFAICT, from a user perspective, save_image_format does not imply compression. 
> The implication primarily stems from variable and function naming in the code 
> :-). The current documentation of save_image_format may slightly imply 
> compression, but we can easily improve that. The setting already accepts 'raw' 
> for the existing (non-compressed) stream format.

E.g. something like this small series

https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/BXJVV3BZ6KTVRFWNP6ZFQZFKBSNTN77W/

Regards,
Jim

> 
> [1] 
> https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/message/PP3XCRF2DW4ZQC7NJUHNL4RHNDJ3PFKS/
> 
Re: [PATCH RFC 0/9] qemu: Support mapped-ram migration capability
Posted by Jim Fehlig via Devel 1 month ago
On 8/7/24 09:45, Daniel P. Berrangé wrote:
> On Thu, Jun 13, 2024 at 04:43:14PM -0600, Jim Fehlig via Devel wrote:
>> This series is a RFC for support of QEMU's mapped-ram migration
>> capability [1] for saving and restoring VMs. It implements the first
>> part of the design approach we discussed for supporting parallel
>> save/restore [2]. In summary, the approach is
>>
>> 1. Add mapped-ram migration capability
>> 2. Steal an element from save header 'unused' for a 'features' variable
>>     and bump save version to 3.
>> 3. Add /etc/libvirt/qemu.conf knob for the save format version,
>>     defaulting to latest v3
>> 4. Use v3 (aka mapped-ram) by default
>> 5. Use mapped-ram with BYPASS_CACHE for v3, old approach for v2
>> 6. include: Define constants for parallel save/restore
>> 7. qemu: Add support for parallel save. Implies mapped-ram, reject if v2
>> 8. qemu: Add support for parallel restore. Implies mapped-ram.
>>     Reject if v2
>> 9. tools: add parallel parameter to virsh save command
>> 10. tools: add parallel parameter to virsh restore command
>>
>> This series implements 1-5, with the BYPASS_CACHE support in patches 8
>> and 9 being quite hacky. They are included to discuss approaches to make
>> them less hacky. See the patches for details.
> 
> I'm a little on the edge about the user interface for the choice
> of formats - whether the version number knob in qemu.conf is the
> ideal approach or not.

Yeah, I'm not fond of the idea either after seeing it in code and experimenting 
with the result.

> 
>> The QEMU mapped-ram capability currently does not support directio.
>> Fabino is working on that now [3]. This complicates merging support
>> in libvirt. I don't think it's reasonable to enable mapped-ram by
>> default when BYPASS_CACHE cannot be supported. Should we wait until
>> the mapped-ram directio support is merged in QEMU before supporting
>> mapped-ram in libvirt?
>>
>> For the moment, compression is ignored in the new save version.
>> Currently, libvirt connects the output of QEMU's save stream to the
>> specified compression program via a pipe. This approach is incompatible
>> with mapped-ram since the fd provided to QEMU must be seekable. One
>> option is to reopen and compress the saved image after the actual save
>> operation has completed. This has the downside of requiring the iohelper
>> to handle BYPASS_CACHE, which would preclude us from removing it
>> sometime in the future. Other suggestions much welcomed.
> 
> Going back to the original motivation for mapped-ram. The first key
> factor was that it will make it more viable to use multi-fd for
> parallelized saving and restoring, as it lets threads write concurrently
> without needing synchronization. The predictable worst case file size
> when the VM is live & dirtying memory, was an added benefit.

Correct. The main motivation for mapped-ram is parallelizing save/restore. To 
that end, we could base the whole feature on whether parallel is requested. If 
so, we save with mapped-ram (failing if not supported by underlying qemu) and 
don't support compression. If parallel not requested, use existing save/restore 
implementation.

> 
> The the streaming format, we can compress as we stream, so adding
> compression burns more CPU, but this is parallelized with the QEMU
> saving, so the wallclock time doesn't increase as badly.
> 
> With mapped-ram, if we want to compress, we need to wait for the
> image to be fully written, as we can be re-writing regions if the
> guest is live. IOW, compression cannot be parallelized with the
> file writing. So compression will add significant wallclock time,
> as well as having the transient extra disk usage penalty.
> 
> We can optimize to avoid the extra disk penalty by using discard
> on the intermediate file every time we read a chunk. eg this
> loop:
> 
>    while () {
>         read 1 MB from save file
>         discard 1MB from save file
>         write 1 MB to compressor pipe
>    }
> 
> We can't avoid the wallclock penalty from compression, and as you
> say, we still need the iohelper chained after the compressor to
> get O_DIRECT.
> 
>> Note the logical file size of mapped-ram saved images is slightly
>> larger than guest RAM size, so the files are often much larger than the
>> files produced by the existing, sequential format. However, actual blocks
>> written to disk is often lower with mapped-ram saved images. E.g. a saved
>> image from a 30G, freshly booted, idle guest results in the following
>> 'Size' and 'Blocks' values reported by stat(1)
>>
>>                   Size         Blocks
>> sequential     998595770      1950392
>> mapped-ram     34368584225    1800456
>>
>> With the same guest running a workload that dirties memory
>>
>>                   Size         Blocks
>> sequential     33173330615    64791672
>> mapped-ram     34368578210    64706944
> 
> I'm a little concerned this difference between sparse and non-sparse
> formats could trip up existing applications that are using libvirt.
> 
> eg, openstack uses the save/restore to file facilities and IIUC can
> upload the saved file to its image storage (glance). Experiance tells
> us that applications will often not handle sparseness correctly if
> they have never been expecting it.

We can avoid these cases if mapped-ram is only used in conjunction with 
parallel. To make use of the parallel feature, users would need to adjust their 
tooling, if needed. Any opinions on that?

> 
> Overall I'm wondering if we need to give a direct choice to mgmt
> apps.
> 
> We added a the save/restore variants that accept virTypedParameters,
> so we could define a VIR_DOMAIN_SAVE_PARAM_FORMAT, which accepts
> 'stream' and 'mapped' as options. This choice would then influence
> whether we save in v2 or v3 format.  On restore we don't need a
> parameter as we just probe the on disk format.

I worry these options will confuse users. IMO we'd need a basic description of 
the stream formats, along with caveats of file size, lack of compression 
support, etc. It seems like quite a bit for the average user to absorb.

> 
> As a documentation task we can then save that compression is
> incompatible with 'mapped'.
> 
> Annoyingly we already have a 'save_image_formt' in qemu.conf though
> taking  'raw', 'zstd', 'lzop', etc to choose the compression type.
> So we have a terminology clash.
> 
> We probably should have exposed this compression choice in the API
> too, via a VIR_DOMAIN_SAVE_PARAM_COMPRESSION typed parameter, taking
> values 'default', 'raw', 'zstd', 'lzop', etc, where 'default' (or
> omitted) means honour the qemu.conf setting and all others override
> qemu.conf

Agree with that suggestion. And VIR_DOMAIN_SAVE_PARAM_COMPRESSION would be easy 
to describe and clear to average user :-).

> 
> Might suggest we name the new API parameter VR_DOMAIN_SAVE_PARAM_LAYOUT
> for stream vs mapped choice ?

These stream formats seem like an implementation detail we should attempt to 
hide from the user.

Regards,
Jim
Re: [PATCH RFC 0/9] qemu: Support mapped-ram migration capability
Posted by Daniel P. Berrangé 1 month ago
On Wed, Aug 07, 2024 at 12:04:18PM -0600, Jim Fehlig wrote:
> On 8/7/24 09:45, Daniel P. Berrangé wrote:
> > On Thu, Jun 13, 2024 at 04:43:14PM -0600, Jim Fehlig via Devel wrote:
> > > The QEMU mapped-ram capability currently does not support directio.
> > > Fabino is working on that now [3]. This complicates merging support
> > > in libvirt. I don't think it's reasonable to enable mapped-ram by
> > > default when BYPASS_CACHE cannot be supported. Should we wait until
> > > the mapped-ram directio support is merged in QEMU before supporting
> > > mapped-ram in libvirt?
> > > 
> > > For the moment, compression is ignored in the new save version.
> > > Currently, libvirt connects the output of QEMU's save stream to the
> > > specified compression program via a pipe. This approach is incompatible
> > > with mapped-ram since the fd provided to QEMU must be seekable. One
> > > option is to reopen and compress the saved image after the actual save
> > > operation has completed. This has the downside of requiring the iohelper
> > > to handle BYPASS_CACHE, which would preclude us from removing it
> > > sometime in the future. Other suggestions much welcomed.
> > 
> > Going back to the original motivation for mapped-ram. The first key
> > factor was that it will make it more viable to use multi-fd for
> > parallelized saving and restoring, as it lets threads write concurrently
> > without needing synchronization. The predictable worst case file size
> > when the VM is live & dirtying memory, was an added benefit.
> 
> Correct. The main motivation for mapped-ram is parallelizing save/restore.
> To that end, we could base the whole feature on whether parallel is
> requested. If so, we save with mapped-ram (failing if not supported by
> underlying qemu) and don't support compression. If parallel not requested,
> use existing save/restore implementation.

We had discussed that as a strategy previously and thought it
was a good idea.

Your other reply to me though makes me re-consider. You're
showing that /without/ O_DIRECT and multifd, then mapped-ram
is already a big win over the legacy method. The scale of that
improvement is somewhat surprising to me and if reliably so,
that is compelling to want all the time.


> > Overall I'm wondering if we need to give a direct choice to mgmt
> > apps.
> > 
> > We added a the save/restore variants that accept virTypedParameters,
> > so we could define a VIR_DOMAIN_SAVE_PARAM_FORMAT, which accepts
> > 'stream' and 'mapped' as options. This choice would then influence
> > whether we save in v2 or v3 format.  On restore we don't need a
> > parameter as we just probe the on disk format.
> 
> I worry these options will confuse users. IMO we'd need a basic description
> of the stream formats, along with caveats of file size, lack of compression
> support, etc. It seems like quite a bit for the average user to absorb.
> 
> > 
> > As a documentation task we can then save that compression is
> > incompatible with 'mapped'.
> > 
> > Annoyingly we already have a 'save_image_formt' in qemu.conf though
> > taking  'raw', 'zstd', 'lzop', etc to choose the compression type.
> > So we have a terminology clash.
> > 
> > We probably should have exposed this compression choice in the API
> > too, via a VIR_DOMAIN_SAVE_PARAM_COMPRESSION typed parameter, taking
> > values 'default', 'raw', 'zstd', 'lzop', etc, where 'default' (or
> > omitted) means honour the qemu.conf setting and all others override
> > qemu.conf
> 
> Agree with that suggestion. And VIR_DOMAIN_SAVE_PARAM_COMPRESSION would be
> easy to describe and clear to average user :-).
> 
> > 
> > Might suggest we name the new API parameter VR_DOMAIN_SAVE_PARAM_LAYOUT
> > for stream vs mapped choice ?
> 
> These stream formats seem like an implementation detail we should attempt to
> hide from the user.

Yeah it is rather an impl detail, but perhaps it is an impl detail we
cannot avoid exposing users to. After all they need to be aware of
use of sparseness in order to handle the files without ballooning up
their size.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH RFC 0/9] qemu: Support mapped-ram migration capability
Posted by Jim Fehlig via Devel 1 month ago
On 8/7/24 12:39, Daniel P. Berrangé wrote:
> On Wed, Aug 07, 2024 at 12:04:18PM -0600, Jim Fehlig wrote:
>> On 8/7/24 09:45, Daniel P. Berrangé wrote:
>>> On Thu, Jun 13, 2024 at 04:43:14PM -0600, Jim Fehlig via Devel wrote:
>>>> The QEMU mapped-ram capability currently does not support directio.
>>>> Fabino is working on that now [3]. This complicates merging support
>>>> in libvirt. I don't think it's reasonable to enable mapped-ram by
>>>> default when BYPASS_CACHE cannot be supported. Should we wait until
>>>> the mapped-ram directio support is merged in QEMU before supporting
>>>> mapped-ram in libvirt?
>>>>
>>>> For the moment, compression is ignored in the new save version.
>>>> Currently, libvirt connects the output of QEMU's save stream to the
>>>> specified compression program via a pipe. This approach is incompatible
>>>> with mapped-ram since the fd provided to QEMU must be seekable. One
>>>> option is to reopen and compress the saved image after the actual save
>>>> operation has completed. This has the downside of requiring the iohelper
>>>> to handle BYPASS_CACHE, which would preclude us from removing it
>>>> sometime in the future. Other suggestions much welcomed.
>>>
>>> Going back to the original motivation for mapped-ram. The first key
>>> factor was that it will make it more viable to use multi-fd for
>>> parallelized saving and restoring, as it lets threads write concurrently
>>> without needing synchronization. The predictable worst case file size
>>> when the VM is live & dirtying memory, was an added benefit.
>>
>> Correct. The main motivation for mapped-ram is parallelizing save/restore.
>> To that end, we could base the whole feature on whether parallel is
>> requested. If so, we save with mapped-ram (failing if not supported by
>> underlying qemu) and don't support compression. If parallel not requested,
>> use existing save/restore implementation.
> 
> We had discussed that as a strategy previously and thought it
> was a good idea.
> 
> Your other reply to me though makes me re-consider. You're
> showing that /without/ O_DIRECT and multifd, then mapped-ram
> is already a big win over the legacy method. The scale of that
> improvement is somewhat surprising to me and if reliably so,
> that is compelling to want all the time.

How do we have it all of the time without issues of spareness you describe? 
Seems it will always be opt-in, unless using the new parallel option.

> 
>>> Overall I'm wondering if we need to give a direct choice to mgmt
>>> apps.
>>>
>>> We added a the save/restore variants that accept virTypedParameters,
>>> so we could define a VIR_DOMAIN_SAVE_PARAM_FORMAT, which accepts
>>> 'stream' and 'mapped' as options. This choice would then influence
>>> whether we save in v2 or v3 format.  On restore we don't need a
>>> parameter as we just probe the on disk format.
>>
>> I worry these options will confuse users. IMO we'd need a basic description
>> of the stream formats, along with caveats of file size, lack of compression
>> support, etc. It seems like quite a bit for the average user to absorb.
>>
>>>
>>> As a documentation task we can then save that compression is
>>> incompatible with 'mapped'.
>>>
>>> Annoyingly we already have a 'save_image_formt' in qemu.conf though
>>> taking  'raw', 'zstd', 'lzop', etc to choose the compression type.
>>> So we have a terminology clash.
>>>
>>> We probably should have exposed this compression choice in the API
>>> too, via a VIR_DOMAIN_SAVE_PARAM_COMPRESSION typed parameter, taking
>>> values 'default', 'raw', 'zstd', 'lzop', etc, where 'default' (or
>>> omitted) means honour the qemu.conf setting and all others override
>>> qemu.conf
>>
>> Agree with that suggestion. And VIR_DOMAIN_SAVE_PARAM_COMPRESSION would be
>> easy to describe and clear to average user :-).
>>
>>>
>>> Might suggest we name the new API parameter VR_DOMAIN_SAVE_PARAM_LAYOUT
>>> for stream vs mapped choice ?
>>
>> These stream formats seem like an implementation detail we should attempt to
>> hide from the user.
> 
> Yeah it is rather an impl detail, but perhaps it is an impl detail we
> cannot avoid exposing users to. After all they need to be aware of
> use of sparseness in order to handle the files without ballooning up
> their size.

As for a name for the new parameter, I can't think of anything better than your 
suggestion of VIR_DOMAIN_SAVE_PARAM_LAYOUT. I much prefer _FORMAT, but yeah, 
that ship has sailed.

Before working on the API parameter, I'll summarize my understanding of the 
options for supporting mapped-ram:

1. No user control. If qemu supports mapped-ram, use it
2. Control at driver level
3. Control at the API level

We don't want 1 due to incompatibilities with compression, sparseness, etc. 
Experience with the compression feature has shown that 2 is inflexible. Control 
at the API level is preferred for maximum user flexibility.

Regards,
Jim
Re: [PATCH RFC 0/9] qemu: Support mapped-ram migration capability
Posted by Jim Fehlig via Devel 1 month ago
On 8/8/24 17:46, Jim Fehlig wrote:
> On 8/7/24 12:39, Daniel P. Berrangé wrote:
>> On Wed, Aug 07, 2024 at 12:04:18PM -0600, Jim Fehlig wrote:
>>> On 8/7/24 09:45, Daniel P. Berrangé wrote:
>>>> On Thu, Jun 13, 2024 at 04:43:14PM -0600, Jim Fehlig via Devel wrote:
>>>>> The QEMU mapped-ram capability currently does not support directio.
>>>>> Fabino is working on that now [3]. This complicates merging support
>>>>> in libvirt. I don't think it's reasonable to enable mapped-ram by
>>>>> default when BYPASS_CACHE cannot be supported. Should we wait until
>>>>> the mapped-ram directio support is merged in QEMU before supporting
>>>>> mapped-ram in libvirt?
>>>>>
>>>>> For the moment, compression is ignored in the new save version.
>>>>> Currently, libvirt connects the output of QEMU's save stream to the
>>>>> specified compression program via a pipe. This approach is incompatible
>>>>> with mapped-ram since the fd provided to QEMU must be seekable. One
>>>>> option is to reopen and compress the saved image after the actual save
>>>>> operation has completed. This has the downside of requiring the iohelper
>>>>> to handle BYPASS_CACHE, which would preclude us from removing it
>>>>> sometime in the future. Other suggestions much welcomed.
>>>>
>>>> Going back to the original motivation for mapped-ram. The first key
>>>> factor was that it will make it more viable to use multi-fd for
>>>> parallelized saving and restoring, as it lets threads write concurrently
>>>> without needing synchronization. The predictable worst case file size
>>>> when the VM is live & dirtying memory, was an added benefit.
>>>
>>> Correct. The main motivation for mapped-ram is parallelizing save/restore.
>>> To that end, we could base the whole feature on whether parallel is
>>> requested. If so, we save with mapped-ram (failing if not supported by
>>> underlying qemu) and don't support compression. If parallel not requested,
>>> use existing save/restore implementation.
>>
>> We had discussed that as a strategy previously and thought it
>> was a good idea.
>>
>> Your other reply to me though makes me re-consider. You're
>> showing that /without/ O_DIRECT and multifd, then mapped-ram
>> is already a big win over the legacy method. The scale of that
>> improvement is somewhat surprising to me and if reliably so,
>> that is compelling to want all the time.
> 
> How do we have it all of the time without issues of spareness you describe? 
> Seems it will always be opt-in, unless using the new parallel option.
> 
>>
>>>> Overall I'm wondering if we need to give a direct choice to mgmt
>>>> apps.
>>>>
>>>> We added a the save/restore variants that accept virTypedParameters,
>>>> so we could define a VIR_DOMAIN_SAVE_PARAM_FORMAT, which accepts
>>>> 'stream' and 'mapped' as options. This choice would then influence
>>>> whether we save in v2 or v3 format.  On restore we don't need a
>>>> parameter as we just probe the on disk format.
>>>
>>> I worry these options will confuse users. IMO we'd need a basic description
>>> of the stream formats, along with caveats of file size, lack of compression
>>> support, etc. It seems like quite a bit for the average user to absorb.
>>>
>>>>
>>>> As a documentation task we can then save that compression is
>>>> incompatible with 'mapped'.
>>>>
>>>> Annoyingly we already have a 'save_image_formt' in qemu.conf though
>>>> taking  'raw', 'zstd', 'lzop', etc to choose the compression type.
>>>> So we have a terminology clash.
>>>>
>>>> We probably should have exposed this compression choice in the API
>>>> too, via a VIR_DOMAIN_SAVE_PARAM_COMPRESSION typed parameter, taking
>>>> values 'default', 'raw', 'zstd', 'lzop', etc, where 'default' (or
>>>> omitted) means honour the qemu.conf setting and all others override
>>>> qemu.conf
>>>
>>> Agree with that suggestion. And VIR_DOMAIN_SAVE_PARAM_COMPRESSION would be
>>> easy to describe and clear to average user :-).
>>>
>>>>
>>>> Might suggest we name the new API parameter VR_DOMAIN_SAVE_PARAM_LAYOUT
>>>> for stream vs mapped choice ?
>>>
>>> These stream formats seem like an implementation detail we should attempt to
>>> hide from the user.
>>
>> Yeah it is rather an impl detail, but perhaps it is an impl detail we
>> cannot avoid exposing users to. After all they need to be aware of
>> use of sparseness in order to handle the files without ballooning up
>> their size.
> 
> As for a name for the new parameter, I can't think of anything better than your 
> suggestion of VIR_DOMAIN_SAVE_PARAM_LAYOUT. I much prefer _FORMAT, but yeah, 
> that ship has sailed.
> 
> Before working on the API parameter, I'll summarize my understanding of the 
> options for supporting mapped-ram:
> 
> 1. No user control. If qemu supports mapped-ram, use it
> 2. Control at driver level

Forgot to mention: I sent the implementation for this option since it was ready 
anyhow

https://lists.libvirt.org/archives/list/devel@lists.libvirt.org/thread/MNBHGQH7PKV4RXQZXLPAGMOTNEVR3JVS/

Regards,
Jim