docs/devel/migration.rst | 43 ++++ include/exec/ramblock.h | 8 + include/io/channel.h | 109 ++++++++ include/migration/qemu-file-types.h | 2 + include/qemu/bitops.h | 13 + include/qemu/osdep.h | 2 + io/channel-file.c | 69 +++++ io/channel.c | 128 ++++++++++ migration/file.c | 191 +++++++++++++- migration/file.h | 5 + migration/migration-hmp-cmds.c | 11 + migration/migration.c | 38 ++- migration/multifd-zlib.c | 22 +- migration/multifd-zstd.c | 22 +- migration/multifd.c | 376 ++++++++++++++++++++++------ migration/multifd.h | 30 ++- migration/options.c | 70 ++++++ migration/options.h | 4 + migration/qemu-file.c | 82 ++++++ migration/qemu-file.h | 7 +- migration/ram.c | 291 ++++++++++++++++++++- migration/ram.h | 1 + migration/savevm.c | 1 + monitor/fds.c | 27 +- qapi/migration.json | 24 +- tests/qtest/migration-helpers.c | 42 ++++ tests/qtest/migration-helpers.h | 1 + tests/qtest/migration-test.c | 206 +++++++++++++++ util/osdep.c | 9 + 29 files changed, 1686 insertions(+), 148 deletions(-)
Hi, In this v3: Added support for the "file:/dev/fdset/" syntax to receive multiple file descriptors. This allows the management layer to open the migration file beforehand and pass the file descriptors to QEMU. We need more than one fd to be able to use O_DIRECT concurrently with unaligned writes. Dropped the auto-pause capability. That discussion was kind of stuck. We can revisit optimizations for non-live scenarios once the series is more mature/merged. Changed the multifd incoming side to use a more generic data structure instead of MultiFDPages_t. This allows multifd to restore the ram using larger chunks. The rest are minor changes, I have noted them in the patches themselves. Thanks CI run: https://gitlab.com/farosas/qemu/-/pipelines/1086786947 v2: https://lore.kernel.org/r/20231023203608.26370-1-farosas@suse.de v1: https://lore.kernel.org/r/20230330180336.2791-1-farosas@suse.de Fabiano Rosas (24): io: fsync before closing a file channel migration/ram: Introduce 'fixed-ram' migration capability migration: Add fixed-ram URI compatibility check migration/ram: Add incoming 'fixed-ram' migration migration/multifd: Allow multifd without packets migration/multifd: Allow QIOTask error reporting without an object migration/multifd: Add outgoing QIOChannelFile support migration/multifd: Add incoming QIOChannelFile support io: Add a pwritev/preadv version that takes a discontiguous iovec multifd: Rename MultiFDSendParams::data to compress_data migration/multifd: Decouple recv method from pages migration/multifd: Allow receiving pages without packets migration/ram: Ignore multifd flush when doing fixed-ram migration migration/multifd: Support outgoing fixed-ram stream format migration/multifd: Support incoming fixed-ram stream format tests/qtest: Add a multifd + fixed-ram migration test migration: Add direct-io parameter tests/qtest: Add a test for migration with direct-io and multifd monitor: Honor QMP request for fd removal immediately monitor: Extract fdset fd flags comparison into a function monitor: fdset: Match against O_DIRECT docs/devel/migration.rst: Document the file transport migration: Add support for fdset with multifd + file tests/qtest: Add a test for fixed-ram with passing of fds Nikolay Borisov (6): io: add and implement QIO_CHANNEL_FEATURE_SEEKABLE for channel file io: Add generic pwritev/preadv interface io: implement io_pwritev/preadv for QIOChannelFile migration/qemu-file: add utility methods for working with seekable channels migration/ram: Add outgoing 'fixed-ram' migration tests/qtest: migration-test: Add tests for fixed-ram file-based migration docs/devel/migration.rst | 43 ++++ include/exec/ramblock.h | 8 + include/io/channel.h | 109 ++++++++ include/migration/qemu-file-types.h | 2 + include/qemu/bitops.h | 13 + include/qemu/osdep.h | 2 + io/channel-file.c | 69 +++++ io/channel.c | 128 ++++++++++ migration/file.c | 191 +++++++++++++- migration/file.h | 5 + migration/migration-hmp-cmds.c | 11 + migration/migration.c | 38 ++- migration/multifd-zlib.c | 22 +- migration/multifd-zstd.c | 22 +- migration/multifd.c | 376 ++++++++++++++++++++++------ migration/multifd.h | 30 ++- migration/options.c | 70 ++++++ migration/options.h | 4 + migration/qemu-file.c | 82 ++++++ migration/qemu-file.h | 7 +- migration/ram.c | 291 ++++++++++++++++++++- migration/ram.h | 1 + migration/savevm.c | 1 + monitor/fds.c | 27 +- qapi/migration.json | 24 +- tests/qtest/migration-helpers.c | 42 ++++ tests/qtest/migration-helpers.h | 1 + tests/qtest/migration-test.c | 206 +++++++++++++++ util/osdep.c | 9 + 29 files changed, 1686 insertions(+), 148 deletions(-) -- 2.35.3
On Mon, Nov 27, 2023 at 05:25:42PM -0300, Fabiano Rosas wrote: > Hi, > > In this v3: > > Added support for the "file:/dev/fdset/" syntax to receive multiple > file descriptors. This allows the management layer to open the > migration file beforehand and pass the file descriptors to QEMU. We > need more than one fd to be able to use O_DIRECT concurrently with > unaligned writes. > > Dropped the auto-pause capability. That discussion was kind of > stuck. We can revisit optimizations for non-live scenarios once the > series is more mature/merged. > > Changed the multifd incoming side to use a more generic data structure > instead of MultiFDPages_t. This allows multifd to restore the ram > using larger chunks. > > The rest are minor changes, I have noted them in the patches > themselves. Fabiano, Could you always keep a section around in the cover letter (and also in the upcoming doc file fixed-ram.rst) on the benefits of this feature? Please bare with me - I can start to ask silly questions. I thought it was about "keeping the snapshot file small". But then when I was thinking the use case, iiuc fixed-ram migration should always suggest the user to stop the VM first before migration starts, then if the VM is stopped the ultimate image shouldn't be large either. Or is it about performance only? Where did I miss? Thanks, -- Peter Xu
Peter Xu <peterx@redhat.com> writes: > On Mon, Nov 27, 2023 at 05:25:42PM -0300, Fabiano Rosas wrote: >> Hi, >> >> In this v3: >> >> Added support for the "file:/dev/fdset/" syntax to receive multiple >> file descriptors. This allows the management layer to open the >> migration file beforehand and pass the file descriptors to QEMU. We >> need more than one fd to be able to use O_DIRECT concurrently with >> unaligned writes. >> >> Dropped the auto-pause capability. That discussion was kind of >> stuck. We can revisit optimizations for non-live scenarios once the >> series is more mature/merged. >> >> Changed the multifd incoming side to use a more generic data structure >> instead of MultiFDPages_t. This allows multifd to restore the ram >> using larger chunks. >> >> The rest are minor changes, I have noted them in the patches >> themselves. > > Fabiano, > > Could you always keep a section around in the cover letter (and also in the > upcoming doc file fixed-ram.rst) on the benefits of this feature? > > Please bare with me - I can start to ask silly questions. > That's fine. Ask away! > I thought it was about "keeping the snapshot file small". But then when I > was thinking the use case, iiuc fixed-ram migration should always suggest > the user to stop the VM first before migration starts, then if the VM is > stopped the ultimate image shouldn't be large either. > > Or is it about performance only? Where did I miss? Performance is the main benefit because fixed-ram enables the use of multifd for file migration which would otherwise not be parallelizable. To use multifd has been the direction for a while as you know, so it makes sense. A fast file migration is desirable because it could be used for snapshots with a stopped vm and also to replace the "exec:cat" hack (this last one I found out about recently, Juan mentioned it in this thread: https://lore.kernel.org/r/87cyx5ty26.fsf@secure.mitica). The size aspect is just an interesting property, not necessarily a reason. It's about having the file bounded to the RAM size. So a running guest would not produce a continuously growing file. This is in contrast with previous experiments (libvirt code) in using a proxy to put multifd-produced data into a file. I'll add this^ information in a more organized matter to the docs and cover letter. Let me know what else I need to clarify. Some notes about fixed-ram by itself: This series also enables fixed-ram without multifd, which would only take benefit of the size property. That is not part of our end goal which is to have multifd + fixed-ram, but I kept it nonetheless because it helps to debug/reason about the fixed-ram format without conflating matters with multifd. Fixed-ram without multifd also allows the file migration to take benefit of direct io because the data portion of the file (pages) will be written with alignment. This version of the series does not yet support it, but I have a simple patch for the next version. I also had a - perhaps naive - idea that we could merge the io code + fixed-ram first, to expedite things and later bring in the multifd and directio enhancements, but the review process ended up not being that modular.
On Thu, Jan 11, 2024 at 03:38:31PM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > On Mon, Nov 27, 2023 at 05:25:42PM -0300, Fabiano Rosas wrote: > >> Hi, > >> > >> In this v3: > >> > >> Added support for the "file:/dev/fdset/" syntax to receive multiple > >> file descriptors. This allows the management layer to open the > >> migration file beforehand and pass the file descriptors to QEMU. We > >> need more than one fd to be able to use O_DIRECT concurrently with > >> unaligned writes. > >> > >> Dropped the auto-pause capability. That discussion was kind of > >> stuck. We can revisit optimizations for non-live scenarios once the > >> series is more mature/merged. > >> > >> Changed the multifd incoming side to use a more generic data structure > >> instead of MultiFDPages_t. This allows multifd to restore the ram > >> using larger chunks. > >> > >> The rest are minor changes, I have noted them in the patches > >> themselves. > > > > Fabiano, > > > > Could you always keep a section around in the cover letter (and also in the > > upcoming doc file fixed-ram.rst) on the benefits of this feature? > > > > Please bare with me - I can start to ask silly questions. > > > > That's fine. Ask away! > > > I thought it was about "keeping the snapshot file small". But then when I > > was thinking the use case, iiuc fixed-ram migration should always suggest > > the user to stop the VM first before migration starts, then if the VM is > > stopped the ultimate image shouldn't be large either. > > > > Or is it about performance only? Where did I miss? > > Performance is the main benefit because fixed-ram enables the use of > multifd for file migration which would otherwise not be > parallelizable. To use multifd has been the direction for a while as you > know, so it makes sense. > > A fast file migration is desirable because it could be used for > snapshots with a stopped vm and also to replace the "exec:cat" hack > (this last one I found out about recently, Juan mentioned it in this > thread: https://lore.kernel.org/r/87cyx5ty26.fsf@secure.mitica). I digged again the history, and started to remember the "live" migration case for fixed-ram. IIUC that is what Dan mentioned in below email regarding to the "virDomainSnapshotXXX" use case: https://lore.kernel.org/all/ZD7MRGQ+4QsDBtKR@redhat.com/ So IIUC "stopped VM" is not always the use case? If you agree with this, we need to document these two use cases clearly in the doc update: - "Migrate a VM to file, then destroy the VM" It should be suggested to stop the VM first before triggering such migration in this use case in the documents. - "Take a live snapshot of the VM" It'll be ideal if there is a portable interface to synchronously track dirtying of guest pages, but we don't... So fixed-ram seems to be the solution for such a portable solution for taking live snapshot across-platforms as long as async dirty tracking is still supported on that OS (aka KVM_GET_DIRTY_LOG). If async tracking is not supported, snapshot cannot be done live on the OS then, and one needs to use "snapshot-save". For this one, IMHO it would be good to mention (from QEMU perspective) the existance of background-snapshot even though libvirt didn't support it for some reason. Currently background-snapshot lacks multi-thread feature (nor O_DIRECT), though, so it may be less performant than fixed-ram. However if with all features there I believe that's even more performant. Please consider mention to a degree of detail on this. > > The size aspect is just an interesting property, not necessarily a > reason. See above on the 2nd "live" use case of fixed-ram. I think in that case, size is still a matter, then, because that one cannot stop the VM vcpus. > It's about having the file bounded to the RAM size. So a running > guest would not produce a continuously growing file. This is in contrast > with previous experiments (libvirt code) in using a proxy to put > multifd-produced data into a file. > > I'll add this^ information in a more organized matter to the docs and > cover letter. Let me know what else I need to clarify. Thanks. > > Some notes about fixed-ram by itself: > > This series also enables fixed-ram without multifd, which would only > take benefit of the size property. That is not part of our end goal > which is to have multifd + fixed-ram, but I kept it nonetheless because > it helps to debug/reason about the fixed-ram format without conflating > matters with multifd. Yes, makes sense. > > Fixed-ram without multifd also allows the file migration to take benefit > of direct io because the data portion of the file (pages) will be > written with alignment. This version of the series does not yet support > it, but I have a simple patch for the next version. > > I also had a - perhaps naive - idea that we could merge the io code + > fixed-ram first, to expedite things and later bring in the multifd and > directio enhancements, but the review process ended up not being that > modular. What's the review process issue you're talking about? If you can split the series that'll help merging for sure to me. IIRC there's complexity on passing the o-direct fds around, and not sure whether that chunk can be put at the last, similarly to split the multifd bits. One thing I just noticed is fixed-ram seems to be always preferred for "file:" migrations. Then can we already imply fixed-ram for "file" URIs? I'm even thinking whether we can make it the default and drop the fixed-ram capability: fixed-ram won't work besides file, and file won't make sense if not using offsets / fixed-ram. There's at least one problem, where we have released 8.2 with "file:", so it means it could break users already using "file:" there. I'm wondering whether that'll be worthwhile considering if we can drop the (seems redundant..) capability. What do you think? -- Peter Xu
Peter Xu <peterx@redhat.com> writes: > On Thu, Jan 11, 2024 at 03:38:31PM -0300, Fabiano Rosas wrote: >> Peter Xu <peterx@redhat.com> writes: >> >> > On Mon, Nov 27, 2023 at 05:25:42PM -0300, Fabiano Rosas wrote: >> >> Hi, >> >> >> >> In this v3: >> >> >> >> Added support for the "file:/dev/fdset/" syntax to receive multiple >> >> file descriptors. This allows the management layer to open the >> >> migration file beforehand and pass the file descriptors to QEMU. We >> >> need more than one fd to be able to use O_DIRECT concurrently with >> >> unaligned writes. >> >> >> >> Dropped the auto-pause capability. That discussion was kind of >> >> stuck. We can revisit optimizations for non-live scenarios once the >> >> series is more mature/merged. >> >> >> >> Changed the multifd incoming side to use a more generic data structure >> >> instead of MultiFDPages_t. This allows multifd to restore the ram >> >> using larger chunks. >> >> >> >> The rest are minor changes, I have noted them in the patches >> >> themselves. >> > >> > Fabiano, >> > >> > Could you always keep a section around in the cover letter (and also in the >> > upcoming doc file fixed-ram.rst) on the benefits of this feature? >> > >> > Please bare with me - I can start to ask silly questions. >> > >> >> That's fine. Ask away! >> >> > I thought it was about "keeping the snapshot file small". But then when I >> > was thinking the use case, iiuc fixed-ram migration should always suggest >> > the user to stop the VM first before migration starts, then if the VM is >> > stopped the ultimate image shouldn't be large either. >> > >> > Or is it about performance only? Where did I miss? >> >> Performance is the main benefit because fixed-ram enables the use of >> multifd for file migration which would otherwise not be >> parallelizable. To use multifd has been the direction for a while as you >> know, so it makes sense. >> >> A fast file migration is desirable because it could be used for >> snapshots with a stopped vm and also to replace the "exec:cat" hack >> (this last one I found out about recently, Juan mentioned it in this >> thread: https://lore.kernel.org/r/87cyx5ty26.fsf@secure.mitica). > > I digged again the history, and started to remember the "live" migration > case for fixed-ram. IIUC that is what Dan mentioned in below email > regarding to the "virDomainSnapshotXXX" use case: > > https://lore.kernel.org/all/ZD7MRGQ+4QsDBtKR@redhat.com/ > > So IIUC "stopped VM" is not always the use case? > > If you agree with this, we need to document these two use cases clearly in > the doc update: > > - "Migrate a VM to file, then destroy the VM" > > It should be suggested to stop the VM first before triggering such > migration in this use case in the documents. > > - "Take a live snapshot of the VM" > > It'll be ideal if there is a portable interface to synchronously track > dirtying of guest pages, but we don't... > > So fixed-ram seems to be the solution for such a portable solution for > taking live snapshot across-platforms as long as async dirty tracking > is still supported on that OS (aka KVM_GET_DIRTY_LOG). If async > tracking is not supported, snapshot cannot be done live on the OS then, > and one needs to use "snapshot-save". > > For this one, IMHO it would be good to mention (from QEMU perspective) > the existance of background-snapshot even though libvirt didn't support > it for some reason. Currently background-snapshot lacks multi-thread > feature (nor O_DIRECT), though, so it may be less performant than > fixed-ram. However if with all features there I believe that's even > more performant. Please consider mention to a degree of detail on > this. > I'll include these in some form in the docs update. >> >> The size aspect is just an interesting property, not necessarily a >> reason. > > See above on the 2nd "live" use case of fixed-ram. I think in that case, > size is still a matter, then, because that one cannot stop the VM vcpus. > >> It's about having the file bounded to the RAM size. So a running >> guest would not produce a continuously growing file. This is in contrast >> with previous experiments (libvirt code) in using a proxy to put >> multifd-produced data into a file. >> >> I'll add this^ information in a more organized matter to the docs and >> cover letter. Let me know what else I need to clarify. > > Thanks. > >> >> Some notes about fixed-ram by itself: >> >> This series also enables fixed-ram without multifd, which would only >> take benefit of the size property. That is not part of our end goal >> which is to have multifd + fixed-ram, but I kept it nonetheless because >> it helps to debug/reason about the fixed-ram format without conflating >> matters with multifd. > > Yes, makes sense. > >> >> Fixed-ram without multifd also allows the file migration to take benefit >> of direct io because the data portion of the file (pages) will be >> written with alignment. This version of the series does not yet support >> it, but I have a simple patch for the next version. >> >> I also had a - perhaps naive - idea that we could merge the io code + >> fixed-ram first, to expedite things and later bring in the multifd and >> directio enhancements, but the review process ended up not being that >> modular. > > What's the review process issue you're talking about? No issue per-se. I'm just mentioning that I split the series in a certain way and no one seemed to notice. =) Basically everything up until patch 10/30 is one chunk that is mostly separate from multifd support (patches 11-22/30) and direct-io + fdset (32-30/30). > > If you can split the series that'll help merging for sure to me. IIRC > there's complexity on passing the o-direct fds around, and not sure whether > that chunk can be put at the last, similarly to split the multifd bits. > The logical sequence for merging in my view would be: 1 - file: support - Steven already did that 2 - file: + fixed-ram 2a- file: + fixed-ram + direct-io (optional, I will send a patch in v4) 3 - file: + fixed-ram + multifd 4 - file: + fixed-ram + multifd + direct-io (here we get the full perf. benefits) 5 - file:/dev/fdset + fixed-ram + multifd + direct-io (here we can go enable libvirt support) > One thing I just noticed is fixed-ram seems to be always preferred for > "file:" migrations. Then can we already imply fixed-ram for "file" URIs? > The file URI alone is good to replace the exec:cat trick. We'll need it once we deprecate exec: to be able to do debugging of the stream. > I'm even thinking whether we can make it the default and drop the fixed-ram > capability: fixed-ram won't work besides file, and file won't make sense if > not using offsets / fixed-ram. There's at least one problem, where we have > released 8.2 with "file:", so it means it could break users already using > "file:" there. I'm wondering whether that'll be worthwhile considering if > we can drop the (seems redundant..) capability. What do you think?
On Mon, Jan 15, 2024 at 04:45:15PM -0300, Fabiano Rosas wrote: > Peter Xu <peterx@redhat.com> writes: > > > On Thu, Jan 11, 2024 at 03:38:31PM -0300, Fabiano Rosas wrote: > >> Peter Xu <peterx@redhat.com> writes: > >> > >> > On Mon, Nov 27, 2023 at 05:25:42PM -0300, Fabiano Rosas wrote: > >> >> Hi, > >> >> > >> >> In this v3: > >> >> > >> >> Added support for the "file:/dev/fdset/" syntax to receive multiple > >> >> file descriptors. This allows the management layer to open the > >> >> migration file beforehand and pass the file descriptors to QEMU. We > >> >> need more than one fd to be able to use O_DIRECT concurrently with > >> >> unaligned writes. > >> >> > >> >> Dropped the auto-pause capability. That discussion was kind of > >> >> stuck. We can revisit optimizations for non-live scenarios once the > >> >> series is more mature/merged. > >> >> > >> >> Changed the multifd incoming side to use a more generic data structure > >> >> instead of MultiFDPages_t. This allows multifd to restore the ram > >> >> using larger chunks. > >> >> > >> >> The rest are minor changes, I have noted them in the patches > >> >> themselves. > >> > > >> > Fabiano, > >> > > >> > Could you always keep a section around in the cover letter (and also in the > >> > upcoming doc file fixed-ram.rst) on the benefits of this feature? > >> > > >> > Please bare with me - I can start to ask silly questions. > >> > > >> > >> That's fine. Ask away! > >> > >> > I thought it was about "keeping the snapshot file small". But then when I > >> > was thinking the use case, iiuc fixed-ram migration should always suggest > >> > the user to stop the VM first before migration starts, then if the VM is > >> > stopped the ultimate image shouldn't be large either. > >> > > >> > Or is it about performance only? Where did I miss? > >> > >> Performance is the main benefit because fixed-ram enables the use of > >> multifd for file migration which would otherwise not be > >> parallelizable. To use multifd has been the direction for a while as you > >> know, so it makes sense. > >> > >> A fast file migration is desirable because it could be used for > >> snapshots with a stopped vm and also to replace the "exec:cat" hack > >> (this last one I found out about recently, Juan mentioned it in this > >> thread: https://lore.kernel.org/r/87cyx5ty26.fsf@secure.mitica). > > > > I digged again the history, and started to remember the "live" migration > > case for fixed-ram. IIUC that is what Dan mentioned in below email > > regarding to the "virDomainSnapshotXXX" use case: > > > > https://lore.kernel.org/all/ZD7MRGQ+4QsDBtKR@redhat.com/ > > > > So IIUC "stopped VM" is not always the use case? > > > > If you agree with this, we need to document these two use cases clearly in > > the doc update: > > > > - "Migrate a VM to file, then destroy the VM" > > > > It should be suggested to stop the VM first before triggering such > > migration in this use case in the documents. > > > > - "Take a live snapshot of the VM" > > > > It'll be ideal if there is a portable interface to synchronously track > > dirtying of guest pages, but we don't... > > > > So fixed-ram seems to be the solution for such a portable solution for > > taking live snapshot across-platforms as long as async dirty tracking > > is still supported on that OS (aka KVM_GET_DIRTY_LOG). If async > > tracking is not supported, snapshot cannot be done live on the OS then, > > and one needs to use "snapshot-save". > > > > For this one, IMHO it would be good to mention (from QEMU perspective) > > the existance of background-snapshot even though libvirt didn't support > > it for some reason. Currently background-snapshot lacks multi-thread > > feature (nor O_DIRECT), though, so it may be less performant than > > fixed-ram. However if with all features there I believe that's even > > more performant. Please consider mention to a degree of detail on > > this. > > > > I'll include these in some form in the docs update. Thanks. Fixed-ram should also need a separate file after the doc series applied. I'll try to prepare a pull this week so both fixed-ram and cpr should hopefully have place to hold its own file under docs/devel/migration/. PS: just in case it didn't land as soon, feel free to fetch migration-next of my github.com/peterx/qemu repo; I only put things there if they at least pass one round of CI, so the content should be relatively stable even though not fully guranteed. > > >> > >> The size aspect is just an interesting property, not necessarily a > >> reason. > > > > See above on the 2nd "live" use case of fixed-ram. I think in that case, > > size is still a matter, then, because that one cannot stop the VM vcpus. > > > >> It's about having the file bounded to the RAM size. So a running > >> guest would not produce a continuously growing file. This is in contrast > >> with previous experiments (libvirt code) in using a proxy to put > >> multifd-produced data into a file. > >> > >> I'll add this^ information in a more organized matter to the docs and > >> cover letter. Let me know what else I need to clarify. > > > > Thanks. > > > >> > >> Some notes about fixed-ram by itself: > >> > >> This series also enables fixed-ram without multifd, which would only > >> take benefit of the size property. That is not part of our end goal > >> which is to have multifd + fixed-ram, but I kept it nonetheless because > >> it helps to debug/reason about the fixed-ram format without conflating > >> matters with multifd. > > > > Yes, makes sense. > > > >> > >> Fixed-ram without multifd also allows the file migration to take benefit > >> of direct io because the data portion of the file (pages) will be > >> written with alignment. This version of the series does not yet support > >> it, but I have a simple patch for the next version. > >> > >> I also had a - perhaps naive - idea that we could merge the io code + > >> fixed-ram first, to expedite things and later bring in the multifd and > >> directio enhancements, but the review process ended up not being that > >> modular. > > > > What's the review process issue you're talking about? > > No issue per-se. I'm just mentioning that I split the series in a > certain way and no one seemed to notice. =) Oh :) > > Basically everything up until patch 10/30 is one chunk that is mostly > separate from multifd support (patches 11-22/30) and direct-io + fdset > (32-30/30). You can describe these in the cover letter. Personally I can always merge initial M patches when they're ready out of N; there'll be quite a few iochannel ones though in the first batch, so I'll check with Dan when the 1st chunk in ready stage on how it should go in. > > > > > If you can split the series that'll help merging for sure to me. IIRC > > there's complexity on passing the o-direct fds around, and not sure whether > > that chunk can be put at the last, similarly to split the multifd bits. > > > > The logical sequence for merging in my view would be: > > 1 - file: support - Steven already did that > 2 - file: + fixed-ram > 2a- file: + fixed-ram + direct-io (optional, I will send a patch in v4) > 3 - file: + fixed-ram + multifd > 4 - file: + fixed-ram + multifd + direct-io (here we get the full perf. benefits) > 5 - file:/dev/fdset + fixed-ram + multifd + direct-io (here we can go > enable libvirt support) Sounds good. Such planning is IMHO fine to be put into TODO section of devel/migration/fixed-ram.rst if you want, especially you already plan to post separate series. So you can start with the .rst file with the whole design all in; we can merge it first. You remove todos along with patchset goes in. Your call on how to do it. > > > One thing I just noticed is fixed-ram seems to be always preferred for > > "file:" migrations. Then can we already imply fixed-ram for "file" URIs? > > > > The file URI alone is good to replace the exec:cat trick. We'll need it > once we deprecate exec: to be able to do debugging of the stream. I didn't follow up much on Juan's previous plan to deprecate exec. But yeah anyway let's start with that cap. > > > I'm even thinking whether we can make it the default and drop the fixed-ram > > capability: fixed-ram won't work besides file, and file won't make sense if > > not using offsets / fixed-ram. There's at least one problem, where we have > > released 8.2 with "file:", so it means it could break users already using > > "file:" there. I'm wondering whether that'll be worthwhile considering if > > we can drop the (seems redundant..) capability. What do you think? > -- Peter Xu
On Mon, Jan 15, 2024 at 02:22:47PM +0800, Peter Xu wrote: > On Thu, Jan 11, 2024 at 03:38:31PM -0300, Fabiano Rosas wrote: > > Peter Xu <peterx@redhat.com> writes: > > > > > On Mon, Nov 27, 2023 at 05:25:42PM -0300, Fabiano Rosas wrote: > > >> Hi, > > >> > > >> In this v3: > > >> > > >> Added support for the "file:/dev/fdset/" syntax to receive multiple > > >> file descriptors. This allows the management layer to open the > > >> migration file beforehand and pass the file descriptors to QEMU. We > > >> need more than one fd to be able to use O_DIRECT concurrently with > > >> unaligned writes. > > >> > > >> Dropped the auto-pause capability. That discussion was kind of > > >> stuck. We can revisit optimizations for non-live scenarios once the > > >> series is more mature/merged. > > >> > > >> Changed the multifd incoming side to use a more generic data structure > > >> instead of MultiFDPages_t. This allows multifd to restore the ram > > >> using larger chunks. > > >> > > >> The rest are minor changes, I have noted them in the patches > > >> themselves. > > > > > > Fabiano, > > > > > > Could you always keep a section around in the cover letter (and also in the > > > upcoming doc file fixed-ram.rst) on the benefits of this feature? > > > > > > Please bare with me - I can start to ask silly questions. > > > > > > > That's fine. Ask away! > > > > > I thought it was about "keeping the snapshot file small". But then when I > > > was thinking the use case, iiuc fixed-ram migration should always suggest > > > the user to stop the VM first before migration starts, then if the VM is > > > stopped the ultimate image shouldn't be large either. > > > > > > Or is it about performance only? Where did I miss? > > > > Performance is the main benefit because fixed-ram enables the use of > > multifd for file migration which would otherwise not be > > parallelizable. To use multifd has been the direction for a while as you > > know, so it makes sense. > > > > A fast file migration is desirable because it could be used for > > snapshots with a stopped vm and also to replace the "exec:cat" hack > > (this last one I found out about recently, Juan mentioned it in this > > thread: https://lore.kernel.org/r/87cyx5ty26.fsf@secure.mitica). > > I digged again the history, and started to remember the "live" migration > case for fixed-ram. IIUC that is what Dan mentioned in below email > regarding to the "virDomainSnapshotXXX" use case: > > https://lore.kernel.org/all/ZD7MRGQ+4QsDBtKR@redhat.com/ > > So IIUC "stopped VM" is not always the use case? > > If you agree with this, we need to document these two use cases clearly in > the doc update: > > - "Migrate a VM to file, then destroy the VM" > > It should be suggested to stop the VM first before triggering such > migration in this use case in the documents. > > - "Take a live snapshot of the VM" > > It'll be ideal if there is a portable interface to synchronously track > dirtying of guest pages, but we don't... > > So fixed-ram seems to be the solution for such a portable solution for > taking live snapshot across-platforms as long as async dirty tracking > is still supported on that OS (aka KVM_GET_DIRTY_LOG). If async > tracking is not supported, snapshot cannot be done live on the OS then, > and one needs to use "snapshot-save". > > For this one, IMHO it would be good to mention (from QEMU perspective) > the existance of background-snapshot even though libvirt didn't support > it for some reason. Currently background-snapshot lacks multi-thread > feature (nor O_DIRECT), though, so it may be less performant than > fixed-ram. However if with all features there I believe that's even > more performant. Please consider mention to a degree of detail on > this. > > > > > The size aspect is just an interesting property, not necessarily a > > reason. > > See above on the 2nd "live" use case of fixed-ram. I think in that case, > size is still a matter, then, because that one cannot stop the VM vcpus. > > > It's about having the file bounded to the RAM size. So a running > > guest would not produce a continuously growing file. This is in contrast > > with previous experiments (libvirt code) in using a proxy to put > > multifd-produced data into a file. > > > > I'll add this^ information in a more organized matter to the docs and > > cover letter. Let me know what else I need to clarify. > > Thanks. > > > > > Some notes about fixed-ram by itself: > > > > This series also enables fixed-ram without multifd, which would only > > take benefit of the size property. That is not part of our end goal > > which is to have multifd + fixed-ram, but I kept it nonetheless because > > it helps to debug/reason about the fixed-ram format without conflating > > matters with multifd. > > Yes, makes sense. > > > > > Fixed-ram without multifd also allows the file migration to take benefit > > of direct io because the data portion of the file (pages) will be > > written with alignment. This version of the series does not yet support > > it, but I have a simple patch for the next version. > > > > I also had a - perhaps naive - idea that we could merge the io code + > > fixed-ram first, to expedite things and later bring in the multifd and > > directio enhancements, but the review process ended up not being that > > modular. > > What's the review process issue you're talking about? > > If you can split the series that'll help merging for sure to me. IIRC > there's complexity on passing the o-direct fds around, and not sure whether > that chunk can be put at the last, similarly to split the multifd bits. > > One thing I just noticed is fixed-ram seems to be always preferred for > "file:" migrations. Then can we already imply fixed-ram for "file" URIs? > > I'm even thinking whether we can make it the default and drop the fixed-ram > capability: fixed-ram won't work besides file, and file won't make sense if > not using offsets / fixed-ram. There's at least one problem, where we have > released 8.2 with "file:", so it means it could break users already using > "file:" there. I'm wondering whether that'll be worthwhile considering if > we can drop the (seems redundant..) capability. What do you think? The 'fd' protocol should support 'fixed-ram' too if passed a seekable FD. The 'file' protocol should be able to create save images compatible with older QEMU too IMHO. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On Mon, Jan 15, 2024 at 08:11:40AM +0000, Daniel P. Berrangé wrote: > On Mon, Jan 15, 2024 at 02:22:47PM +0800, Peter Xu wrote: > > On Thu, Jan 11, 2024 at 03:38:31PM -0300, Fabiano Rosas wrote: > > > Peter Xu <peterx@redhat.com> writes: > > > > > > > On Mon, Nov 27, 2023 at 05:25:42PM -0300, Fabiano Rosas wrote: > > > >> Hi, > > > >> > > > >> In this v3: > > > >> > > > >> Added support for the "file:/dev/fdset/" syntax to receive multiple > > > >> file descriptors. This allows the management layer to open the > > > >> migration file beforehand and pass the file descriptors to QEMU. We > > > >> need more than one fd to be able to use O_DIRECT concurrently with > > > >> unaligned writes. > > > >> > > > >> Dropped the auto-pause capability. That discussion was kind of > > > >> stuck. We can revisit optimizations for non-live scenarios once the > > > >> series is more mature/merged. > > > >> > > > >> Changed the multifd incoming side to use a more generic data structure > > > >> instead of MultiFDPages_t. This allows multifd to restore the ram > > > >> using larger chunks. > > > >> > > > >> The rest are minor changes, I have noted them in the patches > > > >> themselves. > > > > > > > > Fabiano, > > > > > > > > Could you always keep a section around in the cover letter (and also in the > > > > upcoming doc file fixed-ram.rst) on the benefits of this feature? > > > > > > > > Please bare with me - I can start to ask silly questions. > > > > > > > > > > That's fine. Ask away! > > > > > > > I thought it was about "keeping the snapshot file small". But then when I > > > > was thinking the use case, iiuc fixed-ram migration should always suggest > > > > the user to stop the VM first before migration starts, then if the VM is > > > > stopped the ultimate image shouldn't be large either. > > > > > > > > Or is it about performance only? Where did I miss? > > > > > > Performance is the main benefit because fixed-ram enables the use of > > > multifd for file migration which would otherwise not be > > > parallelizable. To use multifd has been the direction for a while as you > > > know, so it makes sense. > > > > > > A fast file migration is desirable because it could be used for > > > snapshots with a stopped vm and also to replace the "exec:cat" hack > > > (this last one I found out about recently, Juan mentioned it in this > > > thread: https://lore.kernel.org/r/87cyx5ty26.fsf@secure.mitica). > > > > I digged again the history, and started to remember the "live" migration > > case for fixed-ram. IIUC that is what Dan mentioned in below email > > regarding to the "virDomainSnapshotXXX" use case: > > > > https://lore.kernel.org/all/ZD7MRGQ+4QsDBtKR@redhat.com/ > > > > So IIUC "stopped VM" is not always the use case? > > > > If you agree with this, we need to document these two use cases clearly in > > the doc update: > > > > - "Migrate a VM to file, then destroy the VM" > > > > It should be suggested to stop the VM first before triggering such > > migration in this use case in the documents. > > > > - "Take a live snapshot of the VM" > > > > It'll be ideal if there is a portable interface to synchronously track > > dirtying of guest pages, but we don't... > > > > So fixed-ram seems to be the solution for such a portable solution for > > taking live snapshot across-platforms as long as async dirty tracking > > is still supported on that OS (aka KVM_GET_DIRTY_LOG). If async > > tracking is not supported, snapshot cannot be done live on the OS then, > > and one needs to use "snapshot-save". > > > > For this one, IMHO it would be good to mention (from QEMU perspective) > > the existance of background-snapshot even though libvirt didn't support > > it for some reason. Currently background-snapshot lacks multi-thread > > feature (nor O_DIRECT), though, so it may be less performant than > > fixed-ram. However if with all features there I believe that's even > > more performant. Please consider mention to a degree of detail on > > this. > > > > > > > > The size aspect is just an interesting property, not necessarily a > > > reason. > > > > See above on the 2nd "live" use case of fixed-ram. I think in that case, > > size is still a matter, then, because that one cannot stop the VM vcpus. > > > > > It's about having the file bounded to the RAM size. So a running > > > guest would not produce a continuously growing file. This is in contrast > > > with previous experiments (libvirt code) in using a proxy to put > > > multifd-produced data into a file. > > > > > > I'll add this^ information in a more organized matter to the docs and > > > cover letter. Let me know what else I need to clarify. > > > > Thanks. > > > > > > > > Some notes about fixed-ram by itself: > > > > > > This series also enables fixed-ram without multifd, which would only > > > take benefit of the size property. That is not part of our end goal > > > which is to have multifd + fixed-ram, but I kept it nonetheless because > > > it helps to debug/reason about the fixed-ram format without conflating > > > matters with multifd. > > > > Yes, makes sense. > > > > > > > > Fixed-ram without multifd also allows the file migration to take benefit > > > of direct io because the data portion of the file (pages) will be > > > written with alignment. This version of the series does not yet support > > > it, but I have a simple patch for the next version. > > > > > > I also had a - perhaps naive - idea that we could merge the io code + > > > fixed-ram first, to expedite things and later bring in the multifd and > > > directio enhancements, but the review process ended up not being that > > > modular. > > > > What's the review process issue you're talking about? > > > > If you can split the series that'll help merging for sure to me. IIRC > > there's complexity on passing the o-direct fds around, and not sure whether > > that chunk can be put at the last, similarly to split the multifd bits. > > > > One thing I just noticed is fixed-ram seems to be always preferred for > > "file:" migrations. Then can we already imply fixed-ram for "file" URIs? > > > > I'm even thinking whether we can make it the default and drop the fixed-ram > > capability: fixed-ram won't work besides file, and file won't make sense if > > not using offsets / fixed-ram. There's at least one problem, where we have > > released 8.2 with "file:", so it means it could break users already using > > "file:" there. I'm wondering whether that'll be worthwhile considering if > > we can drop the (seems redundant..) capability. What do you think? > > The 'fd' protocol should support 'fixed-ram' too if passed a seekable > FD. Ah ok, then the cap is still needed. > > The 'file' protocol should be able to create save images compatible with > older QEMU too IMHO. This is less of a concern, IMHO, but indeed if we have the cap anyway then it makes sense to do so. Thanks, -- Peter Xu
© 2016 - 2024 Red Hat, Inc.