docs/devel/migration/CPR.rst | 17 ++++++++-- system/physmem.c | 17 ++++++++-- tests/qtest/migration/cpr-tests.c | 56 ++++++++++++++++++++++++++++++- 3 files changed, 84 insertions(+), 6 deletions(-)
Hi, This series improves a CPR workflow for large guests where guest RAM is shared/external and preserved in place. With shared RAM and x-ignore-shared enabled, the migration stream skips guest RAM pages and transfers only non-RAM VM state (vmstate). This avoids copying guest RAM to the on-disk migration URI, which can significantly reduce checkpoint/restore downtime for multi-GB guests. In the LUO/KHO update flow [1], a LUO agent coordinates a host kexec reboot while keeping VM RAM content intact. LUO creates the guest RAM backing as a memfd and passes it to QEMU via -add-fd on the initial launch, so that memory-backend-file can use it as the shared RAM backing. On update, LUO checkpoints QEMU, reboots the host kernel via kexec, and then re-launches QEMU to restore vmstate while reusing the same preserved memfd-backed RAM FD. Today LUO only supports handing off guest RAM via memfd [2]. To re-attach that preserved RAM backing without reopening non-persistent paths, QEMU needs to let memory-backend-file consume the pre-opened FD using mem-path=/dev/fdset/<id>. However, memory-backend-file currently uses open()/creat() directly, so /dev/fdset/<id> cannot be resolved through the fdset mechanism, making this workflow impossible. This series allows /dev/fdset/<id> for file-backed RAM, documents the setup, and adds qtests to validate that x-ignore-shared keeps RAM transfer minimal in the cpr-reboot path. [1]: https://docs.kernel.org/next/core-api/liveupdate.html [2]: https://docs.kernel.org/mm/memfd_preservation.html Li Chen (3): system/physmem: allow /dev/fdset for file-backed RAM docs: CPR: document shared RAM with x-ignore-shared tests/qtest: cpr-reboot: check ignore-shared transfer docs/devel/migration/CPR.rst | 17 ++++++++-- system/physmem.c | 17 ++++++++-- tests/qtest/migration/cpr-tests.c | 56 ++++++++++++++++++++++++++++++- 3 files changed, 84 insertions(+), 6 deletions(-) -- 2.52.0
Hi Li, Thanks for the series. I have a few comments regarding the interaction with the LUO agent and the broader architecture. > In the LUO/KHO update flow [1], a LUO agent coordinates a host kexec reboot > while keeping VM RAM content intact. LUO creates the guest RAM backing as a > memfd and passes it to QEMU via -add-fd on the initial launch, so that I am currently working on the LUO agent (LUOD) documentation, and the intended behavior differs slightly from this description. Rather than the agent simply passing FDs, the design involves LUOD opening /dev/liveupdate and creating a UDS socket (e.g., at /run/luod/liveupdate.sock). Live-update-capable clients connect to this socket, create a LUO session, and preserve their resources into that session. Once all clients notify LUOD that they are ready, LUOD performs the kexec reboot while keeping /dev/liveupdate open, ensuring the LUO sessions survive. After the reboot, the restarted clients reconnect to LUOD via UDS and request their preserved sessions. The sessions are passed as FDs back to the clients. From those sessions, the clients retrieve the resources, restore them, and resume suspended operations. Finally, the clients "finish" their sessions, fully reclaiming ownership of the preserved resources from LUO back to userspace. > memory-backend-file can use it as the shared RAM backing. On update, LUO > checkpoints QEMU, reboots the host kernel via kexec, and then re-launches QEMU > to restore vmstate while reusing the same preserved memfd-backed RAM FD. > Today LUO only supports handing off guest RAM via memfd [2]. > > To re-attach that preserved RAM backing without reopening non-persistent > paths, QEMU needs to let memory-backend-file consume the pre-opened FD using > mem-path=/dev/fdset/<id>. However, memory-backend-file currently uses > open()/creat() directly, so /dev/fdset/<id> cannot be resolved through the > fdset mechanism, making this workflow impossible. > > This series allows /dev/fdset/<id> for file-backed RAM, documents the setup, > and adds qtests to validate that x-ignore-shared keeps RAM transfer minimal > in the cpr-reboot path. Sessions become particularly relevant when we look beyond guest RAM. For complex resources like VFIO or IOMMU, simply passing the FD is insufficient. These resources require specific ioctls to be issued by clients after the FD is retrieved, but before vCPUs are resumed or the session is "finished". Pasha
Hi Pasha, Thanks for the detailed clarification, that's very helpful. ---- On Mon, 29 Dec 2025 23:50:35 +0800 Pasha Tatashin <pasha.tatashin@soleen.com> wrote --- > Hi Li, > > Thanks for the series. I have a few comments regarding the interaction > with the LUO agent and the broader architecture. > > > In the LUO/KHO update flow [1], a LUO agent coordinates a host kexec reboot > > while keeping VM RAM content intact. LUO creates the guest RAM backing as a > > memfd and passes it to QEMU via -add-fd on the initial launch, so that > > I am currently working on the LUO agent (LUOD) documentation, and the > intended behavior differs slightly from this description. > > Rather than the agent simply passing FDs, the design involves LUOD > opening /dev/liveupdate and creating a UDS socket (e.g., at > /run/luod/liveupdate.sock). Live-update-capable clients connect to > this socket, create a LUO session, and preserve their resources into > that session. Once all clients notify LUOD that they are ready, LUOD > performs the kexec reboot while keeping /dev/liveupdate open, ensuring > the LUO sessions survive. > > After the reboot, the restarted clients reconnect to LUOD via UDS and > request their preserved sessions. The sessions are passed as FDs back > to the clients. From those sessions, the clients retrieve the > resources, restore them, and resume suspended operations. Finally, the > clients "finish" their sessions, fully reclaiming ownership of the > preserved resources from LUO back to userspace. Ack, and sorry for my cover letter text over-simplified the design. > > memory-backend-file can use it as the shared RAM backing. On update, LUO > > checkpoints QEMU, reboots the host kernel via kexec, and then re-launches QEMU > > to restore vmstate while reusing the same preserved memfd-backed RAM FD. > > Today LUO only supports handing off guest RAM via memfd [2]. > > > > To re-attach that preserved RAM backing without reopening non-persistent > > paths, QEMU needs to let memory-backend-file consume the pre-opened FD using > > mem-path=/dev/fdset/<id>. However, memory-backend-file currently uses > > open()/creat() directly, so /dev/fdset/<id> cannot be resolved through the > > fdset mechanism, making this workflow impossible. > > > > This series allows /dev/fdset/<id> for file-backed RAM, documents the setup, > > and adds qtests to validate that x-ignore-shared keeps RAM transfer minimal > > in the cpr-reboot path. > > Sessions become particularly relevant when we look beyond guest RAM. > For complex resources like VFIO or IOMMU, simply passing the FD is > insufficient. These resources require specific ioctls to be issued by > clients after the FD is retrieved, but before vCPUs are resumed or the > session is "finished". One question: in the intended LUOD/session architecture, do you still expect clients (or a wrapper) to inject the retrieved guest RAM memfd FD into QEMU via -add-fd + /dev/fdset/<id> at QEMU startup? (as this patchset does) Or is the longer-term plan that QEMU itself connects to LUOD over UDS, retrieves the session/resource FD directly, and consumes it as the RAM backing without going through -add-fd? If the latter, I think we'd likely need a QEMU-side interface to consume an inherited/pre-opened FD for RAM backing during early machine init. Regards, Li
On Tue, Dec 30, 2025 at 3:30 AM Li Chen <me@linux.beauty> wrote: > > Hi Pasha, > > Thanks for the detailed clarification, that's very helpful. > > ---- On Mon, 29 Dec 2025 23:50:35 +0800 Pasha Tatashin <pasha.tatashin@soleen.com> wrote --- > > Hi Li, > > > > Thanks for the series. I have a few comments regarding the interaction > > with the LUO agent and the broader architecture. > > > > > In the LUO/KHO update flow [1], a LUO agent coordinates a host kexec reboot > > > while keeping VM RAM content intact. LUO creates the guest RAM backing as a > > > memfd and passes it to QEMU via -add-fd on the initial launch, so that > > > > I am currently working on the LUO agent (LUOD) documentation, and the > > intended behavior differs slightly from this description. > > > > Rather than the agent simply passing FDs, the design involves LUOD > > opening /dev/liveupdate and creating a UDS socket (e.g., at > > /run/luod/liveupdate.sock). Live-update-capable clients connect to > > this socket, create a LUO session, and preserve their resources into > > that session. Once all clients notify LUOD that they are ready, LUOD > > performs the kexec reboot while keeping /dev/liveupdate open, ensuring > > the LUO sessions survive. > > > > After the reboot, the restarted clients reconnect to LUOD via UDS and > > request their preserved sessions. The sessions are passed as FDs back > > to the clients. From those sessions, the clients retrieve the > > resources, restore them, and resume suspended operations. Finally, the > > clients "finish" their sessions, fully reclaiming ownership of the > > preserved resources from LUO back to userspace. > > Ack, and sorry for my cover letter text over-simplified the design. > > > > memory-backend-file can use it as the shared RAM backing. On update, LUO > > > checkpoints QEMU, reboots the host kernel via kexec, and then re-launches QEMU > > > to restore vmstate while reusing the same preserved memfd-backed RAM FD. > > > Today LUO only supports handing off guest RAM via memfd [2]. > > > > > > To re-attach that preserved RAM backing without reopening non-persistent > > > paths, QEMU needs to let memory-backend-file consume the pre-opened FD using > > > mem-path=/dev/fdset/<id>. However, memory-backend-file currently uses > > > open()/creat() directly, so /dev/fdset/<id> cannot be resolved through the > > > fdset mechanism, making this workflow impossible. > > > > > > This series allows /dev/fdset/<id> for file-backed RAM, documents the setup, > > > and adds qtests to validate that x-ignore-shared keeps RAM transfer minimal > > > in the cpr-reboot path. > > > > Sessions become particularly relevant when we look beyond guest RAM. > > For complex resources like VFIO or IOMMU, simply passing the FD is > > insufficient. These resources require specific ioctls to be issued by > > clients after the FD is retrieved, but before vCPUs are resumed or the > > session is "finished". > > One question: in the intended LUOD/session architecture, do you still expect > clients (or a wrapper) to inject the retrieved guest RAM memfd FD into QEMU > via -add-fd + /dev/fdset/<id> at QEMU startup? (as this patchset does) > > Or is the longer-term plan that QEMU itself connects to LUOD over UDS, retrieves > the session/resource FD directly, and consumes it as the RAM backing without > going through -add-fd? The latter, I expect QEMU and other VMMs use their LUO session FDs that they receive from LUOD to retrieve and restore the preserved resources: memfd+iommufd+vfiofd and resume the suspended VMs. > > If the latter, I think we'd likely need a QEMU-side interface to > consume an inherited/pre-opened FD for RAM backing during early machine init. > > Regards, > > Li >
On Tue, Dec 30, 2025 at 11:42:09AM -0500, Pasha Tatashin wrote: > > One question: in the intended LUOD/session architecture, do you still expect > > clients (or a wrapper) to inject the retrieved guest RAM memfd FD into QEMU > > via -add-fd + /dev/fdset/<id> at QEMU startup? (as this patchset does) > > > > Or is the longer-term plan that QEMU itself connects to LUOD over UDS, retrieves > > the session/resource FD directly, and consumes it as the RAM backing without > > going through -add-fd? > > The latter, I expect QEMU and other VMMs use their LUO session FDs > that they receive from LUOD to retrieve and restore the preserved > resources: memfd+iommufd+vfiofd and resume the suspended VMs. Right, and we shouldn't have something like /dev/fset/ID at all. The VMM has to know it is doing a LUO restoration and squence restoration of all the stuff the preserving VMM put into LUO. The general case is far more complex than just opening a memfd. Jason
© 2016 - 2026 Red Hat, Inc.