[PATCH 0/3] CPR: shared RAM with /dev/fdset for LUO kexec reboot

Li Chen posted 3 patches 1 month, 1 week ago
Failed in applying to current master (apply log)
docs/devel/migration/CPR.rst      | 17 ++++++++--
system/physmem.c                  | 17 ++++++++--
tests/qtest/migration/cpr-tests.c | 56 ++++++++++++++++++++++++++++++-
3 files changed, 84 insertions(+), 6 deletions(-)
[PATCH 0/3] CPR: shared RAM with /dev/fdset for LUO kexec reboot
Posted by Li Chen 1 month, 1 week ago
Hi,

This series improves a CPR workflow for large guests where guest RAM is
shared/external and preserved in place. With shared RAM and x-ignore-shared
enabled, the migration stream skips guest RAM pages and transfers only
non-RAM VM state (vmstate). This avoids copying guest RAM to the on-disk
migration URI, which can significantly reduce checkpoint/restore downtime for
multi-GB guests.

In the LUO/KHO update flow [1], a LUO agent coordinates a host kexec reboot
while keeping VM RAM content intact. LUO creates the guest RAM backing as a
memfd and passes it to QEMU via -add-fd on the initial launch, so that
memory-backend-file can use it as the shared RAM backing. On update, LUO
checkpoints QEMU, reboots the host kernel via kexec, and then re-launches QEMU
to restore vmstate while reusing the same preserved memfd-backed RAM FD.
Today LUO only supports handing off guest RAM via memfd [2].

To re-attach that preserved RAM backing without reopening non-persistent
paths, QEMU needs to let memory-backend-file consume the pre-opened FD using
mem-path=/dev/fdset/<id>. However, memory-backend-file currently uses
open()/creat() directly, so /dev/fdset/<id> cannot be resolved through the
fdset mechanism, making this workflow impossible.

This series allows /dev/fdset/<id> for file-backed RAM, documents the setup,
and adds qtests to validate that x-ignore-shared keeps RAM transfer minimal
in the cpr-reboot path.

[1]: https://docs.kernel.org/next/core-api/liveupdate.html
[2]: https://docs.kernel.org/mm/memfd_preservation.html

Li Chen (3):
  system/physmem: allow /dev/fdset for file-backed RAM
  docs: CPR: document shared RAM with x-ignore-shared
  tests/qtest: cpr-reboot: check ignore-shared transfer

 docs/devel/migration/CPR.rst      | 17 ++++++++--
 system/physmem.c                  | 17 ++++++++--
 tests/qtest/migration/cpr-tests.c | 56 ++++++++++++++++++++++++++++++-
 3 files changed, 84 insertions(+), 6 deletions(-)

-- 
2.52.0
Re: [PATCH 0/3] CPR: shared RAM with /dev/fdset for LUO kexec reboot
Posted by Pasha Tatashin 1 month, 1 week ago
Hi Li,

Thanks for the series. I have a few comments regarding the interaction
with the LUO agent and the broader architecture.

> In the LUO/KHO update flow [1], a LUO agent coordinates a host kexec reboot
> while keeping VM RAM content intact. LUO creates the guest RAM backing as a
> memfd and passes it to QEMU via -add-fd on the initial launch, so that

I am currently working on the LUO agent (LUOD) documentation, and the
intended behavior differs slightly from this description.

Rather than the agent simply passing FDs, the design involves LUOD
opening /dev/liveupdate and creating a UDS socket (e.g., at
/run/luod/liveupdate.sock). Live-update-capable clients connect to
this socket, create a LUO session, and preserve their resources into
that session. Once all clients notify LUOD that they are ready, LUOD
performs the kexec reboot while keeping /dev/liveupdate open, ensuring
the LUO sessions survive.

After the reboot, the restarted clients reconnect to LUOD via UDS and
request their preserved sessions. The sessions are passed as FDs back
to the clients. From those sessions, the clients retrieve the
resources, restore them, and resume suspended operations. Finally, the
clients "finish" their sessions, fully reclaiming ownership of the
preserved resources from LUO back to userspace.

> memory-backend-file can use it as the shared RAM backing. On update, LUO
> checkpoints QEMU, reboots the host kernel via kexec, and then re-launches QEMU
> to restore vmstate while reusing the same preserved memfd-backed RAM FD.
> Today LUO only supports handing off guest RAM via memfd [2].
>
> To re-attach that preserved RAM backing without reopening non-persistent
> paths, QEMU needs to let memory-backend-file consume the pre-opened FD using
> mem-path=/dev/fdset/<id>. However, memory-backend-file currently uses
> open()/creat() directly, so /dev/fdset/<id> cannot be resolved through the
> fdset mechanism, making this workflow impossible.
>
> This series allows /dev/fdset/<id> for file-backed RAM, documents the setup,
> and adds qtests to validate that x-ignore-shared keeps RAM transfer minimal
> in the cpr-reboot path.

Sessions become particularly relevant when we look beyond guest RAM.
For complex resources like VFIO or IOMMU, simply passing the FD is
insufficient. These resources require specific ioctls to be issued by
clients after the FD is retrieved, but before vCPUs are resumed or the
session is "finished".

Pasha
Re: [PATCH 0/3] CPR: shared RAM with /dev/fdset for LUO kexec reboot
Posted by Li Chen 1 month, 1 week ago
Hi Pasha,

Thanks for the detailed clarification, that's very helpful.

 ---- On Mon, 29 Dec 2025 23:50:35 +0800  Pasha Tatashin <pasha.tatashin@soleen.com> wrote --- 
 > Hi Li,
 > 
 > Thanks for the series. I have a few comments regarding the interaction
 > with the LUO agent and the broader architecture.
 > 
 > > In the LUO/KHO update flow [1], a LUO agent coordinates a host kexec reboot
 > > while keeping VM RAM content intact. LUO creates the guest RAM backing as a
 > > memfd and passes it to QEMU via -add-fd on the initial launch, so that
 > 
 > I am currently working on the LUO agent (LUOD) documentation, and the
 > intended behavior differs slightly from this description.
 > 
 > Rather than the agent simply passing FDs, the design involves LUOD
 > opening /dev/liveupdate and creating a UDS socket (e.g., at
 > /run/luod/liveupdate.sock). Live-update-capable clients connect to
 > this socket, create a LUO session, and preserve their resources into
 > that session. Once all clients notify LUOD that they are ready, LUOD
 > performs the kexec reboot while keeping /dev/liveupdate open, ensuring
 > the LUO sessions survive.
 > 
 > After the reboot, the restarted clients reconnect to LUOD via UDS and
 > request their preserved sessions. The sessions are passed as FDs back
 > to the clients. From those sessions, the clients retrieve the
 > resources, restore them, and resume suspended operations. Finally, the
 > clients "finish" their sessions, fully reclaiming ownership of the
 > preserved resources from LUO back to userspace.

Ack, and sorry for my cover letter text over-simplified the design. 

 > > memory-backend-file can use it as the shared RAM backing. On update, LUO
 > > checkpoints QEMU, reboots the host kernel via kexec, and then re-launches QEMU
 > > to restore vmstate while reusing the same preserved memfd-backed RAM FD.
 > > Today LUO only supports handing off guest RAM via memfd [2].
 > >
 > > To re-attach that preserved RAM backing without reopening non-persistent
 > > paths, QEMU needs to let memory-backend-file consume the pre-opened FD using
 > > mem-path=/dev/fdset/<id>. However, memory-backend-file currently uses
 > > open()/creat() directly, so /dev/fdset/<id> cannot be resolved through the
 > > fdset mechanism, making this workflow impossible.
 > >
 > > This series allows /dev/fdset/<id> for file-backed RAM, documents the setup,
 > > and adds qtests to validate that x-ignore-shared keeps RAM transfer minimal
 > > in the cpr-reboot path.
 > 
 > Sessions become particularly relevant when we look beyond guest RAM.
 > For complex resources like VFIO or IOMMU, simply passing the FD is
 > insufficient. These resources require specific ioctls to be issued by
 > clients after the FD is retrieved, but before vCPUs are resumed or the
 > session is "finished".

One question: in the intended LUOD/session architecture, do you still expect
clients (or a wrapper) to inject the retrieved guest RAM memfd FD into QEMU
via -add-fd + /dev/fdset/<id> at QEMU startup? (as this patchset does)

Or is the longer-term plan that QEMU itself connects to LUOD over UDS, retrieves
the session/resource FD directly, and consumes it as the RAM backing without
going through -add-fd? 

If the latter, I think we'd likely need a QEMU-side interface to
consume an inherited/pre-opened FD for RAM backing during early machine init.

Regards,

Li​
Re: [PATCH 0/3] CPR: shared RAM with /dev/fdset for LUO kexec reboot
Posted by Pasha Tatashin 1 month, 1 week ago
On Tue, Dec 30, 2025 at 3:30 AM Li Chen <me@linux.beauty> wrote:
>
> Hi Pasha,
>
> Thanks for the detailed clarification, that's very helpful.
>
>  ---- On Mon, 29 Dec 2025 23:50:35 +0800  Pasha Tatashin <pasha.tatashin@soleen.com> wrote ---
>  > Hi Li,
>  >
>  > Thanks for the series. I have a few comments regarding the interaction
>  > with the LUO agent and the broader architecture.
>  >
>  > > In the LUO/KHO update flow [1], a LUO agent coordinates a host kexec reboot
>  > > while keeping VM RAM content intact. LUO creates the guest RAM backing as a
>  > > memfd and passes it to QEMU via -add-fd on the initial launch, so that
>  >
>  > I am currently working on the LUO agent (LUOD) documentation, and the
>  > intended behavior differs slightly from this description.
>  >
>  > Rather than the agent simply passing FDs, the design involves LUOD
>  > opening /dev/liveupdate and creating a UDS socket (e.g., at
>  > /run/luod/liveupdate.sock). Live-update-capable clients connect to
>  > this socket, create a LUO session, and preserve their resources into
>  > that session. Once all clients notify LUOD that they are ready, LUOD
>  > performs the kexec reboot while keeping /dev/liveupdate open, ensuring
>  > the LUO sessions survive.
>  >
>  > After the reboot, the restarted clients reconnect to LUOD via UDS and
>  > request their preserved sessions. The sessions are passed as FDs back
>  > to the clients. From those sessions, the clients retrieve the
>  > resources, restore them, and resume suspended operations. Finally, the
>  > clients "finish" their sessions, fully reclaiming ownership of the
>  > preserved resources from LUO back to userspace.
>
> Ack, and sorry for my cover letter text over-simplified the design.
>
>  > > memory-backend-file can use it as the shared RAM backing. On update, LUO
>  > > checkpoints QEMU, reboots the host kernel via kexec, and then re-launches QEMU
>  > > to restore vmstate while reusing the same preserved memfd-backed RAM FD.
>  > > Today LUO only supports handing off guest RAM via memfd [2].
>  > >
>  > > To re-attach that preserved RAM backing without reopening non-persistent
>  > > paths, QEMU needs to let memory-backend-file consume the pre-opened FD using
>  > > mem-path=/dev/fdset/<id>. However, memory-backend-file currently uses
>  > > open()/creat() directly, so /dev/fdset/<id> cannot be resolved through the
>  > > fdset mechanism, making this workflow impossible.
>  > >
>  > > This series allows /dev/fdset/<id> for file-backed RAM, documents the setup,
>  > > and adds qtests to validate that x-ignore-shared keeps RAM transfer minimal
>  > > in the cpr-reboot path.
>  >
>  > Sessions become particularly relevant when we look beyond guest RAM.
>  > For complex resources like VFIO or IOMMU, simply passing the FD is
>  > insufficient. These resources require specific ioctls to be issued by
>  > clients after the FD is retrieved, but before vCPUs are resumed or the
>  > session is "finished".
>
> One question: in the intended LUOD/session architecture, do you still expect
> clients (or a wrapper) to inject the retrieved guest RAM memfd FD into QEMU
> via -add-fd + /dev/fdset/<id> at QEMU startup? (as this patchset does)
>
> Or is the longer-term plan that QEMU itself connects to LUOD over UDS, retrieves
> the session/resource FD directly, and consumes it as the RAM backing without
> going through -add-fd?

The latter, I expect QEMU and other VMMs use their LUO session FDs
that they receive from LUOD to retrieve and restore the preserved
resources: memfd+iommufd+vfiofd and resume the suspended VMs.

>
> If the latter, I think we'd likely need a QEMU-side interface to
> consume an inherited/pre-opened FD for RAM backing during early machine init.
>
> Regards,
>
> Li
>
Re: [PATCH 0/3] CPR: shared RAM with /dev/fdset for LUO kexec reboot
Posted by Jason Gunthorpe 1 month, 1 week ago
On Tue, Dec 30, 2025 at 11:42:09AM -0500, Pasha Tatashin wrote:
> > One question: in the intended LUOD/session architecture, do you still expect
> > clients (or a wrapper) to inject the retrieved guest RAM memfd FD into QEMU
> > via -add-fd + /dev/fdset/<id> at QEMU startup? (as this patchset does)
> >
> > Or is the longer-term plan that QEMU itself connects to LUOD over UDS, retrieves
> > the session/resource FD directly, and consumes it as the RAM backing without
> > going through -add-fd?
> 
> The latter, I expect QEMU and other VMMs use their LUO session FDs
> that they receive from LUOD to retrieve and restore the preserved
> resources: memfd+iommufd+vfiofd and resume the suspended VMs.

Right, and we shouldn't have something like /dev/fset/ID at all.

The VMM has to know it is doing a LUO restoration and squence
restoration of all the stuff the preserving VMM put into LUO.

The general case is far more complex than just opening a memfd.

Jason