.../ABI/testing/sysfs-kernel-liveupdate | 51 + Documentation/admin-guide/index.rst | 1 + Documentation/admin-guide/liveupdate.rst | 16 + Documentation/core-api/index.rst | 1 + Documentation/core-api/kho/concepts.rst | 2 +- Documentation/core-api/liveupdate.rst | 57 + Documentation/mm/index.rst | 1 + Documentation/mm/memfd_preservation.rst | 138 +++ Documentation/userspace-api/index.rst | 1 + .../userspace-api/ioctl/ioctl-number.rst | 2 + Documentation/userspace-api/liveupdate.rst | 25 + MAINTAINERS | 19 +- include/linux/kexec_handover.h | 53 +- include/linux/liveupdate.h | 203 ++++ include/linux/shmem_fs.h | 23 + include/uapi/linux/liveupdate.h | 399 +++++++ init/Kconfig | 2 + kernel/Kconfig.kexec | 14 - kernel/Makefile | 2 +- kernel/liveupdate/Kconfig | 90 ++ kernel/liveupdate/Makefile | 17 + kernel/{ => liveupdate}/kexec_handover.c | 554 ++++----- kernel/liveupdate/kexec_handover_debug.c | 222 ++++ kernel/liveupdate/kexec_handover_internal.h | 45 + kernel/liveupdate/luo_core.c | 517 +++++++++ kernel/liveupdate/luo_files.c | 1033 +++++++++++++++++ kernel/liveupdate/luo_internal.h | 60 + kernel/liveupdate/luo_ioctl.c | 297 +++++ kernel/liveupdate/luo_selftests.c | 345 ++++++ kernel/liveupdate/luo_selftests.h | 84 ++ kernel/liveupdate/luo_subsystems.c | 452 ++++++++ kernel/liveupdate/luo_sysfs.c | 92 ++ kernel/reboot.c | 4 + mm/Makefile | 1 + mm/internal.h | 6 + mm/memblock.c | 56 +- mm/memfd_luo.c | 507 ++++++++ mm/shmem.c | 52 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/liveupdate/.gitignore | 1 + tools/testing/selftests/liveupdate/Makefile | 7 + tools/testing/selftests/liveupdate/config | 6 + .../testing/selftests/liveupdate/liveupdate.c | 406 +++++++ 43 files changed, 5448 insertions(+), 417 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-kernel-liveupdate create mode 100644 Documentation/admin-guide/liveupdate.rst create mode 100644 Documentation/core-api/liveupdate.rst create mode 100644 Documentation/mm/memfd_preservation.rst create mode 100644 Documentation/userspace-api/liveupdate.rst create mode 100644 include/linux/liveupdate.h create mode 100644 include/uapi/linux/liveupdate.h create mode 100644 kernel/liveupdate/Kconfig create mode 100644 kernel/liveupdate/Makefile rename kernel/{ => liveupdate}/kexec_handover.c (74%) create mode 100644 kernel/liveupdate/kexec_handover_debug.c create mode 100644 kernel/liveupdate/kexec_handover_internal.h create mode 100644 kernel/liveupdate/luo_core.c create mode 100644 kernel/liveupdate/luo_files.c create mode 100644 kernel/liveupdate/luo_internal.h create mode 100644 kernel/liveupdate/luo_ioctl.c create mode 100644 kernel/liveupdate/luo_selftests.c create mode 100644 kernel/liveupdate/luo_selftests.h create mode 100644 kernel/liveupdate/luo_subsystems.c create mode 100644 kernel/liveupdate/luo_sysfs.c create mode 100644 mm/memfd_luo.c create mode 100644 tools/testing/selftests/liveupdate/.gitignore create mode 100644 tools/testing/selftests/liveupdate/Makefile create mode 100644 tools/testing/selftests/liveupdate/config create mode 100644 tools/testing/selftests/liveupdate/liveupdate.c
This series introduces the LUO, a kernel subsystem designed to facilitate live kernel updates with minimal downtime, particularly in cloud delplyoments aiming to update without fully disrupting running virtual machines. This series builds upon KHO framework by adding programmatic control over KHO's lifecycle and leveraging KHO for persisting LUO's own metadata across the kexec boundary. The git branch for this series can be found at: https://github.com/googleprodkernel/linux-liveupdate/tree/luo/v3 Changelog from v2: - Addressed comments from Mike Rapoport and Jason Gunthorpe - Only one user agent (LiveupdateD) can open /dev/liveupdate - Release all preserved resources if /dev/liveupdate closes before reboot. - With the above changes, sessions are not needed, and should be maintained by the user-agent itself, so removed support for sessions. - Added support for changing per-FD state (i.e. some FDs can be prepared or finished before the global transition. - All IOCTLs now follow iommufd/fwctl extendable design. - Replaced locks with guards - Added a callback for registered subsystems to be notified during boot: ops->boot(). - Removed args from callbacks, instead use container_of() to carry context specific data (see luo_selftests.c for example). - removed patches for luolib, they are going to be introduced in a separate repository. What is Live Update? Live Update is a kexec based reboot process where selected kernel resources (memory, file descriptors, and eventually devices) are kept operational or their state preserved across a kernel transition. For certain resources, DMA and interrupt activity might continue with minimal interruption during the kernel reboot. LUO provides a framework for coordinating live updates. It features: State Machine: Manages the live update process through states: NORMAL, PREPARED, FROZEN, UPDATED. KHO Integration: LUO programmatically drives KHO's finalization and abort sequences. KHO's debugfs interface is now optional configured via CONFIG_KEXEC_HANDOVER_DEBUG. LUO preserves its own metadata via KHO's kho_add_subtree and kho_preserve_phys() mechanisms. Subsystem Participation: A callback API liveupdate_register_subsystem() allows kernel subsystems (e.g., KVM, IOMMU, VFIO, PCI) to register handlers for LUO events (PREPARE, FREEZE, FINISH, CANCEL) and persist a u64 payload via the LUO FDT. File Descriptor Preservation: Infrastructure liveupdate_register_filesystem, luo_register_file, luo_retrieve_file to allow specific types of file descriptors (e.g., memfd, vfio) to be preserved and restored. Handlers for specific file types can be registered to manage their preservation and restoration, storing a u64 payload in the LUO FDT. User-space Interface: ioctl (/dev/liveupdate): The primary control interface for triggering LUO state transitions (prepare, freeze, finish, cancel) and managing the preservation/restoration of file descriptors. Access requires CAP_SYS_ADMIN. sysfs (/sys/kernel/liveupdate/state): A read-only interface for monitoring the current LUO state. This allows userspace services to track progress and coordinate actions. Selftests: Includes kernel-side hooks and userspace selftests to verify core LUO functionality, particularly subsystem registration and basic state transitions. LUO State Machine and Events: NORMAL: Default operational state. PREPARED: Initial preparation complete after LIVEUPDATE_PREPARE event. Subsystems have saved initial state. FROZEN: Final "blackout window" state after LIVEUPDATE_FREEZE event, just before kexec. Workloads must be suspended. UPDATED: Next kernel has booted via live update. Awaiting restoration and LIVEUPDATE_FINISH. Events: LIVEUPDATE_PREPARE: Prepare for reboot, serialize state. LIVEUPDATE_FREEZE: Final opportunity to save state before kexec. LIVEUPDATE_FINISH: Post-reboot cleanup in the next kernel. LIVEUPDATE_CANCEL: Abort prepare or freeze, revert changes. v2: https://lore.kernel.org/all/20250723144649.1696299-1-pasha.tatashin@soleen.com v1: https://lore.kernel.org/all/20250625231838.1897085-1-pasha.tatashin@soleen.com RFC v2: https://lore.kernel.org/all/20250515182322.117840-1-pasha.tatashin@soleen.com RFC v1: https://lore.kernel.org/all/20250320024011.2995837-1-pasha.tatashin@soleen.com Changyuan Lyu (1): kho: add interfaces to unpreserve folios and physical memory ranges Mike Rapoport (Microsoft) (1): kho: drop notifiers Pasha Tatashin (23): kho: init new_physxa->phys_bits to fix lockdep kho: mm: Don't allow deferred struct page with KHO kho: warn if KHO is disabled due to an error kho: allow to drive kho from within kernel kho: make debugfs interface optional kho: don't unpreserve memory during abort liveupdate: kho: move to kernel/liveupdate liveupdate: luo_core: luo_ioctl: Live Update Orchestrator liveupdate: luo_core: integrate with KHO liveupdate: luo_subsystems: add subsystem registration liveupdate: luo_subsystems: implement subsystem callbacks liveupdate: luo_files: add infrastructure for FDs liveupdate: luo_files: implement file systems callbacks liveupdate: luo_ioctl: add userpsace interface liveupdate: luo_files: luo_ioctl: Unregister all FDs on device close liveupdate: luo_files: luo_ioctl: Add ioctls for per-file state management liveupdate: luo_sysfs: add sysfs state monitoring reboot: call liveupdate_reboot() before kexec kho: move kho debugfs directory to liveupdate liveupdate: add selftests for subsystems un/registration selftests/liveupdate: add subsystem/state tests docs: add luo documentation MAINTAINERS: add liveupdate entry Pratyush Yadav (5): mm: shmem: use SHMEM_F_* flags instead of VM_* flags mm: shmem: allow freezing inode mapping mm: shmem: export some functions to internal.h luo: allow preserving memfd docs: add documentation for memfd preservation via LUO .../ABI/testing/sysfs-kernel-liveupdate | 51 + Documentation/admin-guide/index.rst | 1 + Documentation/admin-guide/liveupdate.rst | 16 + Documentation/core-api/index.rst | 1 + Documentation/core-api/kho/concepts.rst | 2 +- Documentation/core-api/liveupdate.rst | 57 + Documentation/mm/index.rst | 1 + Documentation/mm/memfd_preservation.rst | 138 +++ Documentation/userspace-api/index.rst | 1 + .../userspace-api/ioctl/ioctl-number.rst | 2 + Documentation/userspace-api/liveupdate.rst | 25 + MAINTAINERS | 19 +- include/linux/kexec_handover.h | 53 +- include/linux/liveupdate.h | 203 ++++ include/linux/shmem_fs.h | 23 + include/uapi/linux/liveupdate.h | 399 +++++++ init/Kconfig | 2 + kernel/Kconfig.kexec | 14 - kernel/Makefile | 2 +- kernel/liveupdate/Kconfig | 90 ++ kernel/liveupdate/Makefile | 17 + kernel/{ => liveupdate}/kexec_handover.c | 554 ++++----- kernel/liveupdate/kexec_handover_debug.c | 222 ++++ kernel/liveupdate/kexec_handover_internal.h | 45 + kernel/liveupdate/luo_core.c | 517 +++++++++ kernel/liveupdate/luo_files.c | 1033 +++++++++++++++++ kernel/liveupdate/luo_internal.h | 60 + kernel/liveupdate/luo_ioctl.c | 297 +++++ kernel/liveupdate/luo_selftests.c | 345 ++++++ kernel/liveupdate/luo_selftests.h | 84 ++ kernel/liveupdate/luo_subsystems.c | 452 ++++++++ kernel/liveupdate/luo_sysfs.c | 92 ++ kernel/reboot.c | 4 + mm/Makefile | 1 + mm/internal.h | 6 + mm/memblock.c | 56 +- mm/memfd_luo.c | 507 ++++++++ mm/shmem.c | 52 +- tools/testing/selftests/Makefile | 1 + tools/testing/selftests/liveupdate/.gitignore | 1 + tools/testing/selftests/liveupdate/Makefile | 7 + tools/testing/selftests/liveupdate/config | 6 + .../testing/selftests/liveupdate/liveupdate.c | 406 +++++++ 43 files changed, 5448 insertions(+), 417 deletions(-) create mode 100644 Documentation/ABI/testing/sysfs-kernel-liveupdate create mode 100644 Documentation/admin-guide/liveupdate.rst create mode 100644 Documentation/core-api/liveupdate.rst create mode 100644 Documentation/mm/memfd_preservation.rst create mode 100644 Documentation/userspace-api/liveupdate.rst create mode 100644 include/linux/liveupdate.h create mode 100644 include/uapi/linux/liveupdate.h create mode 100644 kernel/liveupdate/Kconfig create mode 100644 kernel/liveupdate/Makefile rename kernel/{ => liveupdate}/kexec_handover.c (74%) create mode 100644 kernel/liveupdate/kexec_handover_debug.c create mode 100644 kernel/liveupdate/kexec_handover_internal.h create mode 100644 kernel/liveupdate/luo_core.c create mode 100644 kernel/liveupdate/luo_files.c create mode 100644 kernel/liveupdate/luo_internal.h create mode 100644 kernel/liveupdate/luo_ioctl.c create mode 100644 kernel/liveupdate/luo_selftests.c create mode 100644 kernel/liveupdate/luo_selftests.h create mode 100644 kernel/liveupdate/luo_subsystems.c create mode 100644 kernel/liveupdate/luo_sysfs.c create mode 100644 mm/memfd_luo.c create mode 100644 tools/testing/selftests/liveupdate/.gitignore create mode 100644 tools/testing/selftests/liveupdate/Makefile create mode 100644 tools/testing/selftests/liveupdate/config create mode 100644 tools/testing/selftests/liveupdate/liveupdate.c -- 2.50.1.565.gc32cd1483b-goog
On 07.08.25 03:44, Pasha Tatashin wrote: > This series introduces the LUO, a kernel subsystem designed to > facilitate live kernel updates with minimal downtime, > particularly in cloud delplyoments aiming to update without fully > disrupting running virtual machines. > > This series builds upon KHO framework by adding programmatic > control over KHO's lifecycle and leveraging KHO for persisting LUO's > own metadata across the kexec boundary. The git branch for this series > can be found at: > > https://github.com/googleprodkernel/linux-liveupdate/tree/luo/v3 > > Changelog from v2: > - Addressed comments from Mike Rapoport and Jason Gunthorpe > - Only one user agent (LiveupdateD) can open /dev/liveupdate > - Release all preserved resources if /dev/liveupdate closes > before reboot. > - With the above changes, sessions are not needed, and should be > maintained by the user-agent itself, so removed support for > sessions. > - Added support for changing per-FD state (i.e. some FDs can be > prepared or finished before the global transition. > - All IOCTLs now follow iommufd/fwctl extendable design. > - Replaced locks with guards > - Added a callback for registered subsystems to be notified > during boot: ops->boot(). > - Removed args from callbacks, instead use container_of() to > carry context specific data (see luo_selftests.c for example). > - removed patches for luolib, they are going to be introduced in > a separate repository. > > What is Live Update? > Live Update is a kexec based reboot process where selected kernel > resources (memory, file descriptors, and eventually devices) are kept > operational or their state preserved across a kernel transition. For > certain resources, DMA and interrupt activity might continue with > minimal interruption during the kernel reboot. > > LUO provides a framework for coordinating live updates. It features: > State Machine: Manages the live update process through states: > NORMAL, PREPARED, FROZEN, UPDATED. > > KHO Integration: > > LUO programmatically drives KHO's finalization and abort sequences. > KHO's debugfs interface is now optional configured via > CONFIG_KEXEC_HANDOVER_DEBUG. > > LUO preserves its own metadata via KHO's kho_add_subtree and > kho_preserve_phys() mechanisms. > > Subsystem Participation: A callback API liveupdate_register_subsystem() > allows kernel subsystems (e.g., KVM, IOMMU, VFIO, PCI) to register > handlers for LUO events (PREPARE, FREEZE, FINISH, CANCEL) and persist a > u64 payload via the LUO FDT. > > File Descriptor Preservation: Infrastructure > liveupdate_register_filesystem, luo_register_file, luo_retrieve_file to > allow specific types of file descriptors (e.g., memfd, vfio) to be > preserved and restored. > > Handlers for specific file types can be registered to manage their > preservation and restoration, storing a u64 payload in the LUO FDT. > > User-space Interface: > > ioctl (/dev/liveupdate): The primary control interface for > triggering LUO state transitions (prepare, freeze, finish, cancel) > and managing the preservation/restoration of file descriptors. > Access requires CAP_SYS_ADMIN. > > sysfs (/sys/kernel/liveupdate/state): A read-only interface for > monitoring the current LUO state. This allows userspace services to > track progress and coordinate actions. > > Selftests: Includes kernel-side hooks and userspace selftests to > verify core LUO functionality, particularly subsystem registration and > basic state transitions. > > LUO State Machine and Events: > > NORMAL: Default operational state. > PREPARED: Initial preparation complete after LIVEUPDATE_PREPARE > event. Subsystems have saved initial state. > FROZEN: Final "blackout window" state after LIVEUPDATE_FREEZE > event, just before kexec. Workloads must be suspended. > UPDATED: Next kernel has booted via live update. Awaiting restoration > and LIVEUPDATE_FINISH. > > Events: > LIVEUPDATE_PREPARE: Prepare for reboot, serialize state. > LIVEUPDATE_FREEZE: Final opportunity to save state before kexec. > LIVEUPDATE_FINISH: Post-reboot cleanup in the next kernel. > LIVEUPDATE_CANCEL: Abort prepare or freeze, revert changes. > > v2: https://lore.kernel.org/all/20250723144649.1696299-1-pasha.tatashin@soleen.com > v1: https://lore.kernel.org/all/20250625231838.1897085-1-pasha.tatashin@soleen.com > RFC v2: https://lore.kernel.org/all/20250515182322.117840-1-pasha.tatashin@soleen.com > RFC v1: https://lore.kernel.org/all/20250320024011.2995837-1-pasha.tatashin@soleen.com > > Changyuan Lyu (1): > kho: add interfaces to unpreserve folios and physical memory ranges > > Mike Rapoport (Microsoft) (1): > kho: drop notifiers > > Pasha Tatashin (23): > kho: init new_physxa->phys_bits to fix lockdep > kho: mm: Don't allow deferred struct page with KHO > kho: warn if KHO is disabled due to an error > kho: allow to drive kho from within kernel > kho: make debugfs interface optional > kho: don't unpreserve memory during abort > liveupdate: kho: move to kernel/liveupdate > liveupdate: luo_core: luo_ioctl: Live Update Orchestrator > liveupdate: luo_core: integrate with KHO > liveupdate: luo_subsystems: add subsystem registration > liveupdate: luo_subsystems: implement subsystem callbacks > liveupdate: luo_files: add infrastructure for FDs > liveupdate: luo_files: implement file systems callbacks > liveupdate: luo_ioctl: add userpsace interface > liveupdate: luo_files: luo_ioctl: Unregister all FDs on device close > liveupdate: luo_files: luo_ioctl: Add ioctls for per-file state > management > liveupdate: luo_sysfs: add sysfs state monitoring > reboot: call liveupdate_reboot() before kexec > kho: move kho debugfs directory to liveupdate > liveupdate: add selftests for subsystems un/registration > selftests/liveupdate: add subsystem/state tests > docs: add luo documentation > MAINTAINERS: add liveupdate entry > > Pratyush Yadav (5): > mm: shmem: use SHMEM_F_* flags instead of VM_* flags > mm: shmem: allow freezing inode mapping > mm: shmem: export some functions to internal.h > luo: allow preserving memfd > docs: add documentation for memfd preservation via LUO It's not clear from the description why these mm shmem changes are buried in this patch set. It's not even described above in the patch description. I suggest sending that part out separately, so Hugh actually spots this. (is he even CC'ed?) -- Cheers, David / dhildenb
On Fri, Aug 8, 2025 at 12:07 PM David Hildenbrand <david@redhat.com> wrote: > > On 07.08.25 03:44, Pasha Tatashin wrote: > > This series introduces the LUO, a kernel subsystem designed to > > facilitate live kernel updates with minimal downtime, > > particularly in cloud delplyoments aiming to update without fully > > disrupting running virtual machines. > > > > This series builds upon KHO framework by adding programmatic > > control over KHO's lifecycle and leveraging KHO for persisting LUO's > > own metadata across the kexec boundary. The git branch for this series > > can be found at: > > > > https://github.com/googleprodkernel/linux-liveupdate/tree/luo/v3 > > > > Changelog from v2: > > - Addressed comments from Mike Rapoport and Jason Gunthorpe > > - Only one user agent (LiveupdateD) can open /dev/liveupdate > > - Release all preserved resources if /dev/liveupdate closes > > before reboot. > > - With the above changes, sessions are not needed, and should be > > maintained by the user-agent itself, so removed support for > > sessions. > > - Added support for changing per-FD state (i.e. some FDs can be > > prepared or finished before the global transition. > > - All IOCTLs now follow iommufd/fwctl extendable design. > > - Replaced locks with guards > > - Added a callback for registered subsystems to be notified > > during boot: ops->boot(). > > - Removed args from callbacks, instead use container_of() to > > carry context specific data (see luo_selftests.c for example). > > - removed patches for luolib, they are going to be introduced in > > a separate repository. > > > > What is Live Update? > > Live Update is a kexec based reboot process where selected kernel > > resources (memory, file descriptors, and eventually devices) are kept > > operational or their state preserved across a kernel transition. For > > certain resources, DMA and interrupt activity might continue with > > minimal interruption during the kernel reboot. > > > > LUO provides a framework for coordinating live updates. It features: > > State Machine: Manages the live update process through states: > > NORMAL, PREPARED, FROZEN, UPDATED. > > > > KHO Integration: > > > > LUO programmatically drives KHO's finalization and abort sequences. > > KHO's debugfs interface is now optional configured via > > CONFIG_KEXEC_HANDOVER_DEBUG. > > > > LUO preserves its own metadata via KHO's kho_add_subtree and > > kho_preserve_phys() mechanisms. > > > > Subsystem Participation: A callback API liveupdate_register_subsystem() > > allows kernel subsystems (e.g., KVM, IOMMU, VFIO, PCI) to register > > handlers for LUO events (PREPARE, FREEZE, FINISH, CANCEL) and persist a > > u64 payload via the LUO FDT. > > > > File Descriptor Preservation: Infrastructure > > liveupdate_register_filesystem, luo_register_file, luo_retrieve_file to > > allow specific types of file descriptors (e.g., memfd, vfio) to be > > preserved and restored. > > > > Handlers for specific file types can be registered to manage their > > preservation and restoration, storing a u64 payload in the LUO FDT. > > > > User-space Interface: > > > > ioctl (/dev/liveupdate): The primary control interface for > > triggering LUO state transitions (prepare, freeze, finish, cancel) > > and managing the preservation/restoration of file descriptors. > > Access requires CAP_SYS_ADMIN. > > > > sysfs (/sys/kernel/liveupdate/state): A read-only interface for > > monitoring the current LUO state. This allows userspace services to > > track progress and coordinate actions. > > > > Selftests: Includes kernel-side hooks and userspace selftests to > > verify core LUO functionality, particularly subsystem registration and > > basic state transitions. > > > > LUO State Machine and Events: > > > > NORMAL: Default operational state. > > PREPARED: Initial preparation complete after LIVEUPDATE_PREPARE > > event. Subsystems have saved initial state. > > FROZEN: Final "blackout window" state after LIVEUPDATE_FREEZE > > event, just before kexec. Workloads must be suspended. > > UPDATED: Next kernel has booted via live update. Awaiting restoration > > and LIVEUPDATE_FINISH. > > > > Events: > > LIVEUPDATE_PREPARE: Prepare for reboot, serialize state. > > LIVEUPDATE_FREEZE: Final opportunity to save state before kexec. > > LIVEUPDATE_FINISH: Post-reboot cleanup in the next kernel. > > LIVEUPDATE_CANCEL: Abort prepare or freeze, revert changes. > > > > v2: https://lore.kernel.org/all/20250723144649.1696299-1-pasha.tatashin@soleen.com > > v1: https://lore.kernel.org/all/20250625231838.1897085-1-pasha.tatashin@soleen.com > > RFC v2: https://lore.kernel.org/all/20250515182322.117840-1-pasha.tatashin@soleen.com > > RFC v1: https://lore.kernel.org/all/20250320024011.2995837-1-pasha.tatashin@soleen.com > > > > Changyuan Lyu (1): > > kho: add interfaces to unpreserve folios and physical memory ranges > > > > Mike Rapoport (Microsoft) (1): > > kho: drop notifiers > > > > Pasha Tatashin (23): > > kho: init new_physxa->phys_bits to fix lockdep > > kho: mm: Don't allow deferred struct page with KHO > > kho: warn if KHO is disabled due to an error > > kho: allow to drive kho from within kernel > > kho: make debugfs interface optional > > kho: don't unpreserve memory during abort > > liveupdate: kho: move to kernel/liveupdate > > liveupdate: luo_core: luo_ioctl: Live Update Orchestrator > > liveupdate: luo_core: integrate with KHO > > liveupdate: luo_subsystems: add subsystem registration > > liveupdate: luo_subsystems: implement subsystem callbacks > > liveupdate: luo_files: add infrastructure for FDs > > liveupdate: luo_files: implement file systems callbacks > > liveupdate: luo_ioctl: add userpsace interface > > liveupdate: luo_files: luo_ioctl: Unregister all FDs on device close > > liveupdate: luo_files: luo_ioctl: Add ioctls for per-file state > > management > > liveupdate: luo_sysfs: add sysfs state monitoring > > reboot: call liveupdate_reboot() before kexec > > kho: move kho debugfs directory to liveupdate > > liveupdate: add selftests for subsystems un/registration > > selftests/liveupdate: add subsystem/state tests > > docs: add luo documentation > > MAINTAINERS: add liveupdate entry > > > > Pratyush Yadav (5): > > mm: shmem: use SHMEM_F_* flags instead of VM_* flags > > mm: shmem: allow freezing inode mapping > > mm: shmem: export some functions to internal.h > > luo: allow preserving memfd > > docs: add documentation for memfd preservation via LUO > > It's not clear from the description why these mm shmem changes are > buried in this patch set. It's not even described above in the patch > description. Hi David, Yes, I should update the cover letter to include memfd preservation work. > I suggest sending that part out separately, so Hugh actually spots this. > (is he even CC'ed?) +cc hughd@google.com While MM list is CCed, you are right, I have not specifically CCed shmem maintainers. This will be fixed in the next revision. Thank you, Pasha
On Fri, Aug 08 2025, David Hildenbrand wrote: > On 07.08.25 03:44, Pasha Tatashin wrote: >> This series introduces the LUO, a kernel subsystem designed to >> facilitate live kernel updates with minimal downtime, >> particularly in cloud delplyoments aiming to update without fully >> disrupting running virtual machines. >> This series builds upon KHO framework by adding programmatic >> control over KHO's lifecycle and leveraging KHO for persisting LUO's >> own metadata across the kexec boundary. The git branch for this series >> can be found at: >> https://github.com/googleprodkernel/linux-liveupdate/tree/luo/v3 >> Changelog from v2: >> - Addressed comments from Mike Rapoport and Jason Gunthorpe >> - Only one user agent (LiveupdateD) can open /dev/liveupdate >> - Release all preserved resources if /dev/liveupdate closes >> before reboot. >> - With the above changes, sessions are not needed, and should be >> maintained by the user-agent itself, so removed support for >> sessions. >> - Added support for changing per-FD state (i.e. some FDs can be >> prepared or finished before the global transition. >> - All IOCTLs now follow iommufd/fwctl extendable design. >> - Replaced locks with guards >> - Added a callback for registered subsystems to be notified >> during boot: ops->boot(). >> - Removed args from callbacks, instead use container_of() to >> carry context specific data (see luo_selftests.c for example). >> - removed patches for luolib, they are going to be introduced in >> a separate repository. >> What is Live Update? >> Live Update is a kexec based reboot process where selected kernel >> resources (memory, file descriptors, and eventually devices) are kept >> operational or their state preserved across a kernel transition. For >> certain resources, DMA and interrupt activity might continue with >> minimal interruption during the kernel reboot. >> LUO provides a framework for coordinating live updates. It features: >> State Machine: Manages the live update process through states: >> NORMAL, PREPARED, FROZEN, UPDATED. >> KHO Integration: >> LUO programmatically drives KHO's finalization and abort sequences. >> KHO's debugfs interface is now optional configured via >> CONFIG_KEXEC_HANDOVER_DEBUG. >> LUO preserves its own metadata via KHO's kho_add_subtree and >> kho_preserve_phys() mechanisms. >> Subsystem Participation: A callback API liveupdate_register_subsystem() >> allows kernel subsystems (e.g., KVM, IOMMU, VFIO, PCI) to register >> handlers for LUO events (PREPARE, FREEZE, FINISH, CANCEL) and persist a >> u64 payload via the LUO FDT. >> File Descriptor Preservation: Infrastructure >> liveupdate_register_filesystem, luo_register_file, luo_retrieve_file to >> allow specific types of file descriptors (e.g., memfd, vfio) to be >> preserved and restored. >> Handlers for specific file types can be registered to manage their >> preservation and restoration, storing a u64 payload in the LUO FDT. >> User-space Interface: >> ioctl (/dev/liveupdate): The primary control interface for >> triggering LUO state transitions (prepare, freeze, finish, cancel) >> and managing the preservation/restoration of file descriptors. >> Access requires CAP_SYS_ADMIN. >> sysfs (/sys/kernel/liveupdate/state): A read-only interface for >> monitoring the current LUO state. This allows userspace services to >> track progress and coordinate actions. >> Selftests: Includes kernel-side hooks and userspace selftests to >> verify core LUO functionality, particularly subsystem registration and >> basic state transitions. >> LUO State Machine and Events: >> NORMAL: Default operational state. >> PREPARED: Initial preparation complete after LIVEUPDATE_PREPARE >> event. Subsystems have saved initial state. >> FROZEN: Final "blackout window" state after LIVEUPDATE_FREEZE >> event, just before kexec. Workloads must be suspended. >> UPDATED: Next kernel has booted via live update. Awaiting restoration >> and LIVEUPDATE_FINISH. >> Events: >> LIVEUPDATE_PREPARE: Prepare for reboot, serialize state. >> LIVEUPDATE_FREEZE: Final opportunity to save state before kexec. >> LIVEUPDATE_FINISH: Post-reboot cleanup in the next kernel. >> LIVEUPDATE_CANCEL: Abort prepare or freeze, revert changes. >> v2: >> https://lore.kernel.org/all/20250723144649.1696299-1-pasha.tatashin@soleen.com >> v1: https://lore.kernel.org/all/20250625231838.1897085-1-pasha.tatashin@soleen.com >> RFC v2: https://lore.kernel.org/all/20250515182322.117840-1-pasha.tatashin@soleen.com >> RFC v1: https://lore.kernel.org/all/20250320024011.2995837-1-pasha.tatashin@soleen.com >> Changyuan Lyu (1): >> kho: add interfaces to unpreserve folios and physical memory ranges >> Mike Rapoport (Microsoft) (1): >> kho: drop notifiers >> Pasha Tatashin (23): >> kho: init new_physxa->phys_bits to fix lockdep >> kho: mm: Don't allow deferred struct page with KHO >> kho: warn if KHO is disabled due to an error >> kho: allow to drive kho from within kernel >> kho: make debugfs interface optional >> kho: don't unpreserve memory during abort >> liveupdate: kho: move to kernel/liveupdate >> liveupdate: luo_core: luo_ioctl: Live Update Orchestrator >> liveupdate: luo_core: integrate with KHO >> liveupdate: luo_subsystems: add subsystem registration >> liveupdate: luo_subsystems: implement subsystem callbacks >> liveupdate: luo_files: add infrastructure for FDs >> liveupdate: luo_files: implement file systems callbacks >> liveupdate: luo_ioctl: add userpsace interface >> liveupdate: luo_files: luo_ioctl: Unregister all FDs on device close >> liveupdate: luo_files: luo_ioctl: Add ioctls for per-file state >> management >> liveupdate: luo_sysfs: add sysfs state monitoring >> reboot: call liveupdate_reboot() before kexec >> kho: move kho debugfs directory to liveupdate >> liveupdate: add selftests for subsystems un/registration >> selftests/liveupdate: add subsystem/state tests >> docs: add luo documentation >> MAINTAINERS: add liveupdate entry >> Pratyush Yadav (5): >> mm: shmem: use SHMEM_F_* flags instead of VM_* flags >> mm: shmem: allow freezing inode mapping >> mm: shmem: export some functions to internal.h >> luo: allow preserving memfd >> docs: add documentation for memfd preservation via LUO > > It's not clear from the description why these mm shmem changes are buried in > this patch set. It's not even described above in the patch description. Patches 26-30 describe the shmem changes in more detail, but you're right, it should be mentioned in the cover as well. The idea is, LUO is used to preserve kernel resources across kexec. One of the most fundamental resources the kernel has is memory. Since LUO does preservation based on file descriptors, memfd is the way to attach a FD to memory. So we went with memfd as the first user of LUO. memfd can be backed by shmem or hugetlb, but currently only shmem is supported. We do plan to support hugetlb as well in the future. The idea is to keep the serialization/live update logic out of the way of the main subsystem. So we decided to keep the logic out in a separate file. > > I suggest sending that part out separately, so Hugh actually spots this. > (is he even CC'ed?) Hmm, none of the shmem maintainers are included. I wonder why. The patches do touch shmem.c and shmem_fs.h so the MAINTAINERS entry for "TMPFS (SHMEM FILESYSTEM)" should have been hit. My guess is that the shmem changes weren't part of the original RFC so perhaps Pasha forgot to update the To/Cc list since then? Either way, I've added Hugh and Baolin to this email. Hugh, Baolin, you can find the shmem related patches at [0][1][2][3][4]. Pasha, can you please add them for later versions as well? And now that I think about it, I suppose patch 29 should also add memfd_luo.c under the SHMEM MAINTAINERS entry. [0] https://lore.kernel.org/lkml/20250807014442.3829950-27-pasha.tatashin@soleen.com/ [1] https://lore.kernel.org/lkml/20250807014442.3829950-28-pasha.tatashin@soleen.com/ [2] https://lore.kernel.org/lkml/20250807014442.3829950-29-pasha.tatashin@soleen.com/ [3] https://lore.kernel.org/lkml/20250807014442.3829950-30-pasha.tatashin@soleen.com/ [4] https://lore.kernel.org/lkml/20250807014442.3829950-31-pasha.tatashin@soleen.com/ -- Regards, Pratyush Yadav
> > And now that I think about it, I suppose patch 29 should also add > memfd_luo.c under the SHMEM MAINTAINERS entry. Right, let's update this in the next revision. Thanks, Pasha
Hi Pasha, On Thu, Aug 07 2025, Pasha Tatashin wrote: > This series introduces the LUO, a kernel subsystem designed to > facilitate live kernel updates with minimal downtime, > particularly in cloud delplyoments aiming to update without fully > disrupting running virtual machines. > > This series builds upon KHO framework by adding programmatic > control over KHO's lifecycle and leveraging KHO for persisting LUO's > own metadata across the kexec boundary. The git branch for this series > can be found at: > > https://github.com/googleprodkernel/linux-liveupdate/tree/luo/v3 > > Changelog from v2: > - Addressed comments from Mike Rapoport and Jason Gunthorpe > - Only one user agent (LiveupdateD) can open /dev/liveupdate > - With the above changes, sessions are not needed, and should be > maintained by the user-agent itself, so removed support for > sessions. If all the FDs are restored in the agent's context, this assigns all the resources to the agent. For example, if the agent restores a memfd, all the memory gets charged to the agent's cgroup, and the client gets none of it. This makes it impossible to do any kind of resource limits. This was one of the advantages of being able to pass around sessions instead of FDs. The agent can pass on the right session to the right client, and then the client does the restore, getting all the resources charged to it. If we don't allow this, I think we will make LUO/LiveupdateD unsuitable for many kinds of workloads. Do you have any ideas on how to do proper resource attribution with the current patches? If not, then perhaps we should reconsider this change? [...] -- Regards, Pratyush Yadav
> > https://github.com/googleprodkernel/linux-liveupdate/tree/luo/v3 > > > > Changelog from v2: > > - Addressed comments from Mike Rapoport and Jason Gunthorpe > > - Only one user agent (LiveupdateD) can open /dev/liveupdate > > - With the above changes, sessions are not needed, and should be > > maintained by the user-agent itself, so removed support for > > sessions. > > If all the FDs are restored in the agent's context, this assigns all the > resources to the agent. For example, if the agent restores a memfd, all > the memory gets charged to the agent's cgroup, and the client gets none > of it. This makes it impossible to do any kind of resource limits. > > This was one of the advantages of being able to pass around sessions > instead of FDs. The agent can pass on the right session to the right > client, and then the client does the restore, getting all the resources > charged to it. > > If we don't allow this, I think we will make LUO/LiveupdateD unsuitable > for many kinds of workloads. Do you have any ideas on how to do proper > resource attribution with the current patches? If not, then perhaps we > should reconsider this change? Hi Pratyush, That's an excellent point, and you're right that we must have a solution for correct resource charging. I'd prefer to keep the session logic in the userspace agent (luod https://tinyurl.com/luoddesign). For the charging problem, I believe there's a clear path forward with the current ioctl-based API. The design of the ioctl commands (with a size field in each struct) is intentionally extensible. In a follow-up patch, we can extend the liveupdate_ioctl_fd_restore struct to include a target pid field. The luod agent, would then be able to restore an FD on behalf of a client and instruct the kernel to charge the associated resources to that client's PID. This keeps the responsibilities clean: luod manages sessions and authorization, while the kernel provides the specific mechanism for resource attribution. I agree this is a must-have feature, but I think it can be cleanly added on top of the current foundation. Pasha > > [...] > > -- > Regards, > Pratyush Yadav
On Tue, Aug 26, 2025 at 01:54:31PM +0000, Pasha Tatashin wrote: > > > https://github.com/googleprodkernel/linux-liveupdate/tree/luo/v3 > > > > > > Changelog from v2: > > > - Addressed comments from Mike Rapoport and Jason Gunthorpe > > > - Only one user agent (LiveupdateD) can open /dev/liveupdate > > > - With the above changes, sessions are not needed, and should be > > > maintained by the user-agent itself, so removed support for > > > sessions. > > > > If all the FDs are restored in the agent's context, this assigns all the > > resources to the agent. For example, if the agent restores a memfd, all > > the memory gets charged to the agent's cgroup, and the client gets none > > of it. This makes it impossible to do any kind of resource limits. > > > > This was one of the advantages of being able to pass around sessions > > instead of FDs. The agent can pass on the right session to the right > > client, and then the client does the restore, getting all the resources > > charged to it. > > > > If we don't allow this, I think we will make LUO/LiveupdateD unsuitable > > for many kinds of workloads. Do you have any ideas on how to do proper > > resource attribution with the current patches? If not, then perhaps we > > should reconsider this change? > > Hi Pratyush, > > That's an excellent point, and you're right that we must have a > solution for correct resource charging. > > I'd prefer to keep the session logic in the userspace agent (luod > https://tinyurl.com/luoddesign). > > For the charging problem, I believe there's a clear path forward with > the current ioctl-based API. The design of the ioctl commands (with a > size field in each struct) is intentionally extensible. In a follow-up > patch, we can extend the liveupdate_ioctl_fd_restore struct to include > a target pid field. The luod agent, would then be able to restore an > FD on behalf of a client and instruct the kernel to charge the > associated resources to that client's PID. This wasn't quite the idea though.. The sessions sub FD were intended to be passed directly to other processes though unix sockets and fd passing so they could run their own ioctls in their own context for both save and restore. The ioctls available on the sessions should be specifically narrowed to be safe for this. I can understand not implementing session FDs in the first version, but when sessions FD are available they should work like this and solve the namespace/cgroup/etc issues. Passing some PID in an ioctl is not a great idea... Jason
On Tue, Aug 26, 2025 at 2:24 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Tue, Aug 26, 2025 at 01:54:31PM +0000, Pasha Tatashin wrote: > > > > https://github.com/googleprodkernel/linux-liveupdate/tree/luo/v3 > > > > > > > > Changelog from v2: > > > > - Addressed comments from Mike Rapoport and Jason Gunthorpe > > > > - Only one user agent (LiveupdateD) can open /dev/liveupdate > > > > - With the above changes, sessions are not needed, and should be > > > > maintained by the user-agent itself, so removed support for > > > > sessions. > > > > > > If all the FDs are restored in the agent's context, this assigns all the > > > resources to the agent. For example, if the agent restores a memfd, all > > > the memory gets charged to the agent's cgroup, and the client gets none > > > of it. This makes it impossible to do any kind of resource limits. > > > > > > This was one of the advantages of being able to pass around sessions > > > instead of FDs. The agent can pass on the right session to the right > > > client, and then the client does the restore, getting all the resources > > > charged to it. > > > > > > If we don't allow this, I think we will make LUO/LiveupdateD unsuitable > > > for many kinds of workloads. Do you have any ideas on how to do proper > > > resource attribution with the current patches? If not, then perhaps we > > > should reconsider this change? > > > > Hi Pratyush, > > > > That's an excellent point, and you're right that we must have a > > solution for correct resource charging. > > > > I'd prefer to keep the session logic in the userspace agent (luod > > https://tinyurl.com/luoddesign). > > > > For the charging problem, I believe there's a clear path forward with > > the current ioctl-based API. The design of the ioctl commands (with a > > size field in each struct) is intentionally extensible. In a follow-up > > patch, we can extend the liveupdate_ioctl_fd_restore struct to include > > a target pid field. The luod agent, would then be able to restore an > > FD on behalf of a client and instruct the kernel to charge the > > associated resources to that client's PID. > > This wasn't quite the idea though.. > > The sessions sub FD were intended to be passed directly to other > processes though unix sockets and fd passing so they could run their > own ioctls in their own context for both save and restore. The ioctls > available on the sessions should be specifically narrowed to be safe > for this. > > I can understand not implementing session FDs in the first version, > but when sessions FD are available they should work like this and > solve the namespace/cgroup/etc issues. > > Passing some PID in an ioctl is not a great idea... Hi Jason, I'm trying to understand the drawbacks of the PID-based approach. Could you elaborate on why passing a PID in the RESTORE_FD ioctl is not a good idea? From my perspective, luod would have a live, open socket to the client process requesting the restore. It can use SO_PEERCRED to securely identify the client's PID at that moment. The flow would be: 1. Client connects and resumes its session with luod. 2. Client requests to restore TOKEN_X. 3. luod verifies the client owns TOKEN_X for its session. 4. luod calls the RESTORE_FD ioctl, telling the kernel: "Please restore TOKEN_X and charge the resources to PID Y (which I just verified is on the other end of this socket)." 5. The kernel performs the action. 6. luod receives the new FD from the kernel and passes it back to the client over the socket. In this flow, the client isn't providing an arbitrary PID; the trusted luod agent is providing the PID of a process it has an active connection with. The idea was to let luod handle the session/security story, and the kernel handle the core preservation mechanism. Adding sessions to the kernel, delegates the management and part of the security model into the kernel. I am not sure if it is necessary, what can be cleanly managed in userspace should stay in userspace. Thanks, Pasha > > Jason
On Tue, Aug 26, 2025 at 03:02:13PM +0000, Pasha Tatashin wrote: > I'm trying to understand the drawbacks of the PID-based approach. > Could you elaborate on why passing a PID in the RESTORE_FD ioctl is > not a good idea? It will be a major invasive change all over the place in the kernel to change things that assume current to do something else. We should try to avoid this. > In this flow, the client isn't providing an arbitrary PID; the trusted > luod agent is providing the PID of a process it has an active > connection with. PIDs are wobbly thing, you can never really trust them unless they are in a pidfd. > The idea was to let luod handle the session/security story, and the > kernel handle the core preservation mechanism. Adding sessions to the > kernel, delegates the management and part of the security model into > the kernel. I am not sure if it is necessary, what can be cleanly > managed in userspace should stay in userspace. session fds were an update imagined to allow the kernel to partition things the session FD it self could be shared with other processes. I think in the calls the idea was it was reasonable to start without sessions fds at all, but in this case we shouldn't be mucking with pids or current. Since it seems that is important it should be addressed by issuing the restore ioctl inside the correct process context, that is a much easier thing to delegate to the kernel than trying to deal with spoofing current/etc. Jason
On Tue, Aug 26, 2025 at 3:13 PM Jason Gunthorpe <jgg@nvidia.com> wrote: > > On Tue, Aug 26, 2025 at 03:02:13PM +0000, Pasha Tatashin wrote: > > I'm trying to understand the drawbacks of the PID-based approach. > > Could you elaborate on why passing a PID in the RESTORE_FD ioctl is > > not a good idea? > > It will be a major invasive change all over the place in the kernel > to change things that assume current to do something else. We should > try to avoid this. > > > In this flow, the client isn't providing an arbitrary PID; the trusted > > luod agent is providing the PID of a process it has an active > > connection with. > > PIDs are wobbly thing, you can never really trust them unless they are > in a pidfd. Makes, sense, using a PID by value is fragile due to reuse. Luod would acquire a pidfd for the client process from its socket connection and pass that pidfd to the kernel in the RESTORE_FD ioctl. The kernel would then be operating on a stable, secure handle to the target process. > > The idea was to let luod handle the session/security story, and the > > kernel handle the core preservation mechanism. Adding sessions to the > > kernel, delegates the management and part of the security model into > > the kernel. I am not sure if it is necessary, what can be cleanly > > managed in userspace should stay in userspace. > > session fds were an update imagined to allow the kernel to partition > things the session FD it self could be shared with other processes. I understand the model you're proposing: luod acts as a factory, issuing session FDs that are then passed to clients, allowing them to perform restore operations within their own context. While we can certainly extend the design to support that, I am still trying to determine if it's strictly necessary, especially if the same outcome (correct resource attribution) can be achieved with less kernel complexity. My primary concern is that functionality that can be cleanly managed in userspace should remain there. > I think in the calls the idea was it was reasonable to start without > sessions fds at all, but in this case we shouldn't be mucking with > pids or current. The existing interface, with the addition of passing a pidfd, provides the necessary flexibility without being invasive. The change would be localized to the new code that performs the FD retrieval and wouldn't involve spoofing current or making widespread changes. For example, to handle cgroup charging for a memfd, the flow inside memfd_luo_retrieve() would look something like this: task = get_pid_task(target_pid, PIDTYPE_PID); mm = get_task_mm(task); // ... folio = kho_restore_folio(phys); // Charge to the target mm, not 'current->mm' mem_cgroup_charge(folio, mm, ...); mmput(mm); put_task_struct(task); This approach seems quite contained, and does not modify the existing interfaces. It avoids the need for the kernel to manage the entire session state and its associated security model. Pasha
On Tue, Aug 26, 2025 at 04:10:31PM +0000, Pasha Tatashin wrote: > > > I think in the calls the idea was it was reasonable to start without > > sessions fds at all, but in this case we shouldn't be mucking with > > pids or current. > > The existing interface, with the addition of passing a pidfd, provides > the necessary flexibility without being invasive. The change would be > localized to the new code that performs the FD retrieval and wouldn't > involve spoofing current or making widespread changes. > For example, to handle cgroup charging for a memfd, the flow inside > memfd_luo_retrieve() would look something like this: > > task = get_pid_task(target_pid, PIDTYPE_PID); > mm = get_task_mm(task); > // ... > folio = kho_restore_folio(phys); > // Charge to the target mm, not 'current->mm' > mem_cgroup_charge(folio, mm, ...); > mmput(mm); > put_task_struct(task); Execpt it doesn't work like that in all places, iommufd for example uses GFP_KERNEL_ACCOUNT which relies on current. How you fix that when current is the wrong cgroup, I have no idea if it is even possible. Jason
> > The existing interface, with the addition of passing a pidfd, provides > > the necessary flexibility without being invasive. The change would be > > localized to the new code that performs the FD retrieval and wouldn't > > involve spoofing current or making widespread changes. > > For example, to handle cgroup charging for a memfd, the flow inside > > memfd_luo_retrieve() would look something like this: > > > > task = get_pid_task(target_pid, PIDTYPE_PID); > > mm = get_task_mm(task); > > // ... > > folio = kho_restore_folio(phys); > > // Charge to the target mm, not 'current->mm' > > mem_cgroup_charge(folio, mm, ...); > > mmput(mm); > > put_task_struct(task); > > Execpt it doesn't work like that in all places, iommufd for example > uses GFP_KERNEL_ACCOUNT which relies on current. That's a good point. For kernel allocations, I don't see a clean way to account for a different process. We should not be doing major allocations during the retrieval process itself. Ideally, the kernel would restore an FD using only the preserved folio data (that we can cleanly charge), and then let the user process perform any subsequent actions that might cause new kernel memory allocations. However, I can see how that might not be practical for all handlers. Perhaps, we should add session extensions to the kernel as follow-up after this series lands, we would also need to rewrite luod design accordingly to move some of the sessions logic into the kernel. Thank you, Pasha
On Tue, Aug 26 2025, Pasha Tatashin wrote: >> > The existing interface, with the addition of passing a pidfd, provides >> > the necessary flexibility without being invasive. The change would be >> > localized to the new code that performs the FD retrieval and wouldn't >> > involve spoofing current or making widespread changes. >> > For example, to handle cgroup charging for a memfd, the flow inside >> > memfd_luo_retrieve() would look something like this: >> > >> > task = get_pid_task(target_pid, PIDTYPE_PID); >> > mm = get_task_mm(task); >> > // ... >> > folio = kho_restore_folio(phys); >> > // Charge to the target mm, not 'current->mm' >> > mem_cgroup_charge(folio, mm, ...); >> > mmput(mm); >> > put_task_struct(task); >> > >> > This approach seems quite contained, and does not modify the existing >> > interfaces. It avoids the need for the kernel to manage the entire >> > session state and its associated security model. Even with sessions, I don't think the kernel has to deal with the security model. /dev/liveupdate can still be single-open only, with only luod getting access to it. The the kernel just hands over sessions to luod (maybe with a new ioctl LIVEUPDATE_IOCTL_CREATE_SESSION), and luod takes care of the security model and lifecycle. If luod crashes and loses its handle to /dev/liveupdate, all the sessions associated with it go away too. Essentially, the sessions from kernel perspective would just be a container to group different resources together. I think this adds a small bit of complexity on the session management and serialization side, but I think will save complexity on participating subsystems. >> >> Execpt it doesn't work like that in all places, iommufd for example >> uses GFP_KERNEL_ACCOUNT which relies on current. > > That's a good point. For kernel allocations, I don't see a clean way > to account for a different process. > > We should not be doing major allocations during the retrieval process > itself. Ideally, the kernel would restore an FD using only the > preserved folio data (that we can cleanly charge), and then let the > user process perform any subsequent actions that might cause new > kernel memory allocations. However, I can see how that might not be > practical for all handlers. > > Perhaps, we should add session extensions to the kernel as follow-up > after this series lands, we would also need to rewrite luod design > accordingly to move some of the sessions logic into the kernel. I know the KHO is supposed to not be backwards compatible yet. What is the goal for the LUO APIs? Are they also not backwards compatible? If not, I think we should also consider how sessions will play into backwards compatibility. For example, once we add sessions, what happens to the older versions of luod that directly call preserve or unpreserve? -- Regards, Pratyush Yadav
On Tue, Aug 26, 2025 at 05:03:59PM +0000, Pasha Tatashin wrote: > Perhaps, we should add session extensions to the kernel as follow-up > after this series lands, we would also need to rewrite luod design > accordingly to move some of the sessions logic into the kernel. This is what I imagined at least.. I wouldn't even try to do anything with pid if it can't solve the whole problem. Jason
© 2016 - 2025 Red Hat, Inc.