Live update is a mechanism to support updating a hypervisor in a way
that has limited impact to running virtual machines. This is done by
pausing/serialising running VMs, kexec-ing into a new kernel, starting
new VMM processes and then deserialising/resuming the VMs so that they
continue running from where they were. When the VMs have DMA devices
assigned to them, the IOMMU state and page tables needs to be persisted
so that DMA transactions can continue across kexec.
Currently there is no mechanism in Linux to be able to continue running DMA
across kexec, and pick up handles to the original DMA mappings after
kexec. We are looking for a path to be able to support this capability
which is necessary for live update. In this RFC patch series a
potential solution is sketched out, and we are looking for feedback on
the approach and userspace interfaces.
This RFC is intended to serve as the discussion ground for a Linux
Plumbers Conf 2024 session on iommu persistence:
https://lpc.events/event/18/contributions/1686/
... and a BoF session on memory persistence in general:
https://lpc.events/event/18/contributions/1970/
Please join those to further this discussion.
The concept is as follows:
IOMMUFDs are marked as persistent via a new option. When a struct
iommu_domain is allocated by a iommufd, that iommu_domain also gets
marked as persistend. Before kexec iommufd serialises the metadata for
all persistent domains to the KHO device tree blob. Similarly the IOMMU
platform driver marks all of the page table pages as persistent. After
kexec the persistent IOMMUFDs are expose to userspace in sysfs. Fresh
IOMMUFD objects are built up from the data which was passed across in
KHO. Userspace can open these sysfs files to get a handle on the IOMMUFD
again. Iommufd ensures that only persistent memory can be mapped into
persistent address spaces.
This depends on KHO as the foundational framework for passing data
across kexec and for marking pages as persistent:
https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com/
It also depends on guestmemfs as the provider of persistent guest RAM to
be mapped into the IOMMU:
https://lore.kernel.org/all/20240805093245.889357-1-jgowans@amazon.com/
The code is not a complete solution, it is more of a sketch to show the
moving parts and proposed interfaces to gather feedback and have the
discussion. Only a small portion of the IOMMUFD object and dmar_domains
are serialised; it is necessary to figure out all the data which needs
to be serialised to have a fully working deserialised object.
Sign-offs are omitted to make it clear that this is not for merging yet.
Adding maintainers from IOMMU drivers, iommufd, kexec and KVM (seeing as
this is designed for live update of a hypervisor specifically), and
others who have engaged with the topic of memory persistence previously.
James Gowans (13):
iommufd: Support marking and tracking persistent iommufds
iommufd: Add plumbing for KHO (de)serialise
iommu/intel: zap context table entries on kexec
iommu: Support marking domains as persistent on alloc
iommufd: Serialise persisted iommufds and ioas
iommufd: Expose persistent iommufd IDs in sysfs
iommufd: Re-hydrate a usable iommufd ctx from sysfs
intel-iommu: Add serialise and deserialise boilerplate
intel-iommu: Serialise dmar_domain on KHO activaet
intel-iommu: Re-hydrate persistent domains after kexec
iommu: Add callback to restore persisted iommu_domain
iommufd, guestmemfs: Ensure persistent file used for persistent DMA
iommufd, guestmemfs: Pin files when mapped for persistent DMA
drivers/iommu/amd/iommu.c | 4 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 3 +-
drivers/iommu/intel/Makefile | 1 +
drivers/iommu/intel/dmar.c | 1 +
drivers/iommu/intel/iommu.c | 84 ++++++--
drivers/iommu/intel/iommu.h | 31 +++
drivers/iommu/intel/serialise.c | 174 ++++++++++++++++
drivers/iommu/iommufd/Makefile | 1 +
drivers/iommu/iommufd/hw_pagetable.c | 5 +-
drivers/iommu/iommufd/io_pagetable.c | 2 +-
drivers/iommu/iommufd/ioas.c | 26 +++
drivers/iommu/iommufd/iommufd_private.h | 34 ++++
drivers/iommu/iommufd/main.c | 75 ++++++-
drivers/iommu/iommufd/selftest.c | 1 +
drivers/iommu/iommufd/serialise.c | 213 ++++++++++++++++++++
fs/guestmemfs/file.c | 25 +++
fs/guestmemfs/guestmemfs.h | 1 +
fs/guestmemfs/inode.c | 4 +
include/linux/guestmemfs.h | 15 ++
include/linux/iommu.h | 16 +-
include/uapi/linux/iommufd.h | 5 +
21 files changed, 698 insertions(+), 23 deletions(-)
create mode 100644 drivers/iommu/intel/serialise.c
create mode 100644 drivers/iommu/iommufd/serialise.c
--
2.34.1