[RFC PATCH 00/13] Support iommu(fd) persistence for live update

James Gowans posted 13 patches 2 months, 2 weeks ago
drivers/iommu/amd/iommu.c                   |   4 +-
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   3 +-
drivers/iommu/intel/Makefile                |   1 +
drivers/iommu/intel/dmar.c                  |   1 +
drivers/iommu/intel/iommu.c                 |  84 ++++++--
drivers/iommu/intel/iommu.h                 |  31 +++
drivers/iommu/intel/serialise.c             | 174 ++++++++++++++++
drivers/iommu/iommufd/Makefile              |   1 +
drivers/iommu/iommufd/hw_pagetable.c        |   5 +-
drivers/iommu/iommufd/io_pagetable.c        |   2 +-
drivers/iommu/iommufd/ioas.c                |  26 +++
drivers/iommu/iommufd/iommufd_private.h     |  34 ++++
drivers/iommu/iommufd/main.c                |  75 ++++++-
drivers/iommu/iommufd/selftest.c            |   1 +
drivers/iommu/iommufd/serialise.c           | 213 ++++++++++++++++++++
fs/guestmemfs/file.c                        |  25 +++
fs/guestmemfs/guestmemfs.h                  |   1 +
fs/guestmemfs/inode.c                       |   4 +
include/linux/guestmemfs.h                  |  15 ++
include/linux/iommu.h                       |  16 +-
include/uapi/linux/iommufd.h                |   5 +
21 files changed, 698 insertions(+), 23 deletions(-)
create mode 100644 drivers/iommu/intel/serialise.c
create mode 100644 drivers/iommu/iommufd/serialise.c
[RFC PATCH 00/13] Support iommu(fd) persistence for live update
Posted by James Gowans 2 months, 2 weeks ago
Live update is a mechanism to support updating a hypervisor in a way
that has limited impact to running virtual machines. This is done by
pausing/serialising running VMs, kexec-ing into a new kernel, starting
new VMM processes and then deserialising/resuming the VMs so that they
continue running from where they were. When the VMs have DMA devices
assigned to them, the IOMMU state and page tables needs to be persisted
so that DMA transactions can continue across kexec.

Currently there is no mechanism in Linux to be able to continue running DMA
across kexec, and pick up handles to the original DMA mappings after
kexec. We are looking for a path to be able to support this capability
which is necessary for live update.  In this RFC patch series a
potential solution is sketched out, and we are looking for feedback on
the approach and userspace interfaces.

This RFC is intended to serve as the discussion ground for a Linux
Plumbers Conf 2024 session on iommu persistence:
https://lpc.events/event/18/contributions/1686/
... and a BoF session on memory persistence in general:
https://lpc.events/event/18/contributions/1970/
Please join those to further this discussion.

The concept is as follows:
IOMMUFDs are marked as persistent via a new option.  When a struct
iommu_domain is allocated by a iommufd, that iommu_domain also gets
marked as persistend.  Before kexec iommufd serialises the metadata for
all persistent domains to the KHO device tree blob. Similarly the IOMMU
platform driver marks all of the page table pages as persistent.  After
kexec the persistent IOMMUFDs are expose to userspace in sysfs.  Fresh
IOMMUFD objects are built up from the data which was passed across in
KHO. Userspace can open these sysfs files to get a handle on the IOMMUFD
again. Iommufd ensures that only persistent memory can be mapped into
persistent address spaces.

This depends on KHO as the foundational framework for passing data
across kexec and for marking pages as persistent:
https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com/

It also depends on guestmemfs as the provider of persistent guest RAM to
be mapped into the IOMMU:
https://lore.kernel.org/all/20240805093245.889357-1-jgowans@amazon.com/

The code is not a complete solution, it is more of a sketch to show the
moving parts and proposed interfaces to gather feedback and have the
discussion. Only a small portion of the IOMMUFD object and dmar_domains
are serialised; it is necessary to figure out all the data which needs
to be serialised to have a fully working deserialised object.

Sign-offs are omitted to make it clear that this is not for merging yet.

Adding maintainers from IOMMU drivers, iommufd, kexec and KVM (seeing as
this is designed for live update of a hypervisor specifically), and
others who have engaged with the topic of memory persistence previously.

James Gowans (13):
  iommufd: Support marking and tracking persistent iommufds
  iommufd: Add plumbing for KHO (de)serialise
  iommu/intel: zap context table entries on kexec
  iommu: Support marking domains as persistent on alloc
  iommufd: Serialise persisted iommufds and ioas
  iommufd: Expose persistent iommufd IDs in sysfs
  iommufd: Re-hydrate a usable iommufd ctx from sysfs
  intel-iommu: Add serialise and deserialise boilerplate
  intel-iommu: Serialise dmar_domain on KHO activaet
  intel-iommu: Re-hydrate persistent domains after kexec
  iommu: Add callback to restore persisted iommu_domain
  iommufd, guestmemfs: Ensure persistent file used for persistent DMA
  iommufd, guestmemfs: Pin files when mapped for persistent DMA

 drivers/iommu/amd/iommu.c                   |   4 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c |   3 +-
 drivers/iommu/intel/Makefile                |   1 +
 drivers/iommu/intel/dmar.c                  |   1 +
 drivers/iommu/intel/iommu.c                 |  84 ++++++--
 drivers/iommu/intel/iommu.h                 |  31 +++
 drivers/iommu/intel/serialise.c             | 174 ++++++++++++++++
 drivers/iommu/iommufd/Makefile              |   1 +
 drivers/iommu/iommufd/hw_pagetable.c        |   5 +-
 drivers/iommu/iommufd/io_pagetable.c        |   2 +-
 drivers/iommu/iommufd/ioas.c                |  26 +++
 drivers/iommu/iommufd/iommufd_private.h     |  34 ++++
 drivers/iommu/iommufd/main.c                |  75 ++++++-
 drivers/iommu/iommufd/selftest.c            |   1 +
 drivers/iommu/iommufd/serialise.c           | 213 ++++++++++++++++++++
 fs/guestmemfs/file.c                        |  25 +++
 fs/guestmemfs/guestmemfs.h                  |   1 +
 fs/guestmemfs/inode.c                       |   4 +
 include/linux/guestmemfs.h                  |  15 ++
 include/linux/iommu.h                       |  16 +-
 include/uapi/linux/iommufd.h                |   5 +
 21 files changed, 698 insertions(+), 23 deletions(-)
 create mode 100644 drivers/iommu/intel/serialise.c
 create mode 100644 drivers/iommu/iommufd/serialise.c

-- 
2.34.1