RE: [RFC PATCH v4 00/24] vfio: Adopt iommufd

Duan, Zhenzhong posted 24 patches 9 months ago
Only 0 patches received!
There is a newer version of this series
RE: [RFC PATCH v4 00/24] vfio: Adopt iommufd
Posted by Duan, Zhenzhong 9 months ago
Ping, any comments or suggestions are appreciated.

Thanks
Zhenzhong

>-----Original Message-----
>From: Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Sent: Wednesday, July 12, 2023 3:25 PM
>To: qemu-devel@nongnu.org
>Cc: alex.williamson@redhat.com; clg@redhat.com; jgg@nvidia.com;
>nicolinc@nvidia.com; eric.auger@redhat.com; peterx@redhat.com;
>jasonwang@redhat.com; Tian, Kevin <kevin.tian@intel.com>; Liu, Yi L
><yi.l.liu@intel.com>; Sun, Yi Y <yi.y.sun@intel.com>; Peng, Chao P
><chao.p.peng@intel.com>; Duan, Zhenzhong <zhenzhong.duan@intel.com>
>Subject: [RFC PATCH v4 00/24] vfio: Adopt iommufd
>
>With the introduction of iommufd, the Linux kernel provides a generic
>interface for userspace drivers to propagate their DMA mappings to kernel
>for assigned devices. This series does the porting of the VFIO devices
>onto the /dev/iommu uapi and let it coexist with the legacy implementation.
>
>This QEMU integration is the result of a collaborative work between
>Yi Liu, Yi Sun, Nicolin Chen and Eric Auger.
>
>At QEMU level, interactions with the /dev/iommu are abstracted by a new
>iommufd object (compiled in with the CONFIG_IOMMUFD option).
>
>Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be
>linked with an iommufd object. In this series, the vfio-pci device is
>granted with such capability (other VFIO devices are not yet ready):
>
>It gets a new optional parameter named iommufd which allows to pass
>an iommufd object:
>
>    -object iommufd,id=iommufd0
>    -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0
>
>Note the /dev/iommu and vfio cdev can be externally opened by a
>management layer. In such a case the fd is passed:
>
>    -object iommufd,id=iommufd0,fd=22
>    -device vfio-pci,iommufd=iommufd0,fd=23
>
>If the fd parameter is not passed, the fd is opened by QEMU.
>See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html
>for detailed discuss on this requirement.
>
>If no iommufd option is passed to the vfio-pci device, iommufd is not
>used and the end-user gets the behavior based on the legacy vfio iommu
>interfaces:
>
>    -device vfio-pci,host=0000:02:00.0
>
>While the legacy kernel interface is group-centric, the new iommufd
>interface is device-centric, relying on device fd and iommufd.
>
>To support both interfaces in the QEMU VFIO device we reworked the vfio
>container abstraction so that the generic VFIO code can use either
>backend.
>
>The VFIOContainer object becomes a base object derived into
>a) the legacy VFIO container and
>b) the new iommufd based container.
>
>The base object implements generic code such as code related to
>memory_listener and address space management whereas the derived
>objects implement callbacks specific to either BE, legacy and
>iommufd. Indeed each backend has its own way to setup secure context
>and dma management interface. The below diagram shows how it looks
>like with both BEs.
>
>                    VFIO                           AddressSpace/Memory
>    +-------+  +----------+  +-----+  +-----+
>    |  pci  |  | platform |  |  ap |  | ccw |
>    +---+---+  +----+-----+  +--+--+  +--+--+     +----------------------+
>        |           |           |        |        |   AddressSpace       |
>        |           |           |        |        +------------+---------+
>    +---V-----------V-----------V--------V----+               /
>    |           VFIOAddressSpace              | <------------+
>    |                  |                      |  MemoryListener
>    |          VFIOContainer list             |
>    +-------+----------------------------+----+
>            |                            |
>            |                            |
>    +-------V------+            +--------V----------+
>    |   iommufd    |            |    vfio legacy    |
>    |  container   |            |     container     |
>    +-------+------+            +--------+----------+
>            |                            |
>            | /dev/iommu                 | /dev/vfio/vfio
>            | /dev/vfio/devices/vfioX    | /dev/vfio/$group_id
>Userspace   |                            |
>============+============================+=======================
>====
>Kernel      |  device fd                 |
>            +---------------+            | group/container fd
>            | (BIND_IOMMUFD |            | (SET_CONTAINER/SET_IOMMU)
>            |  ATTACH_IOAS) |            | device fd
>            |               |            |
>            |       +-------V------------V-----------------+
>    iommufd |       |                vfio                  |
>(map/unmap  |       +---------+--------------------+-------+
>ioas_copy)  |                 |                    | map/unmap
>            |                 |                    |
>     +------V------+    +-----V------+      +------V--------+
>     | iommfd core |    |  device    |      |  vfio iommu   |
>     +-------------+    +------------+      +---------------+
>
>[Secure Context setup]
>- iommufd BE: uses device fd and iommufd to setup secure context
>              (bind_iommufd, attach_ioas)
>- vfio legacy BE: uses group fd and container fd to setup secure context
>                  (set_container, set_iommu)
>[Device access]
>- iommufd BE: device fd is opened through /dev/vfio/devices/vfioX
>- vfio legacy BE: device fd is retrieved from group fd ioctl
>[DMA Mapping flow]
>1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener
>2. VFIO populates DMA map/unmap via the container BEs
>   *) iommufd BE: uses iommufd
>   *) vfio legacy BE: uses container fd
>
>This series depends on Yi's kernel series:
>"[PATCH v14 00/26] Add vfio_device cdev for iommufd support"
>https://lore.kernel.org/all/20230711025928.6438-1-yi.l.liu@intel.com/
>and
>"[PATCH v9 00/10] Enhance vfio PCI hot reset for vfio cdev device"
>https://lore.kernel.org/kvm/20230711023126.5531-1-yi.l.liu@intel.com/
>
>which can be found at:
>https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v14
>
>This qemu series can be found at:
>https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_rfcv4
>
>Test done:
>- PCI device were tested
>- platform, ccw and ap were only compile-tested
>- FD passing and hot reset with some trick.
>- device hotplug test with legacy and iommufd backends (limited tests)
>- vIOMMU test run for both legacy and iommufd backends (limited tests)
>
>
>Given some iommufd kernel limitations, the iommufd backend is
>not yet fully on par with the legacy backend w.r.t. features like:
>- p2p mappings (you will see related error traces)
>- live migration
>- and etc.
>
>About TODOs in rfcv3:
>- Add DMA alias check for iommufd BE (group level)
>attach_ioas will fail for aliased device, so I think that's not a problem.
>
>- Make pci.c to be BE agnostic. Needs kernel change as well to fix the
>  VFIO_DEVICE_PCI_HOT_RESET gap
>I didn't make pci.c fully group agnostic because pci device reset is
>device scope operation, force mapping it to container scope callback
>isn't a good idea. Instead I added iommufd code in pci.c and fixed
>VFIO_DEVICE_PCI_HOT_RESET gap there.
>
>- Cleanup the VFIODevice fields as it's used in both backends
>- Replace list with g_tree
>This TODO is not viable due to iterator callback depending on list element.
>
>- Add locks
>I think it's not necessory as BQL already ensure that.
>
>base-commit: 887cba855b
>
>Change log:
>v3 -> v4:
>- rebase on top of v8.0.3
>- Add one patch from Yi which is about vfio device add in kvm
>- Remove IOAS_COPY optimization and focus on functions in this patchset
>- Fix wrong name issue reported and fix suggested by Matthew
>- Fix compilation issue reported and fix sugggsted by Nicolin
>- Use query_dirty_bitmap callback to replace get_dirty_bitmap for better
>granularity
>- Add dev_iter_next() callback to avoid adding so many callback
>  at container scope, add VFIODevice.hwpt to support that
>- Restore all functions back to common from container whenever possible,
>  mainly migration and reset related functions
>- Add --enable/disable-iommufd config option, enabled by default in linux
>- Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next
>- Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device
>- vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove
>redundant code
>- Add FD passing support for vfio device backed by IOMMUFD
>- Fix hot unplug resource leak issue in vfio_legacy_detach_device()
>- Fix FD leak in vfio_get_devicefd()
>
>v3: https://lists.nongnu.org/archive/html/qemu-devel/2023-01/msg07189.html
>
>v2 -> v3:
>- rebase on top of v7.2.0
>- Fix the compilation with CONFIG_IOMMUFD unset by using true classes for
>  VFIO backends
>- Fix use after free in error path, reported by Alister
>- Split common.c in several steps to ease the review
>
>v1 -> v2:
>- remove the first three patches of rfcv1
>- add open cdev helper suggested by Jason
>- remove the QOMification of the VFIOContainer and simply use standard ops
>(David)
>- add "-object iommufd" suggested by Alex
>
>v1: https://lore.kernel.org/qemu-devel/20220414104710.28534-1-
>yi.l.liu@intel.com/
>
>Thanks,
>Yi, Yi, Eric, Zhenzhong
>
>Eric Auger (9):
>  scripts/update-linux-headers: Add iommufd.h
>  vfio/common: Introduce vfio_container_add|del_section_window()
>  vfio/container: Introduce vfio_[attach/detach]_device
>  vfio/platform: Use vfio_[attach/detach]_device
>  vfio/ap: Use vfio_[attach/detach]_device
>  vfio/ccw: Use vfio_[attach/detach]_device
>  vfio/container-base: Introduce [attach/detach]_device container
>    callbacks
>  backends/iommufd: Introduce the iommufd object
>  vfio/as: Allow the selection of a given iommu backend
>
>Yi Liu (6):
>  vfio/common: Move IOMMU agnostic helpers to a separate file
>  vfio/common: Move legacy VFIO backend code into separate container.c
>  vfio/common: Rename into as.c
>  vfio: Add base container
>  util/char_dev: Add open_cdev()
>  vfio/iommufd: Implement the iommufd backend
>
>Zhenzhong Duan (9):
>  Update linux-header per VFIO device cdev v14
>  vfio/common: Extract out vfio_kvm_device_[add/del]_fd
>  vfio/common: Add a vfio device iterator
>  vfio/common: Refactor vfio_viommu_preset() to be group agnostic
>  vfio/as: Simplify vfio_viommu_preset()
>  Add iommufd configure option
>  vfio/as: Add vfio device iterator callback for iommufd
>  vfio/pci: Adapt vfio pci hot reset support with iommufd BE
>  vfio/iommufd: Make vfio cdev pre-openable by passing a file handle
>
> MAINTAINERS                           |   13 +
> backends/Kconfig                      |    4 +
> backends/iommufd.c                    |  268 +++
> backends/meson.build                  |    3 +
> backends/trace-events                 |   12 +
> hw/vfio/ap.c                          |   66 +-
> hw/vfio/as.c                          | 1555 +++++++++++++
> hw/vfio/ccw.c                         |  122 +-
> hw/vfio/common.c                      | 3078 -------------------------
> hw/vfio/container-base.c              |  146 ++
> hw/vfio/container.c                   | 1218 ++++++++++
> hw/vfio/helpers.c                     |  598 +++++
> hw/vfio/iommufd.c                     |  546 +++++
> hw/vfio/meson.build                   |    8 +-
> hw/vfio/pci.c                         |  354 ++-
> hw/vfio/platform.c                    |   43 +-
> hw/vfio/spapr.c                       |   22 +-
> hw/vfio/trace-events                  |   16 +-
> include/hw/vfio/vfio-common.h         |  109 +-
> include/hw/vfio/vfio-container-base.h |  158 ++
> include/qemu/char_dev.h               |   16 +
> include/sysemu/iommufd.h              |   47 +
> linux-headers/linux/iommufd.h         |  347 +++
> linux-headers/linux/kvm.h             |   13 +-
> linux-headers/linux/vfio.h            |  142 +-
> meson.build                           |    6 +
> meson_options.txt                     |    2 +
> qapi/qom.json                         |   18 +-
> qemu-options.hx                       |   13 +
> scripts/meson-buildoptions.sh         |    3 +
> scripts/update-linux-headers.sh       |    3 +-
> util/chardev_open.c                   |   61 +
> util/meson.build                      |    1 +
> 33 files changed, 5601 insertions(+), 3410 deletions(-)
> create mode 100644 backends/iommufd.c
> create mode 100644 hw/vfio/as.c
> delete mode 100644 hw/vfio/common.c
> create mode 100644 hw/vfio/container-base.c
> create mode 100644 hw/vfio/container.c
> create mode 100644 hw/vfio/helpers.c
> create mode 100644 hw/vfio/iommufd.c
> create mode 100644 include/hw/vfio/vfio-container-base.h
> create mode 100644 include/qemu/char_dev.h
> create mode 100644 include/sysemu/iommufd.h
> create mode 100644 linux-headers/linux/iommufd.h
> create mode 100644 util/chardev_open.c
>
>--
>2.34.1
Re: [RFC PATCH v4 00/24] vfio: Adopt iommufd
Posted by Nicolin Chen 9 months ago
On Tue, Aug 01, 2023 at 08:28:01AM +0000, Duan, Zhenzhong wrote:
 
> Ping, any comments or suggestions are appreciated.

Zhenzhong, I'd love to, yet haven't got the chance to go through
this series. I think that most of us are quite occupied at this
moment by the kernel side of the changes.

I plan to take a close look and run some tests next week.

Thanks
Nicolin
RE: [RFC PATCH v4 00/24] vfio: Adopt iommufd
Posted by Duan, Zhenzhong 9 months ago
>-----Original Message-----
>From: Nicolin Chen <nicolinc@nvidia.com>
>Subject: Re: [RFC PATCH v4 00/24] vfio: Adopt iommufd
>
>On Tue, Aug 01, 2023 at 08:28:01AM +0000, Duan, Zhenzhong wrote:
>
>> Ping, any comments or suggestions are appreciated.
>
>Zhenzhong, I'd love to, yet haven't got the chance to go through
>this series. I think that most of us are quite occupied at this
>moment by the kernel side of the changes.
Oh, I see.

>
>I plan to take a close look and run some tests next week.
Much appreciated, thanks Nicolin.

BRs,
Zhenzhong