MAINTAINERS | 10 + qapi/qom.json | 19 + hw/vfio/pci.h | 6 + include/hw/vfio/vfio-common.h | 26 +- include/hw/vfio/vfio-container-base.h | 15 +- include/qemu/chardev_open.h | 16 + include/sysemu/iommufd.h | 44 ++ backends/iommufd.c | 228 ++++++++++ hw/vfio/ap.c | 29 +- hw/vfio/ccw.c | 31 +- hw/vfio/common.c | 24 +- hw/vfio/container-base.c | 6 +- hw/vfio/container.c | 208 ++++++++- hw/vfio/helpers.c | 44 ++ hw/vfio/iommufd.c | 630 ++++++++++++++++++++++++++ hw/vfio/pci.c | 212 ++------- hw/vfio/platform.c | 38 +- util/chardev_open.c | 81 ++++ backends/Kconfig | 4 + backends/meson.build | 1 + backends/trace-events | 10 + hw/arm/Kconfig | 1 + hw/i386/Kconfig | 1 + hw/s390x/Kconfig | 1 + hw/vfio/meson.build | 3 + hw/vfio/trace-events | 11 + qemu-options.hx | 12 + util/meson.build | 1 + 28 files changed, 1493 insertions(+), 219 deletions(-) create mode 100644 include/qemu/chardev_open.h create mode 100644 include/sysemu/iommufd.h create mode 100644 backends/iommufd.c create mode 100644 hw/vfio/iommufd.c create mode 100644 util/chardev_open.c
Hi, Thanks all for giving guides and comments on previous series, this is the remaining part of the iommufd support. Based on Cédric's suggestion, replace old config method for IOMMUFD with Kconfig. Based on Jason's suggestion, drop the implementation of manually allocating hwpt and switch to IOAS attach/detach. Beside current test, we also tested mdev with mtty for better cover range. PATCH 1: Introduce iommufd object PATCH 2-9: add IOMMUFD container and cdev support PATCH 10-17: fd passing for cdev and linking to IOMMUFD PATCH 18: make VFIOContainerBase parameter const PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86 We have done wide test with different combinations, e.g: - PCI device were tested - FD passing and hot reset with some trick. - device hotplug test with legacy and iommufd backends - with or without vIOMMU for legacy and iommufd backends - divices linked to different iommufds - VFIO migration with a E800 net card(no dirty sync support) passthrough - platform, ccw and ap were only compile-tested due to environment limit - test mdev pass through with mtty and mix with real device and different BE Given some iommufd kernel limitations, the iommufd backend is not yet fully on par with the legacy backend w.r.t. features like: - p2p mappings (you will see related error traces) - dirty page sync - and etc. qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v6 Based on vfio-next, commit id: 1a22fb936e -------------------------------------------------------------------------- Below are some background and graph about the design: With the introduction of iommufd, the Linux kernel provides a generic interface for userspace drivers to propagate their DMA mappings to kernel for assigned devices. This series does the porting of the VFIO devices onto the /dev/iommu uapi and let it coexist with the legacy implementation. At QEMU level, interactions with the /dev/iommu are abstracted by a new iommufd object (compiled in with the CONFIG_IOMMUFD option). Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be linked with an iommufd object. In this series, the vfio-pci device is granted with such capability (other VFIO devices are not yet ready): It gets a new optional parameter named iommufd which allows to pass an iommufd object: -object iommufd,id=iommufd0 -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 Note the /dev/iommu and vfio cdev can be externally opened by a management layer. In such a case the fd is passed: -object iommufd,id=iommufd0,fd=22 -device vfio-pci,iommufd=iommufd0,fd=23 If the fd parameter is not passed, the fd is opened by QEMU. See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html for detailed discuss on this requirement. If no iommufd option is passed to the vfio-pci device, iommufd is not used and the end-user gets the behavior based on the legacy vfio iommu interfaces: -device vfio-pci,host=0000:02:00.0 While the legacy kernel interface is group-centric, the new iommufd interface is device-centric, relying on device fd and iommufd. To support both interfaces in the QEMU VFIO device we reworked the vfio container abstraction so that the generic VFIO code can use either backend. The VFIOContainer object becomes a base object derived into a) the legacy VFIO container and b) the new iommufd based container. The base object implements generic code such as code related to memory_listener and address space management whereas the derived objects implement callbacks specific to either BE, legacy and iommufd. Indeed each backend has its own way to setup secure context and dma management interface. The below diagram shows how it looks like with both BEs. VFIO AddressSpace/Memory +-------+ +----------+ +-----+ +-----+ | pci | | platform | | ap | | ccw | +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+ | | | | | AddressSpace | | | | | +------------+---------+ +---V-----------V-----------V--------V----+ / | VFIOAddressSpace | <------------+ | | | MemoryListener | VFIOContainer list | +-------+----------------------------+----+ | | | | +-------V------+ +--------V----------+ | iommufd | | vfio legacy | | container | | container | +-------+------+ +--------+----------+ | | | /dev/iommu | /dev/vfio/vfio | /dev/vfio/devices/vfioX | /dev/vfio/$group_id Userspace | | ============+============================+=========================== Kernel | device fd | +---------------+ | group/container fd | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU) | ATTACH_IOAS) | | device fd | | | | +-------V------------V-----------------+ iommufd | | vfio | (map/unmap | +---------+--------------------+-------+ ioas_copy) | | | map/unmap | | | +------V------+ +-----V------+ +------V--------+ | iommfd core | | device | | vfio iommu | +-------------+ +------------+ +---------------+ [Secure Context setup] - iommufd BE: uses device fd and iommufd to setup secure context (bind_iommufd, attach_ioas) - vfio legacy BE: uses group fd and container fd to setup secure context (set_container, set_iommu) [Device access] - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX - vfio legacy BE: device fd is retrieved from group fd ioctl [DMA Mapping flow] 1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener 2. VFIO populates DMA map/unmap via the container BEs *) iommufd BE: uses iommufd *) vfio legacy BE: uses container fd Changelog: v6: - simplify CONFIG_IOMMUFD checking code further (Cédric) - check iommufd_cdev_kvm_device_add return value (Cédric) - dirrectory -> directory (Cédric) - propagate iommufd_cdev_get_info_iova_range err and print as warning (Cédric) - introduce a helper vfio_device_set_fd (Cédric) - Move #include "sysemu/iommufd.h" in platform.c (Cédric) - simplify iommufd backend uAPI, remove alloc_hwpt, get/put_ioas - Dare to keep Matthew's RB as related change is minor v5: - Change to use Kconfig for CONFIG_IOMMUFD and drop stub file (Cédric) - Add (uintptr_t) to info->allowed_iovas (Cédric) - Switch to IOAS attach/detach and hide hwpt (Jason) - move chardev_open.[h|c] under the IOMMUFD entry (Cédric) - Move vfio_legacy_pci_hot_reset into container.c (Cédric) - Add missed pgsizes initialization in vfio_get_info_iova_range - split linking iommufd patch into three to be cleaner - Fix comments on PCI BAR unmap v4: - add CONFIG_IOMMUFD check for IOMMUFDProperties (Markus) - add doc for default case without fd (Markus) - Fix build issue reported by Markus and Cédric - Simply use SPDX identifier in new file (Cédric) - make vfio_container_init/destroy helper a seperate patch (Cédric) - make vrdl_list movement a seperate patch (Cédric) - add const for some callback parameters (Cédric) - add g_assert in VFIOIOMMUOps callback (Cédric) - introduce pci_hot_reset callback (Cédric) - remove VFIOIOMMUSpaprOps (Cédric) - initialize g_autofree to NULL (Cédric) - adjust func name prefix and trace event in iommufd.c (Cédric) - add RB v3: - Rename base container as VFIOContainerBase and legacy container as container (Cédric) - Drop VFIO_IOMMU_BACKEND_OPS class and use struct instead (Cédric) - Cleanup container.c by introducing spapr backend and move spapr code out (Cédric) - Introduce vfio_iommu_spapr_ops (Cédric) - Add doc of iommufd in qom.json and have iommufd member sorted (Markus) - patch19 and patch21 are splitted to two parts to facilitate review v2: - patch "vfio: Add base container" in v1 is split into patch1-15 per Cédric - add fd passing to platform/ap/ccw vfio device - add (uintptr_t) cast in iommufd_backend_map_dma() per Cédric - rename char_dev.h to chardev_open.h for same naming scheme per Daniel - add full copyright per Daniel and Jason Note changelog below are from full IOMMUFD series: v1: - Alloc hwpt instead of using auto hwpt - elaborate iommufd code per Nicolin - consolidate two patches and drop as.c - typo error fix and function rename rfcv4: - rebase on top of v8.0.3 - Add one patch from Yi which is about vfio device add in kvm - Remove IOAS_COPY optimization and focus on functions in this patchset - Fix wrong name issue reported and fix suggested by Matthew - Fix compilation issue reported and fix sugggsted by Nicolin - Use query_dirty_bitmap callback to replace get_dirty_bitmap for better granularity - Add dev_iter_next() callback to avoid adding so many callback at container scope, add VFIODevice.hwpt to support that - Restore all functions back to common from container whenever possible, mainly migration and reset related functions - Add --enable/disable-iommufd config option, enabled by default in linux - Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next - Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device - vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove redundant code - Add FD passing support for vfio device backed by IOMMUFD - Fix hot unplug resource leak issue in vfio_legacy_detach_device() - Fix FD leak in vfio_get_devicefd() rfcv3: - rebase on top of v7.2.0 - Fix the compilation with CONFIG_IOMMUFD unset by using true classes for VFIO backends - Fix use after free in error path, reported by Alister - Split common.c in several steps to ease the review rfcv2: - remove the first three patches of rfcv1 - add open cdev helper suggested by Jason - remove the QOMification of the VFIOContainer and simply use standard ops (David) - add "-object iommufd" suggested by Alex Thanks Zhenzhong Cédric Le Goater (3): hw/arm: Activate IOMMUFD for virt machines kconfig: Activate IOMMUFD for s390x machines hw/i386: Activate IOMMUFD for q35 machines Eric Auger (2): backends/iommufd: Introduce the iommufd object vfio/pci: Allow the selection of a given iommu backend Yi Liu (2): util/char_dev: Add open_cdev() vfio/iommufd: Implement the iommufd backend Zhenzhong Duan (14): vfio/common: return early if space isn't empty vfio/iommufd: Relax assert check for iommufd backend vfio/iommufd: Add support for iova_ranges and pgsizes vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info vfio/pci: Introduce a vfio pci hot reset interface vfio/iommufd: Enable pci hot reset through iommufd cdev interface vfio/pci: Make vfio cdev pre-openable by passing a file handle vfio/platform: Allow the selection of a given iommu backend vfio/platform: Make vfio cdev pre-openable by passing a file handle vfio/ap: Allow the selection of a given iommu backend vfio/ap: Make vfio cdev pre-openable by passing a file handle vfio/ccw: Allow the selection of a given iommu backend vfio/ccw: Make vfio cdev pre-openable by passing a file handle vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps callbacks MAINTAINERS | 10 + qapi/qom.json | 19 + hw/vfio/pci.h | 6 + include/hw/vfio/vfio-common.h | 26 +- include/hw/vfio/vfio-container-base.h | 15 +- include/qemu/chardev_open.h | 16 + include/sysemu/iommufd.h | 44 ++ backends/iommufd.c | 228 ++++++++++ hw/vfio/ap.c | 29 +- hw/vfio/ccw.c | 31 +- hw/vfio/common.c | 24 +- hw/vfio/container-base.c | 6 +- hw/vfio/container.c | 208 ++++++++- hw/vfio/helpers.c | 44 ++ hw/vfio/iommufd.c | 630 ++++++++++++++++++++++++++ hw/vfio/pci.c | 212 ++------- hw/vfio/platform.c | 38 +- util/chardev_open.c | 81 ++++ backends/Kconfig | 4 + backends/meson.build | 1 + backends/trace-events | 10 + hw/arm/Kconfig | 1 + hw/i386/Kconfig | 1 + hw/s390x/Kconfig | 1 + hw/vfio/meson.build | 3 + hw/vfio/trace-events | 11 + qemu-options.hx | 12 + util/meson.build | 1 + 28 files changed, 1493 insertions(+), 219 deletions(-) create mode 100644 include/qemu/chardev_open.h create mode 100644 include/sysemu/iommufd.h create mode 100644 backends/iommufd.c create mode 100644 hw/vfio/iommufd.c create mode 100644 util/chardev_open.c -- 2.34.1
Hello Zhenzhong, On 11/14/23 11:09, Zhenzhong Duan wrote: > Hi, > > Thanks all for giving guides and comments on previous series, this is > the remaining part of the iommufd support. > > Based on Cédric's suggestion, replace old config method for IOMMUFD > with Kconfig. > > Based on Jason's suggestion, drop the implementation of manually > allocating hwpt and switch to IOAS attach/detach. > > Beside current test, we also tested mdev with mtty for better cover range. > > PATCH 1: Introduce iommufd object > PATCH 2-9: add IOMMUFD container and cdev support > PATCH 10-17: fd passing for cdev and linking to IOMMUFD > PATCH 18: make VFIOContainerBase parameter const > PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86 > > > We have done wide test with different combinations, e.g: > - PCI device were tested > - FD passing and hot reset with some trick. > - device hotplug test with legacy and iommufd backends > - with or without vIOMMU for legacy and iommufd backends > - divices linked to different iommufds > - VFIO migration with a E800 net card(no dirty sync support) passthrough > - platform, ccw and ap were only compile-tested due to environment limit > - test mdev pass through with mtty and mix with real device and different BE > > Given some iommufd kernel limitations, the iommufd backend is > not yet fully on par with the legacy backend w.r.t. features like: > - p2p mappings (you will see related error traces) > - dirty page sync > - and etc. > > > qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v6 > Based on vfio-next, commit id: 1a22fb936e I just had a few comments that I addressed myself in : https://github.com/legoater/qemu/commits/vfio-8.2 Please take a look and I will possibly merge the modified v6 in vfio-next after a few days or next week when people are back from LPC. I would like to propose this series for an early merge in QEMU 9.0. So, we have a few weeks to polish and test a bit more. I didn't get feedback from ARM for instance (that's a message in a bottle for Eric :). My last request would be to take some of the documentation, below in this email, and include it in QEMU. It can be done later. Thanks, C. > > -------------------------------------------------------------------------- > > Below are some background and graph about the design: > > With the introduction of iommufd, the Linux kernel provides a generic > interface for userspace drivers to propagate their DMA mappings to kernel > for assigned devices. This series does the porting of the VFIO devices > onto the /dev/iommu uapi and let it coexist with the legacy implementation. > > At QEMU level, interactions with the /dev/iommu are abstracted by a new > iommufd object (compiled in with the CONFIG_IOMMUFD option). > > Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be > linked with an iommufd object. In this series, the vfio-pci device is > granted with such capability (other VFIO devices are not yet ready): > > It gets a new optional parameter named iommufd which allows to pass > an iommufd object: > > -object iommufd,id=iommufd0 > -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 > > Note the /dev/iommu and vfio cdev can be externally opened by a > management layer. In such a case the fd is passed: > > -object iommufd,id=iommufd0,fd=22 > -device vfio-pci,iommufd=iommufd0,fd=23 > > If the fd parameter is not passed, the fd is opened by QEMU. > See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html > for detailed discuss on this requirement. > > If no iommufd option is passed to the vfio-pci device, iommufd is not > used and the end-user gets the behavior based on the legacy vfio iommu > interfaces: > > -device vfio-pci,host=0000:02:00.0 > > While the legacy kernel interface is group-centric, the new iommufd > interface is device-centric, relying on device fd and iommufd. > > To support both interfaces in the QEMU VFIO device we reworked the vfio > container abstraction so that the generic VFIO code can use either > backend. > > The VFIOContainer object becomes a base object derived into > a) the legacy VFIO container and > b) the new iommufd based container. > > The base object implements generic code such as code related to > memory_listener and address space management whereas the derived > objects implement callbacks specific to either BE, legacy and > iommufd. Indeed each backend has its own way to setup secure context > and dma management interface. The below diagram shows how it looks > like with both BEs. > > VFIO AddressSpace/Memory > +-------+ +----------+ +-----+ +-----+ > | pci | | platform | | ap | | ccw | > +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+ > | | | | | AddressSpace | > | | | | +------------+---------+ > +---V-----------V-----------V--------V----+ / > | VFIOAddressSpace | <------------+ > | | | MemoryListener > | VFIOContainer list | > +-------+----------------------------+----+ > | | > | | > +-------V------+ +--------V----------+ > | iommufd | | vfio legacy | > | container | | container | > +-------+------+ +--------+----------+ > | | > | /dev/iommu | /dev/vfio/vfio > | /dev/vfio/devices/vfioX | /dev/vfio/$group_id > Userspace | | > ============+============================+=========================== > Kernel | device fd | > +---------------+ | group/container fd > | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU) > | ATTACH_IOAS) | | device fd > | | | > | +-------V------------V-----------------+ > iommufd | | vfio | > (map/unmap | +---------+--------------------+-------+ > ioas_copy) | | | map/unmap > | | | > +------V------+ +-----V------+ +------V--------+ > | iommfd core | | device | | vfio iommu | > +-------------+ +------------+ +---------------+ > > [Secure Context setup] > - iommufd BE: uses device fd and iommufd to setup secure context > (bind_iommufd, attach_ioas) > - vfio legacy BE: uses group fd and container fd to setup secure context > (set_container, set_iommu) > [Device access] > - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX > - vfio legacy BE: device fd is retrieved from group fd ioctl > [DMA Mapping flow] > 1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener > 2. VFIO populates DMA map/unmap via the container BEs > *) iommufd BE: uses iommufd > *) vfio legacy BE: uses container fd > > > Changelog: > v6: > - simplify CONFIG_IOMMUFD checking code further (Cédric) > - check iommufd_cdev_kvm_device_add return value (Cédric) > - dirrectory -> directory (Cédric) > - propagate iommufd_cdev_get_info_iova_range err and print as warning (Cédric) > - introduce a helper vfio_device_set_fd (Cédric) > - Move #include "sysemu/iommufd.h" in platform.c (Cédric) > - simplify iommufd backend uAPI, remove alloc_hwpt, get/put_ioas > - Dare to keep Matthew's RB as related change is minor > > v5: > - Change to use Kconfig for CONFIG_IOMMUFD and drop stub file (Cédric) > - Add (uintptr_t) to info->allowed_iovas (Cédric) > - Switch to IOAS attach/detach and hide hwpt (Jason) > - move chardev_open.[h|c] under the IOMMUFD entry (Cédric) > - Move vfio_legacy_pci_hot_reset into container.c (Cédric) > - Add missed pgsizes initialization in vfio_get_info_iova_range > - split linking iommufd patch into three to be cleaner > - Fix comments on PCI BAR unmap > > v4: > - add CONFIG_IOMMUFD check for IOMMUFDProperties (Markus) > - add doc for default case without fd (Markus) > - Fix build issue reported by Markus and Cédric > - Simply use SPDX identifier in new file (Cédric) > - make vfio_container_init/destroy helper a seperate patch (Cédric) > - make vrdl_list movement a seperate patch (Cédric) > - add const for some callback parameters (Cédric) > - add g_assert in VFIOIOMMUOps callback (Cédric) > - introduce pci_hot_reset callback (Cédric) > - remove VFIOIOMMUSpaprOps (Cédric) > - initialize g_autofree to NULL (Cédric) > - adjust func name prefix and trace event in iommufd.c (Cédric) > - add RB > > v3: > - Rename base container as VFIOContainerBase and legacy container as container (Cédric) > - Drop VFIO_IOMMU_BACKEND_OPS class and use struct instead (Cédric) > - Cleanup container.c by introducing spapr backend and move spapr code out (Cédric) > - Introduce vfio_iommu_spapr_ops (Cédric) > - Add doc of iommufd in qom.json and have iommufd member sorted (Markus) > - patch19 and patch21 are splitted to two parts to facilitate review > > v2: > - patch "vfio: Add base container" in v1 is split into patch1-15 per Cédric > - add fd passing to platform/ap/ccw vfio device > - add (uintptr_t) cast in iommufd_backend_map_dma() per Cédric > - rename char_dev.h to chardev_open.h for same naming scheme per Daniel > - add full copyright per Daniel and Jason > > > Note changelog below are from full IOMMUFD series: > > v1: > - Alloc hwpt instead of using auto hwpt > - elaborate iommufd code per Nicolin > - consolidate two patches and drop as.c > - typo error fix and function rename > > rfcv4: > - rebase on top of v8.0.3 > - Add one patch from Yi which is about vfio device add in kvm > - Remove IOAS_COPY optimization and focus on functions in this patchset > - Fix wrong name issue reported and fix suggested by Matthew > - Fix compilation issue reported and fix sugggsted by Nicolin > - Use query_dirty_bitmap callback to replace get_dirty_bitmap for better > granularity > - Add dev_iter_next() callback to avoid adding so many callback > at container scope, add VFIODevice.hwpt to support that > - Restore all functions back to common from container whenever possible, > mainly migration and reset related functions > - Add --enable/disable-iommufd config option, enabled by default in linux > - Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next > - Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device > - vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove > redundant code > - Add FD passing support for vfio device backed by IOMMUFD > - Fix hot unplug resource leak issue in vfio_legacy_detach_device() > - Fix FD leak in vfio_get_devicefd() > > rfcv3: > - rebase on top of v7.2.0 > - Fix the compilation with CONFIG_IOMMUFD unset by using true classes for > VFIO backends > - Fix use after free in error path, reported by Alister > - Split common.c in several steps to ease the review > > rfcv2: > - remove the first three patches of rfcv1 > - add open cdev helper suggested by Jason > - remove the QOMification of the VFIOContainer and simply use standard ops > (David) > - add "-object iommufd" suggested by Alex > > Thanks > Zhenzhong > > > Cédric Le Goater (3): > hw/arm: Activate IOMMUFD for virt machines > kconfig: Activate IOMMUFD for s390x machines > hw/i386: Activate IOMMUFD for q35 machines > > Eric Auger (2): > backends/iommufd: Introduce the iommufd object > vfio/pci: Allow the selection of a given iommu backend > > Yi Liu (2): > util/char_dev: Add open_cdev() > vfio/iommufd: Implement the iommufd backend > > Zhenzhong Duan (14): > vfio/common: return early if space isn't empty > vfio/iommufd: Relax assert check for iommufd backend > vfio/iommufd: Add support for iova_ranges and pgsizes > vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info > vfio/pci: Introduce a vfio pci hot reset interface > vfio/iommufd: Enable pci hot reset through iommufd cdev interface > vfio/pci: Make vfio cdev pre-openable by passing a file handle > vfio/platform: Allow the selection of a given iommu backend > vfio/platform: Make vfio cdev pre-openable by passing a file handle > vfio/ap: Allow the selection of a given iommu backend > vfio/ap: Make vfio cdev pre-openable by passing a file handle > vfio/ccw: Allow the selection of a given iommu backend > vfio/ccw: Make vfio cdev pre-openable by passing a file handle > vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps > callbacks > > MAINTAINERS | 10 + > qapi/qom.json | 19 + > hw/vfio/pci.h | 6 + > include/hw/vfio/vfio-common.h | 26 +- > include/hw/vfio/vfio-container-base.h | 15 +- > include/qemu/chardev_open.h | 16 + > include/sysemu/iommufd.h | 44 ++ > backends/iommufd.c | 228 ++++++++++ > hw/vfio/ap.c | 29 +- > hw/vfio/ccw.c | 31 +- > hw/vfio/common.c | 24 +- > hw/vfio/container-base.c | 6 +- > hw/vfio/container.c | 208 ++++++++- > hw/vfio/helpers.c | 44 ++ > hw/vfio/iommufd.c | 630 ++++++++++++++++++++++++++ > hw/vfio/pci.c | 212 ++------- > hw/vfio/platform.c | 38 +- > util/chardev_open.c | 81 ++++ > backends/Kconfig | 4 + > backends/meson.build | 1 + > backends/trace-events | 10 + > hw/arm/Kconfig | 1 + > hw/i386/Kconfig | 1 + > hw/s390x/Kconfig | 1 + > hw/vfio/meson.build | 3 + > hw/vfio/trace-events | 11 + > qemu-options.hx | 12 + > util/meson.build | 1 + > 28 files changed, 1493 insertions(+), 219 deletions(-) > create mode 100644 include/qemu/chardev_open.h > create mode 100644 include/sysemu/iommufd.h > create mode 100644 backends/iommufd.c > create mode 100644 hw/vfio/iommufd.c > create mode 100644 util/chardev_open.c >
Hi Cédric, >-----Original Message----- >From: Cédric Le Goater <clg@redhat.com> >Sent: Tuesday, November 14, 2023 10:52 PM >Subject: Re: [PATCH v6 00/21] vfio: Adopt iommufd > >Hello Zhenzhong, > >On 11/14/23 11:09, Zhenzhong Duan wrote: >> Hi, >> >> Thanks all for giving guides and comments on previous series, this is >> the remaining part of the iommufd support. >> >> Based on Cédric's suggestion, replace old config method for IOMMUFD >> with Kconfig. >> >> Based on Jason's suggestion, drop the implementation of manually >> allocating hwpt and switch to IOAS attach/detach. >> >> Beside current test, we also tested mdev with mtty for better cover range. >> >> PATCH 1: Introduce iommufd object >> PATCH 2-9: add IOMMUFD container and cdev support >> PATCH 10-17: fd passing for cdev and linking to IOMMUFD >> PATCH 18: make VFIOContainerBase parameter const >> PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86 >> >> >> We have done wide test with different combinations, e.g: >> - PCI device were tested >> - FD passing and hot reset with some trick. >> - device hotplug test with legacy and iommufd backends >> - with or without vIOMMU for legacy and iommufd backends >> - divices linked to different iommufds >> - VFIO migration with a E800 net card(no dirty sync support) passthrough >> - platform, ccw and ap were only compile-tested due to environment limit >> - test mdev pass through with mtty and mix with real device and different BE >> >> Given some iommufd kernel limitations, the iommufd backend is >> not yet fully on par with the legacy backend w.r.t. features like: >> - p2p mappings (you will see related error traces) >> - dirty page sync >> - and etc. >> >> >> qemu code: >https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v6 >> Based on vfio-next, commit id: 1a22fb936e > >I just had a few comments that I addressed myself in : > > https://github.com/legoater/qemu/commits/vfio-8.2 > >Please take a look and I will possibly merge the modified v6 in vfio-next >after a few days or next week when people are back from LPC. Good for me, only a minor suggestion to use OBJECT_DECLARE_TYPE. > >I would like to propose this series for an early merge in QEMU 9.0. So, we >have a few weeks to polish and test a bit more. I didn't get feedback from >ARM for instance (that's a message in a bottle for Eric :). Glad to know, thanks very much for your active review and guidance since my first version😊 I'll do more test recent days. > >My last request would be to take some of the documentation, below in this >email, and include it in QEMU. It can be done later. Sure, will do. BRs. Zhenzhong
Hi Zhenzhong, On 11/14/23 11:09, Zhenzhong Duan wrote: > Hi, > > Thanks all for giving guides and comments on previous series, this is > the remaining part of the iommufd support. > > Based on Cédric's suggestion, replace old config method for IOMMUFD > with Kconfig. > > Based on Jason's suggestion, drop the implementation of manually > allocating hwpt and switch to IOAS attach/detach. > > Beside current test, we also tested mdev with mtty for better cover range. > > PATCH 1: Introduce iommufd object > PATCH 2-9: add IOMMUFD container and cdev support > PATCH 10-17: fd passing for cdev and linking to IOMMUFD > PATCH 18: make VFIOContainerBase parameter const > PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86 > > > We have done wide test with different combinations, e.g: > - PCI device were tested > - FD passing and hot reset with some trick. > - device hotplug test with legacy and iommufd backends > - with or without vIOMMU for legacy and iommufd backends > - divices linked to different iommufds > - VFIO migration with a E800 net card(no dirty sync support) passthrough > - platform, ccw and ap were only compile-tested due to environment limit > - test mdev pass through with mtty and mix with real device and different BE > > Given some iommufd kernel limitations, the iommufd backend is > not yet fully on par with the legacy backend w.r.t. features like: > - p2p mappings (you will see related error traces) > - dirty page sync > - and etc. Feel free to add my T-b: Tested-by: Eric Auger <eric.auger@redhat.com> Thanks Eric > > > qemu code: https://github.com/yiliu1765/qemu/commits/zhenzhong/iommufd_cdev_v6 > Based on vfio-next, commit id: 1a22fb936e > > -------------------------------------------------------------------------- > > Below are some background and graph about the design: > > With the introduction of iommufd, the Linux kernel provides a generic > interface for userspace drivers to propagate their DMA mappings to kernel > for assigned devices. This series does the porting of the VFIO devices > onto the /dev/iommu uapi and let it coexist with the legacy implementation. > > At QEMU level, interactions with the /dev/iommu are abstracted by a new > iommufd object (compiled in with the CONFIG_IOMMUFD option). > > Any QEMU device (e.g. vfio device) wishing to use /dev/iommu must be > linked with an iommufd object. In this series, the vfio-pci device is > granted with such capability (other VFIO devices are not yet ready): > > It gets a new optional parameter named iommufd which allows to pass > an iommufd object: > > -object iommufd,id=iommufd0 > -device vfio-pci,host=0000:02:00.0,iommufd=iommufd0 > > Note the /dev/iommu and vfio cdev can be externally opened by a > management layer. In such a case the fd is passed: > > -object iommufd,id=iommufd0,fd=22 > -device vfio-pci,iommufd=iommufd0,fd=23 > > If the fd parameter is not passed, the fd is opened by QEMU. > See https://www.mail-archive.com/qemu-devel@nongnu.org/msg937155.html > for detailed discuss on this requirement. > > If no iommufd option is passed to the vfio-pci device, iommufd is not > used and the end-user gets the behavior based on the legacy vfio iommu > interfaces: > > -device vfio-pci,host=0000:02:00.0 > > While the legacy kernel interface is group-centric, the new iommufd > interface is device-centric, relying on device fd and iommufd. > > To support both interfaces in the QEMU VFIO device we reworked the vfio > container abstraction so that the generic VFIO code can use either > backend. > > The VFIOContainer object becomes a base object derived into > a) the legacy VFIO container and > b) the new iommufd based container. > > The base object implements generic code such as code related to > memory_listener and address space management whereas the derived > objects implement callbacks specific to either BE, legacy and > iommufd. Indeed each backend has its own way to setup secure context > and dma management interface. The below diagram shows how it looks > like with both BEs. > > VFIO AddressSpace/Memory > +-------+ +----------+ +-----+ +-----+ > | pci | | platform | | ap | | ccw | > +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+ > | | | | | AddressSpace | > | | | | +------------+---------+ > +---V-----------V-----------V--------V----+ / > | VFIOAddressSpace | <------------+ > | | | MemoryListener > | VFIOContainer list | > +-------+----------------------------+----+ > | | > | | > +-------V------+ +--------V----------+ > | iommufd | | vfio legacy | > | container | | container | > +-------+------+ +--------+----------+ > | | > | /dev/iommu | /dev/vfio/vfio > | /dev/vfio/devices/vfioX | /dev/vfio/$group_id > Userspace | | > ============+============================+=========================== > Kernel | device fd | > +---------------+ | group/container fd > | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU) > | ATTACH_IOAS) | | device fd > | | | > | +-------V------------V-----------------+ > iommufd | | vfio | > (map/unmap | +---------+--------------------+-------+ > ioas_copy) | | | map/unmap > | | | > +------V------+ +-----V------+ +------V--------+ > | iommfd core | | device | | vfio iommu | > +-------------+ +------------+ +---------------+ > > [Secure Context setup] > - iommufd BE: uses device fd and iommufd to setup secure context > (bind_iommufd, attach_ioas) > - vfio legacy BE: uses group fd and container fd to setup secure context > (set_container, set_iommu) > [Device access] > - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX > - vfio legacy BE: device fd is retrieved from group fd ioctl > [DMA Mapping flow] > 1. VFIOAddressSpace receives MemoryRegion add/del via MemoryListener > 2. VFIO populates DMA map/unmap via the container BEs > *) iommufd BE: uses iommufd > *) vfio legacy BE: uses container fd > > > Changelog: > v6: > - simplify CONFIG_IOMMUFD checking code further (Cédric) > - check iommufd_cdev_kvm_device_add return value (Cédric) > - dirrectory -> directory (Cédric) > - propagate iommufd_cdev_get_info_iova_range err and print as warning (Cédric) > - introduce a helper vfio_device_set_fd (Cédric) > - Move #include "sysemu/iommufd.h" in platform.c (Cédric) > - simplify iommufd backend uAPI, remove alloc_hwpt, get/put_ioas > - Dare to keep Matthew's RB as related change is minor > > v5: > - Change to use Kconfig for CONFIG_IOMMUFD and drop stub file (Cédric) > - Add (uintptr_t) to info->allowed_iovas (Cédric) > - Switch to IOAS attach/detach and hide hwpt (Jason) > - move chardev_open.[h|c] under the IOMMUFD entry (Cédric) > - Move vfio_legacy_pci_hot_reset into container.c (Cédric) > - Add missed pgsizes initialization in vfio_get_info_iova_range > - split linking iommufd patch into three to be cleaner > - Fix comments on PCI BAR unmap > > v4: > - add CONFIG_IOMMUFD check for IOMMUFDProperties (Markus) > - add doc for default case without fd (Markus) > - Fix build issue reported by Markus and Cédric > - Simply use SPDX identifier in new file (Cédric) > - make vfio_container_init/destroy helper a seperate patch (Cédric) > - make vrdl_list movement a seperate patch (Cédric) > - add const for some callback parameters (Cédric) > - add g_assert in VFIOIOMMUOps callback (Cédric) > - introduce pci_hot_reset callback (Cédric) > - remove VFIOIOMMUSpaprOps (Cédric) > - initialize g_autofree to NULL (Cédric) > - adjust func name prefix and trace event in iommufd.c (Cédric) > - add RB > > v3: > - Rename base container as VFIOContainerBase and legacy container as container (Cédric) > - Drop VFIO_IOMMU_BACKEND_OPS class and use struct instead (Cédric) > - Cleanup container.c by introducing spapr backend and move spapr code out (Cédric) > - Introduce vfio_iommu_spapr_ops (Cédric) > - Add doc of iommufd in qom.json and have iommufd member sorted (Markus) > - patch19 and patch21 are splitted to two parts to facilitate review > > v2: > - patch "vfio: Add base container" in v1 is split into patch1-15 per Cédric > - add fd passing to platform/ap/ccw vfio device > - add (uintptr_t) cast in iommufd_backend_map_dma() per Cédric > - rename char_dev.h to chardev_open.h for same naming scheme per Daniel > - add full copyright per Daniel and Jason > > > Note changelog below are from full IOMMUFD series: > > v1: > - Alloc hwpt instead of using auto hwpt > - elaborate iommufd code per Nicolin > - consolidate two patches and drop as.c > - typo error fix and function rename > > rfcv4: > - rebase on top of v8.0.3 > - Add one patch from Yi which is about vfio device add in kvm > - Remove IOAS_COPY optimization and focus on functions in this patchset > - Fix wrong name issue reported and fix suggested by Matthew > - Fix compilation issue reported and fix sugggsted by Nicolin > - Use query_dirty_bitmap callback to replace get_dirty_bitmap for better > granularity > - Add dev_iter_next() callback to avoid adding so many callback > at container scope, add VFIODevice.hwpt to support that > - Restore all functions back to common from container whenever possible, > mainly migration and reset related functions > - Add --enable/disable-iommufd config option, enabled by default in linux > - Remove VFIODevice.hwpt_next as it's redundant with VFIODevice.next > - Adapt new VFIO_DEVICE_PCI_HOT_RESET uAPI for IOMMUFD backed device > - vfio_kvm_device_add/del_group call vfio_kvm_device_add/del_fd to remove > redundant code > - Add FD passing support for vfio device backed by IOMMUFD > - Fix hot unplug resource leak issue in vfio_legacy_detach_device() > - Fix FD leak in vfio_get_devicefd() > > rfcv3: > - rebase on top of v7.2.0 > - Fix the compilation with CONFIG_IOMMUFD unset by using true classes for > VFIO backends > - Fix use after free in error path, reported by Alister > - Split common.c in several steps to ease the review > > rfcv2: > - remove the first three patches of rfcv1 > - add open cdev helper suggested by Jason > - remove the QOMification of the VFIOContainer and simply use standard ops > (David) > - add "-object iommufd" suggested by Alex > > Thanks > Zhenzhong > > > Cédric Le Goater (3): > hw/arm: Activate IOMMUFD for virt machines > kconfig: Activate IOMMUFD for s390x machines > hw/i386: Activate IOMMUFD for q35 machines > > Eric Auger (2): > backends/iommufd: Introduce the iommufd object > vfio/pci: Allow the selection of a given iommu backend > > Yi Liu (2): > util/char_dev: Add open_cdev() > vfio/iommufd: Implement the iommufd backend > > Zhenzhong Duan (14): > vfio/common: return early if space isn't empty > vfio/iommufd: Relax assert check for iommufd backend > vfio/iommufd: Add support for iova_ranges and pgsizes > vfio/pci: Extract out a helper vfio_pci_get_pci_hot_reset_info > vfio/pci: Introduce a vfio pci hot reset interface > vfio/iommufd: Enable pci hot reset through iommufd cdev interface > vfio/pci: Make vfio cdev pre-openable by passing a file handle > vfio/platform: Allow the selection of a given iommu backend > vfio/platform: Make vfio cdev pre-openable by passing a file handle > vfio/ap: Allow the selection of a given iommu backend > vfio/ap: Make vfio cdev pre-openable by passing a file handle > vfio/ccw: Allow the selection of a given iommu backend > vfio/ccw: Make vfio cdev pre-openable by passing a file handle > vfio: Make VFIOContainerBase poiner parameter const in VFIOIOMMUOps > callbacks > > MAINTAINERS | 10 + > qapi/qom.json | 19 + > hw/vfio/pci.h | 6 + > include/hw/vfio/vfio-common.h | 26 +- > include/hw/vfio/vfio-container-base.h | 15 +- > include/qemu/chardev_open.h | 16 + > include/sysemu/iommufd.h | 44 ++ > backends/iommufd.c | 228 ++++++++++ > hw/vfio/ap.c | 29 +- > hw/vfio/ccw.c | 31 +- > hw/vfio/common.c | 24 +- > hw/vfio/container-base.c | 6 +- > hw/vfio/container.c | 208 ++++++++- > hw/vfio/helpers.c | 44 ++ > hw/vfio/iommufd.c | 630 ++++++++++++++++++++++++++ > hw/vfio/pci.c | 212 ++------- > hw/vfio/platform.c | 38 +- > util/chardev_open.c | 81 ++++ > backends/Kconfig | 4 + > backends/meson.build | 1 + > backends/trace-events | 10 + > hw/arm/Kconfig | 1 + > hw/i386/Kconfig | 1 + > hw/s390x/Kconfig | 1 + > hw/vfio/meson.build | 3 + > hw/vfio/trace-events | 11 + > qemu-options.hx | 12 + > util/meson.build | 1 + > 28 files changed, 1493 insertions(+), 219 deletions(-) > create mode 100644 include/qemu/chardev_open.h > create mode 100644 include/sysemu/iommufd.h > create mode 100644 backends/iommufd.c > create mode 100644 hw/vfio/iommufd.c > create mode 100644 util/chardev_open.c >
>-----Original Message----- >From: Eric Auger <eric.auger@redhat.com> >Sent: Monday, November 20, 2023 5:15 PM >Subject: Re: [PATCH v6 00/21] vfio: Adopt iommufd > >Hi Zhenzhong, > >On 11/14/23 11:09, Zhenzhong Duan wrote: >> Hi, >> >> Thanks all for giving guides and comments on previous series, this is >> the remaining part of the iommufd support. >> >> Based on Cédric's suggestion, replace old config method for IOMMUFD >> with Kconfig. >> >> Based on Jason's suggestion, drop the implementation of manually >> allocating hwpt and switch to IOAS attach/detach. >> >> Beside current test, we also tested mdev with mtty for better cover range. >> >> PATCH 1: Introduce iommufd object >> PATCH 2-9: add IOMMUFD container and cdev support >> PATCH 10-17: fd passing for cdev and linking to IOMMUFD >> PATCH 18: make VFIOContainerBase parameter const >> PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86 >> >> >> We have done wide test with different combinations, e.g: >> - PCI device were tested >> - FD passing and hot reset with some trick. >> - device hotplug test with legacy and iommufd backends >> - with or without vIOMMU for legacy and iommufd backends >> - divices linked to different iommufds >> - VFIO migration with a E800 net card(no dirty sync support) passthrough >> - platform, ccw and ap were only compile-tested due to environment limit >> - test mdev pass through with mtty and mix with real device and different BE >> >> Given some iommufd kernel limitations, the iommufd backend is >> not yet fully on par with the legacy backend w.r.t. features like: >> - p2p mappings (you will see related error traces) >> - dirty page sync >> - and etc. > >Feel free to add my T-b: >Tested-by: Eric Auger <eric.auger@redhat.com> Thanks Eric, you mean all the patches or arm part? BRs. Zhenzhong
Hi Zhengzhong On 11/20/23 11:09, Duan, Zhenzhong wrote: > >> -----Original Message----- >> From: Eric Auger <eric.auger@redhat.com> >> Sent: Monday, November 20, 2023 5:15 PM >> Subject: Re: [PATCH v6 00/21] vfio: Adopt iommufd >> >> Hi Zhenzhong, >> >> On 11/14/23 11:09, Zhenzhong Duan wrote: >>> Hi, >>> >>> Thanks all for giving guides and comments on previous series, this is >>> the remaining part of the iommufd support. >>> >>> Based on Cédric's suggestion, replace old config method for IOMMUFD >>> with Kconfig. >>> >>> Based on Jason's suggestion, drop the implementation of manually >>> allocating hwpt and switch to IOAS attach/detach. >>> >>> Beside current test, we also tested mdev with mtty for better cover range. >>> >>> PATCH 1: Introduce iommufd object >>> PATCH 2-9: add IOMMUFD container and cdev support >>> PATCH 10-17: fd passing for cdev and linking to IOMMUFD >>> PATCH 18: make VFIOContainerBase parameter const >>> PATCH 19-21: Compile out for IOMMUFD for arm, s390x and x86 >>> >>> >>> We have done wide test with different combinations, e.g: >>> - PCI device were tested >>> - FD passing and hot reset with some trick. >>> - device hotplug test with legacy and iommufd backends >>> - with or without vIOMMU for legacy and iommufd backends >>> - divices linked to different iommufds >>> - VFIO migration with a E800 net card(no dirty sync support) passthrough >>> - platform, ccw and ap were only compile-tested due to environment limit >>> - test mdev pass through with mtty and mix with real device and different BE >>> >>> Given some iommufd kernel limitations, the iommufd backend is >>> not yet fully on par with the legacy backend w.r.t. features like: >>> - p2p mappings (you will see related error traces) >>> - dirty page sync >>> - and etc. >> Feel free to add my T-b: >> Tested-by: Eric Auger <eric.auger@redhat.com> > Thanks Eric, you mean all the patches or arm part? Yeah sorry I failed to give details. I have tested on ARM with vfio-pci atm. So all the generic patches + ARM virt / PCI specific ones. As for VFIO-PLATFORM I need to resurrect a new environment because I have some trouble with AMD overdrive which do not expose iommu groups atm. I need to figure this out or create a new vfio-platform environment to test. Working on it ... Thanks Eric > > BRs. > Zhenzhong
© 2016 - 2024 Red Hat, Inc.