>-----Original Message----- >From: Joao Martins <joao.m.martins@oracle.com> >Subject: Re: [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking > >On 11/07/2024 08:41, Cédric Le Goater wrote: >> Hello Joao, >> >> On 7/8/24 4:34 PM, Joao Martins wrote: >>> This small series adds support for IOMMU dirty tracking support via the >>> IOMMUFD backend. The hardware capability is available on most recent >x86 >>> hardware. The series is divided organized as follows: >>> >>> * Patch 1: Fixes a regression into mdev support with IOMMUFD. This >>> one is independent of the series but happened to cross it >>> while testing mdev with this series >>> >>> * Patch 2: Adds a support to iommufd_get_device_info() for capabilities >>> >>> * Patches 3 - 7: IOMMUFD backend support for dirty tracking; >>> >>> Introduce auto domains -- Patch 3 goes into more detail, but the gist is >that >>> we will find and attach a device to a compatible IOMMU domain, or >allocate a new >>> hardware pagetable *or* rely on kernel IOAS attach (for mdevs). >Afterwards the >>> workflow is relatively simple: >>> >>> 1) Probe device and allow dirty tracking in the HWPT >>> 2) Toggling dirty tracking on/off >>> 3) Read-and-clear of Dirty IOVAs >>> >>> The heuristics selected for (1) were to always request the HWPT for >>> dirty tracking if supported, or rely on device dirty page tracking. This >>> is a little simplistic and we aren't necessarily utilizing IOMMU dirty >>> tracking even if we ask during hwpt allocation. >>> >>> The unmap case is deferred until further vIOMMU support with migration >>> is added[3] which will then introduce the usage of >>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP >ioctl in the >>> dma unmap bitmap flow. >>> >>> * Patches 8-10: Don't block live migration where there's no VF dirty >>> tracker, considering that we have IOMMU dirty tracking. >>> >>> Comments and feedback appreciated. >>> >>> Cheers, >>> Joao >>> >>> P.S. Suggest linux-next (or future v6.11) as hypervisor kernel as there's >>> some bugs fixed there with regards to IOMMU hugepage dirty tracking. >>> >>> Changes since RFCv2[4]: >>> * Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING >even if >>> we end up not actually toggling dirty tracking. (Avihai) >>> * Fix error handling widely in auto domains logic and all patches (Avihai) >>> * Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong) >>> * New patches 1 and 2 taking into consideration previous comments. >>> * Store hwpt::flags to know if we have dirty tracking (Avihai) >>> * New patch 8, that allows to query dirty tracking support after >>> provisioning. This is a cleaner way to check IOMMU dirty tracking support >>> when vfio::migration is iniitalized, as opposed to RFCv2 via device caps. >>> device caps way is still used because at vfio attach we aren't yet with >>> a fully initialized migration state. >>> * Adopt error propagation in query,set dirty tracking >>> * Misc improvements overall broadly and Avihai >>> * Drop hugepages as it's a bit unrelated; I can pursue that patch >>> * separately. The main motivation is to provide a way to test >>> without hugepages similar to what >vfio_type1_iommu.disable_hugepages=1 >>> does. >>> >>> Changes since RFCv1[2]: >>> * Remove intel/amd dirty tracking emulation enabling >>> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking >>> [Will pursue these two in separate series] >>> * Introduce auto domains support >>> * Enforce dirty tracking following the IOMMUFD UAPI for this >>> * Add support for toggling hugepages in IOMMUFD >>> * Auto enable support when VF supports migration to use IOMMU >>> when it doesn't have VF dirty tracking >>> * Add a parameter to toggle VF dirty tracking >>> >>> [0] >>> https://lore.kernel.org/qemu-devel/20240201072818.327930-1- >zhenzhong.duan@intel.com/ >>> [1] >>> https://lore.kernel.org/qemu-devel/20240201072818.327930-10- >zhenzhong.duan@intel.com/ >>> [2] >>> https://lore.kernel.org/qemu-devel/20220428211351.3897-1- >joao.m.martins@oracle.com/ >>> [3] >>> https://lore.kernel.org/qemu-devel/20230622214845.3980-1- >joao.m.martins@oracle.com/ >>> [4] >>> https://lore.kernel.org/qemu-devel/20240212135643.5858-1- >joao.m.martins@oracle.com/ >>> >>> Joao Martins (10): >>> vfio/iommufd: don't fail to realize on IOMMU_GET_HW_INFO failure >>> backends/iommufd: Extend iommufd_backend_get_device_info() to >fetch HW >>> capabilities >>> vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() >>> vfio/iommufd: Introduce auto domain creation >>> vfio/iommufd: Probe and request hwpt dirty tracking capability >>> vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking >support >>> vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap >support >>> vfio/iommufd: Parse hw_caps and store dirty tracking support >>> vfio/migration: Don't block migration device dirty tracking is >unsupported >>> vfio/common: Allow disabling device dirty page tracking >>> >>> include/hw/vfio/vfio-common.h | 11 ++ >>> include/sysemu/host_iommu_device.h | 2 + >>> include/sysemu/iommufd.h | 12 +- >>> backends/iommufd.c | 81 ++++++++++- >>> hw/vfio/common.c | 3 + >>> hw/vfio/iommufd.c | 217 +++++++++++++++++++++++++++-- >>> hw/vfio/migration.c | 7 +- >>> hw/vfio/pci.c | 3 + >>> backends/trace-events | 3 + >>> 9 files changed, 325 insertions(+), 14 deletions(-) >> >> >> I am a bit confused with all the inline proposals. Would you mind >> resending a v4 please ? >> > >Yeap, I'll send it out today, or worst case tomorrow morning. > >> Regarding my comments on error handling, >> >> The error should be set in case of failure, which means a routine >> can not return 'false' or '-errno' and not setting 'Error **' >> parameter at the same time. >> >> If the returned value needs to be interpreted in some ways, for a >> retry or any reason, then it makes sense to use an int, else please >> use a bool. This is to avoid random negative values being interpreted >> as an errno when they are not. >> >OK, I'll retain the Error* creation even when expecting to test the errno. > >> With VFIO migration support, low level errors (from the adapter FW >> through the VFIO PCI variant driver) now reach to the core migration >> subsystem. It is preferable to propagate this error, possibly literal, >> to the VMM, monitor or libvirt. It's not fully symmetric today because >> the log_global_stop handler for dirty tracking enablement is not >> addressed. Anyhow, an effort on error reporting needs to be made and >> any use of error_report() in a low level function is a sign for >> improvement. >> >Gotcha. My earlier comment was mostly that it sounded like there was no >place >for returning -errno, but it seems it's not that binary and the Error* is the >thing that really matters here. > >> I think it would have value to probe early the host IOMMU device for >> its HW features. If the results were cached in the HostIOMMUDevice >> struct, it would then remove unnecessary and redundant calls to the >> host kernel and avoid error handling in complex code paths. I hope >> this is feasible. I haven't looked closely tbh. >> >OK, I'll post in this series what I had inline[0], as that's what I did. > >[0] >https://lore.kernel.org/qemu-devel/4e85db04-fbaa-4a6b-b133- >59170c471e24@oracle.com/ > >The gotcha in my opinion is that I cache IOMMUFD specific data returned by >the >GET_HW_INFO ioctl inside a new HostIOMMUDeviceCaps::iommufd. The >reason being >that vfio_device_get_aw_bits() has a hidden assumption that the container >is >already populated with the list of allowed iova ranges, which is not true for >the first device. So rather than have partial set of caps initialized, I >essentially ended up with fetching the raw caps and store them, and serialize >caps into named features (e.g. caps::aw_bits) in >HostIOMMUDevice::realize(). Another way is to call vfio_device_get_aw_bits() and return its result directly in get_cap(), then no need to initialize caps::aw_bits. This way host IOMMU device can be moved ahead as Cédric suggested. Thanks Zhenzhong > >> We are reaching soft freeze, in ~10 days. There is a chance this >> proposal could make it for 9.1. >> >> Thanks, >> >> C. >>
On 11/07/2024 11:22, Duan, Zhenzhong wrote: > > >> -----Original Message----- >> From: Joao Martins <joao.m.martins@oracle.com> >> Subject: Re: [PATCH v3 00/10] hw/vfio: IOMMUFD Dirty Tracking >> >> On 11/07/2024 08:41, Cédric Le Goater wrote: >>> Hello Joao, >>> >>> On 7/8/24 4:34 PM, Joao Martins wrote: >>>> This small series adds support for IOMMU dirty tracking support via the >>>> IOMMUFD backend. The hardware capability is available on most recent >> x86 >>>> hardware. The series is divided organized as follows: >>>> >>>> * Patch 1: Fixes a regression into mdev support with IOMMUFD. This >>>> one is independent of the series but happened to cross it >>>> while testing mdev with this series >>>> >>>> * Patch 2: Adds a support to iommufd_get_device_info() for capabilities >>>> >>>> * Patches 3 - 7: IOMMUFD backend support for dirty tracking; >>>> >>>> Introduce auto domains -- Patch 3 goes into more detail, but the gist is >> that >>>> we will find and attach a device to a compatible IOMMU domain, or >> allocate a new >>>> hardware pagetable *or* rely on kernel IOAS attach (for mdevs). >> Afterwards the >>>> workflow is relatively simple: >>>> >>>> 1) Probe device and allow dirty tracking in the HWPT >>>> 2) Toggling dirty tracking on/off >>>> 3) Read-and-clear of Dirty IOVAs >>>> >>>> The heuristics selected for (1) were to always request the HWPT for >>>> dirty tracking if supported, or rely on device dirty page tracking. This >>>> is a little simplistic and we aren't necessarily utilizing IOMMU dirty >>>> tracking even if we ask during hwpt allocation. >>>> >>>> The unmap case is deferred until further vIOMMU support with migration >>>> is added[3] which will then introduce the usage of >>>> IOMMU_HWPT_GET_DIRTY_BITMAP_NO_CLEAR in GET_DIRTY_BITMAP >> ioctl in the >>>> dma unmap bitmap flow. >>>> >>>> * Patches 8-10: Don't block live migration where there's no VF dirty >>>> tracker, considering that we have IOMMU dirty tracking. >>>> >>>> Comments and feedback appreciated. >>>> >>>> Cheers, >>>> Joao >>>> >>>> P.S. Suggest linux-next (or future v6.11) as hypervisor kernel as there's >>>> some bugs fixed there with regards to IOMMU hugepage dirty tracking. >>>> >>>> Changes since RFCv2[4]: >>>> * Always allocate hwpt with IOMMU_HWPT_ALLOC_DIRTY_TRACKING >> even if >>>> we end up not actually toggling dirty tracking. (Avihai) >>>> * Fix error handling widely in auto domains logic and all patches (Avihai) >>>> * Reuse iommufd_backend_get_device_info() for capabilities (Zhenzhong) >>>> * New patches 1 and 2 taking into consideration previous comments. >>>> * Store hwpt::flags to know if we have dirty tracking (Avihai) >>>> * New patch 8, that allows to query dirty tracking support after >>>> provisioning. This is a cleaner way to check IOMMU dirty tracking support >>>> when vfio::migration is iniitalized, as opposed to RFCv2 via device caps. >>>> device caps way is still used because at vfio attach we aren't yet with >>>> a fully initialized migration state. >>>> * Adopt error propagation in query,set dirty tracking >>>> * Misc improvements overall broadly and Avihai >>>> * Drop hugepages as it's a bit unrelated; I can pursue that patch >>>> * separately. The main motivation is to provide a way to test >>>> without hugepages similar to what >> vfio_type1_iommu.disable_hugepages=1 >>>> does. >>>> >>>> Changes since RFCv1[2]: >>>> * Remove intel/amd dirty tracking emulation enabling >>>> * Remove the dirtyrate improvement for VF/IOMMU dirty tracking >>>> [Will pursue these two in separate series] >>>> * Introduce auto domains support >>>> * Enforce dirty tracking following the IOMMUFD UAPI for this >>>> * Add support for toggling hugepages in IOMMUFD >>>> * Auto enable support when VF supports migration to use IOMMU >>>> when it doesn't have VF dirty tracking >>>> * Add a parameter to toggle VF dirty tracking >>>> >>>> [0] >>>> https://lore.kernel.org/qemu-devel/20240201072818.327930-1- >> zhenzhong.duan@intel.com/ >>>> [1] >>>> https://lore.kernel.org/qemu-devel/20240201072818.327930-10- >> zhenzhong.duan@intel.com/ >>>> [2] >>>> https://lore.kernel.org/qemu-devel/20220428211351.3897-1- >> joao.m.martins@oracle.com/ >>>> [3] >>>> https://lore.kernel.org/qemu-devel/20230622214845.3980-1- >> joao.m.martins@oracle.com/ >>>> [4] >>>> https://lore.kernel.org/qemu-devel/20240212135643.5858-1- >> joao.m.martins@oracle.com/ >>>> >>>> Joao Martins (10): >>>> vfio/iommufd: don't fail to realize on IOMMU_GET_HW_INFO failure >>>> backends/iommufd: Extend iommufd_backend_get_device_info() to >> fetch HW >>>> capabilities >>>> vfio/iommufd: Return errno in iommufd_cdev_attach_ioas_hwpt() >>>> vfio/iommufd: Introduce auto domain creation >>>> vfio/iommufd: Probe and request hwpt dirty tracking capability >>>> vfio/iommufd: Implement VFIOIOMMUClass::set_dirty_tracking >> support >>>> vfio/iommufd: Implement VFIOIOMMUClass::query_dirty_bitmap >> support >>>> vfio/iommufd: Parse hw_caps and store dirty tracking support >>>> vfio/migration: Don't block migration device dirty tracking is >> unsupported >>>> vfio/common: Allow disabling device dirty page tracking >>>> >>>> include/hw/vfio/vfio-common.h | 11 ++ >>>> include/sysemu/host_iommu_device.h | 2 + >>>> include/sysemu/iommufd.h | 12 +- >>>> backends/iommufd.c | 81 ++++++++++- >>>> hw/vfio/common.c | 3 + >>>> hw/vfio/iommufd.c | 217 +++++++++++++++++++++++++++-- >>>> hw/vfio/migration.c | 7 +- >>>> hw/vfio/pci.c | 3 + >>>> backends/trace-events | 3 + >>>> 9 files changed, 325 insertions(+), 14 deletions(-) >>> >>> >>> I am a bit confused with all the inline proposals. Would you mind >>> resending a v4 please ? >>> >> >> Yeap, I'll send it out today, or worst case tomorrow morning. >> >>> Regarding my comments on error handling, >>> >>> The error should be set in case of failure, which means a routine >>> can not return 'false' or '-errno' and not setting 'Error **' >>> parameter at the same time. >>> >>> If the returned value needs to be interpreted in some ways, for a >>> retry or any reason, then it makes sense to use an int, else please >>> use a bool. This is to avoid random negative values being interpreted >>> as an errno when they are not. >>> >> OK, I'll retain the Error* creation even when expecting to test the errno. >> >>> With VFIO migration support, low level errors (from the adapter FW >>> through the VFIO PCI variant driver) now reach to the core migration >>> subsystem. It is preferable to propagate this error, possibly literal, >>> to the VMM, monitor or libvirt. It's not fully symmetric today because >>> the log_global_stop handler for dirty tracking enablement is not >>> addressed. Anyhow, an effort on error reporting needs to be made and >>> any use of error_report() in a low level function is a sign for >>> improvement. >>> >> Gotcha. My earlier comment was mostly that it sounded like there was no >> place >> for returning -errno, but it seems it's not that binary and the Error* is the >> thing that really matters here. >> >>> I think it would have value to probe early the host IOMMU device for >>> its HW features. If the results were cached in the HostIOMMUDevice >>> struct, it would then remove unnecessary and redundant calls to the >>> host kernel and avoid error handling in complex code paths. I hope >>> this is feasible. I haven't looked closely tbh. >>> >> OK, I'll post in this series what I had inline[0], as that's what I did. >> >> [0] >> https://lore.kernel.org/qemu-devel/4e85db04-fbaa-4a6b-b133- >> 59170c471e24@oracle.com/ >> >> The gotcha in my opinion is that I cache IOMMUFD specific data returned by >> the >> GET_HW_INFO ioctl inside a new HostIOMMUDeviceCaps::iommufd. The >> reason being >> that vfio_device_get_aw_bits() has a hidden assumption that the container >> is >> already populated with the list of allowed iova ranges, which is not true for >> the first device. So rather than have partial set of caps initialized, I >> essentially ended up with fetching the raw caps and store them, and serialize >> caps into named features (e.g. caps::aw_bits) in >> HostIOMMUDevice::realize(). > > Another way is to call vfio_device_get_aw_bits() and return its result directly > in get_cap(), then no need to initialize caps::aw_bits. > This way host IOMMU device can be moved ahead as Cédric suggested. Oh, yes, that's a great alternative. Let me adopt that instead and we don't need to make so huge changes structure wise.
© 2016 - 2024 Red Hat, Inc.