MAINTAINERS | 2 + hw/i386/intel_iommu_internal.h | 8 ++ include/hw/i386/intel_iommu.h | 3 + include/hw/pci/pci.h | 38 ++++- include/hw/vfio/vfio-common.h | 26 ++++ include/hw/vfio/vfio-container-base.h | 3 + include/qemu/range.h | 11 ++ include/sysemu/host_iommu_device.h | 95 +++++++++++++ include/sysemu/iommufd.h | 34 +++++ backends/host_iommu_device.c | 59 ++++++++ backends/iommufd.c | 75 +++++++--- hw/i386/intel_iommu.c | 197 ++++++++++++++++++++------ hw/pci/pci.c | 75 +++++++++- hw/vfio/common.c | 18 ++- hw/vfio/container.c | 49 ++++++- hw/vfio/iommufd.c | 52 ++++++- hw/vfio/pci.c | 20 ++- backends/Kconfig | 5 + backends/meson.build | 1 + 19 files changed, 701 insertions(+), 70 deletions(-) create mode 100644 include/sysemu/host_iommu_device.h create mode 100644 backends/host_iommu_device.c
Hi, The most important change in this version is instroducing a common HostIOMMUDeviceCaps structure in HostIOMMUDevice and a new interface between vIOMMU and HostIOMMUDevice. HostIOMMUDeviceClass::realize() is introduced to initialize HostIOMMUDeviceCaps and other fields of HostIOMMUDevice variants. HostIOMMUDeviceClass::check_cap() is introduced to query host IOMMU device capabilities. After the change, part2 is only 3 patches, so merge it with part1 to be a single prerequisite series, same for changelog. If anyone doesn't like that, I can split again. The class tree is as below: HostIOMMUDevice | .caps | .realize() | .check_cap() | .-----------------------------------------------. | | | HostIOMMUDeviceLegacyVFIO {HostIOMMUDeviceLegacyVDPA} HostIOMMUDeviceIOMMUFD | .vdev | {.vdev} | .iommufd | .devid | [.ioas_id] | [.attach_hwpt()] | [.detach_hwpt()] | .----------------------. | | HostIOMMUDeviceIOMMUFDVFIO {HostIOMMUDeviceIOMMUFDVDPA} | .vdev | {.vdev} * The attributes in [] will be implemented in nesting series. * The classes in {} will be implemented in future. * .vdev in different class points to different agent device, * i.e., for VFIO it points to VFIODevice. PATCH1-4: Introduce HostIOMMUDevice and its sub classes PATCH5-11: Introduce HostIOMMUDeviceCaps, implement .realize() and .check_cap() handler PATCH12-16: Create HostIOMMUDevice instance and pass to vIOMMU PATCH17-19: Implement compatibility check between host IOMMU and vIOMMU(intel_iommu) Qemu code can be found at: https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_preq_v3 Besides the compatibility check in this series, in nesting series, this host IOMMU device is extended for much wider usage. For anyone interested on the nesting series, here is the link: https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_rfcv2 Thanks Zhenzhong Changelog: v3: - refine declaration and doc for HostIOMMUDevice (Cédric, Philippe) - introduce HostIOMMUDeviceCaps, .realize() and .check_cap() (Cédric) - introduce helper range_get_last_bit() for range operation (Cédric) - separate pci_device_get_iommu_bus_devfn() in a prereq patch (Cédric) - replace HIOD_ abbreviation with HOST_IOMMU_DEVICE_ (Cédric) - add header in include/sysemu/iommufd.h (Cédric) v2: - use QOM to abstract host IOMMU device and its sub-classes (Cédric) - move host IOMMU device creation in attach_device() (Cédric) - refine pci_device_set/unset_iommu_device doc further (Eric) - define host IOMMU info format of different backend - implement get_host_iommu_info() for different backend (Cédric) - drop cap/ecap update logic (MST) - check aw-bits from get_host_iommu_info() in legacy mode v1: - use HostIOMMUDevice handle instead of union in VFIODevice (Eric) - change host_iommu_device_init to host_iommu_device_create - allocate HostIOMMUDevice in host_iommu_device_create callback and set the VFIODevice base_hdev handle (Eric) - refine pci_device_set/unset_iommu_device doc (Eric) - use HostIOMMUDevice handle instead of union in VTDHostIOMMUDevice (Eric) - convert HostIOMMUDevice to sub object pointer in vtd_check_hdev rfcv2: - introduce common abstract HostIOMMUDevice and sub struct for different BEs (Eric, Cédric) - remove iommufd_device.[ch] (Cédric) - remove duplicate iommufd/devid define from VFIODevice (Eric) - drop the p in aliased_pbus and aliased_pdevfn (Eric) - assert devfn and iommu_bus in pci_device_get_iommu_bus_devfn (Cédric, Eric) - use errp in iommufd_device_get_info (Eric) - split and simplify cap/ecap check/sync code in intel_iommu.c (Cédric) - move VTDHostIOMMUDevice declaration to intel_iommu_internal.h (Cédric) - make '(vtd->cap_reg >> 16) & 0x3fULL' a MACRO and add missed '+1' (Cédric) - block migration if vIOMMU cap/ecap updated based on host IOMMU cap/ecap - add R-B Yi Liu (2): hw/pci: Introduce pci_device_[set|unset]_iommu_device() intel_iommu: Implement [set|unset]_iommu_device() callbacks Zhenzhong Duan (17): backends: Introduce HostIOMMUDevice abstract vfio/container: Introduce HostIOMMUDeviceLegacyVFIO device backends/iommufd: Introduce abstract HostIOMMUDeviceIOMMUFD device vfio/iommufd: Introduce HostIOMMUDeviceIOMMUFDVFIO device backends/host_iommu_device: Introduce HostIOMMUDeviceCaps range: Introduce range_get_last_bit() vfio/container: Implement HostIOMMUDeviceClass::realize() handler backends/iommufd: Introduce helper function iommufd_backend_get_device_info() vfio/iommufd: Implement HostIOMMUDeviceClass::realize() handler vfio/container: Implement HostIOMMUDeviceClass::check_cap() handler backends/iommufd: Implement HostIOMMUDeviceClass::check_cap() handler vfio: Introduce VFIOIOMMUClass::hiod_typename attribute vfio: Create host IOMMU device instance hw/pci: Introduce helper function pci_device_get_iommu_bus_devfn() vfio/pci: Pass HostIOMMUDevice to vIOMMU intel_iommu: Extract out vtd_cap_init() to initialize cap/ecap intel_iommu: Check compatibility with host IOMMU capabilities MAINTAINERS | 2 + hw/i386/intel_iommu_internal.h | 8 ++ include/hw/i386/intel_iommu.h | 3 + include/hw/pci/pci.h | 38 ++++- include/hw/vfio/vfio-common.h | 26 ++++ include/hw/vfio/vfio-container-base.h | 3 + include/qemu/range.h | 11 ++ include/sysemu/host_iommu_device.h | 95 +++++++++++++ include/sysemu/iommufd.h | 34 +++++ backends/host_iommu_device.c | 59 ++++++++ backends/iommufd.c | 75 +++++++--- hw/i386/intel_iommu.c | 197 ++++++++++++++++++++------ hw/pci/pci.c | 75 +++++++++- hw/vfio/common.c | 18 ++- hw/vfio/container.c | 49 ++++++- hw/vfio/iommufd.c | 52 ++++++- hw/vfio/pci.c | 20 ++- backends/Kconfig | 5 + backends/meson.build | 1 + 19 files changed, 701 insertions(+), 70 deletions(-) create mode 100644 include/sysemu/host_iommu_device.h create mode 100644 backends/host_iommu_device.c -- 2.34.1
On 4/29/24 08:50, Zhenzhong Duan wrote: > Hi, > > The most important change in this version is instroducing a common > HostIOMMUDeviceCaps structure in HostIOMMUDevice and a new interface > between vIOMMU and HostIOMMUDevice. > > HostIOMMUDeviceClass::realize() is introduced to initialize > HostIOMMUDeviceCaps and other fields of HostIOMMUDevice variants. > > HostIOMMUDeviceClass::check_cap() is introduced to query host IOMMU > device capabilities. > > After the change, part2 is only 3 patches, so merge it with part1 to be > a single prerequisite series, same for changelog. If anyone doesn't like > that, I can split again. > > The class tree is as below: > > HostIOMMUDevice > | .caps > | .realize() > | .check_cap() > | > .-----------------------------------------------. > | | | > HostIOMMUDeviceLegacyVFIO {HostIOMMUDeviceLegacyVDPA} HostIOMMUDeviceIOMMUFD > | .vdev | {.vdev} | .iommufd > | .devid > | [.ioas_id] > | [.attach_hwpt()] > | [.detach_hwpt()] > | > .----------------------. > | | > HostIOMMUDeviceIOMMUFDVFIO {HostIOMMUDeviceIOMMUFDVDPA} > | .vdev | {.vdev} > > * The attributes in [] will be implemented in nesting series. > * The classes in {} will be implemented in future. > * .vdev in different class points to different agent device, > * i.e., for VFIO it points to VFIODevice. > > PATCH1-4: Introduce HostIOMMUDevice and its sub classes > PATCH5-11: Introduce HostIOMMUDeviceCaps, implement .realize() and .check_cap() handler > PATCH12-16: Create HostIOMMUDevice instance and pass to vIOMMU > PATCH17-19: Implement compatibility check between host IOMMU and vIOMMU(intel_iommu) > > Qemu code can be found at: > https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_preq_v3 > > Besides the compatibility check in this series, in nesting series, this > host IOMMU device is extended for much wider usage. For anyone interested > on the nesting series, here is the link: > https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_rfcv2 v4 should be a good candidate, we will need feedback from the vIOMMU maintainers though. However, have you considered another/complementary approach which would be to create an host IOMMU (iommufd) backend object and a vIOMMU device object together for each vfio-pci device being plugged in the machine ? Something like, -device pcie-root-port,port=23,chassis=8,id=pci.8,bus=pcie.0 \ -object iommufd,id=iommufd1 \ -device intel-iommu,intremap=on,device-iotlb=on,caching-mode=on,iommufd=iommufd1 \ -device vfio-pci,host=0000:08:10.0,bus=pci.1,iommufd=iommufd0 The vIOMMU device would be linked to the host IOMMU (iommufd) backend object at realize time and it would simplify the discovery of the host IOMMU properties. The implementation would be more straight forward. That said, I didn't study deeply what needs to be done. The vIOMMU implementation is not ready yet to support multiple instances and some massaging is needed to change that first. Thanks, C.
Hi Cédric, >-----Original Message----- >From: Cédric Le Goater <clg@redhat.com> >Sent: Friday, May 3, 2024 10:04 PM >To: Duan, Zhenzhong <zhenzhong.duan@intel.com>; qemu- >devel@nongnu.org >Cc: alex.williamson@redhat.com; eric.auger@redhat.com; mst@redhat.com; >peterx@redhat.com; jasowang@redhat.com; jgg@nvidia.com; >nicolinc@nvidia.com; joao.m.martins@oracle.com; Tian, Kevin ><kevin.tian@intel.com>; Liu, Yi L <yi.l.liu@intel.com>; Peng, Chao P ><chao.p.peng@intel.com> >Subject: Re: [PATCH v3 00/19] Add a host IOMMU device abstraction to >check with vIOMMU > >On 4/29/24 08:50, Zhenzhong Duan wrote: >> Hi, >> >> The most important change in this version is instroducing a common >> HostIOMMUDeviceCaps structure in HostIOMMUDevice and a new >interface >> between vIOMMU and HostIOMMUDevice. >> >> HostIOMMUDeviceClass::realize() is introduced to initialize >> HostIOMMUDeviceCaps and other fields of HostIOMMUDevice variants. >> >> HostIOMMUDeviceClass::check_cap() is introduced to query host IOMMU >> device capabilities. >> >> After the change, part2 is only 3 patches, so merge it with part1 to be >> a single prerequisite series, same for changelog. If anyone doesn't like >> that, I can split again. >> >> The class tree is as below: >> >> HostIOMMUDevice >> | .caps >> | .realize() >> | .check_cap() >> | >> .-----------------------------------------------. >> | | | >> HostIOMMUDeviceLegacyVFIO {HostIOMMUDeviceLegacyVDPA} >HostIOMMUDeviceIOMMUFD >> | .vdev | {.vdev} | .iommufd >> | .devid >> | [.ioas_id] >> | [.attach_hwpt()] >> | [.detach_hwpt()] >> | >> .----------------------. >> | | >> HostIOMMUDeviceIOMMUFDVFIO >{HostIOMMUDeviceIOMMUFDVDPA} >> | .vdev | {.vdev} >> >> * The attributes in [] will be implemented in nesting series. >> * The classes in {} will be implemented in future. >> * .vdev in different class points to different agent device, >> * i.e., for VFIO it points to VFIODevice. >> >> PATCH1-4: Introduce HostIOMMUDevice and its sub classes >> PATCH5-11: Introduce HostIOMMUDeviceCaps, implement .realize() >and .check_cap() handler >> PATCH12-16: Create HostIOMMUDevice instance and pass to vIOMMU >> PATCH17-19: Implement compatibility check between host IOMMU and >vIOMMU(intel_iommu) >> >> Qemu code can be found at: >> >https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_pre >q_v3 >> >> Besides the compatibility check in this series, in nesting series, this >> host IOMMU device is extended for much wider usage. For anyone >interested >> on the nesting series, here is the link: >> >https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_rfc >v2 > > >v4 should be a good candidate, we will need feedback from the vIOMMU >maintainers though. > >However, have you considered another/complementary approach which >would be to create an host IOMMU (iommufd) backend object and a >vIOMMU >device object together for each vfio-pci device being plugged in the >machine ? I did consider about a single iommufd instance for qemu and finally chose to support multiple iommufd instances, reason below: I was taking iommufd as a backend of VFIO device not a backend of vIOMMU. So there is an iommufd property linked to iommufd instances. We do support multiple iommufd instances in nesting series just as we do in cdev series, such as: -device intel-iommu,caching-mode=on,dma-drain=on,device-iotlb=on,x-scalable-mode=modern -object iommufd,id=iommufd0 -device vfio-pci,host=6f:01.0,id=vfio0,bus=root0,iommufd=iommufd0 -object iommufd,id=iommufd1 -device vfio-pci,host=6f:01.1,id=vfio1,bus=root1,iommufd=iommufd1 -device vfio-pci,host=6f:01.2,id=vfio2,bus=root2 Adding iommufd property to vIOMMU will limit the whole qemu to use only one iommufd instance, it's also confusing if there is also vfio device with legacy backend. I'm not clear how useful multiple iommufd instances support are. One possible benefit is for security? It may bring a slightly fine-grained isolation in kernel. We can discuss this further, if it's unnecessary to support multiple iommufd instances in nesting series, I can change to single iommufd instance support and add an iommufd property for vIOMMU just as you suggested. Thanks Zhenzhong > >Something like, > > -device pcie-root-port,port=23,chassis=8,id=pci.8,bus=pcie.0 \ > -object iommufd,id=iommufd1 \ > -device intel-iommu,intremap=on,device-iotlb=on,caching- >mode=on,iommufd=iommufd1 \ > -device vfio-pci,host=0000:08:10.0,bus=pci.1,iommufd=iommufd0 > >The vIOMMU device would be linked to the host IOMMU (iommufd) backend >object at realize time and it would simplify the discovery of the host >IOMMU properties. The implementation would be more straight forward. > >That said, I didn't study deeply what needs to be done. The vIOMMU >implementation is not ready yet to support multiple instances and some >massaging is needed to change that first.
On Mon, May 06, 2024 at 02:30:47AM +0000, Duan, Zhenzhong wrote: > I'm not clear how useful multiple iommufd instances support are. > One possible benefit is for security? It may bring a slightly fine-grained > isolation in kernel. No. I don't think there is any usecase, it is only harmful. Jason
>-----Original Message----- >From: Jason Gunthorpe <jgg@nvidia.com> >Subject: Re: [PATCH v3 00/19] Add a host IOMMU device abstraction to >check with vIOMMU > >On Mon, May 06, 2024 at 02:30:47AM +0000, Duan, Zhenzhong wrote: > >> I'm not clear how useful multiple iommufd instances support are. >> One possible benefit is for security? It may bring a slightly fine-grained >> isolation in kernel. > >No. I don't think there is any usecase, it is only harmful. OK, so we need to limit QEMU to only one iommufd instance. In cdev series, we support mix of legacy and iommufd backend and multiple iommufd backend instances for flexibility. We need to make a choice to have this limitation only for nesting series or globally(including cdev). May I ask what harmfulness we may have? Thanks Zhenzhong
On Tue, May 07, 2024 at 02:24:30AM +0000, Duan, Zhenzhong wrote: > >On Mon, May 06, 2024 at 02:30:47AM +0000, Duan, Zhenzhong wrote: > > > >> I'm not clear how useful multiple iommufd instances support are. > >> One possible benefit is for security? It may bring a slightly fine-grained > >> isolation in kernel. > > > >No. I don't think there is any usecase, it is only harmful. > > OK, so we need to limit QEMU to only one iommufd instance. I don't know about limit, but you don't need to do extra stuff to make it work. The main issue will be to get all the viommu instances to share the same iommufd IOAS for the guest physical mapping. Otherwise each viommu should be largely unware of the others sharing (or not) a iommufd. If you can structure things properly it probably doesn't need a hard limit, it will just work worse. Jason
>-----Original Message----- >From: Jason Gunthorpe <jgg@nvidia.com> >Subject: Re: [PATCH v3 00/19] Add a host IOMMU device abstraction to >check with vIOMMU > >On Tue, May 07, 2024 at 02:24:30AM +0000, Duan, Zhenzhong wrote: >> >On Mon, May 06, 2024 at 02:30:47AM +0000, Duan, Zhenzhong wrote: >> > >> >> I'm not clear how useful multiple iommufd instances support are. >> >> One possible benefit is for security? It may bring a slightly fine-grained >> >> isolation in kernel. >> > >> >No. I don't think there is any usecase, it is only harmful. >> >> OK, so we need to limit QEMU to only one iommufd instance. > >I don't know about limit, but you don't need to do extra stuff to make >it work. > >The main issue will be to get all the viommu instances to share the >same iommufd IOAS for the guest physical mapping. Otherwise each >viommu should be largely unware of the others sharing (or not) a >iommufd. I see. > >If you can structure things properly it probably doesn't need a hard >limit, it will just work worse. OK, thanks for clarify. The extra code to support multiple instances in intel_iommu is trivial. So I'd like to keep this flexibility to user just like cdev. User can configure QEMU cmdline to use one IOMMUFD instance easily whenever they want. Thanks Zhenzhong
On Fri, May 03, 2024 at 04:04:25PM +0200, Cédric Le Goater wrote: > However, have you considered another/complementary approach which > would be to create an host IOMMU (iommufd) backend object and a vIOMMU > device object together for each vfio-pci device being plugged in the > machine ? > > Something like, > -device pcie-root-port,port=23,chassis=8,id=pci.8,bus=pcie.0 \ > -object iommufd,id=iommufd1 \ > -device intel-iommu,intremap=on,device-iotlb=on,caching-mode=on,iommufd=iommufd1 \ > -device vfio-pci,host=0000:08:10.0,bus=pci.1,iommufd=iommufd0 ? The main point of this is to have a single iommufd FD open in qemu. Not multiple. Would you achieve this with a iommufd0 and iommufd1 ? Jason
On 5/3/24 16:10, Jason Gunthorpe wrote: > On Fri, May 03, 2024 at 04:04:25PM +0200, Cédric Le Goater wrote: >> However, have you considered another/complementary approach which >> would be to create an host IOMMU (iommufd) backend object and a vIOMMU >> device object together for each vfio-pci device being plugged in the >> machine ? >> >> Something like, >> -device pcie-root-port,port=23,chassis=8,id=pci.8,bus=pcie.0 \ >> -object iommufd,id=iommufd1 \ >> -device intel-iommu,intremap=on,device-iotlb=on,caching-mode=on,iommufd=iommufd1 \ >> -device vfio-pci,host=0000:08:10.0,bus=pci.1,iommufd=iommufd0 > > ? The main point of this is to have a single iommufd FD open in > qemu. Not multiple. oups. The above example should have the same IOMMUFD object device instance: iommufd0. This is bogus copy-paste of a command line with multiple vfio-pci devices, each using its own IOMMUFD object device instance. That's where the idea comes from. Sorry for the noise. C.
© 2016 - 2024 Red Hat, Inc.