include/exec/memory.h | 34 +++- include/hw/vfio/vfio-common.h | 2 + include/hw/virtio/virtio-iommu.h | 7 +- include/qemu/range.h | 14 ++ include/qemu/reserved-region.h | 32 ++++ hw/core/qdev-properties-system.c | 9 +- hw/vfio/common.c | 23 ++- hw/vfio/container.c | 67 ++++++- hw/virtio/virtio-iommu-pci.c | 8 +- hw/virtio/virtio-iommu.c | 155 +++++++++++++-- system/memory.c | 13 ++ tests/unit/test-resv-mem.c | 318 +++++++++++++++++++++++++++++++ util/range.c | 61 +++++- util/reserved-region.c | 91 +++++++++ hw/virtio/trace-events | 1 + tests/unit/meson.build | 1 + util/meson.build | 1 + 17 files changed, 791 insertions(+), 46 deletions(-) create mode 100644 include/qemu/reserved-region.h create mode 100644 tests/unit/test-resv-mem.c create mode 100644 util/reserved-region.c
This applies on top of vfio-next: https://github.com/legoater/qemu/, vfio-next branch On x86, when assigning VFIO-PCI devices protected with virtio-iommu we encounter the case where the guest tries to map IOVAs beyond 48b whereas the physical VTD IOMMU only supports 48b. This ends up with VFIO_MAP_DMA failures at qemu level because at kernel level, vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map(). This is due to the fact the virtio-iommu currently unconditionally exposes an IOVA range of 64b through its config input range fields. This series removes this assumption by retrieving the usable IOVA regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when a VFIO device is attached. This info is communicated to the virtio-iommu memory region, transformed into the inversed info, ie. the host reserved IOVA regions. Then those latter are combined with the reserved IOVA regions set though the virtio-iommu reserved-regions property. That way, the guest virtio-iommu driver, unchanged, is able to probe the whole set of reserved regions and prevent any IOVA belonging to those ranges from beeing used, achieving the original goal. Best Regards Eric This series can be found at: https://github.com/eauger/qemu/tree/vfio-next-iommu_geometry-v3 History: v2 -> v3: - rebase on top of vfio-next (including iommufd prereq) - take into account IOVA range info capability may not be offered by old kernel and use nr_iovas = -1 to encode that [Alex] - use GList * everywhere instead of arrays (in the range_inverse_array) with the benefice it sorts ranges retrieved from the kernel which are not garanteed to be sorted. Rework the tests accordingly [Alex] - Make sure resv_regions GList is build before the probe() [Jean] per device list is first populated with prop resv regions on IOMMUDevice creation and then rebuilt on set_iova() - Add a warning if set_iova builds a valid list after probe was called [Jean] - Build host windows on top of IOVA valid ranges if this info can be retrieved from the kernel. As many windows are created as valid ranges v1 -> v2: - Remove "[PATCH 12/13] virtio-iommu: Resize memory region according to the max iova info" which causes way too much trouble: trigger a coredump in vhost, causes duplication of IOMMU notifiers causing EEXIST vfio_dma_map errors, ... This looks like a bad usage of the memory API so I prefer removing this from this series. So I was also obliged to remove the vfio_find_hostwin() check in the case of an IOMMU. - Let range_inverse_array() take low/high args instead of hardcoding 0, UINT64_MAX which both complexifies the algo and the tests. - Move range function description in header. - Check that if set_iova_ranges is called several times, new resv regions are included in previous ones Eric Auger (13): memory: Let ReservedRegion use Range memory: Introduce memory_region_iommu_set_iova_ranges vfio: Collect container iova range info virtio-iommu: Rename reserved_regions into prop_resv_regions range: Make range_compare() public util/reserved-region: Add new ReservedRegion helpers virtio-iommu: Introduce per IOMMUDevice reserved regions range: Introduce range_inverse_array() virtio-iommu: Record whether a probe request has been issued virtio-iommu: Implement set_iova_ranges() callback virtio-iommu: Consolidate host reserved regions and property set ones test: Add some tests for range and resv-mem helpers vfio: Remove 64-bit IOVA address space assumption include/exec/memory.h | 34 +++- include/hw/vfio/vfio-common.h | 2 + include/hw/virtio/virtio-iommu.h | 7 +- include/qemu/range.h | 14 ++ include/qemu/reserved-region.h | 32 ++++ hw/core/qdev-properties-system.c | 9 +- hw/vfio/common.c | 23 ++- hw/vfio/container.c | 67 ++++++- hw/virtio/virtio-iommu-pci.c | 8 +- hw/virtio/virtio-iommu.c | 155 +++++++++++++-- system/memory.c | 13 ++ tests/unit/test-resv-mem.c | 318 +++++++++++++++++++++++++++++++ util/range.c | 61 +++++- util/reserved-region.c | 91 +++++++++ hw/virtio/trace-events | 1 + tests/unit/meson.build | 1 + util/meson.build | 1 + 17 files changed, 791 insertions(+), 46 deletions(-) create mode 100644 include/qemu/reserved-region.h create mode 100644 tests/unit/test-resv-mem.c create mode 100644 util/reserved-region.c -- 2.41.0
On Wed, Oct 11, 2023 at 07:52:16PM +0200, Eric Auger wrote: > This applies on top of vfio-next: > https://github.com/legoater/qemu/, vfio-next branch virtio things make sense Reviewed-by: Michael S. Tsirkin <mst@redhat.com> let me know how you want to merge all this. > On x86, when assigning VFIO-PCI devices protected with virtio-iommu > we encounter the case where the guest tries to map IOVAs beyond 48b > whereas the physical VTD IOMMU only supports 48b. This ends up with > VFIO_MAP_DMA failures at qemu level because at kernel level, > vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map(). > > This is due to the fact the virtio-iommu currently unconditionally > exposes an IOVA range of 64b through its config input range fields. > > This series removes this assumption by retrieving the usable IOVA > regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when > a VFIO device is attached. This info is communicated to the > virtio-iommu memory region, transformed into the inversed info, ie. > the host reserved IOVA regions. Then those latter are combined with the > reserved IOVA regions set though the virtio-iommu reserved-regions > property. That way, the guest virtio-iommu driver, unchanged, is > able to probe the whole set of reserved regions and prevent any IOVA > belonging to those ranges from beeing used, achieving the original goal. > > Best Regards > > Eric > > This series can be found at: > https://github.com/eauger/qemu/tree/vfio-next-iommu_geometry-v3 > > History: > v2 -> v3: > - rebase on top of vfio-next (including iommufd prereq) > - take into account IOVA range info capability may not be offered by > old kernel and use nr_iovas = -1 to encode that [Alex] > - use GList * everywhere instead of arrays (in the range_inverse_array) > with the benefice it sorts ranges retrieved from the kernel which are > not garanteed to be sorted. Rework the tests accordingly [Alex] > - Make sure resv_regions GList is build before the probe() [Jean] > per device list is first populated with prop resv regions on > IOMMUDevice creation and then rebuilt on set_iova() > - Add a warning if set_iova builds a valid list after probe was > called [Jean] > - Build host windows on top of IOVA valid ranges if this info can > be retrieved from the kernel. As many windows are created as > valid ranges > v1 -> v2: > - Remove "[PATCH 12/13] virtio-iommu: Resize memory region according > to the max iova info" which causes way too much trouble: trigger > a coredump in vhost, causes duplication of IOMMU notifiers causing > EEXIST vfio_dma_map errors, ... This looks like a bad usage of the > memory API so I prefer removing this from this series. So I was > also obliged to remove the vfio_find_hostwin() check in the case > of an IOMMU. > - Let range_inverse_array() take low/high args instead of hardcoding > 0, UINT64_MAX which both complexifies the algo and the tests. > - Move range function description in header. > - Check that if set_iova_ranges is called several times, new resv > regions are included in previous ones > > Eric Auger (13): > memory: Let ReservedRegion use Range > memory: Introduce memory_region_iommu_set_iova_ranges > vfio: Collect container iova range info > virtio-iommu: Rename reserved_regions into prop_resv_regions > range: Make range_compare() public > util/reserved-region: Add new ReservedRegion helpers > virtio-iommu: Introduce per IOMMUDevice reserved regions > range: Introduce range_inverse_array() > virtio-iommu: Record whether a probe request has been issued > virtio-iommu: Implement set_iova_ranges() callback > virtio-iommu: Consolidate host reserved regions and property set ones > test: Add some tests for range and resv-mem helpers > vfio: Remove 64-bit IOVA address space assumption > > include/exec/memory.h | 34 +++- > include/hw/vfio/vfio-common.h | 2 + > include/hw/virtio/virtio-iommu.h | 7 +- > include/qemu/range.h | 14 ++ > include/qemu/reserved-region.h | 32 ++++ > hw/core/qdev-properties-system.c | 9 +- > hw/vfio/common.c | 23 ++- > hw/vfio/container.c | 67 ++++++- > hw/virtio/virtio-iommu-pci.c | 8 +- > hw/virtio/virtio-iommu.c | 155 +++++++++++++-- > system/memory.c | 13 ++ > tests/unit/test-resv-mem.c | 318 +++++++++++++++++++++++++++++++ > util/range.c | 61 +++++- > util/reserved-region.c | 91 +++++++++ > hw/virtio/trace-events | 1 + > tests/unit/meson.build | 1 + > util/meson.build | 1 + > 17 files changed, 791 insertions(+), 46 deletions(-) > create mode 100644 include/qemu/reserved-region.h > create mode 100644 tests/unit/test-resv-mem.c > create mode 100644 util/reserved-region.c > > -- > 2.41.0
On 10/18/23 15:37, Michael S. Tsirkin wrote: > On Wed, Oct 11, 2023 at 07:52:16PM +0200, Eric Auger wrote: >> This applies on top of vfio-next: >> https://github.com/legoater/qemu/, vfio-next branch > > virtio things make sense > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com> > > let me know how you want to merge all this. Michael, I will grab the series if that's OK. Thanks, C.
Hi Cédric, On 10/19/23 13:07, Cédric Le Goater wrote: > On 10/18/23 15:37, Michael S. Tsirkin wrote: >> On Wed, Oct 11, 2023 at 07:52:16PM +0200, Eric Auger wrote: >>> This applies on top of vfio-next: >>> https://github.com/legoater/qemu/, vfio-next branch >> >> virtio things make sense >> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> >> >> let me know how you want to merge all this. > > Michael, > > I will grab the series if that's OK. I have just sent a v4 taking into account Alex' suggestions, collecting Michael's and Alex' R-b, and YangHang's T-b. This should be ready to go if you don't have any other comments and if this survives your non regression tests ;-) Eric > > Thanks, > > C. >
Hello Eric, On 10/19/23 15:51, Eric Auger wrote: > Hi Cédric, > > On 10/19/23 13:07, Cédric Le Goater wrote: >> On 10/18/23 15:37, Michael S. Tsirkin wrote: >>> On Wed, Oct 11, 2023 at 07:52:16PM +0200, Eric Auger wrote: >>>> This applies on top of vfio-next: >>>> https://github.com/legoater/qemu/, vfio-next branch >>> >>> virtio things make sense >>> >>> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> >>> >>> let me know how you want to merge all this. >> >> Michael, >> >> I will grab the series if that's OK. > > I have just sent a v4 taking into account Alex' suggestions, collecting > Michael's and Alex' R-b, and YangHang's T-b. > This should be ready to go if you don't have any other comments and if > this survives your non regression tests ;-) Sure. I did a simple fix in patch 2 for a documentation breakage. No need to resend. I should apply in the morning. Cheers, C.
On Thu, Oct 19, 2023 at 01:07:41PM +0200, Cédric Le Goater wrote: > On 10/18/23 15:37, Michael S. Tsirkin wrote: > > On Wed, Oct 11, 2023 at 07:52:16PM +0200, Eric Auger wrote: > > > This applies on top of vfio-next: > > > https://github.com/legoater/qemu/, vfio-next branch > > > > virtio things make sense > > > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com> > > > > let me know how you want to merge all this. > > Michael, > > I will grab the series if that's OK. fine by me > Thanks, > > C.
The original issue I found : After starting a VM which has two ice PFs and a virtio-iommu device, qemu-kvm and VM guest dmesg throw lots of duplicate VFIO_MAP_DMA errors After testing with Eric's build, the original issue is gone and the Tier1 regression test against ice PF and virtio iommu device gets PASS as well. Tested-by: Yanghang Liu <yanghliu@redhat.com> On Wed, Oct 18, 2023 at 9:45 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Wed, Oct 11, 2023 at 07:52:16PM +0200, Eric Auger wrote: > > This applies on top of vfio-next: > > https://github.com/legoater/qemu/, vfio-next branch > > virtio things make sense > > Reviewed-by: Michael S. Tsirkin <mst@redhat.com> > > let me know how you want to merge all this. > > > > > On x86, when assigning VFIO-PCI devices protected with virtio-iommu > > we encounter the case where the guest tries to map IOVAs beyond 48b > > whereas the physical VTD IOMMU only supports 48b. This ends up with > > VFIO_MAP_DMA failures at qemu level because at kernel level, > > vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map(). > > > > This is due to the fact the virtio-iommu currently unconditionally > > exposes an IOVA range of 64b through its config input range fields. > > > > This series removes this assumption by retrieving the usable IOVA > > regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when > > a VFIO device is attached. This info is communicated to the > > virtio-iommu memory region, transformed into the inversed info, ie. > > the host reserved IOVA regions. Then those latter are combined with the > > reserved IOVA regions set though the virtio-iommu reserved-regions > > property. That way, the guest virtio-iommu driver, unchanged, is > > able to probe the whole set of reserved regions and prevent any IOVA > > belonging to those ranges from beeing used, achieving the original goal. > > > > Best Regards > > > > Eric > > > > This series can be found at: > > https://github.com/eauger/qemu/tree/vfio-next-iommu_geometry-v3 > > > > History: > > v2 -> v3: > > - rebase on top of vfio-next (including iommufd prereq) > > - take into account IOVA range info capability may not be offered by > > old kernel and use nr_iovas = -1 to encode that [Alex] > > - use GList * everywhere instead of arrays (in the range_inverse_array) > > with the benefice it sorts ranges retrieved from the kernel which are > > not garanteed to be sorted. Rework the tests accordingly [Alex] > > - Make sure resv_regions GList is build before the probe() [Jean] > > per device list is first populated with prop resv regions on > > IOMMUDevice creation and then rebuilt on set_iova() > > - Add a warning if set_iova builds a valid list after probe was > > called [Jean] > > - Build host windows on top of IOVA valid ranges if this info can > > be retrieved from the kernel. As many windows are created as > > valid ranges > > v1 -> v2: > > - Remove "[PATCH 12/13] virtio-iommu: Resize memory region according > > to the max iova info" which causes way too much trouble: trigger > > a coredump in vhost, causes duplication of IOMMU notifiers causing > > EEXIST vfio_dma_map errors, ... This looks like a bad usage of the > > memory API so I prefer removing this from this series. So I was > > also obliged to remove the vfio_find_hostwin() check in the case > > of an IOMMU. > > - Let range_inverse_array() take low/high args instead of hardcoding > > 0, UINT64_MAX which both complexifies the algo and the tests. > > - Move range function description in header. > > - Check that if set_iova_ranges is called several times, new resv > > regions are included in previous ones > > > > Eric Auger (13): > > memory: Let ReservedRegion use Range > > memory: Introduce memory_region_iommu_set_iova_ranges > > vfio: Collect container iova range info > > virtio-iommu: Rename reserved_regions into prop_resv_regions > > range: Make range_compare() public > > util/reserved-region: Add new ReservedRegion helpers > > virtio-iommu: Introduce per IOMMUDevice reserved regions > > range: Introduce range_inverse_array() > > virtio-iommu: Record whether a probe request has been issued > > virtio-iommu: Implement set_iova_ranges() callback > > virtio-iommu: Consolidate host reserved regions and property set ones > > test: Add some tests for range and resv-mem helpers > > vfio: Remove 64-bit IOVA address space assumption > > > > include/exec/memory.h | 34 +++- > > include/hw/vfio/vfio-common.h | 2 + > > include/hw/virtio/virtio-iommu.h | 7 +- > > include/qemu/range.h | 14 ++ > > include/qemu/reserved-region.h | 32 ++++ > > hw/core/qdev-properties-system.c | 9 +- > > hw/vfio/common.c | 23 ++- > > hw/vfio/container.c | 67 ++++++- > > hw/virtio/virtio-iommu-pci.c | 8 +- > > hw/virtio/virtio-iommu.c | 155 +++++++++++++-- > > system/memory.c | 13 ++ > > tests/unit/test-resv-mem.c | 318 +++++++++++++++++++++++++++++++ > > util/range.c | 61 +++++- > > util/reserved-region.c | 91 +++++++++ > > hw/virtio/trace-events | 1 + > > tests/unit/meson.build | 1 + > > util/meson.build | 1 + > > 17 files changed, 791 insertions(+), 46 deletions(-) > > create mode 100644 include/qemu/reserved-region.h > > create mode 100644 tests/unit/test-resv-mem.c > > create mode 100644 util/reserved-region.c > > > > -- > > 2.41.0 > >
Hi Yanghang, On 10/19/23 11:07, YangHang Liu wrote: > The original issue I found : After starting a VM which has two ice PFs > and a virtio-iommu device, qemu-kvm and VM guest dmesg throw lots of > duplicate VFIO_MAP_DMA errors > > After testing with Eric's build, the original issue is gone and the > Tier1 regression test against ice PF and virtio iommu device gets PASS > as well. > > Tested-by: Yanghang Liu <yanghliu@redhat.com> Thank you for testing! Eric > > > On Wed, Oct 18, 2023 at 9:45 PM Michael S. Tsirkin <mst@redhat.com> wrote: >> On Wed, Oct 11, 2023 at 07:52:16PM +0200, Eric Auger wrote: >>> This applies on top of vfio-next: >>> https://github.com/legoater/qemu/, vfio-next branch >> virtio things make sense >> >> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> >> >> let me know how you want to merge all this. >> >> >> >>> On x86, when assigning VFIO-PCI devices protected with virtio-iommu >>> we encounter the case where the guest tries to map IOVAs beyond 48b >>> whereas the physical VTD IOMMU only supports 48b. This ends up with >>> VFIO_MAP_DMA failures at qemu level because at kernel level, >>> vfio_iommu_iova_dma_valid() check returns false on vfio_map_do_map(). >>> >>> This is due to the fact the virtio-iommu currently unconditionally >>> exposes an IOVA range of 64b through its config input range fields. >>> >>> This series removes this assumption by retrieving the usable IOVA >>> regions through the VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE UAPI when >>> a VFIO device is attached. This info is communicated to the >>> virtio-iommu memory region, transformed into the inversed info, ie. >>> the host reserved IOVA regions. Then those latter are combined with the >>> reserved IOVA regions set though the virtio-iommu reserved-regions >>> property. That way, the guest virtio-iommu driver, unchanged, is >>> able to probe the whole set of reserved regions and prevent any IOVA >>> belonging to those ranges from beeing used, achieving the original goal. >>> >>> Best Regards >>> >>> Eric >>> >>> This series can be found at: >>> https://github.com/eauger/qemu/tree/vfio-next-iommu_geometry-v3 >>> >>> History: >>> v2 -> v3: >>> - rebase on top of vfio-next (including iommufd prereq) >>> - take into account IOVA range info capability may not be offered by >>> old kernel and use nr_iovas = -1 to encode that [Alex] >>> - use GList * everywhere instead of arrays (in the range_inverse_array) >>> with the benefice it sorts ranges retrieved from the kernel which are >>> not garanteed to be sorted. Rework the tests accordingly [Alex] >>> - Make sure resv_regions GList is build before the probe() [Jean] >>> per device list is first populated with prop resv regions on >>> IOMMUDevice creation and then rebuilt on set_iova() >>> - Add a warning if set_iova builds a valid list after probe was >>> called [Jean] >>> - Build host windows on top of IOVA valid ranges if this info can >>> be retrieved from the kernel. As many windows are created as >>> valid ranges >>> v1 -> v2: >>> - Remove "[PATCH 12/13] virtio-iommu: Resize memory region according >>> to the max iova info" which causes way too much trouble: trigger >>> a coredump in vhost, causes duplication of IOMMU notifiers causing >>> EEXIST vfio_dma_map errors, ... This looks like a bad usage of the >>> memory API so I prefer removing this from this series. So I was >>> also obliged to remove the vfio_find_hostwin() check in the case >>> of an IOMMU. >>> - Let range_inverse_array() take low/high args instead of hardcoding >>> 0, UINT64_MAX which both complexifies the algo and the tests. >>> - Move range function description in header. >>> - Check that if set_iova_ranges is called several times, new resv >>> regions are included in previous ones >>> >>> Eric Auger (13): >>> memory: Let ReservedRegion use Range >>> memory: Introduce memory_region_iommu_set_iova_ranges >>> vfio: Collect container iova range info >>> virtio-iommu: Rename reserved_regions into prop_resv_regions >>> range: Make range_compare() public >>> util/reserved-region: Add new ReservedRegion helpers >>> virtio-iommu: Introduce per IOMMUDevice reserved regions >>> range: Introduce range_inverse_array() >>> virtio-iommu: Record whether a probe request has been issued >>> virtio-iommu: Implement set_iova_ranges() callback >>> virtio-iommu: Consolidate host reserved regions and property set ones >>> test: Add some tests for range and resv-mem helpers >>> vfio: Remove 64-bit IOVA address space assumption >>> >>> include/exec/memory.h | 34 +++- >>> include/hw/vfio/vfio-common.h | 2 + >>> include/hw/virtio/virtio-iommu.h | 7 +- >>> include/qemu/range.h | 14 ++ >>> include/qemu/reserved-region.h | 32 ++++ >>> hw/core/qdev-properties-system.c | 9 +- >>> hw/vfio/common.c | 23 ++- >>> hw/vfio/container.c | 67 ++++++- >>> hw/virtio/virtio-iommu-pci.c | 8 +- >>> hw/virtio/virtio-iommu.c | 155 +++++++++++++-- >>> system/memory.c | 13 ++ >>> tests/unit/test-resv-mem.c | 318 +++++++++++++++++++++++++++++++ >>> util/range.c | 61 +++++- >>> util/reserved-region.c | 91 +++++++++ >>> hw/virtio/trace-events | 1 + >>> tests/unit/meson.build | 1 + >>> util/meson.build | 1 + >>> 17 files changed, 791 insertions(+), 46 deletions(-) >>> create mode 100644 include/qemu/reserved-region.h >>> create mode 100644 tests/unit/test-resv-mem.c >>> create mode 100644 util/reserved-region.c >>> >>> -- >>> 2.41.0 >>
© 2016 - 2024 Red Hat, Inc.