hw/i386/intel_iommu.c | 442 +++++++++++++++++++++++++++++++++++++++-- hw/i386/intel_iommu_internal.h | 1 + hw/i386/trace-events | 10 +- hw/vfio/common.c | 12 +- hw/virtio/vhost.c | 10 +- include/exec/memory.h | 49 ++++- include/hw/i386/intel_iommu.h | 10 + memory.c | 52 ++++- 8 files changed, 552 insertions(+), 34 deletions(-)
This is v8 of vt-d vfio enablement series.
v8
- remove patches 1-9 since merged already
- add David's r-b for all the patches
- add Aviv's s-o-b in the last patch
- rename iommu to iommu_dmar [Jason]
- rename last patch subject to "remote IOTLB" [Jason]
- pick up jason's two patches to fix vhost breakage
- let vhost leverage the new IOMMU notifier interface
v7:
- for the two traces patches: Change subjects. Remove vtd_err() and
vtd_err_nonzero_rsvd() tracers, instead using standalone trace for
each of the places. Don't remove any DPRINTF() if there is no
replacement. [Jason]
- add r-b and a-b for Alex/David/Jason.
- in patch "intel_iommu: renaming gpa to iova where proper", convert
one more place where I missed [Jason]
- fix the place where I should use "~0ULL" not "~0" [Jason]
- squash patch 16 into 18 [Jason]
v6:
- do unmap in all cases when replay [Jason]
- do global replay even if context entry is invalidated [Jason]
- when iommu reset, send unmap to all registered notifiers [Jason]
- use rcu read lock to protect the whole vfio_iommu_map_notify()
[Alex, Paolo]
v5:
- fix patch 4 subject too long, and error spelling [Eric]
- add ack-by for alex in patch 1 [Alex]
- squashing patch 19/20 into patch 18 [Jason]
- fix comments in vtd_page_walk() [Jason]
- remove all error_report() [Jason]
- add comment for patch 18, mention about that enabled vhost without
ATS as well [Jason]
- remove skipped debug thing during page walk [Jason]
- remove duplicated page walk trace [Jason]
- some tunings in vtd_address_space_unmap(), to provide correct iova
and addr_mask. For this, I tuned this patch as well a bit:
"memory: add section range info for IOMMU notifier"
to loosen the range check
v4:
- convert all error_report()s into traces (in the two patches that did
that)
- rebased to Jason's DMAR series (master + one more patch:
"[PATCH V4 net-next] vhost_net: device IOTLB support")
- let vhost use the new api iommu_notifier_init() so it won't break
vhost dmar [Jason]
- touch commit message of the patch:
"intel_iommu: provide its own replay() callback"
old replay is not a dead loop, but it will just consume lots of time
[Jason]
- add comment for patch:
"intel_iommu: do replay when context invalidate"
telling why replay won't be a problem even without CM=1 [Jason]
- remove a useless comment line [Jason]
- remove dmar_enabled parameter for vtd_switch_address_space() and
vtd_switch_address_space_all() [Mst, Jason]
- merged the vfio patches in, to support unmap of big ranges at the
beginning ("[PATCH RFC 0/3] vfio: allow to notify unmap for very big
region")
- using caching_mode instead of cache_mode_enabled, and "caching-mode"
instead of "cache-mode" [Kevin]
- when receive context entry invalidation, we unmap the entire region
first, then replay [Alex]
- fix commit message for patch:
"intel_iommu: simplify irq region translation" [Kevin]
- handle domain/global invalidation, and notify where proper [Jason,
Kevin]
v3:
- fix style error reported by patchew
- fix comment in domain switch patch: use "IOMMU address space" rather
than "IOMMU region" [Kevin]
- add ack-by for Paolo in patch:
"memory: add section range info for IOMMU notifier"
(this is seperately collected besides this thread)
- remove 3 patches which are merged already (from Jason)
- rebase to master b6c0897
v2:
- change comment for "end" parameter in vtd_page_walk() [Tianyu]
- change comment for "a iova" to "an iova" [Yi]
- fix fault printed val for GPA address in vtd_page_walk_level (debug
only)
- rebased to master (rather than Aviv's v6 series) and merged Aviv's
series v6: picked patch 1 (as patch 1 in this series), dropped patch
2, re-wrote patch 3 (as patch 17 of this series).
- picked up two more bugfix patches from Jason's DMAR series
- picked up the following patch as well:
"[PATCH v3] intel_iommu: allow dynamic switch of IOMMU region"
This RFC series is a re-work for Aviv B.D.'s vfio enablement series
with vt-d:
https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01452.html
Aviv has done a great job there, and what we still lack there are
mostly the following:
(1) VFIO got duplicated IOTLB notifications due to splitted VT-d IOMMU
memory region.
(2) VT-d still haven't provide a correct replay() mechanism (e.g.,
when IOMMU domain switches, things will broke).
This series should have solved the above two issues.
Online repo:
https://github.com/xzpeter/qemu/tree/vtd-vfio-enablement-v8
I would be glad to hear about any review comments for above patches.
=========
Test Done
=========
Build test passed for x86_64/arm/ppc64.
Simply tested with x86_64, assigning two PCI devices to a single VM,
boot the VM using:
bin=x86_64-softmmu/qemu-system-x86_64
$bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \
-device intel-iommu,intremap=on,eim=off,caching-mode=on \
-netdev user,id=net0,hostfwd=tcp::5555-:22 \
-device virtio-net-pci,netdev=net0 \
-device vfio-pci,host=03:00.0 \
-device vfio-pci,host=02:00.0 \
-trace events=".trace.vfio" \
/var/lib/libvirt/images/vm1.qcow2
pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio
vtd_page_walk*
vtd_replay*
vtd_inv_desc*
Then, in the guest, run the following tool:
https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c
With parameter:
./vfio-bind-group 00:03.0 00:04.0
Check host side trace log, I can see pages are replayed and mapped in
00:04.0 device address space, like:
...
vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo 0x38fe1001
vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova range 0x0 - 0x8000000000
vtd_page_walk_level Page walk (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000
vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range 0x0 - 0x40000000
vtd_page_walk_level Page walk (base=0x34979000, level=1) iova range 0x0 - 0x200000
vtd_page_walk_one Page walk detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> gpa 0x22e25000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> gpa 0x22e2d000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> gpa 0x129bb000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa 0x12a80000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa 0x12b22000 mask 0xfff perm 3
vtd_page_walk_one Page walk detected map level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3
...
=========
Todo List
=========
- error reporting for the assigned devices (as Tianyu has mentioned)
- per-domain address-space: A better solution in the future may be -
we maintain one address space per IOMMU domain in the guest (so
multiple devices can share a same address space if they are sharing
the same IOMMU domains in the guest), rather than one address space
per device (which is current implementation of vt-d). However that's
a step further than this series, and let's see whether we can first
provide a workable version of device assignment with vt-d
protection.
- don't need to notify IOTLB (psi/gsi/global) invalidations to devices
that with ATS enabled
- investigate when guest map page while mask contains existing mapped
pages (e.g. map 12k-16k first, then map 0-12k)
- coalesce unmap during page walk (currently, we send it once per
page)
- when do PSI for unmap, whether we can send one notify directly
instead of walking over the page table?
- more to come...
Thanks,
Jason Wang (1):
intel_iommu: use the correct memory region for device IOTLB
notification
Peter Xu (8):
memory: add section range info for IOMMU notifier
memory: provide IOMMU_NOTIFIER_FOREACH macro
memory: provide iommu_replay_all()
memory: introduce memory_region_notify_one()
memory: add MemoryRegionIOMMUOps.replay() callback
intel_iommu: provide its own replay() callback
intel_iommu: allow dynamic switch of IOMMU region
intel_iommu: enable remote IOTLB
hw/i386/intel_iommu.c | 442 +++++++++++++++++++++++++++++++++++++++--
hw/i386/intel_iommu_internal.h | 1 +
hw/i386/trace-events | 10 +-
hw/vfio/common.c | 12 +-
hw/virtio/vhost.c | 10 +-
include/exec/memory.h | 49 ++++-
include/hw/i386/intel_iommu.h | 10 +
memory.c | 52 ++++-
8 files changed, 552 insertions(+), 34 deletions(-)
--
2.7.4
On Thu, Apr 06, 2017 at 03:08:35PM +0800, Peter Xu wrote:
> This is v8 of vt-d vfio enablement series.
>
> v8
> - remove patches 1-9 since merged already
> - add David's r-b for all the patches
> - add Aviv's s-o-b in the last patch
> - rename iommu to iommu_dmar [Jason]
> - rename last patch subject to "remote IOTLB" [Jason]
> - pick up jason's two patches to fix vhost breakage
I only see one (6/9) - is a patch missing or misattributed?
> - let vhost leverage the new IOMMU notifier interface
Which patch does this?
> v7:
> - for the two traces patches: Change subjects. Remove vtd_err() and
> vtd_err_nonzero_rsvd() tracers, instead using standalone trace for
> each of the places. Don't remove any DPRINTF() if there is no
> replacement. [Jason]
> - add r-b and a-b for Alex/David/Jason.
> - in patch "intel_iommu: renaming gpa to iova where proper", convert
> one more place where I missed [Jason]
> - fix the place where I should use "~0ULL" not "~0" [Jason]
> - squash patch 16 into 18 [Jason]
>
> v6:
> - do unmap in all cases when replay [Jason]
> - do global replay even if context entry is invalidated [Jason]
> - when iommu reset, send unmap to all registered notifiers [Jason]
> - use rcu read lock to protect the whole vfio_iommu_map_notify()
> [Alex, Paolo]
>
> v5:
> - fix patch 4 subject too long, and error spelling [Eric]
> - add ack-by for alex in patch 1 [Alex]
> - squashing patch 19/20 into patch 18 [Jason]
> - fix comments in vtd_page_walk() [Jason]
> - remove all error_report() [Jason]
> - add comment for patch 18, mention about that enabled vhost without
> ATS as well [Jason]
> - remove skipped debug thing during page walk [Jason]
> - remove duplicated page walk trace [Jason]
> - some tunings in vtd_address_space_unmap(), to provide correct iova
> and addr_mask. For this, I tuned this patch as well a bit:
> "memory: add section range info for IOMMU notifier"
> to loosen the range check
>
> v4:
> - convert all error_report()s into traces (in the two patches that did
> that)
> - rebased to Jason's DMAR series (master + one more patch:
> "[PATCH V4 net-next] vhost_net: device IOTLB support")
> - let vhost use the new api iommu_notifier_init() so it won't break
> vhost dmar [Jason]
> - touch commit message of the patch:
> "intel_iommu: provide its own replay() callback"
> old replay is not a dead loop, but it will just consume lots of time
> [Jason]
> - add comment for patch:
> "intel_iommu: do replay when context invalidate"
> telling why replay won't be a problem even without CM=1 [Jason]
> - remove a useless comment line [Jason]
> - remove dmar_enabled parameter for vtd_switch_address_space() and
> vtd_switch_address_space_all() [Mst, Jason]
> - merged the vfio patches in, to support unmap of big ranges at the
> beginning ("[PATCH RFC 0/3] vfio: allow to notify unmap for very big
> region")
> - using caching_mode instead of cache_mode_enabled, and "caching-mode"
> instead of "cache-mode" [Kevin]
> - when receive context entry invalidation, we unmap the entire region
> first, then replay [Alex]
> - fix commit message for patch:
> "intel_iommu: simplify irq region translation" [Kevin]
> - handle domain/global invalidation, and notify where proper [Jason,
> Kevin]
>
> v3:
> - fix style error reported by patchew
> - fix comment in domain switch patch: use "IOMMU address space" rather
> than "IOMMU region" [Kevin]
> - add ack-by for Paolo in patch:
> "memory: add section range info for IOMMU notifier"
> (this is seperately collected besides this thread)
> - remove 3 patches which are merged already (from Jason)
> - rebase to master b6c0897
>
> v2:
> - change comment for "end" parameter in vtd_page_walk() [Tianyu]
> - change comment for "a iova" to "an iova" [Yi]
> - fix fault printed val for GPA address in vtd_page_walk_level (debug
> only)
> - rebased to master (rather than Aviv's v6 series) and merged Aviv's
> series v6: picked patch 1 (as patch 1 in this series), dropped patch
> 2, re-wrote patch 3 (as patch 17 of this series).
> - picked up two more bugfix patches from Jason's DMAR series
> - picked up the following patch as well:
> "[PATCH v3] intel_iommu: allow dynamic switch of IOMMU region"
>
> This RFC series is a re-work for Aviv B.D.'s vfio enablement series
> with vt-d:
>
> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01452.html
>
> Aviv has done a great job there, and what we still lack there are
> mostly the following:
>
> (1) VFIO got duplicated IOTLB notifications due to splitted VT-d IOMMU
> memory region.
>
> (2) VT-d still haven't provide a correct replay() mechanism (e.g.,
> when IOMMU domain switches, things will broke).
>
> This series should have solved the above two issues.
>
> Online repo:
>
> https://github.com/xzpeter/qemu/tree/vtd-vfio-enablement-v8
>
> I would be glad to hear about any review comments for above patches.
>
> =========
> Test Done
> =========
>
> Build test passed for x86_64/arm/ppc64.
>
> Simply tested with x86_64, assigning two PCI devices to a single VM,
> boot the VM using:
>
> bin=x86_64-softmmu/qemu-system-x86_64
> $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \
> -device intel-iommu,intremap=on,eim=off,caching-mode=on \
> -netdev user,id=net0,hostfwd=tcp::5555-:22 \
> -device virtio-net-pci,netdev=net0 \
> -device vfio-pci,host=03:00.0 \
> -device vfio-pci,host=02:00.0 \
> -trace events=".trace.vfio" \
> /var/lib/libvirt/images/vm1.qcow2
>
> pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio
> vtd_page_walk*
> vtd_replay*
> vtd_inv_desc*
>
> Then, in the guest, run the following tool:
>
> https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c
>
> With parameter:
>
> ./vfio-bind-group 00:03.0 00:04.0
>
> Check host side trace log, I can see pages are replayed and mapped in
> 00:04.0 device address space, like:
>
> ...
> vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo 0x38fe1001
> vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova range 0x0 - 0x8000000000
> vtd_page_walk_level Page walk (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000
> vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range 0x0 - 0x40000000
> vtd_page_walk_level Page walk (base=0x34979000, level=1) iova range 0x0 - 0x200000
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> gpa 0x22e25000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> gpa 0x22e2d000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> gpa 0x129bb000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa 0x12a80000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa 0x12b22000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3
> ...
>
> =========
> Todo List
> =========
>
> - error reporting for the assigned devices (as Tianyu has mentioned)
>
> - per-domain address-space: A better solution in the future may be -
> we maintain one address space per IOMMU domain in the guest (so
> multiple devices can share a same address space if they are sharing
> the same IOMMU domains in the guest), rather than one address space
> per device (which is current implementation of vt-d). However that's
> a step further than this series, and let's see whether we can first
> provide a workable version of device assignment with vt-d
> protection.
>
> - don't need to notify IOTLB (psi/gsi/global) invalidations to devices
> that with ATS enabled
>
> - investigate when guest map page while mask contains existing mapped
> pages (e.g. map 12k-16k first, then map 0-12k)
>
> - coalesce unmap during page walk (currently, we send it once per
> page)
>
> - when do PSI for unmap, whether we can send one notify directly
> instead of walking over the page table?
>
> - more to come...
>
> Thanks,
>
> Jason Wang (1):
> intel_iommu: use the correct memory region for device IOTLB
> notification
>
> Peter Xu (8):
> memory: add section range info for IOMMU notifier
> memory: provide IOMMU_NOTIFIER_FOREACH macro
> memory: provide iommu_replay_all()
> memory: introduce memory_region_notify_one()
> memory: add MemoryRegionIOMMUOps.replay() callback
> intel_iommu: provide its own replay() callback
> intel_iommu: allow dynamic switch of IOMMU region
> intel_iommu: enable remote IOTLB
>
> hw/i386/intel_iommu.c | 442 +++++++++++++++++++++++++++++++++++++++--
> hw/i386/intel_iommu_internal.h | 1 +
> hw/i386/trace-events | 10 +-
> hw/vfio/common.c | 12 +-
> hw/virtio/vhost.c | 10 +-
> include/exec/memory.h | 49 ++++-
> include/hw/i386/intel_iommu.h | 10 +
> memory.c | 52 ++++-
> 8 files changed, 552 insertions(+), 34 deletions(-)
>
> --
> 2.7.4
On Thu, Apr 06, 2017 at 02:53:46PM +0300, Michael S. Tsirkin wrote:
> On Thu, Apr 06, 2017 at 03:08:35PM +0800, Peter Xu wrote:
> > This is v8 of vt-d vfio enablement series.
> >
> > v8
> > - remove patches 1-9 since merged already
> > - add David's r-b for all the patches
> > - add Aviv's s-o-b in the last patch
> > - rename iommu to iommu_dmar [Jason]
> > - rename last patch subject to "remote IOTLB" [Jason]
> > - pick up jason's two patches to fix vhost breakage
>
> I only see one (6/9) - is a patch missing or misattributed?
Oh sorry I should say "jason's one patch". The other patch has already
been merged upstream, which is 375f74f47 ("vhost: generalize iommu
memory region").
>
> > - let vhost leverage the new IOMMU notifier interface
>
> Which patch does this?
The first one ("memory: add section range info for IOMMU notifier").
Thanks,
-- peterx
On Thu, Apr 06, 2017 at 03:08:35PM +0800, Peter Xu wrote:
> This is v8 of vt-d vfio enablement series.
>
> v8
> - remove patches 1-9 since merged already
> - add David's r-b for all the patches
> - add Aviv's s-o-b in the last patch
> - rename iommu to iommu_dmar [Jason]
> - rename last patch subject to "remote IOTLB" [Jason]
> - pick up jason's two patches to fix vhost breakage
> - let vhost leverage the new IOMMU notifier interface
Looks good to me except I'm still wondering why there's a single patch
by Jason. Marcel - you mentioned that you intend to try to open and
maintain pci-next. Would you like to pick this one up? Otherwise, pls
repost after 2.10.
> v7:
> - for the two traces patches: Change subjects. Remove vtd_err() and
> vtd_err_nonzero_rsvd() tracers, instead using standalone trace for
> each of the places. Don't remove any DPRINTF() if there is no
> replacement. [Jason]
> - add r-b and a-b for Alex/David/Jason.
> - in patch "intel_iommu: renaming gpa to iova where proper", convert
> one more place where I missed [Jason]
> - fix the place where I should use "~0ULL" not "~0" [Jason]
> - squash patch 16 into 18 [Jason]
>
> v6:
> - do unmap in all cases when replay [Jason]
> - do global replay even if context entry is invalidated [Jason]
> - when iommu reset, send unmap to all registered notifiers [Jason]
> - use rcu read lock to protect the whole vfio_iommu_map_notify()
> [Alex, Paolo]
>
> v5:
> - fix patch 4 subject too long, and error spelling [Eric]
> - add ack-by for alex in patch 1 [Alex]
> - squashing patch 19/20 into patch 18 [Jason]
> - fix comments in vtd_page_walk() [Jason]
> - remove all error_report() [Jason]
> - add comment for patch 18, mention about that enabled vhost without
> ATS as well [Jason]
> - remove skipped debug thing during page walk [Jason]
> - remove duplicated page walk trace [Jason]
> - some tunings in vtd_address_space_unmap(), to provide correct iova
> and addr_mask. For this, I tuned this patch as well a bit:
> "memory: add section range info for IOMMU notifier"
> to loosen the range check
>
> v4:
> - convert all error_report()s into traces (in the two patches that did
> that)
> - rebased to Jason's DMAR series (master + one more patch:
> "[PATCH V4 net-next] vhost_net: device IOTLB support")
> - let vhost use the new api iommu_notifier_init() so it won't break
> vhost dmar [Jason]
> - touch commit message of the patch:
> "intel_iommu: provide its own replay() callback"
> old replay is not a dead loop, but it will just consume lots of time
> [Jason]
> - add comment for patch:
> "intel_iommu: do replay when context invalidate"
> telling why replay won't be a problem even without CM=1 [Jason]
> - remove a useless comment line [Jason]
> - remove dmar_enabled parameter for vtd_switch_address_space() and
> vtd_switch_address_space_all() [Mst, Jason]
> - merged the vfio patches in, to support unmap of big ranges at the
> beginning ("[PATCH RFC 0/3] vfio: allow to notify unmap for very big
> region")
> - using caching_mode instead of cache_mode_enabled, and "caching-mode"
> instead of "cache-mode" [Kevin]
> - when receive context entry invalidation, we unmap the entire region
> first, then replay [Alex]
> - fix commit message for patch:
> "intel_iommu: simplify irq region translation" [Kevin]
> - handle domain/global invalidation, and notify where proper [Jason,
> Kevin]
>
> v3:
> - fix style error reported by patchew
> - fix comment in domain switch patch: use "IOMMU address space" rather
> than "IOMMU region" [Kevin]
> - add ack-by for Paolo in patch:
> "memory: add section range info for IOMMU notifier"
> (this is seperately collected besides this thread)
> - remove 3 patches which are merged already (from Jason)
> - rebase to master b6c0897
>
> v2:
> - change comment for "end" parameter in vtd_page_walk() [Tianyu]
> - change comment for "a iova" to "an iova" [Yi]
> - fix fault printed val for GPA address in vtd_page_walk_level (debug
> only)
> - rebased to master (rather than Aviv's v6 series) and merged Aviv's
> series v6: picked patch 1 (as patch 1 in this series), dropped patch
> 2, re-wrote patch 3 (as patch 17 of this series).
> - picked up two more bugfix patches from Jason's DMAR series
> - picked up the following patch as well:
> "[PATCH v3] intel_iommu: allow dynamic switch of IOMMU region"
>
> This RFC series is a re-work for Aviv B.D.'s vfio enablement series
> with vt-d:
>
> https://lists.gnu.org/archive/html/qemu-devel/2016-11/msg01452.html
>
> Aviv has done a great job there, and what we still lack there are
> mostly the following:
>
> (1) VFIO got duplicated IOTLB notifications due to splitted VT-d IOMMU
> memory region.
>
> (2) VT-d still haven't provide a correct replay() mechanism (e.g.,
> when IOMMU domain switches, things will broke).
>
> This series should have solved the above two issues.
>
> Online repo:
>
> https://github.com/xzpeter/qemu/tree/vtd-vfio-enablement-v8
>
> I would be glad to hear about any review comments for above patches.
>
> =========
> Test Done
> =========
>
> Build test passed for x86_64/arm/ppc64.
>
> Simply tested with x86_64, assigning two PCI devices to a single VM,
> boot the VM using:
>
> bin=x86_64-softmmu/qemu-system-x86_64
> $bin -M q35,accel=kvm,kernel-irqchip=split -m 1G \
> -device intel-iommu,intremap=on,eim=off,caching-mode=on \
> -netdev user,id=net0,hostfwd=tcp::5555-:22 \
> -device virtio-net-pci,netdev=net0 \
> -device vfio-pci,host=03:00.0 \
> -device vfio-pci,host=02:00.0 \
> -trace events=".trace.vfio" \
> /var/lib/libvirt/images/vm1.qcow2
>
> pxdev:bin [vtd-vfio-enablement]# cat .trace.vfio
> vtd_page_walk*
> vtd_replay*
> vtd_inv_desc*
>
> Then, in the guest, run the following tool:
>
> https://github.com/xzpeter/clibs/blob/master/gpl/userspace/vfio-bind-group/vfio-bind-group.c
>
> With parameter:
>
> ./vfio-bind-group 00:03.0 00:04.0
>
> Check host side trace log, I can see pages are replayed and mapped in
> 00:04.0 device address space, like:
>
> ...
> vtd_replay_ce_valid replay valid context device 00:04.00 hi 0x401 lo 0x38fe1001
> vtd_page_walk Page walk for ce (0x401, 0x38fe1001) iova range 0x0 - 0x8000000000
> vtd_page_walk_level Page walk (base=0x38fe1000, level=3) iova range 0x0 - 0x8000000000
> vtd_page_walk_level Page walk (base=0x35d31000, level=2) iova range 0x0 - 0x40000000
> vtd_page_walk_level Page walk (base=0x34979000, level=1) iova range 0x0 - 0x200000
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x0 -> gpa 0x22dc3000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x1000 -> gpa 0x22e25000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x2000 -> gpa 0x22e12000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x3000 -> gpa 0x22e2d000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x4000 -> gpa 0x12a49000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x5000 -> gpa 0x129bb000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x6000 -> gpa 0x128db000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x7000 -> gpa 0x12a80000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x8000 -> gpa 0x12a7e000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0x9000 -> gpa 0x12b22000 mask 0xfff perm 3
> vtd_page_walk_one Page walk detected map level 0x1 iova 0xa000 -> gpa 0x12b41000 mask 0xfff perm 3
> ...
>
> =========
> Todo List
> =========
>
> - error reporting for the assigned devices (as Tianyu has mentioned)
>
> - per-domain address-space: A better solution in the future may be -
> we maintain one address space per IOMMU domain in the guest (so
> multiple devices can share a same address space if they are sharing
> the same IOMMU domains in the guest), rather than one address space
> per device (which is current implementation of vt-d). However that's
> a step further than this series, and let's see whether we can first
> provide a workable version of device assignment with vt-d
> protection.
>
> - don't need to notify IOTLB (psi/gsi/global) invalidations to devices
> that with ATS enabled
>
> - investigate when guest map page while mask contains existing mapped
> pages (e.g. map 12k-16k first, then map 0-12k)
>
> - coalesce unmap during page walk (currently, we send it once per
> page)
>
> - when do PSI for unmap, whether we can send one notify directly
> instead of walking over the page table?
>
> - more to come...
>
> Thanks,
>
> Jason Wang (1):
> intel_iommu: use the correct memory region for device IOTLB
> notification
>
> Peter Xu (8):
> memory: add section range info for IOMMU notifier
> memory: provide IOMMU_NOTIFIER_FOREACH macro
> memory: provide iommu_replay_all()
> memory: introduce memory_region_notify_one()
> memory: add MemoryRegionIOMMUOps.replay() callback
> intel_iommu: provide its own replay() callback
> intel_iommu: allow dynamic switch of IOMMU region
> intel_iommu: enable remote IOTLB
>
> hw/i386/intel_iommu.c | 442 +++++++++++++++++++++++++++++++++++++++--
> hw/i386/intel_iommu_internal.h | 1 +
> hw/i386/trace-events | 10 +-
> hw/vfio/common.c | 12 +-
> hw/virtio/vhost.c | 10 +-
> include/exec/memory.h | 49 ++++-
> include/hw/i386/intel_iommu.h | 10 +
> memory.c | 52 ++++-
> 8 files changed, 552 insertions(+), 34 deletions(-)
>
> --
> 2.7.4
On Thu, Apr 06, 2017 at 03:00:28PM +0300, Michael S. Tsirkin wrote: > On Thu, Apr 06, 2017 at 03:08:35PM +0800, Peter Xu wrote: > > This is v8 of vt-d vfio enablement series. > > > > v8 > > - remove patches 1-9 since merged already > > - add David's r-b for all the patches > > - add Aviv's s-o-b in the last patch > > - rename iommu to iommu_dmar [Jason] > > - rename last patch subject to "remote IOTLB" [Jason] > > - pick up jason's two patches to fix vhost breakage > > - let vhost leverage the new IOMMU notifier interface > > Looks good to me except I'm still wondering why there's a single patch > by Jason. Answered in the other thread (the other one has been picked up already). > Marcel - you mentioned that you intend to try to open and > maintain pci-next. Would you like to pick this one up? Otherwise, pls > repost after 2.10. Thanks, -- peterx
© 2016 - 2026 Red Hat, Inc.