default-configs/aarch64-softmmu.mak | 1 + hw/arm/Makefile.objs | 1 + hw/arm/smmu-common.c | 474 +++++++++++++ hw/arm/smmu-internal.h | 89 +++ hw/arm/smmuv3-internal.h | 651 ++++++++++++++++++ hw/arm/smmuv3.c | 1256 +++++++++++++++++++++++++++++++++++ hw/arm/trace-events | 54 ++ hw/arm/virt-acpi-build.c | 56 +- hw/arm/virt.c | 111 +++- include/hw/acpi/acpi-defs.h | 15 + include/hw/arm/smmu-common.h | 127 ++++ include/hw/arm/smmuv3.h | 87 +++ include/hw/arm/virt.h | 5 + target/arm/kvm.c | 28 + target/arm/trace-events | 3 + 15 files changed, 2949 insertions(+), 9 deletions(-) create mode 100644 hw/arm/smmu-common.c create mode 100644 hw/arm/smmu-internal.h create mode 100644 hw/arm/smmuv3-internal.h create mode 100644 hw/arm/smmuv3.c create mode 100644 include/hw/arm/smmu-common.h create mode 100644 include/hw/arm/smmuv3.h
This series implements the emulation code for ARM SMMUv3.
This is the continuation of Prem's work [1].
This v5 mainly brings VFIO integration in DT mode. On guest kernel
side, this requires a quirk [1] to force TLB invalidation on map.
The following changes also are noticeable:
- fix SMMU_CMDQ_CONS offset
- adds dma-coherent dt property which fixes the unhandled command
opcode bug.
- implements block PTE
The smmu is instantiated when passing the smmu option to machvirt:
"-M virt-2.10,smmu"
As I haven't split the code yet so that it can be easily reviewable
I don't expect deep reviews at this stage. Also the implementation may
be largely sub-optimal.
Tested Use Cases:
- booted a guest in dt and acpi mode with an iommu_platform
virtio-net-pci device (using dma ops). Tested with the following
guest combinations: 4K page - 39 bit VA, 4K - 48b, 64K - 39b,
64K - 48b.
- booted a guest (featuring [1]) with PCIe passthrough'ed PCIe devices:
- AMD Overdrive and igbvf passthrough (using gsi direct mapping)
- Cavium ThunderX and ixgbevf passthrough (using KVM MSI routing)
Unfortunately I have not been able to run DPDK testpmd yet on guest side.
The problem I see is the user space driver dma-maps a huge area
and this causes plenty of CMDQ_OP_TLBI_NH_VA commands to be sent
(tlbi-on-map) which are sent for each page whereas the dma-map covers a
huge page. I will work on this issue for next version.
Known limitations:
- no VMSAv8-32 suport
- no nested stage support (S1 + S2)
- no support for HYP mappings
- register fine emulation, commands, interrupts and errors were
not accurately tested. Handling is sufficient to run use cases
described hereafter though.
Best Regards
Eric
This series can be found at:
v5: https://github.com/eauger/qemu/tree/v2.9-SMMU-v5
v4: https://github.com/eauger/qemu/tree/v2.9-SMMU-v4
References:
[1] [RFC 0/2] arm-smmu-v3 tlbi-on-map option
[2] Prem's last iteration:
- https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03531.html
History:
v4 -> v5:
- initial_level now part of SMMUTransCfg
- smmu_page_walk_64 takes into account the max input size
- implement sys->iommu_ops.replay and sys->iommu_ops.notify_flag_changed
- smmuv3_translate: bug fix: don't walk on bypass
- smmu_update_qreg: fix PROD index update
- I did not yet address Peter's comments as the code is not mature enough
to be split into sub patches.
v3 -> v4 [Eric]:
- page table walk rewritten to allow scan of the page table within a
range of IOVA. This prepares for VFIO integration and replay.
- configuration parsing partially reworked.
- do not advertise unsupported/untested features: S2, S1 + S2, HYP,
PRI, ATS, ..
- added ACPI table generation
- migrated to dynamic traces
- mingw compilation fix
v2 -> v3 [Eric]:
- rebased on 2.9
- mostly code and patch reorganization to ease the review process
- optional patches removed. They may be handled separately. I am currently
working on ACPI enablement.
- optional instantiation of the smmu in mach-virt
- removed [2/9] (fdt functions) since not mandated
- start splitting main patch into base and derived object
- no new function feature added
v1 -> v2 [Prem]:
- Adopted review comments from Eric Auger
- Make SMMU_DPRINTF to internally call qemu_log
(since translation requests are too many, we need control
on the type of log we want)
- SMMUTransCfg modified to suite simplicity
- Change RegInfo to uint64 register array
- Code cleanup
- Test cleanups
- Reshuffled patches
v0 -> v1 [Prem]:
- As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable)
- Reworked register access/update logic
- Factored out translation code for
- single point bug fix
- sharing/removal in future
- (optional) Unit tests added, with PCI test device
- S1 with 4k/64k, S1+S2 with 4k/64k
- (S1 or S2) only can be verified by Linux 4.7 driver
- (optional) Priliminary ACPI support
v0 [Prem]:
- Implements SMMUv3 spec 11.0
- Supported for PCIe devices,
- Command Queue and Event Queue supported
- LPAE only, S1 is supported and Tested, S2 not tested
- BE mode Translation not supported
- IRQ support (legacy, no MSI)
- Tested with DPDK and e1000
Eric Auger (5):
hw/arm/smmu-common: smmu base class
hw/arm/virt: Add 2.10 machine type
hw/arm/virt: Add tlbi-on-map property to the smmuv3 node
target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route
hw/arm/smmuv3: VFIO integration
Prem Mallappa (3):
hw/arm/smmuv3: smmuv3 emulation model
hw/arm/virt: Add SMMUv3 to the virt board
hw/arm/virt-acpi-build: Add smmuv3 node in IORT table
default-configs/aarch64-softmmu.mak | 1 +
hw/arm/Makefile.objs | 1 +
hw/arm/smmu-common.c | 474 +++++++++++++
hw/arm/smmu-internal.h | 89 +++
hw/arm/smmuv3-internal.h | 651 ++++++++++++++++++
hw/arm/smmuv3.c | 1256 +++++++++++++++++++++++++++++++++++
hw/arm/trace-events | 54 ++
hw/arm/virt-acpi-build.c | 56 +-
hw/arm/virt.c | 111 +++-
include/hw/acpi/acpi-defs.h | 15 +
include/hw/arm/smmu-common.h | 127 ++++
include/hw/arm/smmuv3.h | 87 +++
include/hw/arm/virt.h | 5 +
target/arm/kvm.c | 28 +
target/arm/trace-events | 3 +
15 files changed, 2949 insertions(+), 9 deletions(-)
create mode 100644 hw/arm/smmu-common.c
create mode 100644 hw/arm/smmu-internal.h
create mode 100644 hw/arm/smmuv3-internal.h
create mode 100644 hw/arm/smmuv3.c
create mode 100644 include/hw/arm/smmu-common.h
create mode 100644 include/hw/arm/smmuv3.h
--
2.5.5
Hi Eric, With fixes in comments that I made, I was able to run VM with virtio-blk-pci and virtio-net-pci devices. I have tried vhost-net as well but I am seeing outgoing packets payload corrupted from host perspective, tcpdump on host tun i/f shows zeroes in packet payload. However, tcpdump in VM shows that packets are fine. Have you seen anything like that? Packets incoming to VM are fine though. I will keep debugging on my side too. Thanks, Tomasz On 09.07.2017 22:51, Eric Auger wrote: > This series implements the emulation code for ARM SMMUv3. > This is the continuation of Prem's work [1]. > > This v5 mainly brings VFIO integration in DT mode. On guest kernel > side, this requires a quirk [1] to force TLB invalidation on map. > > The following changes also are noticeable: > - fix SMMU_CMDQ_CONS offset > - adds dma-coherent dt property which fixes the unhandled command > opcode bug. > - implements block PTE > > The smmu is instantiated when passing the smmu option to machvirt: > "-M virt-2.10,smmu" > > As I haven't split the code yet so that it can be easily reviewable > I don't expect deep reviews at this stage. Also the implementation may > be largely sub-optimal. > > Tested Use Cases: > - booted a guest in dt and acpi mode with an iommu_platform > virtio-net-pci device (using dma ops). Tested with the following > guest combinations: 4K page - 39 bit VA, 4K - 48b, 64K - 39b, > 64K - 48b. > - booted a guest (featuring [1]) with PCIe passthrough'ed PCIe devices: > - AMD Overdrive and igbvf passthrough (using gsi direct mapping) > - Cavium ThunderX and ixgbevf passthrough (using KVM MSI routing) > > Unfortunately I have not been able to run DPDK testpmd yet on guest side. > The problem I see is the user space driver dma-maps a huge area > and this causes plenty of CMDQ_OP_TLBI_NH_VA commands to be sent > (tlbi-on-map) which are sent for each page whereas the dma-map covers a > huge page. I will work on this issue for next version. > > Known limitations: > - no VMSAv8-32 suport > - no nested stage support (S1 + S2) > - no support for HYP mappings > - register fine emulation, commands, interrupts and errors were > not accurately tested. Handling is sufficient to run use cases > described hereafter though. > > Best Regards > > Eric > > This series can be found at: > v5: https://github.com/eauger/qemu/tree/v2.9-SMMU-v5 > v4: https://github.com/eauger/qemu/tree/v2.9-SMMU-v4 > > References: > [1] [RFC 0/2] arm-smmu-v3 tlbi-on-map option > [2] Prem's last iteration: > - https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03531.html > > History: > v4 -> v5: > - initial_level now part of SMMUTransCfg > - smmu_page_walk_64 takes into account the max input size > - implement sys->iommu_ops.replay and sys->iommu_ops.notify_flag_changed > - smmuv3_translate: bug fix: don't walk on bypass > - smmu_update_qreg: fix PROD index update > - I did not yet address Peter's comments as the code is not mature enough > to be split into sub patches. > > v3 -> v4 [Eric]: > - page table walk rewritten to allow scan of the page table within a > range of IOVA. This prepares for VFIO integration and replay. > - configuration parsing partially reworked. > - do not advertise unsupported/untested features: S2, S1 + S2, HYP, > PRI, ATS, .. > - added ACPI table generation > - migrated to dynamic traces > - mingw compilation fix > > v2 -> v3 [Eric]: > - rebased on 2.9 > - mostly code and patch reorganization to ease the review process > - optional patches removed. They may be handled separately. I am currently > working on ACPI enablement. > - optional instantiation of the smmu in mach-virt > - removed [2/9] (fdt functions) since not mandated > - start splitting main patch into base and derived object > - no new function feature added > > v1 -> v2 [Prem]: > - Adopted review comments from Eric Auger > - Make SMMU_DPRINTF to internally call qemu_log > (since translation requests are too many, we need control > on the type of log we want) > - SMMUTransCfg modified to suite simplicity > - Change RegInfo to uint64 register array > - Code cleanup > - Test cleanups > - Reshuffled patches > > v0 -> v1 [Prem]: > - As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable) > - Reworked register access/update logic > - Factored out translation code for > - single point bug fix > - sharing/removal in future > - (optional) Unit tests added, with PCI test device > - S1 with 4k/64k, S1+S2 with 4k/64k > - (S1 or S2) only can be verified by Linux 4.7 driver > - (optional) Priliminary ACPI support > > v0 [Prem]: > - Implements SMMUv3 spec 11.0 > - Supported for PCIe devices, > - Command Queue and Event Queue supported > - LPAE only, S1 is supported and Tested, S2 not tested > - BE mode Translation not supported > - IRQ support (legacy, no MSI) > - Tested with DPDK and e1000 > > > Eric Auger (5): > hw/arm/smmu-common: smmu base class > hw/arm/virt: Add 2.10 machine type > hw/arm/virt: Add tlbi-on-map property to the smmuv3 node > target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route > hw/arm/smmuv3: VFIO integration > > Prem Mallappa (3): > hw/arm/smmuv3: smmuv3 emulation model > hw/arm/virt: Add SMMUv3 to the virt board > hw/arm/virt-acpi-build: Add smmuv3 node in IORT table > > default-configs/aarch64-softmmu.mak | 1 + > hw/arm/Makefile.objs | 1 + > hw/arm/smmu-common.c | 474 +++++++++++++ > hw/arm/smmu-internal.h | 89 +++ > hw/arm/smmuv3-internal.h | 651 ++++++++++++++++++ > hw/arm/smmuv3.c | 1256 +++++++++++++++++++++++++++++++++++ > hw/arm/trace-events | 54 ++ > hw/arm/virt-acpi-build.c | 56 +- > hw/arm/virt.c | 111 +++- > include/hw/acpi/acpi-defs.h | 15 + > include/hw/arm/smmu-common.h | 127 ++++ > include/hw/arm/smmuv3.h | 87 +++ > include/hw/arm/virt.h | 5 + > target/arm/kvm.c | 28 + > target/arm/trace-events | 3 + > 15 files changed, 2949 insertions(+), 9 deletions(-) > create mode 100644 hw/arm/smmu-common.c > create mode 100644 hw/arm/smmu-internal.h > create mode 100644 hw/arm/smmuv3-internal.h > create mode 100644 hw/arm/smmuv3.c > create mode 100644 include/hw/arm/smmu-common.h > create mode 100644 include/hw/arm/smmuv3.h >
Hi Eric, Just letting you know that I am facing another issue with the following setup: 1. host (4.12 kernel & 64K page) and VM (4.12 kernel & 64K page) 2. QEMU + -netdev type=tap,ifname=tap,id=net0 -device virtio-net-pci,netdev=net0,iommu_platform,disable-modern=off,disable-legacy=on 2. On VM, I allocate some huge pages and run DPDK testpmd app: # echo 4 > /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages # ./dpdk/usertools/dpdk-devbind.py -b vfio-pci 0000:00:02.0 # ./dpdk/build/app/testpmd -l 0-13 -n 4 -w 0000:00:02.0 -- --disable-hw-vlan-filter --disable-rss -i EAL: Detected 14 lcore(s) EAL: Probing VFIO support... EAL: VFIO support initialized EAL: PCI device 0000:00:02.0 on NUMA socket -1 EAL: probe driver: 1af4:1041 net_virtio EAL: using IOMMU type 1 (Type 1) EAL: iommu_map_dma vaddr ffff20000000 size 80000000 iova 120000000 EAL: Can't write to PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (12) EAL: Can't write to PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (12) EAL: Can't write to PCI bar (0) : offset (12) EAL: Can't read from PCI bar (0) : offset (0) EAL: Can't write to PCI bar (0) : offset (4) EAL: Can't write to PCI bar (0) : offset (14) EAL: Can't write to PCI bar (0) : offset (e) EAL: Can't read from PCI bar (0) : offset (c) EAL: Requested device 0000:00:02.0 cannot be used EAL: No probed ethernet devices Interactive-mode selected USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=251456, size=2176, socket=0 When VM uses *4K pages* the same setup works fine. I will work on this but please let me know in case you already know what is going on. Thanks, Tomasz On 09.07.2017 22:51, Eric Auger wrote: > This series implements the emulation code for ARM SMMUv3. > This is the continuation of Prem's work [1]. > > This v5 mainly brings VFIO integration in DT mode. On guest kernel > side, this requires a quirk [1] to force TLB invalidation on map. > > The following changes also are noticeable: > - fix SMMU_CMDQ_CONS offset > - adds dma-coherent dt property which fixes the unhandled command > opcode bug. > - implements block PTE > > The smmu is instantiated when passing the smmu option to machvirt: > "-M virt-2.10,smmu" > > As I haven't split the code yet so that it can be easily reviewable > I don't expect deep reviews at this stage. Also the implementation may > be largely sub-optimal. > > Tested Use Cases: > - booted a guest in dt and acpi mode with an iommu_platform > virtio-net-pci device (using dma ops). Tested with the following > guest combinations: 4K page - 39 bit VA, 4K - 48b, 64K - 39b, > 64K - 48b. > - booted a guest (featuring [1]) with PCIe passthrough'ed PCIe devices: > - AMD Overdrive and igbvf passthrough (using gsi direct mapping) > - Cavium ThunderX and ixgbevf passthrough (using KVM MSI routing) > > Unfortunately I have not been able to run DPDK testpmd yet on guest side. > The problem I see is the user space driver dma-maps a huge area > and this causes plenty of CMDQ_OP_TLBI_NH_VA commands to be sent > (tlbi-on-map) which are sent for each page whereas the dma-map covers a > huge page. I will work on this issue for next version. > > Known limitations: > - no VMSAv8-32 suport > - no nested stage support (S1 + S2) > - no support for HYP mappings > - register fine emulation, commands, interrupts and errors were > not accurately tested. Handling is sufficient to run use cases > described hereafter though. > > Best Regards > > Eric > > This series can be found at: > v5: https://github.com/eauger/qemu/tree/v2.9-SMMU-v5 > v4: https://github.com/eauger/qemu/tree/v2.9-SMMU-v4 > > References: > [1] [RFC 0/2] arm-smmu-v3 tlbi-on-map option > [2] Prem's last iteration: > - https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03531.html > > History: > v4 -> v5: > - initial_level now part of SMMUTransCfg > - smmu_page_walk_64 takes into account the max input size > - implement sys->iommu_ops.replay and sys->iommu_ops.notify_flag_changed > - smmuv3_translate: bug fix: don't walk on bypass > - smmu_update_qreg: fix PROD index update > - I did not yet address Peter's comments as the code is not mature enough > to be split into sub patches. > > v3 -> v4 [Eric]: > - page table walk rewritten to allow scan of the page table within a > range of IOVA. This prepares for VFIO integration and replay. > - configuration parsing partially reworked. > - do not advertise unsupported/untested features: S2, S1 + S2, HYP, > PRI, ATS, .. > - added ACPI table generation > - migrated to dynamic traces > - mingw compilation fix > > v2 -> v3 [Eric]: > - rebased on 2.9 > - mostly code and patch reorganization to ease the review process > - optional patches removed. They may be handled separately. I am currently > working on ACPI enablement. > - optional instantiation of the smmu in mach-virt > - removed [2/9] (fdt functions) since not mandated > - start splitting main patch into base and derived object > - no new function feature added > > v1 -> v2 [Prem]: > - Adopted review comments from Eric Auger > - Make SMMU_DPRINTF to internally call qemu_log > (since translation requests are too many, we need control > on the type of log we want) > - SMMUTransCfg modified to suite simplicity > - Change RegInfo to uint64 register array > - Code cleanup > - Test cleanups > - Reshuffled patches > > v0 -> v1 [Prem]: > - As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable) > - Reworked register access/update logic > - Factored out translation code for > - single point bug fix > - sharing/removal in future > - (optional) Unit tests added, with PCI test device > - S1 with 4k/64k, S1+S2 with 4k/64k > - (S1 or S2) only can be verified by Linux 4.7 driver > - (optional) Priliminary ACPI support > > v0 [Prem]: > - Implements SMMUv3 spec 11.0 > - Supported for PCIe devices, > - Command Queue and Event Queue supported > - LPAE only, S1 is supported and Tested, S2 not tested > - BE mode Translation not supported > - IRQ support (legacy, no MSI) > - Tested with DPDK and e1000 > > > Eric Auger (5): > hw/arm/smmu-common: smmu base class > hw/arm/virt: Add 2.10 machine type > hw/arm/virt: Add tlbi-on-map property to the smmuv3 node > target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route > hw/arm/smmuv3: VFIO integration > > Prem Mallappa (3): > hw/arm/smmuv3: smmuv3 emulation model > hw/arm/virt: Add SMMUv3 to the virt board > hw/arm/virt-acpi-build: Add smmuv3 node in IORT table > > default-configs/aarch64-softmmu.mak | 1 + > hw/arm/Makefile.objs | 1 + > hw/arm/smmu-common.c | 474 +++++++++++++ > hw/arm/smmu-internal.h | 89 +++ > hw/arm/smmuv3-internal.h | 651 ++++++++++++++++++ > hw/arm/smmuv3.c | 1256 +++++++++++++++++++++++++++++++++++ > hw/arm/trace-events | 54 ++ > hw/arm/virt-acpi-build.c | 56 +- > hw/arm/virt.c | 111 +++- > include/hw/acpi/acpi-defs.h | 15 + > include/hw/arm/smmu-common.h | 127 ++++ > include/hw/arm/smmuv3.h | 87 +++ > include/hw/arm/virt.h | 5 + > target/arm/kvm.c | 28 + > target/arm/trace-events | 3 + > 15 files changed, 2949 insertions(+), 9 deletions(-) > create mode 100644 hw/arm/smmu-common.c > create mode 100644 hw/arm/smmu-internal.h > create mode 100644 hw/arm/smmuv3-internal.h > create mode 100644 hw/arm/smmuv3.c > create mode 100644 include/hw/arm/smmu-common.h > create mode 100644 include/hw/arm/smmuv3.h >
Hi Tomasz, On 01/08/2017 13:01, Tomasz Nowicki wrote: > Hi Eric, > > Just letting you know that I am facing another issue with the following > setup: > 1. host (4.12 kernel & 64K page) and VM (4.12 kernel & 64K page) > 2. QEMU + -netdev type=tap,ifname=tap,id=net0 -device > virtio-net-pci,netdev=net0,iommu_platform,disable-modern=off,disable-legacy=on > > 2. On VM, I allocate some huge pages and run DPDK testpmd app: > # echo 4 > /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages > # ./dpdk/usertools/dpdk-devbind.py -b vfio-pci 0000:00:02.0 > # ./dpdk/build/app/testpmd -l 0-13 -n 4 -w 0000:00:02.0 -- > --disable-hw-vlan-filter --disable-rss -i > EAL: Detected 14 lcore(s) > EAL: Probing VFIO support... > EAL: VFIO support initialized > EAL: PCI device 0000:00:02.0 on NUMA socket -1 > EAL: probe driver: 1af4:1041 net_virtio > EAL: using IOMMU type 1 (Type 1) > EAL: iommu_map_dma vaddr ffff20000000 size 80000000 iova 120000000 > EAL: Can't write to PCI bar (0) : offset (12) > EAL: Can't read from PCI bar (0) : offset (12) > EAL: Can't read from PCI bar (0) : offset (12) > EAL: Can't write to PCI bar (0) : offset (12) > EAL: Can't read from PCI bar (0) : offset (12) > EAL: Can't write to PCI bar (0) : offset (12) > EAL: Can't read from PCI bar (0) : offset (0) > EAL: Can't write to PCI bar (0) : offset (4) > EAL: Can't write to PCI bar (0) : offset (14) > EAL: Can't write to PCI bar (0) : offset (e) > EAL: Can't read from PCI bar (0) : offset (c) > EAL: Requested device 0000:00:02.0 cannot be used > EAL: No probed ethernet devices > Interactive-mode selected > USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=251456, size=2176, > socket=0 > > When VM uses *4K pages* the same setup works fine. I will work on this > but please let me know in case you already know what is going on. No I did not face that one. I was able to launch testpmd without such early message. However I assigned an igbvf device to the guest and then to DPDK. I've never tested your config. However as stated in my cover letter at the moment DPDK is not working for me because of storms of tlbi-on-maps. I intend to work on this as soon as get some bandwidth, sorry. Thanks Eric > > Thanks, > Tomasz > > > On 09.07.2017 22:51, Eric Auger wrote: >> This series implements the emulation code for ARM SMMUv3. >> This is the continuation of Prem's work [1]. >> >> This v5 mainly brings VFIO integration in DT mode. On guest kernel >> side, this requires a quirk [1] to force TLB invalidation on map. >> >> The following changes also are noticeable: >> - fix SMMU_CMDQ_CONS offset >> - adds dma-coherent dt property which fixes the unhandled command >> opcode bug. >> - implements block PTE >> >> The smmu is instantiated when passing the smmu option to machvirt: >> "-M virt-2.10,smmu" >> >> As I haven't split the code yet so that it can be easily reviewable >> I don't expect deep reviews at this stage. Also the implementation may >> be largely sub-optimal. >> >> Tested Use Cases: >> - booted a guest in dt and acpi mode with an iommu_platform >> virtio-net-pci device (using dma ops). Tested with the following >> guest combinations: 4K page - 39 bit VA, 4K - 48b, 64K - 39b, >> 64K - 48b. >> - booted a guest (featuring [1]) with PCIe passthrough'ed PCIe devices: >> - AMD Overdrive and igbvf passthrough (using gsi direct mapping) >> - Cavium ThunderX and ixgbevf passthrough (using KVM MSI routing) >> >> Unfortunately I have not been able to run DPDK testpmd yet on guest side. >> The problem I see is the user space driver dma-maps a huge area >> and this causes plenty of CMDQ_OP_TLBI_NH_VA commands to be sent >> (tlbi-on-map) which are sent for each page whereas the dma-map covers a >> huge page. I will work on this issue for next version. >> >> Known limitations: >> - no VMSAv8-32 suport >> - no nested stage support (S1 + S2) >> - no support for HYP mappings >> - register fine emulation, commands, interrupts and errors were >> not accurately tested. Handling is sufficient to run use cases >> described hereafter though. >> >> Best Regards >> >> Eric >> >> This series can be found at: >> v5: https://github.com/eauger/qemu/tree/v2.9-SMMU-v5 >> v4: https://github.com/eauger/qemu/tree/v2.9-SMMU-v4 >> >> References: >> [1] [RFC 0/2] arm-smmu-v3 tlbi-on-map option >> [2] Prem's last iteration: >> - https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03531.html >> >> History: >> v4 -> v5: >> - initial_level now part of SMMUTransCfg >> - smmu_page_walk_64 takes into account the max input size >> - implement sys->iommu_ops.replay and sys->iommu_ops.notify_flag_changed >> - smmuv3_translate: bug fix: don't walk on bypass >> - smmu_update_qreg: fix PROD index update >> - I did not yet address Peter's comments as the code is not mature enough >> to be split into sub patches. >> >> v3 -> v4 [Eric]: >> - page table walk rewritten to allow scan of the page table within a >> range of IOVA. This prepares for VFIO integration and replay. >> - configuration parsing partially reworked. >> - do not advertise unsupported/untested features: S2, S1 + S2, HYP, >> PRI, ATS, .. >> - added ACPI table generation >> - migrated to dynamic traces >> - mingw compilation fix >> >> v2 -> v3 [Eric]: >> - rebased on 2.9 >> - mostly code and patch reorganization to ease the review process >> - optional patches removed. They may be handled separately. I am >> currently >> working on ACPI enablement. >> - optional instantiation of the smmu in mach-virt >> - removed [2/9] (fdt functions) since not mandated >> - start splitting main patch into base and derived object >> - no new function feature added >> >> v1 -> v2 [Prem]: >> - Adopted review comments from Eric Auger >> - Make SMMU_DPRINTF to internally call qemu_log >> (since translation requests are too many, we need control >> on the type of log we want) >> - SMMUTransCfg modified to suite simplicity >> - Change RegInfo to uint64 register array >> - Code cleanup >> - Test cleanups >> - Reshuffled patches >> >> v0 -> v1 [Prem]: >> - As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable) >> - Reworked register access/update logic >> - Factored out translation code for >> - single point bug fix >> - sharing/removal in future >> - (optional) Unit tests added, with PCI test device >> - S1 with 4k/64k, S1+S2 with 4k/64k >> - (S1 or S2) only can be verified by Linux 4.7 driver >> - (optional) Priliminary ACPI support >> >> v0 [Prem]: >> - Implements SMMUv3 spec 11.0 >> - Supported for PCIe devices, >> - Command Queue and Event Queue supported >> - LPAE only, S1 is supported and Tested, S2 not tested >> - BE mode Translation not supported >> - IRQ support (legacy, no MSI) >> - Tested with DPDK and e1000 >> >> >> Eric Auger (5): >> hw/arm/smmu-common: smmu base class >> hw/arm/virt: Add 2.10 machine type >> hw/arm/virt: Add tlbi-on-map property to the smmuv3 node >> target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route >> hw/arm/smmuv3: VFIO integration >> >> Prem Mallappa (3): >> hw/arm/smmuv3: smmuv3 emulation model >> hw/arm/virt: Add SMMUv3 to the virt board >> hw/arm/virt-acpi-build: Add smmuv3 node in IORT table >> >> default-configs/aarch64-softmmu.mak | 1 + >> hw/arm/Makefile.objs | 1 + >> hw/arm/smmu-common.c | 474 +++++++++++++ >> hw/arm/smmu-internal.h | 89 +++ >> hw/arm/smmuv3-internal.h | 651 ++++++++++++++++++ >> hw/arm/smmuv3.c | 1256 >> +++++++++++++++++++++++++++++++++++ >> hw/arm/trace-events | 54 ++ >> hw/arm/virt-acpi-build.c | 56 +- >> hw/arm/virt.c | 111 +++- >> include/hw/acpi/acpi-defs.h | 15 + >> include/hw/arm/smmu-common.h | 127 ++++ >> include/hw/arm/smmuv3.h | 87 +++ >> include/hw/arm/virt.h | 5 + >> target/arm/kvm.c | 28 + >> target/arm/trace-events | 3 + >> 15 files changed, 2949 insertions(+), 9 deletions(-) >> create mode 100644 hw/arm/smmu-common.c >> create mode 100644 hw/arm/smmu-internal.h >> create mode 100644 hw/arm/smmuv3-internal.h >> create mode 100644 hw/arm/smmuv3.c >> create mode 100644 include/hw/arm/smmu-common.h >> create mode 100644 include/hw/arm/smmuv3.h >>
Hi Eric, On 01.08.2017 15:07, Auger Eric wrote: > Hi Tomasz, > On 01/08/2017 13:01, Tomasz Nowicki wrote: >> Hi Eric, >> >> Just letting you know that I am facing another issue with the following >> setup: >> 1. host (4.12 kernel & 64K page) and VM (4.12 kernel & 64K page) >> 2. QEMU + -netdev type=tap,ifname=tap,id=net0 -device >> virtio-net-pci,netdev=net0,iommu_platform,disable-modern=off,disable-legacy=on >> >> 2. On VM, I allocate some huge pages and run DPDK testpmd app: >> # echo 4 > /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages >> # ./dpdk/usertools/dpdk-devbind.py -b vfio-pci 0000:00:02.0 >> # ./dpdk/build/app/testpmd -l 0-13 -n 4 -w 0000:00:02.0 -- >> --disable-hw-vlan-filter --disable-rss -i >> EAL: Detected 14 lcore(s) >> EAL: Probing VFIO support... >> EAL: VFIO support initialized >> EAL: PCI device 0000:00:02.0 on NUMA socket -1 >> EAL: probe driver: 1af4:1041 net_virtio >> EAL: using IOMMU type 1 (Type 1) >> EAL: iommu_map_dma vaddr ffff20000000 size 80000000 iova 120000000 >> EAL: Can't write to PCI bar (0) : offset (12) >> EAL: Can't read from PCI bar (0) : offset (12) >> EAL: Can't read from PCI bar (0) : offset (12) >> EAL: Can't write to PCI bar (0) : offset (12) >> EAL: Can't read from PCI bar (0) : offset (12) >> EAL: Can't write to PCI bar (0) : offset (12) >> EAL: Can't read from PCI bar (0) : offset (0) >> EAL: Can't write to PCI bar (0) : offset (4) >> EAL: Can't write to PCI bar (0) : offset (14) >> EAL: Can't write to PCI bar (0) : offset (e) >> EAL: Can't read from PCI bar (0) : offset (c) >> EAL: Requested device 0000:00:02.0 cannot be used >> EAL: No probed ethernet devices >> Interactive-mode selected >> USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=251456, size=2176, >> socket=0 >> >> When VM uses *4K pages* the same setup works fine. I will work on this >> but please let me know in case you already know what is going on. > > No I did not face that one. I was able to launch testpmd without such > early message. However I assigned an igbvf device to the guest and then > to DPDK. I've never tested your config. > > However as stated in my cover letter at the moment DPDK is not working > for me because of storms of tlbi-on-maps. I intend to work on this as > soon as get some bandwidth, sorry. I found what was the reason of failure. QEMU creates BARs for VIRTIO PCI device. The size of it depends on what is necessary for VIRTIO protocol. In my case the BAR is 16K size which is too small to be mmapable for kernel with 64K pages: vfio_pci_enable() -> vfio_pci_probe_mmaps() -> here guest kernel checks that BAR size is smaller than current PAGE_SIZE and clears VFIO_REGION_INFO_FLAG_MMAP flag which prevents BAR from being mmapped later on. I added -device virtio-net-pci,...,page-per-vq=on to enlarge BAR size to 8M and now testpmd works fine. I wonder how the same setup is working with e.g. Intel or AMD IOMMU. Thanks, Tomasz
Hi Tomasz, On 03/08/2017 12:11, Tomasz Nowicki wrote: > Hi Eric, > > On 01.08.2017 15:07, Auger Eric wrote: >> Hi Tomasz, >> On 01/08/2017 13:01, Tomasz Nowicki wrote: >>> Hi Eric, >>> >>> Just letting you know that I am facing another issue with the following >>> setup: >>> 1. host (4.12 kernel & 64K page) and VM (4.12 kernel & 64K page) >>> 2. QEMU + -netdev type=tap,ifname=tap,id=net0 -device >>> virtio-net-pci,netdev=net0,iommu_platform,disable-modern=off,disable-legacy=on >>> >>> >>> 2. On VM, I allocate some huge pages and run DPDK testpmd app: >>> # echo 4 > /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages >>> # ./dpdk/usertools/dpdk-devbind.py -b vfio-pci 0000:00:02.0 >>> # ./dpdk/build/app/testpmd -l 0-13 -n 4 -w 0000:00:02.0 -- >>> --disable-hw-vlan-filter --disable-rss -i >>> EAL: Detected 14 lcore(s) >>> EAL: Probing VFIO support... >>> EAL: VFIO support initialized >>> EAL: PCI device 0000:00:02.0 on NUMA socket -1 >>> EAL: probe driver: 1af4:1041 net_virtio >>> EAL: using IOMMU type 1 (Type 1) >>> EAL: iommu_map_dma vaddr ffff20000000 size 80000000 iova 120000000 >>> EAL: Can't write to PCI bar (0) : offset (12) >>> EAL: Can't read from PCI bar (0) : offset (12) >>> EAL: Can't read from PCI bar (0) : offset (12) >>> EAL: Can't write to PCI bar (0) : offset (12) >>> EAL: Can't read from PCI bar (0) : offset (12) >>> EAL: Can't write to PCI bar (0) : offset (12) >>> EAL: Can't read from PCI bar (0) : offset (0) >>> EAL: Can't write to PCI bar (0) : offset (4) >>> EAL: Can't write to PCI bar (0) : offset (14) >>> EAL: Can't write to PCI bar (0) : offset (e) >>> EAL: Can't read from PCI bar (0) : offset (c) >>> EAL: Requested device 0000:00:02.0 cannot be used >>> EAL: No probed ethernet devices >>> Interactive-mode selected >>> USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=251456, size=2176, >>> socket=0 >>> >>> When VM uses *4K pages* the same setup works fine. I will work on this >>> but please let me know in case you already know what is going on. >> >> No I did not face that one. I was able to launch testpmd without such >> early message. However I assigned an igbvf device to the guest and then >> to DPDK. I've never tested your config. >> >> However as stated in my cover letter at the moment DPDK is not working >> for me because of storms of tlbi-on-maps. I intend to work on this as >> soon as get some bandwidth, sorry. > > I found what was the reason of failure. > > QEMU creates BARs for VIRTIO PCI device. The size of it depends on what > is necessary for VIRTIO protocol. In my case the BAR is 16K size which > is too small to be mmapable for kernel with 64K pages: > vfio_pci_enable() -> vfio_pci_probe_mmaps() -> > here guest kernel checks that BAR size is smaller than current PAGE_SIZE > and clears VFIO_REGION_INFO_FLAG_MMAP flag which prevents BAR from being > mmapped later on. I added -device virtio-net-pci,...,page-per-vq=on to > enlarge BAR size to 8M and now testpmd works fine. I wonder how the same > setup is working with e.g. Intel or AMD IOMMU. Hum OK. Yet another thing to investigate! thank you for your efforts and excellent news overall. Preparing a rebase ... Thanks Eric > > Thanks, > Tomasz >
© 2016 - 2025 Red Hat, Inc.