[Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support

Eric Auger posted 8 patches 8 years, 3 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/1499633493-19865-1-git-send-email-eric.auger@redhat.com
Test FreeBSD passed
Test checkpatch passed
Test docker passed
Test s390x passed
There is a newer version of this series
default-configs/aarch64-softmmu.mak |    1 +
hw/arm/Makefile.objs                |    1 +
hw/arm/smmu-common.c                |  474 +++++++++++++
hw/arm/smmu-internal.h              |   89 +++
hw/arm/smmuv3-internal.h            |  651 ++++++++++++++++++
hw/arm/smmuv3.c                     | 1256 +++++++++++++++++++++++++++++++++++
hw/arm/trace-events                 |   54 ++
hw/arm/virt-acpi-build.c            |   56 +-
hw/arm/virt.c                       |  111 +++-
include/hw/acpi/acpi-defs.h         |   15 +
include/hw/arm/smmu-common.h        |  127 ++++
include/hw/arm/smmuv3.h             |   87 +++
include/hw/arm/virt.h               |    5 +
target/arm/kvm.c                    |   28 +
target/arm/trace-events             |    3 +
15 files changed, 2949 insertions(+), 9 deletions(-)
create mode 100644 hw/arm/smmu-common.c
create mode 100644 hw/arm/smmu-internal.h
create mode 100644 hw/arm/smmuv3-internal.h
create mode 100644 hw/arm/smmuv3.c
create mode 100644 include/hw/arm/smmu-common.h
create mode 100644 include/hw/arm/smmuv3.h
[Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support
Posted by Eric Auger 8 years, 3 months ago
This series implements the emulation code for ARM SMMUv3.
This is the continuation of Prem's work [1].

This v5 mainly brings VFIO integration in DT mode. On guest kernel
side, this requires a quirk [1] to force TLB invalidation on map.

The following changes also are noticeable:
- fix SMMU_CMDQ_CONS offset
- adds dma-coherent dt property which fixes the unhandled command
  opcode bug.
- implements block PTE

The smmu is instantiated when passing the smmu option to machvirt:
"-M virt-2.10,smmu"

As I haven't split the code yet so that it can be easily reviewable
I don't expect deep reviews at this stage. Also the implementation may
be largely sub-optimal.

Tested Use Cases:
- booted a guest in dt and acpi mode with an iommu_platform
  virtio-net-pci device (using dma ops). Tested with the following
  guest combinations: 4K page - 39 bit VA, 4K - 48b, 64K - 39b,
  64K - 48b.
- booted a guest (featuring [1]) with PCIe passthrough'ed PCIe devices:
  - AMD Overdrive and igbvf passthrough (using gsi direct mapping)
  - Cavium ThunderX and ixgbevf passthrough (using KVM MSI routing)

Unfortunately I have not been able to run DPDK testpmd yet on guest side.
The problem I see is the user space driver dma-maps a huge area
and this causes plenty of CMDQ_OP_TLBI_NH_VA commands to be sent
(tlbi-on-map) which are sent for each page whereas the dma-map covers a
huge page. I will work on this issue for next version.

Known limitations:
- no VMSAv8-32 suport
- no nested stage support (S1 + S2)
- no support for HYP mappings
- register fine emulation, commands, interrupts and errors were
  not accurately tested. Handling is sufficient to run use cases
  described hereafter though.

Best Regards

Eric

This series can be found at:
v5: https://github.com/eauger/qemu/tree/v2.9-SMMU-v5
v4: https://github.com/eauger/qemu/tree/v2.9-SMMU-v4

References:
[1] [RFC 0/2] arm-smmu-v3 tlbi-on-map option
[2] Prem's last iteration:
- https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03531.html

History:
v4 -> v5:
- initial_level now part of SMMUTransCfg
- smmu_page_walk_64 takes into account the max input size
- implement sys->iommu_ops.replay and sys->iommu_ops.notify_flag_changed
- smmuv3_translate: bug fix: don't walk on bypass
- smmu_update_qreg: fix PROD index update
- I did not yet address Peter's comments as the code is not mature enough
  to be split into sub patches.

v3 -> v4 [Eric]:
- page table walk rewritten to allow scan of the page table within a
  range of IOVA. This prepares for VFIO integration and replay.
- configuration parsing partially reworked.
- do not advertise unsupported/untested features: S2, S1 + S2, HYP,
  PRI, ATS, ..
- added ACPI table generation
- migrated to dynamic traces
- mingw compilation fix

v2 -> v3 [Eric]:
- rebased on 2.9
- mostly code and patch reorganization to ease the review process
- optional patches removed. They may be handled separately. I am currently
  working on ACPI enablement.
- optional instantiation of the smmu in mach-virt
- removed [2/9] (fdt functions) since not mandated
- start splitting main patch into base and derived object
- no new function feature added

v1 -> v2 [Prem]:
- Adopted review comments from Eric Auger
        - Make SMMU_DPRINTF to internally call qemu_log
            (since translation requests are too many, we need control
             on the type of log we want)
        - SMMUTransCfg modified to suite simplicity
        - Change RegInfo to uint64 register array
        - Code cleanup
        - Test cleanups
- Reshuffled patches

v0 -> v1 [Prem]:
- As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable)
- Reworked register access/update logic
- Factored out translation code for
        - single point bug fix
        - sharing/removal in future
- (optional) Unit tests added, with PCI test device
        - S1 with 4k/64k, S1+S2 with 4k/64k
        - (S1 or S2) only can be verified by Linux 4.7 driver
        - (optional) Priliminary ACPI support

v0 [Prem]:
- Implements SMMUv3 spec 11.0
- Supported for PCIe devices,
- Command Queue and Event Queue supported
- LPAE only, S1 is supported and Tested, S2 not tested
- BE mode Translation not supported
- IRQ support (legacy, no MSI)
- Tested with DPDK and e1000


Eric Auger (5):
  hw/arm/smmu-common: smmu base class
  hw/arm/virt: Add 2.10 machine type
  hw/arm/virt: Add tlbi-on-map property to the smmuv3 node
  target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route
  hw/arm/smmuv3: VFIO integration

Prem Mallappa (3):
  hw/arm/smmuv3: smmuv3 emulation model
  hw/arm/virt: Add SMMUv3 to the virt board
  hw/arm/virt-acpi-build: Add smmuv3 node in IORT table

 default-configs/aarch64-softmmu.mak |    1 +
 hw/arm/Makefile.objs                |    1 +
 hw/arm/smmu-common.c                |  474 +++++++++++++
 hw/arm/smmu-internal.h              |   89 +++
 hw/arm/smmuv3-internal.h            |  651 ++++++++++++++++++
 hw/arm/smmuv3.c                     | 1256 +++++++++++++++++++++++++++++++++++
 hw/arm/trace-events                 |   54 ++
 hw/arm/virt-acpi-build.c            |   56 +-
 hw/arm/virt.c                       |  111 +++-
 include/hw/acpi/acpi-defs.h         |   15 +
 include/hw/arm/smmu-common.h        |  127 ++++
 include/hw/arm/smmuv3.h             |   87 +++
 include/hw/arm/virt.h               |    5 +
 target/arm/kvm.c                    |   28 +
 target/arm/trace-events             |    3 +
 15 files changed, 2949 insertions(+), 9 deletions(-)
 create mode 100644 hw/arm/smmu-common.c
 create mode 100644 hw/arm/smmu-internal.h
 create mode 100644 hw/arm/smmuv3-internal.h
 create mode 100644 hw/arm/smmuv3.c
 create mode 100644 include/hw/arm/smmu-common.h
 create mode 100644 include/hw/arm/smmuv3.h

-- 
2.5.5


Re: [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support
Posted by Tomasz Nowicki 8 years, 3 months ago
Hi Eric,

With fixes in comments that I made, I was able to run VM with 
virtio-blk-pci and virtio-net-pci devices.

I have tried vhost-net as well but I am seeing outgoing packets payload 
corrupted from host perspective, tcpdump on host tun i/f shows zeroes in 
packet payload. However, tcpdump in VM shows that packets are fine. Have 
you seen anything like that? Packets incoming to VM are fine though. I 
will keep debugging on my side too.

Thanks,
Tomasz

On 09.07.2017 22:51, Eric Auger wrote:
> This series implements the emulation code for ARM SMMUv3.
> This is the continuation of Prem's work [1].
> 
> This v5 mainly brings VFIO integration in DT mode. On guest kernel
> side, this requires a quirk [1] to force TLB invalidation on map.
> 
> The following changes also are noticeable:
> - fix SMMU_CMDQ_CONS offset
> - adds dma-coherent dt property which fixes the unhandled command
>    opcode bug.
> - implements block PTE
> 
> The smmu is instantiated when passing the smmu option to machvirt:
> "-M virt-2.10,smmu"
> 
> As I haven't split the code yet so that it can be easily reviewable
> I don't expect deep reviews at this stage. Also the implementation may
> be largely sub-optimal.
> 
> Tested Use Cases:
> - booted a guest in dt and acpi mode with an iommu_platform
>    virtio-net-pci device (using dma ops). Tested with the following
>    guest combinations: 4K page - 39 bit VA, 4K - 48b, 64K - 39b,
>    64K - 48b.
> - booted a guest (featuring [1]) with PCIe passthrough'ed PCIe devices:
>    - AMD Overdrive and igbvf passthrough (using gsi direct mapping)
>    - Cavium ThunderX and ixgbevf passthrough (using KVM MSI routing)
> 
> Unfortunately I have not been able to run DPDK testpmd yet on guest side.
> The problem I see is the user space driver dma-maps a huge area
> and this causes plenty of CMDQ_OP_TLBI_NH_VA commands to be sent
> (tlbi-on-map) which are sent for each page whereas the dma-map covers a
> huge page. I will work on this issue for next version.
> 
> Known limitations:
> - no VMSAv8-32 suport
> - no nested stage support (S1 + S2)
> - no support for HYP mappings
> - register fine emulation, commands, interrupts and errors were
>    not accurately tested. Handling is sufficient to run use cases
>    described hereafter though.
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> v5: https://github.com/eauger/qemu/tree/v2.9-SMMU-v5
> v4: https://github.com/eauger/qemu/tree/v2.9-SMMU-v4
> 
> References:
> [1] [RFC 0/2] arm-smmu-v3 tlbi-on-map option
> [2] Prem's last iteration:
> - https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03531.html
> 
> History:
> v4 -> v5:
> - initial_level now part of SMMUTransCfg
> - smmu_page_walk_64 takes into account the max input size
> - implement sys->iommu_ops.replay and sys->iommu_ops.notify_flag_changed
> - smmuv3_translate: bug fix: don't walk on bypass
> - smmu_update_qreg: fix PROD index update
> - I did not yet address Peter's comments as the code is not mature enough
>    to be split into sub patches.
> 
> v3 -> v4 [Eric]:
> - page table walk rewritten to allow scan of the page table within a
>    range of IOVA. This prepares for VFIO integration and replay.
> - configuration parsing partially reworked.
> - do not advertise unsupported/untested features: S2, S1 + S2, HYP,
>    PRI, ATS, ..
> - added ACPI table generation
> - migrated to dynamic traces
> - mingw compilation fix
> 
> v2 -> v3 [Eric]:
> - rebased on 2.9
> - mostly code and patch reorganization to ease the review process
> - optional patches removed. They may be handled separately. I am currently
>    working on ACPI enablement.
> - optional instantiation of the smmu in mach-virt
> - removed [2/9] (fdt functions) since not mandated
> - start splitting main patch into base and derived object
> - no new function feature added
> 
> v1 -> v2 [Prem]:
> - Adopted review comments from Eric Auger
>          - Make SMMU_DPRINTF to internally call qemu_log
>              (since translation requests are too many, we need control
>               on the type of log we want)
>          - SMMUTransCfg modified to suite simplicity
>          - Change RegInfo to uint64 register array
>          - Code cleanup
>          - Test cleanups
> - Reshuffled patches
> 
> v0 -> v1 [Prem]:
> - As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable)
> - Reworked register access/update logic
> - Factored out translation code for
>          - single point bug fix
>          - sharing/removal in future
> - (optional) Unit tests added, with PCI test device
>          - S1 with 4k/64k, S1+S2 with 4k/64k
>          - (S1 or S2) only can be verified by Linux 4.7 driver
>          - (optional) Priliminary ACPI support
> 
> v0 [Prem]:
> - Implements SMMUv3 spec 11.0
> - Supported for PCIe devices,
> - Command Queue and Event Queue supported
> - LPAE only, S1 is supported and Tested, S2 not tested
> - BE mode Translation not supported
> - IRQ support (legacy, no MSI)
> - Tested with DPDK and e1000
> 
> 
> Eric Auger (5):
>    hw/arm/smmu-common: smmu base class
>    hw/arm/virt: Add 2.10 machine type
>    hw/arm/virt: Add tlbi-on-map property to the smmuv3 node
>    target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route
>    hw/arm/smmuv3: VFIO integration
> 
> Prem Mallappa (3):
>    hw/arm/smmuv3: smmuv3 emulation model
>    hw/arm/virt: Add SMMUv3 to the virt board
>    hw/arm/virt-acpi-build: Add smmuv3 node in IORT table
> 
>   default-configs/aarch64-softmmu.mak |    1 +
>   hw/arm/Makefile.objs                |    1 +
>   hw/arm/smmu-common.c                |  474 +++++++++++++
>   hw/arm/smmu-internal.h              |   89 +++
>   hw/arm/smmuv3-internal.h            |  651 ++++++++++++++++++
>   hw/arm/smmuv3.c                     | 1256 +++++++++++++++++++++++++++++++++++
>   hw/arm/trace-events                 |   54 ++
>   hw/arm/virt-acpi-build.c            |   56 +-
>   hw/arm/virt.c                       |  111 +++-
>   include/hw/acpi/acpi-defs.h         |   15 +
>   include/hw/arm/smmu-common.h        |  127 ++++
>   include/hw/arm/smmuv3.h             |   87 +++
>   include/hw/arm/virt.h               |    5 +
>   target/arm/kvm.c                    |   28 +
>   target/arm/trace-events             |    3 +
>   15 files changed, 2949 insertions(+), 9 deletions(-)
>   create mode 100644 hw/arm/smmu-common.c
>   create mode 100644 hw/arm/smmu-internal.h
>   create mode 100644 hw/arm/smmuv3-internal.h
>   create mode 100644 hw/arm/smmuv3.c
>   create mode 100644 include/hw/arm/smmu-common.h
>   create mode 100644 include/hw/arm/smmuv3.h
> 


Re: [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support
Posted by Tomasz Nowicki 8 years, 3 months ago
Hi Eric,

Just letting you know that I am facing another issue with the following 
setup:
1. host (4.12 kernel & 64K page) and VM (4.12 kernel & 64K page)
2. QEMU + -netdev type=tap,ifname=tap,id=net0 -device 
virtio-net-pci,netdev=net0,iommu_platform,disable-modern=off,disable-legacy=on
2. On VM, I allocate some huge pages and run DPDK testpmd app:
# echo 4 > /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages
# ./dpdk/usertools/dpdk-devbind.py -b vfio-pci  0000:00:02.0
# ./dpdk/build/app/testpmd -l 0-13 -n 4 -w 0000:00:02.0 -- 
--disable-hw-vlan-filter --disable-rss -i
EAL: Detected 14 lcore(s)
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: PCI device 0000:00:02.0 on NUMA socket -1
EAL:   probe driver: 1af4:1041 net_virtio
EAL:   using IOMMU type 1 (Type 1)
EAL: iommu_map_dma vaddr ffff20000000 size 80000000 iova 120000000
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (12)
EAL: Can't write to PCI bar (0) : offset (12)
EAL: Can't read from PCI bar (0) : offset (0)
EAL: Can't write to PCI bar (0) : offset (4)
EAL: Can't write to PCI bar (0) : offset (14)
EAL: Can't write to PCI bar (0) : offset (e)
EAL: Can't read from PCI bar (0) : offset (c)
EAL: Requested device 0000:00:02.0 cannot be used
EAL: No probed ethernet devices
Interactive-mode selected
USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=251456, size=2176, 
socket=0

When VM uses *4K pages* the same setup works fine. I will work on this 
but please let me know in case you already know what is going on.

Thanks,
Tomasz


On 09.07.2017 22:51, Eric Auger wrote:
> This series implements the emulation code for ARM SMMUv3.
> This is the continuation of Prem's work [1].
> 
> This v5 mainly brings VFIO integration in DT mode. On guest kernel
> side, this requires a quirk [1] to force TLB invalidation on map.
> 
> The following changes also are noticeable:
> - fix SMMU_CMDQ_CONS offset
> - adds dma-coherent dt property which fixes the unhandled command
>    opcode bug.
> - implements block PTE
> 
> The smmu is instantiated when passing the smmu option to machvirt:
> "-M virt-2.10,smmu"
> 
> As I haven't split the code yet so that it can be easily reviewable
> I don't expect deep reviews at this stage. Also the implementation may
> be largely sub-optimal.
> 
> Tested Use Cases:
> - booted a guest in dt and acpi mode with an iommu_platform
>    virtio-net-pci device (using dma ops). Tested with the following
>    guest combinations: 4K page - 39 bit VA, 4K - 48b, 64K - 39b,
>    64K - 48b.
> - booted a guest (featuring [1]) with PCIe passthrough'ed PCIe devices:
>    - AMD Overdrive and igbvf passthrough (using gsi direct mapping)
>    - Cavium ThunderX and ixgbevf passthrough (using KVM MSI routing)
> 
> Unfortunately I have not been able to run DPDK testpmd yet on guest side.
> The problem I see is the user space driver dma-maps a huge area
> and this causes plenty of CMDQ_OP_TLBI_NH_VA commands to be sent
> (tlbi-on-map) which are sent for each page whereas the dma-map covers a
> huge page. I will work on this issue for next version.
> 
> Known limitations:
> - no VMSAv8-32 suport
> - no nested stage support (S1 + S2)
> - no support for HYP mappings
> - register fine emulation, commands, interrupts and errors were
>    not accurately tested. Handling is sufficient to run use cases
>    described hereafter though.
> 
> Best Regards
> 
> Eric
> 
> This series can be found at:
> v5: https://github.com/eauger/qemu/tree/v2.9-SMMU-v5
> v4: https://github.com/eauger/qemu/tree/v2.9-SMMU-v4
> 
> References:
> [1] [RFC 0/2] arm-smmu-v3 tlbi-on-map option
> [2] Prem's last iteration:
> - https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03531.html
> 
> History:
> v4 -> v5:
> - initial_level now part of SMMUTransCfg
> - smmu_page_walk_64 takes into account the max input size
> - implement sys->iommu_ops.replay and sys->iommu_ops.notify_flag_changed
> - smmuv3_translate: bug fix: don't walk on bypass
> - smmu_update_qreg: fix PROD index update
> - I did not yet address Peter's comments as the code is not mature enough
>    to be split into sub patches.
> 
> v3 -> v4 [Eric]:
> - page table walk rewritten to allow scan of the page table within a
>    range of IOVA. This prepares for VFIO integration and replay.
> - configuration parsing partially reworked.
> - do not advertise unsupported/untested features: S2, S1 + S2, HYP,
>    PRI, ATS, ..
> - added ACPI table generation
> - migrated to dynamic traces
> - mingw compilation fix
> 
> v2 -> v3 [Eric]:
> - rebased on 2.9
> - mostly code and patch reorganization to ease the review process
> - optional patches removed. They may be handled separately. I am currently
>    working on ACPI enablement.
> - optional instantiation of the smmu in mach-virt
> - removed [2/9] (fdt functions) since not mandated
> - start splitting main patch into base and derived object
> - no new function feature added
> 
> v1 -> v2 [Prem]:
> - Adopted review comments from Eric Auger
>          - Make SMMU_DPRINTF to internally call qemu_log
>              (since translation requests are too many, we need control
>               on the type of log we want)
>          - SMMUTransCfg modified to suite simplicity
>          - Change RegInfo to uint64 register array
>          - Code cleanup
>          - Test cleanups
> - Reshuffled patches
> 
> v0 -> v1 [Prem]:
> - As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable)
> - Reworked register access/update logic
> - Factored out translation code for
>          - single point bug fix
>          - sharing/removal in future
> - (optional) Unit tests added, with PCI test device
>          - S1 with 4k/64k, S1+S2 with 4k/64k
>          - (S1 or S2) only can be verified by Linux 4.7 driver
>          - (optional) Priliminary ACPI support
> 
> v0 [Prem]:
> - Implements SMMUv3 spec 11.0
> - Supported for PCIe devices,
> - Command Queue and Event Queue supported
> - LPAE only, S1 is supported and Tested, S2 not tested
> - BE mode Translation not supported
> - IRQ support (legacy, no MSI)
> - Tested with DPDK and e1000
> 
> 
> Eric Auger (5):
>    hw/arm/smmu-common: smmu base class
>    hw/arm/virt: Add 2.10 machine type
>    hw/arm/virt: Add tlbi-on-map property to the smmuv3 node
>    target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route
>    hw/arm/smmuv3: VFIO integration
> 
> Prem Mallappa (3):
>    hw/arm/smmuv3: smmuv3 emulation model
>    hw/arm/virt: Add SMMUv3 to the virt board
>    hw/arm/virt-acpi-build: Add smmuv3 node in IORT table
> 
>   default-configs/aarch64-softmmu.mak |    1 +
>   hw/arm/Makefile.objs                |    1 +
>   hw/arm/smmu-common.c                |  474 +++++++++++++
>   hw/arm/smmu-internal.h              |   89 +++
>   hw/arm/smmuv3-internal.h            |  651 ++++++++++++++++++
>   hw/arm/smmuv3.c                     | 1256 +++++++++++++++++++++++++++++++++++
>   hw/arm/trace-events                 |   54 ++
>   hw/arm/virt-acpi-build.c            |   56 +-
>   hw/arm/virt.c                       |  111 +++-
>   include/hw/acpi/acpi-defs.h         |   15 +
>   include/hw/arm/smmu-common.h        |  127 ++++
>   include/hw/arm/smmuv3.h             |   87 +++
>   include/hw/arm/virt.h               |    5 +
>   target/arm/kvm.c                    |   28 +
>   target/arm/trace-events             |    3 +
>   15 files changed, 2949 insertions(+), 9 deletions(-)
>   create mode 100644 hw/arm/smmu-common.c
>   create mode 100644 hw/arm/smmu-internal.h
>   create mode 100644 hw/arm/smmuv3-internal.h
>   create mode 100644 hw/arm/smmuv3.c
>   create mode 100644 include/hw/arm/smmu-common.h
>   create mode 100644 include/hw/arm/smmuv3.h
> 

Re: [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support
Posted by Auger Eric 8 years, 3 months ago
Hi Tomasz,
On 01/08/2017 13:01, Tomasz Nowicki wrote:
> Hi Eric,
> 
> Just letting you know that I am facing another issue with the following
> setup:
> 1. host (4.12 kernel & 64K page) and VM (4.12 kernel & 64K page)
> 2. QEMU + -netdev type=tap,ifname=tap,id=net0 -device
> virtio-net-pci,netdev=net0,iommu_platform,disable-modern=off,disable-legacy=on
> 
> 2. On VM, I allocate some huge pages and run DPDK testpmd app:
> # echo 4 > /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages
> # ./dpdk/usertools/dpdk-devbind.py -b vfio-pci  0000:00:02.0
> # ./dpdk/build/app/testpmd -l 0-13 -n 4 -w 0000:00:02.0 --
> --disable-hw-vlan-filter --disable-rss -i
> EAL: Detected 14 lcore(s)
> EAL: Probing VFIO support...
> EAL: VFIO support initialized
> EAL: PCI device 0000:00:02.0 on NUMA socket -1
> EAL:   probe driver: 1af4:1041 net_virtio
> EAL:   using IOMMU type 1 (Type 1)
> EAL: iommu_map_dma vaddr ffff20000000 size 80000000 iova 120000000
> EAL: Can't write to PCI bar (0) : offset (12)
> EAL: Can't read from PCI bar (0) : offset (12)
> EAL: Can't read from PCI bar (0) : offset (12)
> EAL: Can't write to PCI bar (0) : offset (12)
> EAL: Can't read from PCI bar (0) : offset (12)
> EAL: Can't write to PCI bar (0) : offset (12)
> EAL: Can't read from PCI bar (0) : offset (0)
> EAL: Can't write to PCI bar (0) : offset (4)
> EAL: Can't write to PCI bar (0) : offset (14)
> EAL: Can't write to PCI bar (0) : offset (e)
> EAL: Can't read from PCI bar (0) : offset (c)
> EAL: Requested device 0000:00:02.0 cannot be used
> EAL: No probed ethernet devices
> Interactive-mode selected
> USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=251456, size=2176,
> socket=0
> 
> When VM uses *4K pages* the same setup works fine. I will work on this
> but please let me know in case you already know what is going on.

No I did not face that one. I was able to launch testpmd without such
early message. However I assigned an igbvf device to the guest and then
to DPDK. I've never tested your config.

However as stated in my cover letter at the moment DPDK is not working
for me because of storms of tlbi-on-maps. I intend to work on this as
soon as get some bandwidth, sorry.

Thanks

Eric
> 
> Thanks,
> Tomasz
> 
> 
> On 09.07.2017 22:51, Eric Auger wrote:
>> This series implements the emulation code for ARM SMMUv3.
>> This is the continuation of Prem's work [1].
>>
>> This v5 mainly brings VFIO integration in DT mode. On guest kernel
>> side, this requires a quirk [1] to force TLB invalidation on map.
>>
>> The following changes also are noticeable:
>> - fix SMMU_CMDQ_CONS offset
>> - adds dma-coherent dt property which fixes the unhandled command
>>    opcode bug.
>> - implements block PTE
>>
>> The smmu is instantiated when passing the smmu option to machvirt:
>> "-M virt-2.10,smmu"
>>
>> As I haven't split the code yet so that it can be easily reviewable
>> I don't expect deep reviews at this stage. Also the implementation may
>> be largely sub-optimal.
>>
>> Tested Use Cases:
>> - booted a guest in dt and acpi mode with an iommu_platform
>>    virtio-net-pci device (using dma ops). Tested with the following
>>    guest combinations: 4K page - 39 bit VA, 4K - 48b, 64K - 39b,
>>    64K - 48b.
>> - booted a guest (featuring [1]) with PCIe passthrough'ed PCIe devices:
>>    - AMD Overdrive and igbvf passthrough (using gsi direct mapping)
>>    - Cavium ThunderX and ixgbevf passthrough (using KVM MSI routing)
>>
>> Unfortunately I have not been able to run DPDK testpmd yet on guest side.
>> The problem I see is the user space driver dma-maps a huge area
>> and this causes plenty of CMDQ_OP_TLBI_NH_VA commands to be sent
>> (tlbi-on-map) which are sent for each page whereas the dma-map covers a
>> huge page. I will work on this issue for next version.
>>
>> Known limitations:
>> - no VMSAv8-32 suport
>> - no nested stage support (S1 + S2)
>> - no support for HYP mappings
>> - register fine emulation, commands, interrupts and errors were
>>    not accurately tested. Handling is sufficient to run use cases
>>    described hereafter though.
>>
>> Best Regards
>>
>> Eric
>>
>> This series can be found at:
>> v5: https://github.com/eauger/qemu/tree/v2.9-SMMU-v5
>> v4: https://github.com/eauger/qemu/tree/v2.9-SMMU-v4
>>
>> References:
>> [1] [RFC 0/2] arm-smmu-v3 tlbi-on-map option
>> [2] Prem's last iteration:
>> - https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg03531.html
>>
>> History:
>> v4 -> v5:
>> - initial_level now part of SMMUTransCfg
>> - smmu_page_walk_64 takes into account the max input size
>> - implement sys->iommu_ops.replay and sys->iommu_ops.notify_flag_changed
>> - smmuv3_translate: bug fix: don't walk on bypass
>> - smmu_update_qreg: fix PROD index update
>> - I did not yet address Peter's comments as the code is not mature enough
>>    to be split into sub patches.
>>
>> v3 -> v4 [Eric]:
>> - page table walk rewritten to allow scan of the page table within a
>>    range of IOVA. This prepares for VFIO integration and replay.
>> - configuration parsing partially reworked.
>> - do not advertise unsupported/untested features: S2, S1 + S2, HYP,
>>    PRI, ATS, ..
>> - added ACPI table generation
>> - migrated to dynamic traces
>> - mingw compilation fix
>>
>> v2 -> v3 [Eric]:
>> - rebased on 2.9
>> - mostly code and patch reorganization to ease the review process
>> - optional patches removed. They may be handled separately. I am
>> currently
>>    working on ACPI enablement.
>> - optional instantiation of the smmu in mach-virt
>> - removed [2/9] (fdt functions) since not mandated
>> - start splitting main patch into base and derived object
>> - no new function feature added
>>
>> v1 -> v2 [Prem]:
>> - Adopted review comments from Eric Auger
>>          - Make SMMU_DPRINTF to internally call qemu_log
>>              (since translation requests are too many, we need control
>>               on the type of log we want)
>>          - SMMUTransCfg modified to suite simplicity
>>          - Change RegInfo to uint64 register array
>>          - Code cleanup
>>          - Test cleanups
>> - Reshuffled patches
>>
>> v0 -> v1 [Prem]:
>> - As per SMMUv3 spec 16.0 (only is_ste_consistant() is noticeable)
>> - Reworked register access/update logic
>> - Factored out translation code for
>>          - single point bug fix
>>          - sharing/removal in future
>> - (optional) Unit tests added, with PCI test device
>>          - S1 with 4k/64k, S1+S2 with 4k/64k
>>          - (S1 or S2) only can be verified by Linux 4.7 driver
>>          - (optional) Priliminary ACPI support
>>
>> v0 [Prem]:
>> - Implements SMMUv3 spec 11.0
>> - Supported for PCIe devices,
>> - Command Queue and Event Queue supported
>> - LPAE only, S1 is supported and Tested, S2 not tested
>> - BE mode Translation not supported
>> - IRQ support (legacy, no MSI)
>> - Tested with DPDK and e1000
>>
>>
>> Eric Auger (5):
>>    hw/arm/smmu-common: smmu base class
>>    hw/arm/virt: Add 2.10 machine type
>>    hw/arm/virt: Add tlbi-on-map property to the smmuv3 node
>>    target/arm/kvm: Translate the MSI doorbell in kvm_arch_fixup_msi_route
>>    hw/arm/smmuv3: VFIO integration
>>
>> Prem Mallappa (3):
>>    hw/arm/smmuv3: smmuv3 emulation model
>>    hw/arm/virt: Add SMMUv3 to the virt board
>>    hw/arm/virt-acpi-build: Add smmuv3 node in IORT table
>>
>>   default-configs/aarch64-softmmu.mak |    1 +
>>   hw/arm/Makefile.objs                |    1 +
>>   hw/arm/smmu-common.c                |  474 +++++++++++++
>>   hw/arm/smmu-internal.h              |   89 +++
>>   hw/arm/smmuv3-internal.h            |  651 ++++++++++++++++++
>>   hw/arm/smmuv3.c                     | 1256
>> +++++++++++++++++++++++++++++++++++
>>   hw/arm/trace-events                 |   54 ++
>>   hw/arm/virt-acpi-build.c            |   56 +-
>>   hw/arm/virt.c                       |  111 +++-
>>   include/hw/acpi/acpi-defs.h         |   15 +
>>   include/hw/arm/smmu-common.h        |  127 ++++
>>   include/hw/arm/smmuv3.h             |   87 +++
>>   include/hw/arm/virt.h               |    5 +
>>   target/arm/kvm.c                    |   28 +
>>   target/arm/trace-events             |    3 +
>>   15 files changed, 2949 insertions(+), 9 deletions(-)
>>   create mode 100644 hw/arm/smmu-common.c
>>   create mode 100644 hw/arm/smmu-internal.h
>>   create mode 100644 hw/arm/smmuv3-internal.h
>>   create mode 100644 hw/arm/smmuv3.c
>>   create mode 100644 include/hw/arm/smmu-common.h
>>   create mode 100644 include/hw/arm/smmuv3.h
>>

Re: [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support
Posted by Tomasz Nowicki 8 years, 3 months ago
Hi Eric,

On 01.08.2017 15:07, Auger Eric wrote:
> Hi Tomasz,
> On 01/08/2017 13:01, Tomasz Nowicki wrote:
>> Hi Eric,
>>
>> Just letting you know that I am facing another issue with the following
>> setup:
>> 1. host (4.12 kernel & 64K page) and VM (4.12 kernel & 64K page)
>> 2. QEMU + -netdev type=tap,ifname=tap,id=net0 -device
>> virtio-net-pci,netdev=net0,iommu_platform,disable-modern=off,disable-legacy=on
>>
>> 2. On VM, I allocate some huge pages and run DPDK testpmd app:
>> # echo 4 > /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages
>> # ./dpdk/usertools/dpdk-devbind.py -b vfio-pci  0000:00:02.0
>> # ./dpdk/build/app/testpmd -l 0-13 -n 4 -w 0000:00:02.0 --
>> --disable-hw-vlan-filter --disable-rss -i
>> EAL: Detected 14 lcore(s)
>> EAL: Probing VFIO support...
>> EAL: VFIO support initialized
>> EAL: PCI device 0000:00:02.0 on NUMA socket -1
>> EAL:   probe driver: 1af4:1041 net_virtio
>> EAL:   using IOMMU type 1 (Type 1)
>> EAL: iommu_map_dma vaddr ffff20000000 size 80000000 iova 120000000
>> EAL: Can't write to PCI bar (0) : offset (12)
>> EAL: Can't read from PCI bar (0) : offset (12)
>> EAL: Can't read from PCI bar (0) : offset (12)
>> EAL: Can't write to PCI bar (0) : offset (12)
>> EAL: Can't read from PCI bar (0) : offset (12)
>> EAL: Can't write to PCI bar (0) : offset (12)
>> EAL: Can't read from PCI bar (0) : offset (0)
>> EAL: Can't write to PCI bar (0) : offset (4)
>> EAL: Can't write to PCI bar (0) : offset (14)
>> EAL: Can't write to PCI bar (0) : offset (e)
>> EAL: Can't read from PCI bar (0) : offset (c)
>> EAL: Requested device 0000:00:02.0 cannot be used
>> EAL: No probed ethernet devices
>> Interactive-mode selected
>> USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=251456, size=2176,
>> socket=0
>>
>> When VM uses *4K pages* the same setup works fine. I will work on this
>> but please let me know in case you already know what is going on.
> 
> No I did not face that one. I was able to launch testpmd without such
> early message. However I assigned an igbvf device to the guest and then
> to DPDK. I've never tested your config.
> 
> However as stated in my cover letter at the moment DPDK is not working
> for me because of storms of tlbi-on-maps. I intend to work on this as
> soon as get some bandwidth, sorry.

I found what was the reason of failure.

QEMU creates BARs for VIRTIO PCI device. The size of it depends on what 
is necessary for VIRTIO protocol. In my case the BAR is 16K size which 
is too small to be mmapable for kernel with 64K pages:
vfio_pci_enable() -> vfio_pci_probe_mmaps() ->
here guest kernel checks that BAR size is smaller than current PAGE_SIZE 
and clears VFIO_REGION_INFO_FLAG_MMAP flag which prevents BAR from being 
mmapped later on. I added -device virtio-net-pci,...,page-per-vq=on to 
enlarge BAR size to 8M and now testpmd works fine. I wonder how the same 
setup is working with e.g. Intel or AMD IOMMU.

Thanks,
Tomasz

Re: [Qemu-devel] [RFC v5 0/8] ARM SMMUv3 Emulation Support
Posted by Auger Eric 8 years, 3 months ago
Hi Tomasz,

On 03/08/2017 12:11, Tomasz Nowicki wrote:
> Hi Eric,
> 
> On 01.08.2017 15:07, Auger Eric wrote:
>> Hi Tomasz,
>> On 01/08/2017 13:01, Tomasz Nowicki wrote:
>>> Hi Eric,
>>>
>>> Just letting you know that I am facing another issue with the following
>>> setup:
>>> 1. host (4.12 kernel & 64K page) and VM (4.12 kernel & 64K page)
>>> 2. QEMU + -netdev type=tap,ifname=tap,id=net0 -device
>>> virtio-net-pci,netdev=net0,iommu_platform,disable-modern=off,disable-legacy=on
>>>
>>>
>>> 2. On VM, I allocate some huge pages and run DPDK testpmd app:
>>> # echo 4 > /sys/kernel/mm/hugepages/hugepages-524288kB/nr_hugepages
>>> # ./dpdk/usertools/dpdk-devbind.py -b vfio-pci  0000:00:02.0
>>> # ./dpdk/build/app/testpmd -l 0-13 -n 4 -w 0000:00:02.0 --
>>> --disable-hw-vlan-filter --disable-rss -i
>>> EAL: Detected 14 lcore(s)
>>> EAL: Probing VFIO support...
>>> EAL: VFIO support initialized
>>> EAL: PCI device 0000:00:02.0 on NUMA socket -1
>>> EAL:   probe driver: 1af4:1041 net_virtio
>>> EAL:   using IOMMU type 1 (Type 1)
>>> EAL: iommu_map_dma vaddr ffff20000000 size 80000000 iova 120000000
>>> EAL: Can't write to PCI bar (0) : offset (12)
>>> EAL: Can't read from PCI bar (0) : offset (12)
>>> EAL: Can't read from PCI bar (0) : offset (12)
>>> EAL: Can't write to PCI bar (0) : offset (12)
>>> EAL: Can't read from PCI bar (0) : offset (12)
>>> EAL: Can't write to PCI bar (0) : offset (12)
>>> EAL: Can't read from PCI bar (0) : offset (0)
>>> EAL: Can't write to PCI bar (0) : offset (4)
>>> EAL: Can't write to PCI bar (0) : offset (14)
>>> EAL: Can't write to PCI bar (0) : offset (e)
>>> EAL: Can't read from PCI bar (0) : offset (c)
>>> EAL: Requested device 0000:00:02.0 cannot be used
>>> EAL: No probed ethernet devices
>>> Interactive-mode selected
>>> USER1: create a new mbuf pool <mbuf_pool_socket_0>: n=251456, size=2176,
>>> socket=0
>>>
>>> When VM uses *4K pages* the same setup works fine. I will work on this
>>> but please let me know in case you already know what is going on.
>>
>> No I did not face that one. I was able to launch testpmd without such
>> early message. However I assigned an igbvf device to the guest and then
>> to DPDK. I've never tested your config.
>>
>> However as stated in my cover letter at the moment DPDK is not working
>> for me because of storms of tlbi-on-maps. I intend to work on this as
>> soon as get some bandwidth, sorry.
> 
> I found what was the reason of failure.
> 
> QEMU creates BARs for VIRTIO PCI device. The size of it depends on what
> is necessary for VIRTIO protocol. In my case the BAR is 16K size which
> is too small to be mmapable for kernel with 64K pages:
> vfio_pci_enable() -> vfio_pci_probe_mmaps() ->
> here guest kernel checks that BAR size is smaller than current PAGE_SIZE
> and clears VFIO_REGION_INFO_FLAG_MMAP flag which prevents BAR from being
> mmapped later on. I added -device virtio-net-pci,...,page-per-vq=on to
> enlarge BAR size to 8M and now testpmd works fine. I wonder how the same
> setup is working with e.g. Intel or AMD IOMMU.
Hum OK. Yet another thing to investigate! thank you for your efforts and
excellent news overall. Preparing a rebase ...

Thanks

Eric
> 
> Thanks,
> Tomasz
>