[PATCH 0/9] SMMUv3.2 Range-based TLB Invalidation Support

Eric Auger posted 9 patches 6 months, 1 week ago
Test docker-mingw@fedora passed
Test checkpatch passed
Test asan passed
Test docker-quick@centos7 passed
Test FreeBSD passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200521091059.9453-1-eric.auger@redhat.com
Maintainers: Peter Maydell <peter.maydell@linaro.org>, Eric Auger <eric.auger@redhat.com>
hw/arm/smmu-internal.h       |  14 +++
hw/arm/smmuv3-internal.h     |   5 +
include/hw/arm/smmu-common.h |  24 ++--
hw/arm/smmu-common.c         | 205 +++++++++++++++++++++++------------
hw/arm/smmuv3.c              | 142 ++++++++++++------------
hw/arm/trace-events          |  12 +-
6 files changed, 246 insertions(+), 156 deletions(-)

[PATCH 0/9] SMMUv3.2 Range-based TLB Invalidation Support

Posted by Eric Auger 6 months, 1 week ago
SMMU3.2 brings the support of range-based TLB invalidation and
level hint. When this feature is supported, the SMMUv3 driver
is allowed to send TLB invalidations for a range of IOVAs instead
of using page based invalidation.

Implementing this feature in the virtual SMMUv3 device is
mandated for DPDK on guest use case: DPDK uses hugepage
buffers and guest sends invalidations for blocks. Without
this feature, a guest invalidation of a block of 1GB for instance
translates into a storm of page invalidations. Each of them
is trapped by the VMM and cascaded downto the physical IOMMU.
This completely stalls the execution. This integration issue
was initially reported in [1].

Now SMMUv3.2 specifies additional parameters to NH_VA and NH_VAA
stage 1 invalidation commands so we can support those extensions.

patches [1, 3] are cleanup patches.
patches [4, 6] changes the implementation of the VSMMUV3 IOTLB
   This IOTLB is a minimalist IOTLB implementation that avoids to
   do the page table walk in case we have an entry in the TLB.
   Previously entries were page mappings only. Now they can be
   blocks.
patches [7, 9] bring support for range invalidation.

Supporting block mappings in the IOTLB look sensible in terms of
TLB entry consumption. However looking at virtio/vhost device usage,
without block mapping and without range invalidation (< 5.7 kernels
it may be less performant. However for recent guest kernels
supporting range invalidations [2], the performance should be similar.

Best Regards

Eric

This series can be found at:
https://github.com/eauger/qemu.git
branch: v5.0.0-smmuv3-ril-v1

References:
[1] [RFC v2 4/4] iommu/arm-smmu-v3: add CMD_TLBI_NH_VA_AM command
for iova range invalidation
(https://lists.linuxfoundation.org/pipermail/iommu/2017-August/023679.html

[2] 5.7+ kernels featuring
6a481a95d4c1 iommu/arm-smmu-v3: Add SMMUv3.2 range invalidation support

Eric Auger (9):
  hw/arm/smmu-common: Factorize some code in smmu_ptw_64()
  hw/arm/smmu-common: Add IOTLB helpers
  hw/arm/smmu: Simplify the IOTLB key format
  hw/arm/smmu: Introduce SMMUTLBEntry for PTW and IOTLB value
  hw/arm/smmuv3: Store the starting level in SMMUTransTableInfo
  hw/arm/smmu-common: Manage IOTLB block entries
  hw/arm/smmuv3: Introduce smmuv3_s1_range_inval() helper
  hw/arm/smmuv3: Get prepared for range invalidation
  hw/arm/smmuv3: Advertise SMMUv3.2 range invalidation

 hw/arm/smmu-internal.h       |  14 +++
 hw/arm/smmuv3-internal.h     |   5 +
 include/hw/arm/smmu-common.h |  24 ++--
 hw/arm/smmu-common.c         | 205 +++++++++++++++++++++++------------
 hw/arm/smmuv3.c              | 142 ++++++++++++------------
 hw/arm/trace-events          |  12 +-
 6 files changed, 246 insertions(+), 156 deletions(-)

-- 
2.20.1