[RFC 0/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

Eugenio Pérez posted 1 patch 3 years, 10 months ago
Test FreeBSD passed
Test docker-quick@centos7 passed
Test checkpatch failed
Test docker-mingw@fedora passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20200625191651.5817-1-eperezma@redhat.com
memory.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
[RFC 0/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
Posted by Eugenio Pérez 3 years, 10 months ago
I am able to hit this assertion when a Red Hat 7 guest virtio_net device
raises an "Invalidation" of all the TLB entries. This happens in the
guest's startup if 'intel_iommu=on' argument is passed to the guest
kernel and right IOMMU/ATS devices are declared in qemu's command line.

If we examinate *entry in frame 4 of backtrace:
*entry = {target_as = 0x555556f6c050, iova = 0x0, translated_addr = 0x0,
addr_mask = 0xffffffffffffffff, perm = 0x0}

Which (I think) tries to invalidate all the TLB registers of the device.

Just deleting that assert is enough for the VM to start and communicate
using IOMMU, but maybe a better alternative is possible. We could move
it to the caller functions in other cases than IOMMU invalidation, or
make it conditional only if not invalidating.

Any comment would be appreciated. Thanks!

--

Guest kernel version: kernel-3.10.0-1151.el7.x86_64

Command line:
/home/qemu/x86_64-softmmu/qemu-system-x86_64 -name \
guest=rhel7-test,debug-threads=on -machine \
pc-q35-5.1,accel=kvm,usb=off,dump-guest-core=off,kernel_irqchip=split \
-cpu \
Broadwell,vme=on,ss=on,vmx=on,f16c=on,rdrand=on,hypervisor=on,arat=on,tsc-adjust=on,umip=on,arch-capabilities=on,xsaveopt=on,pdpe1gb=on,abm=on,skip-l1dfl-vmentry=on,rtm=on,hle=on \
-m 8096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid \
d022ecbf-679e-4755-87ce-eb87fc5bbc5d -display none -no-user-config \
-nodefaults -rtc base=utc,driftfix=slew -global \
kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global \
ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on \
-device intel-iommu,intremap=on,device-iotlb=on -device \
pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 \
-device \
pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 \
-device \
pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 \
-device \
pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 \
-device \
pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4 \
-device \
pcie-root-port,port=0xd,chassis=6,id=pci.6,bus=pcie.0,addr=0x1.0x5 \
-device \
pcie-root-port,port=0xe,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x6 \
-device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 -device \
virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 -drive \
file=/home/virtio-test2.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 \
-device \
virtio-blk-pci,scsi=off,bus=pci.4,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-netdev tap,id=hostnet0,vhost=on,vhostforce=on -device \
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:0d:1d:f2,bus=pci.1,addr=0x0,iommu_platform=on,ats=on \
-device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 -object \
rng-random,id=objrng0,filename=/dev/urandom -device \
virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 -s -msg \
timestamp=on

Full backtrace:

#0 0x00007ffff521370f in raise () at /lib64/libc.so.6
#1 0x00007ffff51fdb25 in abort () at /lib64/libc.so.6
#2 0x00007ffff51fd9f9 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
#3 0x00007ffff520bcc6 in .annobin_assert.c_end () at /lib64/libc.so.6
#4 0x0000555555888171 in memory_region_notify_one (notifier=0x7ffde05dfde8, entry=0x7ffde5dfe200) at /home/qemu/memory.c:1918
#5 0x0000555555888247 in memory_region_notify_iommu (iommu_mr=0x555556f6c0b0, iommu_idx=0, entry=...) at /home/qemu/memory.c:1941
#6 0x0000555555951c8d in vtd_process_device_iotlb_desc (s=0x555557609000, inv_desc=0x7ffde5dfe2d0)
    at /home/qemu/hw/i386/intel_iommu.c:2468
#7 0x0000555555951e6a in vtd_process_inv_desc (s=0x555557609000) at /home/qemu/hw/i386/intel_iommu.c:2531
#8 0x0000555555951fa5 in vtd_fetch_inv_desc (s=0x555557609000) at /home/qemu/hw/i386/intel_iommu.c:2563
#9 0x00005555559520e5 in vtd_handle_iqt_write (s=0x555557609000) at /home/qemu/hw/i386/intel_iommu.c:2590
#10 0x0000555555952b45 in vtd_mem_write (opaque=0x555557609000, addr=136, val=2688, size=4) at /home/qemu/hw/i386/intel_iommu.c:2837
#11 0x0000555555883e17 in memory_region_write_accessor
    (mr=0x555557609330, addr=136, value=0x7ffde5dfe478, size=4, shift=0, mask=4294967295, attrs=...) at /home/qemu/memory.c:483
#12 0x000055555588401d in access_with_adjusted_size
    (addr=136, value=0x7ffde5dfe478, size=4, access_size_min=4, access_size_max=8, access_fn=
    0x555555883d38 <memory_region_write_accessor>, mr=0x555557609330, attrs=...) at /home/qemu/memory.c:544
#13 0x0000555555886f37 in memory_region_dispatch_write (mr=0x555557609330, addr=136, data=2688, op=MO_32, attrs=...)
    at /home/qemu/memory.c:1476
#14 0x0000555555827a03 in flatview_write_continue
    (fv=0x7ffde00935d0, addr=4275634312, attrs=..., ptr=0x7ffff7ff0028, len=4, addr1=136, l=4, mr=0x555557609330) at /home/qemu/exec.c:3146
#15 0x0000555555827b48 in flatview_write (fv=0x7ffde00935d0, addr=4275634312, attrs=..., buf=0x7ffff7ff0028, len=4)
    at /home/qemu/exec.c:3186
#16 0x0000555555827e9d in address_space_write
    (as=0x5555567ca640 <address_space_memory>, addr=4275634312, attrs=..., buf=0x7ffff7ff0028, len=4) at /home/qemu/exec.c:3277
#17 0x0000555555827f0a in address_space_rw
    (as=0x5555567ca640 <address_space_memory>, addr=4275634312, attrs=..., buf=0x7ffff7ff0028, len=4, is_write=true)
    at /home/qemu/exec.c:3287
#18 0x000055555589b633 in kvm_cpu_exec (cpu=0x555556b65640) at /home/qemu/accel/kvm/kvm-all.c:2511
#19 0x0000555555876ba8 in qemu_kvm_cpu_thread_fn (arg=0x555556b65640) at /home/qemu/cpus.c:1284
#20 0x0000555555dafff1 in qemu_thread_start (args=0x555556b8c3b0) at util/qemu-thread-posix.c:521
#21 0x00007ffff55a62de in start_thread () at /lib64/libpthread.so.0
#22 0x00007ffff52d7e83 in clone () at /lib64/libc.so.6

--

Eugenio Pérez (1):
  memory: Delete assertion in memory_region_unregister_iommu_notifier

 memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.18.1


Re: [RFC 0/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
Posted by no-reply@patchew.org 3 years, 10 months ago
Patchew URL: https://patchew.org/QEMU/20200625191651.5817-1-eperezma@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [RFC 0/1] memory: Delete assertion in memory_region_unregister_iommu_notifier
Type: series
Message-id: 20200625191651.5817-1-eperezma@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

From https://github.com/patchew-project/qemu
 * [new tag]         patchew/20200625191651.5817-1-eperezma@redhat.com -> patchew/20200625191651.5817-1-eperezma@redhat.com
Switched to a new branch 'test'
3dcb120 memory: Delete assertion in memory_region_unregister_iommu_notifier

=== OUTPUT BEGIN ===
ERROR: do not use C99 // comments
#24: FILE: memory.c:1918:
+    // assert(entry->iova >= notifier->start && entry_end <= notifier->end);

total: 1 errors, 0 warnings, 8 lines checked

Commit 3dcb1204223a (memory: Delete assertion in memory_region_unregister_iommu_notifier) has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200625191651.5817-1-eperezma@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
[PATCH 0/5] memory: Skip assertion in memory_region_unregister_iommu_notifier
Posted by Eugenio Pérez 3 years, 7 months ago
I am able to hit this assertion when a Red Hat 7 guest virtio_net device
raises an "Invalidation" of all the TLB entries. This happens in the
guest's startup if 'intel_iommu=on' argument is passed to the guest
kernel and right IOMMU/ATS devices are declared in qemu's command line.

Command line:
/home/qemu/x86_64-softmmu/qemu-system-x86_64 -name \
guest=rhel7-test,debug-threads=on -machine \
pc-q35-5.1,accel=kvm,usb=off,dump-guest-core=off,kernel_irqchip=split \
-cpu \
Broadwell,vme=on,ss=on,vmx=on,f16c=on,rdrand=on,hypervisor=on,arat=on,tsc-adjust=on,umip=on,arch-capabilities=on,xsaveopt=on,pdpe1gb=on,abm=on,skip-l1dfl-vmentry=on,rtm=on,hle=on \
-m 8096 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid \
d022ecbf-679e-4755-87ce-eb87fc5bbc5d -display none -no-user-config \
-nodefaults -rtc base=utc,driftfix=slew -global \
kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global \
ICH9-LPC.disable_s3=1 -global ICH9-LPC.disable_s4=1 -boot strict=on \
-device intel-iommu,intremap=on,device-iotlb=on -device \
pcie-root-port,port=0x8,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x1 \
-device \
pcie-root-port,port=0x9,chassis=2,id=pci.2,bus=pcie.0,addr=0x1.0x1 \
-device \
pcie-root-port,port=0xa,chassis=3,id=pci.3,bus=pcie.0,addr=0x1.0x2 \
-device \
pcie-root-port,port=0xb,chassis=4,id=pci.4,bus=pcie.0,addr=0x1.0x3 \
-device \
pcie-root-port,port=0xc,chassis=5,id=pci.5,bus=pcie.0,addr=0x1.0x4 \
-device \
pcie-root-port,port=0xd,chassis=6,id=pci.6,bus=pcie.0,addr=0x1.0x5 \
-device \
pcie-root-port,port=0xe,chassis=7,id=pci.7,bus=pcie.0,addr=0x1.0x6 \
-device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.2,addr=0x0 -device \
virtio-serial-pci,id=virtio-serial0,bus=pci.3,addr=0x0 -drive \
file=/home/virtio-test2.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 \
-device \
virtio-blk-pci,scsi=off,bus=pci.4,addr=0x0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
-netdev tap,id=hostnet0,vhost=on,vhostforce=on -device \
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:0d:1d:f2,bus=pci.1,addr=0x0,iommu_platform=on,ats=on \
-device virtio-balloon-pci,id=balloon0,bus=pci.5,addr=0x0 -object \
rng-random,id=objrng0,filename=/dev/urandom -device \
virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.6,addr=0x0 -s -msg \
timestamp=on

Full backtrace:
 #0  0x00007ffff521370f in raise () at /lib64/libc.so.6
 #1  0x00007ffff51fdb25 in abort () at /lib64/libc.so.6
 #2  0x00007ffff51fd9f9 in _nl_load_domain.cold.0 () at /lib64/libc.so.6
 #3  0x00007ffff520bcc6 in .annobin_assert.c_end () at /lib64/libc.so.6
 #4  0x0000555555888171 in memory_region_notify_one (notifier=0x7ffde0487fa8,
                                                    entry=0x7ffde5dfe200)
                          at /home/qemu/memory.c:1918
 #5  0x0000555555888247 in memory_region_notify_iommu (iommu_mr=0x555556f6c0b0,
                                                      iommu_idx=0, entry=...)
                          at /home/qemu/memory.c:1941
 #6  0x0000555555951c8d in vtd_process_device_iotlb_desc (s=0x555557609000,
                                                       inv_desc=0x7ffde5dfe2d0)
                          at /home/qemu/hw/i386/intel_iommu.c:2468
 #7  0x0000555555951e6a in vtd_process_inv_desc (s=0x555557609000)
                          at /home/qemu/hw/i386/intel_iommu.c:2531
 #8  0x0000555555951fa5 in vtd_fetch_inv_desc (s=0x555557609000)
                          at /home/qemu/hw/i386/intel_iommu.c:2563
 #9  0x00005555559520e5 in vtd_handle_iqt_write (s=0x555557609000)
                          at /home/qemu/hw/i386/intel_iommu.c:2590
 #10 0x0000555555952b45 in vtd_mem_write (opaque=0x555557609000, addr=136,
                                         val=2688, size=4)
                          at /home/qemu/hw/i386/intel_iommu.c:2837
 #11 0x0000555555883e17 in memory_region_write_accessor (mr=0x555557609330,
                                                        addr=136,
                                                        value=0x7ffde5dfe478,
                                                        size=4,
                                                        shift=0,
                                                        mask=4294967295,
                                                        attrs=...)
                         at /home/qemu/memory.c:483
 #12 0x000055555588401d in access_with_adjusted_size (addr=136,
                       value=0x7ffde5dfe478,
                       size=4,
                       access_size_min=4,
                       access_size_max=8,
                       access_fn=0x555555883d38 <memory_region_write_accessor>,
                       mr=0x555557609330,
                       attrs=...)
                       at /home/qemu/memory.c:544
 #13 0x0000555555886f37 in memory_region_dispatch_write (mr=0x555557609330,
                                                       addr=136,
                                                       data=2688,
                                                       op=MO_32,
                                                       attrs=...)
                         at /home/qemu/memory.c:1476
 #14 0x0000555555827a03 in flatview_write_continue (fv=0x7ffdd8503150,
                                                   addr=4275634312,
                                                   attrs=...,
                                                   ptr=0x7ffff7ff0028,
                                                   len=4,
                                                   addr1=136,
                                                   l=4,
                                                   mr=0x555557609330)
                          at /home/qemu/exec.c:3146
 #15 0x0000555555827b48 in flatview_write (fv=0x7ffdd8503150,
                                          addr=4275634312,
                                          attrs=...,
                                          buf=0x7ffff7ff0028,
                                          len=4)
                          at /home/qemu/exec.c:3186
 #16 0x0000555555827e9d in address_space_write (
                                      as=0x5555567ca640 <address_space_memory>,
                                      addr=4275634312,
                                      attrs=...,
                                      buf=0x7ffff7ff0028,
                                      len=4)
                          at /home/qemu/exec.c:3277
 #17 0x0000555555827f0a in address_space_rw (
                                      as=0x5555567ca640 <address_space_memory>,
                                      addr=4275634312,
                                      attrs=...,
                                      buf=0x7ffff7ff0028,
                                      len=4,
                                      is_write=true)
                          at /home/qemu/exec.c:3287
 #18 0x000055555589b633 in kvm_cpu_exec (cpu=0x555556b65640)
                               at /home/qemu/accel/kvm/kvm-all.c:2511
 #19 0x0000555555876ba8 in qemu_kvm_cpu_thread_fn (arg=0x555556b65640)
                               at /home/qemu/cpus.c:1284
 #20 0x0000555555dafff1 in qemu_thread_start (args=0x555556b8c3b0)
                               at util/qemu-thread-posix.c:521
 #21 0x00007ffff55a62de in start_thread () at /lib64/libpthread.so.0
 #22 0x00007ffff52d7e83 in clone () at /lib64/libc.so.6

(gdb) frame 4
 #4  0x0000555555888171 in memory_region_notify_one
                      (notifier=0x7ffde0487fa8, entry=0x7ffde5dfe200)
                      at /home/qemu/memory.c:1918
1918        assert(entry->iova >= notifier->start && entry_end <=
notifier->end);
(gdb) p *entry
$1 = {target_as = 0x555556f6c050, iova = 0, translated_addr = 0,
addr_mask = 18446744073709551615, perm = IOMMU_NONE}
--

Tested with vhost-net and qemu driver, host<->guest communication.

v1: * IOMMU_NOTIFIER_ALL now includes IOMMU_NOTIFIER_DEVIOTLB_EVENTS
      also. VFIO IOMMU notifier will register for all events (as before
      of the patching)
    * Cosmetic changes, like:
      - Expand commit messages
      - Better naming and checks
      - Fix alignment issues
      - Avoid an already present casting from `void *`

RFC v8: Fix use of "tmp" notification in memory.c:memory_region_notify_iommu_one

v7: Add IOMMUTLBNotification, and move introduced "type" from
    IOMMUTLBEntry to the former.

v6: Introduce "type" field for IOMMUTLBEntry. Fill in all uses.
    Update tests reports with more fine-tuning (CPU, RPS/XPS tunning).

v5: Skip regular IOTLB notifications in dev_iotlb notifiers

v4: Rename IOMMU_NOTIFIER_IOTLB -> IOMMU_NOTIFIER_DEVIOTLB.
    Make vhost-net notifier just IOMMU_NOTIFIER_DEVIOTLB, not
    IOMMU_NOTIFIER_UNMAP

v3: Skip the assertion in case notifier is a IOTLB one, since they can manage
    arbitrary ranges. Using a flag in the notifier for now, as Peter suggested.

v2: Actually delete assertion instead of just commenting out using C99

Eugenio Pérez (5):
  memory: Rename memory_region_notify_one to
    memory_region_notify_iommu_one
  memory: Add IOMMUTLBEvent
  memory: Add IOMMU_NOTIFIER_DEVIOTLB_UNMAP IOMMUTLBNotificationType
  intel_iommu: Skip page walking on device iotlb invalidations
  memory: Skip bad range assertion if notifier is DEVIOTLB_UNMAP type

 include/exec/memory.h | 40 ++++++++++---------
 hw/arm/smmu-common.c  | 13 +++---
 hw/arm/smmuv3.c       | 13 +++---
 hw/i386/intel_iommu.c | 92 +++++++++++++++++++++++++------------------
 hw/misc/tz-mpc.c      | 32 ++++++++-------
 hw/ppc/spapr_iommu.c  | 15 +++----
 hw/virtio/vhost.c     |  2 +-
 softmmu/memory.c      | 31 +++++++++------
 8 files changed, 135 insertions(+), 103 deletions(-)

-- 
2.18.1


[PATCH 1/5] memory: Rename memory_region_notify_one to memory_region_notify_iommu_one
Posted by Eugenio Pérez 3 years, 7 months ago
Previous name didn't reflect the iommu operation.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Auger <eric.auger@redhat.com>
---
 include/exec/memory.h | 6 +++---
 hw/arm/smmu-common.c  | 2 +-
 hw/arm/smmuv3.c       | 2 +-
 hw/i386/intel_iommu.c | 4 ++--
 softmmu/memory.c      | 6 +++---
 5 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 0cfe987ab4..22c5f564d1 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -226,7 +226,7 @@ enum IOMMUMemoryRegionAttr {
  * The IOMMU implementation must use the IOMMU notifier infrastructure
  * to report whenever mappings are changed, by calling
  * memory_region_notify_iommu() (or, if necessary, by calling
- * memory_region_notify_one() for each registered notifier).
+ * memory_region_notify_iommu_one() for each registered notifier).
  *
  * Conceptually an IOMMU provides a mapping from input address
  * to an output TLB entry. If the IOMMU is aware of memory transaction
@@ -1274,7 +1274,7 @@ void memory_region_notify_iommu(IOMMUMemoryRegion *iommu_mr,
                                 IOMMUTLBEntry entry);
 
 /**
- * memory_region_notify_one: notify a change in an IOMMU translation
+ * memory_region_notify_iommu_one: notify a change in an IOMMU translation
  *                           entry to a single notifier
  *
  * This works just like memory_region_notify_iommu(), but it only
@@ -1285,7 +1285,7 @@ void memory_region_notify_iommu(IOMMUMemoryRegion *iommu_mr,
  *         replaces all old entries for the same virtual I/O address range.
  *         Deleted entries have .@perm == 0.
  */
-void memory_region_notify_one(IOMMUNotifier *notifier,
+void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
                               IOMMUTLBEntry *entry);
 
 /**
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 3838db1395..88d2c454f0 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -472,7 +472,7 @@ static void smmu_unmap_notifier_range(IOMMUNotifier *n)
     entry.perm = IOMMU_NONE;
     entry.addr_mask = n->end - n->start;
 
-    memory_region_notify_one(n, &entry);
+    memory_region_notify_iommu_one(n, &entry);
 }
 
 /* Unmap all notifiers attached to @mr */
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 0122700e72..0a893ae918 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -827,7 +827,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
     entry.addr_mask = num_pages * (1 << granule) - 1;
     entry.perm = IOMMU_NONE;
 
-    memory_region_notify_one(n, &entry);
+    memory_region_notify_iommu_one(n, &entry);
 }
 
 /* invalidate an asid/iova range tuple in all mr's */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 5284bb68b6..2ad6b9d796 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3498,7 +3498,7 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
         /* This field is meaningless for unmap */
         entry.translated_addr = 0;
 
-        memory_region_notify_one(n, &entry);
+        memory_region_notify_iommu_one(n, &entry);
 
         start += mask;
         remain -= mask;
@@ -3536,7 +3536,7 @@ static void vtd_address_space_refresh_all(IntelIOMMUState *s)
 
 static int vtd_replay_hook(IOMMUTLBEntry *entry, void *private)
 {
-    memory_region_notify_one((IOMMUNotifier *)private, entry);
+    memory_region_notify_iommu_one((IOMMUNotifier *)private, entry);
     return 0;
 }
 
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 70b93104e8..961c25b42f 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1890,8 +1890,8 @@ void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
     memory_region_update_iommu_notify_flags(iommu_mr, NULL);
 }
 
-void memory_region_notify_one(IOMMUNotifier *notifier,
-                              IOMMUTLBEntry *entry)
+void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
+                                    IOMMUTLBEntry *entry)
 {
     IOMMUNotifierFlag request_flags;
     hwaddr entry_end = entry->iova + entry->addr_mask;
@@ -1927,7 +1927,7 @@ void memory_region_notify_iommu(IOMMUMemoryRegion *iommu_mr,
 
     IOMMU_NOTIFIER_FOREACH(iommu_notifier, iommu_mr) {
         if (iommu_notifier->iommu_idx == iommu_idx) {
-            memory_region_notify_one(iommu_notifier, &entry);
+            memory_region_notify_iommu_one(iommu_notifier, &entry);
         }
     }
 }
-- 
2.18.1


[PATCH 2/5] memory: Add IOMMUTLBEvent
Posted by Eugenio Pérez 3 years, 7 months ago
This way we can tell between regular IOMMUTLBEntry (entry of IOMMU
hardware) and notifications.

In the notifications, we set explicitly if it is a MAPs or an UNMAP,
instead of trusting in entry permissions to differentiate them.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 include/exec/memory.h | 27 +++++++------
 hw/arm/smmu-common.c  | 13 ++++---
 hw/arm/smmuv3.c       | 13 ++++---
 hw/i386/intel_iommu.c | 88 ++++++++++++++++++++++++-------------------
 hw/misc/tz-mpc.c      | 32 +++++++++-------
 hw/ppc/spapr_iommu.c  | 15 ++++----
 softmmu/memory.c      | 20 +++++-----
 7 files changed, 111 insertions(+), 97 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 22c5f564d1..ab8d1ac9c9 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -106,6 +106,11 @@ struct IOMMUNotifier {
 };
 typedef struct IOMMUNotifier IOMMUNotifier;
 
+typedef struct IOMMUTLBEvent {
+    IOMMUNotifierFlag type;
+    IOMMUTLBEntry entry;
+} IOMMUTLBEvent;
+
 /* RAM is pre-allocated and passed into qemu_ram_alloc_from_ptr */
 #define RAM_PREALLOC   (1 << 0)
 
@@ -1254,24 +1259,18 @@ uint64_t memory_region_iommu_get_min_page_size(IOMMUMemoryRegion *iommu_mr);
 /**
  * memory_region_notify_iommu: notify a change in an IOMMU translation entry.
  *
- * The notification type will be decided by entry.perm bits:
- *
- * - For UNMAP (cache invalidation) notifies: set entry.perm to IOMMU_NONE.
- * - For MAP (newly added entry) notifies: set entry.perm to the
- *   permission of the page (which is definitely !IOMMU_NONE).
- *
  * Note: for any IOMMU implementation, an in-place mapping change
  * should be notified with an UNMAP followed by a MAP.
  *
  * @iommu_mr: the memory region that was changed
  * @iommu_idx: the IOMMU index for the translation table which has changed
- * @entry: the new entry in the IOMMU translation table.  The entry
- *         replaces all old entries for the same virtual I/O address range.
- *         Deleted entries have .@perm == 0.
+ * @event: TLB event with the new entry in the IOMMU translation table.
+ *         The entry replaces all old entries for the same virtual I/O address
+ *         range.
  */
 void memory_region_notify_iommu(IOMMUMemoryRegion *iommu_mr,
                                 int iommu_idx,
-                                IOMMUTLBEntry entry);
+                                IOMMUTLBEvent event);
 
 /**
  * memory_region_notify_iommu_one: notify a change in an IOMMU translation
@@ -1281,12 +1280,12 @@ void memory_region_notify_iommu(IOMMUMemoryRegion *iommu_mr,
  * notifies a specific notifier, not all of them.
  *
  * @notifier: the notifier to be notified
- * @entry: the new entry in the IOMMU translation table.  The entry
- *         replaces all old entries for the same virtual I/O address range.
- *         Deleted entries have .@perm == 0.
+ * @event: TLB event with the new entry in the IOMMU translation table.
+ *         The entry replaces all old entries for the same virtual I/O address
+ *         range.
  */
 void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
-                              IOMMUTLBEntry *entry);
+                                    IOMMUTLBEvent *event);
 
 /**
  * memory_region_register_iommu_notifier: register a notifier for changes to
diff --git a/hw/arm/smmu-common.c b/hw/arm/smmu-common.c
index 88d2c454f0..405d5c5325 100644
--- a/hw/arm/smmu-common.c
+++ b/hw/arm/smmu-common.c
@@ -465,14 +465,15 @@ IOMMUMemoryRegion *smmu_iommu_mr(SMMUState *s, uint32_t sid)
 /* Unmap the whole notifier's range */
 static void smmu_unmap_notifier_range(IOMMUNotifier *n)
 {
-    IOMMUTLBEntry entry;
+    IOMMUTLBEvent event;
 
-    entry.target_as = &address_space_memory;
-    entry.iova = n->start;
-    entry.perm = IOMMU_NONE;
-    entry.addr_mask = n->end - n->start;
+    event.type = IOMMU_NOTIFIER_UNMAP;
+    event.entry.target_as = &address_space_memory;
+    event.entry.iova = n->start;
+    event.entry.perm = IOMMU_NONE;
+    event.entry.addr_mask = n->end - n->start;
 
-    memory_region_notify_iommu_one(n, &entry);
+    memory_region_notify_iommu_one(n, &event);
 }
 
 /* Unmap all notifiers attached to @mr */
diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 0a893ae918..62b0e289ca 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -799,7 +799,7 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
                                uint8_t tg, uint64_t num_pages)
 {
     SMMUDevice *sdev = container_of(mr, SMMUDevice, iommu);
-    IOMMUTLBEntry entry;
+    IOMMUTLBEvent event;
     uint8_t granule = tg;
 
     if (!tg) {
@@ -822,12 +822,13 @@ static void smmuv3_notify_iova(IOMMUMemoryRegion *mr,
         granule = tt->granule_sz;
     }
 
-    entry.target_as = &address_space_memory;
-    entry.iova = iova;
-    entry.addr_mask = num_pages * (1 << granule) - 1;
-    entry.perm = IOMMU_NONE;
+    event.type = IOMMU_NOTIFIER_UNMAP;
+    event.entry.target_as = &address_space_memory;
+    event.entry.iova = iova;
+    event.entry.addr_mask = num_pages * (1 << granule) - 1;
+    event.entry.perm = IOMMU_NONE;
 
-    memory_region_notify_iommu_one(n, &entry);
+    memory_region_notify_iommu_one(n, &event);
 }
 
 /* invalidate an asid/iova range tuple in all mr's */
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 2ad6b9d796..19661bba3e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1073,7 +1073,7 @@ static int vtd_iova_to_slpte(IntelIOMMUState *s, VTDContextEntry *ce,
     }
 }
 
-typedef int (*vtd_page_walk_hook)(IOMMUTLBEntry *entry, void *private);
+typedef int (*vtd_page_walk_hook)(IOMMUTLBEvent *event, void *private);
 
 /**
  * Constant information used during page walking
@@ -1094,11 +1094,12 @@ typedef struct {
     uint16_t domain_id;
 } vtd_page_walk_info;
 
-static int vtd_page_walk_one(IOMMUTLBEntry *entry, vtd_page_walk_info *info)
+static int vtd_page_walk_one(IOMMUTLBEvent *event, vtd_page_walk_info *info)
 {
     VTDAddressSpace *as = info->as;
     vtd_page_walk_hook hook_fn = info->hook_fn;
     void *private = info->private;
+    IOMMUTLBEntry *entry = &event->entry;
     DMAMap target = {
         .iova = entry->iova,
         .size = entry->addr_mask,
@@ -1107,7 +1108,7 @@ static int vtd_page_walk_one(IOMMUTLBEntry *entry, vtd_page_walk_info *info)
     };
     DMAMap *mapped = iova_tree_find(as->iova_tree, &target);
 
-    if (entry->perm == IOMMU_NONE && !info->notify_unmap) {
+    if (event->type == IOMMU_NOTIFIER_UNMAP && !info->notify_unmap) {
         trace_vtd_page_walk_one_skip_unmap(entry->iova, entry->addr_mask);
         return 0;
     }
@@ -1115,7 +1116,7 @@ static int vtd_page_walk_one(IOMMUTLBEntry *entry, vtd_page_walk_info *info)
     assert(hook_fn);
 
     /* Update local IOVA mapped ranges */
-    if (entry->perm) {
+    if (event->type == IOMMU_NOTIFIER_MAP) {
         if (mapped) {
             /* If it's exactly the same translation, skip */
             if (!memcmp(mapped, &target, sizeof(target))) {
@@ -1141,19 +1142,21 @@ static int vtd_page_walk_one(IOMMUTLBEntry *entry, vtd_page_walk_info *info)
                 int ret;
 
                 /* Emulate an UNMAP */
+                event->type = IOMMU_NOTIFIER_UNMAP;
                 entry->perm = IOMMU_NONE;
                 trace_vtd_page_walk_one(info->domain_id,
                                         entry->iova,
                                         entry->translated_addr,
                                         entry->addr_mask,
                                         entry->perm);
-                ret = hook_fn(entry, private);
+                ret = hook_fn(event, private);
                 if (ret) {
                     return ret;
                 }
                 /* Drop any existing mapping */
                 iova_tree_remove(as->iova_tree, &target);
-                /* Recover the correct permission */
+                /* Recover the correct type */
+                event->type = IOMMU_NOTIFIER_MAP;
                 entry->perm = cache_perm;
             }
         }
@@ -1170,7 +1173,7 @@ static int vtd_page_walk_one(IOMMUTLBEntry *entry, vtd_page_walk_info *info)
     trace_vtd_page_walk_one(info->domain_id, entry->iova,
                             entry->translated_addr, entry->addr_mask,
                             entry->perm);
-    return hook_fn(entry, private);
+    return hook_fn(event, private);
 }
 
 /**
@@ -1191,7 +1194,7 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
     uint32_t offset;
     uint64_t slpte;
     uint64_t subpage_size, subpage_mask;
-    IOMMUTLBEntry entry;
+    IOMMUTLBEvent event;
     uint64_t iova = start;
     uint64_t iova_next;
     int ret = 0;
@@ -1245,13 +1248,15 @@ static int vtd_page_walk_level(dma_addr_t addr, uint64_t start,
              *
              * In either case, we send an IOTLB notification down.
              */
-            entry.target_as = &address_space_memory;
-            entry.iova = iova & subpage_mask;
-            entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
-            entry.addr_mask = ~subpage_mask;
+            event.entry.target_as = &address_space_memory;
+            event.entry.iova = iova & subpage_mask;
+            event.entry.perm = IOMMU_ACCESS_FLAG(read_cur, write_cur);
+            event.entry.addr_mask = ~subpage_mask;
             /* NOTE: this is only meaningful if entry_valid == true */
-            entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
-            ret = vtd_page_walk_one(&entry, info);
+            event.entry.translated_addr = vtd_get_slpte_addr(slpte, info->aw);
+            event.type = event.entry.perm ? IOMMU_NOTIFIER_MAP :
+                                            IOMMU_NOTIFIER_UNMAP;
+            ret = vtd_page_walk_one(&event, info);
         }
 
         if (ret < 0) {
@@ -1430,10 +1435,10 @@ static int vtd_dev_to_context_entry(IntelIOMMUState *s, uint8_t bus_num,
     return 0;
 }
 
-static int vtd_sync_shadow_page_hook(IOMMUTLBEntry *entry,
+static int vtd_sync_shadow_page_hook(IOMMUTLBEvent *event,
                                      void *private)
 {
-    memory_region_notify_iommu((IOMMUMemoryRegion *)private, 0, *entry);
+    memory_region_notify_iommu(private, 0, *event);
     return 0;
 }
 
@@ -1993,14 +1998,17 @@ static void vtd_iotlb_page_invalidate_notify(IntelIOMMUState *s,
                  * page tables.  We just deliver the PSI down to
                  * invalidate caches.
                  */
-                IOMMUTLBEntry entry = {
-                    .target_as = &address_space_memory,
-                    .iova = addr,
-                    .translated_addr = 0,
-                    .addr_mask = size - 1,
-                    .perm = IOMMU_NONE,
+                IOMMUTLBEvent event = {
+                    .type = IOMMU_NOTIFIER_UNMAP,
+                    .entry = {
+                        .target_as = &address_space_memory,
+                        .iova = addr,
+                        .translated_addr = 0,
+                        .addr_mask = size - 1,
+                        .perm = IOMMU_NONE,
+                    },
                 };
-                memory_region_notify_iommu(&vtd_as->iommu, 0, entry);
+                memory_region_notify_iommu(&vtd_as->iommu, 0, event);
             }
         }
     }
@@ -2412,7 +2420,7 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
                                           VTDInvDesc *inv_desc)
 {
     VTDAddressSpace *vtd_dev_as;
-    IOMMUTLBEntry entry;
+    IOMMUTLBEvent event;
     struct VTDBus *vtd_bus;
     hwaddr addr;
     uint64_t sz;
@@ -2460,12 +2468,13 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
         sz = VTD_PAGE_SIZE;
     }
 
-    entry.target_as = &vtd_dev_as->as;
-    entry.addr_mask = sz - 1;
-    entry.iova = addr;
-    entry.perm = IOMMU_NONE;
-    entry.translated_addr = 0;
-    memory_region_notify_iommu(&vtd_dev_as->iommu, 0, entry);
+    event.type = IOMMU_NOTIFIER_UNMAP;
+    event.entry.target_as = &vtd_dev_as->as;
+    event.entry.addr_mask = sz - 1;
+    event.entry.iova = addr;
+    event.entry.perm = IOMMU_NONE;
+    event.entry.translated_addr = 0;
+    memory_region_notify_iommu(&vtd_dev_as->iommu, 0, event);
 
 done:
     return true;
@@ -3486,19 +3495,20 @@ static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n)
     size = remain = end - start + 1;
 
     while (remain >= VTD_PAGE_SIZE) {
-        IOMMUTLBEntry entry;
+        IOMMUTLBEvent event;
         uint64_t mask = get_naturally_aligned_size(start, remain, s->aw_bits);
 
         assert(mask);
 
-        entry.iova = start;
-        entry.addr_mask = mask - 1;
-        entry.target_as = &address_space_memory;
-        entry.perm = IOMMU_NONE;
+        event.type = IOMMU_NOTIFIER_UNMAP;
+        event.entry.iova = start;
+        event.entry.addr_mask = mask - 1;
+        event.entry.target_as = &address_space_memory;
+        event.entry.perm = IOMMU_NONE;
         /* This field is meaningless for unmap */
-        entry.translated_addr = 0;
+        event.entry.translated_addr = 0;
 
-        memory_region_notify_iommu_one(n, &entry);
+        memory_region_notify_iommu_one(n, &event);
 
         start += mask;
         remain -= mask;
@@ -3534,9 +3544,9 @@ static void vtd_address_space_refresh_all(IntelIOMMUState *s)
     vtd_switch_address_space_all(s);
 }
 
-static int vtd_replay_hook(IOMMUTLBEntry *entry, void *private)
+static int vtd_replay_hook(IOMMUTLBEvent *event, void *private)
 {
-    memory_region_notify_iommu_one((IOMMUNotifier *)private, entry);
+    memory_region_notify_iommu_one(private, event);
     return 0;
 }
 
diff --git a/hw/misc/tz-mpc.c b/hw/misc/tz-mpc.c
index 98f151237f..30481e1c90 100644
--- a/hw/misc/tz-mpc.c
+++ b/hw/misc/tz-mpc.c
@@ -82,8 +82,10 @@ static void tz_mpc_iommu_notify(TZMPC *s, uint32_t lutidx,
     /* Called when the LUT word at lutidx has changed from oldlut to newlut;
      * must call the IOMMU notifiers for the changed blocks.
      */
-    IOMMUTLBEntry entry = {
-        .addr_mask = s->blocksize - 1,
+    IOMMUTLBEvent event = {
+        .entry = {
+            .addr_mask = s->blocksize - 1,
+        }
     };
     hwaddr addr = lutidx * s->blocksize * 32;
     int i;
@@ -100,26 +102,28 @@ static void tz_mpc_iommu_notify(TZMPC *s, uint32_t lutidx,
         block_is_ns = newlut & (1 << i);
 
         trace_tz_mpc_iommu_notify(addr);
-        entry.iova = addr;
-        entry.translated_addr = addr;
+        event.entry.iova = addr;
+        event.entry.translated_addr = addr;
 
-        entry.perm = IOMMU_NONE;
-        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_S, entry);
-        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_NS, entry);
+        event.type = IOMMU_NOTIFIER_UNMAP;
+        event.entry.perm = IOMMU_NONE;
+        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_S, event);
+        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_NS, event);
 
-        entry.perm = IOMMU_RW;
+        event.type = IOMMU_NOTIFIER_MAP;
+        event.entry.perm = IOMMU_RW;
         if (block_is_ns) {
-            entry.target_as = &s->blocked_io_as;
+            event.entry.target_as = &s->blocked_io_as;
         } else {
-            entry.target_as = &s->downstream_as;
+            event.entry.target_as = &s->downstream_as;
         }
-        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_S, entry);
+        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_S, event);
         if (block_is_ns) {
-            entry.target_as = &s->downstream_as;
+            event.entry.target_as = &s->downstream_as;
         } else {
-            entry.target_as = &s->blocked_io_as;
+            event.entry.target_as = &s->blocked_io_as;
         }
-        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_NS, entry);
+        memory_region_notify_iommu(&s->upstream, IOMMU_IDX_NS, event);
     }
 }
 
diff --git a/hw/ppc/spapr_iommu.c b/hw/ppc/spapr_iommu.c
index 0fecabc135..8bc3cff05d 100644
--- a/hw/ppc/spapr_iommu.c
+++ b/hw/ppc/spapr_iommu.c
@@ -445,7 +445,7 @@ static void spapr_tce_reset(DeviceState *dev)
 static target_ulong put_tce_emu(SpaprTceTable *tcet, target_ulong ioba,
                                 target_ulong tce)
 {
-    IOMMUTLBEntry entry;
+    IOMMUTLBEvent event;
     hwaddr page_mask = IOMMU_PAGE_MASK(tcet->page_shift);
     unsigned long index = (ioba - tcet->bus_offset) >> tcet->page_shift;
 
@@ -457,12 +457,13 @@ static target_ulong put_tce_emu(SpaprTceTable *tcet, target_ulong ioba,
 
     tcet->table[index] = tce;
 
-    entry.target_as = &address_space_memory,
-    entry.iova = (ioba - tcet->bus_offset) & page_mask;
-    entry.translated_addr = tce & page_mask;
-    entry.addr_mask = ~page_mask;
-    entry.perm = spapr_tce_iommu_access_flags(tce);
-    memory_region_notify_iommu(&tcet->iommu, 0, entry);
+    event.entry.target_as = &address_space_memory,
+    event.entry.iova = (ioba - tcet->bus_offset) & page_mask;
+    event.entry.translated_addr = tce & page_mask;
+    event.entry.addr_mask = ~page_mask;
+    event.entry.perm = spapr_tce_iommu_access_flags(tce);
+    event.type = entry.perm ? IOMMU_NOTIFIER_MAP : IOMMU_NOTIFIER_UNMAP;
+    memory_region_notify_iommu(&tcet->iommu, 0, event);
 
     return H_SUCCESS;
 }
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 961c25b42f..8694fc7cf7 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1891,11 +1891,15 @@ void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
 }
 
 void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
-                                    IOMMUTLBEntry *entry)
+                                    IOMMUTLBEvent *event)
 {
-    IOMMUNotifierFlag request_flags;
+    IOMMUTLBEntry *entry = &event->entry;
     hwaddr entry_end = entry->iova + entry->addr_mask;
 
+    if (event->type == IOMMU_NOTIFIER_UNMAP) {
+        assert(entry->perm == IOMMU_NONE);
+    }
+
     /*
      * Skip the notification if the notification does not overlap
      * with registered range.
@@ -1906,20 +1910,14 @@ void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
 
     assert(entry->iova >= notifier->start && entry_end <= notifier->end);
 
-    if (entry->perm & IOMMU_RW) {
-        request_flags = IOMMU_NOTIFIER_MAP;
-    } else {
-        request_flags = IOMMU_NOTIFIER_UNMAP;
-    }
-
-    if (notifier->notifier_flags & request_flags) {
+    if (event->type & notifier->notifier_flags) {
         notifier->notify(notifier, entry);
     }
 }
 
 void memory_region_notify_iommu(IOMMUMemoryRegion *iommu_mr,
                                 int iommu_idx,
-                                IOMMUTLBEntry entry)
+                                IOMMUTLBEvent event)
 {
     IOMMUNotifier *iommu_notifier;
 
@@ -1927,7 +1925,7 @@ void memory_region_notify_iommu(IOMMUMemoryRegion *iommu_mr,
 
     IOMMU_NOTIFIER_FOREACH(iommu_notifier, iommu_mr) {
         if (iommu_notifier->iommu_idx == iommu_idx) {
-            memory_region_notify_iommu_one(iommu_notifier, &entry);
+            memory_region_notify_iommu_one(iommu_notifier, &event);
         }
     }
 }
-- 
2.18.1


[PATCH 3/5] memory: Add IOMMU_NOTIFIER_DEVIOTLB_UNMAP IOMMUTLBNotificationType
Posted by Eugenio Pérez 3 years, 7 months ago
This allows us to differentiate between regular IOMMU map/unmap events
and DEVIOTLB unmap. Doing so, notifiers that only need device IOTLB
invalidations will not receive regular IOMMU unmappings.

Adapt intel and vhost to use it.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 include/exec/memory.h | 7 ++++++-
 hw/i386/intel_iommu.c | 2 +-
 hw/virtio/vhost.c     | 2 +-
 3 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index ab8d1ac9c9..77afeda60c 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -87,9 +87,14 @@ typedef enum {
     IOMMU_NOTIFIER_UNMAP = 0x1,
     /* Notify entry changes (newly created entries) */
     IOMMU_NOTIFIER_MAP = 0x2,
+    /* Notify changes on device IOTLB entries */
+    IOMMU_NOTIFIER_DEVIOTLB_UNMAP = 0x04,
 } IOMMUNotifierFlag;
 
-#define IOMMU_NOTIFIER_ALL (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
+#define IOMMU_NOTIFIER_IOTLB_EVENTS (IOMMU_NOTIFIER_MAP | IOMMU_NOTIFIER_UNMAP)
+#define IOMMU_NOTIFIER_DEVIOTLB_EVENTS IOMMU_NOTIFIER_DEVIOTLB_UNMAP
+#define IOMMU_NOTIFIER_ALL (IOMMU_NOTIFIER_IOTLB_EVENTS | \
+                            IOMMU_NOTIFIER_DEVIOTLB_EVENTS)
 
 struct IOMMUNotifier;
 typedef void (*IOMMUNotify)(struct IOMMUNotifier *notifier,
diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index 19661bba3e..ab6833d5a0 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -2468,7 +2468,7 @@ static bool vtd_process_device_iotlb_desc(IntelIOMMUState *s,
         sz = VTD_PAGE_SIZE;
     }
 
-    event.type = IOMMU_NOTIFIER_UNMAP;
+    event.type = IOMMU_NOTIFIER_DEVIOTLB_UNMAP;
     event.entry.target_as = &vtd_dev_as->as;
     event.entry.addr_mask = sz - 1;
     event.entry.iova = addr;
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 1a1384e7a6..1bf5bbc82e 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -729,7 +729,7 @@ static void vhost_iommu_region_add(MemoryListener *listener,
     iommu_idx = memory_region_iommu_attrs_to_index(iommu_mr,
                                                    MEMTXATTRS_UNSPECIFIED);
     iommu_notifier_init(&iommu->n, vhost_iommu_unmap_notify,
-                        IOMMU_NOTIFIER_UNMAP,
+                        IOMMU_NOTIFIER_DEVIOTLB_UNMAP,
                         section->offset_within_region,
                         int128_get64(end),
                         iommu_idx);
-- 
2.18.1


[PATCH 4/5] intel_iommu: Skip page walking on device iotlb invalidations
Posted by Eugenio Pérez 3 years, 7 months ago
Although they didn't reach the notifier because of the filtering in
memory_region_notify_iommu_one, the vt-d was still splitting huge
memory invalidations in chunks. Skipping it.

This improves performance in case of netperf with vhost-net:
* TCP_STREAM: From 1923.6Mbit/s to 2175.13Mbit/s (13%)
* TCP_RR: From 8464.73 trans/s to 8932.703333 trans/s (5.5%)
* UDP_RR: From 8562.08 trans/s to 9005.62/s (5.1%)
* UDP_STREAM: No change observed (insignificant 0.1% improvement)

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
---
 hw/i386/intel_iommu.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index ab6833d5a0..fbbda0c87e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -1478,6 +1478,10 @@ static int vtd_sync_shadow_page_table(VTDAddressSpace *vtd_as)
     VTDContextEntry ce;
     IOMMUNotifier *n;
 
+    if (!(vtd_as->iommu.iommu_notify_flags & IOMMU_NOTIFIER_IOTLB_EVENTS)) {
+        return 0;
+    }
+
     ret = vtd_dev_to_context_entry(vtd_as->iommu_state,
                                    pci_bus_num(vtd_as->bus),
                                    vtd_as->devfn, &ce);
-- 
2.18.1


Re: [PATCH 4/5] intel_iommu: Skip page walking on device iotlb invalidations
Posted by Peter Xu 3 years, 7 months ago
On Thu, Sep 03, 2020 at 06:14:45PM +0200, Eugenio Pérez wrote:
> Although they didn't reach the notifier because of the filtering in
> memory_region_notify_iommu_one, the vt-d was still splitting huge
> memory invalidations in chunks. Skipping it.
> 
> This improves performance in case of netperf with vhost-net:
> * TCP_STREAM: From 1923.6Mbit/s to 2175.13Mbit/s (13%)
> * TCP_RR: From 8464.73 trans/s to 8932.703333 trans/s (5.5%)
> * UDP_RR: From 8562.08 trans/s to 9005.62/s (5.1%)
> * UDP_STREAM: No change observed (insignificant 0.1% improvement)
> 
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>

Reviewed-by: Peter Xu <peterx@redhat.com>

-- 
Peter Xu


[PATCH 5/5] memory: Skip bad range assertion if notifier is DEVIOTLB_UNMAP type
Posted by Eugenio Pérez 3 years, 7 months ago
Device IOTLB invalidations can unmap arbitrary ranges, eiter outside of
the memory region or even [0, ~0ULL] for all the space. The assertion
could be hit by a guest, and rhel7 guest effectively hit it.

Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
Reviewed-by: Peter Xu <peterx@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
---
 softmmu/memory.c | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index 8694fc7cf7..e723fcbaa1 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -1895,6 +1895,7 @@ void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
 {
     IOMMUTLBEntry *entry = &event->entry;
     hwaddr entry_end = entry->iova + entry->addr_mask;
+    IOMMUTLBEntry tmp = *entry;
 
     if (event->type == IOMMU_NOTIFIER_UNMAP) {
         assert(entry->perm == IOMMU_NONE);
@@ -1908,10 +1909,18 @@ void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
         return;
     }
 
-    assert(entry->iova >= notifier->start && entry_end <= notifier->end);
+    if (notifier->notifier_flags & IOMMU_NOTIFIER_DEVIOTLB_UNMAP) {
+        /* Crop (iova, addr_mask) to range */
+        tmp.iova = MAX(tmp.iova, notifier->start);
+        tmp.addr_mask = MIN(entry_end, notifier->end) - tmp.iova;
+        /* Confirm no underflow */
+        assert(MIN(entry_end, notifier->end) >= tmp.iova);
+    } else {
+        assert(entry->iova >= notifier->start && entry_end <= notifier->end);
+    }
 
     if (event->type & notifier->notifier_flags) {
-        notifier->notify(notifier, entry);
+        notifier->notify(notifier, &tmp);
     }
 }
 
-- 
2.18.1


Re: [PATCH 5/5] memory: Skip bad range assertion if notifier is DEVIOTLB_UNMAP type
Posted by Jason Wang 3 years, 7 months ago
On 2020/9/4 上午12:14, Eugenio Pérez wrote:
> Device IOTLB invalidations can unmap arbitrary ranges, eiter outside of
> the memory region or even [0, ~0ULL] for all the space. The assertion
> could be hit by a guest, and rhel7 guest effectively hit it.
>
> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> Reviewed-by: Peter Xu <peterx@redhat.com>
> Reviewed-by: Juan Quintela <quintela@redhat.com>
> ---
>   softmmu/memory.c | 13 +++++++++++--
>   1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/softmmu/memory.c b/softmmu/memory.c
> index 8694fc7cf7..e723fcbaa1 100644
> --- a/softmmu/memory.c
> +++ b/softmmu/memory.c
> @@ -1895,6 +1895,7 @@ void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
>   {
>       IOMMUTLBEntry *entry = &event->entry;
>       hwaddr entry_end = entry->iova + entry->addr_mask;
> +    IOMMUTLBEntry tmp = *entry;
>   
>       if (event->type == IOMMU_NOTIFIER_UNMAP) {
>           assert(entry->perm == IOMMU_NONE);
> @@ -1908,10 +1909,18 @@ void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
>           return;
>       }
>   
> -    assert(entry->iova >= notifier->start && entry_end <= notifier->end);
> +    if (notifier->notifier_flags & IOMMU_NOTIFIER_DEVIOTLB_UNMAP) {
> +        /* Crop (iova, addr_mask) to range */
> +        tmp.iova = MAX(tmp.iova, notifier->start);
> +        tmp.addr_mask = MIN(entry_end, notifier->end) - tmp.iova;
> +        /* Confirm no underflow */
> +        assert(MIN(entry_end, notifier->end) >= tmp.iova);


It's still not clear to me why we need such assert. Consider 
notifier->end is the possible IOVA range but not possible device IOTLB 
invalidation range (e.g it allows [0, ULLONG_MAX]).

Thanks


> +    } else {
> +        assert(entry->iova >= notifier->start && entry_end <= notifier->end);
> +    }
>   
>       if (event->type & notifier->notifier_flags) {
> -        notifier->notify(notifier, entry);
> +        notifier->notify(notifier, &tmp);
>       }
>   }
>   


Re: [PATCH 5/5] memory: Skip bad range assertion if notifier is DEVIOTLB_UNMAP type
Posted by Eugenio Perez Martin 3 years, 7 months ago
On Fri, Sep 4, 2020 at 6:34 AM Jason Wang <jasowang@redhat.com> wrote:
>
>
> On 2020/9/4 上午12:14, Eugenio Pérez wrote:
> > Device IOTLB invalidations can unmap arbitrary ranges, eiter outside of
> > the memory region or even [0, ~0ULL] for all the space. The assertion
> > could be hit by a guest, and rhel7 guest effectively hit it.
> >
> > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > Reviewed-by: Peter Xu <peterx@redhat.com>
> > Reviewed-by: Juan Quintela <quintela@redhat.com>
> > ---
> >   softmmu/memory.c | 13 +++++++++++--
> >   1 file changed, 11 insertions(+), 2 deletions(-)
> >
> > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > index 8694fc7cf7..e723fcbaa1 100644
> > --- a/softmmu/memory.c
> > +++ b/softmmu/memory.c
> > @@ -1895,6 +1895,7 @@ void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
> >   {
> >       IOMMUTLBEntry *entry = &event->entry;
> >       hwaddr entry_end = entry->iova + entry->addr_mask;
> > +    IOMMUTLBEntry tmp = *entry;
> >
> >       if (event->type == IOMMU_NOTIFIER_UNMAP) {
> >           assert(entry->perm == IOMMU_NONE);
> > @@ -1908,10 +1909,18 @@ void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
> >           return;
> >       }
> >
> > -    assert(entry->iova >= notifier->start && entry_end <= notifier->end);
> > +    if (notifier->notifier_flags & IOMMU_NOTIFIER_DEVIOTLB_UNMAP) {
> > +        /* Crop (iova, addr_mask) to range */
> > +        tmp.iova = MAX(tmp.iova, notifier->start);
> > +        tmp.addr_mask = MIN(entry_end, notifier->end) - tmp.iova;
> > +        /* Confirm no underflow */
> > +        assert(MIN(entry_end, notifier->end) >= tmp.iova);
>
>
> It's still not clear to me why we need such assert. Consider
> notifier->end is the possible IOVA range but not possible device IOTLB
> invalidation range (e.g it allows [0, ULLONG_MAX]).
>
> Thanks
>

As far as I understood the device should admit that out of bounds
notifications in that case,
and the assert just makes sure that there was no underflow in
tmp.addr_mask, i.e., that something
very wrong that should never happen in production happened.

Peter, would you mind to confirm/correct it?

Is there anything else needed to pull this patch?

Thanks!

>
> > +    } else {
> > +        assert(entry->iova >= notifier->start && entry_end <= notifier->end);
> > +    }
> >
> >       if (event->type & notifier->notifier_flags) {
> > -        notifier->notify(notifier, entry);
> > +        notifier->notify(notifier, &tmp);
> >       }
> >   }
> >
>


Re: [PATCH 5/5] memory: Skip bad range assertion if notifier is DEVIOTLB_UNMAP type
Posted by Peter Xu 3 years, 7 months ago
On Mon, Sep 28, 2020 at 11:05:01AM +0200, Eugenio Perez Martin wrote:
> On Fri, Sep 4, 2020 at 6:34 AM Jason Wang <jasowang@redhat.com> wrote:
> >
> >
> > On 2020/9/4 上午12:14, Eugenio Pérez wrote:
> > > Device IOTLB invalidations can unmap arbitrary ranges, eiter outside of
> > > the memory region or even [0, ~0ULL] for all the space. The assertion
> > > could be hit by a guest, and rhel7 guest effectively hit it.
> > >
> > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > Reviewed-by: Peter Xu <peterx@redhat.com>
> > > Reviewed-by: Juan Quintela <quintela@redhat.com>
> > > ---
> > >   softmmu/memory.c | 13 +++++++++++--
> > >   1 file changed, 11 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > > index 8694fc7cf7..e723fcbaa1 100644
> > > --- a/softmmu/memory.c
> > > +++ b/softmmu/memory.c
> > > @@ -1895,6 +1895,7 @@ void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
> > >   {
> > >       IOMMUTLBEntry *entry = &event->entry;
> > >       hwaddr entry_end = entry->iova + entry->addr_mask;
> > > +    IOMMUTLBEntry tmp = *entry;
> > >
> > >       if (event->type == IOMMU_NOTIFIER_UNMAP) {
> > >           assert(entry->perm == IOMMU_NONE);
> > > @@ -1908,10 +1909,18 @@ void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
> > >           return;
> > >       }
> > >
> > > -    assert(entry->iova >= notifier->start && entry_end <= notifier->end);
> > > +    if (notifier->notifier_flags & IOMMU_NOTIFIER_DEVIOTLB_UNMAP) {
> > > +        /* Crop (iova, addr_mask) to range */
> > > +        tmp.iova = MAX(tmp.iova, notifier->start);
> > > +        tmp.addr_mask = MIN(entry_end, notifier->end) - tmp.iova;
> > > +        /* Confirm no underflow */
> > > +        assert(MIN(entry_end, notifier->end) >= tmp.iova);
> >
> >
> > It's still not clear to me why we need such assert. Consider
> > notifier->end is the possible IOVA range but not possible device IOTLB
> > invalidation range (e.g it allows [0, ULLONG_MAX]).
> >
> > Thanks
> >
> 
> As far as I understood the device should admit that out of bounds
> notifications in that case,
> and the assert just makes sure that there was no underflow in
> tmp.addr_mask, i.e., that something
> very wrong that should never happen in production happened.
> 
> Peter, would you mind to confirm/correct it?

I think Jason is right - since we have checked at the entry that the two
regions cross over each other:

    /*
     * Skip the notification if the notification does not overlap
     * with registered range.
     */
    if (notifier->start > entry_end || notifier->end < entry->iova) {
        return;
    }

Then I don't see how this assertion can fail any more.

But imho not a big problem either, and it shouldn't hurt to even keep the
assertion of above isn't that straightforward.

> 
> Is there anything else needed to pull this patch?

I didn't post a pull for this only because I shouldn't :) - the plan was that
all vt-d patches will still go via Michael's tree, iiuc.  Though at least to me
I think this series is acceptable for merging.

Though it would always be good too if Jason would still like to review it.

Jason, what's your opinion?

Thanks,

-- 
Peter Xu


Re: [PATCH 5/5] memory: Skip bad range assertion if notifier is DEVIOTLB_UNMAP type
Posted by Michael S. Tsirkin 3 years, 6 months ago
On Mon, Sep 28, 2020 at 01:48:57PM -0400, Peter Xu wrote:
> On Mon, Sep 28, 2020 at 11:05:01AM +0200, Eugenio Perez Martin wrote:
> > On Fri, Sep 4, 2020 at 6:34 AM Jason Wang <jasowang@redhat.com> wrote:
> > >
> > >
> > > On 2020/9/4 上午12:14, Eugenio Pérez wrote:
> > > > Device IOTLB invalidations can unmap arbitrary ranges, eiter outside of
> > > > the memory region or even [0, ~0ULL] for all the space. The assertion
> > > > could be hit by a guest, and rhel7 guest effectively hit it.
> > > >
> > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > Reviewed-by: Peter Xu <peterx@redhat.com>
> > > > Reviewed-by: Juan Quintela <quintela@redhat.com>
> > > > ---
> > > >   softmmu/memory.c | 13 +++++++++++--
> > > >   1 file changed, 11 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > > > index 8694fc7cf7..e723fcbaa1 100644
> > > > --- a/softmmu/memory.c
> > > > +++ b/softmmu/memory.c
> > > > @@ -1895,6 +1895,7 @@ void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
> > > >   {
> > > >       IOMMUTLBEntry *entry = &event->entry;
> > > >       hwaddr entry_end = entry->iova + entry->addr_mask;
> > > > +    IOMMUTLBEntry tmp = *entry;
> > > >
> > > >       if (event->type == IOMMU_NOTIFIER_UNMAP) {
> > > >           assert(entry->perm == IOMMU_NONE);
> > > > @@ -1908,10 +1909,18 @@ void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
> > > >           return;
> > > >       }
> > > >
> > > > -    assert(entry->iova >= notifier->start && entry_end <= notifier->end);
> > > > +    if (notifier->notifier_flags & IOMMU_NOTIFIER_DEVIOTLB_UNMAP) {
> > > > +        /* Crop (iova, addr_mask) to range */
> > > > +        tmp.iova = MAX(tmp.iova, notifier->start);
> > > > +        tmp.addr_mask = MIN(entry_end, notifier->end) - tmp.iova;
> > > > +        /* Confirm no underflow */
> > > > +        assert(MIN(entry_end, notifier->end) >= tmp.iova);
> > >
> > >
> > > It's still not clear to me why we need such assert. Consider
> > > notifier->end is the possible IOVA range but not possible device IOTLB
> > > invalidation range (e.g it allows [0, ULLONG_MAX]).
> > >
> > > Thanks
> > >
> > 
> > As far as I understood the device should admit that out of bounds
> > notifications in that case,
> > and the assert just makes sure that there was no underflow in
> > tmp.addr_mask, i.e., that something
> > very wrong that should never happen in production happened.
> > 
> > Peter, would you mind to confirm/correct it?
> 
> I think Jason is right - since we have checked at the entry that the two
> regions cross over each other:
> 
>     /*
>      * Skip the notification if the notification does not overlap
>      * with registered range.
>      */
>     if (notifier->start > entry_end || notifier->end < entry->iova) {
>         return;
>     }
> 
> Then I don't see how this assertion can fail any more.
> 
> But imho not a big problem either, and it shouldn't hurt to even keep the
> assertion of above isn't that straightforward.
> 
> > 
> > Is there anything else needed to pull this patch?
> 
> I didn't post a pull for this only because I shouldn't :) - the plan was that
> all vt-d patches will still go via Michael's tree, iiuc.  Though at least to me
> I think this series is acceptable for merging.

Sure, that's ok.

Eugenio, you sent patch 0 as a response to another series, which
made me miss the series. Pls don't do that in the future.

Looks like Jason reviewed the series - Jason, is that right? -
so I'd like his ack if possible.


> Though it would always be good too if Jason would still like to review it.
> 
> Jason, what's your opinion?
> 
> Thanks,
> 
> -- 
> Peter Xu


Re: [PATCH 5/5] memory: Skip bad range assertion if notifier is DEVIOTLB_UNMAP type
Posted by Jason Wang 3 years, 6 months ago
On 2020/10/4 上午1:38, Michael S. Tsirkin wrote:
> On Mon, Sep 28, 2020 at 01:48:57PM -0400, Peter Xu wrote:
>> On Mon, Sep 28, 2020 at 11:05:01AM +0200, Eugenio Perez Martin wrote:
>>> On Fri, Sep 4, 2020 at 6:34 AM Jason Wang <jasowang@redhat.com> wrote:
>>>>
>>>> On 2020/9/4 上午12:14, Eugenio Pérez wrote:
>>>>> Device IOTLB invalidations can unmap arbitrary ranges, eiter outside of
>>>>> the memory region or even [0, ~0ULL] for all the space. The assertion
>>>>> could be hit by a guest, and rhel7 guest effectively hit it.
>>>>>
>>>>> Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
>>>>> Reviewed-by: Peter Xu <peterx@redhat.com>
>>>>> Reviewed-by: Juan Quintela <quintela@redhat.com>
>>>>> ---
>>>>>    softmmu/memory.c | 13 +++++++++++--
>>>>>    1 file changed, 11 insertions(+), 2 deletions(-)
>>>>>
>>>>> diff --git a/softmmu/memory.c b/softmmu/memory.c
>>>>> index 8694fc7cf7..e723fcbaa1 100644
>>>>> --- a/softmmu/memory.c
>>>>> +++ b/softmmu/memory.c
>>>>> @@ -1895,6 +1895,7 @@ void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
>>>>>    {
>>>>>        IOMMUTLBEntry *entry = &event->entry;
>>>>>        hwaddr entry_end = entry->iova + entry->addr_mask;
>>>>> +    IOMMUTLBEntry tmp = *entry;
>>>>>
>>>>>        if (event->type == IOMMU_NOTIFIER_UNMAP) {
>>>>>            assert(entry->perm == IOMMU_NONE);
>>>>> @@ -1908,10 +1909,18 @@ void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
>>>>>            return;
>>>>>        }
>>>>>
>>>>> -    assert(entry->iova >= notifier->start && entry_end <= notifier->end);
>>>>> +    if (notifier->notifier_flags & IOMMU_NOTIFIER_DEVIOTLB_UNMAP) {
>>>>> +        /* Crop (iova, addr_mask) to range */
>>>>> +        tmp.iova = MAX(tmp.iova, notifier->start);
>>>>> +        tmp.addr_mask = MIN(entry_end, notifier->end) - tmp.iova;
>>>>> +        /* Confirm no underflow */
>>>>> +        assert(MIN(entry_end, notifier->end) >= tmp.iova);
>>>>
>>>> It's still not clear to me why we need such assert. Consider
>>>> notifier->end is the possible IOVA range but not possible device IOTLB
>>>> invalidation range (e.g it allows [0, ULLONG_MAX]).
>>>>
>>>> Thanks
>>>>
>>> As far as I understood the device should admit that out of bounds
>>> notifications in that case,
>>> and the assert just makes sure that there was no underflow in
>>> tmp.addr_mask, i.e., that something
>>> very wrong that should never happen in production happened.
>>>
>>> Peter, would you mind to confirm/correct it?
>> I think Jason is right - since we have checked at the entry that the two
>> regions cross over each other:
>>
>>      /*
>>       * Skip the notification if the notification does not overlap
>>       * with registered range.
>>       */
>>      if (notifier->start > entry_end || notifier->end < entry->iova) {
>>          return;
>>      }
>>
>> Then I don't see how this assertion can fail any more.
>>
>> But imho not a big problem either, and it shouldn't hurt to even keep the
>> assertion of above isn't that straightforward.
>>
>>> Is there anything else needed to pull this patch?
>> I didn't post a pull for this only because I shouldn't :) - the plan was that
>> all vt-d patches will still go via Michael's tree, iiuc.  Though at least to me
>> I think this series is acceptable for merging.
> Sure, that's ok.
>
> Eugenio, you sent patch 0 as a response to another series, which
> made me miss the series. Pls don't do that in the future.
>
> Looks like Jason reviewed the series - Jason, is that right? -
> so I'd like his ack if possible.


Right.

Euenio. If it's possible I would prefer to remove the assertion but it's 
ok it you leave it.

And please repost the series.

Thanks


>
>
>> Though it would always be good too if Jason would still like to review it.
>>
>> Jason, what's your opinion?
>>
>> Thanks,
>>
>> -- 
>> Peter Xu
>


Re: [PATCH 5/5] memory: Skip bad range assertion if notifier is DEVIOTLB_UNMAP type
Posted by Eugenio Perez Martin 3 years, 6 months ago
On Sat, Oct 3, 2020 at 7:38 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 28, 2020 at 01:48:57PM -0400, Peter Xu wrote:
> > On Mon, Sep 28, 2020 at 11:05:01AM +0200, Eugenio Perez Martin wrote:
> > > On Fri, Sep 4, 2020 at 6:34 AM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > >
> > > > On 2020/9/4 上午12:14, Eugenio Pérez wrote:
> > > > > Device IOTLB invalidations can unmap arbitrary ranges, eiter outside of
> > > > > the memory region or even [0, ~0ULL] for all the space. The assertion
> > > > > could be hit by a guest, and rhel7 guest effectively hit it.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez <eperezma@redhat.com>
> > > > > Reviewed-by: Peter Xu <peterx@redhat.com>
> > > > > Reviewed-by: Juan Quintela <quintela@redhat.com>
> > > > > ---
> > > > >   softmmu/memory.c | 13 +++++++++++--
> > > > >   1 file changed, 11 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/softmmu/memory.c b/softmmu/memory.c
> > > > > index 8694fc7cf7..e723fcbaa1 100644
> > > > > --- a/softmmu/memory.c
> > > > > +++ b/softmmu/memory.c
> > > > > @@ -1895,6 +1895,7 @@ void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
> > > > >   {
> > > > >       IOMMUTLBEntry *entry = &event->entry;
> > > > >       hwaddr entry_end = entry->iova + entry->addr_mask;
> > > > > +    IOMMUTLBEntry tmp = *entry;
> > > > >
> > > > >       if (event->type == IOMMU_NOTIFIER_UNMAP) {
> > > > >           assert(entry->perm == IOMMU_NONE);
> > > > > @@ -1908,10 +1909,18 @@ void memory_region_notify_iommu_one(IOMMUNotifier *notifier,
> > > > >           return;
> > > > >       }
> > > > >
> > > > > -    assert(entry->iova >= notifier->start && entry_end <= notifier->end);
> > > > > +    if (notifier->notifier_flags & IOMMU_NOTIFIER_DEVIOTLB_UNMAP) {
> > > > > +        /* Crop (iova, addr_mask) to range */
> > > > > +        tmp.iova = MAX(tmp.iova, notifier->start);
> > > > > +        tmp.addr_mask = MIN(entry_end, notifier->end) - tmp.iova;
> > > > > +        /* Confirm no underflow */
> > > > > +        assert(MIN(entry_end, notifier->end) >= tmp.iova);
> > > >
> > > >
> > > > It's still not clear to me why we need such assert. Consider
> > > > notifier->end is the possible IOVA range but not possible device IOTLB
> > > > invalidation range (e.g it allows [0, ULLONG_MAX]).
> > > >
> > > > Thanks
> > > >
> > >
> > > As far as I understood the device should admit that out of bounds
> > > notifications in that case,
> > > and the assert just makes sure that there was no underflow in
> > > tmp.addr_mask, i.e., that something
> > > very wrong that should never happen in production happened.
> > >
> > > Peter, would you mind to confirm/correct it?
> >
> > I think Jason is right - since we have checked at the entry that the two
> > regions cross over each other:
> >
> >     /*
> >      * Skip the notification if the notification does not overlap
> >      * with registered range.
> >      */
> >     if (notifier->start > entry_end || notifier->end < entry->iova) {
> >         return;
> >     }
> >
> > Then I don't see how this assertion can fail any more.
> >
> > But imho not a big problem either, and it shouldn't hurt to even keep the
> > assertion of above isn't that straightforward.
> >
> > >
> > > Is there anything else needed to pull this patch?
> >
> > I didn't post a pull for this only because I shouldn't :) - the plan was that
> > all vt-d patches will still go via Michael's tree, iiuc.  Though at least to me
> > I think this series is acceptable for merging.
>
> Sure, that's ok.
>
> Eugenio, you sent patch 0 as a response to another series, which
> made me miss the series. Pls don't do that in the future.
>

Sorry, noted for the next time.

Thanks!

> Looks like Jason reviewed the series - Jason, is that right? -
> so I'd like his ack if possible.
>
>
> > Though it would always be good too if Jason would still like to review it.
> >
> > Jason, what's your opinion?
> >
> > Thanks,
> >
> > --
> > Peter Xu
>