[PATCH v2 00/22] intel_iommu: expose Shared Virtual Addressing to VMs

Liu Yi L posted 22 patches 4 years ago
Test docker-mingw@fedora failed
Test docker-quick@centos7 passed
Test checkpatch passed
Test FreeBSD passed
Test asan passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/1585542301-84087-1-git-send-email-yi.l.liu@intel.com
Maintainers: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>, Eric Auger <eric.auger@redhat.com>, Richard Henderson <rth@twiddle.net>, Matthew Rosato <mjrosato@linux.ibm.com>, BALATON Zoltan <balaton@eik.bme.hu>, Halil Pasic <pasic@linux.ibm.com>, Cornelia Huck <cohuck@redhat.com>, Christian Borntraeger <borntraeger@de.ibm.com>, Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>, Helge Deller <deller@gmx.de>, "Michael S. Tsirkin" <mst@redhat.com>, Alex Williamson <alex.williamson@redhat.com>, Peter Maydell <peter.maydell@linaro.org>, David Hildenbrand <david@redhat.com>, Andrey Smirnov <andrew.smirnov@gmail.com>, Paolo Bonzini <pbonzini@redhat.com>, "Hervé Poussineau" <hpoussin@reactos.org>, David Gibson <david@gibson.dropbear.id.au>, Eduardo Habkost <ehabkost@redhat.com>
There is a newer version of this series
hw/Makefile.objs                      |    1 +
hw/alpha/typhoon.c                    |    6 +-
hw/arm/smmu-common.c                  |    6 +-
hw/hppa/dino.c                        |    6 +-
hw/i386/amd_iommu.c                   |    6 +-
hw/i386/intel_iommu.c                 | 1109 ++++++++++++++++++++++++++++++++-
hw/i386/intel_iommu_internal.h        |  114 ++++
hw/i386/trace-events                  |    6 +
hw/iommu/Makefile.objs                |    1 +
hw/iommu/host_iommu_context.c         |  161 +++++
hw/pci-host/designware.c              |    6 +-
hw/pci-host/pnv_phb3.c                |    6 +-
hw/pci-host/pnv_phb4.c                |    6 +-
hw/pci-host/ppce500.c                 |    6 +-
hw/pci-host/prep.c                    |    6 +-
hw/pci-host/sabre.c                   |    6 +-
hw/pci/pci.c                          |   53 +-
hw/ppc/ppc440_pcix.c                  |    6 +-
hw/ppc/spapr_pci.c                    |    6 +-
hw/s390x/s390-pci-bus.c               |    8 +-
hw/vfio/common.c                      |  260 +++++++-
hw/vfio/pci.c                         |   13 +
hw/virtio/virtio-iommu.c              |    6 +-
include/hw/i386/intel_iommu.h         |   57 +-
include/hw/iommu/host_iommu_context.h |  116 ++++
include/hw/pci/pci.h                  |   18 +-
include/hw/pci/pci_bus.h              |    2 +-
include/hw/vfio/vfio-common.h         |    4 +
linux-headers/linux/iommu.h           |  378 +++++++++++
linux-headers/linux/vfio.h            |  127 ++++
scripts/update-linux-headers.sh       |    2 +-
31 files changed, 2463 insertions(+), 45 deletions(-)
create mode 100644 hw/iommu/Makefile.objs
create mode 100644 hw/iommu/host_iommu_context.c
create mode 100644 include/hw/iommu/host_iommu_context.h
create mode 100644 linux-headers/linux/iommu.h
[PATCH v2 00/22] intel_iommu: expose Shared Virtual Addressing to VMs
Posted by Liu Yi L 4 years ago
Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
Intel platforms allows address space sharing between device DMA and
applications. SVA can reduce programming complexity and enhance security.

This QEMU series is intended to expose SVA usage to VMs. i.e. Sharing
guest application address space with passthru devices. This is called
vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
changes.

The high-level architecture for SVA virtualization is as below, the key
design of vSVA support is to utilize the dual-stage IOMMU translation (
also known as IOMMU nesting translation) capability in host IOMMU.

    .-------------.  .---------------------------.
    |   vIOMMU    |  | Guest process CR3, FL only|
    |             |  '---------------------------'
    .----------------/
    | PASID Entry |--- PASID cache flush -
    '-------------'                       |
    |             |                       V
    |             |                CR3 in GPA
    '-------------'
Guest
------| Shadow |--------------------------|--------
      v        v                          v
Host
    .-------------.  .----------------------.
    |   pIOMMU    |  | Bind FL for GVA-GPA  |
    |             |  '----------------------'
    .----------------/  |
    | PASID Entry |     V (Nested xlate)
    '----------------\.------------------------------.
    |             |   |SL for GPA-HPA, default domain|
    |             |   '------------------------------'
    '-------------'
Where:
 - FL = First level/stage one page tables
 - SL = Second level/stage two page tables

The complete vSVA kernel upstream patches are divided into three phases:
    1. Common APIs and PCI device direct assignment
    2. IOMMU-backed Mediated Device assignment
    3. Page Request Services (PRS) support

This QEMU patchset is aiming for the phase 1 and phase 2. It is based
on the two kernel series below.
[1] [PATCH V10 00/11] Nested Shared Virtual Address (SVA) VT-d support:
https://lkml.org/lkml/2020/3/20/1172
[2] [PATCH v1 0/8] vfio: expose virtual Shared Virtual Addressing to VMs
https://lkml.org/lkml/2020/3/22/116

There are roughly two parts:
 1. Introduce HostIOMMUContext as abstract of host IOMMU. It provides explicit
    method for vIOMMU emulators to communicate with host IOMMU. e.g. propagate
    guest page table binding to host IOMMU to setup dual-stage DMA translation
    in host IOMMU and flush iommu iotlb.
 2. Setup dual-stage IOMMU translation for Intel vIOMMU. Includes 
    - Check IOMMU uAPI version compatibility and VFIO Nesting capabilities which
      includes hardware compatibility (stage 1 format) and VFIO_PASID_REQ
      availability. This is preparation for setting up dual-stage DMA translation
      in host IOMMU.
    - Propagate guest PASID allocation and free request to host.
    - Propagate guest page table binding to host to setup dual-stage IOMMU DMA
      translation in host IOMMU.
    - Propagate guest IOMMU cache invalidation to host to ensure iotlb
      correctness.

The complete QEMU set can be found in below link:
https://github.com/luxis1999/qemu.git: sva_vtd_v10_v2

Complete kernel can be found in:
https://github.com/luxis1999/linux-vsva.git: vsva-linux-5.6-rc6

Tests: basci vSVA functionality test, VM reboot/shutdown/crash, kernel build in
guest, boot VM with vSVA disabled, full comapilation with all archs.

Regards,
Yi Liu

Changelog:
	- Patch v1 -> Patch v2:
	  a) Refactor the vfio HostIOMMUContext init code (patch 0008 - 0009 of v1 series)
	  b) Refactor the pasid binding handling (patch 0011 - 0016 of v1 series)
	  Patch v1: https://patchwork.ozlabs.org/cover/1259648/

	- RFC v3.1 -> Patch v1:
	  a) Implement HostIOMMUContext in QOM manner.
	  b) Add pci_set/unset_iommu_context() to register HostIOMMUContext to
	     vIOMMU, thus the lifecircle of HostIOMMUContext is awared in vIOMMU
	     side. In such way, vIOMMU could use the methods provided by the
	     HostIOMMUContext safely.
	  c) Add back patch "[RFC v3 01/25] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps"
	  RFCv3.1: https://patchwork.kernel.org/cover/11397879/

	- RFC v3 -> v3.1:
	  a) Drop IOMMUContext, and rename DualStageIOMMUObject to HostIOMMUContext.
	     HostIOMMUContext is per-vfio-container, it is exposed to  vIOMMU via PCI
	     layer. VFIO registers a PCIHostIOMMUFunc callback to PCI layer, vIOMMU
	     could get HostIOMMUContext instance via it.
	  b) Check IOMMU uAPI version by VFIO_CHECK_EXTENSION
	  c) Add a check on VFIO_PASID_REQ availability via VFIO_GET_IOMMU_IHNFO
	  d) Reorder the series, put vSVA linux header file update in the beginning
	     put the x-scalable-mode option mofification in the end of the series.
	  e) Dropped patch "[RFC v3 01/25] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps"
	  RFCv3: https://patchwork.kernel.org/cover/11356033/

	- RFC v2 -> v3:
	  a) Introduce DualStageIOMMUObject to abstract the host IOMMU programming
	  capability. e.g. request PASID from host, setup IOMMU nesting translation
	  on host IOMMU. The pasid_alloc/bind_guest_page_table/iommu_cache_flush
	  operations are moved to be DualStageIOMMUOps. Thus, DualStageIOMMUObject
	  is an abstract layer which provides QEMU vIOMMU emulators with an explicit
	  method to program host IOMMU.
	  b) Compared with RFC v2, the IOMMUContext has also been updated. It is
	  modified to provide an abstract for vIOMMU emulators. It provides the
	  method for pass-through modules (like VFIO) to communicate with host IOMMU.
	  e.g. tell vIOMMU emulators about the IOMMU nesting capability on host side
	  and report the host IOMMU DMA translation faults to vIOMMU emulators.
	  RFC v2: https://www.spinics.net/lists/kvm/msg198556.html

	- RFC v1 -> v2:
	  Introduce IOMMUContext to abstract the connection between VFIO
	  and vIOMMU emulators, which is a replacement of the PCIPASIDOps
	  in RFC v1. Modify x-scalable-mode to be string option instead of
	  adding a new option as RFC v1 did. Refined the pasid cache management
	  and addressed the TODOs mentioned in RFC v1. 
	  RFC v1: https://patchwork.kernel.org/cover/11033657/

Eric Auger (1):
  scripts/update-linux-headers: Import iommu.h

Liu Yi L (21):
  header file update VFIO/IOMMU vSVA APIs
  vfio: check VFIO_TYPE1_NESTING_IOMMU support
  hw/iommu: introduce HostIOMMUContext
  hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
  hw/pci: introduce pci_device_set/unset_iommu_context()
  intel_iommu: add set/unset_iommu_context callback
  vfio/common: provide PASID alloc/free hooks
  vfio/common: init HostIOMMUContext per-container
  vfio/pci: set host iommu context to vIOMMU
  intel_iommu: add virtual command capability support
  intel_iommu: process PASID cache invalidation
  intel_iommu: add PASID cache management infrastructure
  vfio: add bind stage-1 page table support
  intel_iommu: bind/unbind guest page table to host
  intel_iommu: replay pasid binds after context cache invalidation
  intel_iommu: do not pass down pasid bind for PASID #0
  vfio: add support for flush iommu stage-1 cache
  intel_iommu: process PASID-based iotlb invalidation
  intel_iommu: propagate PASID-based iotlb invalidation to host
  intel_iommu: process PASID-based Device-TLB invalidation
  intel_iommu: modify x-scalable-mode to be string option

 hw/Makefile.objs                      |    1 +
 hw/alpha/typhoon.c                    |    6 +-
 hw/arm/smmu-common.c                  |    6 +-
 hw/hppa/dino.c                        |    6 +-
 hw/i386/amd_iommu.c                   |    6 +-
 hw/i386/intel_iommu.c                 | 1109 ++++++++++++++++++++++++++++++++-
 hw/i386/intel_iommu_internal.h        |  114 ++++
 hw/i386/trace-events                  |    6 +
 hw/iommu/Makefile.objs                |    1 +
 hw/iommu/host_iommu_context.c         |  161 +++++
 hw/pci-host/designware.c              |    6 +-
 hw/pci-host/pnv_phb3.c                |    6 +-
 hw/pci-host/pnv_phb4.c                |    6 +-
 hw/pci-host/ppce500.c                 |    6 +-
 hw/pci-host/prep.c                    |    6 +-
 hw/pci-host/sabre.c                   |    6 +-
 hw/pci/pci.c                          |   53 +-
 hw/ppc/ppc440_pcix.c                  |    6 +-
 hw/ppc/spapr_pci.c                    |    6 +-
 hw/s390x/s390-pci-bus.c               |    8 +-
 hw/vfio/common.c                      |  260 +++++++-
 hw/vfio/pci.c                         |   13 +
 hw/virtio/virtio-iommu.c              |    6 +-
 include/hw/i386/intel_iommu.h         |   57 +-
 include/hw/iommu/host_iommu_context.h |  116 ++++
 include/hw/pci/pci.h                  |   18 +-
 include/hw/pci/pci_bus.h              |    2 +-
 include/hw/vfio/vfio-common.h         |    4 +
 linux-headers/linux/iommu.h           |  378 +++++++++++
 linux-headers/linux/vfio.h            |  127 ++++
 scripts/update-linux-headers.sh       |    2 +-
 31 files changed, 2463 insertions(+), 45 deletions(-)
 create mode 100644 hw/iommu/Makefile.objs
 create mode 100644 hw/iommu/host_iommu_context.c
 create mode 100644 include/hw/iommu/host_iommu_context.h
 create mode 100644 linux-headers/linux/iommu.h

-- 
2.7.4


Re: [PATCH v2 00/22] intel_iommu: expose Shared Virtual Addressing to VMs
Posted by no-reply@patchew.org 4 years ago
Patchew URL: https://patchew.org/QEMU/1585542301-84087-1-git-send-email-yi.l.liu@intel.com/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

                 from /tmp/qemu-test/src/include/hw/pci/pci_bus.h:4,
                 from /tmp/qemu-test/src/include/hw/pci-host/i440fx.h:15,
                 from /tmp/qemu-test/src/stubs/pci-host-piix.c:2:
/tmp/qemu-test/src/include/hw/iommu/host_iommu_context.h:28:10: fatal error: linux/iommu.h: No such file or directory
 #include <linux/iommu.h>
          ^~~~~~~~~~~~~~~
compilation terminated.
  CC      scsi/pr-manager-stub.o
make: *** [/tmp/qemu-test/src/rules.mak:69: stubs/pci-host-piix.o] Error 1
make: *** Waiting for unfinished jobs....
  CC      block/curl.o
Traceback (most recent call last):
---
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', '--label', 'com.qemu.instance.uuid=a71cba547b0b47ef91f874b42e00f828', '-u', '1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', '/var/tmp/patchew-tester-tmp-enp9m7rr/src/docker-src.2020-03-30-01.38.53.2480:/var/tmp/qemu:z,ro', 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit status 2.
filter=--filter=label=com.qemu.instance.uuid=a71cba547b0b47ef91f874b42e00f828
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-enp9m7rr/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real    2m1.872s
user    0m8.422s


The full log is available at
http://patchew.org/logs/1585542301-84087-1-git-send-email-yi.l.liu@intel.com/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com
Re: [PATCH v2 00/22] intel_iommu: expose Shared Virtual Addressing to VMs
Posted by Peter Xu 4 years ago
On Sun, Mar 29, 2020 at 09:24:39PM -0700, Liu Yi L wrote:
> Tests: basci vSVA functionality test,

Could you elaborate what's the functionality test?  Does that contains
at least some IOs go through the SVA-capable device so the nested page
table is used?  I thought it was a yes, but after I notice that the
BIND message flags seems to be wrong, I really think I should ask this
loud..

> VM reboot/shutdown/crash,

What's the VM crash test?

> kernel build in
> guest, boot VM with vSVA disabled, full comapilation with all archs.

I believe I've said similar things, but...  I'd appreciate if you can
also smoke on 2nd-level only with the series applied.

Thanks,

-- 
Peter Xu


RE: [PATCH v2 00/22] intel_iommu: expose Shared Virtual Addressing to VMs
Posted by Liu, Yi L 4 years ago
> From: Peter Xu <peterx@redhat.com>
> Sent: Friday, April 3, 2020 2:13 AM
> To: Liu, Yi L <yi.l.liu@intel.com>
> Subject: Re: [PATCH v2 00/22] intel_iommu: expose Shared Virtual Addressing to
> VMs
> 
> On Sun, Mar 29, 2020 at 09:24:39PM -0700, Liu Yi L wrote:
> > Tests: basci vSVA functionality test,
> 
> Could you elaborate what's the functionality test?  Does that contains
> at least some IOs go through the SVA-capable device so the nested page
> table is used?  I thought it was a yes, but after I notice that the
> BIND message flags seems to be wrong, I really think I should ask this
> loud..

as just replied, in the verification, only the SRE bit is used. So it's not
spotted. In my functionality test, I've passthru a SVA-capable device
and issue SVA transactions.

> > VM reboot/shutdown/crash,
> 
> What's the VM crash test?

it's ctrl+c to kill the VM.

> > kernel build in
> > guest, boot VM with vSVA disabled, full comapilation with all archs.
> 
> I believe I've said similar things, but...  I'd appreciate if you can
> also smoke on 2nd-level only with the series applied.

yeah, you mean the legacy case, I booted with such config.

Regards,
Yi Liu
Re: [PATCH v2 00/22] intel_iommu: expose Shared Virtual Addressing to VMs
Posted by Jason Wang 4 years ago
On 2020/3/30 下午12:24, Liu Yi L wrote:
> Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> Intel platforms allows address space sharing between device DMA and
> applications. SVA can reduce programming complexity and enhance security.
>
> This QEMU series is intended to expose SVA usage to VMs. i.e. Sharing
> guest application address space with passthru devices. This is called
> vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> changes.
>
> The high-level architecture for SVA virtualization is as below, the key
> design of vSVA support is to utilize the dual-stage IOMMU translation (
> also known as IOMMU nesting translation) capability in host IOMMU.
>
>      .-------------.  .---------------------------.
>      |   vIOMMU    |  | Guest process CR3, FL only|
>      |             |  '---------------------------'
>      .----------------/
>      | PASID Entry |--- PASID cache flush -
>      '-------------'                       |
>      |             |                       V
>      |             |                CR3 in GPA
>      '-------------'
> Guest
> ------| Shadow |--------------------------|--------
>        v        v                          v
> Host
>      .-------------.  .----------------------.
>      |   pIOMMU    |  | Bind FL for GVA-GPA  |
>      |             |  '----------------------'
>      .----------------/  |
>      | PASID Entry |     V (Nested xlate)
>      '----------------\.------------------------------.
>      |             ||SL for GPA-HPA, default domain|
>      |             |   '------------------------------'
>      '-------------'
> Where:
>   - FL = First level/stage one page tables
>   - SL = Second level/stage two page tables
>
> The complete vSVA kernel upstream patches are divided into three phases:
>      1. Common APIs and PCI device direct assignment
>      2. IOMMU-backed Mediated Device assignment
>      3. Page Request Services (PRS) support
>
> This QEMU patchset is aiming for the phase 1 and phase 2. It is based
> on the two kernel series below.
> [1] [PATCH V10 00/11] Nested Shared Virtual Address (SVA) VT-d support:
> https://lkml.org/lkml/2020/3/20/1172
> [2] [PATCH v1 0/8] vfio: expose virtual Shared Virtual Addressing to VMs
> https://lkml.org/lkml/2020/3/22/116
>
> There are roughly two parts:
>   1. Introduce HostIOMMUContext as abstract of host IOMMU. It provides explicit
>      method for vIOMMU emulators to communicate with host IOMMU. e.g. propagate
>      guest page table binding to host IOMMU to setup dual-stage DMA translation
>      in host IOMMU and flush iommu iotlb.
>   2. Setup dual-stage IOMMU translation for Intel vIOMMU. Includes
>      - Check IOMMU uAPI version compatibility and VFIO Nesting capabilities which
>        includes hardware compatibility (stage 1 format) and VFIO_PASID_REQ
>        availability. This is preparation for setting up dual-stage DMA translation
>        in host IOMMU.
>      - Propagate guest PASID allocation and free request to host.
>      - Propagate guest page table binding to host to setup dual-stage IOMMU DMA
>        translation in host IOMMU.
>      - Propagate guest IOMMU cache invalidation to host to ensure iotlb
>        correctness.
>
> The complete QEMU set can be found in below link:
> https://github.com/luxis1999/qemu.git: sva_vtd_v10_v2


Hi Yi:

I could not find the branch there.

Thanks


Re: [PATCH v2 00/22] intel_iommu: expose Shared Virtual Addressing to VMs
Posted by Peter Xu 4 years ago
On Thu, Apr 02, 2020 at 04:33:02PM +0800, Jason Wang wrote:
> > The complete QEMU set can be found in below link:
> > https://github.com/luxis1999/qemu.git: sva_vtd_v10_v2
> 
> 
> Hi Yi:
> 
> I could not find the branch there.

Jason,

He typed wrong... It's actually (I found it myself):

https://github.com/luxis1999/qemu/tree/sva_vtd_v10_qemu_v2

-- 
Peter Xu


Re: [PATCH v2 00/22] intel_iommu: expose Shared Virtual Addressing to VMs
Posted by Jason Wang 4 years ago
On 2020/4/2 下午9:46, Peter Xu wrote:
> On Thu, Apr 02, 2020 at 04:33:02PM +0800, Jason Wang wrote:
>>> The complete QEMU set can be found in below link:
>>> https://github.com/luxis1999/qemu.git: sva_vtd_v10_v2
>>
>> Hi Yi:
>>
>> I could not find the branch there.
> Jason,
>
> He typed wrong... It's actually (I found it myself):
>
> https://github.com/luxis1999/qemu/tree/sva_vtd_v10_qemu_v2


Aha, I see.

Thanks


>


RE: [PATCH v2 00/22] intel_iommu: expose Shared Virtual Addressing to VMs
Posted by Liu, Yi L 4 years ago
> From: Peter Xu <peterx@redhat.com>
> Sent: Thursday, April 2, 2020 9:46 PM
> To: Jason Wang <jasowang@redhat.com>
> Subject: Re: [PATCH v2 00/22] intel_iommu: expose Shared Virtual Addressing to
> VMs
> 
> On Thu, Apr 02, 2020 at 04:33:02PM +0800, Jason Wang wrote:
> > > The complete QEMU set can be found in below link:
> > > https://github.com/luxis1999/qemu.git: sva_vtd_v10_v2
> >
> >
> > Hi Yi:
> >
> > I could not find the branch there.
> 
> Jason,
> 
> He typed wrong... It's actually (I found it myself):
> 
> https://github.com/luxis1999/qemu/tree/sva_vtd_v10_qemu_v2
thanks, really a silly type mistake.

Regards,
Yi Liu
Re: [PATCH v2 00/22] intel_iommu: expose Shared Virtual Addressing to VMs
Posted by Auger Eric 4 years ago
Hi Yi,

On 3/30/20 6:24 AM, Liu Yi L wrote:
> Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> Intel platforms allows address space sharing between device DMA and
> applications. SVA can reduce programming complexity and enhance security.
> 
> This QEMU series is intended to expose SVA usage to VMs. i.e. Sharing
> guest application address space with passthru devices. This is called
> vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> changes.
> 
> The high-level architecture for SVA virtualization is as below, the key
> design of vSVA support is to utilize the dual-stage IOMMU translation (
> also known as IOMMU nesting translation) capability in host IOMMU.
> 
>     .-------------.  .---------------------------.
>     |   vIOMMU    |  | Guest process CR3, FL only|
>     |             |  '---------------------------'
>     .----------------/
>     | PASID Entry |--- PASID cache flush -
>     '-------------'                       |
>     |             |                       V
>     |             |                CR3 in GPA
>     '-------------'
> Guest
> ------| Shadow |--------------------------|--------
>       v        v                          v
> Host
>     .-------------.  .----------------------.
>     |   pIOMMU    |  | Bind FL for GVA-GPA  |
>     |             |  '----------------------'
>     .----------------/  |
>     | PASID Entry |     V (Nested xlate)
>     '----------------\.------------------------------.
>     |             |   |SL for GPA-HPA, default domain|
>     |             |   '------------------------------'
>     '-------------'
> Where:
>  - FL = First level/stage one page tables
>  - SL = Second level/stage two page tables
> 
> The complete vSVA kernel upstream patches are divided into three phases:
>     1. Common APIs and PCI device direct assignment
>     2. IOMMU-backed Mediated Device assignment
>     3. Page Request Services (PRS) support
> 
> This QEMU patchset is aiming for the phase 1 and phase 2. It is based
> on the two kernel series below.
> [1] [PATCH V10 00/11] Nested Shared Virtual Address (SVA) VT-d support:
> https://lkml.org/lkml/2020/3/20/1172
> [2] [PATCH v1 0/8] vfio: expose virtual Shared Virtual Addressing to VMs
> https://lkml.org/lkml/2020/3/22/116
+ [PATCH v2 0/3] IOMMU user API enhancement, right?

I think in general, as long as the kernel dependencies are not resolved,
the QEMU series is supposed to stay in RFC state.

Thanks

Eric
> 
> There are roughly two parts:
>  1. Introduce HostIOMMUContext as abstract of host IOMMU. It provides explicit
>     method for vIOMMU emulators to communicate with host IOMMU. e.g. propagate
>     guest page table binding to host IOMMU to setup dual-stage DMA translation
>     in host IOMMU and flush iommu iotlb.
>  2. Setup dual-stage IOMMU translation for Intel vIOMMU. Includes 
>     - Check IOMMU uAPI version compatibility and VFIO Nesting capabilities which
>       includes hardware compatibility (stage 1 format) and VFIO_PASID_REQ
>       availability. This is preparation for setting up dual-stage DMA translation
>       in host IOMMU.
>     - Propagate guest PASID allocation and free request to host.
>     - Propagate guest page table binding to host to setup dual-stage IOMMU DMA
>       translation in host IOMMU.
>     - Propagate guest IOMMU cache invalidation to host to ensure iotlb
>       correctness.
> 
> The complete QEMU set can be found in below link:
> https://github.com/luxis1999/qemu.git: sva_vtd_v10_v2
> 
> Complete kernel can be found in:
> https://github.com/luxis1999/linux-vsva.git: vsva-linux-5.6-rc6
> 
> Tests: basci vSVA functionality test, VM reboot/shutdown/crash, kernel build in
> guest, boot VM with vSVA disabled, full comapilation with all archs.
> 
> Regards,
> Yi Liu
> 
> Changelog:
> 	- Patch v1 -> Patch v2:
> 	  a) Refactor the vfio HostIOMMUContext init code (patch 0008 - 0009 of v1 series)
> 	  b) Refactor the pasid binding handling (patch 0011 - 0016 of v1 series)
> 	  Patch v1: https://patchwork.ozlabs.org/cover/1259648/
> 
> 	- RFC v3.1 -> Patch v1:
> 	  a) Implement HostIOMMUContext in QOM manner.
> 	  b) Add pci_set/unset_iommu_context() to register HostIOMMUContext to
> 	     vIOMMU, thus the lifecircle of HostIOMMUContext is awared in vIOMMU
> 	     side. In such way, vIOMMU could use the methods provided by the
> 	     HostIOMMUContext safely.
> 	  c) Add back patch "[RFC v3 01/25] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps"
> 	  RFCv3.1: https://patchwork.kernel.org/cover/11397879/
> 
> 	- RFC v3 -> v3.1:
> 	  a) Drop IOMMUContext, and rename DualStageIOMMUObject to HostIOMMUContext.
> 	     HostIOMMUContext is per-vfio-container, it is exposed to  vIOMMU via PCI
> 	     layer. VFIO registers a PCIHostIOMMUFunc callback to PCI layer, vIOMMU
> 	     could get HostIOMMUContext instance via it.
> 	  b) Check IOMMU uAPI version by VFIO_CHECK_EXTENSION
> 	  c) Add a check on VFIO_PASID_REQ availability via VFIO_GET_IOMMU_IHNFO
> 	  d) Reorder the series, put vSVA linux header file update in the beginning
> 	     put the x-scalable-mode option mofification in the end of the series.
> 	  e) Dropped patch "[RFC v3 01/25] hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps"
> 	  RFCv3: https://patchwork.kernel.org/cover/11356033/
> 
> 	- RFC v2 -> v3:
> 	  a) Introduce DualStageIOMMUObject to abstract the host IOMMU programming
> 	  capability. e.g. request PASID from host, setup IOMMU nesting translation
> 	  on host IOMMU. The pasid_alloc/bind_guest_page_table/iommu_cache_flush
> 	  operations are moved to be DualStageIOMMUOps. Thus, DualStageIOMMUObject
> 	  is an abstract layer which provides QEMU vIOMMU emulators with an explicit
> 	  method to program host IOMMU.
> 	  b) Compared with RFC v2, the IOMMUContext has also been updated. It is
> 	  modified to provide an abstract for vIOMMU emulators. It provides the
> 	  method for pass-through modules (like VFIO) to communicate with host IOMMU.
> 	  e.g. tell vIOMMU emulators about the IOMMU nesting capability on host side
> 	  and report the host IOMMU DMA translation faults to vIOMMU emulators.
> 	  RFC v2: https://www.spinics.net/lists/kvm/msg198556.html
> 
> 	- RFC v1 -> v2:
> 	  Introduce IOMMUContext to abstract the connection between VFIO
> 	  and vIOMMU emulators, which is a replacement of the PCIPASIDOps
> 	  in RFC v1. Modify x-scalable-mode to be string option instead of
> 	  adding a new option as RFC v1 did. Refined the pasid cache management
> 	  and addressed the TODOs mentioned in RFC v1. 
> 	  RFC v1: https://patchwork.kernel.org/cover/11033657/
> 
> Eric Auger (1):
>   scripts/update-linux-headers: Import iommu.h
> 
> Liu Yi L (21):
>   header file update VFIO/IOMMU vSVA APIs
>   vfio: check VFIO_TYPE1_NESTING_IOMMU support
>   hw/iommu: introduce HostIOMMUContext
>   hw/pci: modify pci_setup_iommu() to set PCIIOMMUOps
>   hw/pci: introduce pci_device_set/unset_iommu_context()
>   intel_iommu: add set/unset_iommu_context callback
>   vfio/common: provide PASID alloc/free hooks
>   vfio/common: init HostIOMMUContext per-container
>   vfio/pci: set host iommu context to vIOMMU
>   intel_iommu: add virtual command capability support
>   intel_iommu: process PASID cache invalidation
>   intel_iommu: add PASID cache management infrastructure
>   vfio: add bind stage-1 page table support
>   intel_iommu: bind/unbind guest page table to host
>   intel_iommu: replay pasid binds after context cache invalidation
>   intel_iommu: do not pass down pasid bind for PASID #0
>   vfio: add support for flush iommu stage-1 cache
>   intel_iommu: process PASID-based iotlb invalidation
>   intel_iommu: propagate PASID-based iotlb invalidation to host
>   intel_iommu: process PASID-based Device-TLB invalidation
>   intel_iommu: modify x-scalable-mode to be string option
> 
>  hw/Makefile.objs                      |    1 +
>  hw/alpha/typhoon.c                    |    6 +-
>  hw/arm/smmu-common.c                  |    6 +-
>  hw/hppa/dino.c                        |    6 +-
>  hw/i386/amd_iommu.c                   |    6 +-
>  hw/i386/intel_iommu.c                 | 1109 ++++++++++++++++++++++++++++++++-
>  hw/i386/intel_iommu_internal.h        |  114 ++++
>  hw/i386/trace-events                  |    6 +
>  hw/iommu/Makefile.objs                |    1 +
>  hw/iommu/host_iommu_context.c         |  161 +++++
>  hw/pci-host/designware.c              |    6 +-
>  hw/pci-host/pnv_phb3.c                |    6 +-
>  hw/pci-host/pnv_phb4.c                |    6 +-
>  hw/pci-host/ppce500.c                 |    6 +-
>  hw/pci-host/prep.c                    |    6 +-
>  hw/pci-host/sabre.c                   |    6 +-
>  hw/pci/pci.c                          |   53 +-
>  hw/ppc/ppc440_pcix.c                  |    6 +-
>  hw/ppc/spapr_pci.c                    |    6 +-
>  hw/s390x/s390-pci-bus.c               |    8 +-
>  hw/vfio/common.c                      |  260 +++++++-
>  hw/vfio/pci.c                         |   13 +
>  hw/virtio/virtio-iommu.c              |    6 +-
>  include/hw/i386/intel_iommu.h         |   57 +-
>  include/hw/iommu/host_iommu_context.h |  116 ++++
>  include/hw/pci/pci.h                  |   18 +-
>  include/hw/pci/pci_bus.h              |    2 +-
>  include/hw/vfio/vfio-common.h         |    4 +
>  linux-headers/linux/iommu.h           |  378 +++++++++++
>  linux-headers/linux/vfio.h            |  127 ++++
>  scripts/update-linux-headers.sh       |    2 +-
>  31 files changed, 2463 insertions(+), 45 deletions(-)
>  create mode 100644 hw/iommu/Makefile.objs
>  create mode 100644 hw/iommu/host_iommu_context.c
>  create mode 100644 include/hw/iommu/host_iommu_context.h
>  create mode 100644 linux-headers/linux/iommu.h
> 


Re: [PATCH v2 00/22] intel_iommu: expose Shared Virtual Addressing to VMs
Posted by Peter Xu 4 years ago
On Mon, Mar 30, 2020 at 12:36:23PM +0200, Auger Eric wrote:
> I think in general, as long as the kernel dependencies are not resolved,
> the QEMU series is supposed to stay in RFC state.

Yeah I agree. I think the subject is not extremely important, but we
definitely should wait for the kernel part to be ready before merging
the series.

Side note: I offered quite a few r-bs for the series (and I still plan
to move on reading it this week since there's a new version, and try
to offer more r-bs when I still have some context in my brain-cache),
however they're mostly only for myself to avoid re-reading the whole
series again in the future especially because it's huge... :)

Thanks,

-- 
Peter Xu


RE: [PATCH v2 00/22] intel_iommu: expose Shared Virtual Addressing to VMs
Posted by Liu, Yi L 4 years ago
Hi Eric,

> From: Peter Xu <peterx@redhat.com>
> Sent: Monday, March 30, 2020 10:47 PM
> To: Auger Eric <eric.auger@redhat.com>
> Subject: Re: [PATCH v2 00/22] intel_iommu: expose Shared Virtual Addressing to
> VMs
> 
> On Mon, Mar 30, 2020 at 12:36:23PM +0200, Auger Eric wrote:
> > I think in general, as long as the kernel dependencies are not
> > resolved, the QEMU series is supposed to stay in RFC state.
> 
> Yeah I agree. I think the subject is not extremely important, but we definitely should
> wait for the kernel part to be ready before merging the series.
> 
> Side note: I offered quite a few r-bs for the series (and I still plan to move on
> reading it this week since there's a new version, and try to offer more r-bs when I
> still have some context in my brain-cache), however they're mostly only for myself
> to avoid re-reading the whole series again in the future especially because it's
> huge... :)

Agreed. I'll rename the next version as RFCv6 then. BTW. although there
is dependency on kernel side, but I think we'd get agreement on the
interaction mechanism between vfio and vIOMMU within QEMU. Also, for the
VT-d specific changes (e.g. the pasid cache invalidation patches and the
pasid-based-iotlb invalidations), we can actually get them ready as they
have no dependency on kernel side change. Please help. :-)

Regards,
Yi Liu