From nobody Thu May 2 04:13:08 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1553641626644865.8073391445197; Tue, 26 Mar 2019 16:07:06 -0700 (PDT) Received: from localhost ([127.0.0.1]:39164 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h8v9n-0002lE-Mo for importer@patchew.org; Tue, 26 Mar 2019 19:06:59 -0400 Received: from eggs.gnu.org ([209.51.188.92]:47466) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1h8v8R-0002H4-Ow for qemu-devel@nongnu.org; Tue, 26 Mar 2019 19:05:36 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1h8v8Q-0004V8-0H for qemu-devel@nongnu.org; Tue, 26 Mar 2019 19:05:35 -0400 Received: from mx1.redhat.com ([209.132.183.28]:46214) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1h8v8P-0004RD-KW for qemu-devel@nongnu.org; Tue, 26 Mar 2019 19:05:33 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 779DEC057F5D; Tue, 26 Mar 2019 22:55:28 +0000 (UTC) Received: from gimli.home (ovpn-116-99.phx2.redhat.com [10.3.116.99]) by smtp.corp.redhat.com (Postfix) with ESMTP id D3487607A4; Tue, 26 Mar 2019 22:55:19 +0000 (UTC) From: Alex Williamson To: qemu-devel@nongnu.org Date: Tue, 26 Mar 2019 16:55:19 -0600 Message-ID: <155364082689.15803.7062874513041742278.stgit@gimli.home> User-Agent: StGit/0.19-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Tue, 26 Mar 2019 22:55:28 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [RFC PATCH] pci: Use PCI aliases when determining device IOMMU address space X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: eric.auger@redhat.com, peterx@redhat.com, mst@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Conventional PCI buses pre-date requester IDs. An IOMMU cannot distinguish by devfn & bus between devices in a conventional PCI topology and therefore we cannot assign them separate AddressSpaces. By taking this requester ID aliasing into account, QEMU better matches the bare metal behavior and restrictions, and enables shared AddressSpace configurations that are otherwise not possible with guest IOMMU support. For the latter case, given any example where an IOMMU group on the host includes multiple devices: $ ls /sys/kernel/iommu_groups/1/devices/ 0000:00:01.0 0000:01:00.0 0000:01:00.1 If we incorporate a vIOMMU into the VM configuration, we're restricted that we can only assign one of the endpoints to the guest because a second endpoint will attempt to use a different AddressSpace. VFIO only supports IOMMU group level granularity at the container level, preventing this second endpoint from being assigned: qemu-system-x86_64 -machine q35... \ -device intel-iommu,intremap=3Don \ -device pcie-root-port,addr=3D1e.0,id=3Dpcie.1 \ -device vfio-pci,host=3D1:00.0,bus=3Dpcie.1,addr=3D0.0,multifunction=3Don= \ -device vfio-pci,host=3D1:00.1,bus=3Dpcie.1,addr=3D0.1 qemu-system-x86_64: -device vfio-pci,host=3D1:00.1,bus=3Dpcie.1,addr=3D0.1:= vfio \ 0000:01:00.1: group 1 used in multiple address spaces However, when QEMU incorporates proper aliasing, we can make use of a PCIe-to-PCI bridge to mask the requester ID, resulting in a hack that provides the downstream devices with the same AddressSpace, ex: qemu-system-x86_64 -machine q35... \ -device intel-iommu,intremap=3Don \ -device pcie-pci-bridge,addr=3D1e.0,id=3Dpci.1 \ -device vfio-pci,host=3D1:00.0,bus=3Dpci.1,addr=3D1.0,multifunction=3Don \ -device vfio-pci,host=3D1:00.1,bus=3Dpci.1,addr=3D1.1 While the utility of this hack may be limited, this AddressSpace aliasing is the correct behavior for QEMU to emulate bare metal. Signed-off-by: Alex Williamson Reviewed-by: Peter Xu --- hw/pci/pci.c | 33 +++++++++++++++++++++++++++++++-- 1 file changed, 31 insertions(+), 2 deletions(-) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index 35451c1e9987..38467e676f1f 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -2594,12 +2594,41 @@ AddressSpace *pci_device_iommu_address_space(PCIDev= ice *dev) { PCIBus *bus =3D pci_get_bus(dev); PCIBus *iommu_bus =3D bus; + uint8_t devfn =3D dev->devfn; =20 while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) { - iommu_bus =3D pci_get_bus(iommu_bus->parent_dev); + PCIBus *parent_bus =3D pci_get_bus(iommu_bus->parent_dev); + + /* + * Determine which requester ID alias should be used for the device + * based on the PCI topology. There are no requester IDs on conve= tional + * PCI buses, therefore we push the alias up to the parent on each= non- + * express bus. Which alias we use depends on whether this is a l= egacy + * PCI bridge or PCIe-to-PCI/X bridge as in chapter 2.3 of the PCI= e-to- + * PCI bridge spec. Note that we cannot use pci_requester_id() he= re + * because the resulting BDF depends on the secondary bridge regis= ter + * programming. We also cannot lookup the PCIBus from the bus num= ber + * at this point for the iommu_fn. Also, requester_id_cache is the + * alias to the root bus, which is usually, but not necessarily al= ways + * where we'll find our iommu_fn. + */ + if (!pci_bus_is_express(iommu_bus)) { + PCIDevice *parent =3D iommu_bus->parent_dev; + + if (pci_is_express(parent) && + pcie_cap_get_type(parent) =3D=3D PCI_EXP_TYPE_PCI_BRIDGE) { + devfn =3D PCI_DEVFN(0, 0); + bus =3D iommu_bus; + } else { + devfn =3D parent->devfn; + bus =3D parent_bus; + } + } + + iommu_bus =3D parent_bus; } if (iommu_bus && iommu_bus->iommu_fn) { - return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, dev->devf= n); + return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn); } return &address_space_memory; }