From nobody Sat Apr 20 10:21:57 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1573043918; cv=none; d=zoho.com; s=zohoarc; b=Xs4EOoP5VRb+Mox4cTlUCpBO67MLieHZFumNrcJ7d9cr+62jMC1VDZPfn/h22D8u06hYTm82yATxYRLpBIN5xZE77yTdergegM1R8CIHmClJNRuUrpNW3WwclXvGoBU37Yf29dN3X0EPZu8HkdiExSVGe7bEZCXrIHmMUGq/kK0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1573043918; h=Content-Type:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=ayKbAFLuOylXFx4QOvYTwafMKFK8Oyh1afctsP3WaJg=; b=luO7/fkhIX+Iy7bHr7gvRa2qz9OTDHcalvt9n4zln8klxMm2KvaHqedY+8WephZWjLG7xDQYcUZEgWcquHSd+ctNIuKrQY589AY1z93ubhMESkb8vsEa2OdLSM5QmI+WDFhdHylREHRsKiarWO7tKuJsWxdMDvkgG3xlPcqCCp8= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 157304391832518.7746166755569; Wed, 6 Nov 2019 04:38:38 -0800 (PST) Received: from localhost ([::1]:57164 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iSKa3-0003Rq-B8 for importer@patchew.org; Wed, 06 Nov 2019 07:38:35 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:57095) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iSKWx-0007qe-Si for qemu-devel@nongnu.org; Wed, 06 Nov 2019 07:35:26 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iSKWv-0005Kp-1d for qemu-devel@nongnu.org; Wed, 06 Nov 2019 07:35:22 -0500 Received: from mx1.redhat.com ([209.132.183.28]:43926) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iSKWu-0005Jw-MC for qemu-devel@nongnu.org; Wed, 06 Nov 2019 07:35:20 -0500 Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 4D78D83F3F for ; Wed, 6 Nov 2019 12:35:19 +0000 (UTC) Received: by mail-qk1-f199.google.com with SMTP id s3so24587521qkd.6 for ; Wed, 06 Nov 2019 04:35:19 -0800 (PST) Received: from redhat.com (bzq-79-178-12-128.red.bezeqint.net. [79.178.12.128]) by smtp.gmail.com with ESMTPSA id o2sm12431070qkf.68.2019.11.06.04.35.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Nov 2019 04:35:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=ayKbAFLuOylXFx4QOvYTwafMKFK8Oyh1afctsP3WaJg=; b=o1ZZa1kMF9rI+HdQSBZZEnGpZOOYInly9aZvAWm40p3r+ypaeyX7wrwaPS/AD0pWi8 +gFpBR9ChDu3JhV92+SK6yUNMMGFH7TpZbHaVoOpAbLeoSuvXJk3vWsnCJcm/MCgBrkw tT3TdTDJ1YVFDZHNasWQm3ytutT6mZWpnCXHZwktXFWnARcZoYC5Y1gNmLKOEaMj4ag6 +Kiv4L091aFN3jybROxSD29w39+//217+w8I3d7KYj70CRpWUcukF2h1Mgq4J/ioDydU xQZooQfWLPoKjaomAjcXJ3rMpOxbgJspdWmiXiGbO9bdf/A9FEkyz8xqiYfLVZqiZ58Z 4CLg== X-Gm-Message-State: APjAAAWYfBlqmqD0CwVUXIVdRno/xhZy9IpwLxtf6GY19xQGX/Yfm66T IMxHa/7SXO4AqM+ijNVM66ccvJAPV71AeEPdE+qUZkufHe5srBDkGm/AoBr0csMFjUD5yir0dmV JFJ1B4M8yDjTXDTY= X-Received: by 2002:ad4:408d:: with SMTP id l13mr2002133qvp.49.1573043718159; Wed, 06 Nov 2019 04:35:18 -0800 (PST) X-Google-Smtp-Source: APXvYqz2p5mjethekeHAIvITy6WstnadxAQrepg7DHKnB3pRjzlm5ddWghBB0TiQw1Lc/Z8r3pF5bg== X-Received: by 2002:ad4:408d:: with SMTP id l13mr2002099qvp.49.1573043717801; Wed, 06 Nov 2019 04:35:17 -0800 (PST) Date: Wed, 6 Nov 2019 07:35:13 -0500 From: "Michael S. Tsirkin" To: qemu-devel@nongnu.org Subject: [PULL 1/3] pci: Use PCI aliases when determining device IOMMU address space Message-ID: <20191106123407.20997-2-mst@redhat.com> References: <20191106123407.20997-1-mst@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20191106123407.20997-1-mst@redhat.com> X-Mailer: git-send-email 2.22.0.678.g13338e74b8 X-Mutt-Fcc: =sent X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Alex Williamson , Peter Xu Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Alex Williamson PCIe requester IDs are used by modern IOMMUs to differentiate devices in order to provide a unique IOVA address space per device. These requester IDs are composed of the bus/device/function (BDF) of the requesting device. Conventional PCI pre-dates this concept and is simply a shared parallel bus where transactions are claimed by decoding target ranges rather than the packetized, point-to-point mechanisms of PCI-express. In order to interface conventional PCI to PCIe, the PCIe-to-PCI bridge creates and accepts packetized transactions on behalf of all downstream devices, using one of two potential forms of a requester ID relating to the bridge itself or its subordinate bus. All downstream devices are therefore aliased by the bridge's requester ID and it's not possible for the IOMMU to create unique IOVA spaces for devices downstream of such buses. At least that's how it works on bare metal. Until now point we've ignored this nuance of vIOMMU support in QEMU, creating a unique AddressSpace per device regardless of the virtual bus topology. Aside from simply being true to bare metal behavior, there are aspects of a shared address space that we can use to our advantage when designing a VM. For instance, a PCI device assignment scenario where we have the following IOMMU group on the host system: $ ls /sys/kernel/iommu_groups/1/devices/ 0000:00:01.0 0000:01:00.0 0000:01:00.1 An IOMMU group is considered the smallest set of devices which are fully DMA isolated from other devices by the IOMMU. In this case the root port at 00:01.0 does not guarantee that it prevents peer to peer traffic between the endpoints on bus 01: and the devices are therefore grouped together. VFIO considers an IOMMU group to be the smallest unit of device ownership and allows only a single shared IOVA space per group due to the limitations of the isolation. Therefore, if we attempt to create the following VM, we get an error: qemu-system-x86_64 -machine q35... \ -device intel-iommu,intremap=3Don \ -device pcie-root-port,addr=3D1e.0,id=3Dpcie.1 \ -device vfio-pci,host=3D1:00.0,bus=3Dpcie.1,addr=3D0.0,multifunction=3Don= \ -device vfio-pci,host=3D1:00.1,bus=3Dpcie.1,addr=3D0.1 qemu-system-x86_64: -device vfio-pci,host=3D1:00.1,bus=3Dpcie.1,addr=3D0.1:= vfio \ 0000:01:00.1: group 1 used in multiple address spaces VFIO only allows a single IOVA space (AddressSpace) for both devices, but we've placed them into a topology where the vIOMMU expects a separate AddressSpace for each device. On bare metal we know that a conventional PCI bus would provide the sort of aliasing we need here, forcing the IOMMU to consider these devices to be part of a single shared IOVA space. The support provided here does the same for QEMU, such that we can create a conventional PCI topology to expose equivalent AddressSpace sharing requirements to the VM: qemu-system-x86_64 -machine q35... \ -device intel-iommu,intremap=3Don \ -device pcie-pci-bridge,addr=3D1e.0,id=3Dpci.1 \ -device vfio-pci,host=3D1:00.0,bus=3Dpci.1,addr=3D1.0,multifunction=3Don \ -device vfio-pci,host=3D1:00.1,bus=3Dpci.1,addr=3D1.1 There are pros and cons to this configuration; it's not necessarily recommended, it's simply a tool we can use to create configurations which may provide additional functionality in spite of host hardware limitations or as a benefit to the guest configuration or resource usage. An incomplete list of pros and cons: Cons: a) Extended PCI configuration space is unavailable to devices downstream of a conventional PCI bus. The degree to which this is a drawback depends on the device and guest drivers. b) Applying this topology to devices which are already isolated by the host IOMMU (singleton IOMMU groups) will result in devices which appear to be non-isolated to the VM (non-singleton groups). This can limit configurations within the guest, such as userspace drivers or nested device assignment. Pros: a) QEMU better emulates bare metal. b) Configurations as above are now possible. c) Host IOMMU resources and VM locked memory requirements are reduced in vIOMMU configurations due to shared IOMMU domains on the host and avoidance of duplicate locked memory accounting. Reviewed-by: Peter Xu Signed-off-by: Alex Williamson Message-Id: <157187083548.5439.14747141504058604843.stgit@gimli.home> Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- hw/pci/pci.c | 43 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 40 insertions(+), 3 deletions(-) diff --git a/hw/pci/pci.c b/hw/pci/pci.c index c68498c0de..cbc7a32568 100644 --- a/hw/pci/pci.c +++ b/hw/pci/pci.c @@ -2646,12 +2646,49 @@ AddressSpace *pci_device_iommu_address_space(PCIDev= ice *dev) { PCIBus *bus =3D pci_get_bus(dev); PCIBus *iommu_bus =3D bus; + uint8_t devfn =3D dev->devfn; =20 - while(iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) { - iommu_bus =3D pci_get_bus(iommu_bus->parent_dev); + while (iommu_bus && !iommu_bus->iommu_fn && iommu_bus->parent_dev) { + PCIBus *parent_bus =3D pci_get_bus(iommu_bus->parent_dev); + + /* + * The requester ID of the provided device may be aliased, as seen= from + * the IOMMU, due to topology limitations. The IOMMU relies on a + * requester ID to provide a unique AddressSpace for devices, but + * conventional PCI buses pre-date such concepts. Instead, the PC= Ie- + * to-PCI bridge creates and accepts transactions on behalf of dow= n- + * stream devices. When doing so, all downstream devices are mask= ed + * (aliased) behind a single requester ID. The requester ID used + * depends on the format of the bridge devices. Proper PCIe-to-PCI + * bridges, with a PCIe capability indicating such, follow the + * guidelines of chapter 2.3 of the PCIe-to-PCI/X bridge specifica= tion, + * where the bridge uses the seconary bus as the bridge portion of= the + * requester ID and devfn of 00.0. For other bridges, typically t= hose + * found on the root complex such as the dmi-to-pci-bridge, we fol= low + * the convention of typical bare-metal hardware, which uses the + * requester ID of the bridge itself. There are device specific + * exceptions to these rules, but these are the defaults that the + * Linux kernel uses when determining DMA aliases itself and belie= ved + * to be true for the bare metal equivalents of the devices emulat= ed + * in QEMU. + */ + if (!pci_bus_is_express(iommu_bus)) { + PCIDevice *parent =3D iommu_bus->parent_dev; + + if (pci_is_express(parent) && + pcie_cap_get_type(parent) =3D=3D PCI_EXP_TYPE_PCI_BRIDGE) { + devfn =3D PCI_DEVFN(0, 0); + bus =3D iommu_bus; + } else { + devfn =3D parent->devfn; + bus =3D parent_bus; + } + } + + iommu_bus =3D parent_bus; } if (iommu_bus && iommu_bus->iommu_fn) { - return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, dev->devf= n); + return iommu_bus->iommu_fn(bus, iommu_bus->iommu_opaque, devfn); } return &address_space_memory; } --=20 MST From nobody Sat Apr 20 10:21:57 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1573043806; cv=none; d=zoho.com; s=zohoarc; b=Z1fx7HXo9eMg39NzQPmY1hwfLtMS8Hld+wvjUrkSC7gxXFKsT1Aj1gyJkAu9+cQvXJ0yPXgR+tHPHuzmG0wEs0OmzIopBu0C/fholqb6YNy547X0IIFvCdqnwdyAXNNPr4i/CJ8s9HZtAdxb/LiE2fYNNDGa0ioqjFl7qQpawck= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1573043806; h=Content-Type:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=KCLzSOVUUewzW8if+qSqUXDFPbh6AktYDPGpM9xn7Ec=; b=VEh/BApWRUu1/OvodWAadFIFa0c/e4Qdj9arxqZN0ch8+8ynnx6zotFm+JFHwGEW56t5iiDeS4cunzb5x/ZoYNkL3NEVGYnzsV1wjxoDILdDD8Dgq/boDeSBZd/yNtxsITHQ80Wz9DUjsoLOCkufEaTg5N6TkAs1X8SO29sOxJw= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1573043806954572.7715835924778; Wed, 6 Nov 2019 04:36:46 -0800 (PST) Received: from localhost ([::1]:57128 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iSKYF-0000ku-UY for importer@patchew.org; Wed, 06 Nov 2019 07:36:43 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:57128) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iSKX3-0007um-1o for qemu-devel@nongnu.org; Wed, 06 Nov 2019 07:35:30 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iSKX1-0005Uy-1T for qemu-devel@nongnu.org; Wed, 06 Nov 2019 07:35:28 -0500 Received: from mx1.redhat.com ([209.132.183.28]:52326) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iSKX0-0005Ue-No for qemu-devel@nongnu.org; Wed, 06 Nov 2019 07:35:26 -0500 Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id C4C4159445 for ; Wed, 6 Nov 2019 12:35:25 +0000 (UTC) Received: by mail-qk1-f197.google.com with SMTP id a129so24541240qkg.22 for ; Wed, 06 Nov 2019 04:35:25 -0800 (PST) Received: from redhat.com (bzq-79-178-12-128.red.bezeqint.net. [79.178.12.128]) by smtp.gmail.com with ESMTPSA id a3sm11387516qkf.76.2019.11.06.04.35.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Nov 2019 04:35:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=KCLzSOVUUewzW8if+qSqUXDFPbh6AktYDPGpM9xn7Ec=; b=UXVBH9YApB4y6UaFLv2YW5NUd7KbKWeDmybb4VvcOwbnSXFfuaNVQditHHj6tdWNIh 4k5TIeSgjema8Y/3Rxacb/yZdh8jRswn70Z2gBNHCoLyiqARMuSX6z9JgpSx64VbbtX4 DIRnVH+fpFGTgvDqI2hX7EZo9FIhtdcSOpPaJu1CVUQXRzNdvzztGw8Xe2AQ8j0L3NCr 8cJe9m94tK/gw7cwKSRo2HEAXTMVrZ+yL1OR2TzvYrye5uf+D9BnKUKF6GVddO7SiI1k QCADWcb4niSSQkYGtdbDUOBQRRIr0GEi9Ug//VAtiZs0Vvq73nGR1hitxC3ze49OoxdU OSvQ== X-Gm-Message-State: APjAAAVZkOLYUzyCgemVCrxFwG6oODk8IkEi9tMlQjNKf1Z4oqs64/SG aK5ka3Bk+y01f91ToutdPOzuPiqhY+skjzk4ZsJbyUn/P9KtyCQD8OnW8izNBCjiBCtmGeUEXXy gMp7NUU0XqD2UiPo= X-Received: by 2002:a37:4350:: with SMTP id q77mr1806675qka.266.1573043724160; Wed, 06 Nov 2019 04:35:24 -0800 (PST) X-Google-Smtp-Source: APXvYqyOnbB+Sb3lM5qBrbGQNjwe/WN/1rNhKdjS3kBLRuuIl1PsrDLNgR9JRAgqFNQ5VrpU3hI5iQ== X-Received: by 2002:a37:4350:: with SMTP id q77mr1806626qka.266.1573043723639; Wed, 06 Nov 2019 04:35:23 -0800 (PST) Date: Wed, 6 Nov 2019 07:35:18 -0500 From: "Michael S. Tsirkin" To: qemu-devel@nongnu.org Subject: [PULL 2/3] hw/i386: AMD-Vi IVRS DMA alias support Message-ID: <20191106123407.20997-3-mst@redhat.com> References: <20191106123407.20997-1-mst@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20191106123407.20997-1-mst@redhat.com> X-Mailer: git-send-email 2.22.0.678.g13338e74b8 X-Mutt-Fcc: =sent X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Eduardo Habkost , Peter Xu , Alex Williamson , Igor Mammedov , Paolo Bonzini , Richard Henderson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Alex Williamson When we account for DMA aliases in the PCI address space, we can no longer use a single IVHD entry in the IVRS covering all devices. We instead need to walk the PCI bus and create alias ranges when we find a conventional bus. These alias ranges cannot overlap with a "Select All" range (as currently implemented), so we also need to enumerate each device with IVHD entries. Importantly, the IVHD entries used here include a Device ID, which is simply the PCI BDF (Bus/Device/Function). The guest firmware is responsible for programming bus numbers, so the final revision of this table depends on the update mechanism (acpi_build_update) to be called after guest PCI enumeration. For an example guest configuration of: -+-[0000:40]---00.0-[41]----00.0 Intel Corporation 82574L Gigabit Network = Connection \-[0000:00]-+-00.0 Intel Corporation 82G33/G31/P35/P31 Express DRAM Contr= oller +-01.0 Device 1234:1111 +-02.0-[01]----00.0 Intel Corporation 82574L Gigabit Network = Connection +-02.1-[02]----00.0 Red Hat, Inc. QEMU XHCI Host Controller +-02.2-[03]-- +-02.3-[04]-- +-02.4-[05]-- +-02.5-[06-09]----00.0-[07-09]--+-00.0-[08]-- | \-01.0-[09]----00.0 Intel Cor= poration 82574L Gigabit Network Connection +-02.6-[0a-0c]----00.0-[0b-0c]--+-01.0-[0c]-- | \-03.0 Intel Corporation 8254= 0EM Gigabit Ethernet Controller +-02.7-[0d]----0e.0 Intel Corporation 82540EM Gigabit Etherne= t Controller +-03.0 Red Hat, Inc. QEMU PCIe Expander bridge +-04.0 Advanced Micro Devices, Inc. [AMD] Device 0020 +-1f.0 Intel Corporation 82801IB (ICH9) LPC Interface Control= ler +-1f.2 Intel Corporation 82801IR/IO/IH (ICH9R/DO/DH) 6 port S= ATA Controller [AHCI mode] \-1f.3 Intel Corporation 82801I (ICH9 Family) SMBus Controller Where we have: 00:02.7 PCI bridge: Intel Corporation 82801 PCI Bridge (dmi-to-pci-bridge) 00:03.0 Host bridge: Red Hat, Inc. QEMU PCIe Expander bridge (pcie-expander-bus) 06:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Upstream) (pcie-switch-upstream-port) 07:00.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstrea= m) (pcie-switch-downstream-port) 07:01.0 PCI bridge: Texas Instruments XIO3130 PCI Express Switch (Downstrea= m) (pcie-switch-downstream-port) 0a:00.0 PCI bridge: Red Hat, Inc. Device 000e (pcie-to-pci-bridge) The following IVRS table is produced: AMD-Vi: Using IVHD type 0x10 AMD-Vi: device: 00:04.0 cap: 0040 seg: 0 flags: d1 info 0000 AMD-Vi: mmio-addr: 00000000fed80000 AMD-Vi: DEV_SELECT devid: 40:00.0 flags: 00 AMD-Vi: DEV_SELECT_RANGE_START devid: 41:00.0 flags: 00 AMD-Vi: DEV_RANGE_END devid: 41:1f.7 AMD-Vi: DEV_SELECT devid: 00:00.0 flags: 00 AMD-Vi: DEV_SELECT devid: 00:01.0 flags: 00 AMD-Vi: DEV_SELECT devid: 00:02.0 flags: 00 AMD-Vi: DEV_SELECT_RANGE_START devid: 01:00.0 flags: 00 AMD-Vi: DEV_RANGE_END devid: 01:1f.7 AMD-Vi: DEV_SELECT devid: 00:02.1 flags: 00 AMD-Vi: DEV_SELECT_RANGE_START devid: 02:00.0 flags: 00 AMD-Vi: DEV_RANGE_END devid: 02:1f.7 AMD-Vi: DEV_SELECT devid: 00:02.2 flags: 00 AMD-Vi: DEV_SELECT_RANGE_START devid: 03:00.0 flags: 00 AMD-Vi: DEV_RANGE_END devid: 03:1f.7 AMD-Vi: DEV_SELECT devid: 00:02.3 flags: 00 AMD-Vi: DEV_SELECT_RANGE_START devid: 04:00.0 flags: 00 AMD-Vi: DEV_RANGE_END devid: 04:1f.7 AMD-Vi: DEV_SELECT devid: 00:02.4 flags: 00 AMD-Vi: DEV_SELECT_RANGE_START devid: 05:00.0 flags: 00 AMD-Vi: DEV_RANGE_END devid: 05:1f.7 AMD-Vi: DEV_SELECT devid: 00:02.5 flags: 00 AMD-Vi: DEV_SELECT devid: 06:00.0 flags: 00 AMD-Vi: DEV_SELECT devid: 07:00.0 flags: 00 AMD-Vi: DEV_SELECT_RANGE_START devid: 08:00.0 flags: 00 AMD-Vi: DEV_RANGE_END devid: 08:1f.7 AMD-Vi: DEV_SELECT devid: 07:01.0 flags: 00 AMD-Vi: DEV_SELECT_RANGE_START devid: 09:00.0 flags: 00 AMD-Vi: DEV_RANGE_END devid: 09:1f.7 AMD-Vi: DEV_SELECT devid: 00:02.6 flags: 00 AMD-Vi: DEV_SELECT devid: 0a:00.0 flags: 00 AMD-Vi: DEV_ALIAS_RANGE devid: 0b:00.0 flags: 00 devid_to: 0b:00.0 AMD-Vi: DEV_RANGE_END devid: 0c:1f.7 AMD-Vi: DEV_SELECT devid: 00:02.7 flags: 00 AMD-Vi: DEV_ALIAS_RANGE devid: 0d:00.0 flags: 00 devid_to: 00:02.7 AMD-Vi: DEV_RANGE_END devid: 0d:1f.7 AMD-Vi: DEV_SELECT devid: 00:03.0 flags: 00 AMD-Vi: DEV_SELECT devid: 00:04.0 flags: 00 AMD-Vi: DEV_SELECT devid: 00:1f.0 flags: 00 AMD-Vi: DEV_SELECT devid: 00:1f.2 flags: 00 AMD-Vi: DEV_SELECT devid: 00:1f.3 flags: 00 Reviewed-by: Peter Xu Signed-off-by: Alex Williamson Message-Id: <157187084880.5439.16700585779699233836.stgit@gimli.home> Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- hw/i386/acpi-build.c | 127 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 120 insertions(+), 7 deletions(-) diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c index 9dd3dbb16c..dbdbbf59b9 100644 --- a/hw/i386/acpi-build.c +++ b/hw/i386/acpi-build.c @@ -2518,12 +2518,105 @@ build_dmar_q35(GArray *table_data, BIOSLinker *lin= ker) */ #define IOAPIC_SB_DEVID (uint64_t)PCI_BUILD_BDF(0, PCI_DEVFN(0x14, 0)) =20 +/* + * Insert IVHD entry for device and recurse, insert alias, or insert range= as + * necessary for the PCI topology. + */ +static void +insert_ivhd(PCIBus *bus, PCIDevice *dev, void *opaque) +{ + GArray *table_data =3D opaque; + uint32_t entry; + + /* "Select" IVHD entry, type 0x2 */ + entry =3D PCI_BUILD_BDF(pci_bus_num(bus), dev->devfn) << 8 | 0x2; + build_append_int_noprefix(table_data, entry, 4); + + if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_BRIDGE)) { + PCIBus *sec_bus =3D pci_bridge_get_sec_bus(PCI_BRIDGE(dev)); + uint8_t sec =3D pci_bus_num(sec_bus); + uint8_t sub =3D dev->config[PCI_SUBORDINATE_BUS]; + + if (pci_bus_is_express(sec_bus)) { + /* + * Walk the bus if there are subordinates, otherwise use a ran= ge + * to cover an entire leaf bus. We could potentially also use= a + * range for traversed buses, but we'd need to take care not to + * create both Select and Range entries covering the same devi= ce. + * This is easier and potentially more compact. + * + * An example bare metal system seems to use Select entries for + * root ports without a slot (ie. built-ins) and Range entries + * when there is a slot. The same system also only hard-codes + * the alias range for an onboard PCIe-to-PCI bridge, apparent= ly + * making no effort to support nested bridges. We attempt to + * be more thorough here. + */ + if (sec =3D=3D sub) { /* leaf bus */ + /* "Start of Range" IVHD entry, type 0x3 */ + entry =3D PCI_BUILD_BDF(sec, PCI_DEVFN(0, 0)) << 8 | 0x3; + build_append_int_noprefix(table_data, entry, 4); + /* "End of Range" IVHD entry, type 0x4 */ + entry =3D PCI_BUILD_BDF(sub, PCI_DEVFN(31, 7)) << 8 | 0x4; + build_append_int_noprefix(table_data, entry, 4); + } else { + pci_for_each_device(sec_bus, sec, insert_ivhd, table_data); + } + } else { + /* + * If the secondary bus is conventional, then we need to creat= e an + * Alias range for everything downstream. The range covers the + * first devfn on the secondary bus to the last devfn on the + * subordinate bus. The alias target depends on legacy versus + * express bridges, just as in pci_device_iommu_address_space(= ). + * DeviceIDa vs DeviceIDb as per the AMD IOMMU spec. + */ + uint16_t dev_id_a, dev_id_b; + + dev_id_a =3D PCI_BUILD_BDF(sec, PCI_DEVFN(0, 0)); + + if (pci_is_express(dev) && + pcie_cap_get_type(dev) =3D=3D PCI_EXP_TYPE_PCI_BRIDGE) { + dev_id_b =3D dev_id_a; + } else { + dev_id_b =3D PCI_BUILD_BDF(pci_bus_num(bus), dev->devfn); + } + + /* "Alias Start of Range" IVHD entry, type 0x43, 8 bytes */ + build_append_int_noprefix(table_data, dev_id_a << 8 | 0x43, 4); + build_append_int_noprefix(table_data, dev_id_b << 8 | 0x0, 4); + + /* "End of Range" IVHD entry, type 0x4 */ + entry =3D PCI_BUILD_BDF(sub, PCI_DEVFN(31, 7)) << 8 | 0x4; + build_append_int_noprefix(table_data, entry, 4); + } + } +} + +/* For all PCI host bridges, walk and insert IVHD entries */ +static int +ivrs_host_bridges(Object *obj, void *opaque) +{ + GArray *ivhd_blob =3D opaque; + + if (object_dynamic_cast(obj, TYPE_PCI_HOST_BRIDGE)) { + PCIBus *bus =3D PCI_HOST_BRIDGE(obj)->bus; + + if (bus) { + pci_for_each_device(bus, pci_bus_num(bus), insert_ivhd, ivhd_b= lob); + } + } + + return 0; +} + static void build_amd_iommu(GArray *table_data, BIOSLinker *linker) { - int ivhd_table_len =3D 28; + int ivhd_table_len =3D 24; int iommu_start =3D table_data->len; AMDVIState *s =3D AMD_IOMMU_DEVICE(x86_iommu_get_default()); + GArray *ivhd_blob =3D g_array_new(false, true, 1); =20 /* IVRS header */ acpi_data_push(table_data, sizeof(AcpiTableHeader)); @@ -2544,6 +2637,27 @@ build_amd_iommu(GArray *table_data, BIOSLinker *link= er) (1UL << 7), /* PPRSup */ 1); =20 + /* + * A PCI bus walk, for each PCI host bridge, is necessary to create a + * complete set of IVHD entries. Do this into a separate blob so that= we + * can calculate the total IVRS table length here and then append the = new + * blob further below. Fall back to an entry covering all devices, wh= ich + * is sufficient when no aliases are present. + */ + object_child_foreach_recursive(object_get_root(), + ivrs_host_bridges, ivhd_blob); + + if (!ivhd_blob->len) { + /* + * Type 1 device entry reporting all devices + * These are 4-byte device entries currently reporting the range= of + * Refer to Spec - Table 95:IVHD Device Entry Type Codes(4-byte) + */ + build_append_int_noprefix(ivhd_blob, 0x0000001, 4); + } + + ivhd_table_len +=3D ivhd_blob->len; + /* * When interrupt remapping is supported, we add a special IVHD device * for type IO-APIC. @@ -2551,6 +2665,7 @@ build_amd_iommu(GArray *table_data, BIOSLinker *linke= r) if (x86_iommu_ir_supported(x86_iommu_get_default())) { ivhd_table_len +=3D 8; } + /* IVHD length */ build_append_int_noprefix(table_data, ivhd_table_len, 2); /* DeviceID */ @@ -2570,12 +2685,10 @@ build_amd_iommu(GArray *table_data, BIOSLinker *lin= ker) (1UL << 2) | /* GTSup */ (1UL << 6), /* GASup */ 4); - /* - * Type 1 device entry reporting all devices - * These are 4-byte device entries currently reporting the range of - * Refer to Spec - Table 95:IVHD Device Entry Type Codes(4-byte) - */ - build_append_int_noprefix(table_data, 0x0000001, 4); + + /* IVHD entries as found above */ + g_array_append_vals(table_data, ivhd_blob->data, ivhd_blob->len); + g_array_free(ivhd_blob, TRUE); =20 /* * Add a special IVHD device type. --=20 MST From nobody Sat Apr 20 10:21:57 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1573043811; cv=none; d=zoho.com; s=zohoarc; b=f6e8mzDV7RC1azQwDCI6V/6xSeLAe60I2RDYo4B2W//hoPXWWD6m4xFpanxHMOv0KWDqu9BioFcq8wKuD8gCFeLoi2lPKGXG27cdEdHW5EOKok6kAEEelKozKXLKMtcXnScPZ0AYiGmR1TiwWuFYPoWcg02Vakx16KQ+l4cTtPI= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zoho.com; s=zohoarc; t=1573043811; h=Content-Type:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=QfkebnCw9+iVvGKqmCYw9finvtMZT4kE8vPj6IOUFQs=; b=YAmP8xci9mpKlGhj/DlJdNZ/uFUwrO5Ycpc6RfFcZzV3oTKIIe+1wP4N8KZTLa1CvCUTYpw1TDH+qhVT5ZbITPmVhDmU2NFgL7nbmtoRz7m4Y/p66puVWSiFPUDf/HmJwh8A1bUWgziHty1LnMrlG8FsN/b070NV4bFjxwYbIcA= ARC-Authentication-Results: i=1; mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail header.from= (p=none dis=none) header.from= Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1573043811294891.5930965802469; Wed, 6 Nov 2019 04:36:51 -0800 (PST) Received: from localhost ([::1]:57131 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iSKYI-0000pn-US for importer@patchew.org; Wed, 06 Nov 2019 07:36:46 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]:57156) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1iSKX6-0007xy-A7 for qemu-devel@nongnu.org; Wed, 06 Nov 2019 07:35:33 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1iSKX5-0005bX-0t for qemu-devel@nongnu.org; Wed, 06 Nov 2019 07:35:32 -0500 Received: from mx1.redhat.com ([209.132.183.28]:52384) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1iSKX4-0005Zv-NA for qemu-devel@nongnu.org; Wed, 06 Nov 2019 07:35:30 -0500 Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D5EA959445 for ; Wed, 6 Nov 2019 12:35:29 +0000 (UTC) Received: by mail-qk1-f199.google.com with SMTP id b82so3101797qkc.0 for ; Wed, 06 Nov 2019 04:35:29 -0800 (PST) Received: from redhat.com (bzq-79-178-12-128.red.bezeqint.net. [79.178.12.128]) by smtp.gmail.com with ESMTPSA id u27sm14794890qtj.5.2019.11.06.04.35.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Nov 2019 04:35:27 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=QfkebnCw9+iVvGKqmCYw9finvtMZT4kE8vPj6IOUFQs=; b=Ro16N1JY64uUibk280Y0O92QsBJTvWSHjiIvEwhZXdIVA5C05NWznJcv/uYh/CnHiw gNSdKQkvXMpmwXAeadCv0Aly1tEmYrIL6+MAiQK417X25HDUvBt4oVEOifJs24qbjiEy IexrDIywkwyt/nKjcptFLI05PlZMcaB69f/HoD+vew3N5faYvUMlZkRc3mvF2Irigur8 TKK0GcTSJdFSC7hkELGDDpb61epA9xdRYuSJx3hfp730t/fNZD84ED3vSIouts7dHb79 kY7EP7ZmiYtoI0ciB1YzDMjL12Kvlnnkj7E6COOiNNuybdzBzMmbmgcbNAwV2pZusgf5 xebQ== X-Gm-Message-State: APjAAAUEKEZNi+u3ZXf6yguOnyTsdxQxTdvL+I6ilkhhgr9ulbQN4rxm 3cBZDCgAnCfzc2rAELOrOxWvydq42qXAk9R1ByKMEPAhj4WgScCsgKaTVA5Hb8tfQmlZVYcQg55 0Qn4iL7WNQNAghNE= X-Received: by 2002:ad4:5446:: with SMTP id h6mr1957007qvt.20.1573043728839; Wed, 06 Nov 2019 04:35:28 -0800 (PST) X-Google-Smtp-Source: APXvYqwIxColSsTsiJ62wP6yX/To1ZObhscut/SmVCoBVIP7TgHKRM4O+oIyThMjLNWvEAtbjBIRdg== X-Received: by 2002:ad4:5446:: with SMTP id h6mr1956978qvt.20.1573043728569; Wed, 06 Nov 2019 04:35:28 -0800 (PST) Date: Wed, 6 Nov 2019 07:35:24 -0500 From: "Michael S. Tsirkin" To: qemu-devel@nongnu.org Subject: [PULL 3/3] virtio: notify virtqueue via host notifier when available Message-ID: <20191106123407.20997-4-mst@redhat.com> References: <20191106123407.20997-1-mst@redhat.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20191106123407.20997-1-mst@redhat.com> X-Mailer: git-send-email 2.22.0.678.g13338e74b8 X-Mutt-Fcc: =sent X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Peter Maydell , Stefan Hajnoczi , Felipe Franciosi Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Stefan Hajnoczi Host notifiers are used in several cases: 1. Traditional ioeventfd where virtqueue notifications are handled in the main loop thread. 2. IOThreads (aio_handle_output) where virtqueue notifications are handled in an IOThread AioContext. 3. vhost where virtqueue notifications are handled by kernel vhost or a vhost-user device backend. Most virtqueue notifications from the guest use the ioeventfd mechanism, but there are corner cases where QEMU code calls virtio_queue_notify(). This currently honors the host notifier for the IOThreads aio_handle_output case, but not for the vhost case. The result is that vhost does not receive virtqueue notifications from QEMU when virtio_queue_notify() is called. This patch extends virtio_queue_notify() to set the host notifier whenever it is enabled instead of calling the vq->(aio_)handle_output() function directly. We track the host notifier state for each virtqueue separately since some devices may use it only for certain virtqueues. This fixes the vhost case although it does add a trip through the eventfd for the traditional ioeventfd case. I don't think it's worth adding a fast path for the traditional ioeventfd case because calling virtio_queue_notify() is rare when ioeventfd is enabled. Reported-by: Felipe Franciosi Signed-off-by: Stefan Hajnoczi Message-Id: <20191105140946.165584-1-stefanha@redhat.com> Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin --- hw/virtio/virtio-bus.c | 4 ++++ hw/virtio/virtio.c | 9 ++++++++- include/hw/virtio/virtio.h | 1 + 3 files changed, 13 insertions(+), 1 deletion(-) diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c index b2c804292e..d6332d45c3 100644 --- a/hw/virtio/virtio-bus.c +++ b/hw/virtio/virtio-bus.c @@ -288,6 +288,10 @@ int virtio_bus_set_host_notifier(VirtioBusState *bus, = int n, bool assign) k->ioeventfd_assign(proxy, notifier, n, false); } =20 + if (r =3D=3D 0) { + virtio_queue_set_host_notifier_enabled(vq, assign); + } + return r; } =20 diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index 762df12f4c..04716b5f6c 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -128,6 +128,7 @@ struct VirtQueue VirtIODevice *vdev; EventNotifier guest_notifier; EventNotifier host_notifier; + bool host_notifier_enabled; QLIST_ENTRY(VirtQueue) node; }; =20 @@ -2271,7 +2272,7 @@ void virtio_queue_notify(VirtIODevice *vdev, int n) } =20 trace_virtio_queue_notify(vdev, vq - vdev->vq, vq); - if (vq->handle_aio_output) { + if (vq->host_notifier_enabled) { event_notifier_set(&vq->host_notifier); } else if (vq->handle_output) { vq->handle_output(vdev, vq); @@ -3145,6 +3146,7 @@ void virtio_init(VirtIODevice *vdev, const char *name, vdev->vq[i].vector =3D VIRTIO_NO_VECTOR; vdev->vq[i].vdev =3D vdev; vdev->vq[i].queue_index =3D i; + vdev->vq[i].host_notifier_enabled =3D false; } =20 vdev->name =3D name; @@ -3436,6 +3438,11 @@ EventNotifier *virtio_queue_get_host_notifier(VirtQu= eue *vq) return &vq->host_notifier; } =20 +void virtio_queue_set_host_notifier_enabled(VirtQueue *vq, bool enabled) +{ + vq->host_notifier_enabled =3D enabled; +} + int virtio_queue_set_host_notifier_mr(VirtIODevice *vdev, int n, MemoryRegion *mr, bool assign) { diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h index 3448d67d2a..c32a815303 100644 --- a/include/hw/virtio/virtio.h +++ b/include/hw/virtio/virtio.h @@ -312,6 +312,7 @@ int virtio_device_grab_ioeventfd(VirtIODevice *vdev); void virtio_device_release_ioeventfd(VirtIODevice *vdev); bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev); EventNotifier *virtio_queue_get_host_notifier(VirtQueue *vq); +void virtio_queue_set_host_notifier_enabled(VirtQueue *vq, bool enabled); void virtio_queue_host_notifier_read(EventNotifier *n); void virtio_queue_aio_set_host_notifier_handler(VirtQueue *vq, AioContext = *ctx, VirtIOHandleAIOOutput hand= le_output); --=20 MST