From nobody Sun Nov 9 11:46:09 2025 Delivered-To: importer@patchew.org Received-SPF: temperror (zoho.com: Error in retrieving data from DNS) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=temperror (zoho.com: Error in retrieving data from DNS) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1550813741965560.3586997645433; Thu, 21 Feb 2019 21:35:41 -0800 (PST) Received: from localhost ([127.0.0.1]:44509 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gx3Uc-0002nO-62 for importer@patchew.org; Fri, 22 Feb 2019 00:35:26 -0500 Received: from eggs.gnu.org ([209.51.188.92]:49749) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gx3TI-0002Ei-3X for qemu-devel@nongnu.org; Fri, 22 Feb 2019 00:34:05 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gx3TG-0004kQ-Qn for qemu-devel@nongnu.org; Fri, 22 Feb 2019 00:34:03 -0500 Received: from mx1.redhat.com ([209.132.183.28]:38172) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gx3TG-0004gx-DD for qemu-devel@nongnu.org; Fri, 22 Feb 2019 00:34:02 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E34873091D80 for ; Fri, 22 Feb 2019 05:33:51 +0000 (UTC) Received: from gimli.home (ovpn-116-24.phx2.redhat.com [10.3.116.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id A74A167676; Fri, 22 Feb 2019 05:33:49 +0000 (UTC) From: Alex Williamson To: qemu-devel@nongnu.org Date: Thu, 21 Feb 2019 22:33:49 -0700 Message-ID: <155081362929.23160.15724111710078454465.stgit@gimli.home> In-Reply-To: <155081340903.23160.4034617687275790161.stgit@gimli.home> References: <155081340903.23160.4034617687275790161.stgit@gimli.home> User-Agent: StGit/0.19-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Fri, 22 Feb 2019 05:33:51 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL 1/2] vfio/common: Work around kernel overflow bug in DMA unmap X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which adds a test for address space wrap-around in the vfio DMA unmap path. Unfortunately due to overflow, the kernel detects an unmap of the last page in the 64-bit address space as a wrap-around. In QEMU, a Q35 guest with VT-d emulation and guest IOMMU enabled will attempt to make such an unmap request during VM system reset, triggering an error: qemu-kvm: VFIO_UNMAP_DMA: -22 qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = =3D -22 (Invalid argument) Here the IOVA start address (0xfef00000) and the size parameter (0xffffffff01100000) add to exactly 2^64, triggering the bug. A kernel fix is queued for the Linux v5.0 release to address this. This patch implements a workaround to retry the unmap, excluding the final page of the range when we detect an unmap failing which matches the requirements for this issue. This is expected to be a safe and complete workaround as the VT-d address space does not extend to the full 64-bit space and therefore the last page should never be mapped. This workaround can be removed once all kernels with this bug are sufficiently deprecated. Link: https://bugzilla.redhat.com/show_bug.cgi?id=3D1662291 Reported-by: Pei Zhang Debugged-by: Peter Xu Reviewed-by: Peter Xu Reviewed-by: Cornelia Huck Signed-off-by: Alex Williamson --- hw/vfio/common.c | 20 +++++++++++++++++++- hw/vfio/trace-events | 1 + 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 4262b80c4450..9c3796e7db43 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -220,7 +220,25 @@ static int vfio_dma_unmap(VFIOContainer *container, .size =3D size, }; =20 - if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) { + while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) { + /* + * The type1 backend has an off-by-one bug in the kernel (71a7d3d7= 8e3c + * v4.15) where an overflow in its wrap-around check prevents us f= rom + * unmapping the last page of the address space. Test for the err= or + * condition and re-try the unmap excluding the last page. The + * expectation is that we've never mapped the last page anyway and= this + * unmap request comes via vIOMMU support which also makes it unli= kely + * that this page is used. This bug was introduced well after typ= e1 v2 + * support was introduced, so we shouldn't need to test for v1. A= fix + * is queued for kernel v5.0 so this workaround can be removed once + * affected kernels are sufficiently deprecated. + */ + if (errno =3D=3D EINVAL && unmap.size && !(unmap.iova + unmap.size= ) && + container->iommu_type =3D=3D VFIO_TYPE1v2_IOMMU) { + trace_vfio_dma_unmap_overflow_workaround(); + unmap.size -=3D 1ULL << ctz64(container->pgsizes); + continue; + } error_report("VFIO_UNMAP_DMA: %d", -errno); return -errno; } diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index f41ca96160bf..ed2f333ad726 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -110,6 +110,7 @@ vfio_region_mmaps_set_enabled(const char *name, bool en= abled) "Region %s mmaps e vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) = "Device %s region %d: %d sparse mmap entries" vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long en= d) "sparse entry %d [0x%lx - 0x%lx]" vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t s= ubtype) "%s index %d, %08x/%0x8" +vfio_dma_unmap_overflow_workaround(void) "" =20 # hw/vfio/platform.c vfio_platform_base_device_init(char *name, int groupid) "%s belongs to gro= up #%d" From nobody Sun Nov 9 11:46:09 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1550813821738468.458098969336; Thu, 21 Feb 2019 21:37:01 -0800 (PST) Received: from localhost ([127.0.0.1]:44571 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gx3W4-0003nj-OW for importer@patchew.org; Fri, 22 Feb 2019 00:36:56 -0500 Received: from eggs.gnu.org ([209.51.188.92]:49879) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gx3TL-0002FV-BN for qemu-devel@nongnu.org; Fri, 22 Feb 2019 00:34:08 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gx3TJ-0004oH-Ug for qemu-devel@nongnu.org; Fri, 22 Feb 2019 00:34:07 -0500 Received: from mx1.redhat.com ([209.132.183.28]:48896) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1gx3TJ-0004if-A5 for qemu-devel@nongnu.org; Fri, 22 Feb 2019 00:34:05 -0500 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9AF1A3688B for ; Fri, 22 Feb 2019 05:33:57 +0000 (UTC) Received: from gimli.home (ovpn-116-24.phx2.redhat.com [10.3.116.24]) by smtp.corp.redhat.com (Postfix) with ESMTP id 5DA2B19C58; Fri, 22 Feb 2019 05:33:57 +0000 (UTC) From: Alex Williamson To: qemu-devel@nongnu.org Date: Thu, 21 Feb 2019 22:33:57 -0700 Message-ID: <155081363698.23160.8018986798435862933.stgit@gimli.home> In-Reply-To: <155081340903.23160.4034617687275790161.stgit@gimli.home> References: <155081340903.23160.4034617687275790161.stgit@gimli.home> User-Agent: StGit/0.19-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Fri, 22 Feb 2019 05:33:57 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PULL 2/2] hw/vfio/common: Refactor container initialization X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" From: Eric Auger We introduce the vfio_init_container_type() helper. It computes the highest usable iommu type and then set the container and the iommu type. Its usage in vfio_connect_container() makes the code ready for addition of new iommu types. Signed-off-by: Eric Auger Reviewed-by: Greg Kurz Signed-off-by: Alex Williamson --- hw/vfio/common.c | 114 +++++++++++++++++++++++++++++++++-----------------= ---- 1 file changed, 70 insertions(+), 44 deletions(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 9c3796e7db43..df2b4721bffb 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -1054,6 +1054,60 @@ static void vfio_put_address_space(VFIOAddressSpace = *space) } } =20 +/* + * vfio_get_iommu_type - selects the richest iommu_type (v2 first) + */ +static int vfio_get_iommu_type(VFIOContainer *container, + Error **errp) +{ + int iommu_types[] =3D { VFIO_TYPE1v2_IOMMU, VFIO_TYPE1_IOMMU, + VFIO_SPAPR_TCE_v2_IOMMU, VFIO_SPAPR_TCE_IOMMU }; + int i; + + for (i =3D 0; i < ARRAY_SIZE(iommu_types); i++) { + if (ioctl(container->fd, VFIO_CHECK_EXTENSION, iommu_types[i])) { + return iommu_types[i]; + } + } + error_setg(errp, "No available IOMMU models"); + return -EINVAL; +} + +static int vfio_init_container(VFIOContainer *container, int group_fd, + Error **errp) +{ + int iommu_type, ret; + + iommu_type =3D vfio_get_iommu_type(container, errp); + if (iommu_type < 0) { + return iommu_type; + } + + ret =3D ioctl(group_fd, VFIO_GROUP_SET_CONTAINER, &container->fd); + if (ret) { + error_setg_errno(errp, errno, "Failed to set group container"); + return -errno; + } + + while (ioctl(container->fd, VFIO_SET_IOMMU, iommu_type)) { + if (iommu_type =3D=3D VFIO_SPAPR_TCE_v2_IOMMU) { + /* + * On sPAPR, despite the IOMMU subdriver always advertises v1 = and + * v2, the running platform may not support v2 and there is no + * way to guess it until an IOMMU group gets added to the cont= ainer. + * So in case it fails with v2, try v1 as a fallback. + */ + iommu_type =3D VFIO_SPAPR_TCE_IOMMU; + continue; + } + error_setg_errno(errp, errno, "Failed to set iommu for container"); + return -errno; + } + + container->iommu_type =3D iommu_type; + return 0; +} + static int vfio_connect_container(VFIOGroup *group, AddressSpace *as, Error **errp) { @@ -1119,25 +1173,17 @@ static int vfio_connect_container(VFIOGroup *group,= AddressSpace *as, container->fd =3D fd; QLIST_INIT(&container->giommu_list); QLIST_INIT(&container->hostwin_list); - if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1_IOMMU) || - ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU)) { - bool v2 =3D !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_TYPE1v2_IOMMU); - struct vfio_iommu_type1_info info; =20 - ret =3D ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd); - if (ret) { - error_setg_errno(errp, errno, "failed to set group container"); - ret =3D -errno; - goto free_container_exit; - } + ret =3D vfio_init_container(container, group->fd, errp); + if (ret) { + goto free_container_exit; + } =20 - container->iommu_type =3D v2 ? VFIO_TYPE1v2_IOMMU : VFIO_TYPE1_IOM= MU; - ret =3D ioctl(fd, VFIO_SET_IOMMU, container->iommu_type); - if (ret) { - error_setg_errno(errp, errno, "failed to set iommu for contain= er"); - ret =3D -errno; - goto free_container_exit; - } + switch (container->iommu_type) { + case VFIO_TYPE1v2_IOMMU: + case VFIO_TYPE1_IOMMU: + { + struct vfio_iommu_type1_info info; =20 /* * FIXME: This assumes that a Type1 IOMMU can map any 64-bit @@ -1155,30 +1201,13 @@ static int vfio_connect_container(VFIOGroup *group,= AddressSpace *as, } vfio_host_win_add(container, 0, (hwaddr)-1, info.iova_pgsizes); container->pgsizes =3D info.iova_pgsizes; - } else if (ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_IOMMU) || - ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IOMMU)) { + break; + } + case VFIO_SPAPR_TCE_v2_IOMMU: + case VFIO_SPAPR_TCE_IOMMU: + { struct vfio_iommu_spapr_tce_info info; - bool v2 =3D !!ioctl(fd, VFIO_CHECK_EXTENSION, VFIO_SPAPR_TCE_v2_IO= MMU); - - ret =3D ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &fd); - if (ret) { - error_setg_errno(errp, errno, "failed to set group container"); - ret =3D -errno; - goto free_container_exit; - } - container->iommu_type =3D - v2 ? VFIO_SPAPR_TCE_v2_IOMMU : VFIO_SPAPR_TCE_IOMMU; - ret =3D ioctl(fd, VFIO_SET_IOMMU, container->iommu_type); - if (ret) { - container->iommu_type =3D VFIO_SPAPR_TCE_IOMMU; - v2 =3D false; - ret =3D ioctl(fd, VFIO_SET_IOMMU, container->iommu_type); - } - if (ret) { - error_setg_errno(errp, errno, "failed to set iommu for contain= er"); - ret =3D -errno; - goto free_container_exit; - } + bool v2 =3D container->iommu_type =3D=3D VFIO_SPAPR_TCE_v2_IOMMU; =20 /* * The host kernel code implementing VFIO_IOMMU_DISABLE is called @@ -1240,10 +1269,7 @@ static int vfio_connect_container(VFIOGroup *group, = AddressSpace *as, info.dma32_window_size - 1, 0x1000); } - } else { - error_setg(errp, "No available IOMMU models"); - ret =3D -EINVAL; - goto free_container_exit; + } } =20 vfio_kvm_device_add_group(group);