From nobody Sun Apr 28 09:04:49 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=fail(p=none dis=none) header.from=redhat.com Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 154707553025727.782101498864222; Wed, 9 Jan 2019 15:12:10 -0800 (PST) Received: from localhost ([127.0.0.1]:56344 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ghN0v-0005DN-64 for importer@patchew.org; Wed, 09 Jan 2019 18:11:57 -0500 Received: from eggs.gnu.org ([209.51.188.92]:46488) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ghN02-0004nz-Sb for qemu-devel@nongnu.org; Wed, 09 Jan 2019 18:11:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ghN01-0001ka-UM for qemu-devel@nongnu.org; Wed, 09 Jan 2019 18:11:02 -0500 Received: from mx1.redhat.com ([209.132.183.28]:41842) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1ghN01-0001kH-Me for qemu-devel@nongnu.org; Wed, 09 Jan 2019 18:11:01 -0500 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 993CF88E51 for ; Wed, 9 Jan 2019 23:10:59 +0000 (UTC) Received: from gimli.home (ovpn-116-25.phx2.redhat.com [10.3.116.25]) by smtp.corp.redhat.com (Postfix) with ESMTP id 14EAA600C9; Wed, 9 Jan 2019 23:10:51 +0000 (UTC) From: Alex Williamson To: alex.williamson@redhat.com Date: Wed, 09 Jan 2019 16:10:51 -0700 Message-ID: <154707542737.22183.7160770678781819267.stgit@gimli.home> User-Agent: StGit/0.19-dirty MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.25]); Wed, 09 Jan 2019 23:10:59 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH] vfio/common: Work around kernel overflow bug in DMA unmap X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: qemu-devel@nongnu.org, peterx@redhat.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" A kernel bug was introduced in v4.15 via commit 71a7d3d78e3c which adds a test for address space wrap-around in the vfio DMA unmap path. Unfortunately due to overflow, the kernel detects an unmap of the last page in the 64-bit address space as a wrap-around. In QEMU, a Q35 guest with VT-d emulation and guest IOMMU enabled will attempt to make such an unmap request during VM system reset, triggering an error: qemu-kvm: VFIO_UNMAP_DMA: -22 qemu-kvm: vfio_dma_unmap(0x561f059948f0, 0xfef00000, 0xffffffff01100000) = =3D -22 (Invalid argument) Here the IOVA start address (0xfef00000) and the size parameter (0xffffffff01100000) add to exactly 2^64, triggering the bug. A kernel fix is queued for the Linux v5.0 release to address this. This patch implements a workaround to retry the unmap, excluding the final page of the range when we detect an unmap failing which matches the requirements for this issue. This is expected to be a safe and complete workaround as the VT-d address space does not extend to the full 64-bit space and therefore the last page should never be mapped. This workaround can be removed once all kernels with this bug are sufficiently deprecated. Link: https://bugzilla.redhat.com/show_bug.cgi?id=3D1662291 Reported-by: Pei Zhang Debugged-by: Peter Xu Signed-off-by: Alex Williamson Reviewed-by: Cornelia Huck Reviewed-by: Peter Xu --- hw/vfio/common.c | 20 +++++++++++++++++++- hw/vfio/trace-events | 1 + 2 files changed, 20 insertions(+), 1 deletion(-) diff --git a/hw/vfio/common.c b/hw/vfio/common.c index 7c185e5a2e79..820b839057c6 100644 --- a/hw/vfio/common.c +++ b/hw/vfio/common.c @@ -220,7 +220,25 @@ static int vfio_dma_unmap(VFIOContainer *container, .size =3D size, }; =20 - if (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) { + while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, &unmap)) { + /* + * The type1 backend has an off-by-one bug in the kernel (71a7d3d7= 8e3c + * v4.15) where an overflow in its wrap-around check prevents us f= rom + * unmapping the last page of the address space. Test for the err= or + * condition and re-try the unmap excluding the last page. The + * expectation is that we've never mapped the last page anyway and= this + * unmap request comes via vIOMMU support which also makes it unli= kely + * that this page is used. This bug was introduced well after typ= e1 v2 + * support was introduced, so we shouldn't need to test for v1. A= fix + * is queued for kernel v5.0 so this workaround can be removed once + * affected kernels are sufficiently deprecated. + */ + if (errno =3D=3D EINVAL && unmap.size && !(unmap.iova + unmap.size= ) && + container->iommu_type =3D=3D VFIO_TYPE1v2_IOMMU) { + trace_vfio_dma_unmap_overflow_workaround(); + unmap.size -=3D 1ULL << ctz64(container->pgsizes); + continue; + } error_report("VFIO_UNMAP_DMA: %d", -errno); return -errno; } diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index a85e8662eadb..a002c6af2dda 100644 --- a/hw/vfio/trace-events +++ b/hw/vfio/trace-events @@ -110,6 +110,7 @@ vfio_region_mmaps_set_enabled(const char *name, bool en= abled) "Region %s mmaps e vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) = "Device %s region %d: %d sparse mmap entries" vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long en= d) "sparse entry %d [0x%lx - 0x%lx]" vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t s= ubtype) "%s index %d, %08x/%0x8" +vfio_dma_unmap_overflow_workaround(void) "" =20 # hw/vfio/platform.c vfio_platform_base_device_init(char *name, int groupid) "%s belongs to gro= up #%d"