From nobody Fri Dec 19 12:00:35 2025 Received: from mx0b-00082601.pphosted.com (mx0b-00082601.pphosted.com [67.231.153.30]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A280F1A0BD0 for ; Fri, 10 Oct 2025 07:38:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=67.231.153.30 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760081927; cv=none; b=ttYq0oNjOV6ndoDSs8E/6iiWsMRZUgeyORNnQy/HZTE+kw2//LtHkBU4XZpo0X/a1hw5XzgxJA8yi/mS0ac+tevCoW6HyH42Kky0xfWUGj49gCnJvFe6OAOOCMm3Rxhmt39ZjhkgoOWJwPRKDzLndzYYjXEPXcTtHA1vXuGwA6E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1760081927; c=relaxed/simple; bh=S+pHiYIHaGw+CvQLU0K5VOk+9NWEeMXMFCmBxYIMrCQ=; h=From:Date:Subject:MIME-Version:Content-Type:Message-ID:References: In-Reply-To:To:CC; b=Apkcua4BAUYTxPwgPoNZZ8QEtptUgMazrBTZMAdNJp7u00G+NGxE2oh9exAhVOIgl1+yVkGdEfeGDB3E3nnez2ZzZNt5zC2BpliZZ9tREuJBUgwn1sEn6FucMxtcCzhZ9dVxTocx7wPpt16+BlnPbC+xd3TO+FwUGJ6XaaJez70= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com; spf=pass smtp.mailfrom=meta.com; dkim=pass (2048-bit key) header.d=fb.com header.i=@fb.com header.b=iE3YKsRb; arc=none smtp.client-ip=67.231.153.30 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=fb.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=meta.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=fb.com header.i=@fb.com header.b="iE3YKsRb" Received: from pps.filterd (m0109332.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.18.1.11/8.18.1.11) with ESMTP id 599KUNpo1851787 for ; Fri, 10 Oct 2025 00:38:44 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=cc :content-transfer-encoding:content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to; s=s2048-2025-q2; bh=IOSbF+UJ+T1DnDZXSj4ONs9KyACoP34rO8u6Ro8Ak0g=; b=iE3YKsRbz09J FP3Wuv+DATzNiHpm/M6kOgICPOIehcFCqY1GcYKekIbPWgBJvRA7CP4/GjLRWls6 PQZVeoe6jNEwplVYNL2f7Ghdcbo8qPZLQwaAU53LWbdVdot1WUPLuBIUggPcPP3B ivpv5ImaogEs9sXm3bHv6x3aIY1xH6DDxE3Yuhgi8lFt24QsUmFu7JJfY9RsTBoT 9fnRh7nTticdwBQHE+hsTnzsyOZzUQBhNf1BwJklvhIhZ9zJkkkwZiipO7vBBjSk 08fMugkhDye3hu0tT/PpxhoGFOJo+tonSkDXZBLMgw5WzhHM5DT/pimNxlsAH1fC t38mIzmPzA== Received: from mail.thefacebook.com ([163.114.134.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 49pkcqwfnp-11 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Fri, 10 Oct 2025 00:38:44 -0700 (PDT) Received: from twshared23637.05.prn5.facebook.com (2620:10d:c085:108::4) by mail.thefacebook.com (2620:10d:c08b:78::c78f) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.2.2562.20; Fri, 10 Oct 2025 07:38:41 +0000 Received: by devgpu015.cco6.facebook.com (Postfix, from userid 199522) id 45172E937D8; Fri, 10 Oct 2025 00:38:40 -0700 (PDT) From: Alex Mastro Date: Fri, 10 Oct 2025 00:38:41 -0700 Subject: [PATCH v3 3/3] vfio/type1: handle DMA map/unmap up to the addressable limit Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Message-ID: <20251010-fix-unmap-v3-3-306c724d6998@fb.com> References: <20251010-fix-unmap-v3-0-306c724d6998@fb.com> In-Reply-To: <20251010-fix-unmap-v3-0-306c724d6998@fb.com> To: Alex Williamson CC: Jason Gunthorpe , Alejandro Jimenez , , , Alex Mastro X-Mailer: b4 0.13.0 X-FB-Internal: Safe X-Authority-Analysis: v=2.4 cv=KNBXzVFo c=1 sm=1 tr=0 ts=68e8b804 cx=c_pps a=CB4LiSf2rd0gKozIdrpkBw==:117 a=CB4LiSf2rd0gKozIdrpkBw==:17 a=IkcTkHD0fZMA:10 a=x6icFKpwvdMA:10 a=FOH2dFAWAAAA:8 a=P34euavMcYKjluB8mAYA:9 a=QEXdDO2ut3YA:10 X-Proofpoint-GUID: tIhgREUE1Olp_dNU5ItdZGR4USBoRrd0 X-Proofpoint-ORIG-GUID: tIhgREUE1Olp_dNU5ItdZGR4USBoRrd0 X-Proofpoint-Spam-Details-Enc: AW1haW4tMjUxMDEwMDA0MyBTYWx0ZWRfX9vwtjPf1pel0 OAa1LIDfPBXpzUSx28LCJBYiZ86npDsnb+ZV3LQk8mvYGau1080OcqFCeP0oeruHKx4dFGGNXc8 CqE0KQU+f/gyfqmpIuqhv7D+n0PjzZ2E6WfZFTzlKcaDwwM+CGerOPLRoJaTKPghjpDJ4CRcScp ThVWACXmwY7dFhCtqWOuGxsDDZ082fN9uwDkrXx55/WGEeVxYHxyvOEKsbfbvhu7Aean6VmL40g s5B2kg6SflACz+8fGI5rdOOz5YnzcwAxEMsUi5OGu4niTfUz8IiHJQlLJ0+pPYctuQwAZsvhzX7 BrfOt3Fp8BADKUpj5Z967guPMGYW9LrvhsaHJNlSzLN7rGMMurTu1ukTpKI8/WnK5QGjbvUIa6B AnvY4aoz9YKtMd0BdOMDaCTaPiwAuA== X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.293,Aquarius:18.0.1117,Hydra:6.1.9,FMLib:17.12.80.40 definitions=2025-10-10_01,2025-10-06_01,2025-03-28_01 Handle DMA map/unmap operations up to the addressable limit by comparing against inclusive end-of-range limits, and changing iteration to perform relative traversals across range sizes, rather than absolute traversals across addresses. vfio_link_dma inserts a zero-sized vfio_dma into the rb-tree, and is only used for that purpose, so discard the size from consideration for the insertion point. Signed-off-by: Alex Mastro --- drivers/vfio/vfio_iommu_type1.c | 77 ++++++++++++++++++++++---------------= ---- 1 file changed, 42 insertions(+), 35 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type= 1.c index 15aab95d9b8d..567cbab8dfd3 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -166,12 +166,14 @@ static struct vfio_dma *vfio_find_dma(struct vfio_iom= mu *iommu, { struct rb_node *node =3D iommu->dma_list.rb_node; =20 + WARN_ON(!size); + while (node) { struct vfio_dma *dma =3D rb_entry(node, struct vfio_dma, node); =20 - if (start + size <=3D dma->iova) + if (start + size - 1 < dma->iova) node =3D node->rb_left; - else if (start >=3D dma->iova + dma->size) + else if (start > dma->iova + dma->size - 1) node =3D node->rb_right; else return dma; @@ -181,16 +183,19 @@ static struct vfio_dma *vfio_find_dma(struct vfio_iom= mu *iommu, } =20 static struct rb_node *vfio_find_dma_first_node(struct vfio_iommu *iommu, - dma_addr_t start, u64 size) + dma_addr_t start, + dma_addr_t end) { struct rb_node *res =3D NULL; struct rb_node *node =3D iommu->dma_list.rb_node; struct vfio_dma *dma_res =3D NULL; =20 + WARN_ON(end < start); + while (node) { struct vfio_dma *dma =3D rb_entry(node, struct vfio_dma, node); =20 - if (start < dma->iova + dma->size) { + if (start <=3D dma->iova + dma->size - 1) { res =3D node; dma_res =3D dma; if (start >=3D dma->iova) @@ -200,7 +205,7 @@ static struct rb_node *vfio_find_dma_first_node(struct = vfio_iommu *iommu, node =3D node->rb_right; } } - if (res && size && dma_res->iova >=3D start + size) + if (res && dma_res->iova > end) res =3D NULL; return res; } @@ -210,11 +215,13 @@ static void vfio_link_dma(struct vfio_iommu *iommu, s= truct vfio_dma *new) struct rb_node **link =3D &iommu->dma_list.rb_node, *parent =3D NULL; struct vfio_dma *dma; =20 + WARN_ON(new->size !=3D 0); + while (*link) { parent =3D *link; dma =3D rb_entry(parent, struct vfio_dma, node); =20 - if (new->iova + new->size <=3D dma->iova) + if (new->iova <=3D dma->iova) link =3D &(*link)->rb_left; else link =3D &(*link)->rb_right; @@ -1078,12 +1085,12 @@ static size_t unmap_unpin_slow(struct vfio_domain *= domain, static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dm= a, bool do_accounting) { - dma_addr_t iova =3D dma->iova, end =3D dma->iova + dma->size; struct vfio_domain *domain, *d; LIST_HEAD(unmapped_region_list); struct iommu_iotlb_gather iotlb_gather; int unmapped_region_cnt =3D 0; long unlocked =3D 0; + size_t pos =3D 0; =20 if (!dma->size) return 0; @@ -1107,13 +1114,14 @@ static long vfio_unmap_unpin(struct vfio_iommu *iom= mu, struct vfio_dma *dma, } =20 iommu_iotlb_gather_init(&iotlb_gather); - while (iova < end) { + while (pos < dma->size) { size_t unmapped, len; phys_addr_t phys, next; + dma_addr_t iova =3D dma->iova + pos; =20 phys =3D iommu_iova_to_phys(domain->domain, iova); if (WARN_ON(!phys)) { - iova +=3D PAGE_SIZE; + pos +=3D PAGE_SIZE; continue; } =20 @@ -1122,7 +1130,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu= , struct vfio_dma *dma, * may require hardware cache flushing, try to find the * largest contiguous physical memory chunk to unmap. */ - for (len =3D PAGE_SIZE; iova + len < end; len +=3D PAGE_SIZE) { + for (len =3D PAGE_SIZE; pos + len < dma->size; len +=3D PAGE_SIZE) { next =3D iommu_iova_to_phys(domain->domain, iova + len); if (next !=3D phys + len) break; @@ -1143,7 +1151,7 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu= , struct vfio_dma *dma, break; } =20 - iova +=3D unmapped; + pos +=3D unmapped; } =20 dma->iommu_mapped =3D false; @@ -1235,7 +1243,7 @@ static int update_user_bitmap(u64 __user *bitmap, str= uct vfio_iommu *iommu, } =20 static int vfio_iova_dirty_bitmap(u64 __user *bitmap, struct vfio_iommu *i= ommu, - dma_addr_t iova, size_t size, size_t pgsize) + dma_addr_t iova, dma_addr_t iova_end, size_t pgsize) { struct vfio_dma *dma; struct rb_node *n; @@ -1252,8 +1260,8 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap,= struct vfio_iommu *iommu, if (dma && dma->iova !=3D iova) return -EINVAL; =20 - dma =3D vfio_find_dma(iommu, iova + size - 1, 0); - if (dma && dma->iova + dma->size !=3D iova + size) + dma =3D vfio_find_dma(iommu, iova_end, 1); + if (dma && dma->iova + dma->size - 1 !=3D iova_end) return -EINVAL; =20 for (n =3D rb_first(&iommu->dma_list); n; n =3D rb_next(n)) { @@ -1262,7 +1270,7 @@ static int vfio_iova_dirty_bitmap(u64 __user *bitmap,= struct vfio_iommu *iommu, if (dma->iova < iova) continue; =20 - if (dma->iova > iova + size - 1) + if (dma->iova > iova_end) break; =20 ret =3D update_user_bitmap(bitmap, iommu, dma, iova, pgsize); @@ -1350,7 +1358,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu, if (unmap_all) { if (iova || size) goto unlock; - size =3D U64_MAX; + iova_end =3D U64_MAX; } else { if (!size || size & (pgsize - 1) || size > SIZE_MAX) goto unlock; @@ -1405,17 +1413,17 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iom= mu, if (dma && dma->iova !=3D iova) goto unlock; =20 - dma =3D vfio_find_dma(iommu, iova_end, 0); - if (dma && dma->iova + dma->size !=3D iova + size) + dma =3D vfio_find_dma(iommu, iova_end, 1); + if (dma && dma->iova + dma->size - 1 !=3D iova_end) goto unlock; } =20 ret =3D 0; - n =3D first_n =3D vfio_find_dma_first_node(iommu, iova, size); + n =3D first_n =3D vfio_find_dma_first_node(iommu, iova, iova_end); =20 while (n) { dma =3D rb_entry(n, struct vfio_dma, node); - if (dma->iova >=3D iova + size) + if (dma->iova > iova_end) break; =20 if (!iommu->v2 && iova > dma->iova) @@ -1747,12 +1755,12 @@ static int vfio_iommu_replay(struct vfio_iommu *iom= mu, =20 for (; n; n =3D rb_next(n)) { struct vfio_dma *dma; - dma_addr_t iova; + size_t pos =3D 0; =20 dma =3D rb_entry(n, struct vfio_dma, node); - iova =3D dma->iova; =20 - while (iova < dma->iova + dma->size) { + while (pos < dma->size) { + dma_addr_t iova =3D dma->iova + pos; phys_addr_t phys; size_t size; =20 @@ -1768,14 +1776,14 @@ static int vfio_iommu_replay(struct vfio_iommu *iom= mu, phys =3D iommu_iova_to_phys(d->domain, iova); =20 if (WARN_ON(!phys)) { - iova +=3D PAGE_SIZE; + pos +=3D PAGE_SIZE; continue; } =20 size =3D PAGE_SIZE; p =3D phys + size; i =3D iova + size; - while (i < dma->iova + dma->size && + while (pos + size < dma->size && p =3D=3D iommu_iova_to_phys(d->domain, i)) { size +=3D PAGE_SIZE; p +=3D PAGE_SIZE; @@ -1783,9 +1791,8 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu, } } else { unsigned long pfn; - unsigned long vaddr =3D dma->vaddr + - (iova - dma->iova); - size_t n =3D dma->iova + dma->size - iova; + unsigned long vaddr =3D dma->vaddr + pos; + size_t n =3D dma->size - pos; long npage; =20 npage =3D vfio_pin_pages_remote(dma, vaddr, @@ -1816,7 +1823,7 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu, goto unwind; } =20 - iova +=3D size; + pos +=3D size; } } =20 @@ -1833,29 +1840,29 @@ static int vfio_iommu_replay(struct vfio_iommu *iom= mu, unwind: for (; n; n =3D rb_prev(n)) { struct vfio_dma *dma =3D rb_entry(n, struct vfio_dma, node); - dma_addr_t iova; + size_t pos =3D 0; =20 if (dma->iommu_mapped) { iommu_unmap(domain->domain, dma->iova, dma->size); continue; } =20 - iova =3D dma->iova; - while (iova < dma->iova + dma->size) { + while (pos < dma->size) { + dma_addr_t iova =3D dma->iova + pos; phys_addr_t phys, p; size_t size; dma_addr_t i; =20 phys =3D iommu_iova_to_phys(domain->domain, iova); if (!phys) { - iova +=3D PAGE_SIZE; + pos +=3D PAGE_SIZE; continue; } =20 size =3D PAGE_SIZE; p =3D phys + size; i =3D iova + size; - while (i < dma->iova + dma->size && + while (pos + size < dma->size && p =3D=3D iommu_iova_to_phys(domain->domain, i)) { size +=3D PAGE_SIZE; p +=3D PAGE_SIZE; @@ -2988,7 +2995,7 @@ static int vfio_iommu_type1_dirty_pages(struct vfio_i= ommu *iommu, if (iommu->dirty_page_tracking) ret =3D vfio_iova_dirty_bitmap(range.bitmap.data, iommu, range.iova, - range.size, + range_end, range.bitmap.pgsize); else ret =3D -EINVAL; --=20 2.47.3