From nobody Mon May 25 04:33:47 2026 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 7258134A76E; Mon, 18 May 2026 18:07:20 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779127642; cv=none; b=DX/eRnpMY2Uc75GRCQb4zEWVhXnBYvLoPwRsqTfup+knLnu3D5/nzc3h8SNE3MCSEh6f4uOEiiOEmaP2QqOwietYNJqjZTovnwVbtrEmiPIHFDrjAFZDdmp94nrycIV/I3GX9NfjDXlC458Ommq8QWph94zaAlTiVoB+7914F/Q= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779127642; c=relaxed/simple; bh=ICpLJ1jbodL8Hr68Dy9epDUS9DgFK7swpvUXqykeaLc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=tCO738he8jCkaFW9G5H/qxiIusShB7sNBxvs+YUTWvKtuLgWVC6QiIMLePf8S31AbljxhQ6G/RteyIXL6c8xq4AHc6J/lA4RAnXcZoLXHphvlfHuNDB1kdF0AeZ2cU14ufqlFQs/iNBJV4fderW/9PpBfx7vfDQoN1W5CRRDOWI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=chtc/2Bb; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="chtc/2Bb" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=ZkcoBzaa8HqfFWOSJraMioCgNPa7yXrTo90NvjAzCwk=; b=chtc/2Bb4n9v89arDFL8GDoATo VzxJgQ2z47UdYxM5d0PfWovVUdxpS72CRUsH5WSt6X7txqE7J88UT7q8FnUcewVlFOhBMIoQbDJfo 9CxgOrtTJ90WTPHAMuSAHWqu8ShKogc7jyMpjVKhD5DeZ7STWKsr/CrwOcEjKYlz6CmywXkUKxHFv KtpdCJNYMf7hKGHRQb6t6uHESUa9ss777c6D08P64T4u5MGf1m7eB5fGQbjdYknLz2L7Rr8uvjOyr BjNQUEFCSMNFqZCibqcL19BQv2fE4jjlQFjQFcmj3w9GLD7QShF1OB7ynM8GN+u1Q0vJT7p/1LTht 8lRZPx3Q==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1wP2Lb-0000000006g-3RK9; Mon, 18 May 2026 14:05:47 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: robin.murphy@arm.com, joro@8bytes.org, will@kernel.org, iommu@lists.linux.dev, jgg@ziepe.ca, kyle@mcmartin.ca, kernel-team@meta.com, Rik van Riel , Claude Opus , Rik van Riel Subject: [PATCH 1/4] iova: switch to augmented rbtree for log(n) allocation Date: Mon, 18 May 2026 14:05:04 -0400 Message-ID: <20260518180545.2286608-2-riel@surriel.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260518180545.2286608-1-riel@surriel.com> References: <20260518180545.2286608-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rik van Riel The IOVA allocator's __alloc_and_insert_iova_range() walks the rbtree backwards via rb_prev() to find a free gap, which is O(n) on the number of allocated ranges. Under heavy fragmentation (e.g. iommu_iova caches holding hundreds of thousands of ranges) this can take 20+ seconds and trigger softlockups. Switch to an augmented rbtree that tracks two new fields per iova node: gap_to_prev - the free gap (in pfns) between this node and its in-order predecessor. __subtree_max_gap - max gap_to_prev over this node's subtree. The augmented invariant is maintained via RB_DECLARE_CALLBACKS_MAX from rbtree_augmented.h, plus explicit propagate() calls on insertion (where both the new node and its successor get a fresh gap_to_prev) and on deletion (where the successor's gap_to_prev grows). The new __iova_search_free_gap() does a reverse in-order walk (right->node->left) to prefer high addresses (preserving the existing top-down allocation behaviour) and prunes any subtree whose __subtree_max_gap is smaller than the requested size. The walk uses parent pointers to traverse back up the tree. For the typical IOMMU workload (uniform DMA mask per domain, so limit_pfn never binds against a candidate gap) and unaligned or small-alignment allocations, the search is O(log n). The alloc_iova_fast() size_aligned path can still visit O(n) borderline-sized gaps when alignment is large relative to typical gap size; this is a known trade-off and would require an alignment-aware augmentation to address. The cached_node / cached32_node fields and their update helpers are left in place but no longer consulted on the alloc path; a follow-up can strip them. The fast-fail max32_alloc_size optimization is preserved. Assisted-by: Claude Opus Signed-off-by: Rik van Riel --- drivers/iommu/iova.c | 225 ++++++++++++++++++++++++++++--------------- include/linux/iova.h | 2 + 2 files changed, 147 insertions(+), 80 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 021daf6528de..5c813dd42c04 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -13,6 +13,7 @@ #include #include #include +#include =20 /* The anchor node sits above the top of the usable address space */ #define IOVA_ANCHOR ~0UL @@ -34,6 +35,15 @@ static struct iova *to_iova(struct rb_node *node) return rb_entry(node, struct iova, node); } =20 +static inline unsigned long iova_gap_value(struct iova *iova) +{ + return iova->gap_to_prev; +} + +RB_DECLARE_CALLBACKS_MAX(static, iova_gap_callbacks, + struct iova, node, unsigned long, __subtree_max_gap, + iova_gap_value) + void init_iova_domain(struct iova_domain *iovad, unsigned long granule, unsigned long start_pfn) @@ -54,20 +64,13 @@ init_iova_domain(struct iova_domain *iovad, unsigned lo= ng granule, iovad->dma_32bit_pfn =3D 1UL << (32 - iova_shift(iovad)); iovad->max32_alloc_size =3D iovad->dma_32bit_pfn; iovad->anchor.pfn_lo =3D iovad->anchor.pfn_hi =3D IOVA_ANCHOR; + iovad->anchor.gap_to_prev =3D IOVA_ANCHOR; + iovad->anchor.__subtree_max_gap =3D IOVA_ANCHOR; rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node); rb_insert_color(&iovad->anchor.node, &iovad->rbroot); } EXPORT_SYMBOL_GPL(init_iova_domain); =20 -static struct rb_node * -__get_cached_rbnode(struct iova_domain *iovad, unsigned long limit_pfn) -{ - if (limit_pfn <=3D iovad->dma_32bit_pfn) - return iovad->cached32_node; - - return iovad->cached_node; -} - static void __cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new) { @@ -96,49 +99,13 @@ __cached_rbnode_delete_update(struct iova_domain *iovad= , struct iova *free) iovad->cached_node =3D rb_next(&free->node); } =20 -static struct rb_node *iova_find_limit(struct iova_domain *iovad, unsigned= long limit_pfn) -{ - struct rb_node *node, *next; - /* - * Ideally what we'd like to judge here is whether limit_pfn is close - * enough to the highest-allocated IOVA that starting the allocation - * walk from the anchor node will be quicker than this initial work to - * find an exact starting point (especially if that ends up being the - * anchor node anyway). This is an incredibly crude approximation which - * only really helps the most likely case, but is at least trivially easy. - */ - if (limit_pfn > iovad->dma_32bit_pfn) - return &iovad->anchor.node; - - node =3D iovad->rbroot.rb_node; - while (to_iova(node)->pfn_hi < limit_pfn) - node =3D node->rb_right; - -search_left: - while (node->rb_left && to_iova(node->rb_left)->pfn_lo >=3D limit_pfn) - node =3D node->rb_left; - - if (!node->rb_left) - return node; - - next =3D node->rb_left; - while (next->rb_right) { - next =3D next->rb_right; - if (to_iova(next)->pfn_lo >=3D limit_pfn) { - node =3D next; - goto search_left; - } - } - - return node; -} - /* Insert the iova into domain rbtree by holding writer lock */ static void iova_insert_rbtree(struct rb_root *root, struct iova *iova, struct rb_node *start) { struct rb_node **new, *parent =3D NULL; + struct rb_node *prev_node, *next_node; =20 new =3D (start) ? &start : &(root->rb_node); /* Figure out where to put new node */ @@ -156,62 +123,123 @@ iova_insert_rbtree(struct rb_root *root, struct iova= *iova, return; } } - /* Add new node and rebalance tree. */ + rb_link_node(&iova->node, parent, new); - rb_insert_color(&iova->node, root); + + prev_node =3D rb_prev(&iova->node); + if (prev_node) + iova->gap_to_prev =3D iova->pfn_lo - to_iova(prev_node)->pfn_hi - 1; + else + iova->gap_to_prev =3D iova->pfn_lo; + iova->__subtree_max_gap =3D iova->gap_to_prev; + + next_node =3D rb_next(&iova->node); + if (next_node) + to_iova(next_node)->gap_to_prev =3D + to_iova(next_node)->pfn_lo - iova->pfn_hi - 1; + + if (parent) + iova_gap_callbacks.propagate(parent, NULL); + if (next_node) + iova_gap_callbacks.propagate(next_node, NULL); + + rb_insert_augmented(&iova->node, root, &iova_gap_callbacks); +} + +/* + * Search the augmented rbtree for the highest-addressed free gap of at le= ast + * @size pages, with the allocation fitting below @limit_pfn and at or abo= ve + * @start_pfn. Returns the node whose gap_to_prev is used, or NULL. + * + * Reverse in-order walk (right->node->left) with subtree pruning. Uses + * parent pointers to traverse back up the tree instead of an explicit sta= ck, + * following the same strategy as drm_mm.c's DECLARE_NEXT_HOLE_ADDR. + */ +static bool iova_subtree_usable(struct rb_node *node, unsigned long size) +{ + return node && to_iova(node)->__subtree_max_gap >=3D size; +} + +static struct rb_node * +__iova_search_free_gap(struct rb_node *root_node, unsigned long size, + unsigned long limit_pfn, unsigned long start_pfn, + unsigned long align_mask, unsigned long *new_pfn) +{ + struct rb_node *node; + + if (!iova_subtree_usable(root_node, size)) + return NULL; + + node =3D root_node; + while (iova_subtree_usable(node->rb_right, size)) + node =3D node->rb_right; + + for (;;) { + struct iova *iova =3D to_iova(node); + + if (iova->gap_to_prev >=3D size) { + unsigned long gap_lo =3D iova->pfn_lo - iova->gap_to_prev; + unsigned long gap_hi =3D iova->pfn_lo - 1; + unsigned long pfn; + + if (gap_hi >=3D limit_pfn) + gap_hi =3D limit_pfn - 1; + if (gap_hi >=3D gap_lo && gap_hi - gap_lo + 1 >=3D size) { + pfn =3D (gap_hi - size + 1) & align_mask; + if (pfn >=3D gap_lo && pfn >=3D start_pfn) { + *new_pfn =3D pfn; + return node; + } + } + } + + if (iova_subtree_usable(node->rb_left, size)) { + node =3D node->rb_left; + while (iova_subtree_usable(node->rb_right, size)) + node =3D node->rb_right; + } else { + struct rb_node *parent; + + while ((parent =3D rb_parent(node)) && + node =3D=3D parent->rb_left) + node =3D parent; + if (!parent) + return NULL; + node =3D parent; + } + } } =20 static int __alloc_and_insert_iova_range(struct iova_domain *iovad, unsigned long size, unsigned long limit_pfn, struct iova *new, bool size_aligned) { - struct rb_node *curr, *prev; - struct iova *curr_iova; unsigned long flags; - unsigned long new_pfn, retry_pfn; + unsigned long new_pfn; unsigned long align_mask =3D ~0UL; - unsigned long high_pfn =3D limit_pfn, low_pfn =3D iovad->start_pfn; + struct rb_node *gap_node; =20 if (size_aligned) align_mask <<=3D fls_long(size - 1); =20 - /* Walk the tree backwards */ spin_lock_irqsave(&iovad->iova_rbtree_lock, flags); if (limit_pfn <=3D iovad->dma_32bit_pfn && size >=3D iovad->max32_alloc_size) goto iova32_full; =20 - curr =3D __get_cached_rbnode(iovad, limit_pfn); - curr_iova =3D to_iova(curr); - retry_pfn =3D curr_iova->pfn_hi; - -retry: - do { - high_pfn =3D min(high_pfn, curr_iova->pfn_lo); - new_pfn =3D (high_pfn - size) & align_mask; - prev =3D curr; - curr =3D rb_prev(curr); - curr_iova =3D to_iova(curr); - } while (curr && new_pfn <=3D curr_iova->pfn_hi && new_pfn >=3D low_pfn); - - if (high_pfn < size || new_pfn < low_pfn) { - if (low_pfn =3D=3D iovad->start_pfn && retry_pfn < limit_pfn) { - high_pfn =3D limit_pfn; - low_pfn =3D retry_pfn + 1; - curr =3D iova_find_limit(iovad, limit_pfn); - curr_iova =3D to_iova(curr); - goto retry; - } - iovad->max32_alloc_size =3D size; + gap_node =3D __iova_search_free_gap(iovad->rbroot.rb_node, size, + limit_pfn, iovad->start_pfn, + align_mask, &new_pfn); + if (!gap_node) { + if (limit_pfn <=3D iovad->dma_32bit_pfn) + iovad->max32_alloc_size =3D size; goto iova32_full; } =20 - /* pfn_lo will point to size aligned address if size_aligned is set */ new->pfn_lo =3D new_pfn; - new->pfn_hi =3D new->pfn_lo + size - 1; + new->pfn_hi =3D new_pfn + size - 1; =20 - /* If we have 'prev', it's a valid place to start the insertion. */ - iova_insert_rbtree(&iovad->rbroot, new, prev); + iova_insert_rbtree(&iovad->rbroot, new, gap_node); __cached_rbnode_insert_update(iovad, new); =20 spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags); @@ -295,9 +323,34 @@ private_find_iova(struct iova_domain *iovad, unsigned = long pfn) =20 static void remove_iova(struct iova_domain *iovad, struct iova *iova) { + struct rb_node *next_node; + struct iova *next_iova =3D NULL; + assert_spin_locked(&iovad->iova_rbtree_lock); __cached_rbnode_delete_update(iovad, iova); - rb_erase(&iova->node, &iovad->rbroot); + + next_node =3D rb_next(&iova->node); + if (next_node) { + struct rb_node *prev_node =3D rb_prev(&iova->node); + + next_iova =3D to_iova(next_node); + if (prev_node) + next_iova->gap_to_prev =3D + next_iova->pfn_lo - to_iova(prev_node)->pfn_hi - 1; + else + next_iova->gap_to_prev =3D next_iova->pfn_lo; + /* + * Propagate next_iova's new augmented values to the root BEFORE + * the erase. Otherwise rotations inside rb_erase_augmented may + * copy a stale __subtree_max_gap from next_iova to other nodes, + * leaving ancestors in an inconsistent state that the post-erase + * propagate cannot fully repair (early-termination at matching + * intermediate values). + */ + iova_gap_callbacks.propagate(&next_iova->node, NULL); + } + + rb_erase_augmented(&iova->node, &iovad->rbroot, &iova_gap_callbacks); } =20 /** @@ -527,8 +580,20 @@ reserve_iova(struct iova_domain *iovad, spin_lock_irqsave(&iovad->iova_rbtree_lock, flags); for (node =3D rb_first(&iovad->rbroot); node; node =3D rb_next(node)) { if (__is_range_overlap(node, pfn_lo, pfn_hi)) { + struct rb_node *p; + unsigned long gap; + iova =3D to_iova(node); __adjust_overlap_range(iova, &pfn_lo, &pfn_hi); + p =3D rb_prev(node); + if (p) + gap =3D iova->pfn_lo - to_iova(p)->pfn_hi - 1; + else + gap =3D iova->pfn_lo; + if (iova->gap_to_prev !=3D gap) { + iova->gap_to_prev =3D gap; + iova_gap_callbacks.propagate(node, NULL); + } if ((pfn_lo >=3D iova->pfn_lo) && (pfn_hi <=3D iova->pfn_hi)) goto finish; diff --git a/include/linux/iova.h b/include/linux/iova.h index d2c4fd923efa..52635a72c5c5 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -19,6 +19,8 @@ struct iova { struct rb_node node; unsigned long pfn_hi; /* Highest allocated pfn */ unsigned long pfn_lo; /* Lowest allocated pfn */ + unsigned long gap_to_prev; /* Gap (in pfns) to predecessor */ + unsigned long __subtree_max_gap; /* Max gap_to_prev in subtree */ }; =20 =20 --=20 2.52.0 From nobody Mon May 25 04:33:47 2026 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 55799351C22; Mon, 18 May 2026 18:06:36 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779127598; cv=none; b=TcEUzvv4N7JbS534L4Ai9D+ahqS/ubTg1HC/9t4HogEp1vStOKY7uz2X0NxSoT9MDspZ8MMuNQ/nRISwD5pj9kuwD2jQyiJB0cDkEYFSbTZMd+4rUS0XLLmKjclG5xDxOA1wIlvO7ebaKY9+/ff4+JMcohS6bQ8r0OXAJ+6KEDg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779127598; c=relaxed/simple; bh=fKoNlYFQ9i+xwRphioK+E097l6m7eBvvV96x1XXJGRQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=d9G7FMoGOVlU37mFmCA0AuTdU3m87tRKB6Td0J6bYqNMHMyt8+ZXtogqCnJik3BL6yoQUdsuN5crqqV0SS2F+NzGULMufKTCMzdf0CJlo4Ax0VAS7QSvHlyE4KHpK9G+HYz1BOeo9DbPPIJF68yhhNZZbNPJqWc/8QLtQ3slJyw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=htIWBgJa; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="htIWBgJa" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=GWvEh8/+ioAp2vdjyqkYVgoQtAgnd/toD6zHWkmNYaY=; b=htIWBgJaE9rCE+UKOQq0+OUPuu NKD0GMDTiaBvVoegu03A5jAYjZdFVZfPASgG0h3X8dWYjISSocl52QZHyu+/fL/ZjCz/NF6KUfBII U/oaMcdqRjjn+GN36opBC/WMSHBNhqDKg3nhuoFs1cOiwhtMMOqBT2PpYyLshwr8Lcuza7Ctn9tQ8 O3pR/yZmOCPT3QoSMGZ1euZNkSFl983I22Jo9XTDkWWati+0MHhxlOR2qZiL4e5uCiR1CXZAq+G2g 7FTW9aZdKsynCqR0pmSNaazxf27+bATuVPn2WDoUx+EM4M54Mc2hY2JxGD29z35AJQJg1lwO65n8S tsxsvTxA==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1wP2Lb-0000000006g-3XVp; Mon, 18 May 2026 14:05:47 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: robin.murphy@arm.com, joro@8bytes.org, will@kernel.org, iommu@lists.linux.dev, jgg@ziepe.ca, kyle@mcmartin.ca, kernel-team@meta.com, Rik van Riel , Claude Opus , Rik van Riel Subject: [PATCH 2/4] iova: drop dead cached_node / cached32_node infrastructure Date: Mon, 18 May 2026 14:05:05 -0400 Message-ID: <20260518180545.2286608-3-riel@surriel.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260518180545.2286608-1-riel@surriel.com> References: <20260518180545.2286608-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rik van Riel After the augmented-rbtree port, cached_node and cached32_node are still maintained on every insert and delete but are never consulted on the allocation path. Drop the fields and their helpers. The one piece of useful work in __cached_rbnode_delete_update() was resetting iovad->max32_alloc_size when an iova in the 32-bit range was freed (so the next 32-bit alloc can retry). That logic is preserved by moving it inline into remove_iova(). No external consumers reference the cached_node fields. Assisted-by: Claude Opus Signed-off-by: Rik van Riel --- drivers/iommu/iova.c | 35 +++-------------------------------- include/linux/iova.h | 2 -- 2 files changed, 3 insertions(+), 34 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 5c813dd42c04..4848a7c202be 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -57,8 +57,6 @@ init_iova_domain(struct iova_domain *iovad, unsigned long= granule, =20 spin_lock_init(&iovad->iova_rbtree_lock); iovad->rbroot =3D RB_ROOT; - iovad->cached_node =3D &iovad->anchor.node; - iovad->cached32_node =3D &iovad->anchor.node; iovad->granule =3D granule; iovad->start_pfn =3D start_pfn; iovad->dma_32bit_pfn =3D 1UL << (32 - iova_shift(iovad)); @@ -71,34 +69,6 @@ init_iova_domain(struct iova_domain *iovad, unsigned lon= g granule, } EXPORT_SYMBOL_GPL(init_iova_domain); =20 -static void -__cached_rbnode_insert_update(struct iova_domain *iovad, struct iova *new) -{ - if (new->pfn_hi < iovad->dma_32bit_pfn) - iovad->cached32_node =3D &new->node; - else - iovad->cached_node =3D &new->node; -} - -static void -__cached_rbnode_delete_update(struct iova_domain *iovad, struct iova *free) -{ - struct iova *cached_iova; - - cached_iova =3D to_iova(iovad->cached32_node); - if (free =3D=3D cached_iova || - (free->pfn_hi < iovad->dma_32bit_pfn && - free->pfn_lo >=3D cached_iova->pfn_lo)) - iovad->cached32_node =3D rb_next(&free->node); - - if (free->pfn_lo < iovad->dma_32bit_pfn) - iovad->max32_alloc_size =3D iovad->dma_32bit_pfn; - - cached_iova =3D to_iova(iovad->cached_node); - if (free->pfn_lo >=3D cached_iova->pfn_lo) - iovad->cached_node =3D rb_next(&free->node); -} - /* Insert the iova into domain rbtree by holding writer lock */ static void iova_insert_rbtree(struct rb_root *root, struct iova *iova, @@ -240,7 +210,6 @@ static int __alloc_and_insert_iova_range(struct iova_do= main *iovad, new->pfn_hi =3D new_pfn + size - 1; =20 iova_insert_rbtree(&iovad->rbroot, new, gap_node); - __cached_rbnode_insert_update(iovad, new); =20 spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags); return 0; @@ -327,7 +296,9 @@ static void remove_iova(struct iova_domain *iovad, stru= ct iova *iova) struct iova *next_iova =3D NULL; =20 assert_spin_locked(&iovad->iova_rbtree_lock); - __cached_rbnode_delete_update(iovad, iova); + + if (iova->pfn_lo < iovad->dma_32bit_pfn) + iovad->max32_alloc_size =3D iovad->dma_32bit_pfn; =20 next_node =3D rb_next(&iova->node); if (next_node) { diff --git a/include/linux/iova.h b/include/linux/iova.h index 52635a72c5c5..3c4cc81e5182 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -30,8 +30,6 @@ struct iova_rcache; struct iova_domain { spinlock_t iova_rbtree_lock; /* Lock to protect update of rbtree */ struct rb_root rbroot; /* iova domain rbtree root */ - struct rb_node *cached_node; /* Save last alloced node */ - struct rb_node *cached32_node; /* Save last 32-bit alloced node */ unsigned long granule; /* pfn granularity for this domain */ unsigned long start_pfn; /* Lower limit for this domain */ unsigned long dma_32bit_pfn; --=20 2.52.0 From nobody Mon May 25 04:33:47 2026 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1E8DF38239E; Mon, 18 May 2026 18:07:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779127674; cv=none; b=ilsiOpoGa7YwRYxkU7UMgi7j6+ieCMvQCMzFfJsw9VW2D1+xgy6YL8KLPgMmHLLd8hUeem39yOJhk7nsriAfCKpuxzJt39jKVmKqVKHi0x1Q/icucThVaCs08YIO+Dr1hKv6QdQjPgvfEzI8QVmXiWm7T/921Wr/QKPd/7DjYcE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779127674; c=relaxed/simple; bh=7QIP3eNuJ9L1j9Ee66vD7NURJ8u2/3muoJrz3Q08GO0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Hjq2z0duYJKvi087LGEqvmb/e/4uVX/uqAgF4MYGwnTsOv3yY9AbjrDpfSJuM0um2UD6onhQOxGUsmHFZfBGQ5muCNCag/VuGnvDspu8LvsOj6oeGw49H75Squv+sEGowEty9O+jMQBdF6Qe43AeLgGqOTHBi0bYivHCsX99dNo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=cQe9HIWO; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="cQe9HIWO" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:MIME-Version:References:In-Reply-To: Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-Type:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=mHkvZAKpIOs+cDa+/zKwq5djxx+HCBm5GDTHNqhzcQc=; b=cQe9HIWOyy/Ybd1vlw1ApdELj6 czrn1rHQMRctS3YfZ8V4pvclXYC7vpGDZzBgv4ZQB6h9zwOQqGqIULNJAqd/SW/OBFaVtX1k11Qu2 4c5Z8N+O95SWGTauc2qOTKi7bmhPtIfU58CcuOiGF76x1gg3Y7GOemi0Pjrm9B4msUPGGVIPEx+x/ bjKvrim24CpbeHG6JVTXAh7bPkr518FasK0agj93IWiqvPDI8Oxt0K1MjKYGoh7Xhvt1hPhRvOQKn TgZ5TMmsyqlsv+MdlHr16OIMBfQtu2pU6cJhyACONjHN6V3PUyUvci0LN5EcAxB3n/LeQoJE7QwHa zMErTcUw==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1wP2Lb-0000000006g-3ckH; Mon, 18 May 2026 14:05:47 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: robin.murphy@arm.com, joro@8bytes.org, will@kernel.org, iommu@lists.linux.dev, jgg@ziepe.ca, kyle@mcmartin.ca, kernel-team@meta.com, Rik van Riel , Claude Opus , Rik van Riel Subject: [PATCH 3/4] iova: limit_pfn-aware augmentation for log(n) 32-bit alloc Date: Mon, 18 May 2026 14:05:06 -0400 Message-ID: <20260518180545.2286608-4-riel@surriel.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260518180545.2286608-1-riel@surriel.com> References: <20260518180545.2286608-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Rik van Riel The augmented-rbtree port made __iova_search_free_gap O(log n) when limit_pfn doesn't bind any candidate gap, but degrades to O(n) when limit_pfn is small relative to the gaps in the tree. The classic case is a 32-bit DMA allocation on a domain dominated by 64-bit allocations: the augmented invariant __subtree_max_gap is satisfied everywhere (the huge gap between the highest 64-bit allocation and the IOVA_ANCHOR dominates), so pruning never fires, and every node above limit_pfn must be visited and rejected before falling through to the 32-bit region. Add a second augmented field that bounds the search by 32-bit-clamped gap size: clamped_gap32(node) =3D 0 if gap_to_prev =3D=3D 0 0 if gap_lo >=3D dma_32bit_pfn min(gap_hi, dma_32bit_pfn-1) - gap_lo + 1 otherwise __subtree_max_gap32 =3D max(clamped_gap32) over subtree clamped_gap32 is precomputed and stored on each iova whenever gap_to_prev changes (insert, remove, reserve overlap fix-up), so the augmented callbacks remain pure functions of the node and don't need domain context. The compute helper iova_compute_max() updates both __subtree_max_gap and __subtree_max_gap32 in one pass; the hand-rolled propagate / copy / rotate callbacks (replacing the single-field RB_DECLARE_CALLBACKS_MAX expansion) maintain both fields per insert/erase. Pruning in __iova_search_free_gap selects between them based on whether the caller is doing a 32-bit allocation. For 64-bit allocations the behaviour is unchanged. For 32-bit allocations on mixed-DMA-mask domains the search is now O(log n). Assisted-by: Claude Opus Signed-off-by: Rik van Riel --- drivers/iommu/iova.c | 163 ++++++++++++++++++++++++++++++++++++------- include/linux/iova.h | 6 +- 2 files changed, 140 insertions(+), 29 deletions(-) diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 4848a7c202be..3cdee4870efe 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -35,14 +35,97 @@ static struct iova *to_iova(struct rb_node *node) return rb_entry(node, struct iova, node); } =20 -static inline unsigned long iova_gap_value(struct iova *iova) +/* + * Portion of @iova->gap_to_prev that lies strictly below @dma_32bit_pfn, + * i.e. the largest contiguous sub-gap that a 32-bit-restricted allocation + * could possibly use. Maintained alongside gap_to_prev so the augmented + * callbacks can compare against it without needing per-iova access to the + * domain's dma_32bit_pfn. + */ +static unsigned long +iova_compute_clamped_gap32(struct iova *iova, unsigned long dma_32bit_pfn) +{ + unsigned long gap_lo, gap_hi; + + if (iova->gap_to_prev =3D=3D 0) + return 0; + gap_lo =3D iova->pfn_lo - iova->gap_to_prev; + if (gap_lo >=3D dma_32bit_pfn) + return 0; + gap_hi =3D iova->pfn_lo - 1; + if (gap_hi >=3D dma_32bit_pfn) + gap_hi =3D dma_32bit_pfn - 1; + return gap_hi - gap_lo + 1; +} + +/* + * Recompute @node's __subtree_max_gap and __subtree_max_gap32 from its + * own gap fields and its children's subtree maxes. Returns true (for + * propagate's early-termination) if neither value would change. + */ +static bool iova_compute_max(struct iova *node, bool exit) +{ + unsigned long max_gap =3D node->gap_to_prev; + unsigned long max_gap32 =3D node->clamped_gap32; + struct iova *child; + + if (node->node.rb_left) { + child =3D to_iova(node->node.rb_left); + if (child->__subtree_max_gap > max_gap) + max_gap =3D child->__subtree_max_gap; + if (child->__subtree_max_gap32 > max_gap32) + max_gap32 =3D child->__subtree_max_gap32; + } + if (node->node.rb_right) { + child =3D to_iova(node->node.rb_right); + if (child->__subtree_max_gap > max_gap) + max_gap =3D child->__subtree_max_gap; + if (child->__subtree_max_gap32 > max_gap32) + max_gap32 =3D child->__subtree_max_gap32; + } + if (exit && node->__subtree_max_gap =3D=3D max_gap && + node->__subtree_max_gap32 =3D=3D max_gap32) + return true; + node->__subtree_max_gap =3D max_gap; + node->__subtree_max_gap32 =3D max_gap32; + return false; +} + +static void iova_gap_propagate(struct rb_node *rb, struct rb_node *stop) +{ + while (rb !=3D stop) { + struct iova *node =3D to_iova(rb); + + if (iova_compute_max(node, true)) + break; + rb =3D rb_parent(&node->node); + } +} + +static void iova_gap_copy(struct rb_node *rb_old, struct rb_node *rb_new) { - return iova->gap_to_prev; + struct iova *old =3D to_iova(rb_old); + struct iova *new =3D to_iova(rb_new); + + new->__subtree_max_gap =3D old->__subtree_max_gap; + new->__subtree_max_gap32 =3D old->__subtree_max_gap32; } =20 -RB_DECLARE_CALLBACKS_MAX(static, iova_gap_callbacks, - struct iova, node, unsigned long, __subtree_max_gap, - iova_gap_value) +static void iova_gap_rotate(struct rb_node *rb_old, struct rb_node *rb_new) +{ + struct iova *old =3D to_iova(rb_old); + struct iova *new =3D to_iova(rb_new); + + new->__subtree_max_gap =3D old->__subtree_max_gap; + new->__subtree_max_gap32 =3D old->__subtree_max_gap32; + iova_compute_max(old, false); +} + +static const struct rb_augment_callbacks iova_gap_callbacks =3D { + .propagate =3D iova_gap_propagate, + .copy =3D iova_gap_copy, + .rotate =3D iova_gap_rotate, +}; =20 void init_iova_domain(struct iova_domain *iovad, unsigned long granule, @@ -64,6 +147,9 @@ init_iova_domain(struct iova_domain *iovad, unsigned lon= g granule, iovad->anchor.pfn_lo =3D iovad->anchor.pfn_hi =3D IOVA_ANCHOR; iovad->anchor.gap_to_prev =3D IOVA_ANCHOR; iovad->anchor.__subtree_max_gap =3D IOVA_ANCHOR; + iovad->anchor.clamped_gap32 =3D iova_compute_clamped_gap32(&iovad->anchor, + iovad->dma_32bit_pfn); + iovad->anchor.__subtree_max_gap32 =3D iovad->anchor.clamped_gap32; rb_link_node(&iovad->anchor.node, NULL, &iovad->rbroot.rb_node); rb_insert_color(&iovad->anchor.node, &iovad->rbroot); } @@ -71,9 +157,10 @@ EXPORT_SYMBOL_GPL(init_iova_domain); =20 /* Insert the iova into domain rbtree by holding writer lock */ static void -iova_insert_rbtree(struct rb_root *root, struct iova *iova, +iova_insert_rbtree(struct iova_domain *iovad, struct iova *iova, struct rb_node *start) { + struct rb_root *root =3D &iovad->rbroot; struct rb_node **new, *parent =3D NULL; struct rb_node *prev_node, *next_node; =20 @@ -101,12 +188,18 @@ iova_insert_rbtree(struct rb_root *root, struct iova = *iova, iova->gap_to_prev =3D iova->pfn_lo - to_iova(prev_node)->pfn_hi - 1; else iova->gap_to_prev =3D iova->pfn_lo; + iova->clamped_gap32 =3D iova_compute_clamped_gap32(iova, iovad->dma_32bit= _pfn); iova->__subtree_max_gap =3D iova->gap_to_prev; + iova->__subtree_max_gap32 =3D iova->clamped_gap32; =20 next_node =3D rb_next(&iova->node); - if (next_node) - to_iova(next_node)->gap_to_prev =3D - to_iova(next_node)->pfn_lo - iova->pfn_hi - 1; + if (next_node) { + struct iova *next_iova =3D to_iova(next_node); + + next_iova->gap_to_prev =3D next_iova->pfn_lo - iova->pfn_hi - 1; + next_iova->clamped_gap32 =3D iova_compute_clamped_gap32(next_iova, + iovad->dma_32bit_pfn); + } =20 if (parent) iova_gap_callbacks.propagate(parent, NULL); @@ -119,29 +212,40 @@ iova_insert_rbtree(struct rb_root *root, struct iova = *iova, /* * Search the augmented rbtree for the highest-addressed free gap of at le= ast * @size pages, with the allocation fitting below @limit_pfn and at or abo= ve - * @start_pfn. Returns the node whose gap_to_prev is used, or NULL. + * @start_pfn. When @is_32bit, prune by the 32-bit-clamped subtree max so + * that 32-bit-restricted allocations on a domain dominated by high-pfn + * allocations stay O(log n) instead of degrading to O(n). Returns the node + * whose gap_to_prev is used, or NULL. * * Reverse in-order walk (right->node->left) with subtree pruning. Uses * parent pointers to traverse back up the tree instead of an explicit sta= ck, * following the same strategy as drm_mm.c's DECLARE_NEXT_HOLE_ADDR. */ -static bool iova_subtree_usable(struct rb_node *node, unsigned long size) +static bool iova_subtree_usable(struct rb_node *node, unsigned long size, + bool is_32bit) { - return node && to_iova(node)->__subtree_max_gap >=3D size; + struct iova *iova; + + if (!node) + return false; + iova =3D to_iova(node); + return (is_32bit ? iova->__subtree_max_gap32 + : iova->__subtree_max_gap) >=3D size; } =20 static struct rb_node * __iova_search_free_gap(struct rb_node *root_node, unsigned long size, unsigned long limit_pfn, unsigned long start_pfn, - unsigned long align_mask, unsigned long *new_pfn) + unsigned long align_mask, bool is_32bit, + unsigned long *new_pfn) { struct rb_node *node; =20 - if (!iova_subtree_usable(root_node, size)) + if (!iova_subtree_usable(root_node, size, is_32bit)) return NULL; =20 node =3D root_node; - while (iova_subtree_usable(node->rb_right, size)) + while (iova_subtree_usable(node->rb_right, size, is_32bit)) node =3D node->rb_right; =20 for (;;) { @@ -163,9 +267,9 @@ __iova_search_free_gap(struct rb_node *root_node, unsig= ned long size, } } =20 - if (iova_subtree_usable(node->rb_left, size)) { + if (iova_subtree_usable(node->rb_left, size, is_32bit)) { node =3D node->rb_left; - while (iova_subtree_usable(node->rb_right, size)) + while (iova_subtree_usable(node->rb_right, size, is_32bit)) node =3D node->rb_right; } else { struct rb_node *parent; @@ -188,20 +292,21 @@ static int __alloc_and_insert_iova_range(struct iova_= domain *iovad, unsigned long new_pfn; unsigned long align_mask =3D ~0UL; struct rb_node *gap_node; + bool is_32bit; =20 if (size_aligned) align_mask <<=3D fls_long(size - 1); =20 spin_lock_irqsave(&iovad->iova_rbtree_lock, flags); - if (limit_pfn <=3D iovad->dma_32bit_pfn && - size >=3D iovad->max32_alloc_size) + is_32bit =3D limit_pfn <=3D iovad->dma_32bit_pfn; + if (is_32bit && size >=3D iovad->max32_alloc_size) goto iova32_full; =20 gap_node =3D __iova_search_free_gap(iovad->rbroot.rb_node, size, limit_pfn, iovad->start_pfn, - align_mask, &new_pfn); + align_mask, is_32bit, &new_pfn); if (!gap_node) { - if (limit_pfn <=3D iovad->dma_32bit_pfn) + if (is_32bit) iovad->max32_alloc_size =3D size; goto iova32_full; } @@ -209,7 +314,7 @@ static int __alloc_and_insert_iova_range(struct iova_do= main *iovad, new->pfn_lo =3D new_pfn; new->pfn_hi =3D new_pfn + size - 1; =20 - iova_insert_rbtree(&iovad->rbroot, new, gap_node); + iova_insert_rbtree(iovad, new, gap_node); =20 spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags); return 0; @@ -310,13 +415,15 @@ static void remove_iova(struct iova_domain *iovad, st= ruct iova *iova) next_iova->pfn_lo - to_iova(prev_node)->pfn_hi - 1; else next_iova->gap_to_prev =3D next_iova->pfn_lo; + next_iova->clamped_gap32 =3D iova_compute_clamped_gap32(next_iova, + iovad->dma_32bit_pfn); /* * Propagate next_iova's new augmented values to the root BEFORE * the erase. Otherwise rotations inside rb_erase_augmented may - * copy a stale __subtree_max_gap from next_iova to other nodes, - * leaving ancestors in an inconsistent state that the post-erase - * propagate cannot fully repair (early-termination at matching - * intermediate values). + * copy a stale __subtree_max_gap or __subtree_max_gap32 from + * next_iova to other nodes, leaving ancestors in an inconsistent + * state that the post-erase propagate cannot fully repair + * (early-termination at matching intermediate values). */ iova_gap_callbacks.propagate(&next_iova->node, NULL); } @@ -512,7 +619,7 @@ __insert_new_range(struct iova_domain *iovad, =20 iova =3D alloc_and_init_iova(pfn_lo, pfn_hi); if (iova) - iova_insert_rbtree(&iovad->rbroot, iova, NULL); + iova_insert_rbtree(iovad, iova, NULL); =20 return iova; } @@ -563,6 +670,8 @@ reserve_iova(struct iova_domain *iovad, gap =3D iova->pfn_lo; if (iova->gap_to_prev !=3D gap) { iova->gap_to_prev =3D gap; + iova->clamped_gap32 =3D iova_compute_clamped_gap32(iova, + iovad->dma_32bit_pfn); iova_gap_callbacks.propagate(node, NULL); } if ((pfn_lo >=3D iova->pfn_lo) && diff --git a/include/linux/iova.h b/include/linux/iova.h index 3c4cc81e5182..d262c6d88d6c 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -19,8 +19,10 @@ struct iova { struct rb_node node; unsigned long pfn_hi; /* Highest allocated pfn */ unsigned long pfn_lo; /* Lowest allocated pfn */ - unsigned long gap_to_prev; /* Gap (in pfns) to predecessor */ - unsigned long __subtree_max_gap; /* Max gap_to_prev in subtree */ + unsigned long gap_to_prev; /* Gap (in pfns) to predecessor */ + unsigned long clamped_gap32; /* gap_to_prev portion below dma_32bit= _pfn */ + unsigned long __subtree_max_gap; /* Max gap_to_prev in subtree */ + unsigned long __subtree_max_gap32; /* Max clamped_gap32 in subtree */ }; =20 =20 --=20 2.52.0 From nobody Mon May 25 04:33:47 2026 Received: from shelob.surriel.com (shelob.surriel.com [96.67.55.147]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6DF3538AC6E; Mon, 18 May 2026 18:08:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=96.67.55.147 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779127701; cv=none; b=jBIwVM2iOUxEmt8NErFbvo6ekiOtvEkK7mMEX+nQurYfU5zKfnHGMq2ZgLj+vYav6LOTIthp285RFsQJbqasLKNQerIt3rTlAlgtSDhPK2iC+T8TEGiV7siYkfiPrcfdLbpaeOTjUk3fJA2Wiq31zMXQN87EkYwApZWZyb7nuvQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1779127701; c=relaxed/simple; bh=1HH61hQLYzbkKKbwTtST54bosEs+O4Xs+rWAPGhx1sQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=KDA8g7PogF2DkBULCr6yHp5/oWI5yvan0j6S7CY6Vo5maOIJGLqDtEzO92kEIcDG6f1/z429M/z9RBsOUAfMLGxiZoMjXbCfz9sqoflmTu+mHDt3yXRX1rGy22HV3GAtV63dmzGGmgeWJYFvOIZso5Cl/azVnteRGSMhfs01bFA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com; spf=pass smtp.mailfrom=surriel.com; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b=Rwwh/683; arc=none smtp.client-ip=96.67.55.147 Authentication-Results: smtp.subspace.kernel.org; dmarc=none (p=none dis=none) header.from=surriel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=surriel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=surriel.com header.i=@surriel.com header.b="Rwwh/683" DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=surriel.com ; s=mail; h=Content-Transfer-Encoding:Content-Type:MIME-Version:References: In-Reply-To:Message-ID:Date:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=iZGFztyXfV+sf4DJsunxOD5ksDnL6RfuXkIYLY3WpoI=; b=Rwwh/683sSiQR8o9DGnF+Ye4EE YSFs+hF+kCvQEq6G+L7r/S8WBNBR2uXYX1qao+FHzITxfvgBvoeDNIcOm+if1VBpg4GsILQiB2jv0 Xc/nMMxX0I/ZQjuyW+ve2XX1fQo+n+arQh0MpGOqMIFP4OEYUu0qr3mTd6VcBSUtjrJJ+ntanXWqo BVK4NOYN5nRiUnST6HKezEKv//bJ6PyjLJgDN7SKCNQr5gWTGbaIvwwZAKWvaDSolfxGA8N7nTUgA EUxkTytQL17fbckcnkSrg+MjUIW7ZCMms00ml1s9U1Y1nso+eNS6ofPYpgQxD5srfSDsZZv3vAooV SzXj5DOA==; Received: from fangorn.home.surriel.com ([10.0.13.7]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.97.1) (envelope-from ) id 1wP2Lb-0000000006g-3ibV; Mon, 18 May 2026 14:05:47 -0400 From: Rik van Riel To: linux-kernel@vger.kernel.org Cc: robin.murphy@arm.com, joro@8bytes.org, will@kernel.org, iommu@lists.linux.dev, jgg@ziepe.ca, kyle@mcmartin.ca, kernel-team@meta.com, Rik van Riel , Claude Opus , Rik van Riel Subject: [PATCH 4/4] iova: add KUnit test suite Date: Mon, 18 May 2026 14:05:07 -0400 Message-ID: <20260518180545.2286608-5-riel@surriel.com> X-Mailer: git-send-email 2.52.0 In-Reply-To: <20260518180545.2286608-1-riel@surriel.com> References: <20260518180545.2286608-1-riel@surriel.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Rik van Riel Add a kunit suite for the augmented-rbtree IOVA allocator, plus an iova_domain_verify_invariants() helper (compiled only when the test config is enabled) that walks the tree and confirms every node's gap_to_prev, clamped_gap32, __subtree_max_gap, and __subtree_max_gap32 match what recomputation from scratch yields. Test cases: - test_size_aligned: alignment of size_aligned allocs across orders 0..7. - test_top_down_preference: sequential allocs decrease in pfn_lo. - test_reserve_iova: allocs avoid the reserved range. - test_32bit_in_64bit_domain: 1000 64-bit allocs followed by a 32-bit alloc must still find a slot below DMA_BIT_MASK(32) -- exercises the __subtree_max_gap32 augmentation. - test_aligned_in_fragmented: pack size-2 size_aligned allocs, free every other to leave size-2 holes; a fresh size-2 aligned alloc must still succeed and return a 2-aligned pfn. - test_pci_32bit_workaround_pattern: alternate 32-bit-first allocation attempts with 64-bit fallback, mirroring dma-iommu.c. Uses a deliberately small <=3D32-bit limit so the 32-bit region is exhausted partway through and the 64-bit fallback path is genuinely exercised. - test_stress_random: 2048 random alloc/free operations with mixed sizes, alignments, and 32/64-bit limits, checking augmented-rbtree invariants after every operation. Uses a deterministic PRNG so failures reproduce across boots. The suite uses kunit's .init/.exit fixture mechanism for per-test setup and teardown, storing the iova_test_ctx in test->priv. Each test verifies the augmented invariants both during and after the test run so that any sequencing bug in insert / erase / rotate / propagate is caught at the operation that introduced it. Run with: tools/testing/kunit/kunit.py run --kunitconfig=3Ddrivers/iommu Assisted-by: Claude Opus Signed-off-by: Rik van Riel --- drivers/iommu/.kunitconfig | 4 + drivers/iommu/Kconfig | 17 ++ drivers/iommu/Makefile | 1 + drivers/iommu/iova-kunit.c | 314 +++++++++++++++++++++++++++++++++++++ drivers/iommu/iova.c | 90 +++++++++++ include/linux/iova.h | 3 + 6 files changed, 429 insertions(+) create mode 100644 drivers/iommu/.kunitconfig create mode 100644 drivers/iommu/iova-kunit.c diff --git a/drivers/iommu/.kunitconfig b/drivers/iommu/.kunitconfig new file mode 100644 index 000000000000..56b481ecf993 --- /dev/null +++ b/drivers/iommu/.kunitconfig @@ -0,0 +1,4 @@ +CONFIG_KUNIT=3Dy +CONFIG_IOMMU_SUPPORT=3Dy +CONFIG_IOMMU_IOVA=3Dy +CONFIG_IOMMU_IOVA_KUNIT_TEST=3Dy diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index f86262b11416..584cd13356e4 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -3,6 +3,23 @@ config IOMMU_IOVA tristate =20 +config IOMMU_IOVA_KUNIT_TEST + tristate "KUnit tests for the IOVA allocator" if !KUNIT_ALL_TESTS + depends on IOMMU_IOVA && KUNIT + default KUNIT_ALL_TESTS + help + Enable kunit tests for the IOVA allocator. The tests exercise + basic allocation and free, size-aligned allocation, top-down + ordering, the limit_pfn-aware 32-bit augmentation, aligned + allocations in fragmented domains, and randomly-fragmented + stress scenarios. Each test verifies that the augmented + rbtree invariants remain consistent throughout. + + Run with: + tools/testing/kunit/kunit.py run --kunitconfig=3Ddrivers/iommu + + If unsure, say N here. + # IOMMU_API always gets selected by whoever wants it. config IOMMU_API bool diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 0275821f4ef9..6bd7da1cbebd 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -16,6 +16,7 @@ obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) +=3D io-pgtable-arm.o obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE_KUNIT_TEST) +=3D io-pgtable-arm-selftes= ts.o obj-$(CONFIG_IOMMU_IO_PGTABLE_DART) +=3D io-pgtable-dart.o obj-$(CONFIG_IOMMU_IOVA) +=3D iova.o +obj-$(CONFIG_IOMMU_IOVA_KUNIT_TEST) +=3D iova-kunit.o obj-$(CONFIG_OF_IOMMU) +=3D of_iommu.o obj-$(CONFIG_MSM_IOMMU) +=3D msm_iommu.o obj-$(CONFIG_IPMMU_VMSA) +=3D ipmmu-vmsa.o diff --git a/drivers/iommu/iova-kunit.c b/drivers/iommu/iova-kunit.c new file mode 100644 index 000000000000..95eefe66501e --- /dev/null +++ b/drivers/iommu/iova-kunit.c @@ -0,0 +1,314 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * KUnit tests for the IOVA allocator. + * + * Exercises the augmented-rbtree based allocator: basic alloc/free, + * size-aligned allocations, top-down ordering, the limit_pfn-aware + * 32-bit augmentation (relevant for the dma-iommu pci_32bit_workaround + * pattern), the alignment-aware two-phase search, and randomly + * fragmented stress. + * + * Each test verifies that the augmented invariants + * (__subtree_max_gap, __subtree_max_gap32, gap_to_prev, clamped_gap32) + * remain consistent after every batch of operations. + */ +#include +#include +#include +#include + +#define TEST_GRANULE PAGE_SIZE +/* Highest pfn that fits in 32 bits =E2=80=94 triggers the is_32bit alloc = path. */ +#define TEST_LIMIT_32BIT (DMA_BIT_MASK(32) >> PAGE_SHIFT) +/* A 64-bit-ish limit well above dma_32bit_pfn. 1ULL avoids UB on ILP32. */ +#define TEST_LIMIT_64BIT ((1ULL << 36) >> PAGE_SHIFT) +/* + * A small <=3D32-bit limit used by tests that want to actually exhaust the + * 32-bit-restricted region within a tractable number of allocations and + * exercise the 64-bit fallback path. + */ +#define TEST_LIMIT_32BIT_RESTRICTED 1024UL + +struct iova_test_ctx { + struct iova_domain iovad; + bool initialized; +}; + +static int iova_test_init(struct kunit *test) +{ + struct iova_test_ctx *ctx; + int ret; + + ctx =3D kunit_kzalloc(test, sizeof(*ctx), GFP_KERNEL); + if (!ctx) + return -ENOMEM; + test->priv =3D ctx; + + ret =3D iova_cache_get(); + if (ret) + return ret; + + init_iova_domain(&ctx->iovad, TEST_GRANULE, 1); + ret =3D iova_domain_init_rcaches(&ctx->iovad); + if (ret) { + put_iova_domain(&ctx->iovad); + iova_cache_put(); + return ret; + } + ctx->initialized =3D true; + + KUNIT_ASSERT_TRUE(test, iova_domain_verify_invariants(&ctx->iovad)); + return 0; +} + +static void iova_test_exit(struct kunit *test) +{ + struct iova_test_ctx *ctx =3D test->priv; + + if (ctx && ctx->initialized) { + put_iova_domain(&ctx->iovad); + ctx->initialized =3D false; + iova_cache_put(); + } +} + +static void test_size_aligned(struct kunit *test) +{ + struct iova_test_ctx *ctx =3D test->priv; + int order; + + for (order =3D 0; order < 8; ++order) { + unsigned long size =3D 1UL << order; + struct iova *iova =3D alloc_iova(&ctx->iovad, size, + TEST_LIMIT_32BIT, true); + + KUNIT_ASSERT_NOT_NULL(test, iova); + KUNIT_EXPECT_EQ(test, iova->pfn_lo & (size - 1), 0); + KUNIT_EXPECT_EQ(test, iova->pfn_hi - iova->pfn_lo + 1, size); + __free_iova(&ctx->iovad, iova); + KUNIT_EXPECT_TRUE(test, iova_domain_verify_invariants(&ctx->iovad)); + } +} + +static void test_top_down_preference(struct kunit *test) +{ + struct iova_test_ctx *ctx =3D test->priv; + struct iova *iovas[16]; + int i; + + for (i =3D 0; i < ARRAY_SIZE(iovas); ++i) { + iovas[i] =3D alloc_iova(&ctx->iovad, 1, TEST_LIMIT_32BIT, false); + KUNIT_ASSERT_NOT_NULL(test, iovas[i]); + if (i > 0) + KUNIT_EXPECT_LT(test, iovas[i]->pfn_lo, + iovas[i - 1]->pfn_lo); + } + KUNIT_EXPECT_TRUE(test, iova_domain_verify_invariants(&ctx->iovad)); + + for (i =3D 0; i < ARRAY_SIZE(iovas); ++i) + __free_iova(&ctx->iovad, iovas[i]); +} + +static void test_reserve_iova(struct kunit *test) +{ + struct iova_test_ctx *ctx =3D test->priv; + const unsigned long reserve_lo =3D TEST_LIMIT_32BIT / 2; + struct iova *r, *iova; + int i; + + /* Reserve the entire top half through the limit_pfn, inclusive. */ + r =3D reserve_iova(&ctx->iovad, reserve_lo, TEST_LIMIT_32BIT); + KUNIT_ASSERT_NOT_NULL(test, r); + KUNIT_EXPECT_TRUE(test, iova_domain_verify_invariants(&ctx->iovad)); + + /* All allocs must land below the reserved range. */ + for (i =3D 0; i < 100; ++i) { + iova =3D alloc_iova(&ctx->iovad, 1, TEST_LIMIT_32BIT, false); + KUNIT_ASSERT_NOT_NULL(test, iova); + KUNIT_EXPECT_LT(test, iova->pfn_hi, reserve_lo); + } + KUNIT_EXPECT_TRUE(test, iova_domain_verify_invariants(&ctx->iovad)); +} + +/* + * The pci_32bit_workaround scenario: every PCI device's first IOVA + * allocation hits the 32-bit-restricted path before falling back to + * 64-bit. Mix the two and verify the limit_pfn-aware augmentation + * keeps both correct. + */ +static void test_32bit_in_64bit_domain(struct kunit *test) +{ + struct iova_test_ctx *ctx =3D test->priv; + struct iova *iova; + int i; + + /* Fill the high 64-bit space. */ + for (i =3D 0; i < 1000; ++i) { + iova =3D alloc_iova(&ctx->iovad, 1, TEST_LIMIT_64BIT, true); + KUNIT_ASSERT_NOT_NULL(test, iova); + } + KUNIT_EXPECT_TRUE(test, iova_domain_verify_invariants(&ctx->iovad)); + + /* A 32-bit alloc must still find a slot below DMA_BIT_MASK(32). */ + iova =3D alloc_iova(&ctx->iovad, 1, TEST_LIMIT_32BIT, true); + KUNIT_ASSERT_NOT_NULL(test, iova); + KUNIT_EXPECT_LE(test, iova->pfn_hi, TEST_LIMIT_32BIT); + KUNIT_EXPECT_TRUE(test, iova_domain_verify_invariants(&ctx->iovad)); + + __free_iova(&ctx->iovad, iova); +} + +/* + * Aligned allocation in a fragmented domain: pack size-2 size_aligned + * allocations at the top, free every other one to leave size-2 holes, + * then verify a fresh size-2 aligned alloc still succeeds and returns + * a 2-aligned pfn. The augmented-rbtree invariants must remain + * consistent throughout. + */ +static void test_aligned_in_fragmented(struct kunit *test) +{ + struct iova_test_ctx *ctx =3D test->priv; + const int N =3D 64; + struct iova **iovas; + struct iova *iova; + int i; + + iovas =3D kunit_kcalloc(test, N, sizeof(*iovas), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, iovas); + + for (i =3D 0; i < N; ++i) { + iovas[i] =3D alloc_iova(&ctx->iovad, 2, TEST_LIMIT_32BIT, true); + KUNIT_ASSERT_NOT_NULL(test, iovas[i]); + KUNIT_EXPECT_EQ(test, iovas[i]->pfn_lo & 1, 0); + } + KUNIT_EXPECT_TRUE(test, iova_domain_verify_invariants(&ctx->iovad)); + + /* Free every other to create size-2 gaps interleaved with allocs. */ + for (i =3D 0; i < N; i +=3D 2) { + __free_iova(&ctx->iovad, iovas[i]); + iovas[i] =3D NULL; + } + KUNIT_EXPECT_TRUE(test, iova_domain_verify_invariants(&ctx->iovad)); + + iova =3D alloc_iova(&ctx->iovad, 2, TEST_LIMIT_32BIT, true); + KUNIT_ASSERT_NOT_NULL(test, iova); + KUNIT_EXPECT_EQ(test, iova->pfn_lo & 1, 0); + __free_iova(&ctx->iovad, iova); + KUNIT_EXPECT_TRUE(test, iova_domain_verify_invariants(&ctx->iovad)); + + for (i =3D 0; i < N; ++i) + if (iovas[i]) + __free_iova(&ctx->iovad, iovas[i]); +} + +/* + * Mimic dma-iommu's pci_32bit_workaround pattern: every alloc first + * tries the 32-bit limit; if that fails, retry with the 64-bit limit. + * Use a deliberately small <=3D32-bit limit so the 32-bit region is + * actually exhausted partway through and the 64-bit fallback path is + * exercised. Verifies that the dual-augmented invariant survives the + * rapid switching between is_32bit=3Dtrue and is_32bit=3Dfalse. + */ +static void test_pci_32bit_workaround_pattern(struct kunit *test) +{ + struct iova_test_ctx *ctx =3D test->priv; + int fallback_count =3D 0; + int i; + + for (i =3D 0; i < 500; ++i) { + unsigned long size =3D (i % 4) + 1; + struct iova *iova =3D alloc_iova(&ctx->iovad, size, + TEST_LIMIT_32BIT_RESTRICTED, + true); + + if (!iova) { + iova =3D alloc_iova(&ctx->iovad, size, + TEST_LIMIT_64BIT, true); + fallback_count++; + } + KUNIT_ASSERT_NOT_NULL(test, iova); + } + KUNIT_EXPECT_TRUE(test, iova_domain_verify_invariants(&ctx->iovad)); + KUNIT_EXPECT_GT(test, fallback_count, 0); +} + +/* + * Random alloc/free over many iterations, verifying invariants after + * every operation. Uses a deterministic PRNG so failures reproduce + * across boots. + */ +static void test_stress_random(struct kunit *test) +{ + struct iova_test_ctx *ctx =3D test->priv; + const int N =3D 512; + const int iters =3D 4 * N; + struct iova **iovas; + u32 rng =3D 0xDEADBEEF; + int i; + + iovas =3D kunit_kcalloc(test, N, sizeof(*iovas), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, iovas); + + for (i =3D 0; i < iters; ++i) { + int slot; + bool use_32bit; + unsigned long limit; + const char *op; + + rng =3D rng * 1103515245 + 12345; + slot =3D (rng >> 8) % N; + rng =3D rng * 1103515245 + 12345; + use_32bit =3D rng & 1; + limit =3D use_32bit ? TEST_LIMIT_32BIT : TEST_LIMIT_64BIT; + + if (iovas[slot]) { + op =3D "free"; + __free_iova(&ctx->iovad, iovas[slot]); + iovas[slot] =3D NULL; + } else { + unsigned long size; + bool aligned; + + rng =3D rng * 1103515245 + 12345; + size =3D 1UL << ((rng >> 8) % 4); + rng =3D rng * 1103515245 + 12345; + aligned =3D rng & 1; + + op =3D "alloc"; + iovas[slot] =3D alloc_iova(&ctx->iovad, size, limit, + aligned); + } + if (!iova_domain_verify_invariants(&ctx->iovad)) { + kunit_info(test, "iter %d slot %d: invariant broken after %s\n", + i, slot, op); + KUNIT_FAIL(test, "verify failed"); + break; + } + } + + for (i =3D 0; i < N; ++i) + if (iovas[i]) + __free_iova(&ctx->iovad, iovas[i]); +} + +static struct kunit_case iova_test_cases[] =3D { + KUNIT_CASE(test_size_aligned), + KUNIT_CASE(test_top_down_preference), + KUNIT_CASE(test_reserve_iova), + KUNIT_CASE(test_32bit_in_64bit_domain), + KUNIT_CASE(test_aligned_in_fragmented), + KUNIT_CASE(test_pci_32bit_workaround_pattern), + KUNIT_CASE(test_stress_random), + {} +}; + +static struct kunit_suite iova_test_suite =3D { + .name =3D "iova", + .init =3D iova_test_init, + .exit =3D iova_test_exit, + .test_cases =3D iova_test_cases, +}; +kunit_test_suite(iova_test_suite); + +MODULE_DESCRIPTION("KUnit tests for the IOVA allocator"); +MODULE_LICENSE("GPL"); diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c index 3cdee4870efe..299c9924fb64 100644 --- a/drivers/iommu/iova.c +++ b/drivers/iommu/iova.c @@ -1150,6 +1150,96 @@ void iova_cache_put(void) } EXPORT_SYMBOL_GPL(iova_cache_put); =20 +#if IS_ENABLED(CONFIG_IOMMU_IOVA_KUNIT_TEST) +/* + * Walk the iova rbtree and verify that every node's gap_to_prev, + * clamped_gap32, __subtree_max_gap, and __subtree_max_gap32 match what + * recomputation from scratch would yield. Returns true on success. + * + * Intended for use by kunit tests to catch invariant corruption from + * insertion / deletion / rotation paths. + */ +struct iova_subtree_maxes { + unsigned long max_gap; + unsigned long max_gap32; +}; + +static struct iova_subtree_maxes +iova_walk_verify(struct rb_node *node, struct iova_domain *iovad, bool *ok) +{ + struct iova_subtree_maxes m =3D { 0, 0 }; + struct iova_subtree_maxes left, right; + struct rb_node *prev_node; + struct iova *iova; + unsigned long expected_gap; + unsigned long expected_clamped; + + if (!node) + return m; + + left =3D iova_walk_verify(node->rb_left, iovad, ok); + right =3D iova_walk_verify(node->rb_right, iovad, ok); + iova =3D to_iova(node); + + prev_node =3D rb_prev(node); + if (prev_node) + expected_gap =3D iova->pfn_lo - to_iova(prev_node)->pfn_hi - 1; + else + expected_gap =3D iova->pfn_lo; + if (iova->gap_to_prev !=3D expected_gap) { + pr_err("iova_verify: pfn_lo=3D0x%lx gap_to_prev=3D%lu expected=3D%lu\n", + iova->pfn_lo, iova->gap_to_prev, expected_gap); + *ok =3D false; + } + + expected_clamped =3D iova_compute_clamped_gap32(iova, iovad->dma_32bit_pf= n); + if (iova->clamped_gap32 !=3D expected_clamped) { + pr_err("iova_verify: pfn_lo=3D0x%lx clamped_gap32=3D%lu expected=3D%lu\n= ", + iova->pfn_lo, iova->clamped_gap32, expected_clamped); + *ok =3D false; + } + + m.max_gap =3D iova->gap_to_prev; + if (left.max_gap > m.max_gap) + m.max_gap =3D left.max_gap; + if (right.max_gap > m.max_gap) + m.max_gap =3D right.max_gap; + + m.max_gap32 =3D iova->clamped_gap32; + if (left.max_gap32 > m.max_gap32) + m.max_gap32 =3D left.max_gap32; + if (right.max_gap32 > m.max_gap32) + m.max_gap32 =3D right.max_gap32; + + if (iova->__subtree_max_gap !=3D m.max_gap) { + pr_err("iova_verify: pfn_lo=3D0x%lx __subtree_max_gap=3D%lu expected=3D%= lu (own=3D%lu left=3D%lu right=3D%lu)\n", + iova->pfn_lo, iova->__subtree_max_gap, m.max_gap, + iova->gap_to_prev, left.max_gap, right.max_gap); + *ok =3D false; + } + if (iova->__subtree_max_gap32 !=3D m.max_gap32) { + pr_err("iova_verify: pfn_lo=3D0x%lx __subtree_max_gap32=3D%lu expected= =3D%lu (own=3D%lu left=3D%lu right=3D%lu)\n", + iova->pfn_lo, iova->__subtree_max_gap32, m.max_gap32, + iova->clamped_gap32, left.max_gap32, right.max_gap32); + *ok =3D false; + } + + return m; +} + +bool iova_domain_verify_invariants(struct iova_domain *iovad) +{ + bool ok =3D true; + unsigned long flags; + + spin_lock_irqsave(&iovad->iova_rbtree_lock, flags); + iova_walk_verify(iovad->rbroot.rb_node, iovad, &ok); + spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags); + return ok; +} +EXPORT_SYMBOL_GPL(iova_domain_verify_invariants); +#endif /* CONFIG_IOMMU_IOVA_KUNIT_TEST */ + MODULE_AUTHOR("Anil S Keshavamurthy "); MODULE_DESCRIPTION("IOMMU I/O Virtual Address management"); MODULE_LICENSE("GPL"); diff --git a/include/linux/iova.h b/include/linux/iova.h index d262c6d88d6c..108676d8ad69 100644 --- a/include/linux/iova.h +++ b/include/linux/iova.h @@ -104,6 +104,9 @@ void init_iova_domain(struct iova_domain *iovad, unsign= ed long granule, int iova_domain_init_rcaches(struct iova_domain *iovad); struct iova *find_iova(struct iova_domain *iovad, unsigned long pfn); void put_iova_domain(struct iova_domain *iovad); +#if IS_ENABLED(CONFIG_IOMMU_IOVA_KUNIT_TEST) +bool iova_domain_verify_invariants(struct iova_domain *iovad); +#endif #else static inline int iova_cache_get(void) { --=20 2.52.0