From nobody Thu Mar  5 04:06:13 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4094DC43334
	for <linux-kernel@archiver.kernel.org>; Mon, 18 Jul 2022 01:28:43 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S230475AbiGRB2l (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Sun, 17 Jul 2022 21:28:41 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60462 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S230504AbiGRB2i (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 17 Jul 2022 21:28:38 -0400
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 77ADA13F18
        for <linux-kernel@vger.kernel.org>;
 Sun, 17 Jul 2022 18:28:37 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1658107717; x=1689643717;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=k3ePWk+fUoLUpq8oyDvlEOoSVKymvsGYV9Svvo+EtbE=;
  b=SwLReOzkAVbd9bB+79+oDK0uMAE9xnqwWx5jCoEHSEyiJEhytQC13Xyh
   TD/yyjU5ELaEAiW6E7zzXqX1R1/vEDOKEpOyHLCnqHrMFb6azg7ePmMh+
   EvoOmj5Miv7Lztpiz3i/7VfR55f6Zaz2uiENIel2evSwWjw8PCD0agSX9
   TtvEymPHS9XIgeQJudgBEmKWdHsHiWp7Hf5xxql5kjI09gauqlH+n5U33
   2zQTRppJVBbjbdzWJomiNiELkpUCREXFIYxkMTwpJORrAUuTHn2qiyExW
   iIZwAIbqBaieeChikodEvsXINzvMUfBtJ2rzq/CwVkLIVYpnGsX6Otwy5
   g==;
X-IronPort-AV: E=McAfee;i="6400,9594,10411"; a="283673951"
X-IronPort-AV: E=Sophos;i="5.92,280,1650956400";
   d="scan'208";a="283673951"
Received: from orsmga003.jf.intel.com ([10.7.209.27])
  by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 17 Jul 2022 18:28:37 -0700
X-IronPort-AV: E=Sophos;i="5.92,280,1650956400";
   d="scan'208";a="547294028"
Received: from spr.sh.intel.com ([10.239.53.122])
  by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 17 Jul 2022 18:28:33 -0700
From: Chao Gao <chao.gao@intel.com>
To: linux-kernel@vger.kernel.org, iommu@lists.linux.dev
Cc: dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com,
        rafael.j.wysocki@intel.com, reinette.chatre@intel.com,
        dan.j.williams@intel.com, kirill.shutemov@linux.intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com,
        ilpo.jarvinen@linux.intel.com, ak@linux.intel.com,
        alexander.shishkin@linux.intel.com, Chao Gao <chao.gao@intel.com>
Subject: [RFC v2 1/2] swiotlb: use bitmap to track free slots
Date: Mon, 18 Jul 2022 09:28:17 +0800
Message-Id: <20220718012818.107051-2-chao.gao@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20220718012818.107051-1-chao.gao@intel.com>
References: <20220718012818.107051-1-chao.gao@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Currently, each slot tracks the number of contiguous free slots starting
from itself. It helps to quickly check if there are enough contiguous
entries when dealing with an allocation request. But maintaining this
information can leads to some overhead. Specifically, if a slot is
allocated/freed, preceding slots may need to be updated as the number
of contiguous free slots can change. This process may access memory
scattering over multiple cachelines.

To reduce the overhead of maintaining the number of contiguous free
entries, use a global bitmap to track free slots; each bit represents
if a slot is available. The number of contiguous free slots can be
calculated by counting the number of consecutive 1s in the bitmap.

Tests show that the average cost of freeing slots drops by 120 cycles
while the average cost of allocation increases by 20 cycles. Overall,
100 cycles are saved from a pair of allocation and freeing.

Signed-off-by: Chao Gao <chao.gao@intel.com>
---
Ilpo, I didn't add your Reviewed-by as many changes were made due to
conflicts during rebasing.
---
 include/linux/swiotlb.h |  6 ++---
 kernel/dma/swiotlb.c    | 60 +++++++++++++++++++----------------------
 2 files changed, 30 insertions(+), 36 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index d3ae03edbbd2..2c8e6f5df610 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -77,8 +77,6 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t ph=
ys,
  *		@end. For default swiotlb, this is command line adjustable via
  *		setup_io_tlb_npages.
  * @used:	The number of used IO TLB block.
- * @list:	The free list describing the number of free entries available
- *		from each index.
  * @orig_addr:	The original address corresponding to a mapped entry.
  * @alloc_size:	Size of the allocated buffer.
  * @debugfs:	The dentry to debugfs.
@@ -87,6 +85,8 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t ph=
ys,
  * @for_alloc:  %true if the pool is used for memory allocation
  * @nareas:  The area number in the pool.
  * @area_nslabs: The slot number in the area.
+ * @bitmap:	The bitmap used to track free entries. 1 in bit X means the sl=
ot
+ *		indexed by X is free.
  */
 struct io_tlb_mem {
 	phys_addr_t start;
@@ -104,8 +104,8 @@ struct io_tlb_mem {
 	struct io_tlb_slot {
 		phys_addr_t orig_addr;
 		size_t alloc_size;
-		unsigned int list;
 	} *slots;
+	unsigned long *bitmap;
 };
 extern struct io_tlb_mem io_tlb_default_mem;
=20
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 70fd73fc357a..e9803a04459e 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -276,7 +276,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *=
mem, phys_addr_t start,
 	}
=20
 	for (i =3D 0; i < mem->nslabs; i++) {
-		mem->slots[i].list =3D IO_TLB_SEGSIZE - io_tlb_offset(i);
+		__set_bit(i, mem->bitmap);
 		mem->slots[i].orig_addr =3D INVALID_PHYS_ADDR;
 		mem->slots[i].alloc_size =3D 0;
 	}
@@ -360,6 +360,11 @@ void __init swiotlb_init_remap(bool addressing_limit, =
unsigned int flags,
 	if (!mem->areas)
 		panic("%s: Failed to allocate mem->areas.\n", __func__);
=20
+	mem->bitmap =3D memblock_alloc(BITS_TO_BYTES(nslabs), SMP_CACHE_BYTES);
+	if (!mem->bitmap)
+		panic("%s: Failed to allocate %lu bytes align=3D0x%x\n",
+		      __func__, BITS_TO_BYTES(nslabs), SMP_CACHE_BYTES);
+
 	swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, flags, false,
 				default_nareas);
=20
@@ -434,6 +439,10 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 	if (!mem->areas)
 		goto error_area;
=20
+	mem->bitmap =3D bitmap_zalloc(nslabs, GFP_KERNEL);
+	if (!mem->bitmap)
+		goto error_bitmap;
+
 	mem->slots =3D (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
 		get_order(array_size(sizeof(*mem->slots), nslabs)));
 	if (!mem->slots)
@@ -448,6 +457,8 @@ int swiotlb_init_late(size_t size, gfp_t gfp_mask,
 	return 0;
=20
 error_slots:
+	bitmap_free(mem->bitmap);
+error_bitmap:
 	free_pages((unsigned long)mem->areas, area_order);
 error_area:
 	free_pages((unsigned long)vstart, order);
@@ -607,7 +618,7 @@ static int swiotlb_do_find_slots(struct io_tlb_mem *mem,
 	unsigned int iotlb_align_mask =3D
 		dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
 	unsigned int nslots =3D nr_slots(alloc_size), stride;
-	unsigned int index, wrap, count =3D 0, i;
+	unsigned int index, wrap, i;
 	unsigned int offset =3D swiotlb_align_offset(dev, orig_addr);
 	unsigned long flags;
 	unsigned int slot_base;
@@ -626,6 +637,9 @@ static int swiotlb_do_find_slots(struct io_tlb_mem *mem,
 		stride =3D max(stride, stride << (PAGE_SHIFT - IO_TLB_SHIFT));
 	stride =3D max(stride, (alloc_align_mask >> IO_TLB_SHIFT) + 1);
=20
+	/* slots shouldn't cross one segment */
+	max_slots =3D min_t(unsigned long, max_slots, IO_TLB_SEGSIZE);
+
 	spin_lock_irqsave(&area->lock, flags);
 	if (unlikely(nslots > mem->area_nslabs - area->used))
 		goto not_found;
@@ -651,7 +665,8 @@ static int swiotlb_do_find_slots(struct io_tlb_mem *mem,
 		if (!iommu_is_span_boundary(slot_index, nslots,
 					    nr_slots(tbl_dma_addr),
 					    max_slots)) {
-			if (mem->slots[slot_index].list >=3D nslots)
+			if (find_next_zero_bit(mem->bitmap, slot_index + nslots,
+					       slot_index) =3D=3D slot_index + nslots)
 				goto found;
 		}
 		index =3D wrap_area_index(mem, index + stride);
@@ -663,14 +678,10 @@ static int swiotlb_do_find_slots(struct io_tlb_mem *m=
em,
=20
 found:
 	for (i =3D slot_index; i < slot_index + nslots; i++) {
-		mem->slots[i].list =3D 0;
+		__clear_bit(i, mem->bitmap);
 		mem->slots[i].alloc_size =3D alloc_size - (offset +
 				((i - slot_index) << IO_TLB_SHIFT));
 	}
-	for (i =3D slot_index - 1;
-	     io_tlb_offset(i) !=3D IO_TLB_SEGSIZE - 1 &&
-	     mem->slots[i].list; i--)
-		mem->slots[i].list =3D ++count;
=20
 	/*
 	 * Update the indices to avoid searching in the next round.
@@ -775,40 +786,20 @@ static void swiotlb_release_slots(struct device *dev,=
 phys_addr_t tlb_addr)
 	int nslots =3D nr_slots(mem->slots[index].alloc_size + offset);
 	int aindex =3D index / mem->area_nslabs;
 	struct io_tlb_area *area =3D &mem->areas[aindex];
-	int count, i;
+	int i;
=20
 	/*
-	 * Return the buffer to the free list by setting the corresponding
-	 * entries to indicate the number of contiguous entries available.
-	 * While returning the entries to the free list, we merge the entries
-	 * with slots below and above the pool being returned.
+	 * Return the slots to swiotlb, updating bitmap to indicate
+	 * corresponding entries are free.
 	 */
 	BUG_ON(aindex >=3D mem->nareas);
-
 	spin_lock_irqsave(&area->lock, flags);
-	if (index + nslots < ALIGN(index + 1, IO_TLB_SEGSIZE))
-		count =3D mem->slots[index + nslots].list;
-	else
-		count =3D 0;
-
-	/*
-	 * Step 1: return the slots to the free list, merging the slots with
-	 * superceeding slots
-	 */
 	for (i =3D index + nslots - 1; i >=3D index; i--) {
-		mem->slots[i].list =3D ++count;
+		__set_bit(i, mem->bitmap);
 		mem->slots[i].orig_addr =3D INVALID_PHYS_ADDR;
 		mem->slots[i].alloc_size =3D 0;
 	}
=20
-	/*
-	 * Step 2: merge the returned slots with the preceding slots, if
-	 * available (non zero)
-	 */
-	for (i =3D index - 1;
-	     io_tlb_offset(i) !=3D IO_TLB_SEGSIZE - 1 && mem->slots[i].list;
-	     i--)
-		mem->slots[i].list =3D ++count;
 	area->used -=3D nslots;
 	spin_unlock_irqrestore(&area->lock, flags);
 }
@@ -980,7 +971,10 @@ static int rmem_swiotlb_device_init(struct reserved_me=
m *rmem,
 			return -ENOMEM;
=20
 		mem->slots =3D kcalloc(nslabs, sizeof(*mem->slots), GFP_KERNEL);
-		if (!mem->slots) {
+		mem->bitmap =3D bitmap_zalloc(nslabs, GFP_KERNEL);
+		if (!mem->slots || !mem->bitmap) {
+			kfree(mem->slots);
+			bitmap_free(mem->bitmap);
 			kfree(mem);
 			return -ENOMEM;
 		}
--=20
2.25.1
From nobody Thu Mar  5 04:06:13 2026
Return-Path: <linux-kernel-owner@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6DE5CC43334
	for <linux-kernel@archiver.kernel.org>; Mon, 18 Jul 2022 01:28:48 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S231547AbiGRB2r (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Sun, 17 Jul 2022 21:28:47 -0400
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60580 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S229680AbiGRB2n (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Sun, 17 Jul 2022 21:28:43 -0400
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id CA96D13EB0
        for <linux-kernel@vger.kernel.org>;
 Sun, 17 Jul 2022 18:28:41 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1658107721; x=1689643721;
  h=from:to:cc:subject:date:message-id:in-reply-to:
   references:mime-version:content-transfer-encoding;
  bh=6UdrsFkMglmSseu9pu04NsQfmdO+tQ2peISIQFw0LIM=;
  b=CU7On30BnTmZyJx8XIOCwlNd+ZrcN+Krvt1cC7g7WgO/hgi9jzbQITOV
   zXN61Tc/8kPs165y7C0n9nvTD+PRb55onqM2kMjczuwwP8MMVjTSXDer9
   ibmSPERvDKqtIP4cAD+vcxTQfYG3ZMbtNxeedIqNp3N6kkSDzwWEvwaCi
   dbiSagrrA1eoEySJaT6QKrswxu4VkLHY+Staoz/ieVnwgc7XoAgKhLhX/
   CZICtRpHe+CChc4unyP8e0diEqWy4/TpJNqjSS8LBCvhISa2rUK7buRTK
   8NyESm2BabQVG8ZjVvmXU9gciq4dLHGVEgnJuBcIqdLeG/iOIH84M4dv3
   w==;
X-IronPort-AV: E=McAfee;i="6400,9594,10411"; a="283673959"
X-IronPort-AV: E=Sophos;i="5.92,280,1650956400";
   d="scan'208";a="283673959"
Received: from orsmga003.jf.intel.com ([10.7.209.27])
  by fmsmga102.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 17 Jul 2022 18:28:41 -0700
X-IronPort-AV: E=Sophos;i="5.92,280,1650956400";
   d="scan'208";a="547294038"
Received: from spr.sh.intel.com ([10.239.53.122])
  by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 17 Jul 2022 18:28:37 -0700
From: Chao Gao <chao.gao@intel.com>
To: linux-kernel@vger.kernel.org, iommu@lists.linux.dev
Cc: dave.hansen@intel.com, len.brown@intel.com, tony.luck@intel.com,
        rafael.j.wysocki@intel.com, reinette.chatre@intel.com,
        dan.j.williams@intel.com, kirill.shutemov@linux.intel.com,
        sathyanarayanan.kuppuswamy@linux.intel.com,
        ilpo.jarvinen@linux.intel.com, ak@linux.intel.com,
        alexander.shishkin@linux.intel.com, Chao Gao <chao.gao@intel.com>
Subject: [RFC v2 2/2] swiotlb: Allocate memory in a cache-friendly way
Date: Mon, 18 Jul 2022 09:28:18 +0800
Message-Id: <20220718012818.107051-3-chao.gao@intel.com>
X-Mailer: git-send-email 2.25.1
In-Reply-To: <20220718012818.107051-1-chao.gao@intel.com>
References: <20220718012818.107051-1-chao.gao@intel.com>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Currently, swiotlb uses an index to indicate the starting point of next
search. The index increases from 0 to the number of slots - 1 and then
wraps around. It is straightforward but not cache-friendly because the
"oldest" slot in a swiotlb area is used first.

Freed slots are probably accessed right before being freed, especially
in VM's case (device backends access them in DMA_TO_DEVICE mode; guest
accesses them in other DMA modes). Thus those just freed slots may
reside in cache. Then reusing those just freed slots can reduce cache
misses.

To that end, maintain a free list for free slots and insert freed slots
from the head and searching for free slots always starts from the head.

With this optimization, network throughput of sending data from host to
guest, measured by iperf3, increases by 7%.

A bad side effect of this patch is we cannot use a large stride to skip
unaligned slots when there is an alignment requirement. Currently, a
large stride is used when a) device has an alignment requirement, stride
is calculated according to the requirement; b) the requested size is
larger than PAGE_SIZE. For x86 with 4KB page size, stride is set to 2.

For case a), few devices have an alignment requirement; the impact is
limited. For case b) this patch probably leads to one (or more if page size
is larger than 4K) additional lookup; but as the "io_tlb_slot" struct of
free slots are also accessed when freeing slots, they probably resides in
CPU cache as well and then the overhead is almost negligible.

Suggested-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Chao Gao <chao.gao@intel.com>
---
 include/linux/swiotlb.h |  2 ++
 kernel/dma/swiotlb.c    | 71 +++++++++++++++++------------------------
 2 files changed, 32 insertions(+), 41 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 2c8e6f5df610..335a550aeda5 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -79,6 +79,7 @@ dma_addr_t swiotlb_map(struct device *dev, phys_addr_t ph=
ys,
  * @used:	The number of used IO TLB block.
  * @orig_addr:	The original address corresponding to a mapped entry.
  * @alloc_size:	Size of the allocated buffer.
+ * @node:	Representation of an io_tlb_slot in the per-area free list.
  * @debugfs:	The dentry to debugfs.
  * @late_alloc:	%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
@@ -104,6 +105,7 @@ struct io_tlb_mem {
 	struct io_tlb_slot {
 		phys_addr_t orig_addr;
 		size_t alloc_size;
+		struct list_head node;
 	} *slots;
 	unsigned long *bitmap;
 };
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e9803a04459e..cb04a5c06552 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -78,13 +78,13 @@ static unsigned long default_nareas;
  * This is a single area with a single lock.
  *
  * @used:	The number of used IO TLB block.
- * @index:	The slot index to start searching in this area for next round.
+ * @free_slots: List of free slots.
  * @lock:	The lock to protect the above data structures in the map and
  *		unmap calls.
  */
 struct io_tlb_area {
 	unsigned long used;
-	unsigned int index;
+	struct list_head free_slots;
 	spinlock_t lock;
 };
=20
@@ -258,6 +258,7 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *=
mem, phys_addr_t start,
 		unsigned long nslabs, unsigned int flags,
 		bool late_alloc, unsigned int nareas)
 {
+	int aindex;
 	void *vaddr =3D phys_to_virt(start);
 	unsigned long bytes =3D nslabs << IO_TLB_SHIFT, i;
=20
@@ -272,13 +273,16 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem=
 *mem, phys_addr_t start,
=20
 	for (i =3D 0; i < mem->nareas; i++) {
 		spin_lock_init(&mem->areas[i].lock);
-		mem->areas[i].index =3D 0;
+		INIT_LIST_HEAD(&mem->areas[i].free_slots);
 	}
=20
 	for (i =3D 0; i < mem->nslabs; i++) {
 		__set_bit(i, mem->bitmap);
 		mem->slots[i].orig_addr =3D INVALID_PHYS_ADDR;
 		mem->slots[i].alloc_size =3D 0;
+		aindex =3D i / mem->area_nslabs;
+		list_add_tail(&mem->slots[i].node,
+			      &mem->areas[aindex].free_slots);
 	}
=20
 	/*
@@ -595,13 +599,6 @@ static inline unsigned long get_max_slots(unsigned lon=
g boundary_mask)
 	return nr_slots(boundary_mask + 1);
 }
=20
-static unsigned int wrap_area_index(struct io_tlb_mem *mem, unsigned int i=
ndex)
-{
-	if (index >=3D mem->area_nslabs)
-		return 0;
-	return index;
-}
-
 /*
  * Find a suitable number of IO TLB entries size that will fit this reques=
t and
  * allocate a buffer from that IO TLB pool.
@@ -614,29 +611,19 @@ static int swiotlb_do_find_slots(struct io_tlb_mem *m=
em,
 	unsigned long boundary_mask =3D dma_get_seg_boundary(dev);
 	dma_addr_t tbl_dma_addr =3D
 		phys_to_dma_unencrypted(dev, mem->start) & boundary_mask;
+	dma_addr_t slot_dma_addr;
 	unsigned long max_slots =3D get_max_slots(boundary_mask);
 	unsigned int iotlb_align_mask =3D
 		dma_get_min_align_mask(dev) & ~(IO_TLB_SIZE - 1);
-	unsigned int nslots =3D nr_slots(alloc_size), stride;
-	unsigned int index, wrap, i;
+	unsigned int nslots =3D nr_slots(alloc_size);
+	unsigned int slot_index, i;
 	unsigned int offset =3D swiotlb_align_offset(dev, orig_addr);
 	unsigned long flags;
-	unsigned int slot_base;
-	unsigned int slot_index;
+	struct io_tlb_slot *slot, *tmp;
=20
 	BUG_ON(!nslots);
 	BUG_ON(area_index >=3D mem->nareas);
=20
-	/*
-	 * For mappings with an alignment requirement don't bother looping to
-	 * unaligned slots once we found an aligned one.  For allocations of
-	 * PAGE_SIZE or larger only look for page aligned allocations.
-	 */
-	stride =3D (iotlb_align_mask >> IO_TLB_SHIFT) + 1;
-	if (alloc_size >=3D PAGE_SIZE)
-		stride =3D max(stride, stride << (PAGE_SHIFT - IO_TLB_SHIFT));
-	stride =3D max(stride, (alloc_align_mask >> IO_TLB_SHIFT) + 1);
-
 	/* slots shouldn't cross one segment */
 	max_slots =3D min_t(unsigned long, max_slots, IO_TLB_SEGSIZE);
=20
@@ -644,19 +631,27 @@ static int swiotlb_do_find_slots(struct io_tlb_mem *m=
em,
 	if (unlikely(nslots > mem->area_nslabs - area->used))
 		goto not_found;
=20
-	slot_base =3D area_index * mem->area_nslabs;
-	index =3D wrap =3D wrap_area_index(mem, ALIGN(area->index, stride));
-
-	do {
-		slot_index =3D slot_base + index;
+	list_for_each_entry_safe(slot, tmp, &area->free_slots, node) {
+		slot_index =3D slot - mem->slots;
+		slot_dma_addr =3D slot_addr(tbl_dma_addr, slot_index);
=20
 		if (orig_addr &&
-		    (slot_addr(tbl_dma_addr, slot_index) &
-		     iotlb_align_mask) !=3D (orig_addr & iotlb_align_mask)) {
-			index =3D wrap_area_index(mem, index + 1);
+		    (slot_dma_addr & iotlb_align_mask) !=3D
+			    (orig_addr & iotlb_align_mask)) {
 			continue;
 		}
=20
+		/* Ensure requested alignment is met */
+		if (alloc_align_mask && (slot_dma_addr & (alloc_align_mask - 1)))
+			continue;
+
+		/*
+		 * If requested size is larger than a page, ensure allocated
+		 * memory to be page aligned.
+		 */
+		if (alloc_size >=3D PAGE_SIZE && (slot_dma_addr & ~PAGE_MASK))
+			continue;
+
 		/*
 		 * If we find a slot that indicates we have 'nslots' number of
 		 * contiguous buffers, we allocate the buffers from that slot
@@ -669,8 +664,7 @@ static int swiotlb_do_find_slots(struct io_tlb_mem *mem,
 					       slot_index) =3D=3D slot_index + nslots)
 				goto found;
 		}
-		index =3D wrap_area_index(mem, index + stride);
-	} while (index !=3D wrap);
+	}
=20
 not_found:
 	spin_unlock_irqrestore(&area->lock, flags);
@@ -681,15 +675,9 @@ static int swiotlb_do_find_slots(struct io_tlb_mem *me=
m,
 		__clear_bit(i, mem->bitmap);
 		mem->slots[i].alloc_size =3D alloc_size - (offset +
 				((i - slot_index) << IO_TLB_SHIFT));
+		list_del(&mem->slots[i].node);
 	}
=20
-	/*
-	 * Update the indices to avoid searching in the next round.
-	 */
-	if (index + nslots < mem->area_nslabs)
-		area->index =3D index + nslots;
-	else
-		area->index =3D 0;
 	area->used +=3D nslots;
 	spin_unlock_irqrestore(&area->lock, flags);
 	return slot_index;
@@ -798,6 +786,7 @@ static void swiotlb_release_slots(struct device *dev, p=
hys_addr_t tlb_addr)
 		__set_bit(i, mem->bitmap);
 		mem->slots[i].orig_addr =3D INVALID_PHYS_ADDR;
 		mem->slots[i].alloc_size =3D 0;
+		list_add(&mem->slots[i].node, &mem->areas[aindex].free_slots);
 	}
=20
 	area->used -=3D nslots;
--=20
2.25.1