From nobody Sun Feb  8 20:12:34 2026
Received: from BN1PR04CU002.outbound.protection.outlook.com
 (mail-eastus2azon11010026.outbound.protection.outlook.com [52.101.56.26])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2405D2FA0DB
	for <linux-kernel@vger.kernel.org>; Wed, 26 Nov 2025 01:11:08 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=fail smtp.client-ip=52.101.56.26
ARC-Seal: i=2; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1764119472; cv=fail;
 b=Ae3dqentx+RCR4NUyMVmM/EznEHhwhpCpU0bMEJMrCC166Nwc4gGddSOkcu9CJF7ludtPyGQb0EOQk8C+yi+VP2/SL/bMwZpJPeiSuHns8bHmAjY0nCmEKhEwaOZzyrjQwhMbQaN08HA+8R8BLPq6w+WjCRgSnmQzPuoOmCkfo0=
ARC-Message-Signature: i=2; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1764119472; c=relaxed/simple;
	bh=8E/aFH28KkS5GGyi4XrxqTz2ymX3Y+ZKhQcV72ew+cw=;
	h=From:To:CC:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=S4iFEbnOH0+s/dbMg7xALngDBH9ulpDdrSFYlV5oi8ZHouJVcLYrr7WDI73Qdl77lohpCdpPelruwNiNQqAyE74YSekpFxR39nJrk4twKsimOEZpFdbZNAD2VP5UGjMPD99hbTf2eGc1vqhjh9PHzgnIl2MxWwZcsGXVd4rled0=
ARC-Authentication-Results: i=2; smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com;
 spf=fail smtp.mailfrom=nvidia.com;
 dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b=ky4iCigV; arc=fail smtp.client-ip=52.101.56.26
Authentication-Results: smtp.subspace.kernel.org;
 dmarc=pass (p=reject dis=none) header.from=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
 spf=fail smtp.mailfrom=nvidia.com
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=Nvidia.com header.i=@Nvidia.com
 header.b="ky4iCigV"
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=cjMLmCkpF4a2cEjXkvCD2FB9+WfmSjetoYJhluIlL988M3Uqffi2ABZ6xTqMG/L3mEzyjCBznZg1MnwsN/asqCpVSyJ6mID3RhFG3I2nSySzybL48peSwZ4Km/FNWu1EgfGrD2Vk3PTCj8I/gOvDbpiER6hws/HBkdiJx+N7egaiTxzf38KJR044X6lkp7Jaa6Rt3hz9TiNDpWw67NjfihJzpKJnLzmYb4XrW8nuI7ssx4wn8vXdgezZDV3ULEibSD8NgAyVWW9XGLMwv6KONYrFrRsknfvqjnCwu3z7E1P3hSPVHH8+vkwv67OW7V0Id9XrVenaMZFjaKbtuXVb5w==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com;
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=QaYYbkDVdBctoHBCxxx9PfDNeB6l0aivXG+rEGTaMb0=;
 b=Bjy1Zk7Elm6SIf/iEDAwVc8yRygbN5Z2TKMB6WXIOXcKd966OCUIel9d2MkWikWLzeudTUnSm9p2AwkRlnyzkDSWyk4R7hYkcnzOEqWugqDy1D41g/lx20CzjYRDgXGIZrEMOI9OfXgZWVj26/0wAcBITGLl7o+MkCRZmRePM9VJssW5CEbrAo0yjIEa8GKKdQ6RfMWKJchsELdFeLHYDkJMsozvn2I7CwKsyuHn/YdGknskvN99peOHsIVHCrbcuJK97s/drMLQgD+PE7Z8/oYswifXgbZAagyKKS2YfOhx6WkIJs12/tz0js3dNdXKDJWLSvhLWEqrgpyMaxzahA==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is
 216.228.117.161) smtp.rcpttodomain=kernel.org smtp.mailfrom=nvidia.com;
 dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com;
 dkim=none (message not signed); arc=none (0)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com;
 s=selector2;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=QaYYbkDVdBctoHBCxxx9PfDNeB6l0aivXG+rEGTaMb0=;
 b=ky4iCigV4YAf2rx/3IT/yhw2C3PrXT/Km37ftW+CF4WcVVEX5rqLyzVNdpzx9ptZkrVbJ4pZzuttsnmSp+YitVkxcanYkJC11czbOaEr+sRFHpKgQfAj0hfOb2yBGsRe1eidQFszDYiY+SjjxiPb5SyvW3w15TXTW9Hc/XyUDhJ4hzHi6Ci/8KaPDsjyutp1AdoCFRrIz6NKWcUfrc6zw0aItIoppMSwQd3PDNbvqta30OsWkMypD6J/fm1gr9587wRUpc3iSjnQQacmxzFBx3lAuN46Tf3/oQbed2emIbDl9RoXPzMtgLgGwZh7e8d0fGh+5+p41hAVB5ABc3ziUg==
Received: from DS7PR03CA0360.namprd03.prod.outlook.com (2603:10b6:8:55::33) by
 BY5PR12MB4228.namprd12.prod.outlook.com (2603:10b6:a03:20b::22) with
 Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.9366.12; Wed, 26 Nov
 2025 01:11:00 +0000
Received: from DS3PEPF000099D8.namprd04.prod.outlook.com
 (2603:10b6:8:55:cafe::d1) by DS7PR03CA0360.outlook.office365.com
 (2603:10b6:8:55::33) with Microsoft SMTP Server (version=TLS1_3,
 cipher=TLS_AES_256_GCM_SHA384) id 15.20.9343.17 via Frontend Transport; Wed,
 26 Nov 2025 01:10:59 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 216.228.117.161)
 smtp.mailfrom=nvidia.com; dkim=none (message not signed)
 header.d=none;dmarc=pass action=none header.from=nvidia.com;
Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates
 216.228.117.161 as permitted sender) receiver=protection.outlook.com;
 client-ip=216.228.117.161; helo=mail.nvidia.com; pr=C
Received: from mail.nvidia.com (216.228.117.161) by
 DS3PEPF000099D8.mail.protection.outlook.com (10.167.17.9) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.9366.7 via Frontend Transport; Wed, 26 Nov 2025 01:10:59 +0000
Received: from rnnvmail204.nvidia.com (10.129.68.6) by mail.nvidia.com
 (10.129.200.67) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Tue, 25 Nov
 2025 17:10:40 -0800
Received: from rnnvmail203.nvidia.com (10.129.68.9) by rnnvmail204.nvidia.com
 (10.129.68.6) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.2562.20; Tue, 25 Nov
 2025 17:10:40 -0800
Received: from Asurada-Nvidia.nvidia.com (10.127.8.13) by mail.nvidia.com
 (10.129.68.9) with Microsoft SMTP Server id 15.2.2562.20 via Frontend
 Transport; Tue, 25 Nov 2025 17:10:39 -0800
From: Nicolin Chen <nicolinc@nvidia.com>
To: <will@kernel.org>
CC: <jean-philippe@linaro.org>, <robin.murphy@arm.com>, <joro@8bytes.org>,
	<jgg@nvidia.com>, <balbirs@nvidia.com>, <miko.lenczewski@arm.com>,
	<peterz@infradead.org>, <kevin.tian@intel.com>, <praan@google.com>,
	<linux-arm-kernel@lists.infradead.org>, <iommu@lists.linux.dev>,
	<linux-kernel@vger.kernel.org>
Subject: [PATCH v6 3/7] iommu/arm-smmu-v3: Introduce a per-domain
 arm_smmu_invs array
Date: Tue, 25 Nov 2025 17:10:08 -0800
Message-ID: 
 <8d02cbd9e58fe99f6a7576934d34b440b89e8c9d.1764119291.git.nicolinc@nvidia.com>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <cover.1764119291.git.nicolinc@nvidia.com>
References: <cover.1764119291.git.nicolinc@nvidia.com>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
X-NV-OnPremToCloud: ExternallySecured
X-EOPAttributedMessage: 0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: DS3PEPF000099D8:EE_|BY5PR12MB4228:EE_
X-MS-Office365-Filtering-Correlation-Id: 37e46f68-e047-42cf-340e-08de2c88a8dc
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: 
	BCL:0;ARA:13230040|376014|7416014|82310400026|36860700013|1800799024;
X-Microsoft-Antispam-Message-Info: 
	=?us-ascii?Q?HlNsp+stPpYuAOP5tAnwduVbMfS51ikgLD2WPXYDpTxP+NyHigRGNWkbx3IE?=
 =?us-ascii?Q?kFPYxiwBr1/ef8WOVE+zi8OF+YxdJUVNDowF094ArdAOLI2K2gB23UDtPQMp?=
 =?us-ascii?Q?XnRxAEP3mbm9pn+IPhqXM/4YZcAA56szEoy9MJ1tJk7xOBFKREFGhXfPGckx?=
 =?us-ascii?Q?dfsy2Bgfdz6SiIzZZfpbak8Q74qsyQ2taOYFoGBGlbSlJW59WmORonPMYWWm?=
 =?us-ascii?Q?th485ZOKPBxN+EXSeraIxXWhGXKxJPEfikL4nTx84uFVH4tBFPk9W0cwO1oe?=
 =?us-ascii?Q?wekI7FQUWspyt8lbwDSjzMVjKRQykJPF0rLLG2JJPB0WVZNZgc27fCOZPpx7?=
 =?us-ascii?Q?eANoCV6WqBlGzY8U2GQsZZGmfHIkG92cPN6vWjTX1JP1GuRP0S3KXZ9C6acq?=
 =?us-ascii?Q?QG6Bpc6A4tJu8m1is2S72LqzGMMMS0WCOBC/GYqcThYWmxli5nQhvDxs0f2z?=
 =?us-ascii?Q?VFV5lUYADDWsieN0tHKfUfx42S8RQE6a22/cfg/4qBZle2DtUqtifaDM79z/?=
 =?us-ascii?Q?PTAGm5NRz8MjzcCNParwOVQsUeWzDdvRl6CZd4C2yx33SYGtjJBdnQBuWUgV?=
 =?us-ascii?Q?Bo95Rjr+8jGiGhrg79Rz8AV3AuVHz5dIWg2mjjW23OILSNmB5+PtBJPbS+oL?=
 =?us-ascii?Q?uxoaQYzYOJEzGARkvy+tioQwZt6FRqJA0u0ZeKCoVu8RikekZo7WkGRUudO4?=
 =?us-ascii?Q?C1sh7NwJw5j21hbIhOLWodLemSEjVHTjWzNxHjf2OSwjETXggxUfFEvo4s6y?=
 =?us-ascii?Q?g2xATv2RAQoms6dWF1KV0Z40LKidqEgM1DzqIqHbnwTd0iH0SWJy7t2lu309?=
 =?us-ascii?Q?mg5tP2cdAUyObfC3ekQp2jnIEw6eRwi7X6vhTcjmmZolspD2IjggBK5Swg3D?=
 =?us-ascii?Q?82vpv1bJuga5W4ay+nQkhcx2ZikNIIev6icN0HBUZGyCI3RCMe5fMgkdg7hj?=
 =?us-ascii?Q?qP/y4G7pMBrp5iuKZ3qxJpEeejC9afNncA+3rYntcUUSAFSKDNu06rMEcfvx?=
 =?us-ascii?Q?BxRFUjz7/n3ZRJfNsRVxela4o3NHRvb+vVG/csr8DR2WZTVBkv/QtxW8dya1?=
 =?us-ascii?Q?vNThMrtuChaS/JPK7m5dCqbd3fWVcdyV/fFe9wdtXrxGN0gqNOYXQW3j5b7D?=
 =?us-ascii?Q?q5WL7GE0LjLIimcH45fCkwSsCELNFmFTIE9rds7LXyceY4q/pg2ZCjR3osdC?=
 =?us-ascii?Q?i3nKW9X/Zcb31F2RddjOMx4aJNlHdj2QLbGGcFHsKmpqeCs6GNV4YSMWqqUr?=
 =?us-ascii?Q?SNmOA9xhu6exwN8hdNzTdSk3E6ouFCrmlaonq36RCHDGGxNPsaf8sES9IVYx?=
 =?us-ascii?Q?cNCxkBVr0rGTQnxvDtsxLhag4e5KkrbY/Cg/fKQNjK4RUibU7iHUvsBuv+s9?=
 =?us-ascii?Q?sRsXbyGPmsq8H160NeFqImFx20AjhKnWV7qawu1J8CHt0Vrj5QBeo7L2KZGz?=
 =?us-ascii?Q?68n4Ky1lav6o/WAKRK8UmEhk+IQYQBWoSh4fS7UvaZZvEpeF+/HI4SpympGC?=
 =?us-ascii?Q?uS9t0frKIyqOYuJXC12uC0ts/nnYEn4votpGPd1gvSum5+8wo/yLzVqVYb6v?=
 =?us-ascii?Q?a4idLT81Ma6RgrEwLgE=3D?=
X-Forefront-Antispam-Report: 
	CIP:216.228.117.161;CTRY:US;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:mail.nvidia.com;PTR:dc6edge2.nvidia.com;CAT:NONE;SFS:(13230040)(376014)(7416014)(82310400026)(36860700013)(1800799024);DIR:OUT;SFP:1101;
X-OriginatorOrg: Nvidia.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 26 Nov 2025 01:10:59.1874
 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 
 37e46f68-e047-42cf-340e-08de2c88a8dc
X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: 
 TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a;Ip=[216.228.117.161];Helo=[mail.nvidia.com]
X-MS-Exchange-CrossTenant-AuthSource: 
	DS3PEPF000099D8.namprd04.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BY5PR12MB4228
Content-Type: text/plain; charset="utf-8"

From: Jason Gunthorpe <jgg@nvidia.com>

Create a new data structure to hold an array of invalidations that need to
be performed for the domain based on what masters are attached, to replace
the single smmu pointer and linked list of masters in the current design.

Each array entry holds one of the invalidation actions - S1_ASID, S2_VMID,
ATS or their variant with information to feed invalidation commands to HW.
It is structured so that multiple SMMUs can participate in the same array,
removing one key limitation of the current system.

To maximize performance, a sorted array is used as the data structure. It
allows grouping SYNCs together to parallelize invalidations. For instance,
it will group all the ATS entries after the ASID/VMID entry, so they will
all be pushed to the PCI devices in parallel with one SYNC.

To minimize the locking cost on the invalidation fast path (reader of the
invalidation array), the array is managed with RCU.

Provide a set of APIs to add/delete entries to/from an array, which cover
cannot-fail attach cases, e.g. attaching to arm_smmu_blocked_domain. Also
add kunit coverage for those APIs.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Co-developed-by: Nicolin Chen <nicolinc@nvidia.com>
Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |  97 +++++++
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c  |  92 +++++++
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 256 ++++++++++++++++++
 3 files changed, 445 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/ar=
m/arm-smmu-v3/arm-smmu-v3.h
index 96a23ca633cb..c6fb84fc9201 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -649,6 +649,92 @@ struct arm_smmu_cmdq_batch {
 	int				num;
 };
=20
+/*
+ * The order here also determines the sequence in which commands are sent =
to the
+ * command queue. E.g. TLBI must be done before ATC_INV.
+ */
+enum arm_smmu_inv_type {
+	INV_TYPE_S1_ASID,
+	INV_TYPE_S2_VMID,
+	INV_TYPE_S2_VMID_S1_CLEAR,
+	INV_TYPE_ATS,
+	INV_TYPE_ATS_FULL,
+};
+
+struct arm_smmu_inv {
+	struct arm_smmu_device *smmu;
+	u8 type;
+	u8 size_opcode;
+	u8 nsize_opcode;
+	u32 id; /* ASID or VMID or SID */
+	union {
+		size_t pgsize; /* ARM_SMMU_FEAT_RANGE_INV */
+		u32 ssid; /* INV_TYPE_ATS */
+	};
+
+	refcount_t users; /* users=3D0 to mark as a trash to be purged */
+};
+
+static inline bool arm_smmu_inv_is_ats(struct arm_smmu_inv *inv)
+{
+	return inv->type =3D=3D INV_TYPE_ATS || inv->type =3D=3D INV_TYPE_ATS_FUL=
L;
+}
+
+/**
+ * struct arm_smmu_invs - Per-domain invalidation array
+ * @max_invs: maximum capacity of the flexible array
+ * @num_invs: number of invalidations in the flexible array. May be smalle=
r than
+ *            @max_invs after a tailing trash entry is excluded, but must =
not be
+ *            greater than @max_invs
+ * @num_trashes: number of trash entries in the array for arm_smmu_invs_pu=
rge().
+ *               Must not be greater than @num_invs
+ * @rwlock: optional rwlock to fench ATS operations
+ * @has_ats: flag if the array contains an INV_TYPE_ATS or INV_TYPE_ATS_FU=
LL
+ * @rcu: rcu head for kfree_rcu()
+ * @inv: flexible invalidation array
+ *
+ * The arm_smmu_invs is an RCU data structure. During a ->attach_dev callb=
ack,
+ * arm_smmu_invs_merge(), arm_smmu_invs_unref() and arm_smmu_invs_purge() =
will
+ * be used to allocate a new copy of an old array for addition and deletio=
n in
+ * the old domain's and new domain's invs arrays.
+ *
+ * The arm_smmu_invs_unref() mutates a given array, by internally reducing=
 the
+ * users counts of some given entries. This exists to support a no-fail ro=
utine
+ * like attaching to an IOMMU_DOMAIN_BLOCKED. And it could pair with a fol=
lowup
+ * arm_smmu_invs_purge() call to generate a new clean array.
+ *
+ * Concurrent invalidation thread will push every invalidation described i=
n the
+ * array into the command queue for each invalidation event. It is designe=
d like
+ * this to optimize the invalidation fast path by avoiding locks.
+ *
+ * A domain can be shared across SMMU instances. When an instance gets rem=
oved,
+ * it would delete all the entries that belong to that SMMU instance. Then=
, a
+ * synchronize_rcu() would have to be called to sync the array, to prevent=
 any
+ * concurrent invalidation thread accessing the old array from issuing com=
mands
+ * to the command queue of a removed SMMU instance.
+ */
+struct arm_smmu_invs {
+	size_t max_invs;
+	size_t num_invs;
+	size_t num_trashes;
+	rwlock_t rwlock;
+	bool has_ats;
+	struct rcu_head rcu;
+	struct arm_smmu_inv inv[] __counted_by(max_invs);
+};
+
+static inline struct arm_smmu_invs *arm_smmu_invs_alloc(size_t num_invs)
+{
+	struct arm_smmu_invs *new_invs;
+
+	new_invs =3D kzalloc(struct_size(new_invs, inv, num_invs), GFP_KERNEL);
+	if (!new_invs)
+		return NULL;
+	new_invs->max_invs =3D new_invs->num_invs =3D num_invs;
+	rwlock_init(&new_invs->rwlock);
+	return new_invs;
+}
+
 struct arm_smmu_evtq {
 	struct arm_smmu_queue		q;
 	struct iopf_queue		*iopf;
@@ -875,6 +961,8 @@ struct arm_smmu_domain {
=20
 	struct iommu_domain		domain;
=20
+	struct arm_smmu_invs __rcu	*invs;
+
 	/* List of struct arm_smmu_master_domain */
 	struct list_head		devices;
 	spinlock_t			devices_lock;
@@ -923,6 +1011,13 @@ void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *t=
arget,
 void arm_smmu_make_sva_cd(struct arm_smmu_cd *target,
 			  struct arm_smmu_master *master, struct mm_struct *mm,
 			  u16 asid);
+
+struct arm_smmu_invs *arm_smmu_invs_merge(struct arm_smmu_invs *invs,
+					  struct arm_smmu_invs *to_merge);
+void arm_smmu_invs_unref(struct arm_smmu_invs *invs,
+			 struct arm_smmu_invs *to_unref,
+			 void (*free_fn)(struct arm_smmu_inv *inv));
+struct arm_smmu_invs *arm_smmu_invs_purge(struct arm_smmu_invs *invs);
 #endif
=20
 struct arm_smmu_master_domain {
@@ -956,6 +1051,8 @@ struct arm_smmu_domain *arm_smmu_domain_alloc(void);
=20
 static inline void arm_smmu_domain_free(struct arm_smmu_domain *smmu_domai=
n)
 {
+	/* No concurrency with invalidation is possible at this point */
+	kfree(rcu_dereference_protected(smmu_domain->invs, true));
 	kfree(smmu_domain);
 }
=20
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c b/drivers/iom=
mu/arm/arm-smmu-v3/arm-smmu-v3-test.c
index d2671bfd3798..58ec7f2f4335 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-test.c
@@ -567,6 +567,97 @@ static void arm_smmu_v3_write_cd_test_sva_release(stru=
ct kunit *test)
 						      NUM_EXPECTED_SYNCS(2));
 }
=20
+static void arm_smmu_v3_invs_test_verify(struct kunit *test,
+					 struct arm_smmu_invs *invs, int num,
+					 const int *ids, const int *users)
+{
+	KUNIT_EXPECT_EQ(test, invs->num_invs, num);
+	while (num--) {
+		KUNIT_EXPECT_EQ(test, invs->inv[num].id, ids[num]);
+		KUNIT_EXPECT_EQ(test, refcount_read(&invs->inv[num].users),
+				users[num]);
+	}
+}
+
+static struct arm_smmu_invs invs1 =3D {
+	.num_invs =3D 3,
+	.inv =3D { { .type =3D INV_TYPE_S2_VMID, .id =3D 1, },
+		 { .type =3D INV_TYPE_S2_VMID, .id =3D 2, },
+		 { .type =3D INV_TYPE_S2_VMID, .id =3D 3, }, },
+};
+
+static struct arm_smmu_invs invs2 =3D {
+	.num_invs =3D 3,
+	.inv =3D { { .type =3D INV_TYPE_S2_VMID, .id =3D 1, }, /* duplicated */
+		 { .type =3D INV_TYPE_ATS, .id =3D 4, },
+		 { .type =3D INV_TYPE_ATS, .id =3D 5, }, },
+};
+
+static struct arm_smmu_invs invs3 =3D {
+	.num_invs =3D 3,
+	.inv =3D { { .type =3D INV_TYPE_S2_VMID, .id =3D 1, }, /* duplicated */
+		 { .type =3D INV_TYPE_ATS, .id =3D 5, }, /* recover a trash */
+		 { .type =3D INV_TYPE_ATS, .id =3D 6, }, },
+};
+
+static void arm_smmu_v3_invs_test(struct kunit *test)
+{
+	const int results1[2][3] =3D { { 1, 2, 3, }, { 1, 1, 1, }, };
+	const int results2[2][5] =3D { { 1, 2, 3, 4, 5, }, { 2, 1, 1, 1, 1, }, };
+	const int results3[2][3] =3D { { 1, 2, 3, }, { 1, 1, 1, }, };
+	const int results4[2][5] =3D { { 1, 2, 3, 5, 6, }, { 2, 1, 1, 1, 1, }, };
+	const int results5[2][5] =3D { { 1, 2, 3, 5, 6, }, { 1, 0, 0, 1, 1, }, };
+	const int results6[2][3] =3D { { 1, 5, 6, }, { 1, 1, 1, }, };
+	struct arm_smmu_invs *test_a, *test_b;
+
+	/* New array */
+	test_a =3D arm_smmu_invs_alloc(0);
+	KUNIT_EXPECT_EQ(test, test_a->num_invs, 0);
+
+	/* Test1: merge invs1 (new array) */
+	test_b =3D arm_smmu_invs_merge(test_a, &invs1);
+	kfree(test_a);
+	arm_smmu_v3_invs_test_verify(test, test_b, ARRAY_SIZE(results1[0]),
+				     results1[0], results1[1]);
+
+	/* Test2: merge invs2 (new array) */
+	test_a =3D arm_smmu_invs_merge(test_b, &invs2);
+	kfree(test_b);
+	arm_smmu_v3_invs_test_verify(test, test_a, ARRAY_SIZE(results2[0]),
+				     results2[0], results2[1]);
+
+	/* Test3: unref invs2 (same array) */
+	arm_smmu_invs_unref(test_a, &invs2, NULL);
+	arm_smmu_v3_invs_test_verify(test, test_a, ARRAY_SIZE(results3[0]),
+				     results3[0], results3[1]);
+	KUNIT_EXPECT_EQ(test, test_a->num_trashes, 0);
+
+	/* Test4: merge invs3 (new array) */
+	test_b =3D arm_smmu_invs_merge(test_a, &invs3);
+	kfree(test_a);
+	arm_smmu_v3_invs_test_verify(test, test_b, ARRAY_SIZE(results4[0]),
+				     results4[0], results4[1]);
+
+	/* Test5: unref invs1 (same array) */
+	arm_smmu_invs_unref(test_b, &invs1, NULL);
+	arm_smmu_v3_invs_test_verify(test, test_b, ARRAY_SIZE(results5[0]),
+				     results5[0], results5[1]);
+	KUNIT_EXPECT_EQ(test, test_b->num_trashes, 2);
+
+	/* Test6: purge test_b (new array) */
+	test_a =3D arm_smmu_invs_purge(test_b);
+	kfree(test_b);
+	arm_smmu_v3_invs_test_verify(test, test_a, ARRAY_SIZE(results6[0]),
+				     results6[0], results6[1]);
+
+	/* Test7: unref invs3 (same array) */
+	arm_smmu_invs_unref(test_a, &invs3, NULL);
+	KUNIT_EXPECT_EQ(test, test_a->num_invs, 0);
+	KUNIT_EXPECT_EQ(test, test_a->num_trashes, 0);
+
+	kfree(test_a);
+}
+
 static struct kunit_case arm_smmu_v3_test_cases[] =3D {
 	KUNIT_CASE(arm_smmu_v3_write_ste_test_bypass_to_abort),
 	KUNIT_CASE(arm_smmu_v3_write_ste_test_abort_to_bypass),
@@ -590,6 +681,7 @@ static struct kunit_case arm_smmu_v3_test_cases[] =3D {
 	KUNIT_CASE(arm_smmu_v3_write_ste_test_s2_to_s1_stall),
 	KUNIT_CASE(arm_smmu_v3_write_cd_test_sva_clear),
 	KUNIT_CASE(arm_smmu_v3_write_cd_test_sva_release),
+	KUNIT_CASE(arm_smmu_v3_invs_test),
 	{},
 };
=20
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/ar=
m/arm-smmu-v3/arm-smmu-v3.c
index e9759e8af0c0..f6bca44c78bf 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -26,6 +26,7 @@
 #include <linux/pci.h>
 #include <linux/pci-ats.h>
 #include <linux/platform_device.h>
+#include <linux/sort.h>
 #include <linux/string_choices.h>
 #include <kunit/visibility.h>
 #include <uapi/linux/iommufd.h>
@@ -1015,6 +1016,253 @@ static void arm_smmu_page_response(struct device *d=
ev, struct iopf_fault *unused
 	 */
 }
=20
+/* Invalidation array manipulation functions */
+static inline struct arm_smmu_inv *
+arm_smmu_invs_iter_next(struct arm_smmu_invs *invs, size_t next, size_t *i=
dx)
+{
+	while (true) {
+		if (next >=3D invs->num_invs) {
+			*idx =3D next;
+			return NULL;
+		}
+		if (!refcount_read(&invs->inv[next].users)) {
+			next++;
+			continue;
+		}
+		*idx =3D next;
+		return &invs->inv[next];
+	}
+}
+
+/**
+ * arm_smmu_invs_for_each_entry - Iterate over all non-trash entries in in=
vs
+ * @invs: the base invalidation array
+ * @idx: a stack variable of 'size_t', to store the array index
+ * @cur: a stack variable of 'struct arm_smmu_inv *'
+ */
+#define arm_smmu_invs_for_each_entry(invs, idx, cur)                      =
     \
+	for (cur =3D arm_smmu_invs_iter_next(invs, 0, &(idx)); cur;              \
+	     cur =3D arm_smmu_invs_iter_next(invs, idx + 1, &(idx)))
+
+static int arm_smmu_inv_cmp(const struct arm_smmu_inv *inv_l,
+			    const struct arm_smmu_inv *inv_r)
+{
+	if (inv_l->smmu !=3D inv_r->smmu)
+		return cmp_int((uintptr_t)inv_l->smmu, (uintptr_t)inv_r->smmu);
+	if (inv_l->type !=3D inv_r->type)
+		return cmp_int(inv_l->type, inv_r->type);
+	return cmp_int(inv_l->id, inv_r->id);
+}
+
+static inline int arm_smmu_invs_iter_next_cmp(struct arm_smmu_invs *invs_l,
+					      size_t next_l, size_t *idx_l,
+					      struct arm_smmu_invs *invs_r,
+					      size_t next_r, size_t *idx_r)
+{
+	struct arm_smmu_inv *cur_l =3D
+		arm_smmu_invs_iter_next(invs_l, next_l, idx_l);
+
+	/*
+	 * We have to update the idx_r manually, because the invs_r cannot call
+	 * arm_smmu_invs_iter_next() as the invs_r never sets any users counter.
+	 */
+	*idx_r =3D next_r;
+
+	/*
+	 * Compare of two sorted arrays items. If one side is past the end of
+	 * the array, return the other side to let it run out the iteration.
+	 *
+	 * If the left entry is empty, return 1 to pick the right entry.
+	 * If the right entry is empty, return -1 to pick the left entry.
+	 */
+	if (!cur_l)
+		return 1;
+	if (next_r >=3D invs_r->num_invs)
+		return -1;
+	return arm_smmu_inv_cmp(cur_l, &invs_r->inv[next_r]);
+}
+
+/**
+ * arm_smmu_invs_for_each_entry - Iterate over two sorted arrays computing=
 for
+ *                                arm_smmu_invs_merge() or arm_smmu_invs_u=
nref()
+ * @invs_l: the base invalidation array
+ * @idx_l: a stack variable of 'size_t', to store the base array index
+ * @invs_r: the build_invs array as to_merge or to_unref
+ * @idx_r: a stack variable of 'size_t', to store the build_invs index
+ * @cmp: a stack variable of 'int', to store return value (-1, 0, or 1)
+ */
+#define arm_smmu_invs_for_each_cmp(invs_l, idx_l, invs_r, idx_r, cmp)     =
     \
+	for (idx_l =3D idx_r =3D 0,                                              =
  \
+	     cmp =3D arm_smmu_invs_iter_next_cmp(invs_l, 0, &(idx_l),            \
+					       invs_r, 0, &(idx_r));           \
+	     idx_l < invs_l->num_invs || idx_r < invs_r->num_invs;             \
+	     cmp =3D arm_smmu_invs_iter_next_cmp(                                \
+		     invs_l, idx_l + (cmp <=3D 0 ? 1 : 0), &(idx_l),             \
+		     invs_r, idx_r + (cmp >=3D 0 ? 1 : 0), &(idx_r)))
+
+/**
+ * arm_smmu_invs_merge() - Merge @to_merge into @invs and generate a new a=
rray
+ * @invs: the base invalidation array
+ * @to_merge: an array of invlidations to merge
+ *
+ * Return: a newly allocated array on success, or ERR_PTR
+ *
+ * This function must be locked and serialized with arm_smmu_invs_unref() =
and
+ * arm_smmu_invs_purge(), but do not lockdep on any lock for KUNIT test.
+ *
+ * Both @invs and @to_merge must be sorted, to ensure the returned array w=
ill be
+ * sorted as well.
+ *
+ * Caller is resposible for freeing the @invs and the returned new one.
+ *
+ * Entries marked as trash will be purged in the returned array.
+ */
+VISIBLE_IF_KUNIT
+struct arm_smmu_invs *arm_smmu_invs_merge(struct arm_smmu_invs *invs,
+					  struct arm_smmu_invs *to_merge)
+{
+	struct arm_smmu_invs *new_invs;
+	struct arm_smmu_inv *new;
+	size_t num_invs =3D 0;
+	size_t i, j;
+	int cmp;
+
+	arm_smmu_invs_for_each_cmp(invs, i, to_merge, j, cmp)
+		num_invs++;
+
+	new_invs =3D arm_smmu_invs_alloc(num_invs);
+	if (!new_invs)
+		return ERR_PTR(-ENOMEM);
+
+	new =3D new_invs->inv;
+	arm_smmu_invs_for_each_cmp(invs, i, to_merge, j, cmp) {
+		if (cmp < 0) {
+			*new =3D invs->inv[i];
+		} else if (cmp =3D=3D 0) {
+			*new =3D invs->inv[i];
+			refcount_inc(&new->users);
+		} else {
+			*new =3D to_merge->inv[j];
+			refcount_set(&new->users, 1);
+		}
+
+		/*
+		 * Check that the new array is sorted. This also validates that
+		 * to_merge is sorted.
+		 */
+		if (new !=3D new_invs->inv)
+			WARN_ON_ONCE(arm_smmu_inv_cmp(new - 1, new) =3D=3D 1);
+		new++;
+	}
+
+	WARN_ON(new !=3D new_invs->inv + new_invs->num_invs);
+
+	return new_invs;
+}
+EXPORT_SYMBOL_IF_KUNIT(arm_smmu_invs_merge);
+
+/**
+ * arm_smmu_invs_unref() - Find in @invs for all entries in @to_unref, dec=
rease
+ *                         the user counts without deletions
+ * @invs: the base invalidation array
+ * @to_unref: an array of invlidations to decrease their user counts
+ * @free_fn: A callback function to invoke, when an entry's user count red=
uces
+ *           to 0
+ *
+ * Return: the number of trash entries in the array, for arm_smmu_invs_pur=
ge()
+ *
+ * This function will not fail. Any entry with users=3D0 will be marked as=
 trash.
+ * All tailing trash entries in the array will be dropped. And the size of=
 the
+ * array will be trimmed properly. All trash entries in-between will remai=
n in
+ * the @invs until being completely deleted by the next arm_smmu_invs_merg=
e()
+ * or an arm_smmu_invs_purge() function call.
+ *
+ * This function must be locked and serialized with arm_smmu_invs_merge() =
and
+ * arm_smmu_invs_purge(), but do not lockdep on any mutex for KUNIT test.
+ *
+ * Note that the final @invs->num_invs might not reflect the actual number=
 of
+ * invalidations due to trash entries. Any reader should take the read loc=
k to
+ * iterate each entry and check its users counter till the last entry.
+ */
+VISIBLE_IF_KUNIT
+void arm_smmu_invs_unref(struct arm_smmu_invs *invs,
+			 struct arm_smmu_invs *to_unref,
+			 void (*free_fn)(struct arm_smmu_inv *inv))
+{
+	unsigned long flags;
+	size_t num_invs =3D 0;
+	size_t i, j;
+	int cmp;
+
+	arm_smmu_invs_for_each_cmp(invs, i, to_unref, j, cmp) {
+		if (cmp < 0) {
+			/* not found in to_unref, leave alone */
+			num_invs =3D i + 1;
+		} else if (cmp =3D=3D 0) {
+			/* same item */
+			if (!refcount_dec_and_test(&invs->inv[i].users)) {
+				num_invs =3D i + 1;
+				continue;
+			}
+
+			/* KUNIT test doesn't pass in a free_fn */
+			if (free_fn)
+				free_fn(&invs->inv[i]);
+			invs->num_trashes++;
+		} else {
+			/* item in to_unref is not in invs or already a trash */
+			WARN_ON(true);
+		}
+	}
+
+	/* Exclude any tailing trash */
+	invs->num_trashes -=3D invs->num_invs - num_invs;
+
+	/* The lock is required to fence concurrent ATS operations. */
+	write_lock_irqsave(&invs->rwlock, flags);
+	WRITE_ONCE(invs->num_invs, num_invs); /* Remove tailing trash entries */
+	write_unlock_irqrestore(&invs->rwlock, flags);
+}
+EXPORT_SYMBOL_IF_KUNIT(arm_smmu_invs_unref);
+
+/**
+ * arm_smmu_invs_purge() - Purge all the trash entries in the @invs
+ * @invs: the base invalidation array
+ *
+ * Return: a newly allocated array on success removing all the trash entri=
es, or
+ *         NULL if there is no trash entry in the array or there is a bug
+ *
+ * This function must be locked and serialized with arm_smmu_invs_merge() =
and
+ * arm_smmu_invs_unref(), but do not lockdep on any lock for KUNIT test.
+ *
+ * Caller is resposible for freeing the @invs and the returned new one.
+ */
+VISIBLE_IF_KUNIT
+struct arm_smmu_invs *arm_smmu_invs_purge(struct arm_smmu_invs *invs)
+{
+	struct arm_smmu_invs *new_invs;
+	struct arm_smmu_inv *inv;
+	size_t i, num_invs =3D 0;
+
+	if (WARN_ON(invs->num_invs < invs->num_trashes))
+		return NULL;
+	if (!invs->num_invs || !invs->num_trashes)
+		return NULL;
+
+	new_invs =3D arm_smmu_invs_alloc(invs->num_invs - invs->num_trashes);
+	if (!new_invs)
+		return ERR_PTR(-ENOMEM);
+
+	arm_smmu_invs_for_each_entry(invs, i, inv) {
+		new_invs->inv[num_invs] =3D *inv;
+		num_invs++;
+	}
+
+	WARN_ON(num_invs !=3D new_invs->num_invs);
+	return new_invs;
+}
+EXPORT_SYMBOL_IF_KUNIT(arm_smmu_invs_purge);
+
 /* Context descriptor manipulation functions */
 void arm_smmu_tlb_inv_asid(struct arm_smmu_device *smmu, u16 asid)
 {
@@ -2462,13 +2710,21 @@ static bool arm_smmu_enforce_cache_coherency(struct=
 iommu_domain *domain)
 struct arm_smmu_domain *arm_smmu_domain_alloc(void)
 {
 	struct arm_smmu_domain *smmu_domain;
+	struct arm_smmu_invs *new_invs;
=20
 	smmu_domain =3D kzalloc(sizeof(*smmu_domain), GFP_KERNEL);
 	if (!smmu_domain)
 		return ERR_PTR(-ENOMEM);
=20
+	new_invs =3D arm_smmu_invs_alloc(0);
+	if (!new_invs) {
+		kfree(smmu_domain);
+		return ERR_PTR(-ENOMEM);
+	}
+
 	INIT_LIST_HEAD(&smmu_domain->devices);
 	spin_lock_init(&smmu_domain->devices_lock);
+	rcu_assign_pointer(smmu_domain->invs, new_invs);
=20
 	return smmu_domain;
 }
--=20
2.43.0