[PATCH rfcv1 5/8] iommu/arm-smmu-v3: Pre-allocate a per-master invalidation array

Nicolin Chen posted 8 patches 1 month, 3 weeks ago
There is a newer version of this series
[PATCH rfcv1 5/8] iommu/arm-smmu-v3: Pre-allocate a per-master invalidation array
Posted by Nicolin Chen 1 month, 3 weeks ago
When a master is attached from an old domain to a new domain, it needs to
build an invalidation array to delete and add the array entries from/onto
the invalidation arrays of those two domains, passed via the del_invs and
add_invs arguments in to arm_smmu_invs_del/add() respectively.

Since the master->num_streams might differ across masters, a memory would
have to be allocated when building an add_invs/del_invs array which might
fail with -ENOMEM.

On the other hand, an attachment to arm_smmu_blocked_domain must not fail
so it's the best to avoid any memory allocation in that path.

Pre-allocate a fixed size invalidation array for every master. This array
will be filled dynamically when building an add_invs or del_invs array to
attach or detach an smmu_domain.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h |  1 +
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 10 ++++++++++
 2 files changed, 11 insertions(+)

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index d7421b56e3598..0330444bef45f 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -919,6 +919,7 @@ struct arm_smmu_master {
 	struct arm_smmu_device		*smmu;
 	struct device			*dev;
 	struct arm_smmu_stream		*streams;
+	struct arm_smmu_invs		*invs;
 	struct arm_smmu_vmaster		*vmaster; /* use smmu->streams_mutex */
 	/* Locked by the iommu core using the group mutex */
 	struct arm_smmu_ctx_desc_cfg	cd_table;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 73f3b411ff7ef..fb5429d8ebb29 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3723,6 +3723,7 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 	int i;
 	int ret = 0;
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(master->dev);
+	size_t num_ats = dev_is_pci(master->dev) ? master->num_streams : 0;
 
 	master->streams = kcalloc(fwspec->num_ids, sizeof(*master->streams),
 				  GFP_KERNEL);
@@ -3730,6 +3731,13 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 		return -ENOMEM;
 	master->num_streams = fwspec->num_ids;
 
+	/* Max possible num_invs: two for ASID/VMIDs and num_ats for ATC_INVs */
+	master->invs = arm_smmu_invs_alloc(2 + num_ats);
+	if (IS_ERR(master->invs)) {
+		kfree(master->streams);
+		return PTR_ERR(master->invs);
+	}
+
 	mutex_lock(&smmu->streams_mutex);
 	for (i = 0; i < fwspec->num_ids; i++) {
 		struct arm_smmu_stream *new_stream = &master->streams[i];
@@ -3767,6 +3775,7 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
 		for (i--; i >= 0; i--)
 			rb_erase(&master->streams[i].node, &smmu->streams);
 		kfree(master->streams);
+		kfree(master->invs);
 	}
 	mutex_unlock(&smmu->streams_mutex);
 
@@ -3788,6 +3797,7 @@ static void arm_smmu_remove_master(struct arm_smmu_master *master)
 	mutex_unlock(&smmu->streams_mutex);
 
 	kfree(master->streams);
+	kfree(master->invs);
 }
 
 static struct iommu_device *arm_smmu_probe_device(struct device *dev)
-- 
2.43.0
Re: [PATCH rfcv1 5/8] iommu/arm-smmu-v3: Pre-allocate a per-master invalidation array
Posted by Jason Gunthorpe 1 month, 1 week ago
On Wed, Aug 13, 2025 at 06:25:36PM -0700, Nicolin Chen wrote:
> @@ -3730,6 +3731,13 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
>  		return -ENOMEM;
>  	master->num_streams = fwspec->num_ids;
>  
> +	/* Max possible num_invs: two for ASID/VMIDs and num_ats for ATC_INVs */
> +	master->invs = arm_smmu_invs_alloc(2 + num_ats);
> +	if (IS_ERR(master->invs)) {
> +		kfree(master->streams);
> +		return PTR_ERR(master->invs);
> +	}

This seems like a nice solution, but I would add a comment here that
it is locked by the group mutex, and check if ATS is supported:

	/*
	 * Scratch memory to build the per-domain invalidation list. locked by
	 * the group_mutex. Max possible num_invs: two for ASID/VMIDs and
	 * num_streams for ATC_INVs
	 */
	if (dev_is_pci(master->dev) &&
	    pci_ats_supported(to_pci_dev(master->dev)))
		master->invs = arm_smmu_invs_alloc(2 + master->num_streams);
	else
		master->invs = arm_smmu_invs_alloc(2);

And probably rename it scratch_invs or something to indicate it is
temporary memory.

I'm not sure there is any case where fwspec->num_ids >1 &&
ats_supported, or at least is should be really rare.

Jason
Re: [PATCH rfcv1 5/8] iommu/arm-smmu-v3: Pre-allocate a per-master invalidation array
Posted by Nicolin Chen 4 weeks ago
On Tue, Aug 26, 2025 at 04:56:41PM -0300, Jason Gunthorpe wrote:
> On Wed, Aug 13, 2025 at 06:25:36PM -0700, Nicolin Chen wrote:
> > @@ -3730,6 +3731,13 @@ static int arm_smmu_insert_master(struct arm_smmu_device *smmu,
> >  		return -ENOMEM;
> >  	master->num_streams = fwspec->num_ids;
> >  
> > +	/* Max possible num_invs: two for ASID/VMIDs and num_ats for ATC_INVs */
> > +	master->invs = arm_smmu_invs_alloc(2 + num_ats);
> > +	if (IS_ERR(master->invs)) {
> > +		kfree(master->streams);
> > +		return PTR_ERR(master->invs);
> > +	}
> 
> This seems like a nice solution, but I would add a comment here that
> it is locked by the group mutex, and check if ATS is supported:
> 
> 	/*
> 	 * Scratch memory to build the per-domain invalidation list. locked by
> 	 * the group_mutex. Max possible num_invs: two for ASID/VMIDs and
> 	 * num_streams for ATC_INVs
> 	 */
> 	if (dev_is_pci(master->dev) &&
> 	    pci_ats_supported(to_pci_dev(master->dev)))
> 		master->invs = arm_smmu_invs_alloc(2 + master->num_streams);
> 	else
> 		master->invs = arm_smmu_invs_alloc(2);
> 
> And probably rename it scratch_invs or something to indicate it is
> temporary memory.

I renamed it to master->build_invs:

@@ -919,7 +931,14 @@ struct arm_smmu_master {
 	struct arm_smmu_device		*smmu;
 	struct device			*dev;
 	struct arm_smmu_stream		*streams;
-	struct arm_smmu_invs		*invs;
+	/*
+	 * Scratch memory for a to_merge or to_unref array to build a per-domain
+	 * invalidation array. It'll be pre-allocated with enough enries for all
+	 * possible build scenarios. It can be used by only one caller at a time
+	 * until the arm_smmu_invs_merge/unref() finishes. Must be locked by the
+	 * iommu_group mutex.
+	 */
+	struct arm_smmu_invs		*build_invs;
 	struct arm_smmu_vmaster		*vmaster; /* use smmu->streams_mutex */
 	/* Locked by the iommu core using the group mutex */
 	struct arm_smmu_ctx_desc_cfg	cd_table;

One thing that I noticed is that group mutex alone isn't enough,
because there can be two arm_smmu_build_invs() calls during the
same attach_dev callback. And the second one would overwrite.

Thanks
Nicolin
Re: [PATCH rfcv1 5/8] iommu/arm-smmu-v3: Pre-allocate a per-master invalidation array
Posted by Jason Gunthorpe 3 weeks, 5 days ago
On Sat, Sep 06, 2025 at 12:45:58AM -0700, Nicolin Chen wrote:
 
> One thing that I noticed is that group mutex alone isn't enough,
> because there can be two arm_smmu_build_invs() calls during the
> same attach_dev callback. And the second one would overwrite.

Well the group mutex is what makes it non-concurrent. Two things using
it sequentially within the same thread are a different issue

Jason