[PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper

Nicolin Chen posted 5 patches 1 month, 3 weeks ago
There is a newer version of this series
[PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Nicolin Chen 1 month, 3 weeks ago
There is a need to attach a PCI device that's under a reset to temporally
the blocked domain (i.e. detach it from its previously attached domain),
and then to reattach it back to its previous domain (i.e. detach it from
the blocked domain) after reset.

During the reset stage, there can be races from other attach/detachment.
To solve this, a per-gdev reset flag will be introduced so that all the
attach functions will bypass the driver-level attach_dev callbacks, but
only update the group->domain pointer. The reset recovery procedure will
attach directly to the cached pointer so things will be back to normal.

On the other hand, the iommu_get_domain_for_dev() API always returns the
group->domain pointer, and several IOMMMU drivers call this API in their
attach_dev callback functions to get the currently attached domain for a
device, which will be broken for the recovery case mentioned above:
 1. core asks the driver to attach dev from blocked to group->domain
 2. driver attaches dev from group->domain to group->domain

So, iommu_get_domain_for_dev() should check the gdev flag and return the
blocked domain if the flag is set. But the caller of this API could hold
the group->mutex already or not, making it difficult to add the lock.

Introduce a new iommu_get_domain_for_dev_locked() helper to be used by
those drivers in a context that is already under the protection of the
group->mutex, e.g. those attach_dev callback functions. And roll out the
new helper to all the existing IOMMU drivers.

Add a lockdep_assert_not_held to the existing iommu_get_domain_for_dev()
to note that it would be only used outside the group->mutex.

Signed-off-by: Nicolin Chen <nicolinc@nvidia.com>
---
 include/linux/iommu.h                              |  1 +
 .../iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c    |  2 +-
 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c        |  9 +++++----
 drivers/iommu/arm/arm-smmu/qcom_iommu.c            |  2 +-
 drivers/iommu/dma-iommu.c                          |  2 +-
 drivers/iommu/fsl_pamu_domain.c                    |  2 +-
 drivers/iommu/iommu.c                              | 14 ++++++++++++++
 drivers/iommu/ipmmu-vmsa.c                         |  2 +-
 drivers/iommu/msm_iommu.c                          |  2 +-
 drivers/iommu/mtk_iommu.c                          |  2 +-
 drivers/iommu/omap-iommu.c                         |  2 +-
 drivers/iommu/tegra-smmu.c                         |  2 +-
 12 files changed, 29 insertions(+), 13 deletions(-)

diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index c30d12e16473d..61b17883cb0cb 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -909,6 +909,7 @@ extern int iommu_attach_device(struct iommu_domain *domain,
 extern void iommu_detach_device(struct iommu_domain *domain,
 				struct device *dev);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
+struct iommu_domain *iommu_get_domain_for_dev_locked(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
 		     phys_addr_t paddr, size_t size, int prot, gfp_t gfp);
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
index 8cd8929bbfdf8..6d44c97d430b4 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
@@ -145,7 +145,7 @@ static int arm_smmu_attach_dev_nested(struct iommu_domain *domain,
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
 	struct arm_smmu_attach_state state = {
 		.master = master,
-		.old_domain = iommu_get_domain_for_dev(dev),
+		.old_domain = iommu_get_domain_for_dev_locked(dev),
 		.ssid = IOMMU_NO_PASID,
 	};
 	struct arm_smmu_ste ste;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 5968043ac8023..3efe51ae37edb 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -3010,7 +3010,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev)
 	struct arm_smmu_device *smmu;
 	struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
 	struct arm_smmu_attach_state state = {
-		.old_domain = iommu_get_domain_for_dev(dev),
+		.old_domain = iommu_get_domain_for_dev_locked(dev),
 		.ssid = IOMMU_NO_PASID,
 	};
 	struct arm_smmu_master *master;
@@ -3124,7 +3124,8 @@ int arm_smmu_set_pasid(struct arm_smmu_master *master,
 		       struct arm_smmu_domain *smmu_domain, ioasid_t pasid,
 		       struct arm_smmu_cd *cd, struct iommu_domain *old)
 {
-	struct iommu_domain *sid_domain = iommu_get_domain_for_dev(master->dev);
+	struct iommu_domain *sid_domain =
+		iommu_get_domain_for_dev_locked(master->dev);
 	struct arm_smmu_attach_state state = {
 		.master = master,
 		.ssid = pasid,
@@ -3190,7 +3191,7 @@ static int arm_smmu_blocking_set_dev_pasid(struct iommu_domain *new_domain,
 	 */
 	if (!arm_smmu_ssids_in_use(&master->cd_table)) {
 		struct iommu_domain *sid_domain =
-			iommu_get_domain_for_dev(master->dev);
+			iommu_get_domain_for_dev_locked(master->dev);
 
 		if (sid_domain->type == IOMMU_DOMAIN_IDENTITY ||
 		    sid_domain->type == IOMMU_DOMAIN_BLOCKED)
@@ -3207,7 +3208,7 @@ static void arm_smmu_attach_dev_ste(struct iommu_domain *domain,
 	struct arm_smmu_master *master = dev_iommu_priv_get(dev);
 	struct arm_smmu_attach_state state = {
 		.master = master,
-		.old_domain = iommu_get_domain_for_dev(dev),
+		.old_domain = iommu_get_domain_for_dev_locked(dev),
 		.ssid = IOMMU_NO_PASID,
 	};
 
diff --git a/drivers/iommu/arm/arm-smmu/qcom_iommu.c b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
index c5be95e560317..c82fbcd28cdde 100644
--- a/drivers/iommu/arm/arm-smmu/qcom_iommu.c
+++ b/drivers/iommu/arm/arm-smmu/qcom_iommu.c
@@ -390,7 +390,7 @@ static int qcom_iommu_attach_dev(struct iommu_domain *domain, struct device *dev
 static int qcom_iommu_identity_attach(struct iommu_domain *identity_domain,
 				      struct device *dev)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_domain_for_dev_locked(dev);
 	struct qcom_iommu_domain *qcom_domain;
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 	struct qcom_iommu_dev *qcom_iommu = dev_iommu_priv_get(dev);
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ea2ef53bd4fef..99680cdb57265 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -2097,7 +2097,7 @@ EXPORT_SYMBOL_GPL(dma_iova_destroy);
 
 void iommu_setup_dma_ops(struct device *dev)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_domain_for_dev_locked(dev);
 
 	if (dev_is_pci(dev))
 		dev->iommu->pci_32bit_workaround = !iommu_dma_forcedac;
diff --git a/drivers/iommu/fsl_pamu_domain.c b/drivers/iommu/fsl_pamu_domain.c
index 5f08523f97cb9..2b7395c5884ea 100644
--- a/drivers/iommu/fsl_pamu_domain.c
+++ b/drivers/iommu/fsl_pamu_domain.c
@@ -300,7 +300,7 @@ static int fsl_pamu_attach_device(struct iommu_domain *domain,
 static int fsl_pamu_platform_attach(struct iommu_domain *platform_domain,
 				    struct device *dev)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_domain_for_dev_locked(dev);
 	struct fsl_dma_domain *dma_domain;
 	const u32 *prop;
 	int len;
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index e6a66dacce1b8..8c277cc8e9750 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2176,6 +2176,7 @@ void iommu_detach_device(struct iommu_domain *domain, struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_detach_device);
 
+/* Caller can be any general/external function that isn't an IOMMU callback */
 struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
 {
 	/* Caller must be a probed driver on dev */
@@ -2184,10 +2185,23 @@ struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)
 	if (!group)
 		return NULL;
 
+	lockdep_assert_not_held(&group->mutex);
+
 	return group->domain;
 }
 EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev);
 
+/* Caller must be an IOMMU callback/internal function that holds group->mutex */
+struct iommu_domain *iommu_get_domain_for_dev_locked(struct device *dev)
+{
+	struct iommu_group *group = dev->iommu_group;
+
+	lockdep_assert_held(&group->mutex);
+
+	return group->domain;
+}
+EXPORT_SYMBOL_GPL(iommu_get_domain_for_dev_locked);
+
 /*
  * For IOMMU_DOMAIN_DMA implementations which already provide their own
  * guarantees that the group and its default domain are valid and correct.
diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index ffa892f657140..7e8de6e1bf78b 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -639,7 +639,7 @@ static int ipmmu_attach_device(struct iommu_domain *io_domain,
 static int ipmmu_iommu_identity_attach(struct iommu_domain *identity_domain,
 				       struct device *dev)
 {
-	struct iommu_domain *io_domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *io_domain = iommu_get_domain_for_dev_locked(dev);
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 	struct ipmmu_vmsa_domain *domain;
 	unsigned int i;
diff --git a/drivers/iommu/msm_iommu.c b/drivers/iommu/msm_iommu.c
index 43a61ba021a51..58cc08c8ede8b 100644
--- a/drivers/iommu/msm_iommu.c
+++ b/drivers/iommu/msm_iommu.c
@@ -443,7 +443,7 @@ static int msm_iommu_attach_dev(struct iommu_domain *domain, struct device *dev)
 static int msm_iommu_identity_attach(struct iommu_domain *identity_domain,
 				     struct device *dev)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_domain_for_dev_locked(dev);
 	struct msm_priv *priv;
 	unsigned long flags;
 	struct msm_iommu_dev *iommu;
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 0e0285348d2b8..31b29bcfc7569 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -775,7 +775,7 @@ static int mtk_iommu_attach_device(struct iommu_domain *domain,
 static int mtk_iommu_identity_attach(struct iommu_domain *identity_domain,
 				     struct device *dev)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_domain_for_dev_locked(dev);
 	struct mtk_iommu_data *data = dev_iommu_priv_get(dev);
 
 	if (domain == identity_domain || !domain)
diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index 6fb93927bdb98..881fa5d243a15 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1538,7 +1538,7 @@ static void _omap_iommu_detach_dev(struct omap_iommu_domain *omap_domain,
 static int omap_iommu_identity_attach(struct iommu_domain *identity_domain,
 				      struct device *dev)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_domain_for_dev_locked(dev);
 	struct omap_iommu_domain *omap_domain;
 
 	if (domain == identity_domain || !domain)
diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 36cdd5fbab077..fdec0439bd683 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -526,7 +526,7 @@ static int tegra_smmu_attach_dev(struct iommu_domain *domain,
 static int tegra_smmu_identity_attach(struct iommu_domain *identity_domain,
 				      struct device *dev)
 {
-	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
+	struct iommu_domain *domain = iommu_get_domain_for_dev_locked(dev);
 	struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
 	struct tegra_smmu_as *as;
 	struct tegra_smmu *smmu;
-- 
2.43.0
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Jason Gunthorpe 1 month, 2 weeks ago
On Mon, Aug 11, 2025 at 03:59:10PM -0700, Nicolin Chen wrote:

> Introduce a new iommu_get_domain_for_dev_locked() helper to be used by
> those drivers in a context that is already under the protection of the
> group->mutex, e.g. those attach_dev callback functions. And roll out the
> new helper to all the existing IOMMU drivers.

I still don't much care for this, I think it would be better to
approach this problem by passing the old domain to the driver callback:

> @@ -390,7 +390,7 @@ static int qcom_iommu_attach_dev(struct iommu_domain *domain, struct device *dev
>  static int qcom_iommu_identity_attach(struct iommu_domain *identity_domain,
>  				      struct device *dev)
>  {
> -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> +	struct iommu_domain *domain = iommu_get_domain_for_dev_locked(dev);

Because this is a very common pattern in drivers.

Once that is done we can see what calls to iommu_get_domain_for_dev()
are even left, arguably we should be trying to eliminate this badly
locked thing...

Jason
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Nicolin Chen 1 month, 2 weeks ago
On Mon, Aug 18, 2025 at 11:39:49AM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 11, 2025 at 03:59:10PM -0700, Nicolin Chen wrote:
> 
> > Introduce a new iommu_get_domain_for_dev_locked() helper to be used by
> > those drivers in a context that is already under the protection of the
> > group->mutex, e.g. those attach_dev callback functions. And roll out the
> > new helper to all the existing IOMMU drivers.
> 
> I still don't much care for this, I think it would be better to
> approach this problem by passing the old domain to the driver callback:

Hmm, that was my first attempt before making this patch until...

> > @@ -390,7 +390,7 @@ static int qcom_iommu_attach_dev(struct iommu_domain *domain, struct device *dev
> >  static int qcom_iommu_identity_attach(struct iommu_domain *identity_domain,
> >  				      struct device *dev)
> >  {
> > -	struct iommu_domain *domain = iommu_get_domain_for_dev(dev);
> > +	struct iommu_domain *domain = iommu_get_domain_for_dev_locked(dev);
> 
> Because this is a very common pattern in drivers.
> 
> Once that is done we can see what calls to iommu_get_domain_for_dev()
> are even left,

... I found that in SMMUv3 driver, iommu_get_domain_for_dev() is
used to get the RID domain for an SVA domain:
    arm_smmu_set_pasid()
    arm_smmu_blocking_set_dev_pasid()

These two are already given an "old" (SVA) domain pointer, FWIW.

So, we may change to passing in the old domain as you suggested,
yet we still have to fix the iommu_get_domain_for_dev() in order
to reflect the RID domain correctly for the driver that calls it
(or even potentially) in some group->mutex locked context where
the RID domain might not be naturally passed in.

> arguably we should be trying to eliminate this badly
> locked thing...

Any suggestion?

I see it inevitable that such an iommu specific flag per-device
would have to take the lock..

Thanks
Nicolin
RE: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Tian, Kevin 1 month, 2 weeks ago
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Tuesday, August 19, 2025 1:23 AM
> 
> ... I found that in SMMUv3 driver, iommu_get_domain_for_dev() is
> used to get the RID domain for an SVA domain:
>     arm_smmu_set_pasid()
>     arm_smmu_blocking_set_dev_pasid()
> 
> These two are already given an "old" (SVA) domain pointer, FWIW.
> 
> So, we may change to passing in the old domain as you suggested,
> yet we still have to fix the iommu_get_domain_for_dev() in order
> to reflect the RID domain correctly for the driver that calls it
> (or even potentially) in some group->mutex locked context where
> the RID domain might not be naturally passed in.
> 

Out of curiosity.

arm_smmu_blocking_set_dev_pasid()

	/*                              
	 * When the last user of the CD table goes away downgrade the STE back
	 * to a non-cd_table one.
	 */
	if (!arm_smmu_ssids_in_use(&master->cd_table)) {
		struct iommu_domain *sid_domain =
			iommu_get_domain_for_dev(master->dev);

		if (sid_domain->type == IOMMU_DOMAIN_IDENTITY ||
		   sid_domain->type == IOMMU_DOMAIN_BLOCKED)
			sid_domain->ops->attach_dev(sid_domain, dev); 
	}

why cannot downgrade apply to the case where the RID is attached to
a DMA domain?
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Jason Gunthorpe 1 month, 2 weeks ago
On Thu, Aug 21, 2025 at 08:11:05AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Tuesday, August 19, 2025 1:23 AM
> > 
> > ... I found that in SMMUv3 driver, iommu_get_domain_for_dev() is
> > used to get the RID domain for an SVA domain:
> >     arm_smmu_set_pasid()
> >     arm_smmu_blocking_set_dev_pasid()
> > 
> > These two are already given an "old" (SVA) domain pointer, FWIW.
> > 
> > So, we may change to passing in the old domain as you suggested,
> > yet we still have to fix the iommu_get_domain_for_dev() in order
> > to reflect the RID domain correctly for the driver that calls it
> > (or even potentially) in some group->mutex locked context where
> > the RID domain might not be naturally passed in.
> > 
> 
> Out of curiosity.
> 
> arm_smmu_blocking_set_dev_pasid()
> 
> 	/*                              
> 	 * When the last user of the CD table goes away downgrade the STE back
> 	 * to a non-cd_table one.
> 	 */
> 	if (!arm_smmu_ssids_in_use(&master->cd_table)) {
> 		struct iommu_domain *sid_domain =
> 			iommu_get_domain_for_dev(master->dev);
> 
> 		if (sid_domain->type == IOMMU_DOMAIN_IDENTITY ||
> 		   sid_domain->type == IOMMU_DOMAIN_BLOCKED)
> 			sid_domain->ops->attach_dev(sid_domain, dev); 
> 	}
> 
> why cannot downgrade apply to the case where the RID is attached to
> a DMA domain?

If the RID is a PAGING domain then it must be a S1 paging domain and there is
no downgrade possible.

It is impossible for the RID to be a S2 paging domain while ssids are
in use.

Jason
RE: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Tian, Kevin 1 month, 1 week ago
> From: Jason Gunthorpe <jgg@nvidia.com>
> Sent: Thursday, August 21, 2025 9:14 PM
> 
> On Thu, Aug 21, 2025 at 08:11:05AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Tuesday, August 19, 2025 1:23 AM
> > >
> > > ... I found that in SMMUv3 driver, iommu_get_domain_for_dev() is
> > > used to get the RID domain for an SVA domain:
> > >     arm_smmu_set_pasid()
> > >     arm_smmu_blocking_set_dev_pasid()
> > >
> > > These two are already given an "old" (SVA) domain pointer, FWIW.
> > >
> > > So, we may change to passing in the old domain as you suggested,
> > > yet we still have to fix the iommu_get_domain_for_dev() in order
> > > to reflect the RID domain correctly for the driver that calls it
> > > (or even potentially) in some group->mutex locked context where
> > > the RID domain might not be naturally passed in.
> > >
> >
> > Out of curiosity.
> >
> > arm_smmu_blocking_set_dev_pasid()
> >
> > 	/*
> > 	 * When the last user of the CD table goes away downgrade the STE
> back
> > 	 * to a non-cd_table one.
> > 	 */
> > 	if (!arm_smmu_ssids_in_use(&master->cd_table)) {
> > 		struct iommu_domain *sid_domain =
> > 			iommu_get_domain_for_dev(master->dev);
> >
> > 		if (sid_domain->type == IOMMU_DOMAIN_IDENTITY ||
> > 		   sid_domain->type == IOMMU_DOMAIN_BLOCKED)
> > 			sid_domain->ops->attach_dev(sid_domain, dev);
> > 	}
> >
> > why cannot downgrade apply to the case where the RID is attached to
> > a DMA domain?
> 
> If the RID is a PAGING domain then it must be a S1 paging domain and there
> is
> no downgrade possible.
> 
> It is impossible for the RID to be a S2 paging domain while ssids are
> in use.
> 

Thanks for the background. btw is it technically impossible or just
not worth of the extra software complexity for no value? e.g. if
maintaining two page tables (S1/S2 formats) with exact same mappings,
does SMMU allow smooth transition between two modes w/o breaking
in-fly DMAs? but probably keeping two page tables in-sync in transition
is already a problem w/o proper locking in map/unmap... 
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Jason Gunthorpe 1 month, 1 week ago
On Fri, Aug 22, 2025 at 09:45:28AM +0000, Tian, Kevin wrote:
> Thanks for the background. btw is it technically impossible or just
> not worth of the extra software complexity for no value? 

Unsupported by HW. It cannot mix S2 and S1 formats on
SSIDs. If SSID is in use then all must be S1.

> e.g. if maintaining two page tables (S1/S2 formats) with exact same
> mappings, does SMMU allow smooth transition between two modes w/o
> breaking in-fly DMAs? 

It could and there is SW support for that..

> but probably keeping two page tables in-sync in transition is
> already a problem w/o proper locking in map/unmap...

Yes, plus the doubling the memory. It is not worthwile there is really
very little advantange to the S2 format.

Jason
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Jason Gunthorpe 1 month, 2 weeks ago
On Mon, Aug 18, 2025 at 10:22:52AM -0700, Nicolin Chen wrote:
> > Because this is a very common pattern in drivers.
> > 
> > Once that is done we can see what calls to iommu_get_domain_for_dev()
> > are even left,
> 
> ... I found that in SMMUv3 driver, iommu_get_domain_for_dev() is
> used to get the RID domain for an SVA domain:
>     arm_smmu_set_pasid()
>     arm_smmu_blocking_set_dev_pasid()
> 
> These two are already given an "old" (SVA) domain pointer, FWIW.
> 
> So, we may change to passing in the old domain as you suggested,
> yet we still have to fix the iommu_get_domain_for_dev() in order
> to reflect the RID domain correctly for the driver that calls it
> (or even potentially) in some group->mutex locked context where
> the RID domain might not be naturally passed in.

It could probably be avoided by keeping track of more information in
the master, but also it is not so bad to use a _locked version here.

> > arguably we should be trying to eliminate this badly
> > locked thing...
> 
> Any suggestion?

Bit by bit.. I counted 58 by grep

Changing attach will get rid of alot of them

Then there is stuff like this:

        domain = iommu_get_domain_for_dev(emu->card->dev);
        if (!domain || domain->type == IOMMU_DOMAIN_IDENTITY)
                return;

Which should be more like 
   if (iommu_get_translation_mode(dev) == IDENTITY)

With sensible internal locking

So that is another bunch. Not sure what will be left after.

Not saying to do all that here, just prefer we move toward that direction.

Jason
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Nicolin Chen 1 month, 2 weeks ago
On Mon, Aug 18, 2025 at 08:42:41PM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 18, 2025 at 10:22:52AM -0700, Nicolin Chen wrote:
> > > Because this is a very common pattern in drivers.
> > > 
> > > Once that is done we can see what calls to iommu_get_domain_for_dev()
> > > are even left,
> > 
> > ... I found that in SMMUv3 driver, iommu_get_domain_for_dev() is
> > used to get the RID domain for an SVA domain:
> >     arm_smmu_set_pasid()
> >     arm_smmu_blocking_set_dev_pasid()
> > 
> > These two are already given an "old" (SVA) domain pointer, FWIW.
> > 
> > So, we may change to passing in the old domain as you suggested,
> > yet we still have to fix the iommu_get_domain_for_dev() in order
> > to reflect the RID domain correctly for the driver that calls it
> > (or even potentially) in some group->mutex locked context where
> > the RID domain might not be naturally passed in.
> 
> It could probably be avoided by keeping track of more information in
> the master, but also it is not so bad to use a _locked version here.

Yes, I've thought about that. The concern is that some other place
someday may want to use iommu_get_domain_for_dev() in similar cases
but would find that it doesn't work. So it would have to duplicate
the domain pointer in its "master" structure.

Overall, having a _locked version feels cleaner to me.

> > > arguably we should be trying to eliminate this badly
> > > locked thing...
> > 
> > Any suggestion?
> 
> Bit by bit.. I counted 58 by grep
> 
> Changing attach will get rid of alot of them
> 
> Then there is stuff like this:
> 
>         domain = iommu_get_domain_for_dev(emu->card->dev);
>         if (!domain || domain->type == IOMMU_DOMAIN_IDENTITY)
>                 return;
> 
> Which should be more like 
>    if (iommu_get_translation_mode(dev) == IDENTITY)
>
> With sensible internal locking

Hmm, I feel this iommu_get_translation_mode() is somewhat the same
as the current iommu_get_domain_for_dev(). It would just return the
group->domain->type v.s. group->domain, right?

This doesn't have any UAF concern though.

> So that is another bunch. Not sure what will be left after.

I recall that some of the drivers manages their own domains, e.g.
    drivers/gpu/drm/tegra/drm.c

So, they would want more out of the domain pointer than just type.

> Not saying to do all that here, just prefer we move toward that direction.

Yea.. I also think it's a bit difficult to justify the changes in
the non-iommu callers, since they are not affected by any patch in
this series.

Thanks
Nicolin
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Jason Gunthorpe 1 month, 2 weeks ago
On Mon, Aug 18, 2025 at 10:09:11PM -0700, Nicolin Chen wrote:
> Yes, I've thought about that. The concern is that some other place
> someday may want to use iommu_get_domain_for_dev() in similar cases
> but would find that it doesn't work. So it would have to duplicate
> the domain pointer in its "master" structure.
> 
> Overall, having a _locked version feels cleaner to me.

We probably need the locked version, but it just shouldn't be called very
much..

> > With sensible internal locking
> 
> Hmm, I feel this iommu_get_translation_mode() is somewhat the same
> as the current iommu_get_domain_for_dev(). It would just return the
> group->domain->type v.s. group->domain, right?
> 
> This doesn't have any UAF concern though.

Yes, no UAF concern is the point

> > So that is another bunch. Not sure what will be left after.
> 
> I recall that some of the drivers manages their own domains, e.g.
>     drivers/gpu/drm/tegra/drm.c
> 
> So, they would want more out of the domain pointer than just type.

This looks like it wants an 'is currently attached' operation

Jason
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Nicolin Chen 1 month, 2 weeks ago
On Tue, Aug 19, 2025 at 09:52:49AM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 18, 2025 at 10:09:11PM -0700, Nicolin Chen wrote:
> > Yes, I've thought about that. The concern is that some other place
> > someday may want to use iommu_get_domain_for_dev() in similar cases
> > but would find that it doesn't work. So it would have to duplicate
> > the domain pointer in its "master" structure.
> > 
> > Overall, having a _locked version feels cleaner to me.
> 
> We probably need the locked version, but it just shouldn't be called very
> much..

OK. Let's have one patch upgrading the attach_dev to pass in the
old domain pointer (aligning with the SVA version of attach_dev),
and another patch adding the _locked version that then will have
very limited callers.

> > > With sensible internal locking
> > 
> > Hmm, I feel this iommu_get_translation_mode() is somewhat the same
> > as the current iommu_get_domain_for_dev(). It would just return the
> > group->domain->type v.s. group->domain, right?
> > 
> > This doesn't have any UAF concern though.
> 
> Yes, no UAF concern is the point

I see.

> > > So that is another bunch. Not sure what will be left after.
> > 
> > I recall that some of the drivers manages their own domains, e.g.
> >     drivers/gpu/drm/tegra/drm.c
> > 
> > So, they would want more out of the domain pointer than just type.
> 
> This looks like it wants an 'is currently attached' operation

That's certainly one of the wide use cases. And I think we could
have an IOMMU_DOMAIN_NONE to fit that into the type helper.

Yet, I also see some other cases that cannot be helped with the
type function. Just listing a few:

1) domain matching (and type)
drivers/gpu/drm/tegra/drm.c:965:        if (domain && domain->type != IOMMU_DOMAIN_IDENTITY &&
drivers/gpu/drm/tegra/drm.c:966:            domain != tegra->domain)
drivers/gpu/drm/tegra/drm.c-967-                return 0;

2) page size
drivers/gpu/drm/arm/malidp_planes.c:307:	mmu_dom = iommu_get_domain_for_dev(mp->base.dev->dev);
drivers/gpu/drm/arm/malidp_planes.c-308-	if (mmu_dom)
drivers/gpu/drm/arm/malidp_planes.c-309-		return mmu_dom->pgsize_bitmap;

3) iommu_iova_to_phys
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2597:	dom = iommu_get_domain_for_dev(adev->dev);
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2598-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2599-	while (size) {
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2600-		phys_addr_t addr = *pos & PAGE_MASK;
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2601-		loff_t off = *pos & ~PAGE_MASK;
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2602-		size_t bytes = PAGE_SIZE - off;
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2603-		unsigned long pfn;
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2604-		struct page *p;
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2605-		void *ptr;
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2606-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2607-		bytes = min(bytes, size);
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2608-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2609:		addr = dom ? iommu_iova_to_phys(dom, addr) : addr;

4) map/unmap
drivers/net/ipa/ipa_mem.c:465:  domain = iommu_get_domain_for_dev(dev);
drivers/net/ipa/ipa_mem.c-466-  if (!domain) {
drivers/net/ipa/ipa_mem.c-467-          dev_err(dev, "no IOMMU domain found for IMEM\n");
drivers/net/ipa/ipa_mem.c-468-          return -EINVAL;
drivers/net/ipa/ipa_mem.c-469-  }
drivers/net/ipa/ipa_mem.c-470-
drivers/net/ipa/ipa_mem.c-471-  /* Align the address down and the size up to page boundaries */
drivers/net/ipa/ipa_mem.c-472-  phys = addr & PAGE_MASK;
drivers/net/ipa/ipa_mem.c-473-  size = PAGE_ALIGN(size + addr - phys);
drivers/net/ipa/ipa_mem.c-474-  iova = phys;    /* We just want a direct mapping */
drivers/net/ipa/ipa_mem.c-475-
drivers/net/ipa/ipa_mem.c-476-  ret = iommu_map(domain, iova, phys, size, IOMMU_READ | IOMMU_WRITE,
...
drivers/net/ipa/ipa_mem.c:495:  domain = iommu_get_domain_for_dev(dev);
drivers/net/ipa/ipa_mem.c-496-  if (domain) {
drivers/net/ipa/ipa_mem.c-497-          size_t size;
drivers/net/ipa/ipa_mem.c-498-
drivers/net/ipa/ipa_mem.c-499-          size = iommu_unmap(domain, ipa->imem_iova, ipa->imem_size);


We could probably invent something for case 1-3. But for case 4,
it feels that iommu_get_domain_for_dev() is the only helper..

Thanks
Nicolin
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Jason Gunthorpe 1 month, 2 weeks ago
On Tue, Aug 19, 2025 at 10:22:20AM -0700, Nicolin Chen wrote:

> Yet, I also see some other cases that cannot be helped with the
> type function. Just listing a few:

Probably several query functions are needed that can be lock safe
 
> 1) domain matching (and type)
> drivers/gpu/drm/tegra/drm.c:965:        if (domain && domain->type != IOMMU_DOMAIN_IDENTITY &&
> drivers/gpu/drm/tegra/drm.c:966:            domain != tegra->domain)
> drivers/gpu/drm/tegra/drm.c-967-                return 0;

is attached
 
> 2) page size
> drivers/gpu/drm/arm/malidp_planes.c:307:	mmu_dom = iommu_get_domain_for_dev(mp->base.dev->dev);
> drivers/gpu/drm/arm/malidp_planes.c-308-	if (mmu_dom)
> drivers/gpu/drm/arm/malidp_planes.c-309-		return mmu_dom->pgsize_bitmap;

return page size bitmap
 
> 3) iommu_iova_to_phys
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2597:	dom = iommu_get_domain_for_dev(adev->dev);
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2598-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2599-	while (size) {
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2600-		phys_addr_t addr = *pos & PAGE_MASK;
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2601-		loff_t off = *pos & ~PAGE_MASK;
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2602-		size_t bytes = PAGE_SIZE - off;
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2603-		unsigned long pfn;
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2604-		struct page *p;
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2605-		void *ptr;
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2606-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2607-		bytes = min(bytes, size);
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c-2608-
> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c:2609:		addr = dom ? iommu_iova_to_phys(dom, addr) : addr;

safe iova to phys directly from a struct device
 
> 4) map/unmap
> drivers/net/ipa/ipa_mem.c:465:  domain = iommu_get_domain_for_dev(dev);
> drivers/net/ipa/ipa_mem.c-466-  if (!domain) {
> drivers/net/ipa/ipa_mem.c-467-          dev_err(dev, "no IOMMU domain found for IMEM\n");
> drivers/net/ipa/ipa_mem.c-468-          return -EINVAL;
> drivers/net/ipa/ipa_mem.c-469-  }
> drivers/net/ipa/ipa_mem.c-470-
> drivers/net/ipa/ipa_mem.c-471-  /* Align the address down and the size up to page boundaries */
> drivers/net/ipa/ipa_mem.c-472-  phys = addr & PAGE_MASK;
> drivers/net/ipa/ipa_mem.c-473-  size = PAGE_ALIGN(size + addr - phys);
> drivers/net/ipa/ipa_mem.c-474-  iova = phys;    /* We just want a direct mapping */
> drivers/net/ipa/ipa_mem.c-475-
> drivers/net/ipa/ipa_mem.c-476-  ret = iommu_map(domain, iova, phys, size, IOMMU_READ | IOMMU_WRITE,
> ...
> drivers/net/ipa/ipa_mem.c:495:  domain = iommu_get_domain_for_dev(dev);
> drivers/net/ipa/ipa_mem.c-496-  if (domain) {
> drivers/net/ipa/ipa_mem.c-497-          size_t size;
> drivers/net/ipa/ipa_mem.c-498-
> drivers/net/ipa/ipa_mem.c-499-          size = iommu_unmap(domain, ipa->imem_iova, ipa->imem_size);

Broken! Illegal to call iommu_map on a DMA API domain.

This is exactly the sort of abuse I would like to see made imposible :(

If it really needs something like this then it needs a proper dma api
interface to do it and properly reserve the iova from the allocator.

Jason
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Nicolin Chen 1 month, 2 weeks ago
On Thu, Aug 21, 2025 at 10:13:04AM -0300, Jason Gunthorpe wrote:
> On Tue, Aug 19, 2025 at 10:22:20AM -0700, Nicolin Chen wrote:
> 
> > Yet, I also see some other cases that cannot be helped with the
> > type function. Just listing a few:
> 
> Probably several query functions are needed that can be lock safe
>  
> > 1) domain matching (and type)
> > drivers/gpu/drm/tegra/drm.c:965:        if (domain && domain->type != IOMMU_DOMAIN_IDENTITY &&
> > drivers/gpu/drm/tegra/drm.c:966:            domain != tegra->domain)
> > drivers/gpu/drm/tegra/drm.c-967-                return 0;
> 
> is attached

I should have pasted the full piece:
drivers/gpu/drm/tegra/drm.c-960-	/*
drivers/gpu/drm/tegra/drm.c:961:	 * If the host1x client is already attached to an IOMMU domain that is
drivers/gpu/drm/tegra/drm.c-962-	 * not the shared IOMMU domain, don't try to attach it to a different
drivers/gpu/drm/tegra/drm.c-963-	 * domain. This allows using the IOMMU-backed DMA API.
drivers/gpu/drm/tegra/drm.c-964-	 */
drivers/gpu/drm/tegra/drm.c-965-	if (domain && domain->type != IOMMU_DOMAIN_IDENTITY &&
drivers/gpu/drm/tegra/drm.c-966-	    domain != tegra->domain)

So, the check is two-fold:
1) is attached
2) is the shared IOMMU domain (tegra->domain?)
  
> > 4) map/unmap
> > drivers/net/ipa/ipa_mem.c:465:  domain = iommu_get_domain_for_dev(dev);
> > drivers/net/ipa/ipa_mem.c-466-  if (!domain) {
> > drivers/net/ipa/ipa_mem.c-467-          dev_err(dev, "no IOMMU domain found for IMEM\n");
> > drivers/net/ipa/ipa_mem.c-468-          return -EINVAL;
> > drivers/net/ipa/ipa_mem.c-469-  }
> > drivers/net/ipa/ipa_mem.c-470-
> > drivers/net/ipa/ipa_mem.c-471-  /* Align the address down and the size up to page boundaries */
> > drivers/net/ipa/ipa_mem.c-472-  phys = addr & PAGE_MASK;
> > drivers/net/ipa/ipa_mem.c-473-  size = PAGE_ALIGN(size + addr - phys);
> > drivers/net/ipa/ipa_mem.c-474-  iova = phys;    /* We just want a direct mapping */
> > drivers/net/ipa/ipa_mem.c-475-
> > drivers/net/ipa/ipa_mem.c-476-  ret = iommu_map(domain, iova, phys, size, IOMMU_READ | IOMMU_WRITE,
> > ...
> > drivers/net/ipa/ipa_mem.c:495:  domain = iommu_get_domain_for_dev(dev);
> > drivers/net/ipa/ipa_mem.c-496-  if (domain) {
> > drivers/net/ipa/ipa_mem.c-497-          size_t size;
> > drivers/net/ipa/ipa_mem.c-498-
> > drivers/net/ipa/ipa_mem.c-499-          size = iommu_unmap(domain, ipa->imem_iova, ipa->imem_size);
> 
> Broken! Illegal to call iommu_map on a DMA API domain.
> 
> This is exactly the sort of abuse I would like to see made imposible :(
> 
> If it really needs something like this then it needs a proper dma api
> interface to do it and properly reserve the iova from the allocator.

Yea. This particular case is forcing a direct mapping for a small
piece of memory. So it should probably be described in the Device
Tree v.s. the of_match_table data in the driver, so that _of core
would allocate an IOMMU_RESV_DIRECT.

Overall, I feel this would be a big project yet arguably for a low
reward..

Nicolin
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Jason Gunthorpe 1 month, 2 weeks ago
On Thu, Aug 21, 2025 at 08:22:06AM -0700, Nicolin Chen wrote:
> I should have pasted the full piece:
> drivers/gpu/drm/tegra/drm.c-960-	/*
> drivers/gpu/drm/tegra/drm.c:961:	 * If the host1x client is already attached to an IOMMU domain that is
> drivers/gpu/drm/tegra/drm.c-962-	 * not the shared IOMMU domain, don't try to attach it to a different
> drivers/gpu/drm/tegra/drm.c-963-	 * domain. This allows using the IOMMU-backed DMA API.
> drivers/gpu/drm/tegra/drm.c-964-	 */
> drivers/gpu/drm/tegra/drm.c-965-	if (domain && domain->type != IOMMU_DOMAIN_IDENTITY &&
> drivers/gpu/drm/tegra/drm.c-966-	    domain != tegra->domain)
> 
> So, the check is two-fold:
> 1) is attached
> 2) is the shared IOMMU domain (tegra->domain?)

Yea

 iommu_is_domain_currently_attached(dev, tegra->domain)

> Overall, I feel this would be a big project yet arguably for a low
> reward..

Indeed, we can drop a FIXME comment there and leave it as the last
user or something perhaps

Jason
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Nicolin Chen 1 month, 1 week ago
On Thu, Aug 21, 2025 at 03:37:04PM -0300, Jason Gunthorpe wrote:
> On Thu, Aug 21, 2025 at 08:22:06AM -0700, Nicolin Chen wrote:
> > I should have pasted the full piece:
> > drivers/gpu/drm/tegra/drm.c-960-	/*
> > drivers/gpu/drm/tegra/drm.c:961:	 * If the host1x client is already attached to an IOMMU domain that is
> > drivers/gpu/drm/tegra/drm.c-962-	 * not the shared IOMMU domain, don't try to attach it to a different
> > drivers/gpu/drm/tegra/drm.c-963-	 * domain. This allows using the IOMMU-backed DMA API.
> > drivers/gpu/drm/tegra/drm.c-964-	 */
> > drivers/gpu/drm/tegra/drm.c-965-	if (domain && domain->type != IOMMU_DOMAIN_IDENTITY &&
> > drivers/gpu/drm/tegra/drm.c-966-	    domain != tegra->domain)
> > 
> > So, the check is two-fold:
> > 1) is attached
> > 2) is the shared IOMMU domain (tegra->domain?)
> 
> Yea
> 
>  iommu_is_domain_currently_attached(dev, tegra->domain)

Ah, yea.

> > Overall, I feel this would be a big project yet arguably for a low
> > reward..
> 
> Indeed, we can drop a FIXME comment there and leave it as the last
> user or something perhaps

I see. We could keep it in the library but discourage new callers.

I will start with the internal cleanup as we discussed, as a part
of this PCI reset series. Then, I will try adding those new helper
functions that we've listed here, as a separate series.

Thanks
Nicolin
RE: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Tian, Kevin 1 month, 2 weeks ago
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Tuesday, August 12, 2025 6:59 AM
> 
> There is a need to attach a PCI device that's under a reset to temporally
> the blocked domain (i.e. detach it from its previously attached domain),
> and then to reattach it back to its previous domain (i.e. detach it from
> the blocked domain) after reset.
> 
> During the reset stage, there can be races from other attach/detachment.
> To solve this, a per-gdev reset flag will be introduced so that all the
> attach functions will bypass the driver-level attach_dev callbacks, but
> only update the group->domain pointer. The reset recovery procedure will
> attach directly to the cached pointer so things will be back to normal.
> 
> On the other hand, the iommu_get_domain_for_dev() API always returns the
> group->domain pointer, and several IOMMMU drivers call this API in their
> attach_dev callback functions to get the currently attached domain for a
> device, which will be broken for the recovery case mentioned above:
>  1. core asks the driver to attach dev from blocked to group->domain
>  2. driver attaches dev from group->domain to group->domain

the 2nd bullet implies that a driver may skip the operation by noting that
old domain is same as the new one?

> 
> So, iommu_get_domain_for_dev() should check the gdev flag and return the
> blocked domain if the flag is set. But the caller of this API could hold
> the group->mutex already or not, making it difficult to add the lock.
> 
> Introduce a new iommu_get_domain_for_dev_locked() helper to be used by
> those drivers in a context that is already under the protection of the
> group->mutex, e.g. those attach_dev callback functions. And roll out the
> new helper to all the existing IOMMU drivers.

iommu_get_domain_for_dev() is also called outside of attach_dev
callback functions, e.g. malidp_get_pgsize_bitmap(). and the returned
info according to the attached domain might be saved in static
structures, e.g.:

	ms->mmu_prefetch_pgsize = malidp_get_pgsize_bitmap(mp);

would that cause weird issues when blocking domain is returned
though one may not expect reset to happen at that point?

> +/* Caller can be any general/external function that isn't an IOMMU callback
> */
>  struct iommu_domain *iommu_get_domain_for_dev(struct device *dev)

s/can/must/ ?
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Nicolin Chen 1 month, 2 weeks ago
On Fri, Aug 15, 2025 at 08:55:28AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Tuesday, August 12, 2025 6:59 AM
> > On the other hand, the iommu_get_domain_for_dev() API always returns the
> > group->domain pointer, and several IOMMMU drivers call this API in their
> > attach_dev callback functions to get the currently attached domain for a
> > device, which will be broken for the recovery case mentioned above:
> >  1. core asks the driver to attach dev from blocked to group->domain
> >  2. driver attaches dev from group->domain to group->domain
> 
> the 2nd bullet implies that a driver may skip the operation by noting that
> old domain is same as the new one?

Drivers uses iommu_get_domain_for_dev() to get the "old domain".
But during a reset, it doesn't return the actual old domain that
should be blocked domain but returns the group->domain.

Driver needs the actual domain (blocked) in that case because it
handles the requests from iommu_dev_reset_prepare/done().

> > So, iommu_get_domain_for_dev() should check the gdev flag and return the
> > blocked domain if the flag is set. But the caller of this API could hold
> > the group->mutex already or not, making it difficult to add the lock.
> > 
> > Introduce a new iommu_get_domain_for_dev_locked() helper to be used by
> > those drivers in a context that is already under the protection of the
> > group->mutex, e.g. those attach_dev callback functions. And roll out the
> > new helper to all the existing IOMMU drivers.
> 
> iommu_get_domain_for_dev() is also called outside of attach_dev
> callback functions, e.g. malidp_get_pgsize_bitmap(). and the returned
> info according to the attached domain might be saved in static
> structures, e.g.:
> 
> 	ms->mmu_prefetch_pgsize = malidp_get_pgsize_bitmap(mp);
> 
> would that cause weird issues when blocking domain is returned
> though one may not expect reset to happen at that point?

We aren't changing the iommu_get_domain_for_dev(). So, it should
be used exclusively for functions that are outside group->mutex,
like this one, i.e. they should still get the group->domain v.s.
the blocked domain.

Thanks
Nicolin
RE: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Tian, Kevin 1 month, 2 weeks ago
> From: Nicolin Chen <nicolinc@nvidia.com>
> Sent: Saturday, August 16, 2025 2:45 AM
> 
> On Fri, Aug 15, 2025 at 08:55:28AM +0000, Tian, Kevin wrote:
> > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > Sent: Tuesday, August 12, 2025 6:59 AM
> > > So, iommu_get_domain_for_dev() should check the gdev flag and return
> the
> > > blocked domain if the flag is set. But the caller of this API could hold
> > > the group->mutex already or not, making it difficult to add the lock.
> > >
> > > Introduce a new iommu_get_domain_for_dev_locked() helper to be used
> by
> > > those drivers in a context that is already under the protection of the
> > > group->mutex, e.g. those attach_dev callback functions. And roll out the
> > > new helper to all the existing IOMMU drivers.
> >
> > iommu_get_domain_for_dev() is also called outside of attach_dev
> > callback functions, e.g. malidp_get_pgsize_bitmap(). and the returned
> > info according to the attached domain might be saved in static
> > structures, e.g.:
> >
> > 	ms->mmu_prefetch_pgsize = malidp_get_pgsize_bitmap(mp);
> >
> > would that cause weird issues when blocking domain is returned
> > though one may not expect reset to happen at that point?
> 
> We aren't changing the iommu_get_domain_for_dev(). So, it should
> be used exclusively for functions that are outside group->mutex,
> like this one, i.e. they should still get the group->domain v.s.
> the blocked domain.
> 

Usually the difference between func() and func_locked() is only about
whether the caller holds a lock. If they mean to return different
domains, may need better naming (though I don't have a good option
now)...
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Nicolin Chen 1 month, 2 weeks ago
On Thu, Aug 21, 2025 at 08:07:01AM +0000, Tian, Kevin wrote:
> > From: Nicolin Chen <nicolinc@nvidia.com>
> > Sent: Saturday, August 16, 2025 2:45 AM
> > 
> > On Fri, Aug 15, 2025 at 08:55:28AM +0000, Tian, Kevin wrote:
> > > > From: Nicolin Chen <nicolinc@nvidia.com>
> > > > Sent: Tuesday, August 12, 2025 6:59 AM
> > > > So, iommu_get_domain_for_dev() should check the gdev flag and return
> > the
> > > > blocked domain if the flag is set. But the caller of this API could hold
> > > > the group->mutex already or not, making it difficult to add the lock.
> > > >
> > > > Introduce a new iommu_get_domain_for_dev_locked() helper to be used
> > by
> > > > those drivers in a context that is already under the protection of the
> > > > group->mutex, e.g. those attach_dev callback functions. And roll out the
> > > > new helper to all the existing IOMMU drivers.
> > >
> > > iommu_get_domain_for_dev() is also called outside of attach_dev
> > > callback functions, e.g. malidp_get_pgsize_bitmap(). and the returned
> > > info according to the attached domain might be saved in static
> > > structures, e.g.:
> > >
> > > 	ms->mmu_prefetch_pgsize = malidp_get_pgsize_bitmap(mp);
> > >
> > > would that cause weird issues when blocking domain is returned
> > > though one may not expect reset to happen at that point?
> > 
> > We aren't changing the iommu_get_domain_for_dev(). So, it should
> > be used exclusively for functions that are outside group->mutex,
> > like this one, i.e. they should still get the group->domain v.s.
> > the blocked domain.
> > 
> 
> Usually the difference between func() and func_locked() is only about
> whether the caller holds a lock. If they mean to return different
> domains, may need better naming (though I don't have a good option
> now)...

Yea, naming is tricky here :)

Perhaps it could follow a different pattern:
    iommu_dev_get_domain_for_driver(struct device *dev);

With a simple note highlighting to be used by IOMMU drivers in the
iommu callback functions like attach_dev.

Nicolin
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Nicolin Chen 1 month ago
Hi Kevin,

On Thu, Aug 21, 2025 at 07:41:57AM -0700, Nicolin Chen wrote:
> On Thu, Aug 21, 2025 at 08:07:01AM +0000, Tian, Kevin wrote:
> > Usually the difference between func() and func_locked() is only about
> > whether the caller holds a lock. If they mean to return different
> > domains, may need better naming (though I don't have a good option
> > now)...
> 
> Yea, naming is tricky here :)
> 
> Perhaps it could follow a different pattern:
>     iommu_dev_get_domain_for_driver(struct device *dev);
> 
> With a simple note highlighting to be used by IOMMU drivers in the
> iommu callback functions like attach_dev.

I am keeping this "_locked" naming in v4, as there are only two
places calling this new helper and only one of them is a driver.

Please check v4 and see if you still prefer a different name.

Thanks
Nic
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Baolu Lu 1 month, 2 weeks ago
On 8/12/25 06:59, Nicolin Chen wrote:
> There is a need to attach a PCI device that's under a reset to temporally
> the blocked domain (i.e. detach it from its previously attached domain),
> and then to reattach it back to its previous domain (i.e. detach it from
> the blocked domain) after reset.
> 
> During the reset stage, there can be races from other attach/detachment.
> To solve this, a per-gdev reset flag will be introduced so that all the
> attach functions will bypass the driver-level attach_dev callbacks, but
> only update the group->domain pointer. The reset recovery procedure will
> attach directly to the cached pointer so things will be back to normal.
> 
> On the other hand, the iommu_get_domain_for_dev() API always returns the
> group->domain pointer, and several IOMMMU drivers call this API in their
> attach_dev callback functions to get the currently attached domain for a
> device, which will be broken for the recovery case mentioned above:
>   1. core asks the driver to attach dev from blocked to group->domain
>   2. driver attaches dev from group->domain to group->domain
> 
> So, iommu_get_domain_for_dev() should check the gdev flag and return the
> blocked domain if the flag is set. But the caller of this API could hold
> the group->mutex already or not, making it difficult to add the lock.
> 
> Introduce a new iommu_get_domain_for_dev_locked() helper to be used by
> those drivers in a context that is already under the protection of the
> group->mutex, e.g. those attach_dev callback functions. And roll out the
> new helper to all the existing IOMMU drivers.

Given that iommu_group->mutex is transparent to the iommu driver, how
about

/*
  * Called only by iommu drivers in the callback context where
  * group->mutex has already been held by the core.
  */
struct iommu_domain *iommu_get_domain_for_dev_internal(struct device *dev)
{
	...
	lockdep_assert_held(&group->mutex);
	...
}

?

> 
> Add a lockdep_assert_not_held to the existing iommu_get_domain_for_dev()
> to note that it would be only used outside the group->mutex.
> 
> Signed-off-by: Nicolin Chen<nicolinc@nvidia.com>

Thanks,
baolu
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Jason Gunthorpe 1 month, 2 weeks ago
On Fri, Aug 15, 2025 at 01:28:02PM +0800, Baolu Lu wrote:

> Given that iommu_group->mutex is transparent to the iommu driver, how
> about

It is not actually transparent, alot of drivers are implicitly
assuming that the core single threads their struct device for alot of
the ops callbacks and we exposed the lockdep test to the drivers to
help the document this.

Jason
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Baolu Lu 1 month, 2 weeks ago
On 8/18/25 22:40, Jason Gunthorpe wrote:
> On Fri, Aug 15, 2025 at 01:28:02PM +0800, Baolu Lu wrote:
> 
>> Given that iommu_group->mutex is transparent to the iommu driver, how
>> about
> It is not actually transparent, alot of drivers are implicitly
> assuming that the core single threads their struct device for alot of
> the ops callbacks and we exposed the lockdep test to the drivers to
> help the document this.

Fair enough. It's the role of iommu_group_mutex_assert().

Thanks,
baolu
Re: [PATCH v3 3/5] iommu: Add iommu_get_domain_for_dev_locked() helper
Posted by Nicolin Chen 1 month, 2 weeks ago
On Fri, Aug 15, 2025 at 01:28:02PM +0800, Baolu Lu wrote:
> On 8/12/25 06:59, Nicolin Chen wrote:
> Given that iommu_group->mutex is transparent to the iommu driver, how
> about
> 
> /*
>  * Called only by iommu drivers in the callback context where
>  * group->mutex has already been held by the core.
>  */
> struct iommu_domain *iommu_get_domain_for_dev_internal(struct device *dev)
> {
> 	...
> 	lockdep_assert_held(&group->mutex);
> 	...
> }

I thought about the "_internal" tag, but didn't pick that because
we seem to use that for an internal function inside iommu.c?

I don't have a problem using that though, as it can be noted like
like your writing.

Thanks
Nicolin