[PATCH RFC] iommu: fix wrong DMA ops for iommu device

Charan Teja Kalla posted 1 patch 1 year ago
drivers/iommu/iommu.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
[PATCH RFC] iommu: fix wrong DMA ops for iommu device
Posted by Charan Teja Kalla 1 year ago
Race between smmu device probe and iommu device probe is resulting into
wrong DMA ops used for the device.

bus_iommu_probe(P1)                  of_iommu_configure(P2)
(from iommu_device_register)        (from insmod of a driver)
--------------------------         --------------------------
1) probe_iommu_group()
(Fills just dev->iommu_group
under iommu_probe_device_lock)
				2) iommu_probe_device():
				  (acquires the iommu_probe_device_lock,
				   but since dev->iommu_group is already
				   set, it just returns without
				   allocating the domain.)

				3) of_dma_configure_id()->
				   arch_setup_dma_ops() gets called for
				   this device, but returns with the
				   error message:
				   "Failed to set up IOMMU for device;
				    retaining platform DMA ops"

				4) device probe is succeeded where it
				   can now setup iommu mappings using
				   wrong ->dma_ops

5) domain will be allocated later
and fills iommu_group->domain.

Step 3) and 4) denies iommu device from using 'iommu_dma_ops'.

This issue does exists on 6.6 because of commit f188056352bc ("iommu:
Avoid locking/unlocking for iommu_probe_device()") without which 2)
returns only after filling the domain under group->mutex, thus no issue.

With commit b5c58b2fdc42("dma-mapping: direct calls for dma-iommu"),
->iommu_dma_ops is removed altogether and iommu api are directly being
called under "use_dma_iommu(): return dev->dma_iommu". Again,
->dma_iommu flag is set only after domain allocation. So, If
there is a sufficient delay between steps 4) and 5), issue is still
exists but that seems very narrow to me.

I am thinking, setup of the domain before returning from step2) is the
way to fix this issue. Can you please help if you have any suggestions?

Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
---
 drivers/iommu/iommu.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 3f1029c0825e..61fc8145f378 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -561,19 +561,43 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
 int iommu_probe_device(struct device *dev)
 {
 	const struct iommu_ops *ops;
+	struct iommu_group *group;
 	int ret;
 
 	mutex_lock(&iommu_probe_device_lock);
 	ret = __iommu_probe_device(dev, NULL);
 	mutex_unlock(&iommu_probe_device_lock);
 	if (ret)
-		return ret;
+		goto err_out;
+
+	group = iommu_group_get(dev);
+	if (!group) {
+		ret = -ENODEV;
+		goto err_release;
+	}
+
+	mutex_lock(&group->mutex);
+	if (!group->default_domain) {
+		ret = iommu_setup_default_domain(group, 0);
+		if (ret)
+			goto err_unlock;
+	}
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
 
 	ops = dev_iommu_ops(dev);
 	if (ops->probe_finalize)
 		ops->probe_finalize(dev);
 
 	return 0;
+
+err_unlock:
+	mutex_unlock(&group->mutex);
+	iommu_group_put(group);
+err_release:
+	iommu_release_device(dev);
+err_out:
+	return ret;
 }
 
 static void __iommu_group_free_device(struct iommu_group *group,
-- 
2.34.1
Re: [PATCH RFC] iommu: fix wrong DMA ops for iommu device
Posted by Charan Teja Kalla 11 months, 2 weeks ago
Just a quick ask if you have any comments for the below. BTW, in
internal tests, when we revert the commit f188056352bc ("iommu: Avoid
locking/unlocking for iommu_probe_device()", it is helping.

I can share more details If the below is not sufficient.

Thanks in advance...

On 12/13/2024 8:34 PM, Charan Teja Kalla wrote:
> Race between smmu device probe and iommu device probe is resulting into
> wrong DMA ops used for the device.
> 
> bus_iommu_probe(P1)                  of_iommu_configure(P2)
> (from iommu_device_register)        (from insmod of a driver)
> --------------------------         --------------------------
> 1) probe_iommu_group()
> (Fills just dev->iommu_group
> under iommu_probe_device_lock)
> 				2) iommu_probe_device():
> 				  (acquires the iommu_probe_device_lock,
> 				   but since dev->iommu_group is already
> 				   set, it just returns without
> 				   allocating the domain.)
> 
> 				3) of_dma_configure_id()->
> 				   arch_setup_dma_ops() gets called for
> 				   this device, but returns with the
> 				   error message:
> 				   "Failed to set up IOMMU for device;
> 				    retaining platform DMA ops"
> 
> 				4) device probe is succeeded where it
> 				   can now setup iommu mappings using
> 				   wrong ->dma_ops
> 
> 5) domain will be allocated later
> and fills iommu_group->domain.
> 
> Step 3) and 4) denies iommu device from using 'iommu_dma_ops'.
> 
> This issue does exists on 6.6 because of commit f188056352bc ("iommu:
> Avoid locking/unlocking for iommu_probe_device()") without which 2)
> returns only after filling the domain under group->mutex, thus no issue.
> 
> With commit b5c58b2fdc42("dma-mapping: direct calls for dma-iommu"),
> ->iommu_dma_ops is removed altogether and iommu api are directly being
> called under "use_dma_iommu(): return dev->dma_iommu". Again,
> ->dma_iommu flag is set only after domain allocation. So, If
> there is a sufficient delay between steps 4) and 5), issue is still
> exists but that seems very narrow to me.
> 
> I am thinking, setup of the domain before returning from step2) is the
> way to fix this issue. Can you please help if you have any suggestions?
> 
> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
> ---
>  drivers/iommu/iommu.c | 26 +++++++++++++++++++++++++-
>  1 file changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 3f1029c0825e..61fc8145f378 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -561,19 +561,43 @@ static int __iommu_probe_device(struct device *dev, struct list_head *group_list
>  int iommu_probe_device(struct device *dev)
>  {
>  	const struct iommu_ops *ops;
> +	struct iommu_group *group;
>  	int ret;
>  
>  	mutex_lock(&iommu_probe_device_lock);
>  	ret = __iommu_probe_device(dev, NULL);
>  	mutex_unlock(&iommu_probe_device_lock);
>  	if (ret)
> -		return ret;
> +		goto err_out;
> +
> +	group = iommu_group_get(dev);
> +	if (!group) {
> +		ret = -ENODEV;
> +		goto err_release;
> +	}
> +
> +	mutex_lock(&group->mutex);
> +	if (!group->default_domain) {
> +		ret = iommu_setup_default_domain(group, 0);
> +		if (ret)
> +			goto err_unlock;
> +	}
> +	mutex_unlock(&group->mutex);
> +	iommu_group_put(group);
>  
>  	ops = dev_iommu_ops(dev);
>  	if (ops->probe_finalize)
>  		ops->probe_finalize(dev);
>  
>  	return 0;
> +
> +err_unlock:
> +	mutex_unlock(&group->mutex);
> +	iommu_group_put(group);
> +err_release:
> +	iommu_release_device(dev);
> +err_out:
> +	return ret;
>  }
>  
>  static void __iommu_group_free_device(struct iommu_group *group,
Re: [PATCH RFC] iommu: fix wrong DMA ops for iommu device
Posted by Will Deacon 11 months, 2 weeks ago
(Please try not to top-post as it makes the thread hard to follow)

On Thu, Jan 02, 2025 at 04:09:46PM +0530, Charan Teja Kalla wrote:
> On 12/13/2024 8:34 PM, Charan Teja Kalla wrote:
> > Race between smmu device probe and iommu device probe is resulting into
> > wrong DMA ops used for the device.
> > 
> > bus_iommu_probe(P1)                  of_iommu_configure(P2)
> > (from iommu_device_register)        (from insmod of a driver)
> > --------------------------         --------------------------
> > 1) probe_iommu_group()
> > (Fills just dev->iommu_group
> > under iommu_probe_device_lock)
> > 				2) iommu_probe_device():
> > 				  (acquires the iommu_probe_device_lock,
> > 				   but since dev->iommu_group is already
> > 				   set, it just returns without
> > 				   allocating the domain.)
> > 
> > 				3) of_dma_configure_id()->
> > 				   arch_setup_dma_ops() gets called for
> > 				   this device, but returns with the
> > 				   error message:
> > 				   "Failed to set up IOMMU for device;
> > 				    retaining platform DMA ops"
> > 
> > 				4) device probe is succeeded where it
> > 				   can now setup iommu mappings using
> > 				   wrong ->dma_ops
> > 
> > 5) domain will be allocated later
> > and fills iommu_group->domain.
> > 
> > Step 3) and 4) denies iommu device from using 'iommu_dma_ops'.
> > 
> > This issue does exists on 6.6 because of commit f188056352bc ("iommu:
> > Avoid locking/unlocking for iommu_probe_device()") without which 2)
> > returns only after filling the domain under group->mutex, thus no issue.
> > 
> > With commit b5c58b2fdc42("dma-mapping: direct calls for dma-iommu"),
> > ->iommu_dma_ops is removed altogether and iommu api are directly being
> > called under "use_dma_iommu(): return dev->dma_iommu". Again,
> > ->dma_iommu flag is set only after domain allocation. So, If
> > there is a sufficient delay between steps 4) and 5), issue is still
> > exists but that seems very narrow to me.
> > 
> > I am thinking, setup of the domain before returning from step2) is the
> > way to fix this issue. Can you please help if you have any suggestions?
> > 
> > Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
> > ---
> >  drivers/iommu/iommu.c | 26 +++++++++++++++++++++++++-
> >  1 file changed, 25 insertions(+), 1 deletion(-)
>
> Just a quick ask if you have any comments for the below. BTW, in
> internal tests, when we revert the commit f188056352bc ("iommu: Avoid
> locking/unlocking for iommu_probe_device()", it is helping.
> 
> I can share more details If the below is not sufficient.
> 
> Thanks in advance...

Can you reproduce the issue on 6.13-rc5? We bodged some of the probe
code recently:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=229e6ee43d2a160a1592b83aad620d6027084aad

and a better set of fixes are queued in -next.

Will
Re: [PATCH RFC] iommu: fix wrong DMA ops for iommu device
Posted by Charan Teja Kalla 11 months, 1 week ago
Thanks Will for the response!!

On 1/3/2025 9:04 PM, Will Deacon wrote:
> (Please try not to top-post as it makes the thread hard to follow)
> 
> On Thu, Jan 02, 2025 at 04:09:46PM +0530, Charan Teja Kalla wrote:
>> On 12/13/2024 8:34 PM, Charan Teja Kalla wrote:
>>> Race between smmu device probe and iommu device probe is resulting into
>>> wrong DMA ops used for the device.
>>>
>>> bus_iommu_probe(P1)                  of_iommu_configure(P2)
>>> (from iommu_device_register)        (from insmod of a driver)
>>> --------------------------         --------------------------
>>> 1) probe_iommu_group()
>>> (Fills just dev->iommu_group
>>> under iommu_probe_device_lock)
>>> 				2) iommu_probe_device():
>>> 				  (acquires the iommu_probe_device_lock,
>>> 				   but since dev->iommu_group is already
>>> 				   set, it just returns without
>>> 				   allocating the domain.)
>>>
>>> 				3) of_dma_configure_id()->
>>> 				   arch_setup_dma_ops() gets called for
>>> 				   this device, but returns with the
>>> 				   error message:
>>> 				   "Failed to set up IOMMU for device;
>>> 				    retaining platform DMA ops"
>>>
>>> 				4) device probe is succeeded where it
>>> 				   can now setup iommu mappings using
>>> 				   wrong ->dma_ops
>>>
>>> 5) domain will be allocated later
>>> and fills iommu_group->domain.
>>>
>>> Step 3) and 4) denies iommu device from using 'iommu_dma_ops'.
>>>
>>> This issue does exists on 6.6 because of commit f188056352bc ("iommu:
>>> Avoid locking/unlocking for iommu_probe_device()") without which 2)
>>> returns only after filling the domain under group->mutex, thus no issue.
>>>
>>> With commit b5c58b2fdc42("dma-mapping: direct calls for dma-iommu"),
>>> ->iommu_dma_ops is removed altogether and iommu api are directly being
>>> called under "use_dma_iommu(): return dev->dma_iommu". Again,
>>> ->dma_iommu flag is set only after domain allocation. So, If
>>> there is a sufficient delay between steps 4) and 5), issue is still
>>> exists but that seems very narrow to me.
>>>
>>> I am thinking, setup of the domain before returning from step2) is the
>>> way to fix this issue. Can you please help if you have any suggestions?
>>>
>>> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
>>> ---
>>>  drivers/iommu/iommu.c | 26 +++++++++++++++++++++++++-
>>>  1 file changed, 25 insertions(+), 1 deletion(-)
>>
>> Just a quick ask if you have any comments for the below. BTW, in
>> internal tests, when we revert the commit f188056352bc ("iommu: Avoid
>> locking/unlocking for iommu_probe_device()", it is helping.
>>
>> I can share more details If the below is not sufficient.
>>
>> Thanks in advance...
> 
> Can you reproduce the issue on 6.13-rc5? We bodged some of the probe
> code recently:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=229e6ee43d2a160a1592b83aad620d6027084aad
> 
> and a better set of fixes are queued in -next.

Yes, we did try a better patch from Robin[1] and the issue is still
reproducible.

IIUC, these patches are trying to get valid smmu device, during the
probe of an smmu client device, by traversing proper 'klist_devices'
happening from iommu_init_device(). Please CMIW.

But this bug makes iommu_init_device() to completely gets skipped.

static int __iommu_probe_device(struct device *dev, ...)
{
	/* Device is probed already if in a group */
        if (dev->iommu_group)
                return 0;  --> driver call returns from here.

        ret = iommu_init_device(dev, ops);
        if (ret)
                return ret;


}

Regarding the testing, we work on Android that is on LTS kernels, so, it
may not be possible for us to try the thing on 6.13 and this seems a bug
on LTS 6.6 kernel.

[1]
https://lore.kernel.org/all/0952ca36-c5d9-462a-ab7b-b97154c56919@arm.com/

Thanks
Re: [PATCH RFC] iommu: fix wrong DMA ops for iommu device
Posted by Will Deacon 11 months, 1 week ago
On Mon, Jan 06, 2025 at 06:46:08PM +0530, Charan Teja Kalla wrote:
> On 1/3/2025 9:04 PM, Will Deacon wrote:
> > On Thu, Jan 02, 2025 at 04:09:46PM +0530, Charan Teja Kalla wrote:
> >> On 12/13/2024 8:34 PM, Charan Teja Kalla wrote:
> >>> Race between smmu device probe and iommu device probe is resulting into
> >>> wrong DMA ops used for the device.
> >>>
> >>> bus_iommu_probe(P1)                  of_iommu_configure(P2)
> >>> (from iommu_device_register)        (from insmod of a driver)
> >>> --------------------------         --------------------------
> >>> 1) probe_iommu_group()
> >>> (Fills just dev->iommu_group
> >>> under iommu_probe_device_lock)
> >>> 				2) iommu_probe_device():
> >>> 				  (acquires the iommu_probe_device_lock,
> >>> 				   but since dev->iommu_group is already
> >>> 				   set, it just returns without
> >>> 				   allocating the domain.)
> >>>
> >>> 				3) of_dma_configure_id()->
> >>> 				   arch_setup_dma_ops() gets called for
> >>> 				   this device, but returns with the
> >>> 				   error message:
> >>> 				   "Failed to set up IOMMU for device;
> >>> 				    retaining platform DMA ops"
> >>>
> >>> 				4) device probe is succeeded where it
> >>> 				   can now setup iommu mappings using
> >>> 				   wrong ->dma_ops
> >>>
> >>> 5) domain will be allocated later
> >>> and fills iommu_group->domain.
> >>>
> >>> Step 3) and 4) denies iommu device from using 'iommu_dma_ops'.
> >>>
> >>> This issue does exists on 6.6 because of commit f188056352bc ("iommu:
> >>> Avoid locking/unlocking for iommu_probe_device()") without which 2)
> >>> returns only after filling the domain under group->mutex, thus no issue.
> >>>
> >>> With commit b5c58b2fdc42("dma-mapping: direct calls for dma-iommu"),
> >>> ->iommu_dma_ops is removed altogether and iommu api are directly being
> >>> called under "use_dma_iommu(): return dev->dma_iommu". Again,
> >>> ->dma_iommu flag is set only after domain allocation. So, If
> >>> there is a sufficient delay between steps 4) and 5), issue is still
> >>> exists but that seems very narrow to me.
> >>>
> >>> I am thinking, setup of the domain before returning from step2) is the
> >>> way to fix this issue. Can you please help if you have any suggestions?
> >>>
> >>> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
> >>> ---
> >>>  drivers/iommu/iommu.c | 26 +++++++++++++++++++++++++-
> >>>  1 file changed, 25 insertions(+), 1 deletion(-)
> >>
> >> Just a quick ask if you have any comments for the below. BTW, in
> >> internal tests, when we revert the commit f188056352bc ("iommu: Avoid
> >> locking/unlocking for iommu_probe_device()", it is helping.
> >>
> >> I can share more details If the below is not sufficient.
> >>
> >> Thanks in advance...
> > 
> > Can you reproduce the issue on 6.13-rc5? We bodged some of the probe
> > code recently:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=229e6ee43d2a160a1592b83aad620d6027084aad
> > 
> > and a better set of fixes are queued in -next.
> 
> Yes, we did try a better patch from Robin[1] and the issue is still
> reproducible.

Did you try the -EPROBE_DEFER hack that got merged?

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=229e6ee43d2a160a1592b83aad620d6027084aad

Otherwise, you can try the stack queued for 6.14 in iommu/linux.git:

46b3df8eb9bd iommu: Manage driver probe deferral better
fcbd62156742 iommu/arm-smmu-v3: Clean up more on probe failure
97cb1fa02726 iommu/arm-smmu: Retire probe deferral workaround
7d835134d4e1 iommu/arm-smmu: Make instance lookup robust

> IIUC, these patches are trying to get valid smmu device, during the
> probe of an smmu client device, by traversing proper 'klist_devices'
> happening from iommu_init_device(). Please CMIW.

Well, they're fixing a race between the IOMMU probing and its client
devices attaching.

> But this bug makes iommu_init_device() to completely gets skipped.
> 
> static int __iommu_probe_device(struct device *dev, ...)
> {
> 	/* Device is probed already if in a group */
>         if (dev->iommu_group)
>                 return 0;  --> driver call returns from here.
> 
>         ret = iommu_init_device(dev, ops);
>         if (ret)
>                 return ret;
> 
> 
> }
> 
> Regarding the testing, we work on Android that is on LTS kernels, so, it
> may not be possible for us to try the thing on 6.13 and this seems a bug
> on LTS 6.6 kernel.

We need to fix 6.13 before we fix 6.6 unless you can show that 6.13 is
unaffected (in which case, we can backport the fix(es)).

Will
Re: [PATCH RFC] iommu: fix wrong DMA ops for iommu device
Posted by Robin Murphy 11 months, 1 week ago
On 07/01/2025 11:31 am, Will Deacon wrote:
> On Mon, Jan 06, 2025 at 06:46:08PM +0530, Charan Teja Kalla wrote:
>> On 1/3/2025 9:04 PM, Will Deacon wrote:
>>> On Thu, Jan 02, 2025 at 04:09:46PM +0530, Charan Teja Kalla wrote:
>>>> On 12/13/2024 8:34 PM, Charan Teja Kalla wrote:
>>>>> Race between smmu device probe and iommu device probe is resulting into
>>>>> wrong DMA ops used for the device.
>>>>>
>>>>> bus_iommu_probe(P1)                  of_iommu_configure(P2)
>>>>> (from iommu_device_register)        (from insmod of a driver)
>>>>> --------------------------         --------------------------
>>>>> 1) probe_iommu_group()
>>>>> (Fills just dev->iommu_group
>>>>> under iommu_probe_device_lock)
>>>>> 				2) iommu_probe_device():
>>>>> 				  (acquires the iommu_probe_device_lock,
>>>>> 				   but since dev->iommu_group is already
>>>>> 				   set, it just returns without
>>>>> 				   allocating the domain.)
>>>>>
>>>>> 				3) of_dma_configure_id()->
>>>>> 				   arch_setup_dma_ops() gets called for
>>>>> 				   this device, but returns with the
>>>>> 				   error message:
>>>>> 				   "Failed to set up IOMMU for device;
>>>>> 				    retaining platform DMA ops"
>>>>>
>>>>> 				4) device probe is succeeded where it
>>>>> 				   can now setup iommu mappings using
>>>>> 				   wrong ->dma_ops
>>>>>
>>>>> 5) domain will be allocated later
>>>>> and fills iommu_group->domain.
>>>>>
>>>>> Step 3) and 4) denies iommu device from using 'iommu_dma_ops'.
>>>>>
>>>>> This issue does exists on 6.6 because of commit f188056352bc ("iommu:
>>>>> Avoid locking/unlocking for iommu_probe_device()") without which 2)
>>>>> returns only after filling the domain under group->mutex, thus no issue.
>>>>>
>>>>> With commit b5c58b2fdc42("dma-mapping: direct calls for dma-iommu"),
>>>>> ->iommu_dma_ops is removed altogether and iommu api are directly being
>>>>> called under "use_dma_iommu(): return dev->dma_iommu". Again,
>>>>> ->dma_iommu flag is set only after domain allocation. So, If
>>>>> there is a sufficient delay between steps 4) and 5), issue is still
>>>>> exists but that seems very narrow to me.
>>>>>
>>>>> I am thinking, setup of the domain before returning from step2) is the
>>>>> way to fix this issue. Can you please help if you have any suggestions?
>>>>>
>>>>> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
>>>>> ---
>>>>>   drivers/iommu/iommu.c | 26 +++++++++++++++++++++++++-
>>>>>   1 file changed, 25 insertions(+), 1 deletion(-)
>>>>
>>>> Just a quick ask if you have any comments for the below. BTW, in
>>>> internal tests, when we revert the commit f188056352bc ("iommu: Avoid
>>>> locking/unlocking for iommu_probe_device()", it is helping.
>>>>
>>>> I can share more details If the below is not sufficient.
>>>>
>>>> Thanks in advance...
>>>
>>> Can you reproduce the issue on 6.13-rc5? We bodged some of the probe
>>> code recently:
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=229e6ee43d2a160a1592b83aad620d6027084aad
>>>
>>> and a better set of fixes are queued in -next.
>>
>> Yes, we did try a better patch from Robin[1] and the issue is still
>> reproducible.
> 
> Did you try the -EPROBE_DEFER hack that got merged?
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=229e6ee43d2a160a1592b83aad620d6027084aad
> 
> Otherwise, you can try the stack queued for 6.14 in iommu/linux.git:
> 
> 46b3df8eb9bd iommu: Manage driver probe deferral better
> fcbd62156742 iommu/arm-smmu-v3: Clean up more on probe failure
> 97cb1fa02726 iommu/arm-smmu: Retire probe deferral workaround
> 7d835134d4e1 iommu/arm-smmu: Make instance lookup robust
> 
>> IIUC, these patches are trying to get valid smmu device, during the
>> probe of an smmu client device, by traversing proper 'klist_devices'
>> happening from iommu_init_device(). Please CMIW.
> 
> Well, they're fixing a race between the IOMMU probing and its client
> devices attaching.
> 
>> But this bug makes iommu_init_device() to completely gets skipped.
>>
>> static int __iommu_probe_device(struct device *dev, ...)
>> {
>> 	/* Device is probed already if in a group */
>>          if (dev->iommu_group)
>>                  return 0;  --> driver call returns from here.
>>
>>          ret = iommu_init_device(dev, ops);
>>          if (ret)
>>                  return ret;
>>
>>
>> }
>>
>> Regarding the testing, we work on Android that is on LTS kernels, so, it
>> may not be possible for us to try the thing on 6.13 and this seems a bug
>> on LTS 6.6 kernel.
> 
> We need to fix 6.13 before we fix 6.6 unless you can show that 6.13 is
> unaffected (in which case, we can backport the fix(es)).

Certainly the reasoning can't apply to mainline as given, since 
arch_setup_dma_ops() stopped touching IOMMU stuff at all back in 6.10 
(and indeed iommu_dma_ops itself no longer exists since 6.12).

Robin.
Re: [PATCH RFC] iommu: fix wrong DMA ops for iommu device
Posted by Charan Teja Kalla 11 months, 1 week ago
Thanks Will and Robin for the response!!

On 1/8/2025 6:07 PM, Robin Murphy wrote:
> On 07/01/2025 11:31 am, Will Deacon wrote:
>> On Mon, Jan 06, 2025 at 06:46:08PM +0530, Charan Teja Kalla wrote:
>>> On 1/3/2025 9:04 PM, Will Deacon wrote:
>>>> On Thu, Jan 02, 2025 at 04:09:46PM +0530, Charan Teja Kalla wrote:
>>>>> On 12/13/2024 8:34 PM, Charan Teja Kalla wrote:
>>>>>> Race between smmu device probe and iommu device probe is resulting
>>>>>> into
>>>>>> wrong DMA ops used for the device.
>>>>>>
>>>>>> bus_iommu_probe(P1)                  of_iommu_configure(P2)
>>>>>> (from iommu_device_register)        (from insmod of a driver)
>>>>>> --------------------------         --------------------------
>>>>>> 1) probe_iommu_group()
>>>>>> (Fills just dev->iommu_group
>>>>>> under iommu_probe_device_lock)
>>>>>>                 2) iommu_probe_device():
>>>>>>                   (acquires the iommu_probe_device_lock,
>>>>>>                    but since dev->iommu_group is already
>>>>>>                    set, it just returns without
>>>>>>                    allocating the domain.)
>>>>>>
>>>>>>                 3) of_dma_configure_id()->
>>>>>>                    arch_setup_dma_ops() gets called for
>>>>>>                    this device, but returns with the
>>>>>>                    error message:
>>>>>>                    "Failed to set up IOMMU for device;
>>>>>>                     retaining platform DMA ops"
>>>>>>
>>>>>>                 4) device probe is succeeded where it
>>>>>>                    can now setup iommu mappings using
>>>>>>                    wrong ->dma_ops
>>>>>>
>>>>>> 5) domain will be allocated later
>>>>>> and fills iommu_group->domain.
>>>>>>
>>>>>> Step 3) and 4) denies iommu device from using 'iommu_dma_ops'.
>>>>>>
>>>>>> This issue does exists on 6.6 because of commit f188056352bc ("iommu:
>>>>>> Avoid locking/unlocking for iommu_probe_device()") without which 2)
>>>>>> returns only after filling the domain under group->mutex, thus no
>>>>>> issue.
>>>>>>
>>>>>> With commit b5c58b2fdc42("dma-mapping: direct calls for dma-iommu"),
>>>>>> ->iommu_dma_ops is removed altogether and iommu api are directly
>>>>>> being
>>>>>> called under "use_dma_iommu(): return dev->dma_iommu". Again,
>>>>>> ->dma_iommu flag is set only after domain allocation. So, If
>>>>>> there is a sufficient delay between steps 4) and 5), issue is still
>>>>>> exists but that seems very narrow to me.
>>>>>>
>>>>>> I am thinking, setup of the domain before returning from step2) is
>>>>>> the
>>>>>> way to fix this issue. Can you please help if you have any
>>>>>> suggestions?
>>>>>>
>>>>>> Signed-off-by: Charan Teja Kalla <quic_charante@quicinc.com>
>>>>>> ---
>>>>>>   drivers/iommu/iommu.c | 26 +++++++++++++++++++++++++-
>>>>>>   1 file changed, 25 insertions(+), 1 deletion(-)
>>>>>
>>>>> Just a quick ask if you have any comments for the below. BTW, in
>>>>> internal tests, when we revert the commit f188056352bc ("iommu: Avoid
>>>>> locking/unlocking for iommu_probe_device()", it is helping.
>>>>>
>>>>> I can share more details If the below is not sufficient.
>>>>>
>>>>> Thanks in advance...
>>>>
>>>> Can you reproduce the issue on 6.13-rc5? We bodged some of the probe
>>>> code recently:
>>>>
>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
>>>> commit/?id=229e6ee43d2a160a1592b83aad620d6027084aad
>>>>
>>>> and a better set of fixes are queued in -next.
>>>
>>> Yes, we did try a better patch from Robin[1] and the issue is still
>>> reproducible.
>>
>> Did you try the -EPROBE_DEFER hack that got merged?
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
>> commit/?id=229e6ee43d2a160a1592b83aad620d6027084aad
>>
@Will,

We just tested and the issue is still reported with just the mentioned
hack that got merged.

May be I will try to convince again with more details about why this is
a race and why the -EPROBE_DEFER hack you mentioned will not help.

S: SMMU device probe in bus_iommu_probe().
C: SMMU client device probe in of_iommu_configure().

1) S: In probe_iommu_group():
	Fills the client device ->iommu_group under  iommu_probe_device_lock.
But iommu_group->domain is not yet allocated because it is a 2 step
process from probe_iommu_group.

2) C: iommu_probe_device() --> __iommu_probe_device():
	Acquires the iommu_probe_device_lock, and since dev->iommu_group is
already set, it just returns without allocating the domain.

	static int __iommu_probe_device(struct device *dev, ...)
	{
		/* Device is probed already if in a group */
        if (dev->iommu_group)
                return 0;  --> __driver call returns from here.__

        ret = iommu_init_device(dev, ops);
        if (ret)
                return ret;
	}
The change you mentioned will come into play iff it enters
iommu_init_device(), where it returns -EPROBE_DEFER till smmu driver
bound happens. __Here we didn't even enter into this function__.

3) C: Continues probing and calls
of_dma_configure_id()->arch_setup_dma_ops()->iommu_setup_dma_ops().
Since ->domain is not yet allocated, it just returns without setting
'iommu_dma_ops' and with a warn message of 'Failed to set up IOMMU for
device' for this client device. And also the client probe is
successfully completed.
	void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit) {
	if (!domain)
		goto out_err;
			
	if (iommu_is_dma_domain(domain)) {
		.....
	dev->dma_ops = &iommu_dma_ops;<-- This step will not be executed.
	}
			
out_err:
         pr_warn("Failed to set up IOMMU for device %s; retaining
platform DMA ops\n",
                 dev_name(dev));			
	}

4) S: domain will be allocated later in the second step and fills
iommu_group->domain.


Since, the SMMU client device probe is already successfully completed
with invalid dma_ops and when client uses the dma api can lead to issues.

>> Otherwise, you can try the stack queued for 6.14 in iommu/linux.git:
>>
>> 46b3df8eb9bd iommu: Manage driver probe deferral better
>> fcbd62156742 iommu/arm-smmu-v3: Clean up more on probe failure
>> 97cb1fa02726 iommu/arm-smmu: Retire probe deferral workaround
>> 7d835134d4e1 iommu/arm-smmu: Make instance lookup robust
>>
>>> IIUC, these patches are trying to get valid smmu device, during the
>>> probe of an smmu client device, by traversing proper 'klist_devices'
>>> happening from iommu_init_device(). Please CMIW.
>>
>> Well, they're fixing a race between the IOMMU probing and its client
>> devices attaching.
>>
>>> But this bug makes iommu_init_device() to completely gets skipped.
>>>
>>> static int __iommu_probe_device(struct device *dev, ...)
>>> {
>>>     /* Device is probed already if in a group */
>>>          if (dev->iommu_group)
>>>                  return 0;  --> driver call returns from here.
>>>
>>>          ret = iommu_init_device(dev, ops);
>>>          if (ret)
>>>                  return ret;
>>>
>>>
>>> }
>>>
>>> Regarding the testing, we work on Android that is on LTS kernels, so, it
>>> may not be possible for us to try the thing on 6.13 and this seems a bug
>>> on LTS 6.6 kernel.
>>
>> We need to fix 6.13 before we fix 6.6 unless you can show that 6.13 is
>> unaffected (in which case, we can backport the fix(es)).
> 
> Certainly the reasoning can't apply to mainline as given, since
> arch_setup_dma_ops() stopped touching IOMMU stuff at all back in 6.10
> (and indeed iommu_dma_ops itself no longer exists since 6.12).
> 

@Robin

Agree that we don't have iommu_dma_ops but I can say that same race
still exists. Although dma_ops is not used, but the decision to call
into the iommu api's is determined by the 'dev->dma_iommu' flag, which
again, is set after domain is allocated for a device.

In the same race mentioned above,
1) S: Domain is not allocated but the dev->iommu_group.
2) C: Just returns as dev->iommu_group is filled.
3) C: Continues probing and succeeds.
4) C: Calls dma_alloc/map/.... But, it won't enter into iommu_ api's
because the 'dev->dma_iommu' is still 'false'.
5) S: Domain is allocated and sets the 'dev->dma_iommu' to 'true'.

4) above is the problematic step. Although issue exists but seems to me
that very narrow to get triggered. Please CMIW.

> Robin.

Thanks.

Re: [PATCH RFC] iommu: fix wrong DMA ops for iommu device
Posted by Robin Murphy 11 months ago
On 10/01/2025 1:29 pm, Charan Teja Kalla wrote:
[...]
>>> We need to fix 6.13 before we fix 6.6 unless you can show that 6.13 is
>>> unaffected (in which case, we can backport the fix(es)).
>>
>> Certainly the reasoning can't apply to mainline as given, since
>> arch_setup_dma_ops() stopped touching IOMMU stuff at all back in 6.10
>> (and indeed iommu_dma_ops itself no longer exists since 6.12).
>>
> 
> @Robin
> 
> Agree that we don't have iommu_dma_ops but I can say that same race
> still exists. Although dma_ops is not used, but the decision to call
> into the iommu api's is determined by the 'dev->dma_iommu' flag, which
> again, is set after domain is allocated for a device.
> 
> In the same race mentioned above,
> 1) S: Domain is not allocated but the dev->iommu_group.
> 2) C: Just returns as dev->iommu_group is filled.
> 3) C: Continues probing and succeeds.
> 4) C: Calls dma_alloc/map/.... But, it won't enter into iommu_ api's
> because the 'dev->dma_iommu' is still 'false'.
> 5) S: Domain is allocated and sets the 'dev->dma_iommu' to 'true'.
> 
> 4) above is the problematic step. Although issue exists but seems to me
> that very narrow to get triggered. Please CMIW.

Hmm, yes, I guess there is a fundamental race where async client driver
probe can observe a partially initialised group while the IOMMU driver
itself is still running iommu_device_register()->bus_iommu_probe()...
And in that case, this patch is wrong for two reasons: firstly, bodging
the iommu_probe_device() call (which really should not exist at all)
does not cover all cases; and secondly, forcibly creating the default
domain in that path before we know bus_iommu_probe() has seen all other
existing devices means iommu_get_default_domain_type() may miss their
requirements and thus defeat the whole point of deferred allocation in
the first place.

Having looked a bit closer, I think a more robust solution for now is
probably as below.

Thanks,
Robin.

----->8-----
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index d1af0547f553..8d90d196e38d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3120,6 +3120,11 @@ int iommu_device_use_default_domain(struct device *dev)
  		return 0;
  
  	mutex_lock(&group->mutex);
+	/* We may race against bus_iommu_probe() finalising groups here */
+	if (!group->default_domain) {
+		ret = -EPROBE_DEFER;
+		goto unlock_out;
+	}
  	if (group->owner_cnt) {
  		if (group->domain != group->default_domain || group->owner ||
  		    !xa_empty(&group->pasid_array)) {
Re: [PATCH RFC] iommu: fix wrong DMA ops for iommu device
Posted by Charan Teja Kalla 10 months, 4 weeks ago
Hi Robin,

On 1/13/2025 10:55 PM, Robin Murphy wrote:
>> @Robin
>>
>> Agree that we don't have iommu_dma_ops but I can say that same race
>> still exists. Although dma_ops is not used, but the decision to call
>> into the iommu api's is determined by the 'dev->dma_iommu' flag, which
>> again, is set after domain is allocated for a device.
>>
>> In the same race mentioned above,
>> 1) S: Domain is not allocated but the dev->iommu_group.
>> 2) C: Just returns as dev->iommu_group is filled.
>> 3) C: Continues probing and succeeds.
>> 4) C: Calls dma_alloc/map/.... But, it won't enter into iommu_ api's
>> because the 'dev->dma_iommu' is still 'false'.
>> 5) S: Domain is allocated and sets the 'dev->dma_iommu' to 'true'.
>>
>> 4) above is the problematic step. Although issue exists but seems to me
>> that very narrow to get triggered. Please CMIW.
> 
> Hmm, yes, I guess there is a fundamental race where async client driver
> probe can observe a partially initialised group while the IOMMU driver
> itself is still running iommu_device_register()->bus_iommu_probe()...
> And in that case, this patch is wrong for two reasons: firstly, bodging
> the iommu_probe_device() call (which really should not exist at all)
> does not cover all cases; and secondly, forcibly creating the default
> domain in that path before we know bus_iommu_probe() has seen all other
> existing devices means iommu_get_default_domain_type() may miss their
> requirements and thus defeat the whole point of deferred allocation in
> the first place.
> 
Got it. Thanks.
> Having looked a bit closer, I think a more robust solution for now is
> probably as below.
> 
Just want to hear of other solutions you have in mind, when you say this
solution is for now....
> Thanks,
> Robin.
> 
> ----->8-----
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index d1af0547f553..8d90d196e38d 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -3120,6 +3120,11 @@ int iommu_device_use_default_domain(struct device
> *dev)
>          return 0;
>  
>      mutex_lock(&group->mutex);
> +    /* We may race against bus_iommu_probe() finalising groups here */
> +    if (!group->default_domain) {
> +        ret = -EPROBE_DEFER;
> +        goto unlock_out;
> +    }

Please lmk If I can submit this as V2 patch, with your name as Suggested-by?

>      if (group->owner_cnt) {
>          if (group->domain != group->default_domain || group->owner ||
>              !xa_empty(&group->pasid_array)) {

Re: [PATCH RFC] iommu: fix wrong DMA ops for iommu device
Posted by Charan Teja Kalla 10 months, 3 weeks ago

On 1/20/2025 5:45 PM, Charan Teja Kalla wrote:
>> ----->8-----
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index d1af0547f553..8d90d196e38d 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -3120,6 +3120,11 @@ int iommu_device_use_default_domain(struct device
>> *dev)
>>          return 0;
>>  
>>      mutex_lock(&group->mutex);
>> +    /* We may race against bus_iommu_probe() finalising groups here */
>> +    if (!group->default_domain) {
>> +        ret = -EPROBE_DEFER;
>> +        goto unlock_out;
>> +    }
> Please lmk If I can submit this as V2 patch, with your name as Suggested-by?
I missed to mention that this patch is fixing the problem we are
seeing..Thanks.
Re: [PATCH RFC] iommu: fix wrong DMA ops for iommu device
Posted by Robin Murphy 10 months, 3 weeks ago
On 2025-01-22 5:39 am, Charan Teja Kalla wrote:
> 
> 
> On 1/20/2025 5:45 PM, Charan Teja Kalla wrote:
>>> ----->8-----
>>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>>> index d1af0547f553..8d90d196e38d 100644
>>> --- a/drivers/iommu/iommu.c
>>> +++ b/drivers/iommu/iommu.c
>>> @@ -3120,6 +3120,11 @@ int iommu_device_use_default_domain(struct device
>>> *dev)
>>>           return 0;
>>>   
>>>       mutex_lock(&group->mutex);
>>> +    /* We may race against bus_iommu_probe() finalising groups here */
>>> +    if (!group->default_domain) {
>>> +        ret = -EPROBE_DEFER;
>>> +        goto unlock_out;
>>> +    }
>> Please lmk If I can submit this as V2 patch, with your name as Suggested-by?
> I missed to mention that this patch is fixing the problem we are
> seeing..Thanks.

Ah, that's good. I did spend most of an afternoon digging through git
history to write up this patch the other day, and was planning to send
it properly soon (probably after the merge window now) - as noted, it
also turns out not to be quite right for 6.6LTS as-is, so I figured I'd
save worrying about backports until it lands.

Thanks,
Robin.

----->8-----
Subject: [PATCH] iommu: Handle race with default domain setup

It turns out that deferred default domain creation leaves a subtle
race window during iommu_device_register() wherein a client driver may
asynchronously probe in parallel and get as far as performing DMA API
operations with dma-direct, only to be switched to iommu-dma underfoot
once the default domain attachment finally happens, with obviously
disastrous consequences. Even the wonky of_iommu_configure() path is at
risk, since iommu_fwspec_init() will no longer defer client probe as the
instance ops are (necessarily) already registered, and the "replay"
iommu_probe_device() call can see dev->iommu_group already set and so
think there's nothing to do either.

Fortunately we already have the right tool in the right place in the
form of iommu_device_use_default_domain(), which just needs to ensure
that said default domain is actually ready to *be* used. Deferring the
client probe shouldn't have too much impact, given that this only
happens while the IOMMU driver is probing, and thus due to kick the
deferred probe list again once it finishes.

Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
Fixes: 98ac73f99bc4 ("iommu: Require a default_domain for all iommu drivers")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>

---

Note this fixes tag is rather nuanced - historically there was a more
general issue before deac0b3bed26 ("iommu: Split off default domain
allocation from group assignment") set the basis for the current
conditions; 1ea2a07a532b ("iommu: Add DMA ownership management
interfaces") is then the point at which it becomes logical to fix the
current race this way; however only from 98ac73f99bc4 can we rely on all
drivers supporting default domains and so avoid false negatives, thus
even though this might apply to older kernels without conflict it would
not be functionally correct. Furthermore given the other locking and API
flow changes which also happened over that time, it's not clear how long
this race has actually been exposed. LTS fixes are going to be fiddly...
---
  drivers/iommu/iommu.c | 5 +++++
  1 file changed, 5 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 851fd5aeccf5..9969531ad4ed 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3097,6 +3097,11 @@ int iommu_device_use_default_domain(struct device *dev)
  		return 0;
  
  	mutex_lock(&group->mutex);
+	/* We may race against bus_iommu_probe() finalising groups here */
+	if (!group->default_domain) {
+		ret = -EPROBE_DEFER;
+		goto unlock_out;
+	}
  	if (group->owner_cnt) {
  		if (group->domain != group->default_domain || group->owner ||
  		    !xa_empty(&group->pasid_array)) {
-- 
2.39.2.101.g768bb238c484.dirty


Re: [PATCH RFC] iommu: fix wrong DMA ops for iommu device
Posted by Charan Teja Kalla 10 months, 3 weeks ago

On 1/22/2025 8:13 PM, Robin Murphy wrote:
>>> Please lmk If I can submit this as V2 patch, with your name as
>>> Suggested-by?
>> I missed to mention that this patch is fixing the problem we are
>> seeing..Thanks.
> 
> Ah, that's good. I did spend most of an afternoon digging through git
> history to write up this patch the other day, and was planning to send
> it properly soon (probably after the merge window now) - as noted, it
> also turns out not to be quite right for 6.6LTS as-is, so I figured I'd
> save worrying about backports until it lands.

Thanks a lot Robin for sending this patch with much more details!!

> 
> Thanks,
> Robin.
> 
> ----->8-----
> Subject: [PATCH] iommu: Handle race with default domain setup
> 
> It turns out that deferred default domain creation leaves a subtle
> race window during iommu_device_register() wherein a client driver may
> asynchronously probe in parallel and get as far as performing DMA API
> operations with dma-direct, only to be switched to iommu-dma underfoot
> once the default domain attachment finally happens, with obviously
> disastrous consequences. Even the wonky of_iommu_configure() path is at
> risk, since iommu_fwspec_init() will no longer defer client probe as the
> instance ops are (necessarily) already registered, and the "replay"
> iommu_probe_device() call can see dev->iommu_group already set and so
> think there's nothing to do either.
> 
> Fortunately we already have the right tool in the right place in the
> form of iommu_device_use_default_domain(), which just needs to ensure
> that said default domain is actually ready to *be* used. Deferring the
> client probe shouldn't have too much impact, given that this only
> happens while the IOMMU driver is probing, and thus due to kick the
> deferred probe list again once it finishes.
> 
> Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
> Fixes: 98ac73f99bc4 ("iommu: Require a default_domain for all iommu
> drivers")
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>

Do you think If the below tag can be added?
Closes:
https://lore.kernel.org/all/20241213150415.653821-1-quic_charante@quicinc.com/
> 
> ---
> 
> Note this fixes tag is rather nuanced - historically there was a more
> general issue before deac0b3bed26 ("iommu: Split off default domain
> allocation from group assignment") set the basis for the current
> conditions; 1ea2a07a532b ("iommu: Add DMA ownership management
> interfaces") is then the point at which it becomes logical to fix the
> current race this way; however only from 98ac73f99bc4 can we rely on all
> drivers supporting default domains and so avoid false negatives, thus
> even though this might apply to older kernels without conflict it would
> not be functionally correct. Furthermore given the other locking and API
> flow changes which also happened over that time, it's not clear how long
> this race has actually been exposed. LTS fixes are going to be fiddly...

Thanks again for sharing such a detailed history.

My understanding is that the commit f188056352bc ("iommu: Avoid
locking/unlocking for iommu_probe_device()"), also sets the basis for
the current race conditions. Without this, I think, iommu_probe_device()
would always returns with default domain and later iommu_setup_dma_ops()
fills the 'dev->dma_iommu' flag. Please CMIW..

And, as the commit f188056352bc ("iommu: Avoid locking/unlocking for
iommu_probe_device()") is present from 6.6 LTS, I am not sure if we
really need to look before 6.6 LTS kernels to fix this race.

> ---
>  drivers/iommu/iommu.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 851fd5aeccf5..9969531ad4ed 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -3097,6 +3097,11 @@ int iommu_device_use_default_domain(struct device
> *dev)
>          return 0;
>  
>      mutex_lock(&group->mutex);
> +    /* We may race against bus_iommu_probe() finalising groups here */
> +    if (!group->default_domain) {
> +        ret = -EPROBE_DEFER;
> +        goto unlock_out;
> +    }
>      if (group->owner_cnt) {
>          if (group->domain != group->default_domain || group->owner ||
>              !xa_empty(&group->pasid_array)) {

Re: [PATCH RFC] iommu: fix wrong DMA ops for iommu device
Posted by Robin Murphy 10 months, 3 weeks ago
On 2025-01-24 12:32 am, Charan Teja Kalla wrote:
> 
> 
> On 1/22/2025 8:13 PM, Robin Murphy wrote:
>>>> Please lmk If I can submit this as V2 patch, with your name as
>>>> Suggested-by?
>>> I missed to mention that this patch is fixing the problem we are
>>> seeing..Thanks.
>>
>> Ah, that's good. I did spend most of an afternoon digging through git
>> history to write up this patch the other day, and was planning to send
>> it properly soon (probably after the merge window now) - as noted, it
>> also turns out not to be quite right for 6.6LTS as-is, so I figured I'd
>> save worrying about backports until it lands.
> 
> Thanks a lot Robin for sending this patch with much more details!!
> 
>>
>> Thanks,
>> Robin.
>>
>> ----->8-----
>> Subject: [PATCH] iommu: Handle race with default domain setup
>>
>> It turns out that deferred default domain creation leaves a subtle
>> race window during iommu_device_register() wherein a client driver may
>> asynchronously probe in parallel and get as far as performing DMA API
>> operations with dma-direct, only to be switched to iommu-dma underfoot
>> once the default domain attachment finally happens, with obviously
>> disastrous consequences. Even the wonky of_iommu_configure() path is at
>> risk, since iommu_fwspec_init() will no longer defer client probe as the
>> instance ops are (necessarily) already registered, and the "replay"
>> iommu_probe_device() call can see dev->iommu_group already set and so
>> think there's nothing to do either.
>>
>> Fortunately we already have the right tool in the right place in the
>> form of iommu_device_use_default_domain(), which just needs to ensure
>> that said default domain is actually ready to *be* used. Deferring the
>> client probe shouldn't have too much impact, given that this only
>> happens while the IOMMU driver is probing, and thus due to kick the
>> deferred probe list again once it finishes.
>>
>> Reported-by: Charan Teja Kalla <quic_charante@quicinc.com>
>> Fixes: 98ac73f99bc4 ("iommu: Require a default_domain for all iommu
>> drivers")
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> 
> Do you think If the below tag can be added?
> Closes:
> https://lore.kernel.org/all/20241213150415.653821-1-quic_charante@quicinc.com/

Sure, I guess we don't have a pedantic "Will-close-once-backported:" tag 
and it'll shut checkpatch up at least :)

>> ---
>>
>> Note this fixes tag is rather nuanced - historically there was a more
>> general issue before deac0b3bed26 ("iommu: Split off default domain
>> allocation from group assignment") set the basis for the current
>> conditions; 1ea2a07a532b ("iommu: Add DMA ownership management
>> interfaces") is then the point at which it becomes logical to fix the
>> current race this way; however only from 98ac73f99bc4 can we rely on all
>> drivers supporting default domains and so avoid false negatives, thus
>> even though this might apply to older kernels without conflict it would
>> not be functionally correct. Furthermore given the other locking and API
>> flow changes which also happened over that time, it's not clear how long
>> this race has actually been exposed. LTS fixes are going to be fiddly...
> 
> Thanks again for sharing such a detailed history.
> 
> My understanding is that the commit f188056352bc ("iommu: Avoid
> locking/unlocking for iommu_probe_device()"), also sets the basis for
> the current race conditions. Without this, I think, iommu_probe_device()
> would always returns with default domain and later iommu_setup_dma_ops()
> fills the 'dev->dma_iommu' flag. Please CMIW..
> 
> And, as the commit f188056352bc ("iommu: Avoid locking/unlocking for
> iommu_probe_device()") is present from 6.6 LTS, I am not sure if we
> really need to look before 6.6 LTS kernels to fix this race.

I think you're probably right - if I get time I'll double-check and 
tweak the writeup before I resend. From a very quick look, prior to 
f188056352bc indeed it seems like we shouldn't get the DMA ops confusion 
as the consequence, but instead we're back to potentially creating the 
wrong default domain type if bus_iommu_probe() hasn't found all the 
group devices yet, which in reality is either highly unlikely or already 
very likely for other reasons irrespective of this particular race.

Thanks,
Robin.

>> ---
>>   drivers/iommu/iommu.c | 5 +++++
>>   1 file changed, 5 insertions(+)
>>
>> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 851fd5aeccf5..9969531ad4ed 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -3097,6 +3097,11 @@ int iommu_device_use_default_domain(struct device
>> *dev)
>>           return 0;
>>   
>>       mutex_lock(&group->mutex);
>> +    /* We may race against bus_iommu_probe() finalising groups here */
>> +    if (!group->default_domain) {
>> +        ret = -EPROBE_DEFER;
>> +        goto unlock_out;
>> +    }
>>       if (group->owner_cnt) {
>>           if (group->domain != group->default_domain || group->owner ||
>>               !xa_empty(&group->pasid_array)) {
>