drivers/iommu/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
Commit da33e87bd2bf ("iommu: Handle yet another race around
registration") introduced a readiness check in `iommu_fwspec_init()` to
prevent client drivers from configuring their IOMMUs before
`bus_iommu_probe()` has completed.
To optimize the replay path, the readiness check was conditionally
gated behind `!dev->iommu`:
if (!dev->iommu && !READ_ONCE(iommu->ready))
return -EPROBE_DEFER;
However, this assumption breaks down for devices that map to multiple
IOMMU instances. During the initialization loop over multiple IOMMUs in
`of_iommu_configure_device()`, the first IOMMU successfully allocates
`dev->iommu`. When `iommu_fwspec_init()` is called for the second
IOMMU, `!dev->iommu` evaluates to false, short-circuiting the logic and
entirely bypassing the `iommu->ready` check.
If the second IOMMU is still executing its `bus_iommu_probe()`
concurrently, this allows the client driver to proceed prematurely,
resulting in a late IOMMU probe warning:
dev: late IOMMU probe at driver bind, something fishy here!
WARNING: drivers/iommu/iommu.c:645 at __iommu_probe_device
Fix this by making the `iommu->ready` check unconditional, ensuring
that a device will defer its probe until *all* of its required IOMMUs
are fully registered and ready.
Cc: stable@vger.kernel.org
Fixes: da33e87bd2bf ("iommu: Handle yet another race around registration")
Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path")
Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
---
The warning was observed using an Android 6.19 tree, using downstream
drivers (exynos-decon and samsung-sysmmu-v9).
---
drivers/iommu/iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 78756c3f3c40..e61927b4d41f 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -3042,7 +3042,7 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode)
if (!iommu)
return driver_deferred_probe_check_state(dev);
- if (!dev->iommu && !READ_ONCE(iommu->ready))
+ if (!READ_ONCE(iommu->ready))
return -EPROBE_DEFER;
if (fwspec)
---
base-commit: ca3bbc9287400c1274d87ee57a16e3126ba2969a
change-id: 20260320-iommu-ready-check-4976863957c2
Best regards,
--
Tudor Ambarus <tudor.ambarus@linaro.org>
On 2026-03-23 1:09 pm, Tudor Ambarus wrote:
> Commit da33e87bd2bf ("iommu: Handle yet another race around
> registration") introduced a readiness check in `iommu_fwspec_init()` to
> prevent client drivers from configuring their IOMMUs before
> `bus_iommu_probe()` has completed.
>
> To optimize the replay path, the readiness check was conditionally
> gated behind `!dev->iommu`:
> if (!dev->iommu && !READ_ONCE(iommu->ready))
> return -EPROBE_DEFER;
>
> However, this assumption breaks down for devices that map to multiple
> IOMMU instances. During the initialization loop over multiple IOMMUs in
> `of_iommu_configure_device()`, the first IOMMU successfully allocates
> `dev->iommu`. When `iommu_fwspec_init()` is called for the second
> IOMMU, `!dev->iommu` evaluates to false, short-circuiting the logic and
> entirely bypassing the `iommu->ready` check.
>
> If the second IOMMU is still executing its `bus_iommu_probe()`
> concurrently, this allows the client driver to proceed prematurely,
> resulting in a late IOMMU probe warning:
> dev: late IOMMU probe at driver bind, something fishy here!
> WARNING: drivers/iommu/iommu.c:645 at __iommu_probe_device
>
> Fix this by making the `iommu->ready` check unconditional, ensuring
> that a device will defer its probe until *all* of its required IOMMUs
> are fully registered and ready.
...which is obviously wrong, since the whole point is that we *do* want
of_iommu_xlate() to succeed in the bus_iommu_probe() path before the
IOMMU driver has finished registering, because that's how we get probe
to actually happen correctly in the expected order. This change would
effectively undo the whole thing, except leaving ACPI systems without
IOMMU functionality at all - it'll only be sort-of-working for you
because DT still has the sketchy iommu_probe_device() replay in the
driver bind path.
Honestly we just need to get rid of that replay call, which is the root
of almost all of the problems, but I don't know what to do about all the
remaining of_dma_configure() abusers that are relying on it... :(
Thanks,
Robin.
> Cc: stable@vger.kernel.org
> Fixes: da33e87bd2bf ("iommu: Handle yet another race around registration")
> Fixes: bcb81ac6ae3c ("iommu: Get DT/ACPI parsing into the proper probe path")
> Signed-off-by: Tudor Ambarus <tudor.ambarus@linaro.org>
> ---
> The warning was observed using an Android 6.19 tree, using downstream
> drivers (exynos-decon and samsung-sysmmu-v9).
> ---
> drivers/iommu/iommu.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 78756c3f3c40..e61927b4d41f 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -3042,7 +3042,7 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode)
>
> if (!iommu)
> return driver_deferred_probe_check_state(dev);
> - if (!dev->iommu && !READ_ONCE(iommu->ready))
> + if (!READ_ONCE(iommu->ready))
> return -EPROBE_DEFER;
>
> if (fwspec)
>
> ---
> base-commit: ca3bbc9287400c1274d87ee57a16e3126ba2969a
> change-id: 20260320-iommu-ready-check-4976863957c2
>
> Best regards,
On Mon, Mar 23, 2026 at 01:09:27PM +0000, Tudor Ambarus wrote:
> Commit da33e87bd2bf ("iommu: Handle yet another race around
> registration") introduced a readiness check in `iommu_fwspec_init()` to
> prevent client drivers from configuring their IOMMUs before
> `bus_iommu_probe()` has completed.
>
> To optimize the replay path, the readiness check was conditionally
> gated behind `!dev->iommu`:
> if (!dev->iommu && !READ_ONCE(iommu->ready))
> return -EPROBE_DEFER;
>
> However, this assumption breaks down for devices that map to multiple
> IOMMU instances.
?? We don't directly support "multiple IOMMU instances". There is only
one dev->iommu.
AFAIK if some drivers need to support multiple different instances of
the same IOMMU driver they must deal with this fully internally and
present to the core a "single instance" view.
So, your explanation doesn't make sense to me. If dev->iommu is set
then the driver must be ready, including any multi-instances it has.
If it is not ready then this is really an iommu driver bug, not a core
bug?
Jason
Hi, Jason,
On 3/23/26 3:54 PM, Jason Gunthorpe wrote:
> On Mon, Mar 23, 2026 at 01:09:27PM +0000, Tudor Ambarus wrote:
>> Commit da33e87bd2bf ("iommu: Handle yet another race around
>> registration") introduced a readiness check in `iommu_fwspec_init()` to
>> prevent client drivers from configuring their IOMMUs before
>> `bus_iommu_probe()` has completed.
>>
>> To optimize the replay path, the readiness check was conditionally
>> gated behind `!dev->iommu`:
>> if (!dev->iommu && !READ_ONCE(iommu->ready))
>> return -EPROBE_DEFER;
>>
>> However, this assumption breaks down for devices that map to multiple
>> IOMMU instances.
>
> ?? We don't directly support "multiple IOMMU instances". There is only
> one dev->iommu.
>
> AFAIK if some drivers need to support multiple different instances of
> the same IOMMU driver they must deal with this fully internally and
> present to the core a "single instance" view.
Thanks for the quick answer. I may miss a few things, I should have
marked this as an RFC. Would you please help me understand a little bit
more on this topic?
Downstream we have a display controller that's using:
iommus = <&sysmmu_19840000>, <&sysmmu_19c40000>;
These are 2 distinct platform devices, they probe independently, they
each call iommu_device_register() independently.
If I understood you correctly, the downstream driver shall model its
architecture and call iommu_device_register() only once after both
devices are configured.
My downstream reality is different. Here's what I'm encountering:
1/ sysmmu_19840000: dev->iommu is NULL. iommu_fwspec_init() correctly
evaluates !READ_ONCE(sysmmu_19840000->ready). Assuming it is ready,
it allocates dev->iommu.
2/ dev->iommu is now NOT NULL. iommu_fwspec_init() is called for the
second physical instance.
3/ Because of the !dev->iommu gate, the evaluation of
!READ_ONCE(sysmmu_19c40000->ready) is short-circuited and skipped
entirely.
But sysmmu_19c40000 is not ready, its specific bus_iommu_probe() is
executing asynchronously on another CPU.
If the core's intent is to strictly enforce a single IOMMU instance,
shouldn't iommu_fwspec_init() be checking
fwspec->iommu_fwnode == iommu_fwnode
instead of matching the ops? Because the core currently matches on
ops, it permits aggregating multiple physical instances with the
same ops into one fwspec.
Thanks a ton!
ta
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2940,7 +2940,7 @@ int iommu_fwspec_init(struct device *dev, struct fwnode_handle *iommu_fwnode)
return -EPROBE_DEFER;
if (fwspec)
- return iommu->ops == iommu_fwspec_ops(fwspec) ? 0 : -EINVAL;
+ return fwspec->iommu_fwnode == iommu_fwnode ? 0 : -EINVAL;
if (!dev_iommu_get(dev))
return -ENOMEM;
On Mon, Mar 23, 2026 at 06:46:39PM +0200, Tudor Ambarus wrote: > Downstream we have a display controller that's using: > iommus = <&sysmmu_19840000>, <&sysmmu_19c40000>; > > These are 2 distinct platform devices, they probe independently, they > each call iommu_device_register() independently. Sure, I guessed that is what you ment.. Do you have an example of this in an upstream DTS file? > If I understood you correctly, the downstream driver shall model its > architecture and call iommu_device_register() only once after both > devices are configured. No.. I'm not being so perscriptive, I'm just saying that once iommu->ops->probe_device() returns then the device is fully setup and dev->iommu will operate all of the iommus described in iommus=<..> probe_device() cannot return some half setup device with only some of the iommu instances working. We don't have any core idea of a half setup result from probe_device() today. > If the core's intent is to strictly enforce a single IOMMU instance, > shouldn't iommu_fwspec_init() be checking > fwspec->iommu_fwnode == iommu_fwnode > instead of matching the ops? Because the core currently matches on > ops, it permits aggregating multiple physical instances with the > same ops into one fwspec. The driver is responsible to handle this, not the core. It has to hide this mess under its covers, not rely on multiple calls to of_xlate or however it has been hacked up. Probably it means something like of_xlate/probe_device has to EPROBE_DEFER if all the instances listed in iommus don't exist. Jason
Hi, Jason, On 3/23/26 7:31 PM, Jason Gunthorpe wrote: > On Mon, Mar 23, 2026 at 06:46:39PM +0200, Tudor Ambarus wrote: > >> Downstream we have a display controller that's using: >> iommus = <&sysmmu_19840000>, <&sysmmu_19c40000>; >> >> These are 2 distinct platform devices, they probe independently, they >> each call iommu_device_register() independently. > > Sure, I guessed that is what you ment.. > > Do you have an example of this in an upstream DTS file? Yes, Exynos multimedia blocks use this upstream For example, in arch/arm64/boot/dts/exynos/exynos5433.dtsi, the `decon` and `decon_tv` nodes route through multiple sysmmus: iommus = <&sysmmu_decon0x>, <&sysmmu_decon1x>; Looking at the upstream exynos-iommu.c driver, it doesn't return -EPROBE_DEFER if all the instances listed in iommus doesn't exist. It seems it survives the race though, but only because of the core_initcall ordering. In downstream the IOMMU is forced to be a module which exposes this gap. > >> If I understood you correctly, the downstream driver shall model its >> architecture and call iommu_device_register() only once after both >> devices are configured. > > No.. I'm not being so perscriptive, I'm just saying that once > iommu->ops->probe_device() returns then the device is fully setup and > dev->iommu will operate all of the iommus described in iommus=<..> > > probe_device() cannot return some half setup device with only some of > the iommu instances working. > > We don't have any core idea of a half setup result from > probe_device() today. > >> If the core's intent is to strictly enforce a single IOMMU instance, >> shouldn't iommu_fwspec_init() be checking >> fwspec->iommu_fwnode == iommu_fwnode >> instead of matching the ops? Because the core currently matches on >> ops, it permits aggregating multiple physical instances with the >> same ops into one fwspec. > > The driver is responsible to handle this, not the core. It has to hide > this mess under its covers, not rely on multiple calls to of_xlate or > however it has been hacked up. > > Probably it means something like of_xlate/probe_device has to > EPROBE_DEFER if all the instances listed in iommus don't exist. > I can probably track whether all instances are ready, and defer if any is not ready, but then I'll force the iommu clients to use the sketchy replay path, which seems like a bad idea, according to Robin's feedback. I haven't seen functional problems with the races, just the "something fishy" dev_WARN. Maybe we shall downgrade that to dev_info. Thanks! ta
On Thu, Apr 02, 2026 at 02:25:54PM +0300, Tudor Ambarus wrote: > I can probably track whether all instances are ready, and defer if any > is not ready, but then I'll force the iommu clients to use the sketchy > replay path, which seems like a bad idea, according to Robin's feedback. I didn't think that was sketchy, it is part of the boot ordering system to ensure that the iommu driver(s) is probed before the client devices. Half operating a device is definately going to get things into trouble with broken/incomplete domain attachments at least. Jason
On 2026-04-02 12:59 pm, Jason Gunthorpe wrote: > On Thu, Apr 02, 2026 at 02:25:54PM +0300, Tudor Ambarus wrote: > >> I can probably track whether all instances are ready, and defer if any >> is not ready, but then I'll force the iommu clients to use the sketchy >> replay path, which seems like a bad idea, according to Robin's feedback. > > I didn't think that was sketchy, it is part of the boot ordering > system to ensure that the iommu driver(s) is probed before the client > devices. > > Half operating a device is definately going to get things into trouble > with broken/incomplete domain attachments at least. The Exynos driver itself is actually fine, and doing everything right. We'll never have a "half-configured" client device in IOMMU API terms currently - only once both instances are registered such that both of_xlate calls can succeed (one for each specifier in the client device's "iommus" property) will we proceed to calling probe_device, which will then work as normal. The issue here is purely in the race-avoidance scheme within of_iommu_configure() itself, which hasn't accounted for the fact that when it's looping over multiple specifiers, they don't necessarily all target the same IOMMU node. And it's only during a window where the instance targeted by the first specifier happens to be registered already, and the second is currently in the middle of registering. Thanks, Robin.
© 2016 - 2026 Red Hat, Inc.