driver core: Don't let a device probe until it's ready

[PATCH v2] driver core: Don't let a device probe until it's ready

Posted by Douglas Anderson 2 days, 10 hours ago

The moment we link a "struct device" into the list of devices for the
bus, it's possible probe can happen. This is because another thread
can load the driver at any time and that can cause the device to
probe. This has been seen in practice with a stack crawl that looks
like this [1]:

  really_probe()
  __driver_probe_device()
  driver_probe_device()
  __driver_attach()
  bus_for_each_dev()
  driver_attach()
  bus_add_driver()
  driver_register()
  __platform_driver_register()
  init_module() [some module]
  do_one_initcall()
  do_init_module()
  load_module()
  __arm64_sys_finit_module()
  invoke_syscall()

As a result of the above, it was seen that device_links_driver_bound()
could be called for the device before "dev->fwnode->dev" was
assigned. This prevented __fw_devlink_pickup_dangling_consumers() from
being called which meant that other devices waiting on our driver's
sub-nodes were stuck deferring forever.

It's believed that this problem is showing up suddenly for two
reasons:
1. Android has recently (last ~1 year) implemented an optimization to
   the order it loads modules [2]. When devices opt-in to this faster
   loading, modules are loaded one-after-the-other very quickly. This
   is unlike how other distributions do it. The reproduction of this
   problem has only been seen on devices that opt-in to Android's
   "parallel module loading".
2. Android devices typically opt-in to fw_devlink, and the most
   noticeable issue is the NULL "dev->fwnode->dev" in
   device_links_driver_bound(). fw_devlink is somewhat new code and
   also not in use by all Linux devices.

Even though the specific symptom where "dev->fwnode->dev" wasn't
assigned could be fixed by moving that assignment higher in
device_add(), other parts of device_add() (like the call to
device_pm_add()) are also important to run before probe. Only moving
the "dev->fwnode->dev" assignment would likely fix the current
symptoms but lead to difficult-to-debug problems in the future.

Fix the problem by preventing probe until device_add() has run far
enough that the device is ready to probe. If somehow we end up trying
to probe before we're allowed, __driver_probe_device() will return
-EPROBE_DEFER which will make certain the device is noticed.

In the race condition that was seen with Android's faster module
loading, we will temporarily add the device to the deferred list and
then take it off immediately when device_add() probes the device.

[1] Captured on a machine running a downstream 6.6 kernel
[2] https://cs.android.com/android/platform/superproject/main/+/main:system/core/libmodprobe/libmodprobe.cpp?q=LoadModulesParallel

Cc: stable@vger.kernel.org
Fixes: 2023c610dc54 ("Driver core: add new device to bus's list before probing")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
---
v1: https://lore.kernel.org/r/20260320200656.RFC.1.Id750b0fbcc94f23ed04b7aecabcead688d0d8c17@changeid

This v2 feels like a very safe change. It doesn't change the ordering
of any steps of probe and it _just_ prevents the early probe from
happening.

I ran tests where I turned the printout "Device not ready_to_probe" on
and I could see the printout happening, evidence of the race occurring
from other printouts, and things successfully being resolved.

Changes in v2:
- Instead of adjusting the ordering, use "ready_to_probe" flag

 drivers/base/core.c    | 15 +++++++++++++++
 drivers/base/dd.c      | 12 ++++++++++++
 include/linux/device.h |  3 +++
 3 files changed, 30 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 09b98f02f559..4caa3fd1ecdb 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3688,6 +3688,21 @@ int device_add(struct device *dev)
 		fw_devlink_link_device(dev);
 	}
 
+	/*
+	 * The moment the device was linked into the bus's "klist_devices" in
+	 * bus_add_device() then it's possible that probe could have been
+	 * attempted in a different thread via userspace loading a driver
+	 * matching the device. "ready_to_probe" being false would have blocked
+	 * those attempts. Now that all of the above initialization has
+	 * happened, unblock probe. If probe happens through another thread
+	 * after this point but before bus_probe_device() runs then it's fine.
+	 * bus_probe_device() -> device_initial_probe() -> __device_attach()
+	 * will notice (under device_lock) that the device is already bound.
+	 */
+	device_lock(dev);
+	dev->ready_to_probe = true;
+	device_unlock(dev);
+
 	bus_probe_device(dev);
 
 	/*
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 37c7e54e0e4c..a1762254828f 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -848,6 +848,18 @@ static int __driver_probe_device(const struct device_driver *drv, struct device
 	if (dev->driver)
 		return -EBUSY;
 
+	/*
+	 * In device_add(), the "struct device" gets linked into the subsystem's
+	 * list of devices and broadcast to userspace (via uevent) before we're
+	 * quite ready to probe. Those open pathways to driver probe before
+	 * we've finished enough of device_add() to reliably support probe.
+	 * Detect this and tell other pathways to try again later. device_add()
+	 * itself will also try to probe immediately after setting
+	 * "ready_to_probe".
+	 */
+	if (!dev->ready_to_probe)
+		return dev_err_probe(dev, -EPROBE_DEFER, "Device not ready_to_probe");
+
 	dev->can_match = true;
 	dev_dbg(dev, "bus: '%s': %s: matched device with driver %s\n",
 		drv->bus->name, __func__, drv->name);
diff --git a/include/linux/device.h b/include/linux/device.h
index e65d564f01cd..e2f83384b627 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -553,6 +553,8 @@ struct device_physical_location {
  * @dma_skip_sync: DMA sync operations can be skipped for coherent buffers.
  * @dma_iommu: Device is using default IOMMU implementation for DMA and
  *		doesn't rely on dma_ops structure.
+ * @ready_to_probe: If set to %true then device_add() has finished enough
+ *		initialization that probe could be called.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -675,6 +677,7 @@ struct device {
 #ifdef CONFIG_IOMMU_DMA
 	bool			dma_iommu:1;
 #endif
+	bool			ready_to_probe:1;
 };
 
 /**
-- 
2.53.0.1018.g2bb0e51243-goog

Re: [PATCH v2] driver core: Don't let a device probe until it's ready

Posted by Danilo Krummrich 1 day, 10 hours ago

On Mon Mar 30, 2026 at 4:28 PM CEST, Douglas Anderson wrote:
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 09b98f02f559..4caa3fd1ecdb 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -3688,6 +3688,21 @@ int device_add(struct device *dev)
>  		fw_devlink_link_device(dev);
>  	}
>  
> +	/*
> +	 * The moment the device was linked into the bus's "klist_devices" in
> +	 * bus_add_device() then it's possible that probe could have been
> +	 * attempted in a different thread via userspace loading a driver
> +	 * matching the device. "ready_to_probe" being false would have blocked
> +	 * those attempts. Now that all of the above initialization has
> +	 * happened, unblock probe. If probe happens through another thread
> +	 * after this point but before bus_probe_device() runs then it's fine.
> +	 * bus_probe_device() -> device_initial_probe() -> __device_attach()
> +	 * will notice (under device_lock) that the device is already bound.
> +	 */
> +	device_lock(dev);
> +	dev->ready_to_probe = true;
> +	device_unlock(dev);
> +
>  	bus_probe_device(dev);
>  
>  	/*
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 37c7e54e0e4c..a1762254828f 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -848,6 +848,18 @@ static int __driver_probe_device(const struct device_driver *drv, struct device
>  	if (dev->driver)
>  		return -EBUSY;
>  
> +	/*
> +	 * In device_add(), the "struct device" gets linked into the subsystem's
> +	 * list of devices and broadcast to userspace (via uevent) before we're
> +	 * quite ready to probe. Those open pathways to driver probe before
> +	 * we've finished enough of device_add() to reliably support probe.
> +	 * Detect this and tell other pathways to try again later. device_add()
> +	 * itself will also try to probe immediately after setting
> +	 * "ready_to_probe".
> +	 */
> +	if (!dev->ready_to_probe)
> +		return dev_err_probe(dev, -EPROBE_DEFER, "Device not ready_to_probe");

Are we sure this dev->ready_to_probe dance does not introduce a new subtle bug
considering that ready_to_probe is within a bitfield of struct device?

I.e. are we sure there are no potential concurrent modifications of other fields
in this bitfield that are not protected with the device lock?

For instance, in __driver_attach() we set dev->can_match if
driver_match_device() returns -EPROBE_DEFER without the device lock held.

This is exactly the case you want to protect against, i.e. device_add() racing
with __driver_attach().

So, there is a chance that the dev->ready_to_probe change gets interleaved with
a dev->can_match change.

I think all this goes away if we stop using bitfields for synchronization; we
should convert some of those to flags that we can modify with set_bit() and
friends instead.

> +
>  	dev->can_match = true;
>  	dev_dbg(dev, "bus: '%s': %s: matched device with driver %s\n",
>  		drv->bus->name, __func__, drv->name);
> diff --git a/include/linux/device.h b/include/linux/device.h
> index e65d564f01cd..e2f83384b627 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -553,6 +553,8 @@ struct device_physical_location {
>   * @dma_skip_sync: DMA sync operations can be skipped for coherent buffers.
>   * @dma_iommu: Device is using default IOMMU implementation for DMA and
>   *		doesn't rely on dma_ops structure.
> + * @ready_to_probe: If set to %true then device_add() has finished enough
> + *		initialization that probe could be called.
>   *
>   * At the lowest level, every device in a Linux system is represented by an
>   * instance of struct device. The device structure contains the information
> @@ -675,6 +677,7 @@ struct device {
>  #ifdef CONFIG_IOMMU_DMA
>  	bool			dma_iommu:1;
>  #endif
> +	bool			ready_to_probe:1;
>  };

Re: [PATCH v2] driver core: Don't let a device probe until it's ready

Posted by Doug Anderson 1 day, 9 hours ago

Hi,

On Tue, Mar 31, 2026 at 7:42 AM Danilo Krummrich <dakr@kernel.org> wrote:
>
> > @@ -848,6 +848,18 @@ static int __driver_probe_device(const struct device_driver *drv, struct device
> >       if (dev->driver)
> >               return -EBUSY;
> >
> > +     /*
> > +      * In device_add(), the "struct device" gets linked into the subsystem's
> > +      * list of devices and broadcast to userspace (via uevent) before we're
> > +      * quite ready to probe. Those open pathways to driver probe before
> > +      * we've finished enough of device_add() to reliably support probe.
> > +      * Detect this and tell other pathways to try again later. device_add()
> > +      * itself will also try to probe immediately after setting
> > +      * "ready_to_probe".
> > +      */
> > +     if (!dev->ready_to_probe)
> > +             return dev_err_probe(dev, -EPROBE_DEFER, "Device not ready_to_probe");
>
> Are we sure this dev->ready_to_probe dance does not introduce a new subtle bug
> considering that ready_to_probe is within a bitfield of struct device?
>
> I.e. are we sure there are no potential concurrent modifications of other fields
> in this bitfield that are not protected with the device lock?
>
> For instance, in __driver_attach() we set dev->can_match if
> driver_match_device() returns -EPROBE_DEFER without the device lock held.

Bleh. Thank you for catching this. I naively assumed the device lock
protected the bitfield, but I didn't verify that.


> This is exactly the case you want to protect against, i.e. device_add() racing
> with __driver_attach().
>
> So, there is a chance that the dev->ready_to_probe change gets interleaved with
> a dev->can_match change.
>
> I think all this goes away if we stop using bitfields for synchronization; we
> should convert some of those to flags that we can modify with set_bit() and
> friends instead.

That sounds reasonable to me. Do you want me to send a v3 where I
create a new "unsigned long flags" in struct device and introduce this
as the first flag? If there are additional bitfields you want me to
convert, I can send them as additional patches in the series as long
as it's not too big of a change...

-Doug

Re: [PATCH v2] driver core: Don't let a device probe until it's ready

Posted by Danilo Krummrich 4 hours ago

On Tue Mar 31, 2026 at 5:26 PM CEST, Doug Anderson wrote:
> Hi,
>
> On Tue, Mar 31, 2026 at 7:42 AM Danilo Krummrich <dakr@kernel.org> wrote:
>>
>> > @@ -848,6 +848,18 @@ static int __driver_probe_device(const struct device_driver *drv, struct device
>> >       if (dev->driver)
>> >               return -EBUSY;
>> >
>> > +     /*
>> > +      * In device_add(), the "struct device" gets linked into the subsystem's
>> > +      * list of devices and broadcast to userspace (via uevent) before we're
>> > +      * quite ready to probe. Those open pathways to driver probe before
>> > +      * we've finished enough of device_add() to reliably support probe.
>> > +      * Detect this and tell other pathways to try again later. device_add()
>> > +      * itself will also try to probe immediately after setting
>> > +      * "ready_to_probe".
>> > +      */
>> > +     if (!dev->ready_to_probe)
>> > +             return dev_err_probe(dev, -EPROBE_DEFER, "Device not ready_to_probe");
>>
>> Are we sure this dev->ready_to_probe dance does not introduce a new subtle bug
>> considering that ready_to_probe is within a bitfield of struct device?
>>
>> I.e. are we sure there are no potential concurrent modifications of other fields
>> in this bitfield that are not protected with the device lock?
>>
>> For instance, in __driver_attach() we set dev->can_match if
>> driver_match_device() returns -EPROBE_DEFER without the device lock held.
>
> Bleh. Thank you for catching this. I naively assumed the device lock
> protected the bitfield, but I didn't verify that.
>
>
>> This is exactly the case you want to protect against, i.e. device_add() racing
>> with __driver_attach().
>>
>> So, there is a chance that the dev->ready_to_probe change gets interleaved with
>> a dev->can_match change.
>>
>> I think all this goes away if we stop using bitfields for synchronization; we
>> should convert some of those to flags that we can modify with set_bit() and
>> friends instead.
>
> That sounds reasonable to me. Do you want me to send a v3 where I
> create a new "unsigned long flags" in struct device and introduce this
> as the first flag? If there are additional bitfields you want me to
> convert, I can send them as additional patches in the series as long
> as it's not too big of a change...

I think the one with the biggest potential to cause real issues is can_match, as
it is modified without the device lock held from __driver_attach(), which can be
called at any time concurrently.

(I think there are others as well, but they are more on the theoretical side of
things. For instance, dma_skip_sync is modified by dma_set_mask(), which
strictly speaking does not require the device lock to be held. In practice,
that's probably never an issue since dma_set_mask() is typically called from bus
callbacks usually, but it's not strictly a requirement.)

More in general, from a robustness point of view, everything that is set once at
device creation time is fine to be a bitfield; bits that are used for
synchronization or are modified concurrently, I'd rather use bitops.

Re: [PATCH v2] driver core: Don't let a device probe until it's ready

Posted by Doug Anderson 4 hours ago

Hi,

On Wed, Apr 1, 2026 at 2:06 PM Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Tue Mar 31, 2026 at 5:26 PM CEST, Doug Anderson wrote:
> > Hi,
> >
> > On Tue, Mar 31, 2026 at 7:42 AM Danilo Krummrich <dakr@kernel.org> wrote:
> >>
> >> > @@ -848,6 +848,18 @@ static int __driver_probe_device(const struct device_driver *drv, struct device
> >> >       if (dev->driver)
> >> >               return -EBUSY;
> >> >
> >> > +     /*
> >> > +      * In device_add(), the "struct device" gets linked into the subsystem's
> >> > +      * list of devices and broadcast to userspace (via uevent) before we're
> >> > +      * quite ready to probe. Those open pathways to driver probe before
> >> > +      * we've finished enough of device_add() to reliably support probe.
> >> > +      * Detect this and tell other pathways to try again later. device_add()
> >> > +      * itself will also try to probe immediately after setting
> >> > +      * "ready_to_probe".
> >> > +      */
> >> > +     if (!dev->ready_to_probe)
> >> > +             return dev_err_probe(dev, -EPROBE_DEFER, "Device not ready_to_probe");
> >>
> >> Are we sure this dev->ready_to_probe dance does not introduce a new subtle bug
> >> considering that ready_to_probe is within a bitfield of struct device?
> >>
> >> I.e. are we sure there are no potential concurrent modifications of other fields
> >> in this bitfield that are not protected with the device lock?
> >>
> >> For instance, in __driver_attach() we set dev->can_match if
> >> driver_match_device() returns -EPROBE_DEFER without the device lock held.
> >
> > Bleh. Thank you for catching this. I naively assumed the device lock
> > protected the bitfield, but I didn't verify that.
> >
> >
> >> This is exactly the case you want to protect against, i.e. device_add() racing
> >> with __driver_attach().
> >>
> >> So, there is a chance that the dev->ready_to_probe change gets interleaved with
> >> a dev->can_match change.
> >>
> >> I think all this goes away if we stop using bitfields for synchronization; we
> >> should convert some of those to flags that we can modify with set_bit() and
> >> friends instead.
> >
> > That sounds reasonable to me. Do you want me to send a v3 where I
> > create a new "unsigned long flags" in struct device and introduce this
> > as the first flag? If there are additional bitfields you want me to
> > convert, I can send them as additional patches in the series as long
> > as it's not too big of a change...
>
> I think the one with the biggest potential to cause real issues is can_match, as
> it is modified without the device lock held from __driver_attach(), which can be
> called at any time concurrently.
>
> (I think there are others as well, but they are more on the theoretical side of
> things. For instance, dma_skip_sync is modified by dma_set_mask(), which
> strictly speaking does not require the device lock to be held. In practice,
> that's probably never an issue since dma_set_mask() is typically called from bus
> callbacks usually, but it's not strictly a requirement.)
>
> More in general, from a robustness point of view, everything that is set once at
> device creation time is fine to be a bitfield; bits that are used for
> synchronization or are modified concurrently, I'd rather use bitops.

OK, thanks! I've almost finished with patches to fully move all of
them to bitops. This means we simply don't need to think about them.
Also: even if they're not truly needed as bitops, it's nice (and saves
space) not to have some of some flags using bitfields and some bitops
unless there's a good reason.

I'll make sure that "can_match" is second in the list of patches. If
you hate the idea of converting the other ones we can just apply the
earlier patches in the series and drop the rest. ;-)

-Doug

Re: [PATCH v2] driver core: Don't let a device probe until it's ready

Posted by Rafael J. Wysocki 2 days, 9 hours ago

On Mon, Mar 30, 2026 at 4:29 PM Douglas Anderson <dianders@chromium.org> wrote:
>
> The moment we link a "struct device" into the list of devices for the
> bus, it's possible probe can happen. This is because another thread
> can load the driver at any time and that can cause the device to
> probe. This has been seen in practice with a stack crawl that looks
> like this [1]:
>
>   really_probe()
>   __driver_probe_device()
>   driver_probe_device()
>   __driver_attach()
>   bus_for_each_dev()
>   driver_attach()
>   bus_add_driver()
>   driver_register()
>   __platform_driver_register()
>   init_module() [some module]
>   do_one_initcall()
>   do_init_module()
>   load_module()
>   __arm64_sys_finit_module()
>   invoke_syscall()
>
> As a result of the above, it was seen that device_links_driver_bound()
> could be called for the device before "dev->fwnode->dev" was
> assigned. This prevented __fw_devlink_pickup_dangling_consumers() from
> being called which meant that other devices waiting on our driver's
> sub-nodes were stuck deferring forever.
>
> It's believed that this problem is showing up suddenly for two
> reasons:
> 1. Android has recently (last ~1 year) implemented an optimization to
>    the order it loads modules [2]. When devices opt-in to this faster
>    loading, modules are loaded one-after-the-other very quickly. This
>    is unlike how other distributions do it. The reproduction of this
>    problem has only been seen on devices that opt-in to Android's
>    "parallel module loading".
> 2. Android devices typically opt-in to fw_devlink, and the most
>    noticeable issue is the NULL "dev->fwnode->dev" in
>    device_links_driver_bound(). fw_devlink is somewhat new code and
>    also not in use by all Linux devices.
>
> Even though the specific symptom where "dev->fwnode->dev" wasn't
> assigned could be fixed by moving that assignment higher in
> device_add(), other parts of device_add() (like the call to
> device_pm_add()) are also important to run before probe. Only moving
> the "dev->fwnode->dev" assignment would likely fix the current
> symptoms but lead to difficult-to-debug problems in the future.
>
> Fix the problem by preventing probe until device_add() has run far
> enough that the device is ready to probe. If somehow we end up trying
> to probe before we're allowed, __driver_probe_device() will return
> -EPROBE_DEFER which will make certain the device is noticed.
>
> In the race condition that was seen with Android's faster module
> loading, we will temporarily add the device to the deferred list and
> then take it off immediately when device_add() probes the device.
>
> [1] Captured on a machine running a downstream 6.6 kernel
> [2] https://cs.android.com/android/platform/superproject/main/+/main:system/core/libmodprobe/libmodprobe.cpp?q=LoadModulesParallel
>
> Cc: stable@vger.kernel.org
> Fixes: 2023c610dc54 ("Driver core: add new device to bus's list before probing")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>

Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>

> ---
> v1: https://lore.kernel.org/r/20260320200656.RFC.1.Id750b0fbcc94f23ed04b7aecabcead688d0d8c17@changeid
>
> This v2 feels like a very safe change. It doesn't change the ordering
> of any steps of probe and it _just_ prevents the early probe from
> happening.
>
> I ran tests where I turned the printout "Device not ready_to_probe" on
> and I could see the printout happening, evidence of the race occurring
> from other printouts, and things successfully being resolved.
>
> Changes in v2:
> - Instead of adjusting the ordering, use "ready_to_probe" flag
>
>  drivers/base/core.c    | 15 +++++++++++++++
>  drivers/base/dd.c      | 12 ++++++++++++
>  include/linux/device.h |  3 +++
>  3 files changed, 30 insertions(+)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 09b98f02f559..4caa3fd1ecdb 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -3688,6 +3688,21 @@ int device_add(struct device *dev)
>                 fw_devlink_link_device(dev);
>         }
>
> +       /*
> +        * The moment the device was linked into the bus's "klist_devices" in
> +        * bus_add_device() then it's possible that probe could have been
> +        * attempted in a different thread via userspace loading a driver
> +        * matching the device. "ready_to_probe" being false would have blocked
> +        * those attempts. Now that all of the above initialization has
> +        * happened, unblock probe. If probe happens through another thread
> +        * after this point but before bus_probe_device() runs then it's fine.
> +        * bus_probe_device() -> device_initial_probe() -> __device_attach()
> +        * will notice (under device_lock) that the device is already bound.
> +        */
> +       device_lock(dev);
> +       dev->ready_to_probe = true;
> +       device_unlock(dev);
> +
>         bus_probe_device(dev);
>
>         /*
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 37c7e54e0e4c..a1762254828f 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -848,6 +848,18 @@ static int __driver_probe_device(const struct device_driver *drv, struct device
>         if (dev->driver)
>                 return -EBUSY;
>
> +       /*
> +        * In device_add(), the "struct device" gets linked into the subsystem's
> +        * list of devices and broadcast to userspace (via uevent) before we're
> +        * quite ready to probe. Those open pathways to driver probe before
> +        * we've finished enough of device_add() to reliably support probe.
> +        * Detect this and tell other pathways to try again later. device_add()
> +        * itself will also try to probe immediately after setting
> +        * "ready_to_probe".
> +        */
> +       if (!dev->ready_to_probe)
> +               return dev_err_probe(dev, -EPROBE_DEFER, "Device not ready_to_probe");
> +
>         dev->can_match = true;
>         dev_dbg(dev, "bus: '%s': %s: matched device with driver %s\n",
>                 drv->bus->name, __func__, drv->name);
> diff --git a/include/linux/device.h b/include/linux/device.h
> index e65d564f01cd..e2f83384b627 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -553,6 +553,8 @@ struct device_physical_location {
>   * @dma_skip_sync: DMA sync operations can be skipped for coherent buffers.
>   * @dma_iommu: Device is using default IOMMU implementation for DMA and
>   *             doesn't rely on dma_ops structure.
> + * @ready_to_probe: If set to %true then device_add() has finished enough
> + *             initialization that probe could be called.
>   *
>   * At the lowest level, every device in a Linux system is represented by an
>   * instance of struct device. The device structure contains the information
> @@ -675,6 +677,7 @@ struct device {
>  #ifdef CONFIG_IOMMU_DMA
>         bool                    dma_iommu:1;
>  #endif
> +       bool                    ready_to_probe:1;
>  };
>
>  /**
> --
> 2.53.0.1018.g2bb0e51243-goog
>

Re: [PATCH v2] driver core: Don't let a device probe until it's ready

Posted by Alan Stern 2 days, 10 hours ago

On Mon, Mar 30, 2026 at 07:28:41AM -0700, Douglas Anderson wrote:
> The moment we link a "struct device" into the list of devices for the
> bus, it's possible probe can happen. This is because another thread
> can load the driver at any time and that can cause the device to
> probe. This has been seen in practice with a stack crawl that looks
> like this [1]:
> 
>   really_probe()
>   __driver_probe_device()
>   driver_probe_device()
>   __driver_attach()
>   bus_for_each_dev()
>   driver_attach()
>   bus_add_driver()
>   driver_register()
>   __platform_driver_register()
>   init_module() [some module]
>   do_one_initcall()
>   do_init_module()
>   load_module()
>   __arm64_sys_finit_module()
>   invoke_syscall()
> 
> As a result of the above, it was seen that device_links_driver_bound()
> could be called for the device before "dev->fwnode->dev" was
> assigned. This prevented __fw_devlink_pickup_dangling_consumers() from
> being called which meant that other devices waiting on our driver's
> sub-nodes were stuck deferring forever.
> 
> It's believed that this problem is showing up suddenly for two
> reasons:
> 1. Android has recently (last ~1 year) implemented an optimization to
>    the order it loads modules [2]. When devices opt-in to this faster
>    loading, modules are loaded one-after-the-other very quickly. This
>    is unlike how other distributions do it. The reproduction of this
>    problem has only been seen on devices that opt-in to Android's
>    "parallel module loading".
> 2. Android devices typically opt-in to fw_devlink, and the most
>    noticeable issue is the NULL "dev->fwnode->dev" in
>    device_links_driver_bound(). fw_devlink is somewhat new code and
>    also not in use by all Linux devices.
> 
> Even though the specific symptom where "dev->fwnode->dev" wasn't
> assigned could be fixed by moving that assignment higher in
> device_add(), other parts of device_add() (like the call to
> device_pm_add()) are also important to run before probe. Only moving
> the "dev->fwnode->dev" assignment would likely fix the current
> symptoms but lead to difficult-to-debug problems in the future.
> 
> Fix the problem by preventing probe until device_add() has run far
> enough that the device is ready to probe. If somehow we end up trying
> to probe before we're allowed, __driver_probe_device() will return
> -EPROBE_DEFER which will make certain the device is noticed.
> 
> In the race condition that was seen with Android's faster module
> loading, we will temporarily add the device to the deferred list and
> then take it off immediately when device_add() probes the device.
> 
> [1] Captured on a machine running a downstream 6.6 kernel
> [2] https://cs.android.com/android/platform/superproject/main/+/main:system/core/libmodprobe/libmodprobe.cpp?q=LoadModulesParallel
> 
> Cc: stable@vger.kernel.org
> Fixes: 2023c610dc54 ("Driver core: add new device to bus's list before probing")
> Signed-off-by: Douglas Anderson <dianders@chromium.org>
> ---

Reviewed-by: Alan Stern <stern@rowland.harvard.edu>