[v5] driver core: enforce device_lock for driver_match_device()

[PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Gui-Dong Han 3 weeks, 4 days ago

Currently, driver_match_device() is called from three sites. One site
(__device_attach_driver) holds device_lock(dev), but the other two
(bind_store and __driver_attach) do not. This inconsistency means that
bus match() callbacks are not guaranteed to be called with the lock
held.

Fix this by introducing driver_match_device_locked(), which guarantees
holding the device lock using a scoped guard. Replace the unlocked calls
in bind_store() and __driver_attach() with this new helper. Also add a
lock assertion to driver_match_device() to enforce this guarantee.

This consistency also fixes a known race condition. The driver_override
implementation relies on the device_lock, so the missing lock led to the
use-after-free (UAF) reported in Bugzilla for buses using this field.

Stress testing the two newly locked paths for 24 hours with
CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP enabled showed no UAF recurrence
and no lockdep warnings.

Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Suggested-by: Qiu-ji Chen <chenqiuji666@gmail.com>
Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>
---
v5:
* Introduce driver_match_device_locked() helper using guard(device) to
handle locking, as suggested by Rafael J. Wysocki.
v4:
* Remove the misleading comment above device_lock_assert(), and update
subject and commit message to focus on enforcing consistent locking,
as discussed with Danilo Krummrich.
v3:
* Remove redundant locking comments at call sites and add a blank line
after the lock assertion in driver_match_device(), as suggested by Greg KH.
v2:
* Add device_lock_assert() in driver_match_device() to enforce locking
requirement, as suggested by Greg KH.
v1:
* The Bugzilla entry contains full KASAN reports and two PoCs that reliably
reproduce the UAF on both unlocked paths using a standard QEMU setup
(default e1000 device at 0000:00:03.0).
---
 drivers/base/base.h | 9 +++++++++
 drivers/base/bus.c  | 2 +-
 drivers/base/dd.c   | 2 +-
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 430cbefbc97f..677320881af1 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -182,9 +182,18 @@ void device_set_deferred_probe_reason(const struct device *dev, struct va_format
 static inline int driver_match_device(const struct device_driver *drv,
 				      struct device *dev)
 {
+	device_lock_assert(dev);
+
 	return drv->bus->match ? drv->bus->match(dev, drv) : 1;
 }
 
+static inline int driver_match_device_locked(const struct device_driver *drv,
+					     struct device *dev)
+{
+	guard(device)(dev);
+	return driver_match_device(drv, dev);
+}
+
 static inline void dev_sync_state(struct device *dev)
 {
 	if (dev->bus->sync_state)
diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 9eb7771706f0..331d750465e2 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -263,7 +263,7 @@ static ssize_t bind_store(struct device_driver *drv, const char *buf,
 	int err = -ENODEV;
 
 	dev = bus_find_device_by_name(bus, NULL, buf);
-	if (dev && driver_match_device(drv, dev)) {
+	if (dev && driver_match_device_locked(drv, dev)) {
 		err = device_driver_attach(drv, dev);
 		if (!err) {
 			/* success */
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 349f31bedfa1..98feb4c77160 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -1178,7 +1178,7 @@ static int __driver_attach(struct device *dev, void *data)
 	 * is an error.
 	 */
 
-	ret = driver_match_device(drv, dev);
+	ret = driver_match_device_locked(drv, dev);
 	if (ret == 0) {
 		/* no match */
 		return 0;
-- 
2.43.0

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Mark Brown 2 weeks, 4 days ago

On Wed, Jan 14, 2026 at 12:28:43AM +0800, Gui-Dong Han wrote:
> Currently, driver_match_device() is called from three sites. One site
> (__device_attach_driver) holds device_lock(dev), but the other two
> (bind_store and __driver_attach) do not. This inconsistency means that
> bus match() callbacks are not guaranteed to be called with the lock
> held.

I'm seeing boot hangs on Arm Juno in next/pending-fixes which bisect to
this commit.  The boot grinds to a halt near the end of boot:

[    2.570549] ledtrig-cpu: registered to indicate activity on CPUs
[    2.618301] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    2.623547] msm_serial: driver initialized
[    2.624058] SuperH (H)SCI(F) driver initialized
[    2.624312] STM32 USART driver initialized

with no further output, full log:

   https://lava.sirena.org.uk/scheduler/job/2387335#L862

We are also seeing similar looking boot hangs on some Qualcomm platforms
in Arm's test lab which aren't verified to be the same thing but are
hanging at a similar point in boot.

Bisect log:

# bad: [53d96388f8a868d3f79559a6cf39db81b34f9fb4] Merge branch 'slab/for-next-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git
# good: [24d479d26b25bce5faea3ddd9fa8f3a6c3129ea7] Linux 6.19-rc6
# good: [caa329649259d0f90c0056c9860ca659d4ba3211] spi: intel-pci: Add support for Nova Lake SPI serial flash
# good: [4b58aac989c1e3fafb1c68a733811859df388250] regmap: Fix race condition in hwspinlock irqsave routine
# good: [b062a899c997df7b9ce29c62164888baa7a85833] spi: hisi-kunpeng: Fixed the wrong debugfs node name in hisi_spi debugfs initialization
# good: [f3f380ce6b3d5c9805c7e0b3d5bc28d9ec41e2e8] regmap: maple: free entry on mas_store_gfp() failure
git bisect start '53d96388f8a868d3f79559a6cf39db81b34f9fb4' '24d479d26b25bce5faea3ddd9fa8f3a6c3129ea7' 'caa329649259d0f90c0056c9860ca659d4ba3211' '4b58aac989c1e3fafb1c68a733811859df388250' 'b062a899c997df7b9ce29c62164888baa7a85833' 'f3f380ce6b3d5c9805c7e0b3d5bc28d9ec41e2e8'
# test job: [caa329649259d0f90c0056c9860ca659d4ba3211] https://lava.sirena.org.uk/scheduler/job/2376182
# test job: [4b58aac989c1e3fafb1c68a733811859df388250] https://lava.sirena.org.uk/scheduler/job/2364586
# test job: [b062a899c997df7b9ce29c62164888baa7a85833] https://lava.sirena.org.uk/scheduler/job/2363939
# test job: [f3f380ce6b3d5c9805c7e0b3d5bc28d9ec41e2e8] https://lava.sirena.org.uk/scheduler/job/2331556
# test job: [53d96388f8a868d3f79559a6cf39db81b34f9fb4] https://lava.sirena.org.uk/scheduler/job/2387335
# bad: [53d96388f8a868d3f79559a6cf39db81b34f9fb4] Merge branch 'slab/for-next-fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git
git bisect bad 53d96388f8a868d3f79559a6cf39db81b34f9fb4
# test job: [64f6a0a031257e68060f22d217c53a50a671dfc8] https://lava.sirena.org.uk/scheduler/job/2387395
# good: [64f6a0a031257e68060f22d217c53a50a671dfc8] Merge branch 'for-next' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless.git
git bisect good 64f6a0a031257e68060f22d217c53a50a671dfc8
# test job: [686c6adb9882dd8d9fe1cc816bfd289dfa0abbd1] https://lava.sirena.org.uk/scheduler/job/2387470
# bad: [686c6adb9882dd8d9fe1cc816bfd289dfa0abbd1] Merge branch 'fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86.git
git bisect bad 686c6adb9882dd8d9fe1cc816bfd289dfa0abbd1
# test job: [e03b29b55f2b7c345a919a6ee36633b06bf3fb56] https://lava.sirena.org.uk/scheduler/job/2387535
# good: [e03b29b55f2b7c345a919a6ee36633b06bf3fb56] comedi: dmm32at: serialize use of paged registers
git bisect good e03b29b55f2b7c345a919a6ee36633b06bf3fb56
# test job: [bb25769465146dfc54e90e4066e8f41ec6b25730] https://lava.sirena.org.uk/scheduler/job/2387693
# bad: [bb25769465146dfc54e90e4066e8f41ec6b25730] Merge branch 'mtd/fixes' of https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git
git bisect bad bb25769465146dfc54e90e4066e8f41ec6b25730
# test job: [dc23806a7c47ec5f1293aba407fb69519f976ee0] https://lava.sirena.org.uk/scheduler/job/2388190
# bad: [dc23806a7c47ec5f1293aba407fb69519f976ee0] driver core: enforce device_lock for driver_match_device()
git bisect bad dc23806a7c47ec5f1293aba407fb69519f976ee0
# test job: [5f4476e98387618ce22bb93fb5c11142827458ec] https://lava.sirena.org.uk/scheduler/job/2388401
# good: [5f4476e98387618ce22bb93fb5c11142827458ec] rust: auxiliary: add Driver::unbind() callback
git bisect good 5f4476e98387618ce22bb93fb5c11142827458ec
# test job: [c1d4519e1c36ffa01973e23af4502e69dcd84f39] https://lava.sirena.org.uk/scheduler/job/2388603
# good: [c1d4519e1c36ffa01973e23af4502e69dcd84f39] rust: driver: add DEVICE_DRIVER_OFFSET to the DriverLayout trait
git bisect good c1d4519e1c36ffa01973e23af4502e69dcd84f39
# test job: [a995fe1a3aa78b7d06cc1cc7b6b8436c5e93b07f] https://lava.sirena.org.uk/scheduler/job/2388676
# good: [a995fe1a3aa78b7d06cc1cc7b6b8436c5e93b07f] rust: driver: drop device private data post unbind
git bisect good a995fe1a3aa78b7d06cc1cc7b6b8436c5e93b07f
# first bad commit: [dc23806a7c47ec5f1293aba407fb69519f976ee0] driver core: enforce device_lock for driver_match_device()

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by David Heidelberg 2 weeks, 3 days ago

On 20/01/2026 14:22, Mark Brown wrote:
> On Wed, Jan 14, 2026 at 12:28:43AM +0800, Gui-Dong Han wrote:
>> Currently, driver_match_device() is called from three sites. One site
>> (__device_attach_driver) holds device_lock(dev), but the other two
>> (bind_store and __driver_attach) do not. This inconsistency means that
>> bus match() callbacks are not guaranteed to be called with the lock
>> held.
> 
> I'm seeing boot hangs on Arm Juno in next/pending-fixes which bisect to
> this commit.  The boot grinds to a halt near the end of boot:
> 
> [    2.570549] ledtrig-cpu: registered to indicate activity on CPUs
> [    2.618301] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [    2.623547] msm_serial: driver initialized
> [    2.624058] SuperH (H)SCI(F) driver initialized
> [    2.624312] STM32 USART driver initialized
> 

Similar outcome on sdm845-based:
  - Pixel 3
  - OnePlus 6T

Reverting unblock the boot process.

David

-- 
David Heidelberg

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Marek Szyprowski 2 weeks, 4 days ago

On 20.01.2026 14:22, Mark Brown wrote:
> On Wed, Jan 14, 2026 at 12:28:43AM +0800, Gui-Dong Han wrote:
>> Currently, driver_match_device() is called from three sites. One site
>> (__device_attach_driver) holds device_lock(dev), but the other two
>> (bind_store and __driver_attach) do not. This inconsistency means that
>> bus match() callbacks are not guaranteed to be called with the lock
>> held.
> I'm seeing boot hangs on Arm Juno in next/pending-fixes which bisect to
> this commit.  The boot grinds to a halt near the end of boot:
>
> [    2.570549] ledtrig-cpu: registered to indicate activity on CPUs
> [    2.618301] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [    2.623547] msm_serial: driver initialized
> [    2.624058] SuperH (H)SCI(F) driver initialized
> [    2.624312] STM32 USART driver initialized
>
> with no further output, full log:
>
>     https://lava.sirena.org.uk/scheduler/job/2387335#L862
>
> We are also seeing similar looking boot hangs on some Qualcomm platforms
> in Arm's test lab which aren't verified to be the same thing but are
> hanging at a similar point in boot.

I've observed the same issue on Qualcomm RB5 board and bisecting lead me 
also to this patch. My kernel log also doesn't reveal much information:

...

[    3.671227] vreg_bob: Setting 3008000-4000000uV
[    3.676929] vreg_l1c_1p8: Setting 1800000-1800000uV
[    3.682826] vreg_l2c_1p2: Setting 1200000-1200000uV
[    3.688547] vreg_l3c_0p8: Setting 800000-800000uV
[    3.694080] vreg_l4c_1p7: Setting 1704000-2928000uV
[    3.699908] vreg_l5c_1p8: Setting 1800000-2928000uV
[    3.705763] vreg_l6c_2p96: Setting 1800000-2960000uV
[    3.711684] vreg_l7c_cam_vcm0_2p85: Setting 2856000-3104000uV
[    3.718408] vreg_l8c_1p8: Setting 1800000-1800000uV
[    3.724287] vreg_l9c_2p96: Setting 2704000-2960000uV
[    3.730218] vreg_l10c_3p0: Setting 3000000-3000000uV
[    3.736226] vreg_l11c_3p3: Setting 3296000-3296000uV
[    3.743413] vreg_s8c_1p3: Setting 1352000-1352000uV
[    3.771370] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    3.792020] msm_serial: driver initialized
[    3.797633] SuperH (H)SCI(F) driver initialized
[    3.802881] STM32 USART driver initialized

[hang/freeze]

Best regards
-- 
Marek Szyprowski, PhD
Samsung R&D Institute Poland

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Jon Hunter 2 weeks, 3 days ago

On 20/01/2026 15:23, Marek Szyprowski wrote:
> On 20.01.2026 14:22, Mark Brown wrote:
>> On Wed, Jan 14, 2026 at 12:28:43AM +0800, Gui-Dong Han wrote:
>>> Currently, driver_match_device() is called from three sites. One site
>>> (__device_attach_driver) holds device_lock(dev), but the other two
>>> (bind_store and __driver_attach) do not. This inconsistency means that
>>> bus match() callbacks are not guaranteed to be called with the lock
>>> held.
>> I'm seeing boot hangs on Arm Juno in next/pending-fixes which bisect to
>> this commit.  The boot grinds to a halt near the end of boot:
>>
>> [    2.570549] ledtrig-cpu: registered to indicate activity on CPUs
>> [    2.618301] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
>> [    2.623547] msm_serial: driver initialized
>> [    2.624058] SuperH (H)SCI(F) driver initialized
>> [    2.624312] STM32 USART driver initialized
>>
>> with no further output, full log:
>>
>>      https://lava.sirena.org.uk/scheduler/job/2387335#L862
>>
>> We are also seeing similar looking boot hangs on some Qualcomm platforms
>> in Arm's test lab which aren't verified to be the same thing but are
>> hanging at a similar point in boot.
> 
> I've observed the same issue on Qualcomm RB5 board and bisecting lead me
> also to this patch. My kernel log also doesn't reveal much information:
> 
> ...
> 
> [    3.671227] vreg_bob: Setting 3008000-4000000uV
> [    3.676929] vreg_l1c_1p8: Setting 1800000-1800000uV
> [    3.682826] vreg_l2c_1p2: Setting 1200000-1200000uV
> [    3.688547] vreg_l3c_0p8: Setting 800000-800000uV
> [    3.694080] vreg_l4c_1p7: Setting 1704000-2928000uV
> [    3.699908] vreg_l5c_1p8: Setting 1800000-2928000uV
> [    3.705763] vreg_l6c_2p96: Setting 1800000-2960000uV
> [    3.711684] vreg_l7c_cam_vcm0_2p85: Setting 2856000-3104000uV
> [    3.718408] vreg_l8c_1p8: Setting 1800000-1800000uV
> [    3.724287] vreg_l9c_2p96: Setting 2704000-2960000uV
> [    3.730218] vreg_l10c_3p0: Setting 3000000-3000000uV
> [    3.736226] vreg_l11c_3p3: Setting 3296000-3296000uV
> [    3.743413] vreg_s8c_1p3: Setting 1352000-1352000uV
> [    3.771370] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [    3.792020] msm_serial: driver initialized
> [    3.797633] SuperH (H)SCI(F) driver initialized
> [    3.802881] STM32 USART driver initialized
> 
> [hang/freeze]

I am seeing a similar issue on one of our Tegra boards and bisect also 
points to this commit.

It is odd because it only appears to impact the Tegra194 Jetson Xavier 
NX board (tegra194-p3509-0000+p3668-0000.dts).

It appears to boot enough so the test can SSH into the device, but the 
kernel log does not show the us getting to the console prompt. It also 
appears that a lot of drivers are not bound as expected. I would need to 
check if those are all modules or not.

Jon

-- 
nvpublic

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 3 days ago

On Wed Jan 21, 2026 at 9:00 PM CET, Jon Hunter wrote:
> It is odd because it only appears to impact the Tegra194 Jetson Xavier 
> NX board (tegra194-p3509-0000+p3668-0000.dts).
>
> It appears to boot enough so the test can SSH into the device, but the 
> kernel log does not show the us getting to the console prompt. It also 
> appears that a lot of drivers are not bound as expected. I would need to 
> check if those are all modules or not.

The other reports were fixed by [1], but the issue in arm-smmu-qcom shouldn't be
related in this case.

I quickyl checked all drivers with "tegra194" in their compatible string, but
didn't see anything odd.

Can you please try to enable CONFIG_LOCKDEP, CONFIG_PROVE_LOCKING,
CONFIG_DEBUG_MUTEXES and see if you get a lockdep splat using the following
diff?

(You will see a lockdep warning in faux_bus_init(), it's harmless and can be
ignored.)

[1] https://lore.kernel.org/driver-core/20260121141215.29658-1-dakr@kernel.org/

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 677320881af1..4741412d7e46 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -190,8 +190,13 @@ static inline int driver_match_device(const struct device_driver *drv,
 static inline int driver_match_device_locked(const struct device_driver *drv,
                                             struct device *dev)
 {
-       guard(device)(dev);
-       return driver_match_device(drv, dev);
+       int ret;
+
+       mutex_acquire(&dev->mutex.dep_map, 0, 0, _THIS_IP_);
+       ret = driver_match_device(drv, dev);
+       mutex_release(&dev->mutex.dep_map, _THIS_IP_);
+
+       return ret;
 }

 static inline void dev_sync_state(struct device *dev)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 40de2f51a1b1..56c62b3016aa 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2557,6 +2557,8 @@ static void device_release(struct kobject *kobj)

        kfree(dev->dma_range_map);

+       lockdep_unregister_key(&dev->lock_key);
+
        if (dev->release)
                dev->release(dev);
        else if (dev->type && dev->type->release)
@@ -3159,7 +3161,9 @@ void device_initialize(struct device *dev)
        kobject_init(&dev->kobj, &device_ktype);
        INIT_LIST_HEAD(&dev->dma_pools);
        mutex_init(&dev->mutex);
-       lockdep_set_novalidate_class(&dev->mutex);
+       //lockdep_set_novalidate_class(&dev->mutex);
+       lockdep_register_key(&dev->lock_key);
+       lockdep_set_class(&dev->mutex, &dev->lock_key);
        spin_lock_init(&dev->devres_lock);
        INIT_LIST_HEAD(&dev->devres_head);
        device_pm_init(dev);
diff --git a/include/linux/device.h b/include/linux/device.h
index 0be95294b6e6..dc898a420bc2 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -579,6 +579,7 @@ struct device {
        struct mutex            mutex;  /* mutex to synchronize calls to
                                         * its driver.
                                         */
+       struct lock_class_key lock_key;

        struct dev_links_info   links;
        struct dev_pm_info      power;

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Jon Hunter 2 weeks, 2 days ago

Hi Danilo,

On 21/01/2026 21:42, Danilo Krummrich wrote:
> On Wed Jan 21, 2026 at 9:00 PM CET, Jon Hunter wrote:
>> It is odd because it only appears to impact the Tegra194 Jetson Xavier
>> NX board (tegra194-p3509-0000+p3668-0000.dts).
>>
>> It appears to boot enough so the test can SSH into the device, but the
>> kernel log does not show the us getting to the console prompt. It also
>> appears that a lot of drivers are not bound as expected. I would need to
>> check if those are all modules or not.
> 
> The other reports were fixed by [1], but the issue in arm-smmu-qcom shouldn't be
> related in this case.
> 
> I quickyl checked all drivers with "tegra194" in their compatible string, but
> didn't see anything odd.
> 
> Can you please try to enable CONFIG_LOCKDEP, CONFIG_PROVE_LOCKING,
> CONFIG_DEBUG_MUTEXES and see if you get a lockdep splat using the following
> diff?
> 
> (You will see a lockdep warning in faux_bus_init(), it's harmless and can be
> ignored.)

Thanks. I do the lockdep warning in faux_bus_init() but that's the only 
one. I have verified that all these CONFIGs are correctly enabled in the 
build. The device boots fine with the below diff, but I am guessing that 
that is expected?

Any other thoughts?

Thanks
Jon
  > [1] 
https://lore.kernel.org/driver-core/20260121141215.29658-1-dakr@kernel.org/
> 
> diff --git a/drivers/base/base.h b/drivers/base/base.h
> index 677320881af1..4741412d7e46 100644
> --- a/drivers/base/base.h
> +++ b/drivers/base/base.h
> @@ -190,8 +190,13 @@ static inline int driver_match_device(const struct device_driver *drv,
>   static inline int driver_match_device_locked(const struct device_driver *drv,
>                                               struct device *dev)
>   {
> -       guard(device)(dev);
> -       return driver_match_device(drv, dev);
> +       int ret;
> +
> +       mutex_acquire(&dev->mutex.dep_map, 0, 0, _THIS_IP_);
> +       ret = driver_match_device(drv, dev);
> +       mutex_release(&dev->mutex.dep_map, _THIS_IP_);
> +
> +       return ret;
>   }
> 
>   static inline void dev_sync_state(struct device *dev)
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 40de2f51a1b1..56c62b3016aa 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -2557,6 +2557,8 @@ static void device_release(struct kobject *kobj)
> 
>          kfree(dev->dma_range_map);
> 
> +       lockdep_unregister_key(&dev->lock_key);
> +
>          if (dev->release)
>                  dev->release(dev);
>          else if (dev->type && dev->type->release)
> @@ -3159,7 +3161,9 @@ void device_initialize(struct device *dev)
>          kobject_init(&dev->kobj, &device_ktype);
>          INIT_LIST_HEAD(&dev->dma_pools);
>          mutex_init(&dev->mutex);
> -       lockdep_set_novalidate_class(&dev->mutex);
> +       //lockdep_set_novalidate_class(&dev->mutex);
> +       lockdep_register_key(&dev->lock_key);
> +       lockdep_set_class(&dev->mutex, &dev->lock_key);
>          spin_lock_init(&dev->devres_lock);
>          INIT_LIST_HEAD(&dev->devres_head);
>          device_pm_init(dev);
> diff --git a/include/linux/device.h b/include/linux/device.h
> index 0be95294b6e6..dc898a420bc2 100644
> --- a/include/linux/device.h
> +++ b/include/linux/device.h
> @@ -579,6 +579,7 @@ struct device {
>          struct mutex            mutex;  /* mutex to synchronize calls to
>                                           * its driver.
>                                           */
> +       struct lock_class_key lock_key;
> 
>          struct dev_links_info   links;
>          struct dev_pm_info      power;
> 

-- 
nvpublic

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Gui-Dong Han 2 weeks, 2 days ago

On Fri, Jan 23, 2026 at 1:28 AM Jon Hunter <jonathanh@nvidia.com> wrote:
>
> Hi Danilo,
>
> On 21/01/2026 21:42, Danilo Krummrich wrote:
> > On Wed Jan 21, 2026 at 9:00 PM CET, Jon Hunter wrote:
> >> It is odd because it only appears to impact the Tegra194 Jetson Xavier
> >> NX board (tegra194-p3509-0000+p3668-0000.dts).
> >>
> >> It appears to boot enough so the test can SSH into the device, but the
> >> kernel log does not show the us getting to the console prompt. It also
> >> appears that a lot of drivers are not bound as expected. I would need to
> >> check if those are all modules or not.
> >
> > The other reports were fixed by [1], but the issue in arm-smmu-qcom shouldn't be
> > related in this case.
> >
> > I quickyl checked all drivers with "tegra194" in their compatible string, but
> > didn't see anything odd.
> >
> > Can you please try to enable CONFIG_LOCKDEP, CONFIG_PROVE_LOCKING,
> > CONFIG_DEBUG_MUTEXES and see if you get a lockdep splat using the following
> > diff?
> >
> > (You will see a lockdep warning in faux_bus_init(), it's harmless and can be
> > ignored.)
>
> Thanks. I do the lockdep warning in faux_bus_init() but that's the only
> one. I have verified that all these CONFIGs are correctly enabled in the
> build. The device boots fine with the below diff, but I am guessing that
> that is expected?
>
> Any other thoughts?

Can you please try applying the following commit?

https://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git/commit/?h=driver-core-linus&id=ed1ac3c977dd6b119405fa36dd41f7151bd5b4de

Robin Murphy confirmed that the qcom specific issue might actually
impact other hardware platforms (provided ARM_SMMU_QCOM/ARCH_QCOM is
enabled), as the implementation init code is still executed:

https://lore.kernel.org/driver-core/d2ddbb72-30a8-44da-b761-876b2d37567e@arm.com/

So, this patch might fix the issue on Tegra as well.

Thanks.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 2 days ago

On Thu Jan 22, 2026 at 6:55 PM CET, Gui-Dong Han wrote:
> On Fri, Jan 23, 2026 at 1:28 AM Jon Hunter <jonathanh@nvidia.com> wrote:
>>
>> Hi Danilo,
>>
>> On 21/01/2026 21:42, Danilo Krummrich wrote:
>> > On Wed Jan 21, 2026 at 9:00 PM CET, Jon Hunter wrote:
>> >> It is odd because it only appears to impact the Tegra194 Jetson Xavier
>> >> NX board (tegra194-p3509-0000+p3668-0000.dts).
>> >>
>> >> It appears to boot enough so the test can SSH into the device, but the
>> >> kernel log does not show the us getting to the console prompt. It also
>> >> appears that a lot of drivers are not bound as expected. I would need to
>> >> check if those are all modules or not.
>> >
>> > The other reports were fixed by [1], but the issue in arm-smmu-qcom shouldn't be
>> > related in this case.
>> >
>> > I quickyl checked all drivers with "tegra194" in their compatible string, but
>> > didn't see anything odd.
>> >
>> > Can you please try to enable CONFIG_LOCKDEP, CONFIG_PROVE_LOCKING,
>> > CONFIG_DEBUG_MUTEXES and see if you get a lockdep splat using the following
>> > diff?
>> >
>> > (You will see a lockdep warning in faux_bus_init(), it's harmless and can be
>> > ignored.)
>>
>> Thanks. I do the lockdep warning in faux_bus_init() but that's the only
>> one. I have verified that all these CONFIGs are correctly enabled in the
>> build. The device boots fine with the below diff, but I am guessing that
>> that is expected?

Yes, that's expected, we not actually taking the lock, but assert to lockdep
that we did. The fact that we use a dynamic lock class key for each device mutex
to avoid false positives should also be fine.

>> Any other thoughts?

With this diff, if I intentionally create a deadlock condition on my machine, I
do see a lockdep splat as expected.

Anyways, another option would be to attach a hardware debugger (I assume you
have TRACE32 or something available?) and then get a backtrace from the CPU
affected of the deadlock.

> Can you please try applying the following commit?
>
> https://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git/commit/?h=driver-core-linus&id=ed1ac3c977dd6b119405fa36dd41f7151bd5b4de
>
> Robin Murphy confirmed that the qcom specific issue might actually
> impact other hardware platforms (provided ARM_SMMU_QCOM/ARCH_QCOM is
> enabled), as the implementation init code is still executed:
>
> https://lore.kernel.org/driver-core/d2ddbb72-30a8-44da-b761-876b2d37567e@arm.com/
>
> So, this patch might fix the issue on Tegra as well.

I thought of that as well, but looking at the code in arm_smmu_impl_init(), it
seems that can't happen?

	if (of_device_is_compatible(np, "nvidia,tegra234-smmu") ||
	    of_device_is_compatible(np, "nvidia,tegra194-smmu") ||
	    of_device_is_compatible(np, "nvidia,tegra186-smmu"))
		return nvidia_smmu_impl_init(smmu);
	
	if (IS_ENABLED(CONFIG_ARM_SMMU_QCOM))
		smmu = qcom_smmu_impl_init(smmu);

But maybe there is some odd case where the first if condition does not evaluate
to true on tegra194, so maybe worth a try.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Jon Hunter 2 weeks, 2 days ago


On 22/01/2026 18:12, Danilo Krummrich wrote:

...

>>> Any other thoughts?
> 
> With this diff, if I intentionally create a deadlock condition on my machine, I
> do see a lockdep splat as expected.
> 
> Anyways, another option would be to attach a hardware debugger (I assume you
> have TRACE32 or something available?) and then get a backtrace from the CPU
> affected of the deadlock.

Unfortunately, these days I don't have such tools available so that's 
not an option.

>> Can you please try applying the following commit?
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core.git/commit/?h=driver-core-linus&id=ed1ac3c977dd6b119405fa36dd41f7151bd5b4de
>>
>> Robin Murphy confirmed that the qcom specific issue might actually
>> impact other hardware platforms (provided ARM_SMMU_QCOM/ARCH_QCOM is
>> enabled), as the implementation init code is still executed:
>>
>> https://lore.kernel.org/driver-core/d2ddbb72-30a8-44da-b761-876b2d37567e@arm.com/
>>
>> So, this patch might fix the issue on Tegra as well.
> 
> I thought of that as well, but looking at the code in arm_smmu_impl_init(), it
> seems that can't happen?
> 
> 	if (of_device_is_compatible(np, "nvidia,tegra234-smmu") ||
> 	    of_device_is_compatible(np, "nvidia,tegra194-smmu") ||
> 	    of_device_is_compatible(np, "nvidia,tegra186-smmu"))
> 		return nvidia_smmu_impl_init(smmu);
> 	
> 	if (IS_ENABLED(CONFIG_ARM_SMMU_QCOM))
> 		smmu = qcom_smmu_impl_init(smmu);
> 
> But maybe there is some odd case where the first if condition does not evaluate
> to true on tegra194, so maybe worth a try.

I gave this a shot but that did not help either.

Thanks
Jon

-- 
nvpublic

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 2 days ago

On Thu Jan 22, 2026 at 7:58 PM CET, Jon Hunter wrote:
> On 22/01/2026 18:12, Danilo Krummrich wrote:
>> With this diff, if I intentionally create a deadlock condition on my machine, I
>> do see a lockdep splat as expected.
>> 
>> Anyways, another option would be to attach a hardware debugger (I assume you
>> have TRACE32 or something available?) and then get a backtrace from the CPU
>> affected of the deadlock.
>
> Unfortunately, these days I don't have such tools available so that's 
> not an option.

Hm..slowly running out of options. :)

I remember you previously said that you can still SSH into the machine? If so,
can you please share the the first output of

	echo l > /proc/sysrq-trigger

directly after booting?

Subsequently, can you please also run

	echo w > /proc/sysrq-trigger

and

	echo t > /proc/sysrq-trigger

If the output of the last shows a task in D state, you can also run

	cat /proc/$PID/stack

Also, are there any OOT modules loaded?

Thanks,
Danilo

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Jon Hunter 1 week, 4 days ago

Hi Danilo,

On 22/01/2026 19:35, Danilo Krummrich wrote:
> On Thu Jan 22, 2026 at 7:58 PM CET, Jon Hunter wrote:
>> On 22/01/2026 18:12, Danilo Krummrich wrote:
>>> With this diff, if I intentionally create a deadlock condition on my machine, I
>>> do see a lockdep splat as expected.
>>>
>>> Anyways, another option would be to attach a hardware debugger (I assume you
>>> have TRACE32 or something available?) and then get a backtrace from the CPU
>>> affected of the deadlock.
>>
>> Unfortunately, these days I don't have such tools available so that's
>> not an option.
> 
> Hm..slowly running out of options. :)
> 
> I remember you previously said that you can still SSH into the machine? If so,
> can you please share the the first output of
> 
> 	echo l > /proc/sysrq-trigger
> 
> directly after booting?
> 
> Subsequently, can you please also run
> 
> 	echo w > /proc/sysrq-trigger
> 
> and
> 
> 	echo t > /proc/sysrq-trigger

You can find the output of the above commands here:

https://pastebin.com/PuhFURwh

If you search for 'state:D', you can see that various drivers are stuck 
waiting for the mutex. I believe that this all happens because the SPI 
driver crashed during the probe and prevents any further drivers from 
probing.

Jon

-- 
nvpublic

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Gui-Dong Han 1 week, 4 days ago

On Tue, Jan 27, 2026 at 10:53 PM Jon Hunter <jonathanh@nvidia.com> wrote:
>
> Hi Danilo,
>
> On 22/01/2026 19:35, Danilo Krummrich wrote:
> > On Thu Jan 22, 2026 at 7:58 PM CET, Jon Hunter wrote:
> >> On 22/01/2026 18:12, Danilo Krummrich wrote:
> >>> With this diff, if I intentionally create a deadlock condition on my machine, I
> >>> do see a lockdep splat as expected.
> >>>
> >>> Anyways, another option would be to attach a hardware debugger (I assume you
> >>> have TRACE32 or something available?) and then get a backtrace from the CPU
> >>> affected of the deadlock.
> >>
> >> Unfortunately, these days I don't have such tools available so that's
> >> not an option.
> >
> > Hm..slowly running out of options. :)
> >
> > I remember you previously said that you can still SSH into the machine? If so,
> > can you please share the the first output of
> >
> >       echo l > /proc/sysrq-trigger
> >
> > directly after booting?
> >
> > Subsequently, can you please also run
> >
> >       echo w > /proc/sysrq-trigger
> >
> > and
> >
> >       echo t > /proc/sysrq-trigger
>
> You can find the output of the above commands here:
>
> https://pastebin.com/PuhFURwh

Thanks for the logs.

Looking at the trace, it confirms the previous suspicion. Since there
is no circular dependency shown in the logs, it is not a classic
recursive deadlock but rather the device_lock remaining held due to
the earlier crash in the QSPI driver. This prevented other devices on
the same bus from completing their probe.

I'm glad to hear that Breno's SPI fixes resolve the issue. It's a
happy ending for this case.

Thanks for the hard work on debugging this!

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Jon Hunter 2 weeks, 1 day ago

On 22/01/2026 19:35, Danilo Krummrich wrote:
> On Thu Jan 22, 2026 at 7:58 PM CET, Jon Hunter wrote:
>> On 22/01/2026 18:12, Danilo Krummrich wrote:
>>> With this diff, if I intentionally create a deadlock condition on my machine, I
>>> do see a lockdep splat as expected.
>>>
>>> Anyways, another option would be to attach a hardware debugger (I assume you
>>> have TRACE32 or something available?) and then get a backtrace from the CPU
>>> affected of the deadlock.
>>
>> Unfortunately, these days I don't have such tools available so that's
>> not an option.
> 
> Hm..slowly running out of options. :)

No worries. There appears to be a couple issues going on with this 
board. With the patch reverted the board boots fine and tests pass. Even 
in the passing case with this patch reverted, during boot I see a NULL 
pointer deference crash log from the QSPI driver. So I disabled the QSPI 
device in device-tree and with this patch the board boots fine and tests 
pass.

There is a on-going thread for the QSPI driver to fix these NULL pointer 
deference crashes [0]. So the QSPI driver seems to be the root of the 
problem.

Cheers
Jon

[0] https://lore.kernel.org/linux-tegra/aXJWRUhAe8F67-zG@gmail.com/T/#t

-- 
nvpublic

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 1 day ago

On Fri Jan 23, 2026 at 2:57 PM CET, Jon Hunter wrote:
> No worries. There appears to be a couple issues going on with this 
> board. With the patch reverted the board boots fine and tests pass. Even 
> in the passing case with this patch reverted, during boot I see a NULL 
> pointer deference crash log from the QSPI driver. So I disabled the QSPI 
> device in device-tree and with this patch the board boots fine and tests 
> pass.
>
> There is a on-going thread for the QSPI driver to fix these NULL pointer 
> deference crashes [0]. So the QSPI driver seems to be the root of the 
> problem.
>
> [0] https://lore.kernel.org/linux-tegra/aXJWRUhAe8F67-zG@gmail.com/T/#t

So, are you saying the problems you are seeing are unrelated to this patch and
there is no deadlock? (At least this would explain why we couldn't get a lockdep
splat with the diff I shared. :)

Otherwise, can you please share the output of the commands I shared in my
pevious mail?

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Jon Hunter 2 weeks, 1 day ago

On 23/01/2026 14:09, Danilo Krummrich wrote:
> On Fri Jan 23, 2026 at 2:57 PM CET, Jon Hunter wrote:
>> No worries. There appears to be a couple issues going on with this
>> board. With the patch reverted the board boots fine and tests pass. Even
>> in the passing case with this patch reverted, during boot I see a NULL
>> pointer deference crash log from the QSPI driver. So I disabled the QSPI
>> device in device-tree and with this patch the board boots fine and tests
>> pass.
>>
>> There is a on-going thread for the QSPI driver to fix these NULL pointer
>> deference crashes [0]. So the QSPI driver seems to be the root of the
>> problem.
>>
>> [0] https://lore.kernel.org/linux-tegra/aXJWRUhAe8F67-zG@gmail.com/T/#t
> 
> So, are you saying the problems you are seeing are unrelated to this patch and
> there is no deadlock? (At least this would explain why we couldn't get a lockdep
> splat with the diff I shared. :)

Not exactly. With vanilla -next I see various tests fail on this board 
and I can see various devices are not probed as expected. Bisect pointed 
to this patch.

I can fix this by either:

1. Reverting this patch.
2. Disabling the QSPI driver.

Now the QSPI driver has issues which need to be fixed which I am 
wondering once fix will avoid this problem.

However, I guess regardless of the QSPI issue, should this patch be 
having such an impact?

Looking at the bootlog [0], you can see the crash is occurring during 
the tegra_qspi_probe() and so I am guessing this what leads to the 
deadlock? And may be there is no way to avoid that?

Please note that a lot of the boards I test are in a farm and I don't 
have direct access. So although I can see the test harness SSH'ing into 
the board, I am not accessing directly. However, we can run whatever 
tests we want.

There are no OOT drivers being used this is just vanilla -next.

Jon

[0] https://pastebin.com/wJheruPP

-- 
nvpublic

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 1 day ago

On Fri Jan 23, 2026 at 3:29 PM CET, Jon Hunter wrote:
> I can fix this by either:
>
> 1. Reverting this patch.
> 2. Disabling the QSPI driver.
>
> Now the QSPI driver has issues which need to be fixed which I am 
> wondering once fix will avoid this problem.
>
> However, I guess regardless of the QSPI issue, should this patch be 
> having such an impact?

So, this patch by itself is correct, but it reveals when drivers do the wrong
thing, that is register drivers from contexts where it neither makes sense nor
it is supported by the driver core.

The deadlock happens when a driver (A) registers another driver (B) from a
context where the device lock of the device bound to (A) is held, e.g. from bus
callbacks, such as probe(). See also [1].

While never valid, the deadlock does only occur when (A) and (B) are on the same
bus, e.g. when a platform driver registers another platform driver in its
probe() callback.

However, it is a bit more tricky than that: Let's say a platform driver
registers an SPI controller, then spi_register_controller() might scan the SPI
bus and register SPI devices (not drivers), which are then probed as well. So
far this is all fine, but if now in one of the SPI drivers probe() callbacks a
platform driver is registered, you have a deadlock condition as well.

So it seems that something of this kind is going on with
drivers/spi/spi-tegra210-quad.c.

I did already run quite thorough analysis throughout the whole kernel tree with
various static analyzers and also played around with LLMs for finding this
pattern.

The tools gave me two results:

  (1) The IOMMU one I already fixed [2].
  (2) The GPIO driver I posted a patch for in [3].

I specifically also looked for all drivers that are required to run all the
peripherals in the tegra194-p3509-0000+p3668-0000.dts hierarchy, but couldn't
catch anything.

(This is also why I asked about OOT, because there are quite some compatible
strings that are not supported by any upstream driver.)

I think to really see what's going in with spi-tegra210-quad.c, we need the
dumps of the sysrq-triggers I provided in a previous mail.

I'd also recommend to pick a stable state of the spi-tegra210-quad.c driver and
apply this patch on top (or just apply the spi-tegra210-quad.c fixes as well).

Subsequently, we could try and retest with the diff I provided and the
corresponding lockdep options enabled and with the sysrq-triggers (without the
diff).

[1] https://lore.kernel.org/lkml/DFU7CEPUSG9A.1KKGVW4HIPMSH@kernel.org/
[2] https://lore.kernel.org/all/20260121141215.29658-1-dakr@kernel.org/
[3] https://lore.kernel.org/all/20260123133614.72586-1-dakr@kernel.org/

> Please note that a lot of the boards I test are in a farm and I don't 
> have direct access. So although I can see the test harness SSH'ing into 
> the board, I am not accessing directly. However, we can run whatever 
> tests we want.

Maybe you can trigger the sysrq-trigger from a custom test?

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Gui-Dong Han 2 weeks, 1 day ago

On Sat, Jan 24, 2026 at 12:54 AM Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Fri Jan 23, 2026 at 3:29 PM CET, Jon Hunter wrote:
> > I can fix this by either:
> >
> > 1. Reverting this patch.
> > 2. Disabling the QSPI driver.
> >
> > Now the QSPI driver has issues which need to be fixed which I am
> > wondering once fix will avoid this problem.
> >
> > However, I guess regardless of the QSPI issue, should this patch be
> > having such an impact?
>
> So, this patch by itself is correct, but it reveals when drivers do the wrong
> thing, that is register drivers from contexts where it neither makes sense nor
> it is supported by the driver core.
>
> The deadlock happens when a driver (A) registers another driver (B) from a
> context where the device lock of the device bound to (A) is held, e.g. from bus
> callbacks, such as probe(). See also [1].
>
> While never valid, the deadlock does only occur when (A) and (B) are on the same
> bus, e.g. when a platform driver registers another platform driver in its
> probe() callback.
>
> However, it is a bit more tricky than that: Let's say a platform driver
> registers an SPI controller, then spi_register_controller() might scan the SPI
> bus and register SPI devices (not drivers), which are then probed as well. So
> far this is all fine, but if now in one of the SPI drivers probe() callbacks a
> platform driver is registered, you have a deadlock condition as well.
>
> So it seems that something of this kind is going on with
> drivers/spi/spi-tegra210-quad.c.
>
> I did already run quite thorough analysis throughout the whole kernel tree with
> various static analyzers and also played around with LLMs for finding this
> pattern.
>
> The tools gave me two results:
>
>   (1) The IOMMU one I already fixed [2].
>   (2) The GPIO driver I posted a patch for in [3].
>
> I specifically also looked for all drivers that are required to run all the
> peripherals in the tegra194-p3509-0000+p3668-0000.dts hierarchy, but couldn't
> catch anything.
>
> (This is also why I asked about OOT, because there are quite some compatible
> strings that are not supported by any upstream driver.)
>
> I think to really see what's going in with spi-tegra210-quad.c, we need the
> dumps of the sysrq-triggers I provided in a previous mail.

It seems the issue is simpler than a recursive registration deadlock.
Looking at the logs, tegra_qspi_probe triggers a NULL pointer
dereference (Oops) while holding the device_lock. The mutex likely
remains marked as held/orphaned, blocking subsequent driver bindings
on the same bus.

This likely explains why lockdep was silent. Since this is not a lock
dependency cycle or a recursive locking violation, but rather a lock
remaining held by a terminated task, lockdep would not flag it as a
deadlock pattern.

This is indeed a side effect of enforcing the lock here—it amplifies
the impact of a crash. However, an Oops while holding the device_lock
is generally catastrophic regardless.

Following up on our previous discussion [1], refactoring
driver_override would resolve this. We could move driver_override to
struct device and protect it with a dedicated lock (e.g.,
driver_override_lock). We would then replace driver_set_override with
dev_set_driver_override and add dev_access_driver_override with
internal lock assertions. This allows us to remove device_lock from
the 2 match paths, reducing contention and preventing a single crash
from stalling the whole bus.

However, this deviates from the current paradigm where device_lock
protects sysfs attributes (like waiting_for_supplier and
power/control). If other sysfs attributes are found to share similar
constraints or would benefit from finer-grained locking (which
requires further investigation), we might have a stronger argument for
introducing a more generic sysfs_lock to handle this class of
attributes. We would also need to carefully verify safety during
device removal.

Danilo, what are your thoughts on this refactoring plan? I am willing
to attempt it, but since it touches the driver core, documentation,
and 10+ bus drivers, and I haven't submitted such a large series
before, it may take me a few weeks to get an initial version out, and
additional time to iterate based on review feedback until it is ready
for merging. If you prefer to handle it yourself to expedite things,
please let me know so we don't duplicate efforts.

[1] https://lore.kernel.org/all/DFNI14L1K1I0.3FZ84OAWXY0LP@kernel.org/

Thanks.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 1 day ago

On Fri Jan 23, 2026 at 7:53 PM CET, Gui-Dong Han wrote:
> It seems the issue is simpler than a recursive registration deadlock.
> Looking at the logs, tegra_qspi_probe triggers a NULL pointer
> dereference (Oops) while holding the device_lock. The mutex likely
> remains marked as held/orphaned, blocking subsequent driver bindings
> on the same bus.
>
> This likely explains why lockdep was silent. Since this is not a lock
> dependency cycle or a recursive locking violation, but rather a lock
> remaining held by a terminated task, lockdep would not flag it as a
> deadlock pattern.
>
> This is indeed a side effect of enforcing the lock here—it amplifies
> the impact of a crash. However, an Oops while holding the device_lock
> is generally catastrophic regardless.

This makes sense to me; it might indeed be as simple as that.

> Following up on our previous discussion [1], refactoring
> driver_override would resolve this. We could move driver_override to
> struct device and protect it with a dedicated lock (e.g.,
> driver_override_lock). We would then replace driver_set_override with
> dev_set_driver_override and add dev_access_driver_override with
> internal lock assertions. This allows us to remove device_lock from
> the 2 match paths, reducing contention and preventing a single crash
> from stalling the whole bus.
>
> However, this deviates from the current paradigm where device_lock
> protects sysfs attributes (like waiting_for_supplier and
> power/control). If other sysfs attributes are found to share similar
> constraints or would benefit from finer-grained locking (which
> requires further investigation), we might have a stronger argument for
> introducing a more generic sysfs_lock to handle this class of
> attributes. We would also need to carefully verify safety during
> device removal.
>
> Danilo, what are your thoughts on this refactoring plan? I am willing
> to attempt it, but since it touches the driver core, documentation,
> and 10+ bus drivers, and I haven't submitted such a large series
> before, it may take me a few weeks to get an initial version out, and
> additional time to iterate based on review feedback until it is ready
> for merging. If you prefer to handle it yourself to expedite things,
> please let me know so we don't duplicate efforts.

I think moving driver_override to struct device and providing accessors with
proper lockdep assertions is the correct thing to do. With that, I do not think
a separate lock is necessary.

Please feel free to follow up on this.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Jon Hunter 1 week, 4 days ago

On 23/01/2026 19:07, Danilo Krummrich wrote:
> On Fri Jan 23, 2026 at 7:53 PM CET, Gui-Dong Han wrote:
>> It seems the issue is simpler than a recursive registration deadlock.
>> Looking at the logs, tegra_qspi_probe triggers a NULL pointer
>> dereference (Oops) while holding the device_lock. The mutex likely
>> remains marked as held/orphaned, blocking subsequent driver bindings
>> on the same bus.
>>
>> This likely explains why lockdep was silent. Since this is not a lock
>> dependency cycle or a recursive locking violation, but rather a lock
>> remaining held by a terminated task, lockdep would not flag it as a
>> deadlock pattern.
>>
>> This is indeed a side effect of enforcing the lock here—it amplifies
>> the impact of a crash. However, an Oops while holding the device_lock
>> is generally catastrophic regardless.
> 
> This makes sense to me; it might indeed be as simple as that.

Yes I believe that this is the case too.

BTW, if I apply the SPI series from Breno [0], which fixes crash in the 
SPI driver, then everything works fine.

Jon

[0] 
https://lore.kernel.org/linux-tegra/20260126-tegra_xfer-v2-0-6d2115e4f387@debian.org/T/#t
-- 
nvpublic

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 1 week, 4 days ago

On Tue Jan 27, 2026 at 3:58 PM CET, Jon Hunter wrote:
>
> On 23/01/2026 19:07, Danilo Krummrich wrote:
>> On Fri Jan 23, 2026 at 7:53 PM CET, Gui-Dong Han wrote:
>>> It seems the issue is simpler than a recursive registration deadlock.
>>> Looking at the logs, tegra_qspi_probe triggers a NULL pointer
>>> dereference (Oops) while holding the device_lock. The mutex likely
>>> remains marked as held/orphaned, blocking subsequent driver bindings
>>> on the same bus.
>>>
>>> This likely explains why lockdep was silent. Since this is not a lock
>>> dependency cycle or a recursive locking violation, but rather a lock
>>> remaining held by a terminated task, lockdep would not flag it as a
>>> deadlock pattern.
>>>
>>> This is indeed a side effect of enforcing the lock here—it amplifies
>>> the impact of a crash. However, an Oops while holding the device_lock
>>> is generally catastrophic regardless.
>> 
>> This makes sense to me; it might indeed be as simple as that.
>
> Yes I believe that this is the case too.
>
> BTW, if I apply the SPI series from Breno [0], which fixes crash in the 
> SPI driver, then everything works fine.

Thanks for confirming!

- Danilo

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Mark Brown 2 weeks, 4 days ago

On Tue, Jan 20, 2026 at 04:23:27PM +0100, Marek Szyprowski wrote:
> On 20.01.2026 14:22, Mark Brown wrote:

> > We are also seeing similar looking boot hangs on some Qualcomm platforms
> > in Arm's test lab which aren't verified to be the same thing but are
> > hanging at a similar point in boot.

> I've observed the same issue on Qualcomm RB5 board and bisecting lead me 
> also to this patch. My kernel log also doesn't reveal much information:

Yeah, that's one of the boards we're seeing issues on in Arm's test lab.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 4 days ago

On Tue Jan 20, 2026 at 2:22 PM CET, Mark Brown wrote:
> On Wed, Jan 14, 2026 at 12:28:43AM +0800, Gui-Dong Han wrote:
>> Currently, driver_match_device() is called from three sites. One site
>> (__device_attach_driver) holds device_lock(dev), but the other two
>> (bind_store and __driver_attach) do not. This inconsistency means that
>> bus match() callbacks are not guaranteed to be called with the lock
>> held.
>
> I'm seeing boot hangs on Arm Juno in next/pending-fixes which bisect to
> this commit.  The boot grinds to a halt near the end of boot:
>
> [    2.570549] ledtrig-cpu: registered to indicate activity on CPUs
> [    2.618301] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [    2.623547] msm_serial: driver initialized
> [    2.624058] SuperH (H)SCI(F) driver initialized
> [    2.624312] STM32 USART driver initialized

Hm..sounds a bit like some match() callback manually takes the device_lock() and
the reason we're not seeing anything from lockdep is because it happens with the
serial driver.

I don't have a machine to reproduce it, but for debugging it would probably help
to not actually take the lock in __driver_attach(), but only acquire / release
the corresponding lockdep map. If my suspicion is correct, we should see a
lockdep splat pointing out the issue.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Mark Brown 2 weeks, 4 days ago

On Tue, Jan 20, 2026 at 04:03:03PM +0100, Danilo Krummrich wrote:

> I don't have a machine to reproduce it, but for debugging it would probably help
> to not actually take the lock in __driver_attach(), but only acquire / release
> the corresponding lockdep map. If my suspicion is correct, we should see a
> lockdep splat pointing out the issue.

I tried lockdep but didn't see anything different.  Instrumenting with
printk() tells me it's deadlocking trying to attach arm-smmu on Juno
(that's a v1 SMMU on this platform), I'll try to poke further but it'll
likely be tomorrow at the earliest.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 4 days ago

On Tue Jan 20, 2026 at 6:38 PM CET, Mark Brown wrote:
> On Tue, Jan 20, 2026 at 04:03:03PM +0100, Danilo Krummrich wrote:
>
>> I don't have a machine to reproduce it, but for debugging it would probably help
>> to not actually take the lock in __driver_attach(), but only acquire / release
>> the corresponding lockdep map. If my suspicion is correct, we should see a
>> lockdep splat pointing out the issue.
>
> I tried lockdep but didn't see anything different.  Instrumenting with
> printk() tells me it's deadlocking trying to attach arm-smmu on Juno
> (that's a v1 SMMU on this platform), I'll try to poke further but it'll
> likely be tomorrow at the earliest.

Maybe the following diff faking the lock for lockdep helps, as it should keep
things running, i.e. with this we have the exact same semantics as if we'd
revert the patch (except for the lockdep check of course).

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 677320881af1..4741412d7e46 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -190,8 +190,13 @@ static inline int driver_match_device(const struct device_driver *drv,
 static inline int driver_match_device_locked(const struct device_driver *drv,
                                             struct device *dev)
 {
-       guard(device)(dev);
-       return driver_match_device(drv, dev);
+       int ret;
+
+       mutex_acquire(&dev->mutex.dep_map, 0, 0, _THIS_IP_);
+       ret = driver_match_device(drv, dev);
+       mutex_release(&dev->mutex.dep_map, _THIS_IP_);
+
+       return ret;
 }

 static inline void dev_sync_state(struct device *dev)

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Mark Brown 2 weeks, 4 days ago

On Tue, Jan 20, 2026 at 07:36:05PM +0100, Danilo Krummrich wrote:
> On Tue Jan 20, 2026 at 6:38 PM CET, Mark Brown wrote:

> > I tried lockdep but didn't see anything different.  Instrumenting with
> > printk() tells me it's deadlocking trying to attach arm-smmu on Juno
> > (that's a v1 SMMU on this platform), I'll try to poke further but it'll
> > likely be tomorrow at the earliest.

> Maybe the following diff faking the lock for lockdep helps, as it should keep
> things running, i.e. with this we have the exact same semantics as if we'd
> revert the patch (except for the lockdep check of course).

That does allow us to continue to make progress, the SMMU never manages
to probe AFAICT but we do boot normally.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 4 days ago

On Tue Jan 20, 2026 at 9:05 PM CET, Mark Brown wrote:
> On Tue, Jan 20, 2026 at 07:36:05PM +0100, Danilo Krummrich wrote:
>> On Tue Jan 20, 2026 at 6:38 PM CET, Mark Brown wrote:
>
>> > I tried lockdep but didn't see anything different.  Instrumenting with
>> > printk() tells me it's deadlocking trying to attach arm-smmu on Juno
>> > (that's a v1 SMMU on this platform), I'll try to poke further but it'll
>> > likely be tomorrow at the earliest.
>
>> Maybe the following diff faking the lock for lockdep helps, as it should keep
>> things running, i.e. with this we have the exact same semantics as if we'd
>> revert the patch (except for the lockdep check of course).
>
> That does allow us to continue to make progress, the SMMU never manages
> to probe AFAICT but we do boot normally.

I really would expect a lockdep splat in this case, so I was even about to ask
whether CONFIG_PROVE_LOCKING etc. is enabled. But it's me who messed it up. I
missed that we have lockdep_set_novalidate_class(&dev->mutex).

(The fact that the SMMU never manages to probe must be unrelated, i.e. a
different issue. Since my diff should be equivalent to a revert of the patch,
except that it fakes that the mutex has been taken for lockdep.)

Anyways, this should work:

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 677320881af1..4741412d7e46 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -190,8 +190,13 @@ static inline int driver_match_device(const struct device_driver *drv,
 static inline int driver_match_device_locked(const struct device_driver *drv,
                                             struct device *dev)
 {
-       guard(device)(dev);
-       return driver_match_device(drv, dev);
+       int ret;
+
+       mutex_acquire(&dev->mutex.dep_map, 0, 0, _THIS_IP_);
+       ret = driver_match_device(drv, dev);
+       mutex_release(&dev->mutex.dep_map, _THIS_IP_);
+
+       return ret;
 }

 static inline void dev_sync_state(struct device *dev)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 40de2f51a1b1..af270362aeb7 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3159,7 +3159,7 @@ void device_initialize(struct device *dev)
        kobject_init(&dev->kobj, &device_ktype);
        INIT_LIST_HEAD(&dev->dma_pools);
        mutex_init(&dev->mutex);
-       lockdep_set_novalidate_class(&dev->mutex);
+       //lockdep_set_novalidate_class(&dev->mutex);
        spin_lock_init(&dev->devres_lock);
        INIT_LIST_HEAD(&dev->devres_head);
        device_pm_init(dev);

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 3 days ago

On Tue Jan 20, 2026 at 10:18 PM CET, Danilo Krummrich wrote:
> Anyways, this should work:

I Just notied that I pasted the wrong diff, which was nonsense of course, since
it just unlocks all the suppressed false positives. (Should not have sent it
during a meeting. :)

What I actually intended (not neat, but hopefully helps):

diff --git a/drivers/base/base.h b/drivers/base/base.h
index 677320881af1..4741412d7e46 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -190,8 +190,13 @@ static inline int driver_match_device(const struct device_driver *drv,
 static inline int driver_match_device_locked(const struct device_driver *drv,
                                             struct device *dev)
 {
-       guard(device)(dev);
-       return driver_match_device(drv, dev);
+       int ret;
+
+       mutex_acquire(&dev->mutex.dep_map, 0, 0, _THIS_IP_);
+       ret = driver_match_device(drv, dev);
+       mutex_release(&dev->mutex.dep_map, _THIS_IP_);
+
+       return ret;
 }
 
 static inline void dev_sync_state(struct device *dev)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 40de2f51a1b1..56c62b3016aa 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2557,6 +2557,8 @@ static void device_release(struct kobject *kobj)
 
        kfree(dev->dma_range_map);
 
+       lockdep_unregister_key(&dev->lock_key);
+
        if (dev->release)
                dev->release(dev);
        else if (dev->type && dev->type->release)
@@ -3159,7 +3161,9 @@ void device_initialize(struct device *dev)
        kobject_init(&dev->kobj, &device_ktype);
        INIT_LIST_HEAD(&dev->dma_pools);
        mutex_init(&dev->mutex);
-       lockdep_set_novalidate_class(&dev->mutex);
+       //lockdep_set_novalidate_class(&dev->mutex);
+       lockdep_register_key(&dev->lock_key);
+       lockdep_set_class(&dev->mutex, &dev->lock_key);
        spin_lock_init(&dev->devres_lock);
        INIT_LIST_HEAD(&dev->devres_head);
        device_pm_init(dev);
diff --git a/include/linux/device.h b/include/linux/device.h
index 0be95294b6e6..dc898a420bc2 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -579,6 +579,7 @@ struct device {
        struct mutex            mutex;  /* mutex to synchronize calls to
                                         * its driver.
                                         */
+       struct lock_class_key lock_key;
 
        struct dev_links_info   links;
        struct dev_pm_info      power;

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Gui-Dong Han 2 weeks, 3 days ago

On Wed, Jan 21, 2026 at 9:11 AM Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Tue Jan 20, 2026 at 10:18 PM CET, Danilo Krummrich wrote:
> > Anyways, this should work:
>
> I Just notied that I pasted the wrong diff, which was nonsense of course, since
> it just unlocks all the suppressed false positives. (Should not have sent it
> during a meeting. :)
>
> What I actually intended (not neat, but hopefully helps):

Thanks for the updated diff.

I tested it on my QEMU setup. Since I couldn't reproduce the hang
there, I didn't see any lockdep splats regarding the deadlock.
However, since the physical lock is removed, my PoCs successfully
triggered the UAF on both paths as expected.

I did notice a lockdep warning during boot, which happens every time.
I suspect this is because faux_bus_init is an __init function, so we
are registering a key from memory that gets freed. This seems specific
to the debug code, but I'm pasting it below for reference.

[    2.093905] WARNING: CPU: 1 PID: 1 at kernel/locking/lockdep.c:1227
lockdep_register_key+0x104/0x150
[    2.094924] Modules linked in:
[    2.095682] CPU: 1 UID: 0 PID: 1 Comm: swapper/0 Not tainted
6.17.0-dirty #11 PREEMPT(voluntary)
[    2.096241] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.15.0-1 04/01/2014
[    2.096896] RIP: 0010:lockdep_register_key+0x104/0x150
[    2.097301] Code: 8b 04 ed 20 91 dd 90 4c 89 6b 08 48 89 03 48 89
1c ed 20 91 dd 90 48 85 c0 74 04 48 89 58 08 49
[    2.098909] RSP: 0018:ffff888001217e60 EFLAGS: 00000202
[    2.099260] RAX: 0000000000000001 RBX: ffffffff8f025260 RCX: 0000000000000000
[    2.099566] RDX: 1ffffffff1e04a45 RSI: 0000000000000001 RDI: ffffffff8f025260
[    2.099898] RBP: ffffffff8f025140 R08: 0000000000000004 R09: 0000000000000000
[    2.100228] R10: ffffffff8f025213 R11: 0000000000000000 R12: ffffffff8f025260
[    2.100556] R13: ffff888001209580 R14: 0000000000000000 R15: 0000000000000000
[    2.101923] FS:  0000000000000000(0000) GS:ffff8880dc98d000(0000)
knlGS:0000000000000000
[    2.102298] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    2.102560] CR2: 0000000000000000 CR3: 000000004de8c000 CR4: 00000000000006f0
[    2.102628] Call Trace:
[    2.102628]  <TASK>
[    2.102991]  device_initialize+0xc3/0x500
[    2.103307]  device_register+0xd/0x20
[    2.103525]  faux_bus_init+0x12/0x80
[    2.103730]  driver_init+0x2e/0x50
[    2.104066]  kernel_init_freeable+0x33e/0x6e0
[    2.104335]  ? __pfx_kernel_init+0x10/0x10
[    2.104566]  kernel_init+0x1a/0x1e0
[    2.104765]  ? _raw_spin_unlock_irq+0x23/0x40
[    2.105118]  ret_from_fork+0x255/0x330
[    2.105350]  ? __pfx_kernel_init+0x10/0x10
[    2.105567]  ret_from_fork_asm+0x1a/0x30
[    2.106001]  </TASK>
[    2.106218] irq event stamp: 3545
[    2.106405] hardirqs last  enabled at (3553): [<ffffffff8ace8436>]
__up_console_sem+0x66/0x70
[    2.106892] hardirqs last disabled at (3562): [<ffffffff8ace841b>]
__up_console_sem+0x4b/0x70
[    2.107276] softirqs last  enabled at (3244): [<ffffffff8ab7de83>]
handle_softirqs+0x4f3/0x750
[    2.107644] softirqs last disabled at (3239): [<ffffffff8ab7e214>]
__irq_exit_rcu+0xc4/0x100
[    2.108924] ---[ end trace 0000000000000000 ]---

Thanks.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Gui-Dong Han 2 weeks, 3 days ago

On Wed, Jan 21, 2026 at 3:18 PM Gui-Dong Han <hanguidong02@gmail.com> wrote:
>
> On Wed, Jan 21, 2026 at 9:11 AM Danilo Krummrich <dakr@kernel.org> wrote:
> >
> > On Tue Jan 20, 2026 at 10:18 PM CET, Danilo Krummrich wrote:
> > > Anyways, this should work:
> >
> > I Just notied that I pasted the wrong diff, which was nonsense of course, since
> > it just unlocks all the suppressed false positives. (Should not have sent it
> > during a meeting. :)
> >
> > What I actually intended (not neat, but hopefully helps):
>
> Thanks for the updated diff.
>
> I tested it on my QEMU setup. Since I couldn't reproduce the hang
> there, I didn't see any lockdep splats regarding the deadlock.
> However, since the physical lock is removed, my PoCs successfully
> triggered the UAF on both paths as expected.
>
> I did notice a lockdep warning during boot, which happens every time.
> I suspect this is because faux_bus_init is an __init function, so we
> are registering a key from memory that gets freed. This seems specific
> to the debug code, but I'm pasting it below for reference.

I figured out the root cause.

The warning is triggered because faux_bus_root is a static object.
lockdep_register_key() has a WARN_ON_ONCE(static_obj(key)) check that
forbids registering keys residing in static memory. It is not about
__init memory being freed.

Anyway, this is not a big deal and doesn't impact the testing results.

Thanks.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Greg KH 2 weeks, 3 days ago

On Wed, Jan 21, 2026 at 03:41:56PM +0800, Gui-Dong Han wrote:
> On Wed, Jan 21, 2026 at 3:18 PM Gui-Dong Han <hanguidong02@gmail.com> wrote:
> >
> > On Wed, Jan 21, 2026 at 9:11 AM Danilo Krummrich <dakr@kernel.org> wrote:
> > >
> > > On Tue Jan 20, 2026 at 10:18 PM CET, Danilo Krummrich wrote:
> > > > Anyways, this should work:
> > >
> > > I Just notied that I pasted the wrong diff, which was nonsense of course, since
> > > it just unlocks all the suppressed false positives. (Should not have sent it
> > > during a meeting. :)
> > >
> > > What I actually intended (not neat, but hopefully helps):
> >
> > Thanks for the updated diff.
> >
> > I tested it on my QEMU setup. Since I couldn't reproduce the hang
> > there, I didn't see any lockdep splats regarding the deadlock.
> > However, since the physical lock is removed, my PoCs successfully
> > triggered the UAF on both paths as expected.
> >
> > I did notice a lockdep warning during boot, which happens every time.
> > I suspect this is because faux_bus_init is an __init function, so we
> > are registering a key from memory that gets freed. This seems specific
> > to the debug code, but I'm pasting it below for reference.
> 
> I figured out the root cause.
> 
> The warning is triggered because faux_bus_root is a static object.
> lockdep_register_key() has a WARN_ON_ONCE(static_obj(key)) check that
> forbids registering keys residing in static memory. It is not about
> __init memory being freed.
> 
> Anyway, this is not a big deal and doesn't impact the testing results.

Ooh, nice catch.  Let me go make that a dynamic object.  It really
shouldn't be a static one, I hate static struct device usage, and
complain about it from everyone else.  So there's no reason I should
have used that myself :(

thanks,

greg k-h

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 3 days ago

On Wed Jan 21, 2026 at 8:56 AM CET, Greg KH wrote:
> On Wed, Jan 21, 2026 at 03:41:56PM +0800, Gui-Dong Han wrote:
>> The warning is triggered because faux_bus_root is a static object.
>> lockdep_register_key() has a WARN_ON_ONCE(static_obj(key)) check that
>> forbids registering keys residing in static memory. It is not about
>> __init memory being freed.
>> 
>> Anyway, this is not a big deal and doesn't impact the testing results.

Sorry, I was aware of this (and that it is harmess for testing purposes) and
should have mentioned it so people do not get distracted by it.

> Ooh, nice catch.  Let me go make that a dynamic object.

Also had it on my list to fix up today. :)

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Greg KH 2 weeks, 3 days ago

On Wed, Jan 21, 2026 at 10:54:52AM +0100, Danilo Krummrich wrote:
> On Wed Jan 21, 2026 at 8:56 AM CET, Greg KH wrote:
> > On Wed, Jan 21, 2026 at 03:41:56PM +0800, Gui-Dong Han wrote:
> >> The warning is triggered because faux_bus_root is a static object.
> >> lockdep_register_key() has a WARN_ON_ONCE(static_obj(key)) check that
> >> forbids registering keys residing in static memory. It is not about
> >> __init memory being freed.
> >> 
> >> Anyway, this is not a big deal and doesn't impact the testing results.
> 
> Sorry, I was aware of this (and that it is harmess for testing purposes) and
> should have mentioned it so people do not get distracted by it.
> 
> > Ooh, nice catch.  Let me go make that a dynamic object.
> 
> Also had it on my list to fix up today. :)

I've tested this and have sent it out for "real" now.

thanks,

greg k-h

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Greg KH 2 weeks, 3 days ago

On Wed, Jan 21, 2026 at 08:56:26AM +0100, Greg KH wrote:
> On Wed, Jan 21, 2026 at 03:41:56PM +0800, Gui-Dong Han wrote:
> > On Wed, Jan 21, 2026 at 3:18 PM Gui-Dong Han <hanguidong02@gmail.com> wrote:
> > >
> > > On Wed, Jan 21, 2026 at 9:11 AM Danilo Krummrich <dakr@kernel.org> wrote:
> > > >
> > > > On Tue Jan 20, 2026 at 10:18 PM CET, Danilo Krummrich wrote:
> > > > > Anyways, this should work:
> > > >
> > > > I Just notied that I pasted the wrong diff, which was nonsense of course, since
> > > > it just unlocks all the suppressed false positives. (Should not have sent it
> > > > during a meeting. :)
> > > >
> > > > What I actually intended (not neat, but hopefully helps):
> > >
> > > Thanks for the updated diff.
> > >
> > > I tested it on my QEMU setup. Since I couldn't reproduce the hang
> > > there, I didn't see any lockdep splats regarding the deadlock.
> > > However, since the physical lock is removed, my PoCs successfully
> > > triggered the UAF on both paths as expected.
> > >
> > > I did notice a lockdep warning during boot, which happens every time.
> > > I suspect this is because faux_bus_init is an __init function, so we
> > > are registering a key from memory that gets freed. This seems specific
> > > to the debug code, but I'm pasting it below for reference.
> > 
> > I figured out the root cause.
> > 
> > The warning is triggered because faux_bus_root is a static object.
> > lockdep_register_key() has a WARN_ON_ONCE(static_obj(key)) check that
> > forbids registering keys residing in static memory. It is not about
> > __init memory being freed.
> > 
> > Anyway, this is not a big deal and doesn't impact the testing results.
> 
> Ooh, nice catch.  Let me go make that a dynamic object.  It really
> shouldn't be a static one, I hate static struct device usage, and
> complain about it from everyone else.  So there's no reason I should
> have used that myself :(

Totally untested patch below.  Give me a few hours before I can reboot
and try this, but if you wish to use it, please do!


From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Wed, 21 Jan 2026 09:10:21 +0100
Subject: [PATCH] driver core: faux: stop using static struct device

faux_bus_root should not have been a static struct device, but rather a
dynamically created structure so that lockdep and other testing tools do
not trip over it (as well as being the right thing overall to do.)  Fix
this up by making it properly dynamic.

Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 drivers/base/faux.c | 18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/drivers/base/faux.c b/drivers/base/faux.c
index 21dd02124231..23d725817232 100644
--- a/drivers/base/faux.c
+++ b/drivers/base/faux.c
@@ -29,9 +29,7 @@ struct faux_object {
 };
 #define to_faux_object(dev) container_of_const(dev, struct faux_object, faux_dev.dev)
 
-static struct device faux_bus_root = {
-	.init_name	= "faux",
-};
+static struct device *faux_bus_root;
 
 static int faux_match(struct device *dev, const struct device_driver *drv)
 {
@@ -152,7 +150,7 @@ struct faux_device *faux_device_create_with_groups(const char *name,
 	if (parent)
 		dev->parent = parent;
 	else
-		dev->parent = &faux_bus_root;
+		dev->parent = faux_bus_root;
 	dev->bus = &faux_bus_type;
 	dev_set_name(dev, "%s", name);
 	device_set_pm_not_required(dev);
@@ -236,9 +234,15 @@ int __init faux_bus_init(void)
 {
 	int ret;
 
-	ret = device_register(&faux_bus_root);
+	faux_bus_root = kzalloc(sizeof(*faux_bus_root), GFP_KERNEL);
+	if (!faux_bus_root)
+		return -ENOMEM;
+
+	dev_set_name(faux_bus_root, "faux");
+
+	ret = device_register(faux_bus_root);
 	if (ret) {
-		put_device(&faux_bus_root);
+		put_device(faux_bus_root);
 		return ret;
 	}
 
@@ -256,6 +260,6 @@ int __init faux_bus_init(void)
 	bus_unregister(&faux_bus_type);
 
 error_bus:
-	device_unregister(&faux_bus_root);
+	device_unregister(faux_bus_root);
 	return ret;
 }
-- 
2.52.0

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Mark Brown 2 weeks, 4 days ago

On Tue, Jan 20, 2026 at 04:03:03PM +0100, Danilo Krummrich wrote:
> On Tue Jan 20, 2026 at 2:22 PM CET, Mark Brown wrote:

> > I'm seeing boot hangs on Arm Juno in next/pending-fixes which bisect to
> > this commit.  The boot grinds to a halt near the end of boot:

> Hm..sounds a bit like some match() callback manually takes the device_lock() and
> the reason we're not seeing anything from lockdep is because it happens with the
> serial driver.

> I don't have a machine to reproduce it, but for debugging it would probably help
> to not actually take the lock in __driver_attach(), but only acquire / release
> the corresponding lockdep map. If my suspicion is correct, we should see a
> lockdep splat pointing out the issue.

lockdep isn't enabled by any of the defconfigs so has limited test
coverage, the only one I'm seeing it enabled by is ps3_defconfig.  I'll
try to run some tests with lockdep enabled.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Gui-Dong Han 2 weeks, 4 days ago

On Tue, Jan 20, 2026 at 9:22 PM Mark Brown <broonie@kernel.org> wrote:
>
> On Wed, Jan 14, 2026 at 12:28:43AM +0800, Gui-Dong Han wrote:
> > Currently, driver_match_device() is called from three sites. One site
> > (__device_attach_driver) holds device_lock(dev), but the other two
> > (bind_store and __driver_attach) do not. This inconsistency means that
> > bus match() callbacks are not guaranteed to be called with the lock
> > held.
>
> I'm seeing boot hangs on Arm Juno in next/pending-fixes which bisect to
> this commit.  The boot grinds to a halt near the end of boot:
>
> [    2.570549] ledtrig-cpu: registered to indicate activity on CPUs
> [    2.618301] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [    2.623547] msm_serial: driver initialized
> [    2.624058] SuperH (H)SCI(F) driver initialized
> [    2.624312] STM32 USART driver initialized
>
> with no further output, full log:
>
>    https://lava.sirena.org.uk/scheduler/job/2387335#L862
>
> We are also seeing similar looking boot hangs on some Qualcomm platforms
> in Arm's test lab which aren't verified to be the same thing but are
> hanging at a similar point in boot.

Hi Mark,

Thanks for the report and the detailed bisect log.

I verified this on x86 without issues, but it seems I missed this
regression on Arm platforms. I will investigate the cause of this hang
immediately.

Thanks

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Wang Jiayue 2 weeks, 3 days ago

> I'm seeing boot hangs on Arm Juno in next/pending-fixes which bisect to
> this commit.  The boot grinds to a halt near the end of boot:
> 
> [    2.570549] ledtrig-cpu: registered to indicate activity on CPUs
> [    2.618301] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> [    2.623547] msm_serial: driver initialized
> [    2.624058] SuperH (H)SCI(F) driver initialized
> [    2.624312] STM32 USART driver initialized

After partially modifying juno.dts, I managed to roughly emulate kernel
boot on juno board with qemu and successfully reproduced the boot hang.
Below is the gdb backtrace:

#0  0xffff800080114ae0 in mutex_spin_on_owner (lock=0xffff0000036bfc90, owner=0xffff000003510000, ww_ctx=0x0, waiter=0x0) at kernel/locking/mutex.c:377
#1  0xffff80008118cecc in mutex_optimistic_spin (waiter=<optimized out>, ww_ctx=<optimized out>, lock=<optimized out>) at kernel/locking/mutex.c:480
#2  __mutex_lock_common (use_ww_ctx=<optimized out>, ww_ctx=<optimized out>, ip=<optimized out>, nest_lock=<optimized out>, subclass=<optimized out>, state=<optimized out>, lock=<optimized out>) at kernel/locking/mutex.c:618
#3  __mutex_lock (lock=0xffff0000036bfc90, state=0x2, ip=<optimized out>, nest_lock=<optimized out>, subclass=<optimized out>) at kernel/locking/mutex.c:776
#4  0xffff80008118d1dc in __mutex_lock_slowpath (lock=0xffff0000036bfc90) at kernel/locking/mutex.c:1065
#5  0xffff80008118d230 in mutex_lock (lock=0xffff0000036bfc90) at kernel/locking/mutex.c:290
#6  0xffff8000809cdd1c in device_lock (dev=<optimized out>) at ./include/linux/device.h:895
#7  class_device_constructor (_T=<optimized out>) at ./include/linux/device.h:913
#8  driver_match_device_locked (dev=<optimized out>, drv=<optimized out>) at drivers/base/base.h:193
#9  __driver_attach (dev=0xffff0000036bfc10, data=0xffff800082e64440 <qcom_smmu_tbu_driver+40>) at drivers/base/dd.c:1183
#10 0xffff8000809cb17c in bus_for_each_dev (bus=0xffff0000036bfc90, start=0x0, data=0xffff800082e64440 <qcom_smmu_tbu_driver+40>, fn=0xffff8000809cdcec <__driver_attach>) at drivers/base/bus.c:383
#11 0xffff8000809cd03c in driver_attach (drv=0x0) at drivers/base/dd.c:1245
#12 0xffff8000809cc748 in bus_add_driver (drv=0xffff800082e64440 <qcom_smmu_tbu_driver+40>) at drivers/base/bus.c:715
#13 0xffff8000809ced28 in driver_register (drv=0xffff800082e64440 <qcom_smmu_tbu_driver+40>) at drivers/base/driver.c:249
#14 0xffff8000809d0254 in __platform_driver_register (drv=0x0, owner=0xffff000003510000) at drivers/base/platform.c:908
#15 0xffff8000809a6208 in qcom_smmu_impl_init (smmu=0xffff0000037c0080) at drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c:780
#16 0xffff8000809a48a0 in arm_smmu_impl_init (smmu=0xffff0000037c0080) at drivers/iommu/arm/arm-smmu/arm-smmu-impl.c:224
#17 0xffff8000809a2ae0 in arm_smmu_device_probe (pdev=0xffff0000036bfc00) at drivers/iommu/arm/arm-smmu/arm-smmu.c:2155
#18 0xffff8000809d060c in platform_probe (_dev=0xffff0000036bfc10) at drivers/base/platform.c:1446
#19 0xffff8000809cd6a4 in call_driver_probe (drv=<optimized out>, dev=<optimized out>) at drivers/base/dd.c:583
#20 really_probe (dev=0xffff0000036bfc10, drv=0xffff800082e641c0 <arm_smmu_driver+40>) at drivers/base/dd.c:661
#21 0xffff8000809cd8f8 in __driver_probe_device (drv=0xffff800082e641c0 <arm_smmu_driver+40>, dev=0xffff0000036bfc10) at drivers/base/dd.c:803
#22 0xffff8000809cdb34 in driver_probe_device (drv=0xffff0000036bfc90, dev=0xffff0000036bfc10) at drivers/base/dd.c:833
#23 0xffff8000809cddb8 in __driver_attach (data=<optimized out>, dev=<optimized out>) at drivers/base/dd.c:1227
#24 __driver_attach (dev=0xffff0000036bfc10, data=0xffff800082e641c0 <arm_smmu_driver+40>) at drivers/base/dd.c:1167
#25 0xffff8000809cb17c in bus_for_each_dev (bus=0xffff0000036bfc90, start=0x0, data=0xffff800082e641c0 <arm_smmu_driver+40>, fn=0xffff8000809cdcec <__driver_attach>) at drivers/base/bus.c:383
#26 0xffff8000809cd03c in driver_attach (drv=0x0) at drivers/base/dd.c:1245
#27 0xffff8000809cc748 in bus_add_driver (drv=0xffff800082e641c0 <arm_smmu_driver+40>) at drivers/base/bus.c:715
#28 0xffff8000809ced28 in driver_register (drv=0xffff800082e641c0 <arm_smmu_driver+40>) at drivers/base/driver.c:249
#29 0xffff8000809d0254 in __platform_driver_register (drv=0x0, owner=0xffff000003510000) at drivers/base/platform.c:908
#30 0xffff800081f3d12c in arm_smmu_driver_init () at drivers/iommu/arm/arm-smmu/arm-smmu.c:2368
#31 0xffff800080015218 in do_one_initcall (fn=0xffff800081f3d10c <arm_smmu_driver_init>) at init/main.c:1378
#32 0xffff800081ed13e4 in do_initcall_level (command_line=<optimized out>, level=<optimized out>) at init/main.c:1440
#33 do_initcalls () at init/main.c:1456
#34 do_basic_setup () at init/main.c:1475
#35 kernel_init_freeable () at init/main.c:1688
#36 0xffff800081187b50 in kernel_init (unused=0xffff0000036bfc90) at init/main.c:1578
#37 0xffff800080015f58 in ret_from_fork () at arch/arm64/kernel/entry.S:860
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 3 days ago

(Cc: Rob, Will, Robin, Joerg)

On Wed Jan 21, 2026 at 9:55 AM CET, Wang Jiayue wrote:
> After partially modifying juno.dts, I managed to roughly emulate kernel
> boot on juno board with qemu and successfully reproduced the boot hang.
> Below is the gdb backtrace:
>
> #0  0xffff800080114ae0 in mutex_spin_on_owner (lock=0xffff0000036bfc90, owner=0xffff000003510000, ww_ctx=0x0, waiter=0x0) at kernel/locking/mutex.c:377
> #1  0xffff80008118cecc in mutex_optimistic_spin (waiter=<optimized out>, ww_ctx=<optimized out>, lock=<optimized out>) at kernel/locking/mutex.c:480
> #2  __mutex_lock_common (use_ww_ctx=<optimized out>, ww_ctx=<optimized out>, ip=<optimized out>, nest_lock=<optimized out>, subclass=<optimized out>, state=<optimized out>, lock=<optimized out>) at kernel/locking/mutex.c:618
> #3  __mutex_lock (lock=0xffff0000036bfc90, state=0x2, ip=<optimized out>, nest_lock=<optimized out>, subclass=<optimized out>) at kernel/locking/mutex.c:776
> #4  0xffff80008118d1dc in __mutex_lock_slowpath (lock=0xffff0000036bfc90) at kernel/locking/mutex.c:1065
> #5  0xffff80008118d230 in mutex_lock (lock=0xffff0000036bfc90) at kernel/locking/mutex.c:290
> #6  0xffff8000809cdd1c in device_lock (dev=<optimized out>) at ./include/linux/device.h:895
> #7  class_device_constructor (_T=<optimized out>) at ./include/linux/device.h:913
> #8  driver_match_device_locked (dev=<optimized out>, drv=<optimized out>) at drivers/base/base.h:193
> #9  __driver_attach (dev=0xffff0000036bfc10, data=0xffff800082e64440 <qcom_smmu_tbu_driver+40>) at drivers/base/dd.c:1183
> #10 0xffff8000809cb17c in bus_for_each_dev (bus=0xffff0000036bfc90, start=0x0, data=0xffff800082e64440 <qcom_smmu_tbu_driver+40>, fn=0xffff8000809cdcec <__driver_attach>) at drivers/base/bus.c:383
> #11 0xffff8000809cd03c in driver_attach (drv=0x0) at drivers/base/dd.c:1245
> #12 0xffff8000809cc748 in bus_add_driver (drv=0xffff800082e64440 <qcom_smmu_tbu_driver+40>) at drivers/base/bus.c:715
> #13 0xffff8000809ced28 in driver_register (drv=0xffff800082e64440 <qcom_smmu_tbu_driver+40>) at drivers/base/driver.c:249
> #14 0xffff8000809d0254 in __platform_driver_register (drv=0x0, owner=0xffff000003510000) at drivers/base/platform.c:908
> #15 0xffff8000809a6208 in qcom_smmu_impl_init (smmu=0xffff0000037c0080) at drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c:780
> #16 0xffff8000809a48a0 in arm_smmu_impl_init (smmu=0xffff0000037c0080) at drivers/iommu/arm/arm-smmu/arm-smmu-impl.c:224
> #17 0xffff8000809a2ae0 in arm_smmu_device_probe (pdev=0xffff0000036bfc00) at drivers/iommu/arm/arm-smmu/arm-smmu.c:2155
> #18 0xffff8000809d060c in platform_probe (_dev=0xffff0000036bfc10) at drivers/base/platform.c:1446
> #19 0xffff8000809cd6a4 in call_driver_probe (drv=<optimized out>, dev=<optimized out>) at drivers/base/dd.c:583
> #20 really_probe (dev=0xffff0000036bfc10, drv=0xffff800082e641c0 <arm_smmu_driver+40>) at drivers/base/dd.c:661
> #21 0xffff8000809cd8f8 in __driver_probe_device (drv=0xffff800082e641c0 <arm_smmu_driver+40>, dev=0xffff0000036bfc10) at drivers/base/dd.c:803
> #22 0xffff8000809cdb34 in driver_probe_device (drv=0xffff0000036bfc90, dev=0xffff0000036bfc10) at drivers/base/dd.c:833
> #23 0xffff8000809cddb8 in __driver_attach (data=<optimized out>, dev=<optimized out>) at drivers/base/dd.c:1227
> #24 __driver_attach (dev=0xffff0000036bfc10, data=0xffff800082e641c0 <arm_smmu_driver+40>) at drivers/base/dd.c:1167
> #25 0xffff8000809cb17c in bus_for_each_dev (bus=0xffff0000036bfc90, start=0x0, data=0xffff800082e641c0 <arm_smmu_driver+40>, fn=0xffff8000809cdcec <__driver_attach>) at drivers/base/bus.c:383
> #26 0xffff8000809cd03c in driver_attach (drv=0x0) at drivers/base/dd.c:1245
> #27 0xffff8000809cc748 in bus_add_driver (drv=0xffff800082e641c0 <arm_smmu_driver+40>) at drivers/base/bus.c:715
> #28 0xffff8000809ced28 in driver_register (drv=0xffff800082e641c0 <arm_smmu_driver+40>) at drivers/base/driver.c:249
> #29 0xffff8000809d0254 in __platform_driver_register (drv=0x0, owner=0xffff000003510000) at drivers/base/platform.c:908
> #30 0xffff800081f3d12c in arm_smmu_driver_init () at drivers/iommu/arm/arm-smmu/arm-smmu.c:2368
> #31 0xffff800080015218 in do_one_initcall (fn=0xffff800081f3d10c <arm_smmu_driver_init>) at init/main.c:1378
> #32 0xffff800081ed13e4 in do_initcall_level (command_line=<optimized out>, level=<optimized out>) at init/main.c:1440
> #33 do_initcalls () at init/main.c:1456
> #34 do_basic_setup () at init/main.c:1475
> #35 kernel_init_freeable () at init/main.c:1688
> #36 0xffff800081187b50 in kernel_init (unused=0xffff0000036bfc90) at init/main.c:1578
> #37 0xffff800080015f58 in ret_from_fork () at arch/arm64/kernel/entry.S:860
> Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Thanks, this backtrace is very helpful. My lockdep patch should reveal the same
issue once run on real hardware, but with this it's probably not even necessary
anymore.

So, the problem is that in the callstack of the arm-smmu driver's (a platform
driver) probe() function, the QCOM specific code (through arm_smmu_impl_init())
registers another platform driver. Since we are still in probe() of arm-smmu the
call to platform_driver_register() happens with the device lock of the arm-smmu
platform device held.

platform_driver_register() eventually results in driver_attach() which iterates
over all the devices of a bus. Since the device we are probing and the driver we
are registering are for the same bus (i.e. the platform bus) it can now happen
that by chance that we also match the exact same device that is currently probed
again. And since we take the device lock for matching now, we actually take the
same lock twice.

Now, we could avoid this by not matching bound devices, but we check this
through dev->driver while holding the device lock, so that doesn't help.

But on the other hand, I don't see any reason why a driver would call
platform_driver_register() from probe() in the first place. I think drivers
should not do that and instead just register the driver through a normal
initcall.

(If, however, it turns out that registering drivers from probe() is something we
really need for some reason, it is probably best to drop the patch and don't
make any guarantees about whether match() is called with the device lock held or
not.

Consequently, driver_override must be protected with a separate lock (which
would be the cleaner solution in any case).)

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 3 days ago

On Wed Jan 21, 2026 at 11:40 AM CET, Danilo Krummrich wrote:
> So, the problem is that in the callstack of the arm-smmu driver's (a platform
> driver) probe() function, the QCOM specific code (through arm_smmu_impl_init())
> registers another platform driver. Since we are still in probe() of arm-smmu the
> call to platform_driver_register() happens with the device lock of the arm-smmu
> platform device held.
>
> platform_driver_register() eventually results in driver_attach() which iterates
> over all the devices of a bus. Since the device we are probing and the driver we
> are registering are for the same bus (i.e. the platform bus) it can now happen
> that by chance that we also match the exact same device that is currently probed
> again. And since we take the device lock for matching now, we actually take the
> same lock twice.
>
> Now, we could avoid this by not matching bound devices, but we check this
> through dev->driver while holding the device lock, so that doesn't help.
>
> But on the other hand, I don't see any reason why a driver would call
> platform_driver_register() from probe() in the first place. I think drivers
> should not do that and instead just register the driver through a normal
> initcall.
>
> (If, however, it turns out that registering drivers from probe() is something we
> really need for some reason, it is probably best to drop the patch and don't
> make any guarantees about whether match() is called with the device lock held or
> not.
>
> Consequently, driver_override must be protected with a separate lock (which
> would be the cleaner solution in any case).)

I assume that this should resolve the problem (unless there are more drivers
that register drivers in probe()):

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
index 573085349df3..9bb793efc35f 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
@@ -774,10 +774,6 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
 {
        const struct device_node *np = smmu->dev->of_node;
        const struct of_device_id *match;
-       static u8 tbu_registered;
-
-       if (!tbu_registered++)
-               platform_driver_register(&qcom_smmu_tbu_driver);

 #ifdef CONFIG_ACPI
        if (np == NULL) {
@@ -802,3 +798,5 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)

        return smmu;
 }
+
+builtin_platform_driver(qcom_smmu_tbu_driver);

@qcom maintainers: I'm aware of commit 0b4eeee2876f ("iommu/arm-smmu-qcom:
Register the TBU driver in qcom_smmu_impl_init"), but I think the above patch
should work fine as it is still *not only* registered when
CONFIG_ARM_SMMU_QCOM_DEBUG?

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Jiayue Wang 2 weeks, 3 days ago

>
> On Wed Jan 21, 2026 at 11:40 AM CET, Danilo Krummrich wrote:
> > So, the problem is that in the callstack of the arm-smmu driver's (a platform
> > driver) probe() function, the QCOM specific code (through arm_smmu_impl_init())
> > registers another platform driver. Since we are still in probe() of arm-smmu the
> > call to platform_driver_register() happens with the device lock of the arm-smmu
> > platform device held.
> >
> > platform_driver_register() eventually results in driver_attach() which iterates
> > over all the devices of a bus. Since the device we are probing and the driver we
> > are registering are for the same bus (i.e. the platform bus) it can now happen
> > that by chance that we also match the exact same device that is currently probed
> > again. And since we take the device lock for matching now, we actually take the
> > same lock twice.
> >
> > Now, we could avoid this by not matching bound devices, but we check this
> > through dev->driver while holding the device lock, so that doesn't help.
> >
> > But on the other hand, I don't see any reason why a driver would call
> > platform_driver_register() from probe() in the first place. I think drivers
> > should not do that and instead just register the driver through a normal
> > initcall.
> >
> > (If, however, it turns out that registering drivers from probe() is something we
> > really need for some reason, it is probably best to drop the patch and don't
> > make any guarantees about whether match() is called with the device lock held or
> > not.
> >
> > Consequently, driver_override must be protected with a separate lock (which
> > would be the cleaner solution in any case).)
>
> I assume that this should resolve the problem (unless there are more drivers
> that register drivers in probe()):

I tested this patch on qemu, and the boot appears to be normal now
without deadlocks. I encountered some other minor issues, but those
are likely due to my forced emulation in qemu and should be unrelated.

Tested-by: Wang Jiayue <akaieurus@gmail.com>

>
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index 573085349df3..9bb793efc35f 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -774,10 +774,6 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
>  {
>         const struct device_node *np = smmu->dev->of_node;
>         const struct of_device_id *match;
> -       static u8 tbu_registered;
> -
> -       if (!tbu_registered++)
> -               platform_driver_register(&qcom_smmu_tbu_driver);
>
>  #ifdef CONFIG_ACPI
>         if (np == NULL) {
> @@ -802,3 +798,5 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
>
>         return smmu;
>  }
> +
> +builtin_platform_driver(qcom_smmu_tbu_driver);
>
> @qcom maintainers: I'm aware of commit 0b4eeee2876f ("iommu/arm-smmu-qcom:
> Register the TBU driver in qcom_smmu_impl_init"), but I think the above patch
> should work fine as it is still *not only* registered when
> CONFIG_ARM_SMMU_QCOM_DEBUG?

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Robin Murphy 2 weeks, 3 days ago

On 2026-01-21 11:02 am, Danilo Krummrich wrote:
> On Wed Jan 21, 2026 at 11:40 AM CET, Danilo Krummrich wrote:
>> So, the problem is that in the callstack of the arm-smmu driver's (a platform
>> driver) probe() function, the QCOM specific code (through arm_smmu_impl_init())
>> registers another platform driver. Since we are still in probe() of arm-smmu the
>> call to platform_driver_register() happens with the device lock of the arm-smmu
>> platform device held.
>>
>> platform_driver_register() eventually results in driver_attach() which iterates
>> over all the devices of a bus. Since the device we are probing and the driver we
>> are registering are for the same bus (i.e. the platform bus) it can now happen
>> that by chance that we also match the exact same device that is currently probed
>> again. And since we take the device lock for matching now, we actually take the
>> same lock twice.
>>
>> Now, we could avoid this by not matching bound devices, but we check this
>> through dev->driver while holding the device lock, so that doesn't help.
>>
>> But on the other hand, I don't see any reason why a driver would call
>> platform_driver_register() from probe() in the first place. I think drivers
>> should not do that and instead just register the driver through a normal
>> initcall.
>>
>> (If, however, it turns out that registering drivers from probe() is something we
>> really need for some reason, it is probably best to drop the patch and don't
>> make any guarantees about whether match() is called with the device lock held or
>> not.
>>
>> Consequently, driver_override must be protected with a separate lock (which
>> would be the cleaner solution in any case).)
> 
> I assume that this should resolve the problem (unless there are more drivers
> that register drivers in probe()):
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index 573085349df3..9bb793efc35f 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -774,10 +774,6 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
>   {
>          const struct device_node *np = smmu->dev->of_node;
>          const struct of_device_id *match;
> -       static u8 tbu_registered;
> -
> -       if (!tbu_registered++)
> -               platform_driver_register(&qcom_smmu_tbu_driver);
> 
>   #ifdef CONFIG_ACPI
>          if (np == NULL) {
> @@ -802,3 +798,5 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
> 
>          return smmu;
>   }
> +
> +builtin_platform_driver(qcom_smmu_tbu_driver);
> 
> @qcom maintainers: I'm aware of commit 0b4eeee2876f ("iommu/arm-smmu-qcom:
> Register the TBU driver in qcom_smmu_impl_init"), but I think the above patch
> should work fine as it is still *not only* registered when
> CONFIG_ARM_SMMU_QCOM_DEBUG?

In principle there should be nothing wrong with registering the driver 
unconditionally - that existing tbu_registered logic looks racy in the 
face of async_probe anyway - however I don't think the *_platform_driver 
macros will work here, as this all gets combined into arm_smmu.ko 
wherein ending up with multiple module_init declarations breaks the build.

(Please do double-check all the build permutations of ARM_SMMU, 
ARM_SMMU_QCOM and ARM_SMMU_QCOM_DEBUG)

Thanks,
Robin.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 3 days ago

On Wed Jan 21, 2026 at 2:03 PM CET, Robin Murphy wrote:
> On 2026-01-21 11:02 am, Danilo Krummrich wrote:
>> On Wed Jan 21, 2026 at 11:40 AM CET, Danilo Krummrich wrote:
>>> So, the problem is that in the callstack of the arm-smmu driver's (a platform
>>> driver) probe() function, the QCOM specific code (through arm_smmu_impl_init())
>>> registers another platform driver. Since we are still in probe() of arm-smmu the
>>> call to platform_driver_register() happens with the device lock of the arm-smmu
>>> platform device held.
>>>
>>> platform_driver_register() eventually results in driver_attach() which iterates
>>> over all the devices of a bus. Since the device we are probing and the driver we
>>> are registering are for the same bus (i.e. the platform bus) it can now happen
>>> that by chance that we also match the exact same device that is currently probed
>>> again. And since we take the device lock for matching now, we actually take the
>>> same lock twice.
>>>
>>> Now, we could avoid this by not matching bound devices, but we check this
>>> through dev->driver while holding the device lock, so that doesn't help.
>>>
>>> But on the other hand, I don't see any reason why a driver would call
>>> platform_driver_register() from probe() in the first place. I think drivers
>>> should not do that and instead just register the driver through a normal
>>> initcall.
>>>
>>> (If, however, it turns out that registering drivers from probe() is something we
>>> really need for some reason, it is probably best to drop the patch and don't
>>> make any guarantees about whether match() is called with the device lock held or
>>> not.
>>>
>>> Consequently, driver_override must be protected with a separate lock (which
>>> would be the cleaner solution in any case).)
>> 
>> I assume that this should resolve the problem (unless there are more drivers
>> that register drivers in probe()):
>> 
>> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>> index 573085349df3..9bb793efc35f 100644
>> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
>> @@ -774,10 +774,6 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
>>   {
>>          const struct device_node *np = smmu->dev->of_node;
>>          const struct of_device_id *match;
>> -       static u8 tbu_registered;
>> -
>> -       if (!tbu_registered++)
>> -               platform_driver_register(&qcom_smmu_tbu_driver);
>> 
>>   #ifdef CONFIG_ACPI
>>          if (np == NULL) {
>> @@ -802,3 +798,5 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
>> 
>>          return smmu;
>>   }
>> +
>> +builtin_platform_driver(qcom_smmu_tbu_driver);
>> 
>> @qcom maintainers: I'm aware of commit 0b4eeee2876f ("iommu/arm-smmu-qcom:
>> Register the TBU driver in qcom_smmu_impl_init"), but I think the above patch
>> should work fine as it is still *not only* registered when
>> CONFIG_ARM_SMMU_QCOM_DEBUG?
>
> In principle there should be nothing wrong with registering the driver 
> unconditionally - that existing tbu_registered logic looks racy in the 
> face of async_probe anyway - however I don't think the *_platform_driver 
> macros will work here, as this all gets combined into arm_smmu.ko 
> wherein ending up with multiple module_init declarations breaks the build.
>
> (Please do double-check all the build permutations of ARM_SMMU, 
> ARM_SMMU_QCOM and ARM_SMMU_QCOM_DEBUG)

Indeed, I accounted for this in the final patch I sent out, thanks!

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Mark Brown 2 weeks, 3 days ago

On Wed, Jan 21, 2026 at 12:02:15PM +0100, Danilo Krummrich wrote:

> I assume that this should resolve the problem (unless there are more drivers
> that register drivers in probe()):

This makes sense to me, I guess we can just fix any other instances as
they arise.  Will you send the patch?

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 3 days ago

On Wed Jan 21, 2026 at 1:49 PM CET, Mark Brown wrote:
> On Wed, Jan 21, 2026 at 12:02:15PM +0100, Danilo Krummrich wrote:
>
>> I assume that this should resolve the problem (unless there are more drivers
>> that register drivers in probe()):
>
> This makes sense to me, I guess we can just fix any other instances as
> they arise.  Will you send the patch?

Yes, I will send it soon.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Will Deacon 2 weeks, 3 days ago

On Wed, Jan 21, 2026 at 01:50:52PM +0100, Danilo Krummrich wrote:
> On Wed Jan 21, 2026 at 1:49 PM CET, Mark Brown wrote:
> > On Wed, Jan 21, 2026 at 12:02:15PM +0100, Danilo Krummrich wrote:
> >
> >> I assume that this should resolve the problem (unless there are more drivers
> >> that register drivers in probe()):
> >
> > This makes sense to me, I guess we can just fix any other instances as
> > they arise.  Will you send the patch?
> 
> Yes, I will send it soon.

Please make sure you cc Dmitry and Georgi when you do that. I'm worried
that you proposal means we run the probe code once per TBU, which looks
like it will break.

Will

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 2 weeks, 3 days ago

On Wed Jan 21, 2026 at 2:02 PM CET, Will Deacon wrote:
> On Wed, Jan 21, 2026 at 01:50:52PM +0100, Danilo Krummrich wrote:
>> On Wed Jan 21, 2026 at 1:49 PM CET, Mark Brown wrote:
>> > On Wed, Jan 21, 2026 at 12:02:15PM +0100, Danilo Krummrich wrote:
>> >
>> >> I assume that this should resolve the problem (unless there are more drivers
>> >> that register drivers in probe()):
>> >
>> > This makes sense to me, I guess we can just fix any other instances as
>> > they arise.  Will you send the patch?
>> 
>> Yes, I will send it soon.
>
> Please make sure you cc Dmitry and Georgi when you do that.

Sure!

> I'm worried that you proposal means we run the probe code once per TBU, which
> looks like it will break.

It shouldn't change anything in this regard. It only changes when the
qcom_smmu_tbu_driver is registered, initcall vs probe() of another driver.

- Danilo

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Greg KH 2 weeks, 3 days ago

On Wed, Jan 21, 2026 at 12:02:15PM +0100, Danilo Krummrich wrote:
> On Wed Jan 21, 2026 at 11:40 AM CET, Danilo Krummrich wrote:
> > So, the problem is that in the callstack of the arm-smmu driver's (a platform
> > driver) probe() function, the QCOM specific code (through arm_smmu_impl_init())
> > registers another platform driver. Since we are still in probe() of arm-smmu the
> > call to platform_driver_register() happens with the device lock of the arm-smmu
> > platform device held.
> >
> > platform_driver_register() eventually results in driver_attach() which iterates
> > over all the devices of a bus. Since the device we are probing and the driver we
> > are registering are for the same bus (i.e. the platform bus) it can now happen
> > that by chance that we also match the exact same device that is currently probed
> > again. And since we take the device lock for matching now, we actually take the
> > same lock twice.
> >
> > Now, we could avoid this by not matching bound devices, but we check this
> > through dev->driver while holding the device lock, so that doesn't help.
> >
> > But on the other hand, I don't see any reason why a driver would call
> > platform_driver_register() from probe() in the first place. I think drivers
> > should not do that and instead just register the driver through a normal
> > initcall.
> >
> > (If, however, it turns out that registering drivers from probe() is something we
> > really need for some reason, it is probably best to drop the patch and don't
> > make any guarantees about whether match() is called with the device lock held or
> > not.
> >
> > Consequently, driver_override must be protected with a separate lock (which
> > would be the cleaner solution in any case).)
> 
> I assume that this should resolve the problem (unless there are more drivers
> that register drivers in probe()):
> 
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> index 573085349df3..9bb793efc35f 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c
> @@ -774,10 +774,6 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
>  {
>         const struct device_node *np = smmu->dev->of_node;
>         const struct of_device_id *match;
> -       static u8 tbu_registered;
> -
> -       if (!tbu_registered++)
> -               platform_driver_register(&qcom_smmu_tbu_driver);

Ick, yeah, that should not be happening.  We should deadlock on that no
matter what.

> 
>  #ifdef CONFIG_ACPI
>         if (np == NULL) {
> @@ -802,3 +798,5 @@ struct arm_smmu_device *qcom_smmu_impl_init(struct arm_smmu_device *smmu)
> 
>         return smmu;
>  }
> +
> +builtin_platform_driver(qcom_smmu_tbu_driver);

change makes sense to me.

thanks,

greg k-h

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Gui-Dong Han 2 weeks, 3 days ago

On Wed, Jan 21, 2026 at 4:56 PM Wang Jiayue <akaieurus@gmail.com> wrote:
>
> > I'm seeing boot hangs on Arm Juno in next/pending-fixes which bisect to
> > this commit.  The boot grinds to a halt near the end of boot:
> >
> > [    2.570549] ledtrig-cpu: registered to indicate activity on CPUs
> > [    2.618301] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
> > [    2.623547] msm_serial: driver initialized
> > [    2.624058] SuperH (H)SCI(F) driver initialized
> > [    2.624312] STM32 USART driver initialized
>
> After partially modifying juno.dts, I managed to roughly emulate kernel
> boot on juno board with qemu and successfully reproduced the boot hang.
> Below is the gdb backtrace:

Great work, thank you very much!

This is a bit puzzling to me. Since __device_driver_lock(dev,
dev->parent) is called a few lines later in __driver_attach(), I
wonder if the original kernel would also deadlock if match() returns
true in such a nested scenario?

Thanks.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Mark Brown 2 weeks, 4 days ago

On Tue, Jan 20, 2026 at 09:30:53PM +0800, Gui-Dong Han wrote:
> On Tue, Jan 20, 2026 at 9:22 PM Mark Brown <broonie@kernel.org> wrote:

> > I'm seeing boot hangs on Arm Juno in next/pending-fixes which bisect to
> > this commit.  The boot grinds to a halt near the end of boot:

> I verified this on x86 without issues, but it seems I missed this
> regression on Arm platforms. I will investigate the cause of this hang
> immediately.

Thanks.  It's not all platforms - in particular none of the main
emulation platforms triggered the issue unfortunately.  It seems to be a
fairly limited subset of physical platforms that are impacted.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Gui-Dong Han 2 weeks, 4 days ago

On Tue, Jan 20, 2026 at 9:48 PM Mark Brown <broonie@kernel.org> wrote:
>
> On Tue, Jan 20, 2026 at 09:30:53PM +0800, Gui-Dong Han wrote:
> > On Tue, Jan 20, 2026 at 9:22 PM Mark Brown <broonie@kernel.org> wrote:
>
> > > I'm seeing boot hangs on Arm Juno in next/pending-fixes which bisect to
> > > this commit.  The boot grinds to a halt near the end of boot:
>
> > I verified this on x86 without issues, but it seems I missed this
> > regression on Arm platforms. I will investigate the cause of this hang
> > immediately.
>
> Thanks.  It's not all platforms - in particular none of the main
> emulation platforms triggered the issue unfortunately.  It seems to be a
> fairly limited subset of physical platforms that are impacted.

I am quite puzzled by this. My suspicion is a deadlock, possibly
involving PM/power domain operations within the match() callback
triggering a lock dependency.

However, since __device_attach_driver (the device-initiated binding
path) has always held device_lock while calling match(), it is strange
that this locking pattern causes issues now in the __driver_attach
(driver-initiated) path. The specific timing or device state during
late binding must be the key factor.

I do not have access to the affected hardware, but I will attempt to
reproduce it on QEMU Arm and perform a code audit of the relevant bus
match functions.

If anyone has insights into potential lock conflicts specific to this
path, please share.

Thanks.

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 3 weeks, 1 day ago

On Tue Jan 13, 2026 at 5:28 PM CET, Gui-Dong Han wrote:
> Currently, driver_match_device() is called from three sites. One site
> (__device_attach_driver) holds device_lock(dev), but the other two
> (bind_store and __driver_attach) do not. This inconsistency means that
> bus match() callbacks are not guaranteed to be called with the lock
> held.
>
> Fix this by introducing driver_match_device_locked(), which guarantees
> holding the device lock using a scoped guard. Replace the unlocked calls
> in bind_store() and __driver_attach() with this new helper. Also add a
> lock assertion to driver_match_device() to enforce this guarantee.
>
> This consistency also fixes a known race condition. The driver_override
> implementation relies on the device_lock, so the missing lock led to the
> use-after-free (UAF) reported in Bugzilla for buses using this field.
>
> Stress testing the two newly locked paths for 24 hours with
> CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP enabled showed no UAF recurrence
> and no lockdep warnings.
>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
> Suggested-by: Qiu-ji Chen <chenqiuji666@gmail.com>
> Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>

Applied to driver-core-linus, thanks!

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Danilo Krummrich 3 weeks, 4 days ago

On Tue Jan 13, 2026 at 5:28 PM CET, Gui-Dong Han wrote:
> Currently, driver_match_device() is called from three sites. One site
> (__device_attach_driver) holds device_lock(dev), but the other two
> (bind_store and __driver_attach) do not. This inconsistency means that
> bus match() callbacks are not guaranteed to be called with the lock
> held.
>
> Fix this by introducing driver_match_device_locked(), which guarantees
> holding the device lock using a scoped guard. Replace the unlocked calls
> in bind_store() and __driver_attach() with this new helper. Also add a
> lock assertion to driver_match_device() to enforce this guarantee.
>
> This consistency also fixes a known race condition. The driver_override
> implementation relies on the device_lock, so the missing lock led to the
> use-after-free (UAF) reported in Bugzilla for buses using this field.
>
> Stress testing the two newly locked paths for 24 hours with
> CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP enabled showed no UAF recurrence
> and no lockdep warnings.
>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789

Fixes: 49b420a13ff9 ("driver core: check bus->match without holding device lock")

> Suggested-by: Qiu-ji Chen <chenqiuji666@gmail.com>
> Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>

This looks good now!

Reviewed-by: Danilo Krummrich <dakr@kernel.org>

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Gui-Dong Han 3 weeks, 1 day ago

On Wed, Jan 14, 2026 at 3:23 AM Danilo Krummrich <dakr@kernel.org> wrote:
>
> On Tue Jan 13, 2026 at 5:28 PM CET, Gui-Dong Han wrote:
> > Currently, driver_match_device() is called from three sites. One site
> > (__device_attach_driver) holds device_lock(dev), but the other two
> > (bind_store and __driver_attach) do not. This inconsistency means that
> > bus match() callbacks are not guaranteed to be called with the lock
> > held.
> >
> > Fix this by introducing driver_match_device_locked(), which guarantees
> > holding the device lock using a scoped guard. Replace the unlocked calls
> > in bind_store() and __driver_attach() with this new helper. Also add a
> > lock assertion to driver_match_device() to enforce this guarantee.
> >
> > This consistency also fixes a known race condition. The driver_override
> > implementation relies on the device_lock, so the missing lock led to the
> > use-after-free (UAF) reported in Bugzilla for buses using this field.
> >
> > Stress testing the two newly locked paths for 24 hours with
> > CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP enabled showed no UAF recurrence
> > and no lockdep warnings.
> >
> > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
>
> Fixes: 49b420a13ff9 ("driver core: check bus->match without holding device lock")

Thanks for the review! The Fixes tag looks correct as the
inconsistency dates back to that commit.

Since a Fixes tag is present, I recall the patch bot often warns when
a Fixes tag is provided without a corresponding Cc:
stable@vger.kernel.org tag. Perhaps we should include it? This would
also allow us to fix the UAF in older kernels.

If you agree, could you please add it when picking up the patch?

Thank you very much!

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Greg KH 3 weeks, 1 day ago

On Fri, Jan 16, 2026 at 03:34:53PM +0800, Gui-Dong Han wrote:
> On Wed, Jan 14, 2026 at 3:23 AM Danilo Krummrich <dakr@kernel.org> wrote:
> >
> > On Tue Jan 13, 2026 at 5:28 PM CET, Gui-Dong Han wrote:
> > > Currently, driver_match_device() is called from three sites. One site
> > > (__device_attach_driver) holds device_lock(dev), but the other two
> > > (bind_store and __driver_attach) do not. This inconsistency means that
> > > bus match() callbacks are not guaranteed to be called with the lock
> > > held.
> > >
> > > Fix this by introducing driver_match_device_locked(), which guarantees
> > > holding the device lock using a scoped guard. Replace the unlocked calls
> > > in bind_store() and __driver_attach() with this new helper. Also add a
> > > lock assertion to driver_match_device() to enforce this guarantee.
> > >
> > > This consistency also fixes a known race condition. The driver_override
> > > implementation relies on the device_lock, so the missing lock led to the
> > > use-after-free (UAF) reported in Bugzilla for buses using this field.
> > >
> > > Stress testing the two newly locked paths for 24 hours with
> > > CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP enabled showed no UAF recurrence
> > > and no lockdep warnings.
> > >
> > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
> >
> > Fixes: 49b420a13ff9 ("driver core: check bus->match without holding device lock")
> 
> Thanks for the review! The Fixes tag looks correct as the
> inconsistency dates back to that commit.
> 
> Since a Fixes tag is present, I recall the patch bot often warns when
> a Fixes tag is provided without a corresponding Cc:
> stable@vger.kernel.org tag. Perhaps we should include it? This would
> also allow us to fix the UAF in older kernels.

Can you reproduce this UAF?  If so, yes, otherwise fixing for
theoretical things isn't good to backport.

Anyway, looks good to me:

Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Gui-Dong Han 3 weeks, 1 day ago

On Fri, Jan 16, 2026 at 7:19 PM Greg KH <gregkh@linuxfoundation.org> wrote:
>
> On Fri, Jan 16, 2026 at 03:34:53PM +0800, Gui-Dong Han wrote:
> > On Wed, Jan 14, 2026 at 3:23 AM Danilo Krummrich <dakr@kernel.org> wrote:
> > >
> > > On Tue Jan 13, 2026 at 5:28 PM CET, Gui-Dong Han wrote:
> > > > Currently, driver_match_device() is called from three sites. One site
> > > > (__device_attach_driver) holds device_lock(dev), but the other two
> > > > (bind_store and __driver_attach) do not. This inconsistency means that
> > > > bus match() callbacks are not guaranteed to be called with the lock
> > > > held.
> > > >
> > > > Fix this by introducing driver_match_device_locked(), which guarantees
> > > > holding the device lock using a scoped guard. Replace the unlocked calls
> > > > in bind_store() and __driver_attach() with this new helper. Also add a
> > > > lock assertion to driver_match_device() to enforce this guarantee.
> > > >
> > > > This consistency also fixes a known race condition. The driver_override
> > > > implementation relies on the device_lock, so the missing lock led to the
> > > > use-after-free (UAF) reported in Bugzilla for buses using this field.
> > > >
> > > > Stress testing the two newly locked paths for 24 hours with
> > > > CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP enabled showed no UAF recurrence
> > > > and no lockdep warnings.
> > > >
> > > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
> > >
> > > Fixes: 49b420a13ff9 ("driver core: check bus->match without holding device lock")
> >
> > Thanks for the review! The Fixes tag looks correct as the
> > inconsistency dates back to that commit.
> >
> > Since a Fixes tag is present, I recall the patch bot often warns when
> > a Fixes tag is provided without a corresponding Cc:
> > stable@vger.kernel.org tag. Perhaps we should include it? This would
> > also allow us to fix the UAF in older kernels.
>
> Can you reproduce this UAF?  If so, yes, otherwise fixing for
> theoretical things isn't good to backport.

Yes, it is reproducible. The Bugzilla entry linked in the Closes tag
contains full KASAN reports and PoCs that reliably reproduce the UAF.
(This was also noted in the v1 changelog).

Thanks for the review!

Re: [PATCH v5] driver core: enforce device_lock for driver_match_device()

Posted by Rafael J. Wysocki 3 weeks, 4 days ago

On Tue, Jan 13, 2026 at 5:29 PM Gui-Dong Han <hanguidong02@gmail.com> wrote:
>
> Currently, driver_match_device() is called from three sites. One site
> (__device_attach_driver) holds device_lock(dev), but the other two
> (bind_store and __driver_attach) do not. This inconsistency means that
> bus match() callbacks are not guaranteed to be called with the lock
> held.
>
> Fix this by introducing driver_match_device_locked(), which guarantees
> holding the device lock using a scoped guard. Replace the unlocked calls
> in bind_store() and __driver_attach() with this new helper. Also add a
> lock assertion to driver_match_device() to enforce this guarantee.
>
> This consistency also fixes a known race condition. The driver_override
> implementation relies on the device_lock, so the missing lock led to the
> use-after-free (UAF) reported in Bugzilla for buses using this field.
>
> Stress testing the two newly locked paths for 24 hours with
> CONFIG_PROVE_LOCKING and CONFIG_LOCKDEP enabled showed no UAF recurrence
> and no lockdep warnings.
>
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
> Suggested-by: Qiu-ji Chen <chenqiuji666@gmail.com>
> Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>

Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org>

> ---
> v5:
> * Introduce driver_match_device_locked() helper using guard(device) to
> handle locking, as suggested by Rafael J. Wysocki.
> v4:
> * Remove the misleading comment above device_lock_assert(), and update
> subject and commit message to focus on enforcing consistent locking,
> as discussed with Danilo Krummrich.
> v3:
> * Remove redundant locking comments at call sites and add a blank line
> after the lock assertion in driver_match_device(), as suggested by Greg KH.
> v2:
> * Add device_lock_assert() in driver_match_device() to enforce locking
> requirement, as suggested by Greg KH.
> v1:
> * The Bugzilla entry contains full KASAN reports and two PoCs that reliably
> reproduce the UAF on both unlocked paths using a standard QEMU setup
> (default e1000 device at 0000:00:03.0).
> ---
>  drivers/base/base.h | 9 +++++++++
>  drivers/base/bus.c  | 2 +-
>  drivers/base/dd.c   | 2 +-
>  3 files changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/base/base.h b/drivers/base/base.h
> index 430cbefbc97f..677320881af1 100644
> --- a/drivers/base/base.h
> +++ b/drivers/base/base.h
> @@ -182,9 +182,18 @@ void device_set_deferred_probe_reason(const struct device *dev, struct va_format
>  static inline int driver_match_device(const struct device_driver *drv,
>                                       struct device *dev)
>  {
> +       device_lock_assert(dev);
> +
>         return drv->bus->match ? drv->bus->match(dev, drv) : 1;
>  }
>
> +static inline int driver_match_device_locked(const struct device_driver *drv,
> +                                            struct device *dev)
> +{
> +       guard(device)(dev);
> +       return driver_match_device(drv, dev);
> +}
> +
>  static inline void dev_sync_state(struct device *dev)
>  {
>         if (dev->bus->sync_state)
> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 9eb7771706f0..331d750465e2 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -263,7 +263,7 @@ static ssize_t bind_store(struct device_driver *drv, const char *buf,
>         int err = -ENODEV;
>
>         dev = bus_find_device_by_name(bus, NULL, buf);
> -       if (dev && driver_match_device(drv, dev)) {
> +       if (dev && driver_match_device_locked(drv, dev)) {
>                 err = device_driver_attach(drv, dev);
>                 if (!err) {
>                         /* success */
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 349f31bedfa1..98feb4c77160 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -1178,7 +1178,7 @@ static int __driver_attach(struct device *dev, void *data)
>          * is an error.
>          */
>
> -       ret = driver_match_device(drv, dev);
> +       ret = driver_match_device_locked(drv, dev);
>         if (ret == 0) {
>                 /* no match */
>                 return 0;
> --
> 2.43.0
>