drivers/base/core.c | 13 ++++++++----- drivers/base/module.c | 4 ++++ 2 files changed, 12 insertions(+), 5 deletions(-)
uevent_show() wants to de-reference dev->driver->name. There is no clean
way for a device attribute to de-reference dev->driver unless that
attribute is defined via (struct device_driver).dev_groups. Instead, the
anti-pattern of taking the device_lock() in the attribute handler risks
deadlocks with code paths that remove device attributes while holding
the lock.
This deadlock is typically invisible to lockdep given the device_lock()
is marked lockdep_set_novalidate_class(), but some subsystems allocate a
local lockdep key for @dev->mutex to reveal reports of the form:
======================================================
WARNING: possible circular locking dependency detected
6.10.0-rc7+ #275 Tainted: G OE N
------------------------------------------------------
modprobe/2374 is trying to acquire lock:
ffff8c2270070de0 (kn->active#6){++++}-{0:0}, at: __kernfs_remove+0xde/0x220
but task is already holding lock:
ffff8c22016e88f8 (&cxl_root_key){+.+.}-{3:3}, at: device_release_driver_internal+0x39/0x210
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&cxl_root_key){+.+.}-{3:3}:
__mutex_lock+0x99/0xc30
uevent_show+0xac/0x130
dev_attr_show+0x18/0x40
sysfs_kf_seq_show+0xac/0xf0
seq_read_iter+0x110/0x450
vfs_read+0x25b/0x340
ksys_read+0x67/0xf0
do_syscall_64+0x75/0x190
entry_SYSCALL_64_after_hwframe+0x76/0x7e
-> #0 (kn->active#6){++++}-{0:0}:
__lock_acquire+0x121a/0x1fa0
lock_acquire+0xd6/0x2e0
kernfs_drain+0x1e9/0x200
__kernfs_remove+0xde/0x220
kernfs_remove_by_name_ns+0x5e/0xa0
device_del+0x168/0x410
device_unregister+0x13/0x60
devres_release_all+0xb8/0x110
device_unbind_cleanup+0xe/0x70
device_release_driver_internal+0x1c7/0x210
driver_detach+0x47/0x90
bus_remove_driver+0x6c/0xf0
cxl_acpi_exit+0xc/0x11 [cxl_acpi]
__do_sys_delete_module.isra.0+0x181/0x260
do_syscall_64+0x75/0x190
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The observation though is that driver objects are typically much longer
lived than device objects. It is reasonable to perform lockless
de-reference of a @driver pointer even if it is racing detach from a
device. Given the infrequency of driver unregistration, use
synchronize_rcu() in module_remove_driver() to close any potential
races. It is potentially overkill to suffer synchronize_rcu() just to
handle the rare module removal racing uevent_show() event.
Thanks to Tetsuo Handa for the debug analysis of the syzbot report [1].
Fixes: c0a40097f0bc ("drivers: core: synchronize really_probe() and dev_uevent()")
Reported-by: syzbot+4762dd74e32532cda5ff@syzkaller.appspotmail.com
Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Closes: http://lore.kernel.org/5aa5558f-90a4-4864-b1b1-5d6784c5607d@I-love.SAKURA.ne.jp [1]
Link: http://lore.kernel.org/669073b8ea479_5fffa294c1@dwillia2-xfh.jf.intel.com.notmuch
Cc: stable@vger.kernel.org
Cc: Ashish Sangwan <a.sangwan@samsung.com>
Cc: Namjae Jeon <namjae.jeon@samsung.com>
Cc: Dirk Behme <dirk.behme@de.bosch.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
---
drivers/base/core.c | 13 ++++++++-----
drivers/base/module.c | 4 ++++
2 files changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 2b4c0624b704..b5399262198a 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -25,6 +25,7 @@
#include <linux/mutex.h>
#include <linux/pm_runtime.h>
#include <linux/netdevice.h>
+#include <linux/rcupdate.h>
#include <linux/sched/signal.h>
#include <linux/sched/mm.h>
#include <linux/string_helpers.h>
@@ -2640,6 +2641,7 @@ static const char *dev_uevent_name(const struct kobject *kobj)
static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
{
const struct device *dev = kobj_to_dev(kobj);
+ struct device_driver *driver;
int retval = 0;
/* add device node properties if present */
@@ -2668,8 +2670,12 @@ static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
if (dev->type && dev->type->name)
add_uevent_var(env, "DEVTYPE=%s", dev->type->name);
- if (dev->driver)
- add_uevent_var(env, "DRIVER=%s", dev->driver->name);
+ /* Synchronize with module_remove_driver() */
+ rcu_read_lock();
+ driver = READ_ONCE(dev->driver);
+ if (driver)
+ add_uevent_var(env, "DRIVER=%s", driver->name);
+ rcu_read_unlock();
/* Add common DT information about the device */
of_device_uevent(dev, env);
@@ -2739,11 +2745,8 @@ static ssize_t uevent_show(struct device *dev, struct device_attribute *attr,
if (!env)
return -ENOMEM;
- /* Synchronize with really_probe() */
- device_lock(dev);
/* let the kset specific function add its keys */
retval = kset->uevent_ops->uevent(&dev->kobj, env);
- device_unlock(dev);
if (retval)
goto out;
diff --git a/drivers/base/module.c b/drivers/base/module.c
index a1b55da07127..b0b79b9c189d 100644
--- a/drivers/base/module.c
+++ b/drivers/base/module.c
@@ -7,6 +7,7 @@
#include <linux/errno.h>
#include <linux/slab.h>
#include <linux/string.h>
+#include <linux/rcupdate.h>
#include "base.h"
static char *make_driver_name(struct device_driver *drv)
@@ -97,6 +98,9 @@ void module_remove_driver(struct device_driver *drv)
if (!drv)
return;
+ /* Synchronize with dev_uevent() */
+ synchronize_rcu();
+
sysfs_remove_link(&drv->p->kobj, "module");
if (drv->owner)
On 12.07.2024 21:42, Dan Williams wrote:
> uevent_show() wants to de-reference dev->driver->name. There is no clean
> way for a device attribute to de-reference dev->driver unless that
> attribute is defined via (struct device_driver).dev_groups. Instead, the
> anti-pattern of taking the device_lock() in the attribute handler risks
> deadlocks with code paths that remove device attributes while holding
> the lock.
>
> This deadlock is typically invisible to lockdep given the device_lock()
> is marked lockdep_set_novalidate_class(), but some subsystems allocate a
> local lockdep key for @dev->mutex to reveal reports of the form:
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.10.0-rc7+ #275 Tainted: G OE N
> ------------------------------------------------------
> modprobe/2374 is trying to acquire lock:
> ffff8c2270070de0 (kn->active#6){++++}-{0:0}, at: __kernfs_remove+0xde/0x220
>
> but task is already holding lock:
> ffff8c22016e88f8 (&cxl_root_key){+.+.}-{3:3}, at: device_release_driver_internal+0x39/0x210
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&cxl_root_key){+.+.}-{3:3}:
> __mutex_lock+0x99/0xc30
> uevent_show+0xac/0x130
> dev_attr_show+0x18/0x40
> sysfs_kf_seq_show+0xac/0xf0
> seq_read_iter+0x110/0x450
> vfs_read+0x25b/0x340
> ksys_read+0x67/0xf0
> do_syscall_64+0x75/0x190
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> -> #0 (kn->active#6){++++}-{0:0}:
> __lock_acquire+0x121a/0x1fa0
> lock_acquire+0xd6/0x2e0
> kernfs_drain+0x1e9/0x200
> __kernfs_remove+0xde/0x220
> kernfs_remove_by_name_ns+0x5e/0xa0
> device_del+0x168/0x410
> device_unregister+0x13/0x60
> devres_release_all+0xb8/0x110
> device_unbind_cleanup+0xe/0x70
> device_release_driver_internal+0x1c7/0x210
> driver_detach+0x47/0x90
> bus_remove_driver+0x6c/0xf0
> cxl_acpi_exit+0xc/0x11 [cxl_acpi]
> __do_sys_delete_module.isra.0+0x181/0x260
> do_syscall_64+0x75/0x190
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> The observation though is that driver objects are typically much longer
> lived than device objects. It is reasonable to perform lockless
> de-reference of a @driver pointer even if it is racing detach from a
> device. Given the infrequency of driver unregistration, use
> synchronize_rcu() in module_remove_driver() to close any potential
> races. It is potentially overkill to suffer synchronize_rcu() just to
> handle the rare module removal racing uevent_show() event.
>
> Thanks to Tetsuo Handa for the debug analysis of the syzbot report [1].
>
> Fixes: c0a40097f0bc ("drivers: core: synchronize really_probe() and dev_uevent()")
> Reported-by: syzbot+4762dd74e32532cda5ff@syzkaller.appspotmail.com
> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Closes: http://lore.kernel.org/5aa5558f-90a4-4864-b1b1-5d6784c5607d@I-love.SAKURA.ne.jp [1]
> Link: http://lore.kernel.org/669073b8ea479_5fffa294c1@dwillia2-xfh.jf.intel.com.notmuch
> Cc: stable@vger.kernel.org
> Cc: Ashish Sangwan <a.sangwan@samsung.com>
> Cc: Namjae Jeon <namjae.jeon@samsung.com>
> Cc: Dirk Behme <dirk.behme@de.bosch.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Many thanks for this fix! Looks good to me:
Acked-by: Dirk Behme <dirk.behme@de.bosch.com>
Dirk
> ---
> drivers/base/core.c | 13 ++++++++-----
> drivers/base/module.c | 4 ++++
> 2 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 2b4c0624b704..b5399262198a 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -25,6 +25,7 @@
> #include <linux/mutex.h>
> #include <linux/pm_runtime.h>
> #include <linux/netdevice.h>
> +#include <linux/rcupdate.h>
> #include <linux/sched/signal.h>
> #include <linux/sched/mm.h>
> #include <linux/string_helpers.h>
> @@ -2640,6 +2641,7 @@ static const char *dev_uevent_name(const struct kobject *kobj)
> static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
> {
> const struct device *dev = kobj_to_dev(kobj);
> + struct device_driver *driver;
> int retval = 0;
>
> /* add device node properties if present */
> @@ -2668,8 +2670,12 @@ static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
> if (dev->type && dev->type->name)
> add_uevent_var(env, "DEVTYPE=%s", dev->type->name);
>
> - if (dev->driver)
> - add_uevent_var(env, "DRIVER=%s", dev->driver->name);
> + /* Synchronize with module_remove_driver() */
> + rcu_read_lock();
> + driver = READ_ONCE(dev->driver);
> + if (driver)
> + add_uevent_var(env, "DRIVER=%s", driver->name);
> + rcu_read_unlock();
>
> /* Add common DT information about the device */
> of_device_uevent(dev, env);
> @@ -2739,11 +2745,8 @@ static ssize_t uevent_show(struct device *dev, struct device_attribute *attr,
> if (!env)
> return -ENOMEM;
>
> - /* Synchronize with really_probe() */
> - device_lock(dev);
> /* let the kset specific function add its keys */
> retval = kset->uevent_ops->uevent(&dev->kobj, env);
> - device_unlock(dev);
> if (retval)
> goto out;
>
> diff --git a/drivers/base/module.c b/drivers/base/module.c
> index a1b55da07127..b0b79b9c189d 100644
> --- a/drivers/base/module.c
> +++ b/drivers/base/module.c
> @@ -7,6 +7,7 @@
> #include <linux/errno.h>
> #include <linux/slab.h>
> #include <linux/string.h>
> +#include <linux/rcupdate.h>
> #include "base.h"
>
> static char *make_driver_name(struct device_driver *drv)
> @@ -97,6 +98,9 @@ void module_remove_driver(struct device_driver *drv)
> if (!drv)
> return;
>
> + /* Synchronize with dev_uevent() */
> + synchronize_rcu();
> +
> sysfs_remove_link(&drv->p->kobj, "module");
>
> if (drv->owner)
>
>
--
======================================================================
Dirk Behme Robert Bosch Car Multimedia GmbH
CM/ESO2
Phone: +49 5121 49-3274 Dirk Behme
Fax: +49 711 811 5053274 PO Box 77 77 77
mailto:dirk.behme@de.bosch.com D-31132 Hildesheim - Germany
Bosch Group, Car Multimedia (CM)
Engineering SW Operating Systems 2 (ESO2)
Robert Bosch Car Multimedia GmbH - Ein Unternehmen der Bosch Gruppe
Sitz: Hildesheim
Registergericht: Amtsgericht Hildesheim HRB 201334
Aufsichtsratsvorsitzender: Dr. Dirk Hoheisel
Geschäftsführung: Dr. Steffen Berns;
Dr. Sven Ost, Jörg Pollak, Dr. Walter Schirm
======================================================================
On Fri, Jul 12, 2024 at 12:42:09PM -0700, Dan Williams wrote:
> uevent_show() wants to de-reference dev->driver->name. There is no clean
> way for a device attribute to de-reference dev->driver unless that
> attribute is defined via (struct device_driver).dev_groups. Instead, the
> anti-pattern of taking the device_lock() in the attribute handler risks
> deadlocks with code paths that remove device attributes while holding
> the lock.
>
> This deadlock is typically invisible to lockdep given the device_lock()
> is marked lockdep_set_novalidate_class(), but some subsystems allocate a
> local lockdep key for @dev->mutex to reveal reports of the form:
>
> ======================================================
> WARNING: possible circular locking dependency detected
> 6.10.0-rc7+ #275 Tainted: G OE N
> ------------------------------------------------------
> modprobe/2374 is trying to acquire lock:
> ffff8c2270070de0 (kn->active#6){++++}-{0:0}, at: __kernfs_remove+0xde/0x220
>
> but task is already holding lock:
> ffff8c22016e88f8 (&cxl_root_key){+.+.}-{3:3}, at: device_release_driver_internal+0x39/0x210
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #1 (&cxl_root_key){+.+.}-{3:3}:
> __mutex_lock+0x99/0xc30
> uevent_show+0xac/0x130
> dev_attr_show+0x18/0x40
> sysfs_kf_seq_show+0xac/0xf0
> seq_read_iter+0x110/0x450
> vfs_read+0x25b/0x340
> ksys_read+0x67/0xf0
> do_syscall_64+0x75/0x190
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> -> #0 (kn->active#6){++++}-{0:0}:
> __lock_acquire+0x121a/0x1fa0
> lock_acquire+0xd6/0x2e0
> kernfs_drain+0x1e9/0x200
> __kernfs_remove+0xde/0x220
> kernfs_remove_by_name_ns+0x5e/0xa0
> device_del+0x168/0x410
> device_unregister+0x13/0x60
> devres_release_all+0xb8/0x110
> device_unbind_cleanup+0xe/0x70
> device_release_driver_internal+0x1c7/0x210
> driver_detach+0x47/0x90
> bus_remove_driver+0x6c/0xf0
> cxl_acpi_exit+0xc/0x11 [cxl_acpi]
> __do_sys_delete_module.isra.0+0x181/0x260
> do_syscall_64+0x75/0x190
> entry_SYSCALL_64_after_hwframe+0x76/0x7e
>
> The observation though is that driver objects are typically much longer
> lived than device objects. It is reasonable to perform lockless
> de-reference of a @driver pointer even if it is racing detach from a
> device. Given the infrequency of driver unregistration, use
> synchronize_rcu() in module_remove_driver() to close any potential
> races. It is potentially overkill to suffer synchronize_rcu() just to
> handle the rare module removal racing uevent_show() event.
>
> Thanks to Tetsuo Handa for the debug analysis of the syzbot report [1].
>
> Fixes: c0a40097f0bc ("drivers: core: synchronize really_probe() and dev_uevent()")
> Reported-by: syzbot+4762dd74e32532cda5ff@syzkaller.appspotmail.com
> Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Closes: http://lore.kernel.org/5aa5558f-90a4-4864-b1b1-5d6784c5607d@I-love.SAKURA.ne.jp [1]
> Link: http://lore.kernel.org/669073b8ea479_5fffa294c1@dwillia2-xfh.jf.intel.com.notmuch
> Cc: stable@vger.kernel.org
> Cc: Ashish Sangwan <a.sangwan@samsung.com>
> Cc: Namjae Jeon <namjae.jeon@samsung.com>
> Cc: Dirk Behme <dirk.behme@de.bosch.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: "Rafael J. Wysocki" <rafael@kernel.org>
> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
> ---
> drivers/base/core.c | 13 ++++++++-----
> drivers/base/module.c | 4 ++++
> 2 files changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 2b4c0624b704..b5399262198a 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -25,6 +25,7 @@
> #include <linux/mutex.h>
> #include <linux/pm_runtime.h>
> #include <linux/netdevice.h>
> +#include <linux/rcupdate.h>
> #include <linux/sched/signal.h>
> #include <linux/sched/mm.h>
> #include <linux/string_helpers.h>
> @@ -2640,6 +2641,7 @@ static const char *dev_uevent_name(const struct kobject *kobj)
> static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
> {
> const struct device *dev = kobj_to_dev(kobj);
> + struct device_driver *driver;
> int retval = 0;
>
> /* add device node properties if present */
> @@ -2668,8 +2670,12 @@ static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
> if (dev->type && dev->type->name)
> add_uevent_var(env, "DEVTYPE=%s", dev->type->name);
>
> - if (dev->driver)
> - add_uevent_var(env, "DRIVER=%s", dev->driver->name);
> + /* Synchronize with module_remove_driver() */
> + rcu_read_lock();
> + driver = READ_ONCE(dev->driver);
> + if (driver)
> + add_uevent_var(env, "DRIVER=%s", driver->name);
> + rcu_read_unlock();
It's a lot of work for a "simple" thing of just "tell userspace what
happened" type of thing. But it makes sense.
I'll queue this up after -rc1 is out, there's no rush before then as
it's only lockdep warnings happening now, right?
thanks,
greg k-h
Greg KH wrote: [..] > It's a lot of work for a "simple" thing of just "tell userspace what > happened" type of thing. But it makes sense. > > I'll queue this up after -rc1 is out, there's no rush before then as > it's only lockdep warnings happening now, right? Right, only theoretical lockup reports so far, no rush that I can see.
On 2024/07/13 4:42, Dan Williams wrote: > @@ -2668,8 +2670,12 @@ static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env) > if (dev->type && dev->type->name) > add_uevent_var(env, "DEVTYPE=%s", dev->type->name); > > - if (dev->driver) > - add_uevent_var(env, "DRIVER=%s", dev->driver->name); > + /* Synchronize with module_remove_driver() */ > + rcu_read_lock(); > + driver = READ_ONCE(dev->driver); > + if (driver) > + add_uevent_var(env, "DRIVER=%s", driver->name); > + rcu_read_unlock(); > Given that read of dev->driver is protected using RCU, > @@ -97,6 +98,9 @@ void module_remove_driver(struct device_driver *drv) > if (!drv) > return; > where is dev->driver = NULL; performed prior to > + /* Synchronize with dev_uevent() */ > + synchronize_rcu(); > + this synchronize_rcu(), in order to make sure that READ_ONCE(dev->driver) in dev_uevent() observes NULL? > sysfs_remove_link(&drv->p->kobj, "module"); > > if (drv->owner) >
Tetsuo Handa wrote: > On 2024/07/13 4:42, Dan Williams wrote: > > @@ -2668,8 +2670,12 @@ static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env) > > if (dev->type && dev->type->name) > > add_uevent_var(env, "DEVTYPE=%s", dev->type->name); > > > > - if (dev->driver) > > - add_uevent_var(env, "DRIVER=%s", dev->driver->name); > > + /* Synchronize with module_remove_driver() */ > > + rcu_read_lock(); > > + driver = READ_ONCE(dev->driver); > > + if (driver) > > + add_uevent_var(env, "DRIVER=%s", driver->name); > > + rcu_read_unlock(); > > > > Given that read of dev->driver is protected using RCU, > > > @@ -97,6 +98,9 @@ void module_remove_driver(struct device_driver *drv) > > if (!drv) > > return; > > > > where is > > dev->driver = NULL; > > performed prior to It happens in __device_release_driver() and several places in the driver probe failure path. However, the point of this patch is that the "dev->driver = NULL" event does not really matter for this sysfs attribute. This attribute just wants to opportunistically report the driver name to userspace, but that result is ephemeral. I.e. as soon as a dev_uevent() adds a DRIVER environment variable that result could be immediately invalidated before userspace has a chance to do anything with the result. Even with the current device_lock() solution userspace can not depend on the driver still being attached when it goes to act on the DRIVER environment variable. > > + /* Synchronize with dev_uevent() */ > > + synchronize_rcu(); > > + > > this synchronize_rcu(), in order to make sure that > READ_ONCE(dev->driver) in dev_uevent() observes NULL? No, this synchronize_rcu() is to make sure that if dev_uevent() wins the race and observes that dev->driver is not NULL that it is still safe to dereference that result because the 'struct device_driver' object is still live. A 'struct device_driver' instance is typically static data in a kernel module that does not get freed until after driver_unregister(). Calls to driver_unregister() typically only happen at module removal time. So this synchronize_rcu() delays module removal until dev_uevent() finishes reading driver->name.
On 2024/07/13 8:49, Dan Williams wrote:
>>> + /* Synchronize with dev_uevent() */
>>> + synchronize_rcu();
>>> +
>>
>> this synchronize_rcu(), in order to make sure that
>> READ_ONCE(dev->driver) in dev_uevent() observes NULL?
>
> No, this synchronize_rcu() is to make sure that if dev_uevent() wins the
> race and observes that dev->driver is not NULL that it is still safe to
> dereference that result because the 'struct device_driver' object is
> still live.
I can't catch what the pair of rcu_read_lock()/rcu_read_unlock() in dev_uevent()
and synchronize_rcu() in module_remove_driver() is for.
I think that the below race is possible.
Please explain how "/* Synchronize with module_remove_driver() */" works.
Thread1: Thread2:
static int dev_uevent(const struct kobject *kobj, struct kobj_uevent_env *env)
{
const struct device *dev = kobj_to_dev(kobj);
struct device_driver *driver;
int retval = 0;
(...snipped...)
if (dev->type && dev->type->name)
add_uevent_var(env, "DEVTYPE=%s", dev->type->name);
void module_remove_driver(struct device_driver *drv)
{
struct module_kobject *mk = NULL;
char *driver_name;
if (!drv)
return;
/* Synchronize with dev_uevent() */
synchronize_rcu(); // <= This can be no-op because rcu_read_lock() in dev_uevent() is not yet called.
// <= At this moment Thread1 can sleep for arbitrary duration due to preemption, can't it?
/* Synchronize with module_remove_driver() */
rcu_read_lock(); // <= What does this RCU want to synchronize with?
sysfs_remove_link(&drv->p->kobj, "module");
if (drv->owner)
mk = &drv->owner->mkobj;
else if (drv->p->mkobj)
mk = drv->p->mkobj;
if (mk && mk->drivers_dir) {
driver_name = make_driver_name(drv);
if (driver_name) {
sysfs_remove_link(mk->drivers_dir, driver_name);
kfree(driver_name);
}
}
}
driver = READ_ONCE(dev->driver); // <= module_remove_driver() can be already completed even with RCU protection, can't it?
if (driver)
add_uevent_var(env, "DRIVER=%s", driver->name);
rcu_read_unlock();
/* Add common DT information about the device */
of_device_uevent(dev, env);
(..snipped...)
}
Tetsuo Handa wrote:
> On 2024/07/13 8:49, Dan Williams wrote:
> >>> + /* Synchronize with dev_uevent() */
> >>> + synchronize_rcu();
> >>> +
> >>
> >> this synchronize_rcu(), in order to make sure that
> >> READ_ONCE(dev->driver) in dev_uevent() observes NULL?
> >
> > No, this synchronize_rcu() is to make sure that if dev_uevent() wins the
> > race and observes that dev->driver is not NULL that it is still safe to
> > dereference that result because the 'struct device_driver' object is
> > still live.
>
> I can't catch what the pair of rcu_read_lock()/rcu_read_unlock() in dev_uevent()
> and synchronize_rcu() in module_remove_driver() is for.
It is to extend the lifetime of @driver if dev_uevent() observes
non-NULL @dev->driver.
> I think that the below race is possible.
> Please explain how "/* Synchronize with module_remove_driver() */" works.
It is for this race:
Thread1: Thread2:
dev_uevent(...) delete_module()
driver = dev->driver; mod->exit()
if (driver) driver_unregister()
driver_detach() // <-- @dev->driver marked NULL
module_remove_driver()
free_module() // <-- @driver object destroyed
add_uevent_var(env, "DRIVER=%s", driver->name); // <-- use after free of @driver
If driver_detach() happens before Thread1 reads dev->driver then there
is no use after free risk.
The previous attempt to fix this held the device_lock() over
dev_uevent() which prevents driver_detach() from even starting, but that
causes lockdep issues and is even more heavy-handed than the
synchronize_rcu() delay. RCU makes sure that @driver stays alive between
reading @dev->driver and reading @driver->name.
© 2016 - 2025 Red Hat, Inc.