On Wed, Feb 23, 2022 at 05:00:12PM +0100, gregkh@linuxfoundation.org wrote:
> On Wed, Feb 23, 2022 at 09:38:20AM -0500, stern@rowland.harvard.edu wrote:
> > Which bus locks are you referring to? I'm not aware of any locks
> > that synchronize dev_uevent() with anything (in particular, with
> > driver unbinding).
>
> The locks in the driver core that handle the binding and unbinding of
> drivers to devices.
>
> > And as far as I know, usb_gadget_remove_driver() doesn't play any
> > odd tricks with pointers.
>
> Ah, I never noticed that this is doing a "fake" bus and does the
> bind/unbind itself outside of the driver core. It should just be a
> normal bus type and have the core do the work for it, but oh well.
>
> And there is a lock that should serialize all of this already, so it's
> odd that this is able to be triggered at all.
>>I guess at a minimum the UDC core should hold the device lock when it registers, unregisters, binds, or unbinds UDC and gadget devices.
>>Would that be enough to fix the problem? I really don't understand how sysfs file access gets synchronized with device removal.
>>>
>>>Agree with you, in usb_gadget_remove_driver() function, the udc->dev.driver and udc->gadget->dev.driver be set to null without any protection, so when the udevd accessed the dev->driver, this address may be invalid at this time.
>>>maybe the operation of dev->driver can be protected by device_lock().
>>>
Is it enough that we just need to protect "dev.driver" ?
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 3d6430eb0c6a..9bd2624973d7 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -2316,8 +2316,10 @@ static int dev_uevent(struct kobject *kobj, struct kobj_uevent_env *env)
if (dev->type && dev->type->name)
add_uevent_var(env, "DEVTYPE=%s", dev->type->name);
+ device_lock(dev);
if (dev->driver)
add_uevent_var(env, "DRIVER=%s", dev->driver->name);
+ device_unlock(dev);
/* Add common DT information about the device */
of_device_uevent(dev, env);
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
index 568534a0d17c..7877142397d3 100644
--- a/drivers/usb/gadget/udc/core.c
+++ b/drivers/usb/gadget/udc/core.c
@@ -1436,8 +1436,14 @@ static void usb_gadget_remove_driver(struct usb_udc *udc)
usb_gadget_udc_stop(udc);
udc->driver = NULL;
+
+ device_lock(&udc->dev);
udc->dev.driver = NULL;
+ device_unlock(&udc->dev);
+
+ device_lock(&udc->gadget->dev);
udc->gadget->dev.driver = NULL;
+ device_unlock(&udc->gadget->dev);
}
/**
@@ -1498,8 +1504,14 @@ static int udc_bind_to_driver(struct usb_udc *udc, struct usb_gadget_driver *dri
driver->function);
udc->driver = driver;
+
+ device_lock(&udc->dev);
udc->dev.driver = &driver->driver;
+ device_unlock(&udc->dev);
+
+ device_lock(&udc->gadget->dev);
udc->gadget->dev.driver = &driver->driver;
+ device_unlock(&udc->gadget->dev);
usb_gadget_udc_set_speed(udc, driver->max_speed);
@@ -1521,8 +1533,14 @@ static int udc_bind_to_driver(struct usb_udc *udc, struct usb_gadget_driver *dri
dev_err(&udc->dev, "failed to start %s: %d\n",
udc->driver->function, ret);
udc->driver = NULL;
+
+ device_lock(&udc->dev);
udc->dev.driver = NULL;
+ device_unlock(&udc->dev);
+
+ device_lock(&udc->gadget->dev);
udc->gadget->dev.driver = NULL;
+ device_unlock(&udc->gadget->dev);
return ret;
}
Thanks,
Zqiang
>>>Thanks,
>>>Zqiang
> Unless the device is being removed at the same time it was manually
> unbound from the driver? If so, then this really should be fixed up
> to use the driver core logic instead...
>>
>>Device removal does of course trigger unbinding, but they always take place in the same thread so it isn't an issue.
>>
>>Probably part of the reason people don't want to use the driver core here is so that they can specify which UDC a gadget driver should bind to. The driver core would always bind each new gadget to the first registered gadget driver.
>>
>>When Dave Brownell originally wrote the gadget subsystem, I believe he didn't bother to integrate it with the driver core because it was a "bus" with only a single device and a single driver. The ability to have multiple UDCs in the system was added later.
>>
>>Alan Stern
On Thu, Feb 24, 2022 at 03:14:54AM +0000, Zhang, Qiang1 wrote: > > On Wed, Feb 23, 2022 at 05:00:12PM +0100, gregkh@linuxfoundation.org wrote: > > On Wed, Feb 23, 2022 at 09:38:20AM -0500, stern@rowland.harvard.edu wrote: > > > Which bus locks are you referring to? I'm not aware of any locks > > > that synchronize dev_uevent() with anything (in particular, with > > > driver unbinding). > > > > The locks in the driver core that handle the binding and unbinding of > > drivers to devices. > > > > > And as far as I know, usb_gadget_remove_driver() doesn't play any > > > odd tricks with pointers. > > > > Ah, I never noticed that this is doing a "fake" bus and does the > > bind/unbind itself outside of the driver core. It should just be a > > normal bus type and have the core do the work for it, but oh well. > > > > And there is a lock that should serialize all of this already, so it's > > odd that this is able to be triggered at all. > > >>I guess at a minimum the UDC core should hold the device lock when it registers, unregisters, binds, or unbinds UDC and gadget devices. > >>Would that be enough to fix the problem? I really don't understand how sysfs file access gets synchronized with device removal. > > >>> > >>>Agree with you, in usb_gadget_remove_driver() function, the udc->dev.driver and udc->gadget->dev.driver be set to null without any protection, so when the udevd accessed the dev->driver, this address may be invalid at this time. > >>>maybe the operation of dev->driver can be protected by device_lock(). > >>> > > Is it enough that we just need to protect "dev.driver" ? I don't know, although I doubt it. The right way to fix it is to make sure that the existing protections, which apply to drivers that are registered in the driver core, can also work properly with gadgets. But I don't know what those protections are or how they work. > diff --git a/drivers/base/core.c b/drivers/base/core.c > index 3d6430eb0c6a..9bd2624973d7 100644 > --- a/drivers/base/core.c > +++ b/drivers/base/core.c > @@ -2316,8 +2316,10 @@ static int dev_uevent(struct kobject *kobj, struct kobj_uevent_env *env) > if (dev->type && dev->type->name) > add_uevent_var(env, "DEVTYPE=%s", dev->type->name); > > + device_lock(dev); > if (dev->driver) > add_uevent_var(env, "DRIVER=%s", dev->driver->name); > + device_unlock(dev); You probably should not do this. Unless there's a serious bug, the driver core already takes all the locks it needs. Doing this might cause a deadlock (because the caller may already hold the device lock). > > /* Add common DT information about the device */ > of_device_uevent(dev, env); > diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c > index 568534a0d17c..7877142397d3 100644 > --- a/drivers/usb/gadget/udc/core.c > +++ b/drivers/usb/gadget/udc/core.c > @@ -1436,8 +1436,14 @@ static void usb_gadget_remove_driver(struct usb_udc *udc) > usb_gadget_udc_stop(udc); > > udc->driver = NULL; > + > + device_lock(&udc->dev); > udc->dev.driver = NULL; > + device_unlock(&udc->dev); > + > + device_lock(&udc->gadget->dev); > udc->gadget->dev.driver = NULL; > + device_unlock(&udc->gadget->dev); > } These are reasonable things to do, but I don't know if they will fix the problem. We really should ask advice from somebody who understands how this stuff is supposed to work. I'm not sure who to ask, though. Tejun Heo, perhaps (CC'ed). Tejun: The USB Gadget core binds and unbinds drivers without using the normal driver core facilities (see the code in usb_gadget_remove_driver() above). As a result, unbinding races with uevent generation, which can lead to a NULL pointer dereference as found by syzbot testing. In particular, dev->driver can become NULL between the times when dev_uevent() tests it and uses it (see above). Can you tell us how this should be fixed? Alan Stern
On Thu, Feb 24, 2022 at 04:23:26PM -0500, stern@rowland.harvard.edu wrote: > On Thu, Feb 24, 2022 at 03:14:54AM +0000, Zhang, Qiang1 wrote: > > > > On Wed, Feb 23, 2022 at 05:00:12PM +0100, gregkh@linuxfoundation.org wrote: > > > On Wed, Feb 23, 2022 at 09:38:20AM -0500, stern@rowland.harvard.edu wrote: > > > > Which bus locks are you referring to? I'm not aware of any locks > > > > that synchronize dev_uevent() with anything (in particular, with > > > > driver unbinding). > > > > > > The locks in the driver core that handle the binding and unbinding of > > > drivers to devices. > > > > > > > And as far as I know, usb_gadget_remove_driver() doesn't play any > > > > odd tricks with pointers. > > > > > > Ah, I never noticed that this is doing a "fake" bus and does the > > > bind/unbind itself outside of the driver core. It should just be a > > > normal bus type and have the core do the work for it, but oh well. > > > > > > And there is a lock that should serialize all of this already, so it's > > > odd that this is able to be triggered at all. > > > > >>I guess at a minimum the UDC core should hold the device lock when it registers, unregisters, binds, or unbinds UDC and gadget devices. > > >>Would that be enough to fix the problem? I really don't understand how sysfs file access gets synchronized with device removal. > > > > >>> > > >>>Agree with you, in usb_gadget_remove_driver() function, the udc->dev.driver and udc->gadget->dev.driver be set to null without any protection, so when the udevd accessed the dev->driver, this address may be invalid at this time. > > >>>maybe the operation of dev->driver can be protected by device_lock(). > > >>> > > > > Is it enough that we just need to protect "dev.driver" ? > > I don't know, although I doubt it. The right way to fix it is to make > sure that the existing protections, which apply to drivers that are > registered in the driver core, can also work properly with gadgets. But > I don't know what those protections are or how they work. > > > diff --git a/drivers/base/core.c b/drivers/base/core.c > > index 3d6430eb0c6a..9bd2624973d7 100644 > > --- a/drivers/base/core.c > > +++ b/drivers/base/core.c > > @@ -2316,8 +2316,10 @@ static int dev_uevent(struct kobject *kobj, struct kobj_uevent_env *env) > > if (dev->type && dev->type->name) > > add_uevent_var(env, "DEVTYPE=%s", dev->type->name); > > > > + device_lock(dev); > > if (dev->driver) > > add_uevent_var(env, "DRIVER=%s", dev->driver->name); > > + device_unlock(dev); > > You probably should not do this. Unless there's a serious bug, the > driver core already takes all the locks it needs. Doing this might > cause a deadlock (because the caller may already hold the device lock). > > > > > /* Add common DT information about the device */ > > of_device_uevent(dev, env); > > diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c > > index 568534a0d17c..7877142397d3 100644 > > --- a/drivers/usb/gadget/udc/core.c > > +++ b/drivers/usb/gadget/udc/core.c > > @@ -1436,8 +1436,14 @@ static void usb_gadget_remove_driver(struct usb_udc *udc) > > usb_gadget_udc_stop(udc); > > > > udc->driver = NULL; > > + > > + device_lock(&udc->dev); > > udc->dev.driver = NULL; > > + device_unlock(&udc->dev); > > + > > + device_lock(&udc->gadget->dev); > > udc->gadget->dev.driver = NULL; > > + device_unlock(&udc->gadget->dev); > > } > > These are reasonable things to do, but I don't know if they will fix the > problem. > > We really should ask advice from somebody who understands how this stuff > is supposed to work. I'm not sure who to ask, though. Tejun Heo, > perhaps (CC'ed). > > Tejun: The USB Gadget core binds and unbinds drivers without using the > normal driver core facilities (see the code in > usb_gadget_remove_driver() above). As a result, unbinding races with > uevent generation, which can lead to a NULL pointer dereference as found > by syzbot testing. In particular, dev->driver can become NULL between > the times when dev_uevent() tests it and uses it (see above). > > Can you tell us how this should be fixed? It should be fixed by properly using the driver core to bind/unbind the driver to devices like I mentioned previously :) That will be more work, but it's the correct fix here. Otherwise it needs to take the same bus locks that the device lives on to keep things in sync, like the driver core would do if it were managing these things. That could be the "short term" fix if no one wants to do the real work needed here. Nothing should be needed to change in the driver core itself, it is rightfully thinking it owns the device and can free it when needed. thanks, greg k-h
On Thu, Feb 24, 2022 at 11:37:39PM +0100, gregkh@linuxfoundation.org wrote: > On Thu, Feb 24, 2022 at 04:23:26PM -0500, stern@rowland.harvard.edu wrote: > > Can you tell us how this should be fixed? > > It should be fixed by properly using the driver core to bind/unbind the > driver to devices like I mentioned previously :) This would involve creating a "gadget" bus_type (or should it be a device_type under the platform bus?) and registering the gadgets on it, right?. Similarly, the gadget drivers would be registered on this bus. I suppose we can control which drivers get bound to which gadgets with careful matching code. > That will be more work, but it's the correct fix here. Otherwise it > needs to take the same bus locks that the device lives on to keep things > in sync, like the driver core would do if it were managing these things. > That could be the "short term" fix if no one wants to do the real work > needed here. Nothing should be needed to change in the driver core > itself, it is rightfully thinking it owns the device and can free it > when needed. All right, thanks. I'll think about implementing it. Alan Stern
On Thu, Feb 24, 2022 at 03:14:54AM +0000, Zhang, Qiang1 wrote:
>
> On Wed, Feb 23, 2022 at 05:00:12PM +0100, gregkh@linuxfoundation.org wrote:
> > On Wed, Feb 23, 2022 at 09:38:20AM -0500, stern@rowland.harvard.edu wrote:
> > > Which bus locks are you referring to? I'm not aware of any locks
> > > that synchronize dev_uevent() with anything (in particular, with
> > > driver unbinding).
> >
> > The locks in the driver core that handle the binding and unbinding
> > of drivers to devices.
> >
> > > And as far as I know, usb_gadget_remove_driver() doesn't play any
> > > odd tricks with pointers.
> >
> > Ah, I never noticed that this is doing a "fake" bus and does the
> > bind/unbind itself outside of the driver core. It should just be a
> > normal bus type and have the core do the work for it, but oh well.
> >
> > And there is a lock that should serialize all of this already, so
> > it's odd that this is able to be triggered at all.
>
> >>I guess at a minimum the UDC core should hold the device lock when it registers, unregisters, binds, or unbinds UDC and gadget devices.
> >>Would that be enough to fix the problem? I really don't understand how sysfs file access gets synchronized with device removal.
>
> >>>
> >>>Agree with you, in usb_gadget_remove_driver() function, the udc->dev.driver and udc->gadget->dev.driver be set to null without any protection, so when the udevd accessed the dev->driver, this address may be invalid at this time.
> >>>maybe the operation of dev->driver can be protected by device_lock().
> >>>
>
> Is it enough that we just need to protect "dev.driver" ?
I don't know, although I doubt it. The right way to fix it is to make sure that the existing protections, which apply to drivers that are registered in the driver core, can also work properly with gadgets. But I don't know what those protections are or how they work.
> diff --git a/drivers/base/core.c b/drivers/base/core.c index
> 3d6430eb0c6a..9bd2624973d7 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -2316,8 +2316,10 @@ static int dev_uevent(struct kobject *kobj, struct kobj_uevent_env *env)
> if (dev->type && dev->type->name)
> add_uevent_var(env, "DEVTYPE=%s", dev->type->name);
>
> + device_lock(dev);
> if (dev->driver)
> add_uevent_var(env, "DRIVER=%s", dev->driver->name);
> + device_unlock(dev);
>>You probably should not do this. Unless there's a serious bug, the driver core already takes all the locks it needs. Doing this might cause a deadlock (because the caller may already hold the device lock).
Sorry, yes it causes recursive locks.
>
> /* Add common DT information about the device */
> of_device_uevent(dev, env);
> diff --git a/drivers/usb/gadget/udc/core.c
> b/drivers/usb/gadget/udc/core.c index 568534a0d17c..7877142397d3
> 100644
> --- a/drivers/usb/gadget/udc/core.c
> +++ b/drivers/usb/gadget/udc/core.c
> @@ -1436,8 +1436,14 @@ static void usb_gadget_remove_driver(struct usb_udc *udc)
> usb_gadget_udc_stop(udc);
>
> udc->driver = NULL;
> +
> + device_lock(&udc->dev);
> udc->dev.driver = NULL;
> + device_unlock(&udc->dev);
> +
> + device_lock(&udc->gadget->dev);
> udc->gadget->dev.driver = NULL;
> + device_unlock(&udc->gadget->dev);
> }
>>These are reasonable things to do, but I don't know if they will fix the problem.
>>
>>We really should ask advice from somebody who understands how this stuff is supposed to work. I'm not sure who to ask, though. Tejun Heo, perhaps (CC'ed).
>>
>>Tejun: The USB Gadget core binds and unbinds drivers without using the normal driver core facilities (see the code in
>>usb_gadget_remove_driver() above). As a result, unbinding races with uevent generation, which can lead to a NULL pointer dereference as found by syzbot testing. In particular, dev->driver can become NULL between the times when dev_uevent() tests it and uses it (see above).
CPU0 (task 4316) CPU1 (udevd task 3689)
usb_gadget_remove_driver() dev_uevent()
........... if (dev->driver)
udc->dev.driver = NULL; add_uevent_var(env, "DRIVER=%s", dev->driver->name)
udc->gadget->dev.driver = NULL;
Thanks,
Zqiang
>>
>>Can you tell us how this should be fixed?
>>
>>Alan Stern
© 2016 - 2026 Red Hat, Inc.