[PATCH] driver core: fix async device shutdown hang

Stuart Hayes posted 1 patch 2 months, 1 week ago
There is a newer version of this series
drivers/base/core.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
[PATCH] driver core: fix async device shutdown hang
Posted by Stuart Hayes 2 months, 1 week ago
Modify device_shutdown() so that supplier devices do not wait for
consumer devices to be shut down first when the devlink is sync state
only, since the consumer is not dependent on the supplier in this case.

Without this change, a circular dependency could hang the system.

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
---
 drivers/base/core.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index b69b82da8837..76513e360496 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -4898,8 +4898,16 @@ void device_shutdown(void)
 
 		idx = device_links_read_lock();
 		list_for_each_entry_rcu(link, &dev->links.suppliers, c_node,
-				device_links_read_lock_held())
+				device_links_read_lock_held()) {
+			/*
+			 * sync_state_only suppliers don't need to wait,
+			 * aren't reordered on devices_kset, so making them
+			 * wait could result in a hang
+			 */
+			if (device_link_flag_is_sync_state_only(link->flags))
+				continue;
 			link->supplier->p->shutdown_after = cookie;
+		}
 		device_links_read_unlock(idx);
 		put_device(dev);
 
-- 
2.39.3
Re: [PATCH] driver core: fix async device shutdown hang
Posted by Greg Kroah-Hartman 2 months, 1 week ago
On Tue, Sep 17, 2024 at 03:15:17PM -0500, Stuart Hayes wrote:
> Modify device_shutdown() so that supplier devices do not wait for
> consumer devices to be shut down first when the devlink is sync state
> only, since the consumer is not dependent on the supplier in this case.
> 
> Without this change, a circular dependency could hang the system.
> 
> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>

What commit id does this fix?  Should it go to stable?

And what driver is causing this problem, is this a regression or for
something new that just got added to the tree?

thanks,

greg k-h
Re: [PATCH] driver core: fix async device shutdown hang
Posted by stuart hayes 2 months, 1 week ago

On 9/17/2024 3:42 PM, Greg Kroah-Hartman wrote:
> On Tue, Sep 17, 2024 at 03:15:17PM -0500, Stuart Hayes wrote:
>> Modify device_shutdown() so that supplier devices do not wait for
>> consumer devices to be shut down first when the devlink is sync state
>> only, since the consumer is not dependent on the supplier in this case.
>>
>> Without this change, a circular dependency could hang the system.
>>
>> Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> 
> What commit id does this fix?  Should it go to stable?
> 
> And what driver is causing this problem, is this a regression or for
> something new that just got added to the tree?
> 
> thanks,
> 
> greg k-h

This fixes commit 8064952c65045f05ee2671fe437770e50c151776, in
driver-core-next & linux-next... it's problem with code that was just
added to the tree (in drivers/base/core.c).  It is not in stable.

Apologies, I should have mentioned that from the start.

The issue was found using qemu... a pl061 device (supplier) and
gpio-keys device (consumer), from a qemu-generated device tree with
the aarch64 architecture.  I don't know why the devlink is in the
sync_state_only state in this case.  I didn't dig into that, because
I figured the shutdown code shouldn't hang regardless.
Re: [PATCH] driver core: fix async device shutdown hang
Posted by Greg Kroah-Hartman 2 months, 1 week ago
On Tue, Sep 17, 2024 at 07:20:41PM -0500, stuart hayes wrote:
> 
> 
> On 9/17/2024 3:42 PM, Greg Kroah-Hartman wrote:
> > On Tue, Sep 17, 2024 at 03:15:17PM -0500, Stuart Hayes wrote:
> > > Modify device_shutdown() so that supplier devices do not wait for
> > > consumer devices to be shut down first when the devlink is sync state
> > > only, since the consumer is not dependent on the supplier in this case.
> > > 
> > > Without this change, a circular dependency could hang the system.
> > > 
> > > Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> > 
> > What commit id does this fix?  Should it go to stable?
> > 
> > And what driver is causing this problem, is this a regression or for
> > something new that just got added to the tree?
> > 
> > thanks,
> > 
> > greg k-h
> 
> This fixes commit 8064952c65045f05ee2671fe437770e50c151776, in
> driver-core-next & linux-next... it's problem with code that was just
> added to the tree (in drivers/base/core.c).  It is not in stable.

Ah, that wasn't obvious, sorry.

> Apologies, I should have mentioned that from the start.

Can you resend this with a "Fixes:" tag in it so I can just take it that
way and not have to edit it by hand?

thanks,

greg k-h
Re: [PATCH] driver core: fix async device shutdown hang
Posted by Laurence Oberman 2 months, 1 week ago
On Wed, 2024-09-18 at 08:12 +0200, Greg Kroah-Hartman wrote:
> On Tue, Sep 17, 2024 at 07:20:41PM -0500, stuart hayes wrote:
> > 
> > 
> > On 9/17/2024 3:42 PM, Greg Kroah-Hartman wrote:
> > > On Tue, Sep 17, 2024 at 03:15:17PM -0500, Stuart Hayes wrote:
> > > > Modify device_shutdown() so that supplier devices do not wait
> > > > for
> > > > consumer devices to be shut down first when the devlink is sync
> > > > state
> > > > only, since the consumer is not dependent on the supplier in
> > > > this case.
> > > > 
> > > > Without this change, a circular dependency could hang the
> > > > system.
> > > > 
> > > > Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
> > > 
> > > What commit id does this fix?  Should it go to stable?
> > > 
> > > And what driver is causing this problem, is this a regression or
> > > for
> > > something new that just got added to the tree?
> > > 
> > > thanks,
> > > 
> > > greg k-h
> > 
> > This fixes commit 8064952c65045f05ee2671fe437770e50c151776, in
> > driver-core-next & linux-next... it's problem with code that was
> > just
> > added to the tree (in drivers/base/core.c).  It is not in stable.
> 
> Ah, that wasn't obvious, sorry.
> 
> > Apologies, I should have mentioned that from the start.
> 
> Can you resend this with a "Fixes:" tag in it so I can just take it
> that
> way and not have to edit it by hand?
> 
> thanks,
> 
> greg k-h
> 

FYI 
This patch plus the rest of the original Bundle has been tested 
at Red Hat on a system with 24 nvme devices. 
Improvement was almost 8 times faster to shutdown.

Tested-by: Laurence Oberman <loberman@redhat.com>