net: usb: r8152: resume-reset deadlock

Sergey Senozhatsky posted 1 patch 5 months, 2 weeks ago
net: usb: r8152: resume-reset deadlock
Posted by Sergey Senozhatsky 5 months, 2 weeks ago
I'm looking into the following deadlock

<4>[ 1596.492101] schedule_preempt_disabled+0x15/0x30
<4>[ 1596.492170] __mutex_lock_common+0x256/0x490
<4>[ 1596.492209] __mutex_lock_slowpath+0x18/0x30
<4>[ 1596.492249] __rtl8152_set_mac_address+0x80/0x1f0 [r8152 (HASH:ce6f 4)]
<4>[ 1596.492327] dev_set_mac_address+0x7d/0x150
<4>[ 1596.492395] rtl8152_post_reset+0x72/0x150 [r8152 (HASH:ce6f 4)]
<4>[ 1596.492438] usb_reset_device+0x1ce/0x220
<4>[ 1596.492507] rtl8152_resume+0x99/0xc0 [r8152 (HASH:ce6f 4)]
<4>[ 1596.492550] usb_resume_interface+0x3c/0xc0
<4>[ 1596.492619] usb_resume_both+0x104/0x150
<4>[ 1596.492657] ? usb_dev_suspend+0x20/0x20
<4>[ 1596.492725] usb_resume+0x22/0x110
<4>[ 1596.492763] ? usb_dev_suspend+0x20/0x20
<4>[ 1596.492800] dpm_run_callback+0x83/0x1d0
<4>[ 1596.492873] device_resume+0x35f/0x3d0
<4>[ 1596.492912] ? pm_verb+0xa0/0xa0
<4>[ 1596.492951] async_resume+0x1d/0x30
<4>[ 1596.493019] async_run_entry_fn+0x2b/0xd0
<4>[ 1596.493060] worker_thread+0x2ce/0xef0
<4>[ 1596.493101] ? cancel_delayed_work+0x2d0/0x2d0
<4>[ 1596.493170] kthread+0x16d/0x190
<4>[ 1596.493209] ? cancel_delayed_work+0x2d0/0x2d0
<4>[ 1596.493247] ? kthread_associate_blkcg+0x80/0x80
<4>[ 1596.493316] ret_from_fork+0x1f/0x30

rtl8152_resume() seems to be tricky, because it's under tp->control
mutex, when it can see RTL8152_INACCESSIBLE and initiate a full
device reset via usb_reset_device(), which eventually re-enters rtl8152,
at which point it calls __rtl8152_set_mac_address() and deadlocks on
tp->control (I assume) mutex.

__rtl8152_set_mac_address() has in_resume flag (added by Takashi in
776ac63a986d), which is set only in "reset_resume" case, wheres what
we have is "resume_reset".  Moreover in_resume flag is not for tp->control
mutex, as far as I can tell, but for PM lock.  When we set_mac_address
from resume_reset, we lose in_resume flat, so not only we deadlock on
tp->control mutex, but also we may (I guess) deadlock on the PM lock.

Also, we still call rtl8152_resume() even in reset_resume, which I
assume still can end up resetting device and hence in set_mac_address()
in non-in_resume mode, potentially triggering the same deadlock that
Takashi fixed.  Well, unless I'm missing something.

So I don't think I want to add another flag to mark "current owns tp->control
mutex" so that we can handle re-entry.  How about moving usb reset
outside of tp->control scope?  Is there any harm in doing that?

---

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 44cba7acfe7d..7b4324c99869 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -8535,19 +8535,6 @@ static int rtl8152_system_resume(struct r8152 *tp)
 		usb_submit_urb(tp->intr_urb, GFP_NOIO);
 	}
 
-	/* If the device is RTL8152_INACCESSIBLE here then we should do a
-	 * reset. This is important because the usb_lock_device_for_reset()
-	 * that happens as a result of usb_queue_reset_device() will silently
-	 * fail if the device was suspended or if too much time passed.
-	 *
-	 * NOTE: The device is locked here so we can directly do the reset.
-	 * We don't need usb_lock_device_for_reset() because that's just a
-	 * wrapper over device_lock() and device_resume() (which calls us)
-	 * does that for us.
-	 */
-	if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
-		usb_reset_device(tp->udev);
-
 	return 0;
 }
 
@@ -8671,6 +8658,19 @@ static int rtl8152_resume(struct usb_interface *intf)
 
 	mutex_unlock(&tp->control);
 
+	/* If the device is RTL8152_INACCESSIBLE here then we should do a
+	 * reset. This is important because the usb_lock_device_for_reset()
+	 * that happens as a result of usb_queue_reset_device() will silently
+	 * fail if the device was suspended or if too much time passed.
+	 *
+	 * NOTE: The device is locked here so we can directly do the reset.
+	 * We don't need usb_lock_device_for_reset() because that's just a
+	 * wrapper over device_lock() and device_resume() (which calls us)
+	 * does that for us.
+	 */
+	if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
+		usb_reset_device(tp->udev);
+
 	return ret;
 }
Re: net: usb: r8152: resume-reset deadlock
Posted by Paolo Abeni 5 months, 1 week ago
On 8/26/25 11:55 AM, Sergey Senozhatsky wrote:
> I'm looking into the following deadlock
> 
> <4>[ 1596.492101] schedule_preempt_disabled+0x15/0x30
> <4>[ 1596.492170] __mutex_lock_common+0x256/0x490
> <4>[ 1596.492209] __mutex_lock_slowpath+0x18/0x30
> <4>[ 1596.492249] __rtl8152_set_mac_address+0x80/0x1f0 [r8152 (HASH:ce6f 4)]
> <4>[ 1596.492327] dev_set_mac_address+0x7d/0x150
> <4>[ 1596.492395] rtl8152_post_reset+0x72/0x150 [r8152 (HASH:ce6f 4)]
> <4>[ 1596.492438] usb_reset_device+0x1ce/0x220
> <4>[ 1596.492507] rtl8152_resume+0x99/0xc0 [r8152 (HASH:ce6f 4)]
> <4>[ 1596.492550] usb_resume_interface+0x3c/0xc0
> <4>[ 1596.492619] usb_resume_both+0x104/0x150
> <4>[ 1596.492657] ? usb_dev_suspend+0x20/0x20
> <4>[ 1596.492725] usb_resume+0x22/0x110
> <4>[ 1596.492763] ? usb_dev_suspend+0x20/0x20
> <4>[ 1596.492800] dpm_run_callback+0x83/0x1d0
> <4>[ 1596.492873] device_resume+0x35f/0x3d0
> <4>[ 1596.492912] ? pm_verb+0xa0/0xa0
> <4>[ 1596.492951] async_resume+0x1d/0x30
> <4>[ 1596.493019] async_run_entry_fn+0x2b/0xd0
> <4>[ 1596.493060] worker_thread+0x2ce/0xef0
> <4>[ 1596.493101] ? cancel_delayed_work+0x2d0/0x2d0
> <4>[ 1596.493170] kthread+0x16d/0x190
> <4>[ 1596.493209] ? cancel_delayed_work+0x2d0/0x2d0
> <4>[ 1596.493247] ? kthread_associate_blkcg+0x80/0x80
> <4>[ 1596.493316] ret_from_fork+0x1f/0x30
> 
> rtl8152_resume() seems to be tricky, because it's under tp->control
> mutex, when it can see RTL8152_INACCESSIBLE and initiate a full
> device reset via usb_reset_device(), which eventually re-enters rtl8152,
> at which point it calls __rtl8152_set_mac_address() and deadlocks on
> tp->control (I assume) mutex.

Decoding the above stack trace will tell for sure.

> __rtl8152_set_mac_address() has in_resume flag (added by Takashi in
> 776ac63a986d), which is set only in "reset_resume" case, wheres what
> we have is "resume_reset".  Moreover in_resume flag is not for tp->control
> mutex, as far as I can tell, but for PM lock.  When we set_mac_address
> from resume_reset, we lose in_resume flat, so not only we deadlock on
> tp->control mutex, but also we may (I guess) deadlock on the PM lock.
> 
> Also, we still call rtl8152_resume() even in reset_resume, which I
> assume still can end up resetting device and hence in set_mac_address()
> in non-in_resume mode, potentially triggering the same deadlock that
> Takashi fixed.  Well, unless I'm missing something.
> 
> So I don't think I want to add another flag to mark "current owns tp->control
> mutex" so that we can handle re-entry.  How about moving usb reset
> outside of tp->control scope?  Is there any harm in doing that?

According to commit 4933b066fefbee4f1d2d708de53c4ab7f09026ad, the
usb_reset_device() call is intentionally under tp->control protection to
vs double reset.

At very least the proposed code here could end-up causing an unexpected
reset when SELECTIVE_SUSPEND is set.

I *guess* you should track the current status explicitly to restrict the
reset at in the !test_bit(SELECTIVE_SUSPEND) scenario and explicitly
avoid delayed reset during resume.

Ad very least you should add a fixes tag, a proper Sob and use canonical
commit references.

Thanks,

Paolo
Re: net: usb: r8152: resume-reset deadlock
Posted by Sergey Senozhatsky 1 week, 6 days ago
On (25/09/02 10:36), Paolo Abeni wrote:
> On 8/26/25 11:55 AM, Sergey Senozhatsky wrote:
> > I'm looking into the following deadlock
> > 
> > <4>[ 1596.492101] schedule_preempt_disabled+0x15/0x30
> > <4>[ 1596.492170] __mutex_lock_common+0x256/0x490
> > <4>[ 1596.492209] __mutex_lock_slowpath+0x18/0x30
> > <4>[ 1596.492249] __rtl8152_set_mac_address+0x80/0x1f0 [r8152 (HASH:ce6f 4)]
> > <4>[ 1596.492327] dev_set_mac_address+0x7d/0x150
> > <4>[ 1596.492395] rtl8152_post_reset+0x72/0x150 [r8152 (HASH:ce6f 4)]
> > <4>[ 1596.492438] usb_reset_device+0x1ce/0x220
> > <4>[ 1596.492507] rtl8152_resume+0x99/0xc0 [r8152 (HASH:ce6f 4)]
> > <4>[ 1596.492550] usb_resume_interface+0x3c/0xc0
> > <4>[ 1596.492619] usb_resume_both+0x104/0x150
> > <4>[ 1596.492657] ? usb_dev_suspend+0x20/0x20
> > <4>[ 1596.492725] usb_resume+0x22/0x110
> > <4>[ 1596.492763] ? usb_dev_suspend+0x20/0x20
> > <4>[ 1596.492800] dpm_run_callback+0x83/0x1d0
> > <4>[ 1596.492873] device_resume+0x35f/0x3d0
> > <4>[ 1596.492912] ? pm_verb+0xa0/0xa0
> > <4>[ 1596.492951] async_resume+0x1d/0x30
> > <4>[ 1596.493019] async_run_entry_fn+0x2b/0xd0
> > <4>[ 1596.493060] worker_thread+0x2ce/0xef0
> > <4>[ 1596.493101] ? cancel_delayed_work+0x2d0/0x2d0
> > <4>[ 1596.493170] kthread+0x16d/0x190
> > <4>[ 1596.493209] ? cancel_delayed_work+0x2d0/0x2d0
> > <4>[ 1596.493247] ? kthread_associate_blkcg+0x80/0x80
> > <4>[ 1596.493316] ret_from_fork+0x1f/0x30
> > 
> > rtl8152_resume() seems to be tricky, because it's under tp->control
> > mutex, when it can see RTL8152_INACCESSIBLE and initiate a full
> > device reset via usb_reset_device(), which eventually re-enters rtl8152,
> > at which point it calls __rtl8152_set_mac_address() and deadlocks on
> > tp->control (I assume) mutex.
> 
> Decoding the above stack trace will tell for sure.

Apologies for the delay, somehow this slipped through.

> > __rtl8152_set_mac_address() has in_resume flag (added by Takashi in
> > 776ac63a986d), which is set only in "reset_resume" case, wheres what
> > we have is "resume_reset".  Moreover in_resume flag is not for tp->control
> > mutex, as far as I can tell, but for PM lock.  When we set_mac_address
> > from resume_reset, we lose in_resume flat, so not only we deadlock on
> > tp->control mutex, but also we may (I guess) deadlock on the PM lock.
> > 
> > Also, we still call rtl8152_resume() even in reset_resume, which I
> > assume still can end up resetting device and hence in set_mac_address()
> > in non-in_resume mode, potentially triggering the same deadlock that
> > Takashi fixed.  Well, unless I'm missing something.
> > 
> > So I don't think I want to add another flag to mark "current owns tp->control
> > mutex" so that we can handle re-entry.  How about moving usb reset
> > outside of tp->control scope?  Is there any harm in doing that?
> 
> According to commit 4933b066fefbee4f1d2d708de53c4ab7f09026ad, the
> usb_reset_device() call is intentionally under tp->control protection to
> vs double reset.
> 
> At very least the proposed code here could end-up causing an unexpected
> reset when SELECTIVE_SUSPEND is set.
> 
> I *guess* you should track the current status explicitly to restrict the
> reset at in the !test_bit(SELECTIVE_SUSPEND) scenario and explicitly
> avoid delayed reset during resume.

Something like this?

---

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 30f937527cd2..0a570b8ac7dd 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -8542,14 +8542,9 @@ static int rtl8152_system_resume(struct r8152 *tp)
 	 * reset. This is important because the usb_lock_device_for_reset()
 	 * that happens as a result of usb_queue_reset_device() will silently
 	 * fail if the device was suspended or if too much time passed.
-	 *
-	 * NOTE: The device is locked here so we can directly do the reset.
-	 * We don't need usb_lock_device_for_reset() because that's just a
-	 * wrapper over device_lock() and device_resume() (which calls us)
-	 * does that for us.
 	 */
 	if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
-		usb_reset_device(tp->udev);
+		return -ENODEV;
 
 	return 0;
 }
@@ -8674,6 +8669,14 @@ static int rtl8152_resume(struct usb_interface *intf)
 
 	mutex_unlock(&tp->control);
 
+	/*
+	 * This only happens for !SELECTIVE_SUSPEND and RTL8152_INACCESSIBLE,
+	 * drop tp->control, because reset path can re-acquire it (e.g. for
+	 * MAC address/policy update).
+	 */
+	if (ret == -ENODEV)
+		ret = usb_reset_device(tp->udev);
+
 	return ret;
 }

---

> Ad very least you should add a fixes tag, a proper Sob and use canonical
> commit references.

Well, sure, when we have a fix that we agree on I'll submit it properly.
Re: net: usb: r8152: resume-reset deadlock
Posted by Sergey Senozhatsky 1 week, 6 days ago
On (26/01/28 14:58), Sergey Senozhatsky wrote:
> Something like this?
> 
> ---
> 
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index 30f937527cd2..0a570b8ac7dd 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -8542,14 +8542,9 @@ static int rtl8152_system_resume(struct r8152 *tp)
>  	 * reset. This is important because the usb_lock_device_for_reset()
>  	 * that happens as a result of usb_queue_reset_device() will silently
>  	 * fail if the device was suspended or if too much time passed.
> -	 *
> -	 * NOTE: The device is locked here so we can directly do the reset.
> -	 * We don't need usb_lock_device_for_reset() because that's just a
> -	 * wrapper over device_lock() and device_resume() (which calls us)
> -	 * does that for us.
>  	 */
>  	if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> -		usb_reset_device(tp->udev);
> +		return -ENODEV;
>  
>  	return 0;
>  }
> @@ -8674,6 +8669,14 @@ static int rtl8152_resume(struct usb_interface *intf)
>  
>  	mutex_unlock(&tp->control);
>  
> +	/*
> +	 * This only happens for !SELECTIVE_SUSPEND and RTL8152_INACCESSIBLE,
> +	 * drop tp->control, because reset path can re-acquire it (e.g. for
> +	 * MAC address/policy update).
> +	 */
> +	if (ret == -ENODEV)
> +		ret = usb_reset_device(tp->udev);
> +
>  	return ret;
>  }

Sorry, sent it too early.  Something like instead perhaps:

---

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 30f937527cd2..c9406b89bb12 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -8538,20 +8538,18 @@ static int rtl8152_system_resume(struct r8152 *tp)
 		usb_submit_urb(tp->intr_urb, GFP_NOIO);
 	}
 
+	if (!test_bit(RTL8152_INACCESSIBLE, &tp->flags))
+		return 0;
+
 	/* If the device is RTL8152_INACCESSIBLE here then we should do a
 	 * reset. This is important because the usb_lock_device_for_reset()
 	 * that happens as a result of usb_queue_reset_device() will silently
 	 * fail if the device was suspended or if too much time passed.
-	 *
-	 * NOTE: The device is locked here so we can directly do the reset.
-	 * We don't need usb_lock_device_for_reset() because that's just a
-	 * wrapper over device_lock() and device_resume() (which calls us)
-	 * does that for us.
 	 */
-	if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
-		usb_reset_device(tp->udev);
+	if (test_bit(SELECTIVE_SUSPEND, &tp->flags))
+		return usb_reset_device(tp->udev);
 
-	return 0;
+	return -ENODEV;
 }
 
 static int rtl8152_runtime_suspend(struct r8152 *tp)
@@ -8674,6 +8672,14 @@ static int rtl8152_resume(struct usb_interface *intf)
 
 	mutex_unlock(&tp->control);
 
+	/*
+	 * This only happens for !SELECTIVE_SUSPEND and RTL8152_INACCESSIBLE,
+	 * drop tp->control, becuase reset can re-acquire it (e.g. from
+	 * max address reset)
+	 */
+	if (ret == -ENODEV)
+		ret = usb_reset_device(tp->udev);
+
 	return ret;
 }
Re: net: usb: r8152: resume-reset deadlock
Posted by Sergey Senozhatsky 1 week, 6 days ago
On (26/01/28 15:12), Sergey Senozhatsky wrote:
> diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
> index 30f937527cd2..c9406b89bb12 100644
> --- a/drivers/net/usb/r8152.c
> +++ b/drivers/net/usb/r8152.c
> @@ -8538,20 +8538,18 @@ static int rtl8152_system_resume(struct r8152 *tp)
>  		usb_submit_urb(tp->intr_urb, GFP_NOIO);
>  	}
>  
> +	if (!test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> +		return 0;
> +
>  	/* If the device is RTL8152_INACCESSIBLE here then we should do a
>  	 * reset. This is important because the usb_lock_device_for_reset()
>  	 * that happens as a result of usb_queue_reset_device() will silently
>  	 * fail if the device was suspended or if too much time passed.
> -	 *
> -	 * NOTE: The device is locked here so we can directly do the reset.
> -	 * We don't need usb_lock_device_for_reset() because that's just a
> -	 * wrapper over device_lock() and device_resume() (which calls us)
> -	 * does that for us.
>  	 */
> -	if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
> -		usb_reset_device(tp->udev);
> +	if (test_bit(SELECTIVE_SUSPEND, &tp->flags))
> +		return usb_reset_device(tp->udev);
>  
> -	return 0;
> +	return -ENODEV;
>  }
>  
>  static int rtl8152_runtime_suspend(struct r8152 *tp)
> @@ -8674,6 +8672,14 @@ static int rtl8152_resume(struct usb_interface *intf)
>  
>  	mutex_unlock(&tp->control);
>  
> +	/*
> +	 * This only happens for !SELECTIVE_SUSPEND and RTL8152_INACCESSIBLE,
> +	 * drop tp->control, becuase reset can re-acquire it (e.g. from
> +	 * max address reset)
> +	 */
> +	if (ret == -ENODEV)
> +		ret = usb_reset_device(tp->udev);
> +
>  	return ret;
>  }

That's too messy.  Something like this should be better, I guess:

---

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 30f937527cd2..2cbc07b7aab5 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -8538,19 +8538,6 @@ static int rtl8152_system_resume(struct r8152 *tp)
 		usb_submit_urb(tp->intr_urb, GFP_NOIO);
 	}
 
-	/* If the device is RTL8152_INACCESSIBLE here then we should do a
-	 * reset. This is important because the usb_lock_device_for_reset()
-	 * that happens as a result of usb_queue_reset_device() will silently
-	 * fail if the device was suspended or if too much time passed.
-	 *
-	 * NOTE: The device is locked here so we can directly do the reset.
-	 * We don't need usb_lock_device_for_reset() because that's just a
-	 * wrapper over device_lock() and device_resume() (which calls us)
-	 * does that for us.
-	 */
-	if (test_bit(RTL8152_INACCESSIBLE, &tp->flags))
-		usb_reset_device(tp->udev);
-
 	return 0;
 }
 
@@ -8661,6 +8648,7 @@ static int rtl8152_suspend(struct usb_interface *intf, pm_message_t message)
 static int rtl8152_resume(struct usb_interface *intf)
 {
 	struct r8152 *tp = usb_get_intfdata(intf);
+	bool system_resume = !test_bit(SELECTIVE_SUSPEND, &tp->flags);
 	int ret;
 
 	mutex_lock(&tp->control);
@@ -8674,6 +8662,9 @@ static int rtl8152_resume(struct usb_interface *intf)
 
 	mutex_unlock(&tp->control);
 
+	if (system_resume && test_bit(RTL8152_INACCESSIBLE, &tp->flags))
+		ret = usb_reset_device(tp->udev);
+
 	return ret;
 }