[v1] net: wwan: t7xx: fix race between TX thread and system PM suspend

[PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend

Posted by Tim JH Chen 6 days, 21 hours ago

When system suspend is triggered while the DPMAIF TX kthread
(t7xx_dpmaif_tx_hw_push_thread) is running, a deadlock can occur
leading to a CPU soft lockup.

The root cause is two-fold:

1. t7xx_dpmaif_suspend() calls t7xx_dpmaif_tx_stop() which only stops
   the TX work-queue items (by clearing txq->que_started and waiting on
   txq->tx_processing). It does NOT signal the kthread and does NOT
   update dpmaif_ctrl->state, which stays DPMAIF_STATE_PWRON.

2. The kthread's state guard (line: "if ... state != DPMAIF_STATE_PWRON")
   is only checked at the top of each loop iteration. If the thread
   already passed this guard, it proceeds unconditionally to call
   pm_runtime_resume_and_get() — which tries to acquire the PM spinlock
   also held (or contended) by the system PM suspend path.

The result is a spinlock deadlock observed as:

  watchdog: BUG: soft lockup - CPU#N stuck for 26s! [dpmaif_tx_hw_pu]
  RIP: _raw_spin_unlock_irqrestore
  Call Trace:
    __pm_runtime_resume+0x5b/0x80
    t7xx_dpmaif_tx_hw_push_thread+0xc4 [mtk_t7xx]

The condition requires ASPM L1 enabled on the endpoint (which extends
the time pm_runtime_resume_and_get() holds the PM lock during L1.2
link retraining) and hundreds of repeated suspend/resume cycles to
trigger reliably.

Fix by three coordinated changes:

- In t7xx_dpmaif_suspend(): immediately set state to DPMAIF_STATE_PWROFF
  after stopping the TX queue, then call wake_up() so any sleeping thread
  re-evaluates the wait_event condition and stops.

- In t7xx_dpmaif_resume(): restore state to DPMAIF_STATE_PWRON before
  re-enabling the TX queues, symmetric with the suspend change.
  Without this the kthread would never wake up after resume.

- In t7xx_dpmaif_tx_hw_push_thread(): add a second state check
  immediately before pm_runtime_resume_and_get() to close the TOCTOU
  window between the wait_event guard and the pm call.

Tested: no soft lockup observed over 500+ suspend/resume cycles with
SIM registered and ASPM L1 enabled (previously triggered in < 300).

Signed-off-by: Tim JH Chen <tim.jh.chen@wnc.com.tw>
---
 drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c    | 3 +++
 drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
index 7ff33c1d6..315a77e24 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
@@ -412,6 +412,8 @@ static int t7xx_dpmaif_suspend(struct t7xx_pci_dev *t7xx_dev, void *param)
 	struct dpmaif_ctrl *dpmaif_ctrl = param;
 
 	t7xx_dpmaif_tx_stop(dpmaif_ctrl);
+	dpmaif_ctrl->state = DPMAIF_STATE_PWROFF;
+	wake_up(&dpmaif_ctrl->tx_wq);
 	t7xx_dpmaif_hw_stop_all_txq(&dpmaif_ctrl->hw_info);
 	t7xx_dpmaif_hw_stop_all_rxq(&dpmaif_ctrl->hw_info);
 	t7xx_dpmaif_disable_irq(dpmaif_ctrl);
@@ -451,6 +453,7 @@ static int t7xx_dpmaif_resume(struct t7xx_pci_dev *t7xx_dev, void *param)
 	if (!dpmaif_ctrl)
 		return 0;
 
+	dpmaif_ctrl->state = DPMAIF_STATE_PWRON;
 	t7xx_dpmaif_start_txrx_qs(dpmaif_ctrl);
 	t7xx_dpmaif_enable_irq(dpmaif_ctrl);
 	t7xx_dpmaif_unmask_dlq_intr(dpmaif_ctrl);
diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
index 236d632cf..d5a5befec 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
@@ -460,6 +460,9 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg)
 				break;
 		}
 
+		if (dpmaif_ctrl->state != DPMAIF_STATE_PWRON)
+			continue;
+
 		ret = pm_runtime_resume_and_get(dpmaif_ctrl->dev);
 		if (ret < 0 && ret != -EACCES)
 			return ret;
-- 
2.25.1

Re: [PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend

Posted by Paolo Abeni 3 days, 18 hours ago

On 5/18/26 9:50 AM, Tim JH Chen wrote:
> When system suspend is triggered while the DPMAIF TX kthread
> (t7xx_dpmaif_tx_hw_push_thread) is running, a deadlock can occur
> leading to a CPU soft lockup.
> 
> The root cause is two-fold:
> 
> 1. t7xx_dpmaif_suspend() calls t7xx_dpmaif_tx_stop() which only stops
>    the TX work-queue items (by clearing txq->que_started and waiting on
>    txq->tx_processing). It does NOT signal the kthread and does NOT
>    update dpmaif_ctrl->state, which stays DPMAIF_STATE_PWRON.
> 
> 2. The kthread's state guard (line: "if ... state != DPMAIF_STATE_PWRON")
>    is only checked at the top of each loop iteration. If the thread
>    already passed this guard, it proceeds unconditionally to call
>    pm_runtime_resume_and_get() — which tries to acquire the PM spinlock
>    also held (or contended) by the system PM suspend path.
> 
> The result is a spinlock deadlock observed as:
> 
>   watchdog: BUG: soft lockup - CPU#N stuck for 26s! [dpmaif_tx_hw_pu]
>   RIP: _raw_spin_unlock_irqrestore
>   Call Trace:
>     __pm_runtime_resume+0x5b/0x80
>     t7xx_dpmaif_tx_hw_push_thread+0xc4 [mtk_t7xx]
> 
> The condition requires ASPM L1 enabled on the endpoint (which extends
> the time pm_runtime_resume_and_get() holds the PM lock during L1.2
> link retraining) and hundreds of repeated suspend/resume cycles to
> trigger reliably.
> 
> Fix by three coordinated changes:
> 
> - In t7xx_dpmaif_suspend(): immediately set state to DPMAIF_STATE_PWROFF
>   after stopping the TX queue, then call wake_up() so any sleeping thread
>   re-evaluates the wait_event condition and stops.
> 
> - In t7xx_dpmaif_resume(): restore state to DPMAIF_STATE_PWRON before
>   re-enabling the TX queues, symmetric with the suspend change.
>   Without this the kthread would never wake up after resume.
> 
> - In t7xx_dpmaif_tx_hw_push_thread(): add a second state check
>   immediately before pm_runtime_resume_and_get() to close the TOCTOU
>   window between the wait_event guard and the pm call.
> 
> Tested: no soft lockup observed over 500+ suspend/resume cycles with
> SIM registered and ASPM L1 enabled (previously triggered in < 300).
> 
> Signed-off-by: Tim JH Chen <tim.jh.chen@wnc.com.tw>

This is a fix, it should target the 'net' tree including such tag into
the subj prefix and should carry a 'Fixes:' tag.

Also this is v2 of:

https://lore.kernel.org/netdev/TYZPR02MB5232A8C6A2BA56226D97CF4A90062@TYZPR02MB5232.apcprd02.prod.outlook.com/

the subj prefix should have included the relevant revision number and
you should have described what changed in the commit message after a
'---' separator.

Please have a deep read at the process documentation and specifically at:

Documentation/process/maintainer-netdev.rst

before posting the next revision.

/P

[PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend

Posted by Tim JH Chen an hour ago

v2: Address two concerns raised in AI-assisted code review of v1:

1. [High] t7xx_dpmaif_resume() was unconditionally restoring state to
   DPMAIF_STATE_PWRON regardless of the state before suspend.  If the
   modem had already been moved to DPMAIF_STATE_PWROFF by
   t7xx_dpmaif_md_state_callback() (MD_STATE_EXCEPTION or
   MD_STATE_STOPPED) prior to system suspend, resume would incorrectly
   re-arm the TX kthread guard, allowing TX HW writes against a modem
   the MD state machine considers stopped or in exception.

   Fix: save dpmaif_ctrl->state into pre_suspend_state at the start of
   t7xx_dpmaif_suspend() and restore that saved value in
   t7xx_dpmaif_resume(), so a pre-suspend PWROFF is preserved across
   the suspend/resume cycle.

2. [Medium] The v1 second state check before pm_runtime_resume_and_get()
   only narrowed the TOCTOU window -- it did not close it.  The state
   field was a plain enum read and written without any lock or
   READ_ONCE/WRITE_ONCE annotation.  After the check passed on one CPU,
   the suspend path on another CPU could still set state=PWROFF and
   begin PM teardown before the kthread reached pm_runtime_resume_and_get(),
   reproducing the deadlock.

   Fix: introduce tx_pm_lock (struct mutex) held by the kthread across
   the [state check -> pm_runtime_resume_and_get -> pm_runtime_put]
   sequence.  t7xx_dpmaif_suspend() acquires this lock before setting
   DPMAIF_STATE_PWROFF, which serialises with any in-progress kthread
   PM section and guarantees the kthread cannot enter
   pm_runtime_resume_and_get() after the state flag is set.
   READ_ONCE/WRITE_ONCE are added at every access point of the state
   flag that crosses the suspend/resume boundary to prevent
   compiler-visible tearing.

The original v1 description of the root cause and tested fix still
applies (deadlock between t7xx_dpmaif_tx_hw_push_thread calling
pm_runtime_resume_and_get() and the system PM suspend path, triggered
with ASPM L1 enabled after repeated suspend/resume cycles).

Tested: no soft lockup over 500+ suspend/resume cycles with SIM
registered and ASPM L1 enabled (previously triggered in < 300).

Fixes: 05f7e89ab ("Linux 6.19")
Signed-off-by: Tim JH Chen <tim.jh.chen@wnc.com.tw>
---
 drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c    |  6 ++++++
 drivers/net/wwan/t7xx/t7xx_hif_dpmaif.h    |  3 +++
 drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c | 16 +++++++++++++---
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
index 7ff33c1d6ac7..e1f3eeb2c947 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
@@ -412,6 +412,11 @@ static int t7xx_dpmaif_suspend(struct t7xx_pci_dev *t7xx_dev, void *param)
 	struct dpmaif_ctrl *dpmaif_ctrl = param;
 
 	t7xx_dpmaif_tx_stop(dpmaif_ctrl);
+	dpmaif_ctrl->pre_suspend_state = READ_ONCE(dpmaif_ctrl->state);
+	mutex_lock(&dpmaif_ctrl->tx_pm_lock);
+	WRITE_ONCE(dpmaif_ctrl->state, DPMAIF_STATE_PWROFF);
+	mutex_unlock(&dpmaif_ctrl->tx_pm_lock);
+	wake_up(&dpmaif_ctrl->tx_wq);
 	t7xx_dpmaif_hw_stop_all_txq(&dpmaif_ctrl->hw_info);
 	t7xx_dpmaif_hw_stop_all_rxq(&dpmaif_ctrl->hw_info);
 	t7xx_dpmaif_disable_irq(dpmaif_ctrl);
@@ -451,6 +456,7 @@ static int t7xx_dpmaif_resume(struct t7xx_pci_dev *t7xx_dev, void *param)
 	if (!dpmaif_ctrl)
 		return 0;
 
+	WRITE_ONCE(dpmaif_ctrl->state, dpmaif_ctrl->pre_suspend_state);
 	t7xx_dpmaif_start_txrx_qs(dpmaif_ctrl);
 	t7xx_dpmaif_enable_irq(dpmaif_ctrl);
 	t7xx_dpmaif_unmask_dlq_intr(dpmaif_ctrl);
diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.h b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.h
index 0ce4505e813d..670ed2cca761 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.h
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.h
@@ -20,6 +20,7 @@
 
 #include <linux/bitmap.h>
 #include <linux/mm_types.h>
+#include <linux/mutex.h>
 #include <linux/netdevice.h>
 #include <linux/sched.h>
 #include <linux/skbuff.h>
@@ -172,6 +173,8 @@ struct dpmaif_ctrl {
 	struct t7xx_pci_dev		*t7xx_dev;
 	struct md_pm_entity		dpmaif_pm_entity;
 	enum dpmaif_state		state;
+	enum dpmaif_state		pre_suspend_state;
+	struct mutex			tx_pm_lock;
 	bool				dpmaif_sw_init_done;
 	struct dpmaif_hw_info		hw_info;
 	struct dpmaif_tx_queue		txq[DPMAIF_TXQ_NUM];
diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
index 236d632cf591..197c0ab3fd39 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
@@ -449,10 +449,10 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg)
 
 	while (!kthread_should_stop()) {
 		if (t7xx_tx_lists_are_all_empty(dpmaif_ctrl) ||
-		    dpmaif_ctrl->state != DPMAIF_STATE_PWRON) {
+		    READ_ONCE(dpmaif_ctrl->state) != DPMAIF_STATE_PWRON) {
 			if (wait_event_interruptible(dpmaif_ctrl->tx_wq,
 						     (!t7xx_tx_lists_are_all_empty(dpmaif_ctrl) &&
-						     dpmaif_ctrl->state == DPMAIF_STATE_PWRON) ||
+						     READ_ONCE(dpmaif_ctrl->state) == DPMAIF_STATE_PWRON) ||
 						     kthread_should_stop()))
 				continue;
 
@@ -460,14 +460,23 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg)
 				break;
 		}
 
+		mutex_lock(&dpmaif_ctrl->tx_pm_lock);
+		if (READ_ONCE(dpmaif_ctrl->state) != DPMAIF_STATE_PWRON) {
+			mutex_unlock(&dpmaif_ctrl->tx_pm_lock);
+			continue;
+		}
+
 		ret = pm_runtime_resume_and_get(dpmaif_ctrl->dev);
-		if (ret < 0 && ret != -EACCES)
+		if (ret < 0 && ret != -EACCES) {
+			mutex_unlock(&dpmaif_ctrl->tx_pm_lock);
 			return ret;
+		}
 
 		t7xx_pci_disable_sleep(dpmaif_ctrl->t7xx_dev);
 		t7xx_do_tx_hw_push(dpmaif_ctrl);
 		t7xx_pci_enable_sleep(dpmaif_ctrl->t7xx_dev);
 		pm_runtime_put_autosuspend(dpmaif_ctrl->dev);
+		mutex_unlock(&dpmaif_ctrl->tx_pm_lock);
 	}
 
 	return 0;
@@ -475,6 +484,7 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg)
 
 int t7xx_dpmaif_tx_thread_init(struct dpmaif_ctrl *dpmaif_ctrl)
 {
+	mutex_init(&dpmaif_ctrl->tx_pm_lock);
 	init_waitqueue_head(&dpmaif_ctrl->tx_wq);
 	dpmaif_ctrl->tx_thread = kthread_run(t7xx_dpmaif_tx_hw_push_thread,
 					     dpmaif_ctrl, "dpmaif_tx_hw_push");
-- 
2.43.0

Re: [PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend

Posted by Paolo Abeni 3 days, 18 hours ago

From: AI Reviewer <ai@example.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://netdev-ai.bots.linux.dev/sashiko/
---
net: wwan: t7xx: fix race between TX thread and system PM suspend

This patch addresses a soft lockup observed when system suspend races with
the DPMAIF TX kthread. It updates dpmaif_ctrl->state in the suspend and
resume callbacks and adds a second state check in
t7xx_dpmaif_tx_hw_push_thread() before pm_runtime_resume_and_get().

> diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
> index 7ff33c1d6ac7..315a77e2465a 100644
> --- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
> +++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
> @@ -451,6 +453,7 @@ static int t7xx_dpmaif_resume(struct t7xx_pci_dev *t7xx_dev, void *param)
>  	if (!dpmaif_ctrl)
>  		return 0;
>  
> +	dpmaif_ctrl->state = DPMAIF_STATE_PWRON;
>  	t7xx_dpmaif_start_txrx_qs(dpmaif_ctrl);
>  	t7xx_dpmaif_enable_irq(dpmaif_ctrl);
>  	t7xx_dpmaif_unmask_dlq_intr(dpmaif_ctrl);

[High]
Is the unconditional assignment to DPMAIF_STATE_PWRON safe when the modem
was not running before suspend?

Prior to this patch neither suspend nor resume modified dpmaif_ctrl->state,
so a modem that had been moved to DPMAIF_STATE_PWROFF via
t7xx_dpmaif_md_state_callback() (for MD_STATE_STOPPED or MD_STATE_EXCEPTION,
which calls t7xx_dpmaif_stop()) stayed PWROFF across a suspend/resume cycle.

After the patch t7xx_dpmaif_resume() forces state to PWRON and also
restores que_started=true via t7xx_dpmaif_start_txrx_qs(). The TX kthread's
state guard then permits it to proceed:

	if (dpmaif_ctrl->state != DPMAIF_STATE_PWRON)
		continue;

	ret = pm_runtime_resume_and_get(dpmaif_ctrl->dev);
	...
	t7xx_do_tx_hw_push(dpmaif_ctrl);

Could this cause TX HW writes against a modem that the MD state machine
considers stopped or in exception?

The commit message describes the resume change as "symmetric with the
suspend change", but the suspend transition (PWRON -> PWROFF) tracks an
actual HW teardown, while the resume transition forces PWRON regardless of
the prior MD-driven state. Should the resume path only restore PWRON when
the modem was PWRON before suspend?

> diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
> index 236d632cf591..d5a5befec71e 100644
> --- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
> +++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
> @@ -460,6 +460,9 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg)
>  				break;
>  		}
>  
> +		if (dpmaif_ctrl->state != DPMAIF_STATE_PWRON)
> +			continue;
> +
>  		ret = pm_runtime_resume_and_get(dpmaif_ctrl->dev);
>  		if (ret < 0 && ret != -EACCES)
>  			return ret;

[Medium]
Does this new check actually close the TOCTOU window described in the
commit message, or does it just narrow it?

dpmaif_ctrl->state is a plain enum read here without any lock,
READ_ONCE/WRITE_ONCE, or memory barrier. It is also written without
locking by t7xx_dpmaif_suspend(), t7xx_dpmaif_resume(), t7xx_dpmaif_start(),
and t7xx_dpmaif_stop() (via t7xx_dpmaif_md_state_callback()), and read
without locking from t7xx_do_tx_hw_push() and the ISR path.

After this new check passes, can the system PM suspend path still set
state = DPMAIF_STATE_PWROFF on another CPU and grab the PM spinlock before
this thread reaches pm_runtime_resume_and_get(), reproducing the same
deadlock the patch is intended to fix?

The commit message states:

	add a second state check immediately before
	pm_runtime_resume_and_get() to close the TOCTOU window between
	the wait_event guard and the pm call.

Given that the window between the new check and the pm_runtime call is
non-zero and the flag has no synchronization, would it be more accurate to
say the window is reduced rather than closed?

For reference, Documentation-style guidance on flags-as-locks (the gate
guard pattern of reading a flag and then operating on shared state without
holding a lock) suggests using a real mutex/spinlock or a completion that
suspend waits on, so the kthread's PM reference is guaranteed to be
dropped before the suspend path runs. Would something along those lines be
a better fit here?
-- 
This is an AI-generated review.