[PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend

Tim JH Chen posted 1 patch 1 week ago
drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c    | 25 ++++++++++++++++------
drivers/net/wwan/t7xx/t7xx_hif_dpmaif.h    |  3 +++
drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c | 18 ++++++++++++----
3 files changed, 35 insertions(+), 11 deletions(-)
[PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend
Posted by Tim JH Chen 1 week ago
When system suspend is triggered while the DPMAIF TX kthread
(t7xx_dpmaif_tx_hw_push_thread) is running, a deadlock can occur
leading to a CPU soft lockup.

The root cause is two-fold:

1. t7xx_dpmaif_suspend() calls t7xx_dpmaif_tx_stop() which only stops
   the TX work-queue items (by clearing txq->que_started and waiting on
   txq->tx_processing). It does NOT signal the kthread and does NOT
   update dpmaif_ctrl->state, which stays DPMAIF_STATE_PWRON.

2. The kthread's state guard is only checked at the top of each loop
   iteration. If the thread already passed this guard, it proceeds
   unconditionally to call pm_runtime_resume_and_get() — which tries to
   acquire dev->power.lock also contended by the system PM suspend path.

The result is a spinlock deadlock observed as:

  watchdog: BUG: soft lockup - CPU#N stuck for 26s! [dpmaif_tx_hw_pu]
  RIP: _raw_spin_unlock_irqrestore
  Call Trace:
    __pm_runtime_resume+0x5b/0x80
    t7xx_dpmaif_tx_hw_push_thread+0xc4 [mtk_t7xx]

The condition requires ASPM L1 enabled on the endpoint (which extends
the time pm_runtime_resume_and_get() holds dev->power.lock during L1.2
link retraining) and hundreds of repeated suspend/resume cycles to
trigger reliably.

Fix by introducing tx_pm_lock (struct mutex) and several coordinated
changes:

t7xx_dpmaif_suspend():
  After t7xx_dpmaif_tx_stop(), acquire tx_pm_lock. Under the lock,
  snapshot dpmaif_ctrl->state into pre_suspend_state (capturing the
  modem state atomically with respect to the kthread's PM section),
  then set DPMAIF_STATE_PWROFF via WRITE_ONCE(). Release the lock
  and call wake_up() so any sleeping kthread re-evaluates the
  wait_event condition and exits.

  t7xx_dpmaif_suspend() acquires tx_pm_lock without holding any PM
  lock. While it waits, the kthread may call pm_runtime_resume_and_get()
  which briefly takes and releases dev->power.lock independently.
  Because the suspend callback does not compete for dev->power.lock at
  this point, the original spinlock deadlock cannot occur. Suspend
  latency increases by at most one TX burst drain time, which is
  bounded by the DRB ring depth.

t7xx_dpmaif_resume():
  When pre_suspend_state is DPMAIF_STATE_PWRON, re-arm the HW fully
  (start_txrx_qs, enable_irq, unmask_dlq_intr, start_hw) before
  publishing the new state. This ensures the kthread cannot issue
  ul_update_hw_drb_cnt() MMIO writes before UL_ALL_Q_EN is set by
  t7xx_dpmaif_start_hw(). Publish the restored state under tx_pm_lock
  to serialise with the kthread's under-lock state check. Wake up the
  kthread only after HW and state are both consistent.

  When pre_suspend_state is DPMAIF_STATE_PWROFF (modem was already
  stopped or in exception before suspend), skip HW re-arming entirely
  to avoid leaving DMA engines running while the MD state machine
  considers the modem inactive.

t7xx_dpmaif_tx_hw_push_thread():
  Hold tx_pm_lock across the [state check -> pm_runtime_resume_and_get
  -> pm_runtime_put_autosuspend] sequence. A second READ_ONCE() state
  check under the lock closes the TOCTOU window between the wait_event
  guard at the loop top and the pm_runtime call. READ_ONCE() is used
  in all unguarded state reads in this function.

t7xx_dpmaif_start() / t7xx_dpmaif_stop():
  Use WRITE_ONCE() for state writes to match the READ_ONCE() reads
  used throughout the driver and prevent compiler optimisations from
  obscuring concurrent access.

t7xx_do_tx_hw_push():
  Use READ_ONCE() in the do/while termination condition to match the
  WRITE_ONCE() annotations on the write side.

t7xx_dpmaif_tx_thread_init():
  Initialise tx_pm_lock with mutex_init().

Note: t7xx_dpmaif_start() and t7xx_dpmaif_stop() (called from the
MD-FSM kthread via t7xx_dpmaif_md_state_callback()) do not hold
tx_pm_lock. A race where the FSM transitions the modem to
DPMAIF_STATE_PWROFF concurrently with the TX kthread's last burst is
pre-existing and not introduced by this patch; the do/while condition
in t7xx_do_tx_hw_push() now re-checks state with READ_ONCE() at each
iteration boundary, limiting exposure to at most one burst.

Tested: no soft lockup observed over 500+ suspend/resume cycles with
SIM registered and ASPM L1 enabled (previously triggered in < 300).

Fixes: 46e8f49ed7b3 ("net: wwan: t7xx: Introduce power management")
Signed-off-by: Tim JH Chen <tim.jh.chen@wnc.com.tw>
---
v2 -> v3:
  Process fixes (per Documentation/process/maintainer-netdev.rst):
  - Add target tree (net) and revision (v3) to subject prefix
  - Fix Fixes tag to point to 46e8f49ed7b3 ("net: wwan: t7xx:
    Introduce power management") instead of a kernel release tag
  - Move version changelog after '---' separator

  Code fixes (addressing AI-assisted code review of v2):
  - Capture pre_suspend_state inside tx_pm_lock (was outside the lock
    in v2), closing a race where a concurrent t7xx_dpmaif_stop() from
    the MD-FSM kthread could flip state between the snapshot and the
    mutex acquisition, causing resume to incorrectly restore PWRON
  - In resume, re-arm HW before publishing state under tx_pm_lock; in
    v2 state was published before t7xx_dpmaif_start_hw(), allowing the
    TX kthread to call ul_update_hw_drb_cnt() while UL_ALL_Q_EN=0
  - Skip HW re-arming in resume when pre_suspend_state==PWROFF, to
    avoid leaving DMA engines and IRQs live when the MD state machine
    considers the modem stopped or in exception
  - Add WRITE_ONCE() to t7xx_dpmaif_start()/stop() state writes and
    READ_ONCE() to t7xx_do_tx_hw_push() while condition
  - Document why pm_runtime_resume_and_get() under tx_pm_lock cannot
    cause a new deadlock against the suspend path
  - Document the pre-existing MD-FSM kthread / TX kthread race

v1 -> v2:
  - Resume no longer unconditionally restores DPMAIF_STATE_PWRON;
    pre_suspend_state saves the pre-suspend modem state across the cycle
  - Replace the second plain state check with mutex (tx_pm_lock) that
    wraps the full pm_runtime section, eliminating the TOCTOU window
    rather than narrowing it
  - Add READ_ONCE/WRITE_ONCE at state accesses crossing the
    suspend/resume boundary

Signed-off-by: Tim JH Chen <tim.jh.chen@wnc.com.tw>
---
 drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c    | 25 ++++++++++++++++------
 drivers/net/wwan/t7xx/t7xx_hif_dpmaif.h    |  3 +++
 drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c | 18 ++++++++++++----
 3 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
index 7ff33c1d6ac7..845a42fdf507 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c
@@ -363,7 +363,7 @@ static int t7xx_dpmaif_start(struct dpmaif_ctrl *dpmaif_ctrl)
 
 	t7xx_dpmaif_ul_clr_all_intr(hw_info);
 	t7xx_dpmaif_dl_clr_all_intr(hw_info);
-	dpmaif_ctrl->state = DPMAIF_STATE_PWRON;
+	WRITE_ONCE(dpmaif_ctrl->state, DPMAIF_STATE_PWRON);
 	t7xx_dpmaif_enable_irq(dpmaif_ctrl);
 	wake_up(&dpmaif_ctrl->tx_wq);
 	return 0;
@@ -400,7 +400,7 @@ static int t7xx_dpmaif_stop(struct dpmaif_ctrl *dpmaif_ctrl)
 		return -EFAULT;
 
 	t7xx_dpmaif_disable_irq(dpmaif_ctrl);
-	dpmaif_ctrl->state = DPMAIF_STATE_PWROFF;
+	WRITE_ONCE(dpmaif_ctrl->state, DPMAIF_STATE_PWROFF);
 	t7xx_dpmaif_stop_sw(dpmaif_ctrl);
 	t7xx_dpmaif_tx_clear(dpmaif_ctrl);
 	t7xx_dpmaif_rx_clear(dpmaif_ctrl);
@@ -412,6 +412,11 @@ static int t7xx_dpmaif_suspend(struct t7xx_pci_dev *t7xx_dev, void *param)
 	struct dpmaif_ctrl *dpmaif_ctrl = param;
 
 	t7xx_dpmaif_tx_stop(dpmaif_ctrl);
+	mutex_lock(&dpmaif_ctrl->tx_pm_lock);
+	dpmaif_ctrl->pre_suspend_state = READ_ONCE(dpmaif_ctrl->state);
+	WRITE_ONCE(dpmaif_ctrl->state, DPMAIF_STATE_PWROFF);
+	mutex_unlock(&dpmaif_ctrl->tx_pm_lock);
+	wake_up(&dpmaif_ctrl->tx_wq);
 	t7xx_dpmaif_hw_stop_all_txq(&dpmaif_ctrl->hw_info);
 	t7xx_dpmaif_hw_stop_all_rxq(&dpmaif_ctrl->hw_info);
 	t7xx_dpmaif_disable_irq(dpmaif_ctrl);
@@ -451,11 +456,17 @@ static int t7xx_dpmaif_resume(struct t7xx_pci_dev *t7xx_dev, void *param)
 	if (!dpmaif_ctrl)
 		return 0;
 
-	t7xx_dpmaif_start_txrx_qs(dpmaif_ctrl);
-	t7xx_dpmaif_enable_irq(dpmaif_ctrl);
-	t7xx_dpmaif_unmask_dlq_intr(dpmaif_ctrl);
-	t7xx_dpmaif_start_hw(&dpmaif_ctrl->hw_info);
-	wake_up(&dpmaif_ctrl->tx_wq);
+	if (dpmaif_ctrl->pre_suspend_state == DPMAIF_STATE_PWRON) {
+		t7xx_dpmaif_start_txrx_qs(dpmaif_ctrl);
+		t7xx_dpmaif_enable_irq(dpmaif_ctrl);
+		t7xx_dpmaif_unmask_dlq_intr(dpmaif_ctrl);
+		t7xx_dpmaif_start_hw(&dpmaif_ctrl->hw_info);
+	}
+	mutex_lock(&dpmaif_ctrl->tx_pm_lock);
+	WRITE_ONCE(dpmaif_ctrl->state, dpmaif_ctrl->pre_suspend_state);
+	mutex_unlock(&dpmaif_ctrl->tx_pm_lock);
+	if (dpmaif_ctrl->pre_suspend_state == DPMAIF_STATE_PWRON)
+		wake_up(&dpmaif_ctrl->tx_wq);
 	return 0;
 }
 
diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.h b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.h
index 0ce4505e813d..670ed2cca761 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.h
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.h
@@ -20,6 +20,7 @@
 
 #include <linux/bitmap.h>
 #include <linux/mm_types.h>
+#include <linux/mutex.h>
 #include <linux/netdevice.h>
 #include <linux/sched.h>
 #include <linux/skbuff.h>
@@ -172,6 +173,8 @@ struct dpmaif_ctrl {
 	struct t7xx_pci_dev		*t7xx_dev;
 	struct md_pm_entity		dpmaif_pm_entity;
 	enum dpmaif_state		state;
+	enum dpmaif_state		pre_suspend_state;
+	struct mutex			tx_pm_lock;
 	bool				dpmaif_sw_init_done;
 	struct dpmaif_hw_info		hw_info;
 	struct dpmaif_tx_queue		txq[DPMAIF_TXQ_NUM];
diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
index 236d632cf591..e278e9703c69 100644
--- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
+++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c
@@ -439,7 +439,7 @@ static void t7xx_do_tx_hw_push(struct dpmaif_ctrl *dpmaif_ctrl)
 
 		cond_resched();
 	} while (!t7xx_tx_lists_are_all_empty(dpmaif_ctrl) && !kthread_should_stop() &&
-		 (dpmaif_ctrl->state == DPMAIF_STATE_PWRON));
+		 READ_ONCE(dpmaif_ctrl->state) == DPMAIF_STATE_PWRON);
 }
 
 static int t7xx_dpmaif_tx_hw_push_thread(void *arg)
@@ -449,10 +449,10 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg)
 
 	while (!kthread_should_stop()) {
 		if (t7xx_tx_lists_are_all_empty(dpmaif_ctrl) ||
-		    dpmaif_ctrl->state != DPMAIF_STATE_PWRON) {
+		    READ_ONCE(dpmaif_ctrl->state) != DPMAIF_STATE_PWRON) {
 			if (wait_event_interruptible(dpmaif_ctrl->tx_wq,
 						     (!t7xx_tx_lists_are_all_empty(dpmaif_ctrl) &&
-						     dpmaif_ctrl->state == DPMAIF_STATE_PWRON) ||
+						     READ_ONCE(dpmaif_ctrl->state) == DPMAIF_STATE_PWRON) ||
 						     kthread_should_stop()))
 				continue;
 
@@ -460,14 +460,23 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg)
 				break;
 		}
 
+		mutex_lock(&dpmaif_ctrl->tx_pm_lock);
+		if (READ_ONCE(dpmaif_ctrl->state) != DPMAIF_STATE_PWRON) {
+			mutex_unlock(&dpmaif_ctrl->tx_pm_lock);
+			continue;
+		}
+
 		ret = pm_runtime_resume_and_get(dpmaif_ctrl->dev);
-		if (ret < 0 && ret != -EACCES)
+		if (ret < 0 && ret != -EACCES) {
+			mutex_unlock(&dpmaif_ctrl->tx_pm_lock);
 			return ret;
+		}
 
 		t7xx_pci_disable_sleep(dpmaif_ctrl->t7xx_dev);
 		t7xx_do_tx_hw_push(dpmaif_ctrl);
 		t7xx_pci_enable_sleep(dpmaif_ctrl->t7xx_dev);
 		pm_runtime_put_autosuspend(dpmaif_ctrl->dev);
+		mutex_unlock(&dpmaif_ctrl->tx_pm_lock);
 	}
 
 	return 0;
@@ -475,6 +484,7 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg)
 
 int t7xx_dpmaif_tx_thread_init(struct dpmaif_ctrl *dpmaif_ctrl)
 {
+	mutex_init(&dpmaif_ctrl->tx_pm_lock);
 	init_waitqueue_head(&dpmaif_ctrl->tx_wq);
 	dpmaif_ctrl->tx_thread = kthread_run(t7xx_dpmaif_tx_hw_push_thread,
 					     dpmaif_ctrl, "dpmaif_tx_hw_push");
-- 
2.43.0

Re: [PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend
Posted by Paolo Abeni 3 days, 21 hours ago
On 6/1/26 3:52 AM, Tim JH Chen wrote:
> When system suspend is triggered while the DPMAIF TX kthread
> (t7xx_dpmaif_tx_hw_push_thread) is running, a deadlock can occur
> leading to a CPU soft lockup.
> 
> The root cause is two-fold:
> 
> 1. t7xx_dpmaif_suspend() calls t7xx_dpmaif_tx_stop() which only stops
>    the TX work-queue items (by clearing txq->que_started and waiting on
>    txq->tx_processing). It does NOT signal the kthread and does NOT
>    update dpmaif_ctrl->state, which stays DPMAIF_STATE_PWRON.
> 
> 2. The kthread's state guard is only checked at the top of each loop
>    iteration. If the thread already passed this guard, it proceeds
>    unconditionally to call pm_runtime_resume_and_get() — which tries to
>    acquire dev->power.lock also contended by the system PM suspend path.
> 
> The result is a spinlock deadlock observed as:
> 
>   watchdog: BUG: soft lockup - CPU#N stuck for 26s! [dpmaif_tx_hw_pu]
>   RIP: _raw_spin_unlock_irqrestore
>   Call Trace:
>     __pm_runtime_resume+0x5b/0x80
>     t7xx_dpmaif_tx_hw_push_thread+0xc4 [mtk_t7xx]
> 
> The condition requires ASPM L1 enabled on the endpoint (which extends
> the time pm_runtime_resume_and_get() holds dev->power.lock during L1.2
> link retraining) and hundreds of repeated suspend/resume cycles to
> trigger reliably.
> 
> Fix by introducing tx_pm_lock (struct mutex) and several coordinated
> changes:
> 
> t7xx_dpmaif_suspend():
>   After t7xx_dpmaif_tx_stop(), acquire tx_pm_lock. Under the lock,
>   snapshot dpmaif_ctrl->state into pre_suspend_state (capturing the
>   modem state atomically with respect to the kthread's PM section),
>   then set DPMAIF_STATE_PWROFF via WRITE_ONCE(). Release the lock
>   and call wake_up() so any sleeping kthread re-evaluates the
>   wait_event condition and exits.
> 
>   t7xx_dpmaif_suspend() acquires tx_pm_lock without holding any PM
>   lock. While it waits, the kthread may call pm_runtime_resume_and_get()
>   which briefly takes and releases dev->power.lock independently.
>   Because the suspend callback does not compete for dev->power.lock at
>   this point, the original spinlock deadlock cannot occur. Suspend
>   latency increases by at most one TX burst drain time, which is
>   bounded by the DRB ring depth.
> 
> t7xx_dpmaif_resume():
>   When pre_suspend_state is DPMAIF_STATE_PWRON, re-arm the HW fully
>   (start_txrx_qs, enable_irq, unmask_dlq_intr, start_hw) before
>   publishing the new state. This ensures the kthread cannot issue
>   ul_update_hw_drb_cnt() MMIO writes before UL_ALL_Q_EN is set by
>   t7xx_dpmaif_start_hw(). Publish the restored state under tx_pm_lock
>   to serialise with the kthread's under-lock state check. Wake up the
>   kthread only after HW and state are both consistent.
> 
>   When pre_suspend_state is DPMAIF_STATE_PWROFF (modem was already
>   stopped or in exception before suspend), skip HW re-arming entirely
>   to avoid leaving DMA engines running while the MD state machine
>   considers the modem inactive.
> 
> t7xx_dpmaif_tx_hw_push_thread():
>   Hold tx_pm_lock across the [state check -> pm_runtime_resume_and_get
>   -> pm_runtime_put_autosuspend] sequence. A second READ_ONCE() state
>   check under the lock closes the TOCTOU window between the wait_event
>   guard at the loop top and the pm_runtime call. READ_ONCE() is used
>   in all unguarded state reads in this function.
> 
> t7xx_dpmaif_start() / t7xx_dpmaif_stop():
>   Use WRITE_ONCE() for state writes to match the READ_ONCE() reads
>   used throughout the driver and prevent compiler optimisations from
>   obscuring concurrent access.
> 
> t7xx_do_tx_hw_push():
>   Use READ_ONCE() in the do/while termination condition to match the
>   WRITE_ONCE() annotations on the write side.
> 
> t7xx_dpmaif_tx_thread_init():
>   Initialise tx_pm_lock with mutex_init().
> 
> Note: t7xx_dpmaif_start() and t7xx_dpmaif_stop() (called from the
> MD-FSM kthread via t7xx_dpmaif_md_state_callback()) do not hold
> tx_pm_lock. A race where the FSM transitions the modem to
> DPMAIF_STATE_PWROFF concurrently with the TX kthread's last burst is
> pre-existing and not introduced by this patch; the do/while condition
> in t7xx_do_tx_hw_push() now re-checks state with READ_ONCE() at each
> iteration boundary, limiting exposure to at most one burst.
> 
> Tested: no soft lockup observed over 500+ suspend/resume cycles with
> SIM registered and ASPM L1 enabled (previously triggered in < 300).
> 
> Fixes: 46e8f49ed7b3 ("net: wwan: t7xx: Introduce power management")
> Signed-off-by: Tim JH Chen <tim.jh.chen@wnc.com.tw>

The above is way too verbose and hints that this patch should likely 
be split in a series.

Note that process wise there are still several problems:
- missing revision number in the suby prefix
- mismatch between from email message and SoB
- new revision MUST NOT be in reply-to of older ones.

Please try to be accurate with your next resubmission, or we will
have to delay processing this patch for an additional while.

Sashiko has still quite a bit of concerns:

https://netdev-ai.bots.linux.dev/sashiko/#/patchset/20260601015231.3211764-1-tim.jh.chen%40wnc.com.tw

/P