From nobody Mon Jun 8 07:24:28 2026 Received: from mail-pg1-f172.google.com (mail-pg1-f172.google.com [209.85.215.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95BA11ABED9 for ; Mon, 1 Jun 2026 01:52:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780278769; cv=none; b=khC9wvc85b2NuEVG4PyHTXQp/YEPLnDOS0kre6Bq4KryWpR6ITWNZ+X8TiALaK60m4dXf+89HPTvNzO824MCqRl42wR5tIgS2h6dJ+kK6HsRDBagCEgkx+sEkjrHnQb2C6IqGudFjr/rkIAiqiWlYpImMj9xJeQmFfaeH65AOa4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1780278769; c=relaxed/simple; bh=96ARyzPfhfRo6MzSQYdf9+SIgqcfAHdfmIgw1cJ3tiI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=iYKDWgYC3Krba1KwvRcFglIWuz+wCTy6omKnkhTZtP3a++on9rkfVS+2WZqJX+bCSjtwC98C1WGihhMKl6u/yP1hWfUdsE2QxwEiCddgZ8D72Q1KDGlUCgKVdQj9QctEYl8kHgiHgNlsOWmq7K8VeBxPDrhGTuFTxMnzGGZ+lLE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=GT+2pmhK; arc=none smtp.client-ip=209.85.215.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GT+2pmhK" Received: by mail-pg1-f172.google.com with SMTP id 41be03b00d2f7-c8584e80d59so547241a12.2 for ; Sun, 31 May 2026 18:52:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780278767; x=1780883567; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ktDMM0rgR5BCs+kFb4cZjawFjqwuxm/udQlAT/C8G1A=; b=GT+2pmhK/kpIZUYrm8r93GHDfkQUku1hwA2jAd7iGfyDY2eJI3d2eDWw/SyZ0jQdC5 kutiVijI/1Yss1MepGFaeDE63xFEwsvgMa5IzTJ4dIBa7qxwazUwTj8gmUCrfdJHJE+z jvNt7yirbPBQCfw33M0p4tu16Ib4AVSLQSXRM/qOKRQXVuuSHQFitm0nPA7Tl4LUQOzR 7ZBfKJFND4bh7WmWXc9S4gjDmK4gnrdVy1fmkANLKicaWzFvnQTuQMqTagqYqUYIHMdI XnIHcmXK93c3+CVo8gnsliz2MnrOnK/9T+G/NDIVcLPOjrkCpZ9hdJEUBlYjqG1qqKAg qAaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780278767; x=1780883567; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=ktDMM0rgR5BCs+kFb4cZjawFjqwuxm/udQlAT/C8G1A=; b=hxFp1kjDk5pTDzZLoQGea++ayUba0y9b0+FFQuwdIZJ7Z0hoJjadBGwwcmOMCazdVN ccAO9jOwOc4Ive2nDd2p4qYkyqWriLoYU2Y4xBXb6d/zVxFzx3uO4LXWC7wd5+kkEIJp hcdwo8QfLuPP2nw/Zw/8WW+vXX/nBKsir4v3DIlQpdTbZTD7Nkd/zkVY7UiYnTDjCrvS Yv0fT8rXQHPSkC+NtLBKxaxFEevtxzPbDz97ctIoMIst4ViGgBnYyqPLFR8u+60sc65m ff77HsE0tnL9JS7uNr12arjqh0XApBsLnRPuG5flaFxro9fg7BKzzzSDUj1nP9DL3toL AGMw== X-Forwarded-Encrypted: i=1; AFNElJ+Q7qtWeNEyGOl0G/l+L8bJDVz8GaeG+d2jLp4XtQROIPfgvdav7A+QuabbIvyXb8rOAJWN8nFlMSpCUkM=@vger.kernel.org X-Gm-Message-State: AOJu0Yx/OWQF5QuE0b6ZXOyoGTVknEQd4bhyCBUmWFjkj0kTra6MYCey aVQO5cMx4/NWxeOrO0O/c3YLSVy1kfgMG4bYHMMfwq7I/jEcJdjz6kNZ X-Gm-Gg: Acq92OEPmBmjcjPv1ebQ9Si9Bf/SuU88sfgFsGBuYVkb4TqRMlQTW4hW/5kvcvTrlXE N4VYgRhNcI8OxlVnQHX1MwlXFLHY7LqTK7KrCR4/DroMROd8uHaWw0ezaFcK+qQfM0kgb6RIyVQ 0iT8vH8kSdmA1oRAQmgBUm49xNlaTs574KRiBgC0yw6N9+gZT+J1K3LW6frvBrOhHtWv4MM8y/6 hb5o2LBx91QFAb41Zj04k4XCZGmZ5r0shHH2GTeEcn0X+M1TQOQjwWdUsQuUMpgCM+coAQVUmoi aHJcw9Ocq+zSkJ3Y+h/QBmnqJo1PDnSrWZ5VJ9Mi5v5fNCuodDXJmx21AsTykIIaC0mqQ0ZCnU8 WzM8Hs3l7TvOo5RdUls5yEgU8JgiCCTI7JHJtM7R56Tqm5/fWm1DBnhp18fxT5upN+lL0IzQPrh u/hzRZ5Zc9oeEP8NzuwsB46MpVzNPFgmvoU0LNrhrOVoW8n/FGIARDYqEuWR7U X-Received: by 2002:a05:6a00:4f93:b0:842:5a8d:3035 with SMTP id d2e1a72fcca58-8425a8d40b9mr621211b3a.22.1780278766531; Sun, 31 May 2026 18:52:46 -0700 (PDT) Received: from u.. (61-222-64-201.hinet-ip.hinet.net. [61.222.64.201]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-84214d018fcsm8139010b3a.60.2026.05.31.18.52.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 31 May 2026 18:52:45 -0700 (PDT) From: Tim JH Chen X-Google-Original-From: Tim JH Chen To: netdev@vger.kernel.org Cc: pabeni@redhat.com, haijun.liu@mediatek.com, chandrashekar.devegowda@intel.com, ricardo.martinez@linux.intel.com, loic.poulain@oss.qualcomm.com, ryazanov.s.a@gmail.com, johannes@sipsolutions.net, andrew+netdev@lunn.ch, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, linux-kernel@vger.kernel.org, tim.jh.chen@wnc.com.tw, Chih.Hung.Huang@wnc.com.tw Subject: [PATCH] net: wwan: t7xx: fix race between TX thread and system PM suspend Date: Mon, 1 Jun 2026 09:52:31 +0800 Message-ID: <20260601015231.3211764-1-tim.jh.chen@wnc.com.tw> X-Mailer: git-send-email 2.43.0 In-Reply-To: <2f9c5f6b-1d8d-4c8b-815d-77a40aa76e23@redhat.com> References: <2f9c5f6b-1d8d-4c8b-815d-77a40aa76e23@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable When system suspend is triggered while the DPMAIF TX kthread (t7xx_dpmaif_tx_hw_push_thread) is running, a deadlock can occur leading to a CPU soft lockup. The root cause is two-fold: 1. t7xx_dpmaif_suspend() calls t7xx_dpmaif_tx_stop() which only stops the TX work-queue items (by clearing txq->que_started and waiting on txq->tx_processing). It does NOT signal the kthread and does NOT update dpmaif_ctrl->state, which stays DPMAIF_STATE_PWRON. 2. The kthread's state guard is only checked at the top of each loop iteration. If the thread already passed this guard, it proceeds unconditionally to call pm_runtime_resume_and_get() =E2=80=94 which trie= s to acquire dev->power.lock also contended by the system PM suspend path. The result is a spinlock deadlock observed as: watchdog: BUG: soft lockup - CPU#N stuck for 26s! [dpmaif_tx_hw_pu] RIP: _raw_spin_unlock_irqrestore Call Trace: __pm_runtime_resume+0x5b/0x80 t7xx_dpmaif_tx_hw_push_thread+0xc4 [mtk_t7xx] The condition requires ASPM L1 enabled on the endpoint (which extends the time pm_runtime_resume_and_get() holds dev->power.lock during L1.2 link retraining) and hundreds of repeated suspend/resume cycles to trigger reliably. Fix by introducing tx_pm_lock (struct mutex) and several coordinated changes: t7xx_dpmaif_suspend(): After t7xx_dpmaif_tx_stop(), acquire tx_pm_lock. Under the lock, snapshot dpmaif_ctrl->state into pre_suspend_state (capturing the modem state atomically with respect to the kthread's PM section), then set DPMAIF_STATE_PWROFF via WRITE_ONCE(). Release the lock and call wake_up() so any sleeping kthread re-evaluates the wait_event condition and exits. t7xx_dpmaif_suspend() acquires tx_pm_lock without holding any PM lock. While it waits, the kthread may call pm_runtime_resume_and_get() which briefly takes and releases dev->power.lock independently. Because the suspend callback does not compete for dev->power.lock at this point, the original spinlock deadlock cannot occur. Suspend latency increases by at most one TX burst drain time, which is bounded by the DRB ring depth. t7xx_dpmaif_resume(): When pre_suspend_state is DPMAIF_STATE_PWRON, re-arm the HW fully (start_txrx_qs, enable_irq, unmask_dlq_intr, start_hw) before publishing the new state. This ensures the kthread cannot issue ul_update_hw_drb_cnt() MMIO writes before UL_ALL_Q_EN is set by t7xx_dpmaif_start_hw(). Publish the restored state under tx_pm_lock to serialise with the kthread's under-lock state check. Wake up the kthread only after HW and state are both consistent. When pre_suspend_state is DPMAIF_STATE_PWROFF (modem was already stopped or in exception before suspend), skip HW re-arming entirely to avoid leaving DMA engines running while the MD state machine considers the modem inactive. t7xx_dpmaif_tx_hw_push_thread(): Hold tx_pm_lock across the [state check -> pm_runtime_resume_and_get -> pm_runtime_put_autosuspend] sequence. A second READ_ONCE() state check under the lock closes the TOCTOU window between the wait_event guard at the loop top and the pm_runtime call. READ_ONCE() is used in all unguarded state reads in this function. t7xx_dpmaif_start() / t7xx_dpmaif_stop(): Use WRITE_ONCE() for state writes to match the READ_ONCE() reads used throughout the driver and prevent compiler optimisations from obscuring concurrent access. t7xx_do_tx_hw_push(): Use READ_ONCE() in the do/while termination condition to match the WRITE_ONCE() annotations on the write side. t7xx_dpmaif_tx_thread_init(): Initialise tx_pm_lock with mutex_init(). Note: t7xx_dpmaif_start() and t7xx_dpmaif_stop() (called from the MD-FSM kthread via t7xx_dpmaif_md_state_callback()) do not hold tx_pm_lock. A race where the FSM transitions the modem to DPMAIF_STATE_PWROFF concurrently with the TX kthread's last burst is pre-existing and not introduced by this patch; the do/while condition in t7xx_do_tx_hw_push() now re-checks state with READ_ONCE() at each iteration boundary, limiting exposure to at most one burst. Tested: no soft lockup observed over 500+ suspend/resume cycles with SIM registered and ASPM L1 enabled (previously triggered in < 300). Fixes: 46e8f49ed7b3 ("net: wwan: t7xx: Introduce power management") Signed-off-by: Tim JH Chen --- v2 -> v3: Process fixes (per Documentation/process/maintainer-netdev.rst): - Add target tree (net) and revision (v3) to subject prefix - Fix Fixes tag to point to 46e8f49ed7b3 ("net: wwan: t7xx: Introduce power management") instead of a kernel release tag - Move version changelog after '---' separator Code fixes (addressing AI-assisted code review of v2): - Capture pre_suspend_state inside tx_pm_lock (was outside the lock in v2), closing a race where a concurrent t7xx_dpmaif_stop() from the MD-FSM kthread could flip state between the snapshot and the mutex acquisition, causing resume to incorrectly restore PWRON - In resume, re-arm HW before publishing state under tx_pm_lock; in v2 state was published before t7xx_dpmaif_start_hw(), allowing the TX kthread to call ul_update_hw_drb_cnt() while UL_ALL_Q_EN=3D0 - Skip HW re-arming in resume when pre_suspend_state=3D=3DPWROFF, to avoid leaving DMA engines and IRQs live when the MD state machine considers the modem stopped or in exception - Add WRITE_ONCE() to t7xx_dpmaif_start()/stop() state writes and READ_ONCE() to t7xx_do_tx_hw_push() while condition - Document why pm_runtime_resume_and_get() under tx_pm_lock cannot cause a new deadlock against the suspend path - Document the pre-existing MD-FSM kthread / TX kthread race v1 -> v2: - Resume no longer unconditionally restores DPMAIF_STATE_PWRON; pre_suspend_state saves the pre-suspend modem state across the cycle - Replace the second plain state check with mutex (tx_pm_lock) that wraps the full pm_runtime section, eliminating the TOCTOU window rather than narrowing it - Add READ_ONCE/WRITE_ONCE at state accesses crossing the suspend/resume boundary Signed-off-by: Tim JH Chen --- drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c | 25 ++++++++++++++++------ drivers/net/wwan/t7xx/t7xx_hif_dpmaif.h | 3 +++ drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c | 18 ++++++++++++---- 3 files changed, 35 insertions(+), 11 deletions(-) diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c b/drivers/net/wwan/t7x= x/t7xx_hif_dpmaif.c index 7ff33c1d6ac7..845a42fdf507 100644 --- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c +++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.c @@ -363,7 +363,7 @@ static int t7xx_dpmaif_start(struct dpmaif_ctrl *dpmaif= _ctrl) =20 t7xx_dpmaif_ul_clr_all_intr(hw_info); t7xx_dpmaif_dl_clr_all_intr(hw_info); - dpmaif_ctrl->state =3D DPMAIF_STATE_PWRON; + WRITE_ONCE(dpmaif_ctrl->state, DPMAIF_STATE_PWRON); t7xx_dpmaif_enable_irq(dpmaif_ctrl); wake_up(&dpmaif_ctrl->tx_wq); return 0; @@ -400,7 +400,7 @@ static int t7xx_dpmaif_stop(struct dpmaif_ctrl *dpmaif_= ctrl) return -EFAULT; =20 t7xx_dpmaif_disable_irq(dpmaif_ctrl); - dpmaif_ctrl->state =3D DPMAIF_STATE_PWROFF; + WRITE_ONCE(dpmaif_ctrl->state, DPMAIF_STATE_PWROFF); t7xx_dpmaif_stop_sw(dpmaif_ctrl); t7xx_dpmaif_tx_clear(dpmaif_ctrl); t7xx_dpmaif_rx_clear(dpmaif_ctrl); @@ -412,6 +412,11 @@ static int t7xx_dpmaif_suspend(struct t7xx_pci_dev *t7= xx_dev, void *param) struct dpmaif_ctrl *dpmaif_ctrl =3D param; =20 t7xx_dpmaif_tx_stop(dpmaif_ctrl); + mutex_lock(&dpmaif_ctrl->tx_pm_lock); + dpmaif_ctrl->pre_suspend_state =3D READ_ONCE(dpmaif_ctrl->state); + WRITE_ONCE(dpmaif_ctrl->state, DPMAIF_STATE_PWROFF); + mutex_unlock(&dpmaif_ctrl->tx_pm_lock); + wake_up(&dpmaif_ctrl->tx_wq); t7xx_dpmaif_hw_stop_all_txq(&dpmaif_ctrl->hw_info); t7xx_dpmaif_hw_stop_all_rxq(&dpmaif_ctrl->hw_info); t7xx_dpmaif_disable_irq(dpmaif_ctrl); @@ -451,11 +456,17 @@ static int t7xx_dpmaif_resume(struct t7xx_pci_dev *t7= xx_dev, void *param) if (!dpmaif_ctrl) return 0; =20 - t7xx_dpmaif_start_txrx_qs(dpmaif_ctrl); - t7xx_dpmaif_enable_irq(dpmaif_ctrl); - t7xx_dpmaif_unmask_dlq_intr(dpmaif_ctrl); - t7xx_dpmaif_start_hw(&dpmaif_ctrl->hw_info); - wake_up(&dpmaif_ctrl->tx_wq); + if (dpmaif_ctrl->pre_suspend_state =3D=3D DPMAIF_STATE_PWRON) { + t7xx_dpmaif_start_txrx_qs(dpmaif_ctrl); + t7xx_dpmaif_enable_irq(dpmaif_ctrl); + t7xx_dpmaif_unmask_dlq_intr(dpmaif_ctrl); + t7xx_dpmaif_start_hw(&dpmaif_ctrl->hw_info); + } + mutex_lock(&dpmaif_ctrl->tx_pm_lock); + WRITE_ONCE(dpmaif_ctrl->state, dpmaif_ctrl->pre_suspend_state); + mutex_unlock(&dpmaif_ctrl->tx_pm_lock); + if (dpmaif_ctrl->pre_suspend_state =3D=3D DPMAIF_STATE_PWRON) + wake_up(&dpmaif_ctrl->tx_wq); return 0; } =20 diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.h b/drivers/net/wwan/t7x= x/t7xx_hif_dpmaif.h index 0ce4505e813d..670ed2cca761 100644 --- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.h +++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif.h @@ -20,6 +20,7 @@ =20 #include #include +#include #include #include #include @@ -172,6 +173,8 @@ struct dpmaif_ctrl { struct t7xx_pci_dev *t7xx_dev; struct md_pm_entity dpmaif_pm_entity; enum dpmaif_state state; + enum dpmaif_state pre_suspend_state; + struct mutex tx_pm_lock; bool dpmaif_sw_init_done; struct dpmaif_hw_info hw_info; struct dpmaif_tx_queue txq[DPMAIF_TXQ_NUM]; diff --git a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c b/drivers/net/wwan/= t7xx/t7xx_hif_dpmaif_tx.c index 236d632cf591..e278e9703c69 100644 --- a/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c +++ b/drivers/net/wwan/t7xx/t7xx_hif_dpmaif_tx.c @@ -439,7 +439,7 @@ static void t7xx_do_tx_hw_push(struct dpmaif_ctrl *dpma= if_ctrl) =20 cond_resched(); } while (!t7xx_tx_lists_are_all_empty(dpmaif_ctrl) && !kthread_should_sto= p() && - (dpmaif_ctrl->state =3D=3D DPMAIF_STATE_PWRON)); + READ_ONCE(dpmaif_ctrl->state) =3D=3D DPMAIF_STATE_PWRON); } =20 static int t7xx_dpmaif_tx_hw_push_thread(void *arg) @@ -449,10 +449,10 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg) =20 while (!kthread_should_stop()) { if (t7xx_tx_lists_are_all_empty(dpmaif_ctrl) || - dpmaif_ctrl->state !=3D DPMAIF_STATE_PWRON) { + READ_ONCE(dpmaif_ctrl->state) !=3D DPMAIF_STATE_PWRON) { if (wait_event_interruptible(dpmaif_ctrl->tx_wq, (!t7xx_tx_lists_are_all_empty(dpmaif_ctrl) && - dpmaif_ctrl->state =3D=3D DPMAIF_STATE_PWRON) || + READ_ONCE(dpmaif_ctrl->state) =3D=3D DPMAIF_STATE_PWRON) || kthread_should_stop())) continue; =20 @@ -460,14 +460,23 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg) break; } =20 + mutex_lock(&dpmaif_ctrl->tx_pm_lock); + if (READ_ONCE(dpmaif_ctrl->state) !=3D DPMAIF_STATE_PWRON) { + mutex_unlock(&dpmaif_ctrl->tx_pm_lock); + continue; + } + ret =3D pm_runtime_resume_and_get(dpmaif_ctrl->dev); - if (ret < 0 && ret !=3D -EACCES) + if (ret < 0 && ret !=3D -EACCES) { + mutex_unlock(&dpmaif_ctrl->tx_pm_lock); return ret; + } =20 t7xx_pci_disable_sleep(dpmaif_ctrl->t7xx_dev); t7xx_do_tx_hw_push(dpmaif_ctrl); t7xx_pci_enable_sleep(dpmaif_ctrl->t7xx_dev); pm_runtime_put_autosuspend(dpmaif_ctrl->dev); + mutex_unlock(&dpmaif_ctrl->tx_pm_lock); } =20 return 0; @@ -475,6 +484,7 @@ static int t7xx_dpmaif_tx_hw_push_thread(void *arg) =20 int t7xx_dpmaif_tx_thread_init(struct dpmaif_ctrl *dpmaif_ctrl) { + mutex_init(&dpmaif_ctrl->tx_pm_lock); init_waitqueue_head(&dpmaif_ctrl->tx_wq); dpmaif_ctrl->tx_thread =3D kthread_run(t7xx_dpmaif_tx_hw_push_thread, dpmaif_ctrl, "dpmaif_tx_hw_push"); --=20 2.43.0