[PATCH 02/17] wifi: mt76: mt7925: fix missing mutex protection in reset and ROC abort

Zac Bowling posted 17 patches 2 days, 22 hours ago
[PATCH 02/17] wifi: mt76: mt7925: fix missing mutex protection in reset and ROC abort
Posted by Zac Bowling 2 days, 22 hours ago
During firmware recovery and ROC (Remain On Channel) abort operations,
the driver iterates over active interfaces and calls MCU functions that
require the device mutex to be held, but the mutex was not acquired.

This causes system-wide deadlocks where the system becomes completely
unresponsive. From logs on affected systems:

  INFO: task kworker/u128:0:48737 blocked for more than 122 seconds.
  Workqueue: mt76 mt7925_mac_reset_work [mt7925_common]
  Call Trace:
   __schedule+0x426/0x12c0
   schedule+0x27/0xf0
   schedule_preempt_disabled+0x15/0x30
   __mutex_lock.constprop.0+0x3d0/0x6d0
   mt7925_mac_reset_work+0x85/0x170 [mt7925_common]

The deadlock manifests approximately every 5 minutes when the adapter
tries to hop to a better BSSID, triggering firmware reset. Network
commands (ip, ifconfig, etc.) hang indefinitely, processes get stuck
in uninterruptible sleep (D state), and reboot hangs as well.

Add mutex protection around interface iteration in:
- mt7925_mac_reset_work(): Called during firmware recovery after MCU
  timeouts to reconnect all interfaces
- mt7925_roc_abort_sync() in suspend path: Called during suspend to
  clean up Remain On Channel operations

This matches the pattern used in mt7615 and other MediaTek drivers where
interface iteration callbacks invoke MCU functions with mutex held:

  // mt7615/main.c - roc_work has mutex protection
  mt7615_mutex_acquire(phy->dev);
  ieee80211_iterate_active_interfaces(...);
  mt7615_mutex_release(phy->dev);

Note: Sean Wang from MediaTek has submitted an alternative fix for the
ROC path using cancel_delayed_work() instead of cancel_delayed_work_sync().
Both approaches address the deadlock; this one adds explicit mutex
protection which may be superseded by the upstream fix.

Fixes: c948b5da6bbe ("wifi: mt76: mt7925: add Mediatek Wi-Fi7 driver for mt7925 chips")
Link: https://community.frame.work/t/kernel-panic-from-wifi-mediatek-mt7925-nullptr-dereference/79301
Reported-by: Zac Bowling <zac@zacbowling.com>
Tested-by: Zac Bowling <zac@zacbowling.com>
Signed-off-by: Zac Bowling <zac@zacbowling.com>
---
 drivers/net/wireless/mediatek/mt76/mt7925/mac.c | 2 ++
 drivers/net/wireless/mediatek/mt76/mt7925/pci.c | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/drivers/net/wireless/mediatek/mt76/mt7925/mac.c b/drivers/net/wireless/mediatek/mt76/mt7925/mac.c
index 184efe8afa10..06420ac6ed55 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7925/mac.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7925/mac.c
@@ -1331,9 +1331,11 @@ void mt7925_mac_reset_work(struct work_struct *work)
 	dev->hw_full_reset = false;
 	pm->suspended = false;
 	ieee80211_wake_queues(hw);
+	mt792x_mutex_acquire(dev);
 	ieee80211_iterate_active_interfaces(hw,
 					    IEEE80211_IFACE_ITER_RESUME_ALL,
 					    mt7925_vif_connect_iter, NULL);
+	mt792x_mutex_release(dev);
 	mt76_connac_power_save_sched(&dev->mt76.phy, pm);
 
 	mt7925_regd_change(&dev->phy, "00");
diff --git a/drivers/net/wireless/mediatek/mt76/mt7925/pci.c b/drivers/net/wireless/mediatek/mt76/mt7925/pci.c
index c4161754c01d..e9d62c6aee91 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7925/pci.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7925/pci.c
@@ -455,7 +455,9 @@ static int mt7925_pci_suspend(struct device *device)
 	cancel_delayed_work_sync(&pm->ps_work);
 	cancel_work_sync(&pm->wake_work);
 
+	mt792x_mutex_acquire(dev);
 	mt7925_roc_abort_sync(dev);
+	mt792x_mutex_release(dev);
 
 	err = mt792x_mcu_drv_pmctrl(dev);
 	if (err < 0)
-- 
2.51.0