[PATCH v3 00/17] wifi: mt76: mt7925/mt792x: comprehensive stability fixes

Zac Bowling posted 17 patches 1 day, 20 hours ago
drivers/net/wireless/mediatek/mt76/mt792x_core.c | 27 +++++++++++++++-
drivers/net/wireless/mediatek/mt76/mt7925/mac.c  |  8 +++++
drivers/net/wireless/mediatek/mt76/mt7925/main.c | 95 +++++++++++++++++++++---
drivers/net/wireless/mediatek/mt76/mt7925/mcu.c  | 52 ++++++++++++++---
drivers/net/wireless/mediatek/mt76/mt7925/pci.c  |  6 +++
5 files changed, 170 insertions(+), 18 deletions(-)
[PATCH v3 00/17] wifi: mt76: mt7925/mt792x: comprehensive stability fixes
Posted by Zac Bowling 1 day, 20 hours ago
From: Zac Bowling <zac@zacbowling.com>

This patch series addresses kernel panics, system deadlocks, and various
stability issues in the MT7925 WiFi driver. The issues were discovered on
kernel 6.17 (Ubuntu 25.10) and fixes were developed and tested on 6.18.2.

These patches are based on the wireless tree (nbd168/wireless.git) as
requested by Sean Wang.

== Problem Description ==

The MT7925 driver has several bugs that cause:
- Kernel NULL pointer dereferences during BSSID roaming
- System-wide deadlocks requiring hard reboot
- Firmware reload failures after suspend/resume
- Key removal errors during MLO roaming

These issues manifest approximately every 5 minutes when the adapter
tries to switch to a better BSSID, particularly in enterprise environments
with multiple access points.

== Root Causes ==

1. Missing mutex protection around ieee80211_iterate_active_interfaces()
   when the callback invokes MCU functions (patches 2, 3, 16)

2. NULL pointer dereferences where mt792x_vif_to_bss_conf(),
   mt792x_sta_to_link(), and similar functions return NULL during
   MLO state transitions but results are not checked (patches 1, 4, 5,
   9, 10, 14, 17)

3. Ignored MCU return values hiding firmware errors (patches 6, 7, 8)

4. WARN_ON_ONCE used where NULL is expected during normal MLO AP
   setup (patch 13)

5. Firmware semaphore not released after failed load attempts (patch 15)

6. Key removal returning error when link is already torn down (patch 12)

== Testing ==

Stress tested by hammering the driver with custom test script.

Tested on:
- Framework Desktop (AMD Ryzen AI Max 300 Series) with MT7925 (RZ717)
- This whole patch series was tested on Kernel 6.18.2 and 6.17.12 (Ubuntu 25.10)
- Enterprise WiFi environment with multiple WIFI 7 APs with MLO enabled

Before patches: System hangs/panics every 5-15 minutes during BSSID roaming
After patches: Stable for 24+ hours under continuous stress testing

== Crash Traces Fixed ==

Primary NULL pointer dereference:
  BUG: kernel NULL pointer dereference, address: 0000000000000010
  Workqueue: mt76 mt7925_mac_reset_work [mt7925_common]
  RIP: 0010:mt76_connac_mcu_uni_add_dev+0x9c/0x780 [mt76_connac_lib]
  Call Trace:
   mt7925_vif_connect_iter+0xcb/0x240 [mt7925_common]
   __iterate_interfaces+0x92/0x130 [mac80211]
   ieee80211_iterate_interfaces+0x3d/0x60 [mac80211]
   mt7925_mac_reset_work+0x105/0x190 [mt7925_common]

Deadlock trace:
  INFO: task kworker/u128:0:48737 blocked for more than 122 seconds.
  Workqueue: mt76 mt7925_mac_reset_work [mt7925_common]
  Call Trace:
   __mutex_lock.constprop.0+0x3d0/0x6d0
   mt7925_mac_reset_work+0x85/0x170 [mt7925_common]

== Related Links ==

Framework Community discussion:
https://community.frame.work/t/kernel-panic-from-wifi-mediatek-mt7925-nullptr-dereference/79301

OpenWrt GitHub issues:
https://github.com/openwrt/mt76/issues/1014
https://github.com/openwrt/mt76/issues/1036

GitHub repository with additional analysis:
https://github.com/zbowling/mt7925

Zac Bowling (17):
  wifi: mt76: mt7925: fix NULL pointer dereference in vif iteration
  wifi: mt76: mt7925: fix missing mutex protection in reset and ROC abort
  wifi: mt76: mt7925: fix missing mutex protection in runtime PM and MLO PM
  wifi: mt76: mt7925: add NULL checks in MCU STA TLV functions
  wifi: mt76: mt7925: add NULL checks for link_conf and mlink in main.c
  wifi: mt76: mt7925: add error handling for AMPDU MCU commands
  wifi: mt76: mt7925: add error handling for BSS info MCU command in sta_add
  wifi: mt76: mt7925: add error handling for BSS info in key setup
  wifi: mt76: mt7925: add NULL checks in MLO link and chanctx functions
  wifi: mt76: mt792x: fix NULL pointer dereference in TX path
  wifi: mt76: mt7925: add lockdep assertions for mutex verification
  wifi: mt76: mt7925: fix key removal failure during MLO roaming
  wifi: mt76: mt7925: fix kernel warning in MLO ROC setup
  wifi: mt76: mt7925: add NULL checks for MLO link pointers in MCU functions
  wifi: mt76: mt792x: fix firmware reload failure after previous load crash
  wifi: mt76: mt7925: add mutex protection in resume path
  wifi: mt76: mt7925: add NULL checks in link station and TX queue setup

 drivers/net/wireless/mediatek/mt76/mt792x_core.c | 27 +++++++++++++++-
 drivers/net/wireless/mediatek/mt76/mt7925/mac.c  |  8 +++++
 drivers/net/wireless/mediatek/mt76/mt7925/main.c | 95 +++++++++++++++++++++---
 drivers/net/wireless/mediatek/mt76/mt7925/mcu.c  | 52 ++++++++++++++---
 drivers/net/wireless/mediatek/mt76/mt7925/pci.c  |  6 +++
 5 files changed, 170 insertions(+), 18 deletions(-)

-- 
2.51.0