Lockdep gives a splat [1] when ser_hdl_work item is executed. It is
scheduled at mac80211 workqueue via ieee80211_queue_work() and takes a
wiphy lock inside. However, this workqueue can be flushed when e.g.
closing the interface and wiphy lock is already taken in that case.
Choosing wiphy_work_queue() for SER is likely not suitable. Back on to
the global workqueue.
[1]:
WARNING: possible circular locking dependency detected
6.17.0-rc2 #17 Not tainted
------------------------------------------------------
kworker/u32:1/61 is trying to acquire lock:
ffff88811bc00768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: ser_state_run+0x5e/0x180 [rtw89_core]
but task is already holding lock:
ffffc9000048fd30 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}, at: process_one_work+0x7b5/0x1450
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}:
process_one_work+0x7c6/0x1450
worker_thread+0x49e/0xd00
kthread+0x313/0x640
ret_from_fork+0x221/0x300
ret_from_fork_asm+0x1a/0x30
-> #1 ((wq_completion)phy0){+.+.}-{0:0}:
touch_wq_lockdep_map+0x8e/0x180
__flush_workqueue+0x129/0x10d0
ieee80211_stop_device+0xa8/0x110
ieee80211_do_stop+0x14ce/0x2880
ieee80211_stop+0x13a/0x2c0
__dev_close_many+0x18f/0x510
__dev_change_flags+0x25f/0x670
netif_change_flags+0x7b/0x160
do_setlink.isra.0+0x1640/0x35d0
rtnl_newlink+0xd8c/0x1d30
rtnetlink_rcv_msg+0x700/0xb80
netlink_rcv_skb+0x11d/0x350
netlink_unicast+0x49a/0x7a0
netlink_sendmsg+0x759/0xc20
____sys_sendmsg+0x812/0xa00
___sys_sendmsg+0xf7/0x180
__sys_sendmsg+0x11f/0x1b0
do_syscall_64+0xbb/0x360
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&rdev->wiphy.mtx){+.+.}-{4:4}:
__lock_acquire+0x124c/0x1d20
lock_acquire+0x154/0x2e0
__mutex_lock+0x17b/0x12f0
ser_state_run+0x5e/0x180 [rtw89_core]
rtw89_ser_hdl_work+0x119/0x220 [rtw89_core]
process_one_work+0x82d/0x1450
worker_thread+0x49e/0xd00
kthread+0x313/0x640
ret_from_fork+0x221/0x300
ret_from_fork_asm+0x1a/0x30
other info that might help us debug this:
Chain exists of:
&rdev->wiphy.mtx --> (wq_completion)phy0 --> (work_completion)(&ser->ser_hdl_work)
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock((work_completion)(&ser->ser_hdl_work));
lock((wq_completion)phy0);
lock((work_completion)(&ser->ser_hdl_work));
lock(&rdev->wiphy.mtx);
*** DEADLOCK ***
2 locks held by kworker/u32:1/61:
#0: ffff888103835148 ((wq_completion)phy0){+.+.}-{0:0}, at: process_one_work+0xefa/0x1450
#1: ffffc9000048fd30 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}, at: process_one_work+0x7b5/0x1450
stack backtrace:
CPU: 0 UID: 0 PID: 61 Comm: kworker/u32:1 Not tainted 6.17.0-rc2 #17 PREEMPT(voluntary)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS edk2-20250523-14.fc42 05/23/2025
Workqueue: phy0 rtw89_ser_hdl_work [rtw89_core]
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
print_circular_bug.cold+0x178/0x1be
check_noncircular+0x14c/0x170
__lock_acquire+0x124c/0x1d20
lock_acquire+0x154/0x2e0
__mutex_lock+0x17b/0x12f0
ser_state_run+0x5e/0x180 [rtw89_core]
rtw89_ser_hdl_work+0x119/0x220 [rtw89_core]
process_one_work+0x82d/0x1450
worker_thread+0x49e/0xd00
kthread+0x313/0x640
ret_from_fork+0x221/0x300
ret_from_fork_asm+0x1a/0x30
</TASK>
Found by Linux Verification Center (linuxtesting.org).
Fixes: ebfc9199df05 ("wifi: rtw89: add wiphy_lock() to work that isn't held wiphy_lock() yet")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
---
drivers/net/wireless/realtek/rtw89/ser.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtw89/ser.c b/drivers/net/wireless/realtek/rtw89/ser.c
index fe7beff8c424..f99e179f7ff9 100644
--- a/drivers/net/wireless/realtek/rtw89/ser.c
+++ b/drivers/net/wireless/realtek/rtw89/ser.c
@@ -205,7 +205,6 @@ static void rtw89_ser_hdl_work(struct work_struct *work)
static int ser_send_msg(struct rtw89_ser *ser, u8 event)
{
- struct rtw89_dev *rtwdev = container_of(ser, struct rtw89_dev, ser);
struct ser_msg *msg = NULL;
if (test_bit(RTW89_SER_DRV_STOP_RUN, ser->flags))
@@ -221,7 +220,7 @@ static int ser_send_msg(struct rtw89_ser *ser, u8 event)
list_add(&msg->list, &ser->msg_q);
spin_unlock_irq(&ser->msg_q_lock);
- ieee80211_queue_work(rtwdev->hw, &ser->ser_hdl_work);
+ schedule_work(&ser->ser_hdl_work);
return 0;
}
--
2.51.0
Fedor Pchelkin <pchelkin@ispras.ru> wrote: > Lockdep gives a splat [1] when ser_hdl_work item is executed. It is > scheduled at mac80211 workqueue via ieee80211_queue_work() and takes a > wiphy lock inside. However, this workqueue can be flushed when e.g. > closing the interface and wiphy lock is already taken in that case. > > Choosing wiphy_work_queue() for SER is likely not suitable. Back on to > the global workqueue. > > [1]: > > WARNING: possible circular locking dependency detected > 6.17.0-rc2 #17 Not tainted > ------------------------------------------------------ > kworker/u32:1/61 is trying to acquire lock: > ffff88811bc00768 (&rdev->wiphy.mtx){+.+.}-{4:4}, at: ser_state_run+0x5e/0x180 [rtw89_core] > > but task is already holding lock: > ffffc9000048fd30 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}, at: > process_one_work+0x7b5/0x1450 > > which lock already depends on the new lock. > > the existing dependency chain (in reverse order) is: > > -> #2 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}: > process_one_work+0x7c6/0x1450 > worker_thread+0x49e/0xd00 > kthread+0x313/0x640 > ret_from_fork+0x221/0x300 > ret_from_fork_asm+0x1a/0x30 > > -> #1 ((wq_completion)phy0){+.+.}-{0:0}: > touch_wq_lockdep_map+0x8e/0x180 > __flush_workqueue+0x129/0x10d0 > ieee80211_stop_device+0xa8/0x110 > ieee80211_do_stop+0x14ce/0x2880 > ieee80211_stop+0x13a/0x2c0 > __dev_close_many+0x18f/0x510 > __dev_change_flags+0x25f/0x670 > netif_change_flags+0x7b/0x160 > do_setlink.isra.0+0x1640/0x35d0 > rtnl_newlink+0xd8c/0x1d30 > rtnetlink_rcv_msg+0x700/0xb80 > netlink_rcv_skb+0x11d/0x350 > netlink_unicast+0x49a/0x7a0 > netlink_sendmsg+0x759/0xc20 > ____sys_sendmsg+0x812/0xa00 > ___sys_sendmsg+0xf7/0x180 > __sys_sendmsg+0x11f/0x1b0 > do_syscall_64+0xbb/0x360 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > > -> #0 (&rdev->wiphy.mtx){+.+.}-{4:4}: > __lock_acquire+0x124c/0x1d20 > lock_acquire+0x154/0x2e0 > __mutex_lock+0x17b/0x12f0 > ser_state_run+0x5e/0x180 [rtw89_core] > rtw89_ser_hdl_work+0x119/0x220 [rtw89_core] > process_one_work+0x82d/0x1450 > worker_thread+0x49e/0xd00 > kthread+0x313/0x640 > ret_from_fork+0x221/0x300 > ret_from_fork_asm+0x1a/0x30 > > other info that might help us debug this: > > Chain exists of: > &rdev->wiphy.mtx --> (wq_completion)phy0 --> (work_completion)(&ser->ser_hdl_work) > > Possible unsafe locking scenario: > > CPU0 CPU1 > ---- ---- > lock((work_completion)(&ser->ser_hdl_work)); > lock((wq_completion)phy0); > lock((work_completion)(&ser->ser_hdl_work)); > lock(&rdev->wiphy.mtx); > > *** DEADLOCK *** > > 2 locks held by kworker/u32:1/61: > #0: ffff888103835148 ((wq_completion)phy0){+.+.}-{0:0}, at: process_one_work+0xefa/0x1450 > #1: ffffc9000048fd30 ((work_completion)(&ser->ser_hdl_work)){+.+.}-{0:0}, at: > process_one_work+0x7b5/0x1450 > > stack backtrace: > CPU: 0 UID: 0 PID: 61 Comm: kworker/u32:1 Not tainted 6.17.0-rc2 #17 PREEMPT(voluntary) > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS edk2-20250523-14.fc42 05/23/2025 > Workqueue: phy0 rtw89_ser_hdl_work [rtw89_core] > Call Trace: > <TASK> > dump_stack_lvl+0x5d/0x80 > print_circular_bug.cold+0x178/0x1be > check_noncircular+0x14c/0x170 > __lock_acquire+0x124c/0x1d20 > lock_acquire+0x154/0x2e0 > __mutex_lock+0x17b/0x12f0 > ser_state_run+0x5e/0x180 [rtw89_core] > rtw89_ser_hdl_work+0x119/0x220 [rtw89_core] > process_one_work+0x82d/0x1450 > worker_thread+0x49e/0xd00 > kthread+0x313/0x640 > ret_from_fork+0x221/0x300 > ret_from_fork_asm+0x1a/0x30 > </TASK> > > Found by Linux Verification Center (linuxtesting.org). > > Fixes: ebfc9199df05 ("wifi: rtw89: add wiphy_lock() to work that isn't held wiphy_lock() yet") > Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Acked-by: Ping-Ke Shih <pkshih@realtek.com> By the way, you mark this patchset as 'rtw'. Does it mean this patchset is urgent to you? If not, it will be more smooth (avoid possible merge conflict) if it goes via 'rtw-next'. Let me know your preference.
On Thu, 18. Sep 05:52, Ping-Ke Shih wrote: > Fedor Pchelkin <pchelkin@ispras.ru> wrote: > By the way, you mark this patchset as 'rtw'. Does it mean this patchset is > urgent to you? If not, it will be more smooth (avoid possible merge conflict) > if it goes via 'rtw-next'. Let me know your preference. The first patch of the series is rather urgent compared to the others because it addresses the issue occasionally banging on a working system. The other ones are less urgent. TBH I'm not aware of your development process in details. It's v6.17-rc6 at the moment. If I target all patches at rtw-next, are they to land in upcoming merge window for v6.18 release (a couple of weeks from now)? If yes then no hurries from me, rtw-next is ok.
Fedor Pchelkin <pchelkin@ispras.ru> wrote: > On Thu, 18. Sep 05:52, Ping-Ke Shih wrote: > > Fedor Pchelkin <pchelkin@ispras.ru> wrote: > > By the way, you mark this patchset as 'rtw'. Does it mean this patchset is > > urgent to you? If not, it will be more smooth (avoid possible merge conflict) > > if it goes via 'rtw-next'. Let me know your preference. > > The first patch of the series is rather urgent compared to the others > because it addresses the issue occasionally banging on a working system. > The other ones are less urgent. > > TBH I'm not aware of your development process in details. It's v6.17-rc6 > at the moment. If I target all patches at rtw-next, are they to land in > upcoming merge window for v6.18 release (a couple of weeks from now)? > If yes then no hurries from me, rtw-next is ok. It's v6.17-rc6 (-rc eve), so I don't plan to send a pull-request. Originally I plan to send the last pull-request to v6.18 today, so I did review this patchset yesterday to see if I can merge it before sending. Since only two or three minor changes are needed, I can wait a while and send the pull-request next Monday if you can re-spin the patchset this weekend. If not, I can still merge this patchset in v6.18-rc cycle to rtw tree. However, this might cause merge conflict with -next, so I don't prefer this. Upper maintainers need to spend extra time to resolve conflicts.
On Fri, 19. Sep 00:46, Ping-Ke Shih wrote: > Fedor Pchelkin <pchelkin@ispras.ru> wrote: > > On Thu, 18. Sep 05:52, Ping-Ke Shih wrote: > > > Fedor Pchelkin <pchelkin@ispras.ru> wrote: > > > By the way, you mark this patchset as 'rtw'. Does it mean this patchset is > > > urgent to you? If not, it will be more smooth (avoid possible merge conflict) > > > if it goes via 'rtw-next'. Let me know your preference. > > > > The first patch of the series is rather urgent compared to the others > > because it addresses the issue occasionally banging on a working system. > > The other ones are less urgent. > > > > TBH I'm not aware of your development process in details. It's v6.17-rc6 > > at the moment. If I target all patches at rtw-next, are they to land in > > upcoming merge window for v6.18 release (a couple of weeks from now)? > > If yes then no hurries from me, rtw-next is ok. > > It's v6.17-rc6 (-rc eve), so I don't plan to send a pull-request. > > Originally I plan to send the last pull-request to v6.18 today, so I did > review this patchset yesterday to see if I can merge it before sending. > Since only two or three minor changes are needed, I can wait a while and > send the pull-request next Monday if you can re-spin the patchset this > weekend. > > If not, I can still merge this patchset in v6.18-rc cycle to rtw tree. > However, this might cause merge conflict with -next, so I don't prefer > this. Upper maintainers need to spend extra time to resolve conflicts. One thing that I forgot to mention is about rtw89 USB part. "BUG: KFENCE: use-after-free write in rtw89_core_tx_kick_off_and_wait" which is fixed by the first patch of the current series is reproduced reliably with USB HCI because there is no TX completion there yet, i.e. rtw89_core_tx_kick_off_and_wait() always exits with a timeout and touches skb parts which are most probably already freed by the call to ieee80211_tx_status_irqsafe() from URB completion callback. I've got a dongle now and confirm the bug. Turns out it was reported here [1] by Bitterblue Smith as well. [1]: https://lore.kernel.org/linux-wireless/1e5e97d4-8267-4f77-a4bf-1fe23ea40f77@gmail.com/ The first patch does avoid use-after-free bug for USB, too. But, as there is no TX ACK completion implemented for USB yet, tx_wait_list will be piled up with wasted items which can't be freed due to the lack of completion. It's better than crash but still a problem. Bitterblue suggested [2] implementing the missing TX completion parts for USB to fix this entirely. I've got a bunch of patches for it which will send as a separate USB-series today or tomorrow. I expect it'll require time for review and it probably should have to be improved/reworked in several places. Anyway I'll send it soon so you've got a more complete picture and some time until Monday to decide how to handle it. [2]: https://lore.kernel.org/linux-wireless/0cb4d19b-94c7-450e-ac56-8b0d4a1d889f@gmail.com/
© 2016 - 2025 Red Hat, Inc.